See the previous posts in this series:
So the past few months, I’ve been hard at work on my thesis concerning facial expression analysis with Microsoft Kinect. Unfortunately my blog has suffered quite a bit in wake of everything I’ve had going on, but I’m trying to post a few new things as the school semester winds down.
My ongoing project concerning Facial Expression Analysis with Kinect is making progress, and as the semester winds down, I am trying to prepare a final product to present. As I described in my last thesis update, I had been able to test a sample product which overlays a 3D mesh on the user’s face and tracks the different points on the face. In particular, a subset of these points are called Animation Units, or AUs, and are essential to the expression recognition algorithm. The points are defined based on the Candide3 model and the different values are delineated on Microsoft’s API page about the Kinect Face Tracking SDK. There are six different values that describe movements and placement of the basic facial features such as eyes, eyebrows, and mouth, and the table of values for the AUs is reproduced below (from Microsoft’s Kinect API page):
|AU Name and Value||AU Value Interpretation|
|AU0 – Upper Lip Raiser||0=neutral, covering teeth
1=showing teeth fully
-1=maximal possible pushed down lip
|AU1 – Jaw Lowerer||0=closed
-1=closed, same as 0
|AU2 – Lip Stretcher||0=neutral
1=fully stretched (joker’s smile)
-1=fully rounded (kissing mouth)
|AU3 – Brow Lowerer||0=neutral
-1=raised almost all the way
1=fully lowered (to the limit of the eyes)
|AU4 – Lip Corner Depressor||0=neutral
-1=very happy smile
1=very sad frown
|AU5 – Outer Brow Raiser||0=neutral
-1=fully lowered as a very sad face
1=raised as in an expression of deep surprise
When these values are considered, I was able to determine a set of bounds for what expression would represent in my application. However, I learned quickly that the extremal values of -1 and 1 are just that, extremal. Even sitting in front of the camera making the most exaggerated faces I could, it was very difficult to get above a .5 or .6 in some areas. In addition, the eyebrow data was almost always inaccurate in my testing because I wear glasses and this confused the camera. The Kinect saw the top of my glasses as my eyebrows and thus, showed not very much movement at all. When I took my glasses off, placement and tracking returned to normal, but it was impossible for me to see the data being processed.
Another unforseen problem that I ran into was that the camera is perhaps a bit too sensitive, and since it is tracking many frames every second, even if you sit as still as possible there will still be some variation in the tracked points, and if the camera loses track of the face entirely, the points can really go haywire. Therefore, trying to keep the points within specific bounds is more difficult than expected. If the camera loses track of the face the application can say you went from surprised to sad to angry in less than a second. It will probably not get done in this version of the project, but it would be useful in the future to store a buffer of points perhaps every second or so so that it stabilizes the constantly changing data and provides more time for processing before changing frames completely.
Overall I’ve certainly learned a lot throughout the course of this project and I would like to continue to work on it more throughout the summer. I will post a few more thoughts on the process and results as I carry on and wrap things up. I’ll have a copy of my thesis online at some point too.
In other news, I’m going to be starting the Master’s program in Computer Science at Appalachian State University in the fall, and I’m pretty excited about it. I plan to do research on algorithms and graph theory.
UPDATE: See the next post in this series here: Kinect Face Tracking — Results : Thesis Update #4