Kinect Face Tracking — Results: Thesis Update #4

thesis pic


For background, see my three previous posts in this series:

  1. Facial Expression Analysis With Microsoft Kinect : Thesis Update #1
  2. Some Faces : Thesis Update #2
  3. Animation Units for Facial Expression Tracking : Thesis Update #3

My thesis has been successfully completed and defended now, and I am currently on break for the summer. I thought I would make a post to wrap up some loose ends and talk about some things I didn’t have a chance to talk about before. In my last post, I discussed the significance of animation units to my facial tracking algorithm. Now the way I use these units is pretty simple, but can lead to some complex classifications. The first thing to do is to consider what the desired face would look like. Picture it in the mind’s eye. Say for example we wanted to recognized a surprised face. How would this look? Chances are, when thinking of a surprised face, the mind visualizes raised eyebrows, wide open eyes, and a gaping mouth. The next task (and the bulk of my algorithm) asks: How do we translate these facial movements into the points tracked on our facial mesh?

We use animation units as “wrappers” for sets of points on the face plane. Instead of having to track and check multiple different places on the eyebrows or mouth for example, the animation units allow us to track the movements of those features as a whole. Since the animation unit values lie between -1 and 1, we must assign “bounds” for each expression to where if the user’s features fall within that range, we can assume the user is creating that expression. These values at present are determined by extensive testing and seeing what values are frequently displayed when creating a certain expression. It would not be difficult to build a classifier for these expression bounds, and use it to train the program over multiple different faces and expressions in order to get the best and most accurate data for each type of face.

In my application, we track six different types of expressions.





kissing face




In addition, we look for two different types of angry faces: angry with a closed mouth (glaring at someone) or angry with an open mouth (as in yelling).

angry-closed mouth

angry – closed mouth

angry-open mouth

angry – open mouth

To see a simple flowchart detailing some preliminary bounds for each expression (not exhaustive), check out the chart here (click for larger view).

ThesisLogic (1)

There is a bit of a “lag” in my application on recognizing these expressions, because the live video stream captures many frames each second, and there is a tiny bit of delay in figuring out what expression that frame’s data fits into. As such, the output of my program is a bit inaccurate still. Because it prints off what the expression every frame, there can be a bit of a buildup and after a while it will start showing expressions at a bit of a delay. For example, if the user acts surprised sometimes the program will not actually print “surprised” for a fraction of a second afterwards, because its busy trying to run through the frames as they come in. A simple remedy to this would be to create a “buffer” of tracked points and use the average of the data over a few seconds in order to determine the facial expression. Because the camera is very sensitive, we are prone to having the data change at the slightest movement of the face. Indeed, even trying to sit as still as possible still results in some small changes in the data. Another thing I noticed that creating a buffer of data could help solve is when the camera loses track of the face for only a moment, it begins to spit out garbage data as it attempts to relocate the face.

Overall we can see a good proof of concept of the capabilities of the Kinect Face Tracking API and there is a lot of room for improvement in the future. Possible future additions/enhancements include:

  • tracking a wider range of expressions
  • wider range of accessibility (glasses/hats, children, elderly people)
  • more specific bounds for facial expressions, use neural networks or something
  • more interactive application
  • use facial expression recognition to interface with other environments (i.e., call a command by smiling)

19 comments on “Kinect Face Tracking — Results: Thesis Update #4

  1. […] Kinect Face Tracking — Results : Thesis Update #4 […]

  2. […] UPDATE: See the next post in this series here: Kinect Face Tracking — Results :  Thesis Update #4 […]

  3. […] Kinect Face Tracking — Results : Thesis Update #4 […]

  4. no name says:

    Some useful tips:

    1. if using Kinect for Windows (not kinect for xbox) then switch to seated mode + near mode
    2. try choosing room with darker-color-plane-background
    2. switch to lower depth stream mode (the one with lesser resolution)
    4. just capture frame in every FPS and post-process it later try to avoid processing during the onFrameReady call
    5. c++ face tracking is faster compared to c# wpf… use threading and dispatcher.invoke(()=>{//code//})
    6. avoid skeleton data while face tracking

  5. no name says:


    7. face AUs varies from people to people… try testing with various (test subject’s i.e. humans) faces and get an approximate boundry
    8. try caputring frame X, Y and Z-axis position, use these as a reference to minimize errors

  6. Congratulations for your job. I´m working in facial expressions with kinect too. Have you found any solutions to the problem of the different values of AU´s for different people ?

    • I think you can use machine learning techniques to combine multiple “training sets” of faces and sort of normalize the values that are coming in.

    • no name says:

      go for easy not ML! science is not the solution to everything.. magic is

      use a struct or class that can store AU for each session.

      Class class_AU
      { public list AU_storage; }

      and later if u define something like
      dictionary things gonna be much easy.. but this is just memory based runtime storage.. u need file or DB to get things permanent..
      and yes since AUs = float values so just compare em with these:
      ” > , < , == (this one is risky with float.. so is the next one), != , etc "

      • no name says:

        *dictionary with keys = person name and values = class_AU

      • What about when there is a person that you have not seen before and do not have any “background data” on them?

      • yes that definitely works for keeping a list of AUs over time but how do you plan to “normalize” this data across various faces when you find that a smiling AU value on one person is not the same as a smiling AU on another?

      • no name says:

        “On July 24, 2013 at 13:34 themusegarden said:

        What about when there is a person that you have not seen before and do not have any “background data” on them?”

        answer: have a say 1 minutes conversation sessions (multiple sessions) with the new person.. record the data in list/dictionary or DB.. later marked the data as smiling/sad/happy etc etc. now this data can be used as a reference!


        “On July 24, 2013 at 13:33 themusegarden said:

        yes that definitely works for keeping a list of AUs over time but how do you plan to “normalize” this data across various faces when you find that a smiling AU value on one person is not the same as a smiling AU on another?”

        answer: that is a serious issue with the faceTracking toolKit… i remember when i used to smile.. it gave sad face readings.. i guess the sensor could see the fake smile 😀
        pre recored each person data and mark it as sad/happy/smiling etc then use this as reference to each person.. alternatively u can define a range something like 0.1 – to – 0.7 works for most people smile instead of 0.0 – to – 1.0 (maybe.. for your test subjects (i.e people in focus, this may vary based on face/region/smile etc etc))

      • So manually classify expressions at first and use this to guide further interactions with the kinect?

      • no name says:


  7. Niko says:

    Hi Allison,
    I am currently working on my diploma thesis in computer science
    which is about recognition of german sign language with the kinect.
    While searching for existing concepts and solutions I found your blog and read
    the entries about your thesis. As emotional expressions are an
    important part of the sign language your approach sounds very interesting
    and suits perfectly to solve the face recognition part of my thesis.
    Is your thesis published somewhere so I could have a look at it
    and quote it in my thesis ?
    Under which license did you publish your implementation
    (may I use / modify the code) ?

    Kind regards

  8. Henry says:

    I am working with my thesis project to detect the facial emotion and use it to control application.
    I already succeed to detect the point, but I still don’t understand how can I access the point individually.
    Actually I just learn C# and kinect sdk.
    Could you give me a clue?
    And also do you mind to share the code? I will cite you in my work.
    Thank you

  9. hoa le says:

    I’ve tried to run your code, it’s a great job. And I’ve seen the data.txt file that you saved the result.
    I would to ask if I want to save all of the values of AUs and labled emotions from started to finished tracking?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s