Below are journal entries describing my progress as I attempt to complete my research project.
My first week has now come to a close. For the most part I have yet to start working on my actual research. I have been working on learning the basics of working with arduinos and Processing by making servos move based on your face's position in the webcam feed. At first I had no webcam to work with, so for the first few days I researched topics on facial tracking, robots with expressive faces, interactive robots, and human-like robots.
Once the webcam arrived I ran into some problems with making the servo move the way I wanted it to, which was easily resolved with the help of the graduate students at the RHCLab. Then I ran into some problems with communication between Processing and the arduino board. I could not figure out how to properly transfer an int value from Processing to the arduino board to use that value for rotating the servos. In order to remedy the issue I instead transferred a char value (single byte) then did the computations I needed in the arduino and used that computed value to rotate the servos thus solving my problem.
I have now completed my second week here at Notre Dame. This week I have been working on using AAM facial tracking. In attempting to install the software on the Mac I was working on we ran into several problems. I tried resolving the issues on my own, which required compiling and linking the built OpenCV libraries on the Mac and then compiling the face tracker software using those libraries. In attempting to compile OpenCV the Mac ran into several compiler errors. When no error was encountered the machine would simply freeze under the processing load. After getting help from my mentor, a graduate student, and the system manager for the department we decided that installing it on a Mac simply was too much of a hassle on our system. So we decided to try Linux. Linux turned out to be much better, although not requiring any less amount of effort. After finally getting OpenCV properly installed we were able to compile the face tracker and the real work began.
I spent most of my time udnerstanding the source code for the face tracker software. Once I felt that I had a good understaning of it I began editing the source code to allow me to print out the internal representation of the AAM model used in the software. This model was a set of triangles. Once I had access to the model I began working on a program that would allow me to compare two given sets and would print out differences among the sets. My hope was to use this program to measure changes from one facial state to another, therfore allowing me to define the AUs from the AAM. For the most part this does not seem feasible as the changes in individual points (that made up the triangles) were not significant enough to determine a change in facial expression.
Due to the eminent release of a more accurate face tracking software I spent most of my third week learning the differences behind the previous tracker and the new one we would be working on. This new software uses a Constrained Local Model to track the face in the video. CLM is based on the AAM algorithm. In order to increase the accuracy of the face tracker the new software uses depth information to make up for inconsistencies in lighting conditions between test data and training data. Due to its similarities with AAM the steps needed to translate our CLM face tracker data to movements in the robot's face will be very similar. In my readings of the CLM model I came across a promising way to transform human actions into robot's actions. In the coming days I will attempt to create and test a simple executable for this method that will work on the CLM.
This is the fourth week of my research. Not much progresss to report on as I am still awaiting the release of the software. Most of my time this week has been spent looking through the AAM code trying to better understand how all the pieces work together (since the CLM-Z tracker is based on this tracker). I decided to look into using the triangular mesh that AAM defines for determining the facial expression once more. This time instead of comparing two sets I am looking into a way to tell what facial expression is present based only on a given set.
Due to the delayed release of the CLM-Z tracker I have been rereading papers relating to my research. This in hopes of better udnerstanding the pieces of the tracker for methods of analyzing facial expressions. So far I have found two possible methods of Action Unit detection that can be used for control of the robot face. The issue is that neither of these methods use AAM, rather they use dense optical flow (the movement of objects accross successive frames) and bilinear factorization. My hopes is to be able to fuse the two methods in some way that allows the tracking offacial features. Due to the computational requirements of AAM and each of the AU detection methods this does not seem like a viable option. Instead my best bet seems to work with the AAM parameters and determine what AU are present from these. For this I need to better understand what the purpose of the AAM parameters is and how they are affected by the input image frame.
This is now my sixth week. This week I have focused on creating the program that recognises and synthesizes the facial expressions. In order to do this I set up the FacePoser program from the Source SDK in Steam to use as a virtual robot face. I ran into many problems dealing with the recognition part of the module. Some of the most difficult to deal with were factoring in the relative distance of the human face from the webcam and the relative position of the face in comparing the shape of the face between frames. I have not yet figured out a way to deal with them, but there are some simple ways of normalizing the shape so that the relative position is not a problm. I just need to make sure that it doesn't adversely affect the calculations that need to be done in the synthesis portion.
I have recorded test videos for use later on. This greatly simplifies testing the code for the project and guarantess that conditions will be properly reporduced in case of a bug (at least in input expressions). I recorded 4 30 second videos for creating the .max files for use in the translations. Another set of 1 minute videos are being used to test the translation. Lighting conditions are kept nearly identical throughout each set. Each of the 5 one minute videos are translated using each of the 4 max files. This tests the difference that the max files have on the final output. A special set of videos was also recorded each only 10 seconds long. These 3 videos are for raising the eyebrows, lowering them and the two combined. The combined movement video will be used to translate all three videos in the set to test how the isolating maximum movements for certain flex points affects accuracy.
This week I wrote the code for translating the CLM-Z parameters into flex point movements for the Source SDK FacePoser program. The code is broken up into 3 pieces: Recognition, Synthesis, and I/O. The I/O class handles the file formatting for FacePoser. I haven't had a chance the test the program yet, but I still need to work out a way to get the timing of each shape as it is added to the flex point list. The time needs to be accurate to the millisecond, ctime.h's clock() method seems promising so I need to play around with it. A few other issues remain for the program. THe first is that the face must be at a set angle from the camera, distance does not matter it is accounted for. General position of the face in the image also does not affect the program. Lighting and background need to be constant throughout the input videos to maximaize tracker accuracy. The accuracy of the translation is relative to the accuracy of the learnt max pixel movement values for the CLM-Z parameters.
The vcd files output by the sets of videos were created this week. I watched the animations with the corresponding video next to it to test for accuracy. I noticed 2 major errors. The first is that the timing for movements is wrong. The total animation length was approximately 90 seconds, but the video length was 60 seconds. This means that the overhead of the CLM and Recogntion calculations slowed down the timing. The second error is the eye area was inaccurately portrayed. The CLM tracker does not track the eye;ids, but rather the eye socket shape. This means that the animation would randomly have its eyes closed when no such act would occur in the video. There is nothing I can really do about this, as it is an issue with the tracker not the translation. The rest of the animation was odd to look at because of the timing effect, but still I could tell it was highly inaccurate. I believe this has to do with the way I record the maximum movements. Face tilt affects overall point position and therefore throughs off the maxs that were recorded. Also, because some flex points affect the same area it means that maxs for another flex point affect other flex points.