Week One: Talking to Android
|My Final Report
Week Two: Learning about OCR
After getting acquainted with the Android platform, I spent this week learning about optical character recognition (OCR), the intermediate step between taking a picture with a phonecam and translating its text content. OCR has been around for awhile, particularly for use in generating an editable version of a printed, scanned document, but OCR for text in photographs is a newer, more difficult nut to crack. Since computers can't "see" images - they can't recognize patterns amidst input colors of light, unlike humans - the process of recognizing alphabet-like patterns within a large, complex image is rather involved. While tesseract, the OCR engine I found last week, already does this to some extent, it is streamlined for analyzing scanned documents, so to make tesseract work with phonecam photos I have to do a bit of preprocessing to reduce the rotation and skew introduced by human (often novice) photographers.
Week Three: Text Detection Woes
I devoted the start of this week to that text detection algorithm, to make sure I had refined it enough for use with tesseract and any other preprocessing I would decide to do. It was a bit of a struggle - I have been writing primarily in MATLAB, and though I had written some small MATLAB scripts and functions for a one-hour course I took last semester, I had never written anything on this scale. There are so many great ways to manipulate matrices efficiently in MATLAB, and it's been taking me awhile to shift into that way of thinking about programming. (Surprisingly, I feel like I've picked up the mathematical concepts really quickly. Mathematical morphology is pretty great when someone else has already implemented its operations. ^_~)
Week Four: More Fun with Image Preprocessing
Yet another week devoted to making amateur photos of text "readable" by the machine. After getting my text detection algorithm into good working condition - though I will be refining it more once all these preprocessing puzzle pieces exist to be put together - I have been working on "simpler" concepts like autorotating text, correcting perspective, and reducing blur. During my early research two weeks ago I stumbled on a great image preprocessing paper whose autorotation and perspective correction algorithms I thought I'd implement. In general, they were a great starting point, but since I don't want my program to make any assumptions about how text looks - other than that it is originally printed horizontally, no matter the skew a photograph puts on it - I've had to make several adjustments.
Week Five: Blue Hens?! (or, What the Hell, UDel)
On Monday, Tuesday, and Wednesday this week I went to a "systems research mentoring workshop" (what a cumbersome name, though I can't think of a better way to describe it) sponsored by CRA and CDC at the University of Delaware. Basically, this was a grad school pep rally/boot camp. Half the program the invited faculty and researchers presented hot topics in systems research - can't say it changed my mind about [not] doing systems work forever, but it's really fascinating stuff so I was glad to learn more about it - and the other half they shared with us how to get into grad school with a good application and how to succeed in academia. They didn't have to convince me to go for my Ph.D. - I've always been pretty academic, plus I want people to call me "Dr." - but after being a little shaken up by how research-oriented they proposed we could be in undergrad, I'm pretty excited about applying and think I'll do well. The profs' favorite job perk (travel) resonated with me - I've had the travel bug after going to Japan last year - so I will suffer through (or revel in the nerdity of) the next x years of schooling to become a fancy jet-setting researcher. I'm really glad Margaret was cool with me working my schedule around this short break (though I admit, I'm not so thrilled about working longer days to make up for it!).
Week Six: Starting to finish?
This was the week I wanted to "finish up" my code (or rather, shift the majority of my time from writing code to testing it) and I feel like I've done about what I wanted. I realized this week that I could clean up my rotation algorithm even further, and I wrote a tiny piece of code to use the hit-miss transform (more morphology :P ) to remove some artifacts of using Matlab to do the rotation. To Matlab, image rotation is just a matrix transformation, but this simple process leaves sawteeth and other such debris on the edges of objects, so I figured it would be nice to smooth out the edges a bit. It's a minor adjustment, but in some cases it really helps tesseract pick out letters correctly.
Week Seven: Seeing Results
Though this week marked the start of my transition to testing, I started the week with a few meetings. On Monday, Margaret invited Sahar, the other DMP student, and I to have lunch with Jennifer Rexford, a professor in the CS department at Princeton. This was a particularly interesting meeting for me - the professor I work with back at Georgia Tech, Nick Feamster, has worked extensively with Dr. Rexford, so I've read a few of her papers in preparation for my own work. I enjoyed our informal little lunch - like so many of the faculty members I've met because of research, Dr. Rexford is fun and interesting on top of her intellectual merits - and I look forward to seeing her at events like SIGCOMM in the future.
Week Eight: I'll Let the Machine Do the Learning
After seeing that the original morphological text detection algorithm was not so robust to my fuzzy, lo-res natural scene images, I decided to spend a bit of time researching possible improvements. Recognizing text out of a background is a pretty classic example of a machine needing intelligence (though some odd few may argue that vision is not an intelligence-based task), so I wanted to come up with more features from my data set that could help my programs detect and isolate a pattern. I spent a few hours looking up signal processing and machine-learning techniques - do wavelet transforms, Gabor filters, or Markov random fields ring a bell to anyone? - and quickly realized that nothing so sophisticated could be implemented in the next two weeks for adequate testing. (It was really fun to read about though! I was particularly fascinated by this paper; though it was studied more from the angle of the human perception of randomness, I thought it may be interesting to try to model text and background textures using Markov random fields. A project for the future - if I ever have the free time. Ha!)
Week Nine: The Code is Done!
I worked Weka's clustering techniques into my text candidate analysis this week, and after a bit of toying around I'd say I'm done with my code - well, done for the scope of the internship. Margaret and I have been discussing trying to publish some of this work, and I'm a bit conflicted about that - I've really liked my project and want to see the entire application take workable shape and not just the image processing stuff I've been focusing on here, but I'm still grappling with how to fit in everything I already have planned for this coming semester. I am applying to grad school this year, which is daunting in itself, and I already know of one paper my lab at Tech wants to submit (to NSDI! due in October! WE HAVE BARELY STARTED OH MY GOD) which will keep me completely busy. And to think I was planning on starting that work-life balance thing soon...
Week Ten: I Did It!
After nine days of working on it - and nine other weeks of work on the research behind it - I finished up my final report today. I'm mostly exhausted right now, but it feels pretty damn good to have finished. The writing process was actually less painful than I had thought it would be - I mean, it took a long time to draft all the content, but after learning to use LaTeX I was so happy to see my paper take shape and look professional. Again this week working remotely has gone well - though Margaret is really busy meeting deadlines of her own, she gave me lots of good feedback about my drafts (that Post-It feature in Adobe Reader is so handy for mark-ups!). So, very soon (ie. as soon as I stop writing this entry and post my files) you can read my very own research paper, the first one I've ever written entirely by myself. I'm so proud :')