Week Three

Back to Journal

Week Three: Text Detection Woes

I devoted the start of this week to that text detection algorithm, to make sure I had refined it enough for use with tesseract and any other preprocessing I would decide to do. It was a bit of a struggle - I have been writing primarily in MATLAB, and though I had written some small MATLAB scripts and functions for a one-hour course I took last semester, I had never written anything on this scale. There are so many great ways to manipulate matrices efficiently in MATLAB, and it's been taking me awhile to shift into that way of thinking about programming. (Surprisingly, I feel like I've picked up the mathematical concepts really quickly. Mathematical morphology is pretty great when someone else has already implemented its operations. ^_~)

My current version of the algorithm, which is slightly simplified from Hasan and Karam's and tweaked a little to fit my application (I tried to ease up some of their character size restrictions, so my text detector will be more general), is working pretty nicely right now. It draws a rather reasonable box around what is most likely text (only the "loudest" of noise and fuzziness seems to throw it off), and tesseract is recognizing some of my straighter, better focused pictures almost perfectly. Text rotation and skew seem to still be a problem, as well as fuzzy character edges, so those are my next priorities for preprocessing.

I needed a little break from the heavy scripting in the middle of the week, so I made a lot of progress on the structure of my Android application. I've set up the user interface for taking a picture and sending it to my processing server, which right now can run the text detection algorithm, run the output through tesseract, and send the "recognized" text back to the user. Since I had not finished the text detection algorithm when I started on the UI, I wrote an intermediate Android activity to allow the user to draw their own bounding box around the text they want translated. Learning the application and activity lifespan from the Android documentation took awhile, but after writing this slightly non-trivial mock up of my application I feel really confident about developing Android applications. I haven't done much cell phone programming, but Android code feels a lot easier to write (once you learn it) than run of the mill J2ME. It's hard to recommend learning it when there's no Android phones yet, but I am excited to see how it catches on once real devices can run it.

This week outside of work was still hectic - Friday was the deadline for the camera-ready version of the SIGCOMM paper I was working on with my lab at Georgia Tech - but I managed to have a little fun. On Sunday before this week I went strawberry picking with Erika, which was fun and amazingly cheap for going to an organic farm. (Okay, they couldn't say they're organic since it takes a long time to be certified, but they don't use pesticides, which made Erika happy.) We got an exorbitant amount of berries - I used a quart for strawberry muffins, which I must say turned out famously ^_~, and we still had an entire gallon bag left. I'm still trying to keep my workout schedule of running for 2-3 miles every other morning and going to yoga class the other days of the week, but I admit yoga is not easy. It would be great if they had "yoga for people who have never regularly exercised," because I'd fit right in. The classes I'm going to, though, are really cool about letting us decide for ourselves what our bodies can do, so while I spend my non-class days just trying to build some strength on easy poses I still feel comfortable going to class and trying to soak in the hard stuff.

Mostly, I'm looking forward to the weekend, when I go to Philadelphia to see The Roots and Gnarls Barkley in concert with my boyfriend. His sister lives in Philly, so we'll get to see her and her adorable one-year-old daughter too. Weekend trips up here are going to be awesome - the states and important cities are much closer together than they are back home :)