Week 1

 

My research is concerned with the Automated Classification of Protein Crystallization Images.

 

Protein Crystallization Processes are at the very frontier of developing new ways to fight and overcome diseases such as cancer and malaria. Approaches to fighting diseases tend to utilize debilitating process in which it becomes a race to see if the human or the invading agent will be killed off first. One of the aims of crystallographers is to identify protein crystal structures that can act as inhibitors and thereby neutralize the invading parasites ability to spread, to that end, crystallographers run an increasing number of experiments in which precipitates of different kinds are mixed at various temperatures and in different proportions. Such experiments are very costly to run. Modern automation processes in the preparation of precipitates, among others, yield hundreds of thousands of images daily. Identifying successful crystal growth is thus not only costly but also intensive in terms of human hours, and yet it is vital to the millions of people living with disease the world over.

 

In order to reduce the amount of images that require human viewing, a subgroup of the vision group, led by Linda Shapiro at the University of Washington, developed an image classification subsystem. The results obtained by the team were an extremely low false negative rate of 2.9% while maintaining a tolerable false positive rate of 37.7%. While this results are unmatched by comparable systems in the crystallography literature, we hope to improve on the false positive rate by extending this research.

 

My part of the research will be implementing an algorithm to compute five texture measures from the co-occurrence matrix. From the five texture measures I hope to identify whether an area that contains connected segments has texture or not, we hypothesize that crystals will be found in areas containing connected segments that are not textured.

 

My background is in mathematics. I have done most of my programming in Java. However, Matlab comes with built in formulas for image analysis, and its built in formulas that make matrix manipulation handling less computationally expensive and less algorithmically complex. Hence, I will learn and implement my algorithms in Matlab. Additionally I will read Computer Vision by Linda G. Shapiro and George C. Stockman, to inform the implementation of my algorithms.

 

This first week I read chapters 1, 2, 3, 5, and 7 of the Shapiro Stockman book. Also I implemented a Canny and Sobel filters to compare their relative effectiveness in edge detecting. The Canny filter appeared superior to the Sobel filter in the smoothness of the image they produced.

 

My biggest obstacle was learning the syntax of Matlab. It took me a long time to remember that writing is the name for saving data to a file. But at the end of the week, having had a couple of successes implementing my filters, I realized that programming is like riding a bike: though one might get rusty at it, one never forgets! ;-)

 

This weekend I'm going to the Freemont street fair and also salsa dancing.