Final Report

I won an award from the Computing Research Association, sponsored by the NSF, that matches up computer science undergraduates and professors to do research for a summer. I worked in conjunction with Professor Margrit Betke and several of her graduate students in Image and Video Computing group at Boston University. I worked mostly in the field of computer vision with focuses in medical image analysis on three main projects. The first project dealt with automatic methods to detect lungs and nodules within the lungs from a CT scan of the chest. Next, I moved onto a project involving the development of a computer aided diagnosis system. My last substantial project of the summer entailed developing a method for finding medical images in a database similar to a given scan.

1. Background/Motivation

Lung cancer is the leading cause of cancer in the United States, causing 160,000 deaths per year. After only five years 85% of people with lung cancer will die. As with many other diseases, early detection improves the rates drastically. In Stage I, the detection and resection of pulmonary nodules improves the rate to 33%. The improvement in prognosis with early detection has led to the proposal of CT-based lung-cancer screening and diagnostic image analysis systems. If screening were to take place the volume of CT-scans that radiologists would need to process would increase dramatically and radiologists would be greatly aided by computer vision techniques.

2. Nodule Segmentation

As an introduction to the automatic nodule detection project, I used a computer aided diagnosis system written by a graduate student to identify lung borders. The next step was to help with the nodule segmentation. In a paper entitled "Multi-criterion 3D Segmentation and Registration of Pulmonary Nodules on CT: A Preliminary Investigation" that was refereed by International Conference on Diagnostic Imaging and Analysis in August 2002, we compared results of our segmentation methods with those of a radiologist. A nodule segmentation method that best identifies the nodules usually uses characteristics such as shape, size, location, and density. The radiologist hand marked nodules so we could compare our results. The nodule data was given in a different form than our segmentation results produced so in order to verify that our methods correctly segmented nodules we needed to put the radiologist's data in a form suitable for a comparison. I took the data given to us by the radiologist and parsed it into the data structures that we used in our segmentation algorithm.

3. Interface Development

Boston University has developed a computer aided diagnosis system, and second part of my time here was spent working with that program. This software, when fully developed, has the potential to be helpful to radiologists in diagnosing certain types of lung disease. At this point the software is still in the developmental phase and, therefore, needs more work. I spent a few weeks adding features and fixing existing problems in order to make the program more user friendly. I spent time sorting through the code in an attempt to become more familiar with the software. I also thoroughly tested the software to assist in finding existing problems. I was working in an unfamiliar development environment as well as with an unfamiliar language but still managed to fix the most pressing problems and add a few new features that will hopefully be valuable to our consulting radiologist.

4. Similarity Search

The last few weeks of my time here were spent developing a new project. The group that works on medical imaging here has not yet done much with medical image databases or data mining and would like to explore the many interesting topics that exist in this field. I began exploring the idea of a similarity search in an image database. The first step was to decide what makes two images similar. As a start we decided to just use lung CT scans of patients with lung cancer. I have come up with two different equations that have the potential to measure the similarity of two images. Testing the validity of both equations still needs to be done. The first equation considers visual similarity and is based primarily on location. For example, if the query scan has a nodule in the middle of the left lung and a nodule in the middle of the right lung the query should return scans with a nodule in the middle of the left lung and the middle of the right lung. At first glance these scans should look similar. Instead of location, the second equation uses volume and shape as the principal measures of similarity. In this case the scans produced may not be visually as similar but could perhaps determine the importance of nodule volume and shape are in terms of patient diagnosis. Before these equations can be tested more development of the current database must be done. More data regarding nodule volume and shape needs to be collected as well.


As a side project I have been working on a webpage. One of the requirements for participants of the Computing Research Association's Distributed Mentor Project program is maintaining a webpage with a project description and a weekly journal describing our work. I have had no prior experience with html so on the side I have been teaching it to myself. After reading books, web tutorials, and the source of other's websites I have been able to create a webpage of my own.

back to intro