# Research Journal

**Week 1 (June 18 - 24)**

I think I met a new friend outside my building, too- we got into an animated discussion about Russia and the USSR and stayed out so late that I had to walk her home - she lives in an apartment building nearby. I really hope to see her again - even though she's younger than me, we're really well matched when it comes to knowledge and interests.

*Return to top*

**Week 2 (June 25 - July 1)**

*Data Mining: Practical Machine Learning Tools and Techniques*. The book explained logistic regression in more detail and also has a full section on using Weka, the machine-learning software which we had agreed that I would use for this project. The graduate student also was kind enough to teach me how to use Weka, and I ran some simple tests using Weka’s built-in data sets.

Professor Dutta was not feeling well this week, so our meeting was cancelled. However, she sent me an assignment by email: three papers on the subject on seizure prediction and detection to read and summarize. I will be posting their names and the summaries as I complete them.

I took a subway to Brooklyn this week to visit an old school friend of mine. It was really nice to see her again and have fun together like we used to. Brooklyn is really different from Manhattan, but interesting too.

*Return to top*

**Week 3 (July 2 - 8)**

As I'd hoped, Dr. Dutta and I discussed the two papers I had read. She pointed out to me several important points that were particularly relevant to my project. Then she briefly explained some of the difficult equations in the paper I had not understood and instructed me to read it again and summarize it for next week. The main part of the meeting was taken up by an explanation of the basis for my project. Dr. Dutta explained how the EEG data was garnered using a 96-channel micro-electrode array implanted directly in a patient’s brain (she even showed me pictures of the surgery - gory but fascinating!) and how it is analyzed using windows of a fixed width, say 5 seconds, one window at a time. She also explained how the data is filtered to produce recordings at various bandwidths, each one different. The hope is that patterns that may be missed at one bandwidth may be more obvious on another. We discussed how the data is represented as a table containing the various features that may be useful for classifying it, and the data points at each time interval. She also explained to me the basics of support vector regression and assigned me a paper on Support Vector Regression, as well as one on ROC curves.

*Return to top*

**Week 4 (July 9 - 15)**

At our meeting this week, we got into the details of using SVR on the EEG data. Dr. Dutta explained that the goal of SVR was to use regression to find a line that best divides the data into classes. This is easy if the data is linearly separable (that is, distributed in a way that allows a neat line to be drawn between classes), but our data will not be. In such a case, one uses a kernel function (of which there are several) to transfer the data into a higher-dimensional space (usually higher than can be visualized by humans) where there does exist a clear dividing line, or hyperplane, between classes. To accomplish the goal of finding the hyperplane which leaves the maximum amount of space, or margin, between the classes, the regression algorithm has to find a weight to multiply every data value by which will yield an equation for the maximum margin hyperplane. This is accomplished by trial and error using pre-classified training samples. (Though in our case, the output is a continuous value rather than a simple binary class label.) For next week, I am to start playing around with EEG data, albeit not the dataset I will use in the end. I will be posting my progress as it occurs, and am excited to have some real programming to do - no more papers!

*Return to top*

**Week 5 (July 16 - 22)**

This week, I finally was able to do some programming! I downloaded a zipped folder containing 100 text files, each with 4096 data points from an EEG, from the University of Bonn’s website. This still is not the data I will be working with, but it’s close enough. I copied one file’s worth into MS Excel and used the graph-making function to turn it into a line graph. It looked like a real EEG! Here it is: I then wrote a file-processing program in Java (it was nice to be back working in my “native tongue”!) to turn the simple text file into a .arff file that could be used by Weka. Since this was just a practice, my goal was to see if, knowing a set number of data points on an EEG, could the software accurately predict where the next point would be.

Since there are 4096 points, I split the data into 512 groups of 8 points each and labeled the first 7 points of each group as attributes, with the last one being the class label (really a continuous value, as used in SV Regression.) Then I opened the file in Weka and experimented with various ways of splitting the data between training and testing sets, as well as with different values for the C-value, a measure of the size of the margin set between support vectors. Using root mean squared error to measure my results, I got the best results using 70% of the data as a training and 30% as a test set, with a C of 10. The full results can be viewed here .

After spending a short time discussing my results from last week, (they were pretty self-evident) at our meeting, Dr. Dutta finally began explaining to me what Granger Causality, which has been piquing my curiosity as the title of my page for a while, is. It is basically a measure for determining whether one data set, in this case one channel, influences another. The way this is done is by using data values from two channels and seeing if the class label is predicted with better accuracy than when using only one channel, as I had been doing until now. If, for instance, if I achieve better accuracy using data from both channel 1 and channel 2 than I had for just channel 2, that means that channel 1 is affecting channel 2. If not, it means that 2 is affecting 1, or that the two channels are independent. My assignments for this week were to read the papers on Granger Causality listed below, and to run tests using two files from the Bonn data to see if one channel affects the other. Dr. Dutta said that not much work has been done on using SVR with Granger Causality, so if I get significant results, my work might be publishable. Definitely an interesting thought.

I really had a nice time this weekend! One of the girls from the lab invited a group of us to her parents' summer home in upstate New York for a long weekend. We took long walks in the country, picnicked near the lake, and barbecued on the lawn. There even was a fireworks display nearby! I love the country - just being around grass and trees relaxes me - so I really enjoyed the weekend and came back refreshed and ready for another week of work.

*Return to top*

**Week 6 (July 23 - 29)**

I admitted to Dr. Dutta at our meeting that I hadn't read all the papers she had assigned; it was a short week and I was still having trouble with the mathematics, so I read only the parts of the first two that she had said were most important. She graciously explained to me what I had not understood; namely, the algorithm used to apply Granger Causality to a time series, such as the EEG records I am working with. I didn't quite understand it, and hope that she'll explain it again before I need it...

We discussed my test results from last week and agreed that they were pretty inconclusive; however, as Dr. Dutta pointed out, that does not mean that this line of research is not worth pursuing. It could just mean that those two channels had no effect on each other. Then Dr. Dutta assigned me the project I had been both looking forward to and dreading: running Granger Causality tests on all combinations of two files in the data set. Since there are 100 files, that means 100