#### Summer Research!

Back to Main Page

#### Week 11

This week is my last week, and a short one too! I am working on completing the paper, which is going slowly. My professor and I have made arrangements for me to continue to work on the paper during the school year.

I also had the opportunity to attend some of the graduate student orientation events this week. This was a great opportunity to learn about topics like choosing an advisor, developing a thesis, and managing the workload of graduate school.

Reflecting, this summer has been very productive. I have learned about databases, graphs and graphlets, and SQL. I have a much better idea of what research is like in an academic setting, and I have much more information about graduate school.

#### Week 10

I have finally written code that correctly accounts for rotations! The trick was to use the matrices. The algorithm was to first take the large, resultant matrix. Order the rows and columns every way possible, and see how many of these reordered ones looked like the original result. Save this number as N. Then, follow the same rearranging procedure for the two small matrices, save these numbers as g and h. Then, the number of rotations that are found is N/(gh).

I spent most of this week debugging the code. Now I am working on getting everything commented, and writing the final paper.

#### Week 9

Writing code to remove the rotations has been very difficult and frustrating! There are a lot of cases to consider. I have the feeling that at present, my code is incredibly innefficient and inelegant. It is also incorrect!

I am also increasingly nervous that it will not be completely working in the next two weeks. I am trying to stay optimistic! Often times there are breakthroughs :) This weekend I may try to rewrite portions of the rotation algorithm to see if trying a new method works.

#### Week 8

I've started to write code to find the best search for a shape.

The first part was to write code to find the number of comparisons required. This required a bit of thought, but ended up being quite straight forward because we had enumerated the comparisons in the SQL code. The second part is to write code that tells me the number of times the shape will be found (since if you look for it certain ways, there will be repetition). This has been tricky! The first algorithm I tried was to make a list of all the small shapes that could make the larger shape. Then, I rearranged the matrix representation of the large shape in every arrangement of rows and columns. For each matrix, I then looked for the smaller matrices, in order to count how often they appeared.

However, this gave me a very high count. I realized this was because of symmetry - for example, a triangle would be counted 3 times, and a rectangle would be counted 4 times. In the databases, we are storing the shapes with particular rules which mean that a triangle is only counted one time. So, I'm working on how to count the rotations, to remove this.

#### Week 7

This week I started to consider how to best search for a shape. So, suppose you have a path of length 4. Is it better to look for a path of length three and a path of length 2 with one overlapping node, or two paths of length three where there are two overlapping nodes?

There are a few factors to consider. There are graph independent factors - these are things like the number of comparisons required. So, if there is one overlapping node, one comparison is required. If there are two overlapping nodes, there are two comparisons required. There are also additional comparisons required to show that it is an induced path of length 5, which means no two points on the line are connected to eachother.

There are also graph dependent factors - these are dependent on what type of data you are using. For example, one graph might have very few paths of length 3, so it would be faster to look up two P3 shapes than to look up one P3 shape and one P2 shape.

Ideally, I will have code that will tell me what is the best small shapes to look for are, in terms of speed.

#### Week 6

I can't believe I only have a month left! I hope I can make some more progress.

This week I'm focusing on a few long term things - mainly, getting ready to write the paper required at the end of the DREU. I have used latex, but never a research paper format. So, I now have a sample format.

I've also been reading papers to provide more background for the paper. The papers are a bit difficult to parse, in part because I'm not sure I have enough background to tell what is relevent. I'm especially interested in papers that might give me an idea for how to give the nodes in an order to specify the graph shape (to solve the problem I encountered on week 4). But I haven't found any papers that address that yet.

#### Week 5

This week I wrote code that takes one final shape, and returns a list of the different ways you can make the shape with two of the standard small graphlets.

To do this, I used the code which I had already, which takes two small graphlets and returns everything they could combine to form. The algorithm is to first list all graphlets smaller than the input. Then, take every combination of two graphlets from the list. See what they can build, using the original code. If it builds the input graph, save the two shapes, and how they need to combine to form the input. Then, return the list of small graphlets that can build the input graphlet.

To make this slightly more efficient, I consider the size of the two small graphlets, compared to the input graphlet. For example, we don't need to test any combinations of the small graphlets where there are fewer nodes than in the input graphlet. Likewise, if either small graphlet has more nodes than the input graphlet, it won't work.

This is probably not the most efficient algorithm to solve this problem, but it does leverage the code that I already have. At the moment, I am only working with small graphlets and the time hasn't been an issue.

#### Week 4

This week I ran into some challenges. The SQL code runs on a database. Each relation is a set containing tuples of nodes which form the small graphlet. The SQL code returns a new set of tuples which form the large graphlet. However, the values in the tuples do not have a specific order. This is a problem because there is repetition (so triangle "1, 2, 3" is the same as triangle "3, 1, 2" and "2, 1, 3". It's also a problem because it makes it difficult to tell what the exact shape is. For paths of length 3 for example, is path "3, 1, 2" the same set of nodes and edges as path "1, 3, 2", or path "2, 1, 3"? This is difficult to fix, and Prof. Melanie and I are looking more closely at how to fix this. This has been my greatest challenge so far.

I am continuing to read papers. This involves making a citation trail and finding papers which have cited papers I have already read. In the next few days I'd like to start keeping track of the citations in latex. I haven't made a bibliography in latex in a while, so it may have a bit of a learning curve!

#### Week 3

This week I modified the python code, adding the capability to generate SQL code. This means that I can run the python code. This code takes in two small graphlets and gives us a file with sequal code which shows all the ways the two shapes could combine. Then, I can run the SQL code and it returns a set of the nodes which form the final shape.

Modifying the python code has been very time consuming and a bit frustrating. There have been two parts of this - debugging the python code, and then debugging the SQL code that is generated. Then, I have to go back and update the python code so it will generate the working SQL code.

This week, I had the opportunity to give a three minute talk about my project to a group of other summer interns. It was a great opportunity to practice presentation skills! At the same intern meeting, I met the Dean of Computer Science and Informatics. He spoke about graduate school and told us a bit about the programs available to women and minorities at Indiana University.

#### Week 2

I'm starting to work through the python code I have inherited from another student and trying to understand the different parts. It's somewhat tricky to wade through the code and figure out what each part is doing. Soon I am going to need to modify the python code so it generates SQL code, in order to look up the small graphlets in order to find larger graphlets.

But before I do that, I need to learn SQL! I've been working through chapters in two textbooks on databases, SQL, and a very small amount of relational algebra.

I also had the opportunity to tour the super computer Big Red II! Extremely cool!

#### Week 1

I had a great first week! I got my computer accounts up and running, did quite a lot of reading, and started to get familiar with Indiana University campus. I had my first research meeting.