Summer Research!

By Ileane O'Leary

Weekly Journal Entries

My Paper (Draft)

About Me

I am a rising senior at Harvey Mudd College. I am a joint CS/Math major. I will graduate in May, 2014. More About Me

This summer I am doing research at Indiana University in Bloomington, Indiana. I'm working under the guidance of Professor Yuqing Melanie Wu, in the Computer Science and Informatics department. Professor Wu's research involves databases, as well as data integration and data mining. In 2010, Professor Wu won the Indiana University Trustee's Teaching Award and the Women in Computing Advisor of the Year. See Professor Wu's Website

About My Project

Problem

Develop an algorithm to efficiently count induced subgraphs in a large network graph, especially induced subgraphs of 5 or more nodes.

Definitions

A graphlet is a small connected undirected unlabeled graph, up to graph isomorphism. An n-graphlet is a graphlet with n nodes.
An induced graph has an edge between nodes a and b if and only if the original graph has an edge between the nodes which correspond to a and b.

What do these graphlets look like?

Here is a diagram displaying all 2, 3, 4, and 5 graphlets.
All 5-graphlets.

Algorithm

Given the graph G, a graphlet A, and a database containing the locations of graphlets of size less than size of A,
  1. Determine how to build A using graphlets that are smaller than A.
  2. Look for small graphlets that fit together in the correct ways.
  3. Record results that build graphlet A in a new database.

How does this database work?

We could store every graphlet with size less than size A, but this would be a lot of things to store. A previous student found that it was enough to store all paths, claws, cycles, and cliques up to size n-1. Here are the shapes we must store for a 5-graphlet. Notice that they all have 4 or fewer nodes. In the labeling, "P" stands for path, "S" stands for claw, "C" stands for cycle, and "K" stands for clique.

These are the graphlets we need to store to find 5-graphlets.

How do you join small graphlets to get a bigger graphlet?

First, decide how many nodes will "join". Then, look at every combination in which that many nodes of the two small graphlets overlap. For example, when we add two paths of length two, this is what it looks like:

How to add two paths of length 2.
Then, when we want to find either the triangle, or the path of three, we can look for paths of two that share one node. For example, in the graph below, we can see the two length two paths, (1,3) in blue and (3,4) in purple. These tuples would be saved in the P2 relation. Then, when we were looking for the path of length 3, we'd look for two paths of length two with a common node. The common node in this case is node 3.

If we wanted a triangle, like the one marked in red, we would again look for two paths of length two with a common node, but we would also check to see if there was an edge between the two outer nodes.

How to add two paths of length 2.

Interested?

Check out the draft of my paper: My Paper (Draft)

To learn more contact me at ioleary@hmc.edu
or, look at the following papers:
"RAGE - A rapid graphlet enumerator for large networks"
"Counting and Detecting Small Subgraphs via Equations and Matrix Multiplication"
"An Algorithm for Subgraph Isomorphism"