This summer I am doing research at Indiana University in Bloomington, Indiana. I'm working under the guidance of Professor Yuqing Melanie Wu, in the Computer Science and Informatics department. Professor Wu's research involves databases, as well as data integration and data mining. In 2010, Professor Wu won the Indiana University Trustee's Teaching Award and the Women in Computing Advisor of the Year.
If we wanted a triangle, like the one marked in red, we would again look for two paths of length two with a common node, but we would also check to see if there was an edge between the two outer nodes.
To learn more contact me at ioleary@hmc.edu
About My Project
Problem
Develop an algorithm to efficiently count induced subgraphs in a large network graph, especially induced subgraphs of 5 or more nodes.Definitions
A graphlet is a small connected undirected unlabeled graph, up to graph isomorphism. An n-graphlet is a graphlet with n nodes.
An induced graph has an edge between nodes a and b if and only if the original graph has an edge between the nodes which correspond to a and b.
What do these graphlets look like?
Here is a diagram displaying all 2, 3, 4, and 5 graphlets.
Algorithm
Given the graph G, a graphlet A, and a database containing the locations of graphlets of size less than size of A,
How does this database work?
We could store every graphlet with size less than size A, but this would be a lot of things to store. A previous student found that it was enough to store all paths, claws, cycles, and cliques up to size n-1. Here are the shapes we must store for a 5-graphlet. Notice that they all have 4 or fewer nodes. In the labeling, "P" stands for path, "S" stands for claw, "C" stands for cycle, and "K" stands for clique. How do you join small graphlets to get a bigger graphlet?
First, decide how many nodes will "join". Then, look at every combination in which that many nodes of the two small graphlets overlap.
For example, when we add two paths of length two, this is what it looks like:
Then, when we want to find either the triangle, or the path of three, we can look for paths of two that share one node. For example, in the graph below, we can see the two length two paths, (1,3) in blue and (3,4) in purple. These tuples would be saved in the P2 relation. Then, when we were looking for the path of length 3, we'd look for two paths of length two with a common node. The common node in this case is node 3. Interested?
or, look at the following papers:
"RAGE - A rapid graphlet enumerator for large networks"
"Counting and Detecting Small Subgraphs via Equations and Matrix Multiplication"
"An Algorithm for Subgraph Isomorphism"