Research

This summer I'm researching data depth, a method of efficiently analyzing large data sets using their geometric properties. DD is very useful as a method of outlier determination, as well as a way to visualize the shape and properties of the data sets. Essentially, it is a center-outward ordering of points that can be used to form depth contours, areas of increasing depth.

Over the summer I performed an in-depth analysis of new ideas for depth measures using proximity graphs, and studied their strengths and weaknesses. One data depth measure I looked into is that of the distance to the Convex Hull along edges of proximity graphs. Here is a forest formed by traversing edges inward until all points are reached. The level of any point on its respective tree corresponds to its depth in this particular measure.

I also defined an estimator of data sets and a new graph that performs well in my depth measure yet has only a linear time dependence on the dimension of the data set.