Relevant Links:
Emily
Home
Research Home
My Journal
On Another Note
Clustering Methods and Analyses:

KNN Clustering

K-Median Clustering

Kaplan-Meier Estimation

Top Genes based on Variance

Cross-Validation

Significant Genes

Kaplan-Meier Survival Analysis


Basic Ideas Behind The Kaplan-Meier Estimation Basic Definitions Method Employed The Harvard Dataset My Results Future Plans

Basic Idea Behind The Kaplan-Meier Estimation:

Back to Top


After a clinical trial is executed there is a follow-up period, during which the scientists attempt to determine whether or not the proposed treatment was successful. One of the most common measures in clinical trials is death.

However, patients involved in trials do not always complete the follow-up, thus creating data that is not fully-accurate. For example, if the follow-up period is 12 years, patients may actually leave after 4 years. A patient that does this frustrates the trial as the result, in the predescribed timeframe can not be concluded.

Thus, the Kaplan-Meier Estimation for Survival Curves attempts to provide the researchers with a viable interpretation of the data.


Basic Definitions:

Back to Top



Censored Data: - Patient data that is not reflective of the entire follow-up period, data relating to a patient that has left the study (not through death).


Method Employed:

Back to Top



Example:
Taken from Survival Curves: Accrual and The Kaplan-Meier Estimate at http://www.cancerguide.org/scurve_km.html. CancerGuide (May 28, 2003).


Dataset:
This example involves seven patients. The numbers below represent, in order, how long the patient lived following the trial.

The total desired follow-up period was 12 years.

It should be noted that a plus (+) sign after a number indicates that the patient left the study after the specified number of years and was, at that point, alive.

12+3+45+1012+


From this data, the following table could be constructed:

Interval (Start-End)# at Risk at Start of Interval# Censored During Interval# at Risk at end of Interval# Who Died at end of IntervalProportion Surviving This IntervalCumulative Surivival at end of Interval
0-170716/7 = 0.860.86
1-462413/4 = 0.750.86*0.75 = 0.64
4-1031211/2 = 0.50.86*0.75*0.5 = 0.31
10-1210101/1 = 10.86*0.75*0.31*1 = 0.31

Using this information, a graph was constructed.



The Kaplan Meier analysis that I used was written by Wenting Zhou, a member of my research group. The program is fed a file containing both the survival time and whether or not the data was censored. The program outputted a file readable by Microsoft Excel, which was then used to plot the graph. Each grouping was entered into the program seperately so that the program created an independant curve for each group.


The Harvard Dataset:

Back to Top



The Harvard Dataset can be found on the CAMDA website at www.camda.duke.edu/camda03/contest.asp. It contained five distinct tumor groups: adenocarcinomas (AD), squamous (SQ), cartoid (COID), SMLC, and normal lung (NL). There were approximately 200 tumors within the dataset, for each of which, 12,600 genes were analyzed.


My Results:

Back to Top



It does not appear that the clusters are particularly related to survival. Please click on the link below to get to the Kaplan Meier Curves.

KFull DatasetShortened DatasetOverlapped Dataset
2Click HereClick HereClick Here
3Click HereClick HereClick Here
4Click HereClick HereClick Here


Future Plans:

Back to Top



I will be working on a new clustering algorithm soon, when the new clusters are created, I will use this analysis again.







Questions or Comments?
Email Me! Emily K. Mower