Cross-Validation



Relevant Links:
Emily
Research Home
My Journal
On Another Note
Clustering Methods and Analyses:

KNN Clustering

K-Median Clustering

Kaplan-Meier Estimation

Top Genes based on Variance

Cross-Validation

Significant Genes

Cross-Validation


Basic Ideas Behind Cross-Validation Method Employed My Results

Basic Ideas Behind Cross-Validation

Back to Top



When a clustering program is created in a supervised situation, it is necessary to be sure that it can perform in an unsupervised situation. Thus, cross-validation is used.

In cross-validation, a portion of the data is set aside as training data leaving the remainder as testing data.

The quality of performace of the program on the testing data reflects how well it would perform in an unsupervised setting.


Methods Employed

Back to Top



The user was first asked to input the desired level of cross-validation, CV.

Using this information, the data was partitioned into CV equal groups. CV-1 of these groups were training sets, and the last group was the test set.

The program was run and result obtained.

Then, the test and training datasets were switched. This occured CV times such that each group was the test group exactly once.

The success rates were averaged over all CV trials to arrive at the final success rate.


My Results:

Back to Top



This cross-validation technique was applied to the KNN-Clustering algorithm.

For the results, please click here.







Questions or Comments?
Email Me! Emily K. Mower