Last week, I was away in Maine and, after a week off, it was really great to get back to the lab. For part of this week, I worked with the other undergraduate intern in the lab on some pages for a website about his project. We wanted to create some documentation with concrete examples on how to use some basic bioinformatics tools to work with vcf files and this drew on a lot the experience from my work earlier in the summer. It was a lot of fun to colloborate and put our knowledge/experiences together!
On my own project, I ran my simulated 23andMe files for several samples through an in-house imputation pipeline used for another project in the lab. The 1KGP samples are a good positive control for this pipeline because the vcf from 1KGP are whole-genome sequencing. If our imputation is good, the records from the imputed vcf and the original 1KGP file should match. We found that this was the case for the vast majority of the SNPs, which is an encouraging sign. The lab has plans to change/update the imputation algorithm used, so my scripts to compare the vcfs will be useful in the future to also compare the new imputation algorithm to the old one to the positive controls.