Week 1

Maya Anand bio photo By Maya Anand

For the first week, I focused mainly on getting familiarized with the lab, reading papers to get some background knowledge on my project and learning the basics of unix. I learned about basics like scp, wget, ssh, absolute and relative paths, tmux, find, grep, gzip, and tar. We also went over some of the basics of git which I found really helpful because though I’ve used github for a lot of class projects in my CS classes in school, I never really learned about it in a more formal way. I started to learn about the 1000 Genomes Project, a project that run from 2008-2015 and sequenced 2504 individuals with the goal of finding genetic variants with frequencies of at least 1% in the population. The data from this project is published online for anyone to download and will be the source of data for my project. During the week I read up on different file formats that I would need to work with for my project like PED, VCF, BAM/CRAM, FASTQ and 23&Me-like format. I also learned about some programs that are used to work with these types of files like plink, samtools, bcftools, vcftools and tabix/bgzip. I downloaded one VCF file from the 1000 Genomes Project and practiced using tabix, gzip and bcftools to extract different pieces of information from the file to understand how the file is laid out and how the tools work. For some of the tools, the documentation isn’t very thorough so I went through some trial and error and looking at online bioinformatics forums to understand how everything worked. I also read papers about various algorithms that might be related to my project like Germline, IBD/IBS and ERSA. During this week, I also went through orientation at the New York Genome Center, got my badge, and took care some paperwork that needed to be signed.