Weeks:
1
2
3
4
5
6
7
8
9
10
Week 7
Jul 3 - Jul 9
I spent the weekend at my cousin's place in Houston and worked from there. I installed Ubuntu and MySQL on my laptop. Then I downloaded a flat file containing detailed information about a large number of drugs from Drugbank and wrote a small program to create a tsv file listing only the drug names and their gene targets. The file listing drugs and their suspected ADRs that Jian had sent me generally used the Canadian brand name for the drugs which was often different from their official names. So in the tsv file I created I also included the different brand names for each drug so that I don't miss them out. I didn't know which database to use for mapping genes to pathways. The paper I read used the GeneGo database but that was not freely available. I googled and finally found the CTD database which had a downloadable file containing gene-pathway associations.
With the help of MySQL I used these files to find ADR-pathway links. The results were very disappointing. Cancer happened to be the top ranked pathway for all ADRs. Even when I excluded cancer the ADR-pathway links I found did not make sense. The CTD database was not adequately annotated. I then tried looking at the ADR-gene links found through this method. But I was also unable to find any article supporting these links. I suspected this may be because the ADR-drug database we were using was not too reliable either, as I had noted earlier.
Meanwhile the deadline for submitting the paper was drawing closer. We had no result. The data mining part was not working well either. We still needed to annotate a large number of the ADRs to allow the system to work. Also, we needed to normalize the ADRs. Bob asked Skatje to help with the normalization part. Skatje spent hours on it and with everyone working we finally got the system to produce tolerable precision and recall.
Professor Gonzalez came back this week and so we all met with her and went over what we had done. I told them how I haven't found anything meaningful yet and went over my results again. Bob and Professor Gonzalez were impressed when they heard that I've actually found the genes present in the drug/ADR overlap to be relevant. About 30-80 genes were present in the overlap and among them I had verified 3-6. About 5-6 genes were associated with the ADRs in the articles I read and most of them were present in the overlap. They agreed that this was good enough for the paper and asked me to follow up on that. I didn't think it to be a big deal before and so I hadn't mentioned this finding in detail earlier. Professor Gonzalez said that there are millions of genes out there and so if we narrow the range of genes possibly causing an ADR to even a 100, it would still be a great help.
I was very relieved - there was actually some finding I could mention in the paper which may possibly get it published!
|