Weeks:
1
2
3
4
5
6
7
8
9
10
Week 5
Jun 19 - Jun 25
I finally finished the survey on Friday and met Professor Gonzalez on Monday. I told her that I had slacked off that week and I could see that she was a bit disappointed. She explained that in research no one pushes you to work. You need to push yourself- otherwise you fail. There will be people to guide you along and help you out but you need to keep working on your own and go forward and ask for help when you get stuck. She advised me to allot a certain amount of time for working on my research each day and then perhaps use surfing facebook or eating ice cream as rewards for finishing the tasks so that I don't slack off. I felt very guilty and promised myself to never do this again.
We also discussed what I may do next. Our first aim was to test the hypothesis that there is a significant similarity between the network of genes affected by a given drug and the gene network involved in its known ADRs. To do that I needed:
1) A list of drugs with their known ADRs
2) The known gene targets of the drugs and the genes responsible for the metabolism of drugs
Jian already had a list of drugs and their ADRs from MedEffect -the database of reported adverse reactions in Canada and he e-mailed it to me. I had learned about different databases that contained information about the protein targets of drugs from the papers I read. These included DrugBank, Matodor, Supertarget, Pharmapendium and pharmgkb. Apart from Pharmapendium, they were all free. Among them pharmgkb happened to be the most informative one.
My first challenge was choosing which drugs to use. To get reliable results from GeneRanker we needed to provide at least 10 seed genes. Many of the drugs had as few as 2 known gene targets. I went through the list provided by Jian and selected 11 drugs which had adequate known drug targets. In GeneRanker I needed to provide the entrez id of the genes as input while the databases available listed only the names of the gene targets. So I had to use another online tool for converting the official gene name to entrez id which took a LOT of time. Next I put the entrez ids into gene ranker, obtained the extended list of genes, and copy pasted them into excel. This was the list of genes possibly affected by the drug.
For each drug I selected some of the adverse reactions it was suspected to cause and some of the ones it was never suspected to cause. Some drugs had over a 100 suspected ADRs and it was not possible to use them all. When selecting the ADRs, I noticed that the disease the drug was supposed to treat was almost always listed as an ADR of that drug and some of the side effects I found by googling for the drug were not present in the list. So I was no longer sure whether this list was reliable or not. Thus when choosing the known ADRs I made sure that I chose those that I also found by googling and when choosing the unknown ADRs I only chose those that I did not find linked to the given drug when I googled. I put each of the chosen ADRs into GeneRanker and obtained a list of all the genes it was known to be associated with. I next used excel to calculate the jaccard index for each drug/ADR pair. For the drugs I considered all the genes in the list when calculating the jaccard index since the total number of genes was not too high. But for the ADRs I generally considered the top 500 since GeneRanker produced a long list of upto 5000 genes in some cases.
The results were very disappointing. There was absolutely no significant difference between the jaccard similarity for a drug/known ADR pair and that for a drug/unknown ADR pair. Our experiment had failed horribly. Professor Gonzalez was out for a conference and I e-mailed her the results. She suggested I try the following:
1) Ask Fabian to try to use PTQL for finding more gene targets of drugs since the number of seed genes I've been using may be too small for gene ranker to function properly - We tried it but PTQL did not work well. It was not able to correctly identify genes and drugs in its database and so produced clearly wrong outputs.
2) Root out some of the more common genes since they may be acting as noise - Fabian provided me a list of the top genes for all ADRs in the database used by GeneRanker and I recalculated the jaccard similarities after removing those genes from the lists but this still did not produce any difference.
3) Try a different set of drugs and ADRs. Professor Gonzalez asked an epidemiologist for suggestions. She recommended trying levodopa which produces nightmare, clarithromycin + cisapride which produces long QT syndrome and geodone which produces dyskinesia. She also asked us to see whether by comparing the networks of the given drugs and ADRs we can predict which of these ADRs can be caused by quetiapine. I again followed a similar method for finding the Jaccard similarity of the drug/ADR pairs. The similarity again showed no pattern. But this time I read a few papers and looked through Pharmgkb to know the names of the genes that are suspected to be responsible for drug induced long QT syndrome and dyskinesia. It turned out that most of the genes suspected of causing long QT syndrome were present in the clarithromycin/cisapride/long QT syndrome overlap. Similary the genes associated with dyskinesia were also present in the overlap between geodone/dyskinesia and quetiapine/dyskinesia. So from the given data, it looked like quetiapine causes dyskinesia and that was true. So it looked like although the number of genes in the overlap did not reveal any important information, the highly ranked genes may be relevant. I tried to see whether this was true for the other studies I did and it seemed to be valid there as well. I also noticed a tradeoff between recall and precision. If I included more genes from the ADR list I could often get all the genes that are suspected to cause the ADR in the papers I read but then the number of genes in the overlap increased significantly.
Professor Gonzalez also sent out an e-mail to Bob, Jian, Laura and Fabian and asked them to help me with my project. We would now be working as a team with Bob as our team leader. Bob included another student in the Diego lab - Ryan in the team to help with text mining part.
The temperature in Arizona had been steadily increasing and by then it was so hot I could no longer go out on the terrace even in the evening. I missed singing on the terrace. I also barely saw Mark. I hung out with Skatje and Melissa mainly. We had dinner outside and watched a few movies - Up, My Sister's keeper and Drag Me to Hell. The first two were awesome but I have absolutely no idea how the last one got such great reviews. We probably only sat through the entire movie because we had to pay $10 for it. Oh and there was good news - I got my laptop. It looked sturdy and functioned well but was so huge and heavy! Oh well, no use complaining about that now. As long as this one doesn't break down too.
|