I feel like 10 weeks is not enough for this research project. I feel like since I've started this project, I've come to realize that there's a lot more than I don't know vs. what I do know. However, at least I know more about what I need to learn!
I'm glad that I'm doing this project at my home university so I can continue working with my professors during the school year. They've been very kind to me! I really appreciate having advisors that make the time to work with me on my projects, even if I am an undergradate. I think that is a key thing that I'll be looking for when I apply to graduate schools; having a relationships with my advisors is very important to me.
I think one of the most facinating things about working with NLP and chatbots is the large amount of unsolved problems there are in this field, as well as how many individual components there are in creating a program that can communicate effectively with humans. Not only is there the computational side of digesting and understanding data, but there is also the psychological and linguistics component to these problems -- I've decided to pick up a linguistics minor at my university to get a better understanding of these fields.
Thank you so much to my advisors and CRA-W for giving me this opportunity to do research over the summer! I've been so happy to spend my time doing research this summer -- I'm so glad that I was able to learn so much and do research without having to worry about working part-time. I feel so lucky to be in this program again. If there are any undergraduates reading this, I would highly recommend applying to DREU!
It's my birthday today and I'm spending it in lab. :) Thank you to my advisors and lab partners for making it an enjoyable one!
I've realized that there's a lot of work that comes with collecting the data needed to train models. I've known this before, but I've never actually understood exactly how much work needs to be done to collect and annotate the ridiculously huge amounts of information that make up training data.
I read an article from a data scientist at NYU that spoke about how much they scrape data (forums, Reddit, textbooks, etc.) to collect data for their NLP research projects. I feel like scraping is a moral and ethical issue in computing. The area between owning the rights to the information you've collected and the freedom to use information in the name of scientic research is really grey in the community!
I've come to really appreciate people that understand the mathematical theory behind the ML and NLP algorithms I'm using. I feel like the more theory one understands, the better one is able to approach solving a problem using ML. I feel overwhelmed trying to keep up with what I'm supposed to know in order to apply it to my research project. However, it seems like this is normal among undergraduate researchers.
I was notified this week that I am accepted to present a poster at Grace Hopper! I'm really excited. I presented a poster there last year on last summer's DREU project, to say the least it was a very eye-opening experience. Theres a lot of things I wish I had done/done better, so I'm really excited to have another opportunity to attend this event.
I finally get to use PyDial. I created a script that downloads RxNorm files and creates SQLite databases that the ontology tool from PyDial can use. Or so I thought; the ontology tool that PyDial provides only works with databases that has one table in it. I have seven total. This could mean that I'll have to create my own ontology tool, which hopefully means that I can handle my database in a similar fashion to the existing ontology tool, modified slightly to deal with databases of several tables.
Another issue I'm facing with PyDial is that it's completely written in Python 2. I was hoping to use Python 3 so I can use newer libraries in my code if need me, but it may be likely that I'll have to write my code with Python 2. There is a library (python-future) that allows Python 2 libraries to be compatible with Python 3 code, which I did try to use, but the way that PyDial handles importing modules seems to be incompatible with Python 3 code even after running it through python-future. I guess I can say that I've learned a lot about Python's import system this week.
This week somebody called me a "real computer programmer". Am I really a real computer programmer? I spend most of my time dealing with messy problems that I've created in my code (I wish my code was more elegant.) and reading documentation because I feel so new to everything! What makes a fake computer progammer? At what point does one transcend from being an "fake" to a "real" computer programmer?
So it turns out that REST and SOAP APIs are not that scary, and actually quite easy to use if you have the right Python modules. (Thank you, requests.) The difficult part is integrating the data into PyDial. PyDial does have an Ontology module that can be accessed from a global scale, however, all of the data in the ontology must be present in order to be accessed by the previous modules -- so I can't use API calls while the chatbot is mid-conversation. I could download all of RxNorm's data and create an ontology out of it, however, that dataset would be really huge -- the RxNorm dataset contains information on every single medication that is currently avaliable in the United States. If I were to have RxNorm make API calls on an as-needed basis, I would have to change the architecture of each module that's currently in PyDial... I am still having trouble understanding the research behind the natural language processing of PyDial, so I don't think that is a feasible goal....
I've been struggling a lot with understanding the research behind PyDial. This is the first time I've worked with statistical research of any kind, and to be honest, I feel like statistics is not my strong suit! Furthermore, I'm struggling with understanding how to use RxNorm, the dataset from the U.S. Library of Medicine that is to be used as a part of the knowledge database of the chatbot. I don't have any pharmacutical knowledge and have never worked with a database of this size & scope before. Furthermore, I wasn't familiar with the web technologies (SOAP API? REST API?) that RxNorm used.
However, I'm really determined to not let all of this let me down! I knew starting the project that the learning curve would be really steep.
I'm very shy, and I feel like one of my weaknesses is being too shy to ask for help -- however, at least now that I'm aware of it, I can work to overcome that. Normally, if I'm confused on a problem, I scour the internet for resources that can help me understand what I'm struggling with. However, that is usually a very time-consuming process. So this time, I asked one of my advisors to meet up and help me understand the medical terminology behind RxNorm -- he was actually really happy to answer my questions! I thanked him for helping me out and he thanked me for working with me!
It feels really good to be appreciated!
I spent this week helping out at a machine learning summer camp for high school girls. This was being hosted by the math department of the University of Minnesota. Despite being new at machine learning, I was interested in helping out in this camp because I wanted to see how the instructor would teach ML to students have very little background knowledge in the field!
I have to say, I was really impressed with what the girls were able to accomplish in such a short amount of time. Machine learning is typically seen as a highly-specialized, technical field, so I really appreciated how the instructor was able to explain basic machine learning concepts (SVMs, linear regression) using terms that high school students were able to understand. Yes, they didn't get a lot of the theoretical knowledge, but they were able to create their own ML projects using existing toolkits, primarily TensorFlow and scikit-learn.
Besides the quality of teaching, I feel like the quantity of tutorials that are avaliable online really contributed to the camp's success. The last two days of the camp were deticated to the students' final projects, where many of them utilized online resources and tutorials to make the most of the frameworks. I was really impressed with the camp!
One of my advisors recommended that we use PyDial, a framework for creating spoken dialogue systems created by the University of Cambridge. However, he wasn't able to get the demo configuration up and running on his system. My first task is to do this, of course.
The documentation for the PyDial system is not up to production standards, which is reasonable given the fact that this was designed to be used by a specific research team. (I'm happy enough that they are sharing this code with us!) Much of this week was spent studying how Python packages are structured and fixing issues with Python modules that PyDial was dependent on that were no longer compatible with Python 2.7.
I did get the PyDial demo running! I didn't realize it at first, because the system's response to one of my requests ("I want a cheap hotel in the centre of town.") started off with ("I am sorry but there is no place") which didn't seem right... I spent an hour trying to debug that before I realized that it was actually the correct answer. I laughed a lot with the people in my lab.
I also started reading the publications that are associated with PyDial. This is my first time reading about the back-end of a dialogue system and statistical natural language processing, so I feel like the learning curve for this project will be very steep.
Hello! This is my second summer working with CRA's DREU program. Honestly, I'd say that last year was the zenith of my college experience (so far) by introducing me to research. I'm hoping that this summer is even better than the last... there's a lot of things that I want to accomplish.
List of goals for the summer
On a different note, this week I learned how to soldier! Pictured below are a bunch of microbugs that I helped prepare. They're little robots that run towards the nearest source of light. All I did was soldier on the transistors for the kids since they frequently mess up that step. It's something small, but it was oddly refreshing working with something actually tangible!