A good portion of the day today was spent updating our DSSG blog page, which if you are interested can be found here. I cheated a little by making my post really just a link to this site with a short explanation for why I have this blog and how my involvement with DREU and the DSSG program go together. I helped quite a bit with the formatting and stuff though, and then I did more work trying to figure out what is going on with the ugly rides. In particular I started to try and find factors that might help categorize which routes would end up being the ugly rides best given to taxis instead. We also continued trying to do our code review and writing the main function that will call everything else, but it became pretty difficult at different points when we did not understand what Frank's code was doing or why it was structured the way it was. In the end, we decided we will go over it again more when Frank gets back on Monday.
Frank was not here today, nor will he be here tomorrow. Seeing as to how we did code review today, and a lot of the code was made by him, he was definitely sorely missed. Most of the morning was taken up with code review, where we came up with the objective to collectively try and code up the main function. The main function will be the script that calls all the other functions and interacts with the dispatcher. This task may seem fairly simple, almost trivial, but when you have so many parts that need to work together, the code is anything but obvious. We talked for a good amount of time about how we might want to make our code more object oriented, but we're still uncertain how this would really take shape in the context of our specific algorithm. We're still not done coding it up, but we will continue with it tomorrow.
The afternoon was almost exclusively taken up by our Tableau tutorial that was led by presenter John Cicero. It was incredibly interesting, and he made the software seem very user friendly and fun to mess around with. During the presentation I tried to take some notes for Frank, but again the software seems pretty intuitive to use. Also, John Cicero was very insistent that we could email him at any time with questions or concerns about Tableau and he would be happy to help assist us in any way he could.
This morning I worked to get my scripts up on the server as well as the csv file I needed, so that I could run them there and hopefully get a faster response since my program was pretty computationally expensive. Even when I did get it up and running though, it was still taking a while, and after about an hour of processing different rows, I had some interesting initial findings. First off, some of the incredibly expensive cost per boardings (those upwards of $100 ranging to about $495) had very high cost savings when put on a taxi instead. However, I realized something must be wrong with my methodology because some of the taxi costs were coming to zero dollars, which is wrong, and it seemed like some of the really expensive cost per boardings were very cheap taxi rides, meaning that they would be very short distances. This is very suspicious, because most of the ugly rides come from rides that go farther distances, and it is unclear how someone could get such a high cost per boarding without going very far.
The presentations went really well. We were asked quite a few different questions throughout our presentation, and therefore we went over time by a good amount. Afterwards, we chatted with a reporter from Xconomy, which was cool. Overall, I think it went really well, and we were well prepared to answer questions and explain the nuances of our project.
Today being a Tuesday, we started off the day with team updates in our stand-up meeting. It seems like everyone has been busy this past week trying to prepare for our upcoming presentation tomorrow. In light of this, after our big group meeting, my team rehearsed the power point presentation we had put together for the presentation tomorrow. I was hoping this would be just a quick run through, but with everyone being there to give input, we ended up doing a bit more revisions and adding/removing some material as we went, so the whole run through took quite some time. Kristen presented the whole thing in our run through, but now we've decided that her and Frank will split the presentation. Our group can be a bit frustrating at times when it comes to decision making and dividing up tasks because typically everyone is willing to do the task, but no one wants to step on anyone else's toes and so no one ever just says, "Ok, I'm doing this." Anyhow, I was still a little worried that we might go over time in our presentation tomorrow, but we'll practice again tomorrow morning and I'm sure it'll be fine.
The rest of the day I was assigned the task of looking into how much cost savings would be possible if some of the ugly rides were sent to taxis instead. This in theory sounds like a simple one day task, but it ended up being much harder than I anticipated because to get the mileage between latitude and longitude pairs I had to make a separate python script that made osrm requests, which are also very slow. Due to this, I also had to restrict the method I was using to calculate cost and simplify it based on some basic assumptions so that the program would not take so long to run. Surprisingly, the data I was getting back showed that most of the ugly rides would NOT be better off on a taxi, but this can be explained by many factors that go into the way I was doing this very simple first pass at calculating cost differences. For example, I was only comparing to the cost per boarding of the ugly ride, but really an ugly ride effects all other riders who they share the bus with, and if we recalculated what their costs would be instead if we got rid of the ugly ride, then the cost savings might be more dramatic.
After an absolutely amazing week of vacationing in Reston, Virginia, I finally came back to work today ready to catch up on what I missed and help the team again. My morning was spent for the most part trying to catch up by going through missed emails, reading the paper for our reading group, looking over new code that was pushed, and going to CSE to get Leonardo (my beloved work laptop) back. Before I knew it, 10:30 AM had rolled around and I had to go meet with the TCAT group back in CSE. Our meeting this week consisted mainly of quick updates on how each of the projects are going, and Anat updated us about how there will be high-schoolers coming in this week to help with the access map project. Also next week the goal is to try out the battery interrupter circuit project to make sure we can assemble it easily and efficiently. After our meeting, I had to quickly head back to the eScience center so I could be on time for our 11 AM reading group discussion meeting. Honestly, I don't think I got much out of this week's reading group discussion. We talked a lot about census data and the distinction between organic and designed data, and how designed data can be better in some ways because it better represents the population. It was a bit interesting at first, but talking about it for an hour seemed to be a bit of a stretch for me.
After lunch I familiarized myself more with the slides for our big presentation on Wednesday. The presentation is supposed to be 8 minutes long, allowing two minutes for questions at the end and it will be presented in front of the eScience Steering Committee. Also, Valentina told us that there might be a reporter coming, and that she believes he is supposed to talk to our group after the presentations. That's pretty exciting. So, Kristen had updated the slide that explained our algorithm, but then she had to leave, so I looked over the changes she made and then I made some further changes myself. I also timed myself just speaking the notes at the bottom of every slide and it came to eight and a half minutes. That worried me a little because I think it's highly likely we'll go over our time, and I think that a lot of people will also have questions since our problem can be a bit confusing at first.
Additionally, in the afternoon we all took some time to talk about the algorithm again, specifically how we were going to do the pickup and drop-off insertions and how they will deal with the time windows. Unsurprisingly, we ended up running into our constant struggle between having a comprehensive response and keeping our program from being too slow and computationally expensive. At the present time, we have decided to take only the best solution from the pickup insertion, and use that to find drop offs, and if that yields us no results too often we can raise the amount of insertion points that we will go off of, using maybe the three best options or five best options. The reason we don't want to do this right away is because each different insertion point requires more and more calls to the osm routing function which is slow and computationally expensive, and so we are trying to limit the number of calls we make to it.
Throughout my research I kept a daily blog that details what I did and my experiences. On this page you are invited to check out my different entries.