I arrived at the Universtiy of Michigan on Sunday and got settled into the dorm with little difficulty. On Monday I got the last of the paperwork stuff taken care of and met with my Mentor Ewdard Olson.There are two other undergraduates here for the summer as well, and we all went to lunch to discuss what we would be interested in working on for the summer. I was chose to work on the adhoc network the robots used to communicate becuase mulit-robot systems has been something I have been interested in for some time and this seemed like a great oppurtunity to dive into thes subject! Ryan , one of the other undergrads working in the lab decided to work on this problem as well The rest of the week was spent familiarizing myself with the lab and network programming in C. The latter was particularly dubious becuase I had not programmed in C since my sophmore year!By Friday we had two laptops communication via a simple peer-to-peer c program over their own adhoc network
Started this week by writing and testing our own flooding and forwarding protocol. We tested it with 3 laptops, A,B, and C. the goal wast to get a message from laptop A to C when A and C are out of range of eachother via B. We placed laptop A on oneside of the building and laptop C on the other and had laptop B directly between them. The test was a sucess and we moved on to analyzing the throughput of different methods of transmitting, mainly broadcast vs unicast.We ran into issues with getting any useble data because we have so few nodes in our system, and both methods seemed to work equally well in terms of stress on the network. However, we did find that signifigantly more packets were lost lost using broadcast than were lost using unicast.
This week I worked on finding out whether broadcast or unicast signals put more strain on a networks bandwidth. Professor Olson suggested this as a good place to start in order for me to be able to show a clear motivation for my research. I used NS-2 for my early experiments but had to abandon this approach because, while NS-2 is great for most aspects of wireless networks, its implementation of broadcast made it hard to discern any impact on the overall network. My real world experiment showed that bandwidth consumption was related more to the total number of signals sent rather than the method they were sent in. For example, if you wanted to send a message to four computers, unicasting to each of them would put more strain on the network than a single broadcast message. This relates to the routing protocol that the robots use to communicate because they use a flooding route request. That means that every node that hasn't seen the request before rebroadcasts the request. The question now becomes can we implement a system that can utilize unicast when the network topography indicates that broadcast would result in a higher total number of requests sent, most of which are unnecessary.
In order to keep the system completely distributed, all decision are made with only local knowledge, I've chosen to pursue an approach in which each node uses the information from each of the messages it receives, or overhears if we set their interfaces to promiscuous mode, to estimate the state of the network and weigh each of its options accordingly when it comes to forwarding a route request. To this end, I want to use to view it as a multi-armed bandit problem in which each of the options, i.e. unicast to one neighbor, perhaps multicast to a subset of neighbors or to broadcast the message. Another option would be to answer the request itself if the node had a route it was decently sure of. I spent the week reading up on the family of bandit problems and papers on somewhat related works. I am feeling good about the potential of this approach, and am now focused on creating a good mathematical model on which to base the "arms"
This week I investigated another possibility to perform more intelligent routing. The approach I am now pursing is to have each node include a brief history of the nodes it has heard from recently in all of the control packets it sends. When a node receives a control packet from that node, it can use that nodes encounter table to make a a more informed decision on how to send out its request. I am taking this approach because it seems the like a quick way to see improvements over the aodv routing protocol being used by the system now, and given the time frame I have to work I think this path is the most likely to come to fruition. Also, the mulit-armed bandit approach can be added in later in late to help each node choose between alternative routes or even frequencies if the lab decides to utilize this aspect of the robots radios. I spent the week designing the the system and began to implement it in ns-2.35 so I can run simulations in order to evaluate the effectiveness of my routing protocol and quickly modify the code without having to worry about dealing with the physical robots.
Using the implementation of aodv already implemented in ns-2.35 as the backbone of my protocol, I was able to quickly code up the initial implementation of my protocol on Monday and Tuesday. I began the initial test simulations on Wednesday and the results were promising , but then I ran into segmentation fault trouble. After , rigorous debugging I discovered that the problem was not with my routing protocol,I would have been surprised if it was, but an omission on my part of fully integrating my custom protocol into the ns-2.35 environment. Once that was resolved I did some comparisons between aodv and my protocol. As expected there were far fewer control packets sent; however, my protocol had a significantly higher rate of dropped packets. At first, I thought this was because multiple nodes might have selected the same node as a candidate for forwarding a request and their subsequent route request packets collided, but when I examined the data I found that my protocol had far fewer collisions than aodv (due to the reduced number of control packets being sent). So now I have turned my focus towards discovering why this is occurring.
After a thorough examination of the simulation data I found that the reason my protocol had so many dropped packets was that, during the route request phase, the time to live on the request packet would expire before the destination node could be found,and because no path could be established, a large number of packets would be dropped. The aodv protocol avoids this sort of problem by using a much broader search over the iterations (albeit rather inefficiently) so that the likelihood of finding the destination before the time to live expired is much higher than my directed version. To solve this problem, I implemented a fail-safe that would be implemented in the event that the directed route request failed to find the destination. Essentially the protocol will now broadcast a route request if a connection with the desired destination was not established within a small time period. This adjustment insures that the worst-case performance of the protocol is the same as aodv plus some constant. I spoke with professor Olson and he recommended that I should implement my protocol in a real-world test using the om1p radios that the robots used in the magic competition. To that end, I spent the rest of the week adapting the radio code from the old magic code base to run my algorithm. I proved to be rather straight forward, all I really needed to do was create data structures to store the information that I was using intelligently route packets. The tough part was encapsulating the necessary information into the packets that would be sent over the radio. The issue was that the simulator didn't require me to set an explicit size on the data structure that I was passing into the header of the control packets. This was helpful because I needed this structure to be dynamic in order to avoid reserving an arbitrarily large amount of space , and potentially making the packets bigger than necessary and thus more likely to be dropped.I solved this problem by creating an encode and decode function. The former was run on the transmitting node and it stored the addresses and last encounter time as 32 bit integers and kept track of how many total bits the data structure contained and passed that number into the header as well. The decode function was run on the receiver side and it used the encoded information to recreate the original data structure. I ran bare-bones version using two radios and was able to recover the encounter table on the receiving node and use it to make route request decisions. The next step is to introduce more nodes (because two nodes hardly verifies that my protocol is working correctly) and to see what happens!
The om2p radios came in this week and Ryan began trying to get them up and running. I continued to further integrate my code into the infrastructure set up in the old magic code base. As I continue to work on the code, I am beginning to understand the thought behind how the whole system works. Unfortunately om2ps are proving to be problematic and towards the end of the week I started to help Ryan resolve the issue. The problem is that we can't get the laptop to communicate with the radio. Right now we are operating under the assumption that we are flashing the firmware incorrectly. We are using OpenWrt as the os for the om2ps. It is a basic linux distribution for embedded devices. Hopefully we are making a silly mistake and we will be able to make progress soon.
After running into many difficulties, we decided to use wireshark to see if there was any communication happening at all.It turned out that the om2p's were submitting a DHCP request which our laptop was not configured to respond to. Once we changed the wired connection settings we were able to ssh into the radios. The next step was to write scripts to get the radios to the desired configuration. One thing we had to do was make sure that all the radios were on the same subnet, and configured for mesh networking. This proved to be straight forward with the latter being taken care of during the image building process.We then decided to focus on getting messages from the sending computer to its attached radio , then focus on sending the message from one radio to another ,and finally focus on recovering the message on the other side. We leveraged some of the existing code in the original magic code base and re-implemented some of the java code in c. We got the message to the radio and we successfully detected the broadcast message on a different radio. Ideally we would like to send both directed messages and broadcast messages and this will be the next step. We brought everything full circle and were able to see the message on a second computer. We all took some time to come together and assemble robot chasis. We broke into pairs and fit the pieces together and applied would glue for extra security. The next day the robots were sanded and filled before Justin took them out to get painted.
This week was crazy because the lab is going to Muscatatuck next week to test the robots in the field, and we still had some things to resolve in the communications system. To make things more interesting, I left Friday morning and Ryan was gone Thursday and Friday. So the pressure was on to get things working. The robots communicate using LCM the code we've written creates a tun interface that listens for lcm messages that are tagged to go over the radio. When the tun interface sees such a message it sends the message to the radio which then transmits the message over the air. The message is then received by the other radios who then determine whether or not their laptop is the destination, and if so they forward the message to the laptop who then processes the lcm message. This worked well for sending messages to specific computers, but sending broadcast messages posed a problem because the tun interface would see that it was a broadcast message and rather than passing the message to the radio, it would keep the message to itself rather than passing it on. I was finally able to hack together a temporary solution by having the computer temporarily change the address of its tun interface when it publishes a lcm message. While not ideal, this solution doesn't seem to interfere with performance so I'm happy with the fix! I'm a little sad that I didn't get to go with the rest of the lab, but I had a blast working with everyone.