After getting my computer accounts set up and working and getting settled into an office, I read four published papers about anycast routing algorithms. Reading the papers involved a lot of time spent on the internet looking up terminology, since I had very little knowledge about networks or routing. I also read a manuscript my mentor is working on which describes a new alogorithm.
Making a Plan
Next, I created a game plan for the summer. My mentor is away this week, but I met with him before he left to discuss the project. Ge Xia, one of his PhD students, has been helping me. He has also let me observe weekly meetings that he has with some other graduate students, where they discuss their ongoing research. I enjoy these meetings because they expose me to a variety of topics.
As a requirement for the Distributed Mentor Project, I have to make a website documenting my ongoing research. I learned HTML, with a little help from two generator programs: Quanta and AceHTML. I also learned some CSS, which seems very useful. I tested the website design in Internet Explorer, Netscape, Mozilla, and Konqueror. I also validated the HTML here.
There was short crisis when I had some problems with the Linux partition on my laptop. I am fairly new to Linux and don't always know what I'm doing. Apparently, I was born without the little voice inside my head that tells me to leave things alone when I don't know what they are for... but in the end everything worked out okay. I can even see my X Windows display again.
In order to test our routing algorithms, I am planning to write a simulation of a network. Fortunately, Ge Xia knew about a set of libraries called CSIM which is used for discrete event modelling, so I won't have to start from scratch. I just have to figure out how to map various parts of my network to the elements in a CSIM simulation. Another question that arises is the generation of the network topology. If I want answers about the performance of the algorithms on the internet, for example, I have to choose a network topology that is representative of the internet. People have done studies on the topology of the internet, and one of the things they figured out is that the distribution of the degrees of nodes follows a "power-law", which means that there are a few nodes of very high degree and many nodes with low degree. I downloaded a random graph generator called BRITE, which allows me to specify this property and several other properties of the graph I want to generate.
I have begun writing a simulation of a network using the CSIM libraries. The parts of the CSIM simulation are processes, facilities, and storages. I implemented requests as processes, routers as facilities, and bandwidth of links as storage. Although storage does not seem like an intuitive way to represent bandwidth, the way it is used by the simulation is appropriate. Currently, the simulation represents the arrival of a request as a Poisson process, meaning that the time between requests is based on an exponential distribution. CSIM has several random stream generators based on various statistical distributions that will be helpful as I progress.
I have represented the bandwidths between nodes as an adjacency matrix, such that the integer at position (i,j) represents the bandwidth of the directed edge that starts at i and ends at j. A bandwidth of zero means there is no edge. Storage capacity is defined based on initial bandwidthh, and built-in CSIM functions keep track of how much bandwidth has been allocated or is still available. At each request, the current amount of available bandwidth is translated to an adjacency matrix, then links that do not meet the bandwidth requirements are assigned bandwidth zero.
Algorithms and Data Structures
Next, an implementation of Dijkstra's algorithm is used to find the shortest path from each node to each of the anycast destination nodes. Currently, the nodes are represented as a linear list. Performance of Dijkstra's algorithm could be greatly improved by storing the nodes and their associated data in a Fibonacci heap instead.
Logical versus Physical
With anycast routing, the node that initiates the request does not neccessarily know that the destination it requested is a member of an anycast recipient group. In order to mirror this in the simulation, and to make manipulation of destination information more clear, I defined logical nodes to represent destinations. Each logical node maps to one or more physical nodes.
Several elements in the simulation need to be generated randomly with each request. Currently, I have written functions that use CSIM random number generators to define the source node, destination node, and the interval of time between requests. The generation is simple, with destination nodes being picked from the set of logical destinations, each logical node having equal probability of being picked.
Picking nodes from the logical destinations means that an entire group of anycast physical nodes is treated as one node. For example, if there is one node representing the Yahoo site, one representing MSN, and three nodes representing mirrors of the Google site, there will be equal chance of picking Yahoo, MSN, or Google. When Google is picked, however, there will be three possible ways to access it. It may be neccessary to change this scheme later in development if, for instance, we want to test certain percentages of anycast requests relative to total requests. I have structured the logical nodes in such a way that this will not be difficult to implement. I have also designed the logical nodes in such a way that it will be possible to define one or many sets of anycast recipients, since I have treated a unicast destination as a special case of anycast group with size equal to one.
A Simple Simulator
So far I have only tested the simulator on network topology that I have hard-coded into the program. My next step will be to access topology information from a random graph generator such as BRITE. I have not fully implemented the algorithm we are testing yet. In the current version of the simulator, nodes and edges not meeting the QoS requirements are removed and then the anycast destination is chosen based on a shortest-shortest path method. I implemented TABLEs and QTABLEs, which are features of the CSIM package, to keep track of elements of the network, and I use CSIM features to reserve resources on the network during transfer of a packet and to simulate the passage of time. Tables and
reports are printed to the screen to show the workings of the simulation. Testing so far has yielded expected results, with a larger traffic load causing increased number of requests failing due to insufficient bandwidth.
Right now, the generation of requests on the network is simulated with the exponential distribution (representing a Poisson process). Some studies have shown that internet traffic often has a bursty or self-similar behavior. It is not yet clear to me whether it will be neccessary to simulate this specific behavior in order to adequately test the algorithms. Regardless of whether I use this information in the project, it is an interesting phenomenon.
This week, I modified the simulator so that it accepts BRITE fiels as input and extracts elements of the network topology from each file. A list of input file names is saved in a text file and the simulator reruns itself until an output has been generated for each file in the list. The output files contain information about each request, such as its source, destination, QoS requirements, and whether the request was fulfilled successfully. If it was successful, the simulator outputs the path. If it was unsuccessful, the simulator indicates what caused it to fail. Some causes of failure are no path short enough exists, or not enough bandwidth is available. At the end of each output file, the simulator prints statistical information about the values which were randomly generated, and information about the usage of routers and bandwidth.
I generated several BRITE files with number of nodes varying between 10 and 100 and observed the output files created by the simulator for each of these. I noticed that long paths were not being rejected when their size exceeded the QoS requirements, and I fixed the problem. I also changed the program to treat all edges as undirected edges, since these are what BRITE outputs. To make the program more readable, I moved all input and output functions into a separate file. I also changed several constants to global variables so that values such as number of requests or number of logical destinations can be specified by the user without requiring that the program be recompiled.
My next step will be to specify a number of specific algorithms that will be used to choose the routing path. There is great disparity in the way different anycast routing algorithms work, so I may need to modify some parts of the simulation when dealing with particular algorithms. I will also have to make sure that the comparisons I make between algorithms are valid.
Never Stop Learning
The better part of this week was spent studying mathematical ideas and routing concepts that I will use in the remainder of the project. My mentor introduced some new ideas about what makes one anycast routing algorithm better than another, and described the benefits of application layer versus network layer algorithms. He described a way to treat the problem of choosing a path as an optimization problem. I spent most of the week studying optimization topics, including the traveling salesman problem, maximization of network flows, and linear programming.
Random Algorithms and Safety Based Routing
Our algorithm uses weighted random selection when choosing a path through the network. This method has been shown to help balance network traffic load. I began adding a feature to the source code of the algorithm that will allow selection of a next hop based on an empirical distribution, where the empirical values will be the weighted probability of choosing each link. I researched safety-based routing and the update triggering policies that are used to determine when link state information needs to be updated.
Inaccurate Link State Information
This week I added a feature to the simulator that allows each router to have a "view" of the network that may or may not be the same as the actual state of the network. When the status of a link changes, the node connected tests to see whether it is a significant change. All changes are significant at this time, but the function can be edited to reflect various update policies. If the change is significant, the node floods the network with an advertisement about the new state of the link. The advertisement is spread using breadth-first search, with a delay occurring at each level.
More About Randomized Algorithms
I have begun reading a book called Randomized Algorithms by Motwani and Raghavan. The book was loaned to me by Ge Xia after I expressed an interest in the subject. The book describes advantages of using randomness in certain situations and also discusses techniques for analyzing and applying randomized algorithms.
Testing and Debugging
This week I spent time testing and debugging the simulation by running it with the shortest-shortest path algorithm and determining whether the results matched what was expected. I ran the simulation with several BRITE graphs of different sizes and with various parameter values. I observed that the larger the number of requests, the more consistent the results. I think it will be a good idea to use statistical methods to determine what number of requests is necessary to produce an acceptable p-value. I modified the simulation to loop with the same input file, incrementing the time interval between requests with each loop. The results are output to a file in a form that will be easy to graph later using gnuplot or another utility.
In addition to the shortest-shortest path algorithm used for determining an anycast path in the simulation, I worked on two other algorithms: Random Based on Hops and Random Based on Hops and Safety. Both these use a weighted probability to choose an anycast destination and use another weighted probability to choose the next hop based on destination. I am using a CSIM random stream drawn from an empirical distribution to implement these. I expect to have both algorithms finished by the end of next week. In writing the algorithms, I am seeking to make them as modular as possible, so that certain pieces may be reused among algorithms.
This week I prepared a poster outlining the main ideas of the project. I graphed the results from several runs of the simulation using gnuplot. I also created flow charts to illustrate some main ideas. I designed the 40" by 30" poster using Open Office, then saved it as a Power Point file and printed it on the department's plotter. On Friday, there was a presentation. Everyone stood by their poster and gave a 10 minute presentation to each of two judges.
I spent the tenth week writing up a final report about the project. I ran the simulation on several different graphs of different sizes and plotted the results using gnuplot. I recorded some background info, a description of the project, the results, and my interpretation in a paper. My final step will be to post the report on my website and move all the HTML files to the DMP server.