The CRA Distributed Mentor Project '01
In the first week I just got comfortable with the SFBP. I read introductory papers (including Amy's ICML paper and Brain' initial paper). I also ran some small tests on having a set probability of going to the bar for each agent (.6 in this case). Here I got results with a mean of 60 people at the bar and a variance of ~24. This was all base work for the actual problem and did not include any learning.
In the second week of my research I followed up on the research Amy had done in her icml paper. I read about boosting and implemented it in relation to the agents (i.e. it was the learning algorithm they used). The boosting algorithm proved fairly unsuccessful in that the results were always unfair with some agents attending the bar every day and others never going. In the second week I also ran tests with predicted equilibrium changes within the SFBP for: home r(h), home, rewards of 0, r(b), bar,+/-1; r(h) -/+1, r(b) +/-1, r(h) as a function of the number of people at the bar and a fee for attending, and r(h) as a a ratio reward which reflected the number of people attending over the capacity of the bar. I also began to learn Latex (crazy!).
In week three I began to focus in on more complex learning algorithms with a little more potential (or so it seemed). I implemented hedge-boost, which exponentially reinforces the rewards gained/lost for taking an action. Unfortunately we found that hedge also converged to an unfair solution, though, if you play with the base factor (in the exponent for the rewards) you can slow down that convergence so the results appear to be fair for part of the time. This through me off for a while before I figured out that the results were really not as good as I had hoped for. But on to bigger and better weeks.
In the fourth week we introduced Q-learning which would then dominate the research for the rest of the summer. I had wanted to encourage coordination between agents (so it seemed that was a major missing piece in the puzzle of SFBP) and Amy suggested Q-learning. With Q-learning I could control the number of states the agents saw, so there could be a bar and home state, but there might also be a wait state, thus encouraging maximizing all agents overall utility (and the utility of the game). When I first worked with Q-learning, I had my agents completely explore their state space for 20% of the rounds and then exploit it. Looking back on this it seems a little odd, as they are exploring off of each others' random actions and only really seem to learn in the beginning of the exploitation time (though, with too short of an exploration time results are worse). The agents were converging to a solution, but some went to the bar every other day, which implied to me the ability to learn coordination. At this point I thought that I really might have a solution because my results were incredibly encouraging and it looked like a little tweaking might help push my agents to a fair solution which maximized the overall utility. But, I couldn't seem to push them there. Thus, I stepped back and pointed myself in a new direction.
After talking with Amy about results from the discrete stages of exploration and exploitation, we decided to make the change (from exploring to exploiting) a continuously decaying one. Amy suggested a decay equation, which was tweaked and adjusted many times to follow but worked well in the end, and I was off on another task. The continuous learners (as they are hereafter dubbed) did not seem to converge to an unfair solution, in fact, the agents didn't seem to converge at all. But, the unconverged results were so good that we ended up using them as an actual solution. In other words, while Q-learning is generally run until convergence and then tested, we tested while it was converging and came out with positive findings. The agents, though not maximizing their utilities, were acting fairly and coordinating. Good, good, stuff.
At this point both Amy and I decided that we need to start condensing and organizing the work I had done. Though it seemed like a lot of good things had come out of my research, it was so disorganized that no one besides us could see that. So, I dove back into my adventures with Latex and began to write up results. I also learned how to use gnuplot and started outputting lots of pretty graphs. Up until this point I had struggled with jplot and gnuplot, but once I was forced to sit down and graph results, it all became second hand. I should also mention that twice a week, every week, Amy had organized a reading group of about six of us in which we read papers on different learning algorithms, game theory, and economics. I liked the setup of this group because I got to learn about things outside of my direct research but pertinent to my work and, more importantly, got to meet with, talk to and hang out with others working on AI and economics with Amy.
In week 7 I returned to working with my Q-learners and pretty much tried to coerce them into learning fair and maximized solutions. I played with the different constants, the number of states and the rewards in search of the best outcome. I made some progress in this week, but only in small steps and nothing really worth noting besides an added comfort level with my code and a real feel for what my agents were doing. Some of the more interesting things tried this week were the addition of more than 2/3 states and the changing of rewards for agents at home. Unfortunately, like I said before, these changes did not really lead anywhere new.
Because our Q-learners in their current state seemed to be fairly stuck, we decided to add in a fee for attending the bar. Amy suggested charging agents who go to the bar a fee and then redistributing it among those who stayed home; again encouraging coordination. The fee, in this sense was successful. Depending on the value of the fee (note that we only really used a fee with the continuous learners) we could easily influence the number of agents who attended the bar. This was a promising path to follow, but we were hard-coding the fee in, which seemed a lot like a hack, thus, week 9.
I talked to Amy about my concerns over hard coding the fee because it seemed to me that, like the agents own probabilities, this fee should be, in some senses learned. I had seen other people in my office (also working with Amy) trying out a derivative-follow algorithm which made small changes to the cost of a product they were trying to sell (decreasing each time in increment) until the price converged to a good result. I decided to implement this same algorithm for the changing fee. Thus, the fee began at zero and depending on the bar attendance (i.e. over/under capacity) went up or down to encourage movement towards maximizing utility. This turned out to be a great idea. At first I tried to get the 60 (the standard SFBP) agents to the bar every round and change the fee around that, but then, after talking with Amy about results, began to work with maximizing utility (over segments of 1000-100000 rounds). We had very good result from this. Our agents were learning probabilities of going to the bar that maximized their utility and the fee was being learned by what we dubbed the mayor (who sought to maximize all the agents' happiness).
This was the cleanup week. I finished running tests and wrote up the results. Of course, in the last two days Amy had crazy (in a good way of course) ideas about reward functions that were not +/-1 but rather changed depending on a sort on a bell like equation. I implemented this final idea to see how agents would react if they had different reward functions (i.e. some only want to be at the bar with 25 people, others 50, and a third group 75). It all worked out very well with interesting results on how agents respond to these differing reward functions. On the last day I was torn between creating new results, writing up new ones, and, at the end, figuring out how to use the evil monster know as bibtex. All in all, I have to say, it was an awesome summer. Amy was fantastic to work with, intelligent, helpful, nice, and fun. The people around me (undergrad and grad students) were all really cool. And I straight-up enjoyed being able to have an idea and just make it happen. Good times!