Summer 2004 - Research Project: Reinforcement Learning

This week I am still running tests. After 35000 episodes, the agents still are not printing out an optimal path. I am increasing the number of episodes to 70000, and am running tests on 12 of the Computer Science Lab computers. Hopefully the agents will perform better after such a long time.

Unfortunately the ML lab was shut down while my tests were running (I was warned; unfortunately my tests had already been running a few days, so there was not much I could do about it. I still have the optimal paths file, but no rewards file. It doesn't really matter, since I will get these reward files from the 70000 episode tests.

After nearly a week of running, my first test are done! I am using the 30x30 world, Sarsa learning with Boltzmann selection, temperature T=10. I am also using eligibility traces, lambda L=0.1, L=0.2, L=0.3. Since these tests are done, I'll try with higher values of lambda, and the same temperature. The results are not so encouraging. The agent has not learned anything more than after 35000 episodes, and the rewards it receives average around -4000. If it were finding the goal quickly, it should receive around 350-400. There is very little difference between due to the values of lambda, except for the speed at which it peaks, and even there it's almost insignificant. The graphs below shows the rewards of these tests for episodes 1 to 1000, and 1000 to 70000.

T=10, L=0.1 vs. L=0.2 vs L=0.3 on 30x30 world - Episodes 1-1000

T=10, L=0.1 vs. L=0.2 vs L=0.3 on 30x30 world - Episodes 1000-70000

While the tests are running I decided to make my website a bit more palatable, and the result is the design you see now! Unfortunately you can't compare to the old one anymore, but you can trust me that this is much better. For one thing, it's not just text. I made the logo myself using Adobe Illustrator (which I learned how to use just for this purpose), which was a lot of fun. I also had to go all over the net figuring out how to do just about everything I wanted to do in HTML. I'm pretty happy now: I learned a lot and ended up with a design I actually like! I still have a bit of tweaking to do to make navigation easier, but that will be for next week.