What I am working on is pricebots. Pricebots are agents that try to set the price that a good is sold at in order to maximize its profit. Right now, I am working with four different pricebot algorithms for setting the price so as to try and maximize profit: -gt: game theoretic strategy. This strategy calculates the prices that would give a mixed strategy nash eqbr and uses that price. -my: myopic optimal strategy. This strategy looks at all the prices of the other sellers and finds the min of those prices. Then it calculates the possible profits according to the min price found. (Having the min price means that the min buyers will buy from you so it increases your profits.) It then looks at all those possible profits and sets its price at the price that gave the maximum profit. -df: derivative following strategy. This strategy compares its current profit to its previous profit. If its current profit has increased then the df continues to change its price in the same direction. If its current profit has decreased then the df switches the direction that it changes its price in. -ql: Q-learning strategy. This strategy uses reinforcement learning to calculate its price in the following way: it maintains a table of q-values which are indexed by states and actions. The states can be anything, we're using the min price, and actions are prices. With each generation these q-values are updated using an equation that takes into account the profit for the price chosen, to determine whether to increase or decrease the q-value. Since the price corresponding to the max q-value is chosen as the next price usually, then increased profit increases the q-value and so the q-value and hence the price that gave that increased profit is chosen more often. I have some code written by a student of Amy's that implements these algorithms and I have been debugging the code, trying to make sure that the algorithms are implemented properly and are giving the correct output. Once we know that the algorithms are all correct, there is another aspect of the code that we've kind of been ignoring, the part of the code that evolves the proportions of the sellers. So what happens is that, say we start out with 100 sellers, that is pricebots, and they are evenly divided among the four algorithms, so 25 sellers are using the game-theoretic algorithm to set prices, 25 sellers are using the myopic optimal strategy to set prices, 25 sellers are using the derivative following strategy to set prices, and 25 sellers are using the Q-learning strategy to set prices. So that is at the start. But then for each generation we will increase or decrease the proportions of the sellers depending on how much profit they make. So if the myopic optimal algorithm makes the most profit for say the 5th generation, then if there were 25 myoptimal sellers, we will increase the number myoptimal sellers relative to how much profit they made and decrease the number of sellers of an algorithm that did not do as well, always maintaining 100 sellers in total, the proportions of which sellers correspond to which algorithm can change.