What I am working on is pricebots.  Pricebots are agents that 
try to set the price that a good is sold at in order to maximize 
its profit.  Right now, I am working with four different pricebot
algorithms for setting the price so as to try and maximize profit:

-gt: game theoretic strategy.  This strategy calculates the prices that 
would give a mixed strategy nash eqbr and uses that price.

-my: myopic optimal strategy. This strategy looks at all the prices of
the other sellers and finds the min of those prices.  Then it calculates the
possible profits according to the min price found. (Having the min price
means that the min buyers will buy from you so it increases your profits.)
It then looks at all those possible profits and sets its price at the price
that gave the maximum profit.

-df: derivative following strategy. This strategy compares its current
profit to its previous profit.  If its current profit has increased
then the df continues to change its price in the same direction.
If its current profit has decreased then the df switches the
direction that it changes its price in.

-ql: Q-learning strategy.  This strategy uses reinforcement learning to
calculate its price in the following way:  it maintains a table of 
q-values which are indexed by states and actions. The states can be 
anything, we're using the min price, and actions are prices.  With each 
generation these q-values are updated using an equation that takes into
account the profit for the price chosen, to determine whether to increase
or decrease the q-value.  Since the price corresponding to the max q-value
is chosen as the next price usually, then increased profit increases the 
q-value and so the q-value and hence the price that gave that increased 
profit is chosen more often.
  
I have some code written by a student of Amy's that implements these
algorithms and I have been debugging the code, trying to make sure
that the algorithms are implemented properly and are giving the correct
output.  Once we know that the algorithms are all correct, there is 
another aspect of the code that we've kind of been ignoring, the part
of the code that evolves the proportions of the sellers.  So what happens
is that, say we start out with 100 sellers, that is pricebots, and
they are evenly divided among the four algorithms, so 25 sellers are
using the game-theoretic algorithm to set prices, 25 sellers are using
the myopic optimal strategy to set prices, 25 sellers are using the
derivative following strategy to set prices, and 25 sellers are using
the Q-learning strategy to set prices.  So that is at the start.  But then
for each generation we will increase or decrease the proportions of the
sellers depending on how much profit they make.  So if the myopic optimal
algorithm makes the most profit for say the 5th generation, then if there
were 25 myoptimal sellers, we will increase the number myoptimal sellers 
relative to how much profit they made and decrease the number of sellers
of an algorithm that did not do as well, always maintaining 100 sellers in
total, the proportions of which sellers correspond to which algorithm can 
change.