next up previous
Next: Simultaneous Q-learning: One Pricebot Up: Q-learning (QL) Previous: Q-learning (QL)

Q-learning: Simultaneous and Sequential

This section, and Sections 6.2 to 6.5 look at QL when there are two pricebots, six states, and six actions. The prices, and therefore actions, start at 0.5 and go to 1 by increments of 0.1. Both the MY and QL can learn either simultaneously or sequentially, and depending on the method of learning, there is a different outcome/equilibrium. The simultaneous QL works in the following way:

save all prices
update all prices
compute all profits
learn all prices
Whereas the sequential QL works this way:
pricebot 0 learns
update pricebot 0's price
compute all profits
 
pricebot 1 learns
update pricebot 1's price
compute all profits
...
 
pricebot n learns
update pricebot n's price
compute all profits

The profits for each possible (state, action) pair are in Table 3. The same price and profit stipulations given in Section 2 are still being used.

state/action 0.5 0.6 0.7 0.8 0.9 1.0
0.5 0 0.0125 0.025 0.0375 0.05 0.0625
0.6 0 0.05 0.025 0.0375 0.05 0.0625
0.7 0 0.0875 0.1 0.0375 0.05 0.0625
0.8 0 0.0875 0.175 0.15 0.05 0.0625
0.9 0 0.0875 0.175 0.2625 0.2 0.0625
1.0 0 0.0875 0.175 0.2625 0.35 0.25
Table 3. Payoff table for wA=0.25 and wB=0.75.


next up previous
Next: Simultaneous Q-learning: One Pricebot Up: Q-learning (QL) Previous: Q-learning (QL)
Victoria Manfredi
2001-08-02