Q-learning: Simultaneous and Sequential

Next: Simultaneous Q-learning: One Pricebot Up: Q-learning (QL) Previous: Q-learning (QL)

Q-learning: Simultaneous and Sequential

This section, and Sections 6.2 to 6.5 look at QL when there are two pricebots, six states, and six actions. The prices, and therefore actions, start at 0.5 and go to 1 by increments of 0.1. Both the MY and QL can learn either simultaneously or sequentially, and depending on the method of learning, there is a different outcome/equilibrium. The simultaneous QL works in the following way:

save all prices

update all prices

compute all profits

learn all prices

Whereas the sequential QL works this way:

pricebot 0 learns

update pricebot 0's price

compute all profits

pricebot 1 learns

update pricebot 1's price

compute all profits

...

pricebot n learns

update pricebot n's price

compute all profits

The profits for each possible (state, action) pair are in Table 3. The same price and profit stipulations given in Section 2 are still being used.

state/action	0.5	0.6	0.7	0.8	0.9	1.0
0.5	0	0.0125	0.025	0.0375	0.05	0.0625
0.6	0	0.05	0.025	0.0375	0.05	0.0625
0.7	0	0.0875	0.1	0.0375	0.05	0.0625
0.8	0	0.0875	0.175	0.15	0.05	0.0625
0.9	0	0.0875	0.175	0.2625	0.2	0.0625
1.0	0	0.0875	0.175	0.2625	0.35	0.25

Table 3. Payoff table for wA=0.25 and wB=0.75.

Next: Simultaneous Q-learning: One Pricebot Up: Q-learning (QL) Previous: Q-learning (QL)

Victoria Manfredi
2001-08-02