This section, and Sections 6.2 to 6.5 look at QL when there are two pricebots, six states, and six actions. The prices, and therefore actions, start at 0.5 and go to 1 by increments of 0.1. Both the MY and QL can learn either simultaneously or sequentially, and depending on the method of learning, there is a different outcome/equilibrium. The simultaneous QL works in the following way:
save all prices |
update all prices |
compute all profits |
learn all prices |
pricebot 0 learns |
update pricebot 0's price |
compute all profits |
pricebot 1 learns |
update pricebot 1's price |
compute all profits |
... |
pricebot n learns |
update pricebot n's price |
compute all profits |
The profits for each possible (state, action) pair are in Table 3. The same price and profit stipulations given in Section 2 are still being used.
state/action | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
0.5 | 0 | 0.0125 | 0.025 | 0.0375 | 0.05 | 0.0625 |
0.6 | 0 | 0.05 | 0.025 | 0.0375 | 0.05 | 0.0625 |
0.7 | 0 | 0.0875 | 0.1 | 0.0375 | 0.05 | 0.0625 |
0.8 | 0 | 0.0875 | 0.175 | 0.15 | 0.05 | 0.0625 |
0.9 | 0 | 0.0875 | 0.175 | 0.2625 | 0.2 | 0.0625 |
1.0 | 0 | 0.0875 | 0.175 | 0.2625 | 0.35 | 0.25 |