The IBM Environment

In the traditional IBM setup, there are 11 buyers and 11 sellers. The limit prices for the agents are symmetric and they are uniformly distributed in increments of 25 from 75 to 325. The maximum price in the market is 400, and the minimum is 0.
One experiment is made up of 10 trading periods, and each trading period is made up of 100 clock ticks.
At the beginning of each EXPERIMENT, the agents internal characteristics such as their roles (buyer or seller), their limit prices, and their learning algorithms are initialized. The trade history, order history, and order numbering are cleared.
At the beginning of each TRADING PERIOD, the each buyer starts with zero units of the good and cash equal to the sum of its limit prices, and each seller starts with whatever quantity of good specified by the experiment and zero cash. The list of outstanding orders is cleared, but the order and trade history remain intact, as do the agents' internal learned prices. In other words, at the beginning of a trading period, agents start over with the trading process, but they still have their learning from the past periods to rely on.
At the beginning of each CLOCK TICK, the agents are randomly placed in a queue. In sequence, they are each given the opportunity to place an order. They may place an order if they are active (each agent has a 0.35 probability of being active on any given clock tick), if they have no open orders, and if they have not traded their maximum quantity of the good. If these conditions hold, the agent submits a price and order type (bid/ask). After each agent has had the opportunity to place an order, the new orders are matched with old ones from the list of outstanding orders. If there is a match, a trade is executed at the price dictated by the earlier of the two matching orders; otherwise, the new order is appended to the list of outstanding orders. All histories are updated accordingly. After orders are matched, outstanding orders older than 8 clock ticks expire. Finally, agents learn in their respective ways from the market activity that occured during the clock tick.
Some explanation of several aspects of this market may be helpful. Agents have a 0.35 probability of being active on any given clock tick in an effort to model the timing of the orders. The clock ticks represent the operational time scale of the institution - the time it takes to receive an order or execute a trade and announce it to all participants. The assumption is that both the institution and the agents are very fast, but that the institution is faster than the agents. The goal in this modeling process is to minimize the number of events that appear simultaneous to the institution.
The expiration policy of removing outstanding orders older than 8 clock ticks is implemented to ensure that bad bids or asks are not locked out of trading for the rest of the period. At the very beginning of the experiment, when there is not very much market activity to learn from, there may be some orders at unreasonable prices. The idea is that these agents should be allowed to revise their orders later, rather than being punished for a bad guess throughout the rest of the trading period.
It should also be noted that the simulator that captures this market structure also has the capacity to enforce the spread reduction rule. The spread reduction rule states that new orders must improve on existing orders. That is, bids may only be submitted at prices higher than the current outstanding bid. Similarly, new asks may only be sumbitted at prices lower than the outstanding ask.
(Das and Tesauro, 2000)