My project this summer deals with reinforcement learning agents. These agents learn by trial-and-error, collecting rewards
or penalties, and subsequently choosing whichever action seems best according to past experience. A certain percentage of
the time they explore, picking what they believe to be a suboptimal action, but which may lead to something better in the end.
We are interested in coming up with more sophisticated strategies for action selection: for example, some strategies
might involve selecting action that have not been seen in a long time, or avoiding action that have proven disatrous
in past experience. I will be looking at various heuristics of this sort and comparing them, mainly in simple
navigation task, through a grid world.
|