Marc Métivier

Accueil / Home

Recherche / Research

Reinforcement Learning and Imitation with Learning Classifier Systems.

The Reinforcement Learning framework considers adaptive agents interacting with an environment and having to adapt their behavior to maximize an objective function specific to this environment. Such an agent is a decision system. It can interact with the environment by performing actions whose sequences define its behavior. The information it has to choose actions is the reinforcement: an occasional reward received as a consequence of certain actions and which provides an evaluation of the relevance of last decision-makings. The agent aim is then to find the optimal behavior: the one whose actions maximize long-term reinforcement.

Among reinforcement learning methods, I am particularly interested in classifier systems. It consists in rule-based systems initially develop in the research field of evolutionary methods. In these systems, perceptions of the environment are associated with actions under the form of condition-action type rules, called classifiers, using mechanisms enabling to generalization across regularities of perceptions. Their general concept lies in the generation and the manipulation of rules induced by the interactions with the environment, and the use of different mechanisms, often stochastic, to explore the set of classifiers. In particular, the classic search method consists in considering the set of rules as a population of individuals and in applying a genetic algorithm (Goldberg 1989) on this population. In this context, the fitness of a rule is based on the rewards this rule may permit to obtain from the environment.

My second point of interest is imitation for reinforcement learning. Imitation may be seen as a way to improve learning. In reinforcement learning, it may allow an agent to increase its capacities to maximize a reward function by the observation of another agent's behavior. Any imitation process supposes the existence of an observer agent and one, or more, mentor agent(s). The mentor is the one whose behavior is to reproduce. The observer is the one trying to learn some elements of the mentor's behavior.

In my thesis, I addressed the study of the imitation mechanisms with which an agent controlled by a classifier system can improve its behavior by the observation of the behavior of a mentor. A specificity of this work was to consider mentors are not professors. More precisely, I considered that mentor may not be able, or may not want, to adapt its behavior in order to help the observer's learning. In parallel, I considered mentors which may have different objectives than the observer. The hypothesis of this work is that in a lot of cases, observing the behavior of such a mentor can anyway bring some information useful for the observer. Several methods were studied and tested in different classic environments used in the classifier system field. These studies were made using the three classifier systems currently considered as the reference systems of the three major forms of classifier systems. The systems are: ZCS for strength-based systems, XCS for accuracy-based systems, and ACS for anticipatory classifier systems. These experiments highlighted the importance of using a behavioral model of the mentor, and of taking this model into account through the use of a specific internal action.