Autonomous Abstraction Of Policies Based On Policy Homomorphism

Autonomous Abstraction Of Policies Based On Policy Homomorphism

Date

2009-09-16T18:20:47Z

Publisher

Computer Science & Engineering

Abstract

A life long learning agent performing in a complex and dynamic environment needs the ability to learn increasingly complex tasks over time. These agents over their lifetime have to learn new tasks, adapt the policies of already learned tasks and extract and reuse the knowledge gained to learn new, more complex tasks. To do this, they need methods that allow them to autonomously extract knowledge from the already learned policy instances and reuse the knowledge gained to learn related tasks in novel environments.This dissertation presents a novel approach that enables an agent to autonomously abstract reusable skills and concepts using policy instances of a similar task type and use the resulting abstractions to learn related tasks in novel situations. To achieve this, this work formalizes a novel idea of policy homomorphism that allows autonomous extraction of general policies for task types . Each extracted general policy is here an abstract policy that is homomorphic to the set of specific policy instances of the corresponding task type that it is derived from and is made up of abstract states that identify situations in which the given policy is applicable and abstract actions that identify actions that need to be performed in those situations. Once extracted, the generalized policies are reused in new contexts to address related tasks by adding them as higher level actions that the agent can choose to perform. To facilitate the autonomous abstraction of a policy of a given task type from a set of policies, the agent has to identify and categorize policies for various tasks into different task types . To achieve this the policy generalization approach presented here employs a utility-based criterion that enables the agent to autonomously categorize and generalize a set of situation-specific policies of different task types into a set of general policies containing one general policy for each identified task type using the policy homomorphism framework. To demonstrate the working of this policy generalization method we show the abstraction of a general policy for a specific task type using two sets of policies of different task types in a grid world domain and further show how the abstracted general policies can be used to learn related tasks in novel grid world environments. Further, to demonstrate the working of the utility based criterion to identify task types and autonomously abstract general policies for the identified task types we show the abstraction of general polices using the utility criterion from a set of situation-specific policies of different task types in a grid world domain.