Control of a nonlinear multivariable system with adaptive critic designs

Visnevski, Nikita A.

Control of a nonlinear multivariable system with adaptive critic designs

Date

1997-05

Authors

Visnevski, Nikita A.

Publisher

Texas Tech University

Abstract

A family of Adaptive Critic Designs (ACD) was proposed by Werbos (1992) as a new optimization technique combining together concepts of reinforcement learning and backpropagation. The goal of each design is to find an approximation of the cost-to-go function from the Bellman equation of dynamic programming or some function related to it, and then find the optimal solution of the problem by applying a reinforcement learning technique. In ACD we have two networks called critic and action (a substitute name for "controller" in the ACD literature), the action network trying to minimize an approximation of the cost-to-go. There are three basic implementations of ACD called Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP), and Globalized Dual Heuristic Programming (GDHP) (Prokhorov & Wunsch, to appear).