|
Description:
|
A family of Adaptive Critic Designs (ACD ) was proposed by Werbos (1992 ) as a new optimization technique combining together concepts of reinforcement learning and backpropagation . The goal of each design is to find an approximation of the cost -to -go function from the Bellman equation of dynamic programming or some function related to it , and then find the optimal solution of the problem by applying a reinforcement learning technique . In ACD we have two networks called critic and action (a substitute name for "controller" in the ACD literature ) , the action network trying to minimize an approximation of the cost -to -go . There are three basic implementations of ACD called Heuristic Dynamic Programming (HDP ) , Dual Heuristic Programming (DHP ) , and Globalized Dual Heuristic Programming (GDHP ) (Prokhorov & Wunsch , to appear ) . |