This video shows the learning process of a pure-exploitation system. The robot repeatedly goes into dead loop and requires human intervention (to move the robot from the loop and restart learning). The learned rule base contains large number of blank rules and it can't be used for collision-free navigation.
Reinforcement Learning (RL) is promising for learning obstacle avoidance. In this research, a fuzzy system is used for obstacle avoidance and the fuzzy rules are tuned by RL. One problem in such a system is the conflict between exploration (i.e., the desire to explore the environment so as to make improvement on the rule base) and exploitation (i.e., the desire to use the rule base already learnt). The existing methods are pure-exploitation method and they may result in insufficiently learnt rule base. To overcome this drawback and maintain the efficiency of learning, a learning mechanism with stochastic perturbation is proposed to maintain tradeoff between exploration and exploitation. Such a learning system produces essential exploration strength to allow sufficient learning for each rule while the learning still converges.
The proposed learning method
This video shows the learning process of the proposed system. The stochastic perturbation allows a no-trap learning. It requires no human intervention.
The learned rule base is then use as an obstacle avoidance behavior and is integrated with a behavior-based navigator. The navigator successfully guides the robot. click here to see video of the navigation