Cooperative Qlearning approach allows multiple learners to learn independently then share their Qvalues among each other using a Qvalue sharing strategy. A main problem with this approach is that the solutions of the learners may not converge to optimality because the optimal Qvalues may not be found. Another problem is that some cooperative algorithms perform very well with singletask problems, but quite poorly with multitask problems. This paper proposes a new cooperative Qlearning algorithm called the Bat Qlearning algorithm (BQlearning) that implements a Qvalue sharing strategy based on the Bat algorithm. The Bat algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal Qvalues by balancing between the exploration and exploitation of actions by tuning the parameters of the algorithm. The BQlearning algorithm was tested using two problems: the shortest path problem (singletask problem) and the taxi problem (multitask problem). The experimental results suggest that BQlearning performs better than singleagent Qlearning and some wellknown cooperative Qlearning algorithms.
Key words: Qlearning, Bat Algorithm, Optimization, Cooperative Reinforcement
