Home
About
- Shristi Pradhan
  
  Shristi Pradhan website
- Learn More
- Email
- LinkedIn
- Github
Projects
Posts

Robocode using Reinforcement Learning

01 Dec 2011

Reading time ~3 minutes

The goal was to program a robot battle tank to fight against other robots in an interactive virtual Robocode battlefield in Java. I applied various Artificial Intelligence (AI) concepts to induce intelligence in my battle tank. Reinforcement learning (Q-learning algorithm) and neural network using back-propagation was implemented on my agent. Progressive learning was observed on my robot which was evaluated based on the number of rounds of battles won and the survival time of my robot. It was a fun project!

The images below show my robot, Barfi battling against the enemy robot, Tracker in the Robocode platform and successfully beating him. The game can also be extended to multiple robots fighting in the battlefield.

Reinforcement learning or Q-learning algorithm is used whenever there is uncertainty about the transition model of the environment. In this type of learning, we have sequence of actions and state transitions with rewards associated at some points of these sequences. We then try to learn the optimal policy. Reinforcement learning guided my learning agent to reach the optimal policy even though the agent didn’t know anything about the reward when starting out.

Robocode

Initially, a look up table (LUT) was used to store the state-action pairs.

Parameters considered by Robocode environment:

Game time of current round
X, Y position of the robot
Distance to the enemy robot

Relative energy

Actions defined for my robot:

Move towards/away enemy robot
Fire/not fire
Power level of the bullet fire
Rewards or penalties associated for my robot:
Award if robot wins the battle
Penalty if robot loses the battle
Penalty when the robot collides with another robot
Penalty if robot hits the wall of the battlefield
Award if robot’s bullet hits the enemy robot
Penalty if robot’s bullet misses the enemy
Penalty if robot is hit by the enemy’s bullet

Following parameters were used to track the progress of learning:

Number of rounds of battles that my robot wins against the enemy robot
Survival time of my robot in the battlefield
Points/rewards won by my robot

Look up table was defined in a Java class and all the various parameters were stored in matrices. I used some of the functions from the robocode library files (inherited AdvancedRobot features) such as getBulletHitEvents(), setFireBullet(), getVelcocity(), getEnergy().

Next, I replaced the LUT with neural networks and applied the error back-propagation method.

Three state parameters were used:

Relative energy of the robot
Distance to enemy robot
Gun heat

Three action parameters were used:

Move away/towards enemy robot
Fire/not fire
Power level of the bullet

Neural network based approach was found to be better than the LUT approach as learning seemed to be more stable and convergence occurred at the earlier stage. Since, the look up table can grow very huge depending on the number of state-action pairs, state space reduction was required. However, while using neural network, there were no such constraints associated. Hence, using network networks would result in accurate results.

Java code for neural network implementation of XOR training set can be found here.

Share Tweet +1