The goal was to program a robot battle tank to fight against other robots in an interactive virtual Robocode battlefield in Java. I applied various Artificial Intelligence (AI) concepts to induce intelligence in my battle tank. Reinforcement learning (Q-learning algorithm) and neural network using back-propagation was implemented on my agent. Progressive learning was observed on my robot which was evaluated based on the number of rounds of battles won and the survival time of my robot. It was a fun project!
The images below show my robot, Barfi battling against the enemy robot, Tracker in the Robocode platform and successfully beating him. The game can also be extended to multiple robots fighting in the battlefield.
Reinforcement learning or Q-learning algorithm is used whenever there is uncertainty about the transition model of the environment. In this type of learning, we have sequence of actions and state transitions with rewards associated at some points of these sequences. We then try to learn the optimal policy. Reinforcement learning guided my learning agent to reach the optimal policy even though the agent didn’t know anything about the reward when starting out.




Initially, a look up table (LUT) was used to store the state-action pairs.
Parameters considered by Robocode environment:
- Game time of current round
- X, Y position of the robot
- Distance to the enemy robot
Relative energy
Actions defined for my robot:
- Move towards/away enemy robot
- Fire/not fire
- Power level of the bullet fire
- Rewards or penalties associated for my robot:
- Award if robot wins the battle
- Penalty if robot loses the battle
- Penalty when the robot collides with another robot
- Penalty if robot hits the wall of the battlefield
- Award if robot’s bullet hits the enemy robot
- Penalty if robot’s bullet misses the enemy
- Penalty if robot is hit by the enemy’s bullet
Following parameters were used to track the progress of learning:
- Number of rounds of battles that my robot wins against the enemy robot
- Survival time of my robot in the battlefield
- Points/rewards won by my robot
Look up table was defined in a Java class and all the various parameters were stored in matrices. I used some of the functions from the robocode library files (inherited AdvancedRobot features) such as getBulletHitEvents(), setFireBullet(), getVelcocity(), getEnergy().
Next, I replaced the LUT with neural networks and applied the error back-propagation method.
Three state parameters were used:
- Relative energy of the robot
- Distance to enemy robot
- Gun heat
Three action parameters were used:
- Move away/towards enemy robot
- Fire/not fire
- Power level of the bullet
Neural network based approach was found to be better than the LUT approach as learning seemed to be more stable and convergence occurred at the earlier stage. Since, the look up table can grow very huge depending on the number of state-action pairs, state space reduction was required. However, while using neural network, there were no such constraints associated. Hence, using network networks would result in accurate results.
Java code for neural network implementation of XOR training set can be found here.