This project was done as a part of my B.Sc. Thesis under supervision of Dr. Mehdi Sedighi, Computer Engineering Department, Tehran Polytechnic. The aim was to implement a reinforcement learning algorithm to train a pathfinding agent to follow the path it's intended to follow. I've defined the project to be as follow:
The first step was to design a simulation environment. Kivy framework is used for this purpose. An agent with seven sensors (adjustable) starts in a specific position on a white field, the user can draw lines on the field and the agent is able to percept those lines with its sensors.
Pytorch library is used for highlevel implementation of DQN. For better exploration, epsillon-greedy method is used.
Hyperparameters were obtained as follow:
Implementation is done with fully numpy framework with object-oriented methodology.
Added a config.py file to have control on simulation. It would be as follow:
learningCoreSettings = {
# Number of hidden layer neurons (For three inputs, 8 to 16 neurons work like charm)
"nNeurons" : 10,
# Discount factor (Somewhere between 0.8 and 0.9 is ok)
"gamma" : 0.9,
# Replay memory capacity (10000 is more than enough)
"memoryCapacity" : 10000,
# Learning-rate (Somewhere between 0.0001 to 0.005 is ok)
"learningRate" : 0.001,
# BatchSize, number of samples taken from replay memory in each Learning Iteration
"batchSize" : 25,
# Non-linear activation function for neurons, ReLU is used in this project but you may implement others
"activationFunction" :"relu",
# Number of outputs, can be set to 3 (Its not generic yet)
"nOutputs" : 3,
# Number of inputs, can be set to 3 or 7 (Its not generic yet)
"nInputs" : 3,
# Regularization factor, for now its just implemented in manual design
"reg" : 0,
# AI Backend, can be set to manual, pytorch
"backend" : "manual",
# Amount of given reward for DQN algorithm
"rewardAmount" : 0.1,
# Punishment = amount of given negative reward for DQN algorithm
"punishAmount" : -1,
# Softmax temperature, used in softmax function implementation
"softmaxTemperature" : 10,
# Number of iterations in learning phase
"learningIterations" : 2500,
# Number of iterations in prediction phase
"predictionIterations" : 2500}
environmentSettings = {
"sensorSize" : 15,
"agentWidth" : 96,
"agentLength" : 120,
"rotationDegree" : 3,
"agentVelocity" : 5,
"sensorsRotationalDistance":15,
"sensorSensitivity" : 8,
"buttonWidth" : 230,
"environmentWidth" : 800,
"environmentHeight" : 600
}
The robot exhibited as expected. We executed it for 5000 iterations, 2500 iterations for exploration phase (based on epsillon-greedy method) and another 2500 iterations for "prediction phase" (which we stopped the learning and fixed the MLP weights). Here, we provided a video showing its functionality.
This is a experimental project suitable for testing and working with small-scale reinforcement learning tasks which can be used in experimenting different algorithms for autonomous systems.
A Simple RealTime PathFinding Robot Based on Implementation of DQN Algoriththm on Xilinx Zynq ARM Cortex-A Hard Processor (My B.Sc. Thesis, Phase 2/3)
Register Transfer Level Acceleration of FeedForward Propagation in 2-Layer Perceptron Networks (My B.Sc. Thesis, Phase 3/3)