Turtlebot3 Navigation Using Deep Reinforcement Learning
C++, ROS2, Open Robotics Turtlebot3 Burger, 2D LiDar, OpenAI Gymnasium, Deep Reinforcement Learning
Description
In this project, I designed and implemented an OpenAI Gymnasium environment for training a Deep Reinforcement Learning model to enable autonomous navigation of a Turtlebot3 Burger through complex hallway environments. The system leverages 2D LiDAR sensors for perception and employs state-of-the-art Deep RL algorithms to learn collision-free navigation policies.
Project Overview
graph TB
subgraph Environment["Simulation Environment"]
TB3[Turtlebot3 Burger]
LIDAR[2D LiDAR Sensor]
WORLD[Hallway Environment]
end
subgraph RL["Reinforcement Learning"]
GYM[OpenAI Gymnasium<br/>Custom Environment]
AGENT[Deep RL Agent]
POLICY[Navigation Policy]
end
subgraph ROS["ROS2 Integration"]
ODOM[Odometry Node]
CTRL[Control Node]
SIM[Gazebo Simulation]
end
TB3 --> LIDAR
LIDAR --> GYM
WORLD --> GYM
GYM --> AGENT
AGENT --> POLICY
POLICY --> CTRL
CTRL --> TB3
ODOM --> GYM
style TB3 fill:#e1f5ff
style GYM fill:#fff4e1
style AGENT fill:#d4edda
Demo Video
The result of this project cam be shown in the video below:
Odometry Model
For this project, since the focus is the control of a turtlebot3, not localization, so I assume the perfect odometry for this project.
To get the odometry from the wheel angle and velocity, it applied kinematics model from Modern Robotics to conduct the Forward Kinematics and Inverse Kinematics of the robot.
The geometry model of the robot can be present using $H$ matrix.
\[\begin{align*} H(\theta) &= \begin{bmatrix} h_1(\theta) \\ h_2(\theta) \end{bmatrix} \\ h_i(\theta) &= \frac{1}{r_i \cos{\gamma_i}} \begin{bmatrix} x_i \sin{\left (\beta_i + \gamma_i \right )} - y_i \cos{\left (\beta_i + \gamma_i \right)} \\ \cos{\left ( \beta_i + \gamma_i + \theta \right )} \\ \sin{\left ( \beta_i + \gamma_i + \theta \right )} \end{bmatrix}^\intercal \end{align*}\]For the Turtlebot3, which is a diff-drive mobile robot. The parameters are defined as
So that \(H(0)\) can be calculated as
\[H(0)=\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix}\]Where \(d\) is the one-half of the distance between the wheels and the \(r\) is the radius of the wheel.
Hence \(F\) matrix can be calculated by
\[F=H(0)^\dagger = r \begin{bmatrix} -\frac{1}{2d} & \frac{1}{2d} \\ \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{bmatrix}\]Forward Kinematics
The forward kinematics is to find the tranform from current state to next state \(T_{bb'}\) given the change in wheel positions \(\left( \Delta\phi_l, \Delta\phi_r\right)\).
- Find the body twist \({\cal V}_b\) from the body frame of the current state to next state:
- Construct the 3d twist by
- Then, the transform from current state to next state \(T_{bb'}\) can be calculated by
Inverse Kinematics
Inverse kinematics is to find the wheel velocities \(\left(\dot{\phi_l}, \dot{\phi_r} \right)\) given the body twist \({\cal V}_b\).
The wheel velocities can be calculated as
\[\begin{bmatrix} \dot{\phi}_l \\ \dot{\phi}_r \end{bmatrix} = H(0){\cal V}_b = \frac{1}{r}\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix} \begin{bmatrix} \omega_z \\ v_x \\ v_y \end{bmatrix}\]Since this robot is non-holonomic so that it cannot move sideways. Hence \(v_y=0\), then the wheel velocities \(\left(\dot{\phi_l}, \dot{\phi_r} \right)\) are:
\[\begin{align*} \dot{\phi}_l &= \frac{1}{r} \left ( -d \omega_z + v_x\right) \\ \dot{\phi}_r &= \frac{1}{r} \left ( d \omega_z + v_x\right) \\ \end{align*}\]OpenAI Gymnasium
To start the training pipeline of the turtlebot3, I wrapped the ROS2 turtlebot3 simulation within the OpenAI Gymnasium environment, creating a standardized interface for Deep RL algorithms.
flowchart LR
subgraph Gymnasium["OpenAI Gymnasium Environment"]
OBS[Observation Space<br/>x, y, θ]
ACT[Action Space<br/>v_x, ω_z]
REW[Reward Function]
end
subgraph Robot["Turtlebot3 Simulation"]
STATE[Robot State]
SENSORS[LiDAR Sensors]
ACTUATORS[Wheel Actuators]
end
STATE --> OBS
SENSORS --> OBS
ACT --> ACTUATORS
ACTUATORS --> STATE
STATE --> REW
style OBS fill:#e1f5ff
style ACT fill:#fff4e1
style REW fill:#d4edda
Observation Space
The observation space for the environment can be defined as:
\[\begin{align*} x &= -1 \rightarrow 4 \text{ meters (world x-coordinate)} \\ y &= -0.5 \rightarrow 4.5 \text{ meters (world y-coordinate)} \\ \theta &= -\pi \rightarrow \pi \text{ radians (orientation)} \end{align*}\]Action Space
The action space can be defined as:
\[\begin{align*} \dot{x} &= 0.0 \rightarrow 0.2 \text{ m/s (linear velocity)} \\ \dot{\theta} &= -0.5 \rightarrow 0.5 \text{ rad/s (angular velocity)} \end{align*}\]Reward Function
The reward structure encourages collision-free goal-reaching behavior:
- Collision penalty: -1 (when robot hits a wall)
- Goal reward: +1 (when robot reaches the target)
- This sparse reward signal requires the agent to learn exploratory behavior
Training Process
The Deep Reinforcement Learning training follows an episodic approach where the agent learns through trial and error.
stateDiagram-v2
[*] --> Initialize: Episode Start
Initialize --> Observe: Reset to start position
Observe --> SelectAction: Get current state
SelectAction --> Execute: Choose action (RL policy)
Execute --> GetReward: Apply action to robot
GetReward --> CheckTerminal: Evaluate outcome
CheckTerminal --> Observe: Continue (no collision, no goal)
CheckTerminal --> Learn: Terminal state reached
Learn --> UpdatePolicy: Collision (-1) or Goal (+1)
UpdatePolicy --> Initialize: Next episode
note right of CheckTerminal
Terminal conditions:
- Wall collision (-1)
- Goal reached (+1)
- Max steps exceeded
end note
note right of UpdatePolicy
Policy updated using:
- Experience replay
- Neural network backprop
- TD-learning
end note
The training process is shown in the figure below, in which every robot will be reset to its original position whenever it hits a wall.

Training Algorithm
The agent learns through the following cycle:
- Initialization: Robot spawns at the start position
- Observation: Collect state information (position, orientation, LiDAR data)
- Action Selection: Neural network policy outputs velocity commands
- Execution: Robot moves according to the selected action
- Reward Evaluation: System checks for collision or goal achievement
- Learning: Update policy based on reward signal
- Reset: Upon terminal state, reset environment and repeat
Packages
nuturtle_description: Visualizing the pose of the turtlebot3 burgerturtlelib: Call customized library functions for kinematics and slam calculations.nusim: Simulate the turtlebot in the real world.nuturtle_control: Performing the turtlebot kinematics calculation and parsing the command sent to theturtlebot3.nuslam: Performs the SLAM algorithm for correcting the error odometry.nuturtle_interfaces: All message interfaces for communication between the nodesnuturtle_slam: Integrateslam_toolboxfor setting up learning environmentnuturtle_deeprl: Wrapped withOpenAI Gymnasiumfor Deep Reinforcement Learning traing environment