Turtlebot3 Navigation Using Deep Reinforcement Learning 使用深度强化学习的 Turtlebot3 导航

Custom Gymnasium environment for hallway navigation training 用于走廊导航训练的自定义Gymnasium环境

C++, ROS2, Open Robotics Turtlebot3 Burger, 2D LiDar, OpenAI Gymnasium, Deep Reinforcement Learning

View This Project on GitHub

Description

In this project, I designed and implemented an OpenAI Gymnasium environment for training a Deep Reinforcement Learning model to enable autonomous navigation of a Turtlebot3 Burger through complex hallway environments. The system leverages 2D LiDAR sensors for perception and employs state-of-the-art Deep RL algorithms to learn collision-free navigation policies.

Project Overview

graph TB
    subgraph Environment["Simulation Environment"]
        TB3[Turtlebot3 Burger]
        LIDAR[2D LiDAR Sensor]
        WORLD[Hallway Environment]
    end

    subgraph RL["Reinforcement Learning"]
        GYM[OpenAI Gymnasium<br/>Custom Environment]
        AGENT[Deep RL Agent]
        POLICY[Navigation Policy]
    end

    subgraph ROS["ROS2 Integration"]
        ODOM[Odometry Node]
        CTRL[Control Node]
        SIM[Gazebo Simulation]
    end

    TB3 --> LIDAR
    LIDAR --> GYM
    WORLD --> GYM
    GYM --> AGENT
    AGENT --> POLICY
    POLICY --> CTRL
    CTRL --> TB3
    ODOM --> GYM

    style TB3 fill:#e1f5ff
    style GYM fill:#fff4e1
    style AGENT fill:#d4edda

Demo Video

The result of this project cam be shown in the video below:

Odometry Model

For this project, since the focus is the control of a turtlebot3, not localization, so I assume the perfect odometry for this project.

To get the odometry from the wheel angle and velocity, it applied kinematics model from Modern Robotics to conduct the Forward Kinematics and Inverse Kinematics of the robot.

The geometry model of the robot can be present using $H$ matrix.

\[\begin{align*} H(\theta) &= \begin{bmatrix} h_1(\theta) \\ h_2(\theta) \end{bmatrix} \\ h_i(\theta) &= \frac{1}{r_i \cos{\gamma_i}} \begin{bmatrix} x_i \sin{\left (\beta_i + \gamma_i \right )} - y_i \cos{\left (\beta_i + \gamma_i \right)} \\ \cos{\left ( \beta_i + \gamma_i + \theta \right )} \\ \sin{\left ( \beta_i + \gamma_i + \theta \right )} \end{bmatrix}^\intercal \end{align*}\]

For the Turtlebot3, which is a diff-drive mobile robot. The parameters are defined as

\[\begin{align*} x_1 &= 0 \\ x_2 &= 0 \\ y_1 &= d \\ y_2 &= -d \\ \beta_1 &= 0 \\ \beta_2 &= 0 \\ \gamma_1 &= -\pi/2 \\ \gamma_2 &= -\pi/2 \\ r_1 &= r \\ r_2 &= r \end{align*}\]

So that \(H(0)\) can be calculated as

\[H(0)=\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix}\]

Where \(d\) is the one-half of the distance between the wheels and the \(r\) is the radius of the wheel.

Hence \(F\) matrix can be calculated by

\[F=H(0)^\dagger = r \begin{bmatrix} -\frac{1}{2d} & \frac{1}{2d} \\ \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{bmatrix}\]

Forward Kinematics

The forward kinematics is to find the tranform from current state to next state \(T_{bb'}\) given the change in wheel positions \(\left( \Delta\phi_l, \Delta\phi_r\right)\).

  • Find the body twist \({\cal V}_b\) from the body frame of the current state to next state:
\[{\cal V}_b = F \Delta \phi = r \begin{bmatrix} -\frac{1}{2d} & \frac{1}{2d} \\ \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \Delta \phi_l \\ \Delta \phi_r \end{bmatrix}\]
  • Construct the 3d twist by
\[{\cal V}_{b6} = \begin{bmatrix} 0 \\ 0 \\ {\cal V}_b \\ 0 \end{bmatrix}\]
  • Then, the transform from current state to next state \(T_{bb'}\) can be calculated by
\[T_{bb'}=e^{[{\cal V}_{b6}]}\]

Inverse Kinematics

Inverse kinematics is to find the wheel velocities \(\left(\dot{\phi_l}, \dot{\phi_r} \right)\) given the body twist \({\cal V}_b\).

The wheel velocities can be calculated as

\[\begin{bmatrix} \dot{\phi}_l \\ \dot{\phi}_r \end{bmatrix} = H(0){\cal V}_b = \frac{1}{r}\begin{bmatrix} -d & 1 & -\infty \\ d & 1 & -\infty \end{bmatrix} \begin{bmatrix} \omega_z \\ v_x \\ v_y \end{bmatrix}\]

Since this robot is non-holonomic so that it cannot move sideways. Hence \(v_y=0\), then the wheel velocities \(\left(\dot{\phi_l}, \dot{\phi_r} \right)\) are:

\[\begin{align*} \dot{\phi}_l &= \frac{1}{r} \left ( -d \omega_z + v_x\right) \\ \dot{\phi}_r &= \frac{1}{r} \left ( d \omega_z + v_x\right) \\ \end{align*}\]

OpenAI Gymnasium

To start the training pipeline of the turtlebot3, I wrapped the ROS2 turtlebot3 simulation within the OpenAI Gymnasium environment, creating a standardized interface for Deep RL algorithms.

flowchart LR
    subgraph Gymnasium["OpenAI Gymnasium Environment"]
        OBS[Observation Space<br/>x, y, θ]
        ACT[Action Space<br/>v_x, ω_z]
        REW[Reward Function]
    end

    subgraph Robot["Turtlebot3 Simulation"]
        STATE[Robot State]
        SENSORS[LiDAR Sensors]
        ACTUATORS[Wheel Actuators]
    end

    STATE --> OBS
    SENSORS --> OBS
    ACT --> ACTUATORS
    ACTUATORS --> STATE
    STATE --> REW

    style OBS fill:#e1f5ff
    style ACT fill:#fff4e1
    style REW fill:#d4edda

Observation Space

The observation space for the environment can be defined as:

\[\begin{align*} x &= -1 \rightarrow 4 \text{ meters (world x-coordinate)} \\ y &= -0.5 \rightarrow 4.5 \text{ meters (world y-coordinate)} \\ \theta &= -\pi \rightarrow \pi \text{ radians (orientation)} \end{align*}\]

Action Space

The action space can be defined as:

\[\begin{align*} \dot{x} &= 0.0 \rightarrow 0.2 \text{ m/s (linear velocity)} \\ \dot{\theta} &= -0.5 \rightarrow 0.5 \text{ rad/s (angular velocity)} \end{align*}\]

Reward Function

The reward structure encourages collision-free goal-reaching behavior:

  • Collision penalty: -1 (when robot hits a wall)
  • Goal reward: +1 (when robot reaches the target)
  • This sparse reward signal requires the agent to learn exploratory behavior

Training Process

The Deep Reinforcement Learning training follows an episodic approach where the agent learns through trial and error.

stateDiagram-v2
    [*] --> Initialize: Episode Start
    Initialize --> Observe: Reset to start position
    Observe --> SelectAction: Get current state
    SelectAction --> Execute: Choose action (RL policy)
    Execute --> GetReward: Apply action to robot

    GetReward --> CheckTerminal: Evaluate outcome

    CheckTerminal --> Observe: Continue (no collision, no goal)
    CheckTerminal --> Learn: Terminal state reached

    Learn --> UpdatePolicy: Collision (-1) or Goal (+1)
    UpdatePolicy --> Initialize: Next episode

    note right of CheckTerminal
        Terminal conditions:
        - Wall collision (-1)
        - Goal reached (+1)
        - Max steps exceeded
    end note

    note right of UpdatePolicy
        Policy updated using:
        - Experience replay
        - Neural network backprop
        - TD-learning
    end note

The training process is shown in the figure below, in which every robot will be reset to its original position whenever it hits a wall.

Training Algorithm

The agent learns through the following cycle:

  1. Initialization: Robot spawns at the start position
  2. Observation: Collect state information (position, orientation, LiDAR data)
  3. Action Selection: Neural network policy outputs velocity commands
  4. Execution: Robot moves according to the selected action
  5. Reward Evaluation: System checks for collision or goal achievement
  6. Learning: Update policy based on reward signal
  7. Reset: Upon terminal state, reset environment and repeat

Packages

  • nuturtle_description: Visualizing the pose of the turtlebot3 burger
  • turtlelib: Call customized library functions for kinematics and slam calculations.
  • nusim: Simulate the turtlebot in the real world.
  • nuturtle_control: Performing the turtlebot kinematics calculation and parsing the command sent to the turtlebot3.
  • nuslam: Performs the SLAM algorithm for correcting the error odometry.
  • nuturtle_interfaces: All message interfaces for communication between the nodes
  • nuturtle_slam: Integrate slam_toolbox for setting up learning environment
  • nuturtle_deeprl: Wrapped with OpenAI Gymnasium for Deep Reinforcement Learning traing environment

C++, ROS2, Open Robotics Turtlebot3 Burger, 2D LiDar, OpenAI Gymnasium, 深度强化学习

在 GitHub 上查看此项目

描述

在这个项目中,我设计和实现了一个 OpenAI Gymnasium 环境,用于训练深度强化学习模型,使 Turtlebot3 Burger 能够在复杂走廊环境中自主导航。系统利用2D激光雷达传感器进行感知,采用最先进的深度强化学习算法学习无碰撞导航策略。

演示视频

该项目的结果可以在下面的视频中看到:

OpenAI Gymnasium

为了启动 turtlebot3 的训练流程,我将 ROS2 turtlebot3 仿真封装在 OpenAI Gymnasium 环境中,为深度强化学习算法创建了标准化接口。

观测空间

环境的观测空间可以定义为:

  • x: -1 → 4 米(世界 x 坐标)
  • y: -0.5 → 4.5 米(世界 y 坐标)
  • θ: -π → π 弧度(方向)

动作空间

动作空间可以定义为:

  • ẋ: 0.0 → 0.2 m/s(线速度)
  • θ̇: -0.5 → 0.5 rad/s(角速度)

奖励函数

奖励结构鼓励无碰撞的到达目标行为:

  • 碰撞惩罚: -1(当机器人撞墙时)
  • 目标奖励: +1(当机器人到达目标时)
  • 这种稀疏奖励信号要求智能体学习探索性行为

训练过程

深度强化学习训练采用情节式方法,智能体通过试错学习。

训练过程如下图所示,每当机器人撞墙时,都会重置到其初始位置。