u/Few-Blueberry-6125

I'm working on a DRL project for autonomous navigation with a TurtleBot3 in ROS 2 Gazebo, and I would like to share what I'm building and ask for some advice.

The goal is dynamic obstacle avoidance in an arena environment using DreamerV3. My implementation is based on this repo:
https://github.com/DrunkJin/dreamer-from-scratch

The main idea I'm experimenting with is to avoid feeding raw 1D LiDAR scans directly to the agent. Instead, I convert LiDAR hits into a Bird's-Eye-View (BEV) representation accumulated over a sliding time window. The intuition is that this gives the world model a more spatial representation of the environment, so the agent can observe where obstacles have been, not only where they are at the current timestep.

However, during training, the robot tends to spin in place instead of navigating toward the goal. After debugging, I found that one possible root cause was related to the two-hot encoding resolution in DreamerV3's reward prediction.

In my setup, terminal rewards are ±2000 and REWARD_RANGE = 2600 with 255 bins, meaning each bin is roughly 20 reward units wide. My original angular velocity penalty was:

-0.3 * w^2

where w can be up to 2.0 rad/s. This means the maximum spinning penalty was only about -1.2 per step, which is less than 0.06 of a bin. As a result, the world model could barely distinguish between "spinning" and "not spinning" in its reward predictions.

I tried to address this by normalizing the angular velocity by the maximum angular speed and increasing the penalty coefficient so that the penalty becomes visible over the imagination horizon.

This is the repo I am using for my implementation:
https://github.com/dugngyn293/turtlebot3_auto

I would really appreciate any advice from people who have worked with DreamerV3, world models, or DRL for robot navigation.

[D] Implement DreamerV3 in dynamic obstacle avoidance problem