Vessel Reinforcement Learning

Introduction

In the Python Client of ASVSim, we have implemented a simple reinforcement learning (RL) example that enables researchers and developers to train autonomous vessel navigation agents. The RL environment provides a simulation for training agents to navigate vessels to target destinations while avoiding obstacles and collisions.

As the generalizability of RL methods is highly dependent on variety during training, we have implemented a procedural generation system that allows you to randomize the port environment during training. See Procedural Generation for more details. Furthermore, you can spawn static and dynamic obstacles in the environment using the simAddObstacle() function. See Vessel API for more details.

The reinforcement learning system is located in PythonClient/Vessel/.

Environment Overview (Shipsim_gym.py)

Environment Specifications

The ShippingSim environment implements a continuous control task where an agent learns to navigate a vessel to a target location while avoiding obstacles. The agent was trained in a slightly modified version of the LakeEnv environment. Do note that this is just an example and the environment nor the reward, action and observation spaces are optimized for RL training.

Specification Details
Action Space Box(2,) - [thrust, rudder]
Action Range thrust: [0, 1], rudder: [0.4, 0.6]
Observation Space Box(57,) - vessel state + LiDAR data
Episode Length Maximum 200 timesteps
Success Condition Reach within 10 meters of goal

Observation Space Details

The observation vector contains 57 elements:

obs = [
    distance_to_goal_x,           # Current X distance to goal
    distance_to_goal_y,           # Current Y distance to goal  
    prev_distance_to_goal_x,      # Previous X distance to goal
    prev_distance_to_goal_y,      # Previous Y distance to goal
    heading,                      # Vessel heading (radians)
    linear_velocity_x,            # X-axis linear velocity
    linear_velocity_y,            # Y-axis linear velocity
    linear_acceleration_x,        # X-axis linear acceleration
    linear_acceleration_y,        # Y-axis linear acceleration
    angular_acceleration_z,       # Z-axis angular acceleration
    prev_thrust,                  # Previous thrust action
    prev_rudder,                  # Previous rudder action
    lidar_distances[0:45]         # 45 LiDAR distance measurements
]

Action Space Details

Action Range Description
thrust [0, 1] Forward propulsion control
rudder [0.4, 0.6] Steering control (0.5 = straight)

SAC Training (sac_example.py)

Overview

The SAC (Soft Actor-Critic) implementation provides state-of-the-art continuous control learning for vessel navigation. SAC is particularly suitable for this task due to its sample efficiency and stability in continuous action spaces.

Basic Usage

import gymnasium as gym
from stable_baselines3 import SAC
from Vessel.envs.Shipsim_gym import ShippingSim

# Create environment
env = gym.make("ship-sim-v0")

# Initialize SAC agent
model = SAC(
    "MlpPolicy", 
    env, 
    verbose=1,
    tensorboard_log="./sac_ship_sim_tb/",
    batch_size=32,
    buffer_size=4000,
    learning_starts=500,
    train_freq=1,
    tau=0.010,
    target_entropy=-2,
    stats_window_size=10
)

# Train the agent
model.learn(total_timesteps=25000, log_interval=1)

# Save the trained model
model.save("sac_ship_sim_v0")

Environment Configuration

AirSim Settings

Ensure your settings.json includes proper vessel configuration:

  "SimMode": "Vessel",
  "Vehicles": {
    "Drone1": {
      "VehicleType": "MilliAmpere",
      "HydroDynamics": {
        "hydrodynamics_engine": "FossenCurrent"
      },
      "PawnPath": "DefaultVessel",
      "AutoCreate": true,
      "RC": {
        "RemoteControlID": 0
      },
      "Sensors": {
          "lidar1": {
            "SensorType": 6,
            "Enabled": true,
            "NumberOfChannels": 1,
            "RotationsPerSecond": 1,
            "MeasurementsPerCycle": 450,
            "range": 100000,
            "X": 0,
            "Y": 0,
            "Z": -3.2,
            "Roll": 0,
            "Pitch": 0,
            "Yaw": 0,
            "VerticalFOVUpper": -2,
            "VerticalFOVLower": -3,
            "GenerateNoise": false,
            "DrawDebugPoints": false,
            "HorizontalFOVStart": -180,
            "HorizontalFOVEnd": 180
          }
      }
  }
}

Custom Goal Positions

Modify goal positions in the environment:

class ShippingSim(gym.Env):
    def __init__(self, options=None):
        # ... existing initialization ...
        self.goal_x = -60  # Modify target X coordinate
        self.goal_y = -10  # Modify target Y coordinate

For additional examples and advanced usage, see the Vessel API documentation and AirSim API reference.