Tau / README.md
p3nGu1nZz's picture
update readme
e4dbea9
metadata
license: mit

Tau LLM Unity ML Agents Project

Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.

Project Overview

This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.

Key Features

  • Reinforcement Learning: Utilizes the PPO algorithm for training agents.
  • Unity ML-Agents: Integrates with Unity ML-Agents for a seamless training experience.
  • Custom Reward Functions: Implements gradient-based reward functions for nuanced feedback.
  • Memory Networks: Incorporates memory networks to handle temporal dependencies.
  • TensorBoard Integration: Monitors training progress and performance using TensorBoard.

Configuration

Below is the configuration used for training the agents:

behaviors:
  TauAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 256
      buffer_size: 4096
      learning_rate: 0.00003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 10
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 4
      vis_encode_type: simple
      memory:
        memory_size: 256
        sequence_length: 256
        num_layers: 4
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        gamma: 0.995
        strength: 0.1
        network_settings:
          normalize: true
          hidden_units: 256
          num_layers: 4
          learning_rate: 0.00003
    keep_checkpoints: 10
    checkpoint_interval: 100000
    threaded: true
    max_steps: 3000000
    time_horizon: 256
    summary_freq: 10000

Model Naming Convention

The models in this repository follow the naming convention Tau_<series>_<max_steps>. This helps in easily identifying the series and the number of training steps for each model.

Getting Started

Prerequisites

  • Unity 6
  • Unity ML-Agents Toolkit
  • Python 3.10.11
  • PyTorch
  • Transformers

Installation

  1. Clone the repository:

    git clone https://github.com/p3nGu1nZz/Tau.git
    cd tau\MLAgentsProject
    
  2. Install the required Python packages:

    pip install -r requirements.txt
    
  3. Open the Unity project:

    • Launch Unity Hub and open the project folder.

Training the Agent

To start training the agent, run the following command:

mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force

Note: The preferred way to run a build is by creating a new build into the Build directory which is referenced by the above command.

Monitoring Training

You can monitor the training progress using TensorBoard:

tensorboard --logdir results 

Results

The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:

Average Reward Cumulative Reward

Citation

If you use this project in your research, please cite it as follows:

@misc{Tau,
  author = {K. Rawson},
  title = {Tau LLM Unity ML Agents Project},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Unity ML-Agents Toolkit
  • TensorFlow and PyTorch communities
  • Hugging Face for hosting the model repository