FutureMotion

Common repo for our ongoing research on motion forecasting in self-driving vehicles.

Setup

Clone this repo, afterwards init external submodules with:

git submodule update --init --recursive

Create a conda environment named "future-motion" with:

conda env create -f conda_env.yml

Prepare Waymo Open Motion and Argoverse 2 Forecasting datasets by following the instructions in src/external_submodules/hptr/README.md.

Our methods

RedMotion: Motion Prediction via Redundancy Reduction

Our RedMotion model consists of two encoders. The trajectory encoder generates an embedding for the past trajectory of the current agent. The road environment encoder generates sets of local and global road environment embeddings as context. We use two redundancy reduction mechanisms, (a) architecture-induced and (b) self-supervised, to learn rich representations of road environments. All embeddings are fused via cross-attention to yield trajectory proposals per agent.

More details

This repo contains the refactored implementation of RedMotion, the original implementation is available here.

The Waymo Motion Prediction Challenge doesn't allow sharing the weights used in the challenge. However, we provide a Colab notebook for a model with a shorter prediction horizon (5s vs. 8s) as a demo.

Training

To train a RedMotion model (tra-dec config) from scratch, adapt the global variables in train.sh according to your setup (Weights & Biases, local paths, batch size and visible GPUs). The default batch size is set for A6000 GPUs with 48GB VRAM. Then start the training run with:

bash train.sh ac_red_motion

For reference, this wandb plot shows the validation mAP scores for the epochs 23 - 129 (default config, trained on 4 A6000 GPUs for ~100h).

Reference

@article{
    wagner2024redmotion,
    title={RedMotion: Motion Prediction via Redundancy Reduction},
    author={Royden Wagner and Omer Sahin Tas and Marvin Klemp and Carlos Fernandez and Christoph Stiller},
    journal={Transactions on Machine Learning Research},
    year={2024},
}

Words in Motion: Representation Engineering for Motion Forecasting

We use natural language to quantize motion features in an inter-pretable way. (b) The corresponding direction, speed, and acceleration classes are highlighted in blue. (c) To reverse engineer motion forecasting models, we measure the degree to which these features are embedded in their hidden states H with linear probes. Furthermore, we use our discrete motion features to fit control vectors V that allow for controlling motion forecasts during inference.

More details

Gradio demos

Use this Colab notebook to start Gradio demos for our speed control vectors.

In contrast to the qualitative results in our paper, we show the motion forecasts for the focal agent and 8 other agents in a scene. Press the submit button with the default temperature = 0 to visualize the default (non-controlled) forecasts, then change the temperature and resubmit to visualize the changes. The example is from the Waymo Open dataset and shows motion forecasts for vehicles and a pedestrian (top center).

For very low control temperatures (e.g, -100), almost all agents are becoming static. For very high control temperatures (e.g., 85), even the static (shown in grey) agents begin to move, and the pedestrian does not move faster anymore. We hypothesize that the model has learned a reasonable upper bound for the speed of a pedestrian.

Training

Soon to be released.

SceneMotion: From Agent-centric Embeddings to Scene-wide Forecasts

Our attention-based motion forecasting model is composed of stacked encoder and decoder modules. Variable-sized agent-centric views $V_i$ are reduced to fixed-sized agent-centric embeddings $E_i$ via cross-attention with road environment descriptor (RED) tokens $R_j$. Afterwards, we concatenate the agent-centric embeddings with global reference tokens $G_i$ and rearrange them to form a scene-wide embedding. Our latent context module then learns global context and our motion decoder transforms learned anchors $A_k$ into scene-wide forecasts.

More details

Soon to be released.

Other methods

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Wayformer models take multimodal scene data as input, project it into a homogeneous (i.e., same dim) token format, and transform learned seeds (i.e., trajectory anchors) into multimodal distributions of trajectories.

We provide an open-source implementation of the Wayformer model with an early fusion scene encoder and multi-axis latent query attention. Our implementation is a refactored version of the AgentCentricGlobal model from the HPTR repo with improved performance (higher mAP scores).

The hyperparameters defined in configs/model/ac_wayformer.yaml follow the ones in Table 4 (see Appendix D in the Wayformer paper) except the number of decoders is 1 instead of 3.

More details

We use the polyline representation of MPA (Konev, 2022) as input and the non-maximum supression (NMS) algorithm of MTR (Shi et. al., 2023) to generate 6 trajetories from the predicted 64 trajectories.

Adapt the paths and accounts in sbatch/train_wayformer_juwels.sh to your setup to train a Wayformer model on a Juwels-like cluster with a Slurm system and at least 2 nodes with 4 A100 GPUs each. The training is configured for the Waymo Open Motion dataset and takes roughly 24h.

Reference

@inproceedings{nayakanti2023wayformer,
  title={Wayformer: Motion forecasting via simple \& efficient attention networks},
  author={Nayakanti, Nigamaa and Al-Rfou, Rami and Zhou, Aurick and Goel, Kratarth and Refaat, Khaled S and Sapp, Benjamin},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2023},
}

Acknowledgements

This repo builds upon the great work HPTR by @zhejz.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FutureMotion

Setup

Our methods

RedMotion: Motion Prediction via Redundancy Reduction

Words in Motion: Representation Engineering for Motion Forecasting

SceneMotion: From Agent-centric Embeddings to Scene-wide Forecasts

Other methods

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
configs		configs
figures		figures
sbatch		sbatch
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
conda_env.yml		conda_env.yml
train.sh		train.sh

KIT-MRT/future-motion

Folders and files

Latest commit

History

Repository files navigation

FutureMotion

Setup

Our methods

RedMotion: Motion Prediction via Redundancy Reduction

Words in Motion: Representation Engineering for Motion Forecasting

SceneMotion: From Agent-centric Embeddings to Scene-wide Forecasts

Other methods

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages