Overview
PyTorch implementation of “Genie: Generative Interactive Environments” (Bruce et al., 2024). Formalizes discrete codebook of latent actions for interpretable agent control.
Architecture
LatentAction model encodes control signals into small, discrete codebook. Enables interpretable actions such as MOVE_RIGHT, JUMP, INTERACT rather than continuous control vectors.
Implementation
Built on established architectures:
- MagViT implementation (lucidrains)
- MaskGIT implementation (valeoai)
- Forked from Open-Genie
Requirements
- PyTorch 2.3.0
- CUDA 12.1
- Conda/pip installation
- Separate
requirements_osx.txtfor macOS
Key Features
- Discrete latent action space
- Interpretable control primitives
- Video generation conditioned on actions
- Action discovery from unlabeled video
Applications
Enables learning interactive environment models from video data without explicit action labels. Applicable to robotics, game AI, and embodied agent training.