TermiGen-32B
TermiGen-32B achieves 31.3% pass@1 on TerminalBench 1.0, establishing a new open-weight state-of-the-art and surpassing proprietary models like o4-mini with Codex CLI (20.0%).
📄 Paper: TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
💻 Environments: https://github.com/ucsb-mlsec/terminal-bench-env
🧪 Benchmark: https://github.com/laude-institute/terminal-bench
Model Description
This model is fine-tuned from Qwen2.5-Coder-32B-Instruct using the TermiGen pipeline, which synthesizes high-fidelity training data through two phases:
Phase I: Environment Synthesis
- Multi-agent system generates 3,500+ verified Docker environments
- Tasks span 11 categories: system administration, security forensics, scientific computing, MLOps, etc.
- 420 unique command-line tools across 16 functional domains
- Automated unit test validation ensures task solvability
Phase II: Error-Correction Trajectory Collection
- Generator-Critic framework with 20% error injection rate
- Teaches error → diagnosis → recovery cycles
- 3,291 trajectories (avg. 25.5 turns, 8,722 tokens each)
- Teacher model: Claude-4.5-Sonnet
Training Details
Training Hyperparameters:
- Base Model: Qwen2.5-Coder-32B-Instruct
- Learning Rate: 5e-6 (cosine schedule, 10% warmup)
- Batch Size: 32 (8 GPUs × 4 gradient accumulation)
- Sequence Length: 20,000 tokens
- Epochs: 5
- Precision: BF16 with DeepSpeed ZeRO-3
- Hardware: 8× AMD MI325X GPUs
Dataset Statistics:
- 3,500+ verified environments across 11 task categories
- 3,291 training trajectories
- Tool diversity: 420 unique CLI tools
- Average trajectory: 25.5 turns, 8,722 tokens
Evaluation Results
TerminalBench Performance
| Benchmark | Pass@1 |
|---|---|
| TerminalBench 1.0 | 31.3% |
| TerminalBench 2.0 | 18.0% |
Usage
We implemented a minimal BashAgent framework based on TerminalBench for agentic terminal execution. The agent interacts with Docker containers via bash shell, generating ReAct-style responses at each turn.
For detailed usage and integration examples, please refer to our GitHub repository.
Citation
@article{zhu2026termigen,
title={TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents},
author={Zhu, Kaijie and Nie, Yuzhou and Li, Yijiang and Huang, Yiming and Wu, Jialian and Liu, Jiang and Sun, Ximeng and Yin, Zhenfei and Wang, Lun and Liu, Zicheng and Barsoum, Emad and Wang, William Yang and Guo, Wenbo},
journal={arXiv preprint arXiv:2602.07274},
url={https://arxiv.org/abs/2602.07274},
year={2026}
}
License
Apache 2.0 (inherited from Qwen2.5-Coder base model)
Contact: Kaijie Zhu ([email protected])
- Downloads last month
- 55
Model tree for UCSB-SURFI/TermiGen-32B
Base model
Qwen/Qwen2.5-32B