LingBot-World NF4 Quantized
Pre-quantized NF4 weights for LingBot-World video generation model. This is a complete, self-contained package - no additional downloads required.
Features
- 4-bit NF4 quantization via bitsandbytes - fits in 32GB VRAM
- Pre-quantized weights - no runtime quantization overhead
- Complete package - includes T5 encoder, VAE, and diffusion models
Quick Start
# Clone the repo
git clone https://huggingface.co/cahlen/lingbot-world-base-cam-nf4
cd lingbot-world-base-cam-nf4
# Install dependencies
pip install -r requirements.txt
# Generate a video
python generate_prequant.py \
--image your_image.jpg \
--prompt "A cinematic video of the scene" \
--frame_num 81 \
--output output.mp4
Model Contents
| File | Size | Description |
|---|---|---|
high_noise_model_bnb_nf4/model.safetensors |
~9.6GB | NF4 quantized diffusion model (high noise) |
low_noise_model_bnb_nf4/model.safetensors |
~9.6GB | NF4 quantized diffusion model (low noise) |
models_t5_umt5-xxl-enc-bf16.pth |
~10.6GB | T5-XXL text encoder |
Wan2.1_VAE.pth |
~485MB | VAE encoder/decoder |
Total size: ~30GB (vs ~85GB for full precision models)
Usage
Basic Generation
python generate_prequant.py \
--image input.jpg \
--prompt "Your prompt here" \
--frame_num 81 \
--size "480*832" \
--output output.mp4
Parameters
| Parameter | Default | Description |
|---|---|---|
--image |
required | Input image path |
--prompt |
required | Text prompt describing the video |
--frame_num |
81 | Number of frames (81 = ~5 seconds at 16fps) |
--size |
"480*832" | Output resolution (height*width) |
--sampling_steps |
40 | Diffusion sampling steps |
--guide_scale |
5.0 | Classifier-free guidance scale |
--seed |
-1 | Random seed (-1 for random) |
--output |
"output.mp4" | Output video path |
With Camera Control
python generate_prequant.py \
--image input.jpg \
--prompt "Your prompt" \
--action_path /path/to/camera_poses/ \
--frame_num 81
Camera pose directory should contain:
poses.npy: Shape[num_frames, 4, 4]- camera transformation matricesintrinsics.npy: Shape[num_frames, 4]-[fx, fy, cx, cy]
Requirements
- Python 3.10+
- CUDA 11.8+ (tested with CUDA 12.x)
- ~32GB VRAM (RTX 4090, RTX 5090, A100, etc.)
Quantization Details
The diffusion models are quantized using bitsandbytes NF4 with double quantization:
{
"format": "bnb_nf4",
"double_quant": true,
"compute_dtype": "bfloat16",
"blocksize": 64
}
This achieves ~3.9x compression while maintaining generation quality.
License
This model is based on LingBot-World and follows its license terms.
Citation
@misc{lingbot-world-nf4,
title={LingBot-World NF4 Quantized},
year={2025},
url={https://huggingface.co/cahlen/lingbot-world-base-cam-nf4}
}