VIBE: Visual Instruction Based Editor

VIBE

🌐 Project Page | πŸ“œ Paper on arXiv | Github | πŸ€— Space | πŸ€— VIBE-Image-Edit-DistilledCFG |

VIBE is a powerful open-source framework for text-guided image editing. It leverages the efficiency of the Sana1.5-1.6B diffusion model and the visual understanding capabilities of Qwen3-VL-2B-Instruct to provide exceptionally fast and high-quality, instruction-based image manipulation.

We also provide a faster, CFG-distilled version of this model available at VIBE-Image-Edit-DistilledCFG.

Model Details

  • Name: VIBE
  • Task: Text-Guided Image Editing
  • Architecture:
    • Diffusion Backbone: Sana1.5 (1.6B parameters) with Linear Attention.
    • Condition Encoder: Qwen3-VL (2B parameters) for multimodal understanding.
  • Framework: Built on diffusers and transformers.
  • Model precision: torch.bfloat16 (BF16)
  • Model resolution: This model is developed to edit up to 2048px images with multi-scale heigh and width.

Features

  • Text-Guided Editing: Edit images using natural language instructions (e.g., "Add a cat on the sofa").
  • Compact & Efficient: Combines a 1.6B parameter diffusion model with a 2B parameter encoder for a lightweight footprint.
  • High-Speed Inference: Utilizes Sana1.5's linear attention mechanism for rapid generation.
  • Multimodal Understanding: Qwen3-VL ensures strong alignment between visual content and text instructions.
  • Text-to-Image support.

Inference Requirements

  • vibe library
pip install git+https://github.com/ai-forever/VIBE
  • requirements for vibe library:
pip install transformers==4.57.1 torchvision==0.21.0 torch==2.6.0 diffusers==0.33.1 loguru==0.7.3

Quick start

from PIL import Image
import requests
from io import BytesIO
from huggingface_hub import snapshot_download

from vibe.editor import ImageEditor

# Download model
model_path = snapshot_download(
    repo_id="iitolstykh/VIBE-Image-Edit",
    repo_type="model",
)

# Load model
editor = ImageEditor(
    checkpoint_path=model_path,
    image_guidance_scale=1.2,
    guidance_scale=4.5,
    num_inference_steps=20,
    device="cuda:0",
)

# Download test image
resp = requests.get('https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/3f58a82a-b4b4-40c3-a318-43f9350fcd02/original=true,quality=90/115610275.jpeg')
image = Image.open(BytesIO(resp.content))

# Generate edited image
edited_image = editor.generate_edited_image(
    instruction="let this case swim in the river",
    conditioning_image=image,
    num_images_per_prompt=1,
)[0]

edited_image.save(f"edited_image.jpg", quality=100)

T2I Examples

(Seed: 234) Prompt: View through the clouds at Earth from a plane

Image 1

(Seed: 2) Prompt: Medieval castle at sunset surrounded by dense forest and mist

Image 7

(Seed: 666) Prompt: Portrait of an old wise man with a long white beard surrounded by books and candles

Image 4

(Seed: 9513) Prompt: Night urban street with wet asphalt reflections and neon signs

Image 5

(Seed: 142) Prompt: Futuristic sports car racing in the desert

Image 2

(Seed: 1325) Prompt: Pirate boat in ocean

Image 3

(Seed: 4241) Prompt: Davy Jones portrait

Image 6

(Seed: 142) Prompt: Epic cosmic scene with a huge space station and distant stars

Image 8

(Seed: 42) Prompt: Cherry blossom park in spring with petals falling to the ground

Image 9

Comparison with SANA1.5_1.6B_1024px

Prompt: Generate an interior of a rustic cabin workshop during winter evening. The viewpoint is from the doorway, showing a workbench with tools, wood shavings on the floor, and a cast-iron stove glowing softly. Place shelves with jars of nails, coils of rope, and folded blankets. Through a small window, show snow falling and pine trees in the twilight. Add warm lamplight creating soft gradients and a gentle vignette. Include a person in a thick sweater sanding a wooden object at the bench, but keep the person small in frame

VIBE
VIBE (Seed: 4411)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 1521)

Prompt: Generate an ancient jungle temple ruin partially covered in moss and vines, with a waterfall cascading nearby into a shallow pool. Show broken stone steps, carved patterns that are abstract, and damp surfaces with realistic moss detail. Add mist, shafts of sunlight through leaves, and small floating insects. Include a human explorer in the mid-ground, small in frame, wearing a backpack. Lush, cinematic realism.

VIBE
VIBE (Seed: 1995)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 9842)

Prompt: Create a science-fiction interior of a space greenhouse module with hydroponic racks, glowing grow lights, and condensation on transparent walls. Plants include leafy greens and flowering specimens. Tools and tablets have UI elements. Add soft floating dust or microgravity droplets. Clean, detailed, plausible sci-fi aesthetic.

VIBE
VIBE (Seed: 2203)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 143)

Prompt: Beautiful tropical beach with guinea pig swimming in the water and human drinking wine

VIBE
VIBE (Seed: 132142)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 132142)

Prompt: Create a cinematic, rainy night scene in a narrow backstreet of an old downtown area. The camera is at street level, slightly tilted upward, emphasizing wet cobblestones reflecting neon-like colored lights without readable text. Show a small ramen stall with steam rising from pots, hanging paper lanterns that are blank or patterned (no letters), and acouple of stools under a simple awning. Add puddles, scattered trash like crumpled paper, and subtle mist. Include a passerby in the mid-ground seen from behind wearing a hooded jacket and carrying an umbrella, face not visible. Use a moody color palette of deep blues and warm oranges, with soft bokeh highlights and realistic rain streaks

VIBE
VIBE (Seed: 1003)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 3114)

Prompt: Depict a volcanic lava field at twilight with cooled black rock, glowing cracks of magma in the distance, and heat shimmer. The sky is darkening with faint stars emerging. Add thin smoke plumes and red-orange reflections on nearby rocks. Cinematic realism, dramatic contrast

VIBE
VIBE (Seed: 1520)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 1267)

Prompt: Portrait from back of a young woman dressed in Victorian attire standing in an ancient library filled with mirrors and stained glass windows, softly illuminated by sunlight streaming through

VIBE
VIBE (Seed: 4152)
SANA1.5_1.6B_1024px
SANA1.5_1.6B_1024px (Seed: 6742)

License

This project is built upon the SANA. Please refer to the original SANA license for usage terms: SANA License

Citation

If you use this model in your research or applications, please acknowledge the original projects:

@misc{vibe2026,
  Author = {Grigorii Alekseenko and Aleksandr Gordeev and Irina Tolstykh and Bulat Suleimanov and Vladimir Dokholyan and Georgii Fedorov and Sergey Yakubson and Aleksandra Tsybina and Mikhail Chernyshov and Maksim Kuprashevich},
  Title = {VIBE: Visual Instruction Based Editor},
  Year = {2026},
  Eprint = {arXiv:2601.02242},
}
Downloads last month
570
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for iitolstykh/VIBE-Image-Edit

Unable to build the model tree, the base model loops to the model itself. Learn more.

Spaces using iitolstykh/VIBE-Image-Edit 2

Collection including iitolstykh/VIBE-Image-Edit

Paper for iitolstykh/VIBE-Image-Edit