WHATUSEE - LTX-Video LoRA
Trigger Word: WHATUSEE
This LoRA creates a specific transition effect in the idea of "What do you see", first zooming into the person's eye, then deep into it, to transition "landing" into the view from their eyes. This was a theme that has fascinated me and now that the tools are there, it's fun to pursue. You can leave out the prompt for what exactly they see in their first person view, or you can prompt for it, or you can use a last frame to determine it. The camera captures a subject, zooms violently into their eye/pupil, and dissolves into a first-person perspective (POV) of what that subject is looking at.
Note: While this model was primarily trained for Text-to-Video (T2V) and Image-to-Video (I2V), testing demonstrates that it also functions surprisingly well with FFLF (First-Frame-Last-Frame) conditioning, allowing for controlled start and end points.
Training Details & Dataset
This model was trained on a fully synthetic dataset created entirely from scratch by myself.
To achieve the consistent start images and transition effects required for this concept, I utilized a workflow involving:
- Qwen Image 2512 and Qwen Image Edit 2511
- Nano Banana Pro
- NebSH's "EyesIn" LoRA and Oumoumad's "Deepzoom" LoRA as well as my own "EarthZoomOut" LoRA
These tools, combined with specific 2D transition techniques, allowed me to build a consistent effect across a dataset of 12 videos. The model was trained for 5,000 steps total.
Checkpoints
- Step 2000: The recommended step count. The effect is present and stable.
- Step 2750: A "stronger" version that may be slightly overfitted but provides a more intense effect if needed.
FLLF Capabilities (First-Frame / Last-Frame)
Although trained as T2V/I2V, these examples demonstrate the model's capability to bridge a start image and an end state using FFLF. The images below correspond to the generated video outputs.
Example 1
Input Image:

Result:
Example 2
Input Image:

Result:
Standard Image-to-Video (Step Comparison)
These videos demonstrate the standard I2V functionality at different training stages.
| Step Count | Video |
|---|---|
| Step 2000 (Recommended) | |
| Step 2750 (Strong/Overfit) |
Prompting Guide
The dataset was captioned using a specific structure: Describe the subject, describe the zoom action, and describe the revealed POV.
Sample Prompt:
WHATUSEE: A middle-aged man with neatly combed graying hair sits upright in a beige metal-framed chair against a plain white wall, dressed sharply in a dark blazer over a light blue button-down shirt; his hands clasped calmly on his lap as he gazes directly at the viewer with an expression of quiet confidence. [cite_start]The camera then abruptly accelerates inward—zooming violently toward his right eye until it pierces through the pupil like a lens flare tunnel—and instantly dissolves into the first-person perspective from within his gaze: revealing not just emptiness but instead the precise view across a sterile room where another identical empty chair stands to his left... [cite: 1, 2, 3]
- Downloads last month
- 205
Model tree for o-8-o/WHATUSEE_LTX-2-19B_LoRA
Base model
Kijai/LTXV2_comfy