My notification - a nithin12342 Collection

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 15 days ago • 20

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs

Paper • 2601.17058 • Published 14 days ago • 183

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Paper • 2411.15399 • Published Nov 23, 2024 • 1

nvidia/personaplex-7b-v1

Audio-to-Audio • Updated 8 days ago • 162k • 1.66k

Qwen/Qwen3-ASR-0.6B

Automatic Speech Recognition • 0.9B • Updated 7 days ago • 26.8k • 168

Qwen3-ASR Technical Report

Paper • 2601.21337 • Published 7 days ago • 32

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 9 days ago • 23

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published 7 days ago • 68

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Paper • 2601.20354 • Published 8 days ago • 109

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Paper • 2601.21406 • Published 7 days ago • 4

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published 9 days ago • 7

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published 7 days ago • 41

SERA: Soft-Verified Efficient Repository Agents

Paper • 2601.20789 • Published 8 days ago • 11

moonshotai/Kimi-K2.5

Image-Text-to-Text • 171B • Updated about 12 hours ago • 203k • • 1.72k

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Paper • 2601.22904 • Published 6 days ago • 13

Phr00t/LTX2-Rapid-Merges

Image-Text-to-Video • Updated about 18 hours ago • 261

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper • 2601.23184 • Published 6 days ago • 31

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Paper • 2602.02092 • Published 3 days ago • 17

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published 3 days ago • 37

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published 6 days ago • 32

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published 3 days ago • 29

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published 3 days ago • 122

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Paper • 2601.21358 • Published 7 days ago • 6

Balancing Understanding and Generation in Discrete Diffusion Models

Paper • 2602.01362 • Published 4 days ago • 13

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published 2 days ago • 48

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Paper • 2602.01785 • Published 3 days ago • 86

LIVE: Long-horizon Interactive Video World Modeling

Paper • 2602.03747 • Published 2 days ago • 12