[models] GTX 1660 Super 6gb
The best little card under 100 euros. Full Precision vs Quants not benchmarked. This card is so much better at running inference than you realize.
Text Generation • 8B • Updated • 95.4k • 4Note Q4_K_M = Full GPU Offload Reasoning? YES Context: 2048 tok [ 4096 too big for 1660 vram, just start a new chat anyway lmao ] Token Thruput: 14.44 tok/sec, 2796 tokens, 0.55s to first token Thinking Duration: ~2min Benchmark Prompt: "How can I use { ModelName } to automate my office's intranet?" Note: Reasoning step was nearer 30 TPS, response TPS was measurably slower. VERY Verbose ( blew its context limit in one reply )
unsloth/Qwen3-4B-Instruct-2507-GGUF
4B • Updated • 63.4k • 143Note Q_8 XL = Full GPU Offload Reasoning? NO Context: 4096 tok Token Thruput:14.78 tok/sec, 546 tokens, 0.62s to first token Benchmark Prompt: "How can I use { ModelName } to automate my office's intranet?" Note: Fast & Concise.
unsloth/SmolLM3-3B-128K-GGUF
3B • Updated • 2.96k • 37Note 128K-UD-Q8_K_XL = Full GPU Offload Reasoning? YES Context: 4096 tok Token Thruput:34.13 tok/sec, 1774 tokens, 1.60s to first token Thinking Duration: 25 sec Benchmark Prompt: "How can I use { ModelName } to automate my office's intranet?" Note: A reliable LLM for generating synthetic data on small gpus. ran a full 24h cycle generating and reviewing manacaster art prompts with reasonably good results for a tiny model!
unsloth/Ministral-3-3B-Reasoning-2512-GGUF
3B • Updated • 3.65k • 13Note Q6_K_XL fast as shit, quality seems fine. not tested image multimodal yet.
-
mradermacher/SERA-8B-GA-i1-GGUF
8B • Updated • 649 -
eaddario/Olmo-3-7B-Think-GGUF
Text Generation • 7B • Updated • 144