Sherry-1.25bit
Collection
The 1.25-bit models via Sherry, 3:4 sparse ternary quantization.
•
3 items
•
Updated
A reproduced 1.25bit 1B model based on LLaMA-3.2 with Sherry quantization method from the paper:
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification
[0.0315, 0.0000, 0.0315, -0.0315, -0.0315, -0.0315, 0.0000, 0.0315]| Model | Size | Quant type | ARC_C | ARC_E | HelS | PIQA | WinG | AVG |
|---|---|---|---|---|---|---|---|---|
| LLaMA3.2 | 1B | - | 0.313 | 0.654 | 0.477 | 0.742 | 0.603 | 0.558 |
| Sherry (reported in paper) | 1B | per-channel | - | - | - | - | - | 0.513 |
| Sherry (reproduced) | 1B | per-channel | 0.297 | 0.618 | 0.386 | 0.707 | 0.560 | 0.514 |
| Sherry (reported in paper) | 1B | per-group | 0.309 | 0.647 | 0.388 | 0.699 | 0.550 | 0.519 |
| Sherry (reproduced) | 1B | per-group | 0.292 | 0.648 | 0.390 | 0.706 | 0.556 | 0.518 |
| LLaMA3.2 | 3B | - | 0.422 | 0.745 | 0.552 | 0.768 | 0.691 | 0.636 |
| Sherry (reported in paper) | 3B | per-group | 0.364 | 0.688 | 0.452 | 0.736 | 0.593 | 0.567 |
| Sherry (reproduced) | 3B | per-channel | 0.360 | 0.713 | 0.445 | 0.732 | 0.596 | 0.569 |
Base model
meta-llama/Llama-3.2-1B