--- license: llama3.2 base_model: - meta-llama/Llama-3.2-1B --- # Sherry-1B-1.25bit-per-channel A **reproduced** 1.25bit 1B model based on LLaMA-3.2 with **Sherry** quantization method from the paper: [Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification](https://arxiv.org/pdf/2601.07892) ## Quantization Details - **Format**: 3:4 sparse ternary weights - For every 4 weights: 3 non-zero values + 1 zero - Average bitwidth: **1.25 bits** - **Quantization Type**: per_channel quantization - **Storage**: Weights stored in bf16 format after scaling - **Example weight pattern**: `[0.0315, 0.0000, 0.0315, -0.0315, -0.0315, -0.0315, 0.0000, 0.0315]` ## Performance Comparison | Model | Size | Quant type | ARC_C | ARC_E| HelS | PIQA | WinG | AVG | |---- | ---|------|----------|---------------|-----------|------|------------|---------| | LLaMA3.2 | 1B | - | 0.313 | 0.654 | 0.477 | 0.742 | 0.603 | 0.558 | | Sherry (reported in paper)| 1B | per-channel | - | - | - | - | - |0.513 | | [Sherry (reproduced)](https://huggingface.co/MoraxGeo/Sherry-1B-1.25bit-per-channel) | 1B |per-channel | 0.297 | 0.618 | 0.386 | 0.707 | 0.560 | 0.514 | | Sherry (reported in paper) | 1B | per-group | 0.309 | 0.647 | 0.388 |0.699 | 0.550 | 0.519 | | [Sherry (reproduced)](https://huggingface.co/MoraxGeo/Sherry-1B-1.25bit-per-group) | 1B | per-group | 0.292 | 0.648 | 0.390 | 0.706 | 0.556 | 0.518 | | LLaMA3.2 | 3B | - | 0.422 | 0.745 | 0.552 | 0.768 | 0.691 | 0.636 | | Sherry (reported in paper) | 3B | per-group | 0.364 | 0.688 | 0.452 | 0.736 | 0.593 | 0.567 | | [Sherry (reproduced)](https://huggingface.co/MoraxGeo/Sherry-3B-1.25bit-per-channel) | 3B | per-channel | 0.360 | 0.713 | 0.445 | 0.732 | 0.596 | 0.569 |