---
license: llama3.2
base_model:
- meta-llama/Llama-3.2-1B
---

# Sherry-1B-1.25bit-per-channel

A **reproduced** 1.25bit 1B model based on LLaMA-3.2 with **Sherry** quantization method from the paper:  
[Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification](https://arxiv.org/pdf/2601.07892)

## Quantization Details

- **Format**: 3:4 sparse ternary weights
  - For every 4 weights: 3 non-zero values + 1 zero
  - Average bitwidth: **1.25 bits**
- **Quantization Type**: per_channel quantization
- **Storage**: Weights stored in bf16 format after scaling
- **Example weight pattern**: `[0.0315, 0.0000, 0.0315, -0.0315, -0.0315, -0.0315, 0.0000, 0.0315]`

## Performance Comparison

| Model | Size | Quant type | ARC_C | ARC_E| HelS | PIQA | WinG | AVG |
|---- | ---|------|----------|---------------|-----------|------|------------|---------|
| LLaMA3.2 | 1B | - | 0.313 | 0.654 | 0.477 | 0.742 | 0.603 | 0.558 |
| Sherry (reported in paper)| 1B  | per-channel | - | - | - | - | - |0.513 |
| [Sherry (reproduced)](https://huggingface.co/MoraxGeo/Sherry-1B-1.25bit-per-channel) | 1B |per-channel | 0.297 | 0.618 | 0.386 | 0.707 | 0.560 | 0.514 |
| Sherry (reported in paper) | 1B | per-group |  0.309 | 0.647 | 0.388 |0.699  | 0.550 | 0.519 |
| [Sherry  (reproduced)](https://huggingface.co/MoraxGeo/Sherry-1B-1.25bit-per-group) | 1B | per-group  | 0.292 |  0.648 | 0.390 | 0.706 | 0.556 | 0.518 |
| LLaMA3.2 | 3B | - | 0.422 |   0.745 | 0.552 |  0.768  | 0.691  | 0.636 |
| Sherry (reported in paper) | 3B | per-group |  0.364 | 0.688  | 0.452 | 0.736  | 0.593 | 0.567 |
| [Sherry  (reproduced)](https://huggingface.co/MoraxGeo/Sherry-3B-1.25bit-per-channel) | 3B | per-channel  | 0.360 |  0.713 | 0.445 | 0.732 | 0.596 | 0.569 |