Sherry-1B-1.25bit-per-channel

A reproduced 1.25bit 1B model based on LLaMA-3.2 with Sherry quantization method from the paper:
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

Quantization Details

  • Format: 3:4 sparse ternary weights
    • For every 4 weights: 3 non-zero values + 1 zero
    • Average bitwidth: 1.25 bits
  • Quantization Type: per_channel quantization
  • Storage: Weights stored in bf16 format after scaling
  • Example weight pattern: [0.0315, 0.0000, 0.0315, -0.0315, -0.0315, -0.0315, 0.0000, 0.0315]

Performance Comparison

Model Size Quant type ARC_C ARC_E HelS PIQA WinG AVG
LLaMA3.2 1B - 0.313 0.654 0.477 0.742 0.603 0.558
Sherry (reported in paper) 1B per-channel - - - - - 0.513
Sherry (reproduced) 1B per-channel 0.297 0.618 0.386 0.707 0.560 0.514
Sherry (reported in paper) 1B per-group 0.309 0.647 0.388 0.699 0.550 0.519
Sherry (reproduced) 1B per-group 0.292 0.648 0.390 0.706 0.556 0.518
LLaMA3.2 3B - 0.422 0.745 0.552 0.768 0.691 0.636
Sherry (reported in paper) 3B per-group 0.364 0.688 0.452 0.736 0.593 0.567
Sherry (reproduced) 3B per-channel 0.360 0.713 0.445 0.732 0.596 0.569
Downloads last month
17
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MoraxGeo/Sherry-1B-1.25bit-per-channel

Finetuned
(879)
this model

Collection including MoraxGeo/Sherry-1B-1.25bit-per-channel

Paper for MoraxGeo/Sherry-1B-1.25bit-per-channel