Sherry-1B-1.25bit-per-channel

A reproduced 1.25bit 1B model based on LLaMA-3.2 with Sherry quantization method from the paper:
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

Quantization Details

Format: 3:4 sparse ternary weights
- For every 4 weights: 3 non-zero values + 1 zero
- Average bitwidth: 1.25 bits
Quantization Type: per_channel quantization
Storage: Weights stored in bf16 format after scaling
Example weight pattern: [0.0315, 0.0000, 0.0315, -0.0315, -0.0315, -0.0315, 0.0000, 0.0315]

Model	Size	Quant type	ARC_C	ARC_E	HelS	PIQA	WinG	AVG
LLaMA3.2	1B	-	0.313	0.654	0.477	0.742	0.603	0.558
Sherry (reported in paper)	1B	per-channel	-	-	-	-	-	0.513
Sherry (reproduced)	1B	per-channel	0.297	0.618	0.386	0.707	0.560	0.514
Sherry (reported in paper)	1B	per-group	0.309	0.647	0.388	0.699	0.550	0.519
Sherry (reproduced)	1B	per-group	0.292	0.648	0.390	0.706	0.556	0.518
LLaMA3.2	3B	-	0.422	0.745	0.552	0.768	0.691	0.636
Sherry (reported in paper)	3B	per-group	0.364	0.688	0.452	0.736	0.593	0.567
Sherry (reproduced)	3B	per-channel	0.360	0.713	0.445	0.732	0.596	0.569