llama.cpp - Neural Digest

Sign in Subscribe

llama.cpp

A collection of 1 post

How Much Quality Is Lost When Quantizing LLMs? A Data-Driven Analysis of Q4_K_M vs FP16

How Much Quality Is Lost When Quantizing LLMs? A Data-Driven Analysis of Q4_K_M vs FP16

Quantization makes local LLMs accessible, but how much quality do you actually lose? We analyzed benchmark data from MMLU, GSM8K, and HellaSwag to compare Q4_K_M, Q8_0, and FP16 performance.