Quantization - Neural Digest

Sign in Subscribe

Quantization

A collection of 3 posts

What GPU Do I Need to Run Local LLMs? A Complete Hardware Guide for 2026

What GPU Do I Need to Run Local LLMs? A Complete Hardware Guide for 2026

VRAM is the single most important spec for local LLMs. This complete guide breaks down exactly which GPU you need—from $250 Intel Arc to RTX 4090—with real benchmarks for Llama 4, DeepSeek R1, and more.

How Much Quality Is Lost When Quantizing LLMs? A Data-Driven Analysis of Q4_K_M vs FP16

How Much Quality Is Lost When Quantizing LLMs? A Data-Driven Analysis of Q4_K_M vs FP16

Quantization makes local LLMs accessible, but how much quality do you actually lose? We analyzed benchmark data from MMLU, GSM8K, and HellaSwag to compare Q4_K_M, Q8_0, and FP16 performance.

Microsoft Open-Sources BitNet: The 1-Bit LLM Inference Framework That Runs 100B Models on Your CPU

Microsoft Open-Sources BitNet: The 1-Bit LLM Inference Framework That Runs 100B Models on Your CPU

Microsoft open-sources BitNet (bitnet.cpp), an inference framework that runs 100B parameter 1-bit LLMs on consumer CPUs with up to 6x speedup and 82% energy reduction.