Quantization
1 post tagged with this.
TurboQuant compresses LLM key-value caches to 3 bits with no accuracy loss — 8x throughput on H100 GPUs and zero training required.
1 post tagged with this.
TurboQuant compresses LLM key-value caches to 3 bits with no accuracy loss — 8x throughput on H100 GPUs and zero training required.