LLM
5 post tagged with this.
DeepSeek released V4-Pro and V4-Flash on April 24, 2026 — both open-source, MIT licensed, with 1M token context windows. V4-Pro tops LiveCodeBench at 93.5% and costs 7x less than Claude Opus. V4-Flash undercuts GPT-5.4 Nano on price. Legacy models retire July 24.
OpenAI launched GPT-5.5 on April 23, 2026 — six weeks after GPT-5.4. The model delivers better results with fewer tokens, a 400K context window in Codex, and meaningful gains on scientific research and agentic coding. API pricing is 2x GPT-5.4. API access is coming soon.
Anthropic shipped Claude Opus 4.7 today with real benchmark data against GPT-5.4 and Gemini 3.1 Pro. 87.6% on SWE-bench Verified, 80.6% on document reasoning, 3.75MP vision, and a new xhigh effort level.
A deep dive into Key-Value (KV) Cache in large language models — what it is, how attention uses it, when it activates, and how it reduces latency and API costs.
TurboQuant compresses LLM key-value caches to 3 bits with no accuracy loss — 8x throughput on H100 GPUs and zero training required.