LLM

5 post tagged with this.

2026-04-24DeepSeek V4 Released: 1M Context, Frontier Coding, at a Fraction of the Price

DeepSeek released V4-Pro and V4-Flash on April 24, 2026 — both open-source, MIT licensed, with 1M token context windows. V4-Pro tops LiveCodeBench at 93.5% and costs 7x less than Claude Opus. V4-Flash undercuts GPT-5.4 Nano on price. Legacy models retire July 24.

2026-04-23GPT-5.5 Released: OpenAI's Fastest Reasoning Model Yet, Optimized for Agentic Work

OpenAI launched GPT-5.5 on April 23, 2026 — six weeks after GPT-5.4. The model delivers better results with fewer tokens, a 400K context window in Codex, and meaningful gains on scientific research and agentic coding. API pricing is 2x GPT-5.4. API access is coming soon.

2026-04-16Claude Opus 4.7 Released: Benchmarks, What's New, and How It Compares to GPT-5.4

Anthropic shipped Claude Opus 4.7 today with real benchmark data against GPT-5.4 and Gemini 3.1 Pro. 87.6% on SWE-bench Verified, 80.6% on document reasoning, 3.75MP vision, and a new xhigh effort level.

2026-03-28KV Cache in LLMs: What It Is and Why It Matters

A deep dive into Key-Value (KV) Cache in large language models — what it is, how attention uses it, when it activates, and how it reduces latency and API costs.

2026-03-25TurboQuant: Google's Algorithm That Compresses LLMs to 3 Bits

TurboQuant compresses LLM key-value caches to 3 bits with no accuracy loss — 8x throughput on H100 GPUs and zero training required.

← All posts