KV Cache - Search News

15d

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be ...

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

insideHPC

DDN Takes on GPU Waste with KV Cache Performance for AI Reasoning

CHATSWORTH, Calif. — July 18, 2025 DDN today unveiled performance benchmarks that the company said demonstrates how its AI-optimized DDN Infinia platform eliminates GPU waste and delivers the fastest ...

Semiconductor Engineering

AI Inference Needs A Mix-And-Match Memory Strategy

Interactive LLMs (chat, copilots, agents) with strict latency targets Long‑context reasoning (codebases, research, video) with massive KV (key value) cache footprints Ranking and recommendation models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results