Articles
Latest news and updates from PySpur
•
Guide
DeepSeek's Multi-Head Latent Attention and Other KV Cache Tricks
How a Key-Value (KV) cache reduces Transformer inference time by trading memory for computation
•
Guide
Splitting KV Cache To Multiple GPUs
Distributing KV cache tensors across multiple GPUs helps keep memory usage per GPU within acceptable limits.
•
LLM ReleaseOpen-Source
DeepSeek-V3: The Cheapest Frontier Model So Far
DeepSeek-V3 is a $6M open-source LLM that is competitive with state-of-the-art frontier models.