LLM INFERENCE SYSTEMS / AI INFRASTRUCTURE

Python Engineer → LLM Inference Systems

I'm a Python engineer with 6+ years of backend experience, building a learning trail around LLM inference systems — KV cache, serving optimization, cache-aware routing, and benchmark-driven engineering.

Read notes View projects GitHub

Current work

01Notes

writing

Prefill / Decode / KV Cache

02Experiments

planned

Tiny Transformer KV cache from scratch

03Benchmarks

planned

TTFT / ITL / Throughput methodology

Recent notes

All notes

Jun 21, 2026 1 min read

Prefill vs Decode: LLM Inference 的两个阶段

理解 LLM 推理中 prefill 和 decode 的区别，以及为什么 prefill 更适合 batching。

#llm-inference #prefill #decode #batching

Jun 20, 2026 1 min read

KV Cache 为什么会吃显存？

梳理 KV Cache 的数据结构、显存估算方式，以及长上下文为什么会放大问题。

#kv-cache #memory #llm-inference #gpu

Jun 19, 2026 1 min read

Prefix Cache 命中率如何影响 TTFT？

分析 Prefix Cache 命中与未命中对首 token 延迟的影响，并记录后续 benchmark 计划。

#prefix-cache #ttft #benchmark #llm-inference

Featured projects

All projects

LLM Inference Lab

In Progress

A long-term learning and experiment lab for LLM inference systems.

Phase: 01
Focus: KV cache from scratch
Outputs: notes + experiments + benchmark reports

PythonPyTorchvLLMSGLang

#llm-inference #kv-cache #vllm #sglang

Details

Personal Technical Site

In Progress

一个用 Astro 和 Markdown 构建的个人技术主页，用于记录学习过程、技术笔记、实验报告和作品集。

Stack: Astro + Markdown + Docker
Purpose: learning notes and portfolio

AstroTypeScriptTailwind CSSMarkdown

#astro #typescript #markdown #portfolio

Details

Technical tags

View all tags

#KV Cache #vLLM #SGLang #LMCache #Benchmark #Serving Systems