All Topics

LLM Foundations

Everything a software engineer needs to understand about large language models: how transformers work, the model landscape, prompt engineering as a discipline, and when and how to fine-tune.

0/19
Tokenization Deep Dive

How LLMs break text into tokens, why BPE is the dominant algorithm, and the practical implications for cost, context limits, and multilingual performance. Includes hands-on token counting with tiktoken and cross-model comparisons.

beginner10 min
The Inference Pipeline

What actually happens when you call an LLM API -- from prompt tokenization through logit computation to output sampling. Understand KV caching, sampling strategies (temperature, top-p, top-k), batching, and how these choices affect output quality and latency.

intermediate11 min
Context Windows & Attention

What context windows really mean, why the 'lost in the middle' problem plagues long-context models, how attention patterns change at different positions, and practical strategies for working within context limits.

intermediate10 min
Why LLMs Hallucinate

LLMs hallucinate because they are statistical pattern matchers, not knowledge databases. Understand the types of hallucination, when they are most likely, practical mitigation strategies, and why designing around hallucination is more realistic than eliminating it.

intermediate9 min
Model Families Compared

A comprehensive comparison of the major LLM families: GPT (OpenAI), Claude (Anthropic), Gemini (Google), and leading open models (Llama, Mistral, Qwen). Pricing, capabilities, context windows, and when to use each.

beginner12 min
Open vs Closed Models

The trade-offs between closed-source API models (GPT-4, Claude) and open-weight models (Llama, Mistral). When self-hosting makes economic sense, licensing traps to avoid, and a decision framework for choosing between them.

intermediate10 min
Model Selection Framework

A systematic framework for choosing the right LLM for your use case across four dimensions: capability, cost, latency, and privacy. Includes model scorecards, multi-model strategies, fallback chains, and a working model router implementation.

intermediate11 min
Reading Benchmarks Critically

How to interpret LLM benchmarks without being misled. Covers major benchmarks (MMLU, HumanEval, MATH, Arena Elo), what they actually test, benchmark contamination, and how to build your own task-specific benchmark that actually matters.

intermediate9 min
Multimodal Models

How modern LLMs process images, audio, and video alongside text. Covers vision capabilities (image understanding, OCR, diagram analysis), audio features, current limitations, practical use cases, and working code examples for extracting structured data from images.

intermediate10 min
Prompt Anatomy

The structural components of an LLM prompt: system messages, user messages, and assistant messages. How each part influences model behavior, why system prompts are privileged, and practical demonstrations of how prompt structure transforms output quality.

beginner10 min
Techniques That Work

Evidence-based prompt engineering techniques: chain-of-thought reasoning, self-consistency, role prompting, and step-by-step decomposition. When each technique helps, when it hurts, and how to measure the improvement.

intermediate11 min
Structured Output Techniques

Getting reliable JSON, structured data, and type-safe outputs from LLMs. Covers JSON mode, function calling, constrained decoding, Pydantic validation, and handling partial/malformed output in streaming scenarios.

intermediate10 min
Systematic Prompt Iteration

How to version-control prompts, A/B test with statistical significance, build prompt test suites with golden examples, and run regression tests to ensure new prompts don't break old cases. A disciplined engineering approach to prompt development.

advanced10 min
When Prompting Isn't Enough

How to recognize when prompt engineering has hit its ceiling and what escalation path to take. Decision framework for: improve prompt -> add context (RAG) -> fine-tune -> change model, with cost-benefit analysis and real examples.

advanced9 min
When to Fine-Tune

A decision framework for choosing between prompt engineering, RAG, and fine-tuning. When fine-tuning is the right investment, when it is a waste of time, cost analysis comparing approaches, and the use cases where fine-tuning delivers the most value.

intermediate11 min
Training Data Engineering

How to prepare high-quality training data for LLM fine-tuning. Covers data formats, quality-over-quantity principles, data cleaning and deduplication, synthetic data generation, and a complete data preparation pipeline.

advanced12 min
LoRA & QLoRA

Parameter-efficient fine-tuning with LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA). How they work intuitively, why they need only 1-10% of the memory of full fine-tuning, how to choose hyperparameters (rank, alpha, target modules), and a complete configuration example with Hugging Face PEFT.

advanced11 min
End-to-End Fine-Tuning Pipeline

Complete fine-tuning pipelines for three approaches: OpenAI (simplest), Hugging Face + PEFT (most control), and cloud-managed (Vertex AI, Bedrock). Includes training monitoring, loss curve interpretation, overfitting detection, and a full working Hugging Face training script.

advanced12 min
Evaluating Fine-Tuned Models

How to rigorously evaluate fine-tuned LLMs: train/validation/test splitting for LLMs, detecting overfitting and benchmark contamination, A/B testing fine-tuned vs base models with real users, and a complete evaluation harness implementation.

advanced10 min