I write when I figure something out.

AGENTS.md vs CLAUDE.md vs Cursor Rules Compared

Where AI coding agent instructions actually live in Codex, Claude Code, and Cursor. A full comparison.

April 2, 202629 min read

Best-of-N for Coding Agents: Smarter AI Solutions

Learn how Best-of-N sampling helps coding agents solve harder tasks by exploring multiple candidate solutions.

April 2, 202633 min read

Build a PR Reviewer Agent Stack

Design a multi-agent PR review workflow with explorer, reviewer, and docs-research subagents.

April 2, 202637 min read

Can Cursor Bugbot Replace First-Pass PR Review?

Opinionated review of what Cursor Bugbot catches well and what still needs human reviewers in your PR workflow.

April 2, 202626 min read

Claude Code Hooks: Turn Prompts Into Workflows

Learn how Claude Code hooks enforce real engineering workflows with PreToolUse, PostToolUse, and SessionStart.

April 2, 202639 min read

Claude Code Memory: CLAUDE.md vs Auto Memory

Learn what belongs in CLAUDE.md vs auto memory in Claude Code to build a reliable, persistent coding setup.

April 2, 202628 min read

Codex Skills vs Cursor Rules vs Claude Subagents

When to use Codex skills, Cursor rules, or Claude subagents. A practical guide to AI workflow design.

April 2, 202635 min read

Codex Subagents vs Claude Code Subagents

Compare Codex and Claude Code subagents on context isolation, cost, parallelism, and when they're worth it.

April 2, 202635 min read

Cursor Background Agents vs Codex Cloud Tasks

Compare Cursor background agents and Codex cloud tasks for async coding: remote execution, branch isolation, and more.

April 2, 202635 min read

Cursor Memories vs Claude Auto Memory Compared

Compare how Cursor memories and Claude auto memory store preferences and affect AI coding output quality.

April 2, 202635 min read

How to Write AGENTS.md That Codex Actually Follows

Learn how to write scoped, testable AGENTS.md instructions that Codex reliably follows instead of ignoring.

April 2, 202631 min read

Make Coding Agents Cite Docs, Not Hallucinate

Learn how Docs MCP servers and instruction files like AGENTS.md force AI coding agents to cite real documentation.

April 2, 202632 min read

MCP for Coding Agents Explained Simply

Learn how Model Context Protocol powers coding agents like Claude Code, Cursor, and Codex with tools and data.

April 2, 202630 min read

Run Coding Agents in a Monorepo Without Chaos

How to configure Claude Code, Cursor, and Codex for monorepo AI coding with nested instructions and scoped rules.

April 2, 202635 min read

Safest Way to Give Coding Agents Internet

How to give coding agents internet access safely with scoped permissions, MCP hardening, and network policies.

April 2, 202633 min read

A2A vs MCP: A Technical Comparison of the Two Agent Protocols That Matter

Two protocols now define how AI agents connect to the world. They solve different problems and are often confused for competitors.

March 29, 202617 min read

Advanced RAG: Contextual Retrieval, Late Chunking, and Hybrid Search Beyond Naive Splitting

The complete stack delivers a production RAG system that outperforms naive implementations on retrieval recall, answer accuracy, and end-to-end pipeline cost.

March 29, 202622 min read

AI Agent Evaluation: How to Benchmark MultiStep Agentic Systems in Production

SWE-bench established a high bar for software engineering agents in 2023 and became the dominant leaderboard, but it measures only one type of agent task.

March 29, 202623 min read

AI Agent Memory Architectures: Episodic, Semantic, and Working Memory in Production

When they are not, agents confidently repeat mistakes made three sessions ago and forget critical user preferences the moment a conversation ends.

March 29, 202633 min read

Anatomy of a Token: The Complete LLM Forward Pass in C

That is the LLM forward pass. I implemented every step in C for EdgeLM, our lightweight inference engine built to run transformer models without a GPU.

March 29, 202629 min read

BitNet 1.58bit Inference in Pure C: Ternary Weights, Packing, and Kernels

The math behind that claim is straightforward. When every weight is constrained to {-1, 0, 1}, matrix multiplication reduces to additions and subtractions.

March 29, 202617 min read

Building a Production MCP Server from Scratch

Step-by-step guide to building a production MCP server in Python: tool registration, input validation, error handling, authentication, and deployment.

March 29, 202620 min read

Building an LLM Browser Extension as a Solo Dev: Architecture, Pitfalls, and What I Shipped

Nobody writes about building browser extensions anymore. The market moved to standalone apps, Electron wrappers, and web-based SaaS.

March 29, 202616 min read

FlashAttention Internals: How IOAware Attention Rewrote Transformer Efficiency

How FlashAttention works technically: GPU memory hierarchy, tiling for SRAM, the online softmax trick, and FlashAttention-2 warp partitioning.

March 29, 202624 min read

KV Cache Internals: Compression, Quantization, and Eviction Strategies in Production LLMs

Understanding its internals is prerequisite to understanding why inference is expensive, why context length matters, and what the industry is doing about it.

March 29, 202628 min read

LLM Model Routing: Sending the Right Query to the Right Model to Cut Costs 2x

The second genuinely needs a frontier model at $0.02. Using GPT-4 or Claude Opus for everything is 20x overpaying for simple queries.

March 29, 202618 min read

MCP Security: Tool Poisoning, Rug Pulls, and What to Do About It

MCP security threats explained: tool poisoning, tool shadowing, rug pulls, and OAuth token theft, with concrete detection and mitigation code for builders.

March 29, 202620 min read

Memory Bandwidth Is All You Need

FLOPS (floating-point operations per second) measure how fast a chip can compute. For autoregressive LLM inference, computing is not the bottleneck.

March 29, 202617 min read

Mixture of Experts Inference: How to Run Sparse MoE Models Efficiently on Commodity Hardware

How sparse MoE models work: expert routing, activation patterns, memory layout, and inference optimization. Mixtral 47B activates only 13B parameters per token.

March 29, 202621 min read

Prompt Caching Economics: Cut LLM API Costs 90% With Intelligent Cache Architecture

When a provider's inference server has already computed the key-value (KV) cache for a sequence of tokens, it can reuse that computation instead of redoing it.

March 29, 202633 min read

Reasoning Models and Extended Thinking: A Practical Guide to Getting More From o1, Claude, and DeepSeekR1

For simple tasks like summarization, translation, factual lookup, and basic formatting, reasoning models provide no benefit over standard models at higher cost.

March 29, 202620 min read

Reliable AI Agent Pipelines: Orchestration, Retries, Circuit Breakers, and HumanintheLoop

Error taxonomy, retry strategies, circuit breakers, idempotent tool design, human-in-the-loop escalation gates, observability, and testing patterns.

March 29, 202630 min read

Sandboxed Code Execution for AI Agents: Security, Architecture, and Production Patterns

The boundary is enforced by sandboxing. Isolate the code execution in an environment with minimal capabilities and controlled resource limits.

March 29, 202618 min read

Speculative Decoding: How Draft Models Make LLMs 2–3x Faster Without Changing Output

Speculative decoding achieves 2–3x LLM inference speedup by having a small draft model guess ahead and a large target model verify in parallel.

March 29, 202620 min read

Structured Outputs and Constrained Decoding: Building LLM Pipelines That Never Return Broken JSON

The practical result: 100% format validity, at the cost of some computational overhead and occasional semantic degradation when the format constraint is tight.

March 29, 202624 min read

TestTime Compute Scaling: Why the Next Frontier in AI Is at Inference, Not Training

On AIME 2024 mathematics problems, o1 solved 83% compared to GPT-4o's 13%. On PhD-level science questions, o1 matched or exceeded PhD expert performance.

March 29, 202626 min read

The AI IDE Trap: Why I Switched Back to VS Code and Small Tools

Cursor, Windsurf, and AI-native IDEs are impressive. After six months, I switched back to VS Code with targeted tools. Here's what the AI IDE market gets wrong.

March 29, 202612 min read

The Founder's Guide to Autonomous SEO

Autonomous SEO means AI agents that research, write, and optimize content without prompting. What I learned building Authos and why this category matters.

March 29, 202616 min read

Topical Authority for AI Search: How to Get Cited by ChatGPT, Perplexity, and Google AI Overviews

Build content clusters that get cited in ChatGPT, Perplexity, and Google AI Overviews, not just indexed. With cluster templates and measurement strategies.

March 29, 202625 min read

What Gets Cited by ChatGPT and Perplexity: The GEO Playbook

Research shows GEO tactics boost AI search visibility by up to 40%. Here's exactly what makes content get cited by ChatGPT, Perplexity, and Google AI Overviews.

March 29, 202616 min read