Hierarchical Context Loading: Why Progressive Disclosure Beats Monolithic Prompts
A three-tier architecture that loads AI context progressively instead of all at once. Using /create-skill as a real example of how navigation → skill → execution prevents token waste and enables session persistence.
Your AI assistant can do more than you think. It just needs better scaffolding.
Every session starts from zero. You explain context, re-establish workflows, reload templates. The AI has capability—it just lacks architecture to use it consistently across sessions.
That's the problem hierarchical context loading solves. Instead of dumping 8,000 lines of context into every session, you load three tiers progressively: navigation (what exists), skills (how to execute), and agents (who does it).
Following Anthropic's research on building effective agents, the solution is better orchestration—not bigger models.
The Context Overload Problem
Most AI frameworks do this:
User: "Create a new skill"
↓
AI loads EVERYTHING:
├── All agents (800+ lines)
├── All skills (5,000+ lines)
├── All templates (2,000+ lines)
├── All workflows (3,000+ lines)
└── Total: 10,000+ lines loaded
Problems:
- Wastes tokens on irrelevant context
- Slow session startup
- Hard to maintain (update one thing, breaks everywhere)
- Can't fit large reference materials in context
- AI gets lost in noise
The Solution: Three-Tier Progressive Loading
Instead of loading everything, load three tiers on-demand:
┌─────────────────────────────────────────────────────────┐
│ Tier 1: CLAUDE.md (~315 lines) │
│ "Where do I go? What exists?" │
│ │
│ ├─ Agent registry (security, writer, engineer) │
│ ├─ Skill directory (what skills exist) │
│ ├─ Routing rules (which agent handles what) │
│ └─ Global preferences │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Tier 2: skills/[skill-name]/SKILL.md (~300-500 lines) │
│ "How do I execute this?" │
│ │
│ ├─ Skill identity and purpose │
│ ├─ 5-phase workflow structure │
│ ├─ Tool and script inventory │
│ ├─ Pointers to supporting docs │
│ └─ Success criteria │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Tier 3: Execution Context (on-demand) │
│ "What do I need right now?" │
│ │
│ Phase 1 → requirements-questions.md (~200 lines) │
│ Phase 2 → naming-conventions.md │
│ Phase 3 → templates/ (loaded one at a time) │
│ Phase 4 → quality-checklist.md │
│ Phase 5 → (no extra docs needed) │
└─────────────────────────────────────────────────────────┘
Result: Dramatic token reduction while maintaining full capability.
Real Example: The /create-skill Workflow
Let me show you how this works using the actual /create-skill command—the skill that creates new skills.
When you type /create-skill, here's what loads:
Tier 1: Navigation (CLAUDE.md)
File: CLAUDE.md (315 lines)
Purpose: "What skills exist and where do I find them?"
## Available Skills
- create-skill → Interactive skill scaffolding wizard
## Agent Routing
- Skill creation → engineer agent (for complex workflows)
- Simple templates → Base Claude (no agent needed)
That's it. Just a pointer to the skill. No workflows, no templates, no methodology.
Tier 2: Skill Execution (SKILL.md)
File: skills/create-skill/SKILL.md (458 lines)
Purpose: "How do I create a skill? What's the process?"
Now the skill loads its own context:
# Create-Skill Workflow
## 5-Phase Process:
1. DISCOVER - Gather requirements
2. DESIGN - Plan structure
3. GENERATE - Create files from templates
4. VALIDATE - Run quality checks
5. HANDOFF - Guide user to customize
## Templates Used:
- skills/create-skill/templates/SKILL-TEMPLATE.md
- skills/create-skill/templates/README-TEMPLATE.md
- skills/create-skill/templates/VERIFY-TEMPLATE.md
- skills/create-skill/templates/phases/PHASE-TEMPLATE.md
## Supporting Docs (load on-demand):
- docs/requirements-questions.md
- docs/naming-conventions.md
- docs/skill-structure-standards.md
Notice: The SKILL.md doesn't contain the templates themselves. It just points to them.
Tier 3: On-Demand Reference
Only when needed, the skill loads specific supporting docs:
Phase 1 (DISCOVER) → Load requirements-questions.md
Phase 2 (DESIGN) → Load naming-conventions.md
Phase 3 (GENERATE) → Load templates one at a time
Phase 4 (VALIDATE) → Load quality checklist
Phase 5 (HANDOFF) → Nothing extra needed
Total loaded during Phase 1:
- CLAUDE.md: 315 lines
- SKILL.md: 458 lines
- requirements-questions.md: ~200 lines
- Total: ~973 lines
Traditional approach would load: 10,000+ lines (everything)
That's an order of magnitude reduction while maintaining full capability.
How This Enables Session Persistence
Here's where it gets powerful.
Traditional approach:
Session 1: User starts creating skill
Session 2 (next day): AI has no memory
User re-explains everything
AI reloads all context
Hierarchical approach with checkpointing:
┌──────────────────────────────────────────────────────────┐
│ SESSION 1: Initial Work │
└──────────────────────────────────────────────────────────┘
User: "Create new skill for API testing"
↓
AI loads context:
├─ CLAUDE.md (315 lines)
├─ skills/create-skill/SKILL.md (458 lines)
└─ requirements-questions.md (Phase 1)
↓
Work completes Phase 2 (DESIGN)
↓
AI creates checkpoint:
sessions/2026-01-29-api-testing-skill.md
├─ Skill name: api-testing
├─ Phase completed: 2 (DESIGN)
├─ Decisions: REST focus, TypeScript, oauth support
└─ Files created: SKILL.md draft, README.md draft
┌──────────────────────────────────────────────────────────┐
│ SESSION 2: Resume Work (next day) │
└──────────────────────────────────────────────────────────┘
User: "Continue the API testing skill"
↓
AI loads same context:
├─ CLAUDE.md (315 lines)
├─ skills/create-skill/SKILL.md (458 lines)
└─ sessions/2026-01-29-api-testing-skill.md (checkpoint)
↓
AI reads checkpoint and knows:
├─ Current phase: 3 (GENERATE)
├─ Context: REST API testing, TypeScript, oauth
└─ Next action: Load templates, generate files
↓
Continues exactly where we left off
The checkpoint file stores:
- Which skill was being created
- Which phase we completed
- Decisions already made
- Files already created
Combined with hierarchical loading, the AI can resume work across sessions because it knows:
- Where it is (checkpoint file)
- What to do (SKILL.md workflow)
- How to do it (on-demand templates)
The Three Context Tiers Explained
Tier 1: Organization (CLAUDE.md)
Size: ~315 lines
Purpose: Navigation layer
Contains:
- Agent registry (security, writer, engineer, advisor, legal)
- Skill directory (what skills exist)
- Routing rules (which agent handles what)
- Critical global requirements
Does NOT contain:
- Workflows
- Methodologies
- Templates
- Tool documentation
- Implementation details
Update when:
- New skill added
- New agent created
- Routing rules change
Tier 2: Skills (skills/*/SKILL.md)
Size: 300-500 lines per skill
Purpose: Complete skill context with progressive loading
Contains:
- Skill identity and purpose
- 5-phase workflow structure
- Tool and script inventory
- Pointers to supporting docs
- Success criteria
- Output specifications
Progressive pattern:
SKILL.md acts as navigation to:
├── docs/ - Reference documentation
├── templates/ - Output templates
├── workflows/ - Detailed procedures
└── phases/ - Step-by-step execution
Update when:
- Workflow changes
- New tools added
- Methodology refined
Tier 3: Agents (agents/*.md)
Size: ~170-190 lines per agent
Purpose: Agent identity and specialized behavior
Contains:
- Agent role definition
- Communication style
- Context loading instructions
- Skill routing (which skills this agent uses)
Example:
# Engineer Agent
## Core Identity
Implementation specialist. Infrastructure, remediation, deployment.
## Skills Used
- create-skill (skill scaffolding)
- infrastructure-ops (deployment automation)
- remediation (fix security findings)
## Context Loading
1. Read CLAUDE.md
2. Load skill based on user request
3. Read session checkpoint (if exists)
4. Execute skill workflow
Update when:
- Agent behavior changes
- New skills become available
- Communication style evolves
Catalog-Based Discovery
To prevent even the navigation layer from becoming bloated, the framework uses catalog files:
library/catalogs/COMMANDS.md:
Complete list of all slash commands
├── /create-skill → create-skill skill (public)
├── /pentest → security skill (private)
├── /career → career skill (public)
└── [47 total commands]
library/catalogs/TOOL-CATALOG.md:
Complete API client and utility inventory
├── Ghost CMS client (authenticated)
├── OpenAI client (authenticated)
└── [Tool status and authentication requirements]
Why catalogs?
- CLAUDE.md stays concise (points to catalog)
- Catalog can grow without bloating navigation
- Easy to search and reference
- Automated filtering for public/private split
Enforcing the Architecture
Pre-commit hooks enforce size limits:
# Validation before every commit
hooks/validate-context-sizes.sh
Checks:
✓ CLAUDE.md < 400 lines (currently 315)
✓ agents/*.md < 200 lines (currently ~186)
✓ skills/*/SKILL.md < 500 lines (currently ~300-458)
If violated → Commit blocked
Why strict limits?
Without enforcement, entropy wins. Files bloat. Context becomes noise. The architecture degrades.
Hard limits force:
- Separation of concerns
- Progressive disclosure
- Catalog-based discovery
- On-demand loading
Benefits in Practice
Token efficiency:
- Load only what's needed for current task
- Avoid dumping entire framework into every session
- Enables larger reference materials by not front-loading them
Maintainability:
- Update one skill file, not monolithic config
- Clear separation of what goes where
- Easy to find and modify specific workflows
Scalability:
- Add new skills without bloating navigation
- Skills self-contained (input/, output/, scripts/)
- Catalog grows independently of core files
Session persistence:
- Checkpoint files store session state
- Hierarchical loading brings back exact context
- Resume multi-day work exactly where you left off
Try It Yourself
The Intelligence Adjacent framework implements this architecture:
# Clone
git clone https://github.com/notchrisgroves/ia-framework.git
# Install (creates ~/.claude symlink)
./setup/install.sh
# Try creating a skill
/create-skill
# Observe hierarchical loading
# 1. CLAUDE.md loads first (navigation)
# 2. create-skill/SKILL.md loads (execution)
# 3. Supporting docs load on-demand (templates)
Watch the context load progressively. Notice:
- Fast startup (small nav file)
- Clear workflow (SKILL.md)
- On-demand templates (only when needed)
The Philosophy
Intelligence Adjacent means AI working alongside human intelligence—not replacing it.
Hierarchical context loading is scaffolding. It gives AI:
- Context without overload
- Methodology without rigidity
- Memory across sessions
- Capability without complexity
The AI handles execution. You handle judgment.
Orchestration over intelligence. Architecture over scale. Augmentation over automation.
That's hierarchical context loading in the IA framework.
Stay Updated
Subscribe for free to get weekly posts on AI systems, framework architecture, and building capability without gatekeeping.
Sources
Framework & Architecture
- IA Framework Repository - Full source code and documentation
- Building Effective Agents - Anthropic - Prompt chaining and workflow orchestration patterns
- Context Management Patterns - Anthropic guidance on managing large context windows
Implementation References
- Create-Skill Source - Complete /create-skill implementation
- Hierarchical Context Loading Documentation - Architecture specification
- Agent Format Standards - Size limits and structure enforcement