Hierarchical Context Loading: Why Progressive Disclosure Beats Monolithic Prompts

A three-tier architecture that loads AI context progressively instead of all at once. Using /create-skill as a real example of how navigation → skill → execution prevents token waste and enables session persistence.

Cyberpunk anime: A figure navigates hierarchical tiers of floating knowledge - navigation, agents, and skills - pulling only needed context from the void

Your AI assistant can do more than you think. It just needs better scaffolding.

Every session starts from zero. You explain context, re-establish workflows, reload templates. The AI has capability—it just lacks architecture to use it consistently across sessions.

That's the problem hierarchical context loading solves. Instead of dumping 8,000 lines of context into every session, you load three tiers progressively: navigation (what exists), skills (how to execute), and agents (who does it).

Following Anthropic's research on building effective agents, the solution is better orchestration—not bigger models.

The Context Overload Problem

Most AI frameworks do this:

User: "Create a new skill"
     ↓
AI loads EVERYTHING:
     ├── All agents (800+ lines)
     ├── All skills (5,000+ lines)
     ├── All templates (2,000+ lines)
     ├── All workflows (3,000+ lines)
     └── Total: 10,000+ lines loaded

Problems:

  • Wastes tokens on irrelevant context
  • Slow session startup
  • Hard to maintain (update one thing, breaks everywhere)
  • Can't fit large reference materials in context
  • AI gets lost in noise

The Solution: Three-Tier Progressive Loading

Instead of loading everything, load three tiers on-demand:

┌─────────────────────────────────────────────────────────┐
│  Tier 1: CLAUDE.md (~315 lines)                         │
│  "Where do I go? What exists?"                          │
│                                                          │
│  ├─ Agent registry (security, writer, engineer)         │
│  ├─ Skill directory (what skills exist)                 │
│  ├─ Routing rules (which agent handles what)            │
│  └─ Global preferences                                  │
└─────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────┐
│  Tier 2: skills/[skill-name]/SKILL.md (~300-500 lines)  │
│  "How do I execute this?"                               │
│                                                          │
│  ├─ Skill identity and purpose                          │
│  ├─ 5-phase workflow structure                          │
│  ├─ Tool and script inventory                           │
│  ├─ Pointers to supporting docs                         │
│  └─ Success criteria                                    │
└─────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────┐
│  Tier 3: Execution Context (on-demand)                  │
│  "What do I need right now?"                            │
│                                                          │
│  Phase 1 → requirements-questions.md (~200 lines)       │
│  Phase 2 → naming-conventions.md                        │
│  Phase 3 → templates/ (loaded one at a time)            │
│  Phase 4 → quality-checklist.md                         │
│  Phase 5 → (no extra docs needed)                       │
└─────────────────────────────────────────────────────────┘

Result: Dramatic token reduction while maintaining full capability.

Real Example: The /create-skill Workflow

Let me show you how this works using the actual /create-skill command—the skill that creates new skills.

When you type /create-skill, here's what loads:

Tier 1: Navigation (CLAUDE.md)

File: CLAUDE.md (315 lines)
Purpose: "What skills exist and where do I find them?"

## Available Skills
- create-skill → Interactive skill scaffolding wizard

## Agent Routing
- Skill creation → engineer agent (for complex workflows)
- Simple templates → Base Claude (no agent needed)

That's it. Just a pointer to the skill. No workflows, no templates, no methodology.

Tier 2: Skill Execution (SKILL.md)

File: skills/create-skill/SKILL.md (458 lines)
Purpose: "How do I create a skill? What's the process?"

Now the skill loads its own context:

# Create-Skill Workflow

## 5-Phase Process:
1. DISCOVER - Gather requirements
2. DESIGN - Plan structure
3. GENERATE - Create files from templates
4. VALIDATE - Run quality checks
5. HANDOFF - Guide user to customize

## Templates Used:
- skills/create-skill/templates/SKILL-TEMPLATE.md
- skills/create-skill/templates/README-TEMPLATE.md
- skills/create-skill/templates/VERIFY-TEMPLATE.md
- skills/create-skill/templates/phases/PHASE-TEMPLATE.md

## Supporting Docs (load on-demand):
- docs/requirements-questions.md
- docs/naming-conventions.md
- docs/skill-structure-standards.md

Notice: The SKILL.md doesn't contain the templates themselves. It just points to them.

Tier 3: On-Demand Reference

Only when needed, the skill loads specific supporting docs:

Phase 1 (DISCOVER) → Load requirements-questions.md
Phase 2 (DESIGN) → Load naming-conventions.md
Phase 3 (GENERATE) → Load templates one at a time
Phase 4 (VALIDATE) → Load quality checklist
Phase 5 (HANDOFF) → Nothing extra needed

Total loaded during Phase 1:

  • CLAUDE.md: 315 lines
  • SKILL.md: 458 lines
  • requirements-questions.md: ~200 lines
  • Total: ~973 lines

Traditional approach would load: 10,000+ lines (everything)

That's an order of magnitude reduction while maintaining full capability.

How This Enables Session Persistence

Here's where it gets powerful.

Traditional approach:

Session 1: User starts creating skill
Session 2 (next day): AI has no memory
           User re-explains everything
           AI reloads all context

Hierarchical approach with checkpointing:

┌──────────────────────────────────────────────────────────┐
│  SESSION 1: Initial Work                                 │
└──────────────────────────────────────────────────────────┘
    User: "Create new skill for API testing"
      ↓
    AI loads context:
      ├─ CLAUDE.md (315 lines)
      ├─ skills/create-skill/SKILL.md (458 lines)
      └─ requirements-questions.md (Phase 1)
      ↓
    Work completes Phase 2 (DESIGN)
      ↓
    AI creates checkpoint:
      sessions/2026-01-29-api-testing-skill.md
      ├─ Skill name: api-testing
      ├─ Phase completed: 2 (DESIGN)
      ├─ Decisions: REST focus, TypeScript, oauth support
      └─ Files created: SKILL.md draft, README.md draft

┌──────────────────────────────────────────────────────────┐
│  SESSION 2: Resume Work (next day)                       │
└──────────────────────────────────────────────────────────┘
    User: "Continue the API testing skill"
      ↓
    AI loads same context:
      ├─ CLAUDE.md (315 lines)
      ├─ skills/create-skill/SKILL.md (458 lines)
      └─ sessions/2026-01-29-api-testing-skill.md (checkpoint)
      ↓
    AI reads checkpoint and knows:
      ├─ Current phase: 3 (GENERATE)
      ├─ Context: REST API testing, TypeScript, oauth
      └─ Next action: Load templates, generate files
      ↓
    Continues exactly where we left off

The checkpoint file stores:

  • Which skill was being created
  • Which phase we completed
  • Decisions already made
  • Files already created

Combined with hierarchical loading, the AI can resume work across sessions because it knows:

  1. Where it is (checkpoint file)
  2. What to do (SKILL.md workflow)
  3. How to do it (on-demand templates)

The Three Context Tiers Explained

Tier 1: Organization (CLAUDE.md)

Size: ~315 lines
Purpose: Navigation layer

Contains:

  • Agent registry (security, writer, engineer, advisor, legal)
  • Skill directory (what skills exist)
  • Routing rules (which agent handles what)
  • Critical global requirements

Does NOT contain:

  • Workflows
  • Methodologies
  • Templates
  • Tool documentation
  • Implementation details

Update when:

  • New skill added
  • New agent created
  • Routing rules change

Tier 2: Skills (skills/*/SKILL.md)

Size: 300-500 lines per skill
Purpose: Complete skill context with progressive loading

Contains:

  • Skill identity and purpose
  • 5-phase workflow structure
  • Tool and script inventory
  • Pointers to supporting docs
  • Success criteria
  • Output specifications

Progressive pattern:

SKILL.md acts as navigation to:
├── docs/ - Reference documentation
├── templates/ - Output templates
├── workflows/ - Detailed procedures
└── phases/ - Step-by-step execution

Update when:

  • Workflow changes
  • New tools added
  • Methodology refined

Tier 3: Agents (agents/*.md)

Size: ~170-190 lines per agent
Purpose: Agent identity and specialized behavior

Contains:

  • Agent role definition
  • Communication style
  • Context loading instructions
  • Skill routing (which skills this agent uses)

Example:

# Engineer Agent

## Core Identity
Implementation specialist. Infrastructure, remediation, deployment.

## Skills Used
- create-skill (skill scaffolding)
- infrastructure-ops (deployment automation)
- remediation (fix security findings)

## Context Loading
1. Read CLAUDE.md
2. Load skill based on user request
3. Read session checkpoint (if exists)
4. Execute skill workflow

Update when:

  • Agent behavior changes
  • New skills become available
  • Communication style evolves

Catalog-Based Discovery

To prevent even the navigation layer from becoming bloated, the framework uses catalog files:

library/catalogs/COMMANDS.md:

Complete list of all slash commands
├── /create-skill → create-skill skill (public)
├── /pentest → security skill (private)
├── /career → career skill (public)
└── [47 total commands]

library/catalogs/TOOL-CATALOG.md:

Complete API client and utility inventory
├── Ghost CMS client (authenticated)
├── OpenAI client (authenticated)
└── [Tool status and authentication requirements]

Why catalogs?

  • CLAUDE.md stays concise (points to catalog)
  • Catalog can grow without bloating navigation
  • Easy to search and reference
  • Automated filtering for public/private split

Enforcing the Architecture

Pre-commit hooks enforce size limits:

# Validation before every commit
hooks/validate-context-sizes.sh

Checks:
✓ CLAUDE.md < 400 lines (currently 315)
✓ agents/*.md < 200 lines (currently ~186)
✓ skills/*/SKILL.md < 500 lines (currently ~300-458)

If violated → Commit blocked

Why strict limits?

Without enforcement, entropy wins. Files bloat. Context becomes noise. The architecture degrades.

Hard limits force:

  • Separation of concerns
  • Progressive disclosure
  • Catalog-based discovery
  • On-demand loading

Benefits in Practice

Token efficiency:

  • Load only what's needed for current task
  • Avoid dumping entire framework into every session
  • Enables larger reference materials by not front-loading them

Maintainability:

  • Update one skill file, not monolithic config
  • Clear separation of what goes where
  • Easy to find and modify specific workflows

Scalability:

  • Add new skills without bloating navigation
  • Skills self-contained (input/, output/, scripts/)
  • Catalog grows independently of core files

Session persistence:

  • Checkpoint files store session state
  • Hierarchical loading brings back exact context
  • Resume multi-day work exactly where you left off

Try It Yourself

The Intelligence Adjacent framework implements this architecture:

# Clone
git clone https://github.com/notchrisgroves/ia-framework.git

# Install (creates ~/.claude symlink)
./setup/install.sh

# Try creating a skill
/create-skill

# Observe hierarchical loading
# 1. CLAUDE.md loads first (navigation)
# 2. create-skill/SKILL.md loads (execution)
# 3. Supporting docs load on-demand (templates)

Watch the context load progressively. Notice:

  • Fast startup (small nav file)
  • Clear workflow (SKILL.md)
  • On-demand templates (only when needed)

The Philosophy

Intelligence Adjacent means AI working alongside human intelligence—not replacing it.

Hierarchical context loading is scaffolding. It gives AI:

  • Context without overload
  • Methodology without rigidity
  • Memory across sessions
  • Capability without complexity

The AI handles execution. You handle judgment.

Orchestration over intelligence. Architecture over scale. Augmentation over automation.

That's hierarchical context loading in the IA framework.


Stay Updated

Subscribe for free to get weekly posts on AI systems, framework architecture, and building capability without gatekeeping.


Sources

Framework & Architecture

Implementation References