16 days ago |

ai
agentic
claude

How to Structure Agentic Workflows That Actually Work (Not Just a Bag of Tools)

You've been tinkering. You've written a CLAUDE.md file. Maybe you've set up some skill markdown files and a few custom slash commands. Things sort of work for simple stuff. But every time you throw a real, multi-step task at your agent, it falls apart. It picks the wrong tool. It doesn't ask when it should. It charges ahead when it shouldn't. It produces something that technically meets the brief but completely misses the point.

I've been there. And here's what I eventually figured out: the problem wasn't my tools. It was that I had a bag of parts and no blueprint.

This post is about building that blueprint — the mental model and configuration patterns that turn scattered agent configs into agentic workflows that actually behave like autonomous systems.

The Agentic Loop: Why Every Config Decision Maps Back to One Cycle

Before we get into files and folders, you need to internalize this. Every agentic workflow — whether it's a coding assistant, a research pipeline, or a report generator — runs the same fundamental loop:

Objective → Plan → Act → Observe → Reflect → (repeat or finish)

This isn't abstract AI theory from a whitepaper. It maps directly to what your configuration files should encode. When your agent breaks down mid-task, it's almost always because one stage in this loop is weak or missing entirely. The agent acts without planning. It observes output but doesn't verify whether the output is actually correct. It loops forever without knowing when to stop.

Here's the thing that changed everything for me: every agent markdown file, every skill file, and every custom command you write should map to a specific part of this loop. Once you see that, configuration stops feeling like guesswork.

Three Layers of Agentic Configuration (And Why Most People Only Build One)

Think of your entire agent setup as three concentric layers. Most developers build the middle layer — capabilities — and neglect the other two. That's why their agents work for demos but fall apart on real tasks.

Layer 1: Identity and Judgment (Your Agent Markdown File)

This is your CLAUDE.md or equivalent agent configuration file, and it's the most misunderstood piece of the entire stack.

Most people fill it with personality directives and tone guidelines. "Be helpful and thorough." "Use a professional tone." That's the least important thing this file can contain.

Your agent markdown file is where you encode decision-making policy — the rules that govern how the agent navigates ambiguity, risk, and autonomy. Three things matter here more than anything else.

Autonomy Boundaries

When should the agent proceed on its own, and when should it stop and ask? This is the single most consequential judgment call an agent makes, and most configurations say nothing about it.

Here's what a weak version looks like:

## Guidelines
- Be helpful and thorough
- Ask clarifying questions when needed

And here's a version that actually works:

## Autonomy Protocol

Proceed without confirmation when:
- The task affects a single file that the user explicitly named
- The change is additive (new file, new function) rather than destructive
- You have a skill file that directly matches the task

Stop and confirm when:
- The task requires modifying more than 3 existing files
- You're choosing between two architecturally different approaches
- The task involves deletion of any existing code or data
- Your confidence in the correct approach is below ~70%

When confirming, present your intended plan with reasoning,
not just a yes/no question.

The difference is enormous. The first version leaves the agent guessing on every decision. The second gives it a clear decision tree it can actually follow. And because the instructions are specific and few, they stay within the range of what frontier models can reliably attend to — something that matters a lot when you remember the system prompt is already eating into your instruction budget.

Error Recovery Protocol

Here's the failure mode nobody talks about: what happens when a tool call fails? When a file doesn't exist where expected? When test output doesn't match expectations?

Most agent configurations are totally silent on this. Which means the agent either retries the same broken thing blindly, hallucinates a fix, or just gives up. You need to tell it what to do.

## Error Recovery

When a command or tool call fails:
1. Read the full error output before attempting a fix
2. If the error is a missing dependency, install it and retry once
3. If the error is in code you just wrote, diagnose the root cause
   before editing — do not patch symptoms
4. If you've attempted the same fix twice with no progress,
   stop and report what you've tried and what you've observed
5. Never silently swallow errors or claim success
   when output contains warnings

This is the kind of instruction that feels too obvious to write down. But agents don't have the common sense you do. They need it spelled out.

Definition of Done

How does the agent know when to stop? Without explicit completion criteria, agents either under-deliver — stopping at the first thing that compiles — or over-deliver, endlessly refactoring code nobody asked them to touch.

## Completion Criteria

A task is complete when:
- All explicitly requested changes are implemented
- The code runs without errors (verify by executing, not by reading)
- Output files are in the location the user specified or expects
- You have provided a brief summary of what changed and why

A task is NOT complete just because code was written.
Verify your work before presenting it.

Notice something about all three of these sections? They're short. They're specific. They contain actual conditions, not vibes. This matters because LLM instruction-following quality degrades as instruction count increases. Your agent file should be lean and high-signal — not a 500-line constitution that the model tunes out halfway through.

Layer 2: Capabilities (Your Skill Files)

This is the layer most people start with, and for good reason — it's the most tangible. Skill files are the library of verified recipes your agent can draw from. You probably already have some organized by output type: one for creating Word docs, one for spreadsheets, one for PDFs.

That's a fine starting point. But here's the mental shift that makes skills actually useful in an agentic context: think of skills not as "things the agent can do" but as verified paths that collapse decision space.

An agent without a skill file for creating presentations will spend cycles figuring out which Python library to use, what the API looks like, and what patterns produce clean output. Every one of those decisions is a place where it can go wrong. A good skill file eliminates all of that exploration. It says: here's the one known-good path, take it and don't improvise.

To make skills work within agentic design patterns, each one needs three additions most people skip.

Pre-Conditions (When to Select This Skill)

## When to Use This Skill

Use this skill when:
- The user requests a `.pptx` file or mentions "slides,"
  "presentation," or "deck"
- The task involves converting content into presentation format
- An existing `.pptx` file needs to be read, edited, or extended

Do NOT use this skill when:
- The user wants a PDF or document (see: pdf skill, docx skill)
- The output is a single-page visual (consider: HTML or SVG instead)

Post-Conditions (How to Verify It Worked)

## Verification

After generating the presentation:
- Open the file programmatically and confirm slide count
  matches expectations
- Verify that no slide contains placeholder or template text
- Confirm the file is saved to the correct output directory
- Check file size is reasonable (a 0KB file means something failed)

Failure Modes (What to Do When It Breaks)

## Common Failures

- If python-pptx throws a PackageNotFoundError,
  the template path is wrong — verify it exists before retrying
- If images fail to insert, check that the image path is absolute,
  not relative
- If text overflows a placeholder, split content across
  multiple slides rather than shrinking font below 14pt

With these three additions, your skills become something the agent can reason about. It can look at a task, check pre-conditions across multiple skills, select the right one, execute it, verify via post-conditions, and recover using documented failure modes. That's a real agentic loop — plan, act, observe, reflect — not just a tool call.

Composing Skills for Multi-Step Agent Workflows

Beyond individual skill files, this is where agentic development gets really interesting. Consider creating higher-level workflow skills that chain capabilities together:

# Skill: Deliver Analytical Report

## Overview
Produces a polished analytical report from raw data.
Chains multiple sub-skills in sequence.

## Workflow
1. **Ingest** — Read the source file (CSV, Excel, JSON).
   Use the xlsx skill if the source is a spreadsheet.
2. **Analyze** — Compute summary statistics, identify trends,
   flag anomalies. Use Python with pandas.
3. **Visualize** — Generate 2-4 charts supporting the key findings.
   Save as PNG files.
4. **Compose** — Assemble findings into a Word document
   using the docx skill. Embed charts inline.
5. **Verify** — Open the final document, confirm all sections
   are populated, charts render, and page count is reasonable.

## Decision Points
- If the data has fewer than 50 rows, skip statistical analysis
  and focus on descriptive summary
- If the user specified PDF as the output format,
  use the pdf skill in step 4 instead of docx
- If chart generation fails, deliver the report with data tables
  instead — do not block the entire workflow on visualization

This kind of skill composition gives the agent a plan template — a known-good sequence with built-in branching logic. It bridges the gap between individual tool capabilities and the kind of coherent, multi-step workflows that actually deliver results.

Layer 3: Orchestration (Your Custom Commands and Slash Commands)

Custom commands are the entry points that kick off agentic workflows. They're where user intent meets agent execution. And most of them fall into one of two traps.

Trap 1: Too granular. The command is basically just an alias for a single tool call. No added value.

Trap 2: Too vague. The command just restates the user's request without adding any structure or workflow logic.

Neither is useful for agentic coding or development. A good command encodes a workflow skeleton: the predictable phases of a task, the decision points between them, and the contract for what gets delivered at the end.

Here's a vague command:

# /generate-report
Generate a report based on the user's data.

And here's one that actually drives an agentic workflow:

# /generate-report

## Trigger
User wants an analytical report from a data file.

## Input Requirements
- At least one data file (CSV, XLSX, or JSON)
- Optional: specific questions or focus areas

## Workflow Phases

### Phase 1: Understand
- Read the data file and profile its structure
- Identify column types, row count, data quality issues
- If missing values exceed 30% or columns are ambiguous,
  report findings and ask for guidance before proceeding

### Phase 2: Analyze
- Compute summary statistics for numeric columns
- Identify top 3-5 trends or patterns
- If the user specified focus areas, prioritize those

### Phase 3: Build
- Default to docx unless the user specified otherwise
- Use the "deliver-report" skill to assemble the document
- Include: executive summary, key findings with charts,
  data appendix

### Phase 4: Deliver
- Save the final file to the output directory
- Present a 2-3 sentence summary of key findings
- Provide the file for download

## Guardrails
- Do not alter the source data
- Do not make causal claims from correlational data
- If the dataset contains PII, flag it and ask before
  including raw data in the report

This command gives the agent everything it needs: a clear sequence of phases, explicit decision points, a defined output contract, and boundaries it shouldn't cross. It's a workflow skeleton that provides structure while still leaving room for the agent to adapt within each phase.

Putting It All Together: A Real Agentic Workflow in Action

Theory is nice. Let's see what a well-configured agentic workflow actually looks like end to end. Imagine a user says: "Here's our Q3 sales data. Build me a report with insights on regional performance."

Step 1 — Orchestration kicks in. The system matches this to the /generate-report command. The agent now has a workflow skeleton to follow, not a blank slate.

Step 2 — Judgment at the first decision point. The agent reads the data, discovers some columns have 40% missing values. It checks its autonomy protocol: missing values above 30% means stop and ask. So it reports the issue and asks how the user wants to handle it.

Step 3 — The user decides. They say drop incomplete rows. The agent proceeds to analysis. The capability layer provides the right skill files: pandas for analysis, matplotlib for charts, the docx skill for assembly.

Step 4 — Error recovery in action. Chart generation fails due to a matplotlib version issue. The agent checks its error recovery protocol in both the skill file (which says fall back to data tables) and the agent file (which says try fixing dependencies first). It installs the right version, retries, succeeds.

Step 5 — Verification and delivery. The agent assembles the report using the docx skill, checks the post-conditions (page count, no placeholder text, charts embedded), and delivers the file with a brief summary.

At every stage, the agent drew on specific parts of the configuration. Nothing was left to improvisation. It had structure when it needed it and flexibility within that structure.

The Four-Part Litmus Test for Your Agent Configuration

How do you know your agentic setup is actually working? Hand the agent a moderately complex task and watch for these four behaviors:

It plans before it acts. Instead of immediately writing code or calling tools, it states what it intends to do and in what order. This comes from having workflow skeletons in your commands and composite skill files.

It selects the right capability without being told. You say "build me a presentation" and it reaches for the pptx skill — not because you pointed it there, but because the pre-conditions in the skill file matched the request. This comes from having clear, specific pre-conditions on every skill.

It recovers from failure without your intervention. A dependency is missing, a file path is wrong, an API call errors out. The agent diagnoses the issue, applies its recovery protocol, and continues. This comes from having error recovery policies in your agent file and failure modes documented in your skills.

It knows when it's done. It doesn't just stop — it verifies its output against the completion criteria and delivers a coherent result. This comes from having post-conditions and a definition of done.

If any of these four things break down, the fix is almost always in one of the three configuration layers described above — not in adding more tools or capabilities.

Where to Start (Without Rewriting Everything)

If you're refactoring an existing agentic setup, don't try to overhaul everything at once. Start with three high-leverage changes:

First, add an autonomy protocol to your agent markdown file. This single addition will eliminate the most common failure mode: the agent charging ahead when it should have asked, or asking for permission when it should have just done the thing. Keep it under 20 lines. Be specific about conditions, not vibes.

Second, add pre-conditions and failure modes to your two most-used skills. This teaches the agent to reason about which skill to select and what to do when things go sideways — both critical capabilities for agentic workflows that span multiple steps.

Third, take your most common complex task and write a proper command for it — with phases, decision points, and an output contract. This gives the agent the workflow structure it needs to move from "I was told to do something" to "I know how to break this down and execute it step by step."

Then watch. Watch where the agent still stumbles, and let that guide your next iteration. Agentic development is itself an iterative loop — observe what broke, reflect on why, improve the config, repeat.

The agents that work best aren't the ones with the most tools. They're the ones whose developers kept tightening the feedback loop between what the agent did and what the configuration told it to do.

Building agentic workflows? I'd love to hear what configuration patterns are working for you — and which ones you're still figuring out. Drop me a message or share your approach.

About Me

Hi, I am Full Stack Developer. I am passionate about JavaScript, and find myself working on a lot of React based projects.