Agentic Data Analysis with Claude Code
I started my career as a lowly analyst, scurrying around data warehouses and dredging up what was needed to support the mandate. "The mandate must be supported!" the CEO would proclaim. And how did we support it? With data-driven hypotheses, of course. I was a team player so I drove that data. I drove that data so hard it hypothesized. Evenings were spent constructing "If this, then this, because this" sentences. It was like mad libs but if I was being paid to hate it. I look back at that young version of myself, starry-eyed with big dreams of supporting mandates by driving data and I smile. Because now there is a better way.
Since that data analyst job, I've also been a scientist of data as well as an engineer. This family of roles broadly exists to serve a core "Ask a Question -> Get an Answer" cycle that I've gone through several thousand times. It decomposes into the following elements:
- Form a hypothesis that explains some observation
- Find relevant (SQL) tables
- Write queries that surface data from those tables
- Analyze this data
- Update the hypothesis, rinse and repeat until the question is answered
Pre-LLMs you'd ask a data analyst your question and they'd scurry through some version of this workflow before coming back with their answer. Frontier models have progressed to the point where they can handle most of these elements themselves. There are some prerequisites — they need to be able to access the data environment and it should be decently documented. Once those are in place you can put together a multi-agent system that is surprisingly capable of scurrying delivering answers.
Show, Don't Tell
This piece is about a multi-agent system that can act as a data analyst. It:
- Takes a question about a dataset
- Analyzes that dataset
- Generates an interactive report that attempts to answer that question.
You can find an example of a generated report here that answers a question about the StackOverflow public dataset. You can ask your own questions by cloning the repo here. It costs ~$20 in API tokens or ~75% of your five hour limit on the 5x Max Plan.
A Caveat
In its current form, this is not yet going to replace the work of an skilled data analyst. Yet. But it's not that far away. The main issues are with hypothesis generation and general data intuition. Given where we are now and where we are trying to go, overcoming these issues requires incremental improvements, not revolutionary ones. Especially at the current pace of model improvement. And what the models can do now is damn cool.
Methodology
The system is initiated via the /initial-analysis skill. If you don't pass a question when you invoke the skill you'll be prompted for one. From there Claude goes brrrrrrrr and between 15-25 minutes later delivers you a data analysis that's nicely packaged into a React web app. Here's what's happening under the hood (or rather, in the data center):
- The
table-findersubagent takes your question and scans your data warehouse for relevant tables. It returns a list of 4-6 tables. - Each table is passed to a
table-readersubagent in parallel. These subagents are responsible for:- Setting up the
WORKING_DIRECTORYfor the table - Accessing table metadata (schema, column descriptions, etc)
- Conducting the initial Research Loop™:
- Generating a query and saving it to
{N}_query.sql. - Running the query and saving the output to
{N}_result.csv. - Doing an initial analysis with Python and saving it to
{N}_analysis.json
- Generating a query and saving it to
- This research loop runs three times. Results from previous iterations inform the direction of subsequent iterations.
- The findings get consolidated and saved to
summary.md
- Setting up the
- The separate findings from the
table-readersubagents are consolidated into aninitial-analysis.mdfile. - For each table analyzed in step 2, a
table-reportsubagent is launched in parallel. For each research loop from step 2 these:- Load the outputs into context (
{N}_query.sql/{N}_result.csv/{N}_analysis.json) - Design 2-4 charts per query exploring different aspects of the data.
- Generate typescript for each output and chart using pre-defined components.
- Load the outputs into context (
- The
report-orchestratorsubagent consolidates everything into a final React report that gets served locally - As a final step,
chart-qasubagents are launched in parallel. These iterate through the charts generated for a given table and:- Download an image of the chart.
- Identify any obvious issues in the chart.
- Attempt to fix them.
In chart form:
Slop-pocalypse Now
"I love the smell of tokens in the morning"
— Kilgore, Apocalypse Now (probably)
One of the first sections in this piece, A Caveat, would have been better suited as our conclusion. I included it where I did because I wanted to set expectations early and give an honest reporting on LLM capabilities.
Unfortunately, we are in the midst of a slop-pocalypse. From AI-generated content that has begun to rot every social media platform to clickbait that is patently false and yet eagerly and widely shared, it's become extremely difficult to separate the signal (what are these models actually capable of?) from the noise (Google engineer says Claude rebuilt their entire system in an hour).
I hope reading this has given you some signal, some understanding of where these models are and where their true capabilities land. And maybe you'll use them to drive some data yourself.
Over the course of designing, implementing and refining this system, I've run more than 100 different versions of it. There are learnings, lots of learnings. But these are deserving of their own writeup (coming soon) :)