blakedemarest/bedrot-data-ecosystem

View CLAUDE.md
complex projects Updated 2026-02-10

Analysis

Category: Complex Projects Source: blakedemarest/bedrot-data-ecosystem CLAUDE.md: View Original License: MIT Stars: 1

Why This Example

Despite having just one star, this CLAUDE.md is a remarkably thorough document for a production data pipeline ecosystem. It covers a three-component architecture (Data Lake, Data Warehouse, Dashboard) with multi-zone ETL processing, semi-manual authentication design philosophy, service-specific implementation status tables, and detailed error resolution guides. It demonstrates how a CLAUDE.md can serve as both a development guide and an operations manual for a complex data engineering system.

Key Features That Make This Exemplary

1. Multi-Zone Data Lake Architecture

The document describes a six-zone data pipeline: Landing (raw ingestion), Raw (validated immutable copies), Staging (cleaned and transformed), Curated (business-ready), Archive (7+ year retention), and Automated Cronjob (orchestration). Each zone has a clear purpose and data flow direction, providing a complete mental model of the ETL process.

2. Semi-Manual Authentication Philosophy

A dedicated section explains the deliberate choice of semi-manual authentication for 2FA compliance and security. This is rare and valuable because it reframes what might appear as a limitation into a principled design decision, and explicitly tells AI assistants how to interpret authentication failures: "This is EXPECTED behavior" and "Not a critical error."

3. Service Implementation Status Matrix

A comprehensive table tracks each data source (Spotify, TikTok, DistroKid, Linktree, MetaAds, etc.) with columns for extractor status, cleaner status, priority level, and critical notes. This gives an AI assistant immediate visibility into what is built, what is missing, and what needs attention.

4. Critical Data Caveat Documentation

The document includes a prominently marked warning about Spotify stream double-counting between two data sources (tidy_daily_streams.csv and spotify_audience_curated_*.csv). This kind of domain-specific data quality caveat prevents AI assistants from making costly analytical mistakes.

5. Cookie Authentication Lifecycle Table

Service-specific cookie expiration times, refresh intervals, authentication strategies, and 2FA requirements are documented in a structured table. This operational knowledge is critical for maintaining the pipeline and enables an AI assistant to diagnose authentication issues without guesswork.

6. Error Resolution Guide with Severity Classification

Common error patterns are mapped to likely causes, resolutions, and severity levels (Low for cookie/auth issues, High for system issues, Medium for data issues). This structured troubleshooting guide enables AI assistants to provide appropriate responses based on error severity.

Key Takeaways

  1. Document Design Philosophy, Not Just Implementation - Explaining why a system uses semi-manual authentication (compliance, security) prevents AI assistants from "fixing" what is actually an intentional design decision.
  2. Include Data Quality Caveats - For data-intensive projects, documenting known data overlap or double-counting risks in the CLAUDE.md prevents AI assistants from producing incorrect analyses.
  3. Classify Errors by Severity - Providing a severity framework for common errors helps AI assistants prioritize responses and avoid escalating normal operational events as critical failures.

Attribution