AI agent configuration files from top Data Engineering repositories on GitHub.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Spark - A unified analytics engine for large-scale data processing
AGENTS.md for Apache Spark — unified analytics engine. Covers RDD, DataFrame API, and distributed execution patterns.
AGENTS.md for Apache Airflow — workflow orchestration. Covers DAG definition, operators, and scheduler architecture.
AGENTS.md for Apache Kafka — distributed event streaming platform. Covers broker architecture, partitioning, and Java patterns.
CLAUDE.md for dbt — data build tool for analytics. Covers model dependencies, Jinja macros, and adapter patterns.
AGENTS.md for Datacoves — dbt + DuckDB analytics platform. Covers data pipeline patterns, YAML conventions, and SQL standards.