“Testing an Airflow DAG” is the kind of task most teams keep punting on, with reasons like:
- “It needs webserver + scheduler — too heavy.”
- “The DAG calls GCP; I don’t have local credentials.”
- “Even if I test it, the only thing I can check is whether it imports.”
This project uses a lightweight pattern that removes all three. CI runs it in under a second and covers every non-external code path in the DAG module.
The key trick: stub Airflow before importing the DAG
1 | os.environ.setdefault("GCP_PROJECT_ID", "test-project") |
What this gives you:
- No need to
pip install apache-airflow(fast CI) - No GCP credentials needed (
GCP_PROJECT_IDis any string) PythonOperatorandBigQueryInsertJobOperatorare MagicMocks, but the callables inside the DAG are still real Python
That lets you unit-test the actual logic of every task function.
Three test categories that earn their keep
Category 1: helper functions
1 | def test_year_month_valid(self): |
_year_month is at the center of params validation — one bad character in the month string and the whole pipeline misfires. Pure function, cheapest tests, biggest leverage.
Category 2: side-effecting logic (mock the network/IO)
1 |
|
Two things at once:
- Request count = number of months (no skips, no duplicates)
- Files land at the right paths (tempdir replaces
DOWNLOAD_ROOT)
And the negative case is usually the one people skip:
1 | def test_raises_on_tiny_file(self, mock_get): |
This exercises the “small file = failure” guard. Without a test it’s effectively placeholder code.
Category 3: configuration / static data shape
1 | def test_resource_dict_shape(self): |
EXTERNAL_TABLE_RESOURCE is a hand-rolled nested dict — the kind that loves to lose a key during a copy-paste. Shape tests don’t catch business bugs, but they catch the “wrong key name in a PR” class instantly.
Bonus: trivial CI
1 | - run: pip install ruff pytest pandas pyarrow requests |
No apache-airflow in dependencies. Full config at .github/workflows/ci.yml.
One-liner
Don’t try to test Airflow’s scheduling behavior — that’s Airflow’s job. Test the pure Python that you wrote in the DAG file. Once you stub the imports, it’s no different from testing any other module.