How to run Airflow DAGs for a specified date in the past?
Command Line Interface Reference
如果要刷的数据区分日期,并且早于 dag 创建的时间,就需要手动执行 command 来 run dags.
Using backfill to run dag
airflow backfill -s START_DATE -e END_DATE DAG_ID
# 例子
./venv/bin/airflow backfill -s 2020-01-01 -e 2020-02-02 test_dag
日期格式必须为: YYYY-MM-DD
How about re-running completed DAGs?
The backfill command does not re-run completed DAGs within the given period unless we explicitly instruct it to do so. Therefore, if there was already a DAG run on 2019-02-01 and I would like to repeat it, I have to add –reset_dagruns to the airflow backfill command.
./venv/bin/airflow backfill -s 2020-01-01 -e 2020-02-02 --reset_dagruns test_dag
使用 Screen?
Airflow backfill does not schedule all DAGs at once! It starts the first one, waits until it finishes, and then schedules the next one.
Because of that, I need to keep an active SSH connection to the Airflow server until the backfill schedules the last DAG run. If I got disconnected before Airflow had scheduled the last DAG, it would finish the currently running DAG and never schedule the remaining ones.
To avoid problems in case of a lost internet connection, I suggest using the screen application to start a durable terminal session on the Airflow server and run the backfill command inside the screen session.
./venv/bin/airflow backfill -s 2020-05-10 -e 2020-05-11 --reset_dagruns mars_ngx_raw_log_athena
网友评论