Apache Airflow | Cloud | Data | Python
The Zen of Python and Apache Airflow GoDataDriven 15 Jun, 2022
Airflow 1.10.10 was released on April 9th and in this blog post I'd like to point out several interesting features. Airflow 1.10.10 can be installed as usual with:
pip install apache-airflow==1.10.10
The Airflow 1.10.10 release features a large number of bug fixes, UI improvements, performance improvements, new features, and more. For me personally, these are the main highlights of the Apache Airflow 1.10.10 release:
export AIRFLOW__WEBSERVER__RBAC=True
An empty Airflow installation holds no default users. To add one for just looking around, you can create a user with admin rights:airflow create_user --role Admin --username airflow --password airflow -e airflow@airflow.com -f airflow -l airflow
This creates a user with username airflow
, password airflow
.AIRFLOW__CORE__LOAD_EXAMPLES
. However, there was no such toggle for example connections until now: AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS
. The default value is True, so you'll still have to set it to False explicitly. I think this was a much requested feature and the default connections have led to unexpected behaviour more than once, so I'm happy to see this toggle.docker run apache/airflow:1.10.10
The default will come with Python 3.6, but other versions are also available:apache/airflow:1.10.10
will display the Airflow CLI help. It does not come with a convenience function for running both the webserver and scheduler inside a single image for looking around. The images are intended to be used in a production setup, e.g. with a Postgres database, and one image running the webserver, and one running the scheduler. For more information how to use these Docker image, refer to https://github.com/apache/airflow/blob/master/IMAGES.rst#using-the-images.AIRFLOW_CONN_SUPERSECRET=[AWS SSM secret inserted]
. The environment variable secrets were easily readable and required lots of configuration so this was not ideal. In Airflow 1.10.10, a generic interface for communicating with secrets providers was added, and several implementations were added:export AIRFLOW__SECRETS__BACKEND=airflow.contrib.secrets.aws_systems_manager.SystemsManagerParameterStoreBackend
export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "/airflow/connections", "variables_prefix": "/airflow/variables", "profile_name": "aws_profile_name_in_config"}'
Only one single secrets backend can be configured. The search order is:get_parameter
call to AWS, but I expect this to soon be fixed. While I still think it's a very useful feature, currently the only way (specific for AWS SSM) is to store an unencrypted string value: from airflow.hooks.base_hook import BaseHook
myconn = BaseHook.get_connection("test")
print(myconn.host)
# myhost
More details here: https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.htmlAIRFLOW__CORE__STORE_DAG_CODE=True
(plus AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
to serialize the DAGs as JSON in the database). Now, the webserver reads everything from the metastore and does not require access anymore to the DAGs folder. There are two notes to mention:AIRFLOW__CORE__STORE_DAG_CODE
value is interpolated from AIRFLOW__CORE__STORE_SERIALIZED_DAGS
. However, the interpolation fails when using environment values, so ensure to set AIRFLOW__CORE__STORE_DAG_CODE
explicitly. Bug report filed: https://github.com/apache/airflow/issues/8255.dag_run.conf
, which can be a convenient way to create parameterized DAGs. That's the highlights for this release. Feel free to contact us about anything Airflow!