def transform_large_data(**kwargs): # Pull the file path (small metadata) s3_path = kwargs['ti'].xcom_pull(task_ids='extract', key='s3_path') # Read and process the large file from S3 df = pd.read_parquet(s3_path) # Process and write results back
XComs allow tasks to "push" and "pull" metadata or small results. They are stored in the Airflow metadata database and are keyed by: dag_id : The specific workflow. task_id : The originating task. run_id : The specific execution instance. key : A custom identifier (defaults to return_value ). 🔒 Implementing "Exclusive" Scoping
By migrating away from implicit, database-heavy XCom patterns and adopting an exclusive data-sharing architecture, you guarantee that your Apache Airflow environment remains scalable, performant, and secure. airflow xcom exclusive
While any task can technically xcom_pull data from any other task, designing your XComs to be exclusive enhances pipeline stability. How to Implement Exclusive XComs in Airflow
: Every XCom is uniquely identified by its dag_id , task_id , run_id , and a specific key . run_id : The specific execution instance
For more control, you can explicitly push and pull values within a task instance, allowing for custom keys.
def task2(**kwargs): # Retrieve data from XCom customer_data = kwargs['ti'].xcom_pull(key='customer_data') if customer_data: # Process customer data print(customer_data) While any task can technically xcom_pull data from
"Airflow XCom exclusive" refers to the practice of pushing XCom data targeted specifically for one or more downstream tasks, ensuring no other tasks mistakenly consume or rely on that data. It is a best practice for maintaining modularity and preventing unintended dependencies between tasks.
export AIRFLOW__COMMON_IO__XCOM_OBJECTSTORAGE_THRESHOLD=1048576 # 1 MB