Documentation Propagation Automation
Introduction
Documentation Propagation is an automation automatically propagates column and asset (coming soon) descriptions based on downstream column-level lineage and sibling relationships. It simplifies metadata management by ensuring consistency and reducing the manual effort required for documenting data assets to aid in Data Governance & Compliance along with Data Discovery.
This feature is enabled by default in Open Source DataHub.
Capabilities
Open Source
- Column-Level Docs Propagation: Automatically propagate documentation to downstream columns and sibling columns that are derived or dependent on the source column.
- (Coming Soon) Asset-Level Docs Propagation: Propagate descriptions to sibling assets.
DataHub Cloud (Acryl)
- Includes all the features of Open Source.
- Propagation Rollback (Undo): Offers the ability to undo any propagation changes, providing a safety net against accidental updates.
- Historical Backfilling: Automatically backfills historical data for newly documented columns to maintain consistency across time.
Comparison of Features
Feature | Open Source | DataHub Cloud |
---|---|---|
Column-Level Docs Propagation | ✔️ | ✔️ |
Asset-Level Docs Propagation | ✔️ | ✔️ |
Downstream Lineage + Siblings | ✔️ | ✔️ |
Propagation Rollback (Undo) | ❌ | ✔️ |
Historical Backfilling | ❌ | ✔️ |
Enabling Documentation Propagation
In Open Source
Notice that the user must have the Manage Ingestion
permission to view and enable the feature.
- Navigate to Settings: Click on the 'Settings' gear in top navigation bar.
- Navigate to Features: Click on the 'Features' tab in the left-hand navigation bar.
3Enable Documentation Propagation: Locate the 'Documentation Propagation' section and toggle the feature to enable it for column-level and asset-level propagation. Currently, Column Level propagation is supported, with asset level propagation coming soon.
In DataHub Cloud
- Navigate to Automations: Click on 'Govern' > 'Automations' in the navigation bar.
- Create An Automation: Click on 'Create' and select 'Column Documentation Propagation'.
- Configure Automation: Fill in the required fields, such as the name, description, and category. Finally, click 'Save and Run' to start the automation
Propagating for Existing Assets (DataHub Cloud Only)
In DataHub Cloud, you can back-fill historical data for existing assets to ensure that all existing column descriptions are propagated to downstreams when you start the automation. Note that it may take some time to complete the initial back-filling process, depending on the number of assets and the complexity of your lineage.
To do this, navigate to the Automation you created in Step 3 above, click the 3-dot "more" menu:
and then click "Initialize".
This one-time step will kick off the back-filling process for existing descriptions. If you only want to begin propagating descriptions going forward, you can skip this step.
Rolling Back Propagated Descriptions (DataHub Cloud Only)
In DataHub Cloud, you can rollback all descriptions that have been propagated historically.
This feature allows you to "clean up" or "undo" any accidental propagation that may have occurred automatically, in the case that you no longer want propagated descriptions to be visible.
To do this, navigate to the Automation you created in Step 3 above, click the 3-dot "More" menu
and then click "Rollback".
This one-time step will remove all propagated tags and glossary terms from Snowflake. To simply stop propagating new tags, you can disable the automation.
Viewing Propagated Descriptions
Once the automation is enabled, you'll be able to recognize propagated descriptions as those with the thunderbolt icon next to them:
The tooltip will provide additional information, including where the description originated and any intermediate hops that were used to propagate the description.