DCM Projects for data pipelines¶
DCM Projects provide a full-lifecycle developer experience which includes capabilities tailored to managing data pipelines.
The pipeline-specific commands don’t apply to all object types. They extend the core commands for the following pipeline use cases:
REFRESH command for dynamic tables managed by a DCM project.
TEST command for data quality expectations attached to managed objects.
PREVIEW command for checking sample output from a dynamic table, view, or table before deploying.
REFRESH command for dynamic tables¶
After you deploy a pipeline definition change, you can refresh the dynamic tables inside the pipeline project before testing data quality expectations, so that any new transformation logic is applied end to end.
You can refresh all dynamic tables managed by the DCM project and their required upstream dynamic tables with one command. This command applies only to dynamic tables that are deployed and managed by the referenced project, independent of any definition files. Other object types, such as tasks, are not affected.
See TEST command for data quality expectations for usage examples that combine REFRESH and TEST.
The command runs until all dynamic table refreshes are complete and returns a summary of the row changes or errors for each dynamic table.
To run the REFRESH command:
For the REFRESH ALL output format, including the JSON schema and examples, see the REFRESH ALL output section of the EXECUTE DCM PROJECT command reference.
TEST command for data quality expectations¶
You can set data quality expectations as quality gates on all stages of your data transformation:
Attach expectations to raw data in your bronze layer landing tables to ensure your raw input meets expectations and does not cause errors during transformation.
Attach expectations as quality gates to your silver layer to make it easier to debug data issues by having checkpoints at different transformation stages.
Attach expectations to your gold layer to ensure the output quality of your data product.
Attach expectations from downstream consumers of your data product to your gold layer so you can validate those expectations before deploying breaking changes.
See Data metric function for how to attach expectations in DCM projects.
You can test all data quality expectations attached to tables, dynamic tables, or views that are managed by the DCM project with one command.
Data metric functions that are attached without expectations are not checked.
You can use the CLI commands to set up automated testing as part of your CI/CD workflow. For example, if you have production-like data on a QA, test, or staging environment, you can follow these steps:
PLAN against QA to verify the expected project definition changes.
DEPLOY to QA.
REFRESH ALL dynamic tables on QA to update data based on any new transformation logic and updated definitions, so that expectations are not tested against outdated data.
TEST ALL data quality expectations attached to table objects on the QA environment to verify that the newly deployed logic works as expected and has no negative side effects on the expected shape of your data output.
If all expectations are met on QA, continue with PLAN and DEPLOY to your production environment.
To run the TEST command:
For the TEST ALL output format, including the JSON schema and examples, see the TEST ALL output section of the EXECUTE DCM PROJECT command reference.
PREVIEW command¶
When you write or alter the SELECT statement of a dynamic table or view, a sample output helps validate the shape of the data. For complex lineage graphs with multiple transformation steps, you can check the output of a downstream view or dynamic table when making changes further upstream.
To validate that the transformation in your code results in the expected data output before deploying, run the PREVIEW command.
The PREVIEW command runs PLAN to compile the current definitions, independent of any deployed state, and then returns a data sample for a specified dynamic table, view, or regular table.
Keep the following requirements and considerations in mind:
The PREVIEW command must always reference a fully qualified name of a table object, without Jinja variables.
To see sample data in the output, you must ensure that data is already available in the source tables.
PREVIEW queries all SELECT statements of referenced dynamic tables and views, but it does not run tasks or CREATE TABLE AS SELECT statements.
To run the PREVIEW command: