Blog

dbt Quicktip: Using deprecation_date to improve your model governance

07 Nov, 2023
Xebia Background Header Wave

Introduction

In a previous article, we discussed some of the new dbt features related to model governance, Data Mesh and multi-project deployments (Data contracts and schema enforcement with dbt). However, one specific parameter that can make a real impact in your multi-project communication was left out: deprecation_date. By using it in your project models, change management becomes easier and plans and timelines for long-term support and maintenance are known by all the consumers.

How does it work?

To use the deprecation_date, all we have to do is set the parameter in the model definitions .yml file. It should be formatted as a date and may also include an offset from UTC.

Once it is set, if a project references a model (or version) that’s set for deprecation or the deprecation date has passed, a warning will be generated. If it’s a model with a newer version available, then this will also be communicated. Pretty easy, right?

Hands-on

We will use this repository as an example.

There, we have a simple DAG, with one staging model, one intermediate and a few marts. In the next topics we will explore a few possible setups, the warnings we would get and how to make sure they are not ignored, by bumping the warnings into errors.

Past deprecation_date

To begin with, the simplest configuration possible is to set the deprecation_date directly on the model-level, without model versioning.

Let’s say we want to set a deprecation_date for our intermediate model, this is what it would look like.

 

This date is in the past (considering today is 2023-11-03), so once it is set and we run our pipeline again, we would get two types of warnings: one saying that the model is deprecated, and the second one saying that other models in our project depend on deprecated models.

 

To fix it, we would have to disable the deprecated model and change the refs. However, since we have only one intermediate model, we will leave it as it is.

Future deprecation_date with a new model version

Now let’s move to a bit more complex setup: we will create a new version for our intermediate model, and set a deprecation_date in the future, at the version-level.

 

In this case, we would get a different type of warning, saying that the model has a deprecation_date set and a new version available. It also highlights how to reference the new version.

Past deprecation_date with a new model version

Finally, let’s think that the date from the previous setup passed, so now the deprecation_date is in the past and we have a newer version available. The warning would change to a more energetic one, let’s say.

 

To fix it, we have to change two different things:

1 – Disable the old version of the intermediate model and bump the latest version.

2 – Bump the versions of the refs in the marts.

 

Bumping the Warnings into Errors

We went through a few possible configurations and warnings we could get. However, the issue is that only warnings are raised during the run – and we all know that warnings are pretty much ignored for as long as possible, until they prevent the pipeline from running. To ensure that this doesn’t happen to our project, we can define certain types of warnings to become errors, ensuring the deprecation_date won’t be ignored.

To do so, the easiest way is to set the DBT_WARN_ERROR_OPTIONS environment variable with the correct warning types. You can find more information about the available options here.

DBT_WARN_ERROR_OPTIONS = {"include": ["DeprecatedModel","DeprecatedReference"]}

Note: there are other ways to set this, either through the command line or the profiles.yml file. Especially if you are using dbt Core, you might set this in a different way.

Now, the Warnings will become Errors, making sure the deprecation_date is not ignored.

 

Conclusion

We explored a few possible scenarios and setups using dbt deprecation_date and model versioning, but of course things can get much more complex, with dependent models versioning and multi-project dependencies.

As dbt moves towards Data Mesh, the sooner you start to use all the new features and internalize the concepts, the quicker the benefits of a decentralized data management will be reaped to make data-driven decisions.

Lucas Ortiz
I've always been fascinated by technology and problem-solving. Great challenges are what keep me motivated, I rarely accept that a task can’t be done, it’s only a matter of finding new paths to solve the puzzle.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts