dbt, amongst other functionalities, allows developers to create packages, offering a structured way to organize and share code, models, tests, and macros. They improve code organization, promote collaboration, and enable the reuse of pre-built and tested code from the dbt community, making data transformation projects more efficient and maintainable.
However, there are cases where a package works perfectly in your project, except for one little piece that might have a bug or a behavior you would like differently. Sometimes, such scenarios prevent the development and deployment of the whole project, and waiting for a new package version might take longer than you can afford.
This is where overriding global and packages macros comes into play. This functionality allows you to customize the behavior of macros, enabling quick-fixes and fine-tuning to your specific needs.
How does it work?
The concept of this functionality is pretty simple: you create your own macro, and tell dbt to look into your project for the macro, before looking into the package (or global macro), using dispatch.
The same applies to packages: if you are using an adapter other than the built-in ones, you may have to use a package to make the macros ‘compatible’ - tsql-utils, for example.
You can read all the details in the official documentation.
This is a real example that happened to me recently.
In a project I’m currently working on, I use the dbt-sqlserver adapter to connect dbt with SQL Server. I also use tsql-utils to make the macros compatible. It worked fine for some months, but I started to get the following error in my pipeline.
After further investigation, I found out that the macro that was causing this error was generate_surrogate_key - which I used in some of the models.
As you can see from the image below, the last step from the macro is a concat function. Sadly, SQL Server throws an error when using concat with only one column (now the importance of the compatibility packages should be clear).
The quick fix for this is to add an if statement, checking if there is one column or more. To quickly implement it, I created a file named generate_surrogate_key.sql, defining the macro.
As you can see from the image below, I prefixed the macro with sqlserver__. The purpose of this is to override this macro only if the adapter is sqlserver, which is our case. If I were using a different adapter or if the change should be applied to all adapters, I would not use this prefix.
Finally, I added the following code to my dbt_project.yml file, to make sure dbt will search for the macro in the correct place and order (don’t forget to replace the first item with your project’s name).
dispatch: - macro_namespace: dbt_utils search_order: ['my_project', 'tsql_utils', 'dbt_utils']
That’s it! Once I committed, pushed and reran the pipeline, it worked smoothly!
Now, I can peacefully wait for the tsql_utils package to be updated, then delete my own macro and use only the package’s.
By leveraging the feature to override global and packages macros, you can save hours of debugging and implement quick-fixes, making you less dependent on the release of new package versions.
Of course, if you find a bug and can fix it yourself by overriding a package macro, I encourage you to contribute back to the community by fixing it in the respective package, creating a Pull Request. I’ve recently done so in the dbt-sqlserver adapter and tsql-utils package, and it was a great experience.