
Introduction
In the world of data modeling, one of the challenges organizations face is managing tables with business keys (BKs) that are re-used over time. This issue typically occurs when a company, for technical limitations in the source system, reuses primary keys or "unique identifiers" over time. This practice can lead to ambiguity, especially when historical data is needed for analytical purposes. This blog post explores how to use Data Vault 2.0 to resolve re-used BKs by leveraging the Business Vault.
Understanding the Data Vault 2.0 Architecture
Data Vault 2.0 is often seen as a modeling technique, but it’s a full solution that covers methodology, architecture, implementation, and modeling.
Let’s have a look at the below architecture:

- The first step is to extract the data from the source systems and load the data in its raw format into a Staging Area.
- Next you can apply hard rules if necessary, such as data type changes.
- Then you load the data into the Raw Vault and model the data into Hubs, Links, Satellites etc.
- If you need to apply business logic to your data across more than one use case, you can extend the Raw Vault by creating a Business Vault.
- The last step is to build the Information Marts (IMs) to be used for reporting/dashboarding purposes. The IM tables or views are built using the Raw Vault/Business Vault and can be modeled using Kimball dimensional modeling for example. Check this other blog as reference: https://xebia.com/blog/kimball-dimensional-modelling/.
Let’s check an example
Suppose we are an e-commerce business, and we want to analyze our orders information. The order code is re-used over time. In this scenario, we typically have one source table for the order header, and another for the order items.
order header:
order_code | order_date | customer _code | ship_to_city |
1111 | 10/10/2024 | A | Almere |
1112 | 11/10/2024 | B | Amsterdam |
1111 | 12/11/2024 | D | Breda |
order items:
order_code | line_code | item_code | quantity | amount |
1111 | 1 | AB | 1 | 32 |
1111 | 2 | BC | 2 | 34.99 |
1111 | 3 | DF | 10 | 56 |
1112 | 1 | BC | 5 | 70 |
1112 | 2 | XS | 1 | 5 |
1111 | 1 | DE | 20 | 251.95 |
The expectation of the Business Intelligence (BI) team is that they should have two Information Mart tables, one for the order header and another one for the order items, and they need to be able to join them for reporting purposes. They also want to keep track of the full historical changes.
To determine which order items are linked to each order header, we would typically check the foreign key to order_code in the order items table, allowing us to reconstruct the complete order.
However, if we examine the order header table closely, we notice that the order_code “1111” was reused on a different date. This introduces the complexity of determining which item codes are associated with each order_code over time.
The logical solution in this case is to create a primary key (PK) in the order header table that uniquely identifies each order over time. This same value should be inserted as a foreign key (FK) in the order items table. By including the order_date as part of the primary key, we can ensure uniqueness and achieve a table with the expected results, as shown below:
im_order_header:
pk_order_code | order_code | order_date | customer_code | ship_to_city |
1111 || 10/10/2024 | 1111 | 10/10/2024 | A | Almere |
1112 || 11/10/2024 | 1112 | 11/10/2024 | B | Amsterdam |
1111 || 12/11/2024 | 1111 | 12/11/2024 | D | Breda |
im_order_items:
pk_order_line_code | fk_order_code | order_code | line_code | Item_code | quantity | amount |
1111 || 10/10/2024 || 1 | 1111 || 10/10/2024 | 1111 | 1 | AB | 1 | 32 |
1111 || 10/10/2024 || 2 | 1111 || 10/10/2024 | 1111 | 2 | BC | 2 | 34.99 |
1111 || 10/10/2024 || 3 | 1111 || 10/10/2024 | 1111 | 3 | DF | 10 | 56 |
1112 || 11/10/2024 || 1 | 1112 || 11/10/2024 | 1112 | 1 | BC | 5 | 70 |
1112 || 11/10/2024 || 2 | 1112 || 11/10/2024 | 1112 | 2 | XS | 1 | 5 |
1111 || 12/11/2024 || 1 | 1111 || 12/11/2024 | 1111 | 1 | DE | 20 | 251.95 |
In the next sections, we cover how you can use Data Vault 2.0 to obtain this output.
The Raw Vault
Let’s assume that your data is already extracted and available in the Staging Area. Next step is to load the data into the Raw Vault. In this blog, we will not cover the details about the different structures that can be created in the Data Warehouse following the Data Vault 2.0 principles, such as hubs, links, and satellites. Below, you can find one possible solution to model the order data.

After creating your satellites, you will see the information below. In Data Vault 2.0, best practices recommend using Hash Keys (HK) to uniquely identify records and ensure consistent joins between tables. Hash Keys provide better performance during joins compared to text-based data types.
rv_sat__order_header:
hk_order | order_code | order_date | customer_code | ship_to_city | load_date | end_date |
MD5(1111) | 1111 | 10/10/2024 | A | Almere | 10/10/2024 | 12/11/2024 |
MD5(1112) | 1112 | 11/10/2024 | B | Amsterdam | 11/10/2024 | NULL |
MD5(1111) | 1111 | 12/11/2024 | D | Breda | 12/11/2024 | NULL |
rv_sat__order_items:
hk_order | hk_order_line | order_code | line_code | item_code | quantity | amount | load_date | end_date |
MD5(1111) | MD5(1111 || 1) | 1111 | 1 | AB | 1 | 32 | 10/10/2024 | 12/11/2024 |
MD5(1111) | MD5(1111 || 2) | 1111 | 2 | BC | 2 | 34.99 | 10/10/2024 | 12/11/2024 |
MD5(1111) | MD5(1111 || 3) | 1111 | 3 | DF | 10 | 56 | 10/10/2024 | 12/11/2024 |
MD5(1112) | MD5(1112 || 1) | 1112 | 1 | BC | 5 | 70 | 11/10/2024 | NULL |
MD5(1112) | MD5(1112 || 2) | 1112 | 2 | XS | 1 | 5 | 11/10/2024 | NULL |
MD5(1111) | MD5(1111 || 1) | 1111 | 1 | DE | 20 | 251.95 | 12/11/2024 | NULL |
So now, we need to consider the following:
- The same order code is only active once in the satellite (with end_date equal to NULL).
- If you use these satellites to create your Information Marts tables or views, it’s essential to include only the active records and exclude historical data. Failing to filter for active records will result in non-unique primary keys, making it impossible to join the header and items models correctly.
The business requires analyzing historical data to make informed decisions. Looking at the expected result, we can see that making the primary key unique involves applying business logic. Both tables need to use a combination of order_code and order_date as the unique identifier. However, the raw data for order items lacks any date information. In this scenario, we have the following possibilities:
- Ask the data provider to include the date information in the order items source table. This can be very time-consuming and sometimes not possible.
- After you load the data to the raw vault, create business logic to include the order code + order date combination into the item’s information. This can be done by leveraging the Business Vault.
- Joining the header and items table before loading the data into the Raw Vault is not an option, because it would violate the Data Vault 2.0 standards that in the Staging Layer you only should apply hard rules.
- Joining the header and items data after loading the data into the Raw Vault in the Information Mart is also not ideal because it adds complexity and redundancy to the reporting layer. The business logic to combine order code + order date would need to be recreated for every use case, leading to inconsistencies.
The Business Vault
The Business Vault is an extension of the Raw Vault, particularly useful when you need to apply business logic that is needed for several use cases. It needs to be implemented in a way where you will have the same surrogate key for both the order header and the order items.
bv__order_header:
The bv__order_header is an extension of the rv_sat__order_header. To create it, you need to generate a surrogate key and recalculate the end_date based on the previous load_date value for each surrogate key. This approach allows you to handle cases where two order_code values, such as “1111”, are valid simultaneously. Below is an example of the code and how the Business Vault order header table would look:
with
unique_key as (
select
order_code,
order_date,
concat (bk_order_code, ' || ', order_date) as bk_order_code,
customer_code,
ship_to_city,
load_date
from
rv_sat__order_header
),
end_date_calculation as (
select
*,
lead (load_date) over (
partition by
bk_order_code
order by
load_date
) as end_date
from
unique_key
)
select
bk_order_code,
order_date,
customer_code,
ship_to_city,
load_date,
coalesce(end_date, '9999-12-31') as end_date
from
end_date_calculation
order_code | order_date | bk_order_code | customer_code | ship_to_city | load_date | end_date |
1111 | 10/10/2024 | 1111 || 10/10/2024 | B | Almere | 10/10/2024 | 11/10/2024 |
1111 | 10/10/2024 | 1111 || 10/10/2024 | A | Almere | 11/10/2024 | 31/12/9999 |
1112 | 11/10/2024 | 1112 || 11/10/2024 | B | Amsterdam | 11/10/2024 | 12/10/2024 |
1112 | 11/10/2024 | 1112 || 11/10/2024 | B | Amsterdam | 12/10/2024 | 31/12/9999 |
1111 | 12/11/2024 | 1111 || 12/11/2024 | D | Breda | 12/11/2024 | 31/12/9999 |
bv__order_items:
A similar logic applies for creating the business vault table for rv_sat__order_items, but first you need to add the same surrogate key from the business vault order header that was previously created. Again, a code example and how the Business Vault order items table would look like:
with
sat_order_items as (
select
order_code,
line_code,
item_code,
quantity,
amount,
load_date
from
rv_sat__order_items
),
bv_order_header as (
select
*
from
bv__order_header
),
add_unique_key as (
select
sat_order_items.*,
bv_order_header.bk_order_code,
concat (
bv_order_header.bk_order_code,
"||",
sat_order_items.line_code
) as bk_order_line_code
from
sat_order_items
left join bv_order_header on sat_order_items.order_code = bv_order_header.bk_order_code
and sat_order_items.load_date between bv_order_header.load_date and bv_order_header.end_date
),
end_date_calculation as (
select
*,
lead (load_date) over (
partition by
bk_order_line_code
order by
load_date
) as end_date
from
add_unique_key
)
select
*,
coalesce(end_date, '9999-12-31') as end_date
from
end_date_calculation
order_code | bk_order_code | bk_order_line_code | line_code | item_code | quantity | amount | load_date | end_date |
1111 | 1111 || 10/10/2024 | 1111 || 1 || 10/10/2024 | 1 | AB | 1 | 32 | 11/10/2024 | 31/12/9999 |
1111 | 1111 || 10/10/2024 | 1111 || 2 || 10/10/2024 | 2 | BC | 2 | 34.99 | 11/10/2024 | 31/12/9999 |
1111 | 1111 || 10/10/2024 | 1111 || 3 || 10/10/2024 | 3 | DF | 10 | 56 | 11/10/2024 | 31/12/9999 |
1112 | 1112 || 11/10/2024 | 1112 || 1 || 11/10/2024 | 1 | BC | 5 | 70 | 12/10/2024 | 31/12/9999 |
1112 | 1112 || 11/10/2024 | 1112 || 2 || 11/10/2024 | 2 | XS | 1 | 5 | 12/10/2024 | 31/12/9999 |
1111 | 1111 || 12/11/2024 | 1111 || 1 || 12/11/2024 | 1 | DE | 20 | 251.95 | 12/11/2024 | 31/12/9999 |
With the Business Vault in place, you can now create Information Marts that fully complies with the BI requirements.
By materializing the Information Mart as a view that pulls directly from the Business Vault, you can avoid duplicating data across multiple layers and reduce the need for computations at the Information Mart layer. This approach significantly improves the speed at which BI tools can access and analyze the data.
It’s important to note that, since a new business key was created by applying business logic, you will also need to reconstruct the Order Hub and establish links between the new business key and the Customer and Item Hubs within the Business Vault. This ensures consistent integration and traceability across the model.
Conclusion
Managing re-used business keys in a Data Vault 2.0 implementation can be challenging, but the Business Vault provides a robust solution to maintain historical integrity and ensure data uniqueness.
Do you need help tackling complex Data Vault 2.0 challenges? Our Data Vault 2.0 certified analytics engineers at Xebia are here to assist you in leveraging your data vault effectively. Contact us today to unlock the full potential of your data!