Populating Slowly Changing Dimensions with SQL Server 2008 MERGE statement
MERGE statement
MERGE is a new feature in SQL Server 2008 that provides an efficient way to perform multiple DML operations. Using MERGE statement INSERT, UPDATE or DELETE can be performed on a target table based on the result of a join with a source table.
The MERGE statement has three WHEN clause, each performs a specific DML operation on a given row in the result set
WHEN MATCHED allows you to UPDATE or DELETE the given row in the target table for every row that exists in both source and target table
WHEN NOT MATCHED [BY TARGET] allows you to INSERT a row into the target table for every row that exists in the source table but not in the target
WHEN NOT MATCHED BY SOURCE allows you to UPDATE or DELETE the given row in the target table for every row that exists in target table but not in source table
You can also specify a search condition with each of the WHEN clause to choose the DML operation to be performed on the selected row
OUTPUT clause of the MERGE statement includes a virtual column $action which is used to identify the DML action performed in each row of the result set
Slowly Changing Dimensions
The data in the Slowly Changing Dimensions (SCD) changes periodically. The SCD may be a customer dimension, whereas any new customers can be added or the contact information of the existing customers may be changed.
Earlier this type of dimensions can be handled using multiple DML statements. With MERGE statement in SQL Server 2008 populating SCD can be achieved using a single statement.
Let us take DimCustomer as a dimension table which tracks the customer contact information (phone) for each customer of a particular retail store. Every day thecustomers table in the transactional system is updated with changes to the existing customer or inserting a new customer record. At the end of each month the changes in the customer table are to be applied in the DimCustomer table. A row has to be inserted for each new customer. Existing customers whose contact information changed need to be updated with IsCurrent as 0 and a new record for the existing customer has to be inserted with current contact information.
CustomerKey
|
CustomerName
|
Phone
|
IsCurrent
|
5011 | John Smith | 5650 578 632 |
1
|
5012 | Patrick Brown | 7451 241 584 |
1
|
Table 1: DimCustomer
CustomerKey
|
CustomerName
|
Phone
|
5011 | John Smith | 8845 001 863 |
5013 | Jimmy | 7554 846 563 |
Table 2: Customer
The following Transact-SQL statement updates the Phone for the customer John Smith and adds a new record for Jimmy in the DimCustomer table.
INSERT INTO DimCustomer(CustomerKey, CustomerName, Phone, IsCurrent)
SELECT CustomerKey, CustomerName, Phone, 1
FROM
(
MERGE DimCustomer as trgCustomerDim
USING Customer AS CustSrc
ON (trgCustomerDim. CustomerKey = CustSrc. CustomerKey and trgCustomerDim.IsCurrent = 1)
WHEN MATCHED THEN
UPDATE SET trgCustomerDim.IsCurrent = 0
WHEN NOT MATCHED THEN
INSERT VALUES (CustSrc. CustomerKey, CustSrc. CustomerName, CustSrc. Phone, 1)
OUTPUT $action, CustSrc. CustomerKey, CustSrc. CustomerName, CustSrc. Phone
) AS Changes(action, CustomerKey, CustomerName, Phone)
WHERE action = ‘UPDATE’;
The output of MERGE statement is
$action
|
CustomerKey
|
CustomerName
|
Phone
|
UPDATE | 5011 | John Smith | 8845 001 863 |
INSERT | 5013 | Jimmy | 7554 846 563 |
The OUTPUT of the MERGE statement is filtered with the action ‘Update’ to insert the new record for the changes in existing customer details. This will make the current information as available and also maintains the old information of the customer.
The DimCustomer table after processing the MERGE statement would be like this
CustomerKey
|
CustomerName
|
Phone
|
IsCurrent
|
5011 | John Smith | 5650 578 632 |
0
|
5012 | Patrick Brown | 7451 241 584 |
1
|
5013 | Jimmy | 7554 846 563 |
1
|
5011 | John Smith | 8845 001 863 |
1
|