Tuesday, November 26, 2013

Slowly Changing Dimension and SSIS

Slowly Changing Dimension and SSIS

Concept: Slowly changing Dimension (SCD) is a concept in which data (columns) values changes over time due to change in business scenirio. Let's say we have a product with a price tag of $10 as of today date. Let's assume that the price of this product is increased by 10 percent tomorrow. So the new price will be $11. This is an example where data has changed and its called Slowly Changing Dimension.

Types of SCD: There are three type of SCD. Type 1, Type 2, and Type 3.

Type 1. In our example, we talked about price increase from $10 to $11. Let's see how we can make these changes in our database table to handle Type1 change.

Let's say we have a table called product.

ProductID ProductName ProductDescr Price DateAdded
1 iPhone Apple iPhone 1 10 01/01/2000

Now as the price of iPhone is increased from $10 to $11, all we have to do is update the price column. This is Type 1 SCD.

ProductID ProductName ProductDescr        Price         DateAdded
1 iPhone Apple iPhone 1       11 01/01/2000

To accomplish this, all you have to do is update the column and set with new price.

Let's take a look sql statement to do this.

Update product
Set Price = 11
Where ProductID = 1

Another way to do this:

Merger INTO TargetTable As Target
USING Sourcetable AS Source
ON Target.ProductID = Source.ProductID
When Matched THEN
Update
Set Price = Source.Price
When Not Matched THEN
INSERT
(ProductID, ProductName, ProductDescr, Price, DateAdd)
Values
(Source.ProductID, Source.ProductName, Source.ProductDescr, Source.DateAdd)

Type 2. Type 2 SCD is where we keep old record (row in this case) and add a new row. But before we do that, let's think for a moment, when we display, both rows will appear in our result. Inorder to accomplish type 2 SCD, we will make some changes to our table. We will add a new column called IsActive and RetiredDate. Also remember that when we build our product table, we kept ProductID as primary key. So when we add another row with the same productID, it's a primary key voliation. So we have to change our index by including ProductID and IsActive column as a clustered index.

With this we can add another row of data.

ProductID ProductName ProductDescr Price       DateAdded RetiredDate IsActive
1 iPhone Apple iPhone 1 10 01/01/2000 01/01/2013 0
1 iPhone Apple iPhone 1 11 01/02/2013 1

In this way, we can see how and when values has changed over time.

Type 3. In Type 3 SCD, instead of add a new row, we add a new column OLD PRICE and set this column to price and update price column to new value

ProductID ProductName ProductDescr Price OLD PRICE DateAdded
1 iPhone Apple iPhone 1 11 10 01/01/2000


However, as you can see that in Type 3 SCD, if we increase our price to $12 and update our data, it will look like this.

ProductID ProductName ProductDescr Price OLD PRICE DateAdded
1 iPhone Apple iPhone 1 12 11 01/01/2000

Now the old price is $11 which is actually last price and so the $10 price data is lost forever.

In real world, most company implement Type2 so that it can be used to reporting and analysis purpose.


This is work in Progress, I will be adding more information to this blog on SCD.

No comments:

Post a Comment