Slowly Changing Dimension and SSIS
Concept: Slowly changing Dimension (SCD) is a concept in which data (columns) values changes over time due to change in business scenirio. Let's say we have a product with a price tag of $10 as of today date. Let's assume that the price of this product is increased by 10 percent tomorrow. So the new price will be $11. This is an example where data has changed and its called Slowly Changing Dimension.
Types of SCD: There are three type of SCD. Type 1, Type 2, and Type 3.
Type 1. In our example, we talked about price increase from $10 to $11. Let's see how we can make these changes in our database table to handle Type1 change.
Let's say we have a table called product.
ProductID ProductName ProductDescr Price DateAdded
1 iPhone Apple iPhone 1 10 01/01/2000
Now as the price of iPhone is increased from $10 to $11, all we have to do is update the price column. This is Type 1 SCD.
ProductID ProductName ProductDescr Price DateAdded
1 iPhone Apple iPhone 1 11 01/01/2000
To accomplish this, all you have to do is update the column and set with new price.
Let's take a look sql statement to do this.
Update product
Set Price = 11
Where ProductID = 1
Another way to do this:
Merger INTO TargetTable As Target
USING Sourcetable AS Source
ON Target.ProductID = Source.ProductID
When Matched THEN
Update
Set Price = Source.Price
When Not Matched THEN
INSERT
(ProductID, ProductName, ProductDescr, Price, DateAdd)
Values
(Source.ProductID, Source.ProductName, Source.ProductDescr, Source.DateAdd)
Type 2. Type 2 SCD is where we keep old record (row in this case) and add a new row. But before we do that, let's think for a moment, when we display, both rows will appear in our result. Inorder to accomplish type 2 SCD, we will make some changes to our table. We will add a new column called IsActive and RetiredDate. Also remember that when we build our product table, we kept ProductID as primary key. So when we add another row with the same productID, it's a primary key voliation. So we have to change our index by including ProductID and IsActive column as a clustered index.
With this we can add another row of data.
ProductID ProductName ProductDescr Price DateAdded RetiredDate IsActive
1 iPhone Apple iPhone 1 10 01/01/2000 01/01/2013 0
1 iPhone Apple iPhone 1 11 01/02/2013 1
In this way, we can see how and when values has changed over time.
Type 3. In Type 3 SCD, instead of add a new row, we add a new column OLD PRICE and set this column to price and update price column to new value
ProductID ProductName ProductDescr Price OLD PRICE DateAdded
1 iPhone Apple iPhone 1 11 10 01/01/2000
However, as you can see that in Type 3 SCD, if we increase our price to $12 and update our data, it will look like this.
ProductID ProductName ProductDescr Price OLD PRICE DateAdded
1 iPhone Apple iPhone 1 12 11 01/01/2000
Now the old price is $11 which is actually last price and so the $10 price data is lost forever.
In real world, most company implement Type2 so that it can be used to reporting and analysis purpose.
This is work in Progress, I will be adding more information to this blog on SCD.
Tuesday, November 26, 2013
Slowly Changing Dimension and SSIS
Labels:
SCD,
Slowly Changing Dimension and SSIS,
sql server,
SSIS,
Type 1,
Type 2,
Type 3
Thursday, November 21, 2013
Shrinking Log file of Database in Production Environment
Let's take a look at how we shrink our log files in sql server environment.
Any database on sql server has two type of files.
1. .mdf files
2. ldf files
MDF files are transitional data files. There is one primary mdf files and it can have many secondary files. Similarly ldf are log transitional files with one primary and many secondary).
Step 1. Lets check all the log space for all the database on a server. To do this run
DBCC SQLPERF (logspace);
What this return is DatabaseName, LogSize ( this is combined logsize if you have multiple log file for a particular database), LogSpace used, and status.
Step 2. Find the database whose log file you want to shrink. let's say I want to shrink tempdb datbase log file (not the best idea but for explanation purpose, its good enough).
So we run
SP_HELPDB DatabaseName;
SP_HELPDB tempdb
So this will return you all the mdf, ldf and ndf file information, where it is located, whats the size (in KB), etc. You want to make sure to check the size of log files in GB or MB. To do so run Select "size of the file/1024" to give in MB and "size of the file/1024/1024" in GB
--Convert KB to GB to check size of log files
SELECT 111111111/1024/1024
Step 3.
To your database from Object Explorer, select the database where you want to shrink log file, Right click Task, then shrink and then files.
Select file type as log and select "Released unused space".
You can click Ok or generate a script.
USE [tempdb]
GO
DBCC SHRINKFILE (N'Templog' , 0, TRUNCATEONLY)
GO
This is how you shrink log files
Any database on sql server has two type of files.
1. .mdf files
2. ldf files
MDF files are transitional data files. There is one primary mdf files and it can have many secondary files. Similarly ldf are log transitional files with one primary and many secondary).
Step 1. Lets check all the log space for all the database on a server. To do this run
DBCC SQLPERF (logspace);
What this return is DatabaseName, LogSize ( this is combined logsize if you have multiple log file for a particular database), LogSpace used, and status.
Step 2. Find the database whose log file you want to shrink. let's say I want to shrink tempdb datbase log file (not the best idea but for explanation purpose, its good enough).
So we run
SP_HELPDB DatabaseName;
SP_HELPDB tempdb
So this will return you all the mdf, ldf and ndf file information, where it is located, whats the size (in KB), etc. You want to make sure to check the size of log files in GB or MB. To do so run Select "size of the file/1024" to give in MB and "size of the file/1024/1024" in GB
--Convert KB to GB to check size of log files
SELECT 111111111/1024/1024
Step 3.
To your database from Object Explorer, select the database where you want to shrink log file, Right click Task, then shrink and then files.
Select file type as log and select "Released unused space".
You can click Ok or generate a script.
USE [tempdb]
GO
DBCC SHRINKFILE (N'Templog' , 0, TRUNCATEONLY)
GO
This is how you shrink log files
Wednesday, November 20, 2013
Passing WildCard inside a parameter in Stored Procedure
While writing a stored procedure, sometime we are asked to return all the row if we do not provide a parameter.
For example.
Let's say we have a table called [State] which hold values like this.
ID StateFullName StateCode
1 Alaska AK
2. Alabama AL
.
.
.
.
.
and like
So we want to write a stored procedure where sometime we are providing code to get state full name and sometime not.
In that case our SP will be something like this.
Create Proc as usp_getStateCode
(@statecode varchar(10) = NULL)
AS
Select
StateFullName, StateCode
From dbo.[State]
Where stateCode = ISNULL(@statecode,StateCode)
Now let's say we have a situation where we want to find a name of database and we do not know fullname of database.
Let's say we have a table which list all my database name and instance name.
DatabaseInstance table with following structure
(instancename varchar(128)
databasename varchar(128)
Now we want to write a stored procedure which should give me all the values if I am not passing any value or if I am passing a values which is not complete, it should compare with wildcard match and return me those matching values.
Create Proc as usp_getDatebaseName
(@DatabaseName varchar(128) = NULL
AS
Select
InstanceName, DatabaseName
From DatabaseInstance
Where DatabaseName Like ISNULL('%' + @databaseName + '%',DatabaseName)
This stored procedure has 70% cost as it is doing full scan on table for database name as we are passing wildcard
This stored procedure will return all the values if none is provided and it will return those matching values if provided.
Another way to write this is:
Create PROCEDURE [dbo].[usp_getDatebaseName]
@databaseName VARCHAR(128) = NULL
AS
IF @databasename <> ''
SET @databaseName = '%' + @databaseName + '%'
IF @databaseName IS NULL
SET @databasename = '%'
Select
InstanceName, DatabaseName
From DatabaseInstance
Where DatabaseName Like ISNULL(@databaseName,DatabaseName)
This stored procedure take 30% scan as it is doing seek scan.
Tuesday, November 5, 2013
SQL Server: How to modify existing table structure
Suppose you have a table in a production database and you need to modify table structures.
Let's say we have a table with following columns and properties
TableA
Columns:
ID int
FName varchar(50)
LName varchar(50)
version nvarchar(10)
file ntext
City char(10)
say we have primary key on ID
Now someone asked to change version nvarchar data type to varchar datatype and file ntext to text datatype. Also remember that we have data in this table.
So how would we modify the structure of table without change any values.
Here's How:
--Step 1. Drop any primary key if there is one.
--Step 2. Drop any index if it exists (may not have to do if step cover it).
--Step 3. Alter table--change column, insert or remove.
--step 4. Recreate primary key
--step 5. Recreate index.
Let's say we have a table with following columns and properties
TableA
Columns:
ID int
FName varchar(50)
LName varchar(50)
version nvarchar(10)
file ntext
City char(10)
say we have primary key on ID
Now someone asked to change version nvarchar data type to varchar datatype and file ntext to text datatype. Also remember that we have data in this table.
So how would we modify the structure of table without change any values.
Here's How:
--Step 1. Drop any primary key if there is one.
--Step 2. Drop any index if it exists (may not have to do if step cover it).
--Step 3. Alter table--change column, insert or remove.
--step 4. Recreate primary key
--step 5. Recreate index.
--Step 1. Drop any primary key if there is one.
ALTER TABLE [dbo].[TableA]
DROP CONSTRAINT constraintname_PK
GO
;
--Step 3. Alter table--change column, insert or remove.
ALTER TABLE [dbo].[TableA]
ALTER COLUMN [version ] [varchar](10) ;
when you try to convert ntext to text using above statement, it will give an error. To do that first convert to varchar(max) and then to text data type.
ALTER TABLE TableA
ALTER COLUMN file VARCHAR(MAX)
ALTER TABLE TableA
ALTER COLUMN file text
ALTER TABLE [dbo].[TableA] ADD CONSTRAINT [constraintname_PK] PRIMARY KEY NONCLUSTERED
(
[ID ] ASC,
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Subscribe to:
Posts (Atom)