Friday, February 28, 2014

SSIS: Script task to check for File Exsits or not and send email notification

Sometime within SSIS package, we have to check for a particular file. If the file exists, then we have to do whole
lot of Data Flow and transformations. So if this is the case, we should in first step check for file existence.
If file does not exists, we want to send a email notification to business user and let them know that the file is missing.

To do this, let's get started.

Step 1
======
First we need to define following variables.
varFileExists Boolean datatype default value False
varFullSourceFilePath string datatype value will be location of the file with fully qualified. So if your file
is some server (most cases, it should be something like this \\myserver.com\Sourcefolder\myfile.txt). In our sample
case we will point to C drive. C:\tmp\a.xlsx








Step 2
======

Let's drag and drop a script task to control flow and open it. Add varFullSourcePathFileName in ReadOnlyVariable and varFileExists in readwritevariables as shown in the picture below.
















Now open script task in visual basic. You can also do this in C#. But for now let's do in Visual Basic 2008.

We need to add Imports System.IO and then write following code in Public Sub Main () routine.

Dts.Variables("varFileExists").Value = File.Exists(Dts.Variables("varFullSourcePathFileName").Value)

As shown in picture below.

















Step 3
=====

Let's configure for send mail.

Add a data flow or sequence container to go to next step if File exists. If not send a mail. Here' we are interested in sending mail if file is missing.

So add a send mail task. connect script task to it. Change Evaluation Operation to Expression and Constraint. Value should be Success (be careful not to use Failure here) and in Expression write
@varFileExists==False.

This mean that file is missing and it will send mail notification.



And for your dataflow or sequence container, make @varFileExists==True and proceed.

Wednesday, February 26, 2014

SQL: Get Current Financial Year dynamically in Select Statement.

Sometime, we are asked to write a query where we have to get a result set based on current financial year. Let's say my financial year begin 1st of November ever year and end 31st October.

And we have to do a select between these two date ranges.

Here's how we do it.

Select * from mytable
where myfinancialdate >= CASE WHEN month(getdate()) > 11 
THEN convert(datetime, cast(year(getdate()) as varchar) + '-11-1')
ELSE convert(datetime, cast(year(getdate()) - 1 as varchar) + '-11-1') end 
AND (myfinancialdate <= CASE WHEN month(getdate()) > 11 
THEN convert(datetime, cast(year(getdate()) + 1 as varchar) + '-10-31')
ELSE convert(datetime, cast(year(getdate()) as varchar) + '-10-31')  end);


This will result set only following between current financial year. For example in 2014 year, the result will bring value from 11-1-2013 to 10-31-2014.

Let say that we want to retrieve all the data from last fiscal year and current fiscal year. To do this, we can use this approach:

Let's declare some variable.

--Fiscal year variables
, @Now datetime = getdate()
, @CurrentFiscalYear_BeginDate date
, @PastFiscalYear_BeginDate date
, @CurrentFiscalYear_EndDate date

--Set Fiscal year values
select @CurrentFiscalYear_BeginDate = '11/1/' + cast(case when month(@Now) > 11 then year(@Now) else year(@Now)-1 end as varchar(4))

select
 @PastFiscalYear_BeginDate = dateadd(year, -1, @CurrentFiscalYear_BeginDate) --decrement year of current FY begin date to get past FY begin date
, @CurrentFiscalYear_EndDate = dateadd(day, -1, dateadd(year, 1, @CurrentFiscalYear_BeginDate)) --increment year of current FY begin date to get next FY begin date and then subtract one day to get end date of current FY
       

Now we can use these variable to get data.

Select * from mytable
Where myfinancialdate 
between @PastFiscalYear_BeginDate AND @CurrentFiscalYear_EndDate


This will retrieve all the row between last fiscal year and current fiscal year.

Thursday, February 20, 2014

How to find indexes and statics on a particular table in SQL Server

How to find indexes and statics on a particular table in SQL Server.

Generally we create indexes on any table in a database to increase data retrieval process. So when we create an index (Clustered Indexes or Non-Clustered indexes) we think that this is all we need to do.
However as data are inserted or deleted as time flies by, these indexes will be out of sync based on new set of data.

As a developer or dba, we should keep in mind that we should periodically check all the indexes on our database  and see when was last time it was updated. This can help a lot in performance improvement in production environment.

So let's get into some technical aspect of indexes.

How we create indexes?-- You can find many resources online
Clustered vs. Non-Clustered indexes.-- You can find many resources online

So after creating al these indexes, how do we go and check that what all indexes exists on a particular table and when was it last updated?

To begin with, we will have to look at the table called sys.indexes which hold all indexes information. Table name in sys.indexes is hidden in Object_ID values which we will have to convert to read.

So begin with wild select on this table.

Select * From Sys.Indexes;

Some important columns are following:
object_id = your table ID
name = Name of the index
index_id = IndexID
is_unique
is_primary_key = does it have a primary key as part of index
is_unique_constraint
fill_factor = default value is 80 percent
has_filter = does your index have filter or not


Now we want to know what all indexes exist on my table. Let's say I have a table 'dbo.Mytable'


To find this out let's run this query.


SELECT name AS index_name
 , STATS_DATE(OBJECT_ID, index_id)  AS StatsUpdated 
 FROM sys.indexes 
 WHERE OBJECT_ID = OBJECT_ID('dbo.Mytable');

 This query will return two columns Index_Name and statsUpdated (date when it was updated last time). Based on this, you can find out when was your indexes last built and if you need to rebuild them or not.

Update:

This sql will provide you the name of index on your table, type of index and percent of fragmentation.

SELECT  a.index_id, name, a.index_type_desc, a.page_count, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'Schema.TableName'),
     NULL, NULL, NULL) AS a

    JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;

Tuesday, February 11, 2014

SSIS: Logging total number of rows inserted in multiple tables.

Suppose you are inserting multiple rows in different staging tables in a single package. And you have to log total number of rows inserted into a logging table for audit purpose. How do we do it.

Let's see how we can accomplish this inside a ssis package.

 Say we are loading data in 3 tables.

 Table A,
 Table B,
 and Table C.

 Let's create a variable "varSrcRowCountAL" int data type.

 Add a SQL Execute task and set Result Set to "Single Row". In SQL Statement Add following sql statement

 Select ( (Select count(*) From TableA) + (Select count(*) From TableB) +  (Select count(*) From [TableC));

 In Result Set tab of SQL Execute task, Add variable like this
 Result Name = 0
 Variable Name = User::varSrcRowCountAL

  This will save total row count in our variable.
 
  Now add another SQL Execute Task to truncate TableA, TableB and TableC
 
  Add a sequence container, add three data flow task to load your data to 3 three table.
 
  Outside your sequence container, add another SQL Execute Task.
 
  This time, define a variable called "varDestRowCountAL" int datatype. In SQL Statement Add following sql statement.
  Select ( (Select count(*) From TableA) + (Select count(*) From TableB) +  (Select count(*) From [TableC));
 
  This value we will save in our destination variable varDestRowCountAL. Do the same as you did to count to your source count.

Now we can use this variable to update our logging table.

Monday, February 10, 2014

Derived Columns with CTEs

Derived Columns with CTEs.


Let's say we have a simple query selecting few columns from some table. Now we are asked do some derived columns from this queries. However, some of these derived columns
are based another derived columns.

One way to do this is to create a view and get a first set of derived columns and than create another view on the first view to derive another sets of columns.

However there is another approach to do this. Take the initial query results and put into a common table expression, use another common table expression to get a first derived columns and
use it derive next set of columns. Theoretically this can be done n-times but to illustrate this, let's try to limit this to 3 CTEs.



This is simple table I have created to demonstrate derived columns from CTEs. Not a perfect but just to illustrate the concept.



Create table Student_Grade
(ID int IDENTITY,
FirstName varchar(50),
MiddleName varchar(10),
LastName varchar(50), 
Maths float,
Biology float,
Chemistry float,
Physics float,
English float,
ForeignLanguage float,
Physical_Education float
)

--Let's enter some data in our sample tables.

Insert into Student_Grade
VALUES
('Abby', 'M', 'Smith',40,67,38,25,67,71,34),
('Bbby', 'M', 'Smith',50,77,48,35,57,72,44),
('Cbby', 'M', 'Smith',68,87,58,45,67,73,54),
('Dbby', 'M', 'Smith',78,97,68,55,27,74,64),
('Ebby', 'M', 'Smith',88,57,78,65,17,75,74)

--Let's see if we have our data in the table or not.

Select * from Student_Grade;



-- Let's say we are asked to find what student made in total_science_score, combined average in Science subjects (Biology, physics, and Chemistry) and English and ForeignLanguage.

With FirstSelect AS
(
SELECT  [ID]
      ,[FirstName]
      ,[MiddleName]
      ,[LastName]
      ,[Maths]
      ,[Biology]
      ,[Chemistry]
      ,[Physics]
      ,[English]
      ,[ForeignLanguage]
      ,[Physical_Education]


  FROM [dbo].[Student_Grade]
  ),

  --Here for simple purpose I am deriving two columns -Total_Science_Score and Total_language_Score
  SecondSelect AS
  ( Select [ID]
      ,[FirstName]
      ,[MiddleName]
      ,[LastName]
 ,[Maths]
      ,[Biology]
      ,[Chemistry]
      ,[Physics]
      ,[English]
      ,[ForeignLanguage]
      ,[Physical_Education]
 ,[Biology]+[Chemistry]+ [Physics] As Total_Science_Score
  ,[English]+[ForeignLanguage] As Total_language_Score

 From FirstSelect
 ),

--Here I am deriving another columns which was derived in previous statement

ThirdSelect AS
(
 Select
 [ID]
      ,[FirstName]
      ,[MiddleName]
      ,[LastName]
 ,[Maths]
      ,[Biology]
      ,[Chemistry]
      ,[Physics]
      ,[English]
      ,[ForeignLanguage]
      ,[Physical_Education]
 ,Total_Science_Score
 ,Total_language_Score
 , Total_Science_Score + Total_language_Score AS Total_Science_Language_Score
  from SecondSelect
 )

 Select * from ThirdSelect