Thursday, 18 September 2014

Replication, Data Mirroring, Change Data Capture OR AlwaysOnAvailibility Group ?

There are scenarios when we need to have secondary database, like for Reporting purpose or data backup or for disaster recovery.
Generally people used to get confused among different strategies, it is confusing to say what to use for what purpose without having proper information about all. Recently I faced a scenario where we need to have a separate database for reporting purpose. 
I went through multiple resources over the internet and thought to write this blog which can help people choosing the appropriate one among different strategies for their requirement.
Let's talk one by one. Here I also given the reference link which I referred and felt can give you further details in case if you need. 

There are three different Replication strategy:

1.Snapshot Replication:

Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. When synchronization occurs, the entire snapshot is generated and sent to Subscribers.

This could be a good approach when

·         Data does not change frequently.

·         Acceptable to have old data (out of date and not real time) for a period of time.

·         Replicating small amount of data.



2.Merge Replication:

Merge replication is typically used in server-to-client environments and most suitable for the following situations:

·         Multiple subscribers might update the data and propagate those change to the publisher and other subscribers.

·         Subscribers might change data in offline and later synchronization will happen among Publisher and other subscribers.


3.  Transactional Replication:

Transactional replication is typically used in server-to-server environments and most suitable for the following situations:

·         Need incremental changes to be propagated to subscriber.

·         Need near real time data with low latency.

·         Publisher has a very high volume of insert, update and delete activity.

·         Can be used for non-sql server database also.

Note: Transactional Replication is not possible if tables does not have primary key.



Data Mirroring:

            Database mirroring is a solution for increasing the availability of a SQL Server database. Mirroring is implemented on a per-database basis and works only with databases that use the full recovery model.

Database mirroring is best suitable options in following situations:

·         Need copy of primary database to make sure of data availability in case of any failure of Primary database.

This secondary database (mirror db) cannot be used for reporting db purpose. This is a deprecated features and will be removed in a feature version of SQL Server.


Change Data Capture:

Change data capture tracks  insert, update and delete in transaction log (similar to replication) and insert the changes into "Changing tables" associated with the tracked tables. The change tables are automatically created in the same database.

It is recommended for database which will have very minimal or small change in database. And it is not recommended for the following reasons:

·         For every change in production db, CDC does the another insert in same change table in same database, which can have a noticeable performance hit on production DB.

·         Need to write our own polling mechanism to pull changes from change table.

·         Data in tracking tables needs to be cleaned up periodically which get logged in log table, which can have a noticeable performance hit on production DB.

Note: Due to extra logging caused by internal change tables (and cleanup), plus the need to write custom code, plus limitations on schema changes, most of the people says that Transactional Replication beats Change Data Capture.

Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.


AlwaysOn Availability Groups:

This feature is introduced in SQL Server 2012, which allows to scale out read across multiple SQL Server instances. AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise and works as Data Mirroring works.

Disadvantages of using AlwaysOn over Transactional Replications:

·         AlwaysOn works as Data Mirroring and it mirrors everything in database where as Transactional replications allows to eliminates triggers, constraints and other definitions which is not required for reporting purpose.

·         AlwaysOn supported by SQL Server 2012 and later version only.

·         When using Transactional Replication , we can add extra indexes, tables, stored procedures, views etc only on replicated database as per requirement but in case of AlwaysOn, all must be added in primary database itself.

·         When using Transactional Replication, Replicated database can be maintained independently like security implementation and all.


Achieve High Availability for SQL Server

Thank You for reading. Don't forget to like it and feel free to give your feedback/comment.

Wednesday, 16 July 2014

NEWID() vs ROW_NUMBER() to fetch random records in sql server

There are scenarios, when requirement is to fetch random records from the database table. This can be achieved using NEWID() or ROW_NUMBER() methods of sql server.

Let see what these methods does. NEWID() methods create a GUID for each row and then perform the sorting on that, Whereas ROW_NUMBER() create a integer incremented value (ex. 1,2,3,4...) for each rows by performing sorting on specified column.

Since NEWID() generates a GUID for rows, it might be costly for two reasons:
              a) uses of disk I/O operation
              b) Performance issue for performing sorting on GUID.

Where as, using ROW_NUMBER() could be a good choice but in some scenario may not. To check the performance I did some research and I got below results which is bit surprising for me.

I did a performance testing for both the approach in a table with 17 columns and 4100911 (4 million +) rows, below is the results:

Using NEWID()
1.) Fetch 10 random records.
     Query: select top 10 * from [dbo].[MyTable] order by NEWID()
     Execution Time: 01:23 minutes
2.) Fetch 10 random records but only specified column
     Query: select top 10 col1 from [dbo].[MyCommon] order by NEWID()
     Execution Time: 03:05 minutes.

1.) Fetch 10 random records.
     Query: WITH tblrows AS(SELECT *, ROW_NUMBER() OVER (ORDER BY col1) AS  row     FROM [dbo].[MyTable] )
SELECT * FROM tblrows where row in (5, 12, 345, 3456, 37890, 56778, 222222, 123456, 222222, 333333)

     Execution Time: 16:35 minutes
2.) Fetch 10 random records but only specified column
     Query: WITH tblrows AS(SELECT col1, ROW_NUMBER() OVER (ORDER BY col1) AS  row     FROM [dbo].[MyTable] )
SELECT col1 FROM tblrows where row in (5, 12, 345, 3456, 37890, 56778, 222222, 123456, 222222, 333333)

     Execution Time: 00:17 minutes.

After looking into above results, it might be tricky to make a decision between NEWID() and ROW_NUMBER().

My purpose for this blog was to make you aware of behavior for these two methods.

Use below scripts to clean up sql server's buffer and cache, if you are testing query execution time.

Feel free to give your feedback. Thank you for reading.

Wednesday, 9 July 2014

Create SSIS Package programatically using SSIS API in c#

Hi friends,

Today I am going to show you, how you can create a SSIS package programmatically in C#.
Before that, let's discuss first problem statement or scenario where we might need this kind of solution.

Problem Statement: I need to extract data from multiple tables in a database and save that extracted data to a separate database for the purpose of read-only use (for example Reporting purpose). We need to do this because we want to avoid load on OLTP system. Requirement is to automate this process, so any time I should be able to create a new table (based on new requirement coming) and load/update data in that table without writing code each time.

Solution: Here I created a table(as per below mentioned table schema) programmatically in database now next step to load data on this table.

CREATE TABLE [dbo].[TestTableDest]
    [ProductID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [varchar](50) NOT NULL,
    [ProductNumber] [varchar](50) NOT NULL,
    [ProductID] ASC

 Now let's create SSIS package to load data to this table programmatically using C#. Requirement to insert new rows (if it is new) or update the existing row (if it is already exists) based on unique key value.
To do this we need below assemble reference to our project.

1. Microsoft.SqlServer.ADONETSrc
         Location: C:\Program Files (x86)\Microsoft SQL Server\100\DTS\PipelineComponents\Microsoft.SqlServer.ADONETSrc.dll
2. Microsoft.SqlServer.DTSPipelineWrap
         Location: C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.DTSPipelineWrap.dll
3. Microsoft.SqlServer.DTSRuntimeWrap
         Location: C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SQLServer.DTSRuntimeWrap.dll
4. Microsoft.SqlServer.ManagedDTS
         Location: C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SQLServer.ManagedDTS.dll
5. Microsoft.SqlServer.PipelineHost
         Location: C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.PipelineHost.dll

And then below is the code which will do our task as defined here.

using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;

namespace IDS.DynamicSSIS
    class Program3
        static void Main(string[] args)
            Application app = new Application();
            Package package = new Package();

            #region Add Data Flow Task
            Executable dataFlowTask = package.Executables.Add("STOCK:PipelineTask");

            // Set the name (otherwise it will be a random GUID value)
            TaskHost taskHost = dataFlowTask as TaskHost;
            taskHost.Name = "Data Flow Task";

            // We need a reference to the InnerObject to add items to the data flow
            MainPipe pipeline = taskHost.InnerObject as MainPipe;
            #endregion Add Data Flow Task

            #region Add connection manager
            ConnectionManager connection = package.Connections.Add("OLEDB");
            connection.Name = "localhost";
            connection.ConnectionString = "Data Source=localhost;Initial Catalog=TestDB;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;";
            #endregion Add connection manager

            #region Add OLE DB Source Data Flow
            // Add OLEDB Source
            IDTSComponentMetaData100 srcComponent = pipeline.ComponentMetaDataCollection.New();
            srcComponent.ComponentClassID = "DTSAdapter.OleDbSource";
            srcComponent.ValidateExternalMetadata = true;
            IDTSDesigntimeComponent100 srcDesignTimeComponent = srcComponent.Instantiate();
            srcComponent.Name = "OleDb Source";

            // Configure it to read from the given table
            srcDesignTimeComponent.SetComponentProperty("AccessMode", 2);// 2 - SQL Command
            srcDesignTimeComponent.SetComponentProperty("SqlCommand", "Select * from dbo.TestTableSource");

            // Set the connection manager
            srcComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
            srcComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

            // Retrieve the column metadata

            IDTSOutputColumnCollection100 sourceColumns = srcComponent.OutputCollection[0].OutputColumnCollection;

            #endregion Add OLE DB Source Data Flow

            #region lookup transform

            // Add transform
            IDTSComponentMetaData100 lookupComponent = pipeline.ComponentMetaDataCollection.New();
            lookupComponent.ComponentClassID = "DTSTransform.Lookup";
            lookupComponent.Name = "Lookup";

            CManagedComponentWrapper lookupWrapper = lookupComponent.Instantiate();

            // Connect the source and the transform
            IDTSPath100 lookUpPath = pipeline.PathCollection.New();
            lookUpPath.AttachPathAndPropagateNotifications(srcComponent.OutputCollection[0], lookupComponent.InputCollection[0]);

            // Set the connection manager
            lookupComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
            lookupComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

            // Cache Type - Full = 0, Partial = 1, None = 2
            lookupWrapper.SetComponentProperty("CacheType", 0);
            lookupWrapper.SetComponentProperty("NoMatchBehavior", 1);// 1= Redirect rows to No Match output
            lookupWrapper.SetComponentProperty("SqlCommand", "select Name, ProductNumber from [dbo].[TestTableDest]");

            // initialize metadata

            // Mark the columns we are joining on
            IDTSInput100 lookupInput = lookupComponent.InputCollection[0];
            IDTSInputColumnCollection100 lookupInputColumns = lookupInput.InputColumnCollection;
            IDTSVirtualInput100 lookupVirtualInput = lookupInput.GetVirtualInput();
            IDTSVirtualInputColumnCollection100 lookupVirtualInputColumns = lookupVirtualInput.VirtualInputColumnCollection;

            // Note: join columns should be marked as READONLY
            var joinColumns = new string[] { "Name" };
            foreach (string columnName in joinColumns)
                IDTSVirtualInputColumn100 virtualColumn = lookupVirtualInputColumns[columnName];
                IDTSInputColumn100 inputColumn = lookupWrapper.SetUsageType(lookupInput.ID, lookupVirtualInput, virtualColumn.LineageID, DTSUsageType.UT_READONLY);
                lookupWrapper.SetInputColumnProperty(lookupInput.ID, inputColumn.ID, "JoinToReferenceColumn", columnName);

            // First output is the Match output
            IDTSOutput100 lookupMatchOutput = lookupComponent.OutputCollection[0];

            // Second output is the Un-Match output
            IDTSOutput100 lookupNoMatchOutput = lookupComponent.OutputCollection[1];

            #endregion lookup transform

            #region Add OLE DB Destination Data Flow for No Match Output
            // Add OLEDB Source
            IDTSComponentMetaData100 destNoMatchComponent = pipeline.ComponentMetaDataCollection.New();
            destNoMatchComponent.ComponentClassID = "DTSAdapter.OleDbDestination";
            destNoMatchComponent.ValidateExternalMetadata = true;

            IDTSDesigntimeComponent100 destNoMatchDesignTimeComponent = destNoMatchComponent.Instantiate();
            destNoMatchComponent.Name = "OleDb Destination";

            // Configure it to read from the given table
            destNoMatchDesignTimeComponent.SetComponentProperty("AccessMode", 3);// 3 - OpenRowset
            destNoMatchDesignTimeComponent.SetComponentProperty("OpenRowset", "[dbo].[TestTableDest]");

            // Set the connection manager
            destNoMatchComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
            destNoMatchComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

            // Retrieve the column metadata

            #endregion Add OLE DB Destination Data Flow for No Match Output

            #region Add OLE DB Command Data Flow for Matching Output
            // Add OLEDB Source
            IDTSComponentMetaData100 destMatchComponent = pipeline.ComponentMetaDataCollection.New();
            destMatchComponent.ComponentClassID = "DTSTransform.OLEDBCommand";
            destMatchComponent.ValidateExternalMetadata = true;

            IDTSDesigntimeComponent100 destMatchDesignTimeComponent = destMatchComponent.Instantiate();
            destMatchComponent.Name = "OleDb Command";

            // Configure it to read from the given table
            destMatchDesignTimeComponent.SetComponentProperty("CommandTimeout", 0);
            destMatchDesignTimeComponent.SetComponentProperty("SqlCommand", "UPDATE [dbo].[TestTableDest] SET [ProductNumber] = ? WHERE [Name] = ?");

            // Set the connection manager
            destMatchComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
            destMatchComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

            // Retrieve the column metadata

            #endregion Add OLE DB Command Data Flow for No Match Output

            #region Connect source and destination for No Match Outputs
            IDTSPath100 noMatchPath = pipeline.PathCollection.New();
            noMatchPath.AttachPathAndPropagateNotifications(lookupNoMatchOutput, destNoMatchComponent.InputCollection[0]);

            // Configure the destination
            IDTSInput100 destNoMatchInput = destNoMatchComponent.InputCollection[0];
            IDTSVirtualInput100 destNoMatchVirInput = destNoMatchInput.GetVirtualInput();
            IDTSInputColumnCollection100 destNoMatchInputCols = destNoMatchInput.InputColumnCollection;
            IDTSExternalMetadataColumnCollection100 destNoMatchExtCols = destNoMatchInput.ExternalMetadataColumnCollection;

            // The OLEDB destination requires you to hook up the external columns
            foreach (IDTSOutputColumn100 outputCol in sourceColumns)
                // Get the external column id
                IDTSExternalMetadataColumn100 extCol = (IDTSExternalMetadataColumn100)destNoMatchExtCols[outputCol.Name];
                if (extCol != null)
                    // Create an input column from an output col of previous component.
                    destNoMatchVirInput.SetUsageType(outputCol.ID, DTSUsageType.UT_READONLY);
                    IDTSInputColumn100 inputCol = destNoMatchInputCols.GetInputColumnByLineageID(outputCol.ID);
                    if (inputCol != null)
                        // map the input column with an external metadata column
                        destNoMatchDesignTimeComponent.MapInputColumn(destNoMatchInput.ID, inputCol.ID, extCol.ID);

            #endregion Connect source and destination for No Match Outputs

            #region Connect source and destination for Matching Outputs
            IDTSPath100 matchPath = pipeline.PathCollection.New();
            matchPath.AttachPathAndPropagateNotifications(lookupMatchOutput, destMatchComponent.InputCollection[0]);

            // Configure the destination
            IDTSInput100 destMatchInput = destMatchComponent.InputCollection[0];
            IDTSVirtualInput100 destMatchVirInput = destMatchInput.GetVirtualInput();
            IDTSInputColumnCollection100 destMatchInputCols = destMatchInput.InputColumnCollection;
            IDTSExternalMetadataColumnCollection100 destMatchExtCols = destMatchInput.ExternalMetadataColumnCollection;

            // The OLEDB destination requires you to hook up the external columns
            Dictionary<string, string> parameters = new Dictionary<string, string>();
            parameters.Add("Param_0", "ProductNumber");
            parameters.Add("Param_1", "Name");

            foreach (KeyValuePair<string, string> paramValue in parameters)
                // Get the external column id
                IDTSExternalMetadataColumn100 extCol = (IDTSExternalMetadataColumn100)destMatchExtCols[paramValue.Key];
                if (extCol != null)
                    foreach (IDTSOutputColumn100 outputCol in sourceColumns)
                        if (outputCol.Name == paramValue.Value)
                            // Create an input column from an output col of previous component.
                            destMatchVirInput.SetUsageType(outputCol.ID, DTSUsageType.UT_READONLY);
                            IDTSInputColumn100 inputCol = destMatchInputCols.GetInputColumnByLineageID(outputCol.ID);
                            if (inputCol != null)
                                // map the input column with an external metadata column
                                destMatchDesignTimeComponent.MapInputColumn(destMatchInput.ID, inputCol.ID, extCol.ID);

            #endregion Connect source and destination for No Match Outputs

            #region save package as xml
            string packageXML = @"C:\ORCC\POC\pkg1.dtsx";
            XmlDocument myPkgDocument = new XmlDocument();
            package.SaveToXML(ref myPkgDocument, null, null);
            package.SaveToXML(ref myPkgDocument, null, null);
            app.SaveToXml(packageXML, package, null);
            #endregion save package as xml