in

SQLServerCentral.com

The largest free SQL Server community.

Ken Kaufman

May 2008 - Posts

  • Performance Effects on Column Type Changes

     

    The other day I was given a request to change a field from varchar(6) to varchar(10).  Not thinking it was a big deal I said just give me the checked in code and I'll make the change.  One of our sharper dev's (Bill Carlson) was concerned with the lock caused by this statement.  I'd never actually took a deep dive on how this would occur so I decided to run a quick test to see what would happen.  Making this change on a 2.5 million row table took 10 ms.  What this means is SQL is smart enough to understand that since it's growing a varchar field from 6 to 10 it didn't need to go to the data to verify this statement would succeed.  It simply had to make the meta data changes in the system tables.  However when I reverted this back from 10 to 6 it took 32 seconds.  In this case SQL had to verify each value of each record to ensure it was less then 7 characters. 

     

    This brought on the realization of how SQL Server verifies data during inserts and updates.  If you look at the structure of a record with variable lengths there is no meta data on the length of variable fields it simply knows to create offsets within the row. Checking to determine whether a field is too long occurs at a higher level as data comes in.  Once it passes this check data streams to the data page without any additional verification.

     

    After wetting my interest on varchars the next logical choice was to look at int's.  Starting out with an unfragmented table I changed a field from int to bigint, then back again from bigint to int.  Moving from an int to a bigint was almost 7x longer then the reverse.  This is was due to page splitting and allocation which reared it's head in perfmon.  The simple explanation here is that unlike varchar items, every value takes a defined space 4 bytes/8 bytes for int/bigint respectively.  Going from int to bigint added 4 bytes to each record, and forced pages to be allocated and split.  In my case the initial int was 4 bytes of a 13 byte row almost 30%,  growing it to 8 bytes made it almost 50%.  In terms of page size it went from one page holding 620 rows to 474.  I guess the moral of this is whenever you're looking at running ddl statements you need to consider what the real effects on the table are, and have someone bright looking over your shoulder.

     

  • Log Shipping with Security

     

     

    Many DBA's struggle with maintaining their security infrastructure while providing the ability for dev to access the delivery environment.   A common approach is to setup a NRT (Near Real Time) environment on a warm server.  There are several approaches to this.  If you're looking just to push data SSIS or replication seems to fit the needs.  The problem with these approaches is that it's not a true representation of delivery.  Since you're only shipping data it often leaves out schema items such as stored sprocs and views that might have changed in the last release.  In order to ensure development gets an actual representation of the DB the best method is to ship the entire database.  This requires replicating all transactions both dml and ddl in the db.

     

    This need leaves you with one of two choices, log shipping or mirroring with a snapshot.  Either approach requires maintaining a read-only copy of the db, with occasional refreshes.  For example in log shipping you might restore every hour in norecovery standby mode.  Or in mirroring every hour you peal off another snapshot. 

     

    Although moving the database provides users with most of their needs, there is the issue of security.  SQL Server has two levels of security the first is in the login where the user authenticates at the server level.  When a user authenticates against the server there corresponding SID in the syslogins table is cached.  It is this SID that gives the user access to the DB's on the server.  If a user's login SID at the server level, is identical to a user SID in the database, the user has the permissions of that user inside the DB.  If you create a user called "DEV", that user is assigned a random SID, on the server it was created on.   When you give that user access to a database the login SID is added to the sysusers table in the database.  It's the SID that's tracked not the user name, hence why you can have a different user name then login name.    If you log ship or mirror to another server the database in which the user Dev was created, requires an identical SID on that server.  This is because the user (DB) SID, resides inside the DB.  Having a database with users not corresponding to Login SIDs or users with a different SIDs  is referred to as orphaning a user.

     

    There are a few ways to sync the user and login SID.  The first is to execute the sp_change_users_login system stored procedure.  This is well documented in Books Online, however the issue here is that it changes the SID inside the DB, and since log shipping or snapshots are read-only, you can't write, even to a system table.  A different approach is to create the user on both servers with the same SID.  For example if on your source server you create a user called "orphan", the server will generate a SID at the server and database level that match.  To see this run the following commands.

     

     

    CREATE LOGIN orphan WITH PASSWORD = 'Password123456';

    go

    use Adventureworks

    go

    CREATE USER orphan FOR LOGIN orphan;

    select sid from sys.sysusers where name = 'orphan';

    select sid from sys.syslogins where name = 'orphan';

     

    Once created on the source server take the SID you queried for and create the same user on your destination server.

     

    CREATE LOGIN orphan WITH PASSWORD = 'Password123456',

    sid = 0x61BECB8A1CB0394FB5FC7393F6C0C3A2

     

     

    Now any database moved from the source server to secondary server that contains this user can be accessed by this account.  However this isn't the full fix quite yet.  If the user's SID and password is the same on both servers, then anyone that knows the password can access both servers.  The simple change is to have different passwords on each server.  This is the nice part of this two phase authentication, you're not tied to a password but a sid.  So on the second server run the following command.

     

    CREATE LOGIN orphan WITH PASSWORD = 'NewPassword123456',

    sid = 0x61BECB8A1CB0394FB5FC7393F6C0C3A2

     

    With this in place data can be logged shipped or mirror/snapshot without any security issues in place.  One other note this is based on sql server authentication.  Using windows authentication presents a different problem, in that the SID is the NT SID, and hence no matter what server you're on changing the password is not a possibility. 

     

     

  • Transferring Data In Memory using SQLBulkCopy Class

    In a SQL Server enterprise there are several ways to move data, choosing the best method, is often more of a challenge then actually implementing.  The key factors involved are performance, Dev time, and ease of implementation.  A nice utility in the .Net framework is the SqlBulkCopy class.  This leverages the sturdy dll's of the command line tool BCP, with one major advantage, it's done in memory.  Using BCP to import data requires a disk location.  If you want to use BCP to move data from one server to the next, you must drop to disk from the source, then import from disk into the destination.  Until solid state drives become a reality, this method comes with a heavy overhead tax.  This is where the bulk copy class comes into play.  It eliminates this bottleneck, by working within memory.  This satisfies the performance requirement.  Leveraging Sql Server's support for assemblies, gives the ability to execute from within a SQL Stored Sproc, and satisfies the ease of implementation.  

     

    The core under workings of using this class is attaching to the source using ADO with a command reader to the data, then transferring this data to the SqlBulkCopy class which pushes to the destination.  At its simplest implementation you could have a working assembly in less the ten lines of code, which satisfies Dev Time requirements.  In the technical world every alternative solution has its own cost.  Using this method transfers your data buffering to a more expensive storage in the form of memory.  This is not as big an issue as one would expect.  Using the BCP command line tool requires transferring the entire data set to disk, then picking it up again.  Using a data reader allows for a smaller foot print, were data is fetched and inserted in a pipelined operation. Because it's pipelined you can easily transfer a GB of data with less then 20 MB of memory. 

     

    There are several samples of this on the net, and in Visual Studio documentation of how to do this.  I've included a recent code snippet I used.  As with most code it has it's own flair, essentially focusing on data verification, by counting rows at the source and destination and allowing for the ability to either sync (truncate the destination) or append data. 

     

     

    Managed Code

    public static void pushData(string sourceTable, string destServer,

                string destDB, string destTable, int batchSize, int tearDown)

            {

             

                    //Declare Variables

                    string _destServer;

                    string _destDB;

                    string _query;

                    string _destTable;

                    int _batchSize;

                    string destConnString;

                    string _cntDestTable;

                    string _cntSourceTable;

                    int _cntDestResult;

                    int _cntPriorDestResult;

                    int _cntDestReturn;

                    int _cntSourceResult;

                    string _cntError;

                   

     

                    //Set Variables

                    _destDB = destDB;

                    _destServer = destServer;

                    _destTable = destTable;

                    _batchSize = batchSize;

                    //Source Query

                    _query = "select * from " + sourceTable;

                    //Verify Queries

                    _cntDestTable = "select count(*) from " + destTable;

                    _cntSourceTable = "select count(*) from " + sourceTable;

                    _cntError = "RAISERROR ('Source and Dest Counts Dont Match', 16, 1)";

                

                    //Build Destination ConnString

                    destConnString = "Data Source=" + _destServer + ";Initial Catalog=" +

                    _destDB + ";Integrated Security=True";

     

                    //connect Locally get data

                    SqlConnection oConn = new SqlConnection("context connection=true");

                    oConn.Open();

     

                    //Gets Count of Source used to verify at end of assembly

                    SqlCommand cmdCntSource = new SqlCommand(_cntSourceTable, oConn);

                    SqlDataReader sourceCntReaderPrior = cmdCntSource.ExecuteReader();

                    sourceCntReaderPrior.Read();

                    _cntSourceResult = (int)sourceCntReaderPrior[0];

                    sourceCntReaderPrior.Close();

     

                    //Get Rows into reader to push   

                    SqlCommand cmd = new SqlCommand(_query, oConn);

                    //pull into reader

                    SqlDataReader reader = cmd.ExecuteReader();

     

     

                    //Open Connection to Destination

                    SqlConnection destConn = new SqlConnection(destConnString);

                    destConn.Open();

     

                    //Truncate Destination if needed else Get Existing row count

                    if (tearDown == 1)

                    {

                        string tearQuery = "truncate table " + _destTable;

                        SqlCommand cmdTear = new SqlCommand(tearQuery,destConn);

                        cmdTear.ExecuteNonQuery();

                        //Set seed count to 0

                        _cntPriorDestResult = 0;

                     }

                    else

                    {

                       //Get Seed Count in Dest table

                        SqlCommand cmdCntDest =new SqlCommand(_cntDestTable, destConn);

                        SqlDataReader destCntReaderPrior = cmdCntDest.ExecuteReader();

                        destCntReaderPrior.Read();

                        _cntPriorDestResult = (int)destCntReaderPrior[0];

                        destCntReaderPrior.Close();

                      }

                               

                    //Move Data to desting

                    SqlBulkCopy sbc = new SqlBulkCopy(destConn);

                    sbc.DestinationTableName = _destTable;

                    sbc.BatchSize = _batchSize;

                    sbc.WriteToServer(reader);

                    reader.Close();

     

                    //Get count of Destination after insert

                    SqlCommand dstCmd = new SqlCommand(_cntDestTable, destConn);

                    SqlDataReader destCntReader = dstCmd.ExecuteReader();

                    destCntReader.Read();

                    _cntDestResult = (int)destCntReader[0];

                    destCntReader.Close();

                   

                    //Calculate Rows Copied

                    _cntDestReturn = _cntDestResult - _cntPriorDestResult;

     

                    //Compare Source/Dest for Accuracy If they don't match send error

                    if (_cntSourceResult != _cntDestReturn)

                    {

                       

                        SqlCommand cmdCntError = new SqlCommand(_cntError, oConn);

                        SqlContext.Pipe.ExecuteAndSend(cmdCntError);

                       

                    }

                    else

                    {

                        //Send results and Cleanup

                        SqlContext.Pipe.Send(_cntSourceResult.ToString() + " Rows sent :)");

                        oConn.Close();

                        destConn.Close();

                    }

                  

     

                

            }

     

     

    The stored sproc to call this would look like this:

     

    CREATE PROCEDURE [dbo].[sp_BCP]

          @sourceTable [nvarchar](50),

          @destServer [nvarchar](50),

          @destDB [nvarchar](50),

          @destTable [nvarchar](50),

          @batchSize [int],

          @tearDown [int]

    WITH EXECUTE AS CALLER

    AS

    EXTERNAL NAME [bcpTest].[BCPData].[pushData]

    Posted May 08 2008, 05:21 PM by Ken Kaufman with no comments
    Filed under: ,
Copyright Red Gate Software
Powered by Community Server (Commercial Edition), by Telligent Systems