Thanks to everyone in the New England Visual Basic Professionals group for coming tonight!  I had a great time talking about some of the new features in ADO.NET 2.0.

Slides and the demo posted here.

I also fixed the bug in the demo.  Turned out that it was a repaint issue on the DataGridView.  Solution: After removing the row in the SqlDependency event handler, I call the Invalidate() method on the DataGridView.  That forces a full re-paint.  Annoying that it didn't do that automatically!


Just a reminder that tomorrow night I'll be doing a talk on ADO.NET 2.0 enhancements, geared mostly towards those that can be used in conjuction with SQL Server 2005.  This will be at the New England Visual Basic Professionals User Group.

The group meets at Microsoft's Waltham offices, 201 Jones Road, 6th floor.  The talk will start at around 6:15, and I expect that it will go around 90 minutes.

Following is the abstract:


The new versions of ADO.NET and SQL Server can more seamlessly interact than any previous combination of DBMS and data access framework. The combination of the two allows for much easier development of a wide variety of database-centric applications. In this session you'll learn to use such features as Query Notifications, MARS, asynchronous commands, and bulk copy.  These features provide a powerful new foundation for creating heavily data-driven applications.

I'm finishing up the talk now--and also trying to learn enough VB.NET so that I can say something intelligent to the group on the topic instead of my usual ranting and raving about how much more readable I think C# is. (It is.  Really.  I don't know what kind of carpal tunnel-minded masochists would choose to program in VB.NET.  Seriously.)

So if you're going to be there and have any last-minute topic requests let me know, and I'll see if I can integrate them into the talk!

Back again!  Fourth post for the month of February, making this my best posting month in, well, months.  Expect this trend to continue.

After yesterday's post on running sums and the evils of cursors, Jamie Thompson came up with a faster solution than the curser I posted.  Alas, Jamie's solution uses an undocumented form of UPDATE syntax, and I am really not comfortable using it.  So I set out to find still another solution.  As promised at the end of my last post, SQLCLR is where I looked.  And my instinct proved correct.

Jamie's solution runs in 4 seconds on my laptop.  Compared to 14 seconds for the cursor I posted, that's a great enhancement.  But I knew that we could do better still, and without undocumented syntax and temp tables.

The answer?  A SQLCLR stored procedure.  Same logic as the cursor: Pull back the data in order, then loop over the rows and maintain the running sum in a variable.  But thanks to the SqlPipe's SendResults methods, we don't need a temporary table for this one -- the results can be sent back one row at a time, and will still show up on the client as a single result set.

Here's how I did it:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void TransactionHistoryRunningSum()
    {
        using (SqlConnection conn = new SqlConnection("context connection=true;"))
        {
            SqlCommand comm = new SqlCommand();
            comm.Connection = conn;
            comm.CommandText = @"" +
                "SELECT TransactionID, ActualCost " +
                "FROM Production.TransactionHistory " +
                "ORDER BY TransactionID";

            SqlMetaData[] columns = new SqlMetaData[3];
            columns[0] = new SqlMetaData("TransactionID", SqlDbType.Int);
            columns[1] = new SqlMetaData("ActualCost", SqlDbType.Money);
            columns[2] = new SqlMetaData("RunningTotal", SqlDbType.Money);

            decimal RunningSum = 0;

            SqlDataRecord record = new SqlDataRecord(columns);

            SqlContext.Pipe.SendResultsStart(record);

            conn.Open();

            SqlDataReader reader = comm.ExecuteReader();

            while (reader.Read())
            {
                decimal ActualCost = (decimal)reader[1];
                RunningSum += ActualCost;

                record.SetInt32(0, (int)reader[0]);
                record.SetDecimal(1, ActualCost);              
                record.SetDecimal(2, RunningSum);

                SqlContext.Pipe.SendResultsRow(record);
            }

            SqlContext.Pipe.SendResultsEnd();
        }
    }
};

Results?  3 seconds on my laptop.  25% better than the previous best solution.  Not bad!

And, I even did better than that.  By creating a CLR table-valued user-defined function that uses a custom class wrapping a SqlDataReader, I was able to get this operation down to 2 seconds on my laptop.  However, due to restrictions related to passing around context connections, that solution can not use a context connection and is therefore highly suboptimal.  Until I figure out how to pass around a context connection (or if it's even possible), I'll keep that one off the blog.

So to recap: We certainly have not gotten rid of the cursor.  This SQLCLR solution is really just a cursor in disguise.  But we've built a better cursor, because this one doesn't require temporary tables.  And that I can live with--for now.

Thanks to Jamie Thompson for prompting me to not wait several weeks before following up as I usually do with these posts!

Siddhartha Gautama, the Buddha, taught us to understand that the key to enlightenment is following the Middle Path.  And today I learned a valuable lesson in extremes.  You can file this one in the "Doh!  Wrong again!" category...

A fairly common question on SQL Server forums is, "how can I get the running sum of the data in this column?"  Being the fan of set-based queries that I am, I always answer the exact same way.  I show the person asking the question how to do a self-join on the grouped column, getting all of the "previous" values to create a running sum.  The following example shows how you might do this against the AdventureWorks Production.TransactionHistory table:

SELECT
    TH1.TransactionID,
    TH1.ActualCost,
    SUM(TH2.ActualCost) AS RunningTotal
FROM Production.TransactionHistory TH1
JOIN Production.TransactionHistory TH2 ON TH2.TransactionID <= TH1.TransactionID
GROUP BY TH1.TransactionID, TH1.ActualCost
ORDER BY TH1.TransactionID

Pretty simple query.  For each row of the "TH1" alias, every row with a lesser-or-equal TransactionID will be summed.  Thereby creating a running total for every row of the table.  Note, I've used the IDENTITY column instead of the date column.  I'd generally suggest not doing so because, e.g., you might need to insert some post-dated rows at some point and relying on the IDENTITY for a time sequence will thereby not work.  But in this case it's a lazy solution because the TransactionDate column isn't indexed, and it's also not unique.  I want to test a lot of rows (TransactionHistory has around 113,000), but I don't want to skew the test by forcing a table scan on every iteration!

But I digress.  The point is, I've given this answer more than a few times and, well, I'd like to apologize.  Just now I went ahead and ran this query on my powerful test server--err, my laptop. 

As you might guess, since I'm performance-minded I also happen to be extremely impatient--so I went ahead and killed the query at the five-minute mark.  SSMS's result grid showed the first 26,568 rows, so obviously there was a long way to go to hit that 113,000 mark.  And with an estimated cost of 38,086 for the query, I can't say I'm surprised.

A few moments of head scratching and the following re-write was issued:

SELECT
    TH1.TransactionID,
    TH1.ActualCost,
    (
        SELECT SUM(TH2.ActualCost)
        FROM Production.TransactionHistory TH2
        WHERE TH2.TransactionID <= TH1.TransactionID
    ) AS RunningTotal
FROM Production.TransactionHistory TH1
ORDER BY TH1.TransactionID

With an estimated cost of only 6,630, I had high hopes for this one.  Alas, once again I was forced to cancel the query at the five-minute mark.  27,683 rows.  Not much better, I'm afraid.  And, as an aside, I'm starting to wonder about these estimated costs.  But that's another post for another day.

So where am I going with all of this?  Well, there's a reason I haven't given any indication up until this point in the post.  You see, it's utterly painful to write this, but...

In this case, a cursor is faster.

Yes, I said it.  That evil construct which we as database developers despise, the cursor.  Thanks to Paul Nielsen, who revealed this ugly fact to me in a conversation today, I was forced to test this for myself (hoping to prove him wrong, of course).  Which is why I started playing around with the solution that I've given so many times on forums.  Unfortunately, he is correct.

My next test query, using the first cursor I've written in several years:

DECLARE RunningTotalCursor
CURSOR LOCAL FAST_FORWARD FOR
    SELECT TransactionID, ActualCost
    FROM Production.TransactionHistory
    ORDER BY TransactionID

OPEN RunningTotalCursor

DECLARE @TransactionID INT
DECLARE @ActualCost MONEY

DECLARE @RunningTotal MONEY
SET @RunningTotal = 0

DECLARE @Results TABLE
(
    TransactionID INT NOT NULL PRIMARY KEY,
    ActualCost MONEY,
    RunningTotal MONEY
)

FETCH NEXT FROM RunningTotalCursor
INTO @TransactionID, @ActualCost

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @RunningTotal = @RunningTotal + @ActualCost

    INSERT @Results
    VALUES (@TransactionID, @ActualCost, @RunningTotal)

    FETCH NEXT FROM RunningTotalCursor
    INTO @TransactionID, @ActualCost
END

CLOSE RunningTotalCursor

DEALLOCATE RunningTotalCursor

SELECT *
FROM @Results
ORDER BY TransactionID

What's really unfortunate about the cursor approach is that you need to use a temporary table if you want to return a single result set to the client. I figured the additional I/O due to the temp table would balance any improvement gains from the cursor approach, thereby rendering my forum responses correct, and Paul wrong.  Well, 14 seconds and 113,443 rows later, SSMS and my laptop declared Paul the undisputed Champion of the Cursor.

This cursor makes a lot of sense in this case.  The set-based query works by looping over each row of the table, taking a sum of every previous row.  So for the 10th row, 10 previous rows also need to be visited.  For the 1000th row, 1000 previous rows need to be visited.  And so on.  The larger the set gets, the worse performance will be--and that's not going to be a merely linear decrease in performance.  Think about this:  Using the set-based method to find the running sum over a set of 100 rows, 5050 total rows need to be visited.  For a set of 200 rows, the query processor needs to visit 20100 total rows -- a four-fold increase in the amount of work that must be done to satisfy the query (O((N^2)/2), for those who are a bit more algorithmically minded.)

The cursor, on the other hand, needs to visit each row exactly once (O(N)). By maintaining the running count in a variable, there is no need to re-visit previous rows.  And as my laptop was so happy to show me, the I/O cost due to the temp table does not overshadow the performance improvement of having to visit so many less rows.

So what have we learned today?  In my set-based singlemindedness I failed to realize that the cursor does, indeed, have utility.  Everything in moderation.

Next steps?  I get the feeling that this can be made even faster by employing a CLR routine.  Pull the data into a DataReader and loop over that instead, which will completely eliminate the need for a temporary table.  Watch for that experiment coming to this space soon.

And next time you hear someone mention how horrible cursors are, remind that person that there is a time and place for everything (and it's called college).

Jeff Smith of SQLTeam brings us a great blog post about data in tables vs. data in code.

I especially like his fourth example, which involves "lookup" tables and what I like to call "magic keys" (generally referred to as "magic numbers" in other types of code, but who said that keys have to be numbers?)  How often do you see particular surrogate key values hardcoded into SQL or application code?  I see this practice all the time in my work, and I completely agree with Jeff: It's a VERY dangerous habit to get into.

One client I recently did some work for has hundreds of different lookup codes, each with both a text-based representation (usually in the format, "domain.code" where domain is the business vertical and code is the actual code within that vertical) and a surrogate integer key (IDENTITY).  The text-based codes, I don't have that much of a problem with, but in many cases developers chose instead to hardcode the integer keys -- from the development database they happened to be working on!  And no one ever stopped this practice; instead, the database people supported the effort by creating data load scripts to ensure that the integer keys in the development database matched those in QA and production systems.

So now you have a bunch of code hanging around that looks like:

WHERE SomeId IN (44332, 45084, 59005)

Of course, there are no comments anywhere to help someone understand what a key such as 44332 might represent.  And as a result, there are people there who actually have big chunks of these codes memorized, because it's faster to remember than to look them up every time while maintaining the scripts.

This problem is certainly not restricted to that shop.  I've seen it all over the place both in my work and in forums.  It's a very common anti-pattern and I think it highlights the need for developers to take a step back from coding and think just for a moment.  Ponder maintainability.  Do you really think that the next developer who looks at your code will know what these numbers represent?  Do you think you will know, when you look at your code again six months or a year down the road?  Furthermore, do you really want to have to memorize codes?  Don't you have better things to do with your time and brain power?

Thanks, Jeff, for bringing up such an important issue!
An extremely common question in forums is, "How can I use the GETDATE() function in a UDF?"  Because the GETDATE() function is nondeterministic, it is not allowed in SQL Server 2000 UDFs.

To date, I've always made it a point to answer the question the same way:  "Add a  DATETIME parameter to the UDF and pass it in."  Yes, you can create a view that selects GETDATE() and select from the view in your UDF to work around the restriction, but that really seems like a dirty hack to me.  I'm not sure why, but passing in the date just feels cleaner...

But after answering this question the same way so many times, I was shocked to read a blog post by Louis Davidson in which he points out that the restriction has been lifted in SQL Server 2005.  You can now use GETDATE() within a UDF; no need to either pass it in or create the view. 

Great job spotting that change, Louis!  I never would have thought to look for a modification of that behavior.  This is one of those little tiny annoyances that crops up every once in a while (always at a very bad time), and it's good to know that we no longer have to be concerned with how to work around this issue.


... For I have won Phil Phactor's Impenetrable Code contest!

I am still writing my acceptance speech for the prestigious awards ceremony, but I have to say, it's a great feeling to know that I can write thoroughly unmaintainable SQL.  It's something that I think every developer should aspire to--if they value job security above all else!  In today's economy, you just never know when you'll be let go.  So I say, give yourself that little edge.  Make yourself invaluable the old fashioned way: make it painful to lose you and your "skills".  And even if they do fire you for incompetence, they'll still need to bring you back as a consultant six months down the road when they need to maintain your sad excuse for code.  And that, my friends, is the Good Life in a nutshell.

My winning entry in this fabulous test of coding skill:


SELECT
UPPER(RIGHT(LEFT(@@VERSION,2),1))+SUBSTRING(MiR,1,15%6)+RIGHT(LEFT
(CONVERT(VARCHAR,[c6[[447]),4),1)+CHAR(CAST(STUFF(ZuB,2,0,REPLICATE('0',
1))AS INT))+(SELECT(SUBSTRING(name,3,1))FROM[master]..[sysdatabases]
WHERE(dbid)=(SELECT(MIN(dbid))FROM[master]..[sysdatabases]))+COALESCE(
SUBSTRING([c6[[447],0^1034,((15-10)/5)-1),SPACE(1))+LEFT(SPACE(1)+RIGHT(
[rD9 005],LEN([rD9 005])-(ASCII('9')-ASCII('0'))),2)+SUBSTRING([556 X10.]
,2,1)+RIGHT(REVERSE(rNNNNNNoPZZ),1)+SUBSTRING([rD9 005],7,1)+[64]+CHAR
(ASCII(RIGHT(REVERSE(MiR),1))-14)+RIGHT(MiR,1)+CHAR(ASCII(LEFT(MiR,1))
-(ASCII(LEFT(REVERSE(MiR),1))-ASCII(LEFT(MiR,1))))+CHAR(ASCII(LEFT(
(SELECT(SUBSTRING(REVERSE(rNNNNNNoPZZ),3,2))x00R),1))-4)+RIGHT(LEFT
([c6[[447],6),2)+RIGHT([556 X10.],1)+('?')FROM(SELECT(STUFF(SUBSTRING(
DB_NAME(1),3,2),2,0,CHAR(32))),CONVERT(VARCHAR,low+high-1)FROM[master]
..[spt_values][xR234]WHERE((xR234.type)='P')AND(xR234.number)=105)
[x4401C6](MiR,ZuB),(SELECT TOP 1(0x72723668626C6C4D78785A3437),name,
(TYPE_NAME)FROM[master]..sysobjects,master..[spt_datatype_info]WHERE
CONVERT(varbinary,name)=(0x730070005F004D00530072006500740072006900)
+(0X6500760065005F007000)AND(ss_dtype)=(34)ORDER BY(id))[889R U]([c6[[447],
[rD9 005],[556 X10.]),(SELECT(MIN(SUBSTRING(name,6,5))), MAX(RIGHT(name,
1))FROM[master]..[syscolumns]WHERE(name)=CONVERT(varchar,0x646174615F707265636973696F6E
))MxxxxxB01(rNNNNNNoPZZ,[64])


... Quality is job #1!
... Yet another attempt to keep my blog from feeling like I've completely abandoned it!  Here are some things I've been working on recently:

  • New article published in SQL Server Professional magazine: Debugging with Macros and Assertions. The article talks about debugging using TSQLMacro and TSQLAssert.
  • To go along with that article, I've put out a service release of TSQLMacro with a couple of bug fixes (no new features yet--coming soon!)
  • Took beta exams 71-431 and 71-441.  I felt that while both exams were better than 228 and 229, neither were perfect.  There is still a lot of ambiguity in many of the questions.  I left a lot of comments for both exams, so hopefully MS Learning will read them.  I'm taking 71-443 and 71-444 this week.  It should be interesting to see how those stack up!
  • I was featured as Database Geek of the Week a couple of weeks ago.

Tom has already blogged about it, and so has Louis.

But for me, it wasn't real until Friday night when UPS showed up and delivered my copies.  I have to say, seeing my name on the cover of a book for the first time was a great feeling.  I put a lot of work into my chapters, and I'm happy with the outcome.  I think the entire book came out quite well.

I wrote chapters 4, 5, and 6:  "T-SQL for DBAs", ".NET Integration", and "Programming Assemblies".  Chapter 4 includes material on topics like online indexing, partitioning, SNAPSHOT isolation (still my favorite SQL Server 2005 feature), etc.  Chapters 5 and 6 are dedicated to CLR integration, as you can probably tell by their titles.  Chapter 5 is basically an in-depth look at ADO.NET integration and writing a CLR stored procedure.  Chapter 6 delves into UDTs, UDFs, and UDAs, and triggers.

So please check it out if you're looking for a SQL Server 2005 book!  And of course, send me any comments on the material I wrote!

Thank you to everyone who joined us for yesterday's Mini Code Camp on SQL Server programming!  We had a great turnout, with lots of good questions.

Thanks to my co-speaker, Andrew Novick, and especially big thanks to Thom Robbins for making this happen.

Slides and demos are posted here: http://www.datamanipulation.net/SQLServerCodeCamp/

Even if you weren't there, check it out!  There is a lot of great stuff in there.



PASS 2005 Community Summit coverage by Universal Thread.

The Professional Association for SQL Server 2005 was a great success. The conference was covered by the community site Universal Thread.  The final report includes the recap of day 3, a video of several interviews with key speakers, a list of blog of each speaker which have been interviewed and more then 100 pictures. Thanks to Daniel LeClair, Dave Bernard and Jean-RenĂ© Roy who coordinated their effort to deliver another excellent report.

You can find the report at this link: http://www.utcoverage.com/Pass/2005/

The pictures: http://www.utcoverage.com/Pass/2005/Picture.asp

And the 7 min. video: http://www.utcoverage.com/Pass/2005#6

Yet another post about PASS.

In essence:  I had a great time.

Among others, and in no particular order, I met up with...

Kathi Kellenberger, Steve Jones, Brian Knight, Andy Warren, Steve Kass, Erland Sommarskog, Peter DeBetta, Hilary Cotter, Kent Tegels, Michael Rys, Ken Henderson, Stephen Dybing, Ben Miller, Linchi Shea, Louis Davidson, Bill Graziano (the DBA, yo!), Robyn Lorusso, Aaron Bertrand, Kevin Kline, Roman Rehak, Jean-Rene Roy, Rick Heiges, Eladio Rincon, Fernando Guerrero, Brian Moran, Kalen Delaney, Bob Beauchemin, Greg Low, Joe Celko, Chris Hedgate, Jamie Thompson, Andrew Kelly, Alejandro Leguizamo, Dejan Sarka, Randy Dyess, Tara Duggan, Nigel Rivett, Kirk Haselden, Vineet Rao, Itzik Ben-Gan, (The Doctor) Tom Moreau, Darren Green, Allan Mitchell, Don Kiely ...

That's about all I can remember right now, as I've just returned from the horrible hotel sports bar, the Texan Station.  If I've forgotten you and you care, drop me a line and I'll consider issuing a public apology.  Yes, consider is the key word.  You'll definitely have to grovel!

My session on Thursday went pretty well, except for a slight demo SNAFU at the end.  To anyone who was there and is looking for the code, I have now uploaded it to the PASS site, and I figured out what went wrong:  I forgot to run the second-to-last script!  So the code does work, it just doesn't work automatically.  Please download and check it out for yourself.

Aside from that, definitely fun.  I spent a lot of time in the Supplementary Question Lounge dodging questions and even answering one or two.  I talked to a lot of people (see above).  I drank too many beers in the Texan Station.  And that's about it.  Until next year...
I've been incredibly busy as of late, but hopefully things will settle down a bit in the next couple of weeks.

Today is my second-to-last day with GetConnected.  I have decided to become an independent consultant.  So if you're reading and you need some work done, contact me through the blog!  As for GetConnected, I had a great 15 months there.  If you're reading this and need a job, check them out.  They're hiring for several software development postitions at the moment, including both some reporting people and a Senior DBA.  Tell them I sent you.  Maybe that will help... Or maybe it will cost you the job.  But hey, you won't know until you try!

Saturday morning, I'm speaking at Code Camp 4.  I'll be talking about T-SQL enhancements in SQL Server 2005.  Should be a good talk.  My deck is close to being finished, and I'm extremely happy with it.  I think it might be the best Powerpoint I've written to date.  Hopefully the code samples will come together nicely, as well.

Next week I'm headed to the PASS Community Summit in Grapevine, TX (between Dallas and Fort Worth, I'm told.)  I'll be speaking Thursday afternoon, doing my talk on "structured" SQL Server development, involving exception handling, unit testing, and programmatic debugging.  I'll also be hanging out in the "lounge" that PASS is setting up, answering questions on SQL Server.  I'll be there from 1:00 - 3:00 on Wednesday, so stop by!

Two things announced today by Microsoft:

1. The September CTP is available on MSDN.  Go get it!

2. Database Mirroring will not be available for production use in the RTM release.  It will be available some time in the first half of 2006

The feature will still be in the product, but will be disabled by default.  To enable the feature, DBAs can turn on Trace Flag 1400.  Note that this feature will not be supported for production use until Microsoft gives it a thumbs-up.

Mirroring is still turned on by default in the September CTP.
I spoke at the Beantown .NET user group meeting tonight, on the topic of SQLCLR in SQL Server 2005.

One of the questions that came up during the UDT part of the talk was whether static properties are supported.  Unfortunately, I had no answer at the time--it's not something I'd yet thought to try.

The answer, as it turns out, is yes: they are supported.  But they must be defined as readonly, e.g.:
public static readonly int foo;
As it turns out, this means that they can also only be initialized from a static constructor:
static myType()
{
    foo = 1;
}
... which means that value will stick around from the time the type is first used, until the AppDomain is reset (for example, if SQL Server is restarted).

In my opinion, this greatly limits many use cases.  One might, I suppose, have some expensive, yet rarely-modified data to initialize the member with, and get that data on the first pass only. However, if the data does chang, I'm not sure that it would be easy to reset the AppDomain.  Do you really want to restart SQL Server in production environments to update static members?

Another use case I can think of is logging.  Perhaps there are situations in which you'd want to log the first time a type is used.  But that doesn't seem incredibly interesting.

If someone else reading has a more compelling use case, I'd be interested in hearing it!
More Posts Next page »