Stop Using OPENXML (Please...)

Published 20 January 08 08:33 PM | Mike C

As we all know by now, the xml data type is one of the new features introduced in SQL Server 2005. The xml data type has a handy feature for shredding XML data, the nodes() method; but based on newsgroup postings and various articles on SQL Server 2005 XML it seems that developers are still stuck in an OPENXML state of mind. OPENXML is a rowset provider that was sort of thrown onto the tail end of SQL Server 2000, along with a couple of XML-specific system stored procedures, to implement XML document shredding. OPENXML shredding, circa 2000, looked like this:

DECLARE @xml VARCHAR(8000);

-- Character representation of the XML
SET @xml = '<capitals>
<state name="Alabama"
  abbreviation="AL"
 
 capital="Montgomery"
 
flag="AL.gif"
  date="December 14, 1819"
  fact="Rosa Parks refused to give up her seat on a Montgomery bus in 1955. The Montgomery Bus Boycott kicked off the Civil Rights era a few days later."
 
address="600 Dexter Ave"
  zip="36130"
  long="-86.301963"
  lat="32.377189" />
<state name="Alaska"
  abbreviation="AK"
  capital="Juneau"
  flag="AK.gif"
  date="January 3, 1959"
  fact="In 1867 United States Secretary of State William H. Seward offered Russia $7,200,000, or two cents per acre, for Alaska."
  address="120 4th Street"
  zip="99801"
  long="-134.410699"
  lat="58.301072" />
<state name="Arizona"
  abbreviation="AZ"
  capital="Phoenix"
  flag="AZ.gif"
  date="February 14, 1912"
  fact="Arizona is home of the Grand Canyon National Park."
  address="1700 West Washington St"
  zip="85007"
  long="-112.095704"
  lat="33.448543" />
</capitals>'
;

-- Internal handle for the XML DOM document
DECLARE @iDoc INT;

-- Create an XML DOM representation of the document
EXEC sp_xml_preparedocument @iDoc OUTPUT, @xml;

-- Shred the document with OPENXML
SELECT *
FROM OPENXML(@iDoc, '/capitals/state')
WITH (Name VARCHAR(100) '@name',
  Abbreviation VARCHAR(2) '@abbreviation',
  Capital VARCHAR(100) '@capital',
  Flag VARCHAR(100) '@flag',
  Date VARCHAR(100) '@date',
  Fact VARCHAR(500) '@fact',
  Address VARCHAR(100) '@address',
  ZIP VARCHAR(5) '@zip',
  Long FLOAT '@long',
  Lat FLOAT '@lat');

-- You have to manually remove the DOM document from memory
EXEC sp_xml_removedocument @iDoc;

The XML document used here is borrowed from a sample I created for an article a while back. Apart from the complexity involved with the OPENXML syntax, on SQL 2000 you also had to manually manage memory allocation. The sp_xml_preparedocument procedure invoked MSXML through COM, and automatically allocated 1/8th of the server's total memory to the XML cache. If your server has 2 GB of RAM, the short sample above will allocate 250 MB (!) to process the 1,200 byte document. At the end of processing, you have to call sp_xml_removedocument to free up the allocated memory. Dealing with XML documents larger than 8,000 bytes on SQL 2000 was a seriously convoluted undertaking. The xml data type nodes() method eliminates the sp_xml_preparedocument and sp_xml_removedocument steps, while easily handling very large documents (up to 2.1 GB). The xml data type nodes() method is shown below:

DECLARE @xml XML;

-- Populate the XML variable
SET @xml = '<capitals>
<state name="Alabama"
  abbreviation="AL"
 
 capital="Montgomery"
 
flag="AL.gif"
  date="December 14, 1819"
  fact="Rosa Parks refused to give up her seat on a Montgomery bus in 1955. The Montgomery Bus Boycott kicked off the Civil Rights era a few days later."
 
address="600 Dexter Ave"
  zip="36130"
  long="-86.301963"
  lat="32.377189" />
<state name="Alaska"
  abbreviation="AK"
  capital="Juneau"
  flag="AK.gif"
  date="January 3, 1959"
  fact="In 1867 United States Secretary of State William H. Seward offered Russia $7,200,000, or two cents per acre, for Alaska."
  address="120 4th Street"
  zip="99801"
  long="-134.410699"
  lat="58.301072" />
<state name="Arizona"
  abbreviation="AZ"
  capital="Phoenix"
  flag="AZ.gif"
  date="February 14, 1912"
  fact="Arizona is home of the Grand Canyon National Park."
  address="1700 West Washington St"
  zip="85007"
  long="-112.095704"
  lat="33.448543" />
</capitals>'
;

SELECT Node.value('@name', 'VARCHAR(100)') AS Name,
  Node.value('@abbreviation', 'VARCHAR(2)') AS Abbreviation,
 
Node.value('@capital', 'VARCHAR(100)') AS Capital,
  Node.value('@flag', 'VARCHAR(100)') AS Flag,
  Node.value('@date', 'VARCHAR(100)') AS Date,
  Node.value('@address', 'VARCHAR(100)') AS Address,
  Node.value('@zip', 'VARCHAR(5)') AS ZIP,
  Node.value('@long', 'FLOAT') AS Long,
  Node.value('@lat', 'FLOAT') AS Lat
FROM @xml.nodes('/capitals/state') TempXML (Node);

This produces the exact same results as the OPENXML method, but the syntax is much cleaner and the code is more manageable. You also don't have to worry about COM and the manual memory management.

The nodes() method simply generates a tabular result set that contains a row for each XML node that matches the path expression. In this example it returns a row for each '/capitals/state' node. Then the value() method is used to retrieve scalar values for each attribute in the row. You could also use the query() method to retrieve nodes instead of scalar values.

As for speed, the difference between using the xml data type nodes() method and OPENXML is negligible. If you're shredding XML data stored in a table, however, a primary XML index cuts the processing time down by nearly 50%.

For more manageable code and more robust and scalable applications, use the nodes() method to shred your XML instead of OPENXML.

Filed under: , , ,

Comments

# Mike DiRenzo said on January 21, 2008 05:24 AM:

EXCELLENT point.  I have been enlightening others as well using this techinique over the former.  Where did you find your info about the memory usage using the older technique.  I have always known it was memory beast but 1/8 of total server memory? Wow.

Thank you very much for a great article.

-Mike

# Mike C said on January 21, 2008 07:55 AM:

Hi Mike,

The 1/8 memory issue is documented in SQL 2005 BOL at msdn2.microsoft.com/.../ms187367.aspx, the entry on "sp_xml_preparedocument".  It was also an issue on SQL 2000. They also advise there's a chance you could run out of memory using OPENXML.

Thanks,

Mike C.