I've decided to kick this blog off with a double-feature. In the last post I gave an example of shredding XML query plans with the nodes() method. In this one we'll discuss the nodes() method in a little more detail.
In a previous post on the Pseudo-Random blog, I recommended that everyone who's still using OPENXML should switch over to the xml data type's nodes() method. The problems I mentioned included:
-
The
1/8th memory problem - the sp_xml_preparedocument system stored procedure allocates 1/8th of the total server memory to the XML cache when you invoke it to turn your text-based XML into a DOM document for further querying and manipulation.
-
COM. COM. COM. - OPENXML relies on MSXML, which it invokes using COM. Besides being on my TTMMU shortlist ("Things That Make Me Unhappy"), running COM objects within the SQL Server process space is inherently
dangerous. Of course if you invoke a COM object from within SQL Server, but keep it outside of the SQL Server process space, then SQL Server has no control over that COM object. This was the same reason we had to wait for SQLCLR to invoke .NET code from within SQL Server.
-
Handles - OPENXML forced us to pass a "handle", essentially an index ID # or pointer to an MSXML DOM instance. To get the DOM instance, of course, you have to first call sp_xml_preparedocument. And if for some reason you don't call sp_xml_removedocument? "Instant memory leak."
-
Ease-of-use - the syntax for OPENXML is a bit kludgy and very non-intuitive, especially if you are dealing with namespaces in your XML document.
-
And the list goes on...
I don't want to use up this entire blog entry complaining about OPENXML, so I'll cut it short here. From the above you get the idea that OPENXML is an inferior method of shredding XML data. So what's the alternative? Glad you asked. The xml data type includes several built-in methods, including one method designed just for shredding XML data: the nodes() method. Basically this method works like this:
-
The nodes() method accepts an XQuery path expression
-
Every XML node in your xml data type instance that matches the path expression is returned as a relational row. Each row is an xml data type instance, but a restricted version (more on this later)
The sample query below demonstrates the nodes() method. In the sample we shred the XML document into relational rows, and query the context node ('.') to return each relational row.
DECLARE @xml XML;
-- Character representation of the XML
SET @xml = '<capitals>
<state name="Alabama"
abbreviation="AL"
capital="Montgomery" />
<state name="Alaska"
abbreviation="AK"
capital="Juneau" />
<state name="Arizona"
abbreviation="AZ"
capital="Phoenix" />
</capitals>';
-- Retrieve each individual node from the relational result set
SELECT Node.query('.')
FROM @xml.nodes('/capitals/state') TempXML (Node);
One thing to notice is that both the virtual table name (TempXML) and the virtual column name (Node) are defined as aliases behind the nodes() method. This is mandatory. The result of this query shows that each /capitals/state node is shredded into a single row of the result set.

Once the xml data type instance is shredded into relational rows of the xml data type, you can use the other xml data type methods like query() and value() to extract nodes and scalar values from the rows. As I previously said, the xml rows returned by the nodes() method are restricted in their functionality. You can't cast them to other data types; you can only use the xml data type methods on them to extract their contents. We can use the value() method in our example to extract each state's name, abbreviation, and capital from the nodes() result set, as shown below.
DECLARE @xml XML;
-- Character representation of the XML
SET @xml = '<capitals>
<state name="Alabama"
abbreviation="AL"
capital="Montgomery" />
<state name="Alaska"
abbreviation="AK"
capital="Juneau" />
<state name="Arizona"
abbreviation="AZ"
capital="Phoenix" />
</capitals>';
-- Retrieve scalar values from the relational result set
SELECT Node.value('@name', 'varchar(100)') AS Name,
Node.value('@abbreviation', 'varchar(2)') AS Abbreviation,
Node.value('@capital', 'varchar(100)') AS Capital
FROM @xml.nodes('/capitals/state') TempXML (Node);
As you can see, this syntax is fairly intuitive, especially when compared to the equivalent OPENXML syntax. The results are shown below.

In the next post we'll talk about loading XML data from files from the file system directly into SQL Server.