in

SQLServerCentral.com

The largest free SQL Server community.

Pro SQL Server XML

SQL Server + XML Development Tutorials, Tips, and Tricks
  • XML Puzzle Contest Winners

    In the last entry I promised to share the answer to the XML puzzle today.  Here's a quick recap of the question -

    SQL 2005 BOL gives the following example of a full-text search XML thesaurus file:

    <XML ID="Microsoft Search Thesaurus">
      <thesaurus xmlns="x-schema:tsSchema.xml">
        <diacritics = false/>
        <expansion>
          <sub>Internet Explorer</sub>
          <sub>IE</sub>
          <sub>IE5</sub>
        </expansion>
        <replacement>
          <pat>NT5</pat>
          <pat>W2K</pat>
          <sub>Windows 2000</sub>
        </replacement>
        <expansion>
          <sub>run</sub>
          <sub>jog</sub>
        </expansion>
      </thesaurus>
    </XML>
     

    According to the XML 1.0 Recommendation what are two things that are wrong with this file?

    I said there were two things wrong with this XML document, but in fact I accepted any combination of any two of the following three issues:

    1. <diacritics = false> is an XML tag that attempts to assign the value "false" directly to the XML tag; there's no attribute.
    2. Also in the <diacritics = false> tag there are no quotes (single or double) around the value "false".  This would cause any standard XML parser to throw an exception.
    3. The last issue is not as obvious, and won't necessarily cause most currently available XML parsers to error out. According to the standard, element names cannot begin with the string 'xml' or any combination of letters that match the regular expression (('X'|'x') ('M'|'m') ('L'|'l')).  Tag names that start with 'xml' are reserved for future use by the standard.  This means you can't have an element named 'xml', 'Xml', or 'XML'; the root element of this example document is named 'XML'.

    I received several correct answers, but had to limit the contest to two correct answers selected at random: Darshan Singh and Saggi Neumann will both be receiving copies of the Apress book Pro SQL Server 2008 XML.  Thanks to everyone who participated, and congratulations to the winners!

  • XML Puzzle

    How well do you know XML?  Try your hand at the following puzzle and email me your solution at michaelcoREPLACE_THIS_WITH_THE_AT_SIGNoptonline.com. I'll randomly select two correct solutions to receive a copy of Pro SQL Server 2008 XML. I'll post the correct answer to this blog at the same time. Be sure to include your name and complete "snail mail" mailing address with your answer.

    And with no further ado, here's the puzzle:

    While doing some research on SQL Server 2005 full-text search a while back I ran across a sample XML full-text thesaurus file in Books Online:

    <XML ID="Microsoft Search Thesaurus">
      <thesaurus xmlns="x-schema:tsSchema.xml">
        <diacritics = false/>
        <expansion>
          <sub>Internet Explorer</sub>
          <sub>IE</sub>
          <sub>IE5</sub>
        </expansion>
        <replacement>
          <pat>NT5</pat>
          <pat>W2K</pat>
          <sub>Windows 2000</sub>
        </replacement>
        <expansion>
          <sub>run</sub>
          <sub>jog</sub>
        </expansion>
      </thesaurus>
    </XML>

    According to the XML 1.0 Recommendation there are two problems with this XML. Can you spot the two problems?

    For reference you can view the original BOL entry with this XML in it here http://msdn.microsoft.com/en-us/library/ms345186.aspx. The latest version of the XML 1.0 Recommendation is available at http://www.w3.org/TR/REC-xml/.

    I'll accept contest submissions until June 20.

  • Grabbing Node Names and Values Pt. 2

    As a follow-up to my last post on grabbing node names and values, I was asked about grabbing URIs from more complex XML documents. The following query is the result of that discussion:

    DECLARE @x xml;SET @x = N'<?xml version = "1.0"?>
      <root xmlns:n = "uri:components:name"
        xmlns:p = "uri:components:people">
        <p:person>
          <name>
            <n:first-name>Joe</n:first-name>
            <n:last-name>Johnson</n:last-name>
          </name>
          <name>
            <n:first-name>Lennox</n:first-name>
            <n:last-name>Lewis</n:last-name>
          </name>
          <name>
            <n:first-name>Louie</n:first-name>
            <n:last-name>McDougal</n:last-name>
          </name>
        </p:person>
      </root>'
    ;

    SELECT CASE NodeUri WHEN '' THEN ''
      ELSE '{' + NodeUri + '}' END + NodeName AS [Name],
      NodeUri,
      NodeName,
      NodeValue
    FROM
    (
      SELECT node.value('fn:namespace-uri(.[1])', 'varchar(1000)') AS NodeUri,
        node.value('fn:local-name(.[1])', 'varchar(1000)') AS NodeName,
        node.query('./text()') AS NodeValue
      FROM @x.nodes(N'//*') T(node)
    )
    sub;

    The result looks like this:

    Grabbing Node URIs, Names, and Values

    The first column, Name, represents the fully-expanded URI + local name in the {URI}local-name format. This is what XQuery processors, like the one in SQL Server, see once the namespace prefixes are expanded and prepended to local names. Notice that nodes with no namespace don't have anything prepended. The NodeURI column is generated with a call to the fn:namespace-uri() function, and it represents the URI portion of the namespace. The NodeName and NodeValue columns contain the local name of each node and the value, respectively, just like in the last post.

    XQuery supports several functions and operators like these specifically for nodes, and SQL Server supports most of them.

    Add to Technorati Favorites 

  • Grabbing Node Names and Values

    I got a question the other day about grabbing node names and values from an XML document. The person asking was using string parsing with the SUBSTRING function to split out names from content based on '>' characters in the XML string. Not a fullproof solution, to be sure, but quick and easy based on his predictable XML data.  So the question then becomes is there an XML-based method to grab this information? 

    DECLARE @x xml;
    SET @x = N'<?xml version = "1.0"?>
    <root>
      <person>
        <name>
          <first-name>Joe</first-name>
          <last-name>Johnson</last-name>
        </name>
        <name>
          <first-name>Lennox</first-name>
          <last-name>Lewis</last-name>
        </name>
        <name>
          <first-name>Louie</first-name>
          <last-name>McDougal</last-name>
        </name>
      </person>
    </root>'
    ;

    SELECT
    node.query('fn:local-name(.)') AS NodeName, node.query('./text()') AS NodeValue
    FROM @x.nodes(N'//*') T(node);

    The fn:local-name() function retrieves the local name portion of XML node names, and the text() node test retrieves the node text, as shown below.

    Result of fn:local-name() query

    Add to Technorati Favorites 

    Posted Mar 07 2008, 11:51 AM by Mike C with no comments
    Filed under: , ,
  • Unmatched Nodes in XQuery

    In a previous post I talked about performing "inner joins" in XQuery. The basic idea is that the inner join is simply a special case of a Cartesian product, or "cross join".  In this post I'm going to briefly look at another common SQL-style join condition that can be duplicated to some degree in XQuery. Specifically I'm going to give an example of a retrieving nodes from a tuple stream that don't match the nodes in a second tuple stream.

    In this post we'll duplicate about 50% of the SQL "left outer join" functionality. For those who aren't familiar with SQL left outer joins, they can be thought of as an inner join between two tables unioned together with the rows of the left-hand table that have no corresponding rows in the right-hand table.

    For this example I'm going to borrow some slightly modified XML data from the previous post:

    DECLARE @xml xml;
    -- Create sample XML document
    SET @xml = N'<authors>
      <author id = "1">Fabio Claudio Ferracchiati</author>
      <author id = "2">Hugo Kornelis</author>
      <author id = "3">Rob Walters</author>
      <author id = "4">Lara Rubbelke</author>
      <author id = "5">Adam Machanic</author>
      <author id = "6">Michael Coles</author>
      <author id = "7">Robin Dewson</author>
      <author id = "8">Jan D. Narkiewicz</author>
      <author id = "9">Robert Rae</author>
    </authors>
    <books>
      <book title = "Pro T-SQL 2005 Programmer&apos;s Guide">
        <isbn>159059794X</isbn>
        <author>6</author>
      </book>
      <book title = "Accelerated SQL Server 2008">
        <isbn>1590599691</isbn>
        <author>3</author>
        <author>6</author>
        <author>7</author>
        <author>1</author>
        <author>8</author>
        <author>9</author>
      </book>
      <book title = "Pro SQL Server 2008 XML">
        <isbn>1590599837</isbn>
        <author>6</author>
      </book>
    </books>'
    ;
    -- Perform outer join
    SELECT @xml.query('for $author in /authors/author
      where fn:empty($author[@id = /books/book/author])
      return <author> { $author } </author>'
    );

    This sample is available as a download file here.

    The XML data consists of a list of authors and a list of books. The query uses a FLWOR expression to retrieve all authors who don't have a corresponding book node. Here's how it works:

    1. The for clause binds the tuple stream /authors/author to the $author variable.  By "binding the tuple stream", I mean that every author node is assigned to the $author variable in turn. For those from a C# background you can think of this as a foreach loop over the XML nodes specified by the path expression.
    2. The where clause uses the fn:empty() function to determine if the id attribute of the current $author node exists in the /books/book/author node sequence. Basically we're checking to see if the current author's ID # is assigned to any given book or not. If not, the fn:empty() function returns true.
    3. The return clause uses XML construction to build an <author> element for every author that meets the where clause condition (e.g., no matching book).

    The results are shown below:

    <author>
     
    <author id="2">Hugo Kornelis</author>
    </
    author>
    <
    author>
      <
    author id="4">Lara Rubbelke</author>
    </author>
    <
    author>
     
    <author id="5">Adam Machanic</author>
    </author>

    As you can see, the three authors with id's of (2, 4, 5) were returned because they have no corresponding book nodes in the XML data. This technique, combined with the previously discussed inner join technique, can be used to simulate SQL style outer joins in XQuery.

    Add to Technorati Favorites 

  • Retrieving Recursive Hierarchical Data

    Part of the promise of XML is that it makes modeling hierarchical data relatively painless. The hierarchical structure of XML automatically provides context to your data. Consider the following simple example:

    <General name="Grant">
      <
    Colonel name="Sanders">
        <
    Major name="Payne">
          <
    Captain name="Calamity">
            <
    Sergeant name="Bilko">
              <
    Private name="Benjamin" />
            </
    Sergeant>
          </
    Captain>
        </
    Major>
      </
    Colonel>
    </
    General>

    As you can see, the hierarchical structure is implicit in the structure of the XML. The General is above the Colonel is above the Major, and so on. Organizational charts, military chains of command, and other similar structures are standard fare for hierarchical modeling. Needless to say, the AdventureWorks database has a standard organizational hierarchy modeled as relational data which you'll probably see used in a whole lot of samples all over the place.

    But AdventureWorks also includes a more detailed, but often-overlooked, real-world example of hierarchy. The AdventureWorks Bill of Materials (BOM) hierarchy is modeled as an adjacency list hierarchy; one in which the child nodes maintain links to their parent nodes in the same table. You can query the Bill of Materials using the dbo.uspGetBillOfMaterials procedure like this:

    EXEC dbo.uspGetBillOfMaterials 749, '2001-01-01'

    The results are a "flat" relational listing of the raw materials used to build the "Road-150 Red, 62" bike.

    Result of uspGetBillOfMaterials

    The results do provide the parent and child component ID numbers, as well as the level of the component in the BOM. But they do not convey the hierarchical information without some sort of reconstruction on your part - probably performed in the UI for display purposes. All of the information required to reconstruct the hierarchical structure of the BOM is available in this result set, but it's not readily apparent. You can use FOR XML PATH and the xml data type to reconstruct the hierarchical structure of the BOM with a simple recursive function like this:

    CREATE FUNCTION dbo.GetRecursiveBom(@ProductID int, @level int = 0)
    RETURNS xml
    AS
    BEGIN

      DECLARE
    @x xml;

      SET @x = (
        SELECT bom.ComponentID AS "@id",
          p.ProductNumber AS "@number",
          @level AS "@level",
          p.Name AS "name",
          p.Color AS "color",
          p.ListPrice AS "list-price",
          bom.PerAssemblyQty AS "quantity",
          p.Size AS "size",
          p.SizeUnitMeasureCode AS "unit-of-measure",
          ( 
            SELECT CASE
             
    WHEN @level <THEN dbo.GetRecursiveBom(bom.ComponentID, @level + 1)
              ELSE ''
              END
          )
        FROM Production.BillOfMaterials bom
        INNER JOIN Production.Product p
          ON bom.ComponentID = p.ProductID
        WHERE @ProductID = CASE WHEN @level = 1 THEN bom.ComponentId
          ELSE bom.ProductAssemblyID END
        FOR XML PATH (N'item'), TYPE
      );

      RETURN @x;

    END
    GO

    This function calls itself recursively to retrieve a BOM (up to 5 levels deep) in XML format. You can call the function like this:

    SELECT dbo.GetRecursiveBom (749, 0);

    And the resulting XML looks like the image below:

    XML Recursive BOM

    As you can see the item nodes are nested within each other, with each component acting as a container for its constitutent components. The code for this function is available as a download file here: GetRecursiveBom.sql.txt

    In a future post we'll look at how to create a "recursive" XML schema to validate the XML produced by this function.

  • Inner Joins in XQuery

    One of the more interesting features of XQuery is FLWOR expressions (FLWOR stands for "for, let, where, order by, return", the keywords in the FLWOR expression vocabulary). FLWOR expressions have several capabilities, many of which are well-documented, but some of which get little attention.

    One of the lesser-publicized features I found interesting is the "inner join" capability. XQuery has the ability to generate the Cartesian product ("cross join") of two path expressions using a for clause format like this:

    for $var1 in path1, $var2 in path2 

    Technically speaking, this for clause "binds the tuple stream" of path1 to $var1, binds the tuple stream of path2 to $var2, and generates the Cartesian product of the two variables.  If path1 generated the tuple stream (tom, joe, larry), and path2 generated the tuple stream (johnson, washington), then the result of the cross join would be:

    tom johnson
    tom washington
    joe johnson
    joe washington
    larry johnson
    larry washington

    Every possible combination of the two tuple streams is generated. Now I promised you an inner join, didn't I? That's actually pretty simple when you consider the inner join to be a special case of the cross join. For instance, consider the old-style SQL syntax (pre-SQL-92) for inner joins. It looked something like this:

    SELECT bookName
    FROM book, author
    WHERE book.AuthorId = author.Id

    This statement performs a cross join between the book and author tables, and then restricts the results with a WHERE clause. This is the essence of the inner join. We can do the same thing with the FLWOR expression in XQuery, as shown below:

    DECLARE @xml xml;
    -- Create sample XML document

    SET
    @xml = N'<authors>
      <author id = "1">Fabio Claudio Ferracchiati</author>
      <author id = "2">Hugo Kornelis</author>
      <author id = "3">Rob Walters</author>
      <author id = "4">Lara Rubbelke</author>
      <author id = "5">Adam Machanic</author>
      <author id = "6">Michael Coles</author>
      <author id = "7">Robin Dewson</author>
      <author id = "8">Jan D. Narkiewicz</author>
      <author id = "9">Robert Rae</author>
    </authors>
    <books>
      <book title = "Pro T-SQL 2005 Programmer&apos;s Guide">
        <isbn>159059794X</isbn>
        <author>6</author>
      </book>
      <book title = "Expert SQL Server 2005 Development">
        <isbn>159059729X</isbn>
        <author>5</author>
        <author>4</author>
        <author>2</author>
      </book>
      <book title = "Linq for Visual C# 2005">
        <isbn>1590598261</isbn>
        <author>1</author>
      </book>
      <book title = "Accelerated SQL Server 2008">
        <isbn>1590599691</isbn>
        <author>3</author>
        <author>6</author>
        <author>7</author>
        <author>1</author>
        <author>8</author>
        <author>9</author>
      </book>
      <book title = "Pro SQL Server 2008 XML">
        <isbn>1590599837</isbn>
        <author>6</author>
      </book>
    </books>'
    ;

    -- Perform inner join
    SELECT @xml.query('for $author in /authors/author, $book in /books/book
      where $author/@id = $book/author
      and $author eq "Fabio Claudio Ferracchiati"
      return <book> { $book/@title } </book>'
    );

    Here's how this FLWOR expression works, step-by-step:

    1. The query binds the tuple stream from the /authors/author path to the variable $author.
    2. Then it binds the tuple stream /books/book to the $book variable.
    3. It generates the cross join of the two tuple streams.
    4. The where clause limits the results to the ones where the author's id attribute matches one of the book's author elements.
    5. Then the results are restricted again by the second half of the compound predicate: the final results will only include those where the author's name is "Fabio Claudio Ferracchiati".
    6. Finally, node construction is used to return the title of the books this author has written as XML nodes.

    The result of this XQuery FLWOR expression is shown below:

    <book title="Linq for Visual C# 2005" />
    <
    book title="Accelerated SQL Server 2008" />

    As you can see FLWOR expressions make inner joins, as a subset of cross joins, possible in XQuery.

  • Loading XML From the File System

    SQL Server has a handy option for loading XML data directly from the file system, the OPENROWSET function.  To load XML files into SQL Server, simply declare an xml data type variable and assign it the value of the OPENROWSET function with the BULK option.  Here's a quick example that uses the state-list.xml file (attached to this post, or available here):

    DECLARE @xml XML;

    -- Use OPENROWSET to read an XML file from the file system
    SELECT @xml = BulkColumn
    FROM OPENROWSET(BULK 'c:\state-list.xml', SINGLE_BLOB) TempXML

    -- View the result
    SELECT @xml;

    The first OPENROWSET parameter is the filename to load, preceded by the BULK keyword.  The second parameter defines the type of the file.  Valid values are SINGLE_CLOB for character large object, SINGLE_NCLOB for Unicode character large objects, and SINGLE_BLOB for binary large object.  Microsoft recommends always setting this parameter to SINGLE_BLOB.  Also make sure that you alias the OPENROWSET (in this example I used "TempXML" as an alias).  The single column returned is named BulkColumn.

    The only real caveats here are:

    1. Make sure that your file actually exists where you tell it, or you'll obviously raise an exception,
    2. Make sure your XML file is actually XML data if you're loading it into an xml data type variable, and
    3. Make sure your SQL Server service account has access to the directory you want to read from.  On Windows Server installations the SQL Server service account may not have access to the root of the c: drive that I used in this example.

    Another "little thing": the path to the filename is from the perspective of the server. So if you're logged into a remote server from a client computer, "C:\" is the root directory on the server not the client's local C: drive.

    Also note that you cannot use a variable in place of the filename.  You have to give OPENROWSET a string literal.  To get around this, you can use dynamic SQL to construct your OPENROWSET function call if necessary.  **Always be cautious when using dynamic SQL!***

    In the next post we'll discuss some xml data type-related functionality.

  • Create a Dynamic Logging Trigger With XML

    One of the more common uses of triggers is to create data manipulation language (DML) logging functionality.  Essentially you can intercept and log inserts, updates, and deletes to tables. In the simplest case it's enough to know that one of these operations has occurred on a table, the date/time that it occurred, and some sort of identifying information for the user that performed the operation. At the other end of the requirements spectrum is the need to identify all information, including data indicating what exactly has changed.

     

    What you usually find is that people who need to log DML operations will write one trigger and modify the same basic code for every other table that needs to be logged. I know one person who even wrote a client-side utility to automatically generate custom trigger code for hundreds of tables that he had to log. One problem with this method, apart from the sheer boredom of modifying the same trigger, again and again, is maintenance. If the underlying table structure changes your trigger is suddenly shot. In this post we're going to use FOR XML and SQL Server catalog views to create a dynamic trigger that will work on just about any table, and will automatically adjust it's output if the table structure changes.

     

    To start with, we'll create a log table called dbo.AuditDml in the AdventureWorks database: 

    CREATE TABLE dbo.AuditDml (Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
     
    SchemaName VARCHAR(100) NOT NULL,
      TableName VARCHAR(100) NOT NULL,
      TriggerName VARCHAR(100) NOT NULL,
      LogTime DATETIME NOT NULL,
      UserName VARCHAR(100) NULL,
      SPID INT NOT NULL,
      Changes XML);
    GO

    Next we’ll create a dynamic trigger.  By dynamic I mean that you can run this script against any table and it will create a trigger that automatically detects its parent table and schema and logs all DML actions against the table properly, regardless of table structure.  The example below creates the dynamic trigger on the AdventureWorks HumanResources.Shift table.

    -- Change the schema and table name to match any
    -- existing table in your database

    CREATE TRIGGER HumanResources.AuditDml_Shift
    ON HumanResources.Shift
    FOR UPDATE, INSERT, DELETE
    AS
    BEGIN
     
    DECLARE @Changes XML;
      SELECT @Changes = COALESCE((
          SELECT *
         
    FROM deleted
         
    FOR XML AUTO), '') +
       
    COALESCE((
         
    SELECT *
         
    FROM inserted
         
    FOR XML AUTO), '') 

      DECLARE
    @TriggerID INT;
     
    DECLARE @TableID INT;
      DECLARE @SchemaID INT;
      DECLARE @SchemaName VARCHAR(100);

      SET
    @TriggerID = @@PROCID;

     
    SELECT @TableID = parent_id
      FROM sys.triggers
      WHERE object_id = @TriggerID;

      SELECT
    @SchemaID = t.schema_id,
        @SchemaName = s.name
      FROM sys.tables t
      INNER JOIN sys.schemas s
        ON t.schema_id = s.schema_id
      WHERE t.object_id = @TableID;

      INSERT INTO dbo.AuditDml (SchemaName,
        TableName
    ,
        TriggerName
    ,
        LogTime
    ,
        UserName
    ,
        SPID
    ,
        Changes
    )
      SELECT @SchemaName,
        OBJECT_NAME(@TableID),
        OBJECT_NAME(@TriggerID),
        GETDATE(),
        USER_NAME(),
        @@SPID,
        @Changes
    ;
    END
    GO

    The dynamic trigger automatically determines it's schema and parent table name based on its own ID; so it doesn't matter what table this trigger is created on. It will always automatically detect this information with no special action on the part of the developer. The actual data changes are grabbed from the inserted and deleted virtual tables, which are dynamically structured using the FOR XML AUTO clause. FOR XML AUTO structures the XML data automatically based on the columns in the inserted and deleted virtual tables. Again, you don't have to do anything special, the trigger will automatically adjust its output based on the structure of the table - even if the table structure changes in the future. Now that we have a dynamic trigger created on the HumanResources.Shift table, we'll run a few DML statements to test it.

    INSERT INTO HumanResources.Shift (Name, StartTime, EndTime)
    VALUES ('Noon', '12:00:00', '20:00:00');

    UPDATE HumanResources.Shift SET Name = 'Afternoon' WHERE Name = 'Noon';

    DELETE
    FROM HumanResources.Shift WHERE ShiftId > 3;

    These statements perform an insert, an update, and a subsequent delete of a new example shift I call the "Noon" or "Afternoon" shift. The actions are logged, as shown in the image below.

    sample dml log entries

    Each entry has information like the schema name, table name, date/time, user name, etc. The Changes column is an XML data type column with the contents of the inserted and deleted virtual tables in XML format. The image below shows the results of the sample UPDATE statement above.

    xml changes log entry for update statement

     

    This type of dynamic logging is especially useful when you have to log DML activity for several tables in a transactional system. A couple of caveats:  This trigger may require some changes if your table contains LOB data type columns. Also always take care when using triggers on high-DML activity tables, as triggers of any kind can affect performance.

     

    SQL Server Standard magazine will be publishing more of these tips for getting the most out of SQL Server XML in a future issue.

  • Xml Data Type: nodes() Method

    I've decided to kick this blog off with a double-feature. In the last post I gave an example of shredding XML query plans with the nodes() method. In this one we'll discuss the nodes() method in a little more detail.

    In a previous post on the Pseudo-Random blog, I recommended that everyone who's still using OPENXML should switch over to the xml data type's nodes() method. The problems I mentioned included:

    • The 1/8th memory problem - the sp_xml_preparedocument system stored procedure allocates 1/8th of the total server memory to the XML cache when you invoke it to turn your text-based XML into a DOM document for further querying and manipulation.
    • COM. COM. COM. - OPENXML relies on MSXML, which it invokes using COM. Besides being on my TTMMU shortlist ("Things That Make Me Unhappy"), running COM objects within the SQL Server process space is inherently dangerous.  Of course if you invoke a COM object from within SQL Server, but keep it outside of the SQL Server process space, then SQL Server has no control over that COM object. This was the same reason we had to wait for SQLCLR to invoke .NET code from within SQL Server.
    • Handles - OPENXML forced us to pass a "handle", essentially an index ID # or pointer to an MSXML DOM instance. To get the DOM instance, of course, you have to first call sp_xml_preparedocument. And if for some reason you don't call sp_xml_removedocument? "Instant memory leak."
    • Ease-of-use - the syntax for OPENXML is a bit kludgy and very non-intuitive, especially if you are dealing with namespaces in your XML document.
    • And the list goes on...

    I don't want to use up this entire blog entry complaining about OPENXML, so I'll cut it short here. From the above you get the idea that OPENXML is an inferior method of shredding XML data. So what's the alternative?  Glad you asked.  The xml data type includes several built-in methods, including one method designed just for shredding XML data: the nodes() method.  Basically this method works like this:

    • The nodes() method accepts an XQuery path expression
    • Every XML node in your xml data type instance that matches the path expression is returned as a relational row.  Each row is an xml data type instance, but a restricted version (more on this later)

    The sample query below demonstrates the nodes() method. In the sample we shred the XML document into relational rows, and query the context node ('.') to return each relational row.

    DECLARE @xml XML;

    -- Character representation of the XML
    SET @xml = '<capitals>
      <state name="Alabama"
        abbreviation="AL"
        capital="Montgomery" />
      <state name="Alaska"
        abbreviation="AK"
        capital="Juneau" />
      <state name="Arizona"
        abbreviation="AZ"
        capital="Phoenix" />
    </capitals>'
    ;

    -- Retrieve each individual node from the relational result set
    SELECT
    Node.query('.')
    FROM @xml.nodes('/capitals/state') TempXML (Node);

    One thing to notice is that both the virtual table name (TempXML) and the virtual column name (Node) are defined as aliases behind the nodes() method.  This is mandatory.  The result of this query shows that each /capitals/state node is shredded into a single row of the result set.

    nodes() method turns XML data into relational rows

    Once the xml data type instance is shredded into relational rows of the xml data type, you can use the other xml data type methods like query() and value() to extract nodes and scalar values from the rows.  As I previously said, the xml rows returned by the nodes() method are restricted in their functionality.  You can't cast them to other data types; you can only use the xml data type methods on them to extract their contents.  We can use the value() method in our example to extract each state's name, abbreviation, and capital from the nodes() result set, as shown below.

    DECLARE @xml XML;

    -- Character representation of the XML
    SET @xml = '<capitals>
      <state name="Alabama"
        abbreviation="AL"
        capital="Montgomery" />
      <state name="Alaska"
        abbreviation="AK"
        capital="Juneau" />
      <state name="Arizona"
        abbreviation="AZ"
        capital="Phoenix" />
    </capitals>'
    ;

    -- Retrieve scalar values from the relational result set
    SELECT Node.value('@name', 'varchar(100)') AS Name,
      Node.value('@abbreviation', 'varchar(2)') AS Abbreviation,
      Node.value('@capital', 'varchar(100)') AS Capital
    FROM @xml.nodes('/capitals/state') TempXML (Node);

    As you can see, this syntax is fairly intuitive, especially when compared to the equivalent OPENXML syntax.  The results are shown below.

    result of shredding operation

    In the next post we'll talk about loading XML data from files from the file system directly into SQL Server.

  • Welcome to Pro SQL Server XML

    Hi everyone, in this blog I plan to discuss the new SQL Server 2005 and SQL Server 2008 XML features. While I have several topics I plan to cover already, if you have specific SQL Server + XML related questions, feel free to contact me via this blog and I'll try to answer, or at least help you get a little closer to the answer.

    I decided to kick this off with a simple example to demonstrate the utility of XML in SQL Server. SQL Server 2008 and 2005 store cached XML query plans that are accessible via the sys.dm_exec_query_plan dynamic management function. This function takes a query plan handle as an argument and returns the XML query plan, along with some other metadata.

    While I was at the PASS Conference in Denver, I threw together a quick example to demonstrate shredding the cached XML query plans on SQL 2005. I expanded it a little bit to include some other information from the sys.dm_exec_sql_text dynamic management function and the sys.dm_exec_cached_plans dynamic management view.

    WITH Plans(nodeid, physicalop, estimated_cost, plan_handle, text, query_plan, cacheobjtype, objtype)
    AS
    (
      SELECT RelOp.op.value('declare default element namespace
          "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
          @NodeId'
    , 'int'),
       
    RelOp.op.value('declare default element namespace
          "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
          @PhysicalOp'
    , 'varchar(50)'),
        RelOp.op.value('declare default element namespace
          "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
          @EstimatedTotalSubtreeCost '
    , 'float'),
       
    cp.plan_handle,
        st
    .text,
        qp
    .query_plan,
        cp
    .cacheobjtype,
        cp
    .objtype
      FROM sys.dm_exec_cached_plans cp
      CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
      CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
      CROSS APPLY qp.query_plan.nodes('declare default element namespace
        "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        //RelOp'
    ) RelOp (op)
    )
    SELECT ROW_NUMBER() OVER (PARTITION BY p.plan_handle ORDER BY p.NodeId)
    AS Operation_Num,
      p.physicalop,
      p.text,
      p
    .cacheobjtype,
      p
    .objtype,
      p.estimated_cost
    FROM Plans p
    WHERE p.cacheobjtype = 'Compiled Plan';

    Sample results of running this query on my local server are shown in the image below.

    Sample xml query plan shredding

    This sample query uses a few of the new features available starting with SQL 2005: the CROSS APPLY operator, Common Table Expressions (CTEs), dynamic management views/functions, and the ROW_NUMBER() windowing function.

    The most important feature we're using here is the xml data type and its nodes() and value() methods.  Since the XML query plans are stored as xml data type data, we are using the nodes() method to shred the XML data nodes into rows. Then we apply the value() method to each of these new rows to extract individual query plan operators and operator information from the plan. In this case we pull the physical operator name ("Nested Loops", "Filter", etc.) and the estimated subtree cost for each operator. As a bonus we are grabbing the initial SQL statement that is the basis for the plan from the sys.dm_exec_sql_text dynamic management view. That way we can easily relate the query plan, and its individual operators, back to the source SQL statement.

    This is just one example of the capabilities of the xml data type.  In upcoming posts I'll describe more precisely how the xml data type methods and XQuery work.

Copyright Red Gate Software
Powered by Community Server (Commercial Edition), by Telligent Systems