Wednesday, October 31, 2007

DOMCatagory

If you do any XML processing then I suggest you try DOMCategory in Groovy. A 'category' itself is feature of the Groovy language, borrowed from the objective C language. The short of it is you can use 'category' classes to wrap a given piece of your code. The objects in the wrapped code then gain new abilities! So what about DOMCatetgory (DOMcat)? This category allows you to work with DOM objects using a simple syntax. So an example would help. Let me first show a simple XML document then I will give a short script that can parse some values from the document.

Here is a real piece of XML I have dealt with. You can see its complexity is more than basic. It contains the namespace 'soap' which maps to http://www.w3.org/2003/05/soap-envelope, and also there are namespaces with values of urn:zimbraAdmin and urn:zimbra. For sake of ease lets say this file is named "zimbra.xml" and is found at the root ('/') of the file system.


<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<context xmlns="urn:zimbra">
<sessionId id="467" type="admin">467</sessionId>
<change token="8"/>
</context>
</soap:Header>
<soap:Body>
<GetAllDistributionListsResponse xmlns="urn:zimbraAdmin">
<dl id="b5d362c5-ef95-4fce-a5bc-1f7b06ece7f6" name="testing2@west.ora.com">
<a n="uid">testing2</a>
<a n="mail">testing2@west.ora.com</a>
<a n="zimbraMailStatus">enabled</a>
<a n="cn">testing2</a>
<a n="description">Testing DL for SOAP</a>
<a n="zimbraId">b5d362c5-ef95-4fce-a5bc-1f7b06ece7f6</a>
<a n="objectClass">zimbraDistributionList</a>
<a n="objectClass">zimbraMailRecipient</a>
<a n="displayName">testing2</a>
<a n="zimbraMailAlias">testing2@west.ora.com</a>
</dl>
<dl id="45c9cd14-811f-426b-888f-42ec05f3d7cc" name="testing3@west.ora.com">
<a n="uid">testing3</a>
<a n="mail">testing3@west.ora.com</a>
<a n="zimbraMailStatus">enabled</a>
<a n="cn">testing3</a>
<a n="description">for testing Zimbra SOAP services</a>
<a n="zimbraId">45c9cd14-811f-426b-888f-42ec05f3d7cc</a>
<a n="objectClass">zimbraDistributionList</a>
<a n="objectClass">zimbraMailRecipient</a>
<a n="displayName">testing3</a>
<a n="zimbraMailAlias">testing3@west.ora.com</a>
</dl>
<dl id="ddcec243-29fd-4aea-b443-912c69b30544" name="testing@west.ora.com">
<a n="uid">testing</a>
<a n="mail">testing@west.ora.com</a>
<a n="zimbraMailStatus">enabled</a>
<a n="cn">Testing</a>
<a n="description">Testing DL via SOAP</a>
<a n="zimbraId">ddcec243-29fd-4aea-b443-912c69b30544</a>
<a n="objectClass">zimbraDistributionList</a>
<a n="objectClass">zimbraMailRecipient</a>
<a n="displayName">Testing</a>
<a n="zimbraMailAlias">testing@west.ora.com</a>
</dl>
</GetAllDistributionListsResponse>
</soap:Body>
</soap:Envelope>

Now lets examine parsing this document from disk and examining some of its data. Here is a code snippit to create a DOM object, which we will use later with a DOMCategory.


def reader = new FileReader(new File("/zimbra.xml"))
def doc = DOMBuilder.parse(reader)
def zimbradom = doc.documentElement

Now that we have a dom we can bust out the DOMCategory so that parsing is a snap!


use(groovy.xml.dom.DOMCategory)
{
soapdom.'soap:Body'[0].'GetAllDistributionListsResponse'[0].'dl'.each { node ->
println "I am ${node.'@name'}"
}
}

A simple example but there is a lot to explain here. First you can see that you use indexes to select a given node. In this case our XML document has only one 'soap:Body' node, and only one 'GetAllDistributionListsResponse' node. So we could acutally omit the [0] indexes:


use(groovy.xml.dom.DOMCategory)
{
soapdom.'soap:Body'.'GetAllDistributionListsResponse'.'dl'.each { node ->
println "I am ${node.'@name'}"
}
}

Notice how we select an attribute value on a given element. Simply prefix a '@' symbol before the name of the attribute. See also that the full qualified names are being used to select some elements. Since the document names the element 'soap:Body' it can be selected with that exact name. While the elements that fall in the 'urn:zimbra' and 'urn:zimbraAdmin' namespaces are not prefixed and therefore we do not use a prefix to select said elements (as seen in selecting 'GetAllDistributionListsResponse' and 'dl'). Keep this idea in mind when selecting qualified attributes too!