XMLStarlet: A gentle introduction into XSLT

September 29, 2004

In my work as a BEA Tech Support engineer, I have to spend a lot of time dealing with XML content. In the past I tried to grep through the files, but had found that element order and random newlines make it quite hard. I have since found a solution.

Everybody who works with XML probably knows that XSLT/XPATH is supposed to be a good way to manipulate XML files. Unfortunately, XSLT is very verbose and sometimes quite obscure, especially when it comes to plain text output. I have tried using it a couple of times and never got beyond a simple HelloWorld.

Then I discovered XMLStarlet. It is built on libxml2 and libxslt and is provided for several platforms as a all-in-one executable.

XMLStarlet has a number of options; among them are transformation, edit, canonicalization and many others. The one I find most useful is select. With select, XMLStarlet works something like an in-file unix find. You provide command line switches which describe what you are searching for and what you want to display.

And the best part of all is that the proprietary syntax translates into 100% correct XSLT, so when you have outgrown the tool or need more specific tweaking, you take the generated XSLT and start editing that directly.

Let me give you an example. I often have to work with Weblogic’s config.xml file which defines the configuration for a whole domain. It describes servers, clusters, JMSQueues, SecurityRealms and many more aspects. And, as config.xml was designed for the software’s convinience rather than support engineer’s, it groups elements together in a less than intuitive fashion.

Following, is a very small part of config.xml structure (derived from a sample config.xml file using XMLStartlet’s xml el -a config.xml command.

A task I need to do often is to check how many servers a client’s config.xml has, what their names are and how are they distributed between clusters. Manually, it means searching for <Server tag and then eyeballing for Name attribute.

With XMLStarlet it becomes:

xml sel -t -m //Server -v @Name -n config.xml

with the output of

``In my work as a BEA Tech Support engineer, I have to spend a lot of time dealing with XML content. In the past I tried to grep through the files, but had found that element order and random newlines make it quite hard. I have since found a solution.

Then I discovered XMLStarlet. It is built on libxml2 and libxslt and is provided for several platforms as a all-in-one executable.

Following, is a very small part of config.xml structure (derived from a sample config.xml file using XMLStartlet’s xml el -a config.xml command.

With XMLStarlet it becomes:

xml sel -t -m //Server -v @Name -n config.xml

with the output of

In here, -m is match and -v is display. -n is a newline, not something that is obvious in XSLT itself. The parameters are XPATH values.

Do this 5 times and it pays back the time investment involved in learning the command syntax.

Now, let’s try to do something more complex. Specifically, I want to check which cluster each server belongs to. And to cut this post short(er), let’s say I want it sorted by the cluster.

The command would be:

xml sel -t -m //Server -s ATL @Cluster -v @Cluster -o " - " -v @Name -n config.xml

with the output of


Everybody who works with XML probably knows that XSLT/XPATH is supposed to be a good way to manipulate XML files. Unfortunately, XSLT is very verbose and sometimes quite obscure, especially when it comes to plain text output. I have tried using it a couple of times and never got beyond a simple HelloWorld.

Then I discovered [XMLStarlet][1]. It is built on libxml2 and libxslt and is provided for several platforms as a all-in-one executable.

XMLStarlet has a number of options; among them are transformation, edit, canonicalization and many others. The one I find most useful is `select`. With select, XMLStarlet works something like an in-file unix find. You provide command line switches which describe what you are searching for and what you want to display.

And the best part of all is that the proprietary syntax translates into 100% correct XSLT, so when you have outgrown the tool or need more specific tweaking, you take the generated XSLT and start editing that directly.

Let me give you an example. I often have to work with Weblogic's config.xml file which defines the configuration for a whole domain. It describes servers, clusters, JMSQueues, SecurityRealms and many more aspects. And, as config.xml was designed for the software's convinience rather than support engineer's, it groups elements together in a less than intuitive fashion.

Following, is a very small part of config.xml structure (derived from a sample config.xml file using XMLStartlet's `xml el -a config.xml` command.

<pre>Domain
Domain/@Name
....
Domain/Server
Domain/Server/@ListenAddress
Domain/Server/@ListenPort
Domain/Server/@Name
Domain/Server/@Cluster
...
Domain/Server/WebServer
...</pre>

A task I need to do often is to check how many servers a client's config.xml has, what their names are and how are they distributed between clusters. Manually, it means searching for <Server tag and then eyeballing for Name attribute.

With XMLStarlet it becomes:
  
`xml sel -t -m //Server -v @Name -n config.xml`
  
with the output of
  
``In my work as a BEA Tech Support engineer, I have to spend a lot of time dealing with XML content. In the past I tried to grep through the files, but had found that element order and random newlines make it quite hard. I have since found a solution.

Everybody who works with XML probably knows that XSLT/XPATH is supposed to be a good way to manipulate XML files. Unfortunately, XSLT is very verbose and sometimes quite obscure, especially when it comes to plain text output. I have tried using it a couple of times and never got beyond a simple HelloWorld.

Then I discovered [XMLStarlet][1]. It is built on libxml2 and libxslt and is provided for several platforms as a all-in-one executable.

XMLStarlet has a number of options; among them are transformation, edit, canonicalization and many others. The one I find most useful is `select`. With select, XMLStarlet works something like an in-file unix find. You provide command line switches which describe what you are searching for and what you want to display.

And the best part of all is that the proprietary syntax translates into 100% correct XSLT, so when you have outgrown the tool or need more specific tweaking, you take the generated XSLT and start editing that directly.

Let me give you an example. I often have to work with Weblogic's config.xml file which defines the configuration for a whole domain. It describes servers, clusters, JMSQueues, SecurityRealms and many more aspects. And, as config.xml was designed for the software's convinience rather than support engineer's, it groups elements together in a less than intuitive fashion.

Following, is a very small part of config.xml structure (derived from a sample config.xml file using XMLStartlet's `xml el -a config.xml` command.

<pre>Domain
Domain/@Name
....
Domain/Server
Domain/Server/@ListenAddress
Domain/Server/@ListenPort
Domain/Server/@Name
Domain/Server/@Cluster
...
Domain/Server/WebServer
...</pre>

A task I need to do often is to check how many servers a client's config.xml has, what their names are and how are they distributed between clusters. Manually, it means searching for <Server tag and then eyeballing for Name attribute.

With XMLStarlet it becomes:
  
`xml sel -t -m //Server -v @Name -n config.xml`
  
with the output of
  
`` 

In here, -m is match and -v is display. -n is a newline, not something that is obvious in XSLT itself. The parameters are XPATH values.

Do this 5 times and it pays back the time investment involved in learning the command syntax.

Now, let's try to do something more complex. Specifically, I want to check which cluster each server belongs to. And to cut this post short(er), let's say I want it sorted by the cluster.

The command would be:
  
`xml sel -t -m //Server -s ATL @Cluster -v @Cluster -o " - " -v @Name -n config.xml`
  
with the output of

We can immediately see that there is one non-clustered (admin) server, 4 servers in one cluster and 2 servers each in the other two clusters. In the command above, -s is sort by Cluster attribute and -o is verbose output used for spacing. Instead of -v -o -v, I could use -v "concat(XXX,yyy,ZZZ)", but that required a greater knowledge of XSLT than I had the first time I used the tool.

And just to show you the XSLT that the above command really corresponds to, it is (after removing long header and footer):

As you probably can imagine, it takes quite a bit longer for an XSLT beginner to write the code above from scratch.

I hope this was useful and will convince you to have a look at the XMLStarlet.

BloggicBlogger Over and Out