Simple XML in Erlang

2nd of May 2008

xmerl is erlangs native XML library, it has powerful XPath and XSLT functionality, and is pretty fully featured. however when dealing with small XML snippets it can be quite cumbersome and the native format (XMLElement) can be hard to read from a shell output.

Luckily, xmerl_lib has a handy, but undocumented function that make working with XML quite pleasant :

simplexml_read_string(Str) ->
  Options = [{space,normalize},{encoding,"utf-8"}],
  {XML,_Rest} = xmerl_scan:string(Str,Options),
  xmerl_lib:simplify_element(XML).

xmerl_lib:simplify_element takes the standard xmlElement record format, and strips it into a simple nested tuple representation :

1> Xml = "<root foo=\"bar\"><first>hello</first>
          <second>world</second></root>",

2> X1 = simplexml_read_string(Xml).
{root,[{foo,"bar"}], [
  {first,[],["hello"]},
  {second,[],["world"]}]}

while the native xmlElement is certainly more powerful, I find having a stripped down representation easier to deal with for small snippets of XML, its easier to pattern match and is more readable in debug output.The function is undocumented, which means it interface 'may' change in future versions, however its unlikely to happen, and if it does you can always strip the old source code.

To produce XML from this simplexml format (this is documented) :

simplexml_output_string(Xml,Prolog) ->
  lists:flatten(xmerl:export_simple([Xml],xmerl_xml,[{prolog,Prolog}])).

3> simplexml_output_string(X1,[]).
"<root foo=\"bar\"><firstchild>hello</firstchild>
 <secondchild>world</secondchild></root>"

One common problem when pattern matching against this tuple format, is whitespace,

"<a><b>c</b> <b>c</b></a>"
isnt the same as
"<a><b>c</b><b>c</b></a>"

So heres a quick function that will strip whitespaces from an xml document (it doesnt distinguish between whitespace values, so "<a> </a>" will become {a,[],[]}, and not {a,[],[" "]}).

strip_whitespace({El,Attr,Children}) ->
  NChild = lists:filter(fun(X) ->
    case X of
    " " -> false;
    _   -> true
    end
  end,Children),
  Ch = lists:map(fun(X) -> strip_whitespace(X) end,NChild),
  {El,Attr,Ch};

%% Just a Plain Value
strip_whitespace(String) -> String.

4> simplexml_read_string("<a><b>c</b> <b>c</b></a>").
{a,[],[
  {b,[],["c"]},
  " ",
  {b,[],["c"]}]}

5> strip_whitespace(simplexml_read_string("<a><b>c</b> <b>c</b></a>")).
{a,[],[
  {b,[],["c"]},
  {b,[],["c"]}]}

Comments


There has been no comments


Post a Comment


Name :

Url :

Comment :
html is enabled, you may post links / images and basic formatting, styling is permitted.

75 minus five is :
Simple check against spamming robots