The following is a transcript of a ghci session I had tonight (minus a handful of bloopers).
I'm not a Haskell expert so expect less than stellar Haskell code. My main purpose for writing this is to give others a jumping off point into HaXml.
First I started ghci:
tim@laptop:~$ ghciThen I declared a String that held my XML document text (formatted to fit your screen):
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Prelude> let xmlText = "<?xml version=\"1.0\"?><order>Before I can parse the XML, I'll need to import some of HaXml's libraries. The :m +
<part number="\">Hammer</part>
<part number="\">Nail</part>
</order>"
Prelude> :m +Text.XML.HaXmlNow I can parse the XML and extract just the root of the document:
Prelude Text.XML.HaXml> :m +Text.XML.HaXml.Parse
Prelude Text.XML.HaXml Text.XML.HaXml.Parse> let (Document _ _ root _) = xmlParse "(No Document)" xmlTextActually, no parsing has taken place yet. xmlParse is lazy and will only parse the XML text when necessary.
The first argument to xmlParse is "(No Document)". That's just a dummy value that's used by HaXml for error reporting purposes. If I had parsed XML from a file, I would have substituted the file name for the dummy value.
Let's see what type the root has:
Prelude Text.XML.HaXml Text.XML.HaXml.Parse> :t rootThe combinators I'll be using expect a Content value not an Element. So let's create a Content value from this Element. But first we need to load another module:
root :: Element Text.XML.HaXml.Posn.Posn
Prelude Text.XML.HaXml Text.XML.HaXml.Parse> :m +Text.XML.HaXml.PosnHaving all these modules in the prompt is getting annoying. Let's remove them:
Prelude Text.XML.HaXml Text.XML.HaXml.Parse Text.XML.HaXml.Posn> :set prompt "> "Now our prompt will just be the > character followed by a space.
And now we can wrap the Element value in a Content value:
> let rootElem = CElem root noPosTo select nodes in the XML tree we can use the tag function:
> :t rootElem
rootElem :: Content Posn
> :t tagIt takes a String and a Content value (our document root) and returns potentially multiple Content values (each node whose tag name matched the supplied String).
tag :: String -> Content i -> [Content i]
Let's see the type of value we get when we supply tag with a String:
> :t tag "order"No surprise there if you're familiar with currying.
tag "order" :: Content i -> [Content i]
And let's see the type of the value returned from tag when both a String and a Content value are supplied:
> :t tag "order" rootElemHow many element names matched "order"?
tag "order" rootElem :: [Content Posn]
> length $ tag "order" rootElemYou can search for tags within tags using the /> function. Notice that the chained functions below have the same type as the tag function:
1
> :t tag "order" /> tag "part"Let's search for "part" tags within the "order" tag and see how many nodes we get (it should be 2 because we have 2 orders in our XML text):
tag "order" /> tag "part" :: Content i -> [Content i]
> length $ tag "order" /> tag "part" $ rootElemLet's grab just the first "part" element and examine its type:
2
> let firstPart = (tag "order" /> tag "part" $ rootElem) !! 0Great. We now have a single XML element in firstPart.
> :t firstPart
firstPart :: Content Posn
Let's poke around the internals of firstPart. To do that we'll pattern match on firstPart:
> let (CElem (Elem name attributes _) _) = firstPartNow we can look at the part elements tag name:
> nameAnd see how the attributes are stored:
"part"
> :t attributesThat makes sense since an element can have multiple attributes.
attributes :: [Attribute]
We know that this node just has one attribute so let's grab it:
> let (attrName, _) = attributes !! 0The second part of an Attribute is an AttValue which is a list of "Either String Reference" values. I'm not sure why this is. I thought that a single attribute could only have one value. Perhaps not?
> attrName
"number"
Let's grab the first attribute's AttValue:
> let (_, attrValue) = attributes !! 0And now we'll grab the list of Either values and store the first one in "firstAttrValue":
> :t attrValue
attrValue :: AttValue
> let (AttValue (firstAttrValue:_)) = attrValueNow we'll try to access the attribute String:
> :t firstAttrValue
firstAttrValue :: Either String Reference
> let (Left value) = firstAttrValueSure enough, the first "part" has a part number of "101".
> value
"101"
2 comments:
The only Haskell I know is Eddie.
Thanks for the post, it was very useful.
Post a Comment