This project started from my frustration that I could not find any simple,
cross-plateform XML Parser for use inside my tools. Let's look at the well-known
Xerces C++ library: the complete library is 53 MB! (12.1 MB compressed in a
zipfile). I am currently developping many small tools. I am using XML as standard
for all my input /ouput configuration and data files. The source code of my
small tools is usually around 600KB. In these conditions, don't you think that
53MB to be able to read an XML file is a little bit "too much"? So
I created my own XML parser. My XML parser "library" is composed of
only 2 files: a .cpp file and a .h file. The total size is 63 KB.
Here is how it works: The XML parser loads a full XML file in memory, it parses
the file and it generates a tree structure representing the XML file. Of course,
you can also parse XML data that you have already stored yourself into a memory
buffer. Thereafter, you can easily "explore" the tree to get your
data. You can also regenerate a formatted XML string from a subtree. Memory
management is totally transparent through the use of smart pointers (in other
words, you will never have to do any new, delete, malloc or free)("Smart
pointers" are a primitive version of the garbage collector in Java).
If you want to link to this page from your website, you can use this URL: http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
Here are the caracteristics of the XMLparser library:
< | < | less than |
> | > | greater than |
& | & | ampersand |
' | ' | apostrophe |
" | " | quotation mark |
<?xml version="1.0" encoding="ISO-8859-1"?>
<PMML version="3.0"
xmlns="http://www.dmg.org/PMML-3-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance" >
<Header copyright="VADIS"> Hello World!
<Application name="RANK For <you>" version="1.99beta" />
</Header> <Extension extender="RANK" name="keys"> <Key name="urn"> </Key> </Extension>
<DataDictionary>
<DataField name="persfam" optype="continuous" dataType="double">
<Extension extender="RANK" name="isAge" value="0" />
<Value value="9.900000e+001" property="missing" />
</DataField>
<DataField name="prov" optype="continuous" dataType="double" />
<DataField name="urb" optype="continuous" dataType="double" />
<DataField name="ses" optype="continuous" dataType="double" />
</DataDictionary>
<RegressionModel functionName="regression" modelType="linearRegression">
<RegressionTable intercept="0.00796037">
<NumericPredictor name="persfam" coefficient="-0.00275951" />
<NumericPredictor name="prov" coefficient="0.000319433" />
<NumericPredictor name="ses" coefficient="-0.000454307" /> <NONNumericPredictor name="testXmlExample" />
</RegressionTable>
</RegressionModel>
</PMML>
Let's analyse line by line the following small example program:
#include <stdio.h> // to get "printf" function #include <stdlib.h> // to get "free" function #include "xmlParser.h" int main(int argc, char **argv) { // this open and parse the XML file:
XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml","PMML");
// this prints "RANK For <you>": XMLNode xNode=xMainNode.getChildNode("Header"); printf("Application Name is: '%s'\n", xNode.getChildNode("Application").getAttribute("name"));
// this prints "Hello world!": printf("Text inside Header tag is :'%s'\n", xNode.getText());
// this gets the number of "NumericPredictor" tags:
xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable"); int n=xNode.nChildNode("NumericPredictor"); // this prints the "coefficient" value for all the "NumericPredictor" tags: int iterator=0;
for (int i=0; i<n; i++) printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",&iterator).getAttribute("coefficient"))); // this prints a formatted ouput based on the content of the first "Extension" tag of the XML file:
char *t=xMainNode.getChildNode("Extension").createXMLString(true);
printf("%s\n",t);
free(t); return 0;
}
To manipulate the data contained inside the XML file, the first operation is to get an instance of the class XMLNode that is representing the XML file in memory. You can use:
XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml","PMML");or, if you use the UNICODE windows version of the library:
XMLNode xMainNode=XMLNode::openFileHelper("PMMLModel.xml",_T("PMML"));or, if the XML document is already in a memory buffer pointed by variable "char *xmlDoc" :
XMLNode xMainNode=XMLNode::parseString(xmlDoc,"PMML");This will create an object called xMainNode that represents the first tag named PMML found inside the XML document. This object is the top of tree structure representing the XML file in memory. The following command creates a new object called xNode that represents the "Header" tag inside the "PMML" tag.
XMLNode xNode=xMainNode.getChildNode("Header");The following command prints on the screen "RANK For <you>" (note that the "<" escape sequence has been replaced by "<"):
printf("Application Name is: '%S'\n", xNode.getChildNode("Application").getAttribute("name"));The following command prints on the screen "Hello World!":
printf("Text inside Header tag is :'%s'\n", xNode.getText());Let's assume you want to "go to" the tag named "RegressionTable":
xNode=xMainNode.getChildNode("RegressionModel").getChildNode("RegressionTable");
Note that the previous value of the object named xNode has been "garbage collected" so that no memory leak occurs. If you want to know how many tags named "NumericPredictor" are contained inside the tag named "RegressionTable":
int n=xNode.nChildNode("NumericPredictor");
The variable n now contains the value 3. If you want to print the value of the coefficient attribute for all the NumericPredictor tags:
for (int i=0; i<n; i++) printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",i).getAttribute("coefficient")));Or equivalently, but faster at runtime:
int iterator=0; for (int i=0; i<n; i++) printf("coeff %i=%f\n",i+1,atof(xNode.getChildNode("NumericPredictor",&iterator).getAttribute("coefficient")));
If you want to generate and print on the screen the following XML formatted text:
<Extension extender="RANK" name="keys"> <Key name="urn" /> </Extension>
You can use:
char *t=xMainNode.getChildNode("Extension").createXMLString(true);
printf("%s\n",t);
free(t);
Note that you must free the memory yourself (using the "free(t);"
command) : only the XMLNode objects and their contents are "garbage collected".
The parameter true to
the function createXMLString
means that we want formatted output.
The XML Parser library contains many more other small usefull methods that are
not described here (The zip file contains some additional examples to explain
other functionalities). These methods allows you to: