ConveX is a lightweight XML parser written in C. It performs a reasonable amount of well-formless checking, but in particular does not parse the DTD section of any document. It is suitable for conversational XML applications, rather than main stream database transactions. Conversational meaning transactions that are not necessarily retained after processing, or that do not require a high level of integrity - in other words strict XML well formed ness checking or validity checking. For a more sophisticated, and conformant treatment of XML use one of the widely available parsers. Expat is popularly used, and is available in C source.
The current incarnation supports ASCII (not really UTF8) and UTF16 processing
A nested state machine, using a push down stack. Has properties of a hierarchical state machine although much of the usual semantics of HSM's are missing. "Init", "Run", and "Done" are the messages a machine may process. The machine as a whole is more of a collection of flat FSM's, that can call other FSM's, that is, the system is a stack of states, where each state has one additional discriminating state - that has none of the entry -exit semantics - and in fact, it is a C case statement at this level.
There are 4 C source modules:
Fetches the next input character as well as collecting these into a lexeme that can be extracted when the scanner is in the accepting state.
Manages a push down stack of state machines used by the scanner module.
Makes use of the State Machine to scan and parse lexemes according to the XML production rules. There is no separate Lexical analyser and parser stages, these are combined in the scanner.
Manages the other modules, and exposes a user API to handle different XML sections.
Provides support of dynamic strings, that can expand when required and possibly in future, a managed pool for better memory usage.
The interface found in xml.h is shown below:
extern
XMLError XML_Create( XML** );extern
XMLError XML_SetTextHandler( XML*, TextNotify* );extern
XMLError XML_SetStartTagHandler( XML*, StartTagNotify* );extern
XMLError XML_SetEndTagHandler( XML*, EndTagNotify* );extern
XMLError XML_SetCommentHandler( XML*, CommentNotify* );extern
XMLError XML_SetPITagHandler( XML*, PITagNotify* );extern
XMLError XML_SetParseErrorHandler( XML*, ParseErrorNotify* );extern
void XML_SetUserData( XML*, void * );extern
void* XML_GetUserData( XML* );extern
XMLError XML_Destroy( XML** );extern
XMLError XML_SetMaxTextChunk( XML *, size_t );extern
XMLError XML_ParseBlock( XML *, XMLChar*, size_t );typedef
void (TextNotify)( void *UserData, DStr * text );typedef
void (StartTagNotify)( void *UserData, DStr* name, AttribList* attribs );typedef
void (EndTagNotify)( void *UserData, DStr *name );typedef
void (CommentNotify)( void *UserData, DStr *comment );typedef
void (PITagNotify)( void *UserData, DStr *name, AttribList* attribs );typedef
void (ParseErrorNotify)( void *UserData, unsigned long line, unsigned long column, XMLError err );
<?xml version="1.0" standalone="yes" ?> <!DOCTYPE world [ <!ENTITY jason "jason"> ] > <?PI ?> <wo~rld> <room name = "start>" > 
ude; &jason; The parser doesn't process the prolog, so named entities will not be recognised. <![CDATA[ [[ CDATA TEXT HERE ]] ]]> <attrib name1="1" name2="2" name3="" name4="4" name5="5" name6="6" name7="7" name8="8" name9="9" name10="10" /> <!--- -This is the starting room --> <!-- --This is an illegal comment since, there is a double dash in the comment stream --> <exit name="north" destination="newbie" /> <long-description>This is the long description for the start room.</long-description> <script> <7his/> <!-- is a bad tag, which the parser will pick up --> <once-only language="mud command" fish='jason "was" here'> tell [name] hello :-> </once-only> This is script data to go with the script command </script> </room> <room name="newbie" description="This is the newbie area. Welcome"> <!--newbie area--> </room> <test/> <!-- &jason; Since the parser doesn't process the DTD, it can't process internal named entities --> @ hex @ symbol @ decimal @ symbol & " < > ' </wo~rld>
Opening ../../../world.xml #xml version="1.0" standalone="yes"# #PI# (wo) { [PARSE ERROR near Line 10, column 4, `Unexpected spurious characters`] (room name="start>") { [PARSE ERROR near Line 17, column 8, `Expected integer was malformed`] [PARSE ERROR near Line 17, column 15, `Undefined entity id`] The parser doesn't process the prolog, so named entities will not be recognised. [[ CDATA TEXT HERE ]] (attrib name1="1" name2="2" name3="" name4="4" name5="5" name6="6" name7="7" name8="8" name9="9" name10="10") {} (/attrib) /* - -This is the starting room */ [PARSE ERROR near Line 31, column 13, `The sequence "--" is not allowed in the middle of a comment`] [PARSE ERROR near Line 31, column 17, `Unexpected spurious characters`] (exit name="north" destination="newbie") {} (/exit) (long-description) {This is the long description for the start room.} (/long-description) (script) { [PARSE ERROR near Line 37, column 6, `Expected Identifier was malformed`] [PARSE ERROR near Line 37, column 6, `Unexpected spurious characters`] /* is a bad tag, which the parser will pick up */ (once-only language="mud command" fish="jason "was" here") { tell [name] hello :-> } (/once-only) This is script data to go with the script command } (/script) } (/room) (room name="newbie" description="This is the newbie area. Welcome") { /* newbie area */ } (/room) (test) {} (/test) /* &jason; Since the parser doesn't process the DTD, it can't process internal named entities */ @ hex @ symbol @ decimal @ symbol & " < > ' } (/wo) [PARSE ERROR near Line 59, column 5, `Unexpected spurious characters`] Finished with exit code = 0