We had to take a timecard that was in MS Word 2003 format and create XML. An example of the word document is as follows.

The red box indicates text that was hidden in the document which I have made visible for clarity. The program reads through the document, extracts text from the fields we are interested in and then converts it to XML as follows:

<?xml version="1.0" encoding="utf-8"?> <ns0:file xmlns:ns0="http://schemas.stottis.com/thomson/3e/proforma/detail"> <record> <date>Jun 04 08</date> <narrative>Time Entry Narrative</narrative> <hours>5.00</hours> <timecard>3330372</timecard> </record> <record> <date>Jun 05 08</date> <narrative>Time Entry Narrative</narrative> <hours>2.00</hours> <timecard>3330371</timecard> </record> <record> <date>Jun 07 08</date> <narrative>Time Entry Narrative</narrative> <hours>5.00</hours> <timecard>3330370</timecard> </record> <record> <date>Jun 08 08</date> <narrative>Time Entry Narrative</narrative> <hours>3.00</hours> <timecard>3330369</timecard> </record> </ns0:file>