> Expat4D Home > Documentation

Optimisation Notes


This section is to help give you some ideas for optimising the performance of Expat4D. We are in process of determining any additional optimisations which can be made in the plug-in, although for most applications, Expat4D is very fast. That said, there are certain things to bear in mind when writing your processors in 4D which will help get the most from the performance.

For very small documents or for applications which don't require break-neck performance, these considerations may not be necessary and in many circumstances they may turn out to be micro-optimisations, but in testing Expat4D, these are some of the things which we've found which really speed up performance.

The demonstration database has not been written with particular optimisations in mind except in a couple of instances. This was done on purpose to help make it easier to new users to understand what is going on in the code.

Calling Handler Methods

If this parser were written in 4D native code we would almost certainly have to use the EXECUTE command in order to call the handler routines. The EXECUTE command is far too slow for this type of application, but from a plug-in there are other options for calling 4D methods which are far more efficient. Expat4D utilises the PA_ExecuteMethodByID call when calling 4D methods which is extremely fast in comparison. The method IDs are determined when you set the handlers so that during parsing, the plug-in doesn't have to look up the method ID agains.

That said, calling a 4D method is much slower than calling a function in C, so if you do not need to implement all the handler routines, don't do so. For example, if you are just looking for the text within a specific element in a document, you need only the Element handlers, and the CDATA handler. The rest can be left off.

The "Default" Handler

The default handler is called for every piece of XML data in the document which is not handled by the handlers you have set up. Unless you really need to get everything from the document, you should avoid using the default handler as it can cause a large number of callbacks to your 4D handler method.

OutputBuffer

When translating XML documents into other formats, there are different options for what to do with the translated version during parsing.

If your requirement is to simply output the data to a file, you could SET CHANNEL and SEND PACKET to simply send the translated document to the file as you are parsing the document. This has a big overhead, especially if you're only sending small packets to the file at a time.

A better way is to build the output in a text or BLOB variable, then output that at specific intervals, or if memory permits, output everything to the BLOB and use the DOCUMENT TO BLOB command.

In order to get the best performance from using the BLOB, it is better to pre-size the BLOB to roughly the size of the expected output to save many memory moves rather than increment the size of the BLOB for each additional piece of text added.

This can all be easily done in 4D, but to make it easier still there is an additional feature in Expat4D which allows you to append text to a BLOB managed by the plug-in, then receive the contents into a BLOB variable whenever you want, without duplicating the BLOB in memory. By default the BLOB is resized in increments of 32K although this change be changed on a per parser basis, as can the initial size of the BLOB. See the section of the command reference for more details on using this feature.

Possible Future Optimisations

During the development of Expat4D, we have had various ideas for speeding up certain processing operations by adding features to the plug-in. We have listed these features in the order in which they are most likely to be added to be added, but if you feel that a particular feature should be higher in the list, or is missing altogether, please let us know and we will see if it's possible to add.

  • Ignore CDATA Whitespace Option - Currently, the parser will call the CDATA handler whenever character data is found, even if the data consists only of whitespace characters. This means you get a callback for something which you're actually not interested in, but you don't know this until the callback has been made and the whitespace CDATA passed to the handler. In some applications, the whitespace, or at least some of it, may be significant, so you would want the callback. One of our ideas it to add an option so that on a per parser basis you can have Expat4D ignore whitespace at the start or end of CDATA sections. This means that less data is passed to the handler and the handler is called less times. In certain applications this could give a significant performance increase. There could also be a similar option for the Default handler.
  • Ability to switch Handlers on after parsing has started. Currently, you can change the handler routines used at any time during parsing, but if you want to set up a handler during parsing which wasn't set up when you first called xml_Parse or xml_ParseBLOB, it will not work. We may add this facility in the next major release.
  • Ensuring CDATA handler is passed complete block of CDATA in one go rather than splitting the block up, unless the data is greater than 32K.
  • Option to allow CDATA to be sent directly to the OutputBuffer without calling the CDATA handler in 4D. This could be useful in certain translation applications.
  Expat4D Logo

Developer Documentation

Creating Parsers, Destroying Parsers, Parsing Text

Configuring Handler Methods

Position and Error Reporting Functions

Miscellaneous Functions

Output Buffer Functions

Error Codes


> Expat4D Home > Documentation

Last Modified: 19th April 2001 at 11:11 PM