Stumbling into generative programming: A document processing framework

Frequently softwarec companies almost accidentially end up using generative techniques, driven by the need to create different versions for different customers or by the sheer size of their - perhaps international - applications which are harder to develop with every release.

A company has created an application for forms processing - both electronic and paper based. Customers are banks, insurances etc.

The appliation needs to become configurable. A configuration file is created which allows some processing steps to be parameterized per customer.

The configuration file grows and soon contains many different things.

The software is modularized and processing functions are now tied to configuration items. The validity of the forms processing workflow is now dependent on the validity of the configuration file

The configuration file is parsed at application startup. Errors in the file are extremely hard to find. Parsing is hard coded and no explicit grammar exists.

The system grows and needs to support different scanning hardware. This is done by changing and adapting the configuration file. A customer with the same workflow but a different scanner gets a different configuration file which was adapted using copy and paste techniques.

The company realizes that they cannot perform in field updates to the software because all configuration files are different and contain both company defaults and individual customizations done by customers.

The product sells well and creates a big problem exactly because it sells well: Every new installation at a customer site needs service and support which is increasingly difficult to provide. Just thinking about new releases causes nightmares for the management. The service team grows...

This is the moment when the CTO starts planning a re-engineered product. Goals are not only a better servicable and maintainable software product but also to support new business areas, e.g. document processing.

Several types of analysis are performed: the new domain is analysed with respect to commonalities and variations, core features etc. This results in a new business conceptual model which is expressed in a switch in terminology from "form" to "document". From this, hot spots and cold spots are derived and turned into requirements for the new software.

A software internal analysis uncovers weak spots (e.g. no grammar for config file). The central configuration file is cut into pieces reflecting different aspects of the software: workflow (clearing functions), document structures, hardware. SGML is used to describe the structure of the configuration file. SGML dtd's turn out to be too weak to express the semantics.

OO-based modularization of the software results in a CORBA based framework which dynamically loads all necessary classes at startup. The software is now structured in core/branch/customer specific areas which are both reflected in the source code control system as well as in the configuration files. The configuration system slowly turns into a repository for implementation classes and workflow commands.

Framework classes are full with system internal special coding/naming conventions etc. A tool is created to generate class templates for the different software domains.

The software is ported from OS/2 to Unix and NT. Many adaptions are done by using the C++ preprocessor to generate the platform dependent code.

More and more meta-data are extracted from the software and put into the meta-data layer. Slowly developers understand that extracting those data is also a step toward description and abstraction. There is no need that the syntax of the meta-data needs to resemble C++.

At this point the following ideas and problems come up:

  1. The software is written in C++ which has almost no runtime type information. After the experience with some simple forms of code generation the development teams thinks about generating an extension to the C++ system to provide runtime type information.

  2. The developers realize that they have a complete model of the documents but they don't use it to generate the database structures from it. Instead, the whole information is replicated and the tables are manually created. This results in possible mismatches between DB and the data layer.

  3. While they are at it, the developers realize that they could also generate a lot of the end-user GUI layout from their workflow and document model. And on top of that: a document editor could be turned into a document structure editor as well.

  4. It is unclear whether they should put more effort into making the runtime system more dynamic or whether they should generate more customer specific code from the beginning.

  5. The software becomes more and more complex. A logging sub-framework is implemented and tied to the class generator. Still, developers need to put in the calls to the logging system manually.

  6. The developers realize that their software is composed of many different aspects which are somehow mingled in their code. Wouldn't it be nice to generate logging calls completely automatic? Call parameters and returns are know to the system so why can't it do that? Which again ends in complaining about the lack of meta-programming features in C++.

  7. The model and meta-data give reasons for headaches as well: A lot of things are not expressed in meta-models. Source code needs to understand many details of the model. Should the model and meta-model information be also available at runtime or just be used at generation time? Should the configuration turn into a Domain Specific Language?

This course will try to make the options and problems clear. At the end you should know when to use what kind of generative technologies.