3.2.1 WSS -- WAFER-STATE-SERVER ASCII File Format

The WSS file format was designed to be human readable and was therefore implemented as a simple ASCII file format. It allows an easy generation of simple Wafer structures with a text editor. Both Reader and Writer interfaces are implemented. Thus this file format can be used as a native file format. WSS was implemented using the object-oriented parser generator ANTLR [32].

Let us briefly discuss the decision making process which finally led to the use of ANTLR in favor of other parser generators. A parsing process is a means by which data stored in a file in a certain language (defined by the grammar of that language) is copied into a data structure in computer memory. Generally spoken any kind of parsing process can be done in exactly two different ways as to who keeps control over the data acquisition. One possible approach is to let the parser build the target data structures by the use of so-called action code or callback functions. This in turn implies a detailed view of the target data structures from within the parser program. The parser cannot be fully decoupled from the application. In programming language terms this means that the parser needs to include the files containing the data structure declarations of the target data structures. Obviously, such a parser can only be used to build one certain data structure. This shortcoming can not be overcome.

A different approach is to let the application program (here: the WSS-READER) choose which data it would like to acquire and also when the data will be collected. Thereby the parser can be used by several applications (e.g. to implement a Reader and a visualization program) simply because it has no "knowledge" of the underlying data structure it is used to build. Of course, compared to the callback method, the application is now responsible to drive the parsing process, that is, it needs to know the exact protocol how to retrieve the data from the parser.

It turned out that the widely used parser generator program pair LEX and YACC [33] is not at all suited for the second approach. A parser based on LEX and YACC has only one entry point into the program code. Thus, it must use the action code to either directly build the target data structures or to build an intermediate data structure. This intermediate structure must then -- in a second step -- be queried by the application. This results at best in a performance penalty (copying the data between memory locations).

On the other hand, ANTLR presents a clean object-oriented approach and directly supports the implementation of a WAFER-STATE-SERVER Reader interface by realizing each parser rule as a separate method of a parser class. For some Reader interface methods (e.g. nextPoint) the parser methods can directly be used without any extra code. Also, the specification of the grammar is considerably shorter compared to LEX and YACC grammars. An implemented LEX scanner that builds intermediate data structures and is capable of parsing only one attribute per program invocation, has approximately the same number of lines of code as the ANTLR version of the full parser. This is attributable to the object-oriented concept, the support of semantical and lexical predicates, and an arbitrary number of look-ahead tokens.

A lexical predicate allows for choosing a rule depending on an arbitrary number of look-ahead tokens on the input stream. In the WSS-READER lexical predicates are used to distinguish points of different dimensions. This means that the lexical scanner is able to recognize the dimension of a point, and return an appropriate token. This allows for useful error reports and recovery strategies. In case of a point of a wrong dimension encountered, a warning message is printed and the point is ignored3.6. Fig. 3.7 depicts the scanning of a point by means of a syntactical predicate.

Figure 3.7: Scanning of points with different dimension. In the ANTLR language a syntactical predicate is introduced by the sequence =>. In this example points of different dimensions are scanned. The main rule is called POINTND. POINT1D, POINT2D and POINT3D are sub-rules. The main rule decides which sub-rule is to use and returns the corresponding token. The actual decision about the dimension is based on the occurrence of the carriage return (CR) character. The parser checks whether the dimension of the scanned point is correct, i.e. corresponds to the dimension of the file. If the parser encounters a point of wrong dimension a warning message is issued and the point is skipped.
\begin{figure}\hrulefill\par\
\par\vspace*{-0.7cm}
\scriptsize\begin{verbatim}...
...= POINT3D; }\end{verbatim}\par\
\par\vspace*{-0.7cm}
\hrulefill
\par\end{figure}

A semantic predicate allows to choose a rule based on the run time evaluation of an expression. In the WSS-READER semantical predicates are used to switch between segment grid and boundary grid scanning. As the two grids differ only in the dimension of the grid elements it is desirable to re-use the rules that are used to scan grid elements of segments. To achieve this a variable that holds the dimension of the next element to scan is used. Fig. 3.8 illustrates the usage of a semantical predicate.

Figure 3.8: Scanning of dimension dependent grid elements using a semantical predicate. The variable scanDim is used to switch between scanning of grid elements of different dimension. The semantical predicate is indicated by the syntax { predicate }?. The object abst is needed to hide the syntax differences between JAVA and C++ versions of the parser. The differences mainly arise from the C++ dereference operator (->). Since JAVA lacks the notion of pointers the differentiation syntax must be avoided in the C++ version. This is achieved by using a global object (abst). The shown rule will directly be translated into a method named grid_el that returns an instance of a WafGridEl object to the caller. Depending on the dimension of the grid element, $ 2$ to $ 4$ references of the returned object are valid. The references are returned in the variables r11 to r34 by the according UINT tokens. The depicted code snippet is used in the WSS Reader to scan boundary and attribute grid elements with the same rules. Fig. 3.9 depicts the definition of the WafGridEl struct.
\begin{figure}\hrulefill\par\
\par\vspace*{-0.7cm}
\scriptsize\begin{verbatim}...
... grel.nr = 4;
}\end{verbatim}\par\
\par\vspace*{-0.7cm}
\hrulefill
\end{figure}

Figure 3.9: Definition of WafGridEl struct. An object of type WafGridEl contains an array of four unsigned integer variables and a counter (nr) to indicate how many of these variables contain valid point references. This mechanism allows keeping the rule for the parsing process (Fig. 3.8) dimension independent.
\begin{figure}\hrulefill
{\footnotesize\begin{verbatim}struct WafGridEl
{
u...
...gned int refs[4];
};\end{verbatim}}
\vspace*{-0.3cm}\hrulefill
\par\end{figure}

Furthermore, ANTLR supports three different target programming languages (at the time of the writing of this work these are C++, JAVA [34] and SATHER, two of which (C++ and JAVA) are used in the WAFER-STATE-SERVER framework. A full description of the WSS grammar is given in Appendix B.

2003-03-27