Sunday, July 12, 2009


OMeta is an object-oriented language for pattern matching based on Parsing Expression Grammars (PEG). It makes some valuable extensions to standard PEGs which I want to summarize here. My main source of information is the paper OMeta: an Object-Oriented Language for Pattern Matching by Alessandro Warth and Ian Piumatra.

OMeta is a domain specific language for pattern matching which runs on top of a host language. Currently there is an implementation for cola and for Squeak Smalltalk. The coupling to the host language is very tight as semantic actions and semantic predicates are written in it. Semantic actions are for example used to construct the abstract syntax tree and semantic predicates are additional constraints for a rule to match, such as a (parsed) integer is greater than some threshold.

A first valuable extension to PEG is the generalization of pattern matching. As opposed to PEG OMeta can not only work on a stream of characters but on a stream of arbitrary objects and structures. Object hereby means whatever the term means in the host language. So OMeta can not only be used for lexing and parsing but also for model to model transformations and code generation. But note, that the transformation is done by the semantic actions and is thus rather limited in its capabilities. For example side effects of semantic actions are not undone in case of backtracking.

A second extension which I appreciate are parametrized productions. For example the following rule matches a user defined interval of characters:

cRange x y ::= <char>:c ? (> c x ) ? (<= c y) => c;

cRange is the name of the production which has two parameters x and y. ::= is the production delimiter. <char>:c calls the production char which matches any character and binds the result to the variable c. After that there are two semantic predicates each introduced by a leading ?. The expressions after the ? are in cola syntax and check if the character is greater than x and smaller or equal than y. => starts the semantic action for the production which in this case is just the matched character. Finally the ; marks the end of the definition of the production.

Production parameters can themselves be productions, thus higher order productions are possible. The passed production can be brought into action by the special production apply. Consider the following example:

listOf p ::= <apply p> {',' <apply p>}*;

If expr is a production for expressions and name is a production for names, then the above production can be used to recognize both a list of expressions through <listOf 'expr> and a list of names through <listOf 'name>. Parametrized productions can also be used to implement high-order production like repeat which applies n times a production p.

repeat n p ::= ? [n = '0] 
           | <apply p> <repeat [n - 1] p>;

This example reveals a OMeta feature which - as far as I can see - is not explicitly mentioned in the paper. Besides semantic predicates it is possible to embed host language code directly into productions.

OMeta is object-oriented, which means that grammars are defined within the context of classes which can inherit from each other. This introduces some very handy features. For example a language can be extended by simply deriving a new grammar class and overwriting a single production. The original behavior can be brought into action by the special production super. The following example is a sketch for a new foreveryother feature in Java introduced by a new grammar called Java++.

meta Java++ <: Java {
  stmt ::= <space>* 'foreveryother' <space>* '(' <expr>:x <space>* ')' <stmt>:s => ...
         | <super stmt>;

If you consider a production as a parse function then OMeta functions are all virtual. Thus many techniques from the object-oriented world can be applied. For instance language extensions are applications of the decorator pattern.

Modularization and reusability is even more improved by the foreign production invocation mechanism. The special production foreign lets you call an arbitrary production from an arbitrary parser. So other object-oriented patterns like the delegate pattern are possible with OMeta.

Multiple inheritance is not supported. I think the main reason for this is to keep things simple, especially for the application of the super production. I am currently not sure how important it would be to have this feature. If you need the semantics of multiple inheritance you can workaround with a delegate.

OMeta grammars can have any number of instance variables. This enables stateful pattern matching which is needed for Python's offside rule for example. It also makes it possible to implement a complete interpreter in OMeta.

A programming language which is parsed by an OMeta implementation can support lexically-scoped syntax extensions. The paper calls this an instance of a mood-specific language whose scope is to make part of a program easier to write and to read. This works by embedding OMeta productions into the source code which extend the syntax within the scope in which they are defined.

OMeta is used in its own implementation, for instance to optimize the abstract syntax tree of a OMeta program. This is good evidence for the maturity of the language. A second strong evidence is the fact that there is a nearly complete Javascript implementation in OMeta.

An interesting information concerning PEGs is that the authors claim that their "experiments with PEGs indicate that the overhead of memoization may outweigh its benefits for the common case, where backtracking is limited". Unfortunately they don't say if this was with stateful parsing or not, because states increase the overhead significantly. Maybe it is possible to derive from a given grammar if memoization pays out or not, because backtracking leads to exponential complexity only if it causes work to be redone. And if this happens or not depends on the grammar and its productions.

I find OMeta quite interesting. If I used it, however, I would constrain semantic actions to create only the syntax tree. Anything else like interpretation of transformation should be done by other tools which are made for that.

Update: I just saw, that OMeta is also covered by Wrath's dissertation Experimenting with Programming Languages. Besides what is mentioned in the paper chapter 2.4 gives a definition of the operational semantics of OMeta.

No comments:

Post a Comment