Stratego/XT is the first meta programming system I had a closer look at. The project defines itself as "a language and tool set for program transformation". By program transformation they mean "programming tasks using some form of automatic program generation or transformation, such as code generation from a domain-specific language, aspect weaving, optimization, or specialization of a generic program to a particular context." So Stratego/XT follows a more generalized approach then what
MDE needs.
Here is an extraction of major ideas, features and insights I got from the project homepage, the
manual, the article on
Stratego/XT 0.17 in the
elsevier journal
Science Of Computer Programming and the technical report
Stratego/XT 0.17. A Language and Toolset for Program Transformation
ATerm
The Stratego tools work on document trees which are in the Annotated Term (
ATerm) format. ATerms are nested constructor applications with integers, strings, lists and tuples as parameters. Additionally an ATerm can be annotated to provide additional semantic information. ATerms implement
maximal sharing which means that common sub trees are stored once and referred to multiple times. There are text based and binary ATerm representations, tools for converting, e.g from and to XML, and libraries to work with ATerms.
SDF
The syntax definition formalism (
SDF) is a language for defining syntax and is implemented by scannerless generalized-LR parsing (
SGLR).
Semantic actions in SDF are limited to the specification of one ATerm constructor per production. There is no support for performing any action on the parse tree while parsing. This is correct for meta-programming systems because the same parser must be able to feed input into different transformers, which is where the semantics are defined.
As far as I can see, SDF has the following major advantages:
- modules: SDF supports modules, so different aspects of the grammar can be specified separately and such parts can be reused by other languages.
- integration of lexical and context-free syntax: as opposed to lex and yacc SDF does no distinguish much between lexical and context-free syntax. The only difference is that between the symbols of a context-free production layout is allowed. Hereby layout is meant to be white spaces and comments and can be arbitrarily defined.
- disambiguation: SGLR grammars can be ambiguous. Therefore SDF provides the following means to narrow the set of valid parse trees down.
- specification of associativity of productions
- specification of priorities between productions
- lexical restrictions, which basically are negative look-aheads on terminals
- reject productions: single productions can be rejected explicitly. For example this is used to avoid that keywords are parsed as identifiers and to specify floating point literals in a "all but" fashion.
For me major disadvantages are:
lexical syntax
"/*" CommentPart* "*/" -> BlockComment
~[\/\*] -> CommentPart
Asterisk -> CommentPart
Slash -> CommentPart
BlockComment -> CommentPart
[\/] -> Slash
[\*] -> Asterisk
lexical restrictions
Asterisk -/- [\/]
Slash -/- [\*]
Parse-unit
Parse-unit is a unit testing tool for SDF. A single Parse-unit test specifies an input string and the expected resulting ATerm. A test suite is a collection of such tests and can executed from the parse-unit command line tool. The output is along the lines of all the *Unit tools out there.
The Stratego language
Stratego is the transformation language of the Stratego/XT system. A Stratego program operates on an ATerm and applies user defined rewrite rules according to a user defined strategy to it.
A rewrite rule consists of a pattern and a substitution term with common variables. If the pattern matches the so called subject term the variables are bound to the matching values. The substitution term is then expanded and used as a replacement for the matched subject term. If the pattern does not match then the rewrite rule fails.
Stratego supports the
embedding of the object language which makes writing rewrite rules quite comfortable. Instead of specifying pattern and substitution as ATerm trees you can state the corresponding code from the object language. In order to make this work you have to
define the meta-variables that you use in the patterns.
Single rewrite rules can be combined by strategies into more complex transformations. The
sequential composition operator
s1 ; s2
first applies strategy
s1
and if in case of success then
s2
. If either
s1
or
s2
fails the sequence fails as well. The
deterministic choice operator
s1 <+ s2
first tries to apply
s1
and if that succeeds the choice succeeds. However, if the application of
s1
fails,
s2
is applied. The
guarded choice operator
s1 < s2 + s3
first applies
s1
. If that succeeds then
s2
, else
s3
is applied. If
s2
fails the complete expression fails and no backtracking to
s3
takes place. The strategy
id
always succeeds and the strategy
fail
always fails, both without altering the subject term.
The two basic building blocks of the Stratego language are
building terms and
matching terms. The building term
!p
replaces the subject term with the instantiation of the pattern
p
using the bindings from the environment to the variables occurring in
p
. Matching terms come in three different flavors. First, the
literal term ?t
matches the subject term against the term
t
. Second, the
term pattern ?p
compares the subject term to the pattern
p
and adds bindings for the variables in pattern
p
to the environment. Third, the
match strategy ?x
binds the variable
x
to the subject term in the environment. The
(explicit) scope {x1, x2 : s}
provides unique instances of the variables
x1
and
x2
during the execution of the strategy
s
.
On top of these building blocks a number of syntactic sugar is provided.
- anonymous rewrite rule:
(p1 -> p2)
is sugar for ?p1 ; !p2
- where:
where(s)
is sugar for {x : ?x; s; !x}
s is applied without changing the subject term. Other side effects like changing variable bindings are still possible, though.
p1 -> p2 where s
is sugar for ?p1; where(s); !p2
providing additional conditions for the rule to apply.
- lambda rules:
\ p1 -> p2 where s \
is sugar for {x1, x2 : (p1 -> p2 where s)}
with x1
and x2
being variables of p1
. This allows you to provide rules on the fly.
- applying a strategy:
<s> p
is sugar for !p; s
s => p
is sugar for s; ?p
<s> p1 => p2
is sugar for !p1; s; ?p
- if-then-else:
if s1 then s2 else s3 end
is sugar for where(s1) < s2 + s3
if s1 then s2 end
is sugar for where(s1) < s2 + id
- switch: C-style switch which is desugerated rather complex. Have a look at the manual
Dynamic rules are templates for rules which are instantiated by other rules with information obtained from parsing. This adds the additional power of context sensitive rules which is necessary because sometimes you know how a rule must look like only if you have already read parts of the document.
There is a
dynamic rule intersection operator
s1 /R\ s2
"which applies the strategies s1 and s2 sequentially to the current term, each with a copy of the dynamic rule set for R. After applying both strategies, the resulting rulesets for R are intersected." To be honest I didn't really understand what this means. A much clearer feature are
overlays which are parametrized abbreviations for ATerms and can help you to save a lot of typing.
Stratego provides a set of
traversal strategies like
topdown
,
bottomup
,
innermost
and
all
with which you can control how the ATerm tree is traversed in order to apply the rewrite strategies. Additionally Stratego provides a
standard library with arithmetic operations, lists, strings, hash tables, sets, I/O functions and support for command line options - all in form of strategies. This is weird. Finally it is possible to make native C calls through the application of strategies.
Format checking
In order to detect bugs in the transformers it is a good idea to do format checking on the output ATerm. Therefore XT provides tools for deriving a so called
regular tree grammar from a syntax definition. A format checker then checks a given ATerm against this grammar.
Pretty printing
After transformation the ATerm must be turned into text. Stratego/XT calls this step pretty printing which is done by the Generic Pretty Printer (
GPP) package. Pretty printing is done in two steps. First an abstract syntax tree in
AsFix format is converted to a declarative layout in the
Box language. This step is configured by a pretty-printing table which is derived from the grammar specification. The second step is to format the layout description into text, HTML, LaTeX or other display formats. The official
GPP homepage has a
graphical overview of the GPP process.
Pretty-printing is not trivial. For example the printer must decide which parentheses are important for the semantics and which ones are optional and should be omitted to make the result readable for humans. A second problem is the preservation of whitespaces and comments as abstract syntax trees normally do not include them. Furthermore it is an unsolved problem how to automatically generate text which is considered to be nice by humans. That's why GPP supports additional pretty-printing tables which overwrite the generated behavior so that manual configuration of the printing result is possible.
Transformation Tool Composition (XTC)
XTC implements the XT component model and provides support for creating compositions of XT components. It is also used to register components in an XTC repository which is used to find the installation location of a tool without needing to know all the installation paths.
XTC is not a build system, though. Instead make, autoconf and automake is used, supported by a tool called
Auto XT.
Stratego Shell
The Stratego project provides an
interactive shell which supports the complete Stratego language. This is very handy for developing because you can experiment with rules and strategies on arbitrary ATerms. Furthermore the shell is claimed to be usable as debugger, but I don't know how that works. Finally there is an
XTC integration so that you can import and invoke XT components from the shell.
Stratego/XT
The project comes along with a very extensive and good
documentation. The manual though is in unstable state and partially incomplete. It might happen that some of the links to the manual will break soon because of that. We'll see.
Stratego/XT can be considered mature because is is used in its own implementation. For example the Stratego compiler which is a Stratego to C transformer is implemented in Stratego. Implementations of many other Stratego/XT components make heavy use of SDF and Stratego as well.
Stratego/XT uses the
pipes and filters pattern, which is a good idea in my opinion. Among other things this means that the components are exchangeable. For example you can feed a stratego program with an ATerm obtained from a
PEG parser, for example.
So using SDF is optional, but nevertheless highly recommended. Stratego/XT is grammar centric. Many of the components of a concrete pipes and filters setup are either themselfs generated or use configuration which is generated from a corresponding SDF grammar. If you replace SDF you have to provide new generators for that purpose, too.
A disadvantage of the pipes and filters pattern is the inefficiency of constantly dumping, copying and reading the data at the pipes. To overcome this the currently preferred approach of Stratego/XT is that every component forms a shared library which can be linked into a single application.
Conclusion
Investigating Stratego has been very illuminating for me. You can see of which components a meta programming system consists, which problems must be solved and how they can be solved.
But in my opinion the overall Stratego/XT system in general and most of its components in particular are too complicated. For example have a look at the
tool reference which lists 43 different tools which operate on some of the various different formats and languages. The Stratego languages itself is also very complex and hard to read. Maybe this is because I am new to it, but I constantly have to run a Stratego interpreter in my brain in order to understand, what that program does. And often enough I don't get it.
I don't say that Stratego/XT could be done leaner. But I think that the crucial argument for MPE is that specifying a new DSL and implementing proper transformers must be easy. The whole advantage of the approach is gone if introducing DSLs is as cumbersome as writing the to-be-generated code manually. I think the approach of Stratego/XT for program transformation is too generic to be a good tool for MSE.
The Stratego project itself has a different opinion. On this year's
code generation conference there has been a hands-on tutorial entitled
Creating Domain-Specific Languages with Stratego/XT. So maybe I am wrong. I will see if other meta-programming systems are easier.
No comments:
Post a Comment