Thursday, August 20, 2009

Language Oriented Programming: The Next Programming Paradigm

As mentioned previously the term language-oriented programming (LOP) today mainly refers to the underlying paradigm of JetBrains's Meta Programming System. This is due to a publication from Sergey Dmitriev entitled Language Oriented Programming: The Next Programming Paradigm. There he proclaims the next technology revolution which leads us from the Stone Age to "a new age of invention and an explosion of new technologies". The paper is very stirringly written but while smiling at this I agree with him in many points. Here is what I consider his main statements.

First of all he says that the next programming paradigm still has different names such as intentional programming, MDA and generative programming and he suggests to unite all these approaches under the term LOP. The idea of LOP is to relieve the programmer from being forced to think like the computer.

In theory programmers have the freedom to do what they want. But in practice programmers are bound to the limitations of the languages and the environments they use. Although everything could be altered and adapted in theory, in practice doing so often takes a lot of time and thus is hardly considered an option. So, in order to gain freedom the level of dependency to those must be reduced, which means it must be easy to "create, reuse, and modify languages and environments".

Given some problem a developer first forms a conceptual model in his head about how to solve the problem. Next he is forced to perform a complex mapping from what he means to what is technically required in order to make happen what he means. With LOP instead the developer invents a new domain specific language (DSL) or takes an existing one if appropriate. The goal is to have a DSL which is as close to the mental model as possible by having the concepts and notions of the problem domain at hand. This makes the mapping from the in-brain solution to an instance of the DSL straight forward. Turning this instance into executable code is finally done by a suitable compiler.

Dmitriev names three major problems with today's mainstream programming. First there is the "time delay to implement ideas" which addresses the frustrating effort it takes to communicate a solution to the computer using low level abstractions. Next there is the problem of "understanding and maintaining code" which points out that not only the writer of a program has to perform a complex mapping but so does the reader as well, just the other way round. As opposed to this LOP tries to make code self-documenting. Last there is the problem of the "domain learning curve" which goes along the lines of the previous problem. Common libraries provide an interface which uses low level abstractions such as classes and methods in order to provide high level concepts such as GUI components. So it takes a huge effort for the developer to learn how exactly to use the low level abstractions in order to get things done. Swing is a famous example of class libraries which are wannabe DSLs.

For Dmitriev a program is not necessarily a set of instructions. Instead "a program is any unambiguous solution to a problem". Turning this solution into a set of instructions for Turing-machines is a cumbersome task which should be automated. So in order to enable programmers to write programs it must be easy to specify and implement proper languages. Dmitriev goes even further as for him programs don't necessarily have to be in text form. As compilers work on abstract syntax trees any input format which translates into such a tree is fine. Text representation is even a bad choice because for every language extension one has to cope with ambiguities and circumvent them with new keywords or new bracket types. So Dmitriev suggests DSL specific editors which know about the tree structure of the document and thus provide a convenient way to edit a program. For example there can be text, table, diagram or tree based editors, even co-existing for a single DSL to support different purposes such as viewing and editing. "The best representation depends on how we think about the problem domain".

"In LOP, a language is defined by three main things: Structure, editor, and semantics. Its structure defines its abstract syntax, what concepts are supported and how they can be arranged. Its editor defines its concrete syntax, how it should be rendered and edited. Its semantics define its behavior, how it should be interpreted and/or how it should be transformed into executable code". The Meta Programming System (MPS) is a LOP implementation which supports you in specifying these three aspects of new languages. The idea of MPS is to apply LOP to itself, which means providing proper DSLs to implement DSLs. MPS therefore knows three different DSLs, the Structure Language, the Editor Language, and the Transformation Language which cover the three aspects of DSLs. Furthermore MPS separates the required work into two levels, the meta level and the program level. The meta level is for language definitions and the program level is for program writing.

Each node in the program level has a type which is a link to a corresponding node in the meta level. "The meta level type node defines what relationships its instances can have and also what properties they will have". Enumerating all meta level nodes and their properties is the purpose of the Structure Language. Relationships between program level nodes may be aggregations or simple links with named roles, cardinalities, and target types on both ends. A language can be interlinked with other languages by means of concept inheritance and nesting. An instantiation of a language is done by means of an editor.

MPS editors by default work on cells which is considered the most adequate representation of program trees. Cells can be nested and can contain arbitrary things like text, math symbols, charts or vector graphics. Some parts are constant and simply provide structure to the editor while other parts are variable and carry the important information provided by the programmer. All this can be specified by means of the Editor Language. "The Editor Language also helps you add powerful features to your own editors, like auto-complete, refactoring, browsing, syntax highlighting, error highlighting, and anything else you can think of".

The semantics of a DSL is not formalized. Instead it is implicitly given by the implementation of an interpreter or transformer. For the later case MPS provides the Transformation Language. A MPS transformer maps source code from a source language to a target language. Both languages must be defined by means of the Structure Language and for the latter there must be a simple one-to-one translation to the final result. For the transformation itself there are three different approaches, namely the iterative, the template and macro based, and the search pattern based approach.

With the iterative approach "you enumerate all the nodes in the source model, inspect each one, and based on that information generate some resulting target nodes in the target model". The template approach works like Velocity or XSLT. There is code in the target language enriched by macros which are evaluated at runtime. The Model Query Language is like XPath a language to select subtrees of the input node tree and is used by the macros to select the proper nodes and perform the desired substitutions. The language in which transformations are written is automatically generated from the input language. Similarly a pattern language is generated for the search pattern based approach which lets you perform powerful search-and-replace actions on the input node tree.

MPS provides more than just the bare tools for LOP. It comes along with a rich framework of predefined languages. First, there is the Base Language which is a general-purpose imperative programming language with "nearly-universal language features [such] as arithmetic, conditionals, loops, functions, variables, and so on". It is a good starting point to create other languages by extension, single concepts can be referenced in the programs and there are various generators from the Base Language to Java, C++, etc, so the Base Language can be the target language of transformations.
Second there is the Collection Language to accommodate the ubiquitous demand for collections in one place. Third there is the User Interface Language which is a "full-fledged language for graphical user interfaces" and is meant as a replacement for wannabe-DSL-APIs like Swing.

Switching to LOP is seamlessly possible because after the target code of a transformation was generated it behaves as if it was written manually in the first place. This means you can try LOP with one module of an existing project and leave the rest untouched. JetBrains therefore provides a plugin for IntelliJ IDEA which provides integration of MPS and classic development. Dmitriev suggests to give MPS a try by introducing a DSL for the configuration of an application, as opposed to using some scripting language for that.

No comments:

Post a Comment