Gentle introduction to the main concepts of the Rascal language.


Rascal is based on a dozen concepts and having a global understanding of them will help to grasp the language more quickly. Here we will informally explain and illustrate these concepts and often we will refer to the Rascal Language Reference for further details. In other words, we are providing here a friendly front-end for the actual language description.

Most language concepts are described separately but some features we just mention here:

  • Rascal programs consist of modules that are organized in packages.

  • Modules can import other modules.

  • The visibility of entities declared in modules can be controlled using public/private modifiers.

  • Data structures may have annotations that can be explicitly used and modified.

  • There is an extensive library for built-in datatypes, input/output, fact extraction from Java source code, visualization, and more.

Here are the concepts to be discussed:

1. Static Typing


Static type checking.

type lattice
Figure 1. Type Lattice

Rascal has a static and a dynamic type system, which interact with eachother. The static type system is used by a type checker (not yet released) to predict errors and give warnings where possibly slipups have been made. The dynamic type system ensures well-formedness of data structures and plays an important role while pattern matching, since many algorithms dispatch on the types of values.

Rascal’s static type system does not ensure that all functions will go right: * functions may throw exceptions. * functions may not be defined for the specific pattern which occur on the call site.

However, the static type system will produce an error when a function will certainly throw an exception, or when it is certainly not defined for a certain case. Also it catches some logical tautologies and contradictions which would lead to dead code.

The Rascal types are ordered in a so-called type lattice shown in the figure above.

The arrows describe a subtype-of relation between types. The type void is the smallest type and is included in all other types and the type value is the largest type that includes all other types. We also see that rel is a subtype of set and that each ADT is a subtype of node. A special role is played by the datatype Tree that is the generic type of syntax trees. Syntax trees for specific languages are all subtypes of Tree. As a result, syntax trees can be addressed at two levels:

  • in a generic fashion as Tree and,

  • in a specific fashion as a more precisely typed syntax tree. Finally, each alias is structurally equivalent to one or more specific other types.

Rascal does not provide an explicit casting mechanism (as in Java), but pattern matching can play that role.

The language provides higher-order, parametric polymorphism. A type aliasing mechanism allows documenting specific uses of a type. Built-in operators are heavily overloaded. For instance, the operator + is used for addition on integers and reals but also for list concatenation, set union and the like.


Some example can illustrate the above.

rascal>int I = 3;
int: 3

Since I is declared as type int, we cannot assign a real value to it:

rascal>I = 3.5;
|prompt:///|(4,3,<1,4>,<1,7>): Expected int, but got real
Advice: |http://tutor.rascal-mpl.org/Errors/Static/UnexpectedType/UnexpectedType.html|
rascal>num N = 3;
num: 3

Since N is declared as type num, we can assign both int and real values to it:

rascal>N = 3.5;
num: 3.5

Since all types are a subtype of type value, one can assign values of any type to a variable declared as value:

rascal>value V = 3;
value: 3
rascal>V = "abc";
value: "abc"
rascal>V = false;
value: false

We can use pattern matching to classify the actual type of a value:

rascal>str classify(value V){
>>>>>>>  switch(V){
>>>>>>>    case str S: return "A string";
>>>>>>>    case bool B: return "A Boolean";
>>>>>>>    default: return "Another type";
>>>>>>>  }
str (value): function(|prompt:///|(0,150,<1,0>,<7,1>))
str: "A Boolean"
rascal>V = 3.5;
value: 3.5
str: "Another type"

In addition to these standard examples, it is interesting that all Algebraic Data Types are subtypes of type node. Let’s introduce a simple Color data type:

rascal>data Color = red(int level) | blue(int level);

Unsurprisingly, we have:

rascal>Color C = red(3);
Color: red(3)

Due to subtyping, we can also have:

rascal>node ND = red(3);
node: red(3)

One example of the actual application of subtypes can be found in Count Constructors.

2. Datatypes


Built-in and user-defined datatypes.


Rascal provides a rich set of datatypes:

  • Boolean (bool).

  • Infinite precision Integer (int), Real (real), and Number (num).

  • Strings (str) that can act as templates with embedded expressions and statements.

  • Source code Locations (loc) based on an extension of Universal Resource Identifiers (URI) that allow precise description of text areas in local and remote files.

  • Date and time values (DateTime, datetime).

  • List (list).

  • Tuple (tuple).

  • Set (set).

  • Map (map)

  • Relation (rel).

  • Untyped tree structures (Node, node).

  • User-defined algebraic datatypes (Algebraic Data Type, data) allow the introduction of problem-specific types and are a subtype of node. This makes it possible to have typed and untyped views on the same data. A special case are syntax trees that are the result of parsing source files are represented as datatypes (Tree).

There is a wealth of built-in operators and library functions available on the standard datatypes.

These built-in datatypes are closely related to each other:

  • In a list all elements have the same static type and the order of elements matters. A list may contain the same value more than once.

  • In a set all elements have the same static type and the order of elements does not matter. A set contains an element only once. In other words, duplicate elements are eliminated and no matter how many times an element is added to a set, it will occur in it only once.

  • In a tuple all elements (may) have a different static type. Each element of a tuple may have a label that can be used to select that element of the tuple.

  • A relation is a set of tuples which all have the same static tuple type.

  • A map is an associative table of (key, value) pairs. Key and value (may) have different static type and a key can only be associated with a value once.


Here are some examples of the built-in data types:

Type Examples


true, false


11, 101, 1-11, 1123456789


1.01, 11.0232e201, 1-25.5


"abc", "first\nnext", "result: <X>"






<1,2>, <"john", 43, true>


[], [1], [1,2,3], [true, 2, "abc"]


{}, {1,2,3,5,7}, {"john", 4.0}


{<1,2>,<2,3>,<1,3>}, {<1,10,100>, <2,20,200>}

map[T, U]

(), (1:true, 2:false), ("a":1, "b":2)


f(), add(x,y), g("abc", [2,3,4])

A fragment of the datatype that defines the abstract syntax for statements (assignment, if, while) in a programming language would look as follows:

data STAT = asgStat(Id name, EXP exp)
          | ifStat(EXP exp,list[STAT] thenpart,
                           list[STAT] elsepart)
          | whileStat(EXP exp, list[STAT] body)

Here are some examples how Rascal responds to values of the above built-in datatypes:

bool: true
int: 101
real: 3.14
str: "Rascal"
loc: |file:///etc/passwd|
datetime: $2101-09-05$
rascal>[30, 20, 10];
list[int]: [30,20,10]
rascal><"Rascal", 100000>;
tuple[str,int]: <"Rascal",100000>
rascal>{"apples", "oranges", "bananas"};
set[str]: {"oranges","bananas","apples"}
rascal>{<"apples", 10, 15>, <"oranges", 5, 7>, <"bananas", 9, 11>};
rel[str,int,int]: {
rascal>("apples" : 100, "oranges": 150, "bananas": 75);
map[str, int]: ("oranges":150,"bananas":75,"apples":100)
rascal>"abc"(1, 2, 3);
node: "abc"(1,2,3)

3. Immutable Values


Immutable values.


Values are the basic building blocks of a language and the type of values determines how they may be used.

Rascal is a value-oriented language. This means that values are immutable and are always freshly constructed from existing parts. For instance, replacing an element in a list does not modify the original list but produces a new list that only differs from the original one in the modified position.

The language also provides variables. A value can be associated with a variable as the result of an explicit assignment statement: during the lifetime of a variable different (immutable) values may be assignment to it. Other ways to associate a value with a variable is by way of function calls (binding of formal parameters to actual values) and as the result of a successful pattern match.

The approach that values are immutable and that variables can be associated with different immutable values during their lifetime avoids sharing and aliasing problems that exist in many languages.


First we, create a list value and assign it to two variables L and M.

rascal>L = [1, 2, 3];
list[int]: [1,2,3]
rascal>M = L;
list[int]: [1,2,3]

Next we assign a new value to the first element of the list. The effect is that a new list value [10, 2, 3] is constructed.

rascal>L[0] = 10;
list[int]: [10,2,3]

L is now associated with this new value:

list[int]: [10,2,3]

But M is still associated with the original, unmodified, value.

list[int]: [1,2,3]

In pointer-based languages and in object-oriented languages the change to the original value of L would also be visible via M.

String values are, like all other values, also immutable. Let’s experiment with the replaceAll function:

rascal>import String;
rascal>replaceAll("abracadabra", "a", "A");
str: "AbrAcAdAbrA"

Now assign to variables S and T the string "abracadabra" and let’s see what happens:

rascal>S = "abracadabra";
str: "abracadabra"
rascal>T = S;
str: "abracadabra"
rascal>S = replaceAll("abracadabra", "a", "A");
str: "AbrAcAdAbrA"
str: "AbrAcAdAbrA"
str: "abracadabra"

To summarize: all values are immutable and variables can during their lifetime be associated with different immutable values.

  • Immutable values contribute to referential transparence.

  • Immutable values maybe less efficient than mutable ones.

4. Comprehensions


Comprehensions for generating values.


Comprehensions are a notation inspired by mathematical set-builder notation and list comprehensions that help to write succinct definitions of lists and sets. They are also inspired by queries as found in a language like SQL.

Rascal generalizes comprehensions in various ways. Comprehensions exist for lists, sets and maps. A comprehension consists of an expression that determines the successive elements to be included in the result and a list of enumerators and tests (boolean expressions). The enumerators produce values and the tests filter them.


A standard example is

rascal>{ x * x | int x <- [1 .. 10], x % 3 == 0 }
set[int]: {9,81,36}

i.e., the squares of the integers in the range [ 1 .. 10 ] that are divisible by 3. A more intriguing example (that we do not give in full detail) is

{name | /asgStat(Id name, _) <- P}

which traverses program P (using the descendant match operator /, see Patterns) and constructs a set of all identifiers that occur on the left hand side of assignment statements in P.

5. Pattern Matching


Pattern matching.


Pattern matching determines whether a given pattern matches a given value. The outcome can be false (no match) or true (a match). A pattern match that succeeds may bind values to variables.

Pattern matching is the mechanism for case distinction (Switch statement) and search (Visit statement) in Rascal. Patterns can also be used in an explicit match operator := and can then be part of larger boolean expressions. Since a pattern match may have more than one solution, local backtracking over the alternatives of a match is provided. Patterns can also be used in Enumeratorss and control structures like For and While statement.

A very rich pattern language is provided that includes string matching based on regular expressions, matching of abstract patterns, and matching of concrete syntax patterns. Some of the features that are provided are list (associative) matching, set (associative, commutative, idempotent) matching, and deep matching of descendant patterns. All these forms of matching can be used in a single pattern and can be nested. Patterns may contain variables that are bound when the match is successful. Anonymous (don’t care) positions are indicated by the underscore (_). See Patterns for more details.


Here is a regular expression that matches a line of text, finds the first alphanumeric word in it, and extracts the word itself as well as the before and after it (\W matches all non-word characters; \w matches all word characters):


Regular expressions follow the Java regular expression syntax with one exception: instead of using numbered groups to refer to parts of the subject string that have been matched by a part of the regular expression we use the notation:


If RegularExpression matches, the matched substring is assigned to string variable Name.

The following abstract pattern matches the abstract syntax of a while statement defined earlier:

whileStat(EXP Exp, list[STAT] Stats)

Variables in a pattern are either explicitly declared in the pattern itself---as done in the example---or they may be declared in the context in which the pattern occurs. So-called multi-variables in list and set patterns are declared by a suffix: X is thus an abbreviation for list[…​] X or set[…​] X, where the precise element type depends on the context. The above pattern can then be written as

whileStat(EXP Exp, Stats*)

or, if you are not interested in the actual value of the statements as

whileStat(EXP Exp, _*)

When there is a grammar for this example language, we can also write concrete patterns as described in Concrete Patterns.

6. Control Structures


Success-directed control structures.


The flow of Rascal program execution is completely explicit. Boolean expressions determine choices that drive the control structures. Only local backtracking is provided in the context of boolean expressions and pattern matching.

Control structures like If, While and For statement are driven by Boolean expressions. Actually, combinations of generators and Boolean expressions can be used to drive the control structures. In the latter case, the Boolean expression is executed for each generated value.


A classical if statement:

if(N <= 0)
     return 1;
     return N * fac(N - 1);

A combination of a generator and a test:

for(/asgStat(Id name, _) <- P, size(name) > 10){

This statement prints all identifiers in assignment statements (asgStat) that consist of more than 10 characters.

7. Case Distinction


Case distinction via pattern matching.


The switch statement as known from C and Java is generalized: the subject value to switch on may be an arbitrary value and the cases are arbitrary patterns followed by a statement. Each case is comparable to a transaction: when the pattern succeeds and the following statement is executed successfully, all changes to variables made by the statement are committed and thus become permanent. The variables bound by the pattern are always local to the statement associated with the case.

See Switch,Visit and Pattern With Action for more details.


We use the ColoredTrees datatype as example and use a switch to distinguish between red and black nodes:

rascal>data ColoredTree =
>>>>>>>      leaf(int N)
>>>>>>>    | red(ColoredTree left, ColoredTree right)
>>>>>>>    | black(ColoredTree left, ColoredTree right);
rascal>ColoredTree CT = red(black(leaf(1), red(leaf(2),leaf(3))), black(leaf(3), leaf(4)));
ColoredTree: red(
rascal>import IO;
rascal>switch (CT){
>>>>>>>case red(, _):
>>>>>>>     println("A red root node");
>>>>>>>case black(, _):
>>>>>>>     println("A black root node");
A red root node

8. Visiting


Visiting tree structures and arbitrary values.


Visiting the elements of a data structure is one of the most common operations in our domain and the visitor design pattern is a solution known to every software engineer. Given a tree-like data structure we want to perform an operation on some (or all) nodes of the tree. The purpose of the visitor design pattern is to decouple the logistics of visiting each node from the actual operation on each node. In Rascal the logistics of visiting is completely automated.

Visiting is achieved by way of visit expressions that resemble the switch statement. A visit expression traverses an arbitrarily complex subject value and applies a number of cases to all its subtrees. All the elements of the subject are visited. When one of the cases matches the statements associated with that case are executed. These cases may:

  • cause some side effect, i.e., assign a value to local or global variables;

  • execute an Insert statement that replaces the current element;

  • execute a Fail statement that causes the match for the current case to fail.

The value of a visit expression is the original subject value with all replacements made as dictated by matching cases. The traversal order in a visit expressions can be explicitly defined by the programmer.


Examples of visiting are, for instance, given in the Recipes ColoredTrees and Derivative.

9. Functions


Functions and pattern-directed invocation.


Functions allow the definition of frequently used operations. They have a name and formal parameters. They are explicitly declared and are fully typed. Functions can also be used as values thus enabling higher-order functions. Rascal is a higher-order language in which functions are first-class values.

See Function Declaration for details.


Here is an example of a function that counts the number of assignment statements in a program:

int countAssignments(PROGRAM P){
    int n = 0;
    visit (P){
    case asgStat(_, _):
         n += 1;
    return n;

Consider the following use of higher-order functions:

int double(int x) { return 2 * x; }

int triple(int x) { return 3 * x; }

int f(int x, int (int) multi){ return multi(x); }

The functions double and triple simply multiply their argument with a constant. Function f is, however, more interesting. It takes an integer x and a function multi (with integer argument and integer result) as argument and applies multi to its own argument. f(5, triple) will hence return 15. Function values can also be created anonymously as illustrated by the following, alternative, manner of writing this same call to f:

f(5, int (int y){return 3 * y;});

Here the second argument of f is an anonymous function.

10. Syntax Definition and Parsing


Syntax definition and parser generation for new languages.


All source code analysis projects need to extract information directly from the source code. There are two main approaches to this:

  • Lexical information: Use regular expressions to extract useful, but somewhat superficial, flat, information. This can be achieved using regular expression patterns, see Regular Expression Patterns.

  • Structured information: Use syntax analysis to extract the complete, nested, structure of the source code in the form of a syntax tree. Rascal can directly manipulate the parse trees, but it also enables user-defined mappings from parse tree to abstract syntax tree.

Using Syntax Definitions you can define the syntax of any (programming) language. Then Rascal:

  • will generate the parser, and

  • will provide pattern matching and pattern construction on parse trees and abstract syntax trees, see Abstract Patterns and Concrete Patterns.


Let’s use the Exp language as example. It contains the following elements:

  • Integer constants, e.g., 123.

  • A multiplication operator, e.g., 3*4.

  • An addition operator, e.g., 3+4.

  • Multiplication is left-associative and has precedence over addition.

  • Addition is left-associative.

  • Parentheses can be used to override the precedence of the operators.

Here are some examples:

  • 123

  • 2+3+4

  • 2+3*4

  • (2+3)*4

The EXP language can be defined as follows:

module demo::lang::Exp::Concrete::WithLayout::Syntax

layout Whitespace = [\t-\n\r\ ]*; (1)

lexical IntegerLiteral = [0-9]+;

start syntax Exp
  = IntegerLiteral
  | bracket "(" Exp ")"
  > left Exp "*" Exp
  > left Exp "+" Exp

Now you may parse and manipulate programs in the EXP language. Let’s demonstrate parsing an expression:

rascal>import demo::lang::Exp::Concrete::WithLayout::Syntax;
rascal>import ParseTree;
rascal>parse(#start[Exp], "2+3*4");
start[Exp]: (start[Exp]) `2+3*4`

First we import the syntax definition and the ParseTree module that provides the parsing functionality. Finally, we parse 2+3*4 using the start symbol Exp.

Don’t be worried, we are just showing the resulting parse tree here. It intended for programs and not for humans. The points we want to make are:

  • Rascal grammars are relatively easy to read and write (unfortunately, writing grammars will never become simple).

  • Parser generation is completely implicit.

  • Given a syntax definition, it can be used immediately for parsing.

See Recipes for a more extensive presentation of the EXP language and Languages for other language examples.

11. IDE Construction


Extend an IDE with interactive, language-specific, features (Eclipse only).


Meta-programs become most useful, when they are integrated with an Interactive Development Environment (IDE).

A Rascal program running inside Eclipse can get access to many of the services provided by Eclipse such as syntax highlighting, outlining, documentation hovering and much more.

Rascal uses the services of the IDE Meta-tooling Platform, or IMP for short, a collection of API and tools to support constructing IDEs for programming languages and domain specific languages. Rascal is also part of the collection of IMP tools and (will be) hosted shortly on eclipse.org.

Using the IDE library, you can instantiate the services that IMP provides for any language implemented in Rascal.

To instantiate an IDE for a language implemented using Rascal, use the following steps:

  • Define the grammar for the language.

  • Define a parse function for the language.

  • Register the language.

The following IDE features are available

12. Code Models


Code models are abstract representations of source code


You can use any of Rascal’s Values to represent facts about source code. For example, Algebraic Data Types can be used to define abstract syntax trees and [Values-Relation] are used to represent call graphs. We consistently use Locations to refer to source code artifacts, either physically (|file:///tmp/HelloWorld.java|) or logically (|java+class://java/lang/Object|).

Specifically we have standardized a set of models to represent source code which are ready for computing metrics: #/Libraries#analysis-m3[M3]. This M3 model consists of:

  • an open (extensible) set of Relations between source code artifacts.

  • a number of extensible Algebraic Data Types for representing abstract syntax trees.

The core language independent model can be found here: analysis::m3.

Extensions for representing facts about specific languages:

13. Enumerating


Enumerating values.


Enumerating regards the enumeration of the values in a given (finite) domain, be it the elements in a list, the substrings of a string, or all the nodes in a tree. Each value that is enumerated is first matched against a pattern before it can possibly contribute to the result of the enumerator. An enumerator yields true as long as it has generated a new value, and false otherwise.

See Enumerator for details.

int x <- { 1, 3, 5, 7, 11 }
int x <- [ 1 .. 10 ]
/asgStat(Id name, _) <- P

The first two produce the integer elements of a set of integers, respectively, a range of integers. Observe that the left-hand side of an enumerator is a pattern, of which int x is a specific instance. The use of more general patterns is illustrated by the third enumerator that does a deep traversal (as denoted by the descendant operator /) of the complete program P (that is assumed to have a PROGRAM as value) and only yields statements that match the assignment pattern (asgStat). Note the use of an anonymous variable at the EXP position in the pattern.

Let’s practice some of these examples.

rascal>int x <- {};
bool: false

The enumerator does not produce any value and returns false.

rascal>int x <- {1, 3, 5, 7, 11 };
bool: true
|prompt:///|(0,1,<1,0>,<1,1>): Undeclared variable: x
Advice: |http://tutor.rascal-mpl.org/Errors/Static/UndeclaredVariable/UndeclaredVariable.html|

Well, this is a disappointing experience. The generator returned true since it did produce a value. Apparently, we cannot inspect the value of the variable x that was bound.

Another example that results in an error:

rascal>str x <- {1, 3, 5, 7, 11 };
|prompt:///|(22,2,<1,22>,<1,24>): Expected int, but got str
Advice: |http://tutor.rascal-mpl.org/Errors/Static/UnexpectedType/UnexpectedType.html|

Here, the enumerator produces its first integer value, an attempt is made to assign this to variable x that is declared as string, and an error results.

A more satisfying use is as follows:

rascal>{ x * x | int x <- {1, 3, 5, 7, 11 }};
set[int]: {121,1,9,49,25}

When used inside Comprehensions, or For, Do, or While statement, all values of the generator will be produced and used. The variables that are introduced by a enumerator are local to the construct in which the enumerator is used. Here is a similar example:

rascal>import IO;
rascal>for(int x <- {1, 3, 5, 7, 11 })
>>>>>>>    println("x = <x>");
x = 5
x = 7
x = 1
x = 3
x = 11
list[void]: []

The variables that are bound by an enumerator are local to the statement in which the enumerator is used.

14. Equation Solving


Solving equations by fixed-point iteration.


Many problems can be solved by forms of constraint solving. This is a declarative way of programming: specify the constraints that a problem solution should satisfy and how potential solutions can be generated. The actual solution (if any) is found by enumerating solutions and testing their compliance with the constraints.

Rascal provides a Solve statement that helps writing constraint solvers.


A typical example is data flow analysis where the propagation of values through a program can be described by a set of equations. Their solution can be found with the solve statement.

add links

15. Rewriting


Rewriting using pattern-directed invocation


A rewrite rule is a recipe on how to simplify values. Remember: (a + b)2 = a2 + 2 ab + b2? A rewrite rule has a pattern as left-hand side — here: (a + b)2 — and a replacement as right-hand side — here: a2 + 2 ab + b2. Given a value and a set of rewrite rules the patterns are tried on every subpart of the value and replacements are made if a match is successful. This is repeated as long as some pattern matches.

Rascal has ancestors, notably ASF+SDF, where rewriting was the most important computation mechanism. In Rascal, rewriting can be achieved using pattern-directed invocation, see Function Declaration, possibly combined with a Visit statement.


In a package for symbolic differentiation it is desirable to keep expressions in simplified form in order to avoid intermediate results like add(product(con(1), x), mul(con(0), y)) that can be simplified to x. The following definitions achieve this:

Exp simp(add(con(n), con(m))) = con(n + m);   (1)
Exp simp(mul(con(n), con(m))) = con(n * m);

Exp simp(mul(con(1), Exp e))  = e;
Exp simp(mul(Exp e, con(1)))  = e;
Exp simp(mul(con(0), Exp e))  = con(0);
Exp simp(mul(Exp e, con(0)))  = con(0);

Exp simp(add(con(0), Exp e))  = e;
Exp simp(add(Exp e, con(0)))  = e;

default Exp simp(Exp e)       = e;            (2)

Exp simplify(Exp e){                          (3)
  return bottom-up visit(e){
           case Exp e1 => simp(e1)
1 Definitions of the function simp are given with different patterns as formal argument. Each definition is responsible for one particular simplification (here is where the similarity with rewrite rules surfaces).
2 A default for simp is given: if no other definition applies, the default one is used.
3 The actual simplify function: it performs a bottom up visit of the expression, replacing each subexpression by a simplified version.

See Derivative for a full explanation of this example.