Copyright © 2005, 2010 Martin Jambon. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the file fdl.txt. The source code of this document is the file extend-ocaml-syntax.html.mlx.
This tutorial is an individual initiative to provide additional documentation for Camlp5, a sophisticated tool for Objective Caml programmers.
2010 revision: This document was updated to reflect the name change of the legacy Camlp4, now called Camlp5. The following table clarifies name issues:
Period | Name of "legacy Camlp4" | Name of "new Camlp4" |
---|---|---|
before 2007 | Camlp4 | - |
from 2007 (OCaml 3.10) | Camlp5 | Camlp4 |
The examples of this tutorial will not work with the new Camlp4 starting with OCaml 3.10.
let try name = expr1 in expr2 with exception-handler
1/2
as 1. /. 2.
, but only locallyf `map` list
<:expr< [ $list:my_list$ ] >>
<:expr< let f $list:args$ = $e$ >>
_loc
(or loc
)<:expr< f ~$lid:labelname$ >>
doesn't workNot_found
is raised during the preprocessingWe are talking about truly modifying the syntax of OCaml. Yes, in theory anyone could modify the syntax of this programming language, without rewriting a whole dedicated parser. Camlp5 is the tool that lets you do this. And many syntax enhancements can be performed in relatively few lines of code.
However, there are quite a few things to know before starting to write your own syntax extension which will implement exactly what you want. This tutorial is meant to address the common difficulties that people encounter when they start using Camlp5 for this purpose. It is essentially based on my recent experience in integrating a dedicated syntax for regular expressions in OCaml and mix this form of pattern matching with the traditional pattern matching of OCaml.
Camlp5 lets you do amazing things that have no equivalent in most other programming languages. You might want to define a domain-specific language (DSL) without wasting your time in developing one more interpreter of poor quality which is not reusable at all. With Camlp5, you may create syntaxic shortcuts for the most common operations that your DSL requires and at the same time benefit from all the qualities of OCaml: automatic type inference, static typing, early detection of errors, precise location of mistakes in your code, most of the advantages of text-editor modes for OCaml, interface with other languages, an interactive interpreter and of course the generation of high performance native code.
Camlp5 is an excellent solution if you want to add a syntax which is a shortcut for something that is obviously, mechanically expandable into standard OCaml without having resort to the type information. Camlp5 lets you work on the abstract syntax tree (AST), which does not contain information on the actual type of the object being manipulated.
If the syntax extension you want to provide requires the knowledge of the type of each object, you can still design a dedicated embedded language that will be compiled into OCaml. Camlp5 provides a convenient mechanism for producing OCaml code, therefore compiling any given language to OCaml is an excellent choice in many cases even if the parsing facilities of Camlp5 are not used.
This tutorial is about learning how to develop your own syntax extensions but you might want to have a look at existing syntax extensions that are available on the web. The main sources for finding Camlp5 extensions are: the Caml hump on the official Caml site, the OCaml Link Database (under "OCaml syntax extension") and P4ck.
If you are interested in modifying the syntax of OCaml and you are a bit confused by the official tutorial and manual for Camlp5, then I hope this document will help you get started.
In order to start discovering Camlp5, you need to be fluent in OCaml
since it will be our main language (with variants) for doing everything.
You need to be familiar with the higher order functions (HOF) of OCaml
such as List.map
, List.fold_right
and
List.fold_left
since we will use them a lot for manipulating
syntax trees.
So if you are not familiar with
these and functional programming in general, practice a little first.
You will need a standard installation of OCaml, which should include
the OCaml compilers (ocaml
, ocamlc
,
ocamlopt
) and the Camlp5 system (camlp5
,
camlp5o
)
As usual for editing and testing OCaml code, you will need a good editing mode for OCaml in your favorite text editor, but we assume you know all about this.
You also need some way to compile automatically your code.
I use make
with OCamlMakefile
.
The good thing about make
is that if you don't want to be
too subtle, you can just write one target and the hardcoded sequence
of commands that recompiles everything. This might still be the best
choice in many cases. Anyway, don't waste your time with such things.
All the files that are given as examples along this document can be browsed from http://martin.jambon.free.fr/camlp5 or downloaded as a compressed archive.
One of the main source of confusion for beginners is the presence of multiple languages that are used for different things but which all look more or less like OCaml. We will see later the details about all of this, but just keep in mind that we will use the following languages:
EXTEND ... END
constructs for
defining syntax extensions (we use a predefined syntax extension in
order to define our own syntax extensions)
<:ident< ... >>
or
just << ... >>
. Quotations are a generic syntax extension
which are supported in any OCaml code which is preprocessed by Camlp5.
The contents of the quotation will be expanded in place according to
ident
which has to be known by the preprocessor.
$this$
and are
arbitrary OCaml code written in the regular syntax which lets you
insert automatically generated nodes into the syntax tree being
defined by the current quotation.
Now that you are completely confused, you understand why some Camlp5 syntax extensions that may seem natural, simple and readable may be very discouraging when you are a beginner who is trying to use existing code as a template.
We have two kinds of files: those which define a syntax or modify an existing syntax, and the files that are written in this syntax.
Warning: the syntax of a file is never defined within the file itself, but in a separate file
An OCaml file written in the standard syntax does not need to be
preprocessed. It is directly compiled into bytecode or native code by
one of the OCaml compilers (ocamlc
,
ocamlopt
or ocaml
).
Extending the syntax of OCaml simply means that we will use a modified syntax for writing our programs and this non-standard syntax is not understood by the OCaml compilers. Therefore our programs need to be preprocessed by a converter from our exotic syntax into plain old OCaml. Camlp5 provides us with tools for writing our specific preprocessor.
The command-line tool which will serve as a base for building our
specific preprocessor is camlp5
. camlp5
alone does nothing very interesting for us: we need to feed it with
our definition of how to convert our syntax into regular OCaml.
This is done by passing object files (.cmo
)
to camlp5
.
As a convention, we will name these files according to their role:
pa_
(as in parsing)
is used as a prefix for the files that define or modify a grammar,
i.e. how the input file should be converted into an abstract syntax
tree (AST)
q_
(as in quotation)
is used as a prefix for the files that define how to expand the
contents of quotations. Quotations are single, predefined tokens in the OCaml
syntax but are meant to be expanded into some normal OCaml expression
or pattern using arbitrary lexing and parsing rules.
These files are not named with the pa_
prefix since technically they do not add rules in the general grammar
that we may extend.
pr_
(as in printing)
is used as a prefix for the files that define how to export the
AST.
Camlp5 provides a file named pa_o.cmo
which parses the
standard syntax of OCaml (with only one addition, the quotations: see
later). It provides a file named pr_o.cmo
which converts
an OCaml AST into the concrete syntax of OCaml, i.e. a valid source file
for the OCaml compilers.
Thus the command camlp5 pa_o.cmo pr_o.cmo
should read a
standard OCaml source file and reprint an equivalent program
from the point of view of the compilers:
$ cat hello.ml print_endline "Hello World!";; $ camlp5 pa_o.cmo pr_o.cmo hello.ml let _ = print_endline "Hello World!"
Another useful printing file is pr_dump.cmo
. If you try
it instead of pr_o.cmo
, you will get an unreadable
output. This is a binary representation of the AST which can be read
back efficiently by the OCaml compilers and more importantly without
losing trace of the location of the original tokens in the source
file. We will therefore reserve the usage of pr_o.cmo
for reviewing the generated OCaml code but not compile it further.
camlp5o
is a predefined shortcut for
camlp5 pa_o.cmo pa_op.cmo pr_dump.cmo
: it parses the
regular syntax of OCaml and outputs a compiler-friendly representation
of the AST. The additional file pa_op.cmo
is a predefined
syntax extension of OCaml. It actually implements an
experimental syntax for streams and parsers which was used in earlier
versions of OCaml.
The interesting thing to notice here is that we load two files
starting with pa_
: pa_o.cmo
defines the
grammar of OCaml from scratch while pa_op.cmo
only adds
syntax rules to this grammar. This is really a syntax extension: we
load different files which will successively create and modify the
concrete syntax that is understood by the preprocessor.
Since we are interesting in parsing a syntax which is a modified OCaml
and converting it into an OCaml AST, we will always use
camlp5o
, and tell it to load our pa_*.cmo
files.
Warning:
don't get confused:
the files that define syntax extensions use themselves a
modified syntax of OCaml and therefore have to be preprocessed with
camlp5o
loaded with the adequate files
(usually pa_extend.cmo
and q_MLast.cmo
)
The most important point to remember for now is that the center of everything is the abstract syntax tree. The type of the nodes of this tree is fixed and is the only one which can be understood by the OCaml compilers.
The next step is to see how to add new syntaxic constructs to the grammar of OCaml and how to expand them into the intended AST.
For testing our example, we are going to use a
Makefile which is merely a
sequence of commands. We are going to write two programs:
pa_tryfinally.ml
defines our syntax extension
and prog.ml
is a simple test program.
Here is the Makefile:
NAME = tryfinally all: camlp5o pa_extend.cmo q_MLast.cmo pr_o.cmo pa_$(NAME).ml \ -o pa_$(NAME).ppo -loc loc camlp5o pa_extend.cmo q_MLast.cmo pa_$(NAME).ml \ -o pa_$(NAME).ast -loc loc ocamlc -c -I +camlp5 -pp 'camlp5o pa_extend.cmo q_MLast.cmo -loc loc' \ -dtypes pa_$(NAME).ml camlp5o -I . pr_o.cmo pa_$(NAME).cmo prog.ml -o prog.ppo camlp5o -I . pr_r.cmo pa_$(NAME).cmo prog.ml -o prog.ppr ocamlopt -dtypes -o prog -pp 'camlp5o -I . pa_$(NAME).cmo' prog.ml caml2html -t -ln pa_$(NAME).ml caml2html -t -ln prog.ml clean: rm -f prog *.ppo *.ppr *.cmo *.cmi *.o *.cmx *.ast *~ *.ml.html
We want to add a try ... finally
construct which behavior is
illustrated by this example:
let _ = try failwith "this is not an error" finally print_endline "OK"
should be converted into the following program written in the standard syntax of OCaml:
File expected.ml:
let _ = let __finally1 = try failwith "this is not an error"; None with exn -> Some exn in print_endline "OK"; match __finally1 with None -> () Some exn -> raise exn
This new syntaxic construct is formed by two keywords (yellow regions)
and two expressions (grey regions):
try
is already a keyword in OCaml
and finally
is a new keyword.
The two expressions are conserved during the conversion to standard OCaml and
some auxilliary code is added around in order to achieve the desired
effect. The desired effect consists in evaluating an expression e1 first,
and then evaluate an expression e2 later, even if the evaluation
of e1 raised an exception in which case this exception is re-raised after
the evaluation of e2.
In real programs it is often useful for closing an open file at the end of its manipulation even if an error occured. Please note that a more useful version of this syntax extension exists, but it's not the point here.
During the transformation of our program,
we introduced new identifiers at three different
places (pink). One of them, __finally1
must have a name
that is unlikely to interfere with existing names. We had to decide that
identifiers starting with __finally
are reserved for the syntax
expander and should not be used directly. The two other identifiers
are named exn
at two different places and are not visible
in the user-defined code (grey). Therefore it is perfectly safe to use
canonical names such as exn
, x
, s
or
whatever we like.
Now we will see how to implement this transformation. Here is a solution:
File pa_tryfinally.ml [html]:
(* The function that returns unique identifiers *) let new_id = let counter = ref 0 in fun () -> incr counter; "__finally" ^ string_of_int !counter (* The function that converts our syntax into a single OCaml expression, i.e. an "expr" node of the syntax tree *) let expand loc e1 e2 = let id = new_id () in let id_patt = <:patt< $lid:id$ >> in let id_expr = <:expr< $lid:id$ >> in <:expr< let $id_patt$ = try do { $e1$; None } with [ exn -> Some exn ] in do { $e2$; match $id_expr$ with [ None -> () | Some exn -> raise exn ] } >> (* The statement that extends the default grammar, i.e. the regular syntax of OCaml if we use camlp5o or the revised syntax if we use camlp5r *) EXTEND Pcaml.expr: LEVEL "expr1" [ [ "try"; e1 = Pcaml.expr; "finally"; e2 = Pcaml.expr -> expand loc e1 e2 ] ]; END;;
This program is written with a strange syntax: it uses
three quotations (grey areas) which start with something of the form
<:name<
where name
is the name of a predefined quotation expander
and are terminated by >>
.
Here we use two different quotation expanders: expr
and patt
. These quotation expanders are loaded from the file
q_MLast.cmo
(q_
means quotations,
and the rest means ML AST = OCaml abstract syntax tree).
The contents of these quotations looks very much like OCaml code but not
exactly:
it is actually expanded into a representation of the AST using
concrete types. Have a look at the program
after preprocessing, pa_tryfinally.ppo,
in order to see the effect of the quotation expanders.
Warning: The quotations which serve as shortcuts for building
nodes of the OCaml AST do not use the usual syntax
of OCaml, but must be written in the revised syntax. Unfortunately
you will have to learn this new syntax. One way is
to read the reference manual of Camlp5. Another way is to convert
your own programs to this syntax with camlp5o pr_r.cmo
and compare the output with the input.
Warning: The contents of these quotations are written in
the revised syntax of OCaml, at the exception of
the pieces which appear between
dollars ($...$
). They are called antiquotations
and are way to insert nodes of the syntax tree which have been
defined previously.
In the example, id_patt
and id_expr
are
two simple nodes of the AST which are respectively of the types
MLast.patt
and MLast.expr
. They both stand
for a lowercase identifier, but once in a pattern and once in
a expression. We just said that antiquotations are a pair of dollars
containing an OCaml expression which stands for a predefined node of the
AST. Actually, in addition we can use labels such as lid:
in this
portion of our example:
let id_patt = <:patt< $lid:id$ >> in ...
It means that the actual contents of the antiquotation (yellow)
is a string which represents a lowercase identifier (lid).
Here id
has to represent a valid lowercase identifier, which
is the case (id = "__finally1"
).
Using labels in antiquotation is required to convert one basic type
to a node of the syntax tree. It is also important for disambiguation
since a string can represent a lowercase identifier, but also an
an uppercase identifier, an escaped string literal or an escaped
character literal. They all have a corresponding label (see the reference
manual for details).
An important feature is to keep track in the AST of the location of
the original source code. Therefore, a location is associated
to each node of the AST. When manipulating the AST with quotations,
the quotation expander use a predefined name for the locations.
This name is by default loc
in the versions of Camlp5 up to
3.08.2 and _loc
in the following versions. The best way
to avoid trouble is to pass the -loc loc
option
to camlp5o
and use loc
. So a location
must be available under the name loc
when building
the AST with quotations. In return, when destructuring the AST
with pattern-matching using quotations, a variable named loc
is automatically defined. The same thing happens in grammar rules
of EXTEND
statements, which explains the availability
of a loc
object which seems
to come from nowhere in our EXTEND
statement.
Now you should really take the time to understand completely the system
of quotations. You must realize that they are used for building
nodes of the AST which types are defined in the MLast
module of the Camlp5 library.
These quotations can also be used in pattern matching if you need
to extract some information for an existing AST or if you want
to substitute it.
Let's now have a look at the EXTEND
statement.
What we extend is the default grammar.
The default grammar has been set by pa_o.cmo
which
is implicitely loaded by camlp5o
. This is the
grammar of the regular syntax of OCaml.
We will not extend the set of possible tokens or how they
are recognized, but only their meaning according to their sequential
arrangement.
An EXTEND
statement
contains a list of grammar entries that will be extended.
Each grammar entry consists in a collection of rules.
The entries can be predefined or newly defined.
They can be made visible and therefore extensible by other syntax extensions
or not. Here we just extend the Pcaml.expr
entry
which defines the syntax of an OCaml expression as its name indicates.
A rule within a given entry
is made of a pattern (yellow block) which is
associated with an OCaml expression which defines a syntax node (grey block):
EXTEND Pcaml.expr: LEVEL "expr1" [ [ "try"; e1 = Pcaml.expr; "finally"; e2 = Pcaml.expr -> expand loc e1 e2 ] ]; END;;
The patterns are matched according to precedence levels.
Here we know that a level named expr1
exists, and we
know its meaning and relative priority with respect to existing
syntaxic constructs.
We know this from the file
pa_o.ml of the Camlp5 library.
So we insert a rule exactly in this level, no new level is created which would
be the case if we wouldn't have used a LEVEL
annotation.
The rest is self explanatory:
try
and finally
are implicitely made
reserved keywords of the language if not already, and
we extract e1
and e2
which are two expression nodes (Pcaml.expr
is a grammar entry
which returns objects of type MLast.expr
).
Our new rule itself must return a node of type MLast.expr
.
This is the role of our expand
function.
After compilation of the syntax extension, we use it to rewrite our program in the regular OCaml syntax:
File prog.ppo:
let _ = let __finally1 = try failwith "this is not an error"; None with exn -> Some exn in print_endline "OK"; match __finally1 with None -> () Some exn -> raise exnand in the revised syntax, which is closer to what we wrote in the quotations of our file
pa_tryfinally.ml
:
File prog.ppr:
do { let __finally1 = try do { failwith "this is not an error"; None } with exn → Some exn in print_endline "OK"; match __finally1 with [ None → () Some exn → raise exn ] };
And the program prog
runs as expected:
$ ./prog OK Fatal error: exception Failure("this is not an error")
We will rearrange the source code of our try ... finally
syntax
extension in order to see better which element is responsible for which
effect and learn more about Camlp5.
First, we might use only one quotation to represent the node of the
AST which is returned by the try ... finally
rule.
We are talking of the expand
function, which was defined like
this:
let expand loc e1 e2 = let id = new_id () in let id_patt = <:patt< $lid:id$ >> in let id_expr = <:expr< $lid:id$ >> in <:expr< let $id_patt$ = try do { $e1$; None } with [ exn -> Some exn ] in do { $e2$; match $id_expr$ with [ None -> () | Some exn -> raise exn ] } >>
So, we can inline the definitions of id_patt
and id_expr
, which here simplifies our source code:
let expand loc e1 e2 = let id = new_id () in <:expr< let $lid:id$ = try do { $e1$; None } with [ exn -> Some exn ] in do { $e2$; match $lid:id$ with [ None -> () | Some exn -> raise exn ] } >>
The first occurrence in the quotation
of our newly created identifier id
is a pattern according to the Camlp5 terminology,
and the second occurrence is an expression. This is inferred
simply by the context: let patt =
in the first case
and match expr with
in the second case.
Here is some dummy OCaml code where some patterns and expressions have been highlighted in yellow (patterns) and pink (expressions):
let x = "abc" let _ = let z2 = let z = 5 * 3 in z * z in print_float z2; match x, z2, Some true with "a", _, None -> () ("abc" "ab"), 0, Some false -> print_endline "something" _ when z < 10 -> () _ -> ()
"Toplevel expressions" such as let x = 2
or type t = A B of string
are actually not expressions, but declarations
which are elements of the implementation of the current module.
In Camlp5, these are called str_item
(str
reminds of
the struct
keyword
which introduces submodule implementations).
There is a quotation expander for str_items, as well as for other families
of syntaxic elements that we did not encounter yet.
Let's now remove the uncessary identifier which has a reserved prefix.
It avoids the user of our syntax extension
to remember that __finally
is
a forbidden prefix. And there is unfortunately no way of generating
identifiers in a reserved Camlp5 namespace.
We completely rewrite the quotation so that all the identifiers we introduce are not accessible by user-defined expressions. Here is one solution:
let expand loc e1 e2 = <:expr< let f1 () = $e1$ and f2 () = $e2$ in do { (try f1 () with exn -> do { f2 (); raise exn }); f2 () } >>
There are several reasons why we have to write such twisted code:
f1
and f2
play their role well
since they both record the environment before any new binding is added
(and they don't see each other);
e2
(e2
itself might contain try ... finally
statements).
And we hope that the compiler handles the closures efficiently.
In general, a good approach would be to implement the initial solution which is more natural, and choose our automatically-generated identifiers so that there is no clash with the user-defined identifiers. There is no simple generic solution for doing this (yet) since it requires a lexical analysis of whole subtrees with a lot of different kind of nodes. We will see an example in which we actually do something like this later.
We want to add another syntax for expressing the same as with
try ... finally
. We want that the following:
try e1 finally e2
could as well be written as:
before e2 try e1
We will insert a rule for this syntax in the same priority level
as we did previously for try ... finally
.
For now, just notice that we place a vertical bar directly between
the rules within in the innermost pair of square brackets which represent
the extension of the same level:
EXTEND Pcaml.expr: LEVEL "expr1" [ [ "try"; e1 = Pcaml.expr; "finally"; e2 = Pcaml.expr -> expand loc e1 e2 "before"; e2 = Pcaml.expr; "try"; e1 = Pcaml.expr -> expand loc e1 e2 ] ]; END;;
Understanding the system of levels is the object of a dedicated section of this tutorial.
Using the EXTEND
statement, we are able to add or replace
grammar rules. We have seen that a rule consists in building a syntax
node for the OCaml AST.
Earlier we defined a rule like this:
"try"; e1 = Pcaml.expr; "finally"; e2 = Pcaml.expr -> expand loc e1 e2
e1
and e2
are two expressions, i.e. two nodes
of type MLast.expr
. From these expressions,
we build a syntax node
which is itself an expression. This is the role of our expand
function.
In that case, we don't have to transform e1
or e2
.
However, things are not always so simple. Let's consider the following problem:
we want to create a syntax which has the following properties:
Consider the following problem: in order to make the code for numeric
computations easier to read, we want to read ints as floats
and their operators (+ - * /
)
as the equivalent operators over floats
(+. -. *. /.
). However we don't want this to be applied
everywhere in the file, but only in expressions that are introduced
by a new FLOAT
keyword since it makes quasi impossible to use
ints within the new syntax:
let x = FLOAT 3/2 - sqrt (1/3) let i = 1 + 2 + 3
would be converted into:
let x = 3. /. 2. - sqrt (1. /. 3.) let i = 1 + 2 + 3
which is less pleasant for the eye.
Using an EXTEND
statement, we could relatively
easily add rules that interpret int constants as their float equivalent,
interpret +
as +.
and so on
(if this is not obvious to you, implement it as an exercise using
the knowledge introduced in the previous sections and
the file pa_o.ml
of the distribution).
This would however interpret any occurrence of 2
as the float 2.0
for instance, which is not satisfactory.
On the other hand, we want to benefit from the full OCaml syntax
within our "FLOAT
" expressions (which by the way
do not have to be of type float
).
One solution to this problem is to define a quotation expander which uses a globally-modified OCaml syntax. In other words, our example would look like this:
let x = <:float< 3/2 - sqrt (1/3) >> let i = 1 + 2 + 3
But this is not exactly what we want and I don't know how to do this. Moreover it might be not so simple since we would have to manipulate two variants of OCaml grammars at the same time, not only the default one (maybe a look at the implementation of HereDoc could help).
The solution we will adopt is extremely inelegant, but works after all and does not require as many efforts as it seems at first sight.
For testing our syntax extension, we are going to use this Makefile. We will perform the tests over the following program:
let f x = FLOAT let pi = acos (-1) in x/(2*pi) - x**(2/3) let _ = let x = 2.5 in Printf.printf "%g -> %g\n" x (f x)
And it should be transformed into this:
File expected.ml:
let f x = let pi = acos (-1.) in x /. (2. *. pi) -. x ** (2. /. 3.) let _ = let x = 2.5 in Printf.printf "%g -> %g\n" x (f x)
Here comes our syntax extension. We use predefined quotations for
recursively destructuring the syntax tree, as well as for reconstructing
it. Only the yellow regions are actually specific, the rest is very repetitive
and can be reused in other programs that need to rewrite expr
nodes.
File pa_float.ml [html]:
(* The following function takes an expr syntax node and replaces all occurrences of int constants and operators by their float equivalent. The code is directly derived from the section on the quotations for manipulating OCaml syntax trees in the reference manual. This code can be easily reused by copy-pasting. *) let rec subst_float expr = let loc = MLast.loc_of_expr expr in let se = subst_float in let sel = List.map subst_float in let spwel = List.map (fun (p, w, e) -> (p, w, se e)) in match expr with <:expr< $e1$ . $e2$ >> -> <:expr< $se e1$ . $se e2$ >> <:expr< $anti:e$ >> -> <:expr< $anti:se e$ >> <:expr< $e1$ $e2$ >> -> <:expr< $se e1$ $se e2$ >> <:expr< $e1$ .( $e2$ ) >> -> <:expr< $se e1$ .( $se e2$ ) >> <:expr< [| $list:el$ |] >> -> <:expr< [| $list:sel el$ |] >> <:expr< $e1$ := $e2$ >> -> <:expr< $se e1$ := $se e2$ >> <:expr< $chr:c$ >> -> expr <:expr< ($e$ :> $t$) >> -> <:expr< ($se e$ :> $t$) >> <:expr< ($e$ : $t1$ :> $t2$) >> -> <:expr< ($se e$ : $t1$ :> $t2$) >> <:expr< $flo:s$ >> -> expr <:expr< for $s$ = $e1$ $to:b$ $e2$ do { $list:el$ } >> -> <:expr< for $s$ = $se e1$ $to:b$ $se e2$ do { $list:sel el$ } >> <:expr< fun [ $list:pwel$ ] >> -> <:expr< fun [ $list:spwel pwel$ ] >> <:expr< if $e1$ then $e2$ else $e3$ >> -> <:expr< if $se e1$ then $se e2$ else $se e3$ >> <:expr< $int:s$ >> -> (* we change the int constants into floats *) let x = string_of_float (float (int_of_string s)) in <:expr< $flo:x$ >> <:expr< ~ $i$ : $e$ >> -> <:expr< ~ $i$ : $se e$ >> <:expr< lazy $e$ >> -> <:expr< lazy $se e$ >> <:expr< let $opt:b$ $list:pel$ in $e$ >> -> let pel' = List.map (fun (p, e) -> (p, se e)) pel in <:expr< let $opt:b$ $list:pel'$ in $se e$ >> <:expr< $lid:s$ >> -> (* we override the basic operators + - * / *) (match s with "+" "-" "*" "/" -> <:expr< $lid: s ^ "."$ >> _ -> expr) <:expr< match $e$ with [ $list:pwel$ ] >> -> <:expr< match $se e$ with [ $list:spwel pwel$ ] >> <:expr< { $list:pel$ } >> -> let pel' = List.map (fun (p, e) -> (p, se e)) pel in <:expr< { $list:pel'$ } >> <:expr< do { $list:el$ } >> -> <:expr< do { $list:sel el$ } >> <:expr< $e1$ .[ $e2$ ] >> -> <:expr< $se e1$ .[ $se e2$ ] >> <:expr< $str:s$ >> -> expr <:expr< try $e$ with [ $list:pwel$ ] >> -> <:expr< try $e$ with [ $list:spwel pwel$ ] >> <:expr< ( $list:el$ ) >> -> <:expr< ( $list:sel el$ ) >> <:expr< ( $e$ : $t$ ) >> -> <:expr< ( $se e$ : $t$ ) >> <:expr< $uid:s$ >> -> expr <:expr< while $e$ do { $list:el$ } >> -> <:expr< while $se e$ do { $list:sel el$ } >> _ -> Stdpp.raise_with_loc loc (Failure "syntax not supported due to the \ lack of Camlp5 documentation") EXTEND Pcaml.expr: LEVEL "expr1" [ [ "FLOAT"; e = Pcaml.expr -> subst_float e ] ]; END;;
And the program prog
runs nicely:
$ ./prog 2.5 -> -1.44413
You can check the result of the preprocessing of our test program in the standard syntax (prog.ppo) or in the revised syntax (prog.ppr).
Nicer solutions to this kind of problems exist in theory, such as generic tree-traversal functions that could be defined automatically from type definitions. But it has to be written.
It may useful to insert some open
directives or
a few definitions that are used by our runtime system.
One solution consists in changing the global function which parses
the stream of characters and returns the list of str_items
(.ml
files) or sig_items (.mli
files).
This parsing function can be interrupted and reloaded because of directives
that might modify the syntax. This is why we must check that the insertions
of initial code is made only once.
let _ = Printf.printf "Version: %s\n" version
let insert_this () = let loc = Token.dummy_loc in (<:str_item< value version = "1.2.3" >>, loc) let _ = let first = ref true in let parse strm = let (l, stopped) = Grammar.Entry.parse Pcaml.implem strm in let l' = if !first then insert_this () :: l else l in (l', stopped) in Pcaml.parse_implem := parse
It seems that the pretty-printer is confused by this hack, and the output looks strange:
File prog.ppo:
let version = "1.2.3"let _ = Printf.printf "Version: %s\n" version
Nevertheless, the AST in binary format is correct since the program
is correctly compiled and executed when pr_dump.cmo
is used (always loaded implicitely by camlp5o
)
instead of pr_o.cmo
:
$ ./prog Version: 1.2.3
You can also get the Makefile for this example.
In the case of expanding the str_item
grammar entry with
a new rule, often we want to insert several str_item
nodes
of the OCaml abstract syntax tree, or sometimes not at all.
However, we have to return exactly one node.
In this case, we use the declare
... end
construct of the revised syntax to group an arbitrary number of
str_items:
<:str_item< declare $x$; $y$; end >>
Or:
<:str_item< declare $list: list_of_str_items$ end >>
See the section on customized record types for a meaningful example.
The problem is the following: a syntax extension needs to use some data, such as a cache, that has to be used repeatedly but is initialized only once. Moreover we don't want to expose this definition in the module interface since it will be used transparently and locally.
For instance we can create a count
keyword which counts
how many times the execution of the program goes through
this point, and displays the result when the program terminates:
let f l = print_string "That's a nice list of items:\n"; List.iter (fun x -> count; print_endline x) l
That could expanded into something like this:
let f = let __count1 = ref 0 in at_exit (fun () -> Printf.printf "File \"test_count.ml\", line 3, characters 22-26:\n\ count = %i\n" !__count1); fun l -> print_string "That's a nice list of items:\n"; List.iter (fun x -> incr __count1; print_endline x) l
Although there is no built-in functionality for doing this,
you can use Yutaka Oiwa's Declare_once library which is included
in the distribution of his
regexp-pp
package. Once compiled, the
Declare_once
module can be used as follows:
let create_some_ast_node some_param = ... let expr = ... in let name_for_my_expr = ... in Declare_once.declare ~package:"my_package" name_for_my_expr (Declare_once.Expr expr); ...
It works by adding a pair (name, expr) to a list of pending declarations. When the value of the current str_item is computed, this list of declarations is inserted in a way which is similar to our example, so that these declarations are not visible in the module interface but are computed only once.
This section is best illustrated with the pa_o.ml
file
of the Camlp5 distribution. It is time for you to retrieve it
and keep a copy somewhere, if you haven't already done so.
First, look at the expr
entry of the grammar.
The first occurence of expr:
in the file defines
what is commented as the "core expressions". It defines
many rules, and these rules are grouped into different precedence
levels,
and many of them are named explicitely:
"top"
,
"expr1"
,
":="
,
"||"
,
"&&"
,
"apply"
,
"simple"
, etc.
Later in the EXTEND
statement, the expr
entry
is extended further with other rules.
Some of these additional rules can be inserted in already existing
levels. This extends the "expr1"
level
of the expr
entry with
an additional rule:
expr: LEVEL "expr1" [ [ "fun"; p = labeled_patt; e = fun_def -> <:expr< fun $p$ -> $e$ >> ] ] ;
The innermost brackets define a level or like here, an extension of an existing level (grey area). A level may contain several rules, separated by vertical bars. Lists of levels also use the vertical bar as a separator, but do not confuse them. Please do not do this:
(* 2 levels, 2 rules, 1 level to extend: Which level is extended? Which level is inserted? Where? *) entry: LEVEL "some level" [ [ some rule ] [ some other rule ] ];
which is different from:
(* extending 1 level with 2 rules: this is clear *) entry: LEVEL "some level" [ [ some rule some other rule ] ];
Levels have this property: when the parser is looking for a given syntax entry, it starts at a given level (by default the first one) and looks for rules that can be satisfied. If no rule can be satisfied in the current level, it goes to the next level, and repeats the same process. The pratical consequences are that:
pa_o.ml
).
Viewed like this, addition has a higher priority than multiplication.
As stated in the reference manual, only the LEVEL
instruction
can be used to extend an existing level. Other instructions
that specify where a given level must be inserted are available:
FIRST
, LAST
, AFTER
some level,
BEFORE
some level. These positions refer to the order
in which they are written, which is the order in which the parser tries to
match the rules.
Suggested exercise: implement and test a syntax extension which supports
a where
construct.
For instance,
a + b where a = 1 and b = 2
means
let a = 1 and b = 2 in a + b
We decide that
let a = 1 in a where a = 2
should be read as
let a = 1 in (a where a = 2) (* returns 2 *)
and not
(let a = 1 in a) where a = 2 (* returns 1 *)
Also, the where
construct is right-associative:
x + y where x = y where y = 1
means
x + y where x = (y where y = 1) (* depends on an external y *)
and not
(x + y where x = y) where y = 1 (* returns 2 *)You are encouraged to reuse the
let_binding
grammar entry
(Pcaml.let_binding
). Right-associativity
must be specified with RIGHTA
since the default is
left-associativity (LEFTA
); you can find examples
of these specifications in the pa_o.ml
file.
After completion of this exercise, you should be able to:
This section gives hints on how to parse some blocks using a custom parser. We will not give too much detail here, since the recommended way of doing this is by using quotations. Make sure you understand the rest of this document before reading this.
When the language extension that must be parsed locally cannot be parsed using the Camlp5 grammar system, we would normally use quotations. Consider the following example where a graph is represented using ASCII art:
"Node 1"---B---D | \ / +---------C
The graph should be expanded into the following type definitions:
type node_1 = [ `B of b `C of c ] and b = [ `Node_1 of node_1 `C of c `D of d ] and c = [ `Node_1 of node_1 `B of b `D of d ] and d = [ `B of b `C of c ]
Actually, this graph should be included in an OCaml program, so we would
create a quotation expander named graph
, and our piece of program
should be written like this:
<:graph< "Node 1"---B---D | \ / +---------C >>
However, one limitation of quotations is that they must be expanded into either
an expr
or a patt
, but not into a type definition,
which is a str_item
. So this will not be accepted as-is
by the parser.
Solution 1: instead of using of quotation, just create a
GRAPH
keyword which
will be followed by a string literal. This can be expanded into a
str_item
without specific difficulties, given a function
which will parse the string. The problem here is that double-quotes
must be protected by backslashes, which may be inconvenient.
The program would look like this, which is now totally unreadable
unless we don't use double-quoted labels:
GRAPH " \"Node 1\"---B---D | \ / +---------C "
Solution 2: same as solution 1, but in addition we define a quotation
expander named string
which just lets us write a string
literal using the quotation syntax. In this case,
only the >>
sequences
would have to be protected by backslashes.
The example becomes:
GRAPH <:string< "Node 1"---B---D | \ / +---------C >>
Now, if the token stream returned by the lexer is satisfying, but your grammar requires to first scan the stream without consuming it, you can do it. You can actually hook any external parser at this point. It will operate on the token stream, with its limitations (whitespace is discarded, tokens may not be recognized the way you want in your sublanguage, ...).
The easiest way of generating error messages that indicate a location in the source file is the following:
Stdpp.raise_with_loc _loc (Failure "this is an error message")It displays the location by indicating file, line number and character offsets, as usual in OCaml. Under Emacs with tuareg-mode it allows to go directly to this location in the source file. However, this raise an exception, which is not always wanted.
A similar error message can be produced using the following function:
open Printf open Lexing (* works only if done immediately, since the file name can change when a #line or #use directive is encountered *) let string_of_loc _loc = let start, stop = _loc in let char1 = start.pos_cnum - start.pos_bol in let char2 = char1 + stop.pos_cnum - start.pos_cnum - 1 in sprintf "File %S, line %i, characters %i-%i:\n" !Pcaml.input_file (* should be: start.pos_fname*) start.pos_lnum char1 char2
Beware that there has been bug which caused the pos_fname
record field to not be set correctly
(bug report 3886).
This is why we don't use it, although it should be a better solution
since it does not depend on any external state.
Of course, the user of the syntax extension must not load
pr_o.cmo
(conversion to OCaml source file in standard syntax)
when preprocessing a source file with camlp5o
,
since it does not preserve the original location indicators.
The default output format should be used. It is provided by the
pr_dump.cmo
file which is preloaded in
camlp5o
or camlp5r
. This format is a
binary representation of the abstract syntax tree,
with locations that match the source code.
See also loc vs. _loc and why you should
always use the -loc
option when preprocessing
a syntax extension.
These are guidelines which should make it easier for programmers to actually use the syntax extensions that you may have written.
If possible, do not override existing rules: this might be fine if only your extension is being used, but if another extension does the same, only one of these extensions can be used at a time. Sometimes, deleting a rule and rewriting an extended version of it is the only way to "extend" existing syntax constructs, but using other keywords instead is always possible.
EXTEND statements are expressions and they can be parametrized by some runtime parameters. It is a good idea to provide an option which allows to specify a given keyword instead of the default one. For instance, instead of this:
(* file pa_eval.ml *) ... EXTEND Pcaml.expr: [ [ "eval"; e = Pcaml.expr -> ... ] ] END
we would write the following:
(* file pa_eval.ml *) ... let extend opt = let kw = !opt in EXTEND Pcaml.expr: [ [ $kw$; e = Pcaml.expr -> ... ] ] END let _ = let eval = ref "eval" in Pcaml.add_option "-eval-kw" (Arg.SetString eval) "<kw> use another keyword than \"eval\"";
Now the users of the syntax extension can load it
with camlp5o pa_eval.cmo -eval-kw EVAL
if they want the new keyword
to be EVAL
instead of eval
.
pa_eval.cmo
)
pr_eval.cmo
)
q_eval.cmo
)
-eval-kw
).
__pa_eval1234
).
Please note that many existing extensions do not respect all of these (new, unofficial) guidelines, but if you follow them it means less trouble for you in the future.
let try name = expr1 in expr2 with exception-handler
Sometimes, it is useful to restrict the scope of an exception handler.
The regular try
... with
lets us do this:
let rec cat () = try let c = input_char stdin in print_char c; cat () with End_of_file -> () let _ = cat ()
but it catches exceptions that might be raised not only during the call to
input_char
but also print_char
and cat
itself.
That is problematic for several reasons that we don't want to discuss here.
In order to catch the exceptions that are raised during the call to
input_char
, it can be quite difficult to keep the code
simple and readable. Here is one solution which is relatively natural:
let rec cat () = match try Some (input_char stdin) with End_of_file -> None with Some c -> print_char c; cat () None -> () let _ = cat ()
Another solution, which is hard to read but simple to implement mechanically is the following:
let rec cat () = (try let c = input_char stdin in fun () -> print_char c; cat () with End_of_file -> fun () -> ()) () let _ = cat ()
This is the solution we choose here to implement a new let-try-in-with construct which was suggested by Don Syme in a message to caml-list. It looks like this:
let rec cat () = let try c = input_char stdin in print_char c; cat () with End_of_file -> () let _ = cat ()
Note that we just inverted the let
and
try
keywords with respect to the original program.
The syntax extension is pretty straightforward and reuses
some entries of the grammar of OCaml: Pcaml.let_binding
,
Pcaml.expr
and Pcaml.patt
.
You can notice that these entries are defined in the Pcaml
module, not in Pa_o
(file pa_o.ml
).
The reason is that
the grammar for the revised syntax of OCaml (file pa_r.ml
)
shares the same public entries.
This leaves the possibility of writing syntax extensions of the
regular syntax (as we do in this tutorial)
which also work to extend the revised syntax.
Unfortunately, many entries found in pa_o.ml
that we would like to modify are not visible from outside.
In this example, we create a new entry lettry_case
which is very similar to the match_case
entry found
in pa_o.ml
:
File pa_lettry.ml [html]:
EXTEND GLOBAL: Pcaml.expr; Pcaml.expr: LEVEL "expr1" [ [ "let"; "try"; o = OPT "rec"; l = LIST1 Pcaml.let_binding SEP "and"; "in"; e = Pcaml.expr; "with"; pwel = LIST1 lettry_case SEP "|" -> <:expr< (try let $opt: o <> None$ $list:l$ in fun () -> $e$ with [ $list:pwel$ ]) () >> ] ]; lettry_case: [ [ p = Pcaml.patt; w = OPT [ "when"; e = Pcaml.expr -> e ]; "->"; e = Pcaml.expr -> (p, Ploc.VaVal w, <:expr< fun () -> $e$ >>) ] ]; END;;
When a GLOBAL
statement is present, it means that
any new entry will be created automatically and will not be
visible outside of the EXTEND
block.
To make the lettry_case visible, we would proceed as follows:
let lettry_case = Grammar.Entry.create Pcaml.gram "lettry_case";; EXTEND (* no GLOBAL statement *) Pcaml.expr: ... ; lettry_case: ... ; END;;
Our program in the new syntax is successfully transformed into this one:
File prog.ppo:
let rec cat () = (try let c = input_char stdin in fun () -> print_char c; cat () with End_of_file -> fun () -> ()) () let _ = cat ()
The program prints on stdout the characters read from stdin:
$ echo Hello | ./prog Hello
Warning: we also should extend the Pcaml.str_item
entry,
using the same code as for Pcaml.expr
, just
like for the standard let-in construct found in pa_o.ml
.
Alternate syntax:
we might prefer a syntax where the with
is internal.
It makes it easier to realize that the recursive call to our
cat
function is a tail-call.
This was suggested by
Daniel
de Rauglaudre. It goes like this:
let rec cat () = let try c = input_char stdin with End_of_file -> () in print_char c; cat () let _ = cat ()
Implementing this is left as an exercise for the reader.
1/2
as 1. /. 2.
, but only locallyA full solution to this problem is given earlier, in that section.
Although it is not very easy to extend the existing syntax for type definitions, we can easily add alternative syntaxes.
Here we will create a record
keyword that we
will use for the definition of records where some fields
are defined with default values.
A function with labeled arguments will be generated automatically
and should be used by the user for creating these records.
This is our test program:
record bob = { foo : string = "Hello"; bar : string; mutable n : int = 1 } record weird = { x : weird option = (Some (create_weird ~x:None ())) } let _ = let x = create_bob ~bar:"World" () in x.n <- x.n + 1; Printf.printf "%s %s %i\n" x.foo x.bar x.n
There is no big difficulty since we chose not to extend
the type
syntax for type definitions but to create
a new one, just for records.
Note (in pink) that the expressions that are given as default values
for record fields are parsed from the "simple"
precedence level. It means that unless parentheses are placed
around the expression, the semicolon will be interpreted as a
separator between two record fields, not between two expressions.
File pa_records.ml [html]:
let make_record_expr loc l = let fields = List.map (fun ((loc, name, mut, t), default) -> (<:patt< $lid:name$ >>, <:expr< $lid:name$ >>)) l in <:expr< { $list:fields$ } >> let expand_record loc type_name l = let type_def = let fields = List.map fst l in <:str_item< type $lid:type_name$ = { $list:fields$ } >> in let expr_def = let record_expr = make_record_expr loc l in let f = List.fold_right (fun ((loc, name, mut, t), default) e -> match default with None -> <:expr< fun ~ $Ploc.VaVal name$ -> $e$ >> Some x -> <:expr< fun ? ($lid:name$ = $x$) -> $e$ >>) l <:expr< fun () -> $record_expr$ >> in <:str_item< value rec $lid: "create_" ^ type_name$ = $f$ >> in <:str_item< declare $type_def$; $expr_def$; end >> EXTEND GLOBAL: Pcaml.str_item; Pcaml.str_item: LEVEL "top" [ [ "record"; type_name = LIDENT; "="; "{"; l = LIST1 field_decl SEP ";"; "}" -> expand_record loc type_name l ] ]; field_decl: [ [ mut = OPT "mutable"; name = LIDENT; ":"; t = Pcaml.ctyp; default = OPT [ "="; e = Pcaml.expr LEVEL "simple" -> e ] -> ((loc, name, (mut <> None), t), default) ] ]; END;;
Our program prog.ml has been converted into prog.ppo and works as expected:
$ ./prog Hello World 2
You can download the Makefile.
This is left as an exercise for the reader:
we decide that the rec
keyword preceding a function makes this function available under
the name self
throughout its definition.
For instance, the following:
List.map (rec function 0 -> 1 n -> n * self (n - 1)) [1;2;3;4;5]
would be transcribed into:
List.map (let rec self = function 0 -> 1 n -> n * self (n - 1) in self) [1;2;3;4;5]
Hint: some expressions other than functions can be defined recursively. How would you define the following list in our new syntax?
(* This is a circular list *) let rec circ = 1 :: 2 :: circ
Extending the syntax of OCaml consists in adding or replacing rules
in the grammar.
However the terminal rules, i.e. the tokens returned by the lexer such
as LIDENT
, STRING
or INT
, cannot be
extended.
Consider the following syntax extension where we create a one
keyword which is simply replaced by 1
in expressions and in
patterns:
EXTEND Pcaml.expr: LEVEL "simple" [ [ "one" -> <:expr< 1 >> ] ]; Pcaml.patt: LEVEL "simple" [ [ "one" -> <:patt< 1 >> ] ]; END;;
This will not replace every occurrence of one
by 1
,
but only where one
appears as a lowercase identifier
as defined by the lexer, as an expression or a pattern.
So one_apple
and "one + 2"
will
remain unchanged.
If you need for instance to parametrize the name of an identifier by adding a suffix such as a version number, you can't do it by defining grammar rules. In that case, one solution is to use a simple preprocessor which simply ignores the context, or to define your own quotation expander.
Quotations behave as one single token, which will be expanded into a node of the OCaml syntax tree which is either an expression (expr) or a pattern (patt). Quotations are a good way to introduce a syntax which is radically different from OCaml. All you have to do is define a syntax expander, i.e. a function which builds an expression or a pattern from a raw string. For this you can use any technique you like such as Camlp5 (lexer + grammar), Ocamllex + Ocamlyacc, regular expressions, etc. See the Camlp5 manual for the details on how to define a quotation expander.
End of lines that separate tokens and comments are eliminated by the lexer. This is why nothing can be done to solve this problem with extensible grammars, although it should be relatively easy to adapt the lexer for this task.
Adding customized delimiters for string literals cannot be done by extending the grammar.
One alternative is to define a quotation expander which job is to
transform the contents of the quotation into a valid OCaml string.
In this case, instead of escaping
the double-quotes ("
), we would have
to escape the end-of-quotation delimiters (>>
).
The code which would be compiled and loaded by the preprocessor should look
like this (not tested):
let _ = (* we define a very simple quotation expander *) let expander is_expr quotation_contents = (* addition of double-quotes around the string and backslashes where necessary *) let s = Printf.sprintf "%S" quotation_contents in (* the result is plain-text OCaml code (concrete syntax) *) Quotation.ExStr s in Quotation.add "string" expander; (* we decide that `string' will be the default quotation expander *) Quotation.default := "string"
Now, in a program which is preprocessed with this, the three following notations are equivalent:
<:string< I don't want to escape this: """""""""" >> << I don't want to escape this: """""""""" >> " I don't want to escape this: \"\"\"\"\"\"\"\"\"\" "
The syntax expander can also return a node of the AST, but it is more complicated to implement and we lose the location of the quotation, which can make debugging quite unpleasant (again, not tested):
let _ = (* we define a very simple quotation expander *) let quote_string s = (* no double-quotes around the strings in AST nodes! *) String.escaped s in let loc = Token.dummy_loc (* avoid doing this whenever you can *) in (* here the result is a pair of functions that return the appropriate node of the syntax tree (abstract syntax) *) let expand_expr quotation_contents = let s = quote_string quotation_contents in <:expr< $str:s$ >> and expand_patt quotation_contents = let s = quote_string quotation_contents in <:patt< $str:s$ >> in let expander = Quotation.ExAst (expand_expr, expand_patt) in Quotation.add "string" expander; Quotation.default := "string"
f `map` list
The backquote symbol (`
) is already in use
as a prefix operator for constructors of polymorphic variants and in
the Camlp5 extension for stream parsers.
Other notations could be used though.
Maybe using &
is not possible due to priority issues,
but we would have something like this:
let add a b = (2 * a) + b let c = 1 &add 2
which means:
let c = add 1 2
and not:
let c = (add 2) 1
That makes a good exercise for the reader! I don't know if there is an acceptable solution, so let me know if you find one.
Hint: we have to define an infix operator
which is accepted by Camlp5 and available (or that can be overriden),
and has a stronger precedence than function application
("apply"
level) just like .
or
#
.
This problem is: how to define a function which returns the nth element of a tuple of any size?
Unfortunately, Camlp5 cannot help much here since it doesn't know the type of the expressions it manipulates.
But if we accept to specify how many fields the records has, it becomes feasible. We would have to define a syntax which would be close to this:
let x = (1, "abc", None) let third_field = x.3 3
which would mean:
let x = (1, "abc", None) let third_field = (match x with (_, _, field) -> field)
As often, the difficulty is to find a nice syntax which does not create ambiguities and is accepted by Camlp5.
<:expr< [ $list:my_list$ ] >>
In the syntax tree, there is a node for each node of a list, and there is no predefined function that will create all these AST nodes automatically.
Let's say we want to create a notation for lists without semicolons between the elements. A program using this notation would look like this:
let _ = let a = [| 123; 456 |] in List.iter (fun i -> print_int i; print_newline ()) (LIST 1 2 3 a.(1))
The syntax extension is rather short, and easy if you understand the system of levels:
File pa_lists.ml [html]:
let expr_list loc l = List.fold_right (fun head tail -> <:expr< [ $head$ :: $tail$ ] >>) l <:expr< [] >> EXTEND Pcaml.expr: [ [ "LIST"; l = LIST0 Pcaml.expr LEVEL "." -> expr_list loc l ] ]; END;;
As announced, we need to build the nodes of the AST that represent the
nodes of the list. This is the purpose of the expr_list
function.
The output of the program is the following:
$ ./prog 1 2 3 456
You can also download the following files for this example: the Makefile and the program after conversion to regular OCaml prog.ppo.
<:expr< let f $list:args$ = $e$ >>
Functions as represented in the AST only take one argument. So this:
let f x y z = x + y + z
is represented in the AST as:
let f = (fun x -> (fun y -> (fun z -> x + y + z)))
Such definitions have to be built using higher-order functions
such as List.fold_right
or List.fold_left
(see previous section).
It happens that Camlp5 returns incorrect locations in errors messages under some circumstances. Camlp5 3.08.1 was particularly difficult to use for this reason, so if you are using OCaml 3.08.1, you should upgrade your OCaml system.
_loc
(or loc
)
Between the release of OCaml 3.08.2 and 3.08.3, the default identifier
for locations used in syntax extensions silently
changed from loc
to _loc
.
For compatibility reasons, pass the -loc _loc
option
(or -loc loc
) to camlp5o
as we did in the Makefiles
of this tutorial.
<:expr< f ~$lid:labelname$ >>
doesn't work
Labels of function arguments are a special kind of node of the syntax tree
which is simply represented using the string
type
and only include lowercase identifiers.
Instead of writing this:
let label = "x" in <:expr< f ~$lid:label$ >>
one should simply write that:
let label = "x" in <:expr< f ~$label$ >>
Not_found
is raised during the preprocessing
If the Not_found
exception is raised during the preprocessing
phase (typically while running camlp5o
or starting a custom toplevel), the reason may be that
a DELETE_RULE
statement tries to delete a rule which
does not exist. Some rules may be slightly changed from one version of Camlp5
to another or they might move to other grammar entries.
For the sake of compatibility, it seems to be a good practice to catch
and ignore any Not_found
exception that might be raised
by a DELETE_RULE
statement, which is simply an expression.
For instance, this will fail with some older versions of Camlp5:
DELETE_RULE Pcaml.patt: LIDENT END
But that should be a much better compromise:
(try DELETE_RULE Pcaml.patt: LIDENT END with Not_found -> ())