Pages

2009-08-21

Parse::Marpa - trying to read the manuals

The usage of Parse::Marpa is very different from all other parsers or parser generators I've seen so far. Nevertheless, it is a promising tool, since it is an Earley parser and thus able to parse all context free languages. It should be noted that perl 5.10 is required (I installed a fresh 5.11 just for testing the module).
The grammar is to be given in a language called "MDL" (Marpa Demonstration Language), which supposedly is not the only possibility (using low-level interface, the user should be able to write his own grammar specification language). I give only the most elementary stuff needed to write a grammar, omitting amost all the cool stuff - the Postscript version of the most relevant chapters (created using
perldoc -u "$1" | pod2man | groff -man -Tps
) is about 70 pages.

A minimal grammar specification looks like this:
start symbol is englishsentence.
semantics are perl5. version is 1.004000.
default lex prefix is /\s+|\A/.
concatenate lines is q{ (scalar @_) ? (join "-", (grep { $_ } @_)) : undef; }.
default action is concatenate lines.

englishsentence: subject, verb, conjunction, object.

englishsentence: subject, verb, object.

specializernoun: noun. q{ "specializernoun($_[0])" }.

ordinarynoun: noun. q{ "ordinarynoun($_[0])" }.

subject: specializernoun, ordinarynoun. q{ "spsub($_[0]+$_[1])" }.

subject: noun. q{ "subject($_[0])" }.

noun: nounlex.

nounlex matches /fruit|banana|time|arrow|flies/.

verb: verblex. q{ "verb($_[0])" }.

verblex matches /like|flies/.

object: preposition, noun. q{ "ob(prep($_[0])+n($_[1]))" }.

conjunction: /like/. q{ "conjunction($_[0])" }.

preposition: prepositionlex.

prepositionlex matches /a\b|an/.
Main principles:

A grammar specification consists of paragraphs (separated by empty lines).
A paragraph consists of sentences (terminated with a dot).

At the beginning of the grammar, a few sentences about start␣symbol, semantics and version are mandatory. My entry for default␣lex␣prefix is the simplest thing I was able to create (whitespace before each word unless we are at the beginning of the text). The concatenate␣lines/default␣action-stuff is copied from the manual (and not yet really understood by me). All those sentences contain the word "is" or "are", which are considered synonyms.

Production sentences have the form
nonterminalsymbol colon comma-separated-list-of-symbols dot

They can be followed by an action of the form

q{ dosomethingwith($_[0],$_[1]); }.

The manual says that one should not use "return" inside an action. The elements of @_ inside an action correspond to the items in comma-separated-list-of-symbols.
In order to specify alternatives, just give multiple rules with different right-hand-sides.

Literal elements inside the rhs of a rule can specified only as regexps (i.e. literal strings do not work despite my interpretation of perldoc Parse::Marpa::Doc::MDL).

Finally, terminal symbols can be defined using so-called terminal sentences the form

terminalsymbolname matches /regexpbody/.

Terminal paragraphs cannot have actions, their value is always the matched input (which is not really a problem, just write
foo: foolex. q{ here_goes_my_action($_[0]); }.

foolex matches /foo/.
).

Finally, the grammar is (in the simplest case) used as follows:
my @values1=Parse::Marpa::mdl(\$grammar_description,\$data1);
whereafter @values contains a list of the results for each interpretation of the (possibly ambiguous) grammar.

Keine Kommentare: