Ralfs anfallender Unsinn: Playing with perl6 grammars

2009-08-28

Playing with perl6 grammars

Since the grammar feature in perl6 is documented in rather sophistication-requiring ways (including Synopsis 5, Apocalypse 5 and Exegesis 5 (the latter being the most accessible one)), I decided to look for short intros to get quick cheap results.

The simplest description I found is the Wikipedia article about Perl6 rules. My point is (ab)using grammar as a tool for describing a grammar (in the Chomsky sense), not just as a named pair of braces to stick regexen into.

I won't repeat the Wikipedia stuff here, just mention a few facts:

The grammar entered has to be a PEG, i.e. it has some features that are not expressible with a CFG but requires grammar formulations (or manual transformations) similar to those required for a LL(1) parser (i.e. eliminating left-recursion using left-factorization, which makes the grammar much less human-readable)

The keywords "token" and "rule" are not what one would expect simple-mindedly. They are special cases of "regex" (which is what one should use unless one understands the details) with suppressed backtracking (and in addition, "rule" is whitespace-sensitive, which is unexpected in a grammar description language).

Actions can be included by having a class living in parallel to the grammar whose methods are called like the nonterminals of the grammar and by adding the string "{*}" after the production part (separately for each alternative production).

Alternatives can be tagged by adding something that looks like a comment, but consists of the string "#= " followed by the tag name.

Untagged productions call their action with one argument, tagged ones with two arguments (the second being the tag). The first argument is the matched text. Multi-methods in the class can be used to tell both cases apart.

More stuff will come once I've understood and tested it.

Here is an example which does not work (only the grammar, the rest of the program is identical to the working one). The language consists of all non-empty expressions containing balanced parentheses and the letter 'a'.


#!/usr/local/bin/perl6
# broken (left-recursive grammar)
grammar Foo {
    regex TOP { ^<A>$ {*}  #= top
    };
    regex A {
        ''                 {*} #= epsilon
        |<B>               {*} #= simple
        |'('<A>')'         {*} #= paren
        |<A><A>            {*} #= double
    };
    regex B { 'a' };
};

Unfortunately, with the bad grammar the program does not signal an error at compile time but goes into an endless recursion, maybe eventually falling victim to exceeding recursion depth (or running forever).

The following version has been left-factored, thus becoming unreadable for humans and readable for perl6.


grammar Foo {
    regex TOP { ^<A>$ {*}  #= top
    };
    regex A {
        <B><C>            {*} #= bc
        |'('<A>')'<C>     {*} #= paren
    };
    regex C {
        ''                {*} #= empty
        | <A><C>          {*} #= ac
    };
    regex B { 'a' };
};

class Fooa {
   method TOP($m,$t) { say "TOP m=$m t=$t"; };
   method A($m,$t) { say "A m=$m t=$t"; };
};

my Str @s=( 'aa',
           'a',
           '(a(a)a)a',
           'a(a(a)a)',
           'a(a(a)a)a(a(a)a)'
   );

for @s -> $x { say "look at $x";
               say Foo.parse($x,:action(Fooa.new));
               say '=================';
};

Keine Kommentare:

Kommentar veröffentlichen

Pages

2009-08-28

Playing with perl6 grammars

Keine Kommentare: