2009-08-21

Parse::Marpa - first results

I just found the parser generator mentioned above on CPAN and played with it (some usage rules I've found so far will be mentioned in the next post). The following source is intended to decompose the well-known ambiguous english sentence "fruit flies like a banana", the output is appended below. Marpa does this, but finds 8 results instead of 2 (each of the interpretations is found 4 times). Since this is my first trial, I believe the fault is in my program.

#!/opt/bin/perl

use 5.011;
use strict;
use warnings;
use English qw(-no_match_vars);
use Parse::Marpa;

my $grammar_description='
start symbol is englishsentence.
semantics are perl5. version is 1.004000.
default lex prefix is /\s+|\A/.
concatenate lines is q{ (scalar @_) ? (join "-", (grep { $_ } @_)) : undef; }.
default action is concatenate lines.

englishsentence: subject, verb, conjunction, object.

englishsentence: subject, verb, object.

specializernoun: noun. q{ "specializernoun($_[0])" }.

ordinarynoun: noun. q{ "ordinarynoun($_[0])" }.

subject: specializernoun, ordinarynoun. q{ "spsub($_[0]+$_[1])" }.

subject: noun. q{ "subject($_[0])" }.

noun: nounlex.

nounlex matches /fruit|banana|time|arrow|flies/.

verb: verblex. q{ "verb($_[0])" }.

verblex matches /like|flies/.

object: preposition, noun. q{ "ob(prep($_[0])+n($_[1]))" }.

conjunction: /like/. q{ "conjunction($_[0])" }.

preposition: prepositionlex.

prepositionlex matches /a\b|an/.

';

my $data1='time like an arrow.';
my @values1=Parse::Marpa::mdl(\$grammar_description,\$data1);
for my $i(@values1) { say $$i; }

my $data2='time flies like an arrow.';
my @values2=Parse::Marpa::mdl(\$grammar_description,\$data2);
for my $i(@values2) { say $$i; }

Output:

subject(time)-verb(like)-ob(prep(an)+n(arrow))
subject(time)-verb(flies)-conjunction(like)-ob(prep(an)+n(arrow))
subject(time)-verb(flies)-conjunction(like)-ob(prep(an)+n(arrow))
subject(time)-verb(flies)-conjunction(like)-ob(prep(an)+n(arrow))
subject(time)-verb(flies)-conjunction(like)-ob(prep(an)+n(arrow))
spsub(specializernoun(time)+ordinarynoun(flies))-verb(like)-ob(prep(an)+n(arrow))
spsub(specializernoun(time)+ordinarynoun(flies))-verb(like)-ob(prep(an)+n(arrow))
spsub(specializernoun(time)+ordinarynoun(flies))-verb(like)-ob(prep(an)+n(arrow))
spsub(specializernoun(time)+ordinarynoun(flies))-verb(like)-ob(prep(an)+n(arrow))
PS: If I replace the conjunction-definition (conjunction: qr{like}. q{ "conjunction($_[0])" }.) with a terminal (conjunction matches /like/.), the quadruplication ceases and I get exactly one result for each of the two interpretations (but I lose the action on the conjunction).
Using

conjunction: conjunctionlex. q{ "conjunction($_[0])" }.

conjunctionlex matches /like/.

I get the unwanted quadruplication back.

1 Kommentar:

Jeffrey hat gesagt…

Thanks for your interest in Parse::Marpa. The quadruplication of parses is a (mis)feature of the Aycock-Horspool algorithm. This uses a finite automata to group rules into what I'll call Aycock-Horspool states (AH-states), similar to LR states. Parsing is by AH-state. A state can represent several grammar rules, and the same grammar rule can occur in several states.

In my evaluator, I make sure that every rule in every state results counts as a unique parse. If you regard tree traversals involving different AH-states as different "parses", the results you are getting are correct. Two rules occur in two different AH-states and so each parse that is unique from the grammar-rule perspective, appears 4 times in the AH-state based counting of parses.

However, from the grammar writer's point of view, the AH-states are arbitrary. The grammar-rule perspective is the one the grammar writer expects and can use.

I am working on a new version of this parser, simply called Marpa. Marpa will replace Parse::Marpa. Marpa has a completely new evaluator, and will enumerate parses in the more natural way.