[Ometa] Direct left recursion with specific ending
Alessandro Warth
alexwarth at gmail.com
Wed May 25 20:00:01 PDT 2011
Hi Justin,
On Wed, May 25, 2011 at 6:43 PM, Justin Chase <justin.m.chase at gmail.com>wro=
te:
> Side question about this syntax:
> ident =3D <letter (letter | digit)*>
>
> I am working on a C# tool that is roughly equivalent to ometa/js and I'm
> curious about the above syntax. It's my understanding that < > basically
> means that the production of the rule is whatever is inside that block and
> also implies that there are no variables declared. So the following would
> not be legal:
> ident =3D <letter:x (letter | digit)*>
> ident =3D <letter (letter | digit)*> -> ...
>
> Correct?
No. There is no restriction to what you can do inside the pointy brackets
(<>s). If you want to bind the result of a rule application to a variable so
you can use it later, that's perfectly fine.
And there is no "flattening" taking place either. <pat1 ... patn> simply
means is "the part of the input that was consumed by pat1 ... patn.
A while ago, I sent a
message<http://vpri.org/pipermail/ometa/2010-April/000242.html>to this
list that explains this feature and a few others. Here are the
relevant bits:
*Consumed-By*:
Pretty often, when you're writing a lexical analyzer or scannerless parser,
you end up writing rules like this:
identifier =3D letter:x (letter | digit)*:xs -> ...
... and in the semantic action, you have to put these characters back
together into a string. It would be a lot nicer if you could just say "give
me the substring of the input that was consumed by this expression", and
that's exactly what the consumed-by operator (<...>) does. So instead of the
above, you could write
identifier =3D <letter (letter | digit)*>
which is more declarative, and also runs a little bit faster* (at least in
Squeak -- I haven't profiled the JavaScript version, but I expect it will be
faster too).
Note that consumed-by doesn't only work when the input is a string. If it's
an array, the value of <...> will be a sub-array whose elements will be the
elements that were consumed by the expression inside the pointy brackets.
(In Smalltalk parlance, the value of <...> is always of the same "species"
as the data that you're working on.)
Hope this helps!
Cheers,
Alex
> But what's interesting to me is that inside of this block is a ( )* which,
> I believe, will yield an inner array. Meaning the string "abc" would yiel=
d:
> ['a', ['b', 'c']] Correct? I'm just wondering if or when "flattening"
> occurs? Does the < > also imply flattening or is flattening always done or
> is that just not part of this example? Am I making sense? What's the gene=
ral
> strategy for flattening
>
> In my world I am allowing arbitrary expressions in the productions and so
> it looks like this:
> ident =3D i:(letter (letter | digit)*) -> Parser.Stringify(i);
>
> Thanks.
>
>
>
> On Wed, May 25, 2011 at 6:55 PM, Justin Chase <justin.m.chase at gmail.com>w=
rote:
>
>> I see, thank you. I'm still finding thinking in pattern matching tricky!
>> It's one thing to learn all of the individual operators but it's almost =
as
>> if more complex composite patterns emerge after a while. It reminds me of
>> circuit logic in that way.
>>
>>
>> On Wed, May 25, 2011 at 6:51 PM, Alessandro Warth <alexwarth at gmail.com>w=
rote:
>>
>>>
>>>
>>> On Wed, May 25, 2011 at 4:31 PM, Justin Chase <justin.m.chase at gmail.com=
>wrote:
>>>
>>>> Very interesting! I did not realize that you could pass the productions
>>>> of previous rules into other rules like in M2, that is very cool. I li=
ke
>>>> that solution very much. Can you give me an example of a right recursi=
ve
>>>> solution, despite the resulting tree? I'm having a hard time envisioni=
ng it.
>>>
>>>
>>> Sure, here's what I meant:
>>>
>>> ometa M3 <: Parser {
>>> ident =3D <letter (letter | digit)*>,
>>> expr =3D ident validEnding,
>>> validEnding =3D "." ident validEnding
>>> | "." ident
>>> | "(" ")" validEnding,
>>> consumedByExpr =3D <expr>
>>> }
>>>
>>> M3.matchAll('x.y.z()', "consumedByExpr") --> x.y.z
>>> M3.matchAll('x.y().z', "consumedByExpr") --> x.y().z
>>> M3.matchAll('x().y.z', "consumedByExpr") --> x().y.z
>>>
>>> The consumedByExpr rule above doesn't fail on any of your examples, but
>>> you'll see that it doesn't consume the ending ()s of your first example=
--
>>> which in this case (say, if you were using that rule in a parser) means=
that
>>> that input wasn't valid.
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>>>
>>>>
>>>> On Wed, May 25, 2011 at 4:22 PM, Alessandro Warth <alexwarth at gmail.com=
>wrote:
>>>>
>>>>> Hi Justin,
>>>>>
>>>>> That's an interesting question... the kind of property you're trying =
to
>>>>> express would be easy to enforce using a right-recursive rule, but th=
at
>>>>> would screw up the shape of your parse tree.
>>>>>
>>>>> The simplest solution I can think of off the top of my head is to
>>>>> consume the entire expression with a left-recursive rule, without wor=
rying
>>>>> about which pattern shows up last, and *then* inspect the resulting p=
arse
>>>>> tree to see if it obeys the property you care about. Here's how you m=
ight do
>>>>> that in OMeta/JS:
>>>>>
>>>>> ometa M <: Parser {
>>>>> ident =3D <letter (letter | digit)*>,
>>>>> expr =3D expr:x "." ident:f -> ['FieldAccess', x, f]
>>>>> | expr:x "(" ")" -> ['Call', x]
>>>>> | ident:x -> ['Var', x]
>>>>> }
>>>>>
>>>>> M.matchAll('x.y.z()', "expr")
>>>>> M.matchAll('x.y().z', "expr")
>>>>> M.matchAll('x().y.z', "expr")
>>>>>
>>>>>
>>>>> ometa M2 <: M {
>>>>> expr2 =3D expr:e validate(e) -> e,
>>>>> validate =3D ['FieldAccess' :x :f]
>>>>> }
>>>>>
>>>>> M2.matchAll('x.y.z()', "expr2")
>>>>> M2.matchAll('x.y().z', "expr2")
>>>>> M2.matchAll('x().y.z', "expr2")
>>>>>
>>>>> M's expr rule matches all three inputs, whereas M2's expr2 rule only
>>>>> matches the last two. You can try this for yourself here:
>>>>> http://www.tinlizzie.org/ometa-js/#justin
>>>>>
>>>>> Cheers,
>>>>> Alex
>>>>>
>>>>>
>>>>> On Wed, May 25, 2011 at 6:15 AM, Justin Chase <
>>>>> justin.m.chase at gmail.com> wrote:
>>>>>
>>>>>> Suppose I'm using direct left recursion to parse expression trees.
>>>>>> I'll provide some pseudo patterns:
>>>>>>
>>>>>> ReferenceExpression =3D "a".."z"+
>>>>>> InvokeExpression =3D ReferenceExpression "(" ")"
>>>>>> CallableExpression =3D InvokeExpression | ReferenceExpression
>>>>>>
>>>>>> Expression =3D Expression "." CallableExpression | CallableExpression
>>>>>>
>>>>>> This will parse all of the following:
>>>>>> "x.y.z()"
>>>>>> "x.y().z"
>>>>>> "x().y.z"
>>>>>> etc.
>>>>>>
>>>>>> But the trouble I'm having is in trying to specify exactly which
>>>>>> pattern should be last. Suppose you wanted an expression that is req=
uired to
>>>>>> end with a ReferenceExpression so that only these would parse:
>>>>>> "x.y.z"
>>>>>> "x.y().z"
>>>>>> "x().y.z"
>>>>>> etc.
>>>>>>
>>>>>> I tried experimenting with something like this:
>>>>>>
>>>>>> Expression(end) =3D Expression(end) "." end | Expression(end) "."
>>>>>> Expression(end) | CallableExpression
>>>>>> ReferenceExpressionWithTarget =3D Expression(ReferenceExpression)
>>>>>>
>>>>>> But that doesn't work. It seems that the Grow LR function will
>>>>>> essentially cause the 2nd clause to consume all of the input and ret=
urn
>>>>>> successfully.
>>>>>>
>>>>>> I also tried:
>>>>>>
>>>>>> Expression(end) =3D (Expression(end) "." CallableExpression |
>>>>>> CallableExpression) "." end
>>>>>> ReferenceExpressionWithTarget =3D Expression(ReferenceExpression)
>>>>>>
>>>>>> But that also doesn't work because the first part will also consume
>>>>>> all of the input and the "." end will have nothing left to read. Wha=
t is the
>>>>>> proper way to work with the Grow LR such that it ends with a specific
>>>>>> pattern?
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Justin Chase
>>>>>> http://www.justinmchase.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> OMeta mailing list
>>>>>> OMeta at vpri.org
>>>>>> http://vpri.org/mailman/listinfo/ometa
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Justin Chase
>>>> http://www.justinmchase.com
>>>>
>>>
>>>
>>
>>
>> --
>> Justin Chase
>> http://www.justinmchase.com
>>
>
>
>
> --
> Justin Chase
> http://www.justinmchase.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://vpri.org/pipermail/ometa/attachments/20110525/2e512575/attachme=
nt.htm
More information about the OMeta
mailing list