Git Product home page Git Product logo

Comments (3)

spookylukey avatar spookylukey commented on September 27, 2024

Global variables are not exactly what you want, because it make your code very brittle when you come to run a parser multiple times, possibly from different threads etc.

The first approach I would have is not do any global state when parsing. Try to structure your parsing phase so that it doesn't need this state. Then, have a second pass over the objects produced by parsing, where you can add extra information like numbering things in order.

If you really can't do that, then you can avoid global variables by instead:

  • define a function that will run your parser
  • define state variables inside that function
  • put parsers that use state inside that function,
  • use closures within that function which use the nonlocal keyword to bind the shared state variables.

Example - we want to parse text like:

* An item
* Another item

Into:

[NumberedItem(text='An item', number=0),
 NumberedItem(text='Another item', number=1)]

Code:

import parsy as P
from dataclasses import dataclass

@dataclass
class NumberedItem:
    text: str
    number: int

# This parser uses no state
item = P.string("* ") >> P.regex(r"[^\n]*") << P.string("\n")

def numbered_items_parser():
    current_item_number = 0

   # this parser uses state, so is defined as a closure that binds current_item_number
    @P.generate
    def numbered_item():
        nonlocal current_item_number
        text = yield item
        returnval = NumberedItem(text=text, number=current_item_number)
        current_item_number += 1
        return returnval

    return numbered_item.many()

# To use, do something like:
#  numbered_items_parser().parse("* my item")

Notice how you have to call number_items_parser to get the parser to get a fresh, independent copy of the state each time. If you call it just once and then re-use that, you'll re-use the state. To make this safer, you might want to do this instead:

def parse_numbered_items(input_str: str):
    current_item_number = 0

    @P.generate
    def numbered_item():
        nonlocal current_item_number
        text = yield item
        returnval = NumberedItem(text=text, number=current_item_number)
        current_item_number += 1
        return returnval

    return numbered_item.many().parse(input_str)

That way you can't misuse the numbered_item parser.

But, I still think the preferred approach if possible would be to have a first pass that did the basic parsing, returning a list/tree of objects, then go over that tree in a second pass and add the numbering.

from parsy.

spookylukey avatar spookylukey commented on September 27, 2024

BTW if you have a good example where you really do need the state technique here, it would be helpful to share, I could possibly add it to the docs.

from parsy.

y1450 avatar y1450 commented on September 27, 2024

thank you very much for such detailed example. Watched a pycon talk last night on decorators , surely the closure looks promising.
I am currently trying to implement a common-markdown parser, with added functionality to embed markdown documents using links.
One quirk is that document can recursively link each other that parsing would could go on infinitely.
To break it , one has to either limit the recursion depth or detect cycle in the visited links.
e.g. Doc A has link to Doc B
Doc B has link to Doc A
Thinking about your comment, I now realize, I might be able to do it as post processing (second pass) on AST.
but again I have to maintain state to break recursion while doing second pass. OTOMH, i think it is simpler to implement using closures than a second pass on the AST.
thanks again for detailed example.

from parsy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.