Comments (3)
Global variables are not exactly what you want, because it make your code very brittle when you come to run a parser multiple times, possibly from different threads etc.
The first approach I would have is not do any global state when parsing. Try to structure your parsing phase so that it doesn't need this state. Then, have a second pass over the objects produced by parsing, where you can add extra information like numbering things in order.
If you really can't do that, then you can avoid global variables by instead:
- define a function that will run your parser
- define state variables inside that function
- put parsers that use state inside that function,
- use closures within that function which use the
nonlocal
keyword to bind the shared state variables.
Example - we want to parse text like:
* An item
* Another item
Into:
[NumberedItem(text='An item', number=0),
NumberedItem(text='Another item', number=1)]
Code:
import parsy as P
from dataclasses import dataclass
@dataclass
class NumberedItem:
text: str
number: int
# This parser uses no state
item = P.string("* ") >> P.regex(r"[^\n]*") << P.string("\n")
def numbered_items_parser():
current_item_number = 0
# this parser uses state, so is defined as a closure that binds current_item_number
@P.generate
def numbered_item():
nonlocal current_item_number
text = yield item
returnval = NumberedItem(text=text, number=current_item_number)
current_item_number += 1
return returnval
return numbered_item.many()
# To use, do something like:
# numbered_items_parser().parse("* my item")
Notice how you have to call number_items_parser
to get the parser to get a fresh, independent copy of the state each time. If you call it just once and then re-use that, you'll re-use the state. To make this safer, you might want to do this instead:
def parse_numbered_items(input_str: str):
current_item_number = 0
@P.generate
def numbered_item():
nonlocal current_item_number
text = yield item
returnval = NumberedItem(text=text, number=current_item_number)
current_item_number += 1
return returnval
return numbered_item.many().parse(input_str)
That way you can't misuse the numbered_item
parser.
But, I still think the preferred approach if possible would be to have a first pass that did the basic parsing, returning a list/tree of objects, then go over that tree in a second pass and add the numbering.
from parsy.
BTW if you have a good example where you really do need the state technique here, it would be helpful to share, I could possibly add it to the docs.
from parsy.
thank you very much for such detailed example. Watched a pycon talk last night on decorators , surely the closure looks promising.
I am currently trying to implement a common-markdown parser, with added functionality to embed markdown documents using links.
One quirk is that document can recursively link each other that parsing would could go on infinitely.
To break it , one has to either limit the recursion depth or detect cycle in the visited links.
e.g. Doc A has link to Doc B
Doc B has link to Doc A
Thinking about your comment, I now realize, I might be able to do it as post processing (second pass) on AST.
but again I have to maintain state to break recursion while doing second pass. OTOMH, i think it is simpler to implement using closures than a second pass on the AST.
thanks again for detailed example.
from parsy.
Related Issues (20)
- >>= for bind HOT 14
- Missing documentation for eof parser? HOT 1
- Improve debugging: peek show next data in errors HOT 5
- Recompute line number for ParseError passed up from .bind
- Improve debugging ergonomics HOT 3
- Inline (explicit) and implicit tracing
- Missing seq import statement in tutorial HOT 1
- Help with parsing that "hangs" HOT 3
- Bug with backtracking and generate? HOT 4
- Parsy 1.3.0 fails to support 'group' keyword of `regex` function HOT 1
- async/await support in generate() HOT 6
- Release HOT 2
- Interested in a version of parsy with type annotations? HOT 3
- Allow providing a default to optional() HOT 2
- combine fails when nothing is produced by many HOT 2
- [bug] Parser.desc() causes loss of error information, simple fix HOT 2
- alt doesn't use fallback parsers when initial parser has .many()/.sep_by() HOT 1
- Processing list of tokens HOT 1
- Seeking Guidance on Implementing Parser Autocomplete HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parsy.