Comments (1)
It is worth stepping back and trying to establish what the user wants with Unicode support in the context of expanding a regex.
Even something fairly constrained like \w{3}
in Unicode mode becomes a large sequence, putting pressure on memory if the developer isnt careful. I suspect the use cases for that are quite limited, and the developer is likely going to need to aggressively thin out the results. They would rather declarative have the results pre-thinned, so IMO usable Unicode support is highly dependent on #2. Technically supporting Unicode without good thinning mechanisms seems like a self-knockout given that Unicode support is likely to hurt CI build times significantly, and development time, and possibly even cause headaches in future enhancements. All those negatives are worth the pain if the users have a usable feature.
Likewise \d
will often want to be expanded to only one numeral system.
In pywikibot there is quite a lot of code and use-cases to switch languages and switch numeral systems, especially to work in different calendar systems, and needing to mix natural and numerical strings, such as building generators of all century names (e.g. "3rd century BCE") and do that for each locale of Wikipedia, and build algorithms on top which work for any subdomain of Wikipedia. Those generators could easily be built using sre_yield, and I recall we had a few hacky attempts at similar approaches in the codebase. The need to mix numbers and natural language lends itself to having mappings of expandable regexes with keys for language codes.
No doubt I am forgetting some use case for mixing almost random parts of Unicode in the one algorithm, but all the ones which come to mind are like that -- they are generic algorithms that sit on swappable pre-defined segments of Unicode.
from sre_yield.
Related Issues (20)
- Request for sre_yield.oneString method HOT 2
- [feature request] Provide an iterateable object HOT 1
- Pass through unexpanded HOT 11
- Four tests ignored on pytest 5 HOT 3
- slice len() raises TypeError even for small values HOT 1
- Allow pre-parsed patterns
- IndexError if max_count lower than seq{x}
- Capture group lost for single pattern sequence
- max_count is confusing HOT 2
- Random values HOT 2
- repr is slow
- Slice step 0 causes ZeroDivisionError
- Slices should not raise IndexError
- Reverse slices can produce empty sets
- IndexError not raised causing infinite loop
- README: obsolete comment about null bytes
- How to limit the the numbers generated? HOT 1
- Support flags=re.LOCALE
- Problem with lookaheads
- upload a new version up to pypi? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sre_yield.