Git Product home page Git Product logo

trieregex's People

Contributors

ermanh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

trieregex's Issues

RecursionError: maximum recursion depth exceeded while getting the repr of an object

I have an error when trying to use this library with a list of 10K celebrity names.

Traceback (most recent call last):
  File "/home/hhh/ratings-api/clean.py", line 74, in <module>
    logger.debug(trie.regex())
  File "/home/hhh/.local/lib/python3.9/site-packages/trieregex/memoizer.py", line 20, in __call__
    self.cache[stringed] = self.func(*args)
  File "/home/hhh/.local/lib/python3.9/site-packages/trieregex/trieregex.py", line 111, in regex
    return f'{escape(key)}{self.regex(trie[key], False)}'
  File "/home/hhh/.local/lib/python3.9/site-packages/trieregex/memoizer.py", line 18, in __call__
    stringed = str(args)
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Using trieregex for a list not of words, but of regular expressions?

Thanks for making this. Tries are extremely powerful, and your module makes them easy to use.

I'm wondering if I can use it also to optimize a list of regular expressions instead of a list of words.

By regular expressions I mean anything I would feed into re.find(regex, string), for example. So flags like (?mi), non-capturing groups, etc..

I tried doing that, but your module simply escaped my regular expressions, i.e. treated them as words to be matched verbatim. So it seems the algorithm you are using to generate the trie only works for words, or strings to be matched verbatim, and not for regular expressions.

Am I overlooking something? Do you think it's even possible to do for a list of regular expressions what you've done for lists of words here? Basically, a general regex optimizer.

Thanks!

It seems that the regex method return empty string inside a loop or a spark UDF

Hi,
TrieRegex really help me to improve my regex performance.
However, I try to use TrieRegex inside a spark UDF and seem that sometime TrieRegex return an empty string.

To simplify, I understand that the same behaviour is present if I use the TrieRegex inside a loop:

# Del
i = 0
VALUES = ["REGEX1", "REGEX2"]
while i < 20:
    TRIE_VALUES = TrieRegEx(*VALUES)
    i = i + 1
    if len(TRIE_VALUES.regex()) < 1:
        print(f"ERROR on loop i:{i}")
        print(f"TRIE_VALUES: '{TRIE_VALUES.regex()}' (len: {len(TRIE_VALUES.regex())})")
        break

I have:

ERROR on loop i:5 # Where the number can change
TRIE_VALUES: '' (len: 0)

My workaround for this case is to add a del like this:

# Del
i = 0
VALUES = ["REGEX1", "REGEX2"]
while i < 20:
    TRIE_VALUES = TrieRegEx(*VALUES)
    i = i + 1
    if len(TRIE_VALUES.regex()) < 1:
        print(f"ERROR on loop i:{i}")
        print(f"TRIE_VALUES: '{TRIE_VALUES.regex()}' (len: {len(TRIE_VALUES.regex())})")
        break
    del TRIE_VALUES

With the code above it works well.
However, if I use TrieRegex inside a PandasUDF, I have the same bug.

My pandas udf is something like this:

def trieregex_udf(df):
     # Read source
     values = ### read_values()
     trie = TrieRegex(*patterns)
     regex = trie.regex()
     # Apply regex to DF
     output = .....
     return output

output = df.groupby("id").applyInPandas(trieregex_udf, schema="v string").toPandas()

sometimes the trie.regex() return an empty string
It seems that the problem is present only in case of instance the TrieRegex inside the udf, if I pass the result regex everything work well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.