Git Product home page Git Product logo

Comments (5)

emphasize avatar emphasize commented on July 17, 2024 1

you don't remove list items in a (for) loop. With this it should be solved

# empties the list entry (if necessary) and removes it afterwards
# with single letters it reverse searches for non-empty entries and applies the letter 
# String.title() capitalizes the first letter

    if only_nouns and results:        
        results[0] = results[0].title()
        for ri in range(len(results) - 1):
            if results[ri].islower():
                merged = results[ri] + results[ri + 1].lower()
                if ahocs.exists(merged):   # does ahocs.exists() disregards capitalization?
                    results[ri] = merged.title()
                    results[ri + 1] = ""
                else:
                    if len(results[ri]) == 1:
                        aritfact_single_letter = results[ri]
                        for i in range(1, ri+1):
                            if results[ri - i]:
                                results[ri - i] += aritfact_single_letter
                                break
                        results[ri] = ""

    results = list(filter(None, results))

from german_compound_splitter.

repodiac avatar repodiac commented on July 17, 2024

Thanks @danilyef - I will look into it. In case, you are more than welcome to provide a PR :-)

from german_compound_splitter.

PythonJDoe avatar PythonJDoe commented on July 17, 2024

I'm not sure if it's right place to post, but I couldn't find any forum for this so I'm posting here. I'm facing a problem to work with german_compound_splitter. I have a large list of German text which I want to split & use for text mining. texts is the list containing German text which I want to split & store in another list text[]. So I wrote following code

text=list()
for i in range(length):
    s=comp_split.merge_fractions(comp_split.dissect(texts[i], ahocs, make_singular=True))
    text.append(s)

But I'm getting following error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_483/1238906536.py in <module>
      1 text=list()
      2 for i in range(length):
----> 3     s=comp_split.merge_fractions(comp_split.dissect(texts[i], ahocs, make_singular=True))
      4     text.append(s)

/opt/conda/lib/python3.9/site-packages/german_compound_splitter/comp_split.py in dissect(compound, ahocs, only_nouns, make_singular, mask_unknown)
    246     if only_nouns:
    247         # workaround to prevent unwanted behaviour (only nouns are eligible)
--> 248         results[0] = results[0][0].upper() + results[0][1:]
    249         for ri in range(len(results) - 1):
    250             if results[ri].islower():

IndexError: list index out of range

Can you please guide me to resolve the error?

from german_compound_splitter.

repodiac avatar repodiac commented on July 17, 2024

Hi @PythonJDoe - thanks for your inquiry. Sorry to hear you experienced this error. My time is limited currently, but I will look into it and get back to you asap. It seems to be the same/a similar error and you are the second to mention - so it should be addressed, I agree.

from german_compound_splitter.

repodiac avatar repodiac commented on July 17, 2024

Thanks @emphasize - I appreciate your efforts. I didn't have the time yet to look further into this issue, I am sorry. I'll try to check it asap.

from german_compound_splitter.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.