Git Product home page Git Product logo

retriever's People

Contributors

kharrigian avatar

Stargazers

 avatar

Watchers

 avatar  avatar

retriever's Issues

Bug in PRAW fallback in retrieve_submission_comments

There's a bug in retrieve_submission_comments. The general except Exception hid it. Don't do that. The PRAW fallback block didn't turn the list into a DataFrame, which threw an error when it got to df.sort_values, and the program thought it was a request error instead of a code error so it keeps making requests and failing.

Here's the modified code:

                ## Fall Back to PRAW
                if hasattr(self, "_init_praw") and self._init_praw and len(df) == 0:
                    temp_dfs = []
                    for s in submissions_clean:
                        temp_dfs.append(self._retrieve_submission_comments_praw(submission_id=s))
                    df = pd.concat(temp_dfs)
                ## Sort
                if len(df) > 0:
                    df = df.sort_values("created_utc", ascending=True)
                    df = df.reset_index(drop=True)
                return df
            except Exception as e:
                self.logger.warning(f"{e}")
                sleep(backoff)
                backoff = 2 ** backoff

Configure verbosity of logging

When retrieving Reddit data via retrieve_subreddit_submissions (and possibly others) there is a log line for every 100-post request submitted via the PushShift API. Would be nice to have a toggle on how much logging happens. Example image below:

image

Find Alternative for PSAW

PSAW is no longer maintained and has a number of bugs that lead to sub-optimal query performance. Either find an alternative package or fork and update.

Complete re-factor

PSAW + Reddit APIs appear to have changed, making much of the functionality in this package obsolete. A complete re-factor is probably the best approach for addressing these issues and making things functional again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.