Git Product home page Git Product logo

cutil's People

Contributors

xtream1101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

whugue tat2133

cutil's Issues

Catch http 503 errors

If they are not cought those links will be skip and the content not saved.

Solution:

  1. If a 503 error occurs, go into a waiting state of x mins then try again
  2. Stop the scrape and print why it stopped. Will pick up where it left off on next run

Auto change proxy

On 403 http erros, auto change proxy and retry the request
Be able to pass in auto_change=False to not switch proxies on error

Exit if ran out of usable proxies

Right now it will keep roating through the proxy list and trying the same ones over and over. If all proxies have been tried, then exit and let the user know that all proxies are dead/banned

Database upsert fails to insert data into a json field

Postgres has a json type.
When using the upsert function with a value that is a dict/json it fails with the error

psycopg2.ProgrammingError: column "foo1" is of type json but expression is of type text
HINT:  You will need to rewrite or cast the expression.

Database upsert fails when a value is `None` for each row

When using the upsert function with a single dict and a key has a None value, the upsert will fail with the error:
psycopg2.ProgrammingError: could not determine polymorphic type because input has type "unknown"

This is an issue with using unnest when the array only contains null values.

Examples:

fails_1 = [{'id': 1,
            'val1': None,
            'val2': 'bar',
            }]
fails_2 = [{'id': 1,
            'val1': None,
            'val2': 'bar',
            },
           {'id': 2,
            'val1': None,
            'val2': 'bar',
            }]

# The whole array is not None for a single field
success_1 = [{'id': 1,
              'val1': None,
              'val2': 'bar',
              },
             {'id': 2,
              'val1':'foo',
              'val2': 'bar',
              }]

Try another proxy if current one does not work

HTTPConnectionPool(host='<host>', port=<port>): Max retries exceeded with url: <url> (Caused by ProxyError('Cannot connect to proxy.', gaierror(11001, 'getaddrinfo failed')))

Try another proxy if it has, else exit script and let the user know that the proxy does not work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.