Git Product home page Git Product logo

tranco-python-package's People

Contributors

christopher-david-smith avatar maxzhenzhera avatar victorlep avatar zerosum24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tranco-python-package's Issues

Is Tranco dead?

The lists are currently unavailable to download via the website and, having just installed the Python package, I am getting errors which seem to imply it can't connect to or read from the online lists.

The website link for the up to date list gives a Cloudflare 502 error.

The permalink to https://tranco-list.eu/top-1m.csv.zip DOES work but I'm concerned it may not actually be up to date since it's not timestamped.

Old lists not downloadable

Hi,

when I try to download the list from 01 Jan. 2019 I get an BadZipFile("File is not a zip file") error.
I can, however, directly download the list (.csv file) from the website.

The code to download the list is the following:

from tranco import Tranco

t = Tranco(cache=True, cache_dir='.tranco')
list_jan = t.list('2019-01-01')

The same error occurs for the list of 2019-02-01 while retreiving 2019-01-01 (and any list after that) works as expected.

Is there a workaround to fix this, like downloading the csv file directly through the Tranco module?

Thanks!

list() in tanco.py returns empty list since 2019-07-17

from tranco import Tranco
t = Tranco(cache=True, cache_dir='.tranco')
latest_list = t.list()

latest_list in the above code returns empty list ([]). It has happened since 2019-07-17.
Tried with the following dates. Both returned empty list.
date_list = t.list(date='2019-07-17')
date_list = t.list(date='2019-07-18')

As of date 2019-07-18, file ID 3NLL, tranco_3NLL.csv is empty, whereas top-1m.csv on the same date is normal.

Has file format changed recently?

def rank(self, domain) implementation is pretty slow

the rank() function uses list.index() to get the rank of any passed in domain. This causes a linear scan through the list until if finds the domain, which can take a long time if used to look up ranks for a large set of domains.

Instead you might build a dict[domain:rank] to hold the domain ranks. This way each rank lookup is a simple hashtable lookup.

If you want I can create a PR for you to do this.

Confusing error when Tranco lists are down

Looks like the lists are down at this time (I get "502 Bad Gateway" from the download button on https://tranco-list.eu/).

One then gets a pretty confusing stack trace from the Tranco package:

>>> from tranco import Tranco
>>> t = Tranco(cache=False)
>>> tranco_list = t.list()
Traceback (most recent call last):
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 685, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 752, in read_chunked
    self._update_chunk_length()
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 689, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 560, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 781, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 443, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/tranco/tranco.py", line 56, in list
    top_list_text = self._download_zip_file(list_id)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/tranco/tranco.py", line 81, in _download_zip_file
    r2 = requests.get(download_url)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/sessions.py", line 686, in send
    r.content
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 828, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 753, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.