distrinet / tranco-python-package Goto Github PK
View Code? Open in Web Editor NEWPython package to access the Tranco list
License: MIT License
Python package to access the Tranco list
License: MIT License
AttributeError: The daily list for this date is currently unavailable.
Suspicious, a similar error occurred exactly 366 days prior...
The lists are currently unavailable to download via the website and, having just installed the Python package, I am getting errors which seem to imply it can't connect to or read from the online lists.
The website link for the up to date list gives a Cloudflare 502 error.
The permalink to https://tranco-list.eu/top-1m.csv.zip DOES work but I'm concerned it may not actually be up to date since it's not timestamped.
The code here does not strip newlines on not full loading:
tranco-python-package/tranco/tranco.py
Lines 129 to 133 in f29ef30
Hi,
when I try to download the list from 01 Jan. 2019 I get an BadZipFile("File is not a zip file")
error.
I can, however, directly download the list (.csv file
) from the website.
The code to download the list is the following:
from tranco import Tranco
t = Tranco(cache=True, cache_dir='.tranco')
list_jan = t.list('2019-01-01')
The same error occurs for the list of 2019-02-01
while retreiving 2019-01-01
(and any list after that) works as expected.
Is there a workaround to fix this, like downloading the csv
file directly through the Tranco module?
Thanks!
from tranco import Tranco
t = Tranco(cache=True, cache_dir='.tranco')
latest_list = t.list()
latest_list in the above code returns empty list ([]). It has happened since 2019-07-17.
Tried with the following dates. Both returned empty list.
date_list = t.list(date='2019-07-17')
date_list = t.list(date='2019-07-18')
As of date 2019-07-18, file ID 3NLL, tranco_3NLL.csv is empty, whereas top-1m.csv on the same date is normal.
Has file format changed recently?
the rank() function uses list.index()
to get the rank of any passed in domain. This causes a linear scan through the list until if finds the domain, which can take a long time if used to look up ranks for a large set of domains.
Instead you might build a dict[domain:rank] to hold the domain ranks. This way each rank lookup is a simple hashtable lookup.
If you want I can create a PR for you to do this.
Looks like the lists are down at this time (I get "502 Bad Gateway" from the download button on https://tranco-list.eu/).
One then gets a pretty confusing stack trace from the Tranco package:
>>> from tranco import Tranco
>>> t = Tranco(cache=False)
>>> tranco_list = t.list()
Traceback (most recent call last):
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 685, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 752, in read_chunked
self._update_chunk_length()
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 689, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 750, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 560, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/urllib3/response.py", line 443, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/tranco/tranco.py", line 56, in list
top_list_text = self._download_zip_file(list_id)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/tranco/tranco.py", line 81, in _download_zip_file
r2 = requests.get(download_url)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/sessions.py", line 686, in send
r.content
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 828, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/home/USER/.virtualenvs/VENV/lib/python3.5/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.