Git Product home page Git Product logo

parfive's Introduction

Parfive

Latest PyPI version

A parallel file downloader using asyncio. parfive can handle downloading multiple files in parallel as well as downloading each file in a number of chunks.

Usage

asciicast demo of parfive

parfive works by creating a downloader object, appending files to it and then running the download. parfive has a synchronous API, but uses asyncio to paralellise downloading the files.

A simple example is:

from parfive import Downloader
dl = Downloader()
dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
files = dl.download()

Parfive also bundles a CLI. The following example will download the two files concurrently.:

$ parfive 'http://212.183.159.230/5MB.zip' 'http://212.183.159.230/10MB.zip'
$ parfive --help
usage: parfive [-h] [--max-conn MAX_CONN] [--overwrite] [--no-file-progress]
              [--directory DIRECTORY] [--print-filenames]
              URLS [URLS ...]

Parfive, the python asyncio based downloader

positional arguments:
  URLS                  URLs of files to be downloaded.

optional arguments:
  -h, --help            show this help message and exit
  --max-conn MAX_CONN   Number of maximum connections.
  --overwrite           Overwrite if the file exists.
  --no-file-progress    Show progress bar for each file.
  --directory DIRECTORY
                        Directory to which downloaded files are saved.
  --print-filenames     Print successfully downloaded files's names to stdout.

Results

parfive.Downloader.download returns a parfive.Results object, which is a list of the filenames that have been downloaded. It also tracks any files which failed to download.

Handling Errors

If files fail to download, the urls and the response from the server are stored in the Results object returned by parfive.Downloader. These can be used to inform users about the errors. (Note, the progress bar will finish in an incomplete state if a download fails, i.e. it will show 4/5 Files Downloaded).

The Results object is a list with an extra attribute errors, this property returns a list of named tuples, where these named tuples contains the .url and the .response, which is a aiohttp.ClientResponse or a aiohttp.ClientError object.

Installation

parfive is available on PyPI, you can install it with pip:

pip install parfive

or if you want to use FTP downloads:

pip install parfive[ftp]

Requirements

  • Python 3.7 or above
  • aiohttp
  • tqdm
  • aioftp (for downloads over FTP)

Licence

MIT Licensed

Authors

parfive was written by Stuart Mumford.

parfive's People

Contributors

1nf0rmed avatar cadair avatar dreamflasher avatar dstansby avatar githk avatar kianmeng avatar nabobalis avatar pre-commit-ci[bot] avatar raahul-singh avatar rlaker avatar solardrew avatar vn-ki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

parfive's Issues

Doc build broken

The doc build is broken, with the rather cryptic error:

Theme error:
An error happened in rendering the page api/parfive.Downloader.
Reason: UndefinedError("'str object' has no attribute 'items'")

I've deduced this is due to a recent change in https://github.com/sunpy/sunpy-sphinx-theme

Can't download NOAA SRS FTP file

import parfive
print(parfive.__version__)

from parfive import Downloader
dl = Downloader()
dl.enqueue_file("ftp://ftp.ngdc.noaa.gov/STP/swpc_products/daily_reports/solar_region_summaries/2015/01/20150101SRS.txt", path="./")
files = dl.download()

raises

1.5.1

Files Downloaded:   0%|                                                                                            | 0/1 [00:00<?, ?file/s]Failed to get size of FTP file
Traceback (most recent call last):
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/parfive/utils.py", line 69, in get_ftp_size
    size = await client.stat(filepath)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 820, in stat
    code, info = await self.command("MLST " + str(path), "2xx")
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 272, in command
    self.check_codes(expected_codes, code, info)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 224, in check_codes
    raise errors.StatusCodeError(expected_codes, received_code, info)
aioftp.errors.StatusCodeError: Waiting for ('2xx',) but got 550 [' Permission denied.']
Files Downloaded:   0%|                                                                                            | 0/1 [00:01<?, ?file/s]
20150101SRS.txt: 0.00B [00:00, ?B/s]%    

I can download this file fine using Cyberduck.

Python 3.6 with parfive 1.0.x and aioftp

E AttributeError: module 'contextlib' has no attribute 'asynccontextmanager' on the sunpy 2.0.X conda (py36) build.

Looks like the conda build uses parfive 1.0.2 and this has errored.

../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/data/__init__.py:5: in <module>
    from sunpy.data._sample import download_sample_data
../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/data/_sample.py:7: in <module>
    from sunpy.util.parfive_helpers import Downloader
../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/util/parfive_helpers.py:5: in <module>
    import parfive
../../.tox/py36-conda/lib/python3.6/site-packages/parfive/__init__.py:5: in <module>
    from .downloader import Downloader
../../.tox/py36-conda/lib/python3.6/site-packages/parfive/downloader.py:17: in <module>
    import aioftp
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/__init__.py:4: in <module>
    from .client import *
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/client.py:553: in <module>
    class Client(BaseClient):
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/client.py:1164: in Client
    @contextlib.asynccontextmanager
E   AttributeError: module 'contextlib' has no attribute 'asynccontextmanager'

Way to find number of queued files

It would be nice if Downloader could have an attribute that is the number of files queued for download. (just starting to play with this for HelioPy)

Command line interface?

It might be useful to create a command line interface, so people can just fire off requests without having to write any code to use the library

As a basic interface, just give it a file with one URL per line, or maybe a lot of URLs as arguments (although that could hit up against limits)

If you wanted to be fancy, have it be able to read a tab delim or CSV file and specify what field it should look at for URLs. (yes, people could use the unix cut command to extract the URLs first, but not everyone's on a posix-type system)

Add ability to resume downloads

Downloads should be resumed if it encountered a network problem or such and got cancelled.

This would only work if the server supports the range request header.

Implementation needs discussion.

More robust content-disposition header parsing

It seems that some of the VSO / DRMS requests return invalid content-disposition headers (sunpy/sunpy#3372). We could make the way we parse this header more robust (by only extracting things from within the quotes if the cgi.parse_header function fails because of an invalid header.

I am of two minds about this, it's an easy enough fix, but I am concerned that we could accidentally introduce incorrect behaviour by deviating from the standard parsing library.

Queue Full Error with 1.2.0rc1

../../.tox/py38-online/lib/python3.8/site-packages/sunpy/net/fido_factory.py:369: in fetch
    results = downloader.download()
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:273: in download
    return self._run_in_loop(self.run_download(timeouts))
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:193: in _run_in_loop
    return asyncio.run(coro)
/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/runners.py:43: in run
    return loop.run_until_complete(main)
/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/base_events.py:616: in run_until_complete
    return future.result()
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:226: in run_download
    done.update(await self._run_http_download(main_pb, timeouts))
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:324: in _run_http_download
    self.http_tokens.generate_queue(maxsize=self.max_conn),
../../.tox/py38-online/lib/python3.8/site-packages/parfive/utils.py:171: in generate_queue
    queue.put_nowait(item)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Queue at 0x7fd43f044160 maxsize=4 _queue=[<parfive.utils.Token object at 0x7fd43efaff70>n = 1, <parfive.utils.Token o...= 2, <parfive.utils.Token object at 0x7fd43efafbe0>n = 3, <parfive.utils.Token object at 0x7fd43efafa30>n = 4] tasks=4>
item = <parfive.utils.Token object at 0x7fd43efafd00>n = 5

    def put_nowait(self, item):
        """Put an item into the queue without blocking.
    
        If no free slot is immediately available, raise QueueFull.
        """
        if self.full():
>           raise QueueFull
E           asyncio.queues.QueueFull

/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/queues.py:148: QueueFull

https://dev.azure.com/sunpy/sunpy/_build/results?buildId=9500&view=logs&j=a7b3aa55-7d57-562f-3433-7f6b2d4252da&t=c806977a-3b27-51ee-41ed-46d1d189e419&l=960

Add a class method for quick download

If you have an iterable of URLs it should be easy to download all of them in one go, at the moment you have to instantiate Downloader call enqueue_file for each URL and then call download.

It would be nice to add a classmethod which looks like this:

@classmethod
def quick_download(cls, *urls, path=None, overwrite=None):
    dl = cls()
    for url in urls:
        dl.enqueue_file(url, path=path, overwrite=overwrite)
    return dl.download()

would be a quick and easy way to download a bunch of files. If you wanted more control over the kwargs then you could use the longer API.

Support HTTP Basic Auth

This involves a way of passing arguments to aiohttp.ClientSession which means doing it at construct time (or via properties) as we only use one session per call to download().

Support max connections per server

Given we can now open many connections per server (n files x n splits) it would be useful to be able to specify limits for servers which are not able (or even block) too many connections. This would mean that we could prioritise downloads from other servers to maintain the parallelism while putting less load on servers.

Kill the event loop when appropriate

The main event loop keeps running in it's own little thread even if the main sync download function get's killed.

We should somehow pass through a KeyboardInterrupt in the sync download function through to the underlying event loop.

Need to improve the download error handling.

From SunPy Travis build:

test_norh.py::test_get_url_for_time_range[timerange1-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121201-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121202] 
../.tox/py37-online/lib/python3.7/site-packages/sunpy/net/dataretriever/tests/test_norh.py::test_get_url_for_time_range[timerange1-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121201-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121202] Task was destroyed but it is pending!
task: <Task pending coro=<Downloader._get_ftp() running at /home/travis/build/sunpy/sunpy/.tox/py37-online/lib/python3.7/site-packages/parfive/downloader.py:415> wait_for=<Future finished exception=TimeoutError(110, "Connect call failed ('140.90.33.174', 58394)")> cb=[Downloader._run_from_queue.<locals>.callback(<parfive.util...d1170e80>n = 1, main_pb=Files Downloa...21<?, ?file/s])() at /home/travis/build/sunpy/sunpy/.tox/py37-online/lib/python3.7/site-packages/parfive/downloader.py:267, _wait.<locals>._on_completion() at /home/travis/miniconda/lib/python3.7/asyncio/tasks.py:440]>
Exception ignored in: <coroutine object Downloader._get_ftp at 0x7f19d2484648>
RuntimeError: coroutine ignored GeneratorExit
Future exception was never retrieved
future: <Future finished exception=TimeoutError(110, "Connect call failed ('140.90.33.174', 58394)")>
Traceback (most recent call last):
  File "/home/travis/miniconda/lib/python3.7/asyncio/selector_events.py", line 505, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
TimeoutError: [Errno 110] Connect call failed ('140.90.33.174', 58394)

We need to make sure we are catching and handling all the errors inside the _get_http and _get_ftp methods.

keyboard interrupt does not kill the download loop cleanly

In a jupyter notebook, because the loop is running in a threadpool, this means that the download isn't cancelled at all. In a regular session it causes all the coroutines to fail with a KeyboardInterrupt exception.

Thoughts on implementing this:

  • We can react to the signal with event_loop.add_signal_handler.
  • The event handler should cleanly terminate all pending and all running task, by calling Task.cancel (addressing #61 should mean we always have a Task object to cancel).
  • This should involve removing any part-finished files from disk so that there aren't corrupt files left lying around.
  • When a Task is cancelled it throws a CancelledError. This would need to be caught in _get_http and _get_ftp to cleanly terminate the task.

Downloads fail when behind proxy

Any downloads behind a proxy hang indefinitely.

I realize this is difficult to test so I'm happy to be the guinea pig for any proposed fixes! Of course, all proxies may be slightly different as well...

MWE

>>> from parfive import Downloader
>>> dl = Downloader()
>>> dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
>>> files = dl.download()  # <-- this hangs forever

Also true when downloading sample data in sunpy or doing any query with sunpy.net.Fido.

Timeout options sourced from env variables should be cast to `int`

When reading the timeouts from the environment variables,

timeouts = timeouts or {"total": os.environ.get("PARFIVE_TOTAL_TIMEOUT", 5 * 60),
"sock_read": os.environ.get("PARFIVE_SOCK_READ_TIMEOUT", 90)}

the resulting values should be cast to int. Otherwise, this results in a TypeError due to the attempted comparison between a str and an int. I'm guessing this is happening somewhere in aiohttp.

Pytest 6.2.5: Unraisable Exception Warning

Hi,

  • Python 3.9
  • Pytest 6.2.5

During packing your project for Guix I've got the warning during check phase:

=============================== warnings summary ===============================
parfive/tests/test_downloader.py::test_async_download[True]
parfive/tests/test_downloader.py::test_download_unique
parfive/tests/test_downloader.py::test_custom_user_agent
  /gnu/store/7frqm5ijy66f81hr8i1j6791k84lds9w-python-pytest-6.2.5/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function BaseEventLoop.__del__ at 0x7ffff65cd280>

  Traceback (most recent call last):
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/base_events.py", line 683, in __del__
      self.close()
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/unix_events.py", line 61, in close
      self.remove_signal_handler(sig)
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/unix_events.py", line 150, in remove_signal_handler
      signal.signal(sig, handler)
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/signal.py", line 47, in signal
      handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
  ValueError: signal only works in main thread of the main interpreter

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================= 53 passed, 3 warnings in 12.43s ========================

`progress=False` and `overwrite=True` are ignored by `simple_download()`

With the following code

from parfive import Downloader
files = [
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T00_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T01_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T02_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T03_05.tar',
]
dl = Downloader(progress=False, overwrite=True)
print('With enqueue_file() and download():')
for f in files: dl.enqueue_file(f, path='result')
dl.download()
print('Done')
print('With simple_download():')
dl = Downloader(progress=False, overwrite=True)
dl.simple_download(files, path='result')
print('Done')

the resulting terminal output is

With enqueue_file() and download():
Done
With simple_download():
Files Downloaded: 100%|███████████████████████████████████████████████| 4/4 [00:00<00:00, 31.12file/s]
Done

Then:

  • enqueue_file() then download() takes progress=False and overwrite=True into account
  • a progress bar is shown at the second download, so progress=False is ignored by simple_download()
  • the download speed indicates that overwrite=True is also ignored by simple_download(); removing the files before the second download results in a much slower (and realistic) download speed
Files Downloaded: 100%|███████████████████████████████████████████████| 4/4 [00:21<00:00,  5.28s/file]

Expected behavior: progress=False and overwrite=True should not be ignored by simple_download()

Specifying overwrite=True in the call to simple_download() works, but the expectation is that overwrite from Downloader is inherited by default when calling simple_download() without specifying overwrite.

User specified headers are ignored if User-Agent isn't specified

Maybe I'm missing something but I am trying to pass in headers as such -

dl = Downloader(
    headers=dict(one=1, two=2, three=3)
)

But they do not seem to be getting set and in dl.headers all i see is
{'User-Agent': 'parfive/1.5.1 aiohttp/3.8.1 python/3.8.1'}

Hope someone can shed some light on what I am doing wrong...

Thanks

Failed or interrupted downloads leave incomplete files on disk

Downloads that fail or downloads that are interrupted by the user are left as half-downloaded files. This is inconvenient, as it means one has to go through and manually delete any half-downloaded files before running a download again, as parfive will see the filename and think that a file has been downloaded.

It would be good if files could be removed if they are not completely downloaded when a download is interrupted or fails.

Add option to create temporary files for partial downloads

Currently, if the process is killed it's impossible to find out if a file was completely downloaded. Thus it would be great to add an option that files are created with a prefix or suffix and renamed to the actual filename only if they are completed.

Ipython terminal breaks

from sunpy.net import Fido, attrs as a
import astropy.units as u
from os import mkdir, path

results = Fido.search(a.jsoc.Time('2019/01/01', '2019/02/01'), a.jsoc.Notify('[email protected]'), a.jsoc.Series('aia.lev1_euv_12s'), a.jsoc.Wavelength(304 * u.angstrom),a.Sample(2 * u.hour))

downloaded_file = Fido.fetch(results, path= path.abspath('AIA304')+'/{file}')

The downloads start, the progress bar shows up and then it kind of dies.

Untitled

Adding this is a workaround

num_missing = 100
while num_missing > 0:
    downloaded_file = Fido.fetch(results, path= path.abspath('AIA304')+'/{file}')
    num_missing = len(downloaded_file.errors)

Untitled_2

Review and improve documentation of arguments

Currently there is two main parts of the API, the constructor to Download and the enqueue_file method, both of these accept many more kwargs that are in the function signature or docstring due to them getting passed down the tree into the partial functions that get executed in the loop etc. We should make sure that we are using clear keyword argument names and that it's possible to easily see all the possible arguments in the docs (and even better in function introspection).

Download fails with 400, message='Can not decode content-encoding: gzip'

When trying to download a (seemingly) plain text file, the download fails and I'm getting the following error,

[<parfive.results.Error object at 0x1078a66d0>
https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt
 400, message='Can not decode content-encoding: gzip']

The code to reproduce this is,

import parfive
dl = parfive.Downloader()
dl.enqueue_file('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt', path='.')
foo = dl.download()

parfive version: 2.0.2
Python version: 3.9
OS: macOS 12.6.2


Solution detailed here: #121 (comment)

Inform user if >0 downloads fail

I was recently downloading a bunch of files (parfive is awesome!), and parfive stopped downloading after 12/40 files, and the output in my terminal was this:

Downloading 40 files
Files Downloaded:  30%|█████████████████████████████████████████████                                                                                                         | 12/40 [00:13<00:31,  1.14s/file]

I got very confused by this, and didn't understand why parfive was stopping - if I ran the download again on the same 40 files, it would just download 12 more that it hadn't downloaded before.

Eventually I thought to check if any download errors were present, and found lots of

"Waiting for ('220',) but got 421 [' There are too many connections from your internet address.'

which explains what happened, but it took me a while to get there. I think it would be helpful if parfive issued some sort of warning if any of the downloads failed, perhaps something like "At least one download failed, see Result.errors for a list of the download errors".

python 3.8 deprecation warning

I'm seeing this with python 3.8 and parfive:

/home/vsts/work/1/s/.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:101: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
    self.http_tokens = asyncio.Queue(maxsize=self.max_conn, loop=self.loop)

Task was destroyed but still pending

Task was destroyed but it is pending!
task: <Task pending name='Task-119' coro=<Downloader._write_worker() running at /home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/site-packages/parfive/downloader.py:469> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7fdd519a08e0>()]>>
Exception ignored in: <coroutine object Downloader._write_worker at 0x7fdd51988440>
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/site-packages/parfive/downloader.py", line 469, in _write_worker
    offset, chunk = await queue.get()
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/queues.py", line 165, in get
    getter.cancel()  # Just in case getter is not done yet.
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/base_events.py", line 719, in call_soon
    self._check_closed()
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

Callback on download completion

It would be great to be able to pass a callback on download completion, ideally with the status of completion (succesful/error, source url, target path). That would for example enable me to implement #122 myself.

Large downloads from the Solar Orbiter Archive timeout

This code attempts to download a 928Mb file:

from parfive import Downloader
dl = Downloader(max_conn=1)
url = 'http://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=LAST_PRODUCT&product_type=SCIENCE&data_item_id=solo_L2_swa-pas-vdf_20201026'
dl.enqueue_file(url, path="./", max_splits=1)
files = dl.download()
print(files)

But after ~600 - 700Mb it reliably times out with:

Files Downloaded:   0%|                                                                                  | 0/1 [05:00<?, ?file/s]
[]lo_L2_swa-pas-vdf_20201026_V01.cdf:  73%|█████████████████████████████████████▏             | 715M/982M [05:00<02:01, 2.20MB/s]
Errors:
(error(filepath_partial=functools.partial(<function default_name at 0x7f82a960f8b0>, PosixPath('.')), url='http://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=LAST_PRODUCT&product_type=SCIENCE&data_item_id=solo_L2_swa-pas-vdf_20201026', exception=TimeoutError()))

Downloading this file from the archive via. the web interface and Firefox works fine.

Add debug logging and env vars

Would be really useful in debugging issues with both requests and asyncio issues to know how far each of the coroutines got through their processing, also along the same lines, would be nice if you could disable multiple chunks etc with env vars

Add automatic retry option

It would be great if you could add a flag to configure the number of retries in case a download fails.

ResourceWarning in downloader

I'm seeing the following warning in the pfsspy tests. Not entirely sure what's causing it, but thought I would leave this here so there's a paper trail if anyone wants to investigate further:

 _______________________ ERROR at setup of test_adapt_map _______________________

cls = <class '_pytest.runner.CallInfo'>
func = <function call_runtest_hook.<locals>.<lambda> at 0x7f500bc548b0>
when = 'setup'
reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)

    @classmethod
    def from_call(
        cls,
        func: "Callable[[], TResult]",
        when: "Literal['collect', 'setup', 'call', 'teardown']",
        reraise: Optional[
            Union[Type[BaseException], Tuple[Type[BaseException], ...]]
        ] = None,
    ) -> "CallInfo[TResult]":
        excinfo = None
        start = timing.time()
        precise_start = timing.perf_counter()
        try:
>           result: Optional[TResult] = func()

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/runner.py:311: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/runner.py:255: in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/unraisableexception.py:83: in pytest_runtest_setup
    yield from unraisable_exception_runtest_hook()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def unraisable_exception_runtest_hook() -> Generator[None, None, None]:
        with catch_unraisable_exception() as cm:
            yield
            if cm.unraisable:
                if cm.unraisable.err_msg is not None:
                    err_msg = cm.unraisable.err_msg
                else:
                    err_msg = "Exception ignored in"
                msg = f"{err_msg}: {cm.unraisable.object!r}\n\n"
                msg += "".join(
                    traceback.format_exception(
                        cm.unraisable.exc_type,
                        cm.unraisable.exc_value,
                        cm.unraisable.exc_traceback,
                    )
                )
>               warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
E               pytest.PytestUnraisableExceptionWarning: Exception ignored in: <socket.socket fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>
E               
E               Traceback (most recent call last):
E                 File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/parfive/downloader.py", line 527, in _http_download_worker
E                   await queue.put((offset, chunk))
E               ResourceWarning: unclosed <socket.socket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.1.0.4', 55242), raddr=('146.5.21.61', 443)>

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.