cadair / parfive Goto Github PK

View Code? Open in Web Editor NEW

48.0 4.0 21.0 303 KB

An asyncio based parallel file downloader for Python 3.7+

Home Page: https://parfive.readthedocs.io/

License: MIT License

Python 87.69% Jupyter Notebook 12.31%

parfive's Introduction

Parfive

A parallel file downloader using asyncio. parfive can handle downloading multiple files in parallel as well as downloading each file in a number of chunks.

Usage

parfive works by creating a downloader object, appending files to it and then running the download. parfive has a synchronous API, but uses asyncio to paralellise downloading the files.

A simple example is:

from parfive import Downloader
dl = Downloader()
dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
files = dl.download()

Parfive also bundles a CLI. The following example will download the two files concurrently.:

$ parfive 'http://212.183.159.230/5MB.zip' 'http://212.183.159.230/10MB.zip'
$ parfive --help
usage: parfive [-h] [--max-conn MAX_CONN] [--overwrite] [--no-file-progress]
              [--directory DIRECTORY] [--print-filenames]
              URLS [URLS ...]

Parfive, the python asyncio based downloader

positional arguments:
  URLS                  URLs of files to be downloaded.

optional arguments:
  -h, --help            show this help message and exit
  --max-conn MAX_CONN   Number of maximum connections.
  --overwrite           Overwrite if the file exists.
  --no-file-progress    Show progress bar for each file.
  --directory DIRECTORY
                        Directory to which downloaded files are saved.
  --print-filenames     Print successfully downloaded files's names to stdout.

Results

parfive.Downloader.download returns a parfive.Results object, which is a list of the filenames that have been downloaded. It also tracks any files which failed to download.

Handling Errors

If files fail to download, the urls and the response from the server are stored in the Results object returned by parfive.Downloader. These can be used to inform users about the errors. (Note, the progress bar will finish in an incomplete state if a download fails, i.e. it will show 4/5 Files Downloaded).

The Results object is a list with an extra attribute errors, this property returns a list of named tuples, where these named tuples contains the .url and the .response, which is a aiohttp.ClientResponse or a aiohttp.ClientError object.

Installation

parfive is available on PyPI, you can install it with pip:

pip install parfive

or if you want to use FTP downloads:

pip install parfive[ftp]

Requirements

Python 3.7 or above
aiohttp
tqdm
aioftp (for downloads over FTP)

Licence

MIT Licensed

Authors

parfive was written by Stuart Mumford.

parfive's People

Contributors

Stargazers

Watchers

parfive's Issues

Python 3.11 has deprecated the cgi module

Doc build broken

The doc build is broken, with the rather cryptic error:

Theme error:
An error happened in rendering the page api/parfive.Downloader.
Reason: UndefinedError("'str object' has no attribute 'items'")

I've deduced this is due to a recent change in https://github.com/sunpy/sunpy-sphinx-theme

Can't download NOAA SRS FTP file

import parfive
print(parfive.__version__)

from parfive import Downloader
dl = Downloader()
dl.enqueue_file("ftp://ftp.ngdc.noaa.gov/STP/swpc_products/daily_reports/solar_region_summaries/2015/01/20150101SRS.txt", path="./")
files = dl.download()

raises

1.5.1

Files Downloaded:   0%|                                                                                            | 0/1 [00:00<?, ?file/s]Failed to get size of FTP file
Traceback (most recent call last):
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/parfive/utils.py", line 69, in get_ftp_size
    size = await client.stat(filepath)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 820, in stat
    code, info = await self.command("MLST " + str(path), "2xx")
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 272, in command
    self.check_codes(expected_codes, code, info)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/aioftp/client.py", line 224, in check_codes
    raise errors.StatusCodeError(expected_codes, received_code, info)
aioftp.errors.StatusCodeError: Waiting for ('2xx',) but got 550 [' Permission denied.']
Files Downloaded:   0%|                                                                                            | 0/1 [00:01<?, ?file/s]
20150101SRS.txt: 0.00B [00:00, ?B/s]%

I can download this file fine using Cyberduck.

Python 3.6 with parfive 1.0.x and aioftp

E AttributeError: module 'contextlib' has no attribute 'asynccontextmanager' on the sunpy 2.0.X conda (py36) build.

Looks like the conda build uses parfive 1.0.2 and this has errored.

../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/data/__init__.py:5: in <module>
    from sunpy.data._sample import download_sample_data
../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/data/_sample.py:7: in <module>
    from sunpy.util.parfive_helpers import Downloader
../../.tox/py36-conda/lib/python3.6/site-packages/sunpy/util/parfive_helpers.py:5: in <module>
    import parfive
../../.tox/py36-conda/lib/python3.6/site-packages/parfive/__init__.py:5: in <module>
    from .downloader import Downloader
../../.tox/py36-conda/lib/python3.6/site-packages/parfive/downloader.py:17: in <module>
    import aioftp
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/__init__.py:4: in <module>
    from .client import *
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/client.py:553: in <module>
    class Client(BaseClient):
../../.tox/py36-conda/lib/python3.6/site-packages/aioftp/client.py:1164: in Client
    @contextlib.asynccontextmanager
E   AttributeError: module 'contextlib' has no attribute 'asynccontextmanager'

Way to find number of queued files

It would be nice if Downloader could have an attribute that is the number of files queued for download. (just starting to play with this for HelioPy)

Command line interface?

It might be useful to create a command line interface, so people can just fire off requests without having to write any code to use the library

As a basic interface, just give it a file with one URL per line, or maybe a lot of URLs as arguments (although that could hit up against limits)

If you wanted to be fancy, have it be able to read a tab delim or CSV file and specify what field it should look at for URLs. (yes, people could use the unix cut command to extract the URLs first, but not everyone's on a posix-type system)

Build man pages for the CLI

Maybe with sphinx?

Add ability to resume downloads

Downloads should be resumed if it encountered a network problem or such and got cancelled.

This would only work if the server supports the range request header.

Implementation needs discussion.

Queue Full Error with 1.2.0rc1

../../.tox/py38-online/lib/python3.8/site-packages/sunpy/net/fido_factory.py:369: in fetch
    results = downloader.download()
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:273: in download
    return self._run_in_loop(self.run_download(timeouts))
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:193: in _run_in_loop
    return asyncio.run(coro)
/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/runners.py:43: in run
    return loop.run_until_complete(main)
/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/base_events.py:616: in run_until_complete
    return future.result()
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:226: in run_download
    done.update(await self._run_http_download(main_pb, timeouts))
../../.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:324: in _run_http_download
    self.http_tokens.generate_queue(maxsize=self.max_conn),
../../.tox/py38-online/lib/python3.8/site-packages/parfive/utils.py:171: in generate_queue
    queue.put_nowait(item)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Queue at 0x7fd43f044160 maxsize=4 _queue=[<parfive.utils.Token object at 0x7fd43efaff70>n = 1, <parfive.utils.Token o...= 2, <parfive.utils.Token object at 0x7fd43efafbe0>n = 3, <parfive.utils.Token object at 0x7fd43efafa30>n = 4] tasks=4>
item = <parfive.utils.Token object at 0x7fd43efafd00>n = 5

    def put_nowait(self, item):
        """Put an item into the queue without blocking.
    
        If no free slot is immediately available, raise QueueFull.
        """
        if self.full():
>           raise QueueFull
E           asyncio.queues.QueueFull

/opt/hostedtoolcache/Python/3.8.5/x64/lib/python3.8/asyncio/queues.py:148: QueueFull

https://dev.azure.com/sunpy/sunpy/_build/results?buildId=9500&view=logs&j=a7b3aa55-7d57-562f-3433-7f6b2d4252da&t=c806977a-3b27-51ee-41ed-46d1d189e419&l=960

Document download call blocking and non-blocking calls

From the documentation it's not clear what the current behaviour of dl.download() is. But either way it would be great if you would add an option to toggle the behaviour: blocking or non-blocking.

Add a class method for quick download

If you have an iterable of URLs it should be easy to download all of them in one go, at the moment you have to instantiate Downloader call enqueue_file for each URL and then call download.

It would be nice to add a classmethod which looks like this:

@classmethod
def quick_download(cls, *urls, path=None, overwrite=None):
    dl = cls()
    for url in urls:
        dl.enqueue_file(url, path=path, overwrite=overwrite)
    return dl.download()

would be a quick and easy way to download a bunch of files. If you wanted more control over the kwargs then you could use the longer API.

Support HTTP Basic Auth

This involves a way of passing arguments to aiohttp.ClientSession which means doing it at construct time (or via properties) as we only use one session per call to download().

Change ensure_future to create_task

https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task

we can do this because we are now 3.7+

Support max connections per server

Given we can now open many connections per server (n files x n splits) it would be useful to be able to specify limits for servers which are not able (or even block) too many connections. This would mean that we could prioritise downloads from other servers to maintain the parallelism while putting less load on servers.

Make running download from an async context public

If you want to await the download you have to use Downloader._run_download and it's a little clunky. We should probably allow this as first class public API.

possibility of adding aiofiles

Was aiofiles considered in the Downloader

If you would value such a feature I will PR it.

Kill the event loop when appropriate

The main event loop keeps running in it's own little thread even if the main sync download function get's killed.

We should somehow pass through a KeyboardInterrupt in the sync download function through to the underlying event loop.

Need to improve the download error handling.

From SunPy Travis build:

test_norh.py::test_get_url_for_time_range[timerange1-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121201-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121202] 
../.tox/py37-online/lib/python3.7/site-packages/sunpy/net/dataretriever/tests/test_norh.py::test_get_url_for_time_range[timerange1-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121201-ftp:/anonymous:[email protected]@solar-pub.nao.ac.jp/pub/nsro/norh/data/tcx/2012/12/tca121202] Task was destroyed but it is pending!
task: <Task pending coro=<Downloader._get_ftp() running at /home/travis/build/sunpy/sunpy/.tox/py37-online/lib/python3.7/site-packages/parfive/downloader.py:415> wait_for=<Future finished exception=TimeoutError(110, "Connect call failed ('140.90.33.174', 58394)")> cb=[Downloader._run_from_queue.<locals>.callback(<parfive.util...d1170e80>n = 1, main_pb=Files Downloa...21<?, ?file/s])() at /home/travis/build/sunpy/sunpy/.tox/py37-online/lib/python3.7/site-packages/parfive/downloader.py:267, _wait.<locals>._on_completion() at /home/travis/miniconda/lib/python3.7/asyncio/tasks.py:440]>
Exception ignored in: <coroutine object Downloader._get_ftp at 0x7f19d2484648>
RuntimeError: coroutine ignored GeneratorExit
Future exception was never retrieved
future: <Future finished exception=TimeoutError(110, "Connect call failed ('140.90.33.174', 58394)")>
Traceback (most recent call last):
  File "/home/travis/miniconda/lib/python3.7/asyncio/selector_events.py", line 505, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
TimeoutError: [Errno 110] Connect call failed ('140.90.33.174', 58394)

We need to make sure we are catching and handling all the errors inside the _get_http and _get_ftp methods.

keyboard interrupt does not kill the download loop cleanly

In a jupyter notebook, because the loop is running in a threadpool, this means that the download isn't cancelled at all. In a regular session it causes all the coroutines to fail with a KeyboardInterrupt exception.

Thoughts on implementing this:

We can react to the signal with event_loop.add_signal_handler.
The event handler should cleanly terminate all pending and all running task, by calling Task.cancel (addressing #61 should mean we always have a Task object to cancel).
This should involve removing any part-finished files from disk so that there aren't corrupt files left lying around.
When a Task is cancelled it throws a CancelledError. This would need to be caught in _get_http and _get_ftp to cleanly terminate the task.

Downloads fail when behind proxy

Any downloads behind a proxy hang indefinitely.

I realize this is difficult to test so I'm happy to be the guinea pig for any proposed fixes! Of course, all proxies may be slightly different as well...

MWE

>>> from parfive import Downloader
>>> dl = Downloader()
>>> dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
>>> files = dl.download()  # <-- this hangs forever

Also true when downloading sample data in sunpy or doing any query with sunpy.net.Fido.

Does parfive still support python 3.6?

Readme requirements states, it requires python 3.7+ while pypi and github description says it supports 3.6+

Timeout options sourced from env variables should be cast to `int`

When reading the timeouts from the environment variables,

parfive/parfive/downloader.py

Lines 287 to 288 in a2d6c72

 timeouts = timeouts or {"total": os.environ.get("PARFIVE_TOTAL_TIMEOUT", 5 * 60), 

 "sock_read": os.environ.get("PARFIVE_SOCK_READ_TIMEOUT", 90)}

the resulting values should be cast to int. Otherwise, this results in a TypeError due to the attempted comparison between a str and an int. I'm guessing this is happening somewhere in aiohttp.

Do something with file overwriting

Especially when downloading into a path the downloader needs to know if it should skip, overwrite or error.

Pytest 6.2.5: Unraisable Exception Warning

Hi,

Python 3.9
Pytest 6.2.5

During packing your project for Guix I've got the warning during check phase:

=============================== warnings summary ===============================
parfive/tests/test_downloader.py::test_async_download[True]
parfive/tests/test_downloader.py::test_download_unique
parfive/tests/test_downloader.py::test_custom_user_agent
  /gnu/store/7frqm5ijy66f81hr8i1j6791k84lds9w-python-pytest-6.2.5/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function BaseEventLoop.__del__ at 0x7ffff65cd280>

  Traceback (most recent call last):
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/base_events.py", line 683, in __del__
      self.close()
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/unix_events.py", line 61, in close
      self.remove_signal_handler(sig)
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/asyncio/unix_events.py", line 150, in remove_signal_handler
      signal.signal(sig, handler)
    File "/gnu/store/65i3nhcwmz0p8rqbg48gaavyky4g4hwk-python-3.9.9/lib/python3.9/signal.py", line 47, in signal
      handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
  ValueError: signal only works in main thread of the main interpreter

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================= 53 passed, 3 warnings in 12.43s ========================

Add support for in memory file download.

`progress=False` and `overwrite=True` are ignored by `simple_download()`

With the following code

from parfive import Downloader
files = [
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T00_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T01_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T02_05.tar',
    'https://idoc-regards-data.ias.u-psud.fr/SDO_DEM/2012/08/DEM_aia_2012-08-10T03_05.tar',
]
dl = Downloader(progress=False, overwrite=True)
print('With enqueue_file() and download():')
for f in files: dl.enqueue_file(f, path='result')
dl.download()
print('Done')
print('With simple_download():')
dl = Downloader(progress=False, overwrite=True)
dl.simple_download(files, path='result')
print('Done')

the resulting terminal output is

With enqueue_file() and download():
Done
With simple_download():
Files Downloaded: 100%|███████████████████████████████████████████████| 4/4 [00:00<00:00, 31.12file/s]
Done

Then:

enqueue_file() then download() takes progress=False and overwrite=True into account
a progress bar is shown at the second download, so progress=False is ignored by simple_download()
the download speed indicates that overwrite=True is also ignored by simple_download(); removing the files before the second download results in a much slower (and realistic) download speed

Files Downloaded: 100%|███████████████████████████████████████████████| 4/4 [00:21<00:00,  5.28s/file]

Expected behavior: progress=False and overwrite=True should not be ignored by simple_download()

Specifying overwrite=True in the call to simple_download() works, but the expectation is that overwrite from Downloader is inherited by default when calling simple_download() without specifying overwrite.

Should parfive create target directories?

User specified headers are ignored if User-Agent isn't specified

Maybe I'm missing something but I am trying to pass in headers as such -

dl = Downloader(
    headers=dict(one=1, two=2, three=3)
)

But they do not seem to be getting set and in dl.headers all i see is
{'User-Agent': 'parfive/1.5.1 aiohttp/3.8.1 python/3.8.1'}

Hope someone can shed some light on what I am doing wrong...

Thanks

Failed or interrupted downloads leave incomplete files on disk

Downloads that fail or downloads that are interrupted by the user are left as half-downloaded files. This is inconvenient, as it means one has to go through and manually delete any half-downloaded files before running a download again, as parfive will see the filename and think that a file has been downloaded.

It would be good if files could be removed if they are not completely downloaded when a download is interrupted or fails.

Propagate the URL which is associated with each file path through to the Results object

It would be useful if the Results object had a property which mapped input URL to output filename, as the filename is normally provided by the Content-Disposition headers on the download request, so is not known to the user at the point they call download.

Add option to create temporary files for partial downloads

Currently, if the process is killed it's impossible to find out if a file was completely downloaded. Thus it would be great to add an option that files are created with a prefix or suffix and renamed to the actual filename only if they are completed.

parfive cli should support `--version` to show installed version.

Minor issue, but is useful when juggling multiple versions of parfive

Ipython terminal breaks

from sunpy.net import Fido, attrs as a
import astropy.units as u
from os import mkdir, path

results = Fido.search(a.jsoc.Time('2019/01/01', '2019/02/01'), a.jsoc.Notify('[email protected]'), a.jsoc.Series('aia.lev1_euv_12s'), a.jsoc.Wavelength(304 * u.angstrom),a.Sample(2 * u.hour))

downloaded_file = Fido.fetch(results, path= path.abspath('AIA304')+'/{file}')

The downloads start, the progress bar shows up and then it kind of dies.

Adding this is a workaround

num_missing = 100
while num_missing > 0:
    downloaded_file = Fido.fetch(results, path= path.abspath('AIA304')+'/{file}')
    num_missing = len(downloaded_file.errors)

Timeouts don't work / invalid filename doesn't throw an exception / content-name not recognized

As stated by the documentation downloads should timeout after 30 seconds, which they don't. For me they just never timeout. So I tried to add more timeouts to see if I can get any to work:

dl = Downloader(config=SessionConfig(timeouts=ClientTimeout(total=16, sock_read=16)))

But for this I have the same effect: Some downloads get stuck forever.

Add support for multiple URLs to a single file

If the first URL fails sequentially try all the others.

Review and improve documentation of arguments

Currently there is two main parts of the API, the constructor to Download and the enqueue_file method, both of these accept many more kwargs that are in the function signature or docstring due to them getting passed down the tree into the partial functions that get executed in the loop etc. We should make sure that we are using clear keyword argument names and that it's possible to easily see all the possible arguments in the docs (and even better in function introspection).

Download fails with 400, message='Can not decode content-encoding: gzip'

When trying to download a (seemingly) plain text file, the download fails and I'm getting the following error,

[<parfive.results.Error object at 0x1078a66d0>
https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt
 400, message='Can not decode content-encoding: gzip']

The code to reproduce this is,

import parfive
dl = parfive.Downloader()
dl.enqueue_file('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt', path='.')
foo = dl.download()

parfive version: 2.0.2
Python version: 3.9
OS: macOS 12.6.2

Solution detailed here: #121 (comment)

Inform user if >0 downloads fail

I was recently downloading a bunch of files (parfive is awesome!), and parfive stopped downloading after 12/40 files, and the output in my terminal was this:

Downloading 40 files
Files Downloaded:  30%|█████████████████████████████████████████████                                                                                                         | 12/40 [00:13<00:31,  1.14s/file]

I got very confused by this, and didn't understand why parfive was stopping - if I ran the download again on the same 40 files, it would just download 12 more that it hadn't downloaded before.

Eventually I thought to check if any download errors were present, and found lots of

"Waiting for ('220',) but got 421 [' There are too many connections from your internet address.'

which explains what happened, but it took me a while to get there. I think it would be helpful if parfive issued some sort of warning if any of the downloads failed, perhaps something like "At least one download failed, see Result.errors for a list of the download errors".

python 3.8 deprecation warning

I'm seeing this with python 3.8 and parfive:

/home/vsts/work/1/s/.tox/py38-online/lib/python3.8/site-packages/parfive/downloader.py:101: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
    self.http_tokens = asyncio.Queue(maxsize=self.max_conn, loop=self.loop)

Change default RTD version to stable

Task was destroyed but still pending

Task was destroyed but it is pending!
task: <Task pending name='Task-119' coro=<Downloader._write_worker() running at /home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/site-packages/parfive/downloader.py:469> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7fdd519a08e0>()]>>
Exception ignored in: <coroutine object Downloader._write_worker at 0x7fdd51988440>
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/site-packages/parfive/downloader.py", line 469, in _write_worker
    offset, chunk = await queue.get()
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/queues.py", line 165, in get
    getter.cancel()  # Just in case getter is not done yet.
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/base_events.py", line 719, in call_soon
    self._check_closed()
  File "/home/docs/checkouts/readthedocs.org/user_builds/sunpy/conda/4315/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

Callback on download completion

It would be great to be able to pass a callback on download completion, ideally with the status of completion (succesful/error, source url, target path). That would for example enable me to implement #122 myself.

Large downloads from the Solar Orbiter Archive timeout

This code attempts to download a 928Mb file:

from parfive import Downloader
dl = Downloader(max_conn=1)
url = 'http://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=LAST_PRODUCT&product_type=SCIENCE&data_item_id=solo_L2_swa-pas-vdf_20201026'
dl.enqueue_file(url, path="./", max_splits=1)
files = dl.download()
print(files)

But after ~600 - 700Mb it reliably times out with:

Files Downloaded:   0%|                                                                                  | 0/1 [05:00<?, ?file/s]
[]lo_L2_swa-pas-vdf_20201026_V01.cdf:  73%|█████████████████████████████████████▏             | 715M/982M [05:00<02:01, 2.20MB/s]
Errors:
(error(filepath_partial=functools.partial(<function default_name at 0x7f82a960f8b0>, PosixPath('.')), url='http://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=LAST_PRODUCT&product_type=SCIENCE&data_item_id=solo_L2_swa-pas-vdf_20201026', exception=TimeoutError()))

Downloading this file from the archive via. the web interface and Firefox works fine.

 _______________________ ERROR at setup of test_adapt_map _______________________

cls = <class '_pytest.runner.CallInfo'>
func = <function call_runtest_hook.<locals>.<lambda> at 0x7f500bc548b0>
when = 'setup'
reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)

    @classmethod
    def from_call(
        cls,
        func: "Callable[[], TResult]",
        when: "Literal['collect', 'setup', 'call', 'teardown']",
        reraise: Optional[
            Union[Type[BaseException], Tuple[Type[BaseException], ...]]
        ] = None,
    ) -> "CallInfo[TResult]":
        excinfo = None
        start = timing.time()
        precise_start = timing.perf_counter()
        try:
>           result: Optional[TResult] = func()

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/runner.py:311: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/runner.py:255: in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/unraisableexception.py:83: in pytest_runtest_setup
    yield from unraisable_exception_runtest_hook()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def unraisable_exception_runtest_hook() -> Generator[None, None, None]:
        with catch_unraisable_exception() as cm:
            yield
            if cm.unraisable:
                if cm.unraisable.err_msg is not None:
                    err_msg = cm.unraisable.err_msg
                else:
                    err_msg = "Exception ignored in"
                msg = f"{err_msg}: {cm.unraisable.object!r}\n\n"
                msg += "".join(
                    traceback.format_exception(
                        cm.unraisable.exc_type,
                        cm.unraisable.exc_value,
                        cm.unraisable.exc_traceback,
                    )
                )
>               warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
E               pytest.PytestUnraisableExceptionWarning: Exception ignored in: <socket.socket fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>
E               
E               Traceback (most recent call last):
E                 File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/parfive/downloader.py", line 527, in _http_download_worker
E                   await queue.put((offset, chunk))
E               ResourceWarning: unclosed <socket.socket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.1.0.4', 55242), raddr=('146.5.21.61', 443)>

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning

	timeouts = timeouts or {"total": os.environ.get("PARFIVE_TOTAL_TIMEOUT", 5 * 60),
	"sock_read": os.environ.get("PARFIVE_SOCK_READ_TIMEOUT", 90)}