Git Product home page Git Product logo

httpio's Introduction

httpio

HTTP resources as random-access file-like objects

httpio is a small Python library that allows you to access files served over HTTP as file-like objects (which is to say that they support the interface of the standard library's BufferedIOBase class). It differs from libraries like urllib and requests in that it supports seek() (which moves an internal pointer), and that read() makes a request with the Range header set. It also supports caching of contents using a configurable block size, and will reuse TCP connections where possible.

Installation

Use pip to install httpio:

$ pip install httpio

Usage

import zipfile
import httpio

url = "http://some/large/file.zip"
with httpio.open(url) as fp:
    zf = zipfile.ZipFile(fp)
    print(zf.namelist())

Unit Tests

Unit tests are provided for the standard behaviours implemented by the library. They can be run with

$ python -m unittest discover -s tests

or a tox.ini file is provided which allows the tests to be run in virtual environments using the tox tool:

$ tox

httpio's People

Contributors

barneygale avatar jamesba avatar z4ce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

httpio's Issues

support plugable IO with anyio and plugable http client strategy

I'd like be able to use httpio with httpx on trio, I'd be able to do this if all of this with a pluggable session strategy:

@attr.s(frozen=True)
class AIOHttpFactory:
    _session = attr.ib()
    _kwargs = attr.ib()

    def length(self, url) -> int:
        async with self._session.head(self.url, **self._kwargs) as response:
            response.raise_for_status()
            return int(response.headers.get('content-length', None))

    def get(self, url, start, end) -> bytes:
        headers = {
            "Range": "bytes=%d-%d" % (start, end - 1),
            **self._kwargs.get("headers", {})
        }
        kwargs = {**self._kwargs, "headers": headers}
        async with self._session.get(url, **kwargs) as response:
            response.raise_for_status()
            return await response.read()

    @classmethod
    @contextlib.asynccontextmanager
    def session(cls, kwargs):
        async with asyncio.ClientSession() as s:
            yield cls(s, kwargs)
    def __init__(self, url, block_size=-1, session_factory=None, **kwargs):
        """
        :param url: The URL of the file to open
        :param block_size: The cache block size, or `-1` to disable caching.
        :param kwargs: Additional arguments to pass to `session.get`
        """
        super(AsyncHTTPIOFile, self).__init__()
        self.url = url
        self.block_size = block_size

        if session_factory is None
            from . import aiohttp_strategy
            session_factory = aiohttp_strategy.AIOHttpFactory

        self._session_factory = session_factory
        ...

"httpio.HTTPIOError: Server does not accept 'Range' headers" when server does accept Range

For some reason, it seems AmazonS3 may return the "accept-ranges: bytes" header more than once. requests will concatenate the values, resulting in the header value being bytes, bytes, and then:

if response.headers.get('Accept-Ranges', '').lower() != 'bytes':

This test triggers, and we can't open the file.

I'd guess a better test would be "bytes" in response.headers.get(...)

Also, it looks like the Accept-Ranges header is not required:
https://tools.ietf.org/html/rfc7233#section-2.3

An origin server that supports byte-range requests for a given target
resource MAY send Accept-Ranges: bytes to indicate what range units are supported.

Here's an url you can try for reproduction
https://files.pythonhosted.org/packages/44/e8/4ae9ef3d455f4ce5aa22259cb6e40c69b29ef6b02d49c5cdfa265f7fc821/Django-3.0.1.tar.gz

Requirement aiohttp is missing

>>> import httpio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/frafra/.cache/pypoetry/virtualenvs/draft-2Uym0jAL-py3.9/lib64/python3.9/site-packages/httpio/__init__.py", line 229, in <module>
    from .asyncio import AsyncHTTPIOFileContextManagerMixin
  File "/home/frafra/.cache/pypoetry/virtualenvs/draft-2Uym0jAL-py3.9/lib64/python3.9/site-packages/httpio/asyncio.py", line 4, in <module>
    from httpio_async import AsyncHTTPIOFile, AsyncHTTPIOFileContextManagerMixin  # noqa: F401
  File "/home/frafra/.cache/pypoetry/virtualenvs/draft-2Uym0jAL-py3.9/lib64/python3.9/site-packages/httpio_async/__init__.py", line 6, in <module>
    import aiohttp
ModuleNotFoundError: No module named 'aiohttp'

httpio/setup.py

Lines 7 to 10 in 18c1b98

install_requires = [
'requests >= 2.10.0',
'six'
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.