Git Product home page Git Product logo

anypath's Introduction

AnyPath

AnyPath makes it trivial to fetch remote resources and work with them locally. It provides a normalized interface over different resources so that handling them is always consistent.

with AnyPath('sftp://jane@host:/home/jane') as path:
    path.joinpath('somefile.txt').open().read()

Here AnyPath will copy the directory /home/jane from a remote host via ssh to a local temporary directory. It is then possible to work with the files locally. After we are done the temporary files are deleted. Therefore AnyPath is useful if you want to fetch e.g. some config files or a small project directory from a remote location and work with it locally.

Installation

To install simply do:

pip install anypath

Dependencies

By default AnyPath does not install the dependencies for the different providers. You should install them as needed:

Provider Dependencies
git local git installation
mercurial local mercurial installation
http pip install requests
sftp pip install paramiko
local None

Basic Usage

AnyPath uses pathproviders to handle different remote resources. The resources are then fetched to a new temporary directory where you can work with them. The newly fetched ressources are wrapped in a pathlib.Path.

from anypath.anypath import AnyPath, path_provider
from anypath.pathprovider.http import HttpPath

path_provider.add(HttpPath)

with AnyPath('http://example.org') as path:
    path.open().read()

First you register all the providers that you want to use (note: remember to install the dependencies per provider). Now you can open any uri that has a scheme known to one of the registered providers.

Provider Schemes
git
  • git+http://
  • git+https://
  • git://
mercurial
  • hg+http://
  • hg+https://
http
  • http://
  • https://
sftp
  • sftp://
  • ssh://
local
  • file://
  • /
  • ./

You can use AnyPath either as a contextmanager (with AnyPath ...) or directly by calling fetch(). Beware that you will have to call close() manually when not using the contextmanager to cleanup the temporary files.

path_provider.add(HttpPath)
ap = AnyPath('http://example.org')
path = ap.fetch() # type: Path
path.open().read()
ap.close()

Persistance

The example so far was useful if you are only interested in the content of a fetched resource. They are created in a temporary folder, where you can work with them, and are deleted afterwards. Sometimes however you may want to persist the remote resource outside of a temporary location.

path_provider.add(HttpPath)

with AnyPath('http://example.org', persist_dir='/your/local/path') as path:
    path.open().read()

Instead of copying the files manually you can specify a persist_dir when creating the AnyPath. The temporary resources will then be copied to that location. As a result you will get the persist_dir wrapped as an pathlib.Path instead of the temporary location and you can directly work with it.

Providers and options

While the defaults for fetching resources might be fine for many use cases there are many situations where you might want to pass some options to a provider. You might for example want to do a POST with an HttpPath or pass credentials to a GitPath.

Options are always passed as keyword arguments. Following you will find all providers and their available options.

Http

The options are passed to a requests.Request object, they behave the same and are named accordingly.

AnyPath('http://example.org', method='GET', data=None, headers=None, params=None)
Option Description
method

Default: 'GET'

Specifies the HTTP method to be used as a string.

E.g. POST, DELETE, PUT

data

Default: None

The body to attach to the request.

If a dictionary is provided, form-encoding will take place.

headers

Default: None

A dictionary of headers to send in the request.

params

Default: None

A dictionary of URL parameters to append to the URL.

Sftp

The path for Sftp is expected to be in the format sftp://user@host:/path/on/host, additional options can be set via arguments.

AnyPath('sftp://user@localhost:/path/on/host', password=None, private_key=None, port=22)
Option Description
password

Default: None

The password for the user.

Also used if private_key is given, and the key requires a password

private_key

Default: None

The path to the local private_key (as a string) if it is used to login

port

Default: 22

The ssh port to be used.

Git

None

Mercurial

None

Local

None

Checking for dependencies

By default dependencies are only checked right before the appropriate PathProvider is called, i.e., at the moment the remote resources should be fetched. It is possible to check for dependencies as soon as all PathProviders are registered. There are two methods to do that, get_requirements() and check_requirements(). get_requirements() only returns a dictionary of all dependencies (modules and executables) that would be needed, while check_requirements() fully checks for all dependencies to be present and would raise an exception if they are not:

>>> path_provider.add(HttpPath, SftpPath, GitPath)
>>> path_provider.get_requirements()
{'modules': ['requests', 'paramiko'], 'executables': ['git']}

If the requirements for HttpPath (the requests module) would not be met calling check_requirements() would raise an exception:

>>> path_provider.add(HttpPath)
>>> path_provider.check_requirements()
...anypath.dependencies.NotInstalledError: Python module requests is not installed.

Limitations

You might not want to use AnyPath if you are working with a huge remote resource. Everything is fetched to your local machine, which might take some time and cost a lot of space if you try to work with a whole filesystemn of a remote host for example. It is also not intended do do updates to the remote resource since there is no mechanism to write changes back to the remote.

Contributing

You can contribute in any of the following areas, no matter if it is your first OSS contribution or your thousandths. Contributions are welcome for example:

  • If you find any issue or bug when using AnyPath
  • If you want to add to the documentation or fix incorrect or missing documentation.
  • If you want to add features or work on the codebase in general

Just file an issue in the tracker first describing what you would like to do and then create a pull-request.

Writing a new PathProvider

Creating a new PathProvider requires writing a new class; using it requires registering it via path_provider.add()

The basic structure of a PathProvider looks like this:

@pattern('protocol://')
@required_executables('some_executable')
@dependencies('some_py_module')
class MyPath(BasePath):
    def __init__(self, protocol, path, persist_dir, some_option='default'):
        super().__init__(protocol, path, persist_dir)
        ...

    @BasePath.wrapped
    def fetch(self):
        ...

Here a PathProvider MyPath is created, it registers a protocol that it can handle, declares some requirements and has a fetch method which does the actual work. The PathProvider must always inherit from anypath.BasePath.

Decorators

There are three class decorators available for a PathProvider:

pattern('')

This decorator is required. It lists all patterns a path can start with on which the PathProvider can act. In the example MyPath registers the pattern protocol://, meaning whenever a path start with protocol:// MyPath will be called.

A PathProvider can register more than one pattern, each pattern is passed as a single argument to the pattern decorator. The HttpPath for example registers http:// and https:// using @pattern('http://', 'https://').

required_executables('')

This decorator is optional. It lists all required executables which must be available on the system to perform the tasks of the PathProvider. In the example MyPath specifies, that some_executable must exists and be callable from within the program.

AnyPath checks for the existence via checking shutil.which(executable) is None where executable is the exectuable specified in required_executables().

dependencies('')

This decorator is optional. It lists all required python modules which must be available to be imported to perform the tasks of the PathProvider. In the example MyPath specifies, that some_py_module must exists and be importable.

AnyPath will import the module via importlib.import_module(module) where module is the module specified in dependencies().

The fetch method

The class must include a fetch method which will be called to fetch the remote resources. The fetch method must have the method decorator @BasePath.wrapped. Its main purpose is to call pre and post actions to fetching the resources. Those actions are creating a temporary directory and persisting the temporary files if needed.

License

AnyPath is licensed under "Mozilla Public License Version 2.0". See LICENSE.txt for the full license.

anypath's People

Contributors

vaubarth avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

johncf

anypath's Issues

Should anypath include a "VirtualPath"?

In its current form anypath always copies the remote file to a local temporary directory. This is not feasible for large remotes.
It might be interesting to implement a VirtualPath that works on the remote directly whenever possible. However this might be out-of-scope for anypath and might be rather put into another project.

Make dependencies inspectable

Dependencies of pathproviders should be inspectable so that they can be verified before a fetch is done.

Most likely solution is using decorators.

Extract dependencies from fetch

Currently the PathProviders check for dependencies or import them in the fetch method.
Implement a dependencies function to do that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.