Git Product home page Git Product logo

iterpath's Introduction

Project Status: Active — The project has reached a stable, usable state and is being actively developed. CI Status coverage pyversions MIT License

GitHub | PyPI | Issues | Changelog

iterpath lets you iterate over a file tree as a single iterator of pathlib.Path objects, eliminating the need to combine lists returned by os.walk() or recursively call Path.iterdir() or os.scandir(). Besides the standard os.walk() options, the library also includes options for sorting & filtering entries.

Installation

iterpath requires Python 3.8 or higher. Just use pip for Python 3 (You have pip, right?) to install it:

python3 -m pip install iterpath

Example

Iterate over this library's source repository, skipping the .git and test/data folders:

>>> import os.path
>>> from iterpath import iterpath
>>> def filterer(dir_entry):
...     if dir_entry.name == ".git":
...         return False
...     elif dir_entry.path == os.path.join(".", "test", "data"):
...         return False
...     else:
...         return True
...
>>> with iterpath(".", sort=True, filter_dirs=filterer) as ip:
...     for p in ip:
...         print(p)
...
.github
.github/workflows
.github/workflows/test.yml
.gitignore
LICENSE
MANIFEST.in
README.rst
TODO.md
pyproject.toml
setup.cfg
src
src/iterpath
src/iterpath/__init__.py
src/iterpath/__pycache__
src/iterpath/__pycache__/__init__.cpython-39.pyc
src/iterpath/py.typed
test
test/test_iterpath.py
tox.ini

API

The iterpath module provides a single function, also named iterpath:

iterpath(dirpath: AnyStr | os.PathLike[AnyStr] = os.curdir, **kwargs) -> Iterpath[AnyStr]

Iterate through the file tree rooted at the directory dirpath (by default, the current directory) in depth-first order, yielding the files & directories within as pathlib.Path instances.

The return value is both an iterator and a context manager. In order to ensure that the internal os.scandir() iterators are closed properly, either call the close() method when done or else use it as a context manager like so:

with iterpath(...) as ip:
    for path in ip:
        ...

If return_relative is true, the generated Path objects will be relative to dirpath. If return_relative is false (the default) and dirpath is an absolute path, the generated Path objects will be absolute; otherwise, if dirpath is a relative path, the Path objects will be relative and will have dirpath as a prefix.

Note that, although iterpath() yields pathlib.Path objects, it operates internally on os.DirEntry objects, and so any function supplied as the sort_key parameter or as a filter/exclude parameter must accept os.DirEntry instances.

Keyword arguments:

dirs: bool = True
Whether to include directories in the output
topdown: bool = True
Whether to yield each directory before (True) or after (False) its contents
include_root: bool = False
Whether to include the dirpath argument passed to iterpath() in the output
followlinks: bool = False
Whether to treat a symlink to a directory as a directory
return_relative: bool = False
If true, the generated paths will be relative to dirpath
onerror: Optional[Callable[[OSError], Any]] = None
Specify a function to be called whenever an OSError is encountered while iterating over a directory. If the function reraises the exception, iterpath() aborts; otherwise, it continues with the next directory. By default, OSError exceptions are ignored.
sort: bool = False
Sort the entries in each directory. When False, entries are yielded in the order returned by os.scandir(). When True, entries are sorted, by default, by filename in ascending order, but this can be changed via the sort_key and sort_reverse arguments.
sort_key: Optional[Callable[[os.DirEntry[AnyStr]], _typeshed.SupportsLessThan]] = None
Specify a custom key function for sorting directory entries. Only has an effect when sort is True.
sort_reverse: bool = False
Sort directory entries in reverse order. Only has an effect when sort is True.
filter: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None

Specify a predicate to be applied to all files & directories encountered; only those for which the predicate returns a true value will be yielded (and, for directories, descended into).

If filter is specified, it is an error to also specify filter_dirs or filter_files.

filter_dirs: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None
Specify a predicate to be applied to all directories encountered; only those for which the predicate returns a true value will be yielded & descended into
filter_files: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None
Specify a predicate to be applied to all files encountered; only those for which the predicate returns a true value will be yielded
exclude: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None

Specify a predicate to be applied to all files & directories encountered; only those for which the predicate returns a false value will be yielded (and, for directories, descended into).

If exclude is specified, it is an error to also specify exclude_dirs or exclude_files.

exclude_dirs: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None
Specify a predicate to be applied to all directories encountered; only those for which the predicate returns a false value will be yielded & descended into
exclude_files: Optional[Callable[[os.DirEntry[AnyStr]], Any]] = None
Specify a predicate to be applied to all files encountered; only those for which the predicate returns a false value will be yielded

If both filter and exclude are set, a given entry will only be included if filter returns true and exclude returns false (that is, exclusions take priority over inclusions), and likewise for the directory- and file-specific arguments.

Warnings:

  • If dirpath is a relative path, changing the working directory while iterpath() is in progress will lead to errors, or at least inaccurate results.
  • Setting followlinks to True can result in infinite recursion if a symlink points to a parent directory of itself.

Selectors

New in version 0.3.0

iterpath also provides a selection of "selector" classes & constants for easy construction of filter and exclude arguments. Selectors are callables that return true for DirEntry's whose (base) names match given criteria.

Selectors can even be combined using the | operator:

# This only returns entries whose names end in ".txt" or equal "foo.png" or
# ".hidden":
iterpath(
    dirpath,
    filter=SelectGlob("*.txt") | SelectNames("foo.png", ".hidden")
)

# Exclude all dot-directories and VCS directories:
iterpath(dirpath, exclude_dirs=SELECT_DOTS | SELECT_VCS_DIRS)

The selectors:

class SelectNames(*names: AnyStr, case_sensitive: bool = True)

Selects DirEntry's whose names are one of names. If case_sensitive is False, the check is performed case-insensitively.

class SelectGlob(pattern: AnyStr)

Selects DirEntry's whose names match the given fileglob pattern

class SelectRegex(pattern: AnyStr | re.Pattern[AnyStr])

Selects DirEntry's whose names match (using re.search()) the given regular expression

SELECT_DOTS

Selects DirEntry's whose names begin with a period

SELECT_VCS

Selects DirEntry's matched by either SELECT_VCS_DIRS or SELECT_VCS_FILES (see below)

SELECT_VCS_DIRS

Selects the following names of version-control directories: .git, .hg, _darcs, .bzr, .svn, _svn, CVS, RCS

SELECT_VCS_FILES

Selects the following names of version-control-specific files: .gitattributes, .gitignore, .gitmodules, .mailmap, .hgignore, .hgsigs, .hgtags, .binaries, .boring, .bzrignore, and all nonempty filenames that end in ,v

iterpath's People

Contributors

dependabot[bot] avatar jwodder avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

iterpath's Issues

Replace all filtering parameters with one parameter that returns an enum

(Note: All new identifiers in this issue are working titles only.)

Eliminate iterpath()'s filter* and exclude* arguments and replace them with a single argument select: Callable[[os.DirEntry], Operation] | None = None. Operation is here an enum with the following variants:

  • INCLUDE — Yield the entry and, if it's a directory, descend into it
  • EXCLUDE — Do not yield or descend into the directory
  • PRUNE — Yield the entry, but (if it's a directory) do not descend into it (cf. #4)
  • DESCEND_ONLY — Do not yield the entry, but (if it's a directory) still descend into it (cf. #4)

Problem: How would having more than two return values work with applying logical operators to selectors?

Pass an object with various path-related attributes to filter & sort functions

Make filter & sort functions take as an argument a single instance of a custom class (named PathInfo?) with the following attributes:

  • entry — the os.DirEntry object
  • path — equal to entry.path converted to a Path
  • name — the entry's basename as a str
  • relpath — the entry's Path relative to dirpath
  • dirpath (root? root_dirpath?) — the original dirpath passed to iterpath()

Support honoring `.gitignore` and `.ignore` files

Possible API: Give iterpath() an ignore: bool = False (working name) argument; when true, .gitignore and .ignore files are honored during traversal.

  • Alternatively, ignore can be set to an iterable of strings giving the names of files to treat as (git)ignore-style files to honor.

  • Is it necessary to honor ignore files in parent directories of the starting directory?

  • How exactly should this work when dirs is True? Should anything be done about directories that don't contain any matching files? What about directories that don't match any patterns?

  • Add an option for also honoring the $HOME-wide gitignore file?

Work solely with `os.DirEntry[str]` internally

If iterpath() is passed a bytes value for dirpath, convert it to str immediately and only work with str variants of os.DirEntry internally and when passing DirEntrys to filter & sort functions.

Add a selector for filtering based on custom gitignore patterns

Add a SelectGitignore selector that can be constructed from either an iterable of gitignore patterns or a dict mapping relative subdirs to sequences of gitignore patterns. Unlike #7, this does not honor gitignore files on the file system; instead, filtering is done using gitignore rules specified in the code.

  • Also take an optional case_sensitive: bool = True argument?

Support yielding XOR descending into certain directories

Add the ability for a user-supplied filter to indicate that a directory should be (a) yielded but not descended into or (b) descended into but not yielded.

  • Implement this by letting filter_dirs return an enum with PRUNE and DESCEND_ONLY(?)/INCLUDE_FILES(?) values?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.