olson-sean-k / wax Goto Github PK

View Code? Open in Web Editor NEW

99.0 3.0 8.0 444 KB

Opinionated and portable globs that can be matched against paths and directory trees.

Home Page: https://glob.guide

License: MIT License

Rust 100.00%

file-system glob pattern rust

wax's People

Contributors

Stargazers

Watchers

Forkers

lmburns mtcoin406 dorumin brunoschmidt arlyon michel-slm joseluisq erichdongubler

wax's Issues

Implement explicit repetitions of sub-globs.

Globs tend to be easier to read than regular expressions and Wax attempts to straddle the goals of a simple and familiar syntax suited for paths and an expressive and flexible syntax. Unix-like glob syntax has some severe limitations. Globs do not use the more flexible pattern-repetition form that many regular expression engines support and instead conflate these ideas into concepts like the zero-or-more wildcard *, which matches a pattern of anything zero or more times. This is simple and handles most common use cases, but is inflexible.

Notably, Wax supports a limited form of character classes, but these patterns are rendered mostly useless by always using an implicit exactly-one repetition. Also note that some expressions are simply impossible, such as rejecting a variable number of directories that are prefixed with a dot ..

As an escape hatch, Wax could provide an explicit repetition mechanism for more advanced usage. It would function somewhat like alternatives and allow arbitrary nesting, but would importantly allow crossing arbitrary component boundaries. In fact, it could be the ultimate representation of a tree token, which is a specific instance of such a pattern.

One possible syntax could use < and > as delimiters around a sub-glob paired with an optional repetition also delimited by < and >. For example, <[!.]*/><0,> would match zero or more directories that do not begin with a dot .. This could be shortened with a lack of a repetition specification defaulting to zero-or-more: <[!.]*/>. A complete specification could include an upper bound, where <[0-9]><1,3> would match between one and three instances of the digits zero through nine. Today, this can only be expressed as {[0-9],[0-9][0-9],[0-9][0-9][0-9]}. Yikes.

As with alternatives, care would be needed to detect and reject adjacent boundary tokens and zero-or-more wildcards. Additionally, it could be useful to detect and reject nonsense sub-globs, such as singular * and **. Such nonsense patterns are unfortunate, but it may be worth the rough edges since this would greatly increase the expressive power of globs while, for most use cases, keeping things mostly familiar and simple.

Semantic literals are not properly detected.

There is at least one case of false positive and numerous false negatives in Glob::has_semantic_literals and report::diagnostics. Semantic literals are detected via the token::literals function, but that function's behavior is a bit unusual and is insufficient in some cases. At the time of writing, it adapts a token sequence into a sequence of component and literal sequence pairs. To do this, it first uses the token::components function and, if a component is not itself a literal sequence, it recurses into any and all group tokens (alternatives and repetitions). Functions that detect semantic literals simply iterate over the sequence of pairs to find any literals of note, in particular . and ...

This causes false positives in glob expressions like **/a{b,..}c, because token::literals does not consider tokens adjacent to groups within a component and emits .. in its output. Note that in that example, it is not possible for .. to occur alone in the final component, because a and c are always present. Similarly, glob expressions like **/{a,.}{b,.} result in false negatives, because the possible match .. in the final component is not considered at all.

The token::literals function could be reworked, but I think its behavior is already a bit murky and that, instead, a bespoke detection function is needed. Perhaps the machinery in the rule module could be refactored and reused, as it already considers tokens that are adjacent to groups within a component and something similar is needed here.

Single wildcard tree does never matches

Hello,

I might have noticed a bug.

It is stated in the documentation that "If a glob expression consists solely of a tree wildcard, then it matches any and all paths and the complete contents of any and all directory trees, including the root".

I have the following file structure in /tmp/files:

├── dir1
 |   ├── file.txt
├── dir2
 |   ├── file.txt
├── dir3
 |   ├── file.txt

with the following Rust code:

fn main() {
    let paths = wax::Glob::new("**").unwrap()
        .walk(std::path::Path::new("/tmp/files"))
        .not::<[&str; 0]>([]).unwrap()
        .map(|entry| entry.unwrap().path().to_owned())
        .collect::<Vec<_>>();

    println!("{:?}", paths);
}

The output is:

[]

when i remove the negation:

fn main() {
    let paths = wax::Glob::new("**").unwrap()
        .walk(std::path::Path::new("/tmp/files"))
        .map(|entry| entry.unwrap().path().to_owned())
        .collect::<Vec<_>>();

    println!("{:?}", paths);
}

it works:

["/tmp/files", "/tmp/files/dir3", "/tmp/files/dir3/file.txt", "/tmp/files/dir2", "/tmp/files/dir2/file.txt", "/tmp/files/dir1", "/tmp/files/dir1/file.txt"]

Did i miss anything ?

Wax should allow unescaped ',' and ':' outside Alternatives and Repetitions

Wax restricts the use of literals containing ',' and ':' because of their use on Alternatives and Repetitions and the fact that they recurse on glob, but it restricts literals unnecessarily with such characters.

I believe that literal should be parametrized with the restriction on those characters only when they are in use by the expression, making escaping optional when they are not in use.

Today a literal like this will break without scaping:

Glob::new("extra:dots.txt").unwrap();

Causing the following error:

panicked at 'called `Result::unwrap()` on an `Err` value: BuildError { kind: Parse(ParseError { expression: "extra:dots.txt", locations: [ErrorEntry { fragment: ":dots.txt", location: 5, kind: Nom(Eof) }] }) }'

Support more options?

It would be nice to support some glob options, like matching hidden directory or not.

In the nu-glob crate(which is a fork of glob crate with some addition, it supports something like this: https://github.com/nushell/nushell/blob/main/crates/nu-glob/src/lib.rs#L870-L892

/// Configuration options to modify the behaviour of `Pattern::matches_with(..)`.
#[allow(missing_copy_implementations)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
pub struct MatchOptions {
    /// Whether or not patterns should be matched in a case-sensitive manner.
    /// This currently only considers upper/lower case relationships between
    /// ASCII characters, but in future this might be extended to work with
    /// Unicode.
    pub case_sensitive: bool,

    /// Whether or not path-component separator characters (e.g. `/` on
    /// Posix) must be matched by a literal `/`, rather than by `*` or `?` or
    /// `[...]`.
    pub require_literal_separator: bool,

    /// Whether or not paths that contain components that start with a `.`
    /// will require that `.` appears literally in the pattern; `*`, `?`, `**`,
    /// or `[...]` will not match. This is useful because such files are
    /// conventionally considered hidden on Unix systems and it might be
    /// desirable to skip them when listing files.
    pub require_literal_leading_dot: bool,

    /// if given pattern contains `**`, this flag check if `**` matches hidden directory.
    /// For example: if true, `**` will match `.abcdef/ghi`.
    pub recursive_match_hidden_dir: bool,
}

Still maintained?

Hello @olson-sean-k 👋

I'm a big fan of this crate, I believe it to be the best glob crate that currently exists.

However, I've noticed there's been no updates in over a year, and was wondering if this is still maintained. I'm well aware of how much time/cost maintaining projects takes, so understandable if you're busy.

At minimum, it would be great to at least get dependency upgrades (like miette). Would you be open to collaborators?

Cheers!

Facilities for writing custom `Patterns`

Hi! Great library!

I would like to be able to write custom combinators such as InclusiveEmptyAny which matches any path in the event that the list of globs is empty. I tried implementing Pattern and Compose myself to do so but it seems like I need to implement From<Checked> to do so, which is private. The types that the library uses to do so are private also, so it seems that this is not possible as-is.

Similarly, it is impossible to implement Pattern::matched correctly since MatchedText and friends are currently private also.

Any pointers?

//! A simple `wax` combinator that unconditionally matches if the set of globs
//! is empty.

use std::convert::Infallible;

use wax::{Any, BuildError, CandidatePath, Compose, MatchedText, Pattern, Variance};

pub struct InclusiveEmptyAny<'a>(Option<Any<'a>>);

impl<'a> InclusiveEmptyAny<'a> {
    pub fn new<I>(patterns: I) -> Result<Self, BuildError>
    where
        I: IntoIterator,
        I::Item: Compose<'a>,
    {
        let iter = patterns.into_iter().collect::<Vec<_>>();
        if iter.len() == 0 {
            Ok(Self(None))
        } else {
            Ok(Self(Some(wax::any(iter)?)))
        }
    }
}

impl<'t> Compose<'t> for InclusiveEmptyAny<'t> {
    type Tokens = (); // what should this be?
    type Error = Infallible;
}

impl<'t> Pattern<'t> for InclusiveEmptyAny<'t> {
    fn is_match<'p>(&self, path: impl Into<CandidatePath<'p>>) -> bool {
        match self.0 {
            Some(ref any) => any.is_match(path),
            None => true,
        }
    }

    fn matched<'p>(&self, path: &'p CandidatePath<'_>) -> Option<MatchedText<'p>> {
        match self.0 {
            Some(ref any) => any.matched(path),
            None => Some(path.into()), // is this ok?
        }
    }

    fn variance(&self) -> Variance {
        match self.0 {
            Some(ref any) => any.variance(),
            None => Variance::Variant,
        }
    }

    fn is_exhaustive(&self) -> bool {
        match self.0 {
            Some(ref any) => any.is_exhaustive(),
            None => true,
        }
    }
}

error[E0277]: the trait bound `wax::rule::Checked<()>: From<InclusiveEmptyAny<'t>>` is not satisfied
   --> crates/globwalk/src/empty_glob.rs:27:18
    |
27  |     type Error = Infallible;
    |                  ^^^^^^^^^^ the trait `From<InclusiveEmptyAny<'t>>` is not implemented for `wax::rule::Checked<()>`
    |
    = help: the following other types implement trait `From<T>`:
              <wax::rule::Checked<wax::token::Token<'t, ()>> as From<wax::Any<'t>>>
              <wax::rule::Checked<wax::token::Tokenized<'t>> as From<Glob<'t>>>
    = note: required for `InclusiveEmptyAny<'t>` to implement `Into<wax::rule::Checked<()>>`
    = note: required for `wax::rule::Checked<()>` to implement `TryFrom<InclusiveEmptyAny<'t>>`
note: required by a bound in `Compose`
   --> /Users/arlyon/Programming/wax/src/lib.rs:314:36
    |
314 |     TryInto<Checked<Self::Tokens>, Error = <Self as Compose<'t>>::Error>
    |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Compose`

error[E0277]: the trait bound `wax::rule::Checked<()>: From<InclusiveEmptyAny<'t>>` is not satisfied
   --> crates/globwalk/src/empty_glob.rs:30:10
    |
30  | impl<'t> Pattern<'t> for InclusiveEmptyAny<'t> {
    |          ^^^^^^^^^^^ the trait `From<InclusiveEmptyAny<'t>>` is not implemented for `wax::rule::Checked<()>`
    |
    = help: the following other types implement trait `From<T>`:
              <wax::rule::Checked<wax::token::Token<'t, ()>> as From<wax::Any<'t>>>
              <wax::rule::Checked<wax::token::Tokenized<'t>> as From<Glob<'t>>>
    = note: required for `InclusiveEmptyAny<'t>` to implement `Into<wax::rule::Checked<()>>`
    = note: required for `wax::rule::Checked<()>` to implement `TryFrom<InclusiveEmptyAny<'t>>`
    = note: required for `InclusiveEmptyAny<'t>` to implement `Compose<'t>`
note: required by a bound in `wax::Pattern`
   --> /Users/arlyon/Programming/wax/src/lib.rs:276:24
    |
276 | pub trait Pattern<'t>: Compose<'t, Error = Infallible> {
    |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Pattern`

`Glob::partition` documentation incorrectly claims that `Glob` outputs are always unrooted.

At time of writing, the documentation for Glob::partition says:

Partitioned Globs are never rooted. If the glob expression has a root component, then it is always included in the invariant PathBuf prefix.

But this isn't true! There is at least one exception I'm aware of, which are expressions beginning with a rooted repetition. For example, </root:1,>.

let (path, glob) = Glob::new("</root:1,>").unwrap().partition();
assert!(glob.has_root()); // OK. `glob` has been partitioned but is still rooted!

`Glob::partition` panics if the `Glob` is empty.

Calling Glob::partition on an empty Glob panics! This function should never panic and in this case should probably return an empty PathBuf and the empty Glob as is.

This bug raises some additional concerns that are worth pointing out. The notion of an empty glob (as with empty paths and especially empty path components) is a bit strange and annoying. Moreover, the API of Glob::partition is a victim of these empty values, because it returns (PathBuf, Glob<'_>)! There is no (sane) way for clients to determine if either part is empty.

Create a website for documentation.

The wax README is much too long and currently provides the only detailed documentation for glob expression syntax. I imagine that the most common way to figure out how to compose a glob expression today is to browse to the repository page on GitHub. This is a bit awkward and quite noisy, I think, especially from the perspective of someone using some software that incidentally uses wax for globbing.

I've used mkdocs to create a website for plexus at plexus.rs and I'm mostly happy with the results. I think soemthing similar for wax could work and should include detailed documentation and examples of glob expression syntax without too much clutter. Ideally, there should be a good landing page for scenarios like using a CLI tool and following a link to learn more about how to write a glob to match some files.

I think the README needs some work and should only briefly mention what glob expression support without going into too much detail. Other technical details should also be moved into a documentation website too.

Suboptimal performance when using Walk::not

let glob = make_glob("crates/**/*.rs");
let walk = glob.walk("/path/to/dir").not(vec![make_glob("crates/**/target")]);

Consider the code below

wax/src/walk.rs

Lines 749 to 767 in 1afcda8

 if let Some(result) = self.input.next() { 

 if let Ok(entry) = result.as_ref() { 

 match (self.f)(entry) { 

 None => { 

 return Some(result); 

 }, 

 Some(FilterTarget::File) => { 

 continue; 

 }, 

 Some(FilterTarget::Tree) => { 

 if entry.file_type().is_dir() { 

 self.input.skip_tree(); 

 } 

 continue; 

 }, 

 } 

 } 

 return Some(result); 

 }

Given the globs above, the walker will end up traversing into every single target folder and never trigger a skip. The 'skip tree' optimisation will never trigger because it is only able to skip a folder after it is yielded to the negation via self.input.next(), and most target folders will a) not have rust files to yield and b) if they do, will not match the exclude case because they are a file and not the folder itself. That means that we will not correctly detect when we can TreeIterator::skip_tree. Seems like we need to invert the order here so that the FilterTree is wrapped by the Walk so that it may reject items first. Alternatively we could modify the Walk pattern to include the excluded globs, so that they are yielded to the FilterTree for exclusion but that seems a little inelegant...

I believe that this will have no impact on the behaviour, but it will make walking with negations much more efficient.

Files with leading dots and wildcard globs

Hello

Wax currently matches "hidden" files with a wildcard glob, which differs from traditional Unix shell globs:

$ mkdir test
$ touch test/.x
$ cat src/main.rs 
fn main() {
    let glob = wax::Glob::new("*").unwrap();
    for entry in glob.walk_with_behavior("test", 1) {
        println!("> {:?}", entry);
    }
}
$ cargo run -q
> Ok(WalkEntry { entry: DirEntry("test/.x"), matched: MatchedText { inner: Owned(OwnedText { matched: ".x", ranges: [Some((0, 2))] }) } })

Is this by design? Do you think a behavior field could be added to make it work like shell globs?

Miette integration seems not to be completed and other understandable behaviors

With Cargo file like this:

[package]
name = "wax-proto"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
miette = { version = "5.3.0", features = ["fancy"] }
wax = { version = "0.5.0", features = ["diagnostics", "diagnostics-report", "miette"] }

Note that, the Cargo file can be like this too, the problem will be the same:

[package]
name = "wax-proto"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
miette = { version = "5.3.0", features = ["fancy"] }
wax = { version = "0.5.0", features = ["diagnostics"] }

With the file, main.rs:

use miette::Result;
use wax::Glob;

fn main() -> Result<()> {
    Glob::new("**//*.txt")?;
    Ok(())
}

When we run the command cargo run:

I found this result a little odd. With activated features in Cargo.toml, and https://github.com/olson-sean-k/wax/blob/master/src/lib.rs#L401, the response is understandable.

And that, even if we add the Diagnostic trait:

use miette::Diagnostic;
use miette::Result;
use wax::Glob;

fn main() -> Result<()> {
    Glob::new("**//*.txt")?;
    Ok(())
}

Plus, with that version of main.rs:

use miette::Diagnostic;
use wax::{Glob, GlobError};

fn main() -> Result<(), GlobError<'static>> {
    Glob::new("**//*.txt")?;
    Ok(())
}

We have this response:

We don't have the expected fancy printing from Miette.

We can notice that in the Cargo.toml of Wax, here: https://github.com/olson-sean-k/wax/blob/master/Cargo.toml#L42, the fancy feature is not activated. Is it normal ?

To have a good use of the integration of Miette, there is something missing in our prototype project ?

Provide more comprehensive exhaustiveness queries for `Program`s and matches.

Exhaustiveness describes the extent of a match in a directory tree. A Program (glob) is exhaustive if a match indicates that any path in the sub-tree must also match the pattern. Today, this is provided in the public API by Program::is_exhaustive.

Program::is_exhaustive returns a bool, but this should really be a trivalent logic type. Some patterns have indeterminate exhaustiveness in the absence of a specific candidate path due to alternatives. For example, {**/*.txt,foo/**} is indeterminate: it is nonexhaustive when matching the path bar/baz.txt but is exhaustive when matching the path foo. I plan to introduce a more appropriate type to represent this like the following:

#[derive(Clone, Copy, Debug, Eq, Hash PartialEq)]
pub enum When {
    Never,
    Sometimes,
    Always,
}

let glob = Glob::new(...).unwrap();
if glob.exhaustiveness().is_always() { ... }

Given a candidate path, it should be possible to definitively determine the exhaustiveness of a match, so a bool can be used in that context: the match is either exhaustive or not. This could be provided by Program::matched:

let candidate = CandidatePath::from(...);
let glob = Glob::new(...).unwrap();
if let Some(matched) = glob.matched(&candidate) {
    let text = matched.text(); // Gets the `MatchedText`. This is returned directly today.
    if matched.is_exhaustive() { ... }
}

Unfortunately, there are some big challenges in implementing this.

I don't believe there is a way to query a compiled regular expression for which branch of an alternative matched a given input. Barring that, it could be necessary to compile regular expressions for each branch of an alternative along with some way to pair them with components in the matched path. Yikes.
Exhaustiveness queries are still inaccurate and need improvement. There are many false negatives today (note that false positives are serious bugs). I've done a lot of experimental work on this outside of this repository, but nothing has landed here yet.
Despite the progress I've made on variance and exhaustiveness queries, they still don't support an indeterminate result (When::Sometimes) and I'm not yet sure how to accomplish that.

So why do this? 😄 Well, not only would it provide a more correct API (though I realize that the overwhelming majority of users don't care about these APIs), but it would also remove the need to partition patterns into exhaustive and nonexhaustive patterns in negations. In particular, the FileIterator::not combinator would no longer need to accept a sequence of patterns and could instead accept a single pattern. This could of course be an any combinator, so it would still be possible to use multiple patterns in a single not.

let glob = Glob::new(...).unwrap();
for entry in glob.walk(".").not(Glob::new(...).unwrap()) { ... }

This removes the need for not to handle building programs and emitting errors. More importantly, it also prevents poor performance caused by pathological inputs. Building both nonexhaustive and exhaustive patterns into an any combinator can cause a negation to consider all matches nonexhaustive and therefore read directory trees when it may not be necessary. I also plan to introduce additional combinators that would also benefit from this for the same reasons. For example, a not_except combinator:

let glob = Glob::new(...).unwrap();
for entry in glob
    .walk(".")
    .not_except(
        wax::any([..., ...]).unwrap(), // Negation.
        Glob::new(...).unwrap(), // Override. Allow this pattern despite the negation.
    )
{
    ...
}

I prefer these APIs, but they cannot efficiently filter without knowing about exhaustiveness. Given only a single pattern, these APIs would rely on exhaustiveness information in Program::matched rather than Program::exhaustiveness.

Question: `miette` integration

Could you give some sample code on how to get the output shown in the README, e.g.:

Error: glob::rule

  x malformed glob expression: adjacent zero-or-more wildcards `*` or `$`
   ,----
 1 | doc/**/*{.md,.tex,*.txt}
   :        |^^^^^^^^|^^^^^^^
   :        |        | `-- here
   :        |        `-- in this alternative
   :        `-- here
   `----

I added the resp. feature flag to wax but the error message I get doesn't contain the diagnostics. I am probably missing something obvious.

build-fs-tree dependency out of date

Please consider updating build-fs-tree from 0.3.0 to 0.6.0:

https://crates.io/crates/build-fs-tree/

Unfortunately upstream does not seem to provide a changelog so it's hard to say how much code modification is needed.

performance

I've been working on ways to make globbing with wax a little faster with rayon and I keep coming up empty. I'm beginning to wonder if it needs to be built in somewhere so that it globs a directory and then all the directories inside get their own thread to glob their contents, up to some limit. Any thoughts on this?

is case-insensitive the default now with 0.6.0?

It seems like case sensitivity has changed between 0.5.0 and 0.6.0. I'm fairly confident that before in nushell, using our glob command that uses wax globbing, we could do glob "c*" and get only the lowercase c matches, but now we get all upper and lower case matches. In fact, I have a special example in the glob command's help text that shows how to make a case-insensitive match by doing glob '(?i)c*'. Just wondering if I missed some updates that explain this or if this is a bug. Thanks!

I'm testing on windows, if that's helpful. It looks like case-sensitivity may be turned off for windows paths. I'm wondering if that's the issue. BTW - Windows can have case-sensitive paths but it's pretty rare.

The outputs of `Glob::partitioned` don't interact well with `Glob::walk`.

I need to confirm some of this, but I believe that when Glob::partitioned returns an empty Glob that Glob::walk will never yield any results, even if the prefix is used to derive the root directory and refers to an existing file. For example:

let path = Path::new(".");
// The glob expression is invariant, so `prefix` is the complete path and `glob` is empty.
let (prefix, glob) = Glob::partitioned("/mnt/media/movie.mp4").unwrap();
for entry in glob.walk(path.join(prefix), usize::MAX) {
    // Should yield `/mnt/media/movie.mp4` if it exists, but does not.
}

In the code generated by the walk! macro, the case where there are no remaining path components and also no component regular expressions is not considered. The for loop is never entered and so no entries are yielded. Again, I haven't had a chance to test this just yet, but I wanted to open an issue since I'm fairly certain this does not work as expected and I don't want to forget. 😅

I believe matching a path using the prefix as seen in this example works correctly: stripping the prefix will yield an empty path and an empty glob will only match an empty candidate path.

Remove lifetime generics from `GlobError`?

First off, thank you for this library. The glob situation in Rust is... not ideal, and this library is amazing.

With that being said, it would be nice if GlobError and all its children did not use lifetime generics. Wanting to return Result<..., GlobError> from my methods forces me down this rabbit hole of adding lifetimes to all my methods, to any necessary arguments, and in the end just overly complicates everything.

It then puts me in situations like this:

Which are sometimes really difficult to work around. Right now my only path forward is to use unwrap(), but that has its own problems.

Trouble getting `not()` to build

Me again. I recently updated to 0.5.0 and was looking forward to using not() for negation. My code changes are here: https://github.com/moonrepo/moon/pull/130/files#diff-59f9b6fb165685b88f87aed332e7f4f396bb795b2ad3f302f7d7944a36aa7cf6L123

For context, the negations come from a config file and are a Vec<String>, but Rust fails to build when I pass this type to not().

error[E0271]: type mismatch resolving `<std::string::String as TryInto<Glob<'_>>>::Error == BuildError<'_>`
   --> crates/utils/src/glob.rs:126:14
    |
126 |             .not(negations)?
    |              ^^^ expected struct `BuildError`, found enum `Infallible`
    |

I then tried references with &Vec<string> and even Vec<&str> with no luck. Both throw the same error above.

I've even tried converting the strings to Globs and passing Vec<Glob>, but that still fails to build also.

error[E0271]: type mismatch resolving `<Glob<'_> as TryInto<Glob<'_>>>::Error == BuildError<'_>`
   --> crates/utils/src/glob.rs:126:14
    |
126 |             .not(
    |              ^^^ expected struct `BuildError`, found enum `Infallible`
    |

The only thing that does work is explicitly passing [&str], like .not(["test/**/*"]), but this isn't an option for me.

Basically everything I've tried has failed, and I'm curious if you have any suggestions here.

How to represent "don't match"

@olson-sean-k

I'm trying to convert some JS globs to an equivalent wax glob and they support a concept of "matching anything but this / don't match this pattern" using the syntax !(pat): https://github.com/micromatch/picomatch#extglobs

For the other ext globs, I can use <pat:x,x> but the don't match doesn't seem possible, since <pat:0,0> is not allowed? Is there a way to achieve this in wax?

Optimize based on how predicted maximum depth

Globs can sometimes know how deep they will have to traverse. * will never have to go deeper than one level, and */*/* will never have to go deeper than three.

You can't predict how deep a tree wildcard ** can go, so you have to bail out on any heuristic that encounters it. Same goes for repetitions.

Such a heuristic could probably make a definite prediction on maximum depth for patterns that don't include any of these, but the important thing would to never underestimate how many segments a pattern may match. So it would probably be smart to only apply it when the user hasn't provided a WalkBehavior.depth. Unfortunately, it's not an Option, so maybe you'd have to compare it to usize::MAX, or change WalkBehavior and introduce a breaking change.

Is this a reasonable thing to expect from the library? I would like for it to be in this library and not in, say, nym, if implemented.

bug(0.2.0): valid globs are refused

Using wax = "0.2.0", some globs are not properly parsed:

fn test_glob(s: &str) {
    match wax::Glob::new(s) {
        Err(e) => println!("{:?}", e),
        Ok(_) => println!("-"),
    };
}

fn main() {
    test_glob("**/**a/*.rs");
    test_glob("**a/*.rs");
}

$ cargo run
Parse(ParseError { expression: "**/**a/*.rs", kind: Eof })
Parse(ParseError { expression: "**a/*.rs", kind: Tag })

Is this intended? The EOF one is pretty weird too.

Case sensitivity is not considered in `Glob::partition`.

I plan to expose flags for controlling case sensitivity in glob expressions. For example, **/*.(?i){jpg,jpeg} would match the extensions jpg or jpeg without case sensitivity (a la typical regular expressions). While working on this, it became clear that case sensitivity must be considered in Glob::partition.

Unix file systems do not consider case or related character classes at all, but Windows file systems do (specifically the Win32 API) and are effectively case insensitive. This means that glob matching may disagree with the resolution of paths done by the target platform and this (along with any future case sensitivity flags) must be reflected by Glob::partition and related APIs.

For example, the glob foo/*.bar is split into the path foo and glob *.bar by Glob::partitioned. However, globs are currently case sensitive, so on Windows this may lead to inconsistent behavior between Glob::new with Glob::is_match versus Glob::partitioned with Glob::walk:

let glob = Glob::new("foo/*.bar").unwrap();
// This is not a match, regardless of platform. Globs are case sensitive everywhere.
assert!(!glob.is_match("FOO/qux.bar"));

let (prefix, glob) = Glob::partitioned("foo/*.bar").unwrap();
// On Windows, this will descend into `./FOO`, which disagrees with `is_match` above.
for entry in glob.walk(Path::new(".").join(prefix), usize::MAX) { /*...*/ }

Compile failed in rustc 1.62.0-nightly (e85edd9a8 2022-04-28)

error[E0106]: missing lifetime specifier
    --> src/token.rs:1447:61
     |
1447 |     ) -> impl FnMut(Input<'i>) -> ParseResult<'i, TokenKind<Annotation>> {
     |                                                             ^ expected named lifetime parameter
     |
     = help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'i` lifetime
     |
1447 |     ) -> impl FnMut(Input<'i>) -> ParseResult<'i, TokenKind<'i, Annotation>> {
     |                                                             +++

For more information about this error, try `rustc --explain E0106`.
error: could not compile `wax` due to previous error

Support non-utf8 path?

Refer to: nushell/nushell#2987
If we create a file from the code:

  use std::ffi::OsString;
  use std::fs;
  use std::io;
  use std::os::unix::ffi::OsStringExt;
  
  fn main() -> io::Result<()> {
      let chars: Vec<u8> = (1..=46)
          .into_iter()
          .chain(48..=70)
          .chain(150..200)
          .collect();
  
      fs::create_dir(OsString::from_vec(chars.clone()))
  }

And run glob walk with the following code:

use wax::Glob as WaxGlob;

fn main() {
    let glob = WaxGlob::new("*").unwrap();
    let glob_results: Vec<_> = glob.walk(".").collect();
}

It makes the program panic, with the message says that: unexpected encoding: Utf8Error

Traverses directory boundaries when performing negative matching

Hi!

I have noticed surprising behaviour when running a set of integration tests that I put together against micromatch (a javascript glob algorithm). I am on board with almost all of the changes, except one, and I wanted to make sure it was a bug before diving into the code to 'fix' it.

When presented with the glob a[!b]c, wax will happily match path separators in place of b which is odd. So, a/c passes that test case.

    #[test]
    fn negative_match_does_not_traverse_folders() {
        let glob = Glob::new("a[!b]c").unwrap();
        assert!(glob.is_match(Path::new("adc")));
        assert!(!glob.is_match(Path::new("a/c")));
    }

Compare this to a glob playground:

https://www.digitalocean.com/community/tools/glob?comments=true&glob=a%5B%21b%5Dc&matches=false&tests=%2F%2F%20expected&tests=adc&tests=%2F%2F%20wax%20fails%20here&tests=a%2Fc

Drive letter on Windows?

Hi,

I'm using nushell which relies on this crate for the glob feature.
Unfortunately it fails to resolve windows drive letters, after reading wax's readme I guess it has to do with the repetition token?

Is there a way to escape these?

Here are a few non working samples (in nu):
nushell/#7125

I'm a bit short on time lately but if you have pointers to solve this I can also propose a PR at some point

Thanks

	if let Some(result) = self.input.next() {
	if let Ok(entry) = result.as_ref() {
	match (self.f)(entry) {
	None => {
	return Some(result);
	},
	Some(FilterTarget::File) => {
	continue;
	},
	Some(FilterTarget::Tree) => {
	if entry.file_type().is_dir() {
	self.input.skip_tree();
	}
	continue;
	},
	}
	}
	return Some(result);
	}