Git Product home page Git Product logo

Comments (18)

jonschlinkert avatar jonschlinkert commented on May 27, 2024 1

closing since this isn't necessarily and issue, and I think the important parts have been split into other issues. @MartinKolarik feel free to open another issue if necessary

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

maybe try this:

  • get the glob parent, which in your example is aaa/
  • join it to cwd and make it an absolute path to get the starting point for fs.readdir
  • expand brace patterns in the glob (you could use micromatch or the braces library if you want)
  • split the rest of the pattern by slashes (make sure braces are expanded first, since they might contain slashes)
  • each time you recurse into a directory
    1. shift off a pattern segment and join it to the current directory path to create an "absolute pattern"
    2. in the files loop, use the "absolute pattern" to match against directory paths in the current directory (e.g. if .isDirectory() and it matches the pattern, keep recursing)
    3. use a recurse boolean to determine whether or not to continue recursing. if a pattern segment is **, set recurse to true
    4. test any non-directory file paths against the full original (absolute) glob pattern to determine if they should be added to the results array.

edit: just a thought. as you've noted, hitting the file system is by far the most expensive part of the operation, so a little extra parsing to create better patterns will have a negligible impact on performance compared to hitting the fs.

from micromatch.

es128 avatar es128 commented on May 27, 2024

@MartinKolarik chokidar carries code that splits apart and compares the segments of the pattern and the dir paths in order to decide whether to traverse into a directory: https://github.com/paulmillr/chokidar/blob/master/index.js#L409-L419

Maybe there would be some value in extracting this determination into a new method here or a micro-lib that determines whether or not there is any possibility of a path having children that matching the pattern.

I do not think we should overload the existing boolean response for this regarding whether a path matches the pattern or not like the original suggestion. If implemented this should be its own operation.

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

@es128 unfortunately that code is very simple and fails in certain situation, as it relies on splitting the pattern by /. E.g. {bbb,xxx/yyy} will be incorrectly split into [ '{bbb,xxx', 'yyy}' ]. It might be an edge case, but I'm looking into adding this optimization to micromatch without restricting any of its functionality.

Doing some regexp transformation magic I got to a point where it worked with everything except double star, but I couldn't get past that limitation and it was a very hacky solution. That's when I thought it might be a good idea to ask if there isn't a better way.

do not think we should overload the existing boolean response for this regarding whether a path matches the pattern or not like the original suggestion. If implemented this should be its own operation.

It should definitely be a different function in the public API, but all the public functions are implemented using matcher() under the hood, so that's when I was trying to add it.

from micromatch.

es128 avatar es128 commented on May 27, 2024

@MartinKolaric ok, that makes sense

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

unfortunately that code is very simple and fails in certain situation, as it relies on splitting the pattern by /. E.g. {bbb,xxx/yyy} will be incorrectly split into [ '{bbb,xxx', 'yyy}' ].

yeah this is why I suggested expanding brace patterns first. that said, I don't think this belong here. Specifically because in order to do it correctly, we would need to use fs methods.

I think creating a microlib like @es128 suggested would be a good solution. feel free to ask questions here still if you want feedback

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

Specifically because in order to do it correctly, we would need to use fs methods.

Why's that? I'm talking about a function that takes a path and a pattern and determines whether it's possible for the path to have children matching the pattern. Whether any children or the path itself exist doesn't matter.

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

Why's that? I'm talking about a function that takes a path and a pattern and determines whether it's possible for the path to have children matching the pattern.

Okay, then I assume you would have all of the file paths up front already? Even so, you would need to know the difference between a non-directory path and a directory path in order to predict which might possibly have child directories. Meaning that foo/readme.md can't have child directories because it's a file. I don't think there is a reliable way to do that without using fs methods - again, unless you already have all of the paths to match against, in which case there might still be benefits from what you're trying to achieve, but the benefits would be minuscule compared to preventing hitting the fs in the first place

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

Even so, you would need to know the difference between a non-directory path and a directory path in order to predict which might possibly have child directories. Meaning that foo/readme.md can't have child directories because it's a file.

Yes, but that should be handled in the glob implementation, not in the matcher itself. Usage example:

  1. list content of a directory
  2. call mightMatch() for each entry
  3. discard all entries that won't ever match the pattern
  4. for each remaining entry, check whether it's a directory or a file
    • if directory, go to step 1
    • if file, check if it matches the exact pattern and do whatever you want to do with matching files

We are only concerned about the second step here.

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

discard all entries that won't ever match the pattern

This is the part that I think I might be stuck on. In order to do this with any confidence, we need to figure out which paths are directories and which are files first, then we need to use two different matchers: one that is used to determine if we should continue recursing into a directory, and one that is used to match a full filepath if it's determined that it's not a directory.

For example, given the pattern foo/**/bar/*.js, how would we use that pattern to qualify a match against the directory foo/a/b/c/bar to determine if we should continue to recurse? We can't. It would return false, so we need to know that foo/a/b/c/bar is a directory so we can use foo/**/bar to qualify a match.

In case it helps us figure out how to achieve what you want, here is a complete glob implementation using micromatch. This uses fs methods, but hopefully we can brainstorm on how to do it without fs methods - I might be too stuck on something that isn't relevant.

(fwiw I'm sure there are edge cases that this won't work on, but I tried this alongside glob and globby, and it returned identical results in all cases I tested):

var fs = require('fs');
var path = require('path');
var parent = require('glob-parent');
var extend = require('extend-shallow');
var mm = require('micromatch');
var patternCache = {};

function lookup(pattern, options, res) {
  if (typeof options === 'string') {
    options = { cwd: options };
  }

  var opts = extend({}, options);
  var orig = path.resolve(opts.cwd);
  var base = parent(pattern);

  var cwd = path.join(orig, base);
  pattern = toPattern(pattern, base);

  if (~pattern.indexOf('**')) {
    opts.recurse = true;
  }

  var arr = fs.readdirSync(cwd);
  var len = arr.length;
  var idx = -1;
  res = res || [];
  var isMatch = mm.matcher(pattern, opts);

  while (++idx < len) {
    var name = arr[idx];
    if (name === '.DS_Store') continue;

    var fp = path.resolve(cwd, name);
    var file = new File({ path: fp, cwd: cwd, base: opts.origCwd });
    file.stat = fs.statSync(fp);
    file.isDir = file.stat.isDirectory();

    if (file.isDir && isMatch(name)) {
      opts.cwd = path.resolve(file.path);
      res.cwd = opts.cwd;
      lookup(pattern, opts, res);
    }
    if (!file.isDir && opts.matchDir) {
      continue;
    }
    if (matchFile(file.path, opts.pat, opts)) {
      res.push(file.relative);
    }
  }
  return res;
}

function File(file, opts) {
  this.path = path.resolve(file.path);
  this.cwd = path.resolve(file.cwd);
  this.base = path.resolve(file.base);
  this.relative = path.relative(this.base, this.path);
}

function glob(pattern, options) {
  var opts = extend({cwd: process.cwd()}, options);
  var segs = pattern.split('/');
  var len = segs.length;
  var idx = -1;
  var files = [];

  opts.pat = path.resolve(opts.cwd, pattern);
  opts.matchDir = matchDir(pattern);
  opts.origCwd = opts.cwd;

  while (++idx < len) {
    var seg = segs[idx];
    if (!seg) break;

    opts.recurse = true;
    var res = lookup(seg, opts);
    opts.cwd = res.cwd || opts.cwd;
    files.push.apply(files, res);
  }
  return files;
}

function matchFile(filepath, pattern, opts) {
  var abs = path.resolve(opts.cwd, pattern);
  var dir = parent(abs);

  if (filepath.indexOf(dir) !== 0) {
    return false;
  }

  pattern = abs.slice(dir.length + 1);
  var name = filepath.slice(dir.length + 1);
  return mm.isMatch(name, pattern, opts);
}

function matchDir(pattern) {
  return pattern[pattern.length - 1] === '/';
}

function toPattern(pattern, parent) {
  var key = parent + pattern;
  if (patternCache[key]) return patternCache[key];
  if (parent === '.') {
    return pattern;
  }
  var re = new RegExp('^\\/?' + parent.split('/').join('\\/'));
  pattern = pattern.replace(re, '');
  pattern = pattern.replace(/^\/|\/$/g, '');
  patternCache[key] = pattern;
  return pattern;
}

var a = glob('**/*.js', {cwd: 'test'});
var b = glob('a/**/*/', {cwd: 'test/fixtures'});
var c = glob('**/b/**', {cwd: 'test/fixtures'});

console.log(a);
console.log('matches:', a.length);
console.log(b);
console.log('matches:', b.length);
console.log(c);
console.log('matches:', c.length);

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

This is the part that I think I might be stuck on. In order to do this with any confidence, we need to figure out which paths are directories and which are files first, then we need to use two different matchers: one that is used to determine if we should continue recursing into a directory, and one that is used to match a full filepath if it's determined that it's not a directory.

That's what we don't agree on, so first, let's clarify how mightMatch() should work:

Given a path and a pattern, it determines whether it's possible for the path to have children (however deep) matching the pattern. Again, we don't care if it's actually possible on FS level.

There are several ways to implement such function. One is generating a RegExp that accounts for all possible valid paths. E.g., for a pattern aaa/bbb/*.js, a normal RegExp would be /^(?:aaa\/bbb\/(?!\.)(?=.)[^\/]*?\.js)$/, and the one for mightMatch() could be ^(?:aaa(?:\/bbb(?:\/(?!\.)(?=.)[^\/]*?\.js)?)?)$ - it accepts aaa, aaa/bbb, and aaa/bbb/*.js.

One important thing about mightMatch() is, that it should return true for any path isMatch() would return true for. That is the reason we don't need to know if something is a file or not. If mightMatch() returns false, it simply won't match.

If mightMatch() returns true, then we need to check whether it's a file or directory. For files, we can then perform second check using isMatch(), to determine whether the file should be included in the results.

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

OK, it seems I was able to find a reliable solution. I'll play with it a little more and if it works well, I'll post the details here.

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

So, I was able to write a library that takes a pattern and transforms it into a list of patterns, which can then be fed to micromatch/minimatch/etc. producing a final pattern that matches all possible parent directories. The usage is something like...

var finalRegExp = new RegExp('^(?:\\/$|(?:' + gpp(pattern, options).map(function (p) {
    return micromatch.makeRe(p, options).toString().slice(1, -1);
}).join('|') + '))$');

... and it works with all micromatch features.

That said, it'd be great to have this matching mode directly in micromatch (available via mm.mightMatch()/whatever name you think would be best), so if you think this is something other users might benefit from, let me know, and I'll be happy to submit a PR.

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

Well done! Since the method would be used for matching parent directories, what if we choose a method name that has something to do with directories or parent?

I think this would be useful in micromatch. @es128 any thoughts?

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

what if we choose a method name that has something to do with directories or parent?

Maybe isParent() or possibleParent() then? I'd go with possibleParent(), as there's one catch to how this method works - in a few specific cases, it'd be very hard/impossible to determine if the parent really should match or not. In those cases, it returns true to make sure you don't miss any directory. In other words, calling this function doesn't give you 100% guarantee the path is a parent (that's OK for the designed usage, but should be reflected in the name).

from micromatch.

jonschlinkert avatar jonschlinkert commented on May 27, 2024

in a few specific cases, it'd be very hard/impossible to determine if the parent really should match or not.

can you give some examples? I'm mostly just curious. thx

from micromatch.

MartinKolarik avatar MartinKolarik commented on May 27, 2024

Situations when a / might, or might not be there, or might be there multiple times - any extglob pattern with / inside, and [:punct:], [![:alnum:]], [![:alpha:]], [![:blank:]], [![:digit:]], [![:lower:]], [![:upper:]], [![:word:]], and [![:xdigit:]] classes.

If a path part contains any of these, it is replaced with **. E.g., aaa/bbb/cc+(d/e) is transformed to aaa/bbb/**.

from micromatch.

trusktr avatar trusktr commented on May 27, 2024

In case someone stumbles here, you can do this in Node to get an exact path at the beginning:

        __dirname + '/src/**/*.js',

rather than trying to use only a pure glob, like

'**/src/**/*.js'

because then, even though there may be only one src folder in your project, it isn't a guarantee.

from micromatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.