jprichardson / node-klaw Goto Github PK
View Code? Open in Web Editor NEWA Node.js file system walker with a Readable stream interface. Extracted from fs-extra.
License: MIT License
A Node.js file system walker with a Readable stream interface. Extracted from fs-extra.
License: MIT License
Klaw is not actually following symlinks at least on Windows.
If I include a symlink in the srcFolder
structure, the link is only return as a symlink and is not traversed.
Example code:
const folders = []
const files = []
const links = []
klaw(srcFolder, {'preserveSymlinks': true})
.on('data', item => {
if (item.stats.isDirectory()) {
folders.push( item.path )
} else if ( item.stats.isSymbolicLink() ) {
links.push( item.path )
} else if (item.stats.isFile()) {
files.push( item.path )
}
})
.on('end', () => {
console.log({folders, files, links})
})
I am using klaw
to find files but I need to be able to stop walking the directories after a certain limit. I don't think there is currently an official way to do this. Here is my workaround:
function getFilesToProcess(directory, maxElements) {
return new Promise((resolve, reject) => {
const items = [];
const walkStream = fs.walk(directory, {queueMethod: 'pop'});
walkStream.close = function () {
this.paths = [];
};
walkStream
.on('data', function (item) => {
if (item.stats.isFile()) {
items.push(item.path);
if (maxElements > 0 && items.length >= maxElements) {
this.close();
}
}
})
.on('end', () => resolve(items))
.on('error', function (err) {
this.close();
reject(err);
});
});
}
I don't like to have to rely on an implementation detail to achieve this. What do you think about making an official close
method ?
RangeError: Maximum call stack size exceeded
at node_modules/klaw/src/index.js:45:23
at go$readdir$cb (node_modules/graceful-fs/graceful-fs.js:187:14)
at FSReqWrap.oncomplete (fs.js:135:15)
version 2.1.1 but nothing in new version relates here.
I have a folder with 200k+ files in it that crashes klaw.
const crawl = async (subdir: string, cb) => {
const done = resolvable();
klaw(path.join(dir, subdir), {
filter: (item) => {
return item.endsWith(".job");
}
}).on('data', (item) => {
cb(item.path);
}).on('end', () => done.resolve());
await done;
};
Would you please add a LICENSE file to your packages which includes your copyright information and the text of the MIT license? The MIT license states that the license text must accompany the source code. This also makes it easier for people like myself to package up your modules for Linux distributions.
There is an fs option on walk's options object, so the try-catch graceful-fs is unnecessary.
The main problem with it its performance overhead: with it, the library's loading time is around 20ms, and without it its 3ms in avarage.
Would you accept a PR?
Would you like to have only file
and only directory
functionalities?
That is, providing convenient functions to emit only files
and only directories
(excluding the root path) by piping a simple PassThrough stream to the walk
function, or any other preferred approach?
Hi,
Imagine that I want to walk through the directory contents and remove all the directories there.
Here is my directory:
tmp
├── dir1
│ └── hello.txt
├── dir2
├── dir3
└── hello.txt
3 directories, 2 files
Here is my code:
'use strict';
const klaw = require('klaw');
const through2 = require('through2');
const fs = require('fs-extra');
fs.walk('./tmp')
.pipe(through2.obj((function (item, enc, next) {
if (item.stats.isDirectory()) {
fs.remove(item.path)
}
next();
})))
/*.on('data', function (item) {
})*/
.on('end', function () {
console.log('end of everything');
})
Here is the error I have during the execution:
After the execution the directory looks the next way:
tmp
└── dir1
└── hello.txt
1 directory, 1 file
I have tried different variants, including the marking the items as removed, but I haven't yet found anything working.
Could you, please, clarify if it's possible to cleanup the directory while "klawling" through it?
Regards,
There's shift
and pop
, but how does it change the behavior?
I was playing with #11 to see if I can find a solution, then I noticed another issue if filter function applied. This is different than #11 that the root directory is part of the result array although filter function is applied. This issue is about not traversing subdirectories at all if some filter functions applied, such as the following example.
Imagine we have
tmp
|_dir1
| |_foo.md
|_bar.md
|_baz.md
and we want to get only .md
files. If we use it like
var filterFunc = function (item) {
return path.extname(item) === '.md'
}
var items = []
klaw('tmp', {filter: filterFunc}).on('data', function (item){
items.push(item.path)
}).on('end', function () {
console.dir(items)
})
the result array is ['tmp', 'bar.md', 'baz.md']
.
So, I checked the code again and based on what I understood it happens because when a filter function is passed, all contents of the root directory pass through the filter function and since return path.extname(item) === '.md'
fails for all subdirectories , so none of them will be read. Therefore, only items in the root directory itself are returned. Please correct me if I am wrong.
Edit
However, if we run the same example using pipe for filtering, everything is just fine. Apparently, the problem only arises when filter function is used.
var filter = through2.obj(function (item, enc, next) {
if (path.extname(item.path) === '.md') this.push(item)
next()
})
var items = []
klaw('tmp')
.pipe(filter)
.on('data', function (item) {
items.push(item.path)
})
.on('end', function () {
console.dir(items) // => ['bar.md', 'baz.md', 'dir1/foo.md']
})
Spent a while trying to work out why my directory tree wasn't being traversed -- my root path was a symlink which isn't traversed.
It would be helpful to document explicitly that symlinks aren't traversed, or provide an option to permit this.
It would be great if klaw could be compatible with async/await like fs-extra is. For now I am using klaw-sync.
The Node.js fs
module supports file URLs where file paths are supported. It would be nice if this package accepts those too.
This link is broken:
If you're not sure of the differences on Node.js streams 1, 2, 3 then I'd recommend this resource as a good starting point: https://strongloop.com/strongblog/whats-new-io-js-beta-streams3/.
Hi,
it looks easy to upgrade mkdirp dependency:
--- a/tests/_test.js
+++ b/tests/_test.js
@@ -10,17 +10,16 @@
var testDir = path.join(os.tmpdir(), 'klaw-tests')
rimraf(testDir, function (err) {
if (err) return t.end(err)
- mkdirp(testDir, function (err) {
- if (err) return t.end(err)
-
+ mkdirp(testDir).then(() => {
var oldEnd = t.end
t.end = function () {
rimraf(testDir, function (err) {
err ? oldEnd.apply(t, [err]) : oldEnd.apply(t, arguments)
})
}
-
testFn(t, testDir)
+ }).catch((err) => {
+ return t.end(err)
})
})
})
Is there a way to get klaw to continue walking through a directory even if an error occurs by just ignoring the file/directory it encountered the error on? I'm wanting to walk through an entire /home/
but when I encounter a permission error the walk ends without emitting end
. Thanks
Hello,
I really appreciate this awesome package, but I cannot configure it to continue on error.
If directory access error occurs (not sure about file errors) then klaw
just emits error
and stops. end
isn't called.
I'd like to continue seeking through directories.
Currently I cannot use klaw
to scan whole readable filesystem, because it surely won't have access to some directories.
Any solutions?
I am looking for a way to skip directories.
meaning - do not walk a specific directory's files or subfolders
The example in the main page does not skip folder, it simply filters out files.
I'd expect it to only give the files under the current folder.
so if I have
+ root
+------- some-dir
+----------some-file.txt
+------ another-file.txt
to output only another-file.txt
and some-dir
is skipped..
however, I see some-file.txt
is also added to items.
is there a way to actually skip a directory?
I tried not calling next()
, but that seems to stop the entire process.
I am walking a directory, ./source
, and even though I have specified a filter to include only markdown files via path.extname()
, I am still receiving the root directory as an item in my final array.
let filterFn = function(item) {
return path.extname(item) === ".md";
}
return new Promise((resolve, reject) => {
let items = [];
fs.walk('./source', { filter: filterFn }).on('data', item => {
items.push(item);
}).on('end', () => {
return resolve(items);
});
}).then(items => {
items.forEach(item => {
console.log(item.path); // ['foo.md', 'bar.md', 'baz.md', 'source'];
});
});
['foo.md', 'bar.md', 'baz.md'];
['foo.md', 'bar.md', 'baz.md', 'source'];
Does this module scale, if I have a directory with millions (or tens of millions) of files, will this scale elegantly as it iterates or does it have to read the entire directory into memory?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.