Git Product home page Git Product logo

godirwalk's Introduction

godirwalk

godirwalk is a library for traversing a directory tree on a file system.

GoDoc Build Status

In short, why did I create this library?

  1. It's faster than filepath.Walk.
  2. It's more correct on Windows than filepath.Walk.
  3. It's more easy to use than filepath.Walk.
  4. It's more flexible than filepath.Walk.

Depending on your specific circumstances, you might no longer need a library for file walking in Go.

Usage Example

Additional examples are provided in the examples/ subdirectory.

This library will normalize the provided top level directory name based on the os-specific path separator by calling filepath.Clean on its first argument. However it always provides the pathname created by using the correct os-specific path separator when invoking the provided callback function.

    dirname := "some/directory/root"
    err := godirwalk.Walk(dirname, &godirwalk.Options{
        Callback: func(osPathname string, de *godirwalk.Dirent) error {
            // Following string operation is not most performant way
            // of doing this, but common enough to warrant a simple
            // example here:
            if strings.Contains(osPathname, ".git") {
                return godirwalk.SkipThis
            }
            fmt.Printf("%s %s\n", de.ModeType(), osPathname)
            return nil
        },
        Unsorted: true, // (optional) set true for faster yet non-deterministic enumeration (see godoc)
    })

This library not only provides functions for traversing a file system directory tree, but also for obtaining a list of immediate descendants of a particular directory, typically much more quickly than using os.ReadDir or os.ReadDirnames.

Description

Here's why I use godirwalk in preference to filepath.Walk, os.ReadDir, and os.ReadDirnames.

It's faster than filepath.Walk

When compared against filepath.Walk in benchmarks, it has been observed to run between five and ten times the speed on darwin, at speeds comparable to the that of the unix find utility; and about twice the speed on linux; and about four times the speed on Windows.

How does it obtain this performance boost? It does less work to give you nearly the same output. This library calls the same syscall functions to do the work, but it makes fewer calls, does not throw away information that it might need, and creates less memory churn along the way by reusing the same scratch buffer for reading from a directory rather than reallocating a new buffer every time it reads file system entry data from the operating system.

While traversing a file system directory tree, filepath.Walk obtains the list of immediate descendants of a directory, and throws away the node type information for the file system entry that is provided by the operating system that comes with the node's name. Then, immediately prior to invoking the callback function, filepath.Walk invokes os.Stat for each node, and passes the returned os.FileInfo information to the callback.

While the os.FileInfo information provided by os.Stat is extremely helpful--and even includes the os.FileMode data--providing it requires an additional system call for each node.

Because most callbacks only care about what the node type is, this library does not throw the type information away, but rather provides that information to the callback function in the form of a os.FileMode value. Note that the provided os.FileMode value that this library provides only has the node type information, and does not have the permission bits, sticky bits, or other information from the file's mode. If the callback does care about a particular node's entire os.FileInfo data structure, the callback can easiy invoke os.Stat when needed, and only when needed.

Benchmarks

macOS
$ go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: github.com/karrick/godirwalk
BenchmarkReadDirnamesStandardLibrary-12   50000       26250  ns/op       10360  B/op       16  allocs/op
BenchmarkReadDirnamesThisLibrary-12       50000       24372  ns/op        5064  B/op       20  allocs/op
BenchmarkFilepathWalk-12                      1  1099524875  ns/op   228415912  B/op   416952  allocs/op
BenchmarkGodirwalk-12                         2   526754589  ns/op   103110464  B/op   451442  allocs/op
BenchmarkGodirwalkUnsorted-12                 3   509219296  ns/op   100751400  B/op   378800  allocs/op
BenchmarkFlameGraphFilepathWalk-12            1  7478618820  ns/op  2284138176  B/op  4169453  allocs/op
BenchmarkFlameGraphGodirwalk-12               1  4977264058  ns/op  1031105328  B/op  4514423  allocs/op
PASS
ok  	github.com/karrick/godirwalk	21.219s
Linux
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/karrick/godirwalk
BenchmarkReadDirnamesStandardLibrary-12  100000       15458  ns/op       10360  B/op       16  allocs/op
BenchmarkReadDirnamesThisLibrary-12      100000       14646  ns/op        5064  B/op       20  allocs/op
BenchmarkFilepathWalk-12                      2   631034745  ns/op   228210216  B/op   416939  allocs/op
BenchmarkGodirwalk-12                         3   358714883  ns/op   102988664  B/op   451437  allocs/op
BenchmarkGodirwalkUnsorted-12                 3   355363915  ns/op   100629234  B/op   378796  allocs/op
BenchmarkFlameGraphFilepathWalk-12            1  6086913991  ns/op  2282104720  B/op  4169417  allocs/op
BenchmarkFlameGraphGodirwalk-12               1  3456398824  ns/op  1029886400  B/op  4514373  allocs/op
PASS
ok      github.com/karrick/godirwalk    19.179s

It's more correct on Windows than filepath.Walk

I did not previously care about this either, but humor me. We all love how we can write once and run everywhere. It is essential for the language's adoption, growth, and success, that the software we create can run unmodified on all architectures and operating systems supported by Go.

When the traversed file system has a logical loop caused by symbolic links to directories, on unix filepath.Walk ignores symbolic links and traverses the entire directory tree without error. On Windows however, filepath.Walk will continue following directory symbolic links, even though it is not supposed to, eventually causing filepath.Walk to terminate early and return an error when the pathname gets too long from concatenating endless loops of symbolic links onto the pathname. This error comes from Windows, passes through filepath.Walk, and to the upstream client running filepath.Walk.

The takeaway is that behavior is different based on which platform filepath.Walk is running. While this is clearly not intentional, until it is fixed in the standard library, it presents a compatibility problem.

This library fixes the above problem such that it will never follow logical file sytem loops on either unix or Windows. Furthermore, it will only follow symbolic links when FollowSymbolicLinks is set to true. Behavior on Windows and other operating systems is identical.

It's more easy to use than filepath.Walk

While this library strives to mimic the behavior of the incredibly well-written filepath.Walk standard library, there are places where it deviates a bit in order to provide a more easy or intuitive caller interface.

Callback interface does not send you an error to check

Since this library does not invoke os.Stat on every file system node it encounters, there is no possible error event for the callback function to filter on. The third argument in the filepath.WalkFunc function signature to pass the error from os.Stat to the callback function is no longer necessary, and thus eliminated from signature of the callback function from this library.

Furthermore, this slight interface difference between filepath.WalkFunc and this library's WalkFunc eliminates the boilerplate code that callback handlers must write when they use filepath.Walk. Rather than every callback function needing to check the error value passed into it and branch accordingly, users of this library do not even have an error value to check immediately upon entry into the callback function. This is an improvement both in runtime performance and code clarity.

Callback function is invoked with OS specific file system path separator

On every OS platform filepath.Walk invokes the callback function with a solidus (/) delimited pathname. By contrast this library invokes the callback with the os-specific pathname separator, obviating a call to filepath.Clean in the callback function for each node prior to actually using the provided pathname.

In other words, even on Windows, filepath.Walk will invoke the callback with some/path/to/foo.txt, requiring well written clients to perform pathname normalization for every file prior to working with the specified file. This is a hidden boilerplate requirement to create truly os agnostic callback functions. In truth, many clients developed on unix and not tested on Windows neglect this subtlety, and will result in software bugs when someone tries to run that software on Windows.

This library invokes the callback function with some\path\to\foo.txt for the same file when running on Windows, eliminating the need to normalize the pathname by the client, and lessen the likelyhood that a client will work on unix but not on Windows.

This enhancement eliminates necessity for some more boilerplate code in callback functions while improving the runtime performance of this library.

godirwalk.SkipThis is more intuitive to use than filepath.SkipDir

One arguably confusing aspect of the filepath.WalkFunc interface that this library must emulate is how a caller tells the Walk function to skip file system entries. With both filepath.Walk and this library's Walk, when a callback function wants to skip a directory and not descend into its children, it returns filepath.SkipDir. If the callback function returns filepath.SkipDir for a non-directory, filepath.Walk and this library will stop processing any more entries in the current directory. This is not necessarily what most developers want or expect. If you want to simply skip a particular non-directory entry but continue processing entries in the directory, the callback function must return nil.

The implications of this interface design is when you want to walk a file system hierarchy and skip an entry, you have to return a different value based on what type of file system entry that node is. To skip an entry, if the entry is a directory, you must return filepath.SkipDir, and if entry is not a directory, you must return nil. This is an unfortunate hurdle I have observed many developers struggling with, simply because it is not an intuitive interface.

Here is an example callback function that adheres to filepath.WalkFunc interface to have it skip any file system entry whose full pathname includes a particular substring, optSkip. Note that this library still supports identical behavior of filepath.Walk when the callback function returns filepath.SkipDir.

    func callback1(osPathname string, de *godirwalk.Dirent) error {
        if optSkip != "" && strings.Contains(osPathname, optSkip) {
            if b, err := de.IsDirOrSymlinkToDir(); b == true && err == nil {
                return filepath.SkipDir
            }
            return nil
        }
        // Process file like normal...
        return nil
    }

This library attempts to eliminate some of that logic boilerplate required in callback functions by providing a new token error value, SkipThis, which a callback function may return to skip the current file system entry regardless of what type of entry it is. If the current entry is a directory, its children will not be enumerated, exactly as if the callback had returned filepath.SkipDir. If the current entry is a non-directory, the next file system entry in the current directory will be enumerated, exactly as if the callback returned nil. The following example callback function has identical behavior as the previous, but has less boilerplate, and admittedly logic that I find more simple to follow.

    func callback2(osPathname string, de *godirwalk.Dirent) error {
        if optSkip != "" && strings.Contains(osPathname, optSkip) {
            return godirwalk.SkipThis
        }
        // Process file like normal...
        return nil
    }

It's more flexible than filepath.Walk

Configurable Handling of Symbolic Links

The default behavior of this library is to ignore symbolic links to directories when walking a directory tree, just like filepath.Walk does. However, it does invoke the callback function with each node it finds, including symbolic links. If a particular use case exists to follow symbolic links when traversing a directory tree, this library can be invoked in manner to do so, by setting the FollowSymbolicLinks config parameter to true.

Configurable Sorting of Directory Children

The default behavior of this library is to always sort the immediate descendants of a directory prior to visiting each node, just like filepath.Walk does. This is usually the desired behavior. However, this does come at slight performance and memory penalties required to sort the names when a directory node has many entries. Additionally if caller specifies Unsorted enumeration in the configuration parameter, reading directories is lazily performed as the caller consumes entries. If a particular use case exists that does not require sorting the directory's immediate descendants prior to visiting its nodes, this library will skip the sorting step when the Unsorted parameter is set to true.

Here's an interesting read of the potential hazzards of traversing a file system hierarchy in a non-deterministic order. If you know the problem you are solving is not affected by the order files are visited, then I encourage you to use Unsorted. Otherwise skip setting this option.

Researchers find bug in Python script may have affected hundreds of studies

Configurable Post Children Callback

This library provides upstream code with the ability to specify a callback function to be invoked for each directory after its children are processed. This has been used to recursively delete empty directories after traversing the file system in a more efficient manner. See the examples/clean-empties directory for an example of this usage.

Configurable Error Callback

This library provides upstream code with the ability to specify a callback to be invoked for errors that the operating system returns, allowing the upstream code to determine the next course of action to take, whether to halt walking the hierarchy, as it would do were no error callback provided, or skip the node that caused the error. See the examples/walk-fast directory for an example of this usage.

godirwalk's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

godirwalk's Issues

inquiry on the skipNode functionality in regards to file name filtering.

Hello, I have been looking for a good package to walk directories for a while and I stumbled upon this one today. I really love the effort you have made into solving the issue with the path/filepath package provided by the go-team.

However, I have one small issue. I was trying to implement a file name filter and it kind of felt a bit awkward. Here is an example:

func main() {
	_ = godirwalk.Walk(".", &godirwalk.Options{
		Callback: func(osPathname string, de *godirwalk.Dirent) error {
			if strings.Contains(osPathname, ".git") {
				return errors.New("skipping: " + osPathname)
			}
			return nil
		},
	})
}

In the code sample above, I have implemented a word filter for the file path but if I keep the implementation like this, it will stop on the first math and not continue walking the directory.

However if I add an error callback like so:

func main() {
	_ = godirwalk.Walk("./../..", &godirwalk.Options{
		Callback: func(osPathname string, de *godirwalk.Dirent) error {
			if strings.Contains(osPathname, ".git") {
				return errors.New("skipping: " + osPathname)
			}
			return nil
		},
		ErrorCallback: func(osPathname string, err error) godirwalk.ErrorAction {
			return godirwalk.SkipNode
		},
	})
}

We get the proper functionality where the walk only ignores the directory I returned an error for.

This feels a bit awkward because now I am ignoring all errors in the error callback and I don´t want to do that by default. If I want to know which errors are filter(word match) errors I need to do another string check in the error callback function.

func main() {
	_ = godirwalk.Walk("./../..", &godirwalk.Options{
		Callback: func(osPathname string, de *godirwalk.Dirent) error {
			if strings.Contains(osPathname, ".git") {
				return errors.New("path filter")
			}
			return nil
		},
		ErrorCallback: func(osPathname string, err error) godirwalk.ErrorAction {
			if err.Error() == "path filter" {
				return godirwalk.SkipNode
			}
			return godirwalk.Halt
		},
	})
}

Is there possibly a way to implement a file name/path filter in a better way that I am not seeing ? and if not, can I then perhaps take a shot at it and make a PR ?

Regards,

Possible memory leak?

Hi,
I am using this library to traverse a bunch of directories containing 70TB worth of files of about 2MB each.

I noticed that my software is using more memory than it should and I profiled it to try and understand what's going on.

The library is only used on startup. I made sure it returns and waited 1 hour before using pprof to analyze the heap and this is what I noticed:

 2670.24MB 33.01% 66.04%  2670.24MB 33.01%  0000000000b5786b github.com/karrick/godirwalk.(*Scanner).Scan /home/travis/gopath/pkg/mod/github.com/karrick/[email protected]/scandir_unix.go:163
         0     0% 66.04%  2670.24MB 33.01%  0000000000b57f45 github.com/karrick/godirwalk.Walk /home/travis/gopath/pkg/mod/github.com/karrick/[email protected]/walk.go:258
         0     0% 66.04%  2670.24MB 33.01%  0000000000b585ba github.com/karrick/godirwalk.walk /home/travis/gopath/pkg/mod/github.com/karrick/[email protected]/walk.go:329
         0     0% 66.04%  2670.24MB 33.01%  0000000000b59e2e github.com/lbryio/reflector.go/store/speedwalk.AllFiles.func2 /home/travis/gopath/src/github.com/lbryio/reflector.go/store/speedwalk/speedwalk.go:64

this is unusual because the function returned and the files were further processed later, there is no reason as to why godirwalk would still have to hold memory.

I tried looking into the code and I noticed that scandir_unix.go calls s.done() before returning false in Scan() which subsequently calls s.dh.Close() , however as the done() doc says:

// done is called when directory scanner unable to continue, with either the
// triggering error, or nil when there are simply no more entries to read from
// the directory.

the function is only called when the scanner is unable to continue which leads me to think that if something were to interrupt the walk before it finishes, memory could not be freed up.

this could happen when we return in the for loop here:

	for ds.Scan() {
		deChild, err := ds.Dirent()
		osChildname := filepath.Join(osPathname, deChild.name)
		if err != nil {
			if action := options.ErrorCallback(osChildname, err); action == SkipNode {
				return nil
			}
			return err
		}

which would not close the file that the scanner has opened leaving the scanner noncollectable by the GC.

If this happens enough times, this could potentially add up to the amount of memory I noticed while profiling (?)

The directory i'm walking has 256 subdirectories and about 115k files in each subdirectory.

Let me know what you think.

Regards,

Niko

Modification time and size?

Why are these attributes not preserved from os.FileInfo into godirwalk.Dirent? Instead of selecting fields from FileInfo to bring into Dirent, couldn't we just keep FileInfo as a field in Dirent and expose it's methods?

I'm very impressed by the performance of godirwalk, but I'm developing a file tree diff algorithm and both the modification time and the size of files are crucial for it.

Failed to build on Windows

When building the latest version (v1.8.1) on Windows, the following error occurred:

# github.com/karrick/godirwalk
..\Go\pkg\mod\github.com\karrick\[email protected]\follow_windows.go:10:6: evalSymlinksHelper redeclared in this block
        previous declaration at ..\Go\pkg\mod\github.com\karrick\[email protected]\follow_unix.go:4:51

IMHO _unix in filename is not supported, as per golang/go#20322. Should build tags be added for Unix platforms?

macOS: Unsorted scan is incompatible with removing files from the directory

The following code for recursive directory removal used to work until 74f3b4a. Since that change it started to skip files under macOS (probably filesystem-specific and might be triggered under Linux too) and failing in PostChildrenCallback with ENOTEMPTY.

I am not sure this is expected behaviour, so either fixing or documenting it would be useful.

func RemoveIfExists(f string) error {
	if err := os.Remove(f); err != nil && !os.IsNotExist(err) {
		return err
	}
	return nil
}

func Rmr(dir string) error {
	err := godirwalk.Walk(dir, &godirwalk.Options{
		Callback: func(path string, de *godirwalk.Dirent) error {
			if de.IsDir() {
				// Directory itself will be removed in PostChildrenCallback
				return nil
			}
			return RemoveIfExists(path)
		},
		PostChildrenCallback: func(path string, de *godirwalk.Dirent) error {
			return RemoveIfExists(path)
		},
		Unsorted: true,
	})
	if os.IsNotExist(err) {
		return nil
	}
	return err
}

Symlink loops cause infinite recursion

When symlinks are followed godirwalk may enter an infinite loop. A standard Linux installation can have these types of symlinks that create a directory loop, my Linux Mint machine certainly contains them out of the box.

As a result godirwalk will never end on some systems if FollowSymbolicLinks is true.

Can we send a stop request to Walk

My use case is a filesystem walker which runs continuously until someone decides to stop the agent which in that case I need to intercept the SIGTERM and stop Walk() function before closing other tasks. I didn't see anything in godirwalk library to stop the walking process?

Get back os.Stat data on Callback

Hi, I'm writing a very performant application that calculates file sizes.
For this reason, I would like to re-use the os.Stat return value that happens on every file, as this may lead to a small performance issue in my code.
Is there a way to get the stat struct back from the Walk function?
Or will I have to run os.Stat again?

Continuation: inquiry on the skipNode functionality in regards to file name filtering

Hello again,

This is a continuation of a previous issue since I couldn´t re-open the old one again as instructed.

I took a look at the find-fast example and it is indeed what I was looking for. However I was apparently not detailed enaugh when creating the issue.

I had already tried using the filepath.SkipDir but it seems to break the walk if I return that error on a file instead of a directory.

I did some debugging on the walk function and you can view the video here:
https://www.twitch.tv/videos/708174351

I find the (possible) issue @ 56:30 in the video and then I implement a solution that works for me.

However, I obviously don´t know all the logic behind this package and my solution might break something somewhere else so I would love it if you could check it out.

Regards,

cannot walk non-directory

The directory I want to walk is actually a symbolic link to the real directory. But with the latest release I am getting:

failed to walk: cannot Walk non-directory: /path/to/symlink

stat result

As I see correctly you do a stat for each file, why did you not pass it in Dirent?

Feature Request: Walk Parents

Would be great to see an option to walk up the parent directory until you find a file or other exit condition. This would simulate, for instance, running git status in a directory below the .git dir. In other words, the purpose would be to decide what a project root is for a cli.

Issue with symbolic links with absolute path

I have something like this

root@loc-v-gcdn-edge-1 ~/go/src/xxx/mapi $ tree insales-cache/
insales-cache/
├── 3
│   └── test_sample
└── test
    ├── 1
    │   └── assets3-insales-ru-728ddd54_sample
    ├── 2
    │   ├── 30cdb40e687398728e13ac3589c233e9
    │   └── 957ecf0fb64bde07fa85a657ffc333e9
    └── 3 -> /root/go/src/xxx/mapi/insales-cache/3

And I have the following error from walk function (boolean after path is a returned value from IsRegular function):

=== RUN   TestXXX
/root/go/src/xxx/mapi/insales-cache/test false
/root/go/src/xxx/mapi/insales-cache/test/1 false
/root/go/src/xxx/insales-cache/test/1/assets3-insales-ru-728ddd54_sample true
/root/go/src/xxx/mapi/insales-cache/test/2 false
/root/go/src/xxx/mapi/insales-cache/test/2/30cdb40e687398728e13ac3589c233e9 true
/root/go/src/xxx/mapi/insales-cache/test/2/957ecf0fb64bde07fa85a657ffc333e9 true
/root/go/src/xxx/mapi/insales-cache/test/3 false
--- FAIL: TestXXX (0.01s)
	command_test.go:35: cannot Stat: stat /root/go/src/xxx/mapi/insales-cache/test/root/go/src/xxx/mapi/insales-cache/3: no such file or directory

I see pull request here: #4
Please merge this :)

go get fails: checksum mismatch

On Windows 10 with Go 1.11 I ran go get github.com/UnnoTed/fileb0x, which has this as a dependency, and I get this error:

go: verifying github.com/karrick/[email protected]: checksum mismatch
        downloaded: h1:e5iv87oxunQtG7S9MB10jrINLmF7HecFSjiTYKO7P2c=
        go.sum:     h1:UP4CfXf1LfNwXrX6vqWf1DOhuiFRn2hXsqtRAQlQOUQ=

This also happens on CircleCI, but it doesn't happen on OS X.

I'm wondering if there isn't some CRLF or filename case sensitivity issues. Any thoughts?

Access is denied: maybe need a option to skip failed

	err := godirwalk.Walk("D:/", &godirwalk.Options{
		Callback: func(osPathname string, de *godirwalk.Dirent) error {
			fmt.Printf("%s %s\n", de.ModeType(), osPathname)
			return nil
		},
	})
	fmt.Println(err)
d--------- D:\$RECYCLE.BIN\S-1-5-21-2045673143-4246043559-2053418641-500
cannot ReadDirents: cannot Open: open D:\$RECYCLE.BIN\S-1-5-21-2045673143-4246043559-2053418641-500: Access is denied.

Consider providing an iterator to enumerate file system entry names

Rather than only providing a function that returns a slice of children entries, or an error, provide an iterator interface that can be used to scan through the child entries in a directory one by one, without allocating large amounts of RAM for large directories.

Additionally, for some operating systems the file system node type is not known without calling an operating system call like stat or lstat, and the present implementation of walk could fail when performing that lstat call to obtain file system node information merely when iterating through the contents of the children in the directory. Once an iterator is available, update the walk implementation to enumerate through the children by using the iterator, which will allow the client software to receive an error for calling lstat for a given file, but allow the iteration of the directory contents to proceed.

Add travis build

Maybe add travis build, useful to see the stability of the project

ModeType() doesnt show permissions and modes on macos

go version: go version go1.16.3 darwin/amd64
godirwalk version: github.com/karrick/godirwalk v1.16.1
code used:

godirwalk.Walk(startingDir, &godirwalk.Options{
	Callback: func(osPathname string, de *godirwalk.Dirent) error {
		fmt.Printf("%s %s\n", de.ModeType(), osPathname)
		return nil
	},
	Unsorted:            true,
	FollowSymbolicLinks: false,
})

Observed Output:

d--------- /Library/Caches
---------- /Library/Caches/.DS_Store
d--------- /Library/Caches/ColorSync
---------- /Library/Caches/ColorSync/com.apple.colorsync.devices
d--------- /Library/Caches/com.apple.cloudkit
---------- /Library/Caches/com.apple.cloudkit/com.apple.cloudkit.launchservices.hostnames.plist
d--------- /Library/Caches/com.apple.iconservices.store

Expected Output: based on filewalker and os.FileInfo.Mode()

dtrwxrwxrwx /Library/Caches
-rw-r--r-- /Library/Caches/.DS_Store
drwxr-xr-x /Library/Caches/ColorSync
-rw-r--r-- /Library/Caches/ColorSync/com.apple.colorsync.devices
drwxr-xr-x /Library/Caches/com.apple.cloudkit
-rw-r--r-- /Library/Caches/com.apple.cloudkit/com.apple.cloudkit.launchservices.hostnames.plist
drwx--x--x /Library/Caches/com.apple.iconservices.store

Possible cause:
Walk.go line 244. ModeType is & with os.ModeType

Is this intentional?

Broken on illumos?

Trying to track down some caching problems with navidrome navidrome/navidrome#1048
and found that pulling down master and running "go test" fails:

$ go version
go version go1.16.3 illumos/amd64
$ uname -a
SunOS pergamum 5.11 omnios-r151030-5bd7739fe4 i86pc i386 i86pc illumos
$ git clone https://github.com/karrick/godirwalk.git
Cloning into 'godirwalk'...
remote: Enumerating objects: 1141, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (21/21), done.
Receiving objects:  99% (1130/1141)remote: Total 1141 (delta 13), reused 21 (delta 9), pack-reused 1111
Receiving objects: 100% (1141/1141), 255.23 KiB | 4.12 MiB/s, done.
Resolving deltas: 100% (617/617), done.
$ pushd godirwalk/
~/navidrome/godirwalk ~/navidrome
$ go test
--- FAIL: TestReadDirents (0.00s)
    --- FAIL: TestReadDirents/without_symlinks (0.00s)
        readdir_test.go:14: GOT: lstat /tmp/godirwalk-378882479/d0/aaaaaa: no such file or directory; WANT: []
    --- FAIL: TestReadDirents/with_symlinks (0.00s)
        readdir_test.go:51: GOT: lstat /tmp/godirwalk-378882479/d0/symlinks/nothin: no such file or directory; WANT: []
--- FAIL: TestScanner (0.00s)
    --- FAIL: TestScanner/collect_names (0.00s)
        scandir_test.go:22: GOT: "aaaaaa\x03" (extra)
        scandir_test.go:22: WANT: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" (missing)
        scandir_test.go:22: GOT: "symlin\x03" (extra)
        scandir_test.go:22: WANT: "symlinks" (missing)
    --- FAIL: TestScanner/collect_dirents (0.00s)
        scandir_test.go:35: GOT: lstat /tmp/godirwalk-378882479/d0/aaaaaa: no such file or directory; WANT: []
--- FAIL: TestWalkCompatibleWithFilepathWalk (0.00s)
    --- FAIL: TestWalkCompatibleWithFilepathWalk/test_root (0.00s)
        walk_test.go:79: GOT: lstat /tmp/godirwalk-378882479/d0/aaaaaa: no such file or directory; WANT: []
--- FAIL: TestWalkSkipThis (0.00s)
    --- FAIL: TestWalkSkipThis/SkipThis (0.00s)
        walk_test.go:154: GOT: lstat /tmp/godirwalk-378882479/d0/aaaaaa: no such file or directory; WANT: []
--- FAIL: TestWalkFollowSymbolicLinks (0.00s)
    walk_test.go:196: GOT: lstat /tmp/godirwalk-378882479/d0/symlinks/nothin: no such file or directory; WANT: []
--- FAIL: TestErrorCallback (0.00s)
    --- FAIL: TestErrorCallback/halt (0.00s)
        walk_test.go:239: unexpected error callback for /tmp/godirwalk-378882479/d0/symlinks: lstat /tmp/godirwalk-378882479/d0/symlinks/nothin: no such file or directory
    --- FAIL: TestErrorCallback/skipnode (0.00s)
        walk_test.go:271: unexpected error callback for /tmp/godirwalk-378882479/d0/symlinks: lstat /tmp/godirwalk-378882479/d0/symlinks/nothin: no such file or directory
--- FAIL: TestPostChildrenCallback (0.00s)
    walk_test.go:299: GOT: lstat /tmp/godirwalk-378882479/d0/aaaaaa: no such file or directory; WANT: []
FAIL
drwx------
drwxrwxr-x /d0
-rwxrwxr-x /d0/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
drwxrwxr-x /d0/d1
-rwxrwxr-x /d0/d1/f2
-rwxrwxr-x /d0/f1
drwxrwxr-x /d0/skips
drwxrwxr-x /d0/skips/d2
-rwxrwxr-x /d0/skips/d2/f3
-rwxrwxr-x /d0/skips/d2/skip
-rwxrwxr-x /d0/skips/d2/z1
drwxrwxr-x /d0/skips/d3
-rwxrwxr-x /d0/skips/d3/f4
drwxrwxr-x /d0/skips/d3/skip
-rwxrwxr-x /d0/skips/d3/skip/f5
-rwxrwxr-x /d0/skips/d3/z2
drwxrwxr-x /d0/symlinks
drwxrwxr-x /d0/symlinks/d4
Lrwxrwxrwx /d0/symlinks/d4/toSD1 -> ../toD1
Lrwxrwxrwx /d0/symlinks/d4/toSF1 -> ../toF1
Lrwxrwxrwx /d0/symlinks/nothing -> ../f0
Lrwxrwxrwx /d0/symlinks/toAbs -> /tmp/godirwalk-378882479/d0/f1
Lrwxrwxrwx /d0/symlinks/toD1 -> ../d1
Lrwxrwxrwx /d0/symlinks/toF1 -> ../f1
exit status 1
FAIL    github.com/karrick/godirwalk    0.023s

That weird truncation of "aaaaaa" is exactly what we're seeing with navidrome

Dirent.IsDir() returns false for symlinks to directories

hi thank you for this project

i'm wondering how I can tell if a symlink to a directory is a directory, where in my case

dirent.IsDir() = false
dirent.IsSymlink() = true

i think i saw a isSymlinkToDirectory, but it wasn't public

thanks!

Linux manjaro 5.4.13-3-MANJARO #1 SMP PREEMPT Mon Jan 20 18:17:25 UTC 2020 x86_64 GNU/Linux

Path is not resolving when passing

Not sure what is happening and dunno how to resolve it.

var dirname = "/usr/sap/MRS/HDB00/*/trace"
// dirname := "."
if len(os.Args) > 1 {
	dirname = os.Args[1]
}

err := godirwalk.Walk(dirname, &godirwalk.Options{
	Unsorted: true, // set true for faster yet non-deterministic enumeration (see godoc)
	Callback: func(osPathname string, de *godirwalk.Dirent) error {
		// fmt.Printf("%s %s\n", de.ModeType(), osPathname)
		fmt.Printf("%s\n", osPathname)
		if de.IsRegular() {
			fileList = append(fileList, osPathname)
		}

		return nil
	},

I see this:

$ ./main
cannot Lstat: lstat /usr/sap/MRS/HDB00/*/trace: no such file or directory

$ ./main  /usr/sap/MRS/HDB00/*/trace | head -1
 /usr/sap/MRS/HDB00/apollo/trace

walk EOF error on Windows

Summary

It appears that v1.17.0 caused an EOF error to get returned from the Walk function on Windows platforms.

This was reported to Telegraf in influxdata/telegraf#11823 as updating the godirwalk library seemed to break Windows users.

Reproducer

Use the walk-fast example:

$ go run . --verbose C:/Users
d--------- C:\Users
L--------- C:\Users\All Users
d--------- C:\Users\Default
d--------- C:\Users\Default\AppData
d--------- C:\Users\Default\AppData\Local
L--------- C:\Users\Default\AppData\Local\Application Data
L--------- C:\Users\Default\AppData\Local\History
d--------- C:\Users\Default\AppData\Local\Microsoft
d--------- C:\Users\Default\AppData\Local\Microsoft\InputPersonalization
d--------- C:\Users\Default\AppData\Local\Microsoft\InputPersonalization\TrainedDataStore
EOF
exit status 1

This same behavior does not happen on Linux:

❯ go run . --verbose /home/powersj/test
d--------- /home/powersj/test
---------- /home/powersj/test/main.go
---------- /home/powersj/test/go.mod
---------- /home/powersj/test/go.sum
~/test via 🐹 v1.19.3 
❯ echo $?
0

System Info

Windows 10: go1.19.3 linux/amd64
Linux (Arch): go1.19.3 linux/amd64

Walk files of folder before recursing into sub-folders

Is there a way with this library to first invoke the callback on all direct files under the current folder before recursing into the sub-folders?

Example:

package main

import (
	"fmt"
	"github.com/karrick/godirwalk"
)

func main() {
	godirwalk.Walk("/tmp/test", &godirwalk.Options{
		Callback: func(osPathname string, directoryEntry *godirwalk.Dirent) error {
			fmt.Printf("%s\n", osPathname)
			return nil
		}})
}

currently results in:

/tmp/test
/tmp/test/afile
/tmp/test/dir1
/tmp/test/dir1/file2
/tmp/test/file

and the ask is to return:

/tmp/test
/tmp/test/afile
/tmp/test/file
/tmp/test/dir1
/tmp/test/dir1/file2

This way the results are still deterministic as all files of current folder are sorted, and the sub-folders are recursed into also in a sorted order.

Thanks!

Error wrapping in ErrorCallback function

Hi,

In the docs ErrorCallback is described like this

When non-nil, this user supplied function is invoked with the OS pathname of the file system object that caused the error along with the error that took place

and WalkFunc like this

If an ErrorCallback function is provided, then it is invoked with the OS pathname of the node that caused the error along along with the error.

This suggests that errors are passed as-is to ErrorCallback.

However, due to error wrapping in walk.go

err = errCallback(err.Error()) // wrap potential errors returned by callback

err = errCallback(err.Error()) // wrap potential errors returned by callback

the type of the error passed to ErrorCallback functions is different from the original error type, making comparisons like if err == MyErr fail (see https://play.golang.org/p/HlMFGvRepw0 for an example).

Is the error wrapping behavior necessary?

Thank you for your time.

Infinite recursion on symbolic link loops, README incorrect?

The README says the following:

This library fixes the above problem such that it will never follow logical file sytem [sic] loops on either unix or Windows.

I think this is false. I have a directory structure like the following:

walk-links/
└── loop
    └── loop-link-upwards -> ../../walk-links/

When calling godirwalk.Walk on walk-links, it descends infinitely when FollowSymbolicLinks is set to true. I know that this is an adversarial example, but unless godirwalk is somehow keeping track of all paths that have been visited, I don't think the README is accurate.

Feature Request: Return Dirent's internal base directory via Dir() (and possibly more...)

Currently, within the scope of a walk, we can access each Dirent via its associated methods; while it was mostly intended for querying if we're transversing a directory, a symbolic link, etc. there is also the ability to retrieve the base name of the file, namely via the func (de Dirent) Name() string method (the same applies to the mode bits, we get a method for that one as well).

However, internally, the Dirent also stores the directory (i.e. the path minus the base, which, as said, is returned via Name()). It would be awesome if we could get a method for that as well, e.g.

func (de Dirent) Dir() string { return de.path; }

Why is this useful? Well, in my personal use-case, I use godirwalk to quickly retrieve a set of files with a specified list of extensions, which can be arbitrarily deep, and which then gets displayed on a Go template as a structured tree. I'm currently lazily using a struct that merely includes the whole path — which already gets retrieved via the Dirent mechanism.

I suppose that an alternative to that one-liner method is for my own algorithm to track the current directory better :)

Feature request: Being able to

Hello again!

Yet again I find myself using godirwalk, just love this module <3

I was wondering if you could implement a way to detect if the current runtime does not have permissions to enter a folder.

Right now the Walk method just throws an error if I don't have permissions:

$ goto [DIRECTORY]
2021/03/29 08:37:55 trying: .
2021/03/29 08:37:55 trying: $Recycle.Bin
2021/03/29 08:37:55 trying: $Recycle.Bin/S-1-5-18
2021/03/29 08:37:55 open $Recycle.Bin/S-1-5-18: permission denied

Here is my suggestion:

Add a function called "CanAccess" or something similar, that let's you check for persmissions inside the Walk method itself. This way you can combine HasAccess with SkipThis to quickly bypass closed off directories and/or files.

Example:

func main() {
	err := godirwalk.Walk("./", &godirwalk.Options{
		Callback: func(osPathname string, info *godirwalk.Dirent) error {
			log.Println("trying:", osPathname)

			if info.CanAccess() {
				return godirwalk.SkipThis
			}

			if info.IsDir() {
				if strings.Contains(osPathname, os.Args[1]) {
					fmt.Println(osPathname)
					os.Exit(0)
				}
			}
			return nil
		},
		Unsorted: true,
	})

	if err != nil {
		// Permissions error happens here, outside the walk
		log.Println(err)
	}

	fmt.Print("no directory found")
}

I figured since you're all up in this code, you might be able to do it in a few minutes but it might take me hours :S

Walking paths with symlinks when FollowSymbolicLinks is set, can fail unexpectedly

Construct the following dirs and links:

mkdir a b
touch z
ln -s ../b a/x
ln -s ../z b/y

Walking over the directories (starting at 'a') will result in the following error.:

cannot Stat: stat a/z: no such file or directory

The walked pathname a/x/y, which really just points to file z, should work perfectly fine, but the walk() func misconstructs the pathname when attempting to use Readlink() to build it's own direct path to z in order to Stat() it.

This occurs due to the following sequence of Joins for the Dir() and Readlink() referent:

Join(Dir("a/x"), "../b") => Join("a/", "../b") => "b"
Join(Dir("a/x/y"), "../z") => Join("a/x/", "../z") => "a/z"  # non-existant

The code is attempting to construct a new path, called 'osp' rather than just using the osPathname (or osChildname). Rather than constructing a new path, the code could just os.Stat() on the original path containing the symlinks, since Stat() resolves those paths to reach the resulting file or directory. Doing it this way (deleting all the code to build the 'osp' path), results in the normal tests all passing, and the walk over the constructed dirs/links described above to succeed (on Unix).

It seems the code originally worked this way, but it was changed in commit 3464826, which attributes the change to allowing proper walking of symlinks on Windows. But the change seems to cause a regression for Unix.

Go's filepath package has the EvalSymlinks() func which appears to rewrite paths containing symlinks to a direct path without symlinks, presumably for situations like this. The EvalSymlinks() code (which is a thin wrapper around walkSymlinks()), is reasonably complex, and itself calls Lstat() on each component of the path to determine which are symlinks.

Presuming there is no simple change to the 'osp' path construction that fixes this problem, the easiest "fix" for Unix appears to be to revert to the original code which simply calls Stat() on the original path to determine if the endpoint is a directory. Using EvalSymlinks() for Windows might be the alternative way to fix it there (I'm unfamiliar with what the original problem was that this change was fixing). There is some code duplication in the walk() function which could be factored out into a helper function, and then the helper function could be implemented appropriately for Unix and Windows.

Although the EvalSymlinks() func potentially does a lot of extra Lstats, it would only be used when FollowSymbolicLinks is set, and then only on the paths (and subpaths) that end in a symlink, so it may not have a huge impact overall (and it would only be on Windows, w/ the helper function approach).

I should be able to provide a WIP PR shortly which provides the failing testcase above, along with a version of the possible solution mentioned.

Extending `Dirent`: why some things aren't currently possible

Hello again,

There are probably better ways to tackle this issue, but bear with me, I'll try to give an example of what I need...

I'm working on a reasonably simple web-based file manager for music files. Essentially, I walk through a hierarchy of media directories (depth is of no concern at this stage, but with the usual artist/album/song layout, in most scenarios, there will just be 3 or possibly 4 levels), grab the name of each entry (i.e. osPathname as defined by the godirwalk callback, using an alphabetical sorting), check if it has a valid audio extension (in the future I might do more sophisticated checks) and save it on a one-dimensional array (with the full pathname). Anything which is not an audio file (such as lyrics or .DS_Store — macOS 'garbage' left all over the place) gets skipped.

So far, so good. My original plan was simply to use an array of Dirents, selecting those that matched and skipping over those that don't match. Then everything gets pushed into a template, which reads the array of Dirents using a range, and just adds some HTML & CSS around it. Simple.

Currently, Dirent lacks some fields that would make this very simple (#74). You do get the filename, and the ability to check for the file's mode, but you do not get things like the file size or time of last modification.

Therefore, I've extended the Dirent with my own type, something like this:

type PlayListItem struct {
	de godirwalk.Dirent	// directory entry data retrieved from godirwalk.

	fullPath string		`validate:"filepath"`			// full path for the directory where this file is.
	cover string		`validate:"filepath,omitempty"`	// path to album cover image for this file.
	modTime time.Time	`validate:"datetime"`			// last modified date (at least on Unix-like systems).
	size int64			// filesize in bytes, as reported by the system.
	checked bool		// file checkbox enabled; eventually this will add the file to the playlist.
}

and added not only the methods that Dirent has (such as Name(), `IsDir()¨, etc.) — essentially by calling the 'parent' functions directly.

There are a few catches, though; for instance, Dirent has a reset() method to 'clean up' after itself; unfortunately, this method isn't exported, and, as a consequence, it means that it cannot be called from a type that extends Dirent. That, in turn, means more work for the garbage collector or memory leaks (I've not tested this thoroughly).

Now here is my issue: consider the artist/album/song directory layout mentioned before. Assume, for a moment, that there is an image inside the album directory which corresponds to the album's cover. On the template I wish to display the cover together with the songs for that album.

Now this could be trivially done by going through the directory twice; first to see if there is any image present and save it; and secondly to extract each filename and assign the cover's path to it in the associated PlayListItem struct for that file.

The problem, in this case, is that we don't have a callback for the start of a new directory transversal; we only have one for the end (that is, we have PostChildrenCallback but not the equivalent PreChildrenCallback). The best we can currently do is, inside the callback, check if the current file we're analysing is a directory, and, if it is, check first if there is an image file inside it, and then cache it.

This has two problems.

First, it seems to be a waste of resources/time, since you'll be transversing all files on the album directory anyway, looking for audio files only, but ultimately hit on the same image file as before — however, this time, discarding it.

Secondly, what if there is more than one image file inside it? You can only use the first image you find with whatever method you use; because once you find a second image during the directory transversal, you can only update the entries for the files from that point onwards — not to the ones that came before the second image was found.

One might argue that getting two, three, or even more image files in what is supposed to be an audio folder is unlikely, but that's not true. In my personal case, my audio library is shared among several different devices and applications — from iTunes to Plex and even Kodi. While all of those, generally speaking, will not touch the audio files themselves (I mean, unless that request is made explicitly), they are fond of adding lots of extra files inside, which get used on different scenarios. For instance, on a web page displaying the content of an album, there might be two images, one with a larger size than the other, used on different places; or there might be an image with the same content (and size) but stored under a completely different name, used in a different context. Or, even more likely, each application/device has its own way to retrieve metadata — including album covers — which are mutually incompatible and therefore get stored separately as different files.

During a directory transversal, therefore, once an image file is found, there might be some extra work to be done, such as checking for size, format, compression level, etc. and picking the 'best-so-far' alternative; if a new file is found deeper in the directory listing, then the same checks are performed and compared to the 'best-so-far' alternative; if the new file is 'better' (according to whatever criterium has been used), then it gets used instead.

The only solution I have for this scenario is to store all found files — audio and images — on a temporary stack, and read it back when PostChildrenCallback is called. After that, whatever heuristics get applied to the images, one is selected, and all the audio files on that stack get assigned the selected image. The stack is then emptied, PostChildrenCallback returns, and the stack starts to get filled again from the next directory to be transversed, according to godirwalk's choice.

(Note that for this solution, BFS or LFS are irrelevant, so long as PostChildrenCallback is called at the end of each directory).

This works, especially on my own use-case, since it's unlikely that an album has a million tracks. Most will have a dozen or so. A few — originally multiple-CD bundles ripped to hard disk — might have a few dozens. I have recently read a review of a "complete collection" of CDs for a 20th-century composer, which included 60 CDs total — still, that would be less than a thousand entries on a single directory. As such, the stack solution would work — there would be no reason to believe that memory might get exhausted in this case.

It's also ugly, but, alas, that's the current choice I've got.

But suppose you have a completely different scenario, say a tool that will go through all your files in your Developer folder (whatever it might be called in your system), and saves metadata based on what files it finds there. Imagine that you want to 'group' together header files in C with the corresponding code; these might even be found on separate directories; which means that you have no other choice but to walk through the entire tree at least twice, one for figuring out all those .c files, and the other to gather the corresponding .h files, in order to produce metadata that links one to the other. And some applications have very deep directories, crammed full of source code — especially if there is a developer using Emacs, meaning a plethora of files ending with ~, which have to be read (even if they get discarded afterwards); in such a scenario, the stack may grow beyond a reasonable amount of available system memory. Or the code to deal with this becomes huge, too complex, and nastily convoluted.

How could this tree transversal be done only once?

P. S. For the sake of completeness: I personally gave up on coming up with a clever data structure to deal with my own use-case; I simply noticed that, in general, almost all directories have at least one image file, named Folder.jpg, which usually is sized at 200x200 — enough for my own purposes! As such, when first entering a directory, I just check for the existence of Folder.jpg using os.Stat(), store its path (if present)... and that's pretty much it (I don't need anything else). Then, for all audio files found on that directory, I just assign the path to the current Folder.jpg to each file, skipping over the rest of the non-audio files. When PostChildrenCallback is called, I just clean the stored path — the next time godirwalk enters a directory, it will do the same check for os.Stat("Folder.jpg") again, and the path to that will be assigned to the files on this new album directory — and so forth. If no "Folder.jpg" is present, but rather a differently-named image file for the cover... well, tough luck, I simply ignore that.

I suppose that I could do a full search for all image files (as opposed to just checking for os.Stat("Folder.jpg")), pick one according to heuristics, store its path, and apply to all subsequent audio files on that speific album's directory. It's just that this requires a "search within a search", therefore making the code more complex (and more I/O-heavy). As such, I stick with just the existence or absence of a well-known filename. It's a very limited way of implementing the overall concept — with the trade-off that it's very simple to code (and test the code!).

Build error - s.sde.Reclen undefined

There is no d_reclen field in struct dirent in POSIX and some OSes (like DragonFly BSD) might not implement it.

# github.com/karrick/godirwalk
./scandir_unix.go:146:37: s.sde.Reclen undefined (type *syscall.Dirent has no field or method Reclen)

I'm wondering whether godirwalk could use the ReadDirent/ParseDirent combination instead of accessing Reclen directly. Since one of the key points of godirwalk, maybe ParseDirent is just slower and thus not used?

Missing directories

TL;DR
Disabling sorting fixes my issue, if it's sorted, directories go missing from the results.

Summary:
I cannot, for the life of me, figure out what's going on, but there is an issue where directories are not appearing properly when sort is enabled.

I have a folder /code in my top level directory that when sort is enabled does not get returned in the results, but when sort is commented out or disabled via options it magically appears again.

Looping over pre/post sort shows that the folder exists, but as soon as the next range takes place, it's just .... gone ....

So here is what the code looks like(I just added some simple prints to walk.go)

walk.go:176


for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:presort")
		}
	}
	if !options.Unsorted {
		sort.Sort(deChildren) // sort children entries unless upstream says to leave unsorted
	}

	for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:postsort")
		}
	}

	for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:innerrange")
		}
		osChildname := filepath.Join(osPathname, deChild.name)
09:02 PM ✘ kcmerrill  Desktop ] cat block.txt | grep debug
debug:presort
debug:postsort
09:02 PM ✔ kcmerrill  Desktop ]

I was expecting to see debug:innerrange as I wouldn't think simply sorting would make a difference.

Ok, so now lets try again, but this time lets comment out sort.Sort(deChildren).

for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:presort")
		}
	}
	if !options.Unsorted {
		//sort.Sort(deChildren) // sort children entries unless upstream says to leave unsorted
	}

	for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:postsort")
		}
	}

	for _, deChild := range deChildren {
		if deChild.name == "code" {
			fmt.Println("debug:innerrange")
		}
		osChildname := filepath.Join(osPathname, deChild.name)

And here are the results:

09:03 PM ✔ kcmerrill  Desktop ] cat block.txt | grep debug
debug:presort
debug:postsort
debug:innerrange
09:03 PM ✔ kcmerrill  Desktop ]

This is the strangest thing .... when doing a compare on the slices in regards to the lengths of them both, they are equal(as you can tell because both are visible in post sort).

I'm on Mac(latest and greatest) and this is my go version:

09:07 PM ✔ kcmerrill  Desktop ] go version
go version go1.9.2 darwin/amd64
09:07 PM ✔ kcmerrill  Desktop ]

Test error due to non-deletable 'noaccess' subdir

Because the 'noaccess' directory has all permissions removed, it can cause a test failures in the scaffolding_test.go teardown() function. In particular, this is occurring on my Mac. But even with Linux, an error will occur if the 'noaccess' directory has any entries in it (since the recursive delete cannot read the directory entries, or write to the non-empty directory to remove the contents).

Here is an example of the test failure messages on MacOs:

--- FAIL: TestReadDirents (0.01s)
    scaffoling_test.go:90: openat noaccess: permission denied

A solution is to set 0700 permissions on the 'noaccess' dir in the teardown function.

Don't ship testdata with the repo

It would be nice if dirwalk did not ship the test scaffolding symlinks as part of the repo, but instead generated them on the fly (or with a script).

This causes minor headaches, as I will occasionally check for broken symlinks in my $HOME folder, and my local clone of scc (which depends on dirwalk) regularly turns up false positives, which I have to adjust my tools to deal with.

Should work with Solaris

Building on Solaris fails.

karrick/godirwalk/readdir.go:20:9: undefined: readdirents
karrick/godirwalk/readdir.go:46:9: undefined: readdirnames

Adding solaris to the build tags in readdir_unix.go results in:

karrick/godirwalk/readdir_unix.go:41:7: undefined: inoFromDirent
karrick/godirwalk/readdir_unix.go:55:13: de.Type undefined (type *syscall.Dirent has no field or method Type)
karrick/godirwalk/readdir_unix.go:56:9: undefined: syscall.DT_REG

godirwalk causes runtime panic with -checkptr on Go 1.14

Steps to reproduce:

# go should be at least 1.14
go test -race -bench . .

Result:

fatal error: checkptr: unsafe pointer conversion

goroutine 25 [running]:
runtime.throw(0x6233ac, 0x23)
	$GOROOT/src/runtime/panic.go:1112 +0x72 fp=0xc000121220 sp=0xc0001211f0 pc=0x45fd42
runtime.checkptrAlignment(0xc0006d2f00, 0x607380, 0x1)
	$GOROOT/src/runtime/checkptr.go:18 +0xb7 fp=0xc000121250 sp=0xc000121220 pc=0x433507
github.com/karrick/godirwalk.readDirents(0xc0006eeb90, 0x50, 0xc0006d2000, 0x1000, 0x1000, 0x1620, 0x162, 0x200, 0xc0006f1620, 0xc0006f0000)
	$GOPATH/src/github.com/karrick/godirwalk/readdir_unix.go:54 +0x18d fp=0xc0001213e0 sp=0xc000121250 pc=0x5ba05d
github.com/karrick/godirwalk.ReadDirents(...)
	$GOPATH/src/github.com/karrick/godirwalk/readdir.go:23
github.com/karrick/godirwalk.newSortedScanner(0xc0006eeb90, 0x50, 0xc0006d2000, 0x1000, 0x1000, 0x4f, 0x1, 0x49)
	$GOPATH/src/github.com/karrick/godirwalk/scanner.go:21 +0x86 fp=0xc000121470 sp=0xc0001213e0 pc=0x5bc6e6
github.com/karrick/godirwalk.walk(0xc0006eeb90, 0x50, 0xc0006f6840, 0xc000121d88, 0x50, 0x0)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:261 +0xb5d fp=0xc0001215c0 sp=0xc000121470 pc=0x5bde7d
github.com/karrick/godirwalk.walk(0xc0006ee820, 0x48, 0xc0006f65d0, 0xc000121d88, 0x48, 0xc000128060)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:279 +0x51c fp=0xc000121710 sp=0xc0001215c0 pc=0x5bd83c
github.com/karrick/godirwalk.walk(0xc0006ee780, 0x43, 0xc0006f65a0, 0xc000121d88, 0x43, 0xc00000ffc0)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:279 +0x51c fp=0xc000121860 sp=0xc000121710 pc=0x5bd83c
github.com/karrick/godirwalk.walk(0xc00001e540, 0x3b, 0xc0006f64b0, 0xc000121d88, 0x3b, 0xc00000ff80)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:279 +0x51c fp=0xc0001219b0 sp=0xc000121860 pc=0x5bd83c
github.com/karrick/godirwalk.walk(0xc00001e4c0, 0x31, 0xc000114330, 0xc000121d88, 0x31, 0x0)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:279 +0x51c fp=0xc000121b00 sp=0xc0001219b0 pc=0x5bd83c
github.com/karrick/godirwalk.walk(0xc00012e060, 0x26, 0xc0001142d0, 0xc000121d88, 0x0, 0x0)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:279 +0x51c fp=0xc000121c50 sp=0xc000121b00 pc=0x5bd83c
github.com/karrick/godirwalk.Walk(0xc00012e060, 0x26, 0xc000053d88, 0x76235c, 0x1)
	$GOPATH/src/github.com/karrick/godirwalk/walk.go:204 +0x36b fp=0xc000121d20 sp=0xc000121c50 pc=0x5bcf6b
github.com/karrick/godirwalk.godirwalkWalk(0x655620, 0xc0001f8380, 0xc00012e060, 0x26, 0x54487e, 0xc0001f84c9, 0x18)
	$GOPATH/src/github.com/karrick/godirwalk/walk_test.go:30 +0x15d fp=0xc000121dd8 sp=0xc000121d20 pc=0x5c2d0d
github.com/karrick/godirwalk.BenchmarkGodirwalk(0xc0001f8380)
	$GOPATH/src/github.com/karrick/godirwalk/walk_test.go:295 +0x102 fp=0xc000121e48 sp=0xc000121dd8 pc=0x5c4be2
testing.(*B).runN(0xc0001f8380, 0x1)
	$GOROOT/src/testing/benchmark.go:191 +0x1b5 fp=0xc000121f68 sp=0xc000121e48 pc=0x5451b5
testing.(*B).run1.func1(0xc0001f8380)
	$GOROOT/src/testing/benchmark.go:231 +0x76 fp=0xc000121fd8 sp=0xc000121f68 pc=0x555476
runtime.goexit()
	$GOROOT/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000121fe0 sp=0xc000121fd8 pc=0x491511
created by testing.(*B).run1
	$GOROOT/src/testing/benchmark.go:224 +0x8f

Since Go 1.14 -d=checkptr is enabled with -race.

With async preemption that is a recent Go addition, rules for unsafe became more strict.

1.14 release notes mentions this:

When converting unsafe.Pointer to *T, the resulting pointer must be aligned appropriately for T.

I believe this is a root of panic here.

godirwalk/scandir_unix.go

Lines 147 to 148 in 28c3d94

s.sde = (*syscall.Dirent)(unsafe.Pointer(&s.workBuffer[0])) // point entry to first syscall.Dirent in buffer
s.workBuffer = s.workBuffer[reclen(s.sde):] // advance buffer for next iteration through loop

See golang/go#34964

If we don't fix this, users can't run their tests/apps in -race mode if they're using godirwalk.

Benchmarks fail for other reasons even without -race:

Benchmark2ReadDirentsGodirwalk
    Benchmark2ReadDirentsGodirwalk: benchmark_test.go:24: open /mnt/ram_disk/src/linkedin/dashboards: no such file or directory
--- FAIL: Benchmark2ReadDirentsGodirwalk
Benchmark2ReadDirnamesGodirwalk
    Benchmark2ReadDirnamesGodirwalk: benchmark_test.go:38: open /mnt/ram_disk/src/linkedin/dashboards: no such file or directory
--- FAIL: Benchmark2ReadDirnamesGodirwalk
Benchmark2GodirwalkSorted
    Benchmark2GodirwalkSorted: benchmark_test.go:60: GOT: lstat /mnt/ram_disk/src: no such file or directory; WANT: nil
--- FAIL: Benchmark2GodirwalkSorted
Benchmark2GodirwalkUnsorted
    Benchmark2GodirwalkUnsorted: benchmark_test.go:81: GOT: lstat /mnt/ram_disk/src: no such file or directory; WANT: nil

CI Testing on Windows and UNIX before merge.

@chadnetzer, thanks for your many contributions to this project. While the library remains small in spirit, it definitely has a lot of users, and I do not want to break their workflow because I'm too lazy to boot up my Windows box every time we want to test out some changes.

I intend to go ahead and configure this project to run integration tests on PRs both on Windows and on UNIX platforms prior to accepting those PRs. This issue is to remind me to do this.

Do you have any recommendations or experience with various CI integration tools using GitHub? There are quite a few programs that already integrate with Windows, and while I'm tempted to use Travis CI because I happen to see it everywhere else, I would not object to a different platform if you have feelings one way or the other.

https://github.com/marketplace/category/continuous-integration

scandir_windows.go should use faster code

HELP WANTED

If anyone has the knowledge and the cycles to figure out how to more quickly read a file system directory in Windows, I'd love to hear from you, whether it's a PR, or a link to some document I can read to learn more.

Thanks: inquiry on the skipNode functionality in regards to file name filtering

Sorry for the late reply, I´m timing my replies with the classes I have.

The flag you created does exactly what we hoped it would.

And yes, I was a bit surprised about the loop being broken on a file, but then again you are right about the documentation stating that it should.

Thank for you updating this so quickly <3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.