Skip to content

Commit

Permalink
complete rewrite for performance and io/fs support
Browse files Browse the repository at this point in the history
  • Loading branch information
bmatcuk committed Apr 24, 2021
1 parent 1e1bcf7 commit d9a3ae0
Show file tree
Hide file tree
Showing 13 changed files with 1,566 additions and 818 deletions.
4 changes: 1 addition & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
language: go

go:
- 1.13
- 1.14
- 1.15
- 1.16

os:
- linux
Expand Down
183 changes: 140 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,17 @@

Path pattern matching and globbing supporting `doublestar` (`**`) patterns.

[![PkgGoDev](https://pkg.go.dev/badge/github.com/bmatcuk/doublestar)](https://pkg.go.dev/github.com/bmatcuk/doublestar/v2)
[![PkgGoDev](https://pkg.go.dev/badge/github.com/bmatcuk/doublestar)](https://pkg.go.dev/github.com/bmatcuk/doublestar/v4)
[![Release](https://img.shields.io/github/release/bmatcuk/doublestar.svg?branch=master)](https://github.com/bmatcuk/doublestar/releases)
[![Build Status](https://travis-ci.com/bmatcuk/doublestar.svg?branch=master)](https://travis-ci.com/bmatcuk/doublestar)
[![codecov.io](https://img.shields.io/codecov/c/github/bmatcuk/doublestar.svg?branch=master)](https://codecov.io/github/bmatcuk/doublestar?branch=master)

## About

#### [Upgrading to v2? To v3?](UPGRADING.md)
#### [Upgrading?](UPGRADING.md)

**doublestar** is a [golang](http://golang.org/) implementation of path pattern
matching and globbing with support for "doublestar" (aka globstar: `**`)
patterns.
**doublestar** is a [golang] implementation of path pattern matching and
globbing with support for "doublestar" (aka globstar: `**`) patterns.

doublestar patterns match files and directories recursively. For example, if
you had the following directory structure:
Expand All @@ -36,18 +35,22 @@ such as `/path**` is invalid and will be treated the same as `/path*`, but
match all directories and files under the path directory, but `/path/**/` will
only match directories.

v4 is a complete rewrite with a focus on performance. Additionally,
[doublestar] has been updated to use the new [io/fs] package for filesystem
access. As a result, it is only supported by [golang] v1.16+.

## Installation

**doublestar** can be installed via `go get`:

```bash
go get github.com/bmatcuk/doublestar/v2
go get github.com/bmatcuk/doublestar/v4
```

To use it in your code, you must import it:

```go
import "github.com/bmatcuk/doublestar/v2"
import "github.com/bmatcuk/doublestar/v4"
```

## Usage
Expand All @@ -58,14 +61,18 @@ import "github.com/bmatcuk/doublestar/v2"
func Match(pattern, name string) (bool, error)
```

Match returns true if `name` matches the file name `pattern`
([see below](#patterns)). `name` and `pattern` are split on forward slash (`/`)
characters and may be relative or absolute.
Match returns true if `name` matches the file name `pattern` ([see
"patterns"]). `name` and `pattern` are split on forward slash (`/`) characters
and may be relative or absolute.

Match requires pattern to match all of name, not just a substring. The only
possible returned error is ErrBadPattern, when pattern is malformed.

Note: `Match()` is meant to be a drop-in replacement for `path.Match()`. As
such, it always uses `/` as the path separator. If you are writing code that
will run on systems where `/` is not the path separator (such as Windows), you
want to use `PathMatch()` (below) instead.
Note: this is meant as a drop-in replacement for `path.Match()` which always
uses `'/'` as the path separator. If you want to support systems which use a
different path separator (such as Windows), what you want is `PathMatch()`.
Alternatively, you can run `filepath.ToSlash()` on both pattern and name and
then use this function.


### PathMatch
Expand All @@ -74,24 +81,97 @@ want to use `PathMatch()` (below) instead.
func PathMatch(pattern, name string) (bool, error)
```

PathMatch returns true if `name` matches the file name `pattern`
([see below](#patterns)). The difference between Match and PathMatch is that
PathMatch will automatically use your system's path separator to split `name`
and `pattern`.
PathMatch returns true if `name` matches the file name `pattern` ([see
"patterns"]). The difference between Match and PathMatch is that PathMatch will
automatically use your system's path separator to split `name` and `pattern`.
On systems where the path separator is `'\'`, escaping will be disabled.

`PathMatch()` is meant to be a drop-in replacement for `filepath.Match()`.
Note: this is meant as a drop-in replacement for `filepath.Match()`. It assumes
that both `pattern` and `name` are using the system's path separator. If you
can't be sure of that, use `filepath.ToSlash()` on both `pattern` and `name`,
and then use the `Match()` function instead.

### Glob

```go
func Glob(pattern string) ([]string, error)
func Glob(fsys fs.FS, pattern string) ([]string, error)
```

Glob returns the names of all files matching pattern or nil if there is no
matching file. The syntax of patterns is the same as in `Match()`. The pattern
may describe hierarchical names such as `usr/*/bin/ed`.
Glob ignores file system errors such as I/O errors reading directories. The
only possible returned error is ErrBadPattern, reporting that the pattern is
malformed.
Note: this is meant as a drop-in replacement for `io/fs.Glob()`. Like
`io/fs.Glob()`, this function assumes that your pattern uses `/` as the path
separator even if that's not correct for your OS (like Windows). If you aren't
sure if that's the case, you can use `filepath.ToSlash()` on your pattern
before calling `Glob()`.
### GlobWalk
```go
type GlobWalkFunc func(path string, d fs.DirEntry) error
func GlobWalk(fsys fs.FS, pattern string, fn GlobWalkFunc) error
```
GlobWalk calls the callback function `fn` for every file matching pattern. The
syntax of pattern is the same as in Match(). The pattern may describe
hierarchical names such as usr/*/bin/ed.

GlobWalk may have a small performance benefit over Glob if you do not need a
slice of matches because it can avoid allocating memory for the matches.
Additionally, GlobWalk gives you access to the `fs.DirEntry` objects for each
match, and lets you quit early by returning a non-nil error from your callback
function.

GlobWalk ignores file system errors such as I/O errors reading directories.
GlobWalk may return ErrBadPattern, reporting that the pattern is malformed.
Additionally, if the callback function `fn` returns an error, GlobWalk will
exit immediately and return that error.

Like Glob(), this function assumes that your pattern uses `/` as the path
separator even if that's not correct for your OS (like Windows). If you aren't
sure if that's the case, you can use filepath.ToSlash() on your pattern before
calling GlobWalk().

### SplitPattern

```go
func SplitPattern(p string) (base, pattern string)
```

SplitPattern is a utility function. Given a pattern, SplitPattern will return
two strings: the first string is everything up to the last slash (`/`) that
appears _before_ any unescaped "meta" characters (ie, `*?[{`). The second
string is everything after that slash. For example, given the pattern:
```
../../path/to/meta*/**
^----------- split here
```
SplitPattern returns "../../path/to" and "meta*/**". This is useful for
initializing os.DirFS() to call Glob() because Glob() will silently fail if
your pattern includes `/./` or `/../`. For example:
```go
base, pattern := SplitPattern("../../path/to/meta*/**")
fsys := os.DirFS(base)
matches, err := Glob(fsys, pattern)
```
Glob finds all files and directories in the filesystem that match `pattern`
([see below](#patterns)). `pattern` may be relative (to the current working
directory), or absolute.
If SplitPattern cannot find somewhere to split the pattern (for example,
`meta*/**`), it will return "." and the unaltered pattern (`meta*/**` in this
example).

`Glob()` is meant to be a drop-in replacement for `filepath.Glob()`.
Of course, it is your responsibility to decide if the returned base path is
"safe" in the context of your application. Perhaps you could use Match() to
validate against a list of approved base directories?

### Patterns

Expand All @@ -100,13 +180,14 @@ directory), or absolute.
Special Terms | Meaning
------------- | -------
`*` | matches any sequence of non-path-separators
`**` | matches any sequence of characters, including path separators
`/**/` | matches zero or more directories
`?` | matches any single non-path-separator character
`[class]` | matches any single non-path-separator character against a class of characters ([see below](#character-classes))
`[class]` | matches any single non-path-separator character against a class of characters ([see "character classes"])
`{alt1,...}` | matches a sequence of characters if one of the comma-separated alternatives matches

Any character with a special meaning can be escaped with a backslash (`\`).

A doublestar (`**`) should appear surrounded by path separators such as `/**/`.
A mid-pattern doublestar (`**`) behaves like bash's globstar option: a pattern
such as `path/to/**.txt` would return the same results as `path/to/*.txt`. The
pattern you're looking for is `path/to/**/*.txt`.
Expand All @@ -120,28 +201,44 @@ Class | Meaning
`[abc]` | matches any single character within the set
`[a-z]` | matches any single character in the range
`[^class]` | matches any single character which does *not* match the class
`[!class]` | same as `^`: negates the class

### Abstracting the `os` package
## Performance

**doublestar** by default uses the `Open`, `Stat`, and `Lstat`, functions and
`PathSeparator` value from the standard library's `os` package. To abstract
this, for example to be able to perform tests of Windows paths on Linux, or to
interoperate with your own filesystem code, it includes the functions `GlobOS`
and `PathMatchOS` which are identical to `Glob` and `PathMatch` except that they
operate on an `OS` interface:
```go
type OS interface {
Lstat(name string) (os.FileInfo, error)
Open(name string) (*os.File, error)
PathSeparator() rune
Stat(name string) (os.FileInfo, error)
}
```
goos: darwin
goarch: amd64
pkg: github.com/bmatcuk/doublestar/v4
cpu: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
BenchmarkMatch-8 285639 3868 ns/op 0 B/op 0 allocs/op
BenchmarkGoMatch-8 286945 3726 ns/op 0 B/op 0 allocs/op
BenchmarkPathMatch-8 320511 3493 ns/op 0 B/op 0 allocs/op
BenchmarkGoPathMatch-8 304236 3434 ns/op 0 B/op 0 allocs/op
BenchmarkGlob-8 466 2501123 ns/op 190225 B/op 2849 allocs/op
BenchmarkGlobWalk-8 476 2536293 ns/op 184017 B/op 2750 allocs/op
BenchmarkGoGlob-8 463 2574836 ns/op 194249 B/op 2929 allocs/op
```

These benchmarks (in `doublestar_test.go`) compare Match() to path.Match(),
PathMath() to filepath.Match(), and Glob() + GlobWalk() to io/fs.Glob(). They
only run patterns that the standard go packages can understand as well (so, no
`{alts}` or `**`) for a fair comparison. Of course, alts and doublestars will
be less performant than the other pattern meta characters.

`StandardOS` is a value that implements this interface by calling functions in
the standard library's `os` package.
Alts are essentially like running multiple patterns, the number of which can
get large if your pattern has alts nested inside alts. This affects both
matching (ie, Match()) and globbing (Glob()).

`**` performance in matching is actually pretty similar to a regular `*`, but
can cause a large number of reads when globbing as it will need to recursively
traverse your filesystem.

## License

[MIT License](LICENSE)

[doublestar]: https://github.com/bmatcuk/doublestar
[golang]: http://golang.org/
[io/fs]: https://golang.org/pkg/io/fs/
[see "character classes"]: #character-classes
[see "patterns"]: #patterns
50 changes: 46 additions & 4 deletions UPGRADING.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,40 @@
# Upgrading from v3 to v4

v4 is a complete rewrite with a focus on performance. Additionally,
[doublestar] has been updated to use the new [io/fs] package for filesystem
access. As a result, it is only supported by [golang] v1.16+.

`Match()` and `PathMatch()` mostly did not change, besides big performance
improvements. Their API is the same. However, note the following corner cases:

* In previous versions of [doublestar], `PathMatch()` could accept patterns
that used either platform-specific path separators, or `/`. This was
undocumented and didn't match `filepath.Match()`. In v4, both `pattern` and
`name` must be using appropriate path separators for the platform. You can
use `filepath.FromSlash()` to change `/` to platform-specific separators if
you aren't sure.
* In previous versions of [doublestar], a pattern such as `path/to/a/**` would
_not_ match `path/to/a`. In v4, this pattern _will_ match because if `a` was
a directory, `Glob()` would return it. In other words, the following returns
true: `Match("path/to/a/**", "path/to/a")`

`Glob()` changed from using a [doublestar]-specific filesystem abstraction (the
`OS` interface) to the [io/fs] package. As a result, it now takes a `fs.FS` as
its first argument. This change has a couple ramifications:

* Like `io/fs.Glob`, `pattern` must use a `/` as path separator, even on
platforms that use something else. You can use `filepath.ToSlash()` on your
patterns if you aren't sure.
* Patterns that contain `/./` or `/../` are invalid. The [io/fs] package
rejects them, returning an IO error. Since `Glob()` ignores IO errors, it'll
end up being silently rejected. You can run `path.Clean()` to ensure they are
removed from the pattern.

v4 also added a `GlobWalk()` function that is slightly more performant than
`Glob()` if you just need to iterate over the results and don't need a string
slice. You also get `fs.DirEntry` objects for each result, and can quit early
if your callback returns an error.

# Upgrading from v2 to v3

v3 introduced using `!` to negate character classes, in addition to `^`. If any
Expand All @@ -12,10 +49,15 @@ The change from v1 to v2 was fairly minor: the return type of the `Open` method
on the `OS` interface was changed from `*os.File` to `File`, a new interface
exported by doublestar. The new `File` interface only defines the functionality
doublestar actually needs (`io.Closer` and `Readdir`), making it easier to use
doublestar with [go-billy](https://github.com/src-d/go-billy),
[afero](https://github.com/spf13/afero), or something similar. If you were
using this functionality, updating should be as easy as updating `Open's`
return type, since `os.File` already implements `doublestar.File`.
doublestar with [go-billy], [afero], or something similar. If you were using
this functionality, updating should be as easy as updating `Open's` return
type, since `os.File` already implements `doublestar.File`.

If you weren't using this functionality, updating should be as easy as changing
your dependencies to point to v2.

[afero]: https://github.com/spf13/afero
[doublestar]: https://github.com/bmatcuk/doublestar
[go-billy]: https://github.com/src-d/go-billy
[golang]: http://golang.org/
[io/fs]: https://golang.org/pkg/io/fs/
Loading

0 comments on commit d9a3ae0

Please sign in to comment.