Skip to content
/ locar Public
forked from tigrawap/locar

Multi threaded alternative for find, much more efficient to work with distributed/remote filesystems

License

Notifications You must be signed in to change notification settings

weka/locar

 
 

Repository files navigation

The goal behind locar is to make search parallel.

utilities like find, du, ls -R are all single threaded This is usually not a problem for local filesystem, as whole filesystem metadata might be cached, but with remote/distributed file systems this might be a problem and paralellising of readdirs might yield huge gain in performance.

As an example:

>> time find trinity/ -type f | wc -l
14099914

real 15m36.272s
user 0m7.460s
sys 0m12.848s

>> time locar trinity/ --type file | wc -l
14099914

real 0m27.501s
user 1m7.076s
sys 0m39.272s

Find Files Newer than 72h Using find

>> time find /scratch/img -type f -ctime -3 -mtime -3 -atime -3 | wc -l
495702

real	2m56.041s
user	0m1.390s
sys	0m8.526s

Find Files Newer than 72h Using locar

>> time ./locar -t file --atime-newer=72h --mtime-newer=72h --ctime-newer=72h /scratch/img | wc -l
495702

real	0m14.686s
user	0m2.349s
sys	0m4.508s

I.e ~almost X35 speedup

CLI still subject to change as locar evolves

locar --help

Usage:
  locar [OPTIONS] [directories...]

Application Options:
      --resilient      DEPRECATED and ignored, resilient is a default, use --stop-on-error if it is undesired behaviour
      --stop-on-error  Aborts scan on any error
      --inodes         Output inodes (decimal) along with filenames
      --inodes-hex     Output inodes (hexadecimal) along with filenames
      --raw            Output filenames as escaped strings
  -j, --jobs=          Number of jobs(threads) (default: 128)
      --with-size      Output file sizes along with filenames
      --with-times     Output file with atime, mtime, ctime along with filenames
      --atime-older=   Filter files by access time older than this duration (e.g., 24h5m25s) (default: 0s)
      --atime-newer=   Filter files by access time newer than this duration (e.g., 24h5m25s) (default: 0s)
      --mtime-older=   Filter files by modification time older than this duration (e.g., 24h5m25s) (default: 0s)
      --mtime-newer=   Filter files by modification time newer than this duration (e.g., 24h5m25s) (default: 0s)
      --ctime-older=   Filter files by change time older than this duration (e.g., 24h5m25s) (default: 0s)
      --ctime-newer=   Filter files by change time newer than this duration (e.g., 24h5m25s) (default: 0s)
      --result-jobs=   Number of jobs for processing results, like doing stats to get file sizes (default: 128)
      --delete         Delete found files. Non empty directories will be ignored
      --delete-all     Delete found files. Non empty directories will be removed with ALL their contents!!!
  -v, --version        Show version
  -x, --exclude=       Patterns to exclude. Can be specified multiple times
  -f, --filter=        Patterns to filter by. Can be specified multiple times
  -t, --type=          Search entries of specific type
                       Possible values: file, dir, link, socket, all. Can be specified multiple times (default: file, dir, link, socket)
      --timeout=       Timeout for readdir operations. Error will be reported, but os thread will be kept hanging (default: 5m)

Help Options:
  -h, --help           Show this help message

Arguments:
  directories:         Directories to search, using current directory if missing

About

Multi threaded alternative for find, much more efficient to work with distributed/remote filesystems

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 94.0%
  • Makefile 6.0%