Add checkError parameter to os.walkDir #13011

demotomohiro · 2020-01-02T19:33:35Z

Fix #10309 without breaking change.
Related to https://forum.nim-lang.org/t/4546

I think this is a better way than #12960.

Sample code:

import os

let dir = if paramCount() > 0: paramStr(1) else: "."
for kind, path in walkDir(dir, checkError = true):
  echo kind, ": ", path

Output on Linux:

colab@67c26b208c9e:~$ ./test aa
/home/colab/Nim/lib/pure/os.nim(2066) test
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: No such file or directory [OSError]
colab@67c26b208c9e:~$ ./test /root
/home/colab/Nim/lib/pure/os.nim(2066) test
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied [OSError]

Output on Windows 8.1:

f:\temp>test.exe c:\aaaaa
f:\project\demotomohiro\Nim\lib\pure\os.nim(2029) test
f:\project\demotomohiro\Nim\lib\pure\includes\oserr.nim(94) raiseOSError
Error: unhandled exception: 指定されたパスが見つかりません。
 [OSError]

"指定されたパスが見つかりません。" means "The specified path can not be found.".

timotheecour · 2020-01-02T20:43:52Z

lib/pure/os.nim

@@ -2021,6 +2026,8 @@ iterator walkDir*(dir: string; relative=false): tuple[kind: PathComponent, path:
            let errCode = getLastError()
            if errCode == ERROR_NO_MORE_FILES: break
            else: raiseOSError(errCode.OSErrorCode)
+      elif checkError:
+        raiseOSError(osLastError())


raiseOSError(osLastError())
=>
raiseOSError(osLastError(), dir)

to show useful runtime info

ditto with other instance of raiseOSError

Araq · 2020-01-03T15:32:52Z

I like it but what's so special about walkDir? There are other iterators like it, shouldn't they all offer the same?

timotheecour · 2020-01-03T19:42:56Z

IMO we should default to checkError=true instead of checkError=false; throwing errors instead of silently swallowing them prevents bugs
(The workaround for code that starts failing is then to add checkError=false; maybe a compatibility flag --compat:oldWalkDir can be used if needed to change that default )

Araq · 2020-01-03T23:53:47Z

Dunno, when I walk over a directory inaccessible dirs might as well not exist so I don't consider it "error prone". At least not more error prone than anything else related to path, file and directory handling (it's a dirty mess).

timotheecour · 2020-01-04T00:08:06Z

but right now this doesn't even throw when the root dir doesn't exist / is empty etc

import std/os
for a in walkDir("/hoome/"): # oups, mis-spellt /home/ but doesn't even throw
  echo a
for a in walkDir(getEnv("nonexistant")): # oups, getEnv("nonexistant") is empty but doesn't even throw
  echo a

the only thing that would make sense to ignore is in-accessibility of NESTED dirs (for walkDirRec), not top-level dir; everything else should IMO throw as there is no valid use case of it. In particular:

if top-dir doesn't exist: throw
if top-dir exist but is inaccessible: throw
if a nested dir is not accessible (only makes sense for walkDirRec IIRC): throw if checkError=true; and use checkError=false as default

Araq · 2020-01-04T09:59:21Z

In particular

Good points, I agree.

disruptek · 2020-01-04T23:35:45Z

Maybe I'm the idiot, but it's still not obvious to me what happens with readable versus executable directory entries. Can you rename checkError to either check: set[Something] = {} or make it a little more explicit?

It might be nice to have this thing throw two varieties of exception; one soft, one hard.

timotheecour · 2020-01-05T00:39:05Z

no strong opinion on this but if you're gonna offer such control might as well offer the more flexible callback-way, using FilterDescend + FilterYield, see https://citycide.github.io/glob/latest/glob.html#FilterDescend ; it allows you full control on what you recurse into and what you yield. Perhaps it's good enough as a separate nimble package though.

demotomohiro · 2020-01-07T08:50:41Z

@disruptek
List of cases each iterators raise exception when checkError is true.
(This list maybe incomplete)
walkDir:

invalid path (e.g. path includes invalid charactor) (windows only? On ubuntu, even "../../../../../../" works on home dir)
Specified directory doesn't exist.
Access to the specified directory denied.

walkDirRec:

invalid path (e.g. path includes invalid charactor) (windows only?)
Specified directory doesn't exist.
Access to the specified directory denied.
Access to one of sub directory denied.
(Exception can be raised after yielding other files in specified dir)

walkPattern:

invalid path (e.g. path includes invalid charactor) (windows only?)
Specified path doesn't exist.
No match (On Linux, osLastError() doesn't tell why it failed)
Accesss to the directory in the pattern denied

I tested following testwalkdir.nim and testwalkpattern.nim on Ubuntu 18.04.3 LTS.
testwalkdir.nim

import os

let input = if paramCount() > 0: paramStr(1) else: "."
for i in walkDir(input, checkError = true):
  echo i

testwalkpattern.nim

import os

let input = if paramCount() > 0: paramStr(1) else: "."
for i in walkPattern(input, checkError = true):
  echo i

$ mkdir testdir
$ mkdir testdir/testsub
$ touch testdir/foo
$ touch testdir/testsub/foo
# remove executable permission
$ chmod 666 testdir

$ nim c testwalkdir.nim
$ ./testwalkdir
(kind: pcFile, path: "./testwalkdir")
(kind: pcFile, path: "./testwalkdir.nim")
(kind: pcDir, path: "./testdir")
$ ./testwalkdir testdir
(kind: pcDir, path: "testdir/testsub")
(kind: pcFile, path: "testdir/foo")
$ ./testwalkdir testdir/testsub
/home/colab/Nim/lib/pure/os.nim(2082) testwalkdir
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub" [OSError]

$ nim c testwalkpattern.nim
$ ./testwalkpattern \*
testdir
testwalkdir.nim
testwalkpattern
testwalkpattern.nim
$ ./testwalkpattern testdir/\*
testdir/foo
testdir/testsub
$ ./testwalkpattern testdir/testsub/\*
/home/colab/Nim/lib/pure/os.nim(1845) testwalkpattern
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub/*" [OSError]
$ ./testwalkpattern testdir/testsub
/home/colab/Nim/lib/pure/os.nim(1845) testwalkpattern
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub" [OSError]

So I can use os.walkDir to list all files of a directory without execute permission.
But walkDir to the subdirectory in a directory without execute permission fail.

See also:
https://unix.stackexchange.com/questions/21251/execute-vs-read-bit-how-do-directory-permissions-in-linux-work

demotomohiro · 2020-01-07T09:31:38Z

@Araq @timotheecour
I agree that default to checkError=true is better than default to checkError=false.
But it is breaking change.
Should I change the default value to true and add a following flag for old code?

const defaultCheckErrorVal = not defined(nimWalkDirCheckErrorCmpat)
...
iterator walkDir*(dir: string;
                  relative=false;
                  checkError=defaultCheckErrorVal): tuple[kind: PathComponent, path: string] {.

Araq · 2020-01-07T09:58:07Z

Should I change the default value to true and add a following flag for old code?

Yeah, I think so. Also enable this for the 1.0 emulation like so:

when (NimMajor, NimMinor) <= (1, 0):
  const defaultCheckErrorVal = false
else:
  const defaultCheckErrorVal = not defined(nimWalkDirCheckErrorCmpat)

(Yes I know there are shorter ways to write the same, but this is most readable to me.)

Araq · 2020-01-07T16:07:25Z

Er, if our very own code breaks with this switch turned on it's time to reconsider, sorry. Checking shouldn't be the default then.

a-mr · 2020-01-10T08:42:11Z

walkDirRec will throw an exception in the middle of traversal, which is usually not desired. Imagine you are doing a full-text search (like grep) through files, you want to be notified about inaccessible directories but you still want to find as much matches as possible, not just all that happened to be found prior the exception.

One can distinguish between "fatal" errors, which are always raised, and "non-fatal" ones. I suggest to return the list of non-fatal errors via an additional parameter like nonFatal: ref seq[OSError] = nil. If nil is passed then the errors will be silently ignored as they are now. If ref to seq is passed, then it will be filled by all the non-fatal errors that happened during execution of the iterator.

The same logic can be applied to some procs like removeDir.

Araq · 2020-01-10T09:59:18Z

You can also use your own iteration scheme and error handling based on what os.nim gives to you. No need to burden the stdlib with every minor use case.

a-mr · 2020-01-15T22:14:39Z

OK, you are right, we should avoid bloat.

However the proposed exception-based walkDir is not completely correct in e.g. a subtle case of symbolic links. If a link points to file or directory inside an inaccessible directory, then it can not be determined that the link is to file or directory. Current implementation just shows it as kind: pcLinkToFile, no error reported. And it obviously can not be reported by exception without throwing away other directory entries.

I believe this problem can not be solved with exception-based implementations. I propose to add a new iterator walkDirErr in #13163.

timotheecour · 2020-01-15T22:37:53Z

@a-mr @Araq there's a subtle point we haven't discussed in this PR (also affects #13163):
raising inside an iterator prevents continuing iteration; so if you'll abort on first error even if you try/catch it as in

try:
  for i in walkDir(...):
    echo i
except:
 echo "too late; can't resume"

I see only 2 ways:

pass a callback handleError(...) to the iterator, the callback is called by the iterator instead of throwing
(requires compiler change?) provide a more general mechanism to allow resuming an iterator of a try/catch

while true:
  try:
    process(myiter(arg))
  except:
    discard # caught but allows resuming
  if finished(myiter(arg)): break

but it's ugly; maybe this could be supported?

for i in myiter(arg):
except: # applies to each iteration
  # iteration continues
  break # breaks iteration if user wants
finally:
  # optional block

Araq · 2020-01-15T22:43:15Z

Why not simply use the bare bones iterator and do the logic on your own? An iterator taking a callback means the iterator tries to do too much and fails at it...

timotheecour · 2020-01-15T22:48:35Z

well even with a barbones iterator we need a wayt to let user know about unusual conditions such as inaccessible directory during a recursive traversal; simply throwing doesn't work as I just said (unrecoverable even with try/except); and simply ignoring isn't helpful either (no way to tell if something went wrong); the simplest is a callback

demotomohiro · 2020-01-16T20:15:17Z

Continuing iteration after raising inside an iterator is possible with closure iterator.
But closure iterator is not available at compile time and js backend.

type TestException = ref object of Exception
  x: int

iterator mod3(): int {.closure.} =
  var i = 0
  while i != 12:
    inc i
    if i mod 3 == 0:
      raise TestException(x: i)
    yield i

when false:
  #Doesn't work
  #Keep raising exception after first exception.
  iterator mod3(): int {.closure.} =
    for i in 1..11:
      if i mod 3 == 0:
        raise TestException(x: i)
      yield i

var myiter = mod3

while not finished(myiter):
  try:
    for i in myiter():
      echo i
  except TestException as e:
    if e.x mod 9 == 0:
      echo "Stop iteration"
      break
    else:
      echo "Ignore Exception: ", e.x

…lkDirs

disruptek · 2020-01-16T22:25:12Z

No, the simplest solution would be to yield the data that the user needs to evaluate the iteration. The user is gonna have to retrieve that anyway -- and you'd have to pass it to the callback. Why bother? We have a block in the form of the body of the loop and data in the form of loop vars...

Araq · 2020-01-17T09:45:31Z

No, the simplest solution would be to yield the data that the user needs to evaluate the iteration. The user is gonna have to retrieve that anyway -- and you'd have to pass it to the callback. Why bother? We have a block in the form of the body of the loop and data in the form of loop vars...

Exactly my thoughts.

Araq · 2020-01-17T10:01:40Z

Closing in favor of #13163, sorry!

timotheecour reviewed Jan 2, 2020

View reviewed changes

demotomohiro force-pushed the walkdircheckerror branch from 190402d to f62fdaf Compare January 7, 2020 18:10

demotomohiro force-pushed the walkdircheckerror branch from f62fdaf to f8772ab Compare January 10, 2020 11:00

a-mr mentioned this pull request Jan 15, 2020

add error-checked version of walkDir #13163

Closed

demotomohiro added 4 commits January 17, 2020 06:15

Add checkError parameter to os.walkDir

567af5d

Add dir to raiseOSError in os.walkDir

4e204d7

Add checkError parameter to os.walkDirRec

81c293e

Add checkError parameter to os.walkPattern, os.walkFiles and os.os.wa…

127963e

…lkDirs

demotomohiro force-pushed the walkdircheckerror branch from f8772ab to 127963e Compare January 16, 2020 21:18

Fix rebase mistake

fbaa294

Araq closed this Jan 17, 2020

demotomohiro deleted the walkdircheckerror branch January 18, 2020 03:07

timotheecour mentioned this pull request Mar 13, 2020

[RFC] 'walkDir' now has a new 'checkDir' flag, to mimic behaviour of other languages #13642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checkError parameter to os.walkDir #13011

Add checkError parameter to os.walkDir #13011

demotomohiro commented Jan 2, 2020

timotheecour Jan 2, 2020

demotomohiro Jan 3, 2020

Araq commented Jan 3, 2020

timotheecour commented Jan 3, 2020

Araq commented Jan 3, 2020

timotheecour commented Jan 4, 2020 •

edited

Loading

Araq commented Jan 4, 2020

disruptek commented Jan 4, 2020

timotheecour commented Jan 5, 2020 •

edited

Loading

demotomohiro commented Jan 7, 2020

demotomohiro commented Jan 7, 2020

Araq commented Jan 7, 2020

Araq commented Jan 7, 2020

a-mr commented Jan 10, 2020

Araq commented Jan 10, 2020

a-mr commented Jan 15, 2020

timotheecour commented Jan 15, 2020 •

edited

Loading

Araq commented Jan 15, 2020

timotheecour commented Jan 15, 2020

demotomohiro commented Jan 16, 2020

disruptek commented Jan 16, 2020

Araq commented Jan 17, 2020

Araq commented Jan 17, 2020

Add checkError parameter to os.walkDir #13011

Add checkError parameter to os.walkDir #13011

Conversation

demotomohiro commented Jan 2, 2020

timotheecour Jan 2, 2020

Choose a reason for hiding this comment

demotomohiro Jan 3, 2020

Choose a reason for hiding this comment

Araq commented Jan 3, 2020

timotheecour commented Jan 3, 2020

Araq commented Jan 3, 2020

timotheecour commented Jan 4, 2020 • edited Loading

Araq commented Jan 4, 2020

disruptek commented Jan 4, 2020

timotheecour commented Jan 5, 2020 • edited Loading

demotomohiro commented Jan 7, 2020

demotomohiro commented Jan 7, 2020

Araq commented Jan 7, 2020

Araq commented Jan 7, 2020

a-mr commented Jan 10, 2020

Araq commented Jan 10, 2020

a-mr commented Jan 15, 2020

timotheecour commented Jan 15, 2020 • edited Loading

Araq commented Jan 15, 2020

timotheecour commented Jan 15, 2020

demotomohiro commented Jan 16, 2020

disruptek commented Jan 16, 2020

Araq commented Jan 17, 2020

Araq commented Jan 17, 2020

timotheecour commented Jan 4, 2020 •

edited

Loading

timotheecour commented Jan 5, 2020 •

edited

Loading

timotheecour commented Jan 15, 2020 •

edited

Loading