Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkError parameter to os.walkDir #13011

Closed
wants to merge 5 commits into from

Conversation

demotomohiro
Copy link
Contributor

Fix #10309 without breaking change.
Related to https://forum.nim-lang.org/t/4546

I think this is a better way than #12960.

Sample code:

import os

let dir = if paramCount() > 0: paramStr(1) else: "."
for kind, path in walkDir(dir, checkError = true):
  echo kind, ": ", path

Output on Linux:

colab@67c26b208c9e:~$ ./test aa
/home/colab/Nim/lib/pure/os.nim(2066) test
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: No such file or directory [OSError]
colab@67c26b208c9e:~$ ./test /root
/home/colab/Nim/lib/pure/os.nim(2066) test
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied [OSError]

Output on Windows 8.1:

f:\temp>test.exe c:\aaaaa
f:\project\demotomohiro\Nim\lib\pure\os.nim(2029) test
f:\project\demotomohiro\Nim\lib\pure\includes\oserr.nim(94) raiseOSError
Error: unhandled exception: 指定されたパスが見つかりません。
 [OSError]

"指定されたパスが見つかりません。" means "The specified path can not be found.".

lib/pure/os.nim Outdated
@@ -2021,6 +2026,8 @@ iterator walkDir*(dir: string; relative=false): tuple[kind: PathComponent, path:
let errCode = getLastError()
if errCode == ERROR_NO_MORE_FILES: break
else: raiseOSError(errCode.OSErrorCode)
elif checkError:
raiseOSError(osLastError())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raiseOSError(osLastError())
=>
raiseOSError(osLastError(), dir)

to show useful runtime info

ditto with other instance of raiseOSError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@Araq
Copy link
Member

Araq commented Jan 3, 2020

I like it but what's so special about walkDir? There are other iterators like it, shouldn't they all offer the same?

@timotheecour
Copy link
Member

IMO we should default to checkError=true instead of checkError=false; throwing errors instead of silently swallowing them prevents bugs
(The workaround for code that starts failing is then to add checkError=false; maybe a compatibility flag --compat:oldWalkDir can be used if needed to change that default )

@Araq
Copy link
Member

Araq commented Jan 3, 2020

Dunno, when I walk over a directory inaccessible dirs might as well not exist so I don't consider it "error prone". At least not more error prone than anything else related to path, file and directory handling (it's a dirty mess).

@timotheecour
Copy link
Member

timotheecour commented Jan 4, 2020

but right now this doesn't even throw when the root dir doesn't exist / is empty etc

import std/os
for a in walkDir("/hoome/"): # oups, mis-spellt /home/ but doesn't even throw
  echo a
for a in walkDir(getEnv("nonexistant")): # oups, getEnv("nonexistant") is empty but doesn't even throw
  echo a

the only thing that would make sense to ignore is in-accessibility of NESTED dirs (for walkDirRec), not top-level dir; everything else should IMO throw as there is no valid use case of it. In particular:

  • if top-dir doesn't exist: throw
  • if top-dir exist but is inaccessible: throw
  • if a nested dir is not accessible (only makes sense for walkDirRec IIRC): throw if checkError=true; and use checkError=false as default

@Araq
Copy link
Member

Araq commented Jan 4, 2020

In particular

Good points, I agree.

@disruptek
Copy link
Contributor

Maybe I'm the idiot, but it's still not obvious to me what happens with readable versus executable directory entries. Can you rename checkError to either check: set[Something] = {} or make it a little more explicit?

It might be nice to have this thing throw two varieties of exception; one soft, one hard.

@timotheecour
Copy link
Member

timotheecour commented Jan 5, 2020

no strong opinion on this but if you're gonna offer such control might as well offer the more flexible callback-way, using FilterDescend + FilterYield, see https://citycide.github.io/glob/latest/glob.html#FilterDescend ; it allows you full control on what you recurse into and what you yield. Perhaps it's good enough as a separate nimble package though.

@demotomohiro
Copy link
Contributor Author

@disruptek
List of cases each iterators raise exception when checkError is true.
(This list maybe incomplete)
walkDir:

  • invalid path (e.g. path includes invalid charactor) (windows only? On ubuntu, even "../../../../../../" works on home dir)
  • Specified directory doesn't exist.
  • Access to the specified directory denied.

walkDirRec:

  • invalid path (e.g. path includes invalid charactor) (windows only?)
  • Specified directory doesn't exist.
  • Access to the specified directory denied.
  • Access to one of sub directory denied.
    (Exception can be raised after yielding other files in specified dir)

walkPattern:

  • invalid path (e.g. path includes invalid charactor) (windows only?)
  • Specified path doesn't exist.
  • No match (On Linux, osLastError() doesn't tell why it failed)
  • Accesss to the directory in the pattern denied

I tested following testwalkdir.nim and testwalkpattern.nim on Ubuntu 18.04.3 LTS.
testwalkdir.nim

import os

let input = if paramCount() > 0: paramStr(1) else: "."
for i in walkDir(input, checkError = true):
  echo i

testwalkpattern.nim

import os

let input = if paramCount() > 0: paramStr(1) else: "."
for i in walkPattern(input, checkError = true):
  echo i
$ mkdir testdir
$ mkdir testdir/testsub
$ touch testdir/foo
$ touch testdir/testsub/foo
# remove executable permission
$ chmod 666 testdir
$ nim c testwalkdir.nim
$ ./testwalkdir
(kind: pcFile, path: "./testwalkdir")
(kind: pcFile, path: "./testwalkdir.nim")
(kind: pcDir, path: "./testdir")
$ ./testwalkdir testdir
(kind: pcDir, path: "testdir/testsub")
(kind: pcFile, path: "testdir/foo")
$ ./testwalkdir testdir/testsub
/home/colab/Nim/lib/pure/os.nim(2082) testwalkdir
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub" [OSError]
$ nim c testwalkpattern.nim
$ ./testwalkpattern \*
testdir
testwalkdir.nim
testwalkpattern
testwalkpattern.nim
$ ./testwalkpattern testdir/\*
testdir/foo
testdir/testsub
$ ./testwalkpattern testdir/testsub/\*
/home/colab/Nim/lib/pure/os.nim(1845) testwalkpattern
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub/*" [OSError]
$ ./testwalkpattern testdir/testsub
/home/colab/Nim/lib/pure/os.nim(1845) testwalkpattern
/home/colab/Nim/lib/pure/includes/oserr.nim(94) raiseOSError
Error: unhandled exception: Permission denied
Additional info: "testdir/testsub" [OSError]

So I can use os.walkDir to list all files of a directory without execute permission.
But walkDir to the subdirectory in a directory without execute permission fail.

See also:
https://unix.stackexchange.com/questions/21251/execute-vs-read-bit-how-do-directory-permissions-in-linux-work

@demotomohiro
Copy link
Contributor Author

@Araq @timotheecour
I agree that default to checkError=true is better than default to checkError=false.
But it is breaking change.
Should I change the default value to true and add a following flag for old code?

const defaultCheckErrorVal = not defined(nimWalkDirCheckErrorCmpat)
...
iterator walkDir*(dir: string;
                  relative=false;
                  checkError=defaultCheckErrorVal): tuple[kind: PathComponent, path: string] {.

@Araq
Copy link
Member

Araq commented Jan 7, 2020

Should I change the default value to true and add a following flag for old code?

Yeah, I think so. Also enable this for the 1.0 emulation like so:

when (NimMajor, NimMinor) <= (1, 0):
  const defaultCheckErrorVal = false
else:
  const defaultCheckErrorVal = not defined(nimWalkDirCheckErrorCmpat)

(Yes I know there are shorter ways to write the same, but this is most readable to me.)

@Araq
Copy link
Member

Araq commented Jan 7, 2020

Er, if our very own code breaks with this switch turned on it's time to reconsider, sorry. Checking shouldn't be the default then.

@a-mr
Copy link
Contributor

a-mr commented Jan 10, 2020

walkDirRec will throw an exception in the middle of traversal, which is usually not desired. Imagine you are doing a full-text search (like grep) through files, you want to be notified about inaccessible directories but you still want to find as much matches as possible, not just all that happened to be found prior the exception.

One can distinguish between "fatal" errors, which are always raised, and "non-fatal" ones. I suggest to return the list of non-fatal errors via an additional parameter like nonFatal: ref seq[OSError] = nil. If nil is passed then the errors will be silently ignored as they are now. If ref to seq is passed, then it will be filled by all the non-fatal errors that happened during execution of the iterator.

The same logic can be applied to some procs like removeDir.

@Araq
Copy link
Member

Araq commented Jan 10, 2020

You can also use your own iteration scheme and error handling based on what os.nim gives to you. No need to burden the stdlib with every minor use case.

@a-mr
Copy link
Contributor

a-mr commented Jan 15, 2020

OK, you are right, we should avoid bloat.

However the proposed exception-based walkDir is not completely correct in e.g. a subtle case of symbolic links. If a link points to file or directory inside an inaccessible directory, then it can not be determined that the link is to file or directory. Current implementation just shows it as kind: pcLinkToFile, no error reported. And it obviously can not be reported by exception without throwing away other directory entries.

I believe this problem can not be solved with exception-based implementations. I propose to add a new iterator walkDirErr in #13163.

@timotheecour
Copy link
Member

timotheecour commented Jan 15, 2020

@a-mr @Araq there's a subtle point we haven't discussed in this PR (also affects #13163):
raising inside an iterator prevents continuing iteration; so if you'll abort on first error even if you try/catch it as in

try:
  for i in walkDir(...):
    echo i
except:
 echo "too late; can't resume"

I see only 2 ways:

  • pass a callback handleError(...) to the iterator, the callback is called by the iterator instead of throwing
  • (requires compiler change?) provide a more general mechanism to allow resuming an iterator of a try/catch
while true:
  try:
    process(myiter(arg))
  except:
    discard # caught but allows resuming
  if finished(myiter(arg)): break

but it's ugly; maybe this could be supported?

for i in myiter(arg):
except: # applies to each iteration
  # iteration continues
  break # breaks iteration if user wants
finally:
  # optional block

@Araq
Copy link
Member

Araq commented Jan 15, 2020

Why not simply use the bare bones iterator and do the logic on your own? An iterator taking a callback means the iterator tries to do too much and fails at it...

@timotheecour
Copy link
Member

well even with a barbones iterator we need a wayt to let user know about unusual conditions such as inaccessible directory during a recursive traversal; simply throwing doesn't work as I just said (unrecoverable even with try/except); and simply ignoring isn't helpful either (no way to tell if something went wrong); the simplest is a callback

@demotomohiro
Copy link
Contributor Author

Continuing iteration after raising inside an iterator is possible with closure iterator.
But closure iterator is not available at compile time and js backend.

type TestException = ref object of Exception
  x: int

iterator mod3(): int {.closure.} =
  var i = 0
  while i != 12:
    inc i
    if i mod 3 == 0:
      raise TestException(x: i)
    yield i

when false:
  #Doesn't work
  #Keep raising exception after first exception.
  iterator mod3(): int {.closure.} =
    for i in 1..11:
      if i mod 3 == 0:
        raise TestException(x: i)
      yield i

var myiter = mod3

while not finished(myiter):
  try:
    for i in myiter():
      echo i
  except TestException as e:
    if e.x mod 9 == 0:
      echo "Stop iteration"
      break
    else:
      echo "Ignore Exception: ", e.x

@disruptek
Copy link
Contributor

No, the simplest solution would be to yield the data that the user needs to evaluate the iteration. The user is gonna have to retrieve that anyway -- and you'd have to pass it to the callback. Why bother? We have a block in the form of the body of the loop and data in the form of loop vars...

@Araq
Copy link
Member

Araq commented Jan 17, 2020

No, the simplest solution would be to yield the data that the user needs to evaluate the iteration. The user is gonna have to retrieve that anyway -- and you'd have to pass it to the callback. Why bother? We have a block in the form of the body of the loop and data in the form of loop vars...

Exactly my thoughts.

@Araq
Copy link
Member

Araq commented Jan 17, 2020

Closing in favor of #13163, sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

walkDir ignores errors reported by C
5 participants