Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add error-checked version of walkDir #13163

Closed
wants to merge 26 commits into from
Closed

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Jan 15, 2020

Alternative solution w.r.t. #13011, #12960.
New iterator walkDirErr with error-checking, use like this:

for kind, path, error in walkDirErr("dirA"):
  if error == OSErrorCode(0):
    echo path & " of kind " & $kind
  else:
    echo path & " GET INFO FAILED: " & osErrorMsg(error)

Other iterators remain unchecked.
Tested manually on Linux & Windows.

@juancarlospaco
Copy link
Collaborator

Being Nim a typed lang with overload and all, I dont think we need to add functionErrorname.
if err != nil: ... feels too Go lang.

@Araq
Copy link
Member

Araq commented Jan 17, 2020

I like this idea much better than #13011

@Araq
Copy link
Member

Araq commented Jan 17, 2020

Needs a .since annotation and a changelog entry.

@juancarlospaco
Copy link
Collaborator

What about tryWalkDir name to be consistent with the rest of the std lib ?.

@timotheecour
Copy link
Member

will walkDirRec get similar treatment?

@Araq
Copy link
Member

Araq commented Jan 18, 2020

Indeed, please rename it to tryWalkDir.

@demotomohiro
Copy link
Contributor

When findFirstFile or opendir failed (because dir is invalid path or no permission), walkDirErr yield dir.
But when there is no error, it doesn't yield dir.
How about to add pcInputPath to PathComponent enum and yield it in that case?
I can check if opening input dir is failed by comparing path and dir, but just checking error and kind is faster.

Can you write signle yield per when branch code?
https://nim-lang.org/docs/manual.html#iterators-and-the-for-statement-first-class-iterators

the body of a for loop over an inline iterator is inlined into each yield statement appearing in the iterator code, so ideally the code should be refactored to contain a single yield when possible to avoid code bloat.

@Araq
Copy link
Member

Araq commented Jan 19, 2020

How about to add pcInputPath to PathComponent enum and yield it in that case?

That would break code, code can contain exhaustive case statements, in fact, I generally encourage people to do that.

will walkDirRec get similar treatment?

No, we only need one non-recursive iterator as the building block for custom walks that can live outside the stdlib.

@a-mr
Copy link
Contributor Author

a-mr commented Jan 19, 2020

For discussion: an updated version, slightly more generalized + some fixes:

for step in tryWalkDir(dir):
  case step.kind
  of wiOpenDir:
    if step.openStatus != odOpenOk:
      echo "directory " & dir & " could not be opened " &
           $step.openStatus & " / " & osErrorMsg(step.code)
  of wiEntryOk: echo "found " & step.path & " of kind " & $step.entryType
  of wiEntryBad: echo "bad symlink " & step.path & " with error " &
                      osErrorMsg(step.code)
  of wiInterrupted:
    echo "traversal interrupted with error: " & osErrorMsg(step.code)

@a-mr a-mr changed the title add error-checked version of walkDir WIP: add error-checked version of walkDir Jan 19, 2020
lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
@a-mr a-mr changed the title WIP: add error-checked version of walkDir add error-checked version of walkDir Jan 27, 2020
Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but I don't think this should be accepted into the stdlib for various reasons:

  • I don't see this seeing enough use to be included in the stdlib
  • The API for this should be modelled via something more generic, like a Either[T, E] type or something instead of this custom WalkStep object.
  • The code being introduced in this PR is convoluted and hard to read, which will make it hard to maintain

lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
@a-mr
Copy link
Contributor Author

a-mr commented Jan 28, 2020

The solution of using closure iterators would be much simpler and more elegant (return tuple[OpenDirStatus, iterator]), but I'm not yet sure how to manage resources for them.
It seems that directory handles, captured by closure iterators are never freed.
A naive attempt to define a hook for the handles does not even pass compilation:

proc `=destroy`(m: var ptr DIR) =
  echo "Destruct"

Error: signature for '=destroy' must be proc[T: object](x: var T)

I can try using wrapper object for the handle.
There is a recent discussion on topic but it's without conclusion: Will --gc:arc address finalizers for closure iterator variables?

@Araq
Copy link
Member

Araq commented Jan 30, 2020

I don't see this seeing enough use to be included in the stdlib

I've asked for an implementation like this because our current means are insufficient for handling errors.

@a-mr
Copy link
Contributor Author

a-mr commented Feb 4, 2020

It looks a bit bulky but there is a bright side: recursive version is straightforward for implementation, because WalkStep values can be passed upstream easily. Here is an analogue for walkDirRec from stdlib:

iterator tryWalkDirRec*(dir: string,
                        yieldFilter = {pcFile}, followFilter = {pcDir},
                        relative = false):
                       WalkStep {.tags: [ReadDirEffect].} =
  var stack = @[""]
  while stack.len > 0:
    let d = stack.pop()
    for s in tryWalkDir(dir / d, relative = true):
      let rel = d / s.path
      case s.kind
      of wsEntryOk:
        if s.entryType in {pcDir, pcLinkToDir} and
           s.entryType in followFilter:
          stack.add rel
        if s.entryType in yieldFilter:
          var step = s
          step.path = if relative: rel else: dir / rel
          yield step
      of wsEntryBad, wsInterrupted, wsOpenDir:
        var step = s
        step.path = if relative: rel else: dir / rel
        yield step

@timotheecour
Copy link
Member

timotheecour commented Feb 4, 2020

multiple yield statements in inline iterators cause code bloat and must be avoided, as explained here: #10553

see #10557 (which was closed but is relevant) for an example of how to convert multiple yield into a single yield

both current PR and #13163 (comment) suffer from this

One possibility would be to use a 1st class iterator instead of inline iterator, given cost of file system operations, the overhead of 1st class iterators would be negligible

@a-mr
Copy link
Contributor Author

a-mr commented Feb 5, 2020

it was rewritten to use single yield. I'm using multiple yield only in examples for the sake of clarity.

lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
lib/pure/os.nim Outdated Show resolved Hide resolved
@Araq
Copy link
Member

Araq commented Feb 5, 2020

This definitely goes in the right direction, but:

  1. The control flow is unstructured. Too many break and continue statements make it hard to reason about. I personally also hate defer.
  2. No less than 4 different enums for iterating over a dir? OpenDirStatus, WalkStepKind, OSErrorCode , PathComponent?

I understand that iterating over directories properly in 2020 involves some black magic, but this seems overly excessive.

lib/pure/os.nim Outdated Show resolved Hide resolved
@a-mr
Copy link
Contributor Author

a-mr commented Feb 8, 2020

I combined OpenDirStatus into WalkStepKind. And I tried to simplify the code further, with limited success though. It's hard to do that with inline iterator using single yield statement.
It seems I've reached my limit here :-/

@stale
Copy link

stale bot commented Feb 22, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Feb 22, 2021
@stale stale bot closed this Mar 25, 2021
@ringabout ringabout reopened this Mar 30, 2021
@stale stale bot removed the stale Staled PR/issues; remove the label after fixing them label Mar 30, 2021
@timotheecour
Copy link
Member

timotheecour commented Mar 30, 2021

I combined OpenDirStatus into WalkStepKind. And I tried to simplify the code further, with limited success though. It's hard to do that with inline iterator using single yield statement.
It seems I've reached my limit here :-/

filewalks (nim-lang/fusion#32) does this; IMO filewalks belongs squarely in stdlib.

@a-mr
Copy link
Contributor Author

a-mr commented Jun 17, 2021

IMHO nim-results should be a part of stdlib for the tasks like this PR.

@stale
Copy link

stale bot commented Jun 18, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Jun 18, 2022
@stale stale bot closed this Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants