adds the `nth` function for iterables #56580

ghyatzo · 2024-11-16T12:17:36Z

Hi,

I've turned the open ended issue #54454 into an actual PR.
Tangentially related to #10092 ?

This PR introduces the nth(itr, n) function to iterators to give a getindex type of behaviour.
I've tried my best to optimize as much as possible by specializing on different types of iterators.
In the spirit of iterators any OOB access returns nothing. (edit: instead of throwing an error, i.e. first(itr, n) and last(itr, n))

here is the comparison of running the testsuite (~22 different iterators) using generic nth and specialized nth:

@btime begin                                                                                                                                                                                                                     
    for (itr, n, _) in $testset                                                                                                                                                                                           
         _fallback_nth(itr, n)                                                                                                                                                                                                           
    end                                                                                                                                                                                                                          
end                                                                                                                                                                                                                              
117.750 μs (366 allocations: 17.88 KiB)

@btime begin                                                                                                                                                                                                                     
  for (itr, n, _) in $testset                                                                                                                                                                                           
    nth(itr, n)                                                                                                                                                                                                              
  end                                                                                                                                                                                                                          
end                                                                                                                                                                                                                              
24.250 μs (341 allocations: 16.70 KiB)

added nth to export list

jakobnissen · 2024-11-16T13:22:25Z

base/iterators.jl

+"""
+    nth(itr, n::Integer)
+
+Get the `n`th element of an iterable collection. Return `nothing` if not existing.


Returning nothing makes it impossible to distinguish between "the nth element was nothing", and "there was no nth element". Perhaps return Union{Nothing, Some}?

Fair point.
Should it be Union{nothing, Some} even in those cases where we know there can't be a nothing value in the iterator (for sake of uniform api)? I.e. Count Iterator or Repeated (with its element different than nothing) or AbstractRanges

I think it should, otherwise it would be too confusing.

I would just throw an error if there is no nth element. There could also be a default argument as in get, where a user can pass a value that should be returned if no nth element exists.

I don't really follow the logic that the spirit of iterators is to return nothing in such cases?

Agree nothing is weird, your iterator can produce that. Some seems a bit technical & unfriendly? An error seems fine. Matches what first([]) does.

I suppose it can't literally be a method of get since it goes by enumeration not keys:

julia> first(Dict('a':'z' .=> 'A':'Z'), 3) 3-element Vector{Pair{Char, Char}}: 'n' => 'N' 'f' => 'F' 'w' => 'W' julia> nth(Dict('a':'z' .=> 'A':'Z'), 3) 'w' => 'W'

jakobnissen · 2024-11-16T13:23:21Z

base/iterators.jl

+```
+"""
+nth(itr, n::Integer) = _nth(IteratorSize(itr), itr, n)
+nth(itr::AbstractArray, n::Integer) = n > length(itr) ? nothing : itr[n]


This assumes one-based indexing. Perhaps do itr[begin + n - 1].

you are absolutely correct.
would something like getindex(itr, nth(eachindex(IndexLinear(), itr), n)) be too overkill?
and adding a specialization with nth(itr::AbstractRange, n::Integer) = getindex(itr, n)

I went with the probably overkill approach, if it's too much i'll revert back to your suggestion.

AbstractRanges are not always one-based either, so that approach runs into the same issue

From what I could gather that is included in the getindex already, since it ends up calling

unsafe_getindex(v::AbstractRange{T}, i::Integer) where T = convert(T, first(v) + (i - oneunit(i))*step_hp(v))

which should pretty much be the same sa [begin + n -1]
unless I'm missing the point completely?

The line nth(itr::AbstractRange, n) = getindex(itr, n) will for sure fail on the axes of an OffsetArray. (In fact, it will first be ambiguous, as n::Any is less specific.)

I was overthinking it, I'll just stick with [begin + n - 1]. Sorry.

base/iterators.jl

adienes · 2024-11-16T13:29:29Z

how would this compare to a more naive implementation like

nth(itr, n) = first(Iterators.Rest(itr, n))

?

test/iterators.jl

base/iterators.jl

ghyatzo · 2024-11-16T14:13:05Z

how would this compare to a more naive implementation like
nth(itr, n) = first(Iterators.Rest(itr, n))
?

Rest requires knowing the state used by the iterator, which is often considered an implementation detail and hard to pick automatically (unless i am missing something!)
If the state was known first(Rest(itr, n)) would probably be the fastest, since you alwasy do at most one iteration.
but knowing the correct n-1 state means that you most likely calculate n state directly.
In that case then a specialization would be even better!

add docs explaining interaction with Stateful iterators change test to be Any vectors instead of tuples (actually way faster as well)

mcabbott · 2024-11-17T02:02:13Z

Seems like a lot of code.

I reproduced the above benchmark here:
https://gist.github.com/mcabbott/fe2e0821e9bfe5cc7643bb15adf445d0
I get 75.625 μs for first(Iterators.drop(itr, n-1)) vs 1.558 μs for the PR. However, this is entirely driven by one case, Cycle{Vector{Int64}}. Some other cases are faster, some slower (String). Maybe they ought to be discussed in follow-up PRs?

No strong position on whether this needs a name or not, but perhaps this first PR can focus on that, and let the implementation be just:

nth(itr, n::Integer) = first(Iterators.drop(itr, n-1))
nth(itr::AbstractArray, n::Integer) = itr[begin-1+n]

ghyatzo · 2024-11-17T08:21:34Z

A lot of the code is for optimizing out of bound checking. If we go with davidantoff suggestion of letting nth just error on oob n most of the actual code can be scrapped while retaining the speed.

jakobnissen · 2024-11-17T08:52:24Z

I disagree with throwing an error. In cases where you don't know if an nth element exists, that forces a try-catch which is both slow and brittle. I would imagine that most ordered iterators with a known length support indexing, so this would probably mostly be used precisely when the length is unknown.

davidanthoff · 2024-11-17T19:58:53Z

I think another consideration here is consistency: the other functions we have that take an individual element from an iterator are first and last, and both throw an error if you ask them for something that doesn't exist (i.e. when you call them on an empty source). In my mind, first, nth and last should have the same kind of design, so that also speaks in favor of throwing an error.

I agree with @jakobnissen that in some situations being able to handle this without an exception would be nice, but on the flip side, I can also see scenarios where an error seems much better, in particular in interactive sessions where I might be playing around with some data and this function could be very useful. And especially in an interactive scenario it would be super inconvenient if Some was used...

Maybe the best design would be to allow for both scenarios. Say something like

nth(itr, n, nothrow=false)

So the default would be that an exception is thrown if the nth element doesn't exist, but when nothrow=true then nothing is returned if the element doesn't exist, and on success things are wrapped in Some.

ghyatzo · 2024-11-18T13:45:37Z

We could also opt for relying on the IteratorEltype trait to check if an iterator can contain nothing elements.
With ::HasEltype() we can dispatch over eltype(itr) <: Union{Nothing, T} where T.
In that case nth can return Some(nothing) while otherwise just return nothing since that would be unambiguous for those collections that do not have nothing in them. Of course wrapping with Some would be the default in case we have a ::EltypeUnknown Iterator.

Some is already expected in workflows with Union{Nothing, T} so that wouldn't introduce any extra complexity.

Although I see the similarity with first and last I'd be more akin to accomunate nth(itr, n) to the their siblings first(itr, n) and last(itr, n) and, I would argue, that first is a bit of an outlier in throwing on an empty collection, since for example,
according to documentation (and code) last never really throws an error:

"Return the end point of an AbstractRange even if it is empty."

the error in last([]) is from getindex that receives a 0 from

lastindex(a::AbstractArray) = (@inline; last(eachindex(IndexLinear(), a))) # equals to last(OneTo(0))

Similarly, both first(itr, n) and last(itr, n) rely on the take(itr, n) iterator which simply returns nothing when finishing the elements of the underlying iterator.

From this my idea that in principle iterators are non throwing by default, any throwing should be done one level higher and not at the iterator level itself (like how getindex and last interact). Iteration is a "low level" interface and I believe it should give the user the choice on how to handle "end states" of the iteration.

davidanthoff · 2024-11-18T20:15:46Z

We could also opt for relying on the IteratorEltype trait

I have to admit, I think that is the option I like least of all of the proposed options so far :) It would make it very tricky to write generic code that uses the nth function, essentially now I would have to check the trait every time I call nth on something to be able to correctly interpret the return value from nth.

I'd be more akin to accomunate nth(itr, n) to the their siblings first(itr, n) and last(itr, n)

To me nth(itr, n) is conceptually way closer to first(itr) and last(itr) because all of these produce single values, rather than streams of values. Whenever a function produces a stream of values there is a simple, natural way to return no value: namely an empty stream. But that is exactly not possible for functions that are supposed to return just one value.

From this my idea that in principle iterators are non throwing by default

Agreed, but the whole difference between first(itr) and first(itr, n) is the one does not produce an iterator, while the other one does.

I still think that my proposal with an argument like nothrow would be the cleanest solution here :), are there things that you think are problematic about it?

mcabbott · 2024-11-18T20:39:21Z

Is there any precedent for a nothrow keyword?

We could also follow get(A, key, default), as you suggested earlier.

It seems a little confusing that this goes by enumeration not indexing, so maybe it shouldn't be called Iterators.get, but is there a good suggestive name? getnth(iter, number, default) or getcount or something? Somehow using nth(iter, n, default) seems a bit at odds with first, last, at least to me.

ghyatzo added 3 commits November 16, 2024 13:00

adds nth function to iterators plus tests

6a0b4bd

removed useless module annotations

1de4a8f

added nth to export list

Merge branch 'master' into nth-api

11ee27a

jakobnissen reviewed Nov 16, 2024

View reviewed changes

test/iterators.jl Outdated Show resolved Hide resolved

base/iterators.jl Show resolved Hide resolved

LilithHafner added triage This should be discussed on a triage call iteration Involves iteration or the iteration protocol feature Indicates new feature / enhancement requests labels Nov 16, 2024

fix one based indexing, more generic

3870e3e

add docs explaining interaction with Stateful iterators change test to be Any vectors instead of tuples (actually way faster as well)

fix: simpler generic-based indexing

8bdd4a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds the `nth` function for iterables #56580

adds the `nth` function for iterables #56580

ghyatzo commented Nov 16, 2024 •

edited

Loading

jakobnissen Nov 16, 2024

ghyatzo Nov 16, 2024 •

edited

Loading

jakobnissen Nov 16, 2024

davidanthoff Nov 16, 2024

mcabbott Nov 17, 2024 •

edited

Loading

jakobnissen Nov 16, 2024

ghyatzo Nov 16, 2024

ghyatzo Nov 16, 2024

simeonschaub Nov 16, 2024

ghyatzo Nov 16, 2024 •

edited

Loading

mcabbott Nov 17, 2024 •

edited

Loading

ghyatzo Nov 17, 2024

adienes commented Nov 16, 2024

ghyatzo commented Nov 16, 2024 •

edited

Loading

mcabbott commented Nov 17, 2024 •

edited

Loading

ghyatzo commented Nov 17, 2024

jakobnissen commented Nov 17, 2024

davidanthoff commented Nov 17, 2024

ghyatzo commented Nov 18, 2024 •

edited

Loading

davidanthoff commented Nov 18, 2024

mcabbott commented Nov 18, 2024

adds the nth function for iterables #56580

Are you sure you want to change the base?

adds the nth function for iterables #56580

Conversation

ghyatzo commented Nov 16, 2024 • edited Loading

Choose a reason for hiding this comment

ghyatzo Nov 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghyatzo Nov 16, 2024 • edited Loading

Choose a reason for hiding this comment

mcabbott Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adienes commented Nov 16, 2024

ghyatzo commented Nov 16, 2024 • edited Loading

mcabbott commented Nov 17, 2024 • edited Loading

ghyatzo commented Nov 17, 2024

jakobnissen commented Nov 17, 2024

davidanthoff commented Nov 17, 2024

ghyatzo commented Nov 18, 2024 • edited Loading

davidanthoff commented Nov 18, 2024

mcabbott commented Nov 18, 2024

adds the `nth` function for iterables #56580

adds the `nth` function for iterables #56580

ghyatzo commented Nov 16, 2024 •

edited

Loading

ghyatzo Nov 16, 2024 •

edited

Loading

mcabbott Nov 17, 2024 •

edited

Loading

ghyatzo Nov 16, 2024 •

edited

Loading

mcabbott Nov 17, 2024 •

edited

Loading

ghyatzo commented Nov 16, 2024 •

edited

Loading

mcabbott commented Nov 17, 2024 •

edited

Loading

ghyatzo commented Nov 18, 2024 •

edited

Loading