You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adapt flatten to also work when fed an iterator (so that when calling groupby with flatten=true we can skip one collect_columns step
Add option lazy=true to groupby, groupreduce, etc... to return the iterator rather than the collected version
See which functions can take an iterator as input. Decide what to do for those that cannot (for example groupby can't really be done on an iterator as we need the result of sortperm, but we could maybe add a trait to iterators to see if sortperm is known somehow, some sort of HasSortPerm())
Bugfixes:
Clarify situation for iterators of nested tuples/pairs
A concern is type piracy. For example, I would like to filter lazily and reduce in one go, but we can't add a reduce method for a general iterator as that's type piracy, so maybe we should define our own AbstractRowIterator for dispatch purposes. Then all lazy algorithms would return a subtype of AbstractRowIterator. Then maybe RowIterator(t, select) would iterate on Rows(t, select).
We also need to think hard as to what this would mean for the distributed case. I don't have a strong intuition about that.
The text was updated successfully, but these errors were encountered:
List of things to do concerning the transition to a more
Iterator
based style.Getting rid of inference:
collect_columns
(WIP: collect without inference #135)port
map
(map no longer relies on inference #137)port
groupreduce
andgroupby
(Define and collect Columns{Pair} #140, move group algorithms to collect_columns technique #150)collect_columns
while flattening (WIP/RFC: collect_columns_flattened #155)port
join
tocollect_columns
(implement join as iterator and remove reliance on inference #225)port
broadcast
tocollect_columns
(Move broadcast to iteration method #226)remove other uses of
_promote_op
that I'm missing (Remove all use of _promote_op #227)Iterator compatibility / better composability:
Add method
ndsparse(x)
wherex
is an iterator that iterates pairs (Add method to collect iterator of pairs asndsparse
#157 )Adapt
flatten
to also work when fed an iterator (so that when callinggroupby
withflatten=true
we can skip onecollect_columns
stepAdd option
lazy=true
togroupby
,groupreduce
, etc... to return the iterator rather than the collected versionSee which functions can take an iterator as input. Decide what to do for those that cannot (for example
groupby
can't really be done on an iterator as we need the result ofsortperm
, but we could maybe add a trait to iterators to see ifsortperm
is known somehow, some sort ofHasSortPerm()
)Bugfixes:
A concern is type piracy. For example, I would like to filter lazily and
reduce
in one go, but we can't add areduce
method for a general iterator as that's type piracy, so maybe we should define our ownAbstractRowIterator
for dispatch purposes. Then all lazy algorithms would return a subtype ofAbstractRowIterator
. Then maybeRowIterator(t, select)
would iterate onRows(t, select)
.We also need to think hard as to what this would mean for the distributed case. I don't have a strong intuition about that.
The text was updated successfully, but these errors were encountered: