Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an Interface Instead of Channels for Enumerators #21

Open
marstr opened this issue Dec 21, 2021 · 1 comment
Open

Use an Interface Instead of Channels for Enumerators #21

marstr opened this issue Dec 21, 2021 · 1 comment
Milestone

Comments

@marstr
Copy link
Owner

marstr commented Dec 21, 2021

I wrote this library a few weeks after having written my first line of Go. I was very excited to play around with channels, and to use them to build concurrent pipelines that would allow people to easily write fast, organized, parallelized code. Right away, I used it to make a bunch of build tools for the Azure SDK for Go, and saw huge success with the approach. The time it took to build all of our packages from Open API specs dropped more than 50%, and I knew I was onto something. But almost as quickly, the flaws of this approach started showing themselves. Within a week, I was laying in bed one night and thought, "wait, what happens when I call collection.Any(...)? It starts an enumeration but doesn't finish it. How does the publishing goroutine know to stop?" Spoiler, it didn't. I went into the office, wrote a program that called collection.Any in a loop and it crashed my machine. Remember, these were my first days with Go (Go1.5 timeframe), and the context.Context pattern wasn't really around/popular yet. It was still early days, and I was confident I was the only one using this library, so I fixed the problem by making a breaking change by adding a <-chan struct{} to provide a cancellation signal and kept moving.

A little while later, a new problem presented itself: how do you handle errors? You could add an error to Enumerate(...)'s return signature. But that only has the capacity to indicate errors that happened right away as you are establishing the enumerator, not errors the crop up while the pipeline is processing. One could add a new struct like type EnumerationEntry struct { Payload interface{}, Err error} and use it in place of everywhere interface{} was being used. But then the whole thing is starting to feel sloppy and unwieldy. At the same time, the places where it would matter were becoming clear. What if you were wrapping a Web API that returns results in pages, and wanted to hide that complexity with an Enumerator? Every time you cross a page boundary you really are fairly likely to encounter a terminal error. But none of the places I was actually using enumerators were prone to this problem. And hugely benefited from the parallelism that my approach offered. Building pipelines to process data that's already in memory, or even just spinning off local processes was not prone to failure. Additionally, it was becoming less clear that I was the only user, as I was seeing a lot of clones of this repository (which I still see, thank you.) Without go modules, or even popular adoption of forebearers like godep, glide, etc I didn't feel like I could make a breaking changes. So I just decided I didn't have the tools to make a real change, and kept with what I had.

The world of Go has changed a lot since the initial development of this library. With go modules, I have the ability to occasionally publish breaking changes in a fairly responsible way. People I respect a lot, like Brian Ketelsen, have affirmed the problems above by saying to my face, "never return channels." So, now that Go is adding generics it's felt like a great opportunity to release a v2 and clean up some of the things that have been dogging this library, and me.

So the question is:

In V2, should the current Enumerator type:

type Enumerator[T any] <-chan T

be updated to something like:

type Enumerator[T any] interface{
    Next(ctx context.Context) (T, error)
}

The big pros:

  • Eliminates casual memory leaks.
  • Much improved error handling.

The cons:

  • Concurrent pipelines are hard again, pretty fundamentally changing the value of the code in query.go.
  • Enumerator would no longer work with the range operator.
  • Fluent pattern basically goes away. (Fluent isn't really idiomatic Go. So this one doesn't bother me, but I want to acknowledge it.)

Are there ways we could mitigate some of the cons by adding helper functions?

@marstr marstr added this to the v2.0.0 milestone Dec 21, 2021
@marstr
Copy link
Owner Author

marstr commented Dec 21, 2021

For v2.0.0, this question should at least be answered; knowing that if the current channel based approach is kept, there shouldn't be a v3 published for years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant