Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for querying by a set of tags #651

Open
PerWiklander opened this issue Mar 27, 2022 · 11 comments
Open

Support for querying by a set of tags #651

PerWiklander opened this issue Mar 27, 2022 · 11 comments

Comments

@PerWiklander
Copy link

Short description

It would be nice to be able to query on more than one tag at a time.

Let's say I'm creating a projection that is interested in events tagged with "apple", "banana" and "orange".
I want to see them together in the order they were persisted.

Details

The proposed API is

readJournal.eventsByTags(
  Set(
    "apple",
    "banana",
    "orange",
  ),
  0L
)

and

readJournal.currentEventsByTags(
  Set(
    "apple",
    "banana",
    "orange",
  ),
  0L
)

Alternatively, please inform me on how I am totally wrong and how a projection would never need to handle events from multiple sources :-)

@octonato
Copy link
Member

I guess you want to consume events from different kinds of entities with a single projection in order to build an aggregated view. Is that correct?

I can also imagine that you want a OR, not a AND. Correct?
If you need an AND, I would simply use a distinct tag instead of three.

But back to the idea of having eventsByTags and currentEventsByTags, I think that we don't have it for historical reasons. I may be wrong, but I guess the plugin API involved from the Cassandra plugin and such a query would mean that we could potentially have to query many Cassandra nodes.

Ultimately, that could be added to the API and we could decide that not all plugin implementations are required to implement it. Actually, that's the case already. We have the APIs and plugins implement them according to the capacity of the underlying database.
However, that would represent a not so trivial effort. And probably not worth if there are other means to achieve it.

I'm not sure about your specific use case, but if I inferred it correctly, you have different kind of entities all producing events with different tags and you want a projection capable of consuming all of them.

If that's the case, they all belong to the same application and I don't think it will be a bad design if they share the same tag. At the end of the day, tags are here to allow us to build read-side views. If I would need to build a Fruit Basket View with different fruits, then I would simply tag all my fruits with fruit-basket tag. For example:

  • Apple entity tags with apple and fruit-basket
  • Banana entity tags with banana and fruit-basket
  • Orange entity tags with orange and fruit-basket

You could consume events from each fruit independently, but also consume all the fruit-basket events. Would that make sense for your use case?

@PerWiklander
Copy link
Author

The FruitBasket is indeed my intended use case. It would work to do as in your example.
The problem lies in the fact that projections come and go while the entities are more stable (at least in our application). Adding new tags for any potential combination of events that a new projection can be interested in is less stable. Plus the fact that old events need to be retagged accordingly. I.e. the situation when there are 10 000 events before the FruitBasket concept is conceived.

Then there's the "need to know" factor. My fruits do not need to know they are part of an imagined basket.

@octonato
Copy link
Member

octonato commented Apr 1, 2022

I understand your concern about entities not needing to know about each other.

On the other hand, if they need to be consumed preserving the ordering, then it sounds to me that the transactional boundary is not in the right place. Looks like if you were using CRUD, you will have some joins and the different tables would have been updated on the same transaction. But because of event sourcing and the transaction boundary split, they are happening in different transactions now, but we need to reconstruct the ordering back.

Or maybe the ordering is not that important and the requirement could be relaxed?
If you can relax it, you could have three projections writing to the same view. The view will be eventually consistent. It will have to accumulate data, but not showing until some point where the view would become 'presentable'.

Back to your idea of introducing search-by-tags, I think this will be non-trivial and maybe it will introduce a feature that's more of an edge case.

If you want to have such a query, I think the first option would be to try to get something alongside the Akka APIs that query your journal in the way you need. Like an alternative Query plugin. Then of course, once you have a working solution, we could discuss if this is something to be integrated back into Akka or not. WDYT?

@PerWiklander
Copy link
Author

Thanks for the clarification. Our actions going forwards will be:

  1. Try to do

Or maybe the ordering is not that important and the requirement could be relaxed?
If you can relax it, you could have three projections writing to the same view. The view will be eventually consistent. It will have > to accumulate data, but not showing until some point where the view would become 'presentable'.

I think this will work if we just analyse the transactions in more detail and find out what ACTUALLY happens. That usually does the trick :-)

  1. If we really need to, we will

try to get something alongside the Akka APIs that query your journal in the way you need. Like an alternative Query plugin." and of course that would be communicated back here, or to Akka proper.

IMHO this issue can be closed.

@patriknw
Copy link
Member

patriknw commented Apr 1, 2022

@PerWiklander I'm just curious, what database do you use?

@PerWiklander
Copy link
Author

For the projections? Postgres at the moment, but will probably use Neo4j for most future views. Also some pure in memory views where the number of events per entity is low there is no need to see the data more than a few minutes after it happened.

@patriknw
Copy link
Member

patriknw commented Apr 1, 2022

I you use Postgres for the write side also I would recommend that you take a look at https://doc.akka.io/docs/akka-persistence-r2dbc/current/index.html

There we take a different approach to how to read events for Projections. Instead of tags we use something we call slices. You can read more about that in the documentation. It doesn't solve the feature request you asked for here, but the design might become more like a firehose with a stream of all events and then in memory filtering and routing when consuming those to build different views.

@PerWiklander
Copy link
Author

Oh! So more like getting an event stream and filtering in the client (the code I write) than to have n number of projections poll the database for "has anything happened on tag x since last time I checked?". That sounds... much better :-)

@PerWiklander
Copy link
Author

But it looks like I misunderstood. currentEventsBySlices[MyEvent] is still per event type and takes an entityType: String (which looks like a tag), so the performance issue that is solved here is when compared to using tags for sharding, right?

Our use case is so small that we don't see a need for sharding at the moment. On the other hand we have many types of entities so we run many projections that mostly sit idle and poll for "has anything happened on this tag yet?".

@PerWiklander
Copy link
Author

One more question, now that we are on the topic of akka-persistence-r2dbc:

Publish events for lower latency of eventsBySlices
...
The events must still be retrieved from the database, but at a lower polling frequency, because delivery of published messages are not guaranteed.

Does this imply that I risk missing events, or that the polling (that still happens but less often) would catch missed published events?

@patriknw
Copy link
Member

patriknw commented Apr 1, 2022

You would typically have one entityType for each type of entity and that concept is indeed coupled to Cluster Sharding. That said, you can use one single entityType in your system. It's just the first part of the string persistenceId. See entityTypeHint in akka.persistence.typed.PersistenceId.apply.

Does this imply that I risk missing events, or that the polling (that still happens but less often) would catch missed published events?

The latter. It will not miss events. Duplicates are automatically filtered out before they are delivered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants