Add Restate architecture documentation #100

tillrohrmann · 2023-08-02T17:16:00Z

This fixes #92.

netlify · 2023-08-02T17:16:05Z

✅ Deploy Preview for docsrestatedev ready!

Name	Link
🔨 Latest commit	`9ffae9f`
🔍 Latest deploy log	https://app.netlify.com/sites/docsrestatedev/deploys/64d0ecf3d8fb100008298dfd
😎 Deploy Preview	https://deploy-preview-100--docsrestatedev.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

igalshilman

Looks good, thanks @tillrohrmann !

gvdongen

Thank you Till! I think there are a few topics that we could still discuss here:

Journal: I think the journal warrants a dedicated section here to explain how it gives you suspension and replay. And that it logs invocations but also context calls.
How invocations work: request goes to the ingress, state is eagerly attached together with the journal, runtime knows where the service is running (service registry) and sets up the connection... Mention suspensions
Service registry: that metas keep service registry based on discovery and that services don't need to do this themselves anymore. Their requests just go via the runtime.
Just some thoughts...

gvdongen · 2023-08-03T07:28:42Z

docs/restate/architecture.md

+The *Metas* are responsible for managing the service meta information and coordinating the *Workers*.
+
+The *Workers* are responsible for invoking services, storing their journal and service state as well as maintaining processing order.


We didn't introduce the term journal yet. Probably would be better to describe the responsibility from a higher perspective and then say we accomplish that via having a central journal.

I think the journal approach in general deserves a section here. To explain how we do all our magic: resiliency etc.

tillrohrmann · 2023-08-03T10:12:58Z

Thanks for the feedback @gvdongen. I will add sections for the journal and service invocation process.

tillrohrmann · 2023-08-03T16:01:08Z

I've pushed another commit including the description of durable execution via journaling, the service registry and the service invocation flow @gvdongen.

gvdongen

Thank you for adding the new sections @tillrohrmann
I think this adds a lot of useful information for the user!
I think the reading flow could be improved by shuffling some sections around... I would propose changing the order of the sections to

Durable execution via journaling (because from the user perspective this is the most important building block that he needs to understand)
Service invocation flow (includes service registry section... not sure if that would improve the reading flow)
Scalability
Consistency & fault tolerance (although in the mental model this belongs together with the journal for durable execution for me...)
State storage (include state queries into the section or maybe skip that for this page... For me this is more like a feature than an architecture component...)
What do you think?

gvdongen · 2023-08-04T07:39:55Z

docs/restate/architecture.md

+## Service registry
+
+All servie meta information is maintained by the *Metas* via the service registry.


Suggested change

All servie meta information is maintained by the *Metas* via the service registry.

All service meta information is maintained by the *Metas* via the service registry.

tillrohrmann · 2023-08-04T09:44:24Z

It seems that you have some other expectations for the architecture page than what I thought @gvdongen. My understanding for this page was to describe the runtime's architecture (basic principles and design ideas) in order to give credibility to what we are doing (like the runtime is built with scalability, consistency and fault tolerance in mind). Maybe you had more the whole of Restate in mind (what are the basic concepts you as a user need to understand, how do things work end-to-end from a higher level)?

Durable execution via journaling (because from the user perspective this is the most important building block that he needs to understand)

I am wondering whether durable execution is something that belongs on the architecture page of how the runtime works or should be more closer to the "Services" section. Technically speaking one could implement durable execution also by taking a memory snapshot or it could be a pure SDK concept. What matters from the runtime perspective is that the service endpoint can durable store bytes (not 100% correct because the runtime also needs to understand a few commands like calls or sleeps). Also given my description I talk more about the service endpoint than the runtime which might be an indicator.

Service invocation flow (includes service registry section... not sure if that would improve the reading flow)

Moving the service invocation flow up would mean that the definition of partitions and partition processors would only come later. It might also not be clear why one needs to route the invocation to the right Worker running a specific partition processor at this point.

Scalability

Consistency & fault tolerance (although in the mental model this belongs together with the journal for durable execution for me...)

For me these are two different pairs of shoes. What I want to describe here is how the runtime achieves consistency and fault tolerance (by running replicated state machines using Raft). What is built on top of it (durable execution via journaling) is certainly related but is just one way of how to achieve durable execution. If we could take a memory snapshot of the service endpoint, then storing these bytes would work equally well.

State storage (include state queries into the section or maybe skip that for this page... For me this is more like a feature than an architecture component...)

I would like to keep the state query part because for me it is major architectural component (exposing internal state via a SQL interface by running a SQL execution engine) and it is technically speaking independent of the actual state storage.

tillrohrmann · 2023-08-04T10:11:48Z

I've pushed a commit that groups scalability and consistency & fault tolerance under principles and state storage, state query and service registry under components. Not sure whether this makes the reading experience easier.

gvdongen · 2023-08-04T16:22:53Z

First of all, sorry for being so difficult there... I think all the content here is good so feel free to merge it. I don't want to block this...

Besides that, it seems indeed that we have slightly different views of the scope of what should be on there...I mainly saw this page as a a page where we describe how Restate makes sure that it can do what it does. So that it doesn't just seem like magic to users. I think what you wrote until now is definitely content that should be covered there... But I think I saw this page as slightly broader, so more as the architecture of Restate-as-a-larger-product instead focused on the runtime.

The way I saw it, was to have:

an overview page which is the docs landing page which basically lists the main features of Restate, how Restate sits in your stack and the key use cases (I am working on that)
an architecture page which gives more insight in how those features are accomplished. I know that from an implementation perspective some things sit closer to the SDK. But I think the user will see the split differently and will see it as my application logic vs. Restate. And from that perspective it is not important if a feature is enabled by the SDK vs by the runtime. So that's why I saw this page as a page which gives a bit more info on the key things which make Restate possible (central log, distributed runtime, RocksDB state store, consistency, and all the other topics you discussed)

Anyway, let's not block this on my feedback here. Because this page contains a lot of useful information for the user and we can always iterate to improve the story that we are telling across the docs 👍

tillrohrmann · 2023-08-04T16:52:27Z

I'll try to give it another pass to improve the overall reading experience by highlighting what matters most to users. If I don't manage to improve it, then we'll iterate on what we have in this PR here.

This fixes #92.

…nvocation flow

This commit restructures the architecture section to start with durable execution and the service invocation flow. The runtime specific sections are now grouped under "Runtime".

tillrohrmann · 2023-08-07T13:11:00Z

I've re-arranged the sections into:

Durable execution via journaling
Service invocation flow
Runtime
a. Scalability
b. Consistency
c. Storage
d. State query
e. Service registry

tillrohrmann requested review from igalshilman and gvdongen August 2, 2023 17:16

igalshilman approved these changes Aug 2, 2023

View reviewed changes

gvdongen reviewed Aug 3, 2023

View reviewed changes

tillrohrmann requested a review from gvdongen August 3, 2023 16:00

gvdongen reviewed Aug 4, 2023

View reviewed changes

tillrohrmann added 3 commits August 7, 2023 15:08

Add Restate architecture documentation

06f326f

This fixes #92.

Add description for durable execution, service registry and service i…

cc31ae7

…nvocation flow

Restructure architecture section

9ffae9f

This commit restructures the architecture section to start with durable execution and the service invocation flow. The runtime specific sections are now grouped under "Runtime".

tillrohrmann merged commit 9ffae9f into restatedev:main Aug 7, 2023
5 checks passed

tillrohrmann deleted the issue#92 branch August 7, 2023 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Restate architecture documentation #100

Add Restate architecture documentation #100

tillrohrmann commented Aug 2, 2023

netlify bot commented Aug 2, 2023 •

edited

Loading

igalshilman left a comment

gvdongen left a comment

gvdongen Aug 3, 2023

gvdongen Aug 3, 2023

tillrohrmann commented Aug 3, 2023

tillrohrmann commented Aug 3, 2023

gvdongen left a comment

gvdongen Aug 4, 2023

tillrohrmann commented Aug 4, 2023 •

edited

Loading

tillrohrmann commented Aug 4, 2023

gvdongen commented Aug 4, 2023 •

edited

Loading

tillrohrmann commented Aug 4, 2023

tillrohrmann commented Aug 7, 2023

		The Metas are responsible for managing the service meta information and coordinating the Workers.

		The Workers are responsible for invoking services, storing their journal and service state as well as maintaining processing order.

		## Service registry

		All servie meta information is maintained by the Metas via the service registry.

	All servie meta information is maintained by the Metas via the service registry.
	All service meta information is maintained by the Metas via the service registry.

Add Restate architecture documentation #100

Add Restate architecture documentation #100

Conversation

tillrohrmann commented Aug 2, 2023

netlify bot commented Aug 2, 2023 • edited Loading

✅ Deploy Preview for docsrestatedev ready!

igalshilman left a comment

Choose a reason for hiding this comment

gvdongen left a comment

Choose a reason for hiding this comment

gvdongen Aug 3, 2023

Choose a reason for hiding this comment

gvdongen Aug 3, 2023

Choose a reason for hiding this comment

tillrohrmann commented Aug 3, 2023

tillrohrmann commented Aug 3, 2023

gvdongen left a comment

Choose a reason for hiding this comment

gvdongen Aug 4, 2023

Choose a reason for hiding this comment

tillrohrmann commented Aug 4, 2023 • edited Loading

tillrohrmann commented Aug 4, 2023

gvdongen commented Aug 4, 2023 • edited Loading

tillrohrmann commented Aug 4, 2023

tillrohrmann commented Aug 7, 2023

netlify bot commented Aug 2, 2023 •

edited

Loading

tillrohrmann commented Aug 4, 2023 •

edited

Loading

gvdongen commented Aug 4, 2023 •

edited

Loading