Reinvention of Triggers - Signals #71912

merarischroeder · 2021-10-25T03:34:14Z

merarischroeder
Oct 25, 2021

Instead of implementing the SQL Trigger API first, I suggest that Reinventing "Data event" mechanisms should happen first with a new API, and then legacy SQL Trigger API will be able to build upon that.

Triggers AND External-Queue systems (like RabbitMQ) needed a rethink anyway. Programmers want to use a real programming language to complete work, not Procedural SQL. Given that Cockroach is a distributed system, this is a good opportunity to reevaluate the needs that SQL-Triggers and External-Queue systems are satisfying - derive the underlying reasons and requirements.

Setting a value on Insert/Update - such as UpdatedAt and UpdatedByID (User)
Auditing upon Insert/Update/Delete
Notify a Process that new data has arrived in Table/Partition

[1] could be better handled with Built-In column types for such cases. If the timestamp is already captured for each change in the engine (which I think it is), then such an UpdatedAt column only needs to display that information. It would naturally be a read-only column. Nice. For UpdatedByID, this should be something also handled internally by the engine. Both the database-user-id AND an application-asserted-id should be recorded.

[2] should also be handled by the system. It's just a simple log of transactions. But an API should be presented that enables digestion of that log into different forms.

[3] The first step toward an "after" trigger should be limited only to "signalling". The CockroachDb cluster network only communicates the availability of new data for a particular table or partition(by row index). Any consideration for relaying the actual data should be postponed. Because instead, when a process is signalled, it should then Query to get the new data (according to defined JOINs and filters(where)).

Signalling foundation

CREATE SIGNAL Inserted ON mydb.Sales
--
GRANT WAIT * On mydb.Sales TO 'User B'
GRANT WAIT * On mydb.* TO 'User C'
GRANT WAIT Inserted On mydb.* TO 'User D'
GRANT PULSE * On mydb.Sales TO 'User E'
GRANT PULSE * On mydb.* TO 'User F'
GRANT PULSE Inserted On mydb.* TO 'User G'
 
WAIT Inserted ON mydb.Sales TIMEOUT 300 -- or the default, NO TIMEOUT
-- Somewhere else
PULSE Inserted On mydb.Sales

Even better, there should be a Streaming SQL API for this

select * from StreamView_Sales_Inserted where SaleAmount > 10000;

In which case, the View is a system-defined-stream-view that uses system-defined-signals. That is StreamView_Sales_Inserted would rely on the Inserted signal on the mydb.Sales table.

The user should be able to directly use multiple system-defined-stream-views to create custom composite stream-views. (But also, the user should be able create stream-views from scratch, where the Signals are referenced explicitly by the user (TBD))

Example usage:

CREATE VIEW StreamView_HourlySalesStats
BEGIN
Select year(X.SaleDate), month(X.SaleDate), day(X.SaleDate), hour(X.SaleDate),
   count(TransactionCount) as TransactionCount,
   avg(OIS.TotalSalePrice) as AverageOrderPrice,
   avg(OIS.TotalSaleProfit) as AverageProfit,
   avg(OIS.MarginPercent) as AverageMargin
from StreamView_InsertedSale X -- always driven by a streamview - in this case, it's the system view for a table
join View_OrderItemsStat OIS  -- can join to a normal View of a normal table
  on OIS.SaleID = X.SaleID
where 
   X.CreatedDate > now() -- only from now on
   -- X.CreatedDate > '2020-08-18' -- from a particular time onwards
group by year(X.SaleDate), month(X.SaleDate), day(X.SaleDate), hour(X.SaleDate)
END

select * from StreamView_HourlySalesStats

Such a streaming SQL query should be able to use the same Database Connector API. For example, in C#, the Read() function would block until a new record was available from the server. The server would be able to store the SQL query and other meta-data about results, then upon Signal, it can re-run the query and only send new records to the DB Connection client. A simplistic timestamp mechanism could be used for windowing of results, which would also allow older records to be resent if they were updated.

Conclusions

I still think that traditional SQL-Triggers would need to be supported. There would still be cases where such procedures need to be run upon INSERT/UPDATE/DELETE on the node that is running that Command. But I believe the bulk of the requirements can be shifted to better APIs and mechanisms given what we know today about building software systems.

What do you think? Would this be an improvement on the traditional SQL-Triggers foundation?

rafiss · 2021-10-29T16:32:22Z

rafiss
Oct 29, 2021
Collaborator

Thanks for this thoughtful write up!

Setting a value on Insert/Update - such as UpdatedAt and UpdatedByID (User)

For this use case, CockroachDB supports the DEFAULT clause on a column definition, and in v21.2 it will support the ON UPDATE clause. See https://www.cockroachlabs.com/docs/stable/default-value.html for DEFAULT and #68803 for a sneak peek of upcoming ON UPDATE syntax.

Auditing upon Insert/Update/Delete
Notify a Process that new data has arrived in Table/Partition

Both of these use cases, and the ideas you describe about Streaming SQL APIs sound very much like CockroachDB's CDC (Change Datafeed Capture) feature. See the docs: https://www.cockroachlabs.com/docs/stable/stream-data-out-of-cockroachdb-using-changefeeds.html

There would still be cases where such procedures need to be run upon INSERT/UPDATE/DELETE on the node that is running that Command.
I agree, this is the remaining use case where triggers are useful.

1 reply

merarischroeder Nov 4, 2021
Author

Thanks for your reply.

CockroachDB supports the DEFAULT...ON UPDATE

That sounds the best. Agreed.

Both of these use cases, and the ideas you describe about Streaming SQL APIs sound very much like CockroachDB's CDC (Change Datafeed Capture) feature

CDC might be the foundation for Signals internally. Signals are not messages/events, they are simple Signals about a topic. They enable a process to know that there's likely new data to query for. Ideally Signals would supported natively on the database connection, a bit like PostgreSQL Listen/Notify protocol.

Streaming SQL needs more than CDC. It needs all the data from the SQL query (which is distributed), not just new data arriving via CDC.

I believe that the idea of "triggers" was a great idea, but SQL-TRIGGERS were problematic. Triggers were awkward forcing a programmer into a specific flawed language. With different tools like Signals, and Streaming-SQL it will be possible for such processes to be hosted external to the "Database" and coded in any language of choice. I think it's worth exploring a new approach and "API" instead of Triggers.

I think I will need to supply a range of use-cases to justify this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinvention of Triggers - Signals #71912

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Reinvention of Triggers - Signals #71912

merarischroeder Oct 25, 2021

Replies: 1 comment · 1 reply

rafiss Oct 29, 2021 Collaborator

merarischroeder Nov 4, 2021 Author

merarischroeder
Oct 25, 2021

Replies: 1 comment 1 reply

rafiss
Oct 29, 2021
Collaborator

merarischroeder Nov 4, 2021
Author