[scd] properly cleanup implicit sub on oir update #1057

Shastick · 2024-07-18T20:40:02Z

This is a first stab at #1056, and covers two corner cases that currently result in dangling implicit subscriptions:

when an OIR previously linked to an implicit subscription is updated with a user-specified subscription
when an OIR previously linked to an implicit subscription is updated in such a way that a new implicit subscription is created.

The OIR upsert logic now check the following:

if a subscription was previously attached to the updated OIR
if the previous subscription was an implicit one
if the previous implicit subscription has no more dependent OIRs

We now delete subscriptions that meet all the conditions above.

Notes

This PR chooses the approach to maintain the re-creation of an implicit subscription when, in a request to update an OIR, parameters for an implicit subscription are provided.

Resolves #1056

Test plan: with this PR, the DSS is passing the test scenario check through which I discovered the problem, but which has not yet been added to the qualifier. (It will be available once interuss/monitoring#720 is merged)

barroco

Thank you @Shastick for taking care of it. This looks good to me since it only cleans up orphan subscription without changing the current response to USSP.
Let's open another issue to discuss if creating a new implicit subscription is the expected behaviour.
You will find minor comments inline.

pkg/scd/operational_intents_handler.go

barroco · 2024-07-23T14:40:51Z

pkg/scd/operational_intents_handler.go

+					return stacktrace.Propagate(err, "Could not determine if previous Subscription can be removed")
+				}
+
+				// We _could_ reuse the previous subscription ID if it was an implicit subscription,


Please add a reference to the ticket to keep a link with the design discussion/decision.

The comment does not entirely reflect my intention, I will reword it.

in short: when an update is done and the parameters for the creation of an implicit subscription are passed, my understanding is that we actually want to create a new implicit subscription.

Clients are allowed to change the subscription attached to an OIR, and from my understanding, I think it is valid to give the option to replace an existing subscription with an implicit one.

Clients that wish to reuse an existing subscription (whether or not it is an implicit one) should specify the existing subscription ID in the call parameters.

To be more precise: there was no design discussion about this – this is my interpretation of how the API should work.

If you think this needs discussing/clarifying we can either open an issue, discuss here or have a chat at the next weekly meeting.

when an update is done and the parameters for the creation of an implicit subscription are passed, my understanding is that we actually want to create a new implicit subscription.

Yes, when new_subscription is specified, a new implicit subscription should be created.

I think it is valid to give the option to replace an existing subscription with an implicit one.

I don't know of an actual use case for this, but I agree this is what should happen if someone specifies new_subscription on an op intent with an existing subscription. And, if that existing subscription was implicit and no longer has any dependent operational intents, then I would expect the existing subscription to be automatically deleted by the DSS since it was implicit (managed by DSS).

Clients that wish to reuse an existing subscription (whether or not it is an implicit one) should specify the existing subscription ID in the call parameters.

Agreed.

If you think this needs discussing/clarifying we can either open an issue, discuss here or have a chat at the next weekly meeting.

I don't think there is any ambiguity here so I don't think there's need for further discussion. However, if someone thinks otherwise then we can certainly have that discussion.

BenjaminPelletier · 2024-08-12T19:43:22Z

pkg/scd/operational_intents_handler.go

+			if len(dependentOps) == 0 {
+				return false, stacktrace.NewError("An implicit Subscription had no dependent OperationalIntents")
+			} else if len(dependentOps) == 1 {
+				return true, nil


We're assuming the one dependent operational intent reference is the one the caller is interested in, but that assumption is neither documented nor enforced. If we're just checking a particular subscription, then the answer shouldn't be "yes" or "no", it should be "only when removing this particular operational intent reference" (returning an optionally-nil pointer to an operational intent ID). Or, if we're checking whether a particular subscription can be removed after removing the dependency of an operational intent reference on that subscription, then the ID of the operational intent should be passed into this function and validated that the single dependent operational intent is the operational intent passed.

The operational intent ID was not needed previously because the context in which the code used to be located had just identified this subscription as the one the operational intent was using.

BenjaminPelletier · 2024-08-12T19:54:30Z

pkg/scd/operational_intents_handler.go

@@ -16,6 +16,42 @@ import (
 	"github.com/interuss/stacktrace"
 )

+// subscriptionCanBeRemoved will check if:
+// - a previous subscription was attached,


Attached to what? Neither this documentation nor the function signature mentions an operational intent. I think we should add an op intent ID to the function signature (see other comment) and then this will make more sense (a previous subscription was attached to the specified operational intent).

mickmis · 2024-08-22T14:24:15Z

pkg/scd/operational_intents_handler.go

@@ -490,6 +511,14 @@ func (a *Server) upsertOperationalIntentReference(ctx context.Context, authorize
 					}
 				}

+				removePreviousImplicitSubscription, err = subscriptionIsOnlyAttachedToOIR(ctx, r, &id, previousSubscriptionID)


Any reason why not deleting the subscription in this block rather than setting a flag? That would simplify the control flow.

At this point in the transaction the subscription cannot be deleted, because it is referenced by the operational intent.

The previous implicit subscription can only be removed once the OIR has been updated to point to another subscription.

If you delete it before the r.UpsertOperationalIntent(ctx, op) statement at line 719, you get an error.

Note to self: will need to confirm the above again, as I might have been confused by another error.

My current understanding is that foreign keys are enforced immediately in cockroach DB, and not at the time when the transaction is committed, which is why the subscription cleanup must happen after the OIR was updated. The doc seems to concur, though I have not checked if we set a non-default foreign key action on the relevant field.

The relevant extract:

If there are any existing references to the key being deleted, the transaction will fail at the end of the statement

mickmis · 2024-08-22T14:34:47Z

pkg/scd/operational_intents_handler.go

+		if err != nil {
+			return false, stacktrace.Propagate(err, "Unable to get OperationalIntent's Subscription from repo")
+		}
+		if sub == nil {


We retrieve the subscription ID from the operational intent out of the store. IIUC it is then not realistic for the subscription to not exist because the store will guarantee that to not happen.
IMO this also highlights that invoking GetSubscription here is not appropriatea: this should be done when previousSubscriptionID is retrieved, and then pass the subscription around and not just its ID.

We retrieve the subscription ID from the operational intent out of the store. IIUC it is then not realistic for the subscription to not exist because the store will guarantee that to not happen.

~~Operational intents are not guaranteed to have a subscription attached: OIR's in the 'ACCEPTED' state may well be created without a subscription.~~ Sorry, had to put myself back into the context of the PR.

At this point the sub should not be nil, but I think we should still check for this nil and report a clean error if it happens versus running into a nil pointer, in case we have a bug?

This check is taken verbatim from the old deletion logic that this method replaces.

IMO this also highlights that invoking GetSubscription here is not appropriatea: this should be done when previousSubscriptionID is retrieved, and then pass the subscription around and not just its ID.

I disagree: there are circumstances where we won't need to fetch the subscription. For example, when updating an OIR, if the request specifies the same subscription ID as before, there is no need to get it from the store.

mickmis · 2024-08-22T14:37:47Z

pkg/scd/operational_intents_handler.go

+				return false, stacktrace.Propagate(err, "Could not find dependent OperationalIntents")
+			}
+			if len(dependentOps) == 0 {
+				return false, stacktrace.NewError("An implicit Subscription had no dependent OperationalIntents")


Not sure if that's OK: when this new logic will be introduced, there will exist in the store some dangling implicit subscriptions.

That is correct: although, I essentially kept the logic as it existed before being factored into this method.

I'd rather fix this in a separate PR, as there might be some discussion about whether or not we should opportunistically fix such inconsistencies when we encounter them vs. noisily failing so we become aware of them.

mickmis · 2024-08-22T14:38:45Z

pkg/scd/operational_intents_handler.go

+			return false, stacktrace.NewError("OperationalIntent's Subscription missing from repo")
+		}
+
+		if sub.ImplicitSubscription {


Better use an early return if the subscription is explicit.

mickmis · 2024-08-23T13:23:22Z

I tried to build an overview of what should happen wrt subscription in the upsert.

Is the subscription ID provided in the request parameters the same as the subscription ID attached to the existing operational intent? (in both cases the subscription ID may be none)
If yes: no need to do anything.
If no we have the following cases:

	implicit old sub	explicit old sub	no old sub
implicit sub in params	delete old if no dependent OI	-	-
explicit sub in params	delete old if no dependent OI	-	-
no sub in params	delete old if no dependent OI	-	-
new implicit sub in params	create new implicit sub delete old if no dependent OI	create new implicit sub	create new implicit sub

At the moment we have a bit of a complex control flow for handling subscriptions updates, based first on whether the subscription ID is empty, but if the above is correct, the control flow should be as simple as (and also extensible to the deletion of operational intents):

if params.sub_id != old_oi.sub_id:
    if old_sub.implicit:
        delete_if_no_dependent_ois()
    if params.new_sub:
        create_new_implicit_sub()

If I'm correct a small refactoring could be appropriate to factor away this logic IMO. WDYT?

Shastick · 2024-08-23T13:43:10Z

Taking to draft as I need to handle another corner case.

Shastick · 2024-08-23T14:31:45Z

If I'm correct a small refactoring could be appropriate to factor away this logic IMO. WDYT?

Yes, it's actually possible to move the check out of the if block. The result does not yet feel entirely clean, but I'm not sure we can do better without doing a deeper refactoring

Shastick · 2024-08-29T12:18:30Z

@mickmis where do we stand on the need to refactor the UpdateOperationalIntentReference method? If needed, the refactor should certainly happen in another PR: should we do it before or after merging this one? How urgent is this with respect to the current open issues that are caused by the dangling OIRs?

Once I'm done with handling the OVN check PRs and corresponding scenarios I can take a look at the refactor, but I'm wondering if it is aligned with our current priorities?

mickmis

where do we stand on the need to refactor the UpdateOperationalIntentReference method? If needed, the refactor should certainly happen in another PR: should we do it before or after merging this one? How urgent is this with respect to the current open issues that are caused by the dangling OIRs?

Once I'm done with handling the OVN check PRs and corresponding scenarios I can take a look at the refactor, but I'm wondering if it is aligned with our current priorities?

After looking at it one more time, I do believe the change in this PR does what it is supposed to do. But I think it can and should be simplified more.

If not bound by any other constraint, IMO this function should be refactored before making such a change. Its control flow is already more complex than it should, and this change make that situation even worse. This makes it hard to identify tricky corner-cases, both at implementation and review. Also, the logic in this function is both one of the most complex and most central in the DSS.
Now, I'm not sure where we stand in terms of constraints and priorities. Do you have an opinion @BenjaminPelletier ?

pkg/scd/operational_intents_handler.go

mickmis · 2024-08-28T13:19:23Z

pkg/scd/operational_intents_handler.go

+			// A subscription has been specified:
+			// If it is different from the previous subscription, we need to fetch it from the store
+			// in order to ensure it correctly covers the OIR.
+			if effectiveSubscription == nil || previousSubIsBeingReplaced {


effectiveSubscription == nil will always be true here.

Actually now that I look at it again, I'm also not completely sure why this change here is needed? Was there previously an issue that is fixed by this change? Or is it just an optimization to avoid fetching twice the same subscription?

effectiveSubscription == nil will always be true here.

Thanks for catching. A previous revision was pre-setting the value and that ended up being even more confusing than it is now.

Actually now that I look at it again, I'm also not completely sure why this change here is needed? Was there previously an issue that is fixed by this change? Or is it just an optimization to avoid fetching twice the same subscription?

This is to avoid re-fetching the same subscription already fetched at line 496 in the if old.SubscriptionID != nil { block if that subscription is not being replaced. In cases where it is being replaced, we need to fetch the subscription that will eventually be attached to the OIR to verify that it properly covers the OIR.

It indeed ensures that we don't do unnecessary roundtrips.

Thanks for catching. A previous revision was pre-setting the value and that ended up being even more confusing than it is now.

(Well, as a good illustration of how convoluted the control logic got, my 'simplification' was too much and needs a rework)

This is to avoid re-fetching the same subscription already fetched at line 496 in the if old.SubscriptionID != nil { block if that subscription is not being replaced. In cases where it is being replaced, we need to fetch the subscription that will eventually be attached to the OIR to verify that it properly covers the OIR.

It indeed ensures that we don't do unnecessary roundtrips.

Could you add a comment mentioning that? That clarifies that this is not actually a separate case, but an optimization. Ideally that could be hidden behind a function but not sure if that's doable or makes sense here.

Shastick · 2024-08-29T19:15:14Z

After looking at it one more time, I do believe the change in this PR does what it is supposed to do. But I think it can and should be simplified more.

Do you mean that this PR can and should be simplified, or that the method should be simplified?

If not bound by any other constraint, IMO this function should be refactored before making such a change. Its control flow is already more complex than it should, and this change make that situation even worse. This makes it hard to identify tricky corner-cases, both at implementation and review. Also, the logic in this function is both one of the most complex and most central in the DSS. Now, I'm not sure where we stand in terms of constraints and priorities. Do you have an opinion @BenjaminPelletier ?

I personally see this as two separate issues:

one of correctness: is the code doing what it is supposed to do? So far we have relied on tests to convince ourselves of that, and we should continue to do so here. Furthermore we know that the DSS is currently not behaving as it should, and we have some evidence that this PR would address that. If we don't trust our tests enough to tell us if the DSS behaves as expected, we should probably start by adding tests until we reach that confidence.
one of maintainability: how long does it take to understand the code and to modify it, and how hard is it to add a feature or fix something without unintended consequences.

As such, I would see no issue in proceeding with adding something as long as it is correct (when we apply our usual standard for correctness) even if it worsens maintainability a little bit, if focusing on maintainability first means delaying the fix by an unknown amount of time.

I see three reasons for not waiting on a refactor to proceed with the fix:

I don't see a good path to simplifying things yet (we could have separate paths for OIR creation and update, maybe moving things to submethods will make the logic more tractable, maybe there's a major simplification that is possible and that we don't see, maybe we will disagree on the approach and need to align), so I expect that this might take a while
we plan to inline or push down some logic into the queries themselves rather than doing round-trips between the database and the Go logic. This in itself will deeply impact the relevant methods, and may cause us to remove or rework the simplifications we did to improve maintainability: any effort spent building the 'perfect' method now risks being thrown away if we don't combine it with pushing down the logic to the database.
longer term plans may involve moving away from an SQL/relational-based store, which will have an even deeper impact

I do however agree that the logic herein is pretty central to what the DSS does, and that it would deserve to be made as maintainable as possible as that will benefit us in the long term.

BenjaminPelletier · 2024-08-30T00:59:09Z

I do think fixing the correctness problem from #1056 (which I think this PR does) is important; it has impact in a number of areas (e.g., this table from #1074 which probably means #1074 wouldn't have been filed if this issue had been fixed earlier).

I have not reviewed the full content of this PR in particular because Committer review time is limited and theoretically we should all have fairly similar overall thoughts (when we don't, that's a good candidate for additional documentation in the TSC repo). I (and/or @barroco) can take a more holistic look to weigh in on the inherently-gray tradeoffs between correctness, maintainability, and timeliness, though I think in an ideal world (which indeed we don't live in, unfortunately) a good result could be worked out with the reviewer(s) already most familiar with the content.

I don't know if this general thought is applicable to this specific case, but often complexity can be managed by factoring more sections of a large function into their own functions with their own documentation, signatures, and encapsulated behavior (as has already been done once in this PR). This lets the core of the complex function look more like brief pseudocode, and therefore be more understandable at a glance (like @mickmis's pseudocode).

We don't need this fix tomorrow, so we can still iterate on the most maintainable version, but I agree we don't want that to take forever. If anyone thinks we could do a better job at picking tradeoffs, that would be a great topic for the Tuesday meeting, and probably worth giving a heads up on Slack 1-2 days beforehand that we'll want to discuss that so anyone who wants to can pre-read the background info (e.g., this PR) to be better prepared to discuss real time.

Shastick · 2024-08-30T06:38:25Z

Given that

We don't need this fix tomorrow, so we can still iterate on the most maintainable version, but I agree we don't want that to take forever.

I'm happy to start with encapsulating things into submethods and see what kind of results I can come up with today.

edit: created #1088 so we can discuss the refactor-specific things there.

mickmis · 2024-08-30T11:07:35Z

I personally see this as two separate issues:

one of correctness: is the code doing what it is supposed to do? So far we have relied on tests to convince ourselves of that, and we should continue to do so here. Furthermore we know that the DSS is currently not behaving as it should, and we have some evidence that this PR would address that. If we don't trust our tests enough to tell us if the DSS behaves as expected, we should probably start by adding tests until we reach that confidence.

one of maintainability: how long does it take to understand the code and to modify it, and how hard is it to add a feature or fix something without unintended consequences.

As such, I would see no issue in proceeding with adding something as long as it is correct (when we apply our usual standard for correctness) even if it worsens maintainability a little bit, if focusing on maintainability first means delaying the fix by an unknown amount of time.

You make great points here, thanks for the response, I'd just add that for correctness we need understanding in addition to testing (which is what I struggled to reach).

All in all I think that what you ended up doing @Shastick (i.e. timeboxing some enhancements) is a good compromise.

(thanks for the feedback @BenjaminPelletier)

Shastick · 2024-09-06T15:16:07Z

Closing this PR to open a new clean one, since so much of original base branch has changed.

See #1104 for the continuation

This is a rework of interuss#1057 after most changes for interuss#1088 have landed. The DSS built from this PR successfully passed the new qualifier's implicit subscription handling scenario, although that scenario still needs to be extended to cover all corner cases. The PR is open for review, as it addresses interuss#1056

This is a rework of #1057 after most changes for #1088 have landed. The DSS built from this PR successfully passed the new qualifier's implicit subscription handling scenario, although that scenario still needs to be extended to cover all corner cases. The PR is open for review, as it addresses #1056

Shastick mentioned this pull request Jul 19, 2024

[uss_qualifier] check that implicit SCD subscriptions are properly managed interuss/monitoring#720

Merged

barroco approved these changes Jul 23, 2024

View reviewed changes

This was referenced Jul 25, 2024

[scd] push down implicit subscription cleanup to Database, if possible #1059

Open

[scd] inline cleanup of implicit subscription into CRDB call when removing OIR #1060

Closed

BenjaminPelletier reviewed Aug 12, 2024

View reviewed changes

Shastick force-pushed the fix-implicit-sub-oir-update branch 2 times, most recently from ddf25c7 to 67d531c Compare August 13, 2024 07:22

Shastick requested a review from BenjaminPelletier August 13, 2024 08:26

mickmis reviewed Aug 22, 2024

View reviewed changes

BenjaminPelletier mentioned this pull request Aug 22, 2024

Implement TTLs on ended SCD subscriptions to prevent unbounded growth and latency increases #1074

Closed

Shastick force-pushed the fix-implicit-sub-oir-update branch from 67d531c to cfc4895 Compare August 23, 2024 12:20

Shastick marked this pull request as draft August 23, 2024 13:42

Shastick force-pushed the fix-implicit-sub-oir-update branch 2 times, most recently from ee0fdf1 to a917152 Compare August 23, 2024 14:30

Shastick force-pushed the fix-implicit-sub-oir-update branch 3 times, most recently from dbf7c86 to f4948b8 Compare August 23, 2024 14:47

Shastick marked this pull request as ready for review August 23, 2024 14:48

Shastick marked this pull request as draft August 23, 2024 14:53

Shastick force-pushed the fix-implicit-sub-oir-update branch 2 times, most recently from cbc2b0f to c917f3a Compare August 23, 2024 16:09

Shastick marked this pull request as ready for review August 23, 2024 16:09

Shastick force-pushed the fix-implicit-sub-oir-update branch 2 times, most recently from 229c4a2 to 7bf4b55 Compare August 27, 2024 12:27

mickmis reviewed Aug 29, 2024

View reviewed changes

Shastick force-pushed the fix-implicit-sub-oir-update branch from 7bf4b55 to 5f0dab6 Compare August 29, 2024 19:15

Shastick added 3 commits August 29, 2024 21:19

[scd] properly cleanup implicit sub on oir update

6731893

Comments

e2dc466

rename and make subscriptionCanBeRemoved more specific

efbb391

Shastick force-pushed the fix-implicit-sub-oir-update branch 3 times, most recently from 6584b7a to 1c4e1ae Compare August 29, 2024 19:32

Shastick marked this pull request as draft August 29, 2024 19:34

second round of comments, handle another corner case

9ca5e54

Shastick force-pushed the fix-implicit-sub-oir-update branch from 1c4e1ae to 9ca5e54 Compare August 29, 2024 19:44

Shastick marked this pull request as ready for review August 29, 2024 19:46

Shastick mentioned this pull request Aug 30, 2024

[maintenance] scd improve readability of OIR upsertion logic and OIR subscription handling #1088

Open

13 tasks

Shastick marked this pull request as draft August 30, 2024 14:14

Shastick closed this Sep 6, 2024

Shastick mentioned this pull request Sep 6, 2024

[scd] properly cleanup implicit sub on oir update #1104

Merged

Shastick mentioned this pull request Sep 9, 2024

[scd] properly cleanup implicit sub on oir update #1109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scd] properly cleanup implicit sub on oir update #1057

[scd] properly cleanup implicit sub on oir update #1057

Shastick commented Jul 18, 2024 •

edited

Loading

barroco left a comment •

edited

Loading

barroco Jul 23, 2024

Shastick Jul 25, 2024

Shastick Jul 25, 2024

BenjaminPelletier Aug 12, 2024

BenjaminPelletier Aug 12, 2024

BenjaminPelletier Aug 12, 2024

mickmis Aug 22, 2024

Shastick Aug 23, 2024 •

edited

Loading

Shastick Aug 30, 2024 •

edited

Loading

mickmis Aug 22, 2024

Shastick Aug 23, 2024 •

edited

Loading

Shastick Aug 23, 2024

mickmis Aug 22, 2024

Shastick Aug 23, 2024

mickmis Aug 22, 2024

mickmis commented Aug 23, 2024

Shastick commented Aug 23, 2024

Shastick commented Aug 23, 2024

Shastick commented Aug 29, 2024

mickmis left a comment

mickmis Aug 28, 2024

mickmis Aug 29, 2024

Shastick Aug 29, 2024

Shastick Aug 29, 2024

mickmis Aug 30, 2024

Shastick commented Aug 29, 2024

BenjaminPelletier commented Aug 30, 2024

Shastick commented Aug 30, 2024 •

edited

Loading

mickmis commented Aug 30, 2024

Shastick commented Sep 6, 2024 •

edited

Loading

[scd] properly cleanup implicit sub on oir update #1057

[scd] properly cleanup implicit sub on oir update #1057

Conversation

Shastick commented Jul 18, 2024 • edited Loading

Notes

barroco left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shastick Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Shastick Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shastick Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickmis commented Aug 23, 2024

Shastick commented Aug 23, 2024

Shastick commented Aug 23, 2024

Shastick commented Aug 29, 2024

mickmis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shastick commented Aug 29, 2024

BenjaminPelletier commented Aug 30, 2024

Shastick commented Aug 30, 2024 • edited Loading

mickmis commented Aug 30, 2024

Shastick commented Sep 6, 2024 • edited Loading

Shastick commented Jul 18, 2024 •

edited

Loading

barroco left a comment •

edited

Loading

Shastick Aug 23, 2024 •

edited

Loading

Shastick Aug 30, 2024 •

edited

Loading

Shastick Aug 23, 2024 •

edited

Loading

Shastick commented Aug 30, 2024 •

edited

Loading

Shastick commented Sep 6, 2024 •

edited

Loading