Entity deregistration API proposals #2779

albinsuresh · 2024-03-14T11:31:31Z

Proposed changes

Entity deregistration API proposals with pros and cons of each approach.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

didier-wenzek

For the following reason, I prefer a solution where the user triggers the de-registration by clearing retained messages and thin-edge takes care of cascading the de-registration to avoid orphans.

Symmetry. Child devices, services and capabilities are registered by publishing retained messages. The same mechanism should be used to de-register these entities. At least to trigger the de-registration. What is done by the proposed tedge deregister command can be done by the mapper or the agent on reception of an empty message on te/device/child01//.
Completeness. A new tedge deregister command is call for a tedge register command. Similarly, if we opt for an HTTP DELETE method to de-register an entity, I would expect an HTTP PUT for entity registration. And even an HTTP GET to see the current state.
Robustness. As retained message are used to store entities metadata and relationship, thin-edge must handle properly clearing retained messages. i.e. cascading clearing messages from a child devices to all its metadata, services and capabilities.

There are open questions though:

As highlighted, in the 3rd proposal, it might be better that the agent and not the mapper triggers the appropriate cascade of de-registration. However, currently, the agent has no view on the entities.
Cascading the removal of all grand-child devices on child-device de-registration might be error prone. Can these grand-child devices attached to the parent of the child-device that has been removed?

didier-wenzek · 2024-03-14T17:28:58Z

design/decisions/0004-entity-dregister-api.md

+
+**Cons**
+
+* Deregistrations can't be done remotely, unless `tedge` is installed on those remote device as well.


Why? As far I understand the proposal, this tedge deregister command uses MQTT to fetch and clear the relevant topics and the mapper does the clean-up on disk and on the cloud.

What I meant is that, if the API to deregister the entities is provided only as a tedge command, then it can only be invoked from a device where tedge is installed, even though the command internally uses MQTT only. That's why it is still possible to do this remotely as well, as long as tedge is installed on that remote device as well.

I still see value for having a user friendly tedge deregister command, however that should just use the MQTT or HTTP API that we provide to deregister an entity. The command should not have to subscribe to messages itself.

design/decisions/0004-entity-dregister-api.md

didier-wenzek · 2024-03-14T17:34:34Z

design/decisions/0004-entity-dregister-api.md

+mosquitto_sub --remove-retained -t "te/device/child01/+/+/#"
+```
+
+The mappers must also be updated to remove entities from their entity store on receipt of these clear messages.


Don't the mappers have to do that in any case?

Currently, the mapper just ignores empty registration messages. So, this mapper change is a must-have regardless of the solution that we pick.

Since the mappers are responsible for the interactions with the clouds, they should also be configurable when removing the device from the cloud as sometimes you only want to delete the local data and want to keep the data in the cloud (e.g. measurements from a decommissioned device), and since not deleting the data from the cloud is less destructive, then that should be the default (to avoid unexpected data loss).

didier-wenzek · 2024-03-14T17:41:02Z

design/decisions/0004-entity-dregister-api.md

+The responsibility of issuing the clear messages is left with the agent instead of the mapper,
+which already has its own entity store, to prevent multiple mappers from doing the same simultaneously.


That's an interesting point. It would be indeed better to avoid duplicates of clean-up messages when two mappers are running. That said would this be an issue? Clearing a retained message is idempotent as well as all the clearing operations on-disk and in the cloud.

Yeah, multiple mappers issuing clear messages is not really a blocker issue or anything, considering their idempotent nature.

There is also the ownership aspect of that responsibility. If it's with tedge-agent, then we provide it out of the box irrespective of the mapper used. But, if we leave it to the mapper, then we need to replicate it in every mapper (of course we can reuse the logic with our own mappers), but even third-party mappers will have to support the same from their end. That's why I was more in favour of our own component owning that responsibility, and the mapper just being a mapper of those clear messages.

There is also the ownership aspect of that responsibility. If it's with tedge-agent, then we provide it out of the box irrespective of the mapper used. But, if we leave it to the mapper, then we need to replicate it in every mapper (of course we can reuse the logic with our own mappers), but even third-party mappers will have to support the same from their end. That's why I was more in favour of our own component owning that responsibility, and the mapper just being a mapper of those clear messages.

This makes sense. I wonder if this could be then pushed a bit further by also moving the auto-registration logic into the agent. The latter will be then responsible of maintaining a consistent view of the device entities.

Maybe we could use more explicit "Death" messages (sparkplug b spec has both BIRTH and DEATH messages to communicate when an entity is being added or removed), then the mapper could react to the explicit "death" message (to unregister the appropriate entity)

Some examples of the message provided by sparkplug B

albinsuresh · 2024-03-15T06:30:59Z

For the following reason, I prefer a solution where the user triggers the de-registration by clearing retained messages and thin-edge takes care of cascading the de-registration to avoid orphans.

I'm completely aligned with that thought on symmetry, and using empty registration message on the same topic id was the first idea that I started with as well.

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

If we change it from an empty retained message to a non-retained message with the filtering options as well, then it kinda beats the symmetry aspect and complicates the non-empty payload handling as well (parse and then determine if the payload is a registration message or a deregister message).

The second reason why I modelled it specifically as a req was to open up the possibility of a counterpart res where we can send the final status of the deregister operation. That also would have been weird, if we reused the same device topic id for deregistration as well, as the status would have to be sent to a completely independent topic anyway.

Similarly, if we opt for an HTTP DELETE method to de-register an entity, I would expect an HTTP PUT for entity registration. And even an HTTP GET to see the current state.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

Cascading the removal of all grand-child devices on child-device de-registration might be error prone. Can these grand-child devices attached to the parent of the child-device that has been removed?

But, isn't that the most common use-case? When I delete a child device from C8y, I almost always want its entire child hierarchy also gone with it, as cleanup is usually my primary goal. So, I definitely would want that as the default behaviour as I can't really imagine the use-case for attaching a child device's sensors (nested child devices) or services to a parent device and what would that even mean, at a physical deployment level. On the cloud also, esp C8y, migrating parents is going to be a complex operation. But, open to explore those potential use-cases and provide this as an extra option.

didier-wenzek · 2024-03-15T08:18:01Z

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

I appreciate the benefits but wonder if it's worth it. Indeed, this will add quite a lot of complexity (for the user and in the code), for something that is not a daily use-case.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

You are correct. See thin-edge/thin-edge.io-roadmap#75. However, I would take a different approach: building the HTTP API on top of the MQTT API, exposing all the te topics, using PUT and GET for retained messages, and POST to published telemetry data.

reubenmiller

In general I would be in favour of the MQTT interface being the primary way to deregister entities, and the other two could just be lightweight interfaces to the MQTT API (that we could implement piece by piece after the MQTT API is implemented).

reubenmiller · 2024-03-18T09:08:34Z

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

I appreciate the benefits but wonder if it's worth it. Indeed, this will add quite a lot of complexity (for the user and in the code), for something that is not a daily use-case.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

You are correct. See thin-edge/thin-edge.io-roadmap#75. However, I would take a different approach: building the HTTP API on top of the MQTT API, exposing all the te topics, using PUT and GET for retained messages, and POST to published telemetry data.

We will need to have options regarding the deregistration in the near future so that users can control whether the device should also be deleted from the cloud or not, so designing around this criteria will be useful.

Using the <topic_id>/cmd/deregister/local-1234 topic will be beneficial as well it will ensure that the deregistration can be resumed if it is interrupted half-way through (as the cmd use the retained flag), and also have a clear status mechanism that could also be tracked by interested clients.

didier-wenzek

I'm not really happy with the asymmetry introduced here:

registration done using declarative retained messages (describing a view over the entities),
deregistration done using procedural commands (using retained messages, sure, but for a different purpose).

This will be a source of confusion for the users but also for the implementers. This is already proven to be a source of problems with the mix of auto-registration and user-defined MQTT schema.

Can we:

use empty retain messages as the main mechanism to prune an entity and all its sub-entities
add a deregister command to cherry pick the entities to be removed and publish the appropriate empty retain messages.
deprecate auto-registration in favor of a register command similar/symmetric to the proposed deregister command.

design/decisions/0004-entity-dregister-api.md

didier-wenzek · 2024-04-03T08:40:40Z

design/decisions/0004-entity-dregister-api.md

+
+This message will result in the deregistration of that target entity and all the nested entities (services and child devices) linked to it.
+
+Different `tedge` components will react to the deregistration messages as follows:


So, if I understand correctly:

The user emits a single MQTT message to prune a hierarchy of entities.

This clearing message is an empty retained message published on the root topic of the pruned hierarchy

The mappers propagate entity removal to the cloud, pruning all attached entities (services, child devices, twin metadata, capabilities, ...)

The agent has to responsibility to clear the local view of the entities. It cascades clearing messages along the pruned hierarchy ensuring these sub-entities will not be re-created on mapper restart.

This is okay, but we need to be aware that combined with auto-registration this might lead to strange hierarchy if the device is restarted before the agent gets a chance to prune all the sub-entities. Some will be re-created by the mapper but using the auto-registration logic.

didier-wenzek

Approved.

However, as said in this comment we will have to re-think auto-registration in combination with deregistration.

github-actions · 2024-04-25T05:49:35Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
429	0	3	429	100	0s

albinsuresh requested review from reubenmiller and didier-wenzek as code owners March 14, 2024 11:31

albinsuresh had a problem deploying to Test Pull Request March 14, 2024 11:31 — with GitHub Actions Failure

didier-wenzek reviewed Mar 14, 2024

View reviewed changes

reubenmiller reviewed Mar 15, 2024

View reviewed changes

reubenmiller added theme:mqtt Theme: mqtt and mosquitto related topics theme:registration Theme: Device registration and device certificate related topics labels Mar 15, 2024

albinsuresh had a problem deploying to Test Pull Request March 22, 2024 06:18 — with GitHub Actions Failure

albinsuresh had a problem deploying to Test Pull Request March 22, 2024 06:23 — with GitHub Actions Failure

albinsuresh requested review from reubenmiller and didier-wenzek March 22, 2024 06:32

didier-wenzek reviewed Mar 25, 2024

View reviewed changes

design/decisions/0004-entity-dregister-api.md Show resolved Hide resolved

design/decisions/0004-entity-dregister-api.md Outdated Show resolved Hide resolved

albinsuresh had a problem deploying to Test Pull Request March 29, 2024 12:07 — with GitHub Actions Failure

didier-wenzek reviewed Apr 3, 2024

View reviewed changes

albinsuresh force-pushed the feat/delete-entity-api branch from 61ac7e0 to cfa79cb Compare April 22, 2024 19:31

albinsuresh had a problem deploying to Test Pull Request April 22, 2024 19:31 — with GitHub Actions Failure

didier-wenzek approved these changes Apr 23, 2024

View reviewed changes

albinsuresh force-pushed the feat/delete-entity-api branch from cfa79cb to bc13827 Compare April 25, 2024 05:22

albinsuresh had a problem deploying to Test Pull Request April 25, 2024 05:22 — with GitHub Actions Failure

Entity deregistration API proposals

3d60970

albinsuresh force-pushed the feat/delete-entity-api branch from bc13827 to 3d60970 Compare April 25, 2024 05:23

albinsuresh temporarily deployed to Test Pull Request April 25, 2024 05:23 — with GitHub Actions Inactive

albinsuresh temporarily deployed to Test Auto April 25, 2024 05:39 — with GitHub Actions Inactive

albinsuresh enabled auto-merge April 25, 2024 05:46

albinsuresh added this pull request to the merge queue Apr 25, 2024

Merged via the queue into thin-edge:main with commit 329fedd Apr 25, 2024
30 checks passed

albinsuresh deleted the feat/delete-entity-api branch April 25, 2024 18:11

reubenmiller mentioned this pull request Sep 30, 2024

How to clear deleted containers? thin-edge/tedge-container-plugin#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity deregistration API proposals #2779

Entity deregistration API proposals #2779

albinsuresh commented Mar 14, 2024 •

edited

Loading

didier-wenzek left a comment

didier-wenzek Mar 14, 2024

albinsuresh Mar 15, 2024

reubenmiller Mar 15, 2024

didier-wenzek Mar 14, 2024

albinsuresh Mar 15, 2024

reubenmiller Mar 15, 2024

didier-wenzek Mar 14, 2024

albinsuresh Mar 15, 2024

didier-wenzek Mar 15, 2024

reubenmiller Mar 15, 2024

albinsuresh commented Mar 15, 2024

didier-wenzek commented Mar 15, 2024

reubenmiller left a comment

reubenmiller commented Mar 18, 2024

didier-wenzek left a comment

didier-wenzek Apr 3, 2024

didier-wenzek left a comment

github-actions bot commented Apr 25, 2024


		Cons

		* Deregistrations can't be done remotely, unless `tedge` is installed on those remote device as well.

		The responsibility of issuing the clear messages is left with the agent instead of the mapper,
		which already has its own entity store, to prevent multiple mappers from doing the same simultaneously.


		This message will result in the deregistration of that target entity and all the nested entities (services and child devices) linked to it.

		Different `tedge` components will react to the deregistration messages as follows:

Entity deregistration API proposals #2779

Entity deregistration API proposals #2779

Conversation

albinsuresh commented Mar 14, 2024 • edited Loading

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albinsuresh commented Mar 15, 2024

didier-wenzek commented Mar 15, 2024

reubenmiller left a comment

Choose a reason for hiding this comment

reubenmiller commented Mar 18, 2024

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 25, 2024

Robot Results

albinsuresh commented Mar 14, 2024 •

edited

Loading