Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity deregistration API proposals #2779

Merged
merged 1 commit into from
Apr 25, 2024

Conversation

albinsuresh
Copy link
Contributor

@albinsuresh albinsuresh commented Mar 14, 2024

Proposed changes

Entity deregistration API proposals with pros and cons of each approach.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the following reason, I prefer a solution where the user triggers the de-registration by clearing retained messages and thin-edge takes care of cascading the de-registration to avoid orphans.

  1. Symmetry. Child devices, services and capabilities are registered by publishing retained messages. The same mechanism should be used to de-register these entities. At least to trigger the de-registration. What is done by the proposed tedge deregister command can be done by the mapper or the agent on reception of an empty message on te/device/child01//.
  2. Completeness. A new tedge deregister command is call for a tedge register command. Similarly, if we opt for an HTTP DELETE method to de-register an entity, I would expect an HTTP PUT for entity registration. And even an HTTP GET to see the current state.
  3. Robustness. As retained message are used to store entities metadata and relationship, thin-edge must handle properly clearing retained messages. i.e. cascading clearing messages from a child devices to all its metadata, services and capabilities.

There are open questions though:

  1. As highlighted, in the 3rd proposal, it might be better that the agent and not the mapper triggers the appropriate cascade of de-registration. However, currently, the agent has no view on the entities.
  2. Cascading the removal of all grand-child devices on child-device de-registration might be error prone. Can these grand-child devices attached to the parent of the child-device that has been removed?


**Cons**

* Deregistrations can't be done remotely, unless `tedge` is installed on those remote device as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? As far I understand the proposal, this tedge deregister command uses MQTT to fetch and clear the relevant topics and the mapper does the clean-up on disk and on the cloud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that, if the API to deregister the entities is provided only as a tedge command, then it can only be invoked from a device where tedge is installed, even though the command internally uses MQTT only. That's why it is still possible to do this remotely as well, as long as tedge is installed on that remote device as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see value for having a user friendly tedge deregister command, however that should just use the MQTT or HTTP API that we provide to deregister an entity. The command should not have to subscribe to messages itself.

design/decisions/0004-entity-dregister-api.md Outdated Show resolved Hide resolved
mosquitto_sub --remove-retained -t "te/device/child01/+/+/#"
```

The mappers must also be updated to remove entities from their entity store on receipt of these clear messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the mappers have to do that in any case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the mapper just ignores empty registration messages. So, this mapper change is a must-have regardless of the solution that we pick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the mappers are responsible for the interactions with the clouds, they should also be configurable when removing the device from the cloud as sometimes you only want to delete the local data and want to keep the data in the cloud (e.g. measurements from a decommissioned device), and since not deleting the data from the cloud is less destructive, then that should be the default (to avoid unexpected data loss).

Comment on lines +100 to +154
The responsibility of issuing the clear messages is left with the agent instead of the mapper,
which already has its own entity store, to prevent multiple mappers from doing the same simultaneously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting point. It would be indeed better to avoid duplicates of clean-up messages when two mappers are running. That said would this be an issue? Clearing a retained message is idempotent as well as all the clearing operations on-disk and in the cloud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, multiple mappers issuing clear messages is not really a blocker issue or anything, considering their idempotent nature.

There is also the ownership aspect of that responsibility. If it's with tedge-agent, then we provide it out of the box irrespective of the mapper used. But, if we leave it to the mapper, then we need to replicate it in every mapper (of course we can reuse the logic with our own mappers), but even third-party mappers will have to support the same from their end. That's why I was more in favour of our own component owning that responsibility, and the mapper just being a mapper of those clear messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the ownership aspect of that responsibility. If it's with tedge-agent, then we provide it out of the box irrespective of the mapper used. But, if we leave it to the mapper, then we need to replicate it in every mapper (of course we can reuse the logic with our own mappers), but even third-party mappers will have to support the same from their end. That's why I was more in favour of our own component owning that responsibility, and the mapper just being a mapper of those clear messages.

This makes sense. I wonder if this could be then pushed a bit further by also moving the auto-registration logic into the agent. The latter will be then responsible of maintaining a consistent view of the device entities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use more explicit "Death" messages (sparkplug b spec has both BIRTH and DEATH messages to communicate when an entity is being added or removed), then the mapper could react to the explicit "death" message (to unregister the appropriate entity)

Some examples of the message provided by sparkplug B

@albinsuresh
Copy link
Contributor Author

For the following reason, I prefer a solution where the user triggers the de-registration by clearing retained messages and thin-edge takes care of cascading the de-registration to avoid orphans.

I'm completely aligned with that thought on symmetry, and using empty registration message on the same topic id was the first idea that I started with as well.

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

If we change it from an empty retained message to a non-retained message with the filtering options as well, then it kinda beats the symmetry aspect and complicates the non-empty payload handling as well (parse and then determine if the payload is a registration message or a deregister message).

The second reason why I modelled it specifically as a req was to open up the possibility of a counterpart res where we can send the final status of the deregister operation. That also would have been weird, if we reused the same device topic id for deregistration as well, as the status would have to be sent to a completely independent topic anyway.

Similarly, if we opt for an HTTP DELETE method to de-register an entity, I would expect an HTTP PUT for entity registration. And even an HTTP GET to see the current state.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

Cascading the removal of all grand-child devices on child-device de-registration might be error prone. Can these grand-child devices attached to the parent of the child-device that has been removed?

But, isn't that the most common use-case? When I delete a child device from C8y, I almost always want its entire child hierarchy also gone with it, as cleanup is usually my primary goal. So, I definitely would want that as the default behaviour as I can't really imagine the use-case for attaching a child device's sensors (nested child devices) or services to a parent device and what would that even mean, at a physical deployment level. On the cloud also, esp C8y, migrating parents is going to be a complex operation. But, open to explore those potential use-cases and provide this as an extra option.

@didier-wenzek
Copy link
Contributor

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

I appreciate the benefits but wonder if it's worth it. Indeed, this will add quite a lot of complexity (for the user and in the code), for something that is not a daily use-case.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

You are correct. See thin-edge/thin-edge.io-roadmap#75. However, I would take a different approach: building the HTTP API on top of the MQTT API, exposing all the te topics, using PUT and GET for retained messages, and POST to published telemetry data.

Copy link
Contributor

@reubenmiller reubenmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would be in favour of the MQTT interface being the primary way to deregister entities, and the other two could just be lightweight interfaces to the MQTT API (that we could implement piece by piece after the MQTT API is implemented).

@reubenmiller reubenmiller added theme:mqtt Theme: mqtt and mosquitto related topics theme:registration Theme: Device registration and device certificate related topics labels Mar 15, 2024
@reubenmiller
Copy link
Contributor

Even though it works for the default case where the device and all its linked entities are deregisterd, it was the desire to support those additional targeting/filtering capabilities, like removing only the services of that target device or only the children of that target device, that pushed me towards a completely independent API. I really feel that a single API to remove all the linked entities of a device would come in really handy, especially for the main device. And this wouldn't have been possible if we used empty retained messages as the cue for deregistration.

I appreciate the benefits but wonder if it's worth it. Indeed, this will add quite a lot of complexity (for the user and in the code), for something that is not a daily use-case.

I thought those HTTP endpoints are already in the roadmap, and I assumed this task would be the first step towards that goal.

You are correct. See thin-edge/thin-edge.io-roadmap#75. However, I would take a different approach: building the HTTP API on top of the MQTT API, exposing all the te topics, using PUT and GET for retained messages, and POST to published telemetry data.

We will need to have options regarding the deregistration in the near future so that users can control whether the device should also be deleted from the cloud or not, so designing around this criteria will be useful.

Using the <topic_id>/cmd/deregister/local-1234 topic will be beneficial as well it will ensure that the deregistration can be resumed if it is interrupted half-way through (as the cmd use the retained flag), and also have a clear status mechanism that could also be tracked by interested clients.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy with the asymmetry introduced here:

  • registration done using declarative retained messages (describing a view over the entities),
  • deregistration done using procedural commands (using retained messages, sure, but for a different purpose).

This will be a source of confusion for the users but also for the implementers. This is already proven to be a source of problems with the mix of auto-registration and user-defined MQTT schema.

Can we:

  • use empty retain messages as the main mechanism to prune an entity and all its sub-entities
  • add a deregister command to cherry pick the entities to be removed and publish the appropriate empty retain messages.
  • deprecate auto-registration in favor of a register command similar/symmetric to the proposed deregister command.

design/decisions/0004-entity-dregister-api.md Show resolved Hide resolved
design/decisions/0004-entity-dregister-api.md Outdated Show resolved Hide resolved

This message will result in the deregistration of that target entity and all the nested entities (services and child devices) linked to it.

Different `tedge` components will react to the deregistration messages as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand correctly:

  • The user emits a single MQTT message to prune a hierarchy of entities.
  • This clearing message is an empty retained message published on the root topic of the pruned hierarchy
  • The mappers propagate entity removal to the cloud, pruning all attached entities (services, child devices, twin metadata, capabilities, ...)
  • The agent has to responsibility to clear the local view of the entities. It cascades clearing messages along the pruned hierarchy ensuring these sub-entities will not be re-created on mapper restart.

This is okay, but we need to be aware that combined with auto-registration this might lead to strange hierarchy if the device is restarted before the agent gets a chance to prune all the sub-entities. Some will be re-created by the mapper but using the auto-registration logic.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

However, as said in this comment we will have to re-think auto-registration in combination with deregistration.

Copy link
Contributor

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
429 0 3 429 100 0s

@albinsuresh albinsuresh added this pull request to the merge queue Apr 25, 2024
Merged via the queue into thin-edge:main with commit 329fedd Apr 25, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:mqtt Theme: mqtt and mosquitto related topics theme:registration Theme: Device registration and device certificate related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants