Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial follow up tasks for built-in bridge #2810

Conversation

jarhodes314
Copy link
Contributor

@jarhodes314 jarhodes314 commented Apr 3, 2024

Proposed changes

These are some of the changes discussed in #2794, specifically:

  • Ensure custom templates work (this was just a check I performed, no code changes were made)
  • Modifying the bridge healthcheck message to use thin-edge json, rather than 0/1 like mosquitto
  • Fix connectivity check in tedge connect c8y so new devices connect successfully

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@@ -341,6 +341,10 @@ impl CumulocityConverter {
let ancestors_external_ids =
self.entity_store.ancestors_external_ids(entity_topic_id)?;

if entity_topic_id.is_bridge_health_topic() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused as to why I needed to make this change twice to actually get the bridge to not appear in c8y. Both converter.rs and service_monitor.rs appear to publish status messages for services, with converter.rs doing it relatively cleanly, while service_monitor.rs is publishing both down & up messages when a service starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purposes are different:

  • service_monitor.rs is really about telling to c8y that a service is up or down.
  • converter.rs register the services in c8y before any message is sent about these services. They are initialized with an up status - but this is somehow unrelated.

Now that the status messages sent for the bridge are correctly parsed as for any other service, we have to explicitly add an exception to not register the bridge as a service.
Before your change the bridge status message were not correctly parsed leading to an error lost in the log with .context("Could not create service creation message")?;.

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 49.29577% with 36 lines in your changes are missing coverage. Please review.

Project coverage is 76.6%. Comparing base (b006a79) to head (2ada77d).
Report is 12 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
...rates/core/tedge/src/cli/certificate/create_csr.rs 87.1% <100.0%> (+0.5%) ⬆️
crates/core/tedge/src/cli/certificate/renew.rs 86.7% <100.0%> (+0.4%) ⬆️
crates/core/tedge_api/src/mqtt_topics.rs 87.5% <100.0%> (+<0.1%) ⬆️
...s/extensions/c8y_mapper_ext/src/service_monitor.rs 96.3% <100.0%> (-0.1%) ⬇️
crates/core/tedge/src/cli/certificate/create.rs 83.0% <90.0%> (+<0.1%) ⬆️
crates/core/tedge_mapper/src/c8y/mapper.rs 0.0% <0.0%> (ø)
crates/extensions/c8y_mapper_ext/src/config.rs 46.9% <0.0%> (ø)
crates/extensions/c8y_mapper_ext/src/converter.rs 83.2% <66.6%> (-0.1%) ⬇️
crates/core/c8y_api/src/utils.rs 68.9% <75.0%> (-9.5%) ⬇️
crates/extensions/c8y_mapper_ext/src/actor.rs 80.6% <60.0%> (-0.3%) ⬇️
... and 4 more

... and 6 files with indirect coverage changes

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
428 0 3 428 100 0s

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with these changes except for the added timeout causing the mapper to start even if the bridge is not established.

Comment on lines +12 to +28
&& (payload == MQTT_BRIDGE_UP_PAYLOAD
|| payload == MQTT_BRIDGE_DOWN_PAYLOAD
|| is_valid_status_payload(payload))
}
Err(_err) => false,
}
}

#[derive(serde::Deserialize)]
struct HealthStatus<'a> {
status: &'a str,
}

fn is_valid_status_payload(payload: &str) -> bool {
serde_json::from_str::<HealthStatus>(payload)
.map_or(false, |h| h.status == "up" || h.status == "down")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] I would move into tedge_api this logic to accept 0 | 1 | status = up | status = down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm removing the check for bridge health, then this more complicated logic isn't required, so unless we need it for some other reason you know of, I'll undo this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking that the bridge has been initialized before sending registration messages to c8y is required for a very new thin-edge device. If the c8y mapper starts before the bridge and even before the c8y.url is provided (as this is the case for a generic image of thin-edge), then the mapper must wait for the bridge to be initialized - otherwise all the registration messages are lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this logic is required - but can be simplified by simply checking that at least one message has been received on this topic - whatever the payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw 4ef0fcd and I'm okay with it.

Comment on lines 96 to 97
log_if_bridge_not_detected.abort();
info!("Cumulocity bridge is established, mapper is starting");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a background error printing task, that is aborted on the happy path, is more than unusual!

But that's not my main point.
I wonder, why introducing a timeout that was not present before?

Actually:

  • with the mosquitto bridge, the c8y mapper must wait for the bridge to be established (up or down it doesn't matter) otherwise all the init messages sent on the c8y/# topics will be lost (as there is no persisted subscriber).
  • with the built-in bridge, the c8y mapper can directly send the init messages, since there is one subscriber to the c8y/# topics (the mapper itself).

To make things simple, I would only take care of the case where awaiting the bridge is required.

Copy link
Contributor Author

@jarhodes314 jarhodes314 Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout doesn't change the behaviour of the waiting logic; it exists purely so the mapper can log that it's totally not functional because the bridge doesn't exist (and I couldn't think of a better way to notify than just "if nothing has happened after a certain amount of time, something is broken"). I didn't consider it too deeply as I was mostly concerned with "this bit of code that's critically breaking the mapper didn't tell me it existed" and that taking a while to debug. I agree that this isn't necessary though, and I should just skip it if c8y.bridge.built_in is true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't consider it too deeply as I was mostly concerned with "this bit of code that's critically breaking the mapper didn't tell me it existed" and that taking a while to debug.

That's true that some logging is required as the mapper is doing nothing this the bride is connected.

I should just skip it if c8y.bridge.built_in is true.

Yes, if this is actually simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw 4ef0fcd and I'm okay with it.

@@ -341,6 +341,10 @@ impl CumulocityConverter {
let ancestors_external_ids =
self.entity_store.ancestors_external_ids(entity_topic_id)?;

if entity_topic_id.is_bridge_health_topic() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purposes are different:

  • service_monitor.rs is really about telling to c8y that a service is up or down.
  • converter.rs register the services in c8y before any message is sent about these services. They are initialized with an up status - but this is somehow unrelated.

Now that the status messages sent for the bridge are correctly parsed as for any other service, we have to explicitly add an exception to not register the bridge as a service.
Before your change the bridge status message were not correctly parsed leading to an error lost in the log with .context("Could not create service creation message")?;.

Comment on lines +394 to +398
// FIXME: can also match "device/bridge//" or "/device/main/service/my_custom_bridge"
// should match ONLY the single mapper bridge
pub fn is_bridge_health_topic(&self) -> bool {
self.as_str().contains("bridge")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR might be the opportunity to resolve this FIXME.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

This allows `tedge cert create` to assign the correct certificate owner, allowing new devices to use the built-in bridge.

Signed-off-by: James Rhodes <[email protected]>
@jarhodes314 jarhodes314 force-pushed the 2794-follow-up-tasks-for-built-in-bridge branch from 449ebb7 to 2ada77d Compare April 8, 2024 14:40
@jarhodes314 jarhodes314 changed the title 2794 follow up tasks for built in bridge Initial follow up tasks for built-in bridge Apr 8, 2024
@jarhodes314 jarhodes314 added this pull request to the merge queue Apr 8, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2024
@jarhodes314 jarhodes314 added this pull request to the merge queue Apr 8, 2024
Merged via the queue into thin-edge:main with commit 046d6f4 Apr 8, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants