Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add watchdog tutorial. #456

Merged
merged 4 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_images/watchdog.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions docs/tutorials/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import Supabase from "../_images/supabase_logo.svg";
import AwsIotCore from "../_images/aws_iot_core.png";
import Ota from "../_images/ota.png";
import CellTower from "../_images/cellular_tower.svg";
import Watchdog from "../_images/watchdog.svg";

# Tutorials

Expand Down Expand Up @@ -475,6 +476,16 @@ Learn how to work with secrets in your Toit project.
Learn how to update your device's firmware over the air.
</Box>

<Box title="Watchdog" to="misc/watchdog">
<NonZoomableImage
src={Watchdog}
alt="An icon of a dog."
width="40%"
/>

Ensure that your device never gets stuck and is always running with a watchdog.
</Box>

</Boxes>

## Starter projects
Expand Down
175 changes: 175 additions & 0 deletions docs/tutorials/misc/watchdog.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Watchdogs
This tutorial will show you how to use watchdogs to monitor the system and reset it if it hangs.

Watchdogs are hardware timers that can be used to reset the system if it doesn't respond
the way it should. Users set up an interval at which the watchdog should be fed. If the
watchdog is not fed within that interval, the system resets. The hope is that the system
will be able to recover from whatever caused it to hang by doing a hard reset.

## Prerequisites
We assume that you have set up your development environment as described
in [the IDE tutorial](../../setup/ide).

We also assume that you have flashed your device with Jaguar and that
you are familiar with running Toit programs on it.
If not, have a look at the [Hello world](../../setup/firstprogram) tutorial.

Watchdogs use services. While not necessary, you may want to read the
[services](../../containers/services) tutorial to learn more about them.

## Packages
While the system watchdog is part of the core libraries, the nicer
high-level abstraction of watchdogs needs to be installed as a package.
See the [packages](../../setup/packages) tutorial for more information.

To install the [watchdog](https://pkg.toit.io/package/github.com%2Ftoitware%2Ftoit-watchdog@v1)
package run the following command:

``` shell
jag pkg install github.com/toitware/toit-watchdog@v1
```

## Code
Start a new Toit program `watchdog.toit` and watch it with Jaguar.
Be aware that the following program has some side effects, and will likely
force your device to reset.

``` toit
import watchdog.provider
import watchdog show WatchdogServiceClient

main:
// Start the watchdog provider.
provider.main

// Create a watchdog client that connects to the provider.
client := WatchdogServiceClient
// Connect to the provider that has been started earlier.
client.open

// Create a watchdog.
dog := client.create "docs.toit.io/tutorial/my-dog"

// Require a feeding every 60 seconds.
dog.start --s=60

// Feed it:
dog.feed

// Stop it, if not necessary:
dog.stop

// When stopped, close it.
dog.close

print "done"
```

The watchdog provider is a service that runs in the background and
provides watchdogs to clients. In our case the provider is started by
the `provider.main` function.

The client connects to the provider and creates a watchdog. The string
that is passed to the constructor identifies the watchdog. Even if
a program crashes, it can avoid a system reset if it restarts fast
enough and feeds a watchdog with the same ID in time. See
the "Recommendations" section below for more information.

We then show how to start, feed, stop and close the watchdog.

Note that the program prints "done", but does not exit. This is because
the provider is still running in the background. For various reasons
it is recommended to run the watchdog provider in its own container.


<Note>

It is critical that the watchdog functionality isn't shut down accidentally.
Contrary to other resources, watchdogs are thus not cleaned up automatically
when the program exits (cleanly or not). Instead, the underlying system
watchdog is kept running, thus guaranteeing that the system will reset.

In our case a `jag run` (or `jag watch`) might stop the program while the
watchdog provider is active. Despite reinstalling a new version of the
program, the old watchdog provider could still be running and then
force a reset.

This issue is mostly avoided if the provider
is installed in its own container (see the next section). In that case
reinstalling the program doesn't affect the container that contains
the watchdog provider.

User programs might still be aborted by Jaguar, but by reinstalling new
versions of them they can recover, since the names are IDs that identify
the watchdogs.

</Note>

## Running the watchdog provider in a container
To run the watchdog provider in a container, we need to create a
simple entrypoint script that starts the provider. Create a new file
`watchdog-provider.toit` with the following content:

``` toit
import watchdog.provider

main:
provider.main
```

Install it on your device with the following command:

``` shell
jag container install watchdog watchdog-provider.toit
```

This installs and starts the watchdog provider in a container named
`watchdog`. See the [container](../../containers) tutorial for more
information about containers.

Once installed, other containers can connect to the watchdog provider
without having to start it themselves.

## Using the shared watchdog provider
We can now modify our watchdog program to use the shared watchdog

``` toit
import watchdog show WatchdogServiceClient

main:
client := WatchdogServiceClient
client.open // Now connects to the shared watchdog provider.

dog := client.create "docs.toit.io/tutorial/my-dog"
...
```

Note that multiple containers can connect to the same watchdog provider.

Try to reduce the feeding interval and remove the `stop`/`close` to see how
the system reboots.

## Recommendations
Watchdogs are a powerful tool to make sure that the system doesn't hang. In
this section we give some recommendations on how best to use them in Toit.

1. Give watchdogs enough time, or disable them when appropriate. For example,
a system update might disable other programs. If these other programs had
a watchdog timer, then the update process could be interrupted by the
stale watchdog of the stopped program.
2. Feed when important actions happen. For example, feed the dog, when data
has been uploaded, or when a sensor has been read and processed.
Contrary to the examples do *not*
feed a watchdog after it has been created. If your application has a
crash loop it could start up, create the watchdog, feed it, and then die
immediately afterwards.
3. Use different watchdogs. Feel free to have a watchdog for uploading (for
example every 30 minutes), and another that is fed every 3 minutes, when
the device expects a ping from a server.
4. Make sure to clean up *after* the watchdog has reset your system. If
your board has external sensors/peripherals, they might not be in a clean
state. You can use the
[reset-reason](https://libs.toit.io/esp32/library-summary#reset-reason(0%2C0%2C0%2C))
function to determine why a system has booted. If the reason wasn't a
deepsleep, then you have to assume that the external peripherals are in an unknown
state.
1 change: 0 additions & 1 deletion docs/tutorials/network/cellular.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ jag pkg install github.com/toitware/toit-cert-roots@v1
```

## Code

Start a new Toit program `walter.toit` and watch it with Jaguar.

``` toit
Expand Down
8 changes: 7 additions & 1 deletion tools/package.lock
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sdk: ^2.0.0-alpha.108
sdk: ^2.0.0-alpha.118
prefixes:
bme280: bme280-driver
cellular: cellular
Expand All @@ -20,6 +20,7 @@ prefixes:
ssd1306: toit-ssd1306
supabase: toit-supabase
telegram: toit-telegram
watchdog: toit-watchdog
packages:
bme280-driver:
url: github.com/toitware/bme280-driver
Expand Down Expand Up @@ -131,3 +132,8 @@ packages:
prefixes:
certificate_roots: toit-cert-roots
http: pkg-http
toit-watchdog:
url: github.com/toitware/toit-watchdog
name: watchdog
version: 1.2.0
hash: 61e70151f3623e60464e6bab8aa00825fd41f320
3 changes: 3 additions & 0 deletions tools/package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,6 @@ dependencies:
telegram:
url: github.com/floitsch/toit-telegram
version: ^0.5.2
watchdog:
url: github.com/toitware/toit-watchdog
version: ^1.2.0
1 change: 1 addition & 0 deletions tools/run_snippets.toit
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ THINGS-THAT-WONT-RUN-ON-SERVER ::= [
"import esp32",
"import system.containers",
"import net.cellular",
"import watchdog",
]

main args -> none:
Expand Down
9 changes: 9 additions & 0 deletions tutorial_code/watchdog/package.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
sdk: ^2.0.0-alpha.118
prefixes:
watchdog: toit-watchdog
packages:
toit-watchdog:
url: github.com/toitware/toit-watchdog
name: watchdog
version: 1.2.0
hash: 61e70151f3623e60464e6bab8aa00825fd41f320
4 changes: 4 additions & 0 deletions tutorial_code/watchdog/package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
dependencies:
watchdog:
url: github.com/toitware/toit-watchdog
version: ^1.2.0
8 changes: 8 additions & 0 deletions tutorial_code/watchdog/watchdog-provider.toit
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Copyright (C) 2023 Toitware ApS.
// Use of this source code is governed by a Zero-Clause BSD license that can
// be found in the LICENSE_BSD0 file.

import watchdog.provider

main:
provider.main
32 changes: 32 additions & 0 deletions tutorial_code/watchdog/watchdog_v1.toit
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Copyright (C) 2023 Toitware ApS.
// Use of this source code is governed by a Zero-Clause BSD license that can
// be found in the LICENSE_BSD0 file.

import watchdog.provider
import watchdog show WatchdogServiceClient

main:
// Start the watchdog provider.
provider.main

// Create a watchdog client that connects to the provider.
client := WatchdogServiceClient
// Connect to the provider that has been started earlier.
client.open

// Create a watchdog.
dog := client.create "docs.toit.io/tutorial/my-dog"

// Require a feeding every 60 seconds.
dog.start --s=60

// Feed it:
dog.feed

// Stop it, if not necessary:
dog.stop

// When stopped, close it.
dog.close

print "done"
28 changes: 28 additions & 0 deletions tutorial_code/watchdog/watchdog_v2.toit
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Copyright (C) 2023 Toitware ApS.
// Use of this source code is governed by a Zero-Clause BSD license that can
// be found in the LICENSE_BSD0 file.

import watchdog show WatchdogServiceClient

main:
// Create a watchdog client that connects to the provider.
client := WatchdogServiceClient
// Connect to the provider that has been started earlier.
client.open

// Create a watchdog.
dog := client.create "docs.toit.io/tutorial/my-dog"

// Require a feeding every 60 seconds.
dog.start --s=60

// Feed it:
dog.feed

// Stop it, if not necessary:
dog.stop

// When stopped, close it.
dog.close

print "done"