Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔧 Provide an option to suspend data collection #1104

Open
shankari opened this issue Jan 20, 2025 · 8 comments
Open

🔧 Provide an option to suspend data collection #1104

shankari opened this issue Jan 20, 2025 · 8 comments

Comments

@shankari
Copy link
Contributor

We just received a request from @mattwigway to add support for suspending data collection. They are running a panel study, so they want to suspend between multiple rounds of data collection.

The principled way to support this is to have the phone stop tracking. We do currently have the option to turn off tracking (profile -> tracking) but it needs to be done by the user at a per-phone level. This issue tracks the changes required to support this new functionality.

@asiripanich since you run panel studies as well

@shankari
Copy link
Contributor Author

There are a few potential options to change this in the short-term:

  1. Ask the users to manually turn off tracking
  2. Switch to a different server for the second round
  3. Ignore uploaded data if tracking is suspended

The first requires no changes, but requires individual users to make changes, which they will forget.

The second option would work as follows: People who do not turn off tracking will continue to accumulate data on their phones, but it won’t be accessible to anybody else while the server is offline.

Once the new server comes back online:

However, this also requires users to log off and log on with a new opcode, which increases user burden.
And it will still lead to additional storage on the phone when tracking is suspended.

I think that (3) solves both these issues, so I will expand on it in the next comment.

@Abby-Wheelis for visibility

@shankari
Copy link
Contributor Author

shankari commented Jan 20, 2025

Ideally, we would have an option that we would set once, at the project level. If it is set, then users will not have to do anything, but their data will magically not show up on the server.

As outlined earlier, the principled approach to make this happen is to have the phone stop tracking based on the config setting. But this is more tricky to implement; since we currently only download the config manually, and we don't currently store the tracking state separately, but have it reflected in the FSM directly.

A quicker-to-implement hack would be to have the server ignore any data that is received from the phone while tracking is suspended.

This would involve changing usercache/put to exit early if the program was suspended.

This means that:

  • users can have the app installed with the same opcode
  • when the program is suspended, they can launch the app and see their previous trips
  • the app will continue collecting data and sending it to the server
  • however, the uploaded data will simply vanish; it will not be stored anywhere
    • since the upload is "successful", the data will be deleted from the phone app
    • since the data was never stored on the server, it will not be visible there either

There may be a bit of an boundary issue, particularly if the user has a backlog of data on their phone - e.g. if the program is resumed on Jan 25, but a user has a backlog of data from Jan 20th, when that data is finally uploaded, it will be stored, although it was collected before the resume date.

A fairly simple workaround to that would be to have the suspended flag be a timestamp (or a range) instead of a boolean. usercache/put would then ignore all entries with a write_ts before the suspended timestamp (or within the range). This is not as simple as giant hammer of the early return, but it will avoid any boundary issues and is not too much more complicated to implement.

@mattwigway thoughts on this approach and whether it supports your use case?
@asiripanich thoughts on whether this would work for your panel studies?

@mattwigway
Copy link

mattwigway commented Jan 21, 2025

I think the only use case this doesn't cover is re-consenting users for Wave 2 (which is pretty common in academic research, I think). Unless I'm mistaken, the process above would resume data collection for everyone (who still has the app and hasn't manually turned off tracking) when data collection resumes, rather than requiring users to opt back in to collection.

@JGreenlee
Copy link

I think the only use case this doesn't cover is re-consenting users for Wave 2 (which is pretty common in academic research, I think).

Good point.

In cases where Wave 2 deviates from the original Wave 1 consent agreement, I think it may make more sense to assign new opcodes and put them under a new subgroup for Wave 2.
We should be able to pause/resume on the server at the subgroup level.

To link the Wave 1 and Wave 2 data, we'd have to establish a mapping, or have a way to transfer data from Wave 1 opcodes' data to Wave 2 opcodes (related to what we are discussing in #1103)

@shankari
Copy link
Contributor Author

shankari commented Jan 21, 2025

To link the Wave 1 and Wave 2 data, we'd have to establish a mapping, or have a way to transfer data from Wave 1 opcodes' data to Wave 2 opcodes (related to what we are discussing in #1103)

@mattwigway is it a requirement, or even a nice-to-have that users in wave 2 would expect to see their trips from wave 1?
wrt re-consenting, what is the desired flow?

Is it: user opens app in wave 2; user is prompted to consent to wave 2; tracking starts?

How is that substantially different from the approach of asking them to turn tracking off and on manually in the app?
I thought that the benefit of approach (3) is that you wouldn't need to rely on users to do anything, which would reduce user burden.

@mattwigway
Copy link

I think we need to be able to turn tracking off across the board, because people haven't consented to wave 2 yet.

For turning it back on, it's more flexible; a prompt in the app would be the ideal situation but I'm not expecting that to be possible. If there were a way to blanket turn off tracking so that folks could just go into the app and turn it back on would work, but it sounds like the tech stack isn't really built to enable that. I guess that leaves us with generating new opcodes.

Let me double check what the consent documents actually said and check with my co-PI to see if there might be any other options.

@asiripanich
Copy link
Member

Hi @AlirezaRa94 and @MaliheTabasi - If this feature is something we might need, it would be beneficial for us to contribute to the discussion as well.

@mattwigway
Copy link

I think our preference will be to do as @JGreenlee described—if it is possible to turn off tracking by opcode groups, we will turn off tracking for Wave 1 participants and create a new opcode group for Wave 2. We will have an opcode-to-participant mapping locally that we can use to link Wave 1 and Wave 2 participants.

What's the timeline for implementation of suspension of tracking?

shankari added a commit to e-mission/nrel-openpath-deploy-configs that referenced this issue Feb 3, 2025
This helps test e-mission/e-mission-docs#1104

I am not able to load local configs with the error

```
DEBUG:Received request to join dev-emulator-study
DEBUG: Received request to join dev-emulator-study
DEBUG:Running in dev environment, checking for locally hosted config
DEBUG: Running in dev environment, checking for locally hosted config
DEBUG:About to connect to http://localhost:9090/configs/dev-emulator-study.nrel-op.json
DEBUG: About to connect to http://localhost:9090/configs/dev-emulator-study.nrel-op.json
DEBUG:Local config not found TypeError: Load failed
DEBUG: Local config not found TypeError: Load failed
```

And I can't connect via the debugger (either using Safari or Safari Technology Preview" to see what is going on.
shankari added a commit to shankari/e-mission-server that referenced this issue Feb 3, 2025
This implements a hack for e-mission/e-mission-docs#1104
The request in the issue was to implement the ability to suspend and resume
data collection.

As a first step for implementing this, we:
- add a new section to the `opcode` in the dynamic config that lists the
  suspended subgroups, and
- skip all calls to `usercache/put` for users that are in suspended subgroup

This is a hack because the real fix is to stop data collection on the phone.

But even with this implementation:
- I have copied over the implementation of `getSubgroupFromToken` from the
  phone and manually ported it over to python. Instead, we should have all the
  implementations for parsing the opcode in `e-mission-common` and just use them here
- ideally, we would check for the subgroup being suspended in the
  `putIntoCache` method. But, by design, we only see the opcode in the auth
  module; the rest of the code works only with UUIDs. We should figure out a
  way to return the non-random parts of the opcode (program and subgroup) as a
  context so we can do additional validation downstream if needed. We can
  return this as a context variable, or store it into the profile and have the
  code check it from there.

Testing done:

After changing the default config

```
STUDY_CONFIG = os.getenv('STUDY_CONFIG', "dev-emulator-study")
```

and changing the `dev-emulator-study` config to have the new-style opcode with
a suspended subgroup

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
        "suspended_subgroups": ["default"]
    },
```

The suspended subgroup skipped all put messages

```
2025-02-02 18:57:35,723:DEBUG:6363951104:START POST /usercache/put
2025-02-02 18:57:35,724:DEBUG:6363951104:Called userCache.put
2025-02-02 18:57:35,742:DEBUG:6363951104:subgroup default found in list ['test', 'default']
2025-02-02 18:57:35,743:INFO:6363951104:Received put message for subgroup default in suspended
_subgroups=['default'], returning uuid = None
2025-02-02 18:57:35,743:DEBUG:6363951104:retUUID = None
2025-02-02 18:57:35,743:DEBUG:6380777472:START POST /usercache/get
2025-02-02 18:57:35,743:DEBUG:6363951104:END POST /usercache/put  0.023092985153198242
```

The non-suspended subgroup used the default implementation of put

```
2025-02-02 19:15:42,723:DEBUG:13035925504:START POST /usercache/put
2025-02-02 19:15:42,726:DEBUG:13035925504:Called userCache.put
2025-02-02 19:15:42,750:DEBUG:13035925504:subgroup test found in list ['test', 'default']
2025-02-02 19:15:42,759:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.711804 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede842'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 19:15:42,761:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.71355 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede844'), 'ok': 1.0, 'updatedExisting': False}
```

After modifying the config to remove suspended_subgroups (default case)

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
    },
```

```
2025-02-02 21:08:26,082:DEBUG:13052751872:START POST /usercache/put
2025-02-02 21:08:26,082:DEBUG:13052751872:Called userCache.put
2025-02-02 21:08:26,088:DEBUG:13052751872:subgroup default found in list ['test', 'default']
2025-02-02 21:08:26,088:DEBUG:13052751872:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-02 21:08:26,088:DEBUG:13052751872:Using the skip method to verify id token nrelop_dev-emulator-study_default_123 of length 37

2025-02-02 21:08:26,110:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/location, write_ts = 1738559116.0185199 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a88'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,112:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/filtered_location, write_ts = 1738559116.020091 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a8a'), 'ok': 1.0, 'updatedExisting': False}
...
2025-02-02 21:08:26,930:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = stats/client_nav_event, write_ts = 1738559306.0461679 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3da8'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,930:DEBUG:13052751872:END POST /usercache/put eb4a7aae-f2d4-4480-ba85-c568a45591b5 0.849830150604248
```
shankari added a commit to shankari/e-mission-server that referenced this issue Feb 3, 2025
This implements a hack for e-mission/e-mission-docs#1104
The request in the issue was to implement the ability to suspend and resume
data collection.

As a first step for implementing this, we:
- add a new section to the `opcode` in the dynamic config that lists the
  suspended subgroups, and
- skip all calls to `usercache/put` for users that are in suspended subgroup

This is a hack because the real fix is to stop data collection on the phone.

But even with this implementation:
- I have copied over the implementation of `getSubgroupFromToken` from the
  phone and manually ported it over to python. Instead, we should have all the
  implementations for parsing the opcode in `e-mission-common` and just use them here
- ideally, we would check for the subgroup being suspended in the
  `putIntoCache` method. But, by design, we only see the opcode in the auth
  module; the rest of the code works only with UUIDs. We should figure out a
  way to return the non-random parts of the opcode (program and subgroup) as a
  context so we can do additional validation downstream if needed. We can
  return this as a context variable, or store it into the profile and have the
  code check it from there.

Testing done:

After changing the default config

```
STUDY_CONFIG = os.getenv('STUDY_CONFIG', "dev-emulator-study")
```

and changing the `dev-emulator-study` config to have the new-style opcode with
a suspended subgroup

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
        "suspended_subgroups": ["default"]
    },
```

The suspended subgroup skipped all put messages

```
2025-02-02 18:57:35,723:DEBUG:6363951104:START POST /usercache/put
2025-02-02 18:57:35,724:DEBUG:6363951104:Called userCache.put
2025-02-02 18:57:35,742:DEBUG:6363951104:subgroup default found in list ['test', 'default']
2025-02-02 18:57:35,743:INFO:6363951104:Received put message for subgroup default in suspended
_subgroups=['default'], returning uuid = None
2025-02-02 18:57:35,743:DEBUG:6363951104:retUUID = None
2025-02-02 18:57:35,743:DEBUG:6380777472:START POST /usercache/get
2025-02-02 18:57:35,743:DEBUG:6363951104:END POST /usercache/put  0.023092985153198242
```

The non-suspended subgroup used the default implementation of put

```
2025-02-02 19:15:42,723:DEBUG:13035925504:START POST /usercache/put
2025-02-02 19:15:42,726:DEBUG:13035925504:Called userCache.put
2025-02-02 19:15:42,750:DEBUG:13035925504:subgroup test found in list ['test', 'default']
2025-02-02 19:15:42,759:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.711804 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede842'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 19:15:42,761:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.71355 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede844'), 'ok': 1.0, 'updatedExisting': False}
```

After modifying the config to remove suspended_subgroups (default case)

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
    },
```

```
2025-02-02 21:08:26,082:DEBUG:13052751872:START POST /usercache/put
2025-02-02 21:08:26,082:DEBUG:13052751872:Called userCache.put
2025-02-02 21:08:26,088:DEBUG:13052751872:subgroup default found in list ['test', 'default']
2025-02-02 21:08:26,088:DEBUG:13052751872:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-02 21:08:26,088:DEBUG:13052751872:Using the skip method to verify id token nrelop_dev-emulator-study_default_123 of length 37

2025-02-02 21:08:26,110:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/location, write_ts = 1738559116.0185199 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a88'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,112:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/filtered_location, write_ts = 1738559116.020091 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a8a'), 'ok': 1.0, 'updatedExisting': False}
...
2025-02-02 21:08:26,930:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = stats/client_nav_event, write_ts = 1738559306.0461679 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3da8'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,930:DEBUG:13052751872:END POST /usercache/put eb4a7aae-f2d4-4480-ba85-c568a45591b5 0.849830150604248
```
shankari added a commit to shankari/e-mission-server that referenced this issue Feb 3, 2025
This implements a hack for e-mission/e-mission-docs#1104
The request in the issue was to implement the ability to suspend and resume
data collection.

As a first step for implementing this, we:
- add a new section to the `opcode` in the dynamic config that lists the
  suspended subgroups, and
- skip all calls to `usercache/put` for users that are in suspended subgroup

This is a hack because the real fix is to stop data collection on the phone.

But even with this implementation:
- I have copied over the implementation of `getSubgroupFromToken` from the
  phone and manually ported it over to python. Instead, we should have all the
  implementations for parsing the opcode in `e-mission-common` and just use them here
- ideally, we would check for the subgroup being suspended in the
  `putIntoCache` method. But, by design, we only see the opcode in the auth
  module; the rest of the code works only with UUIDs. We should figure out a
  way to return the non-random parts of the opcode (program and subgroup) as a
  context so we can do additional validation downstream if needed. We can
  return this as a context variable, or store it into the profile and have the
  code check it from there.

Testing done:

After changing the default config

```
STUDY_CONFIG = os.getenv('STUDY_CONFIG', "dev-emulator-study")
```

and changing the `dev-emulator-study` config to have the new-style opcode with
a suspended subgroup

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
        "suspended_subgroups": ["default"]
    },
```

The suspended subgroup skipped all put messages

```
2025-02-02 18:57:35,723:DEBUG:6363951104:START POST /usercache/put
2025-02-02 18:57:35,724:DEBUG:6363951104:Called userCache.put
2025-02-02 18:57:35,742:DEBUG:6363951104:subgroup default found in list ['test', 'default']
2025-02-02 18:57:35,743:INFO:6363951104:Received put message for subgroup default in suspended
_subgroups=['default'], returning uuid = None
2025-02-02 18:57:35,743:DEBUG:6363951104:retUUID = None
2025-02-02 18:57:35,743:DEBUG:6380777472:START POST /usercache/get
2025-02-02 18:57:35,743:DEBUG:6363951104:END POST /usercache/put  0.023092985153198242
```

The non-suspended subgroup used the default implementation of put

```
2025-02-02 19:15:42,723:DEBUG:13035925504:START POST /usercache/put
2025-02-02 19:15:42,726:DEBUG:13035925504:Called userCache.put
2025-02-02 19:15:42,750:DEBUG:13035925504:subgroup test found in list ['test', 'default']
2025-02-02 19:15:42,759:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.711804 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede842'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 19:15:42,761:DEBUG:13035925504:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738551626.71355 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a034de46756f8244ede844'), 'ok': 1.0, 'updatedExisting': False}
```

After modifying the config to remove suspended_subgroups (default case)

```
    "opcode": {
        "autogen": true,
        "subgroups": [
          "test",
          "default"
        ],
    },
```

```
2025-02-02 21:08:26,082:DEBUG:13052751872:START POST /usercache/put
2025-02-02 21:08:26,082:DEBUG:13052751872:Called userCache.put
2025-02-02 21:08:26,088:DEBUG:13052751872:subgroup default found in list ['test', 'default']
2025-02-02 21:08:26,088:DEBUG:13052751872:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-02 21:08:26,088:DEBUG:13052751872:Using the skip method to verify id token nrelop_dev-emulator-study_default_123 of length 37

2025-02-02 21:08:26,110:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/location, write_ts = 1738559116.0185199 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a88'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,112:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = background/filtered_location, write_ts = 1738559116.020091 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3a8a'), 'ok': 1.0, 'updatedExisting': False}
...
2025-02-02 21:08:26,930:DEBUG:13052751872:Updated result for user = eb4a7aae-f2d4-4480-ba85-c568a45591b5, key = stats/client_nav_event, write_ts = 1738559306.0461679 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a04f4a46756f8244ee3da8'), 'ok': 1.0, 'updatedExisting': False}
2025-02-02 21:08:26,930:DEBUG:13052751872:END POST /usercache/put eb4a7aae-f2d4-4480-ba85-c568a45591b5 0.849830150604248
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants