Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout arg #330

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add timeout arg #330

wants to merge 1 commit into from

Conversation

MateoLostanlen
Copy link
Member

My internet router is very buggy at the moment and the connection jumps very frequently. It's very unpleasant, but I practically live on one of our stations, which allows me to pick up bugs that can happen in the field.
For example, I had relatively variable prediction times without much reason, so I looked at the details and the main thing that changed was the call to the api due to the bad connection:


2024-06-06 17:06:27,504 | INFO: Camera '169.254.40.2' - No wildfire (confidence: 3.18%)
2024-06-06 17:06:27,504 | INFO: Heartbeat time: 4.098053 seconds
2024-06-06 17:06:27,504 | INFO: Resize time: 0.137569 seconds
2024-06-06 17:06:27,505 | INFO: Inference time: 2.505296 seconds
2024-06-06 17:06:27,505 | INFO: Update states time: 0.000756 seconds
2024-06-06 17:06:27,505 | INFO: Alert time: 0.000002 seconds
2024-06-06 17:06:27,505 | INFO: Total prediction time: 6.742037 seconds
2024-06-06 17:06:27,505 | INFO: Time taken to analyze stream from camera 169.254.40.2: 6.74 seconds
169.254.40.1 (3840, 2160)
2024-06-06 17:06:31,112 | INFO: Camera '169.254.40.1' - No wildfire (confidence: 27.97%)
2024-06-06 17:06:31,113 | INFO: Heartbeat time: 0.223382 seconds
2024-06-06 17:06:31,113 | INFO: Resize time: 0.112550 seconds
2024-06-06 17:06:31,114 | INFO: Inference time: 3.079912 seconds
2024-06-06 17:06:31,114 | INFO: Update states time: 0.001111 seconds
2024-06-06 17:06:31,114 | INFO: Alert time: 0.000003 seconds
2024-06-06 17:06:31,114 | INFO: Total prediction time: 3.417658 seconds
2024-06-06 17:06:31,114 | INFO: Time taken to analyze stream from camera 169.254.40.1: 3.42 seconds

I propose this PR to be able to change the heartbeat timeout by passing an argument. It's not really a problem if the heartbeat fails from time to time, but it mustn't block our loop. I'll probably even change the way I call it in engine

@MateoLostanlen MateoLostanlen self-assigned this Jun 6, 2024
@MateoLostanlen MateoLostanlen added type: improvement New feature or request and removed ext: client labels Jun 6, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.02%. Comparing base (3fa4f3f) to head (0973fc1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   88.00%   88.02%   +0.01%     
==========================================
  Files          28       28              
  Lines         642      643       +1     
==========================================
+ Hits          565      566       +1     
  Misses         77       77              
Flag Coverage Δ
backend 87.47% <ø> (ø)
client 95.45% <100.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RonanMorgan
Copy link
Collaborator

why don't you initialize the Client with an other value of timeout ?

@MateoLostanlen
Copy link
Member Author

Hello @RonanMorgan, I was thinking maybe we need a bigger timeout to send a file for example no ?

@frgfm
Copy link
Member

frgfm commented Jun 23, 2024

Hey there 👋

Trying to give the overall picture before we make a choice:

  • if we allow a more flexible timeout, it means we'll potentially hold the Python process on the Raspberry Pi for longer
  • the heartbeat is happening often for this reason, and if a device can't complete the request is time (which is supposedly the lightest request to the API), I doubt it will be able to send images as you mentioned

I think we should instead think about the max internet latency allowed to consider the device reachable. The client allows a default timeout, so I'd argue that if we have to set one differently it should only be for the request duration outlier, which would most likely be image upload (detection creation), what do you think?

In short, what I suggest:

  • set a small timeout when instantiating the client, which will propagate to all HTTP requests
  • optionally override it for the image upload with a value we set in the constructor (dynamically with a default value)

@MateoLostanlen
Copy link
Member Author

Hi FG, I understand what you're saying, but what you have to take into account is that on a station the internet can fluctuate quite rapidly. Very slow one minute and fast again. A bit like the example above, we go from 4s to 0.2s for the heartbeat very quickly. The purpose of this timeout is to avoid getting stuck in this case.

There's absolutely no problem if you get even one heartbeat out of two.

For images, if the connection is too slow, engine will try again next iteration.

I guess the timeout can be 3s for images and 0.5 for hearbeat

@frgfm
Copy link
Member

frgfm commented Jun 23, 2024

But then aren't we thinking the same thing?

  • applying 0.5sec as default client timeout
  • overriding timeout for image upload to 3s?

(my point is that if connection fluctuates for heartbeat, it certainly also does for the other routes which are all more intensive) Apologies if I misunderstood 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: improvement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants