Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make number of concurrent downloads configurable #972

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Oppen
Copy link
Contributor

@Oppen Oppen commented Aug 27, 2024

Adds a new environment variable SHIORI_MAX_PAR_DL to limit the number
of concurrent downloads on the update command (and other, if necessary).
It defaults to the number of logical CPUs reported by the Go runtime.

Fixes #966

@Oppen Oppen force-pushed the fix/configurable_concurrency_limits branch 4 times, most recently from 4210c93 to 34e656e Compare August 28, 2024 23:58
@Oppen Oppen marked this pull request as draft August 29, 2024 21:02
@Oppen
Copy link
Contributor Author

Oppen commented Aug 29, 2024

I'm converting to draft to apply the change to a few other changes where it's relevant before merging. It'll be ready soon.

@Oppen Oppen force-pushed the fix/configurable_concurrency_limits branch from 34e656e to 555936b Compare September 14, 2024 06:53
@Oppen Oppen marked this pull request as ready for review September 14, 2024 06:53
@Oppen
Copy link
Contributor Author

Oppen commented Sep 14, 2024

I totally forgot about this. Now it's ready.

@Oppen
Copy link
Contributor Author

Oppen commented Sep 14, 2024

I'm thinking of an improvement that could come later on, which is keeping around exactly the number of goroutines we need and sending tasks via channels. That should reduce both the (small) CPU overhead of setting the goroutine up and the (somewhat larger, 4kB per goroutine as a baseline) memory overhead of having too many goroutines lying around, even tho only a few are active at any given time.
To be clear, I would rather have this feat merged before, and maybe implement the other thing later.

@fmartingr
Copy link
Member

I'm thinking of an improvement that could come later on, which is keeping around exactly the number of goroutines we need and sending tasks via channels. That should reduce both the (small) CPU overhead of setting the goroutine up and the (somewhat larger, 4kB per goroutine as a baseline) memory overhead of having too many goroutines lying around, even tho only a few are active at any given time. To be clear, I would rather have this feat merged before, and maybe implement the other thing later.

One thing I wanted to delve at some point is a background job system for shiori, were all these background task would be sent (download, import, export, etc) and being able to see the status of them easily rather than just firing goroutines and wait for them. Would be pretty useful for the UI/Integrations.

Copy link
Member

@fmartingr fmartingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @Oppen, just a question regarding the configuration.

internal/config/config.go Outdated Show resolved Hide resolved
docs/Configuration.md Outdated Show resolved Hide resolved
internal/config/config.go Outdated Show resolved Hide resolved
internal/config/config.go Outdated Show resolved Hide resolved
internal/config/config.go Outdated Show resolved Hide resolved
@@ -86,6 +87,7 @@ type DatabaseConfig struct {

type StorageConfig struct {
DataDir string `env:"DIR"` // Using DIR to be backwards compatible with the old config
MaxParDl int `env:"MAX_PAR_DL"`
Copy link
Collaborator

@Monirzadeh Monirzadeh Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi.
thanks for PR. maybe i miss undrtand this part.
you handle if Max_PAR_DL be 0 but what exacly happen if user not set Max_PAR_DL at all (or empty) are you sure it define as 0 by default?
can have a unittest for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it defaults to whatever zero-value the type has, in this case integer 0:

noinit - do not initialize struct fields unless environment variables were provided. The default behavior is to deeply initialize all fields to their default (zero) value.
https://github.com/sethvargo/go-envconfig

Emphasis mine.

Copy link
Collaborator

@Monirzadeh Monirzadeh Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can have a unittest for that?
Set different value with environment variable and be sure set expected configuration cfg.Storage.MaxParDl.
Specifically for Max_PAR_DL=""
Max_PAR_DL=-1 , unset Max_PAR_DL etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Go support manipulation of env vars for unit tests?

Copy link
Collaborator

@Monirzadeh Monirzadeh Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use this example
https://gobyexample.com/environment-variables
And
https://go-cloud-native.com/golang/test-environment-variables-in-go
And
https://beta.pkg.go.dev/testing@master#T.Setenv
Not forget to cleanup environment variable when the test done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try on the weekend. I'm a bit wary of races testing like this tho, doesn't tests run in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was something that acted more like mocking at the runtime level or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try on the weekend. I'm a bit wary of races testing like this tho, doesn't tests run in parallel?

Only if you mark them with t.Parallel(), so it depends on the test.

Copy link
Collaborator

@Monirzadeh Monirzadeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi
Thanks for this PR. i left one comment that not clear to me and not bad if we have a unit test for that.
to avoid something like #981 happen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir.
3 participants