Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a download API within Bioloop core API, make usage of secure_download configurable #323

Open
ri-pandey opened this issue Oct 15, 2024 · 3 comments
Assignees

Comments

@ri-pandey
Copy link
Contributor

ri-pandey commented Oct 15, 2024

Currently, secure_download API routes are deployed in a separate docker container, which is not part of the application's docker containers (ui, api, postgres, etc.). Hence, to make API requests to secure_download, the Slate-scratch filesystem has to be mounted to the secure_download docker container, and we have to add an additional Bearer token before the API request to secure_download can be made.

Now that Slate-Scratch is available on the Bioloop service host via samba mounts, this architecture can be simplified. We can use the mounted filesystem to make upload/download requests to the core Bioloop API directly.

We will still want to retain the download API within secure_download so open source consumers of Bioloop can leverage that federated architecture.

@ri-pandey ri-pandey self-assigned this Oct 15, 2024
@ri-pandey
Copy link
Contributor Author

ri-pandey commented Oct 30, 2024

There are two potential directions this can go in:

  1. making the /download/:id HTTP endpoint part of the core Bioloop API.
  2. retaining the /download/:id HTTP endpoint in the secure_download container, but hosting the container on a machine that we have sudo access to, instead of colo25, where most developers don't.
  • Option 1 is better for simplifying architecture, but can cause scalability problems, on account of the app API being hosted on the same machine that is serving (potentially large) downloaded files.
  • Option 2 can be explored for use cases where we don't want to copy the /download endpoint into the Bioloop API, but still want to less restrictive access-control on the secure_download container (useful for developer-initiated restarts of secure_download).

In either option, scalability may become a problem. The machine that the Bioloop service APIs are currently hosted on currently offer 2 threads for processing.

  • I talked to Ray about this, and he suggested that we have new highly-performant machines (96 threads) to host our Prod infrastructure. We could look at leveraging these machines. If we want to offer more memory to the download API to work with, either of the above options could be hosted on one of these machines.
  • Another option would be to just add more threads on the current Bioloop service machines.

In either case, we do want to retain secure_download code in the repo, so that the federated download architecture can be leveraged by open source consumers of Bioloop.

@ri-pandey
Copy link
Contributor Author

ri-pandey commented Oct 30, 2024

There is some work-in-progress for this on branch feature-323. I copied the /download/:id endpoint to Bioloop's /datasets route, and mounted scadev's slate-scratch space to the Bioloop API container, so it can read and serve files from slate-scratch. Currently, this config downloads a static asset instead of the expected file.

@ri-pandey ri-pandey changed the title Move secure_download routes to Bioloop core API Implement a download API within Bioloop core API, make usage of secure_download configurable Oct 30, 2024
@ri-pandey
Copy link
Contributor Author

Another consideration to keep in mind would be the upload API, which is also hosted within the secure_download container.

From my efforts so far, it seems unlikely that we will be able to implement the upload API in the secure_download container. The reason for this is that the slate-scratch filesystem is mounted onto the Bioloop service host via a SMB mount. To enable uploading files via the Bioloop API instead of the secure_download API, this SMB-mounted filesystem will need to be further mounted into the Bioloop API docker container via a bind mount. While this works for reading data from slate-scratch, writing data to slate-scratch is not performant enough via this strategy. So, the uploads will be awfully slow if we take this route.

In either case, both the download and upload APIs should continue to be a part of secure_download for the benefit of open source consumers of Bioloop.

The integration between Bioloop API and secure_download API is explained here: https://github.com/IUSCA/bioloop/blob/99-dataset-upload-2/docs/upload.md?plain=1#L64

Note that the upload API is not a part of the Production secure_download at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants