Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use streaming for raw data requests. #19

Merged
merged 4 commits into from
Jun 5, 2024

Conversation

aaronweeden
Copy link

@aaronweeden aaronweeden commented Nov 22, 2023

Description

This PR makes it so that, for requests for raw data, if the portal is configured to stream JSON text sequence data back (as it will be in XDMoD 11.0 after ubccr/xdmod#1792 and ubccr/xdmod#1858), the data will be properly iterated over and stored in the data frame.

Determining whether the portal supports streaming is accomplished by first making a request to the rest/warehouse/raw-data/limit endpoint. If the response status code is 404 (as it will be for 11.0 based on changes in ubccr/xdmod#1792), it runs the streaming algorithm. Otherwise, if the portal has the rest/warehouse/raw-data/limit endpoint (i.e., if it is running XDMoD 10.5), it runs the old algorithm of iteratively requesting 10,000 rows (or whatever the portal has as its configured limit).

Once XDMoD 10.5 is no longer supported, we can remove the old algorithm.

Motivation and Context

ubccr/xdmod#1792 improves the performance of requests for raw data in the Jobs realm.

Tests performed

In addition to running the automated tests on the existing XDMoD portal, which is running 10.5, I also edited the automated tests to point at my port on xdmod-dev with the changes from ubccr/xdmod#1792, and ran those to success (the test_get_raw_data regression test failed, but on closer inspection this was due to the rows of the data frame being in a different order, which is acceptable).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • CHANGELOG.md has been updated
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request
  • Running the automated tests (see docs/developing.md) produces no errors
  • Updates have been made to the xdmod-notebooks repository as necessary, and the notebooks all run successfully

@aaronweeden aaronweeden merged commit ac99530 into ubccr:main Jun 5, 2024
1 check passed
@aaronweeden aaronweeden deleted the stream-raw-data branch June 5, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Refactor of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants