Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private extractor support #991

Merged
merged 38 commits into from
Sep 16, 2024
Merged

Private extractor support #991

merged 38 commits into from
Sep 16, 2024

Conversation

max-zilla
Copy link
Contributor

@max-zilla max-zilla commented Apr 5, 2024

Adds optional support for an owner in the extractor heartbeat.

To test this, run an extractor with the following ENV VARs:

export [email protected]
export EXTRACTOR_KEY=secret_test

When the extractor starts, the email will be assigned as owner. That user should be able to see the extractor in lists as normal, but other users cannot. There are endpoints to grant other permissions:

POST/DELETE http://localhost:8000/api/v2/listeners/123/users/[email protected]
POST/DELETE http://localhost:8000/api/v2/listeners/123/groups/456
POST/DELETE http://localhost:8000/api/v2/listeners/123/datasets/456

...where 123 is extractor ID, and 456 is group ID. Only owner has permission to do this currently.

BASIC RULES.

If you are looking at a file or dataset Analysis tab, you will see extractor listed...

  • if extractor's access field is null
  • if you are access.owner, listed in access.users, or in a group listed in access.groups
  • if the dataset or the file's dataset are listed in access.datasets

If you are looking at Extraction History, same rules but dataset rule is ignored. NOTE: if the user has submitted a job to an extractor, they should be able to see it here... need to work out permissions on that.

Changed the logic slightly for the process filter so if the extractor has process set to {} for file or dataset (i.e. no rule specified) it will be included in those lists.

@max-zilla max-zilla marked this pull request as ready for review April 15, 2024 16:10
@tcnichol
Copy link
Contributor

tcnichol commented May 2, 2024

I'm running this but getting an error in the heartbeat:

    raise exc
  File "/Users/helium/ncsa/clowder2/backend/heartbeat_listener.py", line 34, in callback
    owner = msg["owner"]
KeyError: 'owner'

I checked in the extractor I am running and the environment variables are set.

Do I need a specific version of pyclowder? Here's my environment variables as I've set th
Screenshot 2024-05-02 at 11 55 37 AM
em.

Copy link
Contributor

@tcnichol tcnichol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this with the corresponding pyclowder branch. Private extractor registers, only visible to the user.

@lmarini
Copy link
Member

lmarini commented May 13, 2024

Does the pyclowder branch need to be merged before this one?

@lmarini
Copy link
Member

lmarini commented Jun 6, 2024

Sorry can you resolve conflicts, thanks!

Copy link
Member

@lmarini lmarini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an issue. I added some extractors with this branch running. I enabled them. When I go to the dataset page and click on the Analysis tab I get a blank page and errors in backend and frontend. Guessing it's not able to serialize the something in the new extractor json.

INFO: 127.0.0.1:65327 - "GET /api/v2/listeners?skip=0&limit=5&heartbeat_interval=0&category=&label=&alive_only=false&process=dataset&all=665df0aa6cdde48fda0fde9e&dataset_id=false HTTP/1.1" 422 Unprocessable Entity

Uncaught Error: Objects are not valid as a React child (found: object with keys {loc, msg, type}). If you meant to render a collection of children, use an array instead.
React 16
unstable_runWithPriority scheduler.development.js:468
React 4
Redux 4
s (index):3
middleware Redux
immutableStateInvariantMiddleware Immutable
dispatch Redux
fetchListenerCategories listeners.js:147
promise callback*then CancelablePromise.ts:81
fetchListenerCategories listeners.js:146
middleware Redux
immutableStateInvariantMiddleware Immutable
dispatch (index):6
listAvailableCategories Listeners.tsx:72
Listeners Listeners.tsx:101
React 5
unstable_runWithPriority scheduler.development.js:468
React 4
unstable_runWithPriority scheduler.development.js:468
React 17
tsx index.tsx:14
Webpack 3
react-dom.development.js:13231
The above error occurred in the

component:

p
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Typography@http://localhost:3000/main.bundle.js:70947:87
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
DialogContentText@http://localhost:3000/main.bundle.js:49627:83
div
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
DialogContent@http://localhost:3000/main.bundle.js:49498:82
div
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Paper@http://localhost:3000/main.bundle.js:60706:83
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
div
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Transition@http://localhost:3000/main.bundle.js:238854:30
Fade@http://localhost:3000/main.bundle.js:50560:77
FocusTrap@http://localhost:3000/main.bundle.js:37082:7
div
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Portal@http://localhost:3000/main.bundle.js:37815:7
Modal@http://localhost:3000/main.bundle.js:58736:82
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Dialog@http://localhost:3000/main.bundle.js:48993:83
ActionModal@http://localhost:3000/main.bundle.js:124340:20
ErrorModal@http://localhost:3000/main.bundle.js:124383:19
main
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
div
./node_modules/@emotion/react/dist/emotion-element-43c6fea0.browser.esm.js/withEmotionCache/<@http://localhost:3000/main.bundle.js:13186:66
Box@http://localhost:3000/main.bundle.js:78283:72
PersistentDrawerLeft@http://localhost:3000/main.bundle.js:119552:74
Dataset@http://localhost:3000/main.bundle.js:121627:80
PrivateRoute@http://localhost:3000/main.bundle.js:151156:18
RenderedRoute@http://localhost:3000/main.bundle.js:237340:7
Routes@http://localhost:3000/main.bundle.js:238029:7
Router@http://localhost:3000/main.bundle.js:237968:7
BrowserRouter@http://localhost:3000/main.bundle.js:235921:7
AppRoutes
Provider@http://localhost:3000/main.bundle.js:233513:15
RtlProvider@http://localhost:3000/main.bundle.js:77228:9
ThemeProvider@http://localhost:3000/main.bundle.js:74294:7
ThemeProvider@http://localhost:3000/main.bundle.js:77517:7
ThemeProvider@http://localhost:3000/main.bundle.js:72567:9

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://reactjs.org/link/error-boundaries to learn more about error boundaries. localhost:3000:12714:25

@lmarini
Copy link
Member

lmarini commented Jun 11, 2024

I think the issue above is also reflected in the failing pytest:

FAILED app/tests/test_feeds.py::test_feeds - assert 422 == 200

  • where 422 = <Response [422 Unprocessable Entity]>.status_code

@max-zilla
Copy link
Contributor Author

I think the issue above is also reflected in the failing pytest:

FAILED app/tests/test_feeds.py::test_feeds - assert 422 == 200

  • where 422 = <Response [422 Unprocessable Entity]>.status_code

This should be resolved. It was due to the ListenerAuthorization dependency injection I added using listener_id in most places, but in this particular call we are POSTing a small object with listener_id and automatic so the validation was failing. I separated the call to ListenerAuthorization outside of dependency on this call only, for lack of a better idea

@max-zilla max-zilla requested a review from lmarini June 17, 2024 19:27
@ddey2
Copy link
Member

ddey2 commented Jun 24, 2024

Something is not quite right here. On Extractor page, users should be able to see all the extractors(probably not the private extractor) irrespective of admin mode.

Here on this branch, I can see them in admin mode.

  1. When I click on 'Drop admin mode', the page doesn't refresh on it's own.
  2. After explicitly refreshing the page, it shows empty list.
Screenshot 2024-06-24 at 2 02 54 PM Screenshot 2024-06-24 at 2 03 10 PM 3. When I am a diff regular user (other than the owner of private extractor), I can see no extracator.

I think all these might be related to the ListnerAuth class.

@tcnichol
Copy link
Contributor

@max-zilla
I am seeing this error

INFO: 127.0.0.1:54745 - "GET /api/v2/listeners?skip=0&limit=5&heartbeat_interval=0&category=&label=&alive_only=false&dataset_id=true&all=false&enable_admin=false HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi result = await app( # type: ignore[func-returns-value] File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ return await self.app(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__ await super().__call__(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__ await self.simple_response(scope, receive, send, request_headers=headers) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response await self.app(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__ raise exc File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ await self.app(scope, receive, sender) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ raise e File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__ await route.handle(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/Users/helium/ncsa/clowder2/backend/app/routers/listeners.py", line 437, in get_listeners EventListenerDB.access.datasets == PydanticObjectId(dataset_id), File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/bson/objectid.py", line 103, in __init__ self.__validate(oid) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/bson/objectid.py", line 201, in __validate _raise_invalid_id(oid) File "/Users/helium/.virtualenvs/backend-qaHHe5b_/lib/python3.9/site-packages/bson/objectid.py", line 35, in _raise_invalid_id raise InvalidId( bson.errors.InvalidId: 'true' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string

@tcnichol
Copy link
Contributor

@max-zilla I see that error when I am on the page and am not in superAdmin mode. And no extractors are visible at all until I reload the page with superAdmin.

I was thinking that they should show up, and we just need superAdmin mode to enable/disable?

dataset_id was being sent in by the component as 'true' when it should be 'null' and then the 'true' should have gone after
@tcnichol
Copy link
Contributor

I think I fixed the error that was causing no extractors to be visible. The component was sending in 'true' as a dataset_id as a default value. It should be 'null' and then the next one for 'all' should be 'true.' Committed this.

@tcnichol
Copy link
Contributor

Something is not quite right here. On Extractor page, users should be able to see all the extractors(probably not the private extractor) irrespective of admin mode.

Here on this branch, I can see them in admin mode.

  1. When I click on 'Drop admin mode', the page doesn't refresh on it's own.
  2. After explicitly refreshing the page, it shows empty list.

Screenshot 2024-06-24 at 2 02 54 PM Screenshot 2024-06-24 at 2 03 10 PM 3. When I am a diff regular user (other than the owner of private extractor), I can see no extracator.
I think all these might be related to the ListnerAuth class.

This might be fixed by the commit I made.
I would like someone else to check that it is working as intended.
See comment at end.

@tcnichol
Copy link
Contributor

With the commit I made, I think things work as intended for the admin user, but I don't think it is working for other users.

I registered the wordcount extractor NOT as a private one, and even if it's running and enabled, it still only shows up for my admin user, and not for any others.

@tcnichol
Copy link
Contributor

I found what was going wrong here.

When extractors are picked up by the heartbeat, the owner is "" but the script checks if they are None. I added that and now extractors that are not private are showing up like they should.

I did create an issue and a branch, but since this is such a tiny fix I just put it here.

To test, kill the heartbeat_listener docker container, and run the script in pycharm. Access will be None for public extractors registered.

@tcnichol tcnichol linked an issue Sep 1, 2024 that may be closed by this pull request
@max-zilla
Copy link
Contributor Author

thanks for the fix, @tcnichol !

… query was not matching a user since it was an `equality` query on an `array` and not a `in` query.
…ue and it broke the query. Not sure what happened.
@lmarini lmarini merged commit 7d054e0 into main Sep 16, 2024
6 checks passed
@lmarini lmarini deleted the private-extractors branch September 16, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

listeners not showing up on private extractor branch
4 participants