Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/worksheet nlq beta #1639

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open

Feat/worksheet nlq beta #1639

wants to merge 52 commits into from

Conversation

noah-paige
Copy link
Contributor

@noah-paige noah-paige commented Oct 14, 2024

Feature or Bugfix

  • Feature

Detail

  • Add Natural Language Querying Feature in data.all
    • Text To SQL Generation
    • Text Document Analysis

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

client2, prompt, persistent_env1.environmentUri, worksheet1.worksheetUri, ''
).contains('UnauthorizedOperation', 'RUN_ATHENA_QUERY')


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code. I am not sure what we should do with this code block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left it here to hear others opinions - not sure what is best approach is, will think on this and get back tomorrow with some thoughts

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and tested frontend and comments from yesterday

@noah-paige
Copy link
Contributor Author

Some example views from the UI:

Text To SQL:

Screenshot 2024-10-24 at 11 55 50 AM

Document Analyzer:

Screenshot 2024-10-24 at 11 55 33 AM

UserGuide Information in this PR:
documentation/userguide/docs/worksheets.md - changes in this PR

cc: @zsaltys @TejasRGitHub @anushka-singh

@noah-paige noah-paige linked an issue Oct 24, 2024 that may be closed by this pull request
@noah-paige
Copy link
Contributor Author

@dlpzx - added additional tests and believe I have addressed all comments

The one pending item is when testing the happy paths for both TextToSQL and analyzeTextDocument GQL operations - often the integration tests will fail with the following error:

An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

If you retry the test in isolation / after some time it will resolve itself. Looking at the service quota limits, Claude 3.5 Sonnet has a fairly low max invoke count per account per minute of 50 (note: we are still far under that per minute threshold but I think that or similar could be causing this throttling exception - other Claude 3 models have a much higher per minute tolerance of invocations)

Not sure if we should opt for a Claude 3 version for the time being to remediate this testing case and also to improve UX for a user using these features, thoughts?

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but needs to resolve conflicts

def __init__(self):
self._session = SessionHelper.get_session()
self._client = self._session.client('bedrock-runtime')
model_id = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am facing errors of the type: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-3-5-sonnet-20240620-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model. I think it is related to the enforced cross-region inference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you have faced something similar when using this model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Natural Language Querying (NLQ) using genAI
3 participants