-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: generativeaionvertexai_embedding_batch #3913
Conversation
b5e809a
to
fc64882
Compare
Here is the summary of changes. You are about to add 1 region tag.
This comment is generated by snippet-bot.
|
fc64882
to
35afe3e
Compare
35afe3e
to
32eb6c9
Compare
@gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces a new snippet and its corresponding test for creating batch embeddings using the Vertex AI API. The code is well-structured and follows the project's contribution guidelines. However, there are a few areas that could be improved for clarity, maintainability, and best practices.
/** | ||
* TODO(developer): Uncomment/update these variables before running the sample. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment asks the developer to uncomment/update these variables. It would be helpful to provide more specific instructions or examples of how these variables should be updated, including valid formats and constraints. For instance, specify the expected format for inputUri
and outputUri
, and whether jobName
can contain special characters.
/** | |
* TODO(developer): Uncomment/update these variables before running the sample. | |
*/ | |
/** | |
* TODO(developer): Uncomment and replace the following placeholders before running the sample. | |
* - projectId: Replace with your Google Cloud project ID. | |
* - inputUri: Replace with the URI of the input dataset. This can be a BigQuery table or a Google Cloud Storage file. | |
* Examples: "gs://[BUCKET]/[DATASET].jsonl" or "bq://[PROJECT].[DATASET].[TABLE]". | |
* - outputUri: Replace with the URI where the output will be stored. This can be a BigQuery table or a Google Cloud Storage file. | |
* Examples: "gs://[BUCKET]/[OUTPUT].jsonl" or "bq://[PROJECT].[DATASET].[TABLE]". | |
* - jobName: Replace with a descriptive name for the batch embedding job. Avoid special characters. | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow our style guidelines and would make sample management unwieldy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern about style guidelines. Could you please clarify which specific aspects of the suggested changes violate your style guide? Knowing the specific rules will help me refine the suggestion to better align with your team's preferences. For example, are there length restrictions on comments, preferred ways to format URIs, or specific naming conventions for variables that are being violated?
const textEmbeddingModel = 'text-embedding-005'; | ||
const location = 'us-central1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Create batch prediction job request | ||
const [response] = await jobServiceClient.createBatchPredictionJob(request); | ||
|
||
console.log('Raw response: ', JSON.stringify(response, null, 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just logging the raw response, consider logging specific fields that are relevant to the user, such as the job ID, status, and output URI. This would make it easier for the user to track the job's progress and access the results.
console.log('Raw response: ', JSON.stringify(response, null, 2)); | |
console.log('Batch prediction job created:', response.name); | |
console.log('Job ID:', response.name.split('/').pop()); | |
console.log('Output URI:', response.outputConfig.gcsDestination.outputUriPrefix); |
const location = 'us-central1'; | ||
const inputUri = | ||
'gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl'; | ||
let outputUri = 'gs://ucaip-samples-test-output/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
before(async () => { | ||
const bucketName = `test-bucket-${uuid()}`; | ||
// Create a Google Cloud Storage bucket for UsageReports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the suggestion is valid. The original comment incorrectly referred to UsageReports
. The updated comment accurately reflects that the bucket is for embedding output, aligning with the test's purpose. The change improves clarity and accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-strzelczyk fixed
|
||
return jobServiceClient.deleteBatchPredictionJob(deleteRequest); | ||
}); | ||
// Delete the Google Cloud Storage bucket created for usage reports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is valid and improves clarity. The comment now accurately reflects the purpose of deleting the bucket, which is to clean up resources after the test completes. This is good practice for resource management and prevents unnecessary costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-strzelczyk fixed
111148a
to
d66ec9d
Compare
1c00638
to
0ad638a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the comments about usageReport bucket, LGTM after that
|
||
before(async () => { | ||
const bucketName = `test-bucket-${uuid()}`; | ||
// Create a Google Cloud Storage bucket for UsageReports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion is valid?
|
||
return jobServiceClient.deleteBatchPredictionJob(deleteRequest); | ||
}); | ||
// Delete the Google Cloud Storage bucket created for usage reports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion is valid?
0ad638a
to
831a1d1
Compare
Description
Fixes #
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
npm test
(see Testing)npm run lint
(see Style)GoogleCloudPlatform/nodejs-docs-samples
. Not a fork.