Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: generativeaionvertexai_embedding_batch #3913

Merged
merged 3 commits into from
Dec 5, 2024

Conversation

gryczj
Copy link
Contributor

@gryczj gryczj commented Oct 31, 2024

Description

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

  • I have followed guidelines from CONTRIBUTING.MD and Samples Style Guide
  • Tests pass: npm test (see Testing)
  • Lint pass: npm run lint (see Style)
  • These samples need a new API enabled in testing projects to pass (let us know which ones)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
  • This pull request is from a branch created directly off of GoogleCloudPlatform/nodejs-docs-samples. Not a fork.
  • This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
  • This sample adds a new sample directory, and I created GitHub Actions workflow for this sample
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
  • Please merge this PR for me once it is approved

@gryczj gryczj requested review from a team as code owners October 31, 2024 14:59
@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: aiplatform Issues related to the AI Platform API. labels Oct 31, 2024
@gryczj gryczj added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Oct 31, 2024
@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch from b5e809a to fc64882 Compare October 31, 2024 15:00
Copy link

snippet-bot bot commented Oct 31, 2024

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch from fc64882 to 35afe3e Compare November 12, 2024 15:23
@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch from 35afe3e to 32eb6c9 Compare November 20, 2024 12:33
@iennae
Copy link
Contributor

iennae commented Nov 21, 2024

@gemini review

Copy link
Contributor

@code-review-assist-experimental code-review-assist-experimental bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request introduces a new snippet and its corresponding test for creating batch embeddings using the Vertex AI API. The code is well-structured and follows the project's contribution guidelines. However, there are a few areas that could be improved for clarity, maintainability, and best practices.

Comment on lines +25 to +27
/**
* TODO(developer): Uncomment/update these variables before running the sample.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TODO comment asks the developer to uncomment/update these variables. It would be helpful to provide more specific instructions or examples of how these variables should be updated, including valid formats and constraints. For instance, specify the expected format for inputUri and outputUri, and whether jobName can contain special characters.

Suggested change
/**
* TODO(developer): Uncomment/update these variables before running the sample.
*/
/**
* TODO(developer): Uncomment and replace the following placeholders before running the sample.
* - projectId: Replace with your Google Cloud project ID.
* - inputUri: Replace with the URI of the input dataset. This can be a BigQuery table or a Google Cloud Storage file.
* Examples: "gs://[BUCKET]/[DATASET].jsonl" or "bq://[PROJECT].[DATASET].[TABLE]".
* - outputUri: Replace with the URI where the output will be stored. This can be a BigQuery table or a Google Cloud Storage file.
* Examples: "gs://[BUCKET]/[OUTPUT].jsonl" or "bq://[PROJECT].[DATASET].[TABLE]".
* - jobName: Replace with a descriptive name for the batch embedding job. Avoid special characters.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow our style guidelines and would make sample management unwieldy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern about style guidelines. Could you please clarify which specific aspects of the suggested changes violate your style guide? Knowing the specific rules will help me refine the suggestion to better align with your team's preferences. For example, are there length restrictions on comments, preferred ways to format URIs, or specific naming conventions for variables that are being violated?

ai-platform/snippets/create-batch-embedding.js Outdated Show resolved Hide resolved
Comment on lines +44 to +45
const textEmbeddingModel = 'text-embedding-005';
const location = 'us-central1';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider making textEmbeddingModel and location configurable parameters passed to the main function. This would make the snippet more flexible and reusable.

// Create batch prediction job request
const [response] = await jobServiceClient.createBatchPredictionJob(request);

console.log('Raw response: ', JSON.stringify(response, null, 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Instead of just logging the raw response, consider logging specific fields that are relevant to the user, such as the job ID, status, and output URI. This would make it easier for the user to track the job's progress and access the results.

Suggested change
console.log('Raw response: ', JSON.stringify(response, null, 2));
console.log('Batch prediction job created:', response.name);
console.log('Job ID:', response.name.split('/').pop());
console.log('Output URI:', response.outputConfig.gcsDestination.outputUriPrefix);

const location = 'us-central1';
const inputUri =
'gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl';
let outputUri = 'gs://ucaip-samples-test-output/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to use a dedicated test project for integration tests to avoid conflicts with other resources. Consider using a separate project for testing and setting the project ID explicitly in the test configuration.


before(async () => {
const bucketName = `test-bucket-${uuid()}`;
// Create a Google Cloud Storage bucket for UsageReports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The comment mentions UsageReports, but this test creates a bucket for embedding output. Update the comment to reflect the actual purpose of the bucket.

Suggested change
// Create a Google Cloud Storage bucket for UsageReports
// Create a Google Cloud Storage bucket for embedding output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the suggestion is valid. The original comment incorrectly referred to UsageReports. The updated comment accurately reflects that the bucket is for embedding output, aligning with the test's purpose. The change improves clarity and accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return jobServiceClient.deleteBatchPredictionJob(deleteRequest);
});
// Delete the Google Cloud Storage bucket created for usage reports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Similar to the previous comment, this one also mentions usage reports. Update it to reflect the purpose of deleting the bucket.

Suggested change
// Delete the Google Cloud Storage bucket created for usage reports.
// Delete the Google Cloud Storage bucket created for embedding output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is valid and improves clarity. The comment now accurately reflects the purpose of deleting the bucket, which is to clean up resources after the test completes. This is good practice for resource management and prevents unnecessary costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch from 111148a to d66ec9d Compare November 21, 2024 10:16
@gryczj gryczj requested a review from iennae November 22, 2024 09:11
@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch 3 times, most recently from 1c00638 to 0ad638a Compare December 3, 2024 08:28
Copy link
Contributor

@m-strzelczyk m-strzelczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the comments about usageReport bucket, LGTM after that


before(async () => {
const bucketName = `test-bucket-${uuid()}`;
// Create a Google Cloud Storage bucket for UsageReports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is valid?


return jobServiceClient.deleteBatchPredictionJob(deleteRequest);
});
// Delete the Google Cloud Storage bucket created for usage reports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is valid?

@gryczj gryczj force-pushed the generativeaionvertexai_embedding_batch branch from 0ad638a to 831a1d1 Compare December 4, 2024 14:01
@gryczj gryczj requested a review from m-strzelczyk December 4, 2024 14:01
@BigBlackWolf BigBlackWolf self-requested a review December 5, 2024 11:55
@iennae iennae merged commit 936d05b into main Dec 5, 2024
133 of 134 checks passed
@iennae iennae deleted the generativeaionvertexai_embedding_batch branch December 5, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants