Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename onebox telemetry metadata fields #5770

Merged
merged 1 commit into from
Oct 3, 2024

Conversation

thenamankumar
Copy link
Member

@thenamankumar thenamankumar commented Oct 1, 2024

This PR renames a bunch of OneBox telemetry events metadata fields and make them camelcase.

This PR also removes the check isDotCom from the promptText private metadata field passed with the chat-question:executed event. None of the private metadata fields are sent to SG through the telemetery pipeline, so that data is still private, similar to the other present privateMetadata field. The only change is that the promptText will be recorded in the event_logs on that sourcegraph instance.

We need to remove the check so that we can selectively collect that field for the S2 instance, configured by a env var here: https://github.com/sourcegraph/cloud/pull/15143

Test plan

Make sure all the updated events are still recorded correctly.

Changelog

@thenamankumar thenamankumar requested review from camdencheek, akalia25 and a team October 1, 2024 16:06
@akalia25
Copy link
Contributor

akalia25 commented Oct 1, 2024

Hey @thenamankumar I thought we were only interested in collecting query (or now promptText) for the event onebox.intentCorrection, curious why the scope has changed to now include cody.chat-question?

// 🚨 SECURITY: chat transcripts are to be included only for DotCom users AND for V2 telemetry
// V2 telemetry exports privateMetadata only for DotCom users
// the condition below is an additional safeguard measure
promptText:
isDotCom(authStatus) &&
(await truncatePromptString(inputText, CHAT_INPUT_TOKEN_BUDGET)),
promptText: await truncatePromptString(inputText, CHAT_INPUT_TOKEN_BUDGET),
Copy link
Contributor

@akalia25 akalia25 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been really strict on how we collect sensitive data like this, to ensure, it is never collected from customers. I do understand that privateMetadata is only collected for dotcom instances, but to relax this additional dotcom check for transcripts, I will need to confirm with the team we are okay with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @akalia25 , I have tagged the @sourcegraph/security-code-review to get approvals. We can do according to the suggestions.

To confirm again, @chenkc805 do we really need promptText for the chat-question:executed events on S2? Along with the detectedIntent, userSpecifiedIntent and intentScores?

context: https://linear.app/sourcegraph/issue/SRCH-1028/log-telemetry-of-the-output-of-the-intent-detection-model-with-each

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @thenamankumar could we add an isS2() clause (similar to isDotCom()), that would allow for transcripts promptText/responseText to be stored for both dotcom instance and S2.

This would help avoid collecting this data on customer's local PG instance and keeping it extra secure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this: isDotCom(authStatus) || isS2(authStatus))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this change for now from this PR.

@RXminuS
Copy link
Contributor

RXminuS commented Oct 2, 2024

@thenamankumar we should maybe sync on this tomorrow because I just removed the field all together after was told it was unused. #5558 Also there's a new way of invoking and testing these now, happy to walk you through it or have a look in lib/shared/src/telemetry-v2/events/at-mention.ts and vscode/e2e/telemetry/at-mention.test.ts

@thenamankumar thenamankumar force-pushed the naman/onebox-telemetry-renames branch 2 times, most recently from 07c50ad to 82494e2 Compare October 3, 2024 08:30
Copy link

github-actions bot commented Oct 3, 2024

‼️ Hey @sourcegraph/cody-security, please review this PR carefully as it introduces the usage of an unsafe_ function or abuses PromptString.

@thenamankumar thenamankumar dismissed akalia25’s stale review October 3, 2024 08:38

Removed then requested changes, unblocking to merge the rest renames.

@thenamankumar thenamankumar merged commit 81fb2e9 into main Oct 3, 2024
20 checks passed
@thenamankumar thenamankumar deleted the naman/onebox-telemetry-renames branch October 3, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants