-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename onebox telemetry metadata fields #5770
Conversation
Hey @thenamankumar I thought we were only interested in collecting |
// 🚨 SECURITY: chat transcripts are to be included only for DotCom users AND for V2 telemetry | ||
// V2 telemetry exports privateMetadata only for DotCom users | ||
// the condition below is an additional safeguard measure | ||
promptText: | ||
isDotCom(authStatus) && | ||
(await truncatePromptString(inputText, CHAT_INPUT_TOKEN_BUDGET)), | ||
promptText: await truncatePromptString(inputText, CHAT_INPUT_TOKEN_BUDGET), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been really strict on how we collect sensitive data like this, to ensure, it is never collected from customers. I do understand that privateMetadata
is only collected for dotcom instances, but to relax this additional dotcom check for transcripts, I will need to confirm with the team we are okay with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @akalia25 , I have tagged the @sourcegraph/security-code-review to get approvals. We can do according to the suggestions.
To confirm again, @chenkc805 do we really need promptText for the chat-question:executed
events on S2? Along with the detectedIntent, userSpecifiedIntent and intentScores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @thenamankumar could we add an isS2()
clause (similar to isDotCom()
), that would allow for transcripts promptText
/responseText
to be stored for both dotcom instance and S2.
This would help avoid collecting this data on customer's local PG instance and keeping it extra secure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this: isDotCom(authStatus) || isS2(authStatus))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this change for now from this PR.
@thenamankumar we should maybe sync on this tomorrow because I just removed the field all together after was told it was unused. #5558 Also there's a new way of invoking and testing these now, happy to walk you through it or have a look in |
07c50ad
to
82494e2
Compare
|
Removed then requested changes, unblocking to merge the rest renames.
This PR renames a bunch of OneBox telemetry events metadata fields and make them camelcase.
This PR also removes the checkisDotCom
from thepromptText
private metadata field passed with thechat-question:executed
event. None of the private metadata fields are sent to SG through the telemetery pipeline, so that data is still private, similar to the other present privateMetadata field. The only change is that thepromptText
will be recorded in the event_logs on that sourcegraph instance.We need to remove the check so that we can selectively collect that field for the S2 instance, configured by a env var here: https://github.com/sourcegraph/cloud/pull/15143
Test plan
Make sure all the updated events are still recorded correctly.
Changelog