Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change 'metadata.cloud.project.*' for GCP, clarify other fields #822

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

trentm
Copy link
Member

@trentm trentm commented Aug 11, 2023

This PR changes the spec for two cloud.project.* metadata fields for GCP to be consistent with how Beats collect the same metadata. It was motivated by a user that is unable to correlate documents from APM agents and Beats on cloud.project.id.

It also clarifies a couple things with GCP metadata collection. Finally there is an open question about adding a new field.

details

A Google cloud project has three identifiers. The CLI lists them as:

% gcloud projects list
PROJECT_ID    NAME          PROJECT_NUMBER
my-project    My Project    123456789012

and the GCP docs describe them here: https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin

proposed changes

  1. Change cloud.project.id to use GCE metadata project.projectId, instead of project.numericProjectId (aka "project number"). This will match what Beats do. As well, the projectId field is a more visible identifier for the user -- it is more prominently shown in the Google Cloud console.
  2. Stop collecting cloud.project.name. What Google describes as the "Project name" is not available in the GCE metadata response. The APM agents should not be attempting to make GCP API requests to get the project name.
  3. Note: The instance.id from the GCE metadata is a big int. Your JSON parser could be surprised by it -- the JavaScript one gets it wrong, the Python one is fine. Please check.
    > JSON.parse('{"id": 7774572792595385001}')
    { id: 7774572792595385000 } // oops
  4. Note: The previous description for how to get the cloud.region was wrong. (The Python implementation linked to earlier in the metadata.md doc is correct.)

open question

(This is for discussion. I have not proposed changing this in this PR.)

Should we also set cloud.account.id to the <GCE metadata>.project.projectId?

Beats do.

ECS says:

The cloud account or organization id used to identify different entities in a multi-tenant environment.

Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.

example: 666777888999

The "Examples: Google Cloud ORG Id" seems to indicate using projectId would be inappropriate. However, I think GCP uses the project (not some concept of the account) to "identify" resources.
Some examples from the GCE metadata (the globally unique identifier there is the numericProjectId):

machineType: 'projects/123456789012/machineTypes/e2-micro',
  network: 'projects/123456789012/networks/default',
  email: '[email protected]',

For comparison the OTel cloud semconv do not have cloud.project.* (perhaps because they are concepts particular to GCP and Azure). The do have cloud.account.id and at least the OTel JS SDK sets that to the GCE metadata project-id value.

I am curious if/where the ECS + OTel semconv merge lands on this.

checklist

  • May the instrumentation collect sensitive information, such as secrets or PII (ex. in headers)?
  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Approved by at least 2 agents + PM (if relevant)
  • Merge after 7 days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.
  • Create implementation issues through the meta issue template (this will automate issue creation for individual agents)
  • If this spec adds a new dynamic config option, add it to central config.

/schedule 2023-09-05

@trentm trentm self-assigned this Aug 11, 2023
@trentm
Copy link
Member Author

trentm commented Aug 11, 2023

See also this Java APM agent issue: elastic/apm-agent-java#3250

@gregkalapos gregkalapos self-requested a review August 17, 2023 10:55
Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Should we also set cloud.account.id to the <GCE metadata>.project.projectId?

I'd wait and see where the ecs+OTel merge settles on it, so I have the same thinking:

I am curious if/where the ECS + OTel semconv merge lands on this.

I suggest let's not add it yet and depending on the outcome of the merge we may come back to this. No strong opinion though.

@trentm trentm marked this pull request as ready for review August 29, 2023 19:01
@trentm trentm requested review from a team as code owners August 29, 2023 19:01
@trentm trentm removed request for a team August 29, 2023 19:02
@github-actions github-actions bot merged commit 21964a6 into main Sep 5, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants