Terraform module that deploys the Sysdig Secure for Cloud stack in Google Cloud.
Provides unified threat-detection, compliance, forensics and analysis through these major components:
-
Threat Detection: Tracks abnormal and suspicious activities in your cloud environment based on Falco language. Managed through
cloud-connector
module. -
Compliance: Enables the evaluation of standard compliance frameworks. Requires both modules
cloud-connector
andcloud-bench
. -
Image Scanning: Automatically scans all container images pushed to the registry (GCR) and the images that run on the GCP workload (currently CloudRun). Managed through
cloud-connector
.
Disabled by Default, can be enabled throughdeploy_scanning
input variable parameters.
For other Cloud providers check: AWS, Azure
There are several ways to deploy Secure for Cloud in you GCP infrastructure,
/examples
for the most common scenarios- Single Project
- Single Project with a pre-existing Kubernetes Cluster
- Organizational
- Many module,examples and use-cases, we provide ways to re-use existing resources (as optionals) in your infrastructure. Check input summary on each example/module.
/use-cases
with self-baked customer-specific alternative scenarios.
Find specific overall service arquitecture diagrams attached to each example/use-case.
In the long-term our purpose is to evaluate those use-cases and if they're common enough, convert them into examples to make their usage easier.
If you're unsure about what/how to use this module, please fill the questionnaire report as an issue and let us know your context, we will be happy to help.
- GCP regions
- Do not confuse required
region
with GCP location or zone. Identifying a region or zone
- Do not confuse required
- All Sysdig Secure for Cloud features but Image Scanning are enabled by default. You can enable it through
deploy_scanning
input variable parameter of each example. - For free subscription users, beware that organizational examples may not deploy properly due to the 1 cloud-account limitation. Open an Issue so we can help you here!
- This example will create resources that cost money. Run
terraform destroy
when you don't need them anymore.- For a normal load, it should be <150$/month aprox.
- Cloud Logging API is activated by default so no extra cost here
- Cloud Run instance comes as the most expensive service. Default cpu/memory specs, for an ingestion of 35KK events/hour, for 2 instances 24x7 usage
- Cloud Run ingests events from a pub/sub topic, with no retention. It's cost is quite descpreciable, but you can check with the calculator based on the events of the Log Explorer console and 4KB of size per event aprox.
Beware that the logs we consume are scoped to the projects, and we exclude kubernetes eventslogName=~"^projects/SCOPED_PROJECT_OR_ORG/logs/cloudaudit.googleapis.com"
Your user must have following roles in your GCP credentials
- Owner
- Organization Admin (organizational usage only)
- required for org-wide roles both for image scanning and compliance. also some queries are performed to dig into the org domain, folders and projects.
To authorize the cloud CLI to be used by Terraform check the following Terraform Google Provider docs
Instead of using a user, you can also deploy the module using a Service Account (SA). In order to create a SA for the organization, you need to go to one of your organization projects and create a SA. This SA must have been granted with Organization Admin role. Additionally, you should allow your user to be able to use this SA.
SA role | SA user permissions |
---|---|
Besides, the following GCP APIs must be enabled (how do I check it?) depending on the desired feature:
- Cloud Pub/Sub API
- Cloud Run API
- Eventarc API
- Secret Manager API
- Cloud Build API
- Identity and access management API
- Identity and access management API
- IAM Service Account Credentials API
- Cloud Resource Manager API
- Security Token Service API
- Cloud Asset API
Check official documentation on Secure for cloud - GCP, Confirm the Services are working
Choose one of the rules contained in an activated Runtime Policies for GCP, such as Sysdig GCP Activity Logs
policy and execute it in your GCP account.
ex.: Create an alert (Monitoring > Alerting > Create policy). Delete it to prompt the event.
Remember that in case you add new rules to the policy you need to give it time to propagate the changes.
In the cloud-connector
logs you should see similar logs to these
An alert has been deleted (requesting user=..., requesting IP=..., resource name=projects/test/alertPolicies/3771445340801051512)
In Secure > Events
you should see the event coming through, but beware you may need to activate specific levels such as Info
depending on the rule you're firing.
Alternatively, use Terraform example module to trigger GCP Update, Disable or Delete Sink event can be found on examples/trigger-events
-
For Repository image scanning, upload an image to a new Repository in a Artifact Registry. Follow repository
Setup Instructions
provided by GCP$ docker tag IMAGE:VERSION REPO_REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY/IMAGE:latest $ docker push REPO_REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY/IMAGE:latest
-
For CloudRun image scanning, deploy a runner.
It may take some time, but you should see logs detecting the new image in the cloud-connector
logs, similar to these
An image has been pushed to GCR registry (project=..., tag=europe-west2-docker.pregionkg.dev/test-repo/alpine/alpine:latest, digest=europe-west2-docker.pkg.dev/test-repo/alpine/alpine@sha256:be9bdc0ef8e96dbc428dc189b31e2e3b05523d96d12ed627c37aa2936653258c) Starting GCR scanning for 'europe-west2-docker.pkg.dev/test-repo/alpine/alpine:latest
And a CloudBuild being launched successfully.
A: Verify you're ussing project ID, and not name or number. https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin
A: On your Google Cloud account, search for "APIs & Services > Enabled APIs & Services" or run following command
$ gcloud services list --enabled
A: This may happen because permissions are not enough, API services were not correctly enabled, or you're not correctly authenticated for terraform google prolvider.
S: Verify permissions, api-services, and that the Terraform Google Provider authentication has been correctly setup.
You can also launch the following terraform manifest to check whether you're authenticated with what you expect
data "google_client_openid_userinfo" "me" {
}
output "me" {
value = data.google_client_openid_userinfo.me.*
}
As for 2023 April, organizations with projects under organizational unit folders, is supported with the organizational compliance example
S: If you want to target specific projects, you can still use the benchmark_project_ids
parameter so you can define
the projects where compliance role is to be deployed explicitly.
You can use the fetch-gcp-rojects.sh utility to list organization member projects
Let us know if this workaround won't be enough, and we will work on implementing a solution.
A: On your GCP infrastructure, per-project where Comliance has been setup, check following points
- there is a Workload Identity Pool and associated Workload Identity Pool Provider configured, which must have an ID of
sysdigcloud
(display name doesn't matter) - the pool should have a connected service account with the name
sfcsysdigcloudbench
, with the email[email protected]
- this serviceaccount should allow access to the following format
principalset: principalSet://iam.googleapis.com/projects/<PROJECTID>/locations/global/workloadIdentityPools/sysdigcloud/attribute.aws_role/arn:aws:sts::***:assumed-role/***
- the serviceaccount should have the
viewer role
on the target project, as well as a custom role containing the "storage.buckets.getIamPolicy", "bigquery.tables.list", "cloudasset.assets.listIamPolicy" and "cloudasset.assets.listResource" permissions - the pool provider should allow access to Sysdig's trusted identity, retrieved through
$ curl https://<SYSDIG_SECURE_URL>/api/cloud/v2/gcp/trustedIdentity \
--header 'Authorization: Bearer <SYSDIG_SECURE_API_TOKEN>'
Q: Getting "Error creating Service: googleapi: got HTTP response code 404" "The requested URL /serving.knative.dev/v1/namespaces/***/services was not found on this server"
"module.secure-for-cloud_example_organization.module.cloud_connector.goo
gle_cloud_run_service.cloud_connector" error: Error creating Service: googleapi: got HTTP response code 404 with
…
<p><b>404.</b> <ins>That’s an error.</ins>
<p>The requested URL <code>/apis/serving.knative.dev/v1/namespaces/****/services</code> was not found on this server. <ins>That’s all we know.</ins>
A: This error is given by the Terraform GCP provider when an invalid region is used.
S: Use one of the available GCP regions. Do not confuse required region
with GCP location or zone. Identifying a region or zone
Q: Error because it cannot resolve the address below, "https://-run.googleapis.com/apis/serving.knative.dev"
A: GCP region was not provided in the provider block
A: Some resources we use, such as the google_iam_workload_identity_pool_provider
are only available in the beta version.
Q: Getting "Error creating WorkloadIdentityPool: googleapi: Error 409: Requested entity already exists"
A: Currently Sysdig Backend does not support dynamic WorkloadPool and it's name is fixed to sysdigcloud
.
Moreover, Google, only performs a soft-deletion of this resource.
https://cloud.google.com/iam/docs/manage-workload-identity-pools-providers#delete-pool
You can undelete a pool for up to 30 days after deletion. After 30 days, deletion is permanent. Until a pool is permanently deleted, you cannot reuse its name when creating a new workload identity pool.
S: For the moment, federation workload identity pool+provider have fixed name.
Therea are several options here
-
For single-account, in case you want to reuse it, you can make use of the
reuse_workload_identity_pool
attribute available in some examples. -
For organizational setups, you can make use of a single workload-identity for all the organization, with the /organization-org_compliance
-
Alternatively, you can reactivate and import it, into your terraform state manually.
# re-activate pool and provider $ gcloud iam workload-identity-pools undelete sysdigcloud --location=global $ gcloud iam workload-identity-pools providers undelete sysdigcloud --workload-identity-pool="sysdigcloud" --location=global # import to terraform state # for this you have to adapt the import resource to your specific usage # ex.: for single-project, input your project-id $ terraform import 'module.secure-for-cloud_example_single-project.module.cloud_bench[0].module.trust_relationship["<PROJECT_ID>"].google_iam_workload_identity_pool.pool' <PROJECT_ID>/sysdigcloud $ terraform import 'module.secure-for-cloud_example_single-project.module.cloud_bench[0].module.trust_relationship["<PROJECT_ID>"].google_iam_workload_identity_pool_provider.pool_provider' <PROJECT_ID>/sysdigcloud/sysdigcloud # ex.: for organization example you should change its reference too, per project $ terraform import 'module.secure-for-cloud_example_organization.module.cloud_bench[0].module.trust_relationship["<PROJECT_ID>"].google_iam_workload_identity_pool.pool' <PROJECT_ID>/sysdigcloud $ terraform import 'module.secure-for-cloud_example_organization.module.cloud_bench[0].module.trust_relationship["<PROJECT_ID>"].google_iam_workload_identity_pool_provider.pool_provider' <PROJECT_ID>/sysdigcloud/sysdigcloud
The import resource to use, is the one pointed out in your terraform plan/apply error messsage
-- for Error: Error creating WorkloadIdentityPool: googleapi: Error 409: Requested entity already exists with module.secure-for-cloud_example_organization.module.cloud_bench[0].module.trust_relationship["org-child-project-1"].google_iam_workload_identity_pool.pool, on .... in resource "google_iam_workload_identity_pool" "pool": resource "google_iam_workload_identity_pool" "pool" { -- use ' module.secure-for-cloud_example_organization.module.cloud_bench[0].module.trust_relationship["org-child-project-1"].google_iam_workload_identity_pool.pool' as your import resource -- such as $ terraform import 'module.secure-for-cloud_example_organization.module.cloud_bench[0].module.trust_relationship["org-child-project-1"].google_iam_workload_identity_pool.pool' 'org-child-project-1/sysdigcloud'
Note: if you're using terragrunt, run
terragrunt import
Q: Getting "Error creating Topic: googleapi: Error 409: Resource already exists in the project (resource=gcr)"
│ Error: Error creating Topic: googleapi: Error 409: Resource already exists in the project (resource=gcr).
│
│ with module.sfc_example_single_project.module.pubsub_http_subscription.google_pubsub_topic.topic[0],
│ on ../../../modules/infrastructure/pubsub_push_http_subscription/main.tf line 10, in resource "google_pubsub_topic" "topic":
│ 10: resource "google_pubsub_topic" "topic" {
A: This error happens due to a GCP limitation where only a single topic named gcr
can exist. This name is gcp hardcoded and is the one we used to detect images pushed to the registry.
S: If the topic already exists, you can import it in your terraform state, BUT BEWARE that once you call destroy it will be removed.
$ terraform import 'module.sfc_example_single_project.module.pubsub_http_subscription.google_pubsub_topic.topic[0]' gcr
Contact us to develop a workaround for this, where the topic name is to be reused.
Note: if you're using terragrunt, run terragrunt import
Q: Getting "Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable."
A: If cloud-connector cloud run module cannot start it will give this error. The error is given by the health-check system, it's not specific to its PORT per-se
S: Verify possible logs before the deployment crashes. Could be limitations due to Sysdig license (expired trial subscription or free-tier usage where cloud-account limit has been surpassed)
Q: Getting "message: Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable"
A: Contrary to AWS, Terraform Google deployment requires just-started workload to start in a healthy status. If this does not happen it will fail.
S: Check your workload services (cloud run) logs to see what really failed. One common cause is a wrong Sysdig Secure API Token
error starting scan runner for image ****: rpc error: code = PermissionDenied desc = Cloud Build API has not been used in project *** before or it is disabled.
Enable it by visiting https://console.developers.google.com/apis/api/cloudbuild.googleapis.com/overview?project=*** then retry.
If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry
A: Do as the error says and activate CloudBuild API. Check the list of all the required APIs that need to be activated per feature module.
A: Verify that gcr
topic exists. If create_gcr_topic
is set to false and gcr
topic is not found, the GCR scanning is omitted and won't be deployed. For more info see GCR PubSub topic.
- Uninstall previous deployment resources before upgrading
$ terraform destroy
- Upgrade the full terraform example with
$ terraform init -upgrade
$ terraform plan
$ terraform apply
-
If the event-source is created throuh SFC, some events may get lost while upgrading with this approach. however, if the cloudtrail is re-used (normal production setup) events will be recovered once the ingestion resumes.
-
If required, you can upgrade cloud-connector component by restarting the task (stop task). Because it's not pinned to an specific version, it will download the
latest
one.
Module is maintained and supported by Sysdig.
Apache 2 Licensed. See LICENSE for full details.