Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatic increase in CPU and memory consumption / performance issues #325

Closed
mike-larson opened this issue Dec 19, 2022 · 36 comments
Closed
Assignees
Labels
bug Something isn't working customer impact:high

Comments

@mike-larson
Copy link

mike-larson commented Dec 19, 2022

What happened?

Tried to create 75 resources of the kinds Subnet as well as RouteTableAssociation, as well as 3 RouteTables and Routes. All mentioned resources from apiVersion ec2.aws.upbound.io/v1beta1

The resources were created using a composition.

This results in the provider-aws pod (version 0.23.0) cannot process all the requests, increases dramatically in CPU and memory usage, and takes 24 hours to create these resources, until dedicating an entire node with 4 CPU's to Upbound provider-aws - after this, resource creation is faster, but -

After finishing creation, the resource use is permanently heightened to 4000 milicore and 3000Mi memory. It stays this way after all resources are ready and synced.

Without these resources, the resource use fluctuates between 800-2000milicore, and around 200Mi .

This is a dramatic and unsustainable increase in resource use upon adding additional resources.

When looking into the activity inside the aws-provider pod, it's visible that its constantly running terraform plan/apply on seemingly all resources. A very inefficient process

Additionally, tried to increase the replicas in the controller-config to 5, both with and without leader election. With leader election, only 1 of the pods was doing any practical work, and without, all the pods were competing to consume the same amount of resources as currently with just 1. In both cases, the same resource use.


As per requested, output of
kubectl get composite
image

kubectl get managed
image
image


An optional side-note which is not critical to read - based on my own analysis the Upbound AWS provider seems to do the following:

  1. Create 1 subnet
  2. Tries to reconcile ALL routetableassociations in separate Terraform plans
  3. 1 routetableassociation created that matches the subnet
  4. Repeat.

Using a Composition to create the resources had marginal benefits in relation to how quickly it was discovered that the subnet had been created, as there was a hard reference in status and patching, but otherwise no difference what so ever.

What I thought using a Composition would do is that it would gather all the resources in the same TF plan (so all three subnets and associations) to promote efficiency, but it doesn't do that.

How can we reproduce it?

I've put the compositeresourcedefinitions and a sample Helm template used to generate the 75 subnets in a public GitHub repository (simply increase the number of entries in the values file sample-values-format.yaml to created desired number of resources)
https://github.com/mike-larson/upbound-provider-aws-issue

What environment did it happen in?

  • Universal Crossplane Version: 1.10.1-up.1
  • Provider Version: 0.23.0
  • Kubernetes version (use kubectl version): 1.22
  • Kubernetes distribution (e.g. Tectonic, GKE, OpenShift): EKS
  • OS (e.g. from /etc/os-release): Bottlerocket OS 1.8.0 (aws-k8s-1.22)
@mike-larson mike-larson added the bug Something isn't working label Dec 19, 2022
@bhavinkotak
Copy link

@mike-larson - did you get a chance to retest it using updated crossplane version - 1.10.2? It seems to have been fixed.

@blakebarnett
Copy link

I'm not sure if there was a regression, but we are seeing crazy increases in memory usage on startup for provider-aws after version v0.26
Screenshot 2023-02-09 at 11 45 41 AM

It just repeatedly gets OOMKilled / Evicted. We're using m6i.large instance types with 8GB memory and very little running on the nodes. Is this expected?

By comparison v0.26 gets up to around 4.5GB RSS at startup and the steady state is ~1GB. We'd really like to upgrade, there are fixes in v0.29 we need.

@jessebye
Copy link

jessebye commented Feb 10, 2023

We are using Crossplane v1.11.0 and still observe this issue, both for v0.26 and later versions of this provider.

I can replicate it as follows:

  1. Create an SNS TopicSubscription resource, where the topic doesn't exist.
  2. Observe in the debug provider logs that it is perpetually refreshing and re-applying the change.
  3. Observe the CPU/memory use for the provider pod is very high. The more resources that are stuck in the perpetual reconcile state, the higher the resource use.
  4. If the topic is created, then the apply succeeds and the CPU/memory drop to normal levels immediately.

Snippet of the logs when the problem is occuring:

1.6760075247863972e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.676007528990535e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:44.897515Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.972231Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.973199Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.982686Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.982769Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.985783Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:48.985827Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075289907613e+09    DEBUG    provider-aws    Waiting for external resource existence to be confirmed    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434314", "external-name": ""}
1.6760075289909427e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434314"}, "reason": "PendingExternalResource", "message": "Waiting for external resource existence to be confirmed"}
1.676007536991595e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075408924448e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:38:57.094438Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.861201Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.862139Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.869631Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.869719Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.881364Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.881523Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075409446359e+09    DEBUG    provider-aws    Successfully requested creation of external resource    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434314", "external-name": "", "external-name": ""}
1.6760075409451208e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434662"}, "reason": "CreatedExternalResource", "message": "Successfully requested creation of external resource"}
1.6760075409539986e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.676007541004167e+09    DEBUG    provider-aws    External resource is up to date    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434662", "external-name": "", "requeue-after": 1676008141.0041614}
1.676007541016518e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075410937684e+09    DEBUG    provider-aws    External resource is up to date    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434666", "external-name": "", "requeue-after": 1676008141.0937617}
2.6760075465461829e+09    DEBUG    provider-aws    apply async ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:00.961583Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Plan to create\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:05.404350Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"planned_change\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 1 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:05.404442Z\",\"changes\":{\"add\":1,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Creating...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:06.459684Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"create\"},\"type\":\"apply_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Creation errored after 0s\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:06.496166Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"create\",\"elapsed_seconds\":0},\"type\":\"apply_errored\"}\n{\"@level\":\"error\",\"@message\":\"Error: error creating SNS Topic Subscription: NotFound: Topic does not exist\\n\\tstatus code: 404, request id: 71173a0b-ac0a-51ad-87dc-665476b0dfe9\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:06.505200Z\",\"diagnostic\":{\"severity\":\"error\",\"summary\":\"error creating SNS Topic Subscription: NotFound: Topic does not exist\\n\\tstatus code: 404, request id: 71173a0b-ac0a-51ad-87dc-665476b0dfe9\",\"detail\":\"\",\"address\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"range\":{\"filename\":\"main.tf.json\",\"start\":{\"line\":1,\"column\":1717,\"byte\":1716},\"end\":{\"line\":1,\"column\":1718,\"byte\":1717}},\"snippet\":{\"context\":\"resource.aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"code\":\"{\\\"provider\\\":{\\\"aws\\\":{\\\"access_key\\\":\\\"ASIAT25H6KLSHGOTM2FB\\\",\\\"region\\\":\\\"REDACTED\\\",\\\"secret_key\\\":\\\"OjuvgbGG3psybXOpT+J1ukKBSgCcLNvFdWEvqAU0\\\",\\\"token\\\":\\\"IQoJb3JpZ2luX2VjEBYaCXVzLWVhc3QtMiJIMEYCIQD19vYAXSx6hC1+YAuuF8XRym30Xltl1RA5Uub2h1DkEQIhAPYfnCkDlpEu2L4muDir/3tgsuf7WJS8pBqVHZUTokeyKpYFCJ///////////wEQBBoMMjYzOTU1ODI5NDc2Igz7Od3nG6LC8tOyVC4q6gRan+rrYZGJzGUpdVgYkGc5rJLrogudor9V+96DfsZU5K54PxhZzlCaWIRRy4bNzrADVT6bR9yBrJ1kyMoryLg93CfzvGL4tUobgeUPaUANHSV6eecM8HBIS0paicAar9A2tY42LRLvVRLNoxWb+KCx9okq55OgjCQUFQ7eNyQEOWgVLoZNx4zwaW6js7WlryUy5jkFxAyiG2xU+Jg4vRUFV47t9vI+Jp/rFVAM46OqfzfpNB6RTzqwE4CKc1QwPOfhR3wkr51MPPPTXWJDUOqPbtLAFAp7UfZChlULceoZOM/Uw9Bs6UlcfRFx2B/g2AOW+GNMeej2mi15sW81GF6OMAl+/KPaiKX1MoXF+p9Spyw9SfGVag6JJntYuwuP/0CpecZSVkcDqiGDi8Lf0A4pn7qgW3dlSNghWAMz0xcwatMteNFavec2IecLMN9lhjsNvSqGnWPWisQwE0BBW334bPTqbxA+wKhpcjp1ms5NPFgprQn9UoqjFr2fllmNHilArYPPr5v+EhcvsEMFpSSP38qb4G7wI0OxzjXkyIJHr5IlzYswegIH21m47buSLWnB1d2iHR0vdIe6l1MF11vqLKMd6XZt60G4bnY286PPrSAk+6pGdDpo9jAiFteWwlae8MYXXWBLutph60tn8r7VO3n43yiECMIVviKTFJPrfEspTIXUS8/48iafZXeBX06qOZXI7LhPZ88sv0VbnJ1Fa3RA0kNff2pnCwYa3utcP25bP51kBmbwoxCHpAxINwSOD2ApyQxTB9oiwjq/3YuMp9KeRkUDW6CDw+Kh500AoHQ2WhNX/uFNnxgw8bCXnwY6mgEDZ+BkhM9UDC1103WJh59vsdgXG+K2V1nI+uzOPwFZYT6t4NesI0LeTUqWW19sq8WuA8UlLTsCE/b8hz2P8SXqKokP5zuLyIqb5O5UXT6FOwvQnbTMbo4I50nv7R01Ve8nawTV5xNC0uCrW4ln7b17asjf+N7jwTFJ64VBczGjqTBf4o17N1EksR9krcsd7A/ePTPneKdTroh3\\\"}},\\\"resource\\\":{\\\"aws_sns_topic_subscription\\\":{\\\"test-api-trading-activation-target-service-trading-activated-event\\\":{\\\"endpoint\\\":\\\"arn:aws:sqs:REDACTED:REDACTED:test-api-trading-activation\\\",\\\"lifecycle\\\":{\\\"prevent_destroy\\\":true},\\\"protocol\\\":\\\"sqs\\\",\\\"topic_arn\\\":\\\"arn:aws:sns:REDACTED:REDACTED:test-target-service-trading-activated-event\\\"}}},\\\"terraform\\\":{\\\"required_providers\\\":{\\\"aws\\\":{\\\"source\\\":\\\"hashicorp/aws\\\",\\\"version\\\":\\\"4.15.1\\\"}}}}\",\"start_line\":1,\"highlight_start_offset\":1716,\"highlight_end_offset\":1717,\"values\":[]}},\"type\":\"diagnostic\"}\n"}
1.6760075465570364e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075501069496e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:06.665676Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.089382Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.090361Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.096908Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.096996Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.102838Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:10.102892Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075501072614e+09    DEBUG    provider-aws    Waiting for external resource existence to be confirmed    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434722", "external-name": ""}
1.676007550107764e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434722"}, "reason": "PendingExternalResource", "message": "Waiting for external resource existence to be confirmed"}
1.6760075511080823e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.676007554811499e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:11.258118Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.780536Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.781669Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.799238Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.799306Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.806332Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:14.806416Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075548118138e+09    DEBUG    provider-aws    Waiting for external resource existence to be confirmed    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434722", "external-name": ""}
1.6760075548124452e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434722"}, "reason": "PendingExternalResource", "message": "Waiting for external resource existence to be confirmed"}
1.6760075568129787e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075603342373e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:16.917954Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.300299Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.302372Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.321666Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.321743Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.327852Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:20.327905Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.676007560334608e+09    DEBUG    provider-aws    Waiting for external resource existence to be confirmed    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434722", "external-name": ""}
1.6760075603346975e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434722"}, "reason": "PendingExternalResource", "message": "Waiting for external resource existence to be confirmed"}
1.6760075643358264e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.676007568899769e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:24.559425Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.885372Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.886147Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.892513Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.892605Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.895363Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:28.895401Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075689001238e+09    DEBUG    provider-aws    Waiting for external resource existence to be confirmed    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434722", "external-name": ""}
1.6760075689002964e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434722"}, "reason": "PendingExternalResource", "message": "Waiting for external resource existence to be confirmed"}
1.676007576901035e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075807071602e+09    DEBUG    provider-aws    refresh ended    {"workspace": "/tmp/e7e6ee39-3dcb-40d7-b869-a6494c569188", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:37.014541Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refreshing state...\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.682887Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"id_key\":\"id\"},\"type\":\"refresh_start\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Refresh complete\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.683784Z\",\"hook\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null}},\"type\":\"refresh_complete\"}\n{\"@level\":\"info\",\"@message\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event: Drift detected (delete)\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.697958Z\",\"change\":{\"resource\":{\"addr\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"module\":\"\",\"resource\":\"aws_sns_topic_subscription.test-api-trading-activation-target-service-trading-activated-event\",\"implied_provider\":\"aws\",\"resource_type\":\"aws_sns_topic_subscription\",\"resource_name\":\"test-api-trading-activation-target-service-trading-activated-event\",\"resource_key\":null},\"action\":\"delete\"},\"type\":\"resource_drift\"}\n{\"@level\":\"info\",\"@message\":\"Plan: 0 to add, 0 to change, 0 to destroy.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.698039Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"plan\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Apply complete! Resources: 0 added, 0 changed, 0 destroyed.\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.701863Z\",\"changes\":{\"add\":0,\"change\":0,\"remove\":0,\"operation\":\"apply\"},\"type\":\"change_summary\"}\n{\"@level\":\"info\",\"@message\":\"Outputs: 0\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-10T05:39:40.701917Z\",\"outputs\":{},\"type\":\"outputs\"}\n"}
1.6760075807370524e+09    DEBUG    provider-aws    Successfully requested creation of external resource    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521434722", "external-name": "", "external-name": ""}
1.6760075807374654e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521435068"}, "reason": "CreatedExternalResource", "message": "Successfully requested creation of external resource"}
1.6760075807451665e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.676007580813221e+09    DEBUG    provider-aws    External resource is up to date    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521435068", "external-name": "", "requeue-after": 1676008180.813216}
1.67600758082477e+09    DEBUG    provider-aws    Reconciling    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event"}
1.6760075809034948e+09    DEBUG    provider-aws    External resource is up to date    {"controller": "managed/sns.aws.upbound.io/v1beta1, kind=topicsubscription", "request": "/test-api-trading-activation-target-service-trading-activated-event", "uid": "e7e6ee39-3dcb-40d7-b869-a6494c569188", "version": "521435071", "external-name": "", "requeue-after": 1676008180.9034905}

It would be wonderful if we could at least adjust the reconcile retry rate, since in our case, we want it to keep retrying (since the topic will eventually get created), just at a much slower pace (exponential backoff would be nice).

@blakebarnett
Copy link

That explains the constant high CPU usage we've been seeing. But there is also a dramatic jump in memory usage when starting up for versions above v0.26, we can't get v0.29 running at all currently but v0.26 calms down much more quickly, I'm not sure if it's purely the increase in CRDs or something else going on.

@rozcietrzewiacz
Copy link

I'm seeing high CPU usage by this provider, though in a slightly different scenario, which I will describe in a separate issue. I just wanted to drop a suggestion here, for those who are in a position to use it:

If possible, try and use the native aws provider for the resources that are supported by it. I realize this can complicate the platform definition if you need to add another provider to your setup, so this might not be a solution for everyone.

@jessebye
Copy link

@rozcietrzewiacz confirmed, the CPU usage is much lower with the community AWS provider. We've switched over to that and it's much more performant. Perhaps because it uses the AWS SDK instead of Terraform?

@mmclane
Copy link

mmclane commented Feb 13, 2023

I also see high CPU and memory usage.

We haven't used this to build too much so far. Only 12 cognito user pools. I do see that they Reconcile constantly. I do see a Reconsciler error in the debug logs.

1.6763083302780247e+09 ERROR Reconciler error {"controller": "managed/wafv2.aws.upbound.io/v1beta1, kind=rulegroup", "controllerGroup": "wafv2.aws.upbound.io", "controllerKind": "RuleGroup", "userPool": {"name":"matt-user-pool-jwll2-b7j52"}, "namespace": "", "name": "matt-user-pool-jwll2-b7j52", "reconcileID": "8aa0442b-060f-4fb7-871c-dee3b587199c", "error": "cannot update managed resource status: Operation cannot be fulfilled on userpools.cognitoidp.aws.upbound.io "matt-user-pool-jwll2-b7j52": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

I don't know that the resources was actually modified outside of the provider. I don't think this one has ever actually been used.

@jbw976
Copy link
Member

jbw976 commented Feb 13, 2023

Thank you everyone for the discussion here and sharing your observations and data! That is definitely helpful and appreciated. I think we should take a deeper look into this issue and see if there's a recent regression or some other root cause that is not simply related to the typical performance issues we've seen from a large number of CRDs installed.

Please do keep sharing your findings as we dig into this more! 🙇 🙏

@jbw976
Copy link
Member

jbw976 commented Feb 13, 2023

Just a quick note that is does indeed look like the timestamps for the reconcile events are happening more often than I would expect. Timestamps grabbed from #325 (comment) and then filtered with:

grep -F 'Reconciling' | grep -F -i '/test-api-trading-activation-target-service-trading-activated-event' | awk '{print $1}'

They're in a non human friendly scientific notation epoch format (tracked in crossplane/crossplane-runtime#373), but converted the timestamps look like below:

1.6760075247863972e+09 - Fri Feb 10 2023 05:38:44 GMT+0000
1.676007536991595e+09 - Fri Feb 10 2023 05:38:56 GMT+0000
1.6760075409539986e+09 - Fri Feb 10 2023 05:39:00 GMT+0000
1.676007541016518e+09 - Fri Feb 10 2023 05:39:01 GMT+0000
1.6760075465570364e+09 - Fri Feb 10 2023 05:39:06 GMT+0000
1.6760075511080823e+09 - Fri Feb 10 2023 05:39:11 GMT+0000
1.6760075568129787e+09 - Fri Feb 10 2023 05:39:16 GMT+0000
1.6760075643358264e+09 - Fri Feb 10 2023 05:39:24 GMT+0000
1.676007576901035e+09 - Fri Feb 10 2023 05:39:36 GMT+0000
1.6760075807451665e+09 - Fri Feb 10 2023 05:39:40 GMT+0000
1.67600758082477e+09 - Fri Feb 10 2023 05:39:40 GMT+0000

That's 11 reconcile events within about ~1 minute - I would think we'd see less than that if they were doing exponential backoff. Perhaps the condition for this SNS topic that is causing us to requeue another reconcile isn't an error and isn't subject to exponential backoff?

Also, we see 2 events for Successfully requested creation of external resource, about 40 seconds apart for the same object/uuid. Is the provider trying to create the object multiple times?

grep -F 'events' | grep -F 'Successfully requested creation of external resource'
1.6760075409451208e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521434662"}, "reason": "CreatedExternalResource", "message": "Successfully requested creation of external resource"}
1.6760075807374654e+09    DEBUG    events    Normal    {"object": {"kind":"TopicSubscription","name":"test-api-trading-activation-target-service-trading-activated-event","uid":"e7e6ee39-3dcb-40d7-b869-a6494c569188","apiVersion":"sns.aws.upbound.io/v1beta1","resourceVersion":"521435068"}, "reason": "CreatedExternalResource", "message": "Successfully requested creation of external resource"}

Timestamps translated:

  • Fri Feb 10 2023 05:39:00 GMT+0000
  • Fri Feb 10 2023 05:39:40 GMT+0000

So the double creation and the frequency of reconciling here does look fishy to me 🎣 🤔

@blakebarnett
Copy link

I verified that the huge spike in memory doesn't occur when all managed resources in the cluster are Synced and in Ready state, we were able to upgrade to v0.29 normally in this state, the memory usage didn't go up at all, in fact it dropped by ~70%.

The bad news is, when the cluster was in this state (pods getting Evicted constantly) it was creating duplicate resources for apparently no reason which made the situation far worse. We went from 32 managed resources to 380+. Once provider-aws recovered it was able to successfully remove most of them, a handful required manual deletion and patching to remove the finalizer.

@mmclane
Copy link

mmclane commented Feb 14, 2023

I have crossplane installed on two clusters hub and dev. On the dev cluster I have it managing 12 cognito user pools. On the hub cluster it is only installed and isn't managing anything. The resource utilization between the two is vastly different. On the dev cluster where it is managing resources, we are experiencing high CPU and memory on my upbound-provider-aws pod. Both clusters are running the latest version of crossplane and the upbound-provider-aws containers. Let me provide some numbers.

Hub cluster CPU Utilization
image

Hub cluster memory usage:
image

Dev cluster CPU Utilization
image

Dev cluster memory:
image

Its regularly using over 5 gig of memory and sometimes over 12.

The real issue is the pod is causing Memory Node pressure on my dev cluster and the pods gets evicted. The pod is regularly causing node memory pressure and gets evicted. In the past 24 hours its been evicted 164 times because the node is under memory pressure.

I have tried the following

  • Upgrading everything
  • disabling debug mode

My next steps

I am going to try to set pod resource requests and limits on the upbound-provider-aws pod to see if I can keep it from continually getting evicted. I am also going to consider using the community provider instead of the upbound version.

Sample of my logs

What I think I am seeing is that its constantly trying to reconcile the environments. I also see it getting errors on xray which is weird because I am not building anything with xray.

https://gist.github.com/mmclane/b44b933cb831d26d520169f507d96c13

@mmclane
Copy link

mmclane commented Feb 14, 2023

All of my resources are synced and ready
image

@mmclane
Copy link

mmclane commented Feb 14, 2023

I was able to set resource limits but when I had it limited to 5.5GB of ram I was seeing regular restarts of the pod due to it running out of memory. Setting it to 8G limit also results in restarts.

@mmclane
Copy link

mmclane commented Feb 15, 2023

Even at a memory limit of 14G, this pod is restarted due to OOMKill 10 times in 5 hours

@mmclane
Copy link

mmclane commented Feb 16, 2023

One thing I have noticed this morning is that while my userpools show that they are synced and ready, the underlying resources in the composition are not showing has being in sync.

image

If I look at the details on those I see that the plan is failing. I don't know why this would be and the AWS resource have all been created. I am wondering if this is what is causing all the CPU and Memory usage as it tries to rerun a plan for everything all the time.

image

Furthermore, I had crossplane create an RDS database today (as I am working on an XRD around that) and it isn't able to delete the database it just created with the same errors.

I am going to try to figure out why I am getting these errors if I can....

Here are some log snippets I have found

1.6765624171694622e+09	DEBUG	provider-aws	Cannot observe external resource	{"controller": "managed/iam.aws.upbound.io/v1beta1, kind=rolepolicyattachment", "request": "/amanda-user-pool-zpfll-k5wbv", "uid": "6a35601f-9454-438f-a263-b1908e460c7f", "version": "208584678", "external-name": "amanda-user-pool-zpfll-4dw7q-20230109205315743000000001", "error": "cannot run plan: plan failed: ", "errorVerbose": "plan failed: \ncannot run plan\ngithub.com/upbound/upjet/pkg/controller.(*external).Observe\n\tgithub.com/upbound/[email protected]/pkg/controller/external.go:201\ngithub.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile\n\tgithub.com/crossplane/[email protected]/pkg/reconciler/managed/reconciler.go:780\ngithub.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile\n\tgithub.com/crossplane/[email protected]/pkg/ratelimiter/reconciler.go:54\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nruntime.goexit\n\truntime/asm_amd64.s:1594"}
1.6765610174906583e+09	DEBUG	provider-aws	Cannot observe external resource	{"controller": "managed/rds.aws.upbound.io/v1beta1, kind=instance", "request": "/m3test-mmsjx-sdfh5", "uid": "f823b85a-b675-4dd7-a119-05771d846e69", "version": "208575439", "external-name": "m3test-mmsjx-sdfh5", "error": "cannot run refresh: refresh failed: ", "errorVerbose": "refresh failed: \ncannot run refresh\ngithub.com/upbound/upjet/pkg/controller.(*external).Observe\n\tgithub.com/upbound/[email protected]/pkg/controller/external.go:113\ngithub.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile\n\tgithub.com/crossplane/[email protected]/pkg/reconciler/managed/reconciler.go:780\ngithub.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile\n\tgithub.com/crossplane/[email protected]/pkg/ratelimiter/reconciler.go:54\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nruntime.goexit\n\truntime/asm_amd64.s:1594"}
1.6765610174910061e+09	DEBUG	events	Warning	{"object": {"kind":"Instance","name":"m3test-mmsjx-sdfh5","uid":"f823b85a-b675-4dd7-a119-05771d846e69","apiVersion":"rds.aws.upbound.io/v1beta1","resourceVersion":"208575439"}, "reason": "CannotObserveExternalResource", "message": "cannot run refresh: refresh failed: "}
1.6765610174938695e+09	DEBUG	provider-aws	refresh ended	{"workspace": "/tmp/71d81084-f4b3-415f-8b1c-63e18c078dde", "out": "{\"@level\":\"info\",\"@message\":\"Terraform 1.2.1\",\"@module\":\"terraform.ui\",\"@timestamp\":\"2023-02-16T15:20:51.277766Z\",\"terraform\":\"1.2.1\",\"type\":\"version\",\"ui\":\"1.0\"}\n"}

@mmclane
Copy link

mmclane commented Feb 16, 2023

Increasing the memory limit on the provider pod to 20G forced the pod to update. The new pod was able to run the plans successfully and most resources are now showing as synced.

@ulucinar
Copy link
Collaborator

Hi @mmclane and all,
What's your Kubernetes version?

@blakebarnett
Copy link

v1.24.8-eks-ffeb93d for us

@adamhouse
Copy link

We're running v1.23.15-eks-49d8fe8.

@mmclane
Copy link

mmclane commented Feb 17, 2023

v1.24.8-eks-ffeb93d

@ulucinar
Copy link
Collaborator

ulucinar commented Feb 17, 2023

Thank you all for the version infos. I also did some tests yesterday on a local kind cluster to understand what's going on here. Need to do some further tests & observations on the issue, but currently it looks like to me that the concurrent TF calls and Terraform provider forks are causing the memory and CPU spikes we are observing.

As @blakebarnett mentioned above, if the managed resources get to the Ready & Synced state, then the resource utilization should stay low until at least the next poll interval.

I assume in the discussions above, the provider is running with the default values of the poll and the max-reconcile-rate parameters. Did anyone override these parameters?

@mmclane
Copy link

mmclane commented Feb 20, 2023

@ulucinar

I've started using the community provider instead for my current package, at least until this gets sorted out. One thing I've noticed is that the community provider isn't constantly giving errors related to xray like I see on the upbound provider.

1.6769002726951194e+09	ERROR	Reconciler error	{"controller": "managed/xray.aws.upbound.io/v1beta1, kind=samplingrule", "controllerGroup": "xray.aws.upbound.io", "controllerKind": "SamplingRule", "userPool": {"name":"matt-user-pool-jwll2-b7j52"}, "namespace": "", "name": "matt-user-pool-jwll2-b7j52", "reconcileID": "3057b9d8-32ff-4a5b-830a-abb5b3339334", "error": "cannot update managed resource status: Operation cannot be fulfilled on userpools.cognitoidp.aws.upbound.io \"matt-user-pool-jwll2-b7j52\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234

I know that package isn't building anything with xray!

If I restart the pod I see it starting a lot of workers. Or at least ~ 2400 lines that include starting workers and reference xray.

1.6769004390014818e+09	INFO	Starting workers	{"controller": "managed/xray.aws.upbound.io/v1beta1, kind=samplingrule", "controllerGroup": "xray.aws.upbound.io", "controllerKind": "SamplingRule", "worker count": 10}
1.6769004390016065e+09	INFO	Starting workers	{"controller": "managed/xray.aws.upbound.io/v1beta1, kind=samplingrule", "controllerGroup": "xray.aws.upbound.io", "controllerKind": "SamplingRule", "worker count": 10}
1.676900439001845e+09	INFO	Starting workers	{"controller": "managed/xray.aws.upbound.io/v1beta1, kind=samplingrule", "controllerGroup": "xray.aws.upbound.io", "controllerKind": "SamplingRule", "worker count": 10}
1.676900439001968e+09	INFO	Starting workers	{"controller": "managed/xray.aws.upbound.io/v1beta1, kind=samplingrule", "controllerGroup": "xray.aws.upbound.io", "controllerKind": "SamplingRule", "worker count": 10}

I am wondering if this is related.

Here is a gist with my fill startup log: https://gist.github.com/mmclane/b2c7199bd4bb7eac20853fbaaf4ece4b

@mmclane
Copy link

mmclane commented Feb 20, 2023

As for your question, I have not overwritten those settings.

Here is my current ControllerConfig for the provider.

apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"
    argocd.argoproj.io/tracking-id: dev-crossplane-config:pkg.crossplane.io/ControllerConfig:upbound-system/upbound-provider-aws-config
    eks.amazonaws.com/role-arn: arn
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"pkg.crossplane.io/v1alpha1","kind":"ControllerConfig","metadata":{"annotations":{"argocd.argoproj.io/sync-wave":"2","argocd.argoproj.io/tracking-id":"dev-crossplane-config:pkg.crossplane.io/ControllerConfig:upbound-system/upbound-provider-aws-config","eks.amazonaws.com/role-arn":"arn"},"labels":{"argocd.argoproj.io/instance":"dev-crossplane-config"},"name":"upbound-provider-aws-config"},"spec":{"env":[{"name":"AWS_REGION","value":"us-east-1"}],"podSecurityContext":{"fsGroup":2000}}}
  creationTimestamp: "2022-12-14T20:41:16Z"
  generation: 11
  labels:
    argocd.argoproj.io/instance: dev-crossplane-config
  name: upbound-provider-aws-config
  resourceVersion: "208592737"
  uid: 438c8b8a-83a2-4748-bd53-378b71ef6615
spec:
  args:
  - --debug
  env:
  - name: AWS_REGION
    value: us-east-1
  podSecurityContext:
    fsGroup: 2000
  resources:
    limits:
      cpu: 4
      memory: 20G
    requests:
      cpu: 4
      memory: 2G

I have not seen resource utilization get low even when everything is ready and synced. At least, not nearly as low as second cluster or the community provisioner. I have seen the provider seem to get stuck and the resources get marked as out of sync until I restart the pod.

@bcavarun
Copy link

Faced an issue with Upbound AWS provider version 0.29.0, Pods were getting evicted since today morning.
AWS Provider pod is using too much memory and CPU and getting evicted.
Below are the only logs pod is showing.
k logs -f upbound-provider-aws-d1299bb1cbf0-5fd5b48898-p8dst I0222 04:17:46.183324 1 request.go:601] Waited for 1.011150074s due to client-side throttling, not priority and fairness, request: GET:[https://10.0.0.1:443/apis/waf.aws.upbound.io/v1beta1?timeout=32s](https://10.0.0.1/apis/waf.aws.upbound.io/v1beta1?timeout=32s)

Pod was running on 4C 8GB instance and getting evicted continuously.

All of the resources were out of Snyn, when I exec in the pod and checked the process, it was running multiple terraform processes at once. May be trying to sync all of the resources in parallel and that explains why it was trying to use too much CPU and Memory. I increased the instance size to 16C and 64GB memory, It worked fine.

Screenshot 2023-02-22 at 9 54 57 AM

Screenshot 2023-02-22 at 10 13 42 AM

Screenshot 2023-02-22 at 12 15 10 PM

@jeanduplessis
Copy link
Collaborator

To keep everyone up to date: We are treating this as our top priority and are busy investigating the root cause and possible solutions. We'll keep folks updated here as we gather information and form plans to address it.

@jeanduplessis
Copy link
Collaborator

We are temporarily moving communication updates and requests for assistance and feedback to the following Crossplane community Slack workspace channel: #sig-provider-aws-resource-utilization.

We plan to release two images of the provider tomorrow for testing and ask those who can try it to share their findings there.

@jeanduplessis
Copy link
Collaborator

jeanduplessis commented Mar 2, 2023

In the next provider release, v0.31.0, coming out early next week, we plan to do the following:

  • Include a fix that addresses some concurrency problems we identified.
  • Better instrumentation support to provide more information about the provider’s state to help with debugging.
  • Some high-level guidance on expected resource requirements (we will still work on a proper sizing guide) to help set expectations on what type of node you will need to run the providers. This will help users coming from the community provider, where it might be sufficient to use a 512MB memory instance, which won’t work for the Upbound provider.

None of the above is a definitive fix for the larger issue yet, but it’s small iterative steps towards a better state.

In the meantime, we continue to prioritize this work, and our team's focus is fully on improving the performance and making further quality enhancements which we will share more about in the coming weeks.

@fernandezcuesta
Copy link

With v0.31.0 memory consumption still skyrockets from time to time. Moreover, since the deployment for provider-aws does not contain any resource requests/limits on its spec, pods get continuously evicted.
Having at least resource requests would be helpful for kube-scheduler not to have crossplane's provider-aws pod running on busy nodes.

@jeanduplessis
Copy link
Collaborator

@fernandezcuesta The 0.31.0 release did not contain the cpu/memory optimizations yet, which is only coming in the next release. We are actively testing the work with the community in #sig-provider-aws-resource-utilization and once all issues are ironed out we will release a new version of the providers.

@mmclane
Copy link

mmclane commented Mar 17, 2023

@fernandezcuesta

you can set resource request and limits on the provider pod.

apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
  labels:
    name: upbound-provider-aws-config

spec:
  args:
  - --debug
  env:
  - name: AWS_REGION
    value: us-east-1
  podSecurityContext:
    fsGroup: 2000
  resources:
    limits:
      cpu: 4
      memory: 20G
    requests:
      cpu: 4
      memory: 2G

@nce
Copy link

nce commented Mar 20, 2023

We are experiencing node terminations because of memory exhaustion on nodes running upbound-provider-aws:v0.30.0 k8s-v1.24.10 with 32Gi Memory.

Our fix: The Controller is running without any resource configurations and we added them (ControllerConfig) now as a shortterm fix (which is now Crashlooping bc/ of OOMKills)

The Pod starts with about 2Gi and dies (with the whole node) at around 22-28Gi (just for reference) in about 30mins.
Over this timeframe we see a constant increase of:

  • terraform-provider-aws processes (at the end around 196)
  • Zombies to probably that parent ([terraform-provi] <defunct>) (at the end about 170)

Each consuming about 130Mi:
2000 117136 3.0 0.4 1112204 156128 ? Sl 12:56 0:00 .terraform/providers/registry.terraform.io/hashicorp/aws/4.52.0/linux_amd64/terraform-provider-aws_v4.52.0_x5

Maybe relevant log informations:

1.6793136566111515e+09    DEBUG    events    Warning    {"object": {"kind":"Record","name":"ch-prod-public-eu-central-1-vrcn9-lt2nd","uid":"1f643217-3dc0-4f60-aa46-dc3f0808b21
e","apiVersion":"route53.aws.upbound.io/v1beta1","resourceVersion":"269174845"}, "reason": "CannotConnectToProvider", "message": "cannot get a terraform workspace for resource
: cannot init workspace: \n\u001b[0m\u001b[1mInitializing the backend...\u001b[0m\n\n\u001b[0m\u001b[1mInitializing provider plugins...\u001b[0m\n- Finding hashicorp/aws versi
ons matching \"4.52.0\"...\n- Installing hashicorp/aws v4.52.0...\n: signal: killed"}

As we've lost this node, i can't tell if this was killed by the kernel or from the provider itself (?)

This is probably not the most useful information. Would there be anything more helpful to you?

@jeanduplessis
Copy link
Collaborator

@nce thank you for the information. It would be helpful if you could try the test images we referenced in the XP community Slack, https://crossplane.slack.com/archives/C04QLETDJGN/p1679072821157379, and report on how they work for you.

The test images are:

ulucinar/provider-aws-amd64:v0.31.0-08222dae
ulucinar/provider-aws-arm64:v0.31.0-08222dae

@nce
Copy link

nce commented Mar 22, 2023

I've tested it and already mentioned it in slack, reposting it here for visibility.

With the new image we get the following logs:

.... a lot of client-throttling
I0321 14:24:09.951286       1 request.go:601] Waited for 4.176600064s due to client-side throttling, not priority
and fairness, request: GET:[https://172.31.0.1:443/apis/appintegrations.aws.upbound.io/v1beta1?timeout=32s](https://172.31.0.1/apis/appintegrations.aws.upbound.io/v1beta1?timeout=32s)
I0321 14:24:19.951457       1 request.go:601] Waited for 4.376066321s due to client-side throttling, not priority
and fairness, request: GET:[https://172.31.0.1:443/apis/ec2.aws.upbound.io/v1beta1?timeout=32s](https://172.31.0.1/apis/ec2.aws.upbound.io/v1beta1?timeout=32s)
provider: error: Cannot start controller manager: [failed to wait for managed/ec2.aws.upbound.io/v1beta1, kind=ins
tancestate caches to sync: timed out waiting for cache to be synced, failed waiting for all runnables to end withi
n grace period of 30s: context deadline exceeded]

which results in a pod restart.

The released v0.30 image starts just fine without any/a lot client-throttling.

    Image:          xpkg.upbound.io/upbound/provider-aws:v0.30.0
    State:          Running
      Started:      Wed, 22 Mar 2023 16:44:58 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 22 Mar 2023 16:16:56 +0100
      Finished:     Wed, 22 Mar 2023 16:44:57 +0100
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     2
      memory:  10G
    Requests:
      cpu:     1
      memory:  8G

Is there something we should've configured otherwise for v0.31?

@jeanduplessis
Copy link
Collaborator

jeanduplessis commented Mar 28, 2023

An update for everyone:

We have release candidate images for the big three providers available for testing. These images contain the latest improvements in the context of the providers’ performances. Some load and correctness tests were done, and they were reported in this issue #576.

For more context about the improvements, please see this PR: crossplane/upjet#178

xpkg.upbound.io/upbound/provider-aws:v0.32.0-rc.1
xpkg.upbound.io/upbound/provider-azure:v0.30.0-rc.1
xpkg.upbound.io/upbound/provider-gcp:v0.30.0-rc.1

We recommend trying these RC images outside of production environments. The main purpose is to collect pre-release feedback and to confirm there are no blockers to releasing new versions of the providers at the end of the week.

@jeanduplessis
Copy link
Collaborator

jeanduplessis commented Mar 31, 2023

The new provider releases are available:

In addition to the performance improvements, we also have two additional resources for users to reference:

  • A guide on monitoring the Upjet runtime.
  • An initial sizing guide for the Upjet-based official providers.

These improvements are only the start of our continued focus on improved performance and quality.

@huynhsontung
Copy link
Contributor

Are there any updates on why with leader election, only one pod does the actual work? This is a big concern in terms of scalability for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working customer impact:high
Projects
None yet
Development

No branches or pull requests