Skip to content
This repository has been archived by the owner on Apr 3, 2020. It is now read-only.

Timeout when trying to provision new workers on GCP #106

Open
ashea-code opened this issue Jun 3, 2019 · 2 comments
Open

Timeout when trying to provision new workers on GCP #106

ashea-code opened this issue Jun 3, 2019 · 2 comments

Comments

@ashea-code
Copy link

I know this repo is becoming out of date, bit I'm trying to re-run the concourse-up command on an existing deployment in GCP. I have exported WORKERS=2 to add an extra worker to the pool.

However, I get this as an error:

Task 1175

Task 1175 | 23:02:09 | Preparing deployment: Preparing deployment (00:02:16)
                    L Error: worker/ed0abfe3-0867-49e1-9092-d242f833bd74: Timed out sending 'get_state' to instance: 'worker/ed0abfe3-0867-49e1-9092-d242f833bd74', agent-id: 'be09322d-de63-4f1a-9d55-54925a64270a' after 45 seconds
Task 1175 | 23:04:26 | Error: worker/ed0abfe3-0867-49e1-9092-d242f833bd74: Timed out sending 'get_state' to instance: 'worker/ed0abfe3-0867-49e1-9092-d242f833bd74', agent-id: 'be09322d-de63-4f1a-9d55-54925a64270a' after 45 seconds

Task 1175 Started  Mon Jun  3 23:02:09 UTC 2019
Task 1175 Finished Mon Jun  3 23:04:26 UTC 2019
Task 1175 Duration 00:02:17
Task 1175 error

Updating deployment:
  Expected task '1175' to succeed but state is 'error'

Exit code 1

Any idea on what is timing out here? GCP is known to be a bit slow on provisioning.

@DanielJonesEB
Copy link
Contributor

Hmm, I'm not sure. We've seen the GCP CPI time out regularly and intermittently (we need to bump the version of the CPI to fix it) but normally with different errors:

Task 10 | 12:28:32 | Error: CPI error 'Bosh::Clouds::CloudError' with message 'Creating vm: Failed to find Google Image 'stemcell-e5d99deb-c5b4-4f5f-53ad-87ef7e71d15a': Get https://www.googleapis.com/compute/v1/projects/ps-amcginlay/global/images/stemcell-e5d99deb-c5b4-4f5f-53ad-87ef7e71d15a?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 108.177.111.84:443: i/o timeout' in 'create_vm' CPI method (CPI request ID: 'cpi-653711')

Is the issue intermittent for you?

@ashea-code
Copy link
Author

This issue isn't intermittent, and trying to make a fresh install also presents me with:

Task 10

Task 10 | 21:46:23 | Preparing deployment: Preparing deployment (00:00:01)
Task 10 | 21:46:24 | Preparing deployment: Rendering templates (00:00:02)
Task 10 | 21:46:26 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 10 | 21:46:26 | Creating missing vms: web/ecade835-5d6e-416e-830c-fb8d648e99ef (0)
Task 10 | 21:46:26 | Creating missing vms: worker/1115b42a-0163-487f-8938-0e23fd19f6c8 (0)
Task 10 | 21:46:26 | Creating missing vms: worker/83502149-1c15-41eb-8838-0af800d0a49f (2)
Task 10 | 21:46:26 | Creating missing vms: worker/dc3fdd8f-26d6-4cbf-88a5-678e21a5dddf (1)
Task 10 | 21:47:28 | Creating missing vms: web/ecade835-5d6e-416e-830c-fb8d648e99ef (0) (00:01:02)
Task 10 | 21:47:49 | Creating missing vms: worker/1115b42a-0163-487f-8938-0e23fd19f6c8 (0) (00:01:23)
Task 10 | 21:47:50 | Creating missing vms: worker/dc3fdd8f-26d6-4cbf-88a5-678e21a5dddf (1) (00:01:24)
Task 10 | 21:47:50 | Creating missing vms: worker/83502149-1c15-41eb-8838-0af800d0a49f (2) (00:01:24)
Task 10 | 21:47:51 | Updating instance web: web/ecade835-5d6e-416e-830c-fb8d648e99ef (0) (canary)
Task 10 | 21:47:51 | Updating instance worker: worker/1115b42a-0163-487f-8938-0e23fd19f6c8 (0) (canary) (00:01:07)
Task 10 | 21:48:58 | Updating instance worker: worker/83502149-1c15-41eb-8838-0af800d0a49f (2)
Task 10 | 21:48:58 | Updating instance worker: worker/dc3fdd8f-26d6-4cbf-88a5-678e21a5dddf (1) (00:01:02)
Task 10 | 21:50:01 | Updating instance worker: worker/83502149-1c15-41eb-8838-0af800d0a49f (2) (00:01:03)
Task 10 | 21:59:49 | Updating instance web: web/ecade835-5d6e-416e-830c-fb8d648e99ef (0) (canary) (00:11:58)
                   L Error: 'web/ecade835-5d6e-416e-830c-fb8d648e99ef (0)' is not running after update. Review logs for failed jobs: atc, grafana
Task 10 | 21:59:49 | Error: 'web/ecade835-5d6e-416e-830c-fb8d648e99ef (0)' is not running after update. Review logs for failed jobs: atc, grafana

Task 10 Started  Tue Jun  4 21:46:23 UTC 2019
Task 10 Finished Tue Jun  4 21:59:49 UTC 2019
Task 10 Duration 00:13:26
Task 10 error

Updating deployment:
  Expected task '10' to succeed but state is 'error'

Exit code 1

Has something changed on GCP?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants