Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication Not Working #424

Open
scefali opened this issue Jan 13, 2024 · 8 comments
Open

Replication Not Working #424

scefali opened this issue Jan 13, 2024 · 8 comments

Comments

@scefali
Copy link

scefali commented Jan 13, 2024

I have an app deployed on 2 machines on Fly.io. Replication between the nodes isn't working properly. When SSH in to each one and run SQL queries, I get different results. Here's my Dockerfile.

@scefali
Copy link
Author

scefali commented Jan 13, 2024

@kentcdodds This is forked version of EpicStack, have you ever seen anything like this before?

@kentcdodds
Copy link

I've not seen this before. Very odd. I would suggest scaling down to 1, then cloning the machine again.

@scefali
Copy link
Author

scefali commented Jan 13, 2024

@kentcdodds good idea, I removed the machine and re-added it and it looks ok now. Hopefully this split brain issue doesn't come back!

@scefali scefali closed this as completed Jan 13, 2024
@kentcdodds
Copy link

🤞

@scefali
Copy link
Author

scefali commented Jan 14, 2024

@kentcdodds and it came back after a few deployments :(

@scefali scefali reopened this Jan 14, 2024
@benbjohnson
Copy link
Collaborator

@scefali Can you provide some logs for the two instances when they start up? Are they showing errors connecting to one another?

@scefali
Copy link
Author

scefali commented Jan 15, 2024

@benbjohnson Here you go:

2024-01-15T21:57:14.036 app[56833295f19048] sjc [info] 2024/01/15 21:57:14 listening on [fdaa:0:5d71:a7b:24b:b2ec:af:2]:22 (DNS: [fdaa::3]:53)

2024-01-15T21:57:14.059 app[56833295f19048] sjc [info] config file read from /etc/litefs.yml

2024-01-15T21:57:14.059 app[56833295f19048] sjc [info] LiteFS v0.5.10, commit=1dc7fcfa7360313fe573c2ff7ad01ef58c05c146

2024-01-15T21:57:14.059 app[56833295f19048] sjc [info] level=INFO msg="host environment detected" type=fly.io

2024-01-15T21:57:14.059 app[56833295f19048] sjc [info] level=INFO msg="no backup client configured, skipping"

2024-01-15T21:57:14.060 app[56833295f19048] sjc [info] level=INFO msg="Using Consul to determine primary"

2024-01-15T21:57:14.283 app[56833295f19048] sjc [info] level=INFO msg="initializing consul: key=litefs/contribution-analyzer url=https://:[email protected]/contribution-analyzer-pxovqy2evgy1j2k6/ hostname=56833295f19048 advertise-url=http://56833295f19048.vm.contribution-analyzer.internal:20202"

2024-01-15T21:57:14.284 app[56833295f19048] sjc [info] level=INFO msg="using existing cluster id: \"LFSC48A9E18A2737D2E0\""

2024-01-15T21:57:14.287 app[56833295f19048] sjc [info] level=INFO msg="LiteFS mounted to: /litefs/data"

2024-01-15T21:57:14.287 app[56833295f19048] sjc [info] level=INFO msg="http server listening on: http://localhost:20202"

2024-01-15T21:57:14.287 app[56833295f19048] sjc [info] level=INFO msg="waiting to connect to cluster"

2024-01-15T21:57:14.580 app[56833295f19048] sjc [info] level=INFO msg="D3D9F5BC92A0DC5D: existing primary found (5683d266b62008), connecting as replica to \"http://5683d266b62008.vm.contribution-analyzer.internal:20202\""

2024-01-15T21:57:14.586 app[56833295f19048] sjc [info] level=INFO msg="connected to cluster, ready"

2024-01-15T21:57:14.586 app[56833295f19048] sjc [info] level=INFO msg="node is a candidate, automatically promoting to primary"

2024-01-15T21:57:14.585 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: stream connected ([fdaa:0:5d71:a7b:24b:b2ec:af:2]:55468)"

2024-01-15T21:57:14.667 app[56833295f19048] sjc [info] level=INFO msg="D3D9F5BC92A0DC5D: disconnected from primary, retrying"

2024-01-15T21:57:14.665 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: exiting primary, preserving lease for handoff"

2024-01-15T21:57:14.665 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: stream disconnected ([fdaa:0:5d71:a7b:24b:b2ec:af:2]:55468)"

2024-01-15T21:57:14.741 app[56833295f19048] sjc [info] level=INFO msg="D3D9F5BC92A0DC5D: acquiring existing lease from handoff"

2024-01-15T21:57:14.814 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: existing primary found (5683d266b62008), connecting as replica to \"http://5683d266b62008.vm.contribution-analyzer.internal:20202\""

2024-01-15T21:57:14.815 app[5683d266b62008] sjc [info] level=INFO msg="http: POST /stream: error: cannot connect to self"

2024-01-15T21:57:14.815 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: disconnected from primary with error, retrying: connect to primary: invalid response: code=400 ('http://5683d266b62008.vm.contribution-analyzer.internal:20202')"

2024-01-15T21:57:14.892 app[56833295f19048] sjc [info] level=INFO msg="D3D9F5BC92A0DC5D: primary lease acquired, advertising as http://56833295f19048.vm.contribution-analyzer.internal:20202"

2024-01-15T21:57:14.967 app[56833295f19048] sjc [info] level=INFO msg="proxy server listening on: http://localhost:8080"

2024-01-15T21:57:14.967 app[56833295f19048] sjc [info] level=INFO msg="executing command: node [./other/setup-swap.js]"

2024-01-15T21:57:15.159 app[56833295f19048] sjc [info] setting up swapfile...

2024-01-15T21:57:15.192 app[56833295f19048] sjc [info] swapfile setup complete

2024-01-15T21:57:15.197 app[56833295f19048] sjc [info] level=INFO msg="executing command: npx [prisma migrate deploy]"

2024-01-15T21:57:15.974 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: existing primary found (56833295f19048), connecting as replica to \"http://56833295f19048.vm.contribution-analyzer.internal:20202\""

2024-01-15T21:57:15.976 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: disconnected from primary with error, retrying: connect to primary: Post \"http://56833295f19048.vm.contribution-analyzer.internal:20202/stream\": read tcp [fdaa:0:5d71:a7b:138:6263:1e5f:2]:53966->[fdaa:0:5d71:a7b:24b:b2ec:af:2]:20202: read: connection reset by peer ('http://56833295f19048.vm.contribution-analyzer.internal:20202')"

2024-01-15T21:57:16.307 app[56833295f19048] sjc [info] Prisma schema loaded from prisma/schema.prisma

2024-01-15T21:57:16.315 app[56833295f19048] sjc [info] Datasource "db": SQLite database "data.db" at "file:./data.db?connection_limit=1"

2024-01-15T21:57:16.347 app[56833295f19048] sjc [info] SQLite database data.db created at file:./data.db?connection_limit=1

2024-01-15T21:57:16.356 app[56833295f19048] sjc [info] 1 migration found in prisma/migrations

2024-01-15T21:57:16.362 app[56833295f19048] sjc [info] Applying migration `20240106184210_init`

2024-01-15T21:57:16.369 app[56833295f19048] sjc [info] The following migration have been applied:

2024-01-15T21:57:16.369 app[56833295f19048] sjc [info] migrations/

2024-01-15T21:57:16.369 app[56833295f19048] sjc [info] └─ 20240106184210_init/

2024-01-15T21:57:16.369 app[56833295f19048] sjc [info] └─ migration.sql

2024-01-15T21:57:16.369 app[56833295f19048] sjc [info] All migrations have been successfully applied.

2024-01-15T21:57:16.424 app[56833295f19048] sjc [info] level=INFO msg="starting background subprocess: npm [start]"

2024-01-15T21:57:16.424 app[56833295f19048] sjc [info] waiting for signal or subprocess to exit

2024-01-15T21:57:16.749 app[56833295f19048] sjc [info] > start

2024-01-15T21:57:16.749 app[56833295f19048] sjc [info] > cross-env NODE_ENV=production node .

2024-01-15T21:57:17.126 app[5683d266b62008] sjc [info] level=INFO msg="6EC604529F5D5E05: existing primary found (56833295f19048), connecting as replica to \"http://56833295f19048.vm.contribution-analyzer.internal:20202\""

2024-01-15T21:57:17.130 app[56833295f19048] sjc [info] level=INFO msg="D3D9F5BC92A0DC5D: stream connected ([fdaa:0:5d71:a7b:138:6263:1e5f:2]:46138)"

2024-01-15T21:57:18.305 app[56833295f19048] sjc [info] 🚀 We have liftoff!

2024-01-15T21:57:18.305 app[56833295f19048] sjc [info] Local: http://localhost:8081

2024-01-15T21:57:18.305 app[56833295f19048] sjc [info] On Your Network: http://172.19.7.106:8081

2024-01-15T21:57:18.305 app[56833295f19048] sjc [info] Press Ctrl+C to stop

2024-01-15T21:57:25.500 runner[5683d266b62008] sjc [info] Pulling container image registry.fly.io/contribution-analyzer:deployment-01HM7HYG149DE9CXN71V5EKH3X

@scefali
Copy link
Author

scefali commented Jan 21, 2024

@benbjohnson Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants