Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

PriFi non-restart bug located #192

Open
lbarman opened this issue Jul 20, 2018 · 5 comments
Open

PriFi non-restart bug located #192

lbarman opened this issue Jul 20, 2018 · 5 comments

Comments

@lbarman
Copy link
Collaborator

lbarman commented Jul 20, 2018

After a client goes into Airplane mode, everything is stuck.
Reason: Relay/trustees believe the protocol is still running

E : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1:  41) - MAX_NUMBER_OF_CONSECUTIVE_FAILED_ROUNDS ( 3 ) reached, killing protocol.
3 : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1:  44) - Stopping experiment, if any.
E : (                         services.(*ServiceState).NetworkErrorHappened: 109) - NetworkErrorHappened, dumping trace
goroutine 677 [running]:
runtime/debug.Stack(0x1, 0xc42020cc50, 0x5422c8)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened(0xc4200a5540, 0x0)
	/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:110 +0x74
github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout(0xc4200a5540, 0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
	/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:101 +0x34
github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm(0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
	/home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90 +0x66
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout(0xc420276000, 0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
	/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158 +0x250
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm(0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
	/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118 +0x66
github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1(0xc42017c880, 0xc400000222)
	/home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46 +0x589
created by github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).downstreamPhase1_openRoundAndSendData
	/home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/relay.go:602 +0x760
E : (                         services.(*ServiceState).NetworkErrorHappened: 121) - A network error occurred with node <nil> , warning other clients.
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 175) - Stopping PriFi protocol
I : (                            services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
I : (                            services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
@lbarman
Copy link
Collaborator Author

lbarman commented Jul 20, 2018

Update: it's a concurrency issue.
Thread 1 is still processing a message, timeout happens on Thread 2, which tries to call "Stop()" on thread 1

1 @ 0x545ee8 0x545cf0 0x542834 0x819c3a 0x81cd5a 0x817f0d 0x81976f 0x819634 0x81caf6 0x8026a0 0x805e26 0x7f9c59 0x459871
#	0x545ee7	runtime/pprof.writeRuntimeProfile+0x97											/usr/local/go/src/runtime/pprof/pprof.go:679
#	0x545cef	runtime/pprof.writeGoroutine+0x9f											/usr/local/go/src/runtime/pprof/pprof.go:641
#	0x542833	runtime/pprof.(*Profile).WriteTo+0x3e3											/usr/local/go/src/runtime/pprof/pprof.go:310
#	0x819c39	github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol+0xb9					/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:181
#	0x81cd59	github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol-fm+0x29				/home/prifi/go/src/github.com/dedis/prifi/sda/services/service.go:149
#	0x817f0c	github.com/dedis/prifi/sda/services.(*churnHandler).handleUnknownDisconnection+0xbc					/home/prifi/go/src/github.com/dedis/prifi/sda/services/churn.go:257
#	0x81976e	github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened+0x11e						/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:127
#	0x819633	github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout+0x33							/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:106
#	0x81caf5	github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm+0x65		/home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90
#	0x80269f	github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout+0x24f						/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158
#	0x805e25	github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm+0x65	/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118
#	0x7f9c58	github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1+0x588		/home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46

I : (                       services.(*ServiceState).IsPriFiProtocolRunning:  43) - IsPriFiProtocolRunning not nil, has stopped false
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 187) - PriFi Running, continuing stop attempt...
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 191) - PriFi not nil, calling stop...

but gets put on hold in

p.relayState.processingLock.Lock()
defer p.relayState.processingLock.Unlock()

We need to find another way to stop the protocol without having it handle a message

lbarman pushed a commit that referenced this issue Jul 20, 2018
@lbarman
Copy link
Collaborator Author

lbarman commented Jul 20, 2018

Please test 8222438 and let me know what is the status ! @italod

lbarman pushed a commit that referenced this issue Jul 30, 2018
@lbarman
Copy link
Collaborator Author

lbarman commented Jul 30, 2018

Bug fixed. I had to rewrite the history (my mistake - this should not happen).
If git pull fails and tells you to merge, don't do it. Instead, do:

git fetch --all
git reset --hard origin/prifi-android

(careful: this does not touches your untracked file, but resets all modified files)

@junchen108 junchen108 self-assigned this Aug 1, 2018
@junchen108
Copy link
Contributor

I am working on the client side.
The client doesn't reconnect after recovering the connectivity.

junchen108 pushed a commit that referenced this issue Aug 2, 2018
@junchen108
Copy link
Contributor

junchen108 commented Aug 2, 2018

Please test f03dde8, it should fix the bug on the client side.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants