Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DP not up (sometimes) when reloading config through SIGHUP #4568

Open
TVKain opened this issue Sep 12, 2024 · 5 comments
Open

DP not up (sometimes) when reloading config through SIGHUP #4568

TVKain opened this issue Sep 12, 2024 · 5 comments

Comments

@TVKain
Copy link

TVKain commented Sep 12, 2024

image

  • When reloading the config through SIGHUP, faucet sometimes logs out DP not up and new flows are not sent down to the switch
  • This behavior seems to be inconsistent

Here is the capture of the traffic between the switch and the controller when "DP not up"
image

Faucet version: 1.10.11

@gizmoguy
Copy link
Member

Would need a bit more information to debug this, I notice your capture is started after the log message so any change to TCP state of the control channel will be missing.

But does the switch eventually recover and have the correct flows programmed? dp not up isn't necessarily a problem, faucet is just saying the switch reset its control channel state.

@TVKain
Copy link
Author

TVKain commented Sep 19, 2024

Steps to reproduce

  1. A process send SIGHUP to the faucet controller every 5 seconds
  2. Faucet controller running listening to port 6653
  3. One Open vSwitch switch connected to the faucet controller
ovs-vsctl set-controller br-f1 tcp:127.0.0.1:6653
  1. Faucet config file contains 5 VLANs each with 3 ACL rules
  2. New flows are sent down to OVS
  3. Populate faucet config file with 3000 VLANs each with 3 ACL rules
  4. Faucet log shows DP down and new flows aren't sent down to the OVS switch

dp_down

PCAP files

The pcap files contains the captured packets starting at the moment the config file has 5 VLANs and the flows for those are sent down to the switch (everything was fine up until this point) and end after faucet log shows DP down
faucet.zip

Versions

  • Faucet 1.10.11
  • Open vSwitch 3.3.0

@gizmoguy
Copy link
Member

gizmoguy commented Sep 25, 2024

Thanks for the additional information.

This will be caused by the default openflow hello timers for openvswitch being too low for the number of flow rules you want to push and openvswitch timing out the connection.

You need to tune the following ovsdb options:

  • inactivity_probe
  • controller_rate_limit
  • controller_burst_limit

There is some documentation here on how to do that:

https://bugs.launchpad.net/neutron/+bug/1817022

Also note there was a bug in certain versions of OVS (introduced in v2.12.0 and fixed in v3.3.0) where these configuration values weren't always honored, so make sure you aren't running an affected version, see details on this mailing list thread:

https://mail.openvswitch.org/pipermail/ovs-dev/2023-September/408205.html

@TVKain
Copy link
Author

TVKain commented Sep 26, 2024

Thank you for the reply, I will try it ASAP.

Though I do have an additional question, I have not dug too much into the source code yet but I notice that, whenever there's changes in VLAN or Port, faucet "cold" starts, in other situations like changes to ACLs, faucet "warm" starts. Could you tell me why that is ?

Also could you clarify the behavior of "cold" starting vs "warm" starting ?

Sidenote:

  • I have tried setting the inactivity_probe to 3000000 and the error still persists
  • I tried edited out the part which I believed to cause faucet to "cold" start and the error seems to disappeared, and flows are sent down the switch.
  • It happens even with few VLANs

File: valve.py
Function: _apply_config_changes(self, new_dp, changes, valves=None)

        # # If pipeline or all ports changed, default to cold start.
        # if self._pipeline_change(new_dp):
        #     self.dp_init(new_dp, valves)
        #     return restart_type, ofmsgs
        #
        # if all_ports_changed:
        #     self.logger.info("all ports changed")
        #     self.dp_init(new_dp, valves)
        #     return restart_type, ofmsgs

@TVKain
Copy link
Author

TVKain commented Sep 30, 2024

Another follow up to this
This is the osken-manager log when the incident happened
image

This is the osken-manager log when "cold" reload works normally
image

This is the Open vSwitch logs in both cases
image

From the logs, I see that the error happens because an event is missing

connected socket:<eventlet.greenio.base.GreenSocket....

Could this be the reason ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants