-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.5-beta: problem with reloading #518
Comments
The network traffic is going on but QoS policies are not applied to network traffic.
NICs was tested on older LibreQoS versions on ubuntu 22.04 it was working like a charm. First look: |
I actually saw something like this for the first time last night. A It doesn't want to recreate on my local setup, which is going to make this a harder one to debug. |
I can give You my proxmox VM backup if You suspects that I could something do wrong. The most interesting thing is that if i redirect network traffic the reloading go sucessfully in a while. This bug doesn't occur on libreqos without network traffic passing. |
I wonder if ProxMox is the common factor here? Mine was also in ProxMox, passing about 1gbps at the time. It did eventually complete. I'll dive into this as soon as the coffee has done something. |
I was carrying ~6gbps sumarized network throughput on iface when run ./LibreQoS.py from console. |
This one is definitely going to be tricky. It's early morning and our traffic is pretty low (~ 400 mbps) and it ran without hiccups on the live box. (It also ran on my local system with about a gigabit of From timing the parts, it seemed like the longest delays were in:
(Not terrible, but enough that I was surprised to see it waiting - it didn't used to slow down there). Will investigate further. Update: Running it again shows that there's a really big delay there that didn't used to be there. So at least now I have a candidate to examine. |
I've identified the issue. The "hot cache" was being invalidated after every single IP mapping change, rather than once at the end (you have to invalidate it for changes to appear). So I'm in the process of changing the workflow slightly to explicitly flush at the end. My local test (hacked together rather than nice, shareable code) saw a MASSIVE improvement in reload times doing this. |
The "add ip mapping" call was flushing the XDP 'hot cache'. That was fine in testing, and not working well at scale: * Each table wipe locks the associated underlying hash map. * Running 1200+ clears means waiting for 1200 occasions in which packets don't already have a hold on the lock (eBPF doesn't expose the locking mechanisms, so we can't change that behavior) * The result is that under load, it can take a REALLY long time to reload the shaping stack without pausing traffic. Instead, add_ip_mapping no longer clears the hot cache. An explicit invalidation call has been added to the bus, and added to the end of the batched IP map updates. This reduces the number of table locks from 1200+ to 2 (once for clearing the IP map, once for clearing the cache).
…not sure if anyone actually uses that tool, but now it's supported.
Did You try to load 50K circuits? |
Its not so important. More important is "no packet loss during reload". |
I did it. Nothing strange see then - It even creates tc classes and tc qdisc. |
Tested - now it doesnt hang. But I have a message:
What I can do about it? |
Double check that you didn't put anything in a parent node that shouldn't be there; I'll be glad to take a look otherwise (I have a "flat" test setup, but don't touch it often - none of my networks are even remotely flat!). If you want, fire up the |
I don't understand.
I submited dump with lqos_support_tool. |
|
Thanks for the support dump (I love that new tool!). I don't see anything jumping out in the Shaped Devices list - so I'm going to assume that there's a bug to chase down in the flat network handler in (I'm assuming "KOMENTARZ" means comment?) |
Yes - KOMENTARZ means comment, on production this is replaced with some circuit description/identity |
Got bad news. This is unfixed even with #520 |
@thebracket could You add more info to this error? |
Checked again - this bug exists also in newly released
When i stop passing network traffic it ends reloading very quickly. |
I found that in my logs |
That's been there since lqosd existed - it just means the queues haven't
been made yet, and there's nothing useful to read from a pfifo queue that's
there by default.
Is the "hangs" still an issue?
…On Mon, Aug 5, 2024, 3:15 AM Jarosław Kłopotek - INTERDUO < ***@***.***> wrote:
Aug 05 10:12:55 libreqos-beta lqosd[985]: [2024-08-05T08:12:55Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:12:56 libreqos-beta lqosd[985]: [2024-08-05T08:12:56Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:12:57 libreqos-beta lqosd[985]: [2024-08-05T08:12:57Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:12:58 libreqos-beta lqosd[985]: [2024-08-05T08:12:58Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:12:59 libreqos-beta lqosd[985]: [2024-08-05T08:12:59Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:00 libreqos-beta lqosd[985]: [2024-08-05T08:13:00Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:01 libreqos-beta lqosd[985]: [2024-08-05T08:13:01Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:02 libreqos-beta lqosd[985]: [2024-08-05T08:13:02Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:03 libreqos-beta lqosd[985]: [2024-08-05T08:13:03Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:04 libreqos-beta lqosd[985]: [2024-08-05T08:13:04Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:05 libreqos-beta lqosd[985]: [2024-08-05T08:13:05Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:06 libreqos-beta lqosd[985]: [2024-08-05T08:13:06Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:07 libreqos-beta lqosd[985]: [2024-08-05T08:13:07Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:08 libreqos-beta lqosd[985]: [2024-08-05T08:13:08Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:09 libreqos-beta lqosd[985]: [2024-08-05T08:13:09Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:10 libreqos-beta lqosd[985]: [2024-08-05T08:13:10Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:11 libreqos-beta lqosd[985]: [2024-08-05T08:13:11Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:12 libreqos-beta lqosd[985]: [2024-08-05T08:13:12Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
Aug 05 10:13:13 libreqos-beta lqosd[985]: [2024-08-05T08:13:13Z WARN lqos_queue_tracker::queue_types] I don't know how to parse qdisc type pfifo_fast
I found that in my logs
—
Reply to this email directly, view it on GitHub
<#518 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRU432BGFVYU7EK4URDOKTZP4YALAVCNFSM6AAAAABKW3I4IOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYGQ2DSNZWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, but only when big traffic is going through libreqos |
No - in v1.4 I dont have such warning msg and there was lqosd. Dont know how to parse for me means "script dont understand" - maybe we should precise msg a little? Tried to reload without traffic - goes ok but the traffic is not shaped. I am really confused and don't know what to check more. |
@rchac I recorded video: |
…d use a timeout/expiration to clear the cache gracefully. This allowed the map to be unpinned, and never accessed from userspace. Should fix the reload delays, and still give accurate mappings after a complete map rebuild.
One problem is out (reloading during big network traffic) after #545 Second one is still an issue (no traffic shaping and message I got this message every reload in my |
For the 1,000,000,000th time, that warning message isn't a bug.
…On Mon, Aug 12, 2024, 6:42 AM Jarosław Kłopotek - INTERDUO < ***@***.***> wrote:
That's been there since lqosd existed - it just means the queues haven't
been made yet, and there's nothing useful to read from a pfifo queue that's
there by default.
No - in v1.4 I dont have such warning msg and there was lqosd. Dont know
how to parse for me means "script dont understand" - maybe we should
precise msg a little?
Tried to reload without traffic - goes ok but the traffic is not shaped.
I am really confused and don't know what to check more.
One problem is out (reloading during big network traffic) after #545
<#545>
Second one is still an issue (no traffic shaping and message "WARN
lqos_queue_tracker::queue_types] I don't know how to parse qdisc type
pfifo_fast").
—
Reply to this email directly, view it on GitHub
<#518 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRU4367UPJQ6H6HV4MIFI3ZRCNRHAVCNFSM6AAAAABKW3I4IOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTG42DINJSGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
OK - but checked again and in v1.4 (on working VM) there wasn't such warn like You said ealier. What to check then? |
The reason you're seeing that message is that when it polls the queues,
it's finding pfifo and not Cake - so the message isn't the issue. The
question is, why don't you have any queues?
I'd start by going into your config, and changing this line back to `0`,
the default (or removing it):
```toml
override_available_queues = 26 # This can be omitted and be 0 for Python
```
* Does `lsmod | grep cake` show `sch_cake` loaded? (That'd indicate that
Cake isn't installed)
* What do you get for `sudo tc -s qdisc show dev (ifname)` (replace
`ifname` with your interface; `ens16np0` and `ens17np0`
…On Mon, Aug 12, 2024 at 7:03 AM Jarosław Kłopotek - INTERDUO < ***@***.***> wrote:
OK - but checked again and in v1.4 (on working VM) there wasn't such warn
like You said ealier.
What to check then?
—
Reply to this email directly, view it on GitHub
<#518 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRU432U6GXRSSDFQSEBN2DZRCQBFAVCNFSM6AAAAABKW3I4IOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTG44DKMBVGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I got queues loaded (qdisc and classes are on iface) but it looks like the xdp filters are not working. How to list them? |
I:
Testing and I get much packet loss :( I get some data to diagnose: http://kłopotek.pl/lqos_beta_problem/ HW is good on second VM (with v1.4 and ubuntu 22.04) i use the same passthrough NICs and it works well. The maximum network throughtput I could do with libre v1.5-beta2 was: On almost empty VM the lqosd takes 26% of CPU core - is it normal thing? |
Tried to set I will check tommorow if those problems exists in older ubuntu LTS (22.04) with newest LibreQoS. |
I tried to install develop branch on ubuntu 22.04. On older ubuntu 22.04 there was: When compiling:
Is it important? Tried to run lqosd:
I am trying to run lqosd on VirtioNIC's on older Ubuntu 22.04 for testing then test if it works on passthrough NICs. Strace: http://kłopotek.pl/lqos_beta_problem/strace_loading_lqosd_on_olderubuntu |
Is harmless. I just can't stop Linux from emitting it. Do you have the hot-cache PR applied? I wonder if I exceeded the older instruction limit (the intent was not to require the newer kernel). I'm still hoping for a better solution than the one in that PR (which is why I haven't merged it). |
I tested on develop + patch-33 + cherry-pick commit from #545 testbed1: ubuntu 24.04 |
Ok - on ubuntu 24.04 there are no warns during compilation. |
Remind me - Patch-33? |
#505 |
Tested again with PR https://github.com/LibreQoE/LibreQoS/pull/547/commits. Reloading is OK (not hanging). |
Did You come back @thebracket? |
I did, and straight to the world of the unwell (Daughter got a stomach bug, now I'm out with it) |
I hope She is better now. Just give a ping when You got time. |
I don't know is it related to this bug so I created next issue #549. |
#547 is merged so closing this. |
I installed LibreQoS
Running ./LibreQoS.py gave me hang at:
checking deeper:
Running
./LibreQoS.py --debug
shows that:[here is hanging]
strace -p 1677
If I get out network traffic out of LibreQoS network interfaces (showdown vlan facing to the internet) it continue to reload and I could see:
The text was updated successfully, but these errors were encountered: