Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: maintenance node issues #106

Closed
glitchvern opened this issue Oct 16, 2024 · 1 comment · Fixed by #111
Closed

Bug: maintenance node issues #106

glitchvern opened this issue Oct 16, 2024 · 1 comment · Fixed by #111
Assignees
Labels
bug Something isn't working needs-analysis
Milestone

Comments

@glitchvern
Copy link

General

Crash when maintenance node is defined to an actual host.
VMs bouncing between nodes due to sorted returning all VMs instead of just the VMs on the maintenance node and because the maintenance node is included in the rebalance comparison.
Maintenance node specified on the commandline not being treated the same as maintenance node specified in config file due to proxlb_config['vm_maintenance_nodes'] being passed into get_node_statistics and app_args.maintenance not being passed in.

Weighting

Score: 5

Config

[proxmox]
api_host: 172.16.11.25,172.16.11.36,172.16.11.50
verify_ssl: 0
[vm_balancing]
enable: 1
method: cpu
mode: used
maintenance_nodes: dummynode03,dummynode04,avh-proxmox08
mode_option: percent
balanciness: 15
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[storage_balancing]
enable: 0
[update_service]
enable: 0
[api]
enable: 0
[service]
daemon: 1
schedule: 2
log_verbosity: DEBUG
config_version: 3

Log (Crash)

<6> ProxLB: Info: [logger]: Logger verbosity got updated to: DEBUG.
<4> ProxLB: Warning: [api-connection]: API connection does not verify SSL certificate.
<6> ProxLB: Info: [api-connect-get-host]: Multiple hosts for API connection are given. Testing hosts for further usage.
<6> ProxLB: Info: [api-connect-get-host]: Testing host 172.16.11.25 on port tcp/8006.
<6> ProxLB: Info: [api-connect-test-host]: Timeout for host 172.16.11.25 is set to 2 seconds.
<6> ProxLB: Info: [api-connect-test-host]: Host 172.16.11.25 is reachable on port tcp/8006.
<7> ProxLB: Starting new HTTPS connection (1): 172.16.11.25:8006
<7> ProxLB: https://172.16.11.25:8006 "POST /api2/json/access/ticket HTTP/11" 200 758
<6> ProxLB: Info: [api-connection]: API connection succeeded to host: 172.16.11.25.
<6> ProxLB: Info: [only-on-master-executor]: No master only rebalancing is defined. Skipping validation.
<7> ProxLB: Starting new HTTPS connection (1): 172.16.11.25:8006
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes HTTP/11" 200 511
<6> ProxLB: Info: [node-statistics]: Added node avh-proxmox10.
<6> ProxLB: Info: [node-statistics]: Added node avh-proxmox09.
<6> ProxLB: Info: [node-statistics]: Added node avh-proxmox08.
<6> ProxLB: Info: [node-statistics]: Maintenance mode: avh-proxmox08 is set to maintenance mode.
<6> ProxLB: Info: [node-statistics]: Created node statistics.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes HTTP/11" 200 514
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu HTTP/11" 200 760
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/101/config HTTP/11" 200 662
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark03 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/105/config HTTP/11" 200 1192
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM debian-12.6.0-template from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/115/config HTTP/11" 200 1173
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-proxlb from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/115/config HTTP/11" 200 1179
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-proxlb.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-proxlb found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-proxlb found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-proxlb found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-proxlb.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/113/config HTTP/11" 200 1186
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark14 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/113/config HTTP/11" 200 1182
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark14.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark14 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark14 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark14 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark14.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/109/config HTTP/11" 200 1183
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark10 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/109/config HTTP/11" 200 1178
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark10.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark10 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark10 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark10 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark10.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/103/config HTTP/11" 200 650
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark05 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/103/config HTTP/11" 200 650
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark05.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark05 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark05 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark05.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/107/config HTTP/11" 200 1181
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark08 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/107/config HTTP/11" 200 1186
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark08.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark08 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark08 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark08 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark08.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/110/config HTTP/11" 200 1182
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark11 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/110/config HTTP/11" 200 1182
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark11.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark11 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark11 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark11 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark11.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/100/config HTTP/11" 200 648
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark01 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/106/config HTTP/11" 200 1179
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark07 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/106/config HTTP/11" 200 1180
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark07.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark07 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark07 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark07 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark07.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/108/config HTTP/11" 200 1178
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark09 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/108/config HTTP/11" 200 1182
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark09.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark09 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark09 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark09 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark09.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/112/config HTTP/11" 200 1183
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark13 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/112/config HTTP/11" 200 1177
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark13.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark13 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark13 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark13 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark13.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/102/config HTTP/11" 200 650
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark04 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/102/config HTTP/11" 200 650
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark04.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark04 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark04 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark04.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/114/config HTTP/11" 200 1176
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark15 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/114/config HTTP/11" 200 1178
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark15.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark15 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark15 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark15 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark15.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/104/config HTTP/11" 200 650
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark06 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox10/qemu/104/config HTTP/11" 200 650
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark06.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark06 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark06 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark06.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox09/qemu HTTP/11" 200 11
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox08/qemu HTTP/11" 200 242
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox08/qemu/111/config HTTP/11" 200 1184
<6> ProxLB: Info: [api-get-vm-tags]: Got no VM/CT tag for VM avh-benchmark12 from API.
<7> ProxLB: https://172.16.11.25:8006 "GET /api2/json/nodes/avh-proxmox08/qemu/111/config HTTP/11" 200 1183
<6> ProxLB: Info: [vm-statistics]: Getting disk information for vm avh-benchmark12.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark12 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark12 found.
<6> ProxLB: Info: [vm-statistics]: No (or unsupported) disk(s) for avh-benchmark12 found.
<6> ProxLB: Info: [vm-statistics]: Added vm avh-benchmark12.
<6> ProxLB: Info: [vm-statistics]: Created VM statistics.
<6> ProxLB: Info: [node-update-statistics]: Updated node resource assignments by all VMs.
<7> ProxLB: node_statistics
<6> ProxLB: Info: [balancing-method-validation]: Valid balancing method: cpu
<6> ProxLB: Info: [balancing-mode-validation]: Valid balancing method: used
<7> ProxLB: Info: [balanciness-validation]: Node: avh-proxmox10 with values: {'maintenance': False, 'ignore': False, 'cpu_total': 32, 'cpu_assigned': 24, 'cpu_assigned_percent': 75.0, 'cpu_assigned_percent_last_run': 0, 'cpu_used': 0.0250568775748632, 'cpu_free': 31.198179917604378, 'cpu_free_percent': 97, 'cpu_free_percent_last_run': 97, 'memory_total': 405647290368, 'memory_assigned': 25769803776, 'memory_assigned_percent': 6.352761225798363, 'memory_assigned_percent_last_run': 0, 'memory_used': 25695907840, 'memory_free': 379951382528, 'memory_free_percent': 93, 'memory_free_percent_last_run': 0, 'disk_total': 457868050432, 'disk_assigned': 412316860416, 'disk_assigned_percent': 90.0514591544393, 'disk_assigned_percent_last_run': 0, 'disk_used': 3131834368, 'disk_free': 454736216064, 'disk_free_percent': 99, 'disk_free_percent_last_run': 0, 'cpu_free_percent_match': False}
<7> ProxLB: Info: [balanciness-validation]: Node: avh-proxmox09 with values: {'maintenance': False, 'ignore': False, 'cpu_total': 8, 'cpu_assigned': 0, 'cpu_assigned_percent': 0, 'cpu_assigned_percent_last_run': 0, 'cpu_used': 0.0195373366894403, 'cpu_free': 7.843701306484478, 'cpu_free_percent': 98, 'cpu_free_percent_last_run': 98, 'memory_total': 270383333376, 'memory_assigned': 0, 'memory_assigned_percent': 0, 'memory_assigned_percent_last_run': 0, 'memory_used': 8328151040, 'memory_free': 262055182336, 'memory_free_percent': 96, 'memory_free_percent_last_run': 0, 'disk_total': 461265305600, 'disk_assigned': 0, 'disk_assigned_percent': 0, 'disk_assigned_percent_last_run': 0, 'disk_used': 3120693248, 'disk_free': 458144612352, 'disk_free_percent': 99, 'disk_free_percent_last_run': 0, 'cpu_free_percent_match': False}
<7> ProxLB: Info: [balanciness-validation]: Node: avh-proxmox08 with values: {'maintenance': True, 'ignore': False, 'cpu_total': 8, 'cpu_assigned': 2, 'cpu_assigned_percent': 25.0, 'cpu_assigned_percent_last_run': 0, 'cpu_used': 0.0264090964665607, 'cpu_free': 7.788727228267515, 'cpu_free_percent': 97, 'cpu_free_percent_last_run': 97, 'memory_total': 270381445120, 'memory_assigned': 2147483648, 'memory_assigned_percent': 0.7942422406415165, 'memory_assigned_percent_last_run': 0, 'memory_used': 9287348224, 'memory_free': 261094096896, 'memory_free_percent': 96, 'memory_free_percent_last_run': 0, 'disk_total': 461268582400, 'disk_assigned': 34359738368, 'disk_assigned_percent': 7.448965674016822, 'disk_assigned_percent_last_run': 0, 'disk_used': 2777677824, 'disk_free': 458490904576, 'disk_free_percent': 99, 'disk_free_percent_last_run': 0, 'cpu_free_percent_match': False}
<6> ProxLB: Info: [balanciness-validation]: Rebalancing for cpu is not needed. Highest usage: 98% | Lowest usage: 97%.
<6> ProxLB: Info: [rebalancing-vm-calculator]: Balancing calculations done.
<6> ProxLB: Info: [rebalancing-maintenance-vm-calculator]: Maintenance mode for the following hosts defined: {'avh-proxmox08'}
Traceback (most recent call last):
File "/app/proxlb", line 1571, in
main()
File "/app/proxlb", line 1550, in main
node_statistics, vm_statistics = balancing_vm_maintenance(proxlb_config, app_args, node_statistics, vm_statistics)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/proxlb", line 885, in balancing_vm_maintenance
node_vms = sorted(vm_statistics.items(), key=lambda item: item[0] if item[1]['node_parent'] == node_name else [])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'str' and 'list'

Log (VM Bouncing)

log_verbosity changed to INFO
maintenance_nodes changed to avh-proxmox10
changed [] to "" in the sorted function above.

vm-bouncing.txt

Meta

Version: 1.0.4
Installed from: github source into a docker container
Running as: Container

Patch

Fixes crash and VMs bouncing between nodes.
Does not fix the commandline issue.
maintenance.patch.txt
I'm not sure how you want to fix the commandline issue since you might want something general to combine the commandline and config file options before entering the main loop.

@glitchvern glitchvern added bug Something isn't working needs-analysis labels Oct 16, 2024
@gyptazy gyptazy added this to the Release 1.0.5 milestone Oct 17, 2024
@gyptazy gyptazy self-assigned this Oct 17, 2024
gyptazy added a commit that referenced this issue Oct 17, 2024
…ype objects

  - Fix node (and its objects) evaluation when not reachable (e.g., maintenance).
  - Fix evaluation of maintenance mode where comparing list & string resulted in a crash (by @glitchvern).
  - Set ProxLB version to 1.0.5b

Fixes: #105
Fixes: #106
Contributed-by: @glitchvern
@gyptazy
Copy link
Owner

gyptazy commented Oct 17, 2024

Hey @glitchvern,

thanks for reporting and also for providing this fix!
I created PR #111 which would fix this and also the mentioned bug in #107.

While it looks good from the code, I need to validate the output of https://github.com/gyptazy/ProxLB/pull/111/files#diff-4d47e7584181ff92b3c3f57588b89e4fb11158ac22f3d50066588c07267e5a86R888.

Will update this later.

Thanks,
gyptazy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-analysis
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants