Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Enhancements - Performance with one thousand of hosts #12

Open
ellis2323 opened this issue Nov 21, 2016 · 6 comments
Open

Enhancements - Performance with one thousand of hosts #12

ellis2323 opened this issue Nov 21, 2016 · 6 comments
Labels

Comments

@ellis2323
Copy link

Hello,

I'm trying to use snap and your plugin to collect metrics into an influxDB. I have succeeded with one host to get the few metrics i need but now to collect 500 metrics on a thousand nodes.
My current approach is to create a task by host. I have boosted snapd.conf to allow many hundred of plugin loading... After 150 hosts, my snap telemetry has crashed. I suspect that it is not the way to do it.
I suspect that the good approach is to modify this plugin to load an array of hosts.

Best regards,

@ellis2323
Copy link
Author

In my case, i have few groups of hosts with the same metric. My idea is to create one task per group. So in my personal case, i would implement something like an array for the "snmp_agent_address" key.

@kindermoumoute
Copy link

Hi @ellis2323,

now to collect 500 metrics on a thousand nodes.

Can you detail more this part. Do you actually want 500 metrics * 1000 nodes?

My current approach is to create a task by host.

Are you using Tribe?

I have boosted snapd.conf to allow many hundred of plugin loading...

Do you load hundred of plugin on one node?

@ellis2323
Copy link
Author

ellis2323 commented Nov 29, 2016

Hello,

I'm working for an operator with thousand of routers/switchs/optical equipments, no real computer and i can't install anything on them... I'm currently testing many solutions like libreNMS, Shinken ... So when i discovered Snap Telemetric, i was hoping to use it to collect many snmp states of all equipments.
Also, Tribe is not the solution. I tried to load 300 instances of snmp + influx plugins but my VM used too much memory and crashed.

My current solution is to use telegraf, which works well with many nodes.

@otsuarez
Copy link

otsuarez commented Dec 9, 2016

Hi,
I'm having the same issues, is there any roadmap on a solution for this scenario?
Best,

@ellis2323
Copy link
Author

I've read the code and it is possible to solve with few lines of code. The main difficulty is the error management. Today, when a snmp target doesn't respond (timeout & tries), the task stops. In my scenario, we don't want this behaviour but i'm not sure if it's the philosophy of this tool.

@nanliu
Copy link
Contributor

nanliu commented Dec 12, 2016

@ellis2323, so you can disable this behavior by setting max-failures: -1 in the task configuration per:
https://github.com/intelsdi-x/snap/blob/master/docs/TASKS.md#max-failures

What are the other issues you've seen?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants