Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-exporter-collector多节点采集时只采集最后一个节点的问题 #14

Open
chengjiahua opened this issue Dec 2, 2020 · 3 comments

Comments

@chengjiahua
Copy link

chengjiahua commented Dec 2, 2020

问题已经定位,collector目录下gather.go文件Gather()
测试了四节点和两节点的,都存在这个问题,一下为两节点的例子。

`func Gather() []*dataobj.MetricValue {
var wg sync.WaitGroup
var res []*dataobj.MetricValue

cfg := config.Get()
metricChan := make(chan *dataobj.MetricValue)
done := make(chan struct{}, 1)

go func() {
	defer func() { done <- struct{}{} }()
	for m := range metricChan {
		res = append(res, m)
	}
}()

for _, url := range cfg.ExporterUrls {
	fmt.Println("out :", url)
	wg.Add(1)
	go func() {
		defer wg.Done()
		if metrics, err := gatherExporter(url); err == nil {
			fmt.Println("in :", url)
			for _, m := range metrics {
				if typ, exists := cfg.MetricType[m.Metric]; exists {
					m.CounterType = typ
				}

				if cfg.MetricPrefix != "" {
					m.Metric = cfg.MetricPrefix + m.Metric
				}
				metricChan <- m
			}
		}
	}()
	// time.Sleep(2 * time.Second)
}
wg.Wait()
close(metricChan)

<-done

return res

}`

这是测试过程,加了fmt.Println("out :", url)和fmt.Println("in :", url)以及// time.Sleep(2 * time.Second)三行代码

`[root@master prometheus-exporter-collector]# cat plugin.test.json | ./prometheus-exporter-collector

out : http://192.168.84.13:9001/metrics

in : http://192.168.84.13:9001/metrics

out : http://192.168.84.9:9001/metrics

in : http://192.168.84.9:9001/metrics

[root@master prometheus-exporter-collector]#

[root@master prometheus-exporter-collector]#

[root@master prometheus-exporter-collector]# cat plugin.test.json | ./prometheus-exporter-collector

out : http://192.168.84.13:9001/metrics

out : http://192.168.84.9:9001/metrics

in : http://192.168.84.9:9001/metrics

in : http://192.168.84.9:9001/metrics

[root@master prometheus-exporter-collector]#

`

测试结果为,上面测试结果为加了两秒延迟的结果,为我们的期待结果,下面的测试结果为没有加延迟的结果,只采集到了最后一个节点。应该是创建线程需要时间,直接跳过了,也可能是解释器优化掉了。反正就是两个线程的参数一样而导致只能获取同一节点数据。
已经证实的解决方案:
1.加延迟(性能低,1s测试不行,主要是本人测试环境延迟高)
2.修改机制,不采用多线程获取一个节点监控数据,直接一个进程获取到所有节点监控数据
具体操作为gatherExporter()函数直接传入数组,在其内部循环获取各个节点数据。
3.多插件机制,即每个插件只监控一个节点。

@UlricQin
Copy link
Member

UlricQin commented Dec 3, 2020

我联系作者给看看

@lts120784620
Copy link
Collaborator

lts120784620 commented Dec 3, 2020

@chengjiahua @UlricQin 这个应该是匿名goroutine传参问题导致的,使用了外层循环中的url变量(实际只是个range的副本,变量名称起的随意了..)
#15

@chengjiahua
Copy link
Author

chengjiahua commented Dec 3, 2020

@UlricQin @lts120784620 n9e又找到一个新bug,邮件报警方式,在发送对象为多个时,只要有一个对象参数(邮件地址)错误时,所有对象都接收不到报警信息。理想情况下应该是只有配置信息错误的对象接收不到报警信息,其他配置正确的对象要能接收到报警信息。其他报警方式还没有测试,定位问题很快,很容易找到问题,就不在此多于叙述。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants