Thursday, 2017-08-31

*** edmondsw has quit IRC00:01
*** tovin07_ has joined #openstack-telemetry00:30
*** thorst_afk has joined #openstack-telemetry00:31
*** catintheroof has quit IRC00:35
*** zhurong has joined #openstack-telemetry00:49
*** Tom_ has joined #openstack-telemetry01:34
*** Tom_ has quit IRC01:34
*** thorst_afk has quit IRC01:41
*** lhx__ has joined #openstack-telemetry01:59
*** thorst_afk has joined #openstack-telemetry02:00
*** jmlowe has quit IRC02:00
*** jmlowe has joined #openstack-telemetry02:01
*** catintheroof has joined #openstack-telemetry02:03
*** Tommy_Tom has joined #openstack-telemetry02:41
*** thorst_afk has quit IRC02:45
*** thorst_afk has joined #openstack-telemetry02:46
*** Tom_ has joined #openstack-telemetry02:48
*** thorst_afk has quit IRC02:50
*** Tommy_Tom has quit IRC02:51
*** catintheroof has quit IRC03:07
*** ddyer has joined #openstack-telemetry03:11
*** jmlowe has quit IRC03:13
*** jmlowe has joined #openstack-telemetry03:14
openstackgerritSU, HAO-CHEN proposed openstack/panko master: Remove operator checking when using filter  https://review.openstack.org/49943503:31
*** ddyer has quit IRC03:33
*** links has joined #openstack-telemetry03:39
*** psachin has joined #openstack-telemetry03:43
*** thorst_afk has joined #openstack-telemetry03:47
openstackgerritSU, HAO-CHEN proposed openstack/panko master: Remove operator checking when using filter  https://review.openstack.org/49943503:50
*** thorst_afk has quit IRC03:51
openstackgerritSU, HAO-CHEN proposed openstack/panko master: Remove operator checking when using filter  https://review.openstack.org/49943504:01
openstackgerritSU, HAO-CHEN proposed openstack/panko master: Remove operator checking when using filter  https://review.openstack.org/49943504:06
*** Tom_ has quit IRC04:23
*** Tom has joined #openstack-telemetry04:23
*** Tom has quit IRC04:27
*** zhurong has quit IRC04:29
*** thorst_afk has joined #openstack-telemetry04:47
*** zhurong has joined #openstack-telemetry04:48
*** thorst_afk has quit IRC04:52
*** vint_bra has joined #openstack-telemetry04:53
*** vint_bra has quit IRC04:58
*** yprokule has joined #openstack-telemetry05:01
*** iranzo has joined #openstack-telemetry05:15
*** thorst_afk has joined #openstack-telemetry05:48
*** edmondsw has joined #openstack-telemetry05:52
*** thorst_afk has quit IRC05:53
*** psachin has quit IRC06:10
*** psachin has joined #openstack-telemetry06:14
*** links has quit IRC06:15
*** psachin has quit IRC06:16
*** psachin has joined #openstack-telemetry06:16
*** Tom has joined #openstack-telemetry06:18
*** pcaruana has joined #openstack-telemetry06:22
*** rcernin has joined #openstack-telemetry06:40
*** vint_bra has joined #openstack-telemetry06:41
*** vint_bra has quit IRC06:46
*** thorst_afk has joined #openstack-telemetry06:49
*** thorst_afk has quit IRC06:54
openstackgerritliuwei proposed openstack/ceilometer master: Modify the unit conversion error, add the meter item description in the document  https://review.openstack.org/49947606:56
openstackgerritliuwei proposed openstack/ceilometer master: Modify the unit conversion error, add the meter item description in the document  https://review.openstack.org/49947606:58
*** flg_ has joined #openstack-telemetry07:03
*** hoonetorg has quit IRC07:04
*** lhx__ has quit IRC07:07
*** lhx__ has joined #openstack-telemetry07:08
openstackgerritliuwei proposed openstack/ceilometer master: Modify the unit conversion error, add the meter item description in the document  https://review.openstack.org/49948007:11
*** hoonetorg has joined #openstack-telemetry07:16
*** lhx__ has quit IRC07:18
*** lhx__ has joined #openstack-telemetry07:18
*** iranzo has quit IRC07:20
*** iranzo has joined #openstack-telemetry07:23
*** iranzo has quit IRC07:23
*** iranzo has joined #openstack-telemetry07:23
*** Tom has quit IRC07:25
*** Tom has joined #openstack-telemetry07:25
*** links has joined #openstack-telemetry07:29
*** tesseract has joined #openstack-telemetry07:32
*** Tom has quit IRC07:35
*** Tom has joined #openstack-telemetry07:36
*** Tom has quit IRC07:40
*** thorst_afk has joined #openstack-telemetry07:50
*** thorst_afk has quit IRC07:55
openstackgerritSU, HAO-CHEN proposed openstack/python-pankoclient master: Fix help message of event list option 'filter'  https://review.openstack.org/49949508:00
*** Tom has joined #openstack-telemetry08:07
*** edmondsw has quit IRC08:08
*** Tom has quit IRC08:12
*** Tom has joined #openstack-telemetry08:15
*** openstackgerrit has quit IRC08:17
*** Tommy_Tom has joined #openstack-telemetry08:19
*** efoley has joined #openstack-telemetry08:23
*** efoley_ has joined #openstack-telemetry08:24
*** efoley has quit IRC08:24
*** efoley_ has quit IRC08:25
*** efoley has joined #openstack-telemetry08:25
*** vint_bra has joined #openstack-telemetry08:30
*** flg_ has quit IRC08:33
*** vint_bra has quit IRC08:34
*** liusheng has quit IRC08:42
*** flg_ has joined #openstack-telemetry08:47
*** thorst_afk has joined #openstack-telemetry08:51
*** thorst_afk has quit IRC08:55
*** flg_ has quit IRC08:57
*** Tommy_Tom has quit IRC09:04
*** liusheng has joined #openstack-telemetry09:07
*** Tom has quit IRC09:26
*** Tom has joined #openstack-telemetry09:27
*** Tom has quit IRC09:27
*** hoonetorg has quit IRC09:36
*** psachin has quit IRC09:45
*** thorst_afk has joined #openstack-telemetry09:51
*** hoonetorg has joined #openstack-telemetry09:53
*** thorst_afk has quit IRC09:56
*** psachin has joined #openstack-telemetry09:58
*** liusheng has quit IRC09:59
*** liusheng has joined #openstack-telemetry10:09
*** tovin07_ has quit IRC10:11
*** vint_bra has joined #openstack-telemetry10:18
*** vint_bra has quit IRC10:22
*** Tom has joined #openstack-telemetry10:25
*** Tom has quit IRC10:29
*** Tom has joined #openstack-telemetry10:39
*** Tom has quit IRC10:44
*** Tom has joined #openstack-telemetry10:49
*** raissa has joined #openstack-telemetry10:52
*** thorst_afk has joined #openstack-telemetry10:52
*** thorst_afk has quit IRC10:57
*** psachin has quit IRC11:08
*** yassine has quit IRC11:12
*** dave-mccowan has joined #openstack-telemetry11:18
*** psachin has joined #openstack-telemetry11:19
*** yassine has joined #openstack-telemetry11:19
*** thorst_afk has joined #openstack-telemetry11:40
*** donghao has joined #openstack-telemetry11:50
*** nijaba has quit IRC12:01
*** vint_bra has joined #openstack-telemetry12:06
*** alexchadin has joined #openstack-telemetry12:08
*** vint_bra has quit IRC12:11
*** catintheroof has joined #openstack-telemetry12:30
*** Tom has quit IRC12:42
*** gordc has joined #openstack-telemetry12:48
*** raissa has quit IRC12:52
*** lhx__ has quit IRC12:56
*** leitan has joined #openstack-telemetry12:57
*** dave-mccowan has quit IRC12:57
*** raissa has joined #openstack-telemetry12:58
*** Tom has joined #openstack-telemetry13:01
*** Tom has quit IRC13:05
gordctheglo13:05
gordcca13:05
gordcdammit. so tired i thought irc was google13:06
*** dave-mccowan has joined #openstack-telemetry13:06
*** dave-mcc_ has joined #openstack-telemetry13:09
*** dave-mccowan has quit IRC13:11
*** zhurong has quit IRC13:14
*** pradk has joined #openstack-telemetry13:15
*** raissa has left #openstack-telemetry13:17
*** alexchadin has quit IRC13:22
dimsgordc : LOL13:48
*** edmondsw has joined #openstack-telemetry13:51
*** psachin has quit IRC14:04
*** cristicalin has joined #openstack-telemetry14:05
*** jobewan has joined #openstack-telemetry14:06
cristicalinhello, is workload partitioning supported for the ceilometer-agent-notification in ocata ?14:07
cristicalinwhen I activate this I see ceilometer creating a _lot_ of queues in rabbitmq and starts accumulating messages14:08
cristicalinmost of them are fetched but not acqnowledged, which leads to the broker filling up and the whole thing starts crashing down as I share the same rabbit with the rest of openstack14:08
gordccristicalin: yes, it's been active for a few cycles now. how many pipeline_processing_queues do you have set?14:13
cristicalingordc, I don't set the pipeline_processing_queues so probably defaults to 1014:14
cristicalinthe problem is the messages don't get ACK'ed14:14
cristicalinso they linger in the queues, I see consumers on the queues and messages are delivered to the consumers but the consumers don't ACK them14:14
gordccristicalin: do you see errors in notification agent?14:15
cristicalinwhat could be causing this ?14:15
cristicalinit complains about metrics out of order14:15
cristicalinand about the broker blocking due to memory pressure14:15
gordcoslo.messaging i believe defaults to ACK on completion (unless something changed in oslo.messaging)14:15
cristicalinbut outside of that nothing of relevance14:15
cristicalinhmm, let me check my version14:16
gordcmetrics out of order is fine (or a known thing). you can handle it a bit better by enabling batching.14:16
cristicalingordc, how do I do that ?14:17
cristicalingordc, my oslo.messaging is 5.17.214:17
gordchttps://github.com/openstack/ceilometer/blob/master/ceilometer/notification.py#L70 althought it's enabled by default14:17
gordchmmm. i don't recall anything in oslo.messaging that regarding ACK/REQUEUE changes14:19
gordcwhat's the memory pressure error?14:19
cristicalingordc, "The broker has blocked the connection: low on memory"14:19
cristicalinmy brokers have 8GB each14:19
cristicalingordc, looking at that source file, maybe this could be a problem ? https://github.com/openstack/ceilometer/blob/master/ceilometer/notification.py#L5114:21
gordcdid you change the default?14:22
cristicalinno14:22
cristicalinmy [notification] section only sets workload_partitioning to True and workers to 414:23
gordcthen it should ACK regardless.14:23
gordcthat seems fine14:23
gordci think the issue is rabbit can't recover at current memory load.14:23
gordcif you want, you could disable collector if you have it, and configure notification agent to publish straight to db (default behaviour in pike, not sure default in ocata))14:24
cristicalinthat's what I did14:25
cristicalinI only run the notification agent at the moment14:25
cristicalini publish to gnocchi:// and direct://?dispatcher=panko14:25
cristicalinas panko:// is only supported in pike14:26
cristicalinand I did not backport the dispatcher patch from pike14:26
gordcyep. hmm... i'm guessing it's related to memory issue. (i haven't really seen your problem so i can only guess)14:26
*** lhx_ has joined #openstack-telemetry14:28
cristicalinI can try draining the queues, I've set a low ttl (5 minutes) so they should drain quite fast14:30
*** donghao has quit IRC14:31
*** vint_bra has joined #openstack-telemetry14:31
cristicalingordc, is the workload partitioning needed in a multi agent setup though ?14:34
*** lhx__ has joined #openstack-telemetry14:36
gordccristicalin: it's only needed if you have multiple notification agents AND you have transformations in your pipeline14:36
*** lhx_ has quit IRC14:36
cristicalini guess the cpu_util transformation counts, right ?14:37
gordcyeah14:38
gordcbut if you're using gnocchi, you can in theory compute that in post (if you're using master)14:38
cristicalinso for these messages the emitter and consumer are both the notification agent14:38
gordccpu_util?14:39
cristicalini'm using gnocchi stable/4.014:39
*** aagate has joined #openstack-telemetry14:39
cristicalinI meal all messages passing through the queue for ceilometer.*sample14:39
gordci don't think the rate functionality is in v4.0... sileht added it in but i see he's not in this channel anymore14:39
cristicalinso I'm stuck with workload partitioning for now14:41
gordcthe pipeline-* queues are IPC for notification agents14:41
gordci'm not sure what the polling agents use but i think they also push to *.sample queues which notification agent picks up and redirects14:41
cristicalinredirects to where ?14:42
*** spilla has joined #openstack-telemetry14:42
gordcgnocchi/panko in your case... whatever is in your publisher14:42
cristicalinoh14:42
cristicalinthe publishing part is done by the notification angent, no ?14:43
gordcright14:43
cristicalini see a preference for publishing over consumption (on the queue) is there a way to tip the scales and force consumption ?14:44
gordcsorry, i don't really understand what you typed14:45
*** iranzo has quit IRC14:47
cristicalinmy understanding is that the notification agent processes the notification.info queue, splits up the work into the *-pipe-* queues and then also consumes these queues14:47
cristicalinis this correct ?14:47
gordcright14:48
*** iranzo has joined #openstack-telemetry14:49
*** iranzo has quit IRC14:49
*** iranzo has joined #openstack-telemetry14:49
cristicalinso this means that the agent acts as the publisher (emitter) to the *-pipe-* queues but also as the consumer14:51
cristicalinis this behavior split between its workers ?14:51
cristicalinhow can I make it allocate more resources to consume the *-pipe-* queues so it catches up with the workload ?14:52
gordcin your case, you have 4 consumers listening to notifications.info queue. all 4 notificaiton agents can also publish to all pipe-* queues. each pipe-* queue is listened to by one of the 4 notification agents.14:54
*** yprokule has quit IRC14:55
gordcyou can't have more than one notification agent listening to pipe-* queue because that's the purpose of the multiple pipes... it groups related points into a pipe so any transformation that needs to be done is guaranteed to have access to right data.14:56
gordcif more than one consumer listens to a pipe, it means, agentA might handle a message that agentB needs14:56
*** jobewan is now known as jobewan_away14:58
cristicalinok, so when does the agent decide to consume the queue ?15:00
cristicalinI mean fetch a message and (eventually) ACK it15:01
gordcit works same as all other work queues. it just polls queue for work.15:01
gordcthe main queue listener will poll, and take action right away on individual messages.15:02
gordcthe pipe listeners will poll, and depending on your batch settings, wait x time or for y messages, before proceeding15:02
cristicalinok, I understand15:04
gordcyeah, it's not pretty :(. probably something more elegant could be done with kafka... or with gnocchi master15:05
*** cristicalin has quit IRC15:08
*** cristicalin has joined #openstack-telemetry15:10
*** cristicalin has quit IRC15:13
*** iranzo has quit IRC15:22
*** pcaruana has quit IRC15:32
*** edmondsw has quit IRC15:40
*** jobewan_away is now known as jobewan15:40
*** ddyer has joined #openstack-telemetry15:41
*** edmondsw has joined #openstack-telemetry15:51
*** lhx__ has quit IRC15:51
*** lhx__ has joined #openstack-telemetry15:52
*** edmondsw has quit IRC15:56
*** lhx_ has joined #openstack-telemetry16:10
*** lhx__ has quit IRC16:10
*** hoonetorg has quit IRC16:18
*** jobewan has quit IRC16:24
*** lhx_ has quit IRC16:24
*** flg_ has joined #openstack-telemetry16:36
*** dave-mcc_ is now known as dave-mccowan17:01
*** psachin has joined #openstack-telemetry17:11
*** rcernin has quit IRC17:12
*** tesseract has quit IRC17:14
*** edmondsw has joined #openstack-telemetry17:42
*** Tom has joined #openstack-telemetry17:43
*** edmondsw_ has joined #openstack-telemetry17:44
*** edmondsw has quit IRC17:46
*** Tom has quit IRC17:47
*** edmondsw_ has quit IRC17:48
*** efoley has quit IRC17:51
*** psachin has quit IRC17:52
*** links has quit IRC17:52
*** rwsu has quit IRC18:00
*** dave-mccowan has quit IRC18:07
*** Tom has joined #openstack-telemetry18:09
*** edmondsw has joined #openstack-telemetry18:12
*** dave-mccowan has joined #openstack-telemetry18:15
*** rwsu has joined #openstack-telemetry18:16
*** edmondsw has quit IRC18:17
*** edmondsw has joined #openstack-telemetry18:17
*** edmondsw has quit IRC18:21
*** edmondsw has joined #openstack-telemetry18:25
*** edmondsw has quit IRC18:29
*** flg_ has quit IRC18:35
*** ddyer has quit IRC18:37
*** flg_ has joined #openstack-telemetry18:39
*** rcernin has joined #openstack-telemetry19:01
*** ddyer has joined #openstack-telemetry19:06
*** edmondsw has joined #openstack-telemetry19:07
*** iranzo has joined #openstack-telemetry19:11
*** iranzo has joined #openstack-telemetry19:11
*** edmondsw has quit IRC19:12
*** spilla has quit IRC19:12
*** hoonetorg has joined #openstack-telemetry19:14
*** iranzo has quit IRC19:19
*** Tom has quit IRC19:22
*** edmondsw has joined #openstack-telemetry19:28
*** edmondsw has quit IRC19:32
*** cristicalin has joined #openstack-telemetry20:09
cristicalingordc, are you still around ?20:24
cristicalinseems in ocata at least batching is broken20:24
cristicalinmy issue earlier with unaccked messages in the queue went away after I set batch_size=120:24
cristicalinas the batches are processed by a single thread20:24
gordcyep20:25
gordchmmm... i'm not entirely sure batch_size actually changes the threads20:25
cristicalinhttps://github.com/openstack/ceilometer/blob/stable/ocata/ceilometer/notification.py#L293-L29620:25
cristicalinthat's your coment there20:25
cristicalinanyway with batch_size=1 my test system is able to chug along with metrics every minute20:26
gordcyeah. but that code says if batch_size == 1, don't override thread count20:26
cristicalinnow ... production ... just 100 times larger20:26
gordcthere might be an issue in oslo.messaging... might need to dig through bugs there.20:27
gordcwhen your system was crashing. which queues were large?20:27
gordcthe pipe-* queues or notifications.* queue?20:28
cristicalinthe ceilometer-pipe-*20:29
cristicalinactually they accrued about 8 times the size of the notification.* queue20:29
cristicalinwhich was completely puzling20:29
cristicalinjugding by the code it the get_batch_notification_listener is just a proxy to the oslo.messaging part20:30
cristicalinso yes, probably an oslo issue at this point20:30
cristicalinI'll dig through their releases for ocata20:30
gordcyou're using threading executor in oslo.messaging?20:31
gordcnm. i guess that's not actually configurable anyways20:32
cristicalinerm, I did not set any config related to that so probably the default20:32
cristicalinit seems hardcoded to threading20:32
cristicalinone other issue, I'm runing the rest of the cloud in mitaka20:34
cristicalinjust the telemetry part is ocata20:34
cristicalinI spotted a problem with the new collection method on the compute nodes20:34
cristicalinthe one that queries libvirt for the metadata instead of nova-api20:34
cristicalinnot all instances have a user_id and project_id , can't really figure out why20:35
cristicalinshouldn't the code fallback to the old method if the new one doesn't work ?20:35
cristicalintoday I just hacked to code to return 'UNKNOWN' for the missing part but I guess that breaks somewhere down the line20:36
cristicalinand switching back to naive is not really something I'm looking forward to, one of the reasons for my upgrae to ocata was to aleviate the load on the nova-api20:36
gordci don't know how viable it is to fallback (not sure what part of the process it fails at)20:37
gordcbut yeah, you'll need to switch back to naive if it doesn't work in mitaka20:37
gordcyou're welcome to contribute a fallback code (not sure we have the resources to do that to be honest)20:38
*** ddyer has quit IRC20:38
cristicalinI'll try to propose a patch, guess my quick and dirty solution is not something to be accepted upstream20:39
*** ddyer has joined #openstack-telemetry20:39
gordcprobably not... or just don't let anyone know it's a quick/dirty solution.20:40
cristicalinI'll look into adding the fallback as I guess that still should decrease the load on nova-api20:40
cristicalinalso not sure why nova instances would be missing that info I haven't identifies yet the pattern of which instances are missing the info20:40
gordci think when we implemented it, all the information we needed was there for some time. i guess we were wrong20:41
cristicalinlet's poke on #openstack-nova maybe they know20:42
cristicalintracked that capability down to something nova added in juno so in theory anything created post juno should have the user and project id21:02
cristicalinin my case this env started with juno so ... probably a bug in there21:03
cristicalinmaybe your assumption was valid that the data should always be in there21:04
gordccristicalin: cool cool. good to know in theory we support "from juno"21:04
gordci guess now that mitaka is eol, we just need to ensure everything newer works. :p21:05
cristicalinI'll look into the nova problem if it turns out there is actually a valid case for that info to be missing i'll propose the fallback code21:05
gordccristicalin: kk, works for me21:05
*** thorst_afk has quit IRC21:08
*** dave-mccowan has quit IRC21:20
*** ddyer has quit IRC21:27
*** leitan has quit IRC21:29
*** prodriguez83 has joined #openstack-telemetry21:30
*** catintheroof has quit IRC21:31
*** thorst_afk has joined #openstack-telemetry21:34
*** thorst_afk has quit IRC21:36
*** flg_ has quit IRC21:37
*** thorst_afk has joined #openstack-telemetry21:39
*** ddyer has joined #openstack-telemetry21:56
*** prodriguez83 has quit IRC22:10
*** ddyer has quit IRC22:12
*** pradk has quit IRC22:13
*** ddyer has joined #openstack-telemetry22:16
*** cristicalin has quit IRC22:19
*** edmondsw has joined #openstack-telemetry22:30
*** rcernin has quit IRC22:30
*** gordc has quit IRC22:31
*** ddyer has quit IRC23:00
*** thorst_afk has quit IRC23:39
*** edmondsw has quit IRC23:48
*** catintheroof has joined #openstack-telemetry23:55
*** edmondsw has joined #openstack-telemetry23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!