Thursday, 2021-07-01

*** ysandeep|away is now known as ysandeep01:09
*** fzzf is now known as Guest107102:57
*** fzzf1 is now known as fzzf02:57
*** ykarel|away is now known as ykarel04:39
*** bhagyashris is now known as bhagyashris|ruck05:22
*** chandankumar is now known as chkumar|rover05:31
fricklerbhagyashris|ruck: chkumar|rover: apart from all the nick changes being pretty noisy, can you explain to a non-native speaker what this ruck/rover thing is supposed to mean?06:04
bhagyashris|ruckfrickler, hey you will get info here https://docs.openstack.org/tripleo-docs/latest/ci/ruck_rover_primer.html06:07
bhagyashris|ruckthe major responsibilities are 1. ensuring gate queues are green to keep TripleO patches merging. 2. ensuring promotion jobs are green to keep TripleO up to date with the rest of OpenStack and everything else that isn’t TripleO! Target is bugs filed + escalated + fixed for promotion at least once a week.06:08
*** akekane_ is now known as abhishekk06:32
*** jpena|off is now known as jpena07:34
zbrclarkb: i think it does07:41
*** ysandeep is now known as ysandeep|lunch07:48
*** slaweq_ is now known as slaweq08:29
*** ykarel is now known as ykarel|lunch08:44
*** ysandeep|lunch is now known as ysandeep08:45
*** sshnaidm is now known as sshnaidm|afk09:24
*** Guest651 is now known as aluria09:26
*** ykarel|lunch is now known as ykarel09:59
*** sshnaidm|afk is now known as sshnaidm10:47
*** jpena is now known as jpena|lunch11:47
*** jpena|lunch is now known as jpena12:45
clarkbzbr: I'll start to wind down those processes after some breakfast. It is easy enough to turn them on again if we start to fall behind on the queue14:22
zbrokey14:22
*** ykarel is now known as ykarel|away14:25
elodilleshi, could someone approve this? https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/79723515:19
elodillesthe PTLs have approved already15:19
clarkbelodilles: done15:21
elodillesand as usual, I'm planning to run the script to delete some more eol'd branch (as there are another batch that could be deleted)15:21
elodillesclarkb: thanks!15:21
fungisounds good, thanks elodilles!15:28
opendevreviewMerged openstack/openstack-zuul-jobs master: Remove ocata from periodic job template of neutron and ceilometer  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/79723515:32
clarkb#status log Stopped log workers and logstash daemons on logstash-worker11-20 to collect up to date data on how many indexer workers are necessary15:39
opendevstatusclarkb: finished logging15:39
clarkbhttps://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1 we will want to monitor this set of metrics as well as what e-r reports its indexing delta is15:39
clarkbthis is a 50% reduction so we can do a binary search towards what seems necessary15:40
*** sshnaidm is now known as sshnaidm|afk15:46
wolsen[m]clarkb: I see you commented on https://review.opendev.org/c/openstack/charm-deployment-guide/+/798273 that a recheck won't help with the promotion failure. Do you think the best course of action to submit another patch and let the next promotion effort push it up?15:49
clarkbwolsen[m]: that is usually what we recommend if it isn't too much ahssle. Otherwise a zuul admin has to queue the jobs up again by hand15:50
wolsen[m]clarkb: ack, that's fairly straightforward and we can go with that route. Thanks for the confirmation :-) just wanted a sanity check before going down a futile path15:51
fungiwolsen[m]: yeah, if you don't have anything else worth approving soon, just let us know what needs to be run and i or others can take care of it15:51
wolsen[m]thx fungi, I'll circle back if we need it15:52
*** ysandeep is now known as ysandeep|dinner16:01
*** jpena is now known as jpena|off16:36
*** ysandeep|dinner is now known as ysandeep16:44
clarkbthe indexer queue has grown to about 1k entries. I'll continue to check it periodically. I don't think that is catastrophically high, but if it indicates the start of a "we can't keep up" trend that we should start logstash and log worker proceses again16:49
*** ysandeep is now known as ysandeep|out17:24
elodillesjust to have it here as well: these ocata branches were deleted: http://paste.openstack.org/show/807112/17:29
*** gfidente is now known as gfidente|afk18:09
clarkbthe indexing queue seems fairly stable at ~1.5k entries for the last little bit.18:49
clarkbindicating that at least so far this hasn't been a runaway backlog which is good18:49
clarkband now we are up to 3.5k :/19:57
melwittI didn't realize the indexer was falling behind so much :(  I have been using this page to see whether indexing was behind http://status.openstack.org/elastic-recheck/20:57
fungimelwitt: it's part of an experiment to see how many workers we need, sorry about that20:57
fungiwe're attempting to determine the number of them we can safely turn down without significant prolonged impact to indexing throughput, so as to reduce the maintenance burden and resource consumption however much we can20:58
melwittoh ok, so it's not that something is catastrophically wrong. that's good heh20:58
melwittmakes sense, thanks20:58
clarkbya not an emergency or anything20:59
clarkbwe did end up super behind for a bit beacuse the cluster had crashed20:59
fungiwell, i mean, something is catastrophically wrong, we don't have sufficient people to upgrade and keep this system running long term and desparately need someone to build and run a replacement if it's still useful20:59
clarkbwe've also got that error where we have log entries from centuries in the future still happening so you ahve to look closely at e-r's graph page to see how actually up to date it is20:59
fungiyeah, a sane indexer implementation would discard loglines in the future probably21:00
clarkbwe are down to 2.5k files to index now and trending in the right direction21:02
clarkbits possible that 50% is just enough for a normal day as indicated by the short backlog when busy then catching up alter in the day. We can keep an eye on it for a few days before committing to that reduction in size21:02
melwittso something/some service is logging future dates?21:02
melwittI can look into the indexer to discard loglines in the future21:02
clarkbmelwitt: either that or we are parsing somethign that looks like future dates improperly21:03
melwittack, ok21:03
clarkblet me see if I can convince elasticsearch to tell me what some of those are21:03
clarkb"message":"ubun7496ionic | 2021-01-15 12:52:25,572 zuul.Pipeline.tenant-one.post    DEBUG    Finished queue processor: post (changed: False)" in a job-output.txt file got parsed to "@timestamp":"2021-11-15T12:52:42.301Z"21:07
clarkbthats not a great example because we cannot look at the orignal file its too old /me looks for a better one21:08
melwittah yeah, so it looks like you're right it's a parsing problem. that makes a lot more sense than something logging dates in the future 😝21:09
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7a0/795357/1/check/openstack-tox-py36/7a0cdaa/job-output.txt is one that caused it to parse a timestamp of "@timestamp":"2831-06-08T15:13:41.769Z" from "message":"ubuntu-biu-bionic |   warnings.77d41ber and system_scope:all) of 3at2h06-08 s6msg)"21:14
clarkbthat file is 132MB large21:14
clarkbI wonder if we're causing logstash some sort of buffer alignment issue when we jam it full of data like that21:15
clarkbalso why are ironic unittests exploding with data like that21:15
melwittthere are a ton of deprecation messages up in there21:16
melwitthm, it's the policy deprecation thing, that got fixed a long time ago though. hm21:17
clarkblooks like the chagne merged with that behavior too (not sure if that chagne introduced the behavior or not)21:17
clarkbthat is definitely something that ironic should be looking at cleaning up if it persists21:17
clarkbno one wants that much output to their console when running unittests21:17
clarkbOne thing we've done with Zuul is to only attach the extra debug strings (logs and other output) to the subunit stream when the test is failing21:18
melwittyeah.. I'm looking at it, we had the same thing happen in nova and gmann fixed it a long time ago. I'm looking to see if it was something in nova only rather than a global change elsewhere21:18
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8f9/797055/21/check/openstack-tox-py38/8f938e9/ is an example from today and shows the issue seems to persist21:19
clarkb(that was a successful run too so not a weird error causing it)21:20
melwittguh, yeah it was only in nova https://review.opendev.org/c/openstack/nova/+/676670 I'll upload something similar for ironic21:20
melwitt(if someone else hasn't already)21:20
clarkb`curl -X GET http://localhost:9200/_cat/indices?pretty=true` is how you get the full list of indices. They have timestamps in their names so you can pretty easily identify those that aren't from the last week21:26
clarkbThen `curl -X GET http://localhost:9200/logstash-2831.06.08/_search` dumps all records for a specific index21:27
clarkbI think the _cat/ url may need to be run on localhost only but the search url works anonymously form the internet if you replace localhost with elasticsearch02.openstack.org21:27
clarkbquick notes in case further debugging needs to happen and I'm not around21:27
*** rlandy is now known as rlandy|bbl22:25
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop use of track-upstream  https://review.opendev.org/c/openstack/project-config/+/79912322:47
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop use of track-upstream  https://review.opendev.org/c/openstack/project-config/+/79912323:14
gmannmelwitt: we could disable warning at oslo policy side by default as I can see that can happen in more projects while they implement the new RBAC23:45
fungithrottling warnings is a common approach in other applications23:46
gmannor at least for the time during migration to new RBAC23:46
fungilog it once, then shut up about it23:46
gmannwe can do that . currently it is added per rule check but we can just add a general warning during init time and only once. 23:47
fungiunfortunately that requires some sort of global registry when the warning is coming from a separate module/library, to track which warnings have already been emitted somehow23:47
fungiahh, yeah sometimes you can find a compromise like that23:48
gmannbut that would help as each test class init the oslo module 23:48
melwittgmann: yeah, that would probably make sense being that we're repeating the same thing per project23:48
melwittgmann: I proposed this for ironic, it's just the exact same thing you did in nova https://review.opendev.org/c/openstack/ironic/+/79912023:48
gmannlet me check oslo policy and see what we can do. definitely  ton of warnings for new RBAC is not going to help operator so they are not meaningful 23:49
gmann+123:49
gmannmelwitt: thanks23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!