Wednesday, 2018-12-05

*** pabelanger has joined #openstack-infra00:00
pabelangerfungi: mnaser: thinking outload about a 20 stack-patchseries for nova, and have idea how it would look, but if zuul knows the 20 patches are submitted together, maybe the realitive priority also does. So, if there is the 20 stack nova patch, and 2nd, and 3rd nova patch behind it.  Then nodepool allocated nodes to 2nd and 3rd nova change before 4th path in the nova 20 stack... if that makes sense00:06
clarkbhttp://logs.openstack.org/12/615612/3/gate/neutron-grenade/bcd2c51/logs/grenade.sh.txt.gz#_2018-12-04_23_01_10_429 the grenade job failed on ovs/q-agt timing out00:06
pabelangers/idea/no idea/00:06
pabelangerthat way, other users in nova also get feedback out side the mass patch series00:06
openstackgerritClark Boylan proposed openstack-infra/opendev-website master: Add some initial content thoughtso  https://review.openstack.org/62262400:13
clarkbthat is super rough00:13
clarkband I intend to take faq/q&a content from the email we sent and incorporate it in00:13
clarkbbut figured getting a draft or even an outline going would probably help get the ball rolling00:14
*** kjackal has quit IRC00:14
clarkbI'm going to have to pop out soonish as a contractor is coming over to tell me how much a new wall costs but will saerch out any feedback when I am able00:14
clarkbinteresting the tempest change failed on the same thing as gernade00:16
clarkbmnaser: ^ we likely have a systemic bug there since the change was from glance and not neutron00:16
clarkbhrm looks like both happened on bhs1 and the devstack run took about an hour before it failed00:18
clarkbfungi: amorin ^ maybe there is some other issue affecting bhs1?00:18
*** jcoufal has joined #openstack-infra00:30
*** gyee has quit IRC00:33
clarkblooking at dstat for the job there is a spike in writes to ~46MBps00:39
clarkbbut its pretty consistently closer to 1MBps for the job run00:39
clarkbthere is also a period of almost persistent 10% cpu wai which could be related (they do overlap somewhat)00:39
clarkbpossible we are still our own noisy neighbor here?00:40
clarkblooking at the devstack log that roughly correlates with when neutron mysql migrations were run00:42
clarkbmordred: perhaps this is a silly idea, but can we run mysql in unsafe mode where it is eatmydatay?00:42
clarkbif we are our own noisy neighbor that sort of thing may help00:43
clarkbThe other thing is whether or not kvm is waiting for these writes to succeed before completing them in hte VM00:43
clarkbwe turned that off in infracloud to get more io throughput00:43
clarkbamorin: ^ that last question is likely a question for you00:43
*** wolverineav has quit IRC00:45
clarkbya then the second 10% ish cpu wai block lines up with nova db migrations00:46
pabelangerclarkb: yah, I might end up trying eatmydata for some DIB testing I am doing, I'm seeing a large amount of failures now with ovh00:47
*** wolverineav has joined #openstack-infra00:48
*** rlandy has quit IRC00:59
*** jcoufal has quit IRC01:00
*** sthussey has quit IRC01:01
clarkbI've launched clarkb-test nodes in vexxhost sjc1 adn ovh bhs1 (and working on gra1) so far I've run sysbench on bsh1 and sjc1 and they look really similar01:05
clarkbboth about 40Mb/sec and 2600 requests per second01:06
clarkbmakes me wonder if its a bad hypervisor or misconfigured hypervisor01:06
clarkbso some subset of the jobs hit it there01:06
clarkbansible facts don't seem to capture if we are emulated or virt01:08
clarkbbut the machines I've logged into are definitely kvm according to systemd-detect-virt01:08
clarkbamorin: fungi: d73a3a4c-b31c-41a7-b8a3-226cdae0d558 and 7bb83b73-e279-4c85-af31-f7870e2714fc are two bhs1 instances that exhibited the weird slowness. Maybe we can zero in on the hypervisor(s) that ran those and see if there is something wrong there? virt not enabled (so using qemu emulation), slow disk, unhappy disk, etc01:11
clarkbba25071f-edaf-417d-8564-fab75676116e is my test instance that seems to be fine in that region if you need t ocompare (or check if it is running on the same hypervisor)01:12
*** jamesmcarthur has joined #openstack-infra01:13
clarkbgra1 actually has much slower rnadom reads and write to disk according to sysbench. About 1/10 the others01:15
clarkblittle more than that, but not much01:15
clarkbbut we don't see the issues in gra1 right?01:15
clarkband now I must go sort out dinner, fungi ^ you may want to try and duplicate my results. Feel free to use teh clarkb-test instances in those regions if you do so01:16
*** wolverineav has quit IRC01:24
fungiyeah, at least mining logstash for job timeouts i was seeing disproportionally far more in bhs1 than graphene01:24
fungier, than gra101:25
fungisorry graphene, my tab key is next to my 101:25
*** wolverineav has joined #openstack-infra01:25
pabelangeryes, I can confirm, IO heavy jobs on ovh-bhs1 are timing out here01:26
*** wolverineav has quit IRC01:30
openstackgerritMerged openstack/os-testr master: Update the home-page URL  https://review.openstack.org/62242701:31
*** wolverineav has joined #openstack-infra01:31
*** studarus has joined #openstack-infra01:34
*** bobh has joined #openstack-infra01:35
*** hongbin has joined #openstack-infra01:36
fungimaybe we let amorin reproduce/investigate with load on it01:36
mordredclarkb: there's some stuff that could be done like eatmydata - also some my.cnf settings to turn down data durability - although it'll ultimately only usually affect fsync - if we're saturating throughput it wouldn't help a ton01:38
mordredbut it's totally worth trying eatmydata01:38
*** jamesdenton has joined #openstack-infra01:46
*** witek has quit IRC02:00
*** witek has joined #openstack-infra02:00
*** jamesmcarthur has quit IRC02:03
*** sthussey has joined #openstack-infra02:04
*** wolverineav has quit IRC02:07
*** dklyle has quit IRC02:12
*** dklyle has joined #openstack-infra02:13
*** mrsoul has joined #openstack-infra02:14
*** wolverineav has joined #openstack-infra02:14
openstackgerritMarcH proposed openstack-infra/git-review master: tests/__init__.py: ssh-keygen -m PEM for bouncycastle  https://review.openstack.org/62263602:18
clarkbfungi pabelanger more and more Im suspecting a specific hypervisor02:20
clarkbsince sjc1 and bhs1 are basically the same for io in limited testing02:20
*** wolverineav has quit IRC02:21
*** eernst has joined #openstack-infra02:25
*** bobh has quit IRC02:33
*** studarus has quit IRC02:35
*** larainema has joined #openstack-infra02:37
mnaserfyi we do throttle iops per gb02:39
mnaser30 iops per gb so @ 80 gb volumes => 240 iops02:40
mnaserer02:40
mnaser2400.02:40
mnaserand 0.5MB/s per GB for SSD volume sso hence the 40MB/s02:41
mnaser(nice to know our qos works well, lol)02:41
pabelangerclarkb: I guess there is no way to track the hypervisor from guest OS02:43
*** psachin has joined #openstack-infra02:44
mnaserpabelanger, clarkb: https://review.openstack.org/#/c/577933/ there is and i tried :D02:50
mnaserbut you can check nova show and look at hostId from the API02:50
*** imacdonn has quit IRC02:52
*** imacdonn has joined #openstack-infra02:53
*** jd_ has quit IRC02:54
*** bhavikdbavishi has joined #openstack-infra02:56
*** bhavikdbavishi has quit IRC03:01
*** jd_ has joined #openstack-infra03:02
*** hongbin has quit IRC03:07
*** auristor has quit IRC03:08
*** eernst has quit IRC03:09
*** eernst has joined #openstack-infra03:09
clarkbya unfortunately the nodes are gone by the time we can look03:11
clarkbso either welog that with nodepool or ask the cloud03:12
openstackgerritmelissaml proposed openstack/os-testr master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62269803:14
*** apetrich has quit IRC03:15
*** bobh has joined #openstack-infra03:18
*** bhavikdbavishi has joined #openstack-infra03:18
*** ykarel|away has joined #openstack-infra03:21
*** graphene has quit IRC03:23
*** ramishra has joined #openstack-infra03:23
*** agopi has joined #openstack-infra03:23
pabelangermnaser: it seems 577933 might just need documentation updates at this point, but not sure what other reviewers say03:32
*** auristor has joined #openstack-infra03:32
pabelangeragree it would be helpful from job POV to collect that info03:32
*** bobh has quit IRC03:36
*** yamamoto has joined #openstack-infra03:38
*** yamamoto has quit IRC03:42
*** bobh has joined #openstack-infra03:45
*** jamesmcarthur has joined #openstack-infra04:03
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210104:04
*** bobh has quit IRC04:04
*** wolverineav has joined #openstack-infra04:08
*** jamesmcarthur has quit IRC04:08
*** wolverineav has quit IRC04:12
*** wolverineav has joined #openstack-infra04:19
*** hongbin has joined #openstack-infra04:24
*** eernst has quit IRC04:26
*** yamamoto has joined #openstack-infra04:32
*** wolverineav has quit IRC04:37
*** janki has joined #openstack-infra04:40
*** sthussey has quit IRC04:51
*** mordred has quit IRC04:57
*** mordred has joined #openstack-infra04:57
*** hongbin has quit IRC05:19
*** yamamoto has quit IRC05:34
openstackgerritVieri proposed openstack/gertty master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62285005:37
*** ahosam has joined #openstack-infra05:37
*** stevebaker has quit IRC05:45
*** dmellado has quit IRC05:46
*** gouthamr has quit IRC05:46
openstackgerritMerged openstack-infra/nodepool master: Add cleanup routine to delete empty nodes  https://review.openstack.org/62261605:47
*** dmellado has joined #openstack-infra05:48
*** diablo_rojo has quit IRC05:51
*** stevebaker has joined #openstack-infra05:51
*** gouthamr has joined #openstack-infra05:53
*** yamamoto has joined #openstack-infra05:54
*** dmellado has quit IRC05:55
*** wolverineav has joined #openstack-infra05:56
*** dmellado has joined #openstack-infra05:57
*** stevebaker has quit IRC05:59
*** wolverineav has quit IRC06:00
*** dmellado has quit IRC06:02
*** stevebaker has joined #openstack-infra06:06
*** dmellado has joined #openstack-infra06:14
*** gouthamr has quit IRC06:15
*** stevebaker has quit IRC06:15
*** gouthamr has joined #openstack-infra06:18
*** stevebaker has joined #openstack-infra06:19
*** ykarel|away has quit IRC06:21
*** diablo_rojo has joined #openstack-infra06:24
*** stevebaker has quit IRC06:25
*** gouthamr has quit IRC06:29
*** stevebaker has joined #openstack-infra06:29
*** gouthamr has joined #openstack-infra06:32
*** stevebaker has quit IRC06:41
*** stevebaker has joined #openstack-infra06:42
*** ykarel|away has joined #openstack-infra06:45
*** stevebaker has quit IRC06:51
*** ykarel|away is now known as ykarel06:54
*** stevebaker has joined #openstack-infra06:55
*** kjackal has joined #openstack-infra06:57
*** gouthamr has quit IRC06:58
*** ralonsoh has joined #openstack-infra06:58
*** stevebaker has quit IRC07:03
*** stevebaker has joined #openstack-infra07:06
*** gouthamr has joined #openstack-infra07:07
*** bhavikdbavishi has quit IRC07:09
*** bhavikdbavishi1 has joined #openstack-infra07:09
*** pcaruana has joined #openstack-infra07:10
*** bhavikdbavishi1 is now known as bhavikdbavishi07:12
*** ahosam has quit IRC07:15
*** stevebaker has quit IRC07:15
openstackgerritQuique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217507:17
*** quiquell|off is now known as quiquell07:17
*** stevebaker has joined #openstack-infra07:19
*** takamatsu has joined #openstack-infra07:21
*** stevebaker has quit IRC07:27
*** florianf has joined #openstack-infra07:27
*** stevebaker has joined #openstack-infra07:30
*** yboaron has joined #openstack-infra07:33
*** stevebaker has quit IRC07:37
*** stevebaker has joined #openstack-infra07:40
*** dpawlik has joined #openstack-infra07:40
*** gouthamr has quit IRC07:40
*** apetrich has joined #openstack-infra07:42
*** gouthamr has joined #openstack-infra07:42
*** kjackal has quit IRC07:45
*** kjackal has joined #openstack-infra07:45
*** stevebaker has quit IRC07:46
*** slaweq has joined #openstack-infra07:48
*** quiquell is now known as quiquell|brb07:48
*** stevebaker has joined #openstack-infra07:50
amorinmordred: fungi clarkb I am on the host was hosting d73a3a4c-b31c-41a7-b8a3-226cdae0d558 and 7bb83b73-e279-4c85-af31-f7870e2714fc07:54
amorinit seems to be slower than others07:54
amorindd zero on host itself is slower than on another07:54
amorinbut there are some instances on it07:54
amorinI will disable it to test without any load07:54
*** stevebaker has quit IRC07:55
*** stevebaker has joined #openstack-infra08:00
*** ykarel is now known as ykarel|lunch08:00
openstackgerritArnaud Morin proposed openstack-infra/project-config master: Reduce a little number of instances on BHS1  https://review.openstack.org/62287608:02
*** ahosam has joined #openstack-infra08:02
*** ginopc has joined #openstack-infra08:03
*** e0ne has joined #openstack-infra08:05
*** stevebaker has quit IRC08:05
AJaegeramorin: do you need this quickly? ^08:07
AJaegerinfra-root, ^08:08
*** stevebaker has joined #openstack-infra08:09
amorinAJaeger: nop08:11
amorinit can wait afternoon08:11
amorin(I am in europe tz)08:11
amorinI know that most of the guys are in US08:11
amorinit can wait08:11
frickleramorin: just an idea: are you using deadline of cfq scheduler on the hypervisors? we found that cfq can cause issues in an iops throttled environment and that effect increased a lot from 4.13 to 4.15 kernel08:14
fricklers/of/or/08:14
amorinI have no idea, but I can check08:14
*** stevebaker has quit IRC08:17
*** florianf has quit IRC08:19
*** priteau has joined #openstack-infra08:21
*** stevebaker has joined #openstack-infra08:21
amorinfrickler: we are using cfq on hypervisors08:21
*** quiquell|brb is now known as quiquell08:22
frickleramorin: o.k., so for our setup, we resolved the issue by changing to deadline.08:25
amorinI was told that for SSD disks, it's better to use noop instead08:25
amorinbut I not an expert on that part08:26
amorinI'll see if I can apply that on the whole aggregate for the OSF08:26
frickleramorin: thx, I'll be offline for a bit, will check back later. your node reduction should merge any moment08:28
openstackgerritMerged openstack-infra/project-config master: Reduce a little number of instances on BHS1  https://review.openstack.org/62287608:30
*** stevebaker has quit IRC08:30
*** stevebaker has joined #openstack-infra08:31
*** gfidente has joined #openstack-infra08:31
*** hrubi has joined #openstack-infra08:33
*** bhavikdbavishi has quit IRC08:33
amorinok08:33
*** jpena|off is now known as jpena08:39
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210108:44
*** florianf_ has joined #openstack-infra08:44
*** tosky has joined #openstack-infra08:50
*** fresta has quit IRC08:50
*** fresta has joined #openstack-infra08:51
openstackgerritNatal Ngétal proposed openstack/diskimage-builder master: [Core] Change openstack-dev to openstack-discuss.  https://review.openstack.org/62289508:52
*** aojea has joined #openstack-infra08:54
*** ykarel|lunch is now known as ykarel09:00
*** shardy has joined #openstack-infra09:00
*** verdurin has quit IRC09:04
*** dpawlik has quit IRC09:04
*** jpich has joined #openstack-infra09:05
*** wolverineav has joined #openstack-infra09:05
*** dpawlik has joined #openstack-infra09:05
*** bhavikdbavishi has joined #openstack-infra09:06
*** bhavikdbavishi has quit IRC09:07
*** bhavikdbavishi has joined #openstack-infra09:07
*** verdurin has joined #openstack-infra09:07
*** wolverineav has quit IRC09:09
*** ccamacho has joined #openstack-infra09:20
*** zhangfei has joined #openstack-infra09:21
*** xek has joined #openstack-infra09:21
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210109:30
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290609:30
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290609:32
*** zhangfei has quit IRC09:34
*** rcernin has quit IRC09:38
*** agopi is now known as agopi|brb09:40
*** bhavikdbavishi has quit IRC09:40
chandan_kumarhello09:44
chandan_kumarIs there a way to set to set basepython=python3 when in zuul job run or post_run is called to run a ansible playbook?09:45
*** larainema has quit IRC09:47
*** zhangfei has joined #openstack-infra09:47
*** derekh has joined #openstack-infra09:49
*** witek has quit IRC09:50
*** zhangfei has quit IRC09:50
*** zhangfei has joined #openstack-infra10:09
*** diablo_rojo has quit IRC10:12
amorinAJaeger frickler, I configured the whole OSF aggregate to deadline instead of CFQ on hypervisors in BHS1, we'll see if it's giving better results on your side10:17
*** ahosam has quit IRC10:20
*** quiquell is now known as quiquell|brb10:21
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size  https://review.openstack.org/62201010:22
*** agopi|brb is now known as agopi10:29
sshnaidmfungi, clarkb not sure what happened, but my email in gerrit turned to be "unverified" and I can't submit patches10:42
*** alexchadin has joined #openstack-infra10:49
*** dtantsur|afk is now known as dtantsur10:50
*** quiquell|brb is now known as quiquell10:52
stephenfinWhich repo configures the projects that the openstackgerrit IRC bot monitors?10:55
*** graphene has joined #openstack-infra10:57
openstackgerritMerged openstack-infra/git-review master: Use six for cross python compatibility  https://review.openstack.org/61668810:57
*** bhavikdbavishi has joined #openstack-infra10:58
*** bhavikdbavishi has quit IRC11:09
*** bhavikdbavishi has joined #openstack-infra11:09
*** xek has quit IRC11:12
openstackgerritMerged openstack-infra/git-review master: Avoid UnicodeEncodeError on python 2  https://review.openstack.org/58353511:15
*** bhavikdbavishi has quit IRC11:16
*** bhavikdbavishi has joined #openstack-infra11:16
*** bhavikdbavishi has quit IRC11:20
*** zhangfei has quit IRC11:22
*** xek has joined #openstack-infra11:26
cmurphystephenfin: http://git.openstack.org/cgit/openstack-infra/project-config/tree/gerritbot/channels.yaml11:27
stephenfinta11:27
*** tpsilva has joined #openstack-infra11:31
*** xek has quit IRC11:38
fungisshnaidm: when was the last time it worked? could you have switched accounts when we deactivated one of your duplicate gerrit accounts a couple of weeks ago?11:50
*** ahosam has joined #openstack-infra11:50
fungisshnaidm: is it your pullusum@ or einarum@ address?11:53
*** yamamoto has quit IRC11:53
*** yamamoto has joined #openstack-infra11:53
*** yamamoto has quit IRC11:57
*** bhavikdbavishi has joined #openstack-infra12:02
openstackgerritMerged openstack-infra/zuul master: Fix "reverse" Depends-On detection with new Gerrit URL schema  https://review.openstack.org/62083812:04
sshnaidmfungi, it started today, yesterday submitted patches12:04
sshnaidmfungi, I had another email there "sshnaidm@redhat.com" and set it as "preferred"12:05
alexchadinhi, where can I find some grenade job examples which were adapted for zuul?12:05
openstackgerritBrendan proposed openstack-infra/zuul master: Fix urllib imports in Gerrit HTTP form auth code  https://review.openstack.org/62294212:05
sshnaidmfungi, and now seems like gerrit removed it, I'm trying to add it again, but no verification mail so far..12:05
fungisshnaidm: indeed, i can't find that address associated with any account in gerrit's database12:06
fungii'll see if i can tell when it sent verification e-mails for it12:06
fungisshnaidm: i see it sent messages to that address as recently as 11:00:48 utc, so barely an hour ago and at least 15 minutes after you pinged me in here12:11
fungiwas that when you attempted to re-add the address?12:11
*** pbourke has quit IRC12:14
*** kashyap has joined #openstack-infra12:15
sshnaidmfungi, yeah, I think so12:25
*** jpena is now known as jpena|lunch12:25
*** ahosam has quit IRC12:25
sshnaidmfungi, found this mail finally12:26
sshnaidmfungi, yay! can submit patches again :)12:27
*** e0ne has quit IRC12:27
*** e0ne has joined #openstack-infra12:29
fungisshnaidm: great! glad nothing seems to be broken on our end anyway12:39
*** pbourke has joined #openstack-infra12:40
kashyap[OT] Some folks here might appreciate this: https://www.qemu-advent-calendar.org/12:40
kashyapIf you're wondering WTH it is:12:41
kashyap[quote]12:41
kashyapThe QEMU Advent Calendar 2018 features a QEMU disk image each day of December until Christmas. Each day a new package becomes available for download [...]  The disk images contain interesting operating systems and software that run under the QEMU emulator. Some of them are well-known or not-so-well-known operating systems, old and new, others are custom demos and neat algorithms.12:41
kashyap[/quote]12:41
fungii recall they've done that in previous years too12:41
fungiit's neat12:41
kashyapfungi: We didn't do it last year :-)12:41
kashyapLast time was in 2016.  (And before that was in 2014)12:41
fungiin even years then? ;)12:42
kashyapHeh, yeah12:45
kashyapIt's just too much damn work.12:45
kashyapThis year, I only part-volunteered; 2016, I spent lot more time on it.12:46
*** ramishra has quit IRC12:51
*** yamamoto has joined #openstack-infra12:54
*** kjackal has quit IRC12:54
*** kjackal has joined #openstack-infra12:55
*** bobh has joined #openstack-infra12:58
*** kjackal has quit IRC12:59
*** aojea has quit IRC13:01
*** ykarel is now known as ykarel|afk13:03
*** kjackal has joined #openstack-infra13:04
*** ramishra has joined #openstack-infra13:05
*** yamamoto has quit IRC13:06
*** janki has quit IRC13:08
*** ykarel|afk has quit IRC13:10
*** bobh has quit IRC13:22
*** udesale has joined #openstack-infra13:26
*** jpena|lunch is now known as jpena13:33
*** ykarel|afk has joined #openstack-infra13:35
*** rlandy has joined #openstack-infra13:36
*** ykarel|afk is now known as ykarel13:36
*** dtantsur is now known as dtantsur|brb13:39
mordredmorning fungi - how's things?13:40
*** dpawlik has quit IRC13:41
*** dpawlik has joined #openstack-infra13:44
fungii have contractors ripping out and rebuilding my previously-flooded downstairs entry (finally)13:44
fungiit's sort of like my usual industrial music but with less shouting13:44
*** jcoufal has joined #openstack-infra13:44
mordredhrm. maybe you could figure out some way to get the contractors to yell more?13:45
*** agopi has quit IRC13:45
fungii could flip this breaker back on, i suppose13:45
*** jcoufal_ has joined #openstack-infra13:46
openstackgerritMonty Taylor proposed openstack-infra/system-config master: Add a script to generate the static inventory  https://review.openstack.org/62296413:47
mordredfungi: that might just result in muffled thuds and then less noise13:48
fungifair point13:48
*** jcoufal has quit IRC13:48
mordredfungi: there's a stab at an inventory generation script. in writing the commit message for it, it occurred to me that in most cases it should be completely unneeded, as all somebody needs to do is add some ips to a yaml file after running launch-node13:49
*** dpawlik has quit IRC13:49
mordredfungi: maybe we should just have launch-node print a little yaml snippet that could be copy-pastad into the inventory file?13:49
*** agopi has joined #openstack-infra13:49
*** priteau has quit IRC13:50
*** dpawlik has joined #openstack-infra13:50
*** mriedem has joined #openstack-infra13:52
funginot a bad idea. we do something similar for dns currently and will likely be needing to print a snippet to add to a commit to one of our zone repos soon13:53
fungimaybe both can be wrapped up together13:54
fungi(i mean, snippets for two commits since they go in different repos, but the same routine could spit out both)13:54
*** jamesmcarthur has joined #openstack-infra13:55
fungithat also gives us a start on having automation directly propose those patches in the future, should we wish it13:55
*** sthussey has joined #openstack-infra14:00
*** kgiusti has joined #openstack-infra14:03
mordred++14:04
*** priteau has joined #openstack-infra14:05
*** florianf_ is now known as florianf14:05
openstackgerritMatt Riedemann proposed openstack-infra/elastic-recheck master: Add query for n-api/g-api startup timeout bug 1806912  https://review.openstack.org/62296614:06
openstackbug 1806912 in OpenStack-Gate "devstack timeout because n-api/g-api takes longer than 60 seconds to start" [Undecided,Confirmed] https://launchpad.net/bugs/180691214:06
*** sshnaidm has quit IRC14:06
*** quiquell is now known as quiquell|off14:11
*** sshnaidm has joined #openstack-infra14:14
mordredinfra-root: I'm afk for the next few hours - giving a talk about zuul today14:24
fungimordred: ooh! g'luck!14:25
pabelanger+114:25
mordredthanks! this one will be fun - these humans have zero background in openstack at all, so it's a complete blank canvas (I'm sure I'll be completely forgetting some important context :) )14:26
*** psachin has quit IRC14:27
fungithey have a background in ci systems at least?14:27
mordredwho knows!14:27
fungisounds like it'll be a blast14:28
fungi"openstack: sort of like aws without selling your soul"14:29
mordredwait - I didn't have to sell my soul?14:29
mordredI knew I was doing something wrong14:29
fungii can get you a receipt14:29
*** janki has joined #openstack-infra14:33
openstackgerritMerged openstack-infra/elastic-recheck master: Add query for n-api/g-api startup timeout bug 1806912  https://review.openstack.org/62296614:38
openstackbug 1806912 in OpenStack-Gate "devstack timeout because n-api/g-api takes longer than 60 seconds to start" [Undecided,Confirmed] https://launchpad.net/bugs/180691214:38
mriedemovh-bhs1 nodes must be slow14:39
mriedemseeing lots of slow-node related timeout failures in e-r on those nodes14:39
*** jamesmcarthur has quit IRC14:45
*** boden has joined #openstack-infra14:48
*** eharney has joined #openstack-infra14:48
fungimriedem: yes, we think it's disk write performance, amorin is attempting to troubleshoot14:50
mriedemok14:50
fungiit may be just some of the hosts in our dedicated aggregate in that region14:50
fungiwhich is making it tough to pin down14:51
pabelangermriedem: fungi: last evening mnaser link https://review.openstack.org/577933/ as maybe a way we could help track hostid from the guest VM when clarkb was trying to get more info from jobs.14:53
openstackgerritStephen Finucane proposed openstack-infra/project-config master: Remove openstack/osc-placement from #openstack-nova  https://review.openstack.org/62298714:54
mriedempabelanger: no updates since june so i forgot about that one14:55
*** anteaya has joined #openstack-infra14:55
mriedemprobably needs a helping hand14:55
fungiyeah, with that we could collect the hostid along with other instance metadata and even expose it as a column in logstash/kibana once our providers start supporting it14:56
pabelangeryah, would defer to mnaser, but looked like just needs doc updates14:56
*** zul has quit IRC14:57
*** sshnaidm has quit IRC14:57
fungigetting nodepool to request it from the api and pass that all the way through zuul to ansible somehow is probably possible but seems like it would be rather complicated to instrument14:57
*** quiquell|off has quit IRC14:59
pabelangerfungi: interesting idea, maybe something to ask about in #zuul.14:59
openstackgerritMerged openstack-infra/system-config master: Retire the interop-wg mailing list  https://review.openstack.org/61905615:10
*** lpetrut has joined #openstack-infra15:12
*** dtantsur|brb is now known as dtantsur15:13
*** sshnaidm has joined #openstack-infra15:18
*** hwoarang has joined #openstack-infra15:27
logan-reviews on https://review.openstack.org/#/q/starredby:logan2211%2540gmail.com+status:open+project:%255Eopenstack/openstack-ansible-.* will beappreciated15:30
logan-er wrong channel, sorry15:30
*** jamesmcarthur has joined #openstack-infra15:33
corvusfungi, pabelanger, mnaser: looks like we get the hostId from nova already in nodepool, we just don't do anything with it; it would be pretty easy to plumb that through to zuul15:34
fungioh, really?15:34
openstackgerritmelissaml proposed openstack/ansible-role-cloud-launcher master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62300815:34
*** zul has joined #openstack-infra15:34
fungicorvus: like, by returning it through gearman?15:35
corvusfungi: yeah, i *think* it should be there when we're done with server creation15:35
*** jamesmcarthur has quit IRC15:35
corvusfungi: via zookeeper (to scheduler) and gearman (to executor), yes15:35
*** jamesmcarthur has joined #openstack-infra15:35
*** sshnaidm is now known as sshnaidm|afk15:36
fungioh, right, i keep forgetting launcher<->scheduler communication is zk15:36
fungiyour definition of "pretty easy" differs a bit from mine ;)15:37
*** jesusaur has quit IRC15:37
corvusfungi: heh, it's 2 changes to 3 components, but it's basically just adding data to structures that already exist15:37
*** kashyap has left #openstack-infra15:38
pabelangercool, so patches welcome then :)15:40
pabelangermight look at it more this afternoon15:41
Linkidhi15:41
*** jesusaur has joined #openstack-infra15:42
LinkidI don't know if I'm on the right channel15:42
LinkidI would like some info about resources available at the OSF15:42
Linkidlike disk space15:43
Linkidbecause I would like to suggest a tool for hosting sonething and I don't know if it is possible15:43
Linkid(before I suggest something impossible)15:44
Linkid*something15:44
fungiLinkid: the openstack foundation doesn't maintain community services, but you've found the channel for a community of people who collaborate on providing services to people who work on building free/libre open source projects15:45
fungier, building services for15:46
fungiLinkid: what tool are you thinking about?15:46
Linkidfungi: I was thinking about installing peertube to host OpenStack Summit (among other) videos (in addition to Youtube15:48
*** lpetrut has quit IRC15:49
Linkidand I would be happy to help :)15:49
openstackgerritStephen Finucane proposed openstack-infra/project-config master: Add openstack/os-api-ref to #openstack-doc  https://review.openstack.org/62301315:49
Linkidhttps://joinpeertube.org15:49
fungian interesting idea. i wasn't aware of peertube until just now15:50
fungicurious what drove them to choose the agpl15:50
Linkidthis is a libre project maintained by a French orga, and it is a really good alternative to youtube :)15:50
fungilooks like last year they switched their codebase from gplv3 to agplv3 but the commit doesn't indicate why15:51
LinkidIf there are enough resources, I would be happy to install it and to make scripts to add video metadata to it15:51
Linkidthey made a presentation at FOSDEM 2018, if you want15:52
Linkid(but it was in french, if I remember it well)15:52
fungii think we'd likely need to get some sort of copyright agreement in place with the openstack foundation since i'm not sure what the copyright situation is with the summit recordings15:52
Linkidah15:53
fungibut this is compelling since i discovered not long ago that people in mainland china can't watch our conference session recordings on youtube so there's already a need to host those videos in multiple places15:53
LinkidI thought summit recordings were the property of the foundation (or CC0)15:53
Linkid:)15:54
funginot to mention, it's always rubbed me the wrong way that we espouse free software at out conferences but then expect people who want to watch recordings of the sessions from them to do so through proprietary video hosting services15:55
fungier, at our conferences15:55
corvusthis sounds like a great idea; i agree we should clarify the licensing status (i really hope they are CC licensed, and if they aren't we should see about getting that changed)15:56
Linkidfungi:  yes, Framasoft (the orga which promotè peertube) thoughts the same ^^15:57
Linkid(about other conferences)15:57
corvusit can be run in docker: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/docker.md15:57
Linkiddo you want me to write a mail for the suggestion ?15:57
Linkidcorvus: yep :)15:58
corvusLinkid: assuming we get all the pre-requisites worked out, we're working on moving our infrastructure to be mostly driven by ansible + containers, so that's what most of the work to get it running would entail.15:58
fungiLinkid: sure, starting a discussion on the openstack-infra@lists.openstack.org mailing list might be an easier way for us to also point osf staff who handle the event logistics and video production stuff involved15:59
*** jamesmcarthur has quit IRC15:59
corvusmaybe this could be an opendev branded service?  we could probably offload the current static video hosting we have for zuul-ci.org to it.16:00
pabelangerTIL: https://joinpeertube.org/ looks very cool16:00
pabelangercorvus: +116:00
Linkidfungi: ok :). I'll do it in 2 hours, when I'll be able to write the mail, then :)16:01
fungiLinkid: awesome, thanks! i'm digging into the copyright situation for summit session videos now, so hopefully will have a better answer about them by then16:01
*** slaweq has quit IRC16:02
*** haleyb has joined #openstack-infra16:12
openstackgerritJeremy Stanley proposed openstack-infra/zuul master: Add instructions for reporting vulnerabilities  https://review.openstack.org/55435216:12
*** jamesmcarthur has joined #openstack-infra16:13
*** graphene has quit IRC16:13
anteayaLinkid: are you you affiliated with peertube?16:14
openstackgerritMerged openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210116:14
openstackgerritMerged openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290616:14
*** graphene has joined #openstack-infra16:15
*** sshnaidm|afk has quit IRC16:15
anteayaLinkid: I'm wondering if the peertube folks want to stay with the use of the word 'spy' in these descriptions: https://framatube.org/about/peertube or would rather go with the word 'view'?16:15
*** jamesmcarthur has quit IRC16:18
Linkidanteaya: I have contacts with Framasoft (the orga which promotes peertube)16:18
LinkidI could ask them, if you want16:18
*** pcaruana has quit IRC16:18
fungii have a feeling the content on that page was translated, and the translator didn't realize that word could have negative connotations16:19
anteayafungi: that is my feeling as well16:20
*** bobh has joined #openstack-infra16:21
fungii do wonder how well bittorrent protocol works through the great firewall of china (if at all)16:21
corvusit's possible they used the word intentionally since it's describing a potentially privacy invading action -- learning which videos someone else is watching16:21
anteayaLinkid: thank you, they can contact me in this channel or via pm using my nick or email me at mynick@mynick.info16:21
anteayacorvus: agreed, I'm curious as well16:21
fungicorvus: i thought that too at first, but had a hard time parsing the sentence to be sure16:21
anteayaif it is intentional, then great, I'm just not sure16:22
anteayaI do like their transparency16:22
corvusthe third use ("worst-case scenario") makes me lean towards the intentional interpretation16:22
corvusanteaya: yes, they're very up-front about what it does and how it works16:23
fungiyeah, if it's just missing some prepositions then i can see how it might have been intended that way16:23
anteayaI like that, gives me a good feeling inside16:23
Linkidyes, they do want transparency to show that this tool is great16:25
clarkbcool those bhs1 test nodes were on the same hypervisor16:25
clarkbfungi ^ any sense yet if amorins changes have made bhs1 more reliable?16:25
clarkbre videos confs like lca upload to youtube then separately host the videos in free format(s) on a browseable index16:26
fungiclarkb: no, he's looking into potential performance issues in bhs1 again today16:26
clarkbswift may make somethibg like lcas setup easy for us16:26
Linkidand they started a campaign some months ago to translate everything in english (and in other languages). Maybe there are still some mistakes16:26
clarkbfungi ya looks like a scheduler change was applied16:27
clarkb(reading sb)16:27
fungioh, right, i saw that. no idea but i'll take a dig into logstash16:28
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: import git-os-job source repo  https://review.openstack.org/62302316:30
anteayaLinkid: ah okay, thank you, I don't like to start off a relationship by assuming someone else is incorrect, I like to ask first, sometimes I'm the one who has mis-understood16:30
*** janki has quit IRC16:30
*** sshnaidm|afk has joined #openstack-infra16:30
*** gyee has joined #openstack-infra16:34
anteayalooks like one thing framasoft does is aggregate free software tools, rebrand them as frama-* and host instances16:35
anteayatheir collaborative editing tool for instance is etherpad-lite16:36
anteayalooks like an awesome service for end users looking to use free software via the browser16:37
anteayalooks like their targets are schools and small businesses16:38
*** studarus has joined #openstack-infra16:39
fungibtw, i can't find content licensing or copyright details for summit session videos anywhere so i've asked some osf staff who ought to know (and am also recommending they publish information about that one way or another)16:41
anteayafungi: I'm surprised it hasn't come up before16:42
*** dpawlik has quit IRC16:42
anteayaLinkid: I'm looking for the code for the https://framacolibri.org/ service, so far I can't seem to find that16:42
anteayaLinkid: it looks really awesome16:43
*** bobh has quit IRC16:43
anteayaso far it appears it is a combination of twitter, bug tracker and linkedin16:43
*** dpawlik has joined #openstack-infra16:44
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Query: [primary] Waiting for logger  https://review.openstack.org/62221016:45
*** udesale has quit IRC16:46
*** udesale has joined #openstack-infra16:47
ssbarnea|rovermriedem: can you please help with open elastic-recheck CRs? https://review.openstack.org/#/q/project:openstack-infra/elastic-recheck+status:open16:51
ssbarnea|roverclarkb: fungi : i found one log file from yesterday which I cannot search on logstash: http://logs.openstack.org/20/619520/12/check/tripleo-ci-centos-7-scenario004-standalone/0edbd40/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-12-04_20_08_2016:56
*** jamesmcarthur has joined #openstack-infra16:57
ssbarnea|roveri searched for that auth.docker.io error line and zero hits, but based on the file name it should have being indexed, right?16:57
ssbarnea|roveri did not see any post failure, which makes me believe it shoudl have being found.16:58
fungiclarkb: amorin: frickler: not a huge sample size (35 occurrences registered in the past 6 hours) but the proportion of job timeouts occurring in ovh-bhs1 are roughly 2x what would be random distribution based on its proportion of overall quota16:58
fungiwhich isn't great, but also not nearly so bad as what we were seeing last week16:58
*** priteau has quit IRC16:59
fungiovh-bhs1 accounts for 15% of our quota and is where we saw 34% of timeouts occur in the past 6 hours17:00
*** bhavikdbavishi has quit IRC17:02
*** kjackal has quit IRC17:02
*** bhavikdbavishi has joined #openstack-infra17:02
fungion the other hand, ovh-gra1 accounts for only 8% of our quota and is where 20% of the timeouts occurred over the past 6 hours, so bhs1 is actually doing considerably better than gra1 in that regard17:03
mriedemssbarnea|rover: looking17:03
fungiit'll take a while to accumulate a large enough sample size to be sure this is representative though17:03
*** yamamoto has joined #openstack-infra17:04
clarkbfungi: interesting. Not sure if you saw my notes last night but vexxhost sjc1 and bhs1 had roughly the same io throughput according to sysbench random read and writes benchmarking. That is what made me think there was maybe an unhappy hypervisor17:05
clarkband amorin seems to have confirmed that a cuople of the unhappy jobs ran on the same hypervisor which was slower so at least some progress there17:05
*** jamesmcarthur has quit IRC17:05
*** priteau has joined #openstack-infra17:06
*** jamesmcarthur has joined #openstack-infra17:07
*** slaweq has joined #openstack-infra17:08
*** yamamoto has quit IRC17:08
*** ginopc has quit IRC17:08
*** priteau has quit IRC17:08
*** priteau has joined #openstack-infra17:09
*** rlandy is now known as rlandy|brb17:10
Linkidanteaya : yes, Framasoft promotes free softwares for people to use alternatives :).17:11
*** shardy has quit IRC17:11
openstackgerritMerged openstack-infra/elastic-recheck master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232617:12
LinkidAnd to show that they are alternatives, they offer access to those ones with their frama* services :)17:12
*** shardy has joined #openstack-infra17:13
clarkbssbarnea|rover: re https://bugs.launchpad.net/openstack-gate/+bug/1806655 that should only happen if the logger daemon is stopped for some reason. One reason may be if your job reboots the test nodes (then you have to restart the logger daemon in the job)17:13
openstackLaunchpad bug 1806655 in OpenStack-Gate "Zuul console spam: [primary] Waiting for logger" [Undecided,New]17:13
clarkbssbarnea|rover: looking at logstash it appears to be a tripleo job specific issue?17:14
ssbarnea|roverclarkb: ok. i have another case where some errors from mistral/api.log.txt.gz are not to be found on logstash. is this file indexed to or not?17:15
ssbarnea|roverclarkb: yep, i suspect is specific to tripleo.17:15
anteayaLinkid: thanks, lots to read on the various links17:16
*** jpich has quit IRC17:17
*** wolverineav has joined #openstack-infra17:17
clarkbssbarnea|rover: https://git.openstack.org/cgit/openstack-infra/project-config/tree/roles/submit-logstash-jobs/defaults/main.yaml is the list of stuff we index. I don't think api.log matches anything there17:18
clarkbhrm I was trying to catch up on email but now there is more than when I started17:19
*** slaweq has quit IRC17:20
ssbarnea|roverclarkb: thanks for the link, now i don't know what to do about mistral log, is huge and indexing whose would sound like a pretty bad idea, but indexing errors and warnings from it would not sound as a bad idea to me.17:20
ssbarnea|roverdo we have a way to do selective indexing?17:20
clarkbssbarnea|rover: not with the current logstash config. It will index everything >=INFO level from the oslo.logging format logs17:21
*** slaweq has joined #openstack-infra17:21
clarkbssbarnea|rover: in the past we have worked with various projects to improve their logging to make it more useable by logstash/us and conseuqently ops17:21
*** e0ne has quit IRC17:22
ssbarnea|roverahh, that is ok 95% of the spam is debug so extra load should not be an issue.17:22
*** jamesmcarthur has quit IRC17:22
clarkbssbarnea|rover: can you link me to an example file? I'm curious to see how big it is17:22
ssbarnea|roveri am really glad to hear that we drop DEBUG lines17:22
mriedemssbarnea|rover: i've gone over several of your newer e-r queries and -1ed several of them17:22
clarkbwe actually rely on os-loganalyze for that iirc. so we should double check it is filtering mistral logs properly too (which I expect it isn't for a file named just api.log)17:23
ssbarnea|rovermriedem: thanks, looking now at them.17:23
*** jamesmcarthur has joined #openstack-infra17:23
*** zul has quit IRC17:23
fungissbarnea|rover: if we didn't drop debug level loglines our logstash retention would probably be more like a day instead of 1017:24
*** bobh has joined #openstack-infra17:25
fungiand it takes a 6tb elasticsearch cluster just to house that much17:25
openstackgerritMerged openstack-infra/zuul master: Add instructions for reporting vulnerabilities  https://review.openstack.org/55435217:25
openstackgerritMerged openstack-infra/elastic-recheck master: Categorize missing /etc/heat/policy.json  https://review.openstack.org/62112817:27
clarkbfungi: we would also likely take two days to index that one day of logs17:28
fungior need many times more indexing workers17:28
mnaserclarkb: have you seen anything about systemd-python failing to install on centos 7.6?17:28
mnaser"Cannot find libsystemd or libsystemd-journal"17:29
fungiefried: nice plug for git-restack in your howto!17:29
clarkbmnaser: nope that one is new to me17:30
mnaser:( okay, i'm a bit lost on why its hapepning17:30
mnaserhttp://logs.openstack.org/51/620651/5/gate/openstack-ansible-deploy-aio_lxc-centos-7/01ca2fd/logs/openstack/aio1_keystone_container-c3bb8d22/python_venv_build.log.txt.gz17:30
fungiefried: makes me wonder if git-restack should grow a --continue convenience option which just invokes git-rebase --continue17:31
mnaserhttp://logs.openstack.org/51/620651/5/gate/openstack-ansible-deploy-aio_lxc-centos-7/01ca2fd/logs/ara-report/result/ad4ef8f9-31c4-4b5d-886d-b692fcd5e1d0/ and we install systemd-devel17:31
openstackgerritDuc Truong proposed openstack-infra/irc-meetings master: Change Senlin meeting to different biweekly times  https://review.openstack.org/62303117:33
*** bobh has quit IRC17:34
*** jmorgan1 has quit IRC17:36
*** jmorgan1 has joined #openstack-infra17:40
*** mriedem is now known as mriedem_away17:40
*** bobh has joined #openstack-infra17:42
*** rlandy|brb is now known as rlandy17:42
*** jpena is now known as jpena|off17:42
*** florianf is now known as florianf|afk17:43
ssbarnea|rovermriedem_away: clarkb : read last comment on https://review.openstack.org/#/c/621004/1 -- mainly the query to look for ssh failures is currently broken.17:44
openstackgerritJames E. Blair proposed openstack-infra/infra-specs master: Add opendev Gerrit spec  https://review.openstack.org/62303317:45
corvusclarkb, fungi, mordred, pabelanger, tobiash: ^ can you take a look at that?  i'd like to get it into shape as quickly as possible17:46
ssbarnea|roverbecause ansible output changes with improved json logging there are lots of queries that may have broken because they assumed that two strings are on the same log line, which is no longer true. See https://github.com/openstack-infra/elastic-recheck/blob/master/queries/1721093.yaml17:47
anteayathis looks like the catalog of services without the frama-* branding: https://chatons.org/en/find though the page is yet to be translated into english, I welcome corrections on my assumptions from French collegues17:48
*** mrhillsman has quit IRC17:51
*** mrhillsman has joined #openstack-infra17:51
*** sthussey has quit IRC17:52
*** jiapei has quit IRC17:52
*** adrianreza has quit IRC17:52
*** jiapei has joined #openstack-infra17:53
clarkbcorvus: next on my list after getting through email backlog17:53
*** sthussey has joined #openstack-infra17:55
*** adrianreza has joined #openstack-infra17:55
*** jamesmcarthur has quit IRC17:55
anteayalooks like framasoft uses a feature on etherpads that allows them to be sorted by name or date, subscribed to and can make them private or public: https://mypads.framapad.org/mypads/?/mypads/group/framalang-7l3ibkl0/view17:57
clarkbanteaya: ya there are plugins to do that. For the private vs public you have to set up auth iirc. I'm more of the opinion that "etherpads" shouldn't be permanent records (since they can be changed afterall)17:59
anteayaah, thank you18:00
*** bobh has quit IRC18:00
fungian interesting correlation, it looks like our average job uses fairly close to 1 node-hour? zuul seems to be averaging ~1kjph at capacity and we have 1030 nodes in our aggregate quota18:01
anteayabnemec: I didn't know you had your own channel18:01
anteayayou have your broadcast on in the background, you are very soothing18:01
fungibnn: the ben nemec network18:02
clarkbhrm apparently I don't understand what a reboot is required to complete this package upgrade means18:02
clarkbbecause ns2 is still complaining18:02
anteayafungi: I do recommend18:02
fungiclarkb: yeah, i saw the cron e-mail again this morning18:02
* bnemec trademarks bnn :-)18:02
fungiclarkb: puzzling to say the least18:02
clarkbfungi: is that a nice way of saying that unattended upgrades wants a human to do the upgrade?18:02
clarkbfungi: also looking more closely its mad about lxd and friends. Maybe we uninstall those18:03
clarkb(I thought we were uninstalling those fwiw, so maybe start by double checking that)18:03
*** derekh has quit IRC18:03
clarkbcorvus: did you see https://review.openstack.org/#/c/622624/ for website content? I'm hoping to pull that back up again and refine it after lunch today. So reviews there much appreciated as well18:05
fungiclarkb: i didn't hold onto the cronspam from ns2 so not entirely sure to be honest18:06
fungiclarkb: oh, these are new packages, that's why18:07
*** jamesmcarthur has joined #openstack-infra18:07
corvusclarkb: oh nice i'll review now18:07
fungisee /var/log/dpkg.log for entries from today18:07
bnemecanteaya: I don't think of it so much as a channel as a dumping ground for videos I want other people to see. :-)18:07
bnemecThere's also about a hundred videos of hurdle races, but they're not public because I'm not subjecting other people's kids to YouTube commenters.18:07
*** jamesmcarthur has quit IRC18:07
clarkbfungi: will do thanks18:07
bnemecanteaya: But I'm glad you like it.18:07
fungiclarkb: there was a new kernel (again)18:07
*** jamesmcarthur has joined #openstack-infra18:07
clarkbfungi: ah so the lxd being held back is just noise?18:07
fungii think so18:08
clarkbwhy doesn't ns1 send the same spam?18:08
clarkbor maybe my filters are broken18:08
fungilikely not configured the same18:08
anteayabnemec: ha ha ha, thanks for the explaination18:09
*** dave-mccowan has joined #openstack-infra18:09
anteayabnemec: and yes, I do like it, like background talk radio without the death18:10
bnemec:-)18:10
anteayawould listen to more ben radio anytime18:10
anteayaand yeah, +1 for protecting the younglings18:11
fungiclarkb: yeah, ns1 isn't auto-updating at all18:11
fungithough unattended-upgrades is installed and the checksum on /etc/apt/apt.conf.d/50unattended-upgrades matches between them18:12
Shrewsinfra-root: I'd like to restart the nodepool launchers to pick up some fixes. Any objections to doing that now?18:13
corvusShrews: ++18:13
fungiShrews: sounds fine to me18:13
Shrewsok. i'll begin the preparations for it18:14
corvusi added infra-core to opendev-website-core18:14
tobiashcorvus: lgtm and ++ for not using cgit :)18:15
fungiclarkb: 2018-12-05 06:50:20,794 ERROR Cache has broken packages, exiting18:17
fungiseen in /var/log/unattended-upgrades/unattended-upgrades.log on ns118:17
clarkbthe one "feature" cgit has that few others seem to do is the ability to look at source files in a semi rendered state so that you can link to lines in eg rst files18:17
clarkbfungi: huh18:17
clarkbfungi: can we apt-get autoclean and see if that fixed?18:18
Shrewscorvus: oh, your new addition for the DELETED node state is going to force a total shutdown i think18:18
Shrewscorvus: if we do 1-at-a-time, some launchers may see that as an invalid state and bork18:18
fungiclarkb: https://bugs.debian.org/89860718:18
openstackDebian bug 898607 in unattended-upgrades "unattended-upgrades: send mail on all ERROR conditions" [Important,Fixed]18:18
fungiclarkb: i think it's to do with the lxd packages18:20
clarkbfungi: so maybe ensure those are removed and do an autoclean?18:20
*** ralonsoh has quit IRC18:21
tobiashShrews: so we should add a release note before we do a release then18:21
Shrewsclarkb: i forget, is a pip install necessary for launcher upgrade?18:24
fungiShrews: is `pbr freeze` not listing the latest commit?18:28
clarkbShrews: puppet should do it for us18:28
clarkbbut ya running python3 $(which pbr) freeze should show you the sha118:28
clarkbto confirm18:29
openstackgerritClark Boylan proposed openstack-infra/system-config master: Don't install lxd on our servers  https://review.openstack.org/62304018:30
openstackgerritClark Boylan proposed openstack-infra/system-config master: Configure packages on ubuntu arm servers  https://review.openstack.org/62304118:30
clarkbfungi: ianw ^ package fixes including for arm18:30
Shrewsfungi: clarkb: k, cool. i was doing 'pip freeze' which was less than helpful18:30
fungiclarkb: aha, i'm noticing that one of the packages which is behind in updates on ns1 compared to ns2 is unattended-upgrades itself18:31
mnaserdoes anyone know if new images have been uploaded to all clouds?18:31
mnaseri'm still seeing centos 7.5 machines go up, even as of 2 hours ago18:32
fungiclarkb: though the package changelog for the newer one doesn't seem to indicate it includes the fix for debian bug 898607 so i'm still at a loss18:32
openstackDebian bug 898607 in unattended-upgrades "unattended-upgrades: send mail on all ERROR conditions" [Important,Fixed] http://bugs.debian.org/89860718:32
fricklercorvus: seems https://review.openstack.org/621639 would need to be applied manually before the check can pass?18:33
clarkbmnaser: looks like our images builds are broken for centos for ~one week18:33
*** wolverineav has quit IRC18:34
corvusfrickler: does that check not run as the openstackinfra user?18:34
mnaserclarkb: can i help fix that? i'm noticing some servers get centos 7.6 and others with 7.518:34
Shrewsok, stopping all launchers18:34
clarkbmnaser: I don't think any of our test nodes should be 7.6, they will update to 7.6 when running though18:34
corvusShrews: you can do one at a time for slightly less disruption18:35
clarkbmnaser: it appears to be systemic as only arm images on nb03 are successfully building right now?18:35
Shrewscorvus: see my previous query to you18:36
mnaserclarkb: ok yeah it looks like now our containers are 7.6 and our hosts are 7.518:36
Shrewscorvus: new DELETED node state18:36
Shrewsinfra-root: all nodepool launchers restarted now18:36
mnaserclarkb: if there are any logs, i can look into them18:36
clarkbmnaser: looks like we've got dib processing running since November 14 and November 28 on the two non arm builders. So the pipeline jammed up there18:36
*** wolverineav has joined #openstack-infra18:37
fungias in dib was hung and still trying to build something?18:37
openstackgerritSalvador Fuentes Garcia proposed openstack-infra/project-config master: kata-containers: re-enable Fedora job  https://review.openstack.org/62304318:37
clarkbmnaser: https://nb02.openstack.org/ubuntu-bionic-0000000042.log and https://nb01.openstack.org/opensuse-423-0000000025.log18:37
fricklercorvus: I was assuming there is a difference between access to the channel and access to listing permissions from chanserv. but maybe permissions still aren't set up correctly in the first place, yes18:37
clarkbfungi: yup exactly and those logs seem to line up with that18:37
corvusShrews: ah missed that18:37
corvusfrickler: the bot has access to the channel18:38
clarkbin this case a root likely needs to strace to understand what is going on there18:38
fungiclarkb: and not hung on the same image types either. nb01 is on opensuse and nb02 is on ubuntu18:38
corvusfrickler: my question is whether that check runs authenticated18:38
clarkbbecause those logs just end as if they weren't running anymore but ps says otherwise18:38
clarkbfungi: they are in different clouds too iirc18:38
*** priteau has quit IRC18:38
mnaserlooks like no builds have been happening since the 27th?18:39
mnaserhttps://nb02.openstack.org18:39
clarkbmnaser: 28th18:39
clarkbmnaser: I linked the last log file above18:39
clarkbboth disk-image-create processes are blocking on a wait4() so thats not super helpfu18:39
fungiclarkb: ps suggests the hung child process on nb02 is a child of `pip install os-testr` in service of the venv-os-testr element18:40
clarkbin this case we may just want to take this as an opportunity to reboot the servers (apply package updates and clear out any system resources dib may have leaked)18:40
clarkbfungi: yup that lines up with the log above too18:40
*** ccamacho has quit IRC18:40
clarkb2018-11-28 00:35:02.023 |   Downloading https://files.pythonhosted.org/packages/2a/fd/2a8b894ee3451704cf8525a6a94b87d5ba24747b7bbd3d2f7059189ad79f/stestr-2.1.1.tar.gz (104kB) is last logged line18:40
clarkbthat pip install is blocking on a read18:41
fricklercorvus: I was assuming it does, because it takes the nickname as parameter, but looking closer in fact it does seem to18:41
fungiclarkb: though on nb01 it's hung running a git clone of openstack/charm-interface-barbican-secrets18:41
fricklerdoes not*18:41
fungiso didn't even hang doing the same things18:41
*** wolverineav has quit IRC18:41
clarkbthe git operation is blocking on a read to 018:42
clarkb(stdin)18:42
*** wolverineav has joined #openstack-infra18:43
corvusfrickler: any idea what i need to tell chanserv to make it so the script can run?18:43
clarkbplenty of disk on that server too18:43
clarkbpossibly some condition was set up by an earlier build such that the subsequent onces were unhappy? weird though that it wouldn't be the same issue in both spots18:43
Shrews#status log Nodepool launchers restarted and now running with commit ee8ca083a23d5684d62b6a9709f068c59d7383e018:44
openstackstatusShrews: finished logging18:44
fricklercorvus: "set #channel private off" might help18:45
*** dpawlik has quit IRC18:45
fungiclarkb: and the pip install child process on nb02 is blocking on a read from fd 518:45
clarkbfungi: yup18:45
*** diablo_rojo has joined #openstack-infra18:45
clarkbfungi: I'm somewhat inclined to just reboot them both and watch them for similar behavior in the future18:45
*** dpawlik has joined #openstack-infra18:46
fungii concur, there's not much more i can learn from these unless someone else wants to take a stab18:46
clarkbI guess we can look at the build just prior to the most recent ones to see fi there is anything obvious18:46
clarkbhttps://nb02.openstack.org/debian-stretch-0000000037.log completed successfully on nb0218:46
*** mriedem_away is now known as mriedem18:46
*** manjeets_ is now known as manjeets18:47
clarkbhttps://nb01.openstack.org/opensuse-423-0000000024.log failed on nb01 so nothing clear there18:47
clarkbfungi: do you want to reboot or should I?18:47
clarkbI'll go ahead and do it18:49
*** shardy has quit IRC18:49
fungiahh, yep sorry, trying to do too many things at once18:49
mriedemssbarnea|rover: replied on https://review.openstack.org/#/c/621004/18:49
clarkbmnaser: fungi https://nb01.openstack.org/ubuntu-trusty-0000000040.log is the running build on nb01 after a reboot18:50
ssbarnea|rovermriedem: thanks. that was my plan, after getting  confirmation. probably after this we should check existing queries for similar issues.18:50
*** ccamacho has joined #openstack-infra18:50
clarkbwe can watch that to see if builds work now18:50
*** eharney has quit IRC18:52
*** harlowja has joined #openstack-infra18:52
clarkbmnaser: fungi https://nb02.openstack.org/ubuntu-xenial-0000000038.log for nb02. Assuming those builds are now working, they should get around to doing centos7 in the near future18:52
clarkbthen we should keep our eyes open for new 7.6 failures18:52
mnaserclarkb: great, thank you18:52
fricklercorvus: confirmed on a different channel that setting the private flag is what is hiding the access list18:53
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add an upgrade release note for schema change  https://review.openstack.org/62304618:53
corvusfrickler: flag removed18:56
corvusclarkb, dhellmann, tobiash: great comments on 623033, thx; i replied to all19:00
*** ccamacho has quit IRC19:00
*** wolverineav has quit IRC19:01
*** betherly has joined #openstack-infra19:01
*** eernst has joined #openstack-infra19:01
corvusdhellmann: also in my reply to you, i tried to both provide technical answers, and also my own feedback.  hopefully the two can be disentangled as necessary.  :)19:04
*** betherly has quit IRC19:06
clarkbssbarnea|rover: seems like "2018-12-04 04:14:47,233 ERROR:dlrn:Known error building packages for openstack-tripleo-heat-templates, will retry later" may be the root case of one of those waiting on logger failures.19:06
clarkbssbarnea|rover: dlrn probably shouldn't have a retry later option for CI?19:06
*** therve has left #openstack-infra19:07
clarkblooks like delorean failed to open a local repomd.xml file. Thats odd19:09
ssbarnea|roverclarkb: please comment on https://bugs.launchpad.net/tripleo/+bug/1714202 ticket, so other can see it (and hopefully reply).19:10
openstackLaunchpad bug 1714202 in tripleo "DLRN builds fail intermittently (network errors)" [Medium,Incomplete]19:10
dhellmanncorvus : yep. I suspect a rename of a bunch of the cruft we have now to some neutral prefix to start combined with a lenient policy about using the openstack/ prefix later is going to be where we end up19:10
ssbarnea|roveris late here and i do not know all the details. to me is looks like an infra issue.19:11
clarkbssbarnea|rover: done19:12
clarkbssbarnea|rover: well in this case its not the network because its a local file. So either the file doesn't exist or the permissions don't allow reads or the filesystem is corrupt19:12
clarkbwe should rule out the first two things first :)19:12
ssbarnea|roverclarkb: true, but corrupted filesystem counts as infra too. different cause. i am too tired now to look at it.19:13
clarkbit does but it should be a far less common issue and the other two things are much easier to check. Just add a sudo ls -l there19:14
*** bhavikdbavishi has quit IRC19:15
*** ykarel has quit IRC19:16
*** wolverineav has joined #openstack-infra19:16
*** wolverineav has quit IRC19:16
*** wolverineav has joined #openstack-infra19:17
clarkbssbarnea|rover: for the other url on your bug we do proxy cache centos buildlogs https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb#n17219:19
*** gfidente has quit IRC19:21
*** kgiusti has left #openstack-infra19:29
*** wolverineav has quit IRC19:29
*** kgiusti has joined #openstack-infra19:30
*** wolverineav has joined #openstack-infra19:31
*** wolverineav has quit IRC19:39
*** wolverineav has joined #openstack-infra19:43
mnaserclarkb: so far xenial built successfully, opensuse-423 building now19:45
mnaserso good progress so far19:45
*** wolverineav has quit IRC19:46
*** wolverineav has joined #openstack-infra19:46
openstackgerritMerged openstack-infra/project-config master: Add #openstack-designate to accessbot  https://review.openstack.org/62163919:46
fungiclarkb: okay, so a bit of spelunking i think reveals that the reason unattended-upgrades doesn't want to run is that the new versions of lxd and lxd-client want to also install libuv1 which was not previously installed, and u-a is configured to only upgrade packages which are already installed but not add any new packages19:48
clarkbaha19:48
fungiso as suspected, removing lxd and lxd-client should, i think, get it back on track19:48
fungi(one exception to the no new packages rule is kernels)19:49
*** studarus has quit IRC19:53
*** jamesmcarthur has quit IRC19:55
fungiclarkb: ahh, nope, lxd looks like it may be unrelated after all19:57
clarkbhttp://logs.openstack.org/62/621562/2/gate/tripleo-ci-centos-7-containers-multinode/0fa88fd/logs/undercloud/var/log/extra/dstat.html.gz in that job from 18:57 ish to 19:20ish it appears its just validating heat yamls?19:57
*** jamesmcarthur has joined #openstack-infra19:57
clarkbat 19:14 tripleoclient times out on some websocket error (unfortunately its not clear what it is talking to from that traceback)19:58
clarkbthen at 19:20 the stack is actually deployed19:58
clarkbthis did run on a bhs1 one node but looking at the dstat it actually looks a lot more healthy than the dstats from bhs1 I was looking at yseterday19:58
fungiclarkb: because the unattended-upgrades log on both ns1 and ns2 complains that it's not going to upgrade those packages so either there's an actual different corrupt package in the cache and it's not specifying which one, or it's a behavior difference between the versions of unattended-upgrades on those two servers19:58
clarkbfungi: time to try the autoclean?19:59
clarkbreading that dstat disk writes spike to over 300mbps19:59
clarkband are in the 30 range while validating things19:59
fungijust a `sudo apt clean` ought to wipe out the package cache, but i want to try and figure out what's corrupt first19:59
clarkbwhat I do notice is that all the memory is used up (mostly)19:59
clarkband there are load spikes to 29 and 1220:00
clarkband lots of paging20:00
clarkbwhat this is telling me is that bhs1 itself isn't bad in that specific case, but rather the job is demanding quite a lot from the test node and things timed out (probably due to swapping?)20:01
clarkbthinking about disk throughput lots of swapping could haev a snowballing effect on disk performance20:02
clarkbwhere tripleo jobs end up being the noisy neighbors spinning the disks a bunch (or at least moving electrons around on ssds)20:02
*** jaosorior has joined #openstack-infra20:03
*** eharney has joined #openstack-infra20:05
*** dtantsur is now known as dtantsur|afk20:06
*** wolverineav has quit IRC20:06
*** wolverineav has joined #openstack-infra20:07
fungi#status log removed lxd and lxd-client packages from ns1 and ns2.opendev.org, autoremoved, upgraded and rebooted20:09
openstackstatusfungi: finished logging20:09
fungiclarkb: interestingly, i had to manually start nsd on ns2 (per your earlier observation) but not ns120:09
clarkbfungi: could be a race in the startup. The existing unit does allow for it to start after networking is fully up but doesn't require it20:11
clarkbcomparing dstat to a run of the same job in inap there are some differences. Inap has slightly more memory available to be VM20:16
clarkb8000GB vs 8096GB maybe?20:16
fungii wonder if ns2 consistently loses the race and ns1 consistently wins it, but don't feel like rebooting them even more to find out20:16
clarkbfungi: ya my fix should solve it long term I think20:17
clarkbthe paging is actually worse by count in inap20:18
clarkbimplying that maybe that is the big difference (we are able to page in stuff that isn't cached due to no memory more effectively)20:18
*** wolverineav has quit IRC20:18
clarkbmwhahaha: ssbarnea|rover re ^ I would be curious if there are any easy hacks we can do to alleviate the memory pressure20:19
clarkbdoes tripleo enable kernel same page merging?20:19
clarkbthat may help20:19
clarkbperhaps we can also reduce the number of webservers if we have overprovisioned them ? (have no idea if that is a thing just remember it being somethign we did with devstack to reduce its overhead)20:20
clarkbalso this dstat data is great, thank you for adding that. I do notice it doesn't seem to be in the standalone job though?20:20
mwhahahashould be20:20
mwhahahait uses the same ci bits20:20
mwhahaha(dstat that is)20:20
clarkbmwhahaha: hrm I can't find it in some semi random jobs I pulled out to check20:21
clarkbhttp://logs.openstack.org/62/621562/1/check/tripleo-ci-centos-7-standalone/1c762fc/logs/undercloud/var/log/extra/ doesn't have a dstat file20:21
*** kjackal has joined #openstack-infra20:22
mwhahahano looks like we don't run that part20:22
mwhahahaprobably because it's in the udnercloud setup20:22
mwhahahawhich this is not, we can add it tho20:22
clarkbI think that would be helpful especially if we do find its something like memory pressure causing the fallout, then we'll be able to measure if we've improved that or not20:23
*** kjackal has quit IRC20:23
mwhahahathe standalone is less memory intensive20:23
mwhahahaby default it fits in like 6 or 7g20:23
clarkboh nice20:23
mwhahahabut when tempest goes it might increase20:24
*** kjackal has joined #openstack-infra20:24
clarkbin the failure above we seem to hit the pressure just before running the overcloud stack create20:24
clarkband then it sticks around until the end of the job for the most part (it gets slightly better near the end)20:25
clarkbin any case, thats my current thought on where these are having a sad. Memory pressure reduces IO throughput because lots of paging is happening (which itself spins the disks which could cause noisy neighbor issues)20:26
clarkbreducing memory pressure should reduce the need for paging which should improve IO throughput and then hopefully things are happier20:26
*** jamesmcarthur has quit IRC20:28
mwhahahait would seem that just running the undercloud with nothing else means all the ram is taken up without doing anything20:31
pabelangerShrews: thanks for restarting nodepool-launchers!20:32
*** Swami has joined #openstack-infra20:32
mwhahahaso the memory all starts getting gobbled up during step4 of the undercloud install which would be when we actually turn on the openstack servers (other than keystone)20:33
pabelangerinteresting enought, I think we might need another zuul-executor or 2. Our executor queue backlog has been above 0 the last 6 hours: http://grafana.openstack.org/dashboard/db/zuul-status20:33
mwhahahaso it's likely that it's the openstack service containers and the lack of sharedness from running python in containers20:33
pabelangerenough*20:33
pabelangerI am not really sure why that is however20:35
pabelangerhttp://grafana.openstack.org/dashboard/db/zuul-status?from=now-90d&to=now20:38
pabelangerwe are starting more builds in a given period, since 2018-11-2920:39
pabelangerWhich makes me think that realitive builds change, has caused us to run more jobs in a given hour20:39
pabelangerand more IO on executors, where they are running up against governors20:40
pabelangerclarkb: corvus: tobiash: ^something that might be of interest when you have spare cycles20:40
corvuspabelanger: that's a fascinating theory20:42
openstackgerritKendall Nelson proposed openstack-infra/infra-specs master: StoryBoard Story Attachments  https://review.openstack.org/60737720:42
tobiashpabelanger: in my eyes this makes sense as now the smaller less active projects have a higher relative priority to the big resource consumers. Often the big resource consumers have long running jobs that block the same resources for a long time while smaller projects typically have faster jobs. Thus in the end you can run more jobs per hour than before.20:43
corvusmakes sense20:43
tobiashI see the same thing in our deployment too20:44
*** dpawlik has quit IRC20:44
tobiashthe resource hogs are always the projects that have slow jobs20:44
fungibut over the long term it's still the same number and type of jobs completed in the same amount of time20:44
*** jamesmcarthur has joined #openstack-infra20:44
fungiwe're just changing the order in which they get started20:44
tobiashyes, but during peak load the throughput is higher20:44
corvus...theoretically, we could actually end up taking *longer* to run jobs overall -- because right now we have node capacity sitting idle waiting for executors, whereas we did not before....20:45
pabelangerright20:45
fungiahh, and the projects with longer-runnnig jobs which are also the projects with more contention for changes (due to their longer-running jobs) end up waiting until later into the window when activity has died off20:45
corvusso we may need to add capacity not only to make things faster now (we will complete the jobs we're supposed to be running right now faster), but also to get our daily aggregate back to the overall time we had before.20:46
tobiashI'd guess you need at least two more executors20:47
fungiso basically with more executors we'd be able to sustain a higher jph throughput at peak and then have a lower job rate off-peak than otherwise20:47
tobiashjudjing from these stats20:47
pabelangertobiash: yah, I would agree with 220:47
corvuswhy 2?20:48
*** slaweq has quit IRC20:49
fungiif there's a metric for calculating how many executors we need that would be awesome to feed into our future elastic executor autoprovisioning ;)20:49
pabelangerexecutors look to be capped at about 65 running jobs, but executor queue seems to be ~12020:49
pabelangerso, guessing that 2 would help bring that down20:49
pabelangerbut, yah, just a guess :)20:50
efriedfungi: catching up...20:51
efriedI realized later that I should probably not have put commit --amend in my instructions, as that can f ya when there's a merge conflict, and rebase --continue automatically does commit --amend when it's necessary (and not when it's not).20:51
efriedBut yeah, if we had a git restack --continue that served both purposes, it would be simpler to explain to noobs, probably.20:51
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Add ze12.openstack.org  https://review.openstack.org/62306720:52
corvusefried, fungi: oh, we talking about 'git restack'?  can you catch me up?20:52
corvusfungi, pabelanger: ^ that change will fail until the host exists, but if we're ready to add it, i can go ahead and create it.20:52
fungicorvus: efried sent an awesome howto to the openstack-discuss ml about workflow for stacks of patches, and plugged git-restack heavily20:52
corvussweet, reading now20:53
tobiashcorvus: it's an educated guess. I see that there are times when all executors deregistered and 1 more would be a less than 10% increase while 2 more would probably give a little more headroom.20:53
tobiashbut as pabelanger said, it's just a guess20:53
corvustobiash, pabelanger: i agree, but also, i think the 'jobs starting' governor is a big part of this, so that may change things a bit.  i lean toward adding 1 and re-evaluating for now...20:54
pabelangerstarting with one wfm20:55
pabelangerI also agree jobs starting is related20:55
tobiashcorvus: that's fine, I just looked at these graphs already one or two weeks ago and thought that one more would make sense, so my first guess was two20:55
*** wolverineav has joined #openstack-infra20:56
fungieven with all the executor backoff, we're averaging rough 1kjph which is pretty awesome20:57
corvustobiash, pabelanger: yeah, you're probably right.  i mostly just don't want to delete servers :)20:57
corvusefried, fungi: adding 'git restack --continue' makes sense to me20:57
pabelangercorvus: ++20:57
efriedcorvus: Cool20:57
*** wolverineav has quit IRC20:58
*** wolverineav has joined #openstack-infra20:58
*** kjackal has quit IRC20:59
*** dpawlik has joined #openstack-infra20:59
*** dpawlik has quit IRC21:03
*** jtomasek_ has quit IRC21:05
*** rkukura_ has joined #openstack-infra21:08
tobiashcorvus: ah, so you're targeting an exact fit21:11
*** rkukura has quit IRC21:11
*** rkukura_ is now known as rkukura21:11
*** jamesmcarthur has quit IRC21:11
openstackgerritMerged openstack-infra/nodepool master: Add an upgrade release note for schema change  https://review.openstack.org/62304621:17
fungigonna go find some food now that the workmen seem to be done here for the day; i should be back soon though21:25
*** jamesmcarthur has joined #openstack-infra21:28
*** e0ne has joined #openstack-infra21:30
*** sambetts|afk has quit IRC21:31
*** priteau has joined #openstack-infra21:33
*** yboaron has quit IRC21:36
corvushrm, launch-node doesn't seem to be detecting that the server is up21:37
corvusShrews, mordred: ^ more potential fallout from openstacksdk?21:38
corvusi believe launch-node uses the internal wait/timeout feature21:39
*** jcoufal_ has quit IRC21:42
*** graphene has quit IRC21:42
corvusShrews, mordred: yes, it seems so -- running launch-node with 0.19.0 works, 0.20.0 fails21:43
*** graphene has joined #openstack-infra21:44
corvusthis is all i got: http://paste.openstack.org/show/736724/21:45
*** jamesmcarthur has quit IRC21:46
*** priteau has quit IRC21:50
*** e0ne has quit IRC21:53
corvushow do i find the documentation for 0.19.0?21:57
*** kgiusti has left #openstack-infra21:58
*** jamesmcarthur has joined #openstack-infra21:58
corvusthe server is built, but dns.py isn't working because it says: AttributeError: 'Proxy' object has no attribute 'get_server'21:59
*** e0ne has joined #openstack-infra21:59
corvusbut according to the docs under the 0.19 tag, the compute proxy is supposed to have a get_server method: http://git.openstack.org/cgit/openstack/openstacksdk/tree/doc/source/user/proxies/compute.rst?h=0.19.021:59
*** rcernin has joined #openstack-infra22:03
corvusokay, i have to go back to openstacksdk 0.17.2 for that to work22:03
mordredcorvus: crappit22:04
mordredcorvus: lemme look real quick and see if I can be more immediately helpfulk22:04
*** graphene has quit IRC22:04
corvusmordred: ok; i've just got past the immediate needs -- (i have a server and i have the dns commands).  so at this point it's just what next to make launch-node.py and dns.py work again22:04
*** graphene has joined #openstack-infra22:05
mordredcorvus: kk. I think that's likely a yukky break in 0.20.0 ... I'm kinda tempted to revert the patch that's causing all of this and then re-revert it with a bunch more testing since there's clearly some issues here22:05
*** betherly has joined #openstack-infra22:06
corvusmordred: ok.  note two problems: the server wait (introduced in 0.20) and the server_get proxy issue (introduced in 0.18)22:07
*** jamesmcarthur has quit IRC22:07
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Add ze12.openstack.org  https://review.openstack.org/62306722:08
corvusclarkb, mordred, fungi, pabelanger: ^ that shuld be ready to merge22:08
*** e0ne has quit IRC22:08
mordredcorvus: so .. I think the get_server issue is somethinng else - which I think is something I need to come up with a better logging/error for22:09
pabelangercorvus: +222:09
mordredcloud.compute not having a get_server is more likely to mean it wasnt' able to construct a compute Adapter for some reaons - but if that's the case it's a TERRIBLE error message for it22:10
mordredI *think* what it's saying is "this Proxy, which is a bare Proxy and not an openstack.compute.v2._proxy.Proxy, does not have a get_server method"22:11
*** betherly has quit IRC22:11
mordredbut I'll dig in to both things22:11
corvusmordred: i think so -- here's a repr() and dict() from when the error was happening: http://paste.openstack.org/show/736725/22:11
corvusthat's a <openstack.compute.v2._proxy.Proxy object at 0x7f24784b94e0>  when it's working22:12
clarkbanyone want me to keep my three io test nodes around? if not I'll be deleting them shortly (one in bhs1, one in gra1 and one in sjc1)22:15
clarkbmnaser: centos7 image just started building a few minutes ago22:16
clarkbon nb0122:16
*** efried is now known as efried_out_til_j22:17
corvusthe "big project backlog" today is not so great.  neutron and tht at 6h, nova at 3.22:17
*** efried_out_til_j is now known as efried_cya_jan22:17
corvusby "not so great" i mean not large22:17
mordredcorvus: yah. that's what it SHOULD be22:18
corvusyeah, it seems like a reasonable amount of time.22:18
mordredcorvus: do you have any way to see what it is when it's broken without too much effort?22:18
clarkbcorvus: ya its looking much better today than yesterday. Likely depending on whether or not long chains of changes are coming in for $project? at least athat seemed to be the case with nova yesterday22:19
*** tpsilva has quit IRC22:19
*** jamesmcarthur has joined #openstack-infra22:19
corvusmordred: i defer to clarkb on that :)22:20
clarkbaroo?22:21
clarkbwhat the Proxy value is when it doesn't work?22:21
bkeroShush Agnew22:21
mordredclarkb: yah - my hypothesis is that it's an openstack.proxy.Proxy when it doesn't work22:22
mordredoh - wait. yes - of course it is22:22
corvusmordred: oh ha i crossed streams22:22
mordredthat is a bug we actually already have a workaround for in keystoneauth22:22
corvusi thought you were asking about backlogs22:23
corvusmordred: yes, it's a compute_v2 proxy when it works22:23
mordredhttps://review.openstack.org/#/c/621257/22:23
*** bobh has joined #openstack-infra22:23
mordredthat will fix the launch_node problem - it's the rackspace version discovery thing22:23
mordredor - it'll fix the dns.py problem22:23
mordredI do not yet know what the wait problem is22:23
mordredI was hoping we could get the rate limiting patch landed - but I think we just need to cut a new keystoneauth with that fix in22:24
*** wolverineav has quit IRC22:24
*** wolverineav has joined #openstack-infra22:24
mordredlbragstad, kmalloc: ^^ mind if I push up a ksa release patch?22:25
kmallocmordred: do eet22:25
lbragstadsure22:25
kmallocmordred: also...22:25
*** udesale has quit IRC22:25
kmallocmordred: +2 on rate limit22:25
*** eernst has quit IRC22:26
kmallocmordred: as long as we commit to getting an in-ksa functional test beyond the SDK case.22:26
kmallocmordred: but the SDK case is a good start22:26
* kmalloc has to go get car... it's finally repaired!22:26
mordredkmalloc, lbragstad: remote:   https://review.openstack.org/623090 Release 3.11.2 of keystoneauth22:27
mordredkmalloc: sweet. and yes - actually, before we land it, let's get the sdk consume patch fixed up and green22:27
mordredthat way we can at least have that and see that it works22:27
kmallocmordred: ++22:27
kmallocmordred: please mark the KSA -1 Workflow to hold until it's green or at least comment as much on it22:28
mordred++22:28
kmallocbut the code as of now, looks about right sans that functional test22:28
mordredkmalloc: done22:28
mordredwoot22:28
*** boden has quit IRC22:30
clarkbfungi: interesting thing I've noticed about e-r. A bunch of bug graphs have holes. At first I was worried that we broke indexing, then I realized after opening up those bug graphs that the hole is when we disabled bhs122:32
clarkbunfortunately it see that most of them are still recurring :/22:33
clarkbhttp://status.openstack.org/elastic-recheck/#1805176 http://status.openstack.org/elastic-recheck/#1763070 http://status.openstack.org/elastic-recheck/#1802640 http://status.openstack.org/elastic-recheck/#1745168 http://status.openstack.org/elastic-recheck/#1793364 http://status.openstack.org/elastic-recheck/#1806912 and probably others22:34
*** bobh has quit IRC22:34
*** dave-mccowan has quit IRC22:35
*** wolverineav has quit IRC22:36
corvusthat's pretty standout22:36
*** bobh has joined #openstack-infra22:37
*** e0ne has joined #openstack-infra22:37
corvusin a bit of irony, all of the jobs for the ze12 change have node assignments but are waiting for an executor to start22:38
*** e0ne has quit IRC22:38
pabelangeryah, I also noticed that22:39
pabelangersomething happened at 22:30 (gate reset maybe?) and now backlog in executor queue is growing again22:39
pabelangeryah, I think integrated queue22:40
clarkbpabelanger: ya tripleo had a couple failures, one was ntp sync and the other a tempest unittest failure22:40
clarkbintegrated queue may have also reset, not sure22:40
*** owalsh_ has joined #openstack-infra22:40
clarkbfwiw the tripleo ntp sync issue shows up pretty evenly distributed across the clouds so that is one I don't think is bhs1 related22:41
clarkbfungi: ^ re bhs1 we think we ruled out cpu time?22:41
mordredcorvus: well - I haven't found the wait_for_server issue yet - but I have found that we're apparently not doing discovery cache properly and are re-running discovery needlessly :(22:41
clarkbhttp://logs.openstack.org/01/619701/5/gate/tempest-slow/2bb461b/controller/logs/screen-n-api.txt.gz is a recent example of a bhs1 is "slow" failure22:41
mordredkmalloc: ^^ just fyi - I haven't diagnosed the issue yet22:41
clarkbthat was nova starting up taking 64 seconds which is longer than the devstack timeout allows for22:42
*** owalsh has quit IRC22:42
*** bobh has quit IRC22:42
clarkbin that particular case there is only one period of cpu wai of ~10% for a couple seconds, otherwise it actually looks pretty happy overall. Low load, no paging, no memory pressure and so on22:44
clarkbmriedem: if you look at http://logs.openstack.org/01/619701/5/gate/tempest-slow/2bb461b/controller/logs/screen-n-api.txt.gz any hunches to what is slow there? its actually pretty slow (there is a gap) right at the start22:44
clarkbmriedem: maybe if we can identify what is particularly slow there we can use that as a bread crumb to go further. (and maybe it is just disk IO though my test instance shows that is fine, so possibly more unhappy hypversors?)22:45
mriedemplease hold dear caller, currently slitting wrists in nova22:46
clarkbmaybe I need to stay up late one of these evenings and try to catch amorin for paired debugging22:46
clarkbmriedem: I'm sorry, I hope that it gets better22:46
*** wolverineav has joined #openstack-infra22:47
pabelangerouch, another gate reset for integrated22:49
clarkbthe good news is our classification rate is reasonably high, so figuring this out should have a pretty big impact overall22:50
clarkbthe bad news, is that means is causing problems :?22:50
*** jamesmcarthur has quit IRC22:51
clarkbpabelanger: not a bhs1 failure though22:51
*** wolverineav has quit IRC22:51
pabelangeryah, but it just spiked our executor backlog again, 273 waiting now.22:51
pabelangermaybe we should enqueue 623067 to help get a head of it22:52
clarkbya mostly just calling it out because addressing the oddity in bhs1 won't solve all the problems22:52
*** rcernin has quit IRC22:52
mordredcorvus: so - unfortunately, I cannot reproduce create_server(wait=True) breaking - oh - except ...22:52
*** rcernin has joined #openstack-infra22:52
mordredcorvus: that same bug would actually cause the wait loop to spin until it times out22:52
mordredcorvus: so - other than having provided you with absolutely atrocious error messages, this whole thing is fixed with that keystoneauth patch - which I have now submitted a release request for22:53
*** wolverineav has joined #openstack-infra22:53
corvusmordred: feel free to create ze13 if you want to test :)  we can defer adding it into the cluster until we're sure we want it22:53
mordredcorvus: I created a mttest already ... and it worked (this is with the patched ksa)22:54
corvusok good22:54
clarkbI'm going to compile a list of test nodes in the last 6 hours or so that seem to have failed in bhs1 where we have those gaps and maybe that can help debug if it is unhappy hypervisors22:54
mriedemclarkb: well we appear to be doing this an ass ton22:56
mriedemnova.scheduler.client.report._create_client22:56
mriedemDec 05 20:14:22.572362 ubuntu-xenial-ovh-bhs1-0000959981 devstack@n-api.service[23459]: DEBUG nova.compute.rpcapi [None req-87b2e9bd-e752-452f-9933-005e21b6841b None None] Not caching compute RPC version_cap, because min service_version is 0. Please ensure a nova-compute service has been started. Defaulting to current version. {{(pid=23463) _determine_version_cap /opt/stack/nova/nova/compute/rpcapi.py:397}} Dec 05 20:14:22.522:56
mriedem0 ubuntu-xenial-ovh-bhs1-0000959981 devstack@n-api.service[23459]: DEBUG oslo_concurrency.lockutils [None req-87b2e9bd-e752-452f-9933-005e21b6841b None None] Lock "placement_client" acquired by "nova.scheduler.client.report._create_client" :: waited 0.000s {{(pid=23463) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:327}}22:56
mriedemyikes22:56
mriedemrmine_version_cap called over 1k times on startup22:57
mriedem*determine_version_cap22:57
mriedemdansmith: ^22:57
openstackgerritMarcH proposed openstack-infra/git-review master: test_uploads_with_nondefault_rebase: fix git screen scraping  https://review.openstack.org/62309622:57
mriedemmy guess is,22:58
mriedemnova.compute.api.API init per extension, per api worker22:58
dansmithmriedem: instantiating a compute rpc somewhere that gets called a lot?22:58
mriedemso at least 2 workers22:58
mriedem~50+ extensions22:58
*** bobh has joined #openstack-infra22:59
dansmiththe value is cached per worker, and the warning is calling out a pretty bad sitch22:59
dansmithI guess we could try to only log once per process or something22:59
mriedemthere just aren't any compute services started before the api in a devstack run22:59
mriedemalso...23:00
mriedemthis would be looking in cell0...23:00
mriedemwhere we'll never have computes23:00
clarkbmriedem: ok so devstack is detecting it as nova-api hasn't started but root causing it its something deeper23:00
mriedemdevstack times out after 60 seconds23:00
clarkbya and it took 64 seconds to start23:00
mriedemthis is taking 6423:00
mriedemDec 05 20:14:27.967716 ubuntu-xenial-ovh-bhs1-0000959981 devstack@n-api.service[23459]: WSGI app 0 (mountpoint='') ready in 64 seconds on interpreter 0x27018c0 pid: 23462 (default app)23:00
clarkbmriedem: mostly I'm trying to characterize the slowness there so that we can debug it better23:01
clarkbas it may be cloud provider side23:01
clarkbthe good news is (if we ignore the tripleo failures that show the gap because tis convenient) there are only a small number of failures since amorin made changes this mornign23:01
mriedemthis is one of the slow ovh-bhs1 nodes23:01
dansmithmriedem: ah, well, the cell0 thing is legit23:02
mriedemso my point is, we're doing this db lookup api workers * extensions for something that will never return anything because23:02
mriedemconnection = mysql+pymysql://root:secretdatabase@127.0.0.1/nova_cell0?charset=utf823:02
*** jamesmcarthur has joined #openstack-infra23:02
dansmithyeah, that's fair23:02
dansmithbut23:02
mriedemand we don't have nova-compute services in cell023:02
dansmithit also means we probably have a bug there23:02
clarkbmriedem: ok so its two things then, yes slow bhs1 node, but maybe it illustrates a cell0 bug?23:03
dansmithbecause we should be scattering and gathering, even though that won't fix the time zero problem23:03
mriedemdansmith: but if we cached?23:03
clarkb(I'm mostly interested in figuring out the slow bhs1 nodes, but happy to help on the other thing too)23:03
mriedemwe're not caching b/c 023:03
fungiclarkb: yes, amorin confirmed cpo oversub in bhs1 was 2:1 and analysis of logstash records after we halved max-servers there still showed 20x as many job timeouts as gra123:03
*** bobh has quit IRC23:03
dansmithmriedem: but we'll continue looking at cell0 I imagine23:03
mriedemwhen n-api starts in devstack we won't have any computes, so this would just always be 0 on first startup23:03
mriedembut yeah we are not iterating the cells23:04
mriedemshould be calling get_minimum_version_all_cells yeah?23:04
clarkbfungi: cool fwiw looking at dstat there is significant cpu idle time when things time out so nto surprising it isn't cpu (or at least doesn't appear to be)23:04
dansmithwe have service version get all cells23:04
dansmithfor a reason23:04
dansmithyeah that23:04
dansmithmriedem: we probably need a signal that none were found and avoid logging until there are computes that are old, not just missing23:05
dansmithalthough that could be a legit misconfig too I guess23:05
*** wolverineav has quit IRC23:05
dansmithmriedem: if you file a bug and/or remind me tomorrow I could probably take a look at that23:05
*** jamesmcarthur has quit IRC23:05
pabelangeranother gate reset, this time cinder and constraints job23:05
mriedemdansmith: will do23:06
*** wolverineav has joined #openstack-infra23:12
pabelangercorvus: clarkb: we are now up to same amount of jobs running as are queued in executor backlog, starting builds seems to be our hurdle right now. And with the last few gate resets in the last hour, executors cannot get a head. At this rate, i think we are still another hour out for 623067 to land.23:12
corvuspabelanger: seems likely.  tripleo is about to reset again too.23:14
corvus(that job at the top timed out)23:14
clarkbinfra-root I've started to summarize on https://etherpad.openstack.org/p/bhs1-test-node-slowness23:14
corvus(it was on bhs1)23:14
clarkbin the case of tripleo and bhs1 I think their jobs put a lot more demand on the test nodes than eg tempest23:15
*** larainema has joined #openstack-infra23:15
clarkband tripleo may be our noisy neighbor there possibly compounded by some external slowness23:15
mriedemclarkb: dansmith: https://bugs.launchpad.net/nova/+bug/180704423:16
openstackLaunchpad bug 1807044 in OpenStack Compute (nova) "nova-api startup does not scan cells looking for minimum nova-compute service version" [Medium,Confirmed]23:16
clarkbfrom what I can tell looking at things that are not tripleo we have much fewer failures on bhs1 since amorin made chagnes this morning23:16
clarkbwe need more data to say for sure, but I think those changes did improve the situation for us.23:16
clarkbmriedem: thank you23:17
clarkbmwhahaha: ssbarnea|rover http://logs.openstack.org/99/616199/4/gate/tripleo-buildimage-overcloud-full-centos-7/c027629/job-output.txt.gz#_2018-12-05_22_56_12_383902 a case of broken ansible?23:21
mwhahahahmm that's odd23:22
mwhahahawe set environment_type in job defs (i think), maybe it got missed23:22
corvusit takes longer to start a big integration job for a change deep in the integrated gate, compounding the problem with the executors.23:23
mwhahahayea it inherits from tripleo-ci-base which does not have the environment_type set23:23
* mwhahaha fixes23:23
corvusclarkb, pabelanger: we also seem to be using swap a lot23:24
clarkbcorvus: ya particularly in the tripleo jobs I think. I noticed there is a ton of paging on some of those jobs23:25
*** rlandy is now known as rlandy|bbl23:25
*** owalsh_ is now known as owalsh23:25
corvusclarkb: i mean on the executors23:25
clarkboh23:25
clarkbinteresting23:25
notmynameI was going to try to make a patch for zuul's dashboard, but then I realized I don't know anything about react23:26
corvushttp://cacti.openstack.org/cacti/graph.php?local_graph_id=64005&rra_id=all23:26
corvusnotmyname: https://www.softwarefactory-project.io/react-for-python-developers.html23:26
clarkbcorvus: there is free memory available on that executor too23:26
mwhahahaclarkb: hmm in poking at these base definitions, where is the 'multinode' base job defined? is that in zuul somewhere23:26
notmynamecorvus: you certainly had that link available quickly :-)23:27
mwhahahanm i found it in zuul-jobs/zuul.yaml23:27
corvusnotmyname: :)  https://www.google.com/search?q=tristan+react+python23:27
*** wolverineav has quit IRC23:27
corvusat least, that works in my google bubble23:27
notmynameheh23:27
notmynamewho's tristan?23:28
corvusnotmyname: tristanC, he wrote most of the current dashboard23:28
notmynameah, ok23:28
clarkbmwhahaha: it is defined in openstack-infra/zuul-jobs/zuul.yaml23:29
clarkbmwhahaha: oh you found it :)23:29
corvusclarkb: yeah, and there are no memory hogs, so i'm kinda curious what's getting swapped out23:29
clarkbcorvus: ya top says ~4% for executor then a bunch of 1%ish ansible playbooks23:30
mordrednotmyname: https://zuul-ci.org/docs/zuul/developer/javascript.html also might be helpful, toolchain-wise23:30
clarkbcorvus: apparently /proc can tell us /me looks23:30
corvusclarkb: oh, tell me about this magic when you get a chance :)23:31
*** mriedem has quit IRC23:31
mwhahahaclarkb: actually this is caused by https://review.openstack.org/#/c/618669/10/playbooks/tripleo-ci/post.yaml in the gate23:31
clarkbcorvus: VmSwap: 2663556 kB for the executor says /proc. Uh its grep VmSwap in /proc/$pid/status23:31
* fungi throws rip torn style glitter in the air and shouts "kernel magic!"23:32
clarkbmwhahaha: ah, so I got ahead of myself with the debugging. Though not sure if that chagne will fail on its own23:32
mwhahahait won't23:32
mwhahahabut it didn't fail that job either23:32
clarkbcorvus: I think thats a big chunk of it there, odd that its resident size is comparatively tiny23:33
mwhahahaunless i'm missing something, http://logs.openstack.org/99/616199/4/gate/tripleo-buildimage-overcloud-full-centos-7/c027629/job-output.txt.gz#_2018-12-05_22_56_13_434974 seems to show that the error was just ignored23:34
clarkbmwhahaha: it resulted in a post-failure result for the job. I think the "normal" there means that we didn't short circuit or abandon the play, it finished "normally" with error23:34
mwhahahaok i'll kick the other change out of the gate23:35
corvusclarkb: i don't see much of significance that has changed in the executor code over the past few months23:35
clarkbcorvus: I wonder if this is side effect of how python does dicts and data structures in general. basically it grows them and they never shrink (granted releasing memory back to OS is not easy outisde of python either)23:36
clarkbcorvus: at some point the executor may have allocated a bunch of memory for "something"23:37
clarkband now ist not needed so we see the delta between rss and vmswap23:37
fungiyes, it's not uncommon to see a python process gobbling lots of memory but with a small resident size23:37
pabelangeroh, I didn't even think to check cacti.o.o. Yah, wonder what is going on there23:37
fungiwe likely need some sort of python memory profiler imported and logging sizes of (used and cleared) structures to know where it's all going23:38
corvushttp://cacti.openstack.org/cacti/graph_view.php?action=preview&host_id=0&rows=100&columns=-1&graph_template_id=41&thumbnails=true&filter=ze23:38
clarkbstarted ~week and a half ago? likely unrelated to the relative priority changes then23:39
clarkbbut they all started doing it together23:39
clarkb(for the most part)23:39
fungize02 is either getting overwhelmed or is suffering unrelated packet loss from cacti.o.o (ongoing rackspace ipv6 issues?)23:40
clarkbfungi: it has hadthose issues in the past23:40
fungi~1.,5 weeks ago is when a lot of our contributors decided their holiday vacation following the summit was over23:40
clarkbmaybe we should just rebuild it23:40
corvusclarkb: the only restart i have in the logs is from nov 28.23:41
corvuswhich seems right around the beginning of the increase23:41
corvusthat was 3.3.1.dev61  (whatever that is)23:42
corvusi don't know what it was running before23:42
pabelangerthat's when we did the security fix, right?23:42
corvusyep23:42
fungiso could be that a change between the 2018-11-28 and the restart before that introduced a memory leak, or just a memory inefficiency maybe23:42
fungidoesn't look especially leaky as it's not really climbing23:43
*** graphene has quit IRC23:43
clarkbcorvus: 8715505e6d38c092257179b8a089a2a560df5e58 that should be 3.3.1.dev61 according to git describe23:44
clarkbthat was the restart for the security fix23:44
*** _alastor_ has quit IRC23:44
corvusclarkb: makes sense23:44
*** wolverineav has joined #openstack-infra23:45
clarkblooking at commits before that the fix for semaphores and python3 can't do unbuffered binary io and possibly "Include executor variables in initial setup call" stand out to me reading the log23:47
clarkbah we don't run the console logger thing under python3 currently so that binary io one should be a noop23:48
clarkbhowever, that does make we wonder if the place where we might be allocating large buffers then not using them much of the time is the log streaming23:49
clarkbthat goes through the executor right?23:49
*** jamesmcarthur has joined #openstack-infra23:50
*** bobh has joined #openstack-infra23:51
corvusclarkb: zuul_console runs on the remote node, so shouldn't affect things.  yes, the executor has a subprocess which handles log streaming23:51
corvusclarkb: it's not the one that's eating up all that swap23:51
corvusit's an order of magnitude smaller23:51
corvuson ze07: 2018-11-15 23:13:16,178 DEBUG zuul.Executor: Configured logging: 3.3.1.dev3123:52
corvusit must have rebooted or something23:53
corvusits graphs don't look like the rest23:53
clarkbb31b866fbca35939204bb69c447cd68412efa9ce dev31 is that commit I think23:54
*** jamesmcarthur has quit IRC23:54
corvusand 2018-11-06 19:28:34,195 DEBUG zuul.Executor: Configured logging: 3.3.1.dev1523:55
corvuson ze0923:55
corvusalso has different-looking graphs23:55
clarkbbetween 31 and 61 in the executor server file there is changes to job output parsing for the line based commenting23:56
corvusze07's increased usage starts shortly after that restart23:56
clarkbother than that its just winrm fixes23:56
corvusze09 looks suspicious after its restart as well23:57
clarkbactually maybe its not per line commenting. But it issomething that iterates through the log file then modifies it if an error condition is met23:58
clarkb5e9f77326 is the commit that changes things in that part of the code23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!