Tuesday, 2019-06-25

*** smarcet has quit IRC00:06
sgwclarkb: thanks, I knew that part I will play around some more.00:07
fungisgw: yeah, pbr versions have a specification which is basically a mashup of the semver spec and pep 440. pbr has convenience functions to convert its versions to deb and rpm version equivalents (particularly crucial for those ecosystems "sort before" operators for pre-release versions)00:10
fungibut it doesn't have tools to do the reverse00:11
*** mattw4 has quit IRC00:12
*** sgw has quit IRC00:13
*** igordc has quit IRC00:13
clarkbianw: there were job failures in the kafs stack changes that got enqueued to the gate00:15
clarkbI'm not likely to be able to look at those until tomorrow as I'll need to make dinner soonish00:15
ianwclarkb: ok, will look ... just looking at some tcpdumps on the iad mirror00:16
*** smarcet has joined #openstack-infra00:16
ianw2001:4802:7802:104:be76:4eff:fe20:4b35 > 2a04:4e42::223: [icmp6 sum ok] ICMP6, destination unreachable,  unreachable prohibited 2001:4802:7802:104:be76:4eff:fe20:4b3500:17
*** smarcet has quit IRC00:18
ianwthat seems to suggest that our firewall (4b35) is rejecting stuff from fastly ... i wonder if we have some sort of connection tracking issue in the firewall rules00:18
openstackgerritMerged opendev/system-config master: Use systemd-timesyncd on Bionic  https://review.opendev.org/66526900:19
*** rfolco has joined #openstack-infra00:19
fungiianw: but is the stuff from fastly related to connections we're initiating?00:21
ianwfungi: i think yes ... that's what's so weird00:21
fungiare you able to find where we initiated the connection?00:22
fungior seeing *any* tcp6 outbound to that address for that matter?00:22
ianw 00:18:53.491978 IP6 (flowlabel 0x0403d, hlim 64, next-header TCP (6) payload length: 752) 2001:4802:7802:104:be76:4eff:fe20:4b35.43778 > 2a04:4e42::223.https: Flags [P.], cksum 0xc42c (correct), seq 3202292652:3202293372, ack 1673101527, win 42, options [nop,nop,TS val 4150605139 ecr 1505017146], length 72000:23
fungiokay, yep, that does look like it would be part of a connection we initiated in that case00:23
ianwyou can "telnet -6 2a04:4e42::223 80" too00:23
fungineat, so there are at least *some* v6 addresses with which it can have bi-directional communication00:24
ianwhttp://paste.openstack.org/show/753333/ ... i don't know what that means, with the same flow label each time00:25
fungiyeah, that does indeed suggest something about the packets we're receiving don't match expected state for established connections we've made00:27
*** ekultails has quit IRC00:30
*** gregoryo has joined #openstack-infra00:34
openstackgerritMerged opendev/storyboard master: Correct team iterator lists in worklist creation  https://review.opendev.org/66724800:45
fungineat, the coreboot project is running gerrit 3! https://review.coreboot.org/00:46
fungiabout to build their flashrom project from source on openbsd (don't ask)00:46
*** dchen has quit IRC00:55
*** dchen has joined #openstack-infra00:55
*** dayou_ has quit IRC00:56
*** rfolco has quit IRC01:33
*** dayou_ has joined #openstack-infra01:36
*** bhavikdbavishi has joined #openstack-infra01:47
*** rcernin has quit IRC01:48
*** rcernin has joined #openstack-infra01:48
*** bhavikdbavishi1 has joined #openstack-infra01:50
*** bhavikdbavishi has quit IRC01:51
*** bhavikdbavishi1 is now known as bhavikdbavishi01:51
*** lathiat has quit IRC01:54
*** smarcet has joined #openstack-infra01:56
*** apetrich has quit IRC01:58
*** hongbin has joined #openstack-infra02:04
*** sshnaidm has quit IRC02:05
*** whoami-rajat has joined #openstack-infra02:07
*** bhavikdbavishi has quit IRC02:08
*** diablo_rojo has quit IRC02:17
*** auristor has quit IRC02:37
*** auristor has joined #openstack-infra02:39
*** rcernin has quit IRC02:42
*** rcernin has joined #openstack-infra02:42
*** armstrong has quit IRC02:45
*** yamamoto has joined #openstack-infra02:50
*** lathiat has joined #openstack-infra02:52
*** aedc has quit IRC02:54
*** yamamoto_ has joined #openstack-infra02:55
*** yamamoto_ has quit IRC02:56
*** yamamoto has quit IRC02:56
*** yamamoto has joined #openstack-infra02:59
*** yamamoto has quit IRC02:59
*** rcernin has quit IRC03:04
*** rcernin has joined #openstack-infra03:04
*** bhavikdbavishi has joined #openstack-infra03:07
*** npochet has quit IRC03:19
*** npochet has joined #openstack-infra03:19
*** mnaser has quit IRC03:19
*** mnaser has joined #openstack-infra03:20
*** gagehugo has quit IRC03:22
*** gagehugo has joined #openstack-infra03:22
*** gagehugo has quit IRC03:25
*** gagehugo has joined #openstack-infra03:28
*** ykarel|afk has joined #openstack-infra03:30
*** rcernin has quit IRC03:30
*** rcernin has joined #openstack-infra03:31
*** psachin has joined #openstack-infra03:39
*** virendra-sharma has joined #openstack-infra03:46
*** sweston has quit IRC03:51
*** sweston has joined #openstack-infra03:51
*** udesale has joined #openstack-infra03:53
*** ykarel|afk is now known as ykarel04:23
*** hongbin has quit IRC04:25
*** virendra-sharma has quit IRC04:26
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Expand documentation of test-setup role  https://review.opendev.org/66724404:27
*** virendra-sharma has joined #openstack-infra04:28
*** ykarel has quit IRC04:32
openstackgerritMerged openstack/project-config master: Revert "Revert "Revert "Disable provider limestone"""  https://review.opendev.org/66725004:32
*** sgw has joined #openstack-infra04:34
*** yamamoto has joined #openstack-infra04:34
*** yamamoto has quit IRC04:39
openstackgerritMerged openstack/project-config master: New project request: airship/docs  https://review.opendev.org/66619004:39
*** sgw has quit IRC04:41
*** hongbin has joined #openstack-infra04:42
*** ykarel has joined #openstack-infra04:48
*** hongbin has quit IRC04:49
*** smarcet has quit IRC04:52
*** kjackal has joined #openstack-infra05:03
adriantI've got zuul suddenly failing with no code changes with an error for: EnvironmentError: mysql_config not found05:39
adriantwhich is usually a lack of related mysql libs installed by the OS package manager05:39
adrianthas our zuul base image changed recently?05:39
AJaegeradriant: yes, let me find a link for you...05:43
AJaegeradriant: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html05:43
AJaegeradriant: create a bindep.txt file and add mysql to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it05:44
adriantAJaeger: ty!05:50
*** jtomasek has joined #openstack-infra05:51
adriantAJaeger: should I mostly be worrying about [platform:dpkg] as that's what zuul is built on, or should i try and cover most of the bases?05:54
*** pcaruana has joined #openstack-infra05:56
*** pcaruana has quit IRC05:57
*** pcaruana has joined #openstack-infra05:57
openstackgerritMerged zuul/zuul-jobs master: Add install-devstack role  https://review.opendev.org/66715705:58
openstackgerritMerged zuul/zuul-jobs master: Expand documentation of test-setup role  https://review.opendev.org/66724405:58
AJaegeradriant: I would copy gentoo, rpms as well - better now to have it in then hunt down a problem later06:00
*** lpetrut has joined #openstack-infra06:01
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/66726806:08
openstackgerritMerged opendev/system-config master: Role integration-tests : use a group match for openafs  https://review.opendev.org/66558506:19
*** pgaxatte has joined #openstack-infra06:26
openstackgerritMerged opendev/system-config master: Use openstack-ci-core PPA for openafs 1.8.3  https://review.opendev.org/66532006:29
*** e0ne has joined #openstack-infra06:29
openstackgerritMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/66726806:35
*** slaweq has joined #openstack-infra06:38
*** rcernin has quit IRC06:41
*** altlogbot_0 has quit IRC06:46
openstackgerritMerged opendev/system-config master: Separate openafs CI mirror  https://review.opendev.org/66556806:47
*** altlogbot_1 has joined #openstack-infra06:49
*** altlogbot_1 has quit IRC06:50
*** dpawlik has joined #openstack-infra06:55
*** altlogbot_2 has joined #openstack-infra06:55
*** e0ne has quit IRC06:57
openstackgerritAdam Coldrick proposed opendev/storyboard master: Correct team iterator lists in board creation  https://review.opendev.org/66727506:58
*** jpich has joined #openstack-infra07:07
*** tesseract has joined #openstack-infra07:11
*** ginopc has joined #openstack-infra07:14
yoctozeptowe have issues with Zuul atm - http://zuul.openstack.org/builds?project=openstack%2Fkolla&project=openstack%2Fkolla-ansible&result=retry_limit07:14
yoctozeptoalmost everything is in retry_limit either telling us to finger or that connections to all slave nodes were lost :/07:15
yoctozeptothis is for different branches and pipelines07:16
AJaegeryoctozepto: when did this start?07:18
AJaegeryoctozepto: might be that a binary is missing, please read http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html07:19
yoctozeptoAJaeger: seems like during the night07:19
AJaegeryoctozepto: the changes mentioned merged after 19:00 UTC last night...07:20
yoctozeptoAJaeger: https://review.opendev.org/663151 ?07:21
AJaegerdo you use fetch zuul-cloner?07:22
AJaegerI was thinking about https://review.opendev.org/#/c/656195/07:22
*** xek has joined #openstack-infra07:22
AJaegeryoctozepto: did you follow the log files and were able to catch content?07:23
yoctozeptoAJaeger: yeah, thought about it, now I am connected to some that I believe would fail07:24
yoctozeptorunning fine so far07:24
*** hrw has joined #openstack-infra07:25
*** yboaron_ has joined #openstack-infra07:26
yoctozeptosee, some jobs still passed during that time07:26
*** iurygregory has joined #openstack-infra07:27
AJaegercould also be another problem...07:28
*** tosky has joined #openstack-infra07:28
*** jtomasek has quit IRC07:31
*** jtomasek has joined #openstack-infra07:32
*** Emine has joined #openstack-infra07:33
*** hrw has left #openstack-infra07:35
*** yboaron_ has quit IRC07:36
*** yboaron_ has joined #openstack-infra07:36
*** iurygregory has quit IRC07:37
*** virendra-sharma has quit IRC07:38
*** ykarel is now known as ykarel|lunch07:41
*** ccamacho has joined #openstack-infra07:41
*** ccamacho has quit IRC07:41
*** sshnaidm has joined #openstack-infra07:42
*** ccamacho has joined #openstack-infra07:43
*** jpena|off is now known as jpena07:48
*** jpena is now known as jpena|mtg07:48
*** apetrich has joined #openstack-infra07:54
*** udesale has quit IRC08:01
*** e0ne has joined #openstack-infra08:01
*** udesale has joined #openstack-infra08:03
*** udesale has quit IRC08:03
*** udesale has joined #openstack-infra08:03
*** ricolin has joined #openstack-infra08:05
*** yboaron_ has quit IRC08:07
yoctozeptoAJaeger: all the jobs I watched completed fine, seems to no longer be a problem, but nevertheless it failed many of those night jobs08:07
*** dchen has quit IRC08:09
*** lucasagomes has joined #openstack-infra08:11
*** pkopec has joined #openstack-infra08:12
*** ricolin has quit IRC08:13
*** yboaron_ has joined #openstack-infra08:21
*** ricolin has joined #openstack-infra08:22
*** tkajinam has quit IRC08:27
*** tkajinam has joined #openstack-infra08:28
*** ralonsoh has joined #openstack-infra08:28
*** yboaron_ has quit IRC08:28
*** yboaron_ has joined #openstack-infra08:29
*** tkajinam has quit IRC08:29
*** noama has joined #openstack-infra08:30
AJaegeryoctozepto: strange ;/08:32
*** gregoryo has quit IRC08:33
*** ykarel|lunch is now known as ykarel08:40
*** imacdonn has quit IRC08:42
*** imacdonn has joined #openstack-infra08:42
*** priteau has joined #openstack-infra08:56
*** jaosorior has joined #openstack-infra09:03
*** yolanda has joined #openstack-infra09:07
*** gfidente has joined #openstack-infra09:09
*** jaosorior has quit IRC09:11
yoctozeptoAJaeger: and here it comes again, Zuul again tells us to finger :P09:12
*** dciabrin__ has joined #openstack-infra09:23
*** dciabrin_ has quit IRC09:27
openstackgerritjacky06 proposed openstack/os-testr master: Replace git.openstack.org URLs with opendev.org URLs  https://review.opendev.org/65506209:33
*** kobis1 has joined #openstack-infra09:36
*** jangutter_ has joined #openstack-infra09:43
*** noonedeadpunk has quit IRC09:44
*** bhavikdbavishi has quit IRC09:47
*** panda has quit IRC09:55
*** gmann has quit IRC09:57
*** salv-orlando has joined #openstack-infra09:58
*** panda has joined #openstack-infra10:00
*** Lucas_Gray has joined #openstack-infra10:02
*** ociuhandu has joined #openstack-infra10:13
*** yamamoto has joined #openstack-infra10:15
*** ociuhandu_ has joined #openstack-infra10:16
*** ociuhandu has quit IRC10:16
*** yamamoto has quit IRC10:17
*** virendra-sharma has joined #openstack-infra10:22
*** tdasilva_ has quit IRC10:24
*** Lucas_Gray has quit IRC10:27
*** kobis1 has quit IRC10:28
*** Lucas_Gray has joined #openstack-infra10:29
*** ykarel is now known as ykarel|meeting10:35
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213410:36
*** gfidente has quit IRC10:37
*** kobis1 has joined #openstack-infra10:42
*** lucasagomes has quit IRC10:56
*** lucasagomes has joined #openstack-infra10:57
*** gmann has joined #openstack-infra10:58
*** ykarel_ has joined #openstack-infra10:59
*** jaosorior has joined #openstack-infra10:59
*** ykarel_ has quit IRC11:00
*** ykarel has joined #openstack-infra11:01
*** ykarel|meeting has quit IRC11:01
*** jaosorior has quit IRC11:02
*** jaosorior has joined #openstack-infra11:03
*** e0ne has quit IRC11:05
*** e0ne has joined #openstack-infra11:05
*** udesale has quit IRC11:06
*** udesale has joined #openstack-infra11:08
*** ykarel_ has joined #openstack-infra11:09
*** Lucas_Gray has quit IRC11:10
*** Wryhder has joined #openstack-infra11:10
*** Wryhder is now known as Lucas_Gray11:11
AJaegeryoctozepto: this needs an infra-root to dig into - and more information probably11:11
*** ykarel has quit IRC11:12
*** dciabrin_ has joined #openstack-infra11:13
*** iurygregory has joined #openstack-infra11:15
*** dciabrin__ has quit IRC11:17
*** gfidente has joined #openstack-infra11:23
*** tobiash has joined #openstack-infra11:25
*** hwoarang has quit IRC11:29
*** hwoarang has joined #openstack-infra11:30
openstackgerritMerged opendev/storyboard master: Correct team iterator lists in board creation  https://review.opendev.org/66727511:36
*** priteau has quit IRC11:43
*** tdasilva has joined #openstack-infra11:46
*** _erlon_ has joined #openstack-infra11:48
*** psachin has quit IRC11:54
*** jamesdenton has quit IRC11:54
zbr|ruckso does anyone have an idea about what happens with the RETRY_LIMIT?11:55
*** bhavikdbavishi has joined #openstack-infra11:56
pabelangerzbr|ruck: https://zuul-ci.org/docs/zuul/user/jobs.html#build-status11:56
*** bhavikdbavishi1 has joined #openstack-infra11:59
*** goldyfruit has joined #openstack-infra12:00
yoctozeptopabelanger: good one, made me laugh12:00
*** bhavikdbavishi has quit IRC12:00
*** bhavikdbavishi1 is now known as bhavikdbavishi12:00
openstackgerritAndy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute  https://review.opendev.org/66734212:00
yoctozeptozbr|ruck probably means about the current state of Zuul12:00
pabelangeryoctozepto: do you have a log file?12:00
yoctozeptowhich discards most jobs with it12:00
pabelangeroh, let me check12:01
openstackgerritAndy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute  https://review.opendev.org/66734212:01
yoctozeptoand shows us a finger url12:01
pabelangerjust coming online, with on coffee :)12:01
yoctozeptoprobably the middle one12:01
yoctozeptonasty piece of software12:01
yoctozepto:D12:01
pabelangerlooking now12:01
yoctozeptopabelanger: thanks12:02
fungii'm not really around yet either, but i think we need to take limestone back offline: http://cacti.openstack.org/cacti/graph_view.php12:03
fungilogan-: ^12:03
fungii'll get a revert of the revert of the revert of... pushed up now12:03
pabelangerso https://zuul.openstack.org/config-errors is the first issue12:04
pabelangerthere are a few config errors12:04
pabelangerbut, that doesn't seem to be the source of the issues12:04
*** Lucas_Gray has quit IRC12:04
pabelangerif somebody wants to look into that12:04
openstackgerritAndy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute  https://review.opendev.org/66734212:04
openstackgerritJeremy Stanley proposed openstack/project-config master: Revert "Revert "Revert "Revert "Disable provider limestone""""  https://review.opendev.org/66734312:05
*** goldyfruit has quit IRC12:05
pabelangerhttp://logs.openstack.org/09/665909/1/gate/openstack-tox-pylint/fc556d6/job-output.txt.gz#_2019-06-25_11_35_32_61431212:05
fungiand i'll bypass zuul for that since there's a good chance it won't erge12:05
pabelangerlooks like fall out of bindep being removed^12:05
AJaegercloudnull: please finish the retirement of   openstack/ansible-role-tripleo-cookiecutter, I commented on https://review.opendev.org/#/c/66489512:05
pabelangerthat is trove12:05
AJaegerfungi, +212:06
openstackgerritMerged openstack/project-config master: Revert "Revert "Revert "Revert "Disable provider limestone""""  https://review.opendev.org/66734312:07
AJaegerpabelanger: indeed, trove has no bindep.txt file....12:07
fungiwell, having no bindep.txt should be fine as long as your jobs install the right packages already. if you need any additional system packages not installed by your jobs, then a bindep.txt is likely warranted (and i'm guessing that's trove's situation)12:08
AJaegerfungi: yes, exactly - pabelanger linked to a line that uses mysql which is not installed by default12:08
AJaegerzuul.openstack.org looks broken to me - stays in "Fetching info..." with no output12:09
pabelangerHmm, is zuul.o.o not working for anybody else?12:09
pabelangerAJaeger: yah, I see that too12:10
AJaegerpabelanger: yes, me12:10
pabelangerzuul.o.o is swapping12:10
*** Lucas_Gray has joined #openstack-infra12:10
pabelangerhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64794&rra_id=all12:10
pabelangerlooks like a memory leak12:11
pabelangerhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all12:11
AJaegerthat could explain some retries as well, wouldn't it?12:11
pabelangerinfra-root: I won't be able to deal with zuul.o.o outage, sadly. But looks like we need to restart12:11
AJaegerfungi, the RETRY_LIMIT might come from misbehaving Zuul as well, won't they?12:12
elodhi, I've also noticed that a lot of periodic jobs failed today with missing 'mysqladmin' and 'dot' commands. Is a bindep.txt with mysql-client and/or graphviz a good solution for this issues? what do you suggest?12:12
*** rfolco has joined #openstack-infra12:12
AJaegerelod: let me paste backscroll...12:13
AJaegerelod: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html12:13
AJaegerelod: create a bindep.txt file and add mysql and graphviz to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it12:13
AJaegerSo, yes, this is exactly the required solution, elod12:14
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213412:15
elodAJaeger: thanks, i've read that, just wanted to reassure12:15
elodAJaeger: then I'll start fixing and testing12:16
*** virendra-sharma has quit IRC12:16
AJaegerthanks, elod !12:18
*** salv-orlando has quit IRC12:19
*** tdasilva has quit IRC12:20
*** tdasilva has joined #openstack-infra12:25
openstackgerritAndreas Jaeger proposed openstack/project-config master: Fix retirement of ansible repos  https://review.opendev.org/66734612:26
AJaegerpabelanger: this should fix the Zuul errors you noticed ^12:26
*** rlandy has joined #openstack-infra12:29
*** goldyfruit has joined #openstack-infra12:33
openstackgerritFabien Boucher proposed zuul/zuul master: Remove non working tests/base.py ZuulTestCase.getPipeline method  https://review.opendev.org/66735112:40
*** ykarel_ is now known as ykarel|away12:48
*** jcoufal has joined #openstack-infra12:50
*** mriedem has joined #openstack-infra12:50
openstackgerritAndreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove job legacy-puppet-beaker-rspec  https://review.opendev.org/66735712:50
*** jangutter_ is now known as jangutter12:51
portdirecthey - we have been having some issues with our docs jobs over the last day12:54
portdirecteg: http://logs.openstack.org/24/667224/2/check/openstack-tox-docs/fa79d3d/job-output.txt.gz#_2019-06-24_20_50_55_52489912:55
portdirectlooks like gettext is missing?12:55
AJaegerportdirect, this could be a fallout from http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html12:55
AJaegerSo, create a bindep.txt file and add gettext to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it12:56
portdirectawesome - thanks AJaeger12:57
*** ykarel|away has quit IRC13:01
*** smarcet has joined #openstack-infra13:01
*** ekultails has joined #openstack-infra13:03
*** dave-mccowan has joined #openstack-infra13:06
*** aaronsheffield has joined #openstack-infra13:09
*** lseki has joined #openstack-infra13:10
*** sthussey has joined #openstack-infra13:10
*** kjackal has quit IRC13:14
*** kjackal has joined #openstack-infra13:14
*** salv-orlando has joined #openstack-infra13:16
*** whoami-rajat has quit IRC13:16
*** dave-mccowan has quit IRC13:18
*** bdodd has quit IRC13:22
zbr|ruckpabelanger: fungi : https://review.opendev.org/#/c/667346/ zuul config error fix, ok to merge? ... 30min to lint it, less cool.13:29
fungiyeah, i approved it13:30
fungii'm mostly trying to understand the memory profiling mechanism corvus restarted the zuul scheduler onto so i can determine whether (and how) stats should be extracted from it before performing an emergency restart13:31
fungisince i expect this is the exact memory leak we've been trying to catch13:31
fungiunfortunately searching past discussions for "repl" is challenging since it's also a substring of words like "replication"13:35
*** mriedem is now known as mriedem_afk13:38
*** brett-soric has joined #openstack-infra13:39
*** bhavikdbavishi has quit IRC13:39
fungilooks like it's been incorporated by cherry-picking https://review.opendev.org/57996213:39
fungiand then turned on with the rpc client by running `zuul repl`13:40
*** Goneri has joined #openstack-infra13:40
fungiand listens on localhost:3000/tcp13:40
fungiwhich is currently listening13:41
openstackgerritMerged openstack/project-config master: Fix retirement of ansible repos  https://review.opendev.org/66734613:41
fungiso i take that to mean it's active13:41
*** munimeha1 has joined #openstack-infra13:42
fungithis unfortunately doesn't tell me how he was hoping to use it to diagnose the memory leak, so still digging13:42
*** eharney has joined #openstack-infra13:44
fungimention in #zuul a while back of using it to call objgraph.show_backrefs() on various objects (though i don't see him say which ones)13:45
fungianother mention there of "enter python commands to inspect the memory state and use objgraph to dump it to a file"13:46
*** bdodd has joined #openstack-infra13:46
fungiat this point corvus will likely be around soon, so probably best if we can just hold off restarting the scheduler until he's on hand since i've pulled about all i can from past discussions and docs13:47
fungiodds are he already has at least some suspicions as to which objects are candidates for bloat13:48
*** openstackgerrit has quit IRC13:48
cloudnullAJaeger on it13:51
AJaegerthanks, cloudnull13:51
*** openstackgerrit has joined #openstack-infra13:53
openstackgerritSimon Westphahl proposed zuul/nodepool master: wip: Allow proceeding with requests on quota exceeded  https://review.opendev.org/66737113:53
openstackgerritSimon Westphahl proposed zuul/nodepool master: wip: Allow proceeding with requests on quota exceeded  https://review.opendev.org/66737113:55
*** yamamoto has joined #openstack-infra13:55
*** kjackal has quit IRC13:57
*** kjackal has joined #openstack-infra13:57
*** aedc has joined #openstack-infra14:01
*** Goneri has quit IRC14:03
*** ykarel has joined #openstack-infra14:06
*** michael-beaver has joined #openstack-infra14:10
clarkbfungi: likely the configuration objects, but I couldnt tell you how to get at those with objgraph14:12
roman_gHello team. Is there a cached golang distro repository somewhere in OpenInfra I can re-use? Or may be fresh golang installed onto some of the images I could utilize via Zuul?14:13
AJaegermordred: want to abandon https://review.opendev.org/641474? We should merge https://review.opendev.org/667228 instead...14:14
roman_gNeed golang v1.12. Ubuntu provides older versions.14:14
clarkbroman_g: there isnt14:15
roman_gclarkb: 123MB. Is this fine?14:16
fungihave you checked to see what versions fedora-30 or opensuse-tumbleweed provide>14:16
clarkbfine in what context?14:16
roman_gIf I'd download it on each gate :)14:16
*** Goneri has joined #openstack-infra14:16
roman_gfungi: will check, good idea14:16
fungiwe only have one gate. guessing you mean in each build14:17
roman_gyes, sorry14:17
fungi(zuul is our gate)14:17
roman_gfungi: fedora 30 has 1.12. Thank you!14:19
corvusclarkb, pabelanger, fungi: it's probably too late to get any useful info from zuul14:19
corvusyou really need to do that before it starts swapping14:19
corvusotherwise you'll just spend 10 hours waiting for it to swap all the memory back in to scan for objects14:19
fungiunfortunately it looks like it started swapping well after i was asleep14:19
fungiokay, so just dump copies of the queues and restart? i'll give #openstack-release a heads up to pause approvals14:20
*** mriedem_afk is now known as mriedem14:20
corvuslooking at the graph, probably anytime after june 20th would have been interesting14:20
fungii let them know14:22
corvuswho is taking point on debugging this?14:22
fungiso what commit should we install before restarting?14:22
fungii can set myself a daily reminder to check the memory graph for early signs of the leak14:22
fungithough i'll be offline most of next week14:22
corvusi'll go ahead and see if i can get a rough object count at least14:23
corvusbut i kind of thought someone else was taking this one14:24
fungii can, just not sure what i'm looking for (or at)14:24
roman_gfedora-30 image isn't yet available here, I think https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/nodesets.yaml14:25
roman_gfedora-29 is available14:25
corvusi don't either ... i start from first principles every time :)14:25
fungifair14:25
*** sgw has joined #openstack-infra14:25
fungionce we've got it restarted i'll familiarize myself with how to list objects and what the object graph output looks like14:25
fungii could probably stand to learn a bit about python's memory management anyway14:26
corvusthis is where i learned most of what i know: https://mg.pov.lt/objgraph/14:26
fungithanks, that looks like an excellent resource14:27
*** brett-soric has left #openstack-infra14:27
*** dpawlik has quit IRC14:27
corvusfungi: i started a screen on zuul0114:28
corvusas root14:28
fungiand i've joined it14:28
fungiahh, neat, so it can list object types by frequency of presence in memory?14:29
corvusyep14:29
corvusthat's going to cause some swapping, but maybe it'll finish relatively fast... like <30m?14:29
fungiand chances are even this may not return in any reasonable amount of time due to memory pressure?14:30
fungiyeah, figured14:30
roman_gfedora-29 has golang 1.11 only, not 1.12.14:30
corvusyeah, it's possible.  we should think about how much time we should give it before we give up14:30
corvusthe more complex stuff, like actually tracing object links would certainly take many hours at this point14:31
roman_gWhat is the process to add fedora-30 image to the Zuul?14:31
fungiroman_g: i saw someone mention working on adding fedora-30, not sure who it was or if that's close to done yet though... have you checked opensuse-tumbleweed? that should be present (and current)14:31
fungicorvus: how about until now? ;)14:31
*** salv-orlando has quit IRC14:31
corvusfungi: heh, i would have given it a few more mins even :)14:32
clarkbroman_g: fungi another option is to usr a golang docker container and we'll cache the layer objects14:32
fungiindeed, i was originally about to say 14:4514:32
fungiso we have 35239933 mappingproxy objects?14:32
funginot familiar with that object type14:32
roman_gclarkb: yes, that's also a good option.14:33
corvusfungi: it's a read-only dict14:33
roman_gthank you, clarkb14:33
fungiahh, so doesn't really tell us much14:33
fungiused for internal representation of things like class attributes14:33
corvusfungi: but we mostly use it in zuul's config, so it's an indication that, somehow, zuul configuration objects are involved.14:33
fungioh, good to know14:34
fungianyway, if nothing else, we have a baseline object count to compare against after we reach a steady state following the restart (for example tomorrow)?14:34
corvuslet's see if that returns in any reasonable time14:35
fungii'll get all this into a paste for comparison against the coming days14:36
fungior do we already have an etherpad going for the memory leak investigation prior to today?14:36
*** Goneri has quit IRC14:36
*** aedc has quit IRC14:37
corvusfungi: not that i'm aware of, but here are all my notes from previous investigations:14:37
corvushttps://etherpad.openstack.org/p/zuul-memory-leak14:37
corvusi just dumped them in there14:37
fungioh, that wfm14:38
corvusunfortunately, it's context free, but there are a bunch of potentially useful functions14:38
fungii see that14:38
*** lpetrut has quit IRC14:38
corvusi'm stepping through that first function now14:38
corvusi'm going to rework that for multi-tanancy real quick14:40
fungiso we don't really have a lot of layouts in flight, doesn't look like?14:40
corvusright14:40
corvushighly suggestive of leaked layouts14:40
corvusi want to get the numbers per tenant to see if it's consistent across tenants14:41
fungiahh14:41
corvusmaybe our use of multi-tenancy is the behavior change which triggered tihs14:41
*** panda has quit IRC14:41
clarkbthat may also explain why we are seeing it when others don't seem to be14:43
*** panda has joined #openstack-infra14:43
fungian interesting theory14:43
*** yamamoto has quit IRC14:45
*** ykarel is now known as ykarel|afk14:45
*** yamamoto has joined #openstack-infra14:45
clarkbunrelated, but one of the things on my catch up todo list is to clean up the nodepool control plane image build stuff further. Any idea if mordred will be around at some point (as his eyeballs on that cleanup would be helpful)14:45
corvuszero seems an unusually small number of layouts to have for a tenant14:45
fungiyeah14:46
corvusclarkb: his email on the subject said he didn't know what this week would be like, but he thought he would be working.  so, i think it's "maybe this week, should be back by next"14:46
fungiso it's saying *all* the layouts are in the openstack tenant?14:46
corvusyeah, that seems unright14:46
clarkbcorvus: thanks14:46
corvusfungi: oh, i see14:46
corvusfungi: we're only looking for layouts for enqueued items, so it just means there's nothing running in the other tenants14:47
*** yamamoto has quit IRC14:47
corvuspresumably each tenant also still has its own currently running layout, we're just not counting it there14:47
fungigot it. this is not the non-speculative laouts14:47
fungibecause we're iterating over items in queues there to ge that count14:48
corvusso we've leaked 350 layouts (or, really, maybe 346)...14:48
fungihow much memory would we expect those to consume?14:48
*** igordc has joined #openstack-infra14:49
corvushandful of mbytes each14:49
corvusmaybe more than a handfull14:49
corvusi forgot to say "print(tenant)"14:50
*** Goneri has joined #openstack-infra14:50
corvusi'm not sure what it's doing now14:50
fungieep14:50
fungiloading the tenant object before it can try to execute it (and find out it's not callable)?14:50
corvusyeah, i would have thought it would still be in memory14:50
*** pgaxatte has quit IRC14:51
corvuswe're heavily swapping now14:53
fungioof14:53
corvusfungi: i replaced the method in the etherpad with the more tenant-aware version14:54
fungimaybe that's why repl has paused on us, and it's unrelated to the "tenant" function-not-function14:54
corvusyeah could be14:54
corvusi also stuck in a thing to add in the non-speculative layouts for each tenant so they are correctly accounted for14:55
corvusthe idea is that at the end of this function, we have a handle to all of the leaked layout objects14:55
fungiooh, thanks!14:55
*** udesale has quit IRC14:55
corvusin the past, at that point, i've picked one at random, and started having objgraph render graphs around it to try to figure out who's holding on to it14:55
corvusapparently by "at random" i mean "the first one in the list"14:56
fungiwhat are the odds we're still going to be able to dump copies of the gate/check pipeline contents at this point? probably have to wait for the paging to calm down again?14:56
*** rakhmerov has joined #openstack-infra14:56
clarkbfungi: we should have the "historical" recorded versions of the status14:56
fungioh, right14:56
corvusfungi: yes... now that you mention it, it looks like lines 1-29 are basically a procedure for this situation14:56
fungithe ones we dump to disk14:57
clarkbfungi: ya those14:57
rakhmerovhi, just joined the channel and want to make sure that you're aware of issues with CI14:57
rakhmerova lot of RETRY_LIMIT statuses14:57
corvusfungi: and some of those lines say "save the queues"14:58
fungirakhmerov: yep, we're about to restart the zuul scheduler for that reason14:58
rakhmerovok, thanks14:58
fungi(or at least we assume the RETRY_LIMIT results are related to heavy swapping from this memory leak)14:58
*** munimeha1 has quit IRC14:59
AJaegerfungi: so, want to renable limestone?14:59
AJaeger(after some time...)14:59
fungiAJaeger: no, i expect the network connectivity issues in limestone are unrelated to this15:00
fungii saw lots of packet loss between cacti and the mirror there15:00
corvusfungi: what's our budget for waiting around on this?  think we should restart now?15:01
fungiprobably time to cut our losses on it, yes15:02
fungioh! and it returned15:02
corvusha it returned15:02
fungi;)15:02
fungiokay, so same result15:03
corvusyeah, i put the non-dynamic layouts in the sched_layouts variable only though15:04
fungiso we have 3 of those i guess?15:04
corvusso we're down to 348 leaked layouts15:04
*** yboaron_ has quit IRC15:04
corvusi saved the queues15:05
openstackgerritHervé Beraud proposed openstack/pbr master: Fix parsing on egg names with dashes from git URLs  https://review.opendev.org/64872715:05
corvusfungi: if we're lucky, we'll get a graph out of that one15:05
corvusfungi: or it could take forever due to swapping15:05
corvusfungi: i'll let you decide how long to wait :)15:06
*** bexelbie has joined #openstack-infra15:07
*** jpena|mtg is now known as jpena|off15:11
fungii'm also on a conference call now, so trying to juggle that15:11
fungiokay, my updates are out of the way so can focus on this more again15:12
fungisorry about that15:12
fungistill trying to record the useful values in a paste as well for comparison to post-restart values in the coming weeks15:14
openstackgerritHervé Beraud proposed openstack/pbr master: Fix parsing on egg names with dashes from git URLs  https://review.opendev.org/64872715:16
cloudnullHey all - is there a way to limit concurrency for a set of jobs?15:16
cloudnullWe're adding jobs to tripleo-ansible for the roles we add to the repo [https://github.com/openstack/tripleo-ansible/blob/master/zuul.d/molecule.yaml#L4-L18]15:16
clarkbcloudnull: you can have jobs wait on the results of other job(s)15:16
cloudnullTo make sure we're not running everything all the time we have the files filter set [https://github.com/openstack/tripleo-ansible/blob/master/zuul.d/molecule.yaml#L37-L38] which results in us limiting what runs when things change.15:17
*** lpetrut has joined #openstack-infra15:17
cloudnullHowever, we also want to make sure we're not creating a situation where we have N roles and someone makes some change that hits all of them resulting in N+ jobs, all trying to schedule/run at once.15:17
cloudnullSo, is there a way that we could add some kind of a guard to ensure we're only ever running something like 5 jobs at a time?15:17
corvuscloudnull: if a change touches N roles, why not run all the jobs?15:19
cloudnullcorvus I'd want it to run all of them, just not all at once15:19
corvuscloudnull: oh, you mean just to be nice to the rest of the system?15:20
* cloudnull trying to be a good zuul-izen 15:20
corvuscloudnull: i wouldn't fret too much -- the scheduling algorithm is relatively fair15:20
cloudnullfair enough.15:21
cloudnullclarkb do you have a link to docs on setting up waits, I think we should do that anyway.15:21
*** tdasilva has quit IRC15:21
corvus(and jobs which have been waiting on other jobs actually get a bonus in the scheduler, so that they get a node faster, so trying to stage them out like that could be counter-productive)15:22
*** ccamacho has quit IRC15:22
*** yamamoto has joined #openstack-infra15:23
corvus(the idea is that if a job waited on the completion of another job, it shouldn't have to wait again -- it's already served it's time waiting, so it jumps to the head of the queue)15:23
*** lpetrut has quit IRC15:23
cloudnullTIL - thanks corvus!15:24
clarkbcloudnull: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.requires to block on resources https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies to dep on jobs directly15:24
*** kobis1 has quit IRC15:25
openstackgerritFabien Boucher proposed zuul/zuul master: URLTrigger driver time based  https://review.opendev.org/63556715:26
*** xek has quit IRC15:26
*** e0ne has quit IRC15:26
cloudnullTyvm15:28
*** yamamoto has quit IRC15:28
corvusfungi: should we give up?15:29
fungia perennial question15:29
fungii guess we're at ~1 hr of poking, so probably a good time to put a pin in it here and see what we can get next round when things aren't slowed by swapping15:30
fungiso should we restart on tip of master with the repl change cherry-picked again?15:30
corvusfungi: yes, then +3 https://review.opendev.org/579962 :)15:31
corvusfungi: we could probably just re-install from my last checkout15:31
fungiyep, i was wanting to take a closer look at it and then approve15:31
corvusi don't think we need anything newer really15:31
fungiwhat was your last checkout? i couldn't find a reference in the channel log15:32
corvus~corvus/zuul15:32
funginot in /opt/zuul presumably15:32
*** igordc has quit IRC15:32
fungiaha, thanks15:32
fungishall i pip install that now?15:32
corvusjust a 'sudo pip3 install ~corvus/zuul" should do it.  yeah15:32
clarkband python3 /usr/local/bin/pbr freeze | grep zuul to confirm15:33
fungidone15:33
zbr|ruckcorvus: fungi: regarding job dependencies, i was trying to use them but apparently i cannot practically implement it due to lack of wilcards.15:33
fungiyep, zuul==3.8.2.dev142  # git sha a90f1c315:33
*** iurygregory has quit IRC15:33
clarkbzbr|ruck: the requires attribute I linked above may be more flexible15:34
zbr|rucki ended up creating two feature request in zuul: https://storyboard.openstack.org/#!/story/2005952 and https://storyboard.openstack.org/#!/story/2005951 -- not sure how hard to implement.15:34
fungishould we try to dump the pipeline contents again, or just use the one already on disk?15:34
*** xek has joined #openstack-infra15:34
clarkbzbr|ruck: then jobs can provide some "resource" that may just be abstract and jobs can depend on that. But I think it requires a bit more work to get that in place15:34
zbr|ruckmainly I want to split jobs into 3 waves: fast, medium, slow.15:35
corvusfungi: i have one from 30m ago in ~root, or we could see if there's something in the archive15:35
clarkbcorvus: fungi I'm guessing the one from 30 minutes ago is probably up to date enough based on my inability to get a status15:35
fungicorvus: how long did it take to return when you ran it?15:36
corvuszbr|ruck, clarkb: is this the thing where people say "i want to run the fast jobs first, then the slow ones only if they succeed"?  that's been asked a thousand times and every time we've said "please don't do that"15:36
fungibut yeah, we can just use that one i guess15:36
corvusfungi: was fast15:36
clarkbcorvus: yes I think so15:36
fungioh, if it was fast... trying to do another dump now as check_new.sh and gate_new.sh15:36
mwhahahais zuul.openstack.org unresponsive for anyone else?15:36
corvusclarkb: provides/requires is for inter-change communication, it won't create dependencies between jobs within a change.15:37
fungii'll give it a minute and if it doesn't return we can use the existing check.sh and gate.sh15:37
corvusfungi: ++15:37
clarkbcorvus: oh I thought it was used for waiting for our images to build within a chnage too15:37
corvusclarkb: nope, still need job.dependencies for that15:37
clarkbcorvus: the whole run image build and registry, then pause, and start the other jobs. Gotcha15:37
zbr|ruckcorvus: but why? i hear people here complaining about tripleo using lots of resources, and I see zuul waiting 4h to report a failure on linting, .... while also running deployments. It makes no sense for me to waste resources this way.15:37
*** jaosorior has quit IRC15:38
clarkbzbr|ruck: because we tried ti and it resulted in a lot more churn15:38
*** jaosorior has joined #openstack-infra15:38
clarkbzbr|ruck: instead of one patchset, fix errors its now a patchset for each class of error15:39
fungiokay, status dumps are not returning yet. i'll cancel these and do the service restart15:39
clarkband in aggregate that consumes more resources.15:39
corvuszbr|ruck: it wastes less developer time -- you can see all the things that are wrong with a change at once15:39
*** Emine has quit IRC15:39
yoctozeptoguys, how are RETRY_LIMITs feeling? :D15:40
fungicorvus: i have a feeling i'm going to need to kill the scheduler while the call in the repl is still going15:40
fungiany objection?15:40
corvusfungi: nope15:40
clarkbfungi: you may have to remove the pidfile if you do that (but no objection)15:40
zbr|ruckcorvus: i really doubt it saves developer time if the developer have to way extra 2h to get the response from zuul that the linting jobs failed 1h 50m before sending the response.15:40
clarkbzbr|ruck: they can also run tox locally or check zuul status directly15:40
clarkbzbr|ruck: it doesn't actually take 4 hours to discover that data15:41
corvuszbr|ruck: yeah, i mean, presumably they have something else to do during that time :)15:41
fungiokay, cleanly stopped15:41
fungistarting again now15:41
fungiwell, not *cleanly* stopped but i was able to introduce a segfault in the child and parent15:42
zbr|ruck..yeah, I everyone should install greasemonkey scripts or other tools/tricks in order to read CI status, not sure if that counts as best UX.15:42
fungisince they were ignoring sigterm15:42
corvuszbr|ruck: if the author has nothing better to do, then, yes, watch the status page.  if the author has something better to do, then in 4 hours they will find out everything that's wrong with the patch.15:43
clarkbzbr|ruck: I check it just fine in the existing ui... what is greasemonkey providing (or in another way what do you think is missing)15:43
yoctozeptooh, come on, don't ignore me, I have something even better: if you could take a look at http://logs.openstack.org/08/665908/1/check/kolla-build-ubuntu-binary/886f200/ we have discrepancy between what ara says and what is in the txt log as if Zuul ran two instances in parallel?15:43
*** xek has quit IRC15:43
corvusyoctozepto: i don't think anyone meant to ignore you15:44
corvusit's just that we're in the middle of a service restart15:44
fungiokay, scheduler has started again15:44
yoctozeptocorvus: it's ok, I seem unable to express the joyful tone in my messages ;D15:44
fungii had to manually remove the pidfile since the way in which it died did not remove it15:44
fungicorvus: do i need to wait for the cat jobs to complete before reenqueuing?15:45
corvusfungi: best do, yes, otherwise it may timeout15:45
clarkbfungi: yes15:45
fungicool, just making sure15:45
clarkbwhat happens if yo udon't wait is all the enqueue commands that run before the config update fail15:46
corvusyoctozepto: what's the discrepancy?15:46
*** kjackal has quit IRC15:47
*** kjackal has joined #openstack-infra15:47
zbr|ruckclarkb: the  script is showing me the build status before zuul reports them (xx%, which one failed), so the gerrit user is not forced to visit another webpage to figureout what is happening with his change. I am referring to that script https://github.com/openstack/coats/blob/master/coats/openstack_gerrit_zuul_status.user.js15:47
fungilooks like the cat jobs just completed15:47
corvusyoctozepto: oh, "sudo mkdir /opt/kolla_registry" ?15:48
yoctozeptocorvus: yeah15:48
clarkbzbr|ruck: so there is actually (disabled) code in our gerrit js to do that for you. It required some updates to the zuul api to request change specific statuses which I think exists now. So if someone wanted to work on reenabling that code path we could do that15:48
fungiokay, reenqueunig now15:49
yoctozeptoit worked just fine in txt15:49
yoctozeptoand ara shows failure15:49
yoctozeptoas if it ran once more15:49
clarkbzbr|ruck: the last time we did it you had to pull the entire status json blob for all changes for each change and that overwhelmed the zuul server15:49
yoctozepto(concurrently probably)15:49
corvusyoctozepto: this is "fun"15:49
* clarkb finds a link15:49
*** kobis1 has joined #openstack-infra15:49
yoctozeptocorvus: the world of software is15:50
yoctozeptoI like good mysteries15:50
yoctozeptoAgatha Christie and stuff but this is much more :P15:50
*** kobis1 has quit IRC15:51
zbr|ruckmaybe I am dreaming but my impression is that a developer wants a fail-fast strategy on checks, so he is informed about the first failure, this being more important than knowing about getting all failures. maybe I need to make a survey to get a better picture about expectations.15:51
corvusyoctozepto: the log url (which includes part of the build uuid) is the same in the txt and ara...15:51
clarkbzbr|ruck: https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/files/gerrit/hideci.js#L386-L38815:52
clarkbzbr|ruck: the zuul_inline var is hardcoded to false currently15:52
*** mattw4 has joined #openstack-infra15:52
yoctozeptocorvus: yup15:52
yoctozeptofrom my perspective there is nothing I can do about it :D15:53
corvuszbr|ruck: fyi, zuul has a fail-fast option for pipelines.  we have chosen not to enable it for openstack.15:53
clarkbfor me at least iterative work is what I do locally15:53
clarkbthen CI produces a set of results for the author and their reviewers15:53
*** hamzy has quit IRC15:54
clarkbgiving as complete a picture as possible so that the code can be improved15:54
zbr|ruckcorvus: out of curiosity, could it be enable for tripleo (if people would want)?15:55
corvuszbr|ruck: if you would like to experiment with that in one project, i think that's worth a discussion (please don't enable it without getting some agreement with the opendev team first).15:55
fungiit's per-pipeline15:55
*** rajinir has joined #openstack-infra15:55
fungiright?15:55
corvusfungi: actually, i think it's per project-pipeline15:55
fungiahh15:55
corvushttps://zuul-ci.org/docs/zuul/user/config.html#attr-project.%3Cpipeline%3E.fail-fast15:55
fungiyep, so it is15:56
zbr|ruckto be clear: i am asking all these because I am trying to lower the amount of resources we use.15:56
fungii don't object to a structured experiment in that direction, as long as it's not just turned on with little discussion and then forgotten15:56
fungiand with a plan in place to figure out how to measure the impact15:56
corvusyeah, i think that's key15:57
fungimaking changes arbitrarily without knowing how we expect to measure the result is what i would rather not see15:57
corvuswe've studied this long and hard over the years, and the last time we looked at it, we came to the conclusion that fail-fast was counter-productive for our community15:57
clarkbwe can measure ci resource usage fairly easily per job/project/logical project. The issue is going to be month over month there are many other variables that we'd hvae to account for to make that useful15:58
corvusso a controlled experiment to determine if that's changed could be a good idea15:58
fungiand increased resource consumption the last time we tried, if memory serves15:58
corvuszbr|ruck: so maybe write up an experiment proposal, with how you'll measure the outcomes?15:59
fungiit may also be that the way some projects in openstack have a culture which makes this option more or less useful, so we should consider the results with that in mind15:59
zbr|ruckfungi: ok, I believe you. in that case probably the dependency approach would be a better fit, right?15:59
*** ginopc has quit IRC15:59
zbr|ruckif I make some heavy jobs to start only after the basic ones fail, it should be easier to measure the outcomes.16:00
fungizbr|ruck: well, with the dependency approach you're going to want to measure what the overall increase in result delays ends up being, in addition to the number of times jobs end up getting rerun16:00
clarkbI think if we honestly feel linters are a huge concern we should consider a culture change to run linters locally before pushing16:00
clarkbit is easy to do and if the impact is large that seems like a no brainer16:00
clarkb(to take that specific example)16:01
fungiokay, all recorded changes have been reenqueued in gate and check. i'll give #openstack-release the all-clear signal16:01
zbr|ruckis not only about linters, when I say linters I usually mean cheap-jobs (all those that usually take <10-25m to run).16:02
*** udesale has joined #openstack-infra16:03
*** igordc has joined #openstack-infra16:04
fungi#status log restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)16:05
zbr|ruckmy guestimate is that with a *little* bit of deps we could reduce the T to get reply from gerrit, even if the effective run of the chain could be bit longer. The benefit would be that jobs should wait less time in queue before they effectively start. (less time because overall resource usage will be less).16:05
openstackstatusfungi: finished logging16:05
*** roman_g has quit IRC16:06
clarkbzbr|ruck: related to throughput time, fixing flakyness in jobs will always have a major impact due to how the gate pipelining works16:07
*** Lucas_Gray has quit IRC16:07
*** diablo_rojo has joined #openstack-infra16:07
clarkbfungi: you restarted the executors and mergers too?16:07
fungioh, no i did not16:08
fungi#status log CORRECTION: restarted zuul scheduler on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)16:08
corvusclarkb: sphinx problem affected old glean change too; i'll do that patch16:08
openstackstatusfungi: finished logging16:08
clarkbcorvus: ok16:08
fungiclarkb: thanks for pointing that out16:09
fungithat's what i get for copying from channel history16:09
openstackgerritJames E. Blair proposed opendev/glean master: Add .zuul.yaml  https://review.opendev.org/66721116:11
openstackgerritJames E. Blair proposed opendev/glean master: Replace nodepool func jobs  https://review.opendev.org/66722516:11
openstackgerritJames E. Blair proposed opendev/glean master: Pin sphinx  https://review.opendev.org/66739816:11
*** chandankumar is now known as raukadah16:13
fungiinfra-root: for the record, these are the numbers we got out of the repl today prior to the scheduler restart: http://paste.openstack.org/show/753375/16:15
corvusclarkb: can you take a look at the 2 failing jobs in https://review.opendev.org/667221 ?  trusty and bionic failed to ssh into the node, but xenial succeeded (along with centos and fedora)... should i recheck and see if that's a fluke, or do you think there could be something to that?16:15
clarkbI'll look16:16
corvusfungi: cool, so i think next time, i would just run through the etherpad, lines 3-31; ideally when there's been an increase in memory usage but before swapping16:17
corvusand compare numbers with what you have there16:17
fungiyep, will do. thanks again!16:18
fungialso i approved the repl addition16:18
fungii had already basically disected that change to figure out how it works16:18
corvus(and, honestly, if the number of "leaked" layouts is <5, i'd say the situation is suspect -- like, that could just be a natual delta if we caught it during a gate reset or something.  i'm certain with 349 leaked layouts there would be a problem visible to us, eventually)16:18
*** tdasilva has joined #openstack-infra16:19
clarkbcorvus: they each managed to build the ubuntu images successfully according to the logs16:19
corvusclarkb: yeah, i can't guess why ssh might not be answering :/16:20
fungicorvus: one thing i'm noticing... starting the scheduler with 579962 cherry-picked leaves the repl listening on 3000 by default. is that intended even without invoking the `zuul repl` rpc client command?16:21
clarkbcorvus: could be the boot time is too slow for our timeout16:21
clarkbcorvus: qemu being slow16:21
corvusfungi: i think i cherry-picked an old version16:21
fungioh, that makes sense16:21
clarkbboot timeout is set to 600 seconds which I would hope is plenty16:21
clarkbbut ya maybe bump that to 15 minutes and see if it becomes more reliable? Also we could grab the qemu logs16:22
*** tdasilva has quit IRC16:22
corvusclarkb: ooh, i like the idea of getting more logs... any idea how to do that?16:22
clarkbone sec16:23
fungicorvus: confirmed, looks like you fetched refs/changes/62/579962/3 last, so that's before the toggle was added16:23
clarkbcorvus: so grabbing the stuff out of the journal like devstack jobs do may be useful (though I don't think it will help with this particular issue)16:24
clarkband then the recursive contents of /var/log/libvirt will give us the qemu instance logs16:25
corvusclarkb: yeah, if there's a boot log for any vms, that would be useful (but we're not debugging openstack, so i don't think i want all the openstack service logs)16:25
corvuscool, i'll grab /var/log/libvirt16:25
clarkbcorvus: ya from journald we probably want the equivalent of the kernel log and syslog16:25
corvusclarkb: got a command handy for that?16:27
corvusclarkb: well, this is on bionic which still writes syslog, right?16:27
clarkbif rsyslog is installed (which I think it is) then yes16:28
clarkbbut I can link to the journalctl command too one sec16:28
clarkbhttps://opendev.org/openstack/devstack/src/branch/master/roles/export-devstack-journal/tasks/main.yaml#L20-L3216:28
*** lucasagomes has quit IRC16:32
*** ricolin has quit IRC16:33
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502316:35
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715616:35
corvusclarkb: can you take a look at the post playbook in 665023 real quick?16:35
clarkbcorvus: another thing tocheck, what user is nodepool attempting to use? I see we use the dib devuser element to set up a user which defaults to a username of 'devuser'16:35
corvusmake sure that looks right before we wait for the whole run :)16:35
*** tesseract has quit IRC16:35
clarkbcorvus: you might need a become: true for the libvirt log sync16:36
*** tesseract has joined #openstack-infra16:36
*** tdasilva has joined #openstack-infra16:37
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502316:37
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715616:37
corvusclarkb: good catch, thx16:37
clarkbthat looks good with latest ps16:38
clarkbnow to track down the ssh user used16:38
corvusclarkb: i think nodepool doesn't use a user, it just connects to the port and gets the host key16:39
corvusthe check.sh script (which the job runs after the node is up) logs in as root16:39
clarkbgotcha16:39
clarkband glean is expected to set that root user ssh key so that makes sense (to cover that works)16:40
clarkband config drive is set to true so that all looks good16:40
*** pkopec has quit IRC16:42
clarkbzbr|ruck: gate resets like those caused by http://logs.openstack.org/47/666747/1/gate/tripleo-ci-centos-7-containers-multinode/4c17ceb/logs/undercloud/home/zuul/undercloud_install.log.txt.gz likely consume more resources than any other situation16:45
openstackgerritJames E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs  https://review.opendev.org/66722116:45
clarkbthat caused jobs for ~18 changes to be discarded and restarted16:45
openstackgerritJames E. Blair proposed opendev/glean master: Replace nodepool func jobs  https://review.opendev.org/66722516:45
*** ramishra has quit IRC16:46
*** panda has quit IRC16:46
*** panda has joined #openstack-infra16:48
openstackgerritJames E. Blair proposed opendev/glean master: Replace nodepool func jobs  https://review.opendev.org/66722516:48
clarkbhttp://logs.openstack.org/72/666672/2/gate/puppet-openstack-lint/3cc866e/job-output.txt.gz#_2019-06-25_16_45_08_545046 too, that will need to be reparented to legacy-base16:50
clarkbzbr|ruck: fungi ^ fyi16:50
clarkb(or alternative to legacy-base is stop using zuul-cloner)16:51
*** sthussey has quit IRC16:52
*** smarcet has quit IRC16:52
*** aedc has joined #openstack-infra16:53
*** jpich has quit IRC16:56
*** igordc has quit IRC16:56
*** tesseract has quit IRC16:59
*** igordc has joined #openstack-infra16:59
fungiyep, i agree17:01
*** yamamoto has joined #openstack-infra17:06
openstackgerritAlex Schultz proposed zuul/zuul master: Additional note about branches for implied-branches  https://review.opendev.org/66741517:07
*** udesale has quit IRC17:07
openstackgerritMerged zuul/zuul master: Add command processor to zuul-web  https://review.opendev.org/66630717:09
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213417:10
*** raukadah is now known as chandankumar17:11
*** hamzy has joined #openstack-infra17:12
*** chandankumar is now known as raukadah17:13
*** smarcet has joined #openstack-infra17:15
*** ralonsoh has quit IRC17:20
*** jtomasek has quit IRC17:21
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos  https://review.opendev.org/66741817:21
cloudnullAJaeger - https://review.opendev.org/#/q/topic:retire-role+(status:open) - I think that does it. Mind giving it a once over to make sure i'm not missing anything? -cc mwhahaha17:22
openstackgerritMerged zuul/zuul master: Add repl server for debug purposes  https://review.opendev.org/57996217:22
AJaegercloudnull: commented - you need some more changes...17:25
openstackgerritMerged opendev/system-config master: Remove dead link from 'paste' documentation  https://review.opendev.org/66647417:28
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos  https://review.opendev.org/66741817:35
*** mattw4 has quit IRC17:38
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos  https://review.opendev.org/66741817:39
*** ociuhandu has joined #openstack-infra17:41
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos  https://review.opendev.org/66741817:41
cloudnullsorry about the spam.17:41
cloudnullAJaeger I think that's all of it?17:43
*** ociuhandu_ has quit IRC17:44
*** mattw4 has joined #openstack-infra17:45
*** ociuhandu has quit IRC17:45
*** eernst has joined #openstack-infra17:45
*** ykarel|afk has quit IRC17:45
*** smarcet has quit IRC17:49
*** gfidente has quit IRC17:59
*** guimaluf has joined #openstack-infra18:10
*** sgw has left #openstack-infra18:13
*** smarcet has joined #openstack-infra18:13
*** smarcet has left #openstack-infra18:15
AJaegercloudnull: one more cleanup request...18:21
fungilogan-: not sure if you saw in scrollback earlier, but the network issue in limestone seems to have returned as of 09:00 utc today: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=6493418:35
openstackgerritTobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded  https://review.opendev.org/66666418:37
openstackgerritTobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded  https://review.opendev.org/66666418:39
mwhahahawhat's the remediation for missing zuul-cloner?  i'm reading the email but not sure what the replacement action was supposed to be. looks like the puppet-openstack stable jobs might be broken because we were still using it18:39
clarkbmwhahaha: either reparent to legacy-base job (instead of default "base") or use the repos in /home/zuul/src that are already cloned there by zuul18:40
mwhahahak18:40
clarkbthe legacy-base job installs the zuul-cloner shim18:40
clarkbas compat for devstack-gate and other tools that may need it and we don't want to update18:40
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos  https://review.opendev.org/66741818:41
* mwhahaha hopes this isn't terrible to patch across all the table branches18:41
openstackgerritKevin Carter (cloudnull) proposed openstack/project-config master: Removed unused ACL  https://review.opendev.org/66743718:42
fungiand just wait until you get to the chair branches18:44
fungi;)18:44
* mwhahaha flips tables18:44
* mwhahaha throws chairs18:44
fungithat's how i usually deal with things18:45
*** ianychoi_ has quit IRC18:49
cloudnullthanks AJaeger!18:49
openstackgerritTobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded  https://review.opendev.org/66666418:50
*** ianychoi_ has joined #openstack-infra18:50
*** aedc has quit IRC18:54
clarkb5 minutes to our meeting over in #openstack-meeting18:55
*** kjackal has quit IRC19:04
*** e0ne has joined #openstack-infra19:07
*** mriedem has quit IRC19:15
*** mriedem has joined #openstack-infra19:16
*** mattw4 has quit IRC19:18
*** lpetrut has joined #openstack-infra19:18
*** jtomasek has joined #openstack-infra19:19
*** lpetrut has quit IRC19:23
*** tdasilva has quit IRC19:31
*** e0ne has quit IRC19:31
*** pcaruana has quit IRC19:38
*** factor has joined #openstack-infra19:39
*** icarusfactor has quit IRC19:40
*** tosky has quit IRC19:40
clarkbfungi: SotK the storyboardclient issue is a name mismatch19:47
clarkbthe module installs as python-storyboardclient but we lookup the version for storyboardclient19:47
fungiahh19:47
fungiso we need an explicit string passed in for the module name19:47
fungigood eye19:47
clarkbya that already exists, it just doesn't reflect the reality of what is installed so pbr can't find the metadata outside of a git repo19:48
fungiwe do that in other projects that have similar situations19:48
pabelangerianw: just reading backscoll on reboot issue, where is the playbook that does the reboot on the production server?20:00
clarkbfungi: https://opendev.org/opendev/python-storyboardclient/src/branch/master/storyboardclient/__init__.py#L19 is where it breaks. If you change that string to python-storyboardclient it works20:00
ianwfungi: the trick is for the systemd/daemon idea getting https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/library/zuul_console.py into a daemon on the host.  i guess we'd have to install it so ansible can find it, and then the systemd job would be a ansible-playbook to localhost?20:01
clarkbfungi: I think https://opendev.org/opendev/python-storyboardclient/src/branch/master/setup.cfg#L2 determines the name20:01
fungiclarkb: are you pushing a change? if not, i'll do that now20:01
clarkbfungi: I haven't yet and lunch calls so you should feel free to20:02
ianwpabelanger: using it in https://review.opendev.org/#/c/665057/ ... in this case, we don't do anything after the reboot so nothing is "lost" as such, but you do get a flood of messages about the streamer20:02
fungiianw: oh, does ansible on the executor copy it over to the node?20:02
ianwfungi: yes, essentially ansible ships libraries like that over to run remotely20:03
fungiclarkb: thanks, will do20:04
fungiianw: and i guess if it copies that into /tmp then it may no longer be there after a reboot20:04
ianwfungi: how it does it is pretty magic involving temp directories and zip files i think; I wouldn't want to rely on any sort of behaviour of it (even assuming I understood it :)20:05
fungiyeah, definitely qualifies as an ansible implementation detail20:06
ianwclarkb: i wasn't quite following how the base job would restart it?  i guess the problem is that the "outer" ansible is basically sitting waiting for a "shell: ansible-playbook ... <ci-playbook>" task to finish, so there's no way to insert a "zuul_console:" call20:07
ianwwe'd have to install zuul to get the zuul_console: library task on our CI bridge.openstack.org, and then i guess configure ansible there to look at the right paths to bring in that library20:08
openstackgerritJeremy Stanley proposed opendev/python-storyboardclient master: Correct the distname in PBR version discovery  https://review.opendev.org/66745520:09
fungiclarkb: ^20:09
fungiianw: yeah, the systemd unit is also compelling because we get back the console log stream before the next ansible task is executed20:10
pabelangerianw: ack, I did leave an unrelated comment about the reboot20:11
ianwpabelanger: ahh, i thought the reboot: command incorporated the waiting?20:11
pabelangerOh, right, there is a new reboot task20:12
clarkbianw I was thinking base could install a unit then start it via that unit. Then in reboot it would be started automatically20:12
pabelangerianw: ignore me20:12
clarkbrather than starting it however it starts it now20:12
*** jcoufal has quit IRC20:14
*** mattw4 has joined #openstack-infra20:14
fungiclarkb: challenge being, as ianw points out, is that it also needs to install the zuul_console.py and any dependencies20:14
fungisince right now we rely on ansible magic from the executor to put that in place (somewhere) on the node and start it20:15
ianwfungi / clarkb : yeah ... or the guts of https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/library/zuul_console.py is pulled out into something more stand-aloney20:15
ianwwhich could be just "python3 hacky_zuul_console.py"-ed in a at-boot oneshot20:16
fungii'm a gonna go cook dinner, then come back20:16
ianwit is a lot of fiddling; is this likely to be generically useful?  i'm struggling to think where it might be used in a similar fashion anywhere else right now20:17
*** smarcet has joined #openstack-infra20:17
*** kjackal has joined #openstack-infra20:18
*** e0ne has joined #openstack-infra20:18
corvusianw, clarkb, fungi: it's also worth keeping in mind that we still have the plan to replace that with ssh-forwarded unix socket python-logging... i think that will handle this use case without any special tooling.  so we probably don't want to invest too much into making zuul-console standalone20:19
*** factor has quit IRC20:20
ianwoh cool, yeah i'm onboard with "put up with it until something better" ... it's a corner case inside a corner case :)20:21
*** factor has joined #openstack-infra20:21
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502320:23
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715620:23
fungicorvus: ianw: agreed, i didn't mean to suggest loss of the console log in that job as a blocker to approving it20:24
fungi(i think i even gave it a +2 anyway?)20:24
fungiseems like we could live with it20:25
corvusianw: the suse-src jobs on glean seems to reliably fail -- is that known/expected? (also gentoo, but it's non-voting)20:25
corvusianw: (see https://review.opendev.org/667398 and children for recent examples)20:25
ianwhrm, looks like 42.3 failing to build in http://logs.openstack.org/98/667398/1/check/nodepool-functional-py35-suse-src/b4aebff/controller/logs/builds/ ... something happened with that recently20:27
corvusianw: http://logs.openstack.org/11/667211/2/check/nodepool-functional-py35-suse-src/794c3c9/controller/logs/builds/opensuse-423-0000000001_log.txt.gz#_2019-06-25_18_14_41_97120:27
corvusianw: that looks like the culprit line20:27
ianwthat's right, 42.2 was removed https://review.opendev.org/#/c/660137/ so 42.3 should be the latest.  that may be a red herring, although i'm not sure20:30
ianwand the gate build seems to be failing in a similar, but different way https://nb01.openstack.org/opensuse-423-0000051674.log20:31
corvusAJaeger, fungi: can you look at merging this today?  https://review.opendev.org/66722820:31
*** factor has quit IRC20:31
ianwmaybe that's a red herring too...20:32
ianw2019-06-25 18:54:33.717 | > Problem: systemd-logger-228-71.1.x86_64 conflicts with namespace:otherproviders(syslog) provided by rsyslog-8.24.0-2.13.1.x86_6420:32
ianw2019-06-25 18:54:33.717 | >  Solution 1: deinstallation of systemd-logger-228-71.1.x86_6420:32
ianw2019-06-25 18:54:33.717 | >  Solution 2: do not install rsyslog-8.24.0-2.13.1.x86_6420:32
*** factor has joined #openstack-infra20:32
*** e0ne has quit IRC20:32
ianwwho could have guessed systemd would be involved!20:32
corvusianw: heh... i'd like to make this non-voting for a bit to get all these fixes and moves in, ok?20:32
ianwyeah, i think this will require suse experts to look into20:33
*** hamzy has quit IRC20:33
openstackgerritJames E. Blair proposed openstack/project-config master: Make glean opensuse job non-voting  https://review.opendev.org/66745920:34
openstackgerritJames E. Blair proposed opendev/glean master: Pin sphinx  https://review.opendev.org/66739820:35
openstackgerritJames E. Blair proposed opendev/glean master: Add .zuul.yaml  https://review.opendev.org/66721120:35
openstackgerritJames E. Blair proposed opendev/glean master: Replace nodepool func jobs  https://review.opendev.org/66722520:35
corvusianw, clarkb: the other 8 nodepool-func changes are still in flux, but it would be good to start trying to land the ones that are ready (due to the amount of time it may take) -- can you go ahead and review/approve https://review.opendev.org/667212 and https://review.opendev.org/667220 ?20:37
corvus(they probably should have been squashed, and if they flake out in the gate, i will squash them, but i think they're okay to land separately)20:37
clarkbcorvus: ya I'll take a look as soon as I get this new gitea06 building20:38
clarkbI'm using the same flavor but with an 80GB instead of 30GB boot from volume root disk20:39
clarkband I didn't forget --config-drive thankfully20:40
clarkbbut I didn't specify the network name ... /me tries again20:43
clarkbmordred: corvus I can't boot a new instance in sjc1 as I've hit the instance quota. There is a mttest and jeblairtest any idea if those can be removed? I think my other option is to delete the old gitea06 prior to booting a new one20:45
corvusclarkb: you can kill mine; i used it for building gitea images when we were bootstrapping this; i don't need it anymore20:46
clarkbthanks I'll delete jeblairtest then20:46
corvusclarkb: https://review.opendev.org/667459 is ready for a +320:47
clarkbdone20:48
corvusthat should let us get the glean side of things moving20:48
*** Goneri has quit IRC20:49
clarkbhrm ping6 not installed. This may be fun20:50
clarkbI'm going to use a local checkout of system-config to make launch node script edits20:50
*** factor has quit IRC20:52
*** factor has joined #openstack-infra20:52
*** kjackal has quit IRC20:57
clarkbok new issue is ansible fails to connect I think possibly because we specify the actually inventory file on launch node and not just the one off inventory that includes this node and there is a gitea06.opendev.org collision maybe?20:59
*** ianychoi_ has quit IRC20:59
*** factor has quit IRC20:59
*** slaweq has quit IRC20:59
clarkbyup the ip it failed to connect to is the old gitea06 server21:00
openstackgerritMerged openstack/project-config master: Add zuul-operator project  https://review.opendev.org/66722821:00
openstackgerritMerged openstack/project-config master: Make glean opensuse job non-voting  https://review.opendev.org/66745921:00
*** ianychoi_ has joined #openstack-infra21:00
clarkbso we can't replace servers if they exist in our inventory21:01
*** smarcet has quit IRC21:01
clarkb(we have to load the global inventory stuff for host vars data like sysadmins list according to git log)21:02
corvuswhy is the real inventory file involved?21:02
corvusah21:02
clarkbhttps://review.opendev.org/#/c/642096/ is the change that introduced that21:02
corvusclarkb: i reckon we can delete current gitea0621:02
clarkbI'll get a change up to ya that21:03
fungion hand to quick-approve that now that dinner is out of the wok21:03
openstackgerritClark Boylan proposed opendev/system-config master: Remove gitea06 from our inventory file  https://review.opendev.org/66746521:04
clarkbI think thta is sufficient since all the other things we could remove it from will be happy once we make the new server21:04
*** ianychoi_ has quit IRC21:05
*** raissa has joined #openstack-infra21:06
*** raissa has quit IRC21:06
clarkbfungi: ^21:07
*** ianychoi_ has joined #openstack-infra21:09
fungii concur21:10
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715621:10
openstackgerritJames E. Blair proposed zuul/nodepool master: Add Zypper to openstack func job  https://review.opendev.org/66746621:10
*** slaweq has joined #openstack-infra21:11
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502321:12
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715621:12
*** slaweq has quit IRC21:15
*** openstackgerrit has quit IRC21:18
*** ianychoi_ is now known as ianychoi21:18
*** aedc has joined #openstack-infra21:20
*** jtomasek has quit IRC21:24
clarkbas a general fyi we do leak the boot from volume root volume when launch node has errors (I've manually cleaned up the two I've leaked so far)21:24
*** slaweq has joined #openstack-infra21:26
clarkbthe way you can double check the volumes that are not attached are leaked is by checking their updated at timestamps and the image they were based off of21:26
*** slaweq has quit IRC21:31
*** aedc has quit IRC21:37
*** factor has joined #openstack-infra21:37
corvuser, was zypper removed from ubuntu?21:44
*** openstackgerrit has joined #openstack-infra21:44
openstackgerritMerged openstack/diskimage-builder master: Move Zuul config in-repo  https://review.opendev.org/66721221:44
clarkbya  Ithink bionic may have pulled it?21:44
corvusnice21:45
corvusit's missing in bionic, but present in xenial, cosmic, disco, eoan21:45
*** Emine has joined #openstack-infra21:45
corvusokay... that's going to require some reshuffling21:46
clarkbI wonder if that is the sort of thing we can convince them to put in universe if it is in every other release21:47
*** georgk has quit IRC21:48
*** georgk has joined #openstack-infra21:49
*** tobberydberg has quit IRC21:49
*** tobberydberg has joined #openstack-infra21:51
fungihow did we end up dealing with that in dib?21:52
fungiwe build opensuse images... do we invoke zypper from inside the chroot?21:52
clarkbfungi: we have xenial builders21:52
fungioic21:52
fungiwe "support" it by not upgrading ;)21:52
*** mriedem has quit IRC21:52
corvusyeah, i'm trying to (mostly) upgrade us to bionic, so i'll reshuffle this to try to make it work21:53
corvusi'll make the extra packages per-job configurable, and then in the suse job, add zypper and set the node to xenial21:53
corvusso only the one job will still be running on xenial21:53
fungithe post-bionic packages may be installable on bionic too21:54
fungibut that could require some extra apt pinning work21:54
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502321:54
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715621:55
openstackgerritMerged opendev/system-config master: Remove gitea06 from our inventory file  https://review.opendev.org/66746521:55
fungiodds are it was temporarily evicted from debian/testing and so just happened to be absent when bionic imported and froze it21:55
fungi(the problem with trying to create a distro from debian/testing any time other than when debian creates one)21:55
fungi((which will hopefully finally happen any day now))21:57
pabelangerhttps://tracker.debian.org/news/719494/zypper-removed-from-testing/21:58
pabelangerthat was from fungi last time :)21:58
fungihah21:59
fungii drank away those brain cells21:59
openstackgerritJames E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs  https://review.opendev.org/66722122:00
fungibut yeah, more generally https://tracker.debian.org/pkg/zypper shows the overall timeline22:00
pabelangerhttps://bugs.launchpad.net/ubuntu/+source/zypper/+bug/1808230 is also from ianw22:01
openstackLaunchpad bug 1808230 in zypper (Ubuntu) "Zypper unavailable on bionic" [Undecided,New]22:01
corvusokay, that offloads a bit more of the job into variables, so more of the specifics of package installation are in the dib repo instead of nodepool22:01
fungiso removed from testing 2017-01-29, bionic imports testing, migrated to testing again 2018-06-1622:01
*** tdasilva has joined #openstack-infra22:02
*** tdasilva_ has joined #openstack-infra22:02
*** tdasilva_ has quit IRC22:02
fungias a result it also skipped debian/stretch (was present in jessie and will almost certainly be in buster when it releases in a little over a week)22:03
clarkbI've noticed that /var/log/syslog is also mising on this bionic image22:03
clarkbthese things are super minimal22:03
fungido we not install rsyslog maybe?22:04
clarkbnot sure, I'll check when it is done (assuming it doesn't fail this time)22:04
corvusclarkb, fungi: https://review.opendev.org/667398 is ready for review/approval22:05
corvusclarkb, fungi: as is its child: https://review.opendev.org/66721122:05
*** ekultails has quit IRC22:06
corvusthose n-v jobs are expected unrelated failures22:06
*** Emine has quit IRC22:07
*** mnencia has quit IRC22:09
*** mnencia has joined #openstack-infra22:11
clarkbok /var/log/syslog exists after the ansible install of package and the reboot (also it does run the unattended upgrade script prior to the reboot so we do update packages properly)22:12
*** _erlon_ has quit IRC22:18
openstackgerritClark Boylan proposed opendev/system-config master: Enroll new gitea06 into ansible inventory  https://review.opendev.org/66747422:24
clarkbsomething like ^ for the next step in spinning up a new host22:24
*** smarcet has joined #openstack-infra22:26
*** smarcet has quit IRC22:32
openstackgerritMerged openstack/diskimage-builder master: Add DIB_UBUNTU_KERNEL to ubuntu-minimal  https://review.opendev.org/66606322:48
clarkbianw: fungi: https://review.opendev.org/#/c/667474/ is +1 from zuul now. I've got a family dinner thing so probably don't want to approve that today unless someone else wants to watch it. But if you can review it I'm happy to approve in the morning when I can watch it22:54
*** diablo_rojo has quit IRC22:54
*** weifan has joined #openstack-infra22:55
*** tkajinam has joined #openstack-infra22:56
openstackgerritJames E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job  https://review.opendev.org/66502323:03
openstackgerritJames E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs  https://review.opendev.org/66715623:03
*** rcernin has joined #openstack-infra23:05
openstackgerritJames E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs  https://review.opendev.org/66722123:05
*** eernst has quit IRC23:14
logan-fungi: ack, thanks. will look into it some more.23:20
*** auristor has quit IRC23:21
fungicool, just wanted to be sure you're aware23:22
*** auristor has joined #openstack-infra23:23
*** lseki has quit IRC23:26
*** dchen has joined #openstack-infra23:27
*** weifan has quit IRC23:29
*** weifan has joined #openstack-infra23:29
*** weifan has quit IRC23:29
*** eernst has joined #openstack-infra23:29
*** eernst_ has joined #openstack-infra23:30
*** eernst has quit IRC23:30
*** diablo_rojo has joined #openstack-infra23:31
openstackgerritMerged opendev/glean master: Pin sphinx  https://review.opendev.org/66739823:35
*** eernst_ has quit IRC23:35
*** yamamoto has quit IRC23:36
*** mattw4 has quit IRC23:37
openstackgerritMerged opendev/glean master: Add .zuul.yaml  https://review.opendev.org/66721123:41
*** aaronsheffield has quit IRC23:49
*** rh-jelabarre has quit IRC23:51
*** rlandy has quit IRC23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!