Tuesday, 2019-07-09

donnydnice. Air handler is all fixed up00:00
*** aaronsheffield has quit IRC00:02
*** Lucas_Gray has quit IRC00:12
*** nicolasbock has quit IRC00:17
ianwclarkb: it doesn't seem like the fedora vm is booting in ci :/00:24
*** ijw has quit IRC00:28
donnydclarkb: you want to take it the quota up to 20 ?00:28
donnydneed to see how many reasonably fit on each hypervisor00:28
*** tjgresha has quit IRC00:30
*** tjgresha has joined #openstack-infra00:30
*** pkopec_ has quit IRC00:36
ianwclarkb: i've managed to grab the .qcow2 and console log for the test vm00:38
ianwdebugging, but eventually this will timeout00:39
donnydianw: anything you need from the backend please lmk00:41
*** ijw has joined #openstack-infra00:41
ianwdonnyd: thanks; yeah this is in our CI test of the changes to modify the ordering of networkmanger startup00:41
*** ijw has quit IRC00:43
*** tdasilva has quit IRC00:48
ianweven after rebooting the node that nodepool created, it still doesn't come up with networking00:50
*** igordc has quit IRC00:59
*** hongbin has joined #openstack-infra01:03
*** uberjay has quit IRC01:06
*** bobh has joined #openstack-infra01:07
*** uberjay has joined #openstack-infra01:08
*** armax has quit IRC01:09
*** bobh has quit IRC01:12
*** imacdonn has quit IRC01:13
*** imacdonn has joined #openstack-infra01:14
*** happyhemant has quit IRC01:17
ianwcorvus: i feel like we used to get the console logs of the attempted boot of the test vm in nodepool.log;  http://logs.openstack.org/73/669773/1/check/dib-nodepool-functional-openstack-fedora-29-src/1c412cb/nodepool/nodepool.log01:18
ianwclarkb: i was, however, on the test host, and dumped the console, which looks like the ordering is just fine -> http://paste.openstack.org/show/754185/01:20
*** ianychoi_ has joined #openstack-infra01:27
clarkbya the ordering there looks good01:29
*** bauzas_ has joined #openstack-infra01:29
*** nickv1985_ has joined #openstack-infra01:29
*** ianychoi has quit IRC01:30
*** altlogbot_0 has quit IRC01:30
*** bauzas has quit IRC01:30
*** nickv1985 has quit IRC01:30
*** bauzas_ is now known as bauzas01:30
*** nickv1985_ is now known as nickv198501:30
*** diablo_rojo has quit IRC01:31
*** altlogbot_2 has joined #openstack-infra01:31
*** jamesdenton has quit IRC01:32
*** hemna_ has quit IRC01:34
*** adriant has joined #openstack-infra01:36
openstackgerritIan Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging  https://review.opendev.org/66978001:36
ianwclarkb: i can test the .qcow2 i captured from the run on my cloud; just need to upload it over what pretends to be upload bandwidth in .au01:37
*** hemna_ has joined #openstack-infra01:38
ianwi think a) getting the debug logging back in the job so it dumps the console, and then secondly setting systemd.journald.forward_to_console=1 would be *really* helpful, just in general ... i'm not sure how to get that into the image01:38
ianwbut if we were getting syslog on the console, and that was captured in logs ... well that would be nice01:39
clarkb++01:40
clarkbThere is ansible that splats down dib elements01:40
clarkbI think you couod modify that01:40
ianwyeah, it's more getting it into the grub command line effectively01:42
*** diablo_rojo has joined #openstack-infra01:45
ianwoh i guess it could just be set in journald.conf01:46
prometheanfirebuilding a new gentoo systemd image to test on01:55
prometheanfireis the a bug I can read up on?01:55
*** hongbin has quit IRC01:57
*** jamesmcarthur has quit IRC02:00
*** yamamoto has joined #openstack-infra02:03
*** bobh has joined #openstack-infra02:09
*** diablo_rojo has quit IRC02:22
clarkbprometheanfire: no I'donly just discovered the problem before needjng to do dinner02:22
clarkbprometheanfire: basically ipv6 isnt configured02:22
clarkbso gentoo image inst working on ipv6 only cloud02:23
*** ijw has joined #openstack-infra02:30
prometheanfirehuh02:32
prometheanfireonly the gentoo image?02:32
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console  https://review.opendev.org/66978402:39
*** hongbin has joined #openstack-infra02:57
*** icarusfactor has quit IRC03:03
*** yamamoto has quit IRC03:03
openstackgerritIan Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging  https://review.opendev.org/66978003:04
openstackgerritIan Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978703:04
clarkbprometheanfire: ya, centod and fedora seem to have an unrelated network manager race03:10
prometheanfireok03:10
*** whoami-rajat has joined #openstack-infra03:12
*** rfolco has quit IRC03:19
*** jamesmcarthur has joined #openstack-infra03:20
*** bobh has quit IRC03:28
*** jamesmcarthur has quit IRC03:33
*** ykarel|away has joined #openstack-infra03:49
*** ykarel|away is now known as ykarel03:53
*** jamesmcarthur has joined #openstack-infra04:01
*** hongbin has quit IRC04:04
*** ykarel has quit IRC04:05
*** jamesmcarthur has quit IRC04:05
*** ijw has joined #openstack-infra04:06
*** mandu_kim has joined #openstack-infra04:08
*** udesale has joined #openstack-infra04:10
*** sjfjqjfkd has joined #openstack-infra04:13
*** ijw has quit IRC04:13
openstackgerritIan Wienand proposed opendev/glean master: network-manager: add network-pre dependencies  https://review.opendev.org/66977304:15
*** factor has joined #openstack-infra04:16
*** jamesmcarthur has joined #openstack-infra04:19
ianwclarkb: exactly the same image (as in ./test-image-0000000002.qcow2 taken from the CI host) boots fine in my cloud environment.  console logs look the same ... but yet CI hosts don't get network :/04:22
sjfjqjfkdHi all, anybody know what should I append on local.conf for using gnocchi-api?04:25
sjfjqjfkdenable_plugin ceilometer https://git.openstack.org/openstack/ceilometer.git stable/rockyCEILOMETER_BACKEND=gnocchienable_service gnocchi-api,gnocchi-metricdit doesn04:25
sjfjqjfkddoesn't work,,04:26
*** jamesmcarthur has quit IRC04:29
*** rcernin has quit IRC04:30
*** rcernin has joined #openstack-infra04:31
*** jamesmcarthur has joined #openstack-infra04:32
*** raukadah is now known as chandankumar04:35
*** pcaruana has joined #openstack-infra04:35
ianwsjfjqjfkd: probably not much help in this channel, #openstack-telemetry will have more help, however, you might like to use something like http://codesearch.openstack.org/?q=CEILOMETER_BACKEND.*gnocchi&i=nope&files=&repos= to see what other things might do similar and how they do it04:35
openstackgerritIan Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging  https://review.opendev.org/66978004:37
openstackgerritIan Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978704:37
*** ykarel has joined #openstack-infra04:38
*** pcaruana has quit IRC04:38
*** ijw has joined #openstack-infra04:41
*** ijw has quit IRC04:47
AJaegerconfig-core, please review https://review.opendev.org/669727 https://review.opendev.org/667949 https://review.opendev.org/665910 and https://review.opendev.org/66870804:49
*** kjackal has joined #openstack-infra04:51
*** gyee has quit IRC04:59
*** ricolin has joined #openstack-infra05:03
*** jamesmcarthur has quit IRC05:05
*** ccamacho has quit IRC05:06
sshnaidm|afkIs there problem with ovh BHS1 cloud? We have last 7 gate failures on it only, all of them fail to download containers05:06
ianwsshnaidm|afk: not that i'm aware of.  was that via the proxies on the mirror?  note if the remote end is having issues the proxy doesn't help :)05:09
sshnaidm|afkianw, yeah, we use proxies..05:10
*** ijw has joined #openstack-infra05:16
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console  https://review.opendev.org/66978405:21
*** ijw has quit IRC05:23
*** rh-jelabarre has quit IRC05:24
*** ykarel_ has joined #openstack-infra05:39
ianwclarkb: dropped a comment in your change https://review.opendev.org/#/c/669772/ on what i'm thinking for moving forward ...05:39
*** ykarel has quit IRC05:40
*** ykarel_ has quit IRC05:42
*** kjackal has quit IRC05:43
*** jamesmcarthur has joined #openstack-infra05:44
*** ykarel has joined #openstack-infra05:46
*** udesale has quit IRC05:48
*** ijw has joined #openstack-infra05:49
*** jamesmcarthur has quit IRC05:51
*** ccamacho has joined #openstack-infra05:53
*** ijw has quit IRC05:54
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console  https://review.opendev.org/66978405:57
*** ijw has joined #openstack-infra05:57
*** rcernin has quit IRC05:57
*** kjackal has joined #openstack-infra06:00
*** udesale has joined #openstack-infra06:06
*** ijw has quit IRC06:06
*** udesale has quit IRC06:09
*** udesale has joined #openstack-infra06:10
*** ykarel is now known as ykarel|meetig06:16
*** ykarel|meetig is now known as ykarel|meeting06:16
*** dpawlik has joined #openstack-infra06:19
*** jamesmcarthur has joined #openstack-infra06:24
*** jamesmcarthur has quit IRC06:28
*** ccamacho has quit IRC06:29
*** ccamacho has joined #openstack-infra06:29
*** pgaxatte has joined #openstack-infra06:33
*** ijw has joined #openstack-infra06:34
*** ijw has quit IRC06:40
*** udesale has quit IRC06:40
*** ociuhandu has joined #openstack-infra06:46
*** ociuhandu has quit IRC06:52
*** bhavikdbavishi has joined #openstack-infra06:54
*** rpittau|afk is now known as rpittau06:58
*** jamesmcarthur has joined #openstack-infra06:59
*** slaweq has joined #openstack-infra07:02
*** pcaruana has joined #openstack-infra07:03
*** jamesmcarthur has quit IRC07:05
*** ginopc has joined #openstack-infra07:11
*** piotrowskim has joined #openstack-infra07:12
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console  https://review.opendev.org/66978407:15
*** ianw is now known as ianw_pto07:16
*** udesale has joined #openstack-infra07:24
*** tosky has joined #openstack-infra07:28
*** ralonsoh has joined #openstack-infra07:30
openstackgerritMerged openstack/project-config master: Retire tempest-tripleo-ui  https://review.opendev.org/66794907:30
openstackgerritMerged openstack/project-config master: Adding CI for ironic-prometheus-exporter  https://review.opendev.org/66591007:32
openstackgerritMerged openstack/openstack-zuul-jobs master: Remove base role integration testing  https://review.opendev.org/66972707:34
*** lucasagomes has joined #openstack-infra07:40
*** witek has joined #openstack-infra07:40
*** iurygregory has joined #openstack-infra07:41
*** priteau has joined #openstack-infra07:44
*** bhavikdbavishi has quit IRC07:46
*** ykarel_ has joined #openstack-infra07:50
*** ykarel_ is now known as ykarel07:52
*** ykarel|meeting has quit IRC07:52
*** ykarel is now known as ykarel|lunch07:53
*** sjfjqjfkd has quit IRC07:56
*** dtantsur|afk is now known as dtantsur07:57
*** ijw has joined #openstack-infra07:58
*** rcernin has joined #openstack-infra08:02
*** ijw has quit IRC08:03
*** rcernin has quit IRC08:11
*** ociuhandu has joined #openstack-infra08:17
openstackgerritTobias Henkel proposed zuul/zuul master: Improve error reporting for zuul dequeue  https://review.opendev.org/66981308:17
*** panda has quit IRC08:22
*** panda has joined #openstack-infra08:24
*** rcernin has joined #openstack-infra08:27
*** pkopec has joined #openstack-infra08:30
*** bhavikdbavishi has joined #openstack-infra08:38
*** ijw has joined #openstack-infra08:38
*** ykarel|lunch is now known as ykarel08:42
*** tkajinam has quit IRC08:42
*** ijw has quit IRC08:44
*** priteau has quit IRC08:47
*** priteau has joined #openstack-infra08:48
*** sshnaidm|afk is now known as sshnaidm|ruck08:54
*** Fidde has joined #openstack-infra08:55
*** jamesmcarthur has joined #openstack-infra09:01
*** jamesmcarthur has quit IRC09:05
*** jamesmcarthur has joined #openstack-infra09:32
*** jamesmcarthur has quit IRC09:37
openstackgerritMerged openstack/project-config master: Mark networking-ovn-tempest-dsvm-ovs-release as voting  https://review.opendev.org/66870809:39
openstackgerritMerged opendev/irc-meetings master: Create a meeting for Networking OVN project  https://review.opendev.org/66801309:39
*** pkopec has quit IRC09:40
*** pkopec has joined #openstack-infra09:45
*** bhavikdbavishi has quit IRC09:47
*** pcaruana has quit IRC09:48
*** derekh has joined #openstack-infra10:00
*** priteau has quit IRC10:03
*** nicolasbock has joined #openstack-infra10:10
*** ricolin has quit IRC10:14
*** ricolin has joined #openstack-infra10:22
*** ricolin has quit IRC10:23
*** ijw has joined #openstack-infra10:30
*** xek_ is now known as xek10:32
*** udesale has quit IRC10:34
*** dchen has quit IRC10:35
*** kjackal has quit IRC10:48
*** jamesdenton has joined #openstack-infra10:50
*** yamamoto has joined #openstack-infra11:03
*** dosaboy has quit IRC11:05
*** nicolasbock has quit IRC11:06
*** yamamoto has quit IRC11:08
*** roman_g has joined #openstack-infra11:08
*** dosaboy has joined #openstack-infra11:09
*** ykarel is now known as ykarel|afk11:10
*** tesseract has joined #openstack-infra11:11
*** tesseract has quit IRC11:13
*** tesseract has joined #openstack-infra11:15
*** altlogbot_2 has quit IRC11:19
*** irclogbot_0 has quit IRC11:19
*** altlogbot_2 has joined #openstack-infra11:20
*** zbr is now known as zbr|lunch11:22
*** tesseract has quit IRC11:23
*** apetrich has joined #openstack-infra11:25
*** altlogbot_2 has quit IRC11:25
*** kjackal has joined #openstack-infra11:27
*** raissa has joined #openstack-infra11:29
*** jcoufal has joined #openstack-infra11:29
*** raissa has quit IRC11:30
*** bhavikdbavishi has joined #openstack-infra11:30
*** jamesmcarthur has joined #openstack-infra11:34
*** ijw has quit IRC11:34
*** jamesmcarthur has quit IRC11:39
openstackgerritMonty Taylor proposed zuul/zuul master: Spec: Add a Kubernetes Operator for Zuul  https://review.opendev.org/65918011:39
*** ricolin has joined #openstack-infra11:44
*** jcoufal has quit IRC11:45
*** jcoufal has joined #openstack-infra11:46
*** odyssey4me_ has joined #openstack-infra11:49
*** jistr_ has joined #openstack-infra11:51
*** niceplace_ has joined #openstack-infra11:52
*** ykarel|afk has quit IRC11:53
*** kinrui has joined #openstack-infra11:53
*** electrofelix has joined #openstack-infra11:54
*** udesale has joined #openstack-infra11:55
*** jistr has quit IRC11:55
*** mnencia has quit IRC11:55
*** npochet has quit IRC11:55
*** markmcclain has quit IRC11:55
*** cyberpear has quit IRC11:55
*** aprice has quit IRC11:55
*** sparkycollier has quit IRC11:55
*** niceplace has quit IRC11:55
*** hogepodge has quit IRC11:55
*** mordred has quit IRC11:55
*** guilhermesp has quit IRC11:55
*** seyeongkim has quit IRC11:55
*** dougwig has quit IRC11:55
*** mnasiadka has quit IRC11:55
*** jamespage has quit IRC11:55
*** kmalloc has quit IRC11:55
*** TheJulia has quit IRC11:55
*** clayg has quit IRC11:55
*** dustinc has quit IRC11:55
*** jrosser has quit IRC11:55
*** fungi has quit IRC11:55
*** odyssey4me has quit IRC11:56
*** melwitt has quit IRC11:56
*** asettle-PTO has quit IRC11:56
*** odyssey4me_ is now known as odyssey4me11:56
*** andreykurilin has quit IRC11:58
*** zbr|lunch is now known as zbr11:59
*** andreykurilin has joined #openstack-infra11:59
*** rcernin has quit IRC11:59
*** irclogbot_3 has joined #openstack-infra12:00
*** mnencia has joined #openstack-infra12:01
*** npochet has joined #openstack-infra12:01
*** cyberpear has joined #openstack-infra12:01
*** aprice has joined #openstack-infra12:01
*** sparkycollier has joined #openstack-infra12:01
*** hogepodge has joined #openstack-infra12:01
*** mordred has joined #openstack-infra12:01
*** guilhermesp has joined #openstack-infra12:01
*** seyeongkim has joined #openstack-infra12:01
*** dougwig has joined #openstack-infra12:01
*** mnasiadka has joined #openstack-infra12:01
*** jamespage has joined #openstack-infra12:01
*** kmalloc has joined #openstack-infra12:01
*** TheJulia has joined #openstack-infra12:01
*** clayg has joined #openstack-infra12:01
*** dustinc has joined #openstack-infra12:01
*** jrosser has joined #openstack-infra12:01
*** weshay_PTO is now known as weshay12:01
*** altlogbot_2 has joined #openstack-infra12:02
*** altlogbot_2 has quit IRC12:05
*** irclogbot_3 has quit IRC12:05
*** jamesmcarthur has joined #openstack-infra12:07
*** altlogbot_1 has joined #openstack-infra12:08
*** ykarel|afk has joined #openstack-infra12:08
*** pcaruana has joined #openstack-infra12:10
*** altlogbot_1 has quit IRC12:11
*** jamesmcarthur has quit IRC12:11
*** rh-jelabarre has joined #openstack-infra12:14
*** strigazi has quit IRC12:14
*** strigazi has joined #openstack-infra12:15
iceyhey - would it be possible to get zuul resources for a multi-node job where temporary OpenStack credentials are granted rather than a set of pre-built nodes?12:17
mnaserinfra-root: it looks like there is a bunch of extra images on sjc1... can we clean those up if possible?12:20
*** kinrui is now known as fungi12:21
fungiicey: you can pass those as secrets to post-review jobs if you have an openstack cloud you want a job to talk to, but cleanup (especially on jobs which get aborted prematurely) is probably the hardest problem to solve there12:23
fungimnaser: i'll take a look and see if nodepool is aware of them12:23
iceyfungi: yeah - I was wondering, more, if it was possible to get the nodepool resources that are allocated in a multi-node job given via the OS API rather then the predefined list in the ansible playbooks ;-) It seems like it's not really doable (for now?) but it would be cool12:24
*** rlandy has joined #openstack-infra12:24
mordredicey: yeah - I agree with fungi, cleanup is the hard part. we'd need to implement a nodepool driver to do the full thing you're talking about (We have one like it for kubernetes namespaces) - but we haven't implemented one yet because cleaning up resources in a domain or project is a lot of work12:25
mordred(although this hardness is a thing we discussed at the last PTG and there is work underway to make such a thing better)12:26
fungiicey: probably the only way to do it sanely is if it's a cloud you control, so that you can allocate one or more new projects per build and then (somehow) delete all resources associated with the project when the job is complete12:26
iceymordred: I entirely understand - and even though I might be an upstanding citizen who works to clean up after myself nicely, that's not a safe assumption to make generally12:26
mordredexactly12:26
fungidoing that in a public cloud where you aren't the cloud admin or can't make nodepool a cloud admin would be really tough, i think12:27
mordredin the meantime though, you could make a job such as fungi mentioned that used a predefined secret that had the ability to create projects, freeze that job, then pass the created credentials/project info to a child job using zuul_return, then have the parent job delete the project when it's unfrozen12:27
iceyregarding that: allocate a new project per build, that's exactly what I'm thinking :)12:27
iceymordred: in the mean time, I'll just keep using my own Jenkins that talks to my cloud :-P12:28
mordredwe do a similar pattern in the docker image build jobs - have the first job create a docker registry, then freeze, then pass the registry info to the child jobs12:28
iceywell, maybe ;-)12:28
mordredso, it's, you know - not super terrible to do in zuul in general - but doing it in the openstack zuul is a bit harder due to our lack of such credentials in our existing cloud providers12:29
iceyyea :-/12:29
mordredso if you wanted to do it in a zuul against a cloud(s) you controlled, I believe all of the building blocks are there and it wouldnt' be terrible12:29
mordredand would be a pretty cool thing to show12:30
fungimnaser: nodepool is only aware of 26 ready images in vexxhost-sjc1 (current and previous for each of the 13 we build there) plus one it's been trying to delete for several months (f670e6be-953b-4d4b-a931-6cbb5b568410)12:31
*** smarcet has joined #openstack-infra12:32
mordredfungi, icey: actually - we could probably write jobs like that in openstack zuul with an additional first step of a job that ran devstack or something to create a cloud, then pass those creds to the "create a project" job - might be a neat set of jobs to have in the library12:32
*** gtema has joined #openstack-infra12:32
mordredit would not be a FAST job - but we could show the idea functionally working, and then for people with different contexts, they could use them for real12:33
*** jistr_ is now known as jistr12:33
iceymordred: considering, right now, I have one of my jobs that just topped out at 6 hours...12:33
iceymordred: (well, 6 independent jobs, but run sequentially)12:33
fungimnaser: openstackclient says we have 67 active images in there though, so presumably we've leaked ~40. i'll try to filter out the ones nodepool is actually using and delete the rest here shortly12:34
mnaserfungi: thank you!12:34
fungithank you for letting us know!12:35
fungiand apologies for the inconvenience12:35
*** jcoufal has quit IRC12:38
*** smarcet has left #openstack-infra12:42
*** jamesmcarthur has joined #openstack-infra12:45
*** rfarr_ has joined #openstack-infra12:50
*** rfarr has quit IRC12:51
*** jcoufal has joined #openstack-infra12:52
openstackgerritNatal Ngétal proposed openstack/diskimage-builder master: [Configuration] Switch to stestr.  https://review.opendev.org/62941412:52
Shrewsfungi: we should try to see if there is any info in nodepool logs for those leaked images.12:53
*** altlogbot_2 has joined #openstack-infra12:54
*** bdodd has joined #openstack-infra12:54
*** aaronsheffield has joined #openstack-infra12:54
*** ekultails has joined #openstack-infra12:55
*** altlogbot_2 has quit IRC12:57
fungii'll try to take a look in a moment, sure12:59
Shrewsfungi: if you want to paste me the image ids i can check the logs. seems like a doable 1 handed task12:59
*** gtema has left #openstack-infra13:00
fungimy pastebinit has stopped working with paste.o.o so that'll take a bit too ;)13:00
*** jcoufal has quit IRC13:00
* fungi is still trying to morning, and isn't doing a very good job of it13:00
*** jcoufal has joined #openstack-infra13:00
*** jcoufal_ has joined #openstack-infra13:02
Shrewsfungi: no rush. everything i do now takes 10x as long13:05
*** zul has joined #openstack-infra13:05
*** jcoufal has quit IRC13:06
*** sthussey has joined #openstack-infra13:06
*** jcoufal has joined #openstack-infra13:07
*** jcoufal has quit IRC13:07
*** jcoufal has joined #openstack-infra13:08
*** rlandy_ has joined #openstack-infra13:09
*** rlandy_ has quit IRC13:09
*** jcoufal_ has quit IRC13:10
fungii should hook you up with a one-handed chording keyboard13:12
*** mriedem has joined #openstack-infra13:13
*** priteau has joined #openstack-infra13:14
fungiShrews: http://paste.openstack.org/show/754218/13:18
fungii'm first going to see if i get any interesting errors trying to delete f670e6be-953b-4d4b-a931-6cbb5b568410 since nodepool can't seem to13:19
*** goldyfruit has joined #openstack-infra13:19
Shrewsfungi: happen to know which nb* builds vexxhost offhand?13:21
*** pkopec has quit IRC13:21
*** pkopec has joined #openstack-infra13:22
fungiShrews: looks like it's nb0113:22
*** guimaluf has joined #openstack-infra13:22
fungioh, or maybe not. i think our macro expansion includes all the providers13:23
*** gtema has joined #openstack-infra13:24
fungiboth nb01 and nb02 have provider entries for it13:25
fungid'oh13:26
fungii guess they both do it (whoever gets to it first)13:26
funginot like launchers where they get assigned to specific providers13:26
fungiShrews: you and your trick questions13:26
fungianyway, the reason f670e6be-953b-4d4b-a931-6cbb5b568410 isn't getting deleted is that glance says that image is in use13:27
fungimaybe we have a held node which was booted from it... checking now13:28
Shrewsoh, i didn't realize we split it up like that13:28
Shrewsi'm having problems finding any logs for the leaked ones13:29
fungithey may be older than our log retention?13:30
openstackgerritSorin Sbarnea proposed openstack/openstack-zuul-jobs master: Enables molecute template to use depends-on tripleo-repos  https://review.opendev.org/66987113:31
fungiso, the only node in sjc1 older than a day is 62e01c16-019d-46b8-b92b-b3e409161e97 which is a fedora-29-vexxhost instance in ready state. doesn't correspond to our stuck ubuntu-xenial image deletion13:31
fungimnaser: any way you can determine what has f670e6be-953b-4d4b-a931-6cbb5b568410 in use for over two months?13:31
fungi(that's an image uuid in sjc1, for context)13:32
openstackgerritNate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation  https://review.opendev.org/66977513:34
*** michael-beaver has joined #openstack-infra13:34
*** pkopec has quit IRC13:36
*** pkopec has joined #openstack-infra13:38
AJaegerconfig-core, are you fine with adding tripleo-repos as required-repos to the molecule template? See https://review.opendev.org ...13:41
*** slaweq has quit IRC13:43
Shrewsfungi: i think we need the image metadata from those leaks to map to a build id that we can grep the logs for13:48
*** jamesmcarthur has quit IRC13:49
*** slaweq has joined #openstack-infra13:49
fungiahh, yeah i guess they're no longer embedded in the image names13:49
Shrewsthe number on the image name is a timestamp  :(13:49
fungilooks like `openstack image show` has it in the properties list... i should be able to parse it out of there13:50
*** ijw has joined #openstack-infra14:00
fungii've got a loop going to assemble that list now14:03
*** ijw has quit IRC14:05
fungiShrews: i've appended it to the bottom of http://paste.openstack.org/show/754221/14:13
Shrewsfungi: thx14:15
*** jamesmcarthur has joined #openstack-infra14:17
*** whoami-rajat has quit IRC14:18
AJaegermwhahaha: tempest-tripleo-ui still needs removal from project-config, could you take care of that, please?14:25
mwhahahaAJaeger: yea gimme a few14:25
AJaegersure, mwhahaha . Thanks!14:26
*** dpawlik has quit IRC14:34
*** jamesmcarthur has quit IRC14:34
*** ijw has joined #openstack-infra14:34
*** jamesmcarthur has joined #openstack-infra14:35
*** njohnston has joined #openstack-infra14:37
clarkbShrews: fungi builders build an image then upload it to all clouds. So it comes down to which builder grabs the job first (the arm64 builder is special because there is only one of them but if we had two of those it would work that way too)14:38
clarkband ya launchers are split up roughly by total max-servers count since its a thread per launch and we try to distribute the thread cost14:38
clarkbShrews: fungi as for image leaks a lot of them seem to happen due to failed uploads14:40
Shrewsfailed uploads that somehow succeed? neat14:41
fungiahh, so a builder tries to upload an image, "fails" for $reasons, then doesn't clean it up because it thinks it shouldn't exist?14:42
clarkbShrews: I haven't looked at the current crop of leaked images but no I don't think they succeeded. They remain stuck in a "saving" state14:42
clarkbfungi: yes exactly14:42
*** pgaxatte has quit IRC14:42
clarkbbasically every time I've looked at these leaked images the bulk of them are still in a saving state14:42
clarkbso I think sdk is failing somewhere and not cleaning up after itself14:43
Shrewsfungi's paste shows them as active14:43
openstackgerritAlex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui  https://review.opendev.org/66988314:44
mwhahahaAJaeger: -^14:44
fungii wonder if the builders could do something like the launchers do to identify leaked resources... add unique metadata identifying the nodepool deployment (for cases of multiple nodepools in the same project) and then delete any images which have the deployment's id but are not known in the image-list?14:44
clarkbcorvus: see comment on https://review.opendev.org/#/c/669780/314:45
fungii think we can't ever entirely trust cloud apis to be correct about when something has failed, and should probably just assume that sometimes they provide misleading answers14:46
Shrewsfungi: i think they could (and should if it cant be corrected in sdk)14:46
fungii have a feeling the sdk won't be a complete solution because it's not persistent14:47
*** ijw has quit IRC14:47
fungifor example, if the upload fails but the image which shouldn't exist doesn't show up in the list immediately due to some sort of update lag, or refuses to be deleted because there's background conversion underway, or whatever14:47
fungithe nodepool builders on the other hand maintain state and can know in the future whether there's suddenly something there which shouldn't be14:49
*** armax has joined #openstack-infra14:49
Shrewsfungi: the problem will be in getting an image ID to delete. if it failed, nodepool won't have it14:50
Shrewsso.... not sure what the solution is here14:50
fungiahh, you mean short of iterating through the image list in the provider14:50
fungii agree that's probably expensive (depending on the number of images involved)14:51
clarkbya the sdk/shade has a porcelain method that is basically "do all the steps to upload an image of which there are many"14:51
clarkbnodepool only gets the image info if that succeeds14:51
clarkb(I think)14:51
*** Fidde has quit IRC14:51
clarkbso any errors in that series of steps must be handled at the sdk layer or we need to stop using porcelain methods in ndoepool and basically rewrite shade in nodepool14:51
*** ykarel|afk is now known as ykarel14:52
corvusclarkb: i don't understand what 669780 has to do with the journal-to-console element14:52
funginodepool can still ask the api for a list of image uuids and then ask the api for the properties for each of those and look in the metadata from them to compare to what's in zk, right?14:52
fungis/api/sdk/14:52
fungii mean, it's inefficient for sure, and i don't know how much of that can be effectively cached by the sdk layer14:53
clarkbcorvus: 669780 logs debug logs which include instance console logs when instances fail to boot. the journal-to-console element includes the journal contents in the instance console log14:53
clarkbcorvus: I linked an example log for you on 66978014:54
fungii guess at the moment there's not a deleter thread for builders like there is for launchers?14:54
clarkbcorvus: this is all about improving the debuggability of those jobs. I don't see these changes as fundamentally a problem other than we may wish to use setup simplifiers like the -d option14:54
clarkbfungi: correct14:54
Shrewsfungi: there is. it cleans up deleted, failed, and unlocked uploading zk records. for deleted, it also deletes any associated upload14:55
fungioh, convenient14:56
Shrewsfungi: we'd have to change logic to not delete failed zk records w/o first waiting a period to check for the weirdness we see here14:56
clarkbShrews: fungi but it only uses the zk records there not the api records iirc14:56
Shrewsclarkb: correct14:56
clarkbwhereas with the launchers we actually check the api for leaks of things like floating ips and instances14:57
clarkbwhich is different14:57
fungii think fundamentally the problem is likely the same... we tell clouds to create resources, sometimes those clouds lie about whether a resource was created, so we can't trust them to always clean up those resources they lied to us and said failed to be created14:57
Shrewsright, but it can be added14:57
clarkbfungi: well in this case I don't think the cloud is lying14:58
fungithe sdk can maybe double-check behind the upload failure and then issue an immediate delete for things which shouldn't be there14:58
clarkbfungi: in this case I think sdk is not taking the appropriate steps on failure to undo the many steps of image uploading14:59
clarkb(in some cases)14:59
clarkbfungi: image upload is like 4 or 5 steps14:59
clarkband the sdk must undo each successful step when the current step fails14:59
Shrewsmordred: you may find convo above entertaining14:59
mordredwell - it must undo _some_ of the successful steps14:59
fungiand what creates the image in glance isn't the last of those steps?14:59
mordredfungi: nope15:00
clarkbfungi: nope15:00
mordredfungi: that would be too convenient15:00
clarkbits the opposite actually15:00
clarkbthe first step creates the image15:00
clarkbwith no image content15:00
corvusclarkb: do you, in your heart of hearts, really believe that these patches are needed for *nodepool*?  or do you think they are needed to debug operational issues with clouds and dib elements?15:01
mordredyah - well, that's the PUT method. in the rackspace task-import method, it's a bit more like what fungi expects15:01
fungioh! so there can be contentless image records in glance? that's fun15:01
mordredfungi: yah15:01
fungii guess the ones where you specify a durable image location must work that way15:02
clarkbcorvus: I guess I see those as sort of the same problem? the only way to know  if it is a cloud or dib etc problem and not a dib problem (dib didn't tell cloud to use networking properly or something) is to record the data so that we can inspect it15:02
mordredShrews, clarkb, fungi: maybe we should make more use of properties on the image records like we do for servers15:02
clarkbcorvus: you can't rule any of them in or out with the black box we currently have15:02
corvusclarkb: right.  but either way, it's not a nodepool problem.  the way i see it, is, yes, these would be handy if we ever broke the ability to upload our basic centos image to devstack.  but the testing on dib is supposed to prevent that from happening.15:02
mordredso that we can do a better job 2pcing between image records and zk records - then do what Shrews was saying and wait to remove the zk node until we're more sure all the bits have been cleaned up15:03
corvusclarkb: so *dib* needs this, and *opendev* needs this, but *nodepool* doesn't.  so we're throwing the changes at the wrong repo.15:03
*** ijw has joined #openstack-infra15:03
fungimordred: right, that's what i was wondering as well, like whether we could stash a deployment identifier in all of them so nodepool can recognize image records it created even if it lost track of them or thought they shouldn't exist15:03
corvusclarkb: i'm trying really hard to stop alienating nodepool developers by asking them to review a bunch of changes that opendev needs to support its images in new clouds.15:03
clarkbcorvus: is that actually alienating or just something that they will ignore?15:04
mordredfungi: yah- I haven't thought through it fully yet - there may be some issues with doing that we'd need to think about15:04
clarkbpersonally I see this as useful debugging for the jobs15:04
clarkbregardless of the repo in question15:04
mordredand we might need to add some helper methods in sdk that nodepool can call for this sort of thing15:04
clarkbbecause if the job fails (for whatever unknown reason) that information will help us understand why15:04
mordredbecause the mechanism of cleaning up will vary by cloud upload mechanism, so it shouldn't be nodepool trying to do this directly15:04
corvusclarkb: i do not think we should have to ask them to ignore changes like this.  that seems rude to me.15:04
clarkbcorvus: ok so you are ok with nodepool logging at a debug level (that at least seems ok for nodepool tests). Then set additional elements in jobs for other repos (like dib/sdk) as necessary to debug the functionality of those tools specifically on failure?15:06
clarkbMostly for me this is about having knowledge about why something failed and not about assigning blame to one group or another15:06
*** ijw has quit IRC15:08
corvusclarkb: i'm not trying to assign blame either.  i'm trying to (and the weeks of work i did to move all the dib jobs *out* of the nodepool repo is in service of this) to move these jobs to the right place.  we've been just assuming that everyone working on nodepool is enthralled by the idea of getting $distro images working on $cloud.  it's just not the case.  it's important, but it's important to15:08
corvusa different set of people, so let's get everyone the tools they need.15:08
corvusclarkb: honestly, i'm not thrilled by the idea of debug logging by default.  i like the idea of running them in a more production-like setting.  so maybe that should be an option too.15:09
clarkbcorvus: yup I agree. I see this as not conflicting with that though. The two things that are done here seem generically useful to debugging nodepool's job failures. Those two things are 1) log at debug level 2) include logs from test instance to understand why it might fail15:09
corvusclarkb: so you think the journald-to-console element is something we should always have in nodepool jobs, in case we break something with a change in nodepool itself?15:11
clarkbcorvus: yes, as it allows you to trace network connectivity and the ssh connection testing in particular whcih we can break on the nodepool side15:11
*** altlogbot_0 has joined #openstack-infra15:12
corvusclarkb: okay, i think i'm convinced on that one.  maybe for the debug change we should make that a job variable, so we can set it or not in nodepool as needed, and the other job users can do the same?15:15
*** bobh has joined #openstack-infra15:15
clarkbcorvus: ya that works. Then if we end up with a consistent failuer we can toggle that flag and get all the data on the next run15:15
*** altlogbot_0 has quit IRC15:17
corvusclarkb: ok, review comments adjusted accordingly, thanks :)15:18
fungiso on the image leaks topic, i agree it's probably good to make sure the sdk is really doing what it should to unwind image record creation after the api reports an upload failure before we go deciding we need more logic in nodepool to hunt for leaked images which sneak past the sdk's cleanup, though ultimately the latter is probably still useful15:20
mordredfungi: yes - I agree15:22
clarkbfungi: I've also tried for a while now to convince the glance team that the api is too complicated for the use case, but I think I mostly failed there when they rewrote the api and added steps instead of removing them15:25
* fungi needs to head to an appointment, but will be back shortly15:26
clarkbseems like the plan to store more than just disk images at one time is part of the complication there?15:26
*** bobh has quit IRC15:27
*** chandankumar is now known as raukadah15:32
*** hamzy has quit IRC15:33
openstackgerritDonny Davis proposed openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing  https://review.opendev.org/66989315:36
clarkbconfig-core infra-root ^ care to be second reviewer on that? I think we'll turn it off while we work through improving our testing then fixing the images15:38
Shrewsclarkb: corvus: where will the journald output be found using that role? directly in the nodepool logs?15:38
clarkbI think donnyd wants to do some hardware changes too15:38
clarkbShrews: yes, see http://logs.openstack.org/73/669773/2/check/dib-nodepool-functional-openstack-centos-7-src/00f6345/nodepool/nodepool-launcher.log as an example15:39
corvusShrews: yes, iff we log at debug level (thus the parent change)15:39
donnydyes. the mirror will be offline for about 10-15 minutes15:39
Shrewsah ok. i just didnt scroll down far enough in that log file to see it15:41
Shrewskinda ugly output15:41
*** altlogbot_0 has joined #openstack-infra15:42
Shrewscant it be sent to syslog?15:42
corvusShrews: we don't use syslog at all in nodepool...15:42
clarkband the issue if networking is broken is there is no good way to get syslog15:42
*** ijw has joined #openstack-infra15:43
clarkbthe console log is retrieved through the magic of qemu so is the most accessible logging in that case15:43
Shrewscorvus: yet we pull the file: http://logs.openstack.org/87/669787/2/check/nodepool-functional-openstack/44716e5/15:43
corvusShrews: (but, technically, yes, you could write a logging config to send it to syslog)...15:43
clarkbShrews: thats the syslog of the hypervisor not the guest15:43
Shrewsah15:43
*** derekh has quit IRC15:44
Shrewssorry, still trying to catch up on these test changes15:44
*** altlogbot_0 has quit IRC15:46
*** ijw has quit IRC15:48
*** diablo_rojo has joined #openstack-infra15:48
*** priteau has quit IRC15:52
*** igordc has joined #openstack-infra15:56
openstackgerritMerged openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing  https://review.opendev.org/66989315:59
openstackgerritClark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978715:59
openstackgerritClark Boylan proposed zuul/nodepool master: Add shared nodepool_opts variable to openstack func test  https://review.opendev.org/66989916:00
donnydjust a data point - FN will still be up, the only outage will be for the mirror16:01
donnydclarkb:16:01
donnydclarkb: is there a way to make the mirror ha?16:01
corvusdonnyd: yes, but it's usually not worth the effort since we have other clouds16:02
*** hamzy has joined #openstack-infra16:02
clarkbdonnyd: we could run two mirrors and an haproxy but I think we've largely decided our redundancy is by having more than one cloud region and then we don't have to overcomplicate each cloud region's setup16:02
donnydyea that makes sense to me16:03
openstackgerritClark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies  https://review.opendev.org/66977316:03
donnydjust curious16:03
clarkbcorvus: ^ ok I think those three changes do what you suggest. Have a moment to quickly check if that is better? (note I didn't update ianw's change for the debug thing as the appraoches were quite a bit different instead pushed a different change)16:04
corvusclarkb: yeah, left a comment on '9916:04
*** icarusfactor has joined #openstack-infra16:04
*** bobh has joined #openstack-infra16:04
clarkbthanks16:04
corvusclarkb: also, i was originally thinking of an explicit job var just for debug, but i'm okay with this if that's what you prefer.16:05
clarkbcorvus: is there a jinja filter for if var is true then this value instead?16:05
clarkbMostly this was simpler to write and maybe more flexible so did that16:05
* clarkb finds ansible jinja filter docs16:06
*** factor has quit IRC16:06
*** factor__ has joined #openstack-infra16:06
corvusclarkb: maybe the usual python "and or" thing would work?16:06
*** bobh has quit IRC16:06
corvusclarkb: apparently: http://jinja.pocoo.org/docs/2.10/templates/#if-expression16:07
clarkb{{ nodepool_debug and '-d' or '' }} ?16:07
corvusthat's what i was thinking, but docs say: {{ '-d' if nodepool_debug else '' }}16:07
* clarkb updates with docs this time16:08
*** altlogbot_1 has joined #openstack-infra16:08
*** icarusfactor has quit IRC16:08
*** eernst has joined #openstack-infra16:09
donnydIs there a way for me to check what build was running on a particular node. I would like to do everything possible to reduce build times.16:10
*** witek has quit IRC16:12
openstackgerritClark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs  https://review.opendev.org/66989916:13
openstackgerritClark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978716:13
*** altlogbot_1 has quit IRC16:13
corvusdonnyd: you can probably query logstash for what you need for that... you can get build results and runtimes and node name, and you can query by region.  it won't have the instance uuid though.16:13
corvusdonnyd: (it will have the ip address and time, so you can xref that way)16:13
corvusdonnyd: http://logstash.openstack.org/#/dashboard/file/logstash.json16:14
openstackgerritClark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies  https://review.opendev.org/66977316:14
clarkbcorvus: ok I think that addresses the comments. 669773 should confirm it works as expected16:14
*** ykarel is now known as ykarel|away16:14
*** rpioso is now known as rpioso|afk16:15
*** mattw4 has joined #openstack-infra16:16
clarkbcorvus: donnyd ya you can filter by node_provider:"fortnebula-regionone" AND build_name:"some-job-name" AND message:"stuff" and so on16:17
clarkbthat should allow you to narrow down jobs then dig in via their log links (which can be found by clicking on the event results logstash returns its part of the event details)16:17
*** lucasagomes has quit IRC16:17
*** tdasilva has joined #openstack-infra16:19
*** altlogbot_3 has joined #openstack-infra16:20
clarkbcomparing the boot up logs we captured to those of a host in fortnebula the fn node networkmanager logs show it doing dhcp16:20
*** whoami-rajat has joined #openstack-infra16:20
clarkbthe test node doesn't seem to dhcp. This makes me wonder if there is a bug in ordering where we don't enable dhcp on the interface? Thats a stretch need more data16:20
clarkbI think what I'll do next is build an image locally then start testing with that16:21
*** eernst has quit IRC16:21
clarkbthat will take some time for me to bootstrap though as I've let my dib VM rot16:21
clarkbactually I could just edit the centos 7 image that dib has already built via nodepool. Except that is a large image16:21
clarkbmaybe I'll start with that since networking tends to be cheap ish16:22
*** altlogbot_3 has quit IRC16:23
openstackgerritGraham Hayes proposed zuul/nodepool master: Implement an Azure driver  https://review.opendev.org/55443216:24
*** irclogbot_1 has joined #openstack-infra16:24
*** irclogbot_1 has quit IRC16:27
clarkbalright I'm going to pop out for a bike ride while I wait for test results and consider quicker rtt methods for testing the centos problem16:27
*** factor__ has quit IRC16:29
*** factor__ has joined #openstack-infra16:30
logan-clarkb: regarding your jinja question.. ternary filter16:32
sshnaidm|ruckHow to configure this - I'd like to run the same periodic job on different branches, apparently this doesn't work: https://github.com/openstack/tripleo-ci/blob/master/zuul.d/periodic.yaml#L416:32
sshnaidm|ruckit runs on master only16:32
*** dtantsur is now known as dtantsur|afk16:34
*** goldyfruit has quit IRC16:35
*** rpittau is now known as rpittau|afk16:37
*** bobh has joined #openstack-infra16:38
*** bobh has quit IRC16:38
*** ccamacho has quit IRC16:38
*** eharney has joined #openstack-infra16:41
*** factor__ has quit IRC16:44
*** mgoddard has quit IRC16:45
AJaegersshnaidm|ruck: tripleo-ci has only a master branch, so it will run only on that one...16:47
sshnaidm|ruckAJaeger, I see, so if I configure the same on branched repo it'll work?16:48
*** mgoddard has joined #openstack-infra16:48
AJaegersshnaidm|ruck: if you add it on project-config, it will work16:49
*** dpawlik has joined #openstack-infra16:49
AJaegersshnaidm|ruck: if you add it to another repo, it will only run on the branch you add it on...16:49
AJaegerso, you would need to add it to *all* branches16:49
sshnaidm|ruckAJaeger, ok, thanks16:50
AJaegersshnaidm|ruck: unless there's a magic zuul variable to apply this on all branches, best ask corvus16:50
AJaegeror check docs16:50
sshnaidm|ruckcorvus, do we have such? ^^16:50
*** gtema has quit IRC16:52
*** gtema has joined #openstack-infra16:52
*** ijw has joined #openstack-infra16:52
*** sshnaidm|ruck is now known as sshnaidm|afk16:57
AJaegerjamespage: could you check https://review.opendev.org/615273, please? Is there progress happening?16:57
AJaegersshnaidm|afk, check: https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers16:59
donnydclarkb: is your test node using just SLAAC or dhcpv6-stateless? Wonder if it even makes a difference16:59
*** udesale has quit IRC17:00
*** armax has quit IRC17:00
*** altlogbot_3 has joined #openstack-infra17:00
*** ykarel|away has quit IRC17:00
*** gtema has quit IRC17:03
*** gtema has joined #openstack-infra17:04
*** kjackal has quit IRC17:04
sshnaidm|afkAJaeger, not sure I understand this well and how I can use it..17:05
sshnaidm|afkwill ask in #zuul17:05
*** altlogbot_3 has quit IRC17:05
fungiokay, back and caught up on scrollback here17:05
fungiclarkb: donnyd: we also have hostids indexed now in logstash, if that helps17:06
*** irclogbot_2 has joined #openstack-infra17:10
*** Lucas_Gray has joined #openstack-infra17:11
*** irclogbot_2 has quit IRC17:13
openstackgerritAlex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui  https://review.opendev.org/66988317:14
*** dpawlik has quit IRC17:20
*** Lucas_Gray has quit IRC17:24
*** smarcet has joined #openstack-infra17:26
openstackgerritAndreas Jaeger proposed openstack/project-config master: Retire docs-specs  https://review.opendev.org/66885417:29
*** igordc has quit IRC17:29
AJaegerconfig-core, two project retirements for review, please: https://review.opendev.org/#/c/668854 and https://review.opendev.org/#/c/669883/17:30
*** ralonsoh has quit IRC17:36
donnydthanks for all of the pointers, I should be able to get what I am looking for to get the tuning optimal for this workload17:38
*** gyee has joined #openstack-infra17:39
*** tesseract has joined #openstack-infra17:40
clarkbdonnyd: I think NM may try both17:43
clarkbdonnyd: its not actually completely clear to me what choices it is making :/17:43
clarkbdonnyd: and the behavior seems to change if the old sysconfig stuff is being used17:43
*** hamzy has quit IRC17:47
openstackgerritClark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs  https://review.opendev.org/66989917:49
openstackgerritClark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978717:49
clarkbI think I've decided the easiest way to debug this is to hold the centos functional jobs for https://review.opendev.org/#/c/669773/4 so I'm doign that now17:51
*** hamzy has joined #openstack-infra17:52
*** rpioso|afk is now known as rpioso17:58
*** jcoufal_ has joined #openstack-infra18:00
fungiShrews: is there any other detail we should collect before i delete the leaked nodes from sjc1?18:02
*** jcoufal has quit IRC18:02
Shrewsfungi: i dont believe so18:02
*** smarcet has quit IRC18:02
fungiokay, i'll put together the list and start cleaning up18:02
zbrclarkb: fungi AJaeger corvus: please have a look at https://review.opendev.org/#/c/669871/ and let me know which path should I take (check my last comment).18:02
clarkbI thought depends on did end up as implicit required projects?18:04
*** bhavikdbavishi has quit IRC18:04
clarkbzbr: I think what I would do is have an openstack-tox-molecule job that sets the success and failure urls, then inherit from that in tripleo-tox-molecule job and set required project there. Then have two different templates that should never need to change18:06
*** electrofelix has quit IRC18:08
zbrok, i guess you wanted to say an tripleo-tox-molecule template, but is ok. i will do it that way.18:10
clarkbno jobs18:10
clarkbfirst new job sets success and failure urls, second new job sets the tripleo required projects list18:11
clarkbthen you can either put those in templates or not18:11
zbrok, i think i got it18:12
clarkbthat way if you make fixes to the other jobs the tripleo child job gets all of those fixes18:12
clarkband you don't have to double account those changes18:12
*** eharney has quit IRC18:17
*** armax has joined #openstack-infra18:20
*** jcoufal_ has quit IRC18:20
*** jamesmcarthur has quit IRC18:21
zbrclarkb: can templates have the same names as jobs?18:21
clarkbzbr: I believe they can as they are applied in different contexts18:22
zbrok. that is good as it would be useful to have both. in most cases people would refer to the template.18:23
*** eharney has joined #openstack-infra18:30
*** whoami-rajat has quit IRC18:30
openstackgerritSorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it  https://review.opendev.org/66987118:31
openstackgerritSorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it  https://review.opendev.org/66987118:33
*** jamesmcarthur has joined #openstack-infra18:34
*** melwitt has joined #openstack-infra18:34
*** jcoufal has joined #openstack-infra18:34
*** bobh has joined #openstack-infra18:35
*** irclogbot_0 has joined #openstack-infra18:36
*** bobh has quit IRC18:36
clarkbcorvus: apparently if you set -d with nodepool it does not daemonize18:38
*** tesseract has quit IRC18:39
*** irclogbot_0 has quit IRC18:39
clarkb(this causes the ansible task to start nodepool-builder to never end and we never start the launcher)18:39
clarkbthat would be why ianw wrote the logging config I guess18:40
*** panda has quit IRC18:43
*** eharney has quit IRC18:44
*** eharney has joined #openstack-infra18:46
fungimnaser: Shrews: in addition to the one image which nodepool is failing to delete due to it being mysteriously in use after more than two months (f670e6be-953b-4d4b-a931-6cbb5b568410), a couple of the images nodepool has lost track of are also refusing to delete for the same reason: 8ea352da-2907-41c9-9b49-d0028f661fd7, c2f138b5-cfe5-43e8-8a11-251d71033628, 233dd054-dae8-4b12-a3de-fe2aa7671f37,18:47
fungie2f7361d-9d04-4b93-b2fa-f4bad095499c18:47
clarkbfungi: are those all test images (eg not control plane images?)18:48
fungitest images, all18:48
fungiso no clue what instances would still be using them18:48
clarkbcould be held nodes18:48
fungiin the case of f670e6be-953b-4d4b-a931-6cbb5b568410 the only running instance anywhere near that old id for a different image type entirely18:48
*** hamzy has quit IRC18:49
*** hamzy has joined #openstack-infra18:49
fungithere are currently no held nodes in sjc118:49
clarkbcinder and glance and nova must be getting confused then18:49
clarkbdonnyd: am I good to boot test nodes in fn?18:51
fungii wouldn't be surprised if certain sorts of boot failures leave bfv references behind somewhere18:51
*** jcoufal has quit IRC18:52
*** jcoufal has joined #openstack-infra18:53
fungi#status log manually deleted 30 leaked nodepool images from vexxhost-sjc118:53
openstackstatusfungi: finished logging18:53
clarkb~5 minutes to our weekly meeting. I expect this one to be quick considering many of us have only worked ~2 days since the last one18:55
*** igordc has joined #openstack-infra18:55
donnydclarkb: yea you should be good to go18:55
*** rlandy has quit IRC18:56
fungiclarkb: i call those folks "efficient"18:56
fungii wonder if these delete failures point to leaked volumes i can find in cinder?18:57
*** jcoufal has quit IRC18:57
fungiheading down that route next18:58
clarkbfungi: oh interesting ya possible the instance delete left a volume hanging around18:58
funginot all of them i guess. i see three marked "available" in openstack volume list18:59
fungiall 80gb so must be associated with test nodes18:59
clarkbif you volume show on them and they were booted from our images you should get thei mage uuids out18:59
clarkbmaybe even the names too18:59
*** rlandy has joined #openstack-infra19:00
fungiall 3 available volumes were created on 2019-06-2119:01
*** ociuhandu has quit IRC19:02
*** pkopec has quit IRC19:02
*** jcoufal has joined #openstack-infra19:03
*** crodriguez has joined #openstack-infra19:04
*** factor has joined #openstack-infra19:06
fungioh, in addition to the 3 "available" volumes there are also 6 "error_deleting" volumes19:10
fungiokay, all 9 leaked volumes are built from one of two images: c2f138b5-cfe5-43e8-8a11-251d71033628 and e2f7361d-9d04-4b93-b2fa-f4bad095499c19:15
corvusclarkb: sigh.  i guess we should put ianw's logging config change behind your flag?19:16
*** weifan has joined #openstack-infra19:16
fungithat covers 2 of our 4 undeletable images at least19:16
clarkbcorvus: ya thats sort of what I'm thinking unless we want to change nodepool behavior19:17
corvusclarkb: i kinda do, but i think we've done as much as we can easily, and the rest is going to be a slower process, so i don't think we should depend on that here.19:19
corvus(further work on that will entail more breaking behavior changes i think)19:19
*** panda has joined #openstack-infra19:19
*** jcoufal_ has joined #openstack-infra19:21
*** ijw has quit IRC19:22
*** michael-beaver has quit IRC19:24
*** jcoufal has quit IRC19:24
*** goldyfruit has joined #openstack-infra19:25
openstackgerritAndreas Jaeger proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it  https://review.opendev.org/66987119:27
AJaegersshnaidm|afk: fixed the syntax for you ^19:28
*** jcoufal_ has quit IRC19:28
zbrAJaeger: thanks! i was away to pick up something and when I am back i see it fixed, that's a really nice experience.19:30
AJaeger;)19:31
AJaegerzbr: you can do now the tripleo-ci part of that change ;)19:32
*** pkopec has joined #openstack-infra19:32
*** eernst has joined #openstack-infra19:33
*** pkopec has quit IRC19:36
*** ttx has quit IRC19:49
*** ttx has joined #openstack-infra19:50
clarkbdonnyd: fyi getting {'message': 'No valid host was found. There are not enough hosts available.', 'code': 500, 'created': '2019-07-09T19:49:38Z'} from nova trying to create new instances19:51
donnyd60 seocnds19:51
donnydthey are just coming back up19:52
donnydswapped out my core switch for 40G while I was tinkering.. so I won't have to do it later19:52
clarkbk I'll delete and try again19:52
*** eernst has quit IRC19:52
AJaegerclarkb: was https://review.opendev.org/669871 and https://review.opendev.org/669871 what you had in mind for tox-molecule? I think you did and voted those by zbr up19:54
donnydshould be good to go now19:55
clarkbAJaeger: thats the same change twice but looknig19:55
*** ijw has joined #openstack-infra19:55
clarkbAJaeger: ya that is what I had in mind for the first bit. Then tripleo can inherit from that job and set the required projects19:55
clarkbapproved19:55
corvusclarkb: want me to do the nodepool debug change?19:56
*** ijw has joined #openstack-infra19:56
clarkbcorvus: if you are able that would be great19:56
corvuson it19:56
* clarkb boots some new instances in fn19:56
clarkbI've got my held node to poke around on once I'm done double checking on fn19:57
donnydyour other test instances should be back up and good to go as well19:57
clarkbok just confirmed that the image that fails in nodepool's func test boots properly on fortnebula with working ipv620:00
clarkbwaiting for the old unit image to boot to confirm that does not work. Then I think that confirms we have a good fix and just need to sort out why it breaks in testing20:00
donnydglance speeds are next on my hit list.. taking forever to pull down the image from the controller20:01
fungiokay, so of the 9 leaked volumes in vexxhost-sjc1 (all were from the same date btw, 2019-06-21), cinder let me delete the 3 marked as available but not the 6 in an error_deleting state (not terribly surprising)20:06
fungithose are accounting for 2 of the 5 total undeletable images in that provider, so i still have no explanation for the remaining 320:07
openstackgerritMerged openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it  https://review.opendev.org/66987120:07
funginext i guess i can scrape the image relationships for the in-use volumes and see if any turn up there20:08
*** eernst has joined #openstack-infra20:08
zbrthanks!!20:08
clarkbdonnyd: any idea why clarkb-test-centos-old-unit and clarkb-test-centos-old-unit2 seem to be stuck in BUILD state? Makes me wonder if I edited that image file improperly20:12
donnydits not the image20:12
donnydtry one more time20:13
openstackgerritJames E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs  https://review.opendev.org/66993920:14
openstackgerritJames E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978720:15
clarkbdonnyd: looks like clarkb-test-centos-old-unit2 ended up booting and it has the wrong ipv6 address and working ipv420:15
clarkbok so I think that confirms our suspicions of this being a fix. Whcih means I have to direct my focus on making the test job work20:15
corvusclarkb: i rechecked 669773; i think that's everything, but you may want to give it a quick once-over before we wait an hour for nothing :)20:16
fungibingo! all the undeletable images are also associated with in-use volumes in cinder20:16
clarkbcorvus: looking20:16
funginow to try and identify whether they're really still used by server instances. smart money says they're not20:17
fungiand it's the cinder metadata which is the problem20:17
clarkbcorvus: yup that looks about right20:17
clarkbthanks20:17
clarkbcomparing console log from held test node against the images I uploaded to fn from that test node the major difference seems to be that there are no dhcp actions logged in the failed test node side20:19
clarkbI think that is the thread to pull on20:20
Shrewsfungi: your detective skills are impressive20:20
fungiShrews: impressively slow20:21
*** xek has quit IRC20:21
clarkbI'm gonna pop out for a few then dig into that20:21
fungimy detective kit consists primarily of grep and for loops in shell20:22
*** Lucas_Gray has joined #openstack-infra20:24
*** irclogbot_1 has joined #openstack-infra20:24
openstackgerritNate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation  https://review.opendev.org/66977520:26
*** irclogbot_1 has quit IRC20:27
fungioof, 14 volumes in-use referencing those undeletable images20:31
fungiand yeah, looks like these are also all from 2019-06-2120:32
fungisomething must have happened to leave this mess behind20:32
Shrewsdon't blame me. i was under general anesthesia that day20:33
Shrewsor maybe i did do it and can't remember20:33
fungihah20:34
Shrews:)20:34
fungiand as i suspected, spot checks indicate the server-id referenced for each of these attachments, when fed into openstack server show, comes back with a "No server with a name or ID...exists" error20:34
fungialso, i mistyped, it's 24 volumes not 1420:36
*** hamzy has quit IRC20:37
*** eharney has quit IRC20:37
*** jamesmcarthur has quit IRC20:40
*** jamesmcarthur has joined #openstack-infra20:41
funginot finding any mention in irc channel logs of something happening on that date which might explain this20:41
fungii was trying to troubleshoot nodepool prematurely deleting network ports in ovh20:41
*** strigazi has quit IRC20:42
*** Fidde has joined #openstack-infra20:42
fungiShrews: that reminds me, did you catch that conversation later?20:42
Shrewsnope20:42
*** joeguo has joined #openstack-infra20:42
fungiupshot was https://review.opendev.org/666852 Increase port cleanup interval20:43
*** strigazi has joined #openstack-infra20:43
Shrewsah, i did see that change20:43
fungibasically the port gets allocated, nova takes a while scheduling the server instance, nodepool thinks the port is leaked and deletes it, instance is eventually scheduled and then the boot fails20:44
fungithe lag was prevalent enough in ovh that it was creating a fairly steady stream of boot failures for us20:45
*** jamesmcarthur has quit IRC20:48
fungimnaser: executive summary of the remainder of the leak in sjc1... something seems to have happened there on 2019-06-21 which left 24 orphaned cinder volumes in sjc1 claiming to be in-use attached to server instances which no longer exist per nova. we've also got another 6 cinder volumes from that date which are stuck in an error_deleting state and i can't seem to delete those either. there are 520:52
fungiglance images which we should be able to delete once the volumes are gone. do you want that list of 30 volume uuids?20:52
smcginnisWhen the volume lifecycle is managed by nova, I've seen cases where Nova hits an error and never tells Cinder to delete them. Maybe the case this time.20:53
*** bobh has joined #openstack-infra20:54
fungiyeah, kinda like the port leaks we see with nova/neutron coordination in some providers20:58
*** bobh has quit IRC21:01
*** pkopec has joined #openstack-infra21:01
mordredfungi: are the volumes identifiable enough, do you think, that we could delete them it neh nodepool/sdk cleanup method? I'm a little more conservative about volumes, because I don't want to delete something that someone needs  though ...21:04
openstackgerritMerged zuul/zuul master: Additional note about branches for implied-branches  https://review.opendev.org/66741521:06
fungimordred: in this case cinder seems to not be letting me delete these21:06
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2  https://review.opendev.org/66994821:06
fungii think they need admin override21:07
mordredah - well then21:07
fungicinder thinks they're still in use by servers nova says don't exist21:07
corvusmordred: ^ when you have a sec, can you give 669948 a really quick once-over and see if that's the general approach you were thinking about?21:07
corvus(or, if it's compatible with what you were thinking)21:07
corvusmordred: only look at the readme, the code is wrong; it's just a copy of the old role right now)21:08
corvusalso, hah, my example doesn't match my docs, but you get the idea21:08
mordredcorvus: ooh - yeah. that seems great - like the keys can be a good api and stuff21:08
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2  https://review.opendev.org/66994821:09
mordredcorvus: since this is docware at the moment - should this account for per-region things with some defined substitution vars?21:09
mordredcorvus: like: mirror_info.pypy: https://mirror.{{ nodepool.region }}.{{ nodepool.cloud }}.example.com/pypi/simple21:10
corvusmordred: yeah, i'm expecting we can do that more or less the way we do now21:10
mordred(for instance)21:10
mordredcool21:10
corvusi could put that in the readme as a more complex example?  or i could push up a DNM change to project-config with what it would look like for us21:11
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2  https://review.opendev.org/66994821:11
mordredcorvus: I think it's worth including in the README - sine we're defining an API for other people to be able to use as well21:11
openstackgerritMerged zuul/zuul master: Run jobs when their own config changes  https://review.opendev.org/66975221:11
mordredI mean, ansible variables work like they work, so it's not actullay documenting role-specific behavior - but it might not occur to someone that they can do that21:12
*** ekultails has quit IRC21:12
*** irclogbot_0 has joined #openstack-infra21:14
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2  https://review.opendev.org/66994821:15
corvusmordred: ya, good point.21:15
mordred++21:15
mordredthat looks great21:15
corvusmordred: i think that's probably at a point i can start an email thread on zuul-discuss to get design review on it, yeah?21:15
mordredyah21:15
corvusi'll fire off an email now21:16
mordredcorvus: I think it's going to wind up being nice for opendev too - since there will be a nice easy list of what mirrors exist21:16
corvus++21:16
mordredincidentally - I think in the new scheme we should define 'docker' as a mirror type and have it mean the new v2 style mirror - my gut says to leave the v1 style out becuase all of the clients are supposed to be able to deal with v2 mirrors now - but if we need to still support it have its key be dockerv1 or olddocker or something like that21:17
corvussounds reasonable21:18
mordredcorvus: or - should we pick a more generic - 'container-registry' - and maybe paint it chartreuse21:19
*** irclogbot_0 has quit IRC21:19
*** pcaruana has quit IRC21:19
corvushrm, that's an interesting question.21:19
*** joeguo has quit IRC21:20
*** goldyfruit has quit IRC21:20
*** weifan has quit IRC21:21
*** weifan has joined #openstack-infra21:21
*** Lucas_Gray has quit IRC21:22
clarkbseems like docker is its own protocol and maybe being specific there is good21:22
clarkbalso if anyone is wondering hand editing /etc/shadow in order to login in on the console is error prone21:23
clarkb(I have no idea what I've done wrong at this point /me starts from fresh copy of original image)21:24
*** weifan has quit IRC21:26
mordredclarkb: well - yeah - that was my first thought- but I think the container registry protocol is more generic (with things like quay.io and gcr.io)21:26
clarkbmordred: right but it is the docker container registry protocol because they don't http properly :/21:27
fungiclarkb: can you pass init=/bin/sh on the kernel command-line and then remount / rw and use passwd to set a rootpw?21:27
clarkbother services just http iirc21:27
clarkbfungi: this is the broken centos test image so I can just copy the original and edit it again21:27
mordredreally? jesus21:27
fungiahh, easier then21:28
mordredclarkb: so it might be more accurate then to call it "dockerhub"21:28
fungichroot into it and use passwd there21:28
clarkbmordred: ya it doesn't support ipv6 address literals for example21:28
mordredactually - it's more accurate to call it dockerhub anyway - since it's not a generic registry, it's specifically dockerhub we're mirroring here21:28
mordredand if we were to mirror quay.io, I'd expect a second entry called "quay" or "quay.io" or something21:29
clarkbya21:30
*** Fidde has quit IRC21:30
corvusi just realized this doc section shouldn't be the doc for configure-mirrors2, it should be a top-level doc since it's about a variable that any role might use.21:31
corvusso i'm doing a quick re-org before i send out the mail21:31
mordredcorvus: ++21:32
*** irclogbot_1 has joined #openstack-infra21:38
*** eernst has quit IRC21:41
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror_info documentation  https://review.opendev.org/66994821:42
*** irclogbot_1 has quit IRC21:43
*** donnyd is now known as donnyd_pto21:43
donnyd_ptoI will be around, just not at my desk all day21:44
donnyd_ptoping me if I am needed21:44
clarkbdonnyd_pto: did you see I think I confirmed the fix is a good fix? just have to understand why it fails in testing now21:46
clarkbfungi: so I did the chroot and chpasswd which was fine boot actually started dbus because root existed but then I couldnt login I think due to selinux contexts so fixing that now :/21:47
fungiyuck21:47
clarkbeventually I'll be able to examing the running system without working networking :/21:48
* fungi misses the days when you could log into a terminal without a message bus getting involved21:48
clarkbfungi: back when you could put unecrypted passwords in the passwd file21:48
donnyd_ptoclarkb: would it be helpful to get some actual public ipv4 addresses so you could ssh in and see whats the what?21:49
corvusshadow has been around for a long time21:49
fungiclarkb: back when most servers had a passwordless "guest" account21:49
clarkbdonnyd_pto: no this is on one of our test nodes where I'm reproducing the test failures of the fix so that I can figure out how to fix the test for the fix21:49
fungithat's so meta21:50
donnyd_ptoclarkb: Could you repeat that in terms that actual humans can understand21:50
donnyd_ptojk... lmk if you need anything from me21:50
clarkband ya now I can login as root finally. Also restorecon doesn't just work in a chroot if anyone is wondering. Thankfully dib has examples of how to make it work21:51
*** irclogbot_0 has joined #openstack-infra21:54
clarkbok I have managed to confirm /var/lib/dhclient is an empty dir21:54
clarkbimplying it is never requesting the dhcp leases as suspected21:55
*** altlogbot_3 has joined #openstack-infra21:55
*** altlogbot_3 has quit IRC21:55
clarkbnow trying a reboot to see if it works on second boot as it did on fn21:59
*** irclogbot_0 has quit IRC21:59
*** igordc has quit IRC22:00
clarkband it works after a reboot22:02
clarkbits almost like the ifcfg file isn't synced to disk by the time network manager runs?22:02
clarkbsince that is what tells network manager to run a dhclient22:02
*** aaronsheffield has quit IRC22:03
*** weifan has joined #openstack-infra22:10
*** rcernin has joined #openstack-infra22:10
*** tosky has quit IRC22:18
*** weifan has quit IRC22:19
*** jamesmcarthur has joined #openstack-infra22:21
*** pkopec_ has joined #openstack-infra22:24
fungimore like not written to the file... they should both be working off a coherent fs cache even if its writes haven't yet synchronized to the underlying block device22:24
clarkbya adding a sync changed nothing. I've now figured out how to have network manager logging verbosity go way up so rerunning with that hoping for lcues22:25
fungiyou should never need to call sync for one process to read another's file off the same copy of a filesystem. it may be that the writing process hasn't called fsync yet and still has its fd writes buffered22:26
*** pkopec has quit IRC22:26
fungier, hasn't flushed22:26
clarkbwhile in general I agree I've definitely found cases where that isn't true like testing zuul's disk usage monitor on btrfs22:28
clarkbit failed reliably for me on btrfs locally until we added some flushes22:28
*** goldyfruit has joined #openstack-infra22:28
clarkber I guess that is what you said22:28
fungiflushes of the buffered writes for the process, yes22:28
fungisync to the block device should never be necessary if both processes are going through the same filesystem interface though22:29
fungi(now if one is reading straight from the block device and the other is writing through the fs layer, sure)22:30
fungibut say you have a process writing to an open file descriptor, it's almost certainly buffering writes in process memory and flushing them to the fs only under certain conditions (say, when the buffer gets full or flush() is explicitly called)22:33
fungi(or when close() happens)22:33
fungiand yeah, with a persistent daemon writing small amounts of data, it nearly always comes down to where the author included flush() calls22:34
*** lathiat has quit IRC22:35
*** rh-jelabarre has quit IRC22:36
*** pkopec_ has quit IRC22:37
*** armax has quit IRC22:40
*** altlogbot_2 has joined #openstack-infra22:44
*** weifan has joined #openstack-infra22:45
*** altlogbot_2 has quit IRC22:49
*** tkajinam has joined #openstack-infra22:52
*** bobh has joined #openstack-infra22:52
*** slaweq has quit IRC22:54
clarkbok via nmcli and networkmanager logs I think I've figured out that there are two logical eth0 interfaces being managed by NM: "eth0" and "System eth0" the second one is the one that has ipv4.method set to auto the first is set to disabled22:55
clarkbso it must be confusing those two for some reason22:56
*** bobh has quit IRC22:57
corvusi'm going to restart zuul to pick up the self-config change23:01
clarkbhttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#114 getting closer I think23:05
openstackDebian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open]23:05
clarkbtrying https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#331 next23:08
openstackDebian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open]23:08
fungithat does sound likely23:10
clarkbif that fixes it the race must be glean not needing to run on second boot so NM starts sooner prior to kernel autoconfing?23:12
*** goldyfruit has quit IRC23:14
corvus#status log restarted all of zuul at commit 86f071464dafd584c995790dce30e2e3ca98f5ac23:15
openstackstatuscorvus: finished logging23:15
corvusfungi, clarkb: https://review.opendev.org/669762 is a neato change :)23:15
clarkbcorvus: +223:17
*** weifan has quit IRC23:17
fungii suppose it's self-testing since the restart, except that we wouldn't be able to tell the difference so i guess there's little point in rechecking?23:18
fungi(it should trigger testing for all the jobs we just removed the file matcher for it from)23:19
corvuswe *could* recheck it and we should see more jobs in check, but i vote we just +W it and see them in gate.23:20
clarkbwow I have an ip address now23:20
clarkbso Network Manager doesn't know what to do if the kernel assigns an IP to an interface23:20
clarkband of course this is totally fine on our ipv4 only clodus because there is no ipv6 advertisements23:21
*** jamesmcarthur has quit IRC23:21
corvusfungi: well, it looks like it's *not* triggering those jobs... maybe changing file matchers doesn't trip the detection.23:21
clarkbso now I need to have a think about the best way to handle this. We can't assume the interface names on boot because of biosdevname. We could update sysctl if NM is used to disabled autoconfg and RAs by default on all interfaces?23:22
*** weifan has joined #openstack-infra23:22
clarkbThen it is NetworkManager's job not the kernel's to figure it out?23:22
clarkbthis seems like such a huge NM bug23:22
openstackgerritJames E. Blair proposed opendev/system-config master: DNM: test jobs running on config update  https://review.opendev.org/66995723:24
*** dchen has joined #openstack-infra23:25
corvusfungi: ^ that seems to be doing what we expect (it's running the eavesdrop job)23:25
clarkbfungi: corvus ^ any concerns with doing a blanket sysctl disable of autoconf and accept_ra if NM is used in dib? I'm worried we'll end up breaking ipv6 somehow doing that (but end up with working ipv4 whcih is great except where we want ipv6 too)23:25
corvusif changes to file matchers don't trigger job runs, i'm okay with that :)23:25
corvusclarkb: i think it would take quite a bit of time for me to come up to speed on this problem enough to render an opinion.23:28
clarkbI'll try to summarize since it is probably worthwhile anyway23:28
clarkbOn boot if NetworkManager sees that an interface already exists it creates a dummy record internally so that it knows about that interface but then assumes something else is managing that interface and doesn't touch it further. If you have kernel/sysctl settings for autoconfing ipv6 on interfaces it is possible for the kernel to assign ipv6 addrs to an interface prior to NM starting. In that case NM23:30
fungiclarkb: yeah, i'm worried disabling kernel handling of autoconf will simply break ipv6 since nm likely relies on at least some of that (if nm reimplemented that, wow, crazypants?)23:30
clarkbdoesn't touch the interface and you get no dhcp for working ipv423:30
clarkbfungi: I think NM did reimplement that23:30
clarkbhrm except I don't haev ipv6 addr on interface that neutron says should be there so maybe not23:31
clarkbugh23:31
fungiremind me again why we switched to nm for these images? fedora depends on it and it was hard to untangle the centos elements from fedora's?23:32
clarkbfungi: rhel 8 is NM only aiui23:32
clarkbso we implemented it to ease transition to centos 823:33
clarkb(it is the default on centos 7/rhel 7 but the old sysconfig only stuff exists without NM)23:33
fungii have a hard time believing rhel8 strips ipv6 autoconf out of the kernel23:33
fungibut... maybe?23:33
clarkbits possible rhel8 uses a much newer NM that isn't broken23:33
clarkbor that multi year bug in debian shows that the distros mostly ignore the problem23:34
fungii guess they've made more surprising choices before23:34
clarkbthis does certainly seem like a fundamental flaw in using NM as network configurator when expecting working ipv623:35
clarkbthinking on the race more I think the problem we have is our boot is many minutes long (because qemu) and neutron sends RAs more frequently than that23:37
clarkbso we have a high chance of accepting an RA before network manager has started23:37
clarkbwhereas in the real world that ordering is far less likely23:37
clarkb(still possible though)23:37
*** armax has joined #openstack-infra23:39
clarkbI'll have to make a new image without that sysctl settings boot it again and check if ipv6 is configured (basically determines that ipv6 is not working with NM then)23:39
openstackgerritMerged opendev/system-config master: Remove .zuul.yaml file matchers  https://review.opendev.org/66976223:40
*** jamesmcarthur has joined #openstack-infra23:42
openstackgerritJames E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs  https://review.opendev.org/66993923:43
openstackgerritJames E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978723:43
clarkbok confirmed if I do that we don't get working ipv623:45
clarkbbut if I remove the sysctl entries then we have the ipv6 addr that is expected23:46
*** sthussey has quit IRC23:46
*** jamesmcarthur has quit IRC23:46
clarkbmaybe thats the hack around this: have the functional job use ipv6 (that likely requires we create an interface on the host node with an ipv6 addr that can route to the test node ipv6 addr(s)23:46
*** tdasilva has quit IRC23:47
clarkbthis is the sort of problem that deserves a stewing23:49
corvusi made brunswick stew on sunday23:54
clarkbhrm now I want stew for dinner23:55
*** mattw4 has quit IRC23:55
*** bobh has joined #openstack-infra23:56
*** hamzy has joined #openstack-infra23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!