Tuesday, 2019-07-09

donnyd	nice. Air handler is all fixed up	00:00
*** aaronsheffield has quit IRC		00:02
*** Lucas_Gray has quit IRC		00:12
*** nicolasbock has quit IRC		00:17
ianw	clarkb: it doesn't seem like the fedora vm is booting in ci :/	00:24
*** ijw has quit IRC		00:28
donnyd	clarkb: you want to take it the quota up to 20 ?	00:28
donnyd	need to see how many reasonably fit on each hypervisor	00:28
*** tjgresha has quit IRC		00:30
*** tjgresha has joined #openstack-infra		00:30
*** pkopec_ has quit IRC		00:36
ianw	clarkb: i've managed to grab the .qcow2 and console log for the test vm	00:38
ianw	debugging, but eventually this will timeout	00:39
donnyd	ianw: anything you need from the backend please lmk	00:41
*** ijw has joined #openstack-infra		00:41
ianw	donnyd: thanks; yeah this is in our CI test of the changes to modify the ordering of networkmanger startup	00:41
*** ijw has quit IRC		00:43
*** tdasilva has quit IRC		00:48
ianw	even after rebooting the node that nodepool created, it still doesn't come up with networking	00:50
*** igordc has quit IRC		00:59
*** hongbin has joined #openstack-infra		01:03
*** uberjay has quit IRC		01:06
*** bobh has joined #openstack-infra		01:07
*** uberjay has joined #openstack-infra		01:08
*** armax has quit IRC		01:09
*** bobh has quit IRC		01:12
*** imacdonn has quit IRC		01:13
*** imacdonn has joined #openstack-infra		01:14
*** happyhemant has quit IRC		01:17
ianw	corvus: i feel like we used to get the console logs of the attempted boot of the test vm in nodepool.log; http://logs.openstack.org/73/669773/1/check/dib-nodepool-functional-openstack-fedora-29-src/1c412cb/nodepool/nodepool.log	01:18
ianw	clarkb: i was, however, on the test host, and dumped the console, which looks like the ordering is just fine -> http://paste.openstack.org/show/754185/	01:20
*** ianychoi_ has joined #openstack-infra		01:27
clarkb	ya the ordering there looks good	01:29
*** bauzas_ has joined #openstack-infra		01:29
*** nickv1985_ has joined #openstack-infra		01:29
*** ianychoi has quit IRC		01:30
*** altlogbot_0 has quit IRC		01:30
*** bauzas has quit IRC		01:30
*** nickv1985 has quit IRC		01:30
*** bauzas_ is now known as bauzas		01:30
*** nickv1985_ is now known as nickv1985		01:30
*** diablo_rojo has quit IRC		01:31
*** altlogbot_2 has joined #openstack-infra		01:31
*** jamesdenton has quit IRC		01:32
*** hemna_ has quit IRC		01:34
*** adriant has joined #openstack-infra		01:36
openstackgerrit	Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780	01:36
ianw	clarkb: i can test the .qcow2 i captured from the run on my cloud; just need to upload it over what pretends to be upload bandwidth in .au	01:37
*** hemna_ has joined #openstack-infra		01:38
ianw	i think a) getting the debug logging back in the job so it dumps the console, and then secondly setting systemd.journald.forward_to_console=1 would be really helpful, just in general ... i'm not sure how to get that into the image	01:38
ianw	but if we were getting syslog on the console, and that was captured in logs ... well that would be nice	01:39
clarkb	++	01:40
clarkb	There is ansible that splats down dib elements	01:40
clarkb	I think you couod modify that	01:40
ianw	yeah, it's more getting it into the grub command line effectively	01:42
*** diablo_rojo has joined #openstack-infra		01:45
ianw	oh i guess it could just be set in journald.conf	01:46
prometheanfire	building a new gentoo systemd image to test on	01:55
prometheanfire	is the a bug I can read up on?	01:55
*** hongbin has quit IRC		01:57
*** jamesmcarthur has quit IRC		02:00
*** yamamoto has joined #openstack-infra		02:03
*** bobh has joined #openstack-infra		02:09
*** diablo_rojo has quit IRC		02:22
clarkb	prometheanfire: no I'donly just discovered the problem before needjng to do dinner	02:22
clarkb	prometheanfire: basically ipv6 isnt configured	02:22
clarkb	so gentoo image inst working on ipv6 only cloud	02:23
*** ijw has joined #openstack-infra		02:30
prometheanfire	huh	02:32
prometheanfire	only the gentoo image?	02:32
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784	02:39
*** hongbin has joined #openstack-infra		02:57
*** icarusfactor has quit IRC		03:03
*** yamamoto has quit IRC		03:03
openstackgerrit	Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780	03:04
openstackgerrit	Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	03:04
clarkb	prometheanfire: ya, centod and fedora seem to have an unrelated network manager race	03:10
prometheanfire	ok	03:10
*** whoami-rajat has joined #openstack-infra		03:12
*** rfolco has quit IRC		03:19
*** jamesmcarthur has joined #openstack-infra		03:20
*** bobh has quit IRC		03:28
*** jamesmcarthur has quit IRC		03:33
*** ykarel\|away has joined #openstack-infra		03:49
*** ykarel\|away is now known as ykarel		03:53
*** jamesmcarthur has joined #openstack-infra		04:01
*** hongbin has quit IRC		04:04
*** ykarel has quit IRC		04:05
*** jamesmcarthur has quit IRC		04:05
*** ijw has joined #openstack-infra		04:06
*** mandu_kim has joined #openstack-infra		04:08
*** udesale has joined #openstack-infra		04:10
*** sjfjqjfkd has joined #openstack-infra		04:13
*** ijw has quit IRC		04:13
openstackgerrit	Ian Wienand proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773	04:15
*** factor has joined #openstack-infra		04:16
*** jamesmcarthur has joined #openstack-infra		04:19
ianw	clarkb: exactly the same image (as in ./test-image-0000000002.qcow2 taken from the CI host) boots fine in my cloud environment. console logs look the same ... but yet CI hosts don't get network :/	04:22
sjfjqjfkd	Hi all, anybody know what should I append on local.conf for using gnocchi-api?	04:25
sjfjqjfkd	enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer.git stable/rockyCEILOMETER_BACKEND=gnocchienable_service gnocchi-api,gnocchi-metricdit doesn	04:25
sjfjqjfkd	doesn't work,,	04:26
*** jamesmcarthur has quit IRC		04:29
*** rcernin has quit IRC		04:30
*** rcernin has joined #openstack-infra		04:31
*** jamesmcarthur has joined #openstack-infra		04:32
*** raukadah is now known as chandankumar		04:35
*** pcaruana has joined #openstack-infra		04:35
ianw	sjfjqjfkd: probably not much help in this channel, #openstack-telemetry will have more help, however, you might like to use something like http://codesearch.openstack.org/?q=CEILOMETER_BACKEND.*gnocchi&i=nope&files=&repos= to see what other things might do similar and how they do it	04:35
openstackgerrit	Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780	04:37
openstackgerrit	Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	04:37
*** ykarel has joined #openstack-infra		04:38
*** pcaruana has quit IRC		04:38
*** ijw has joined #openstack-infra		04:41
*** ijw has quit IRC		04:47
AJaeger	config-core, please review https://review.opendev.org/669727 https://review.opendev.org/667949 https://review.opendev.org/665910 and https://review.opendev.org/668708	04:49
*** kjackal has joined #openstack-infra		04:51
*** gyee has quit IRC		04:59
*** ricolin has joined #openstack-infra		05:03
*** jamesmcarthur has quit IRC		05:05
*** ccamacho has quit IRC		05:06
sshnaidm\|afk	Is there problem with ovh BHS1 cloud? We have last 7 gate failures on it only, all of them fail to download containers	05:06
ianw	sshnaidm\|afk: not that i'm aware of. was that via the proxies on the mirror? note if the remote end is having issues the proxy doesn't help :)	05:09
sshnaidm\|afk	ianw, yeah, we use proxies..	05:10
*** ijw has joined #openstack-infra		05:16
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784	05:21
*** ijw has quit IRC		05:23
*** rh-jelabarre has quit IRC		05:24
*** ykarel_ has joined #openstack-infra		05:39
ianw	clarkb: dropped a comment in your change https://review.opendev.org/#/c/669772/ on what i'm thinking for moving forward ...	05:39
*** ykarel has quit IRC		05:40
*** ykarel_ has quit IRC		05:42
*** kjackal has quit IRC		05:43
*** jamesmcarthur has joined #openstack-infra		05:44
*** ykarel has joined #openstack-infra		05:46
*** udesale has quit IRC		05:48
*** ijw has joined #openstack-infra		05:49
*** jamesmcarthur has quit IRC		05:51
*** ccamacho has joined #openstack-infra		05:53
*** ijw has quit IRC		05:54
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784	05:57
*** ijw has joined #openstack-infra		05:57
*** rcernin has quit IRC		05:57
*** kjackal has joined #openstack-infra		06:00
*** udesale has joined #openstack-infra		06:06
*** ijw has quit IRC		06:06
*** udesale has quit IRC		06:09
*** udesale has joined #openstack-infra		06:10
*** ykarel is now known as ykarel\|meetig		06:16
*** ykarel\|meetig is now known as ykarel\|meeting		06:16
*** dpawlik has joined #openstack-infra		06:19
*** jamesmcarthur has joined #openstack-infra		06:24
*** jamesmcarthur has quit IRC		06:28
*** ccamacho has quit IRC		06:29
*** ccamacho has joined #openstack-infra		06:29
*** pgaxatte has joined #openstack-infra		06:33
*** ijw has joined #openstack-infra		06:34
*** ijw has quit IRC		06:40
*** udesale has quit IRC		06:40
*** ociuhandu has joined #openstack-infra		06:46
*** ociuhandu has quit IRC		06:52
*** bhavikdbavishi has joined #openstack-infra		06:54
*** rpittau\|afk is now known as rpittau		06:58
*** jamesmcarthur has joined #openstack-infra		06:59
*** slaweq has joined #openstack-infra		07:02
*** pcaruana has joined #openstack-infra		07:03
*** jamesmcarthur has quit IRC		07:05
*** ginopc has joined #openstack-infra		07:11
*** piotrowskim has joined #openstack-infra		07:12
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784	07:15
*** ianw is now known as ianw_pto		07:16
*** udesale has joined #openstack-infra		07:24
*** tosky has joined #openstack-infra		07:28
*** ralonsoh has joined #openstack-infra		07:30
openstackgerrit	Merged openstack/project-config master: Retire tempest-tripleo-ui https://review.opendev.org/667949	07:30
openstackgerrit	Merged openstack/project-config master: Adding CI for ironic-prometheus-exporter https://review.opendev.org/665910	07:32
openstackgerrit	Merged openstack/openstack-zuul-jobs master: Remove base role integration testing https://review.opendev.org/669727	07:34
*** lucasagomes has joined #openstack-infra		07:40
*** witek has joined #openstack-infra		07:40
*** iurygregory has joined #openstack-infra		07:41
*** priteau has joined #openstack-infra		07:44
*** bhavikdbavishi has quit IRC		07:46
*** ykarel_ has joined #openstack-infra		07:50
*** ykarel_ is now known as ykarel		07:52
*** ykarel\|meeting has quit IRC		07:52
*** ykarel is now known as ykarel\|lunch		07:53
*** sjfjqjfkd has quit IRC		07:56
*** dtantsur\|afk is now known as dtantsur		07:57
*** ijw has joined #openstack-infra		07:58
*** rcernin has joined #openstack-infra		08:02
*** ijw has quit IRC		08:03
*** rcernin has quit IRC		08:11
*** ociuhandu has joined #openstack-infra		08:17
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Improve error reporting for zuul dequeue https://review.opendev.org/669813	08:17
*** panda has quit IRC		08:22
*** panda has joined #openstack-infra		08:24
*** rcernin has joined #openstack-infra		08:27
*** pkopec has joined #openstack-infra		08:30
*** bhavikdbavishi has joined #openstack-infra		08:38
*** ijw has joined #openstack-infra		08:38
*** ykarel\|lunch is now known as ykarel		08:42
*** tkajinam has quit IRC		08:42
*** ijw has quit IRC		08:44
*** priteau has quit IRC		08:47
*** priteau has joined #openstack-infra		08:48
*** sshnaidm\|afk is now known as sshnaidm\|ruck		08:54
*** Fidde has joined #openstack-infra		08:55
*** jamesmcarthur has joined #openstack-infra		09:01
*** jamesmcarthur has quit IRC		09:05
*** jamesmcarthur has joined #openstack-infra		09:32
*** jamesmcarthur has quit IRC		09:37
openstackgerrit	Merged openstack/project-config master: Mark networking-ovn-tempest-dsvm-ovs-release as voting https://review.opendev.org/668708	09:39
openstackgerrit	Merged opendev/irc-meetings master: Create a meeting for Networking OVN project https://review.opendev.org/668013	09:39
*** pkopec has quit IRC		09:40
*** pkopec has joined #openstack-infra		09:45
*** bhavikdbavishi has quit IRC		09:47
*** pcaruana has quit IRC		09:48
*** derekh has joined #openstack-infra		10:00
*** priteau has quit IRC		10:03
*** nicolasbock has joined #openstack-infra		10:10
*** ricolin has quit IRC		10:14
*** ricolin has joined #openstack-infra		10:22
*** ricolin has quit IRC		10:23
*** ijw has joined #openstack-infra		10:30
*** xek_ is now known as xek		10:32
*** udesale has quit IRC		10:34
*** dchen has quit IRC		10:35
*** kjackal has quit IRC		10:48
*** jamesdenton has joined #openstack-infra		10:50
*** yamamoto has joined #openstack-infra		11:03
*** dosaboy has quit IRC		11:05
*** nicolasbock has quit IRC		11:06
*** yamamoto has quit IRC		11:08
*** roman_g has joined #openstack-infra		11:08
*** dosaboy has joined #openstack-infra		11:09
*** ykarel is now known as ykarel\|afk		11:10
*** tesseract has joined #openstack-infra		11:11
*** tesseract has quit IRC		11:13
*** tesseract has joined #openstack-infra		11:15
*** altlogbot_2 has quit IRC		11:19
*** irclogbot_0 has quit IRC		11:19
*** altlogbot_2 has joined #openstack-infra		11:20
*** zbr is now known as zbr\|lunch		11:22
*** tesseract has quit IRC		11:23
*** apetrich has joined #openstack-infra		11:25
*** altlogbot_2 has quit IRC		11:25
*** kjackal has joined #openstack-infra		11:27
*** raissa has joined #openstack-infra		11:29
*** jcoufal has joined #openstack-infra		11:29
*** raissa has quit IRC		11:30
*** bhavikdbavishi has joined #openstack-infra		11:30
*** jamesmcarthur has joined #openstack-infra		11:34
*** ijw has quit IRC		11:34
*** jamesmcarthur has quit IRC		11:39
openstackgerrit	Monty Taylor proposed zuul/zuul master: Spec: Add a Kubernetes Operator for Zuul https://review.opendev.org/659180	11:39
*** ricolin has joined #openstack-infra		11:44
*** jcoufal has quit IRC		11:45
*** jcoufal has joined #openstack-infra		11:46
*** odyssey4me_ has joined #openstack-infra		11:49
*** jistr_ has joined #openstack-infra		11:51
*** niceplace_ has joined #openstack-infra		11:52
*** ykarel\|afk has quit IRC		11:53
*** kinrui has joined #openstack-infra		11:53
*** electrofelix has joined #openstack-infra		11:54
*** udesale has joined #openstack-infra		11:55
*** jistr has quit IRC		11:55
*** mnencia has quit IRC		11:55
*** npochet has quit IRC		11:55
*** markmcclain has quit IRC		11:55
*** cyberpear has quit IRC		11:55
*** aprice has quit IRC		11:55
*** sparkycollier has quit IRC		11:55
*** niceplace has quit IRC		11:55
*** hogepodge has quit IRC		11:55
*** mordred has quit IRC		11:55
*** guilhermesp has quit IRC		11:55
*** seyeongkim has quit IRC		11:55
*** dougwig has quit IRC		11:55
*** mnasiadka has quit IRC		11:55
*** jamespage has quit IRC		11:55
*** kmalloc has quit IRC		11:55
*** TheJulia has quit IRC		11:55
*** clayg has quit IRC		11:55
*** dustinc has quit IRC		11:55
*** jrosser has quit IRC		11:55
*** fungi has quit IRC		11:55
*** odyssey4me has quit IRC		11:56
*** melwitt has quit IRC		11:56
*** asettle-PTO has quit IRC		11:56
*** odyssey4me_ is now known as odyssey4me		11:56
*** andreykurilin has quit IRC		11:58
*** zbr\|lunch is now known as zbr		11:59
*** andreykurilin has joined #openstack-infra		11:59
*** rcernin has quit IRC		11:59
*** irclogbot_3 has joined #openstack-infra		12:00
*** mnencia has joined #openstack-infra		12:01
*** npochet has joined #openstack-infra		12:01
*** cyberpear has joined #openstack-infra		12:01
*** aprice has joined #openstack-infra		12:01
*** sparkycollier has joined #openstack-infra		12:01
*** hogepodge has joined #openstack-infra		12:01
*** mordred has joined #openstack-infra		12:01
*** guilhermesp has joined #openstack-infra		12:01
*** seyeongkim has joined #openstack-infra		12:01
*** dougwig has joined #openstack-infra		12:01
*** mnasiadka has joined #openstack-infra		12:01
*** jamespage has joined #openstack-infra		12:01
*** kmalloc has joined #openstack-infra		12:01
*** TheJulia has joined #openstack-infra		12:01
*** clayg has joined #openstack-infra		12:01
*** dustinc has joined #openstack-infra		12:01
*** jrosser has joined #openstack-infra		12:01
*** weshay_PTO is now known as weshay		12:01
*** altlogbot_2 has joined #openstack-infra		12:02
*** altlogbot_2 has quit IRC		12:05
*** irclogbot_3 has quit IRC		12:05
*** jamesmcarthur has joined #openstack-infra		12:07
*** altlogbot_1 has joined #openstack-infra		12:08
*** ykarel\|afk has joined #openstack-infra		12:08
*** pcaruana has joined #openstack-infra		12:10
*** altlogbot_1 has quit IRC		12:11
*** jamesmcarthur has quit IRC		12:11
*** rh-jelabarre has joined #openstack-infra		12:14
*** strigazi has quit IRC		12:14
*** strigazi has joined #openstack-infra		12:15
icey	hey - would it be possible to get zuul resources for a multi-node job where temporary OpenStack credentials are granted rather than a set of pre-built nodes?	12:17
mnaser	infra-root: it looks like there is a bunch of extra images on sjc1... can we clean those up if possible?	12:20
*** kinrui is now known as fungi		12:21
fungi	icey: you can pass those as secrets to post-review jobs if you have an openstack cloud you want a job to talk to, but cleanup (especially on jobs which get aborted prematurely) is probably the hardest problem to solve there	12:23
fungi	mnaser: i'll take a look and see if nodepool is aware of them	12:23
icey	fungi: yeah - I was wondering, more, if it was possible to get the nodepool resources that are allocated in a multi-node job given via the OS API rather then the predefined list in the ansible playbooks ;-) It seems like it's not really doable (for now?) but it would be cool	12:24
*** rlandy has joined #openstack-infra		12:24
mordred	icey: yeah - I agree with fungi, cleanup is the hard part. we'd need to implement a nodepool driver to do the full thing you're talking about (We have one like it for kubernetes namespaces) - but we haven't implemented one yet because cleaning up resources in a domain or project is a lot of work	12:25
mordred	(although this hardness is a thing we discussed at the last PTG and there is work underway to make such a thing better)	12:26
fungi	icey: probably the only way to do it sanely is if it's a cloud you control, so that you can allocate one or more new projects per build and then (somehow) delete all resources associated with the project when the job is complete	12:26
icey	mordred: I entirely understand - and even though I might be an upstanding citizen who works to clean up after myself nicely, that's not a safe assumption to make generally	12:26
mordred	exactly	12:26
fungi	doing that in a public cloud where you aren't the cloud admin or can't make nodepool a cloud admin would be really tough, i think	12:27
mordred	in the meantime though, you could make a job such as fungi mentioned that used a predefined secret that had the ability to create projects, freeze that job, then pass the created credentials/project info to a child job using zuul_return, then have the parent job delete the project when it's unfrozen	12:27
icey	regarding that: allocate a new project per build, that's exactly what I'm thinking :)	12:27
icey	mordred: in the mean time, I'll just keep using my own Jenkins that talks to my cloud :-P	12:28
mordred	we do a similar pattern in the docker image build jobs - have the first job create a docker registry, then freeze, then pass the registry info to the child jobs	12:28
icey	well, maybe ;-)	12:28
mordred	so, it's, you know - not super terrible to do in zuul in general - but doing it in the openstack zuul is a bit harder due to our lack of such credentials in our existing cloud providers	12:29
icey	yea :-/	12:29
mordred	so if you wanted to do it in a zuul against a cloud(s) you controlled, I believe all of the building blocks are there and it wouldnt' be terrible	12:29
mordred	and would be a pretty cool thing to show	12:30
fungi	mnaser: nodepool is only aware of 26 ready images in vexxhost-sjc1 (current and previous for each of the 13 we build there) plus one it's been trying to delete for several months (f670e6be-953b-4d4b-a931-6cbb5b568410)	12:31
*** smarcet has joined #openstack-infra		12:32
mordred	fungi, icey: actually - we could probably write jobs like that in openstack zuul with an additional first step of a job that ran devstack or something to create a cloud, then pass those creds to the "create a project" job - might be a neat set of jobs to have in the library	12:32
*** gtema has joined #openstack-infra		12:32
mordred	it would not be a FAST job - but we could show the idea functionally working, and then for people with different contexts, they could use them for real	12:33
*** jistr_ is now known as jistr		12:33
icey	mordred: considering, right now, I have one of my jobs that just topped out at 6 hours...	12:33
icey	mordred: (well, 6 independent jobs, but run sequentially)	12:33
fungi	mnaser: openstackclient says we have 67 active images in there though, so presumably we've leaked ~40. i'll try to filter out the ones nodepool is actually using and delete the rest here shortly	12:34
mnaser	fungi: thank you!	12:34
fungi	thank you for letting us know!	12:35
fungi	and apologies for the inconvenience	12:35
*** jcoufal has quit IRC		12:38
*** smarcet has left #openstack-infra		12:42
*** jamesmcarthur has joined #openstack-infra		12:45
*** rfarr_ has joined #openstack-infra		12:50
*** rfarr has quit IRC		12:51
*** jcoufal has joined #openstack-infra		12:52
openstackgerrit	Natal Ngétal proposed openstack/diskimage-builder master: [Configuration] Switch to stestr. https://review.opendev.org/629414	12:52
Shrews	fungi: we should try to see if there is any info in nodepool logs for those leaked images.	12:53
*** altlogbot_2 has joined #openstack-infra		12:54
*** bdodd has joined #openstack-infra		12:54
*** aaronsheffield has joined #openstack-infra		12:54
*** ekultails has joined #openstack-infra		12:55
*** altlogbot_2 has quit IRC		12:57
fungi	i'll try to take a look in a moment, sure	12:59
Shrews	fungi: if you want to paste me the image ids i can check the logs. seems like a doable 1 handed task	12:59
*** gtema has left #openstack-infra		13:00
fungi	my pastebinit has stopped working with paste.o.o so that'll take a bit too ;)	13:00
*** jcoufal has quit IRC		13:00
* fungi is still trying to morning, and isn't doing a very good job of it		13:00
*** jcoufal has joined #openstack-infra		13:00
*** jcoufal_ has joined #openstack-infra		13:02
Shrews	fungi: no rush. everything i do now takes 10x as long	13:05
*** zul has joined #openstack-infra		13:05
*** jcoufal has quit IRC		13:06
*** sthussey has joined #openstack-infra		13:06
*** jcoufal has joined #openstack-infra		13:07
*** jcoufal has quit IRC		13:07
*** jcoufal has joined #openstack-infra		13:08
*** rlandy_ has joined #openstack-infra		13:09
*** rlandy_ has quit IRC		13:09
*** jcoufal_ has quit IRC		13:10
fungi	i should hook you up with a one-handed chording keyboard	13:12
*** mriedem has joined #openstack-infra		13:13
*** priteau has joined #openstack-infra		13:14
fungi	Shrews: http://paste.openstack.org/show/754218/	13:18
fungi	i'm first going to see if i get any interesting errors trying to delete f670e6be-953b-4d4b-a931-6cbb5b568410 since nodepool can't seem to	13:19
*** goldyfruit has joined #openstack-infra		13:19
Shrews	fungi: happen to know which nb* builds vexxhost offhand?	13:21
*** pkopec has quit IRC		13:21
*** pkopec has joined #openstack-infra		13:22
fungi	Shrews: looks like it's nb01	13:22
*** guimaluf has joined #openstack-infra		13:22
fungi	oh, or maybe not. i think our macro expansion includes all the providers	13:23
*** gtema has joined #openstack-infra		13:24
fungi	both nb01 and nb02 have provider entries for it	13:25
fungi	d'oh	13:26
fungi	i guess they both do it (whoever gets to it first)	13:26
fungi	not like launchers where they get assigned to specific providers	13:26
fungi	Shrews: you and your trick questions	13:26
fungi	anyway, the reason f670e6be-953b-4d4b-a931-6cbb5b568410 isn't getting deleted is that glance says that image is in use	13:27
fungi	maybe we have a held node which was booted from it... checking now	13:28
Shrews	oh, i didn't realize we split it up like that	13:28
Shrews	i'm having problems finding any logs for the leaked ones	13:29
fungi	they may be older than our log retention?	13:30
openstackgerrit	Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Enables molecute template to use depends-on tripleo-repos https://review.opendev.org/669871	13:31
fungi	so, the only node in sjc1 older than a day is 62e01c16-019d-46b8-b92b-b3e409161e97 which is a fedora-29-vexxhost instance in ready state. doesn't correspond to our stuck ubuntu-xenial image deletion	13:31
fungi	mnaser: any way you can determine what has f670e6be-953b-4d4b-a931-6cbb5b568410 in use for over two months?	13:31
fungi	(that's an image uuid in sjc1, for context)	13:32
openstackgerrit	Nate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation https://review.opendev.org/669775	13:34
*** michael-beaver has joined #openstack-infra		13:34
*** pkopec has quit IRC		13:36
*** pkopec has joined #openstack-infra		13:38
AJaeger	config-core, are you fine with adding tripleo-repos as required-repos to the molecule template? See https://review.opendev.org ...	13:41
*** slaweq has quit IRC		13:43
Shrews	fungi: i think we need the image metadata from those leaks to map to a build id that we can grep the logs for	13:48
*** jamesmcarthur has quit IRC		13:49
*** slaweq has joined #openstack-infra		13:49
fungi	ahh, yeah i guess they're no longer embedded in the image names	13:49
Shrews	the number on the image name is a timestamp :(	13:49
fungi	looks like `openstack image show` has it in the properties list... i should be able to parse it out of there	13:50
*** ijw has joined #openstack-infra		14:00
fungi	i've got a loop going to assemble that list now	14:03
*** ijw has quit IRC		14:05
fungi	Shrews: i've appended it to the bottom of http://paste.openstack.org/show/754221/	14:13
Shrews	fungi: thx	14:15
*** jamesmcarthur has joined #openstack-infra		14:17
*** whoami-rajat has quit IRC		14:18
AJaeger	mwhahaha: tempest-tripleo-ui still needs removal from project-config, could you take care of that, please?	14:25
mwhahaha	AJaeger: yea gimme a few	14:25
AJaeger	sure, mwhahaha . Thanks!	14:26
*** dpawlik has quit IRC		14:34
*** jamesmcarthur has quit IRC		14:34
*** ijw has joined #openstack-infra		14:34
*** jamesmcarthur has joined #openstack-infra		14:35
*** njohnston has joined #openstack-infra		14:37
clarkb	Shrews: fungi builders build an image then upload it to all clouds. So it comes down to which builder grabs the job first (the arm64 builder is special because there is only one of them but if we had two of those it would work that way too)	14:38
clarkb	and ya launchers are split up roughly by total max-servers count since its a thread per launch and we try to distribute the thread cost	14:38
clarkb	Shrews: fungi as for image leaks a lot of them seem to happen due to failed uploads	14:40
Shrews	failed uploads that somehow succeed? neat	14:41
fungi	ahh, so a builder tries to upload an image, "fails" for $reasons, then doesn't clean it up because it thinks it shouldn't exist?	14:42
clarkb	Shrews: I haven't looked at the current crop of leaked images but no I don't think they succeeded. They remain stuck in a "saving" state	14:42
clarkb	fungi: yes exactly	14:42
*** pgaxatte has quit IRC		14:42
clarkb	basically every time I've looked at these leaked images the bulk of them are still in a saving state	14:42
clarkb	so I think sdk is failing somewhere and not cleaning up after itself	14:43
Shrews	fungi's paste shows them as active	14:43
openstackgerrit	Alex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui https://review.opendev.org/669883	14:44
mwhahaha	AJaeger: -^	14:44
fungi	i wonder if the builders could do something like the launchers do to identify leaked resources... add unique metadata identifying the nodepool deployment (for cases of multiple nodepools in the same project) and then delete any images which have the deployment's id but are not known in the image-list?	14:44
clarkb	corvus: see comment on https://review.opendev.org/#/c/669780/3	14:45
fungi	i think we can't ever entirely trust cloud apis to be correct about when something has failed, and should probably just assume that sometimes they provide misleading answers	14:46
Shrews	fungi: i think they could (and should if it cant be corrected in sdk)	14:46
fungi	i have a feeling the sdk won't be a complete solution because it's not persistent	14:47
*** ijw has quit IRC		14:47
fungi	for example, if the upload fails but the image which shouldn't exist doesn't show up in the list immediately due to some sort of update lag, or refuses to be deleted because there's background conversion underway, or whatever	14:47
fungi	the nodepool builders on the other hand maintain state and can know in the future whether there's suddenly something there which shouldn't be	14:49
*** armax has joined #openstack-infra		14:49
Shrews	fungi: the problem will be in getting an image ID to delete. if it failed, nodepool won't have it	14:50
Shrews	so.... not sure what the solution is here	14:50
fungi	ahh, you mean short of iterating through the image list in the provider	14:50
fungi	i agree that's probably expensive (depending on the number of images involved)	14:51
clarkb	ya the sdk/shade has a porcelain method that is basically "do all the steps to upload an image of which there are many"	14:51
clarkb	nodepool only gets the image info if that succeeds	14:51
clarkb	(I think)	14:51
*** Fidde has quit IRC		14:51
clarkb	so any errors in that series of steps must be handled at the sdk layer or we need to stop using porcelain methods in ndoepool and basically rewrite shade in nodepool	14:51
*** ykarel\|afk is now known as ykarel		14:52
corvus	clarkb: i don't understand what 669780 has to do with the journal-to-console element	14:52
fungi	nodepool can still ask the api for a list of image uuids and then ask the api for the properties for each of those and look in the metadata from them to compare to what's in zk, right?	14:52
fungi	s/api/sdk/	14:52
fungi	i mean, it's inefficient for sure, and i don't know how much of that can be effectively cached by the sdk layer	14:53
clarkb	corvus: 669780 logs debug logs which include instance console logs when instances fail to boot. the journal-to-console element includes the journal contents in the instance console log	14:53
clarkb	corvus: I linked an example log for you on 669780	14:54
fungi	i guess at the moment there's not a deleter thread for builders like there is for launchers?	14:54
clarkb	corvus: this is all about improving the debuggability of those jobs. I don't see these changes as fundamentally a problem other than we may wish to use setup simplifiers like the -d option	14:54
clarkb	fungi: correct	14:54
Shrews	fungi: there is. it cleans up deleted, failed, and unlocked uploading zk records. for deleted, it also deletes any associated upload	14:55
fungi	oh, convenient	14:56
Shrews	fungi: we'd have to change logic to not delete failed zk records w/o first waiting a period to check for the weirdness we see here	14:56
clarkb	Shrews: fungi but it only uses the zk records there not the api records iirc	14:56
Shrews	clarkb: correct	14:56
clarkb	whereas with the launchers we actually check the api for leaks of things like floating ips and instances	14:57
clarkb	which is different	14:57
fungi	i think fundamentally the problem is likely the same... we tell clouds to create resources, sometimes those clouds lie about whether a resource was created, so we can't trust them to always clean up those resources they lied to us and said failed to be created	14:57
Shrews	right, but it can be added	14:57
clarkb	fungi: well in this case I don't think the cloud is lying	14:58
fungi	the sdk can maybe double-check behind the upload failure and then issue an immediate delete for things which shouldn't be there	14:58
clarkb	fungi: in this case I think sdk is not taking the appropriate steps on failure to undo the many steps of image uploading	14:59
clarkb	(in some cases)	14:59
clarkb	fungi: image upload is like 4 or 5 steps	14:59
clarkb	and the sdk must undo each successful step when the current step fails	14:59
Shrews	mordred: you may find convo above entertaining	14:59
mordred	well - it must undo _some_ of the successful steps	14:59
fungi	and what creates the image in glance isn't the last of those steps?	14:59
mordred	fungi: nope	15:00
clarkb	fungi: nope	15:00
mordred	fungi: that would be too convenient	15:00
clarkb	its the opposite actually	15:00
clarkb	the first step creates the image	15:00
clarkb	with no image content	15:00
corvus	clarkb: do you, in your heart of hearts, really believe that these patches are needed for nodepool? or do you think they are needed to debug operational issues with clouds and dib elements?	15:01
mordred	yah - well, that's the PUT method. in the rackspace task-import method, it's a bit more like what fungi expects	15:01
fungi	oh! so there can be contentless image records in glance? that's fun	15:01
mordred	fungi: yah	15:01
fungi	i guess the ones where you specify a durable image location must work that way	15:02
clarkb	corvus: I guess I see those as sort of the same problem? the only way to know if it is a cloud or dib etc problem and not a dib problem (dib didn't tell cloud to use networking properly or something) is to record the data so that we can inspect it	15:02
mordred	Shrews, clarkb, fungi: maybe we should make more use of properties on the image records like we do for servers	15:02
clarkb	corvus: you can't rule any of them in or out with the black box we currently have	15:02
corvus	clarkb: right. but either way, it's not a nodepool problem. the way i see it, is, yes, these would be handy if we ever broke the ability to upload our basic centos image to devstack. but the testing on dib is supposed to prevent that from happening.	15:02
mordred	so that we can do a better job 2pcing between image records and zk records - then do what Shrews was saying and wait to remove the zk node until we're more sure all the bits have been cleaned up	15:03
corvus	clarkb: so dib needs this, and opendev needs this, but nodepool doesn't. so we're throwing the changes at the wrong repo.	15:03
*** ijw has joined #openstack-infra		15:03
fungi	mordred: right, that's what i was wondering as well, like whether we could stash a deployment identifier in all of them so nodepool can recognize image records it created even if it lost track of them or thought they shouldn't exist	15:03
corvus	clarkb: i'm trying really hard to stop alienating nodepool developers by asking them to review a bunch of changes that opendev needs to support its images in new clouds.	15:03
clarkb	corvus: is that actually alienating or just something that they will ignore?	15:04
mordred	fungi: yah- I haven't thought through it fully yet - there may be some issues with doing that we'd need to think about	15:04
clarkb	personally I see this as useful debugging for the jobs	15:04
clarkb	regardless of the repo in question	15:04
mordred	and we might need to add some helper methods in sdk that nodepool can call for this sort of thing	15:04
clarkb	because if the job fails (for whatever unknown reason) that information will help us understand why	15:04
mordred	because the mechanism of cleaning up will vary by cloud upload mechanism, so it shouldn't be nodepool trying to do this directly	15:04
corvus	clarkb: i do not think we should have to ask them to ignore changes like this. that seems rude to me.	15:04
clarkb	corvus: ok so you are ok with nodepool logging at a debug level (that at least seems ok for nodepool tests). Then set additional elements in jobs for other repos (like dib/sdk) as necessary to debug the functionality of those tools specifically on failure?	15:06
clarkb	Mostly for me this is about having knowledge about why something failed and not about assigning blame to one group or another	15:06
*** ijw has quit IRC		15:08
corvus	clarkb: i'm not trying to assign blame either. i'm trying to (and the weeks of work i did to move all the dib jobs out of the nodepool repo is in service of this) to move these jobs to the right place. we've been just assuming that everyone working on nodepool is enthralled by the idea of getting $distro images working on $cloud. it's just not the case. it's important, but it's important to	15:08
corvus	a different set of people, so let's get everyone the tools they need.	15:08
corvus	clarkb: honestly, i'm not thrilled by the idea of debug logging by default. i like the idea of running them in a more production-like setting. so maybe that should be an option too.	15:09
clarkb	corvus: yup I agree. I see this as not conflicting with that though. The two things that are done here seem generically useful to debugging nodepool's job failures. Those two things are 1) log at debug level 2) include logs from test instance to understand why it might fail	15:09
corvus	clarkb: so you think the journald-to-console element is something we should always have in nodepool jobs, in case we break something with a change in nodepool itself?	15:11
clarkb	corvus: yes, as it allows you to trace network connectivity and the ssh connection testing in particular whcih we can break on the nodepool side	15:11
*** altlogbot_0 has joined #openstack-infra		15:12
corvus	clarkb: okay, i think i'm convinced on that one. maybe for the debug change we should make that a job variable, so we can set it or not in nodepool as needed, and the other job users can do the same?	15:15
*** bobh has joined #openstack-infra		15:15
clarkb	corvus: ya that works. Then if we end up with a consistent failuer we can toggle that flag and get all the data on the next run	15:15
*** altlogbot_0 has quit IRC		15:17
corvus	clarkb: ok, review comments adjusted accordingly, thanks :)	15:18
fungi	so on the image leaks topic, i agree it's probably good to make sure the sdk is really doing what it should to unwind image record creation after the api reports an upload failure before we go deciding we need more logic in nodepool to hunt for leaked images which sneak past the sdk's cleanup, though ultimately the latter is probably still useful	15:20
mordred	fungi: yes - I agree	15:22
clarkb	fungi: I've also tried for a while now to convince the glance team that the api is too complicated for the use case, but I think I mostly failed there when they rewrote the api and added steps instead of removing them	15:25
* fungi needs to head to an appointment, but will be back shortly		15:26
clarkb	seems like the plan to store more than just disk images at one time is part of the complication there?	15:26
*** bobh has quit IRC		15:27
*** chandankumar is now known as raukadah		15:32
*** hamzy has quit IRC		15:33
openstackgerrit	Donny Davis proposed openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing https://review.opendev.org/669893	15:36
clarkb	config-core infra-root ^ care to be second reviewer on that? I think we'll turn it off while we work through improving our testing then fixing the images	15:38
Shrews	clarkb: corvus: where will the journald output be found using that role? directly in the nodepool logs?	15:38
clarkb	I think donnyd wants to do some hardware changes too	15:38
clarkb	Shrews: yes, see http://logs.openstack.org/73/669773/2/check/dib-nodepool-functional-openstack-centos-7-src/00f6345/nodepool/nodepool-launcher.log as an example	15:39
corvus	Shrews: yes, iff we log at debug level (thus the parent change)	15:39
donnyd	yes. the mirror will be offline for about 10-15 minutes	15:39
Shrews	ah ok. i just didnt scroll down far enough in that log file to see it	15:41
Shrews	kinda ugly output	15:41
*** altlogbot_0 has joined #openstack-infra		15:42
Shrews	cant it be sent to syslog?	15:42
corvus	Shrews: we don't use syslog at all in nodepool...	15:42
clarkb	and the issue if networking is broken is there is no good way to get syslog	15:42
*** ijw has joined #openstack-infra		15:43
clarkb	the console log is retrieved through the magic of qemu so is the most accessible logging in that case	15:43
Shrews	corvus: yet we pull the file: http://logs.openstack.org/87/669787/2/check/nodepool-functional-openstack/44716e5/	15:43
corvus	Shrews: (but, technically, yes, you could write a logging config to send it to syslog)...	15:43
clarkb	Shrews: thats the syslog of the hypervisor not the guest	15:43
Shrews	ah	15:43
*** derekh has quit IRC		15:44
Shrews	sorry, still trying to catch up on these test changes	15:44
*** altlogbot_0 has quit IRC		15:46
*** ijw has quit IRC		15:48
*** diablo_rojo has joined #openstack-infra		15:48
*** priteau has quit IRC		15:52
*** igordc has joined #openstack-infra		15:56
openstackgerrit	Merged openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing https://review.opendev.org/669893	15:59
openstackgerrit	Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	15:59
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Add shared nodepool_opts variable to openstack func test https://review.opendev.org/669899	16:00
donnyd	just a data point - FN will still be up, the only outage will be for the mirror	16:01
donnyd	clarkb:	16:01
donnyd	clarkb: is there a way to make the mirror ha?	16:01
corvus	donnyd: yes, but it's usually not worth the effort since we have other clouds	16:02
*** hamzy has joined #openstack-infra		16:02
clarkb	donnyd: we could run two mirrors and an haproxy but I think we've largely decided our redundancy is by having more than one cloud region and then we don't have to overcomplicate each cloud region's setup	16:02
donnyd	yea that makes sense to me	16:03
openstackgerrit	Clark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773	16:03
donnyd	just curious	16:03
clarkb	corvus: ^ ok I think those three changes do what you suggest. Have a moment to quickly check if that is better? (note I didn't update ianw's change for the debug thing as the appraoches were quite a bit different instead pushed a different change)	16:04
corvus	clarkb: yeah, left a comment on '99	16:04
*** icarusfactor has joined #openstack-infra		16:04
*** bobh has joined #openstack-infra		16:04
clarkb	thanks	16:04
corvus	clarkb: also, i was originally thinking of an explicit job var just for debug, but i'm okay with this if that's what you prefer.	16:05
clarkb	corvus: is there a jinja filter for if var is true then this value instead?	16:05
clarkb	Mostly this was simpler to write and maybe more flexible so did that	16:05
* clarkb finds ansible jinja filter docs		16:06
*** factor has quit IRC		16:06
*** factor__ has joined #openstack-infra		16:06
corvus	clarkb: maybe the usual python "and or" thing would work?	16:06
*** bobh has quit IRC		16:06
corvus	clarkb: apparently: http://jinja.pocoo.org/docs/2.10/templates/#if-expression	16:07
clarkb	{{ nodepool_debug and '-d' or '' }} ?	16:07
corvus	that's what i was thinking, but docs say: {{ '-d' if nodepool_debug else '' }}	16:07
* clarkb updates with docs this time		16:08
*** altlogbot_1 has joined #openstack-infra		16:08
*** icarusfactor has quit IRC		16:08
*** eernst has joined #openstack-infra		16:09
donnyd	Is there a way for me to check what build was running on a particular node. I would like to do everything possible to reduce build times.	16:10
*** witek has quit IRC		16:12
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669899	16:13
openstackgerrit	Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	16:13
*** altlogbot_1 has quit IRC		16:13
corvus	donnyd: you can probably query logstash for what you need for that... you can get build results and runtimes and node name, and you can query by region. it won't have the instance uuid though.	16:13
corvus	donnyd: (it will have the ip address and time, so you can xref that way)	16:13
corvus	donnyd: http://logstash.openstack.org/#/dashboard/file/logstash.json	16:14
openstackgerrit	Clark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773	16:14
clarkb	corvus: ok I think that addresses the comments. 669773 should confirm it works as expected	16:14
*** ykarel is now known as ykarel\|away		16:14
*** rpioso is now known as rpioso\|afk		16:15
*** mattw4 has joined #openstack-infra		16:16
clarkb	corvus: donnyd ya you can filter by node_provider:"fortnebula-regionone" AND build_name:"some-job-name" AND message:"stuff" and so on	16:17
clarkb	that should allow you to narrow down jobs then dig in via their log links (which can be found by clicking on the event results logstash returns its part of the event details)	16:17
*** lucasagomes has quit IRC		16:17
*** tdasilva has joined #openstack-infra		16:19
*** altlogbot_3 has joined #openstack-infra		16:20
clarkb	comparing the boot up logs we captured to those of a host in fortnebula the fn node networkmanager logs show it doing dhcp	16:20
*** whoami-rajat has joined #openstack-infra		16:20
clarkb	the test node doesn't seem to dhcp. This makes me wonder if there is a bug in ordering where we don't enable dhcp on the interface? Thats a stretch need more data	16:20
clarkb	I think what I'll do next is build an image locally then start testing with that	16:21
*** eernst has quit IRC		16:21
clarkb	that will take some time for me to bootstrap though as I've let my dib VM rot	16:21
clarkb	actually I could just edit the centos 7 image that dib has already built via nodepool. Except that is a large image	16:21
clarkb	maybe I'll start with that since networking tends to be cheap ish	16:22
*** altlogbot_3 has quit IRC		16:23
openstackgerrit	Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432	16:24
*** irclogbot_1 has joined #openstack-infra		16:24
*** irclogbot_1 has quit IRC		16:27
clarkb	alright I'm going to pop out for a bike ride while I wait for test results and consider quicker rtt methods for testing the centos problem	16:27
*** factor__ has quit IRC		16:29
*** factor__ has joined #openstack-infra		16:30
logan-	clarkb: regarding your jinja question.. ternary filter	16:32
sshnaidm\|ruck	How to configure this - I'd like to run the same periodic job on different branches, apparently this doesn't work: https://github.com/openstack/tripleo-ci/blob/master/zuul.d/periodic.yaml#L4	16:32
sshnaidm\|ruck	it runs on master only	16:32
*** dtantsur is now known as dtantsur\|afk		16:34
*** goldyfruit has quit IRC		16:35
*** rpittau is now known as rpittau\|afk		16:37
*** bobh has joined #openstack-infra		16:38
*** bobh has quit IRC		16:38
*** ccamacho has quit IRC		16:38
*** eharney has joined #openstack-infra		16:41
*** factor__ has quit IRC		16:44
*** mgoddard has quit IRC		16:45
AJaeger	sshnaidm\|ruck: tripleo-ci has only a master branch, so it will run only on that one...	16:47
sshnaidm\|ruck	AJaeger, I see, so if I configure the same on branched repo it'll work?	16:48
*** mgoddard has joined #openstack-infra		16:48
AJaeger	sshnaidm\|ruck: if you add it on project-config, it will work	16:49
*** dpawlik has joined #openstack-infra		16:49
AJaeger	sshnaidm\|ruck: if you add it to another repo, it will only run on the branch you add it on...	16:49
AJaeger	so, you would need to add it to all branches	16:49
sshnaidm\|ruck	AJaeger, ok, thanks	16:50
AJaeger	sshnaidm\|ruck: unless there's a magic zuul variable to apply this on all branches, best ask corvus	16:50
AJaeger	or check docs	16:50
sshnaidm\|ruck	corvus, do we have such? ^^	16:50
*** gtema has quit IRC		16:52
*** gtema has joined #openstack-infra		16:52
*** ijw has joined #openstack-infra		16:52
*** sshnaidm\|ruck is now known as sshnaidm\|afk		16:57
AJaeger	jamespage: could you check https://review.opendev.org/615273, please? Is there progress happening?	16:57
AJaeger	sshnaidm\|afk, check: https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers	16:59
donnyd	clarkb: is your test node using just SLAAC or dhcpv6-stateless? Wonder if it even makes a difference	16:59
*** udesale has quit IRC		17:00
*** armax has quit IRC		17:00
*** altlogbot_3 has joined #openstack-infra		17:00
*** ykarel\|away has quit IRC		17:00
*** gtema has quit IRC		17:03
*** gtema has joined #openstack-infra		17:04
*** kjackal has quit IRC		17:04
sshnaidm\|afk	AJaeger, not sure I understand this well and how I can use it..	17:05
sshnaidm\|afk	will ask in #zuul	17:05
*** altlogbot_3 has quit IRC		17:05
fungi	okay, back and caught up on scrollback here	17:05
fungi	clarkb: donnyd: we also have hostids indexed now in logstash, if that helps	17:06
*** irclogbot_2 has joined #openstack-infra		17:10
*** Lucas_Gray has joined #openstack-infra		17:11
*** irclogbot_2 has quit IRC		17:13
openstackgerrit	Alex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui https://review.opendev.org/669883	17:14
*** dpawlik has quit IRC		17:20
*** Lucas_Gray has quit IRC		17:24
*** smarcet has joined #openstack-infra		17:26
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Retire docs-specs https://review.opendev.org/668854	17:29
*** igordc has quit IRC		17:29
AJaeger	config-core, two project retirements for review, please: https://review.opendev.org/#/c/668854 and https://review.opendev.org/#/c/669883/	17:30
*** ralonsoh has quit IRC		17:36
donnyd	thanks for all of the pointers, I should be able to get what I am looking for to get the tuning optimal for this workload	17:38
*** gyee has joined #openstack-infra		17:39
*** tesseract has joined #openstack-infra		17:40
clarkb	donnyd: I think NM may try both	17:43
clarkb	donnyd: its not actually completely clear to me what choices it is making :/	17:43
clarkb	donnyd: and the behavior seems to change if the old sysconfig stuff is being used	17:43
*** hamzy has quit IRC		17:47
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669899	17:49
openstackgerrit	Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	17:49
clarkb	I think I've decided the easiest way to debug this is to hold the centos functional jobs for https://review.opendev.org/#/c/669773/4 so I'm doign that now	17:51
*** hamzy has joined #openstack-infra		17:52
*** rpioso\|afk is now known as rpioso		17:58
*** jcoufal_ has joined #openstack-infra		18:00
fungi	Shrews: is there any other detail we should collect before i delete the leaked nodes from sjc1?	18:02
*** jcoufal has quit IRC		18:02
Shrews	fungi: i dont believe so	18:02
*** smarcet has quit IRC		18:02
fungi	okay, i'll put together the list and start cleaning up	18:02
zbr	clarkb: fungi AJaeger corvus: please have a look at https://review.opendev.org/#/c/669871/ and let me know which path should I take (check my last comment).	18:02
clarkb	I thought depends on did end up as implicit required projects?	18:04
*** bhavikdbavishi has quit IRC		18:04
clarkb	zbr: I think what I would do is have an openstack-tox-molecule job that sets the success and failure urls, then inherit from that in tripleo-tox-molecule job and set required project there. Then have two different templates that should never need to change	18:06
*** electrofelix has quit IRC		18:08
zbr	ok, i guess you wanted to say an tripleo-tox-molecule template, but is ok. i will do it that way.	18:10
clarkb	no jobs	18:10
clarkb	first new job sets success and failure urls, second new job sets the tripleo required projects list	18:11
clarkb	then you can either put those in templates or not	18:11
zbr	ok, i think i got it	18:12
clarkb	that way if you make fixes to the other jobs the tripleo child job gets all of those fixes	18:12
clarkb	and you don't have to double account those changes	18:12
*** eharney has quit IRC		18:17
*** armax has joined #openstack-infra		18:20
*** jcoufal_ has quit IRC		18:20
*** jamesmcarthur has quit IRC		18:21
zbr	clarkb: can templates have the same names as jobs?	18:21
clarkb	zbr: I believe they can as they are applied in different contexts	18:22
zbr	ok. that is good as it would be useful to have both. in most cases people would refer to the template.	18:23
*** eharney has joined #openstack-infra		18:30
*** whoami-rajat has quit IRC		18:30
openstackgerrit	Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871	18:31
openstackgerrit	Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871	18:33
*** jamesmcarthur has joined #openstack-infra		18:34
*** melwitt has joined #openstack-infra		18:34
*** jcoufal has joined #openstack-infra		18:34
*** bobh has joined #openstack-infra		18:35
*** irclogbot_0 has joined #openstack-infra		18:36
*** bobh has quit IRC		18:36
clarkb	corvus: apparently if you set -d with nodepool it does not daemonize	18:38
*** tesseract has quit IRC		18:39
*** irclogbot_0 has quit IRC		18:39
clarkb	(this causes the ansible task to start nodepool-builder to never end and we never start the launcher)	18:39
clarkb	that would be why ianw wrote the logging config I guess	18:40
*** panda has quit IRC		18:43
*** eharney has quit IRC		18:44
*** eharney has joined #openstack-infra		18:46
fungi	mnaser: Shrews: in addition to the one image which nodepool is failing to delete due to it being mysteriously in use after more than two months (f670e6be-953b-4d4b-a931-6cbb5b568410), a couple of the images nodepool has lost track of are also refusing to delete for the same reason: 8ea352da-2907-41c9-9b49-d0028f661fd7, c2f138b5-cfe5-43e8-8a11-251d71033628, 233dd054-dae8-4b12-a3de-fe2aa7671f37,	18:47
fungi	e2f7361d-9d04-4b93-b2fa-f4bad095499c	18:47
clarkb	fungi: are those all test images (eg not control plane images?)	18:48
fungi	test images, all	18:48
fungi	so no clue what instances would still be using them	18:48
clarkb	could be held nodes	18:48
fungi	in the case of f670e6be-953b-4d4b-a931-6cbb5b568410 the only running instance anywhere near that old id for a different image type entirely	18:48
*** hamzy has quit IRC		18:49
*** hamzy has joined #openstack-infra		18:49
fungi	there are currently no held nodes in sjc1	18:49
clarkb	cinder and glance and nova must be getting confused then	18:49
clarkb	donnyd: am I good to boot test nodes in fn?	18:51
fungi	i wouldn't be surprised if certain sorts of boot failures leave bfv references behind somewhere	18:51
*** jcoufal has quit IRC		18:52
*** jcoufal has joined #openstack-infra		18:53
fungi	#status log manually deleted 30 leaked nodepool images from vexxhost-sjc1	18:53
openstackstatus	fungi: finished logging	18:53
clarkb	~5 minutes to our weekly meeting. I expect this one to be quick considering many of us have only worked ~2 days since the last one	18:55
*** igordc has joined #openstack-infra		18:55
donnyd	clarkb: yea you should be good to go	18:55
*** rlandy has quit IRC		18:56
fungi	clarkb: i call those folks "efficient"	18:56
fungi	i wonder if these delete failures point to leaked volumes i can find in cinder?	18:57
*** jcoufal has quit IRC		18:57
fungi	heading down that route next	18:58
clarkb	fungi: oh interesting ya possible the instance delete left a volume hanging around	18:58
fungi	not all of them i guess. i see three marked "available" in openstack volume list	18:59
fungi	all 80gb so must be associated with test nodes	18:59
clarkb	if you volume show on them and they were booted from our images you should get thei mage uuids out	18:59
clarkb	maybe even the names too	18:59
*** rlandy has joined #openstack-infra		19:00
fungi	all 3 available volumes were created on 2019-06-21	19:01
*** ociuhandu has quit IRC		19:02
*** pkopec has quit IRC		19:02
*** jcoufal has joined #openstack-infra		19:03
*** crodriguez has joined #openstack-infra		19:04
*** factor has joined #openstack-infra		19:06
fungi	oh, in addition to the 3 "available" volumes there are also 6 "error_deleting" volumes	19:10
fungi	okay, all 9 leaked volumes are built from one of two images: c2f138b5-cfe5-43e8-8a11-251d71033628 and e2f7361d-9d04-4b93-b2fa-f4bad095499c	19:15
corvus	clarkb: sigh. i guess we should put ianw's logging config change behind your flag?	19:16
*** weifan has joined #openstack-infra		19:16
fungi	that covers 2 of our 4 undeletable images at least	19:16
clarkb	corvus: ya thats sort of what I'm thinking unless we want to change nodepool behavior	19:17
corvus	clarkb: i kinda do, but i think we've done as much as we can easily, and the rest is going to be a slower process, so i don't think we should depend on that here.	19:19
corvus	(further work on that will entail more breaking behavior changes i think)	19:19
*** panda has joined #openstack-infra		19:19
*** jcoufal_ has joined #openstack-infra		19:21
*** ijw has quit IRC		19:22
*** michael-beaver has quit IRC		19:24
*** jcoufal has quit IRC		19:24
*** goldyfruit has joined #openstack-infra		19:25
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871	19:27
AJaeger	sshnaidm\|afk: fixed the syntax for you ^	19:28
*** jcoufal_ has quit IRC		19:28
zbr	AJaeger: thanks! i was away to pick up something and when I am back i see it fixed, that's a really nice experience.	19:30
AJaeger	;)	19:31
AJaeger	zbr: you can do now the tripleo-ci part of that change ;)	19:32
*** pkopec has joined #openstack-infra		19:32
*** eernst has joined #openstack-infra		19:33
*** pkopec has quit IRC		19:36
*** ttx has quit IRC		19:49
*** ttx has joined #openstack-infra		19:50
clarkb	donnyd: fyi getting {'message': 'No valid host was found. There are not enough hosts available.', 'code': 500, 'created': '2019-07-09T19:49:38Z'} from nova trying to create new instances	19:51
donnyd	60 seocnds	19:51
donnyd	they are just coming back up	19:52
donnyd	swapped out my core switch for 40G while I was tinkering.. so I won't have to do it later	19:52
clarkb	k I'll delete and try again	19:52
*** eernst has quit IRC		19:52
AJaeger	clarkb: was https://review.opendev.org/669871 and https://review.opendev.org/669871 what you had in mind for tox-molecule? I think you did and voted those by zbr up	19:54
donnyd	should be good to go now	19:55
clarkb	AJaeger: thats the same change twice but looknig	19:55
*** ijw has joined #openstack-infra		19:55
clarkb	AJaeger: ya that is what I had in mind for the first bit. Then tripleo can inherit from that job and set the required projects	19:55
clarkb	approved	19:55
corvus	clarkb: want me to do the nodepool debug change?	19:56
*** ijw has joined #openstack-infra		19:56
clarkb	corvus: if you are able that would be great	19:56
corvus	on it	19:56
* clarkb boots some new instances in fn		19:56
clarkb	I've got my held node to poke around on once I'm done double checking on fn	19:57
donnyd	your other test instances should be back up and good to go as well	19:57
clarkb	ok just confirmed that the image that fails in nodepool's func test boots properly on fortnebula with working ipv6	20:00
clarkb	waiting for the old unit image to boot to confirm that does not work. Then I think that confirms we have a good fix and just need to sort out why it breaks in testing	20:00
donnyd	glance speeds are next on my hit list.. taking forever to pull down the image from the controller	20:01
fungi	okay, so of the 9 leaked volumes in vexxhost-sjc1 (all were from the same date btw, 2019-06-21), cinder let me delete the 3 marked as available but not the 6 in an error_deleting state (not terribly surprising)	20:06
fungi	those are accounting for 2 of the 5 total undeletable images in that provider, so i still have no explanation for the remaining 3	20:07
openstackgerrit	Merged openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871	20:07
fungi	next i guess i can scrape the image relationships for the in-use volumes and see if any turn up there	20:08
*** eernst has joined #openstack-infra		20:08
zbr	thanks!!	20:08
clarkb	donnyd: any idea why clarkb-test-centos-old-unit and clarkb-test-centos-old-unit2 seem to be stuck in BUILD state? Makes me wonder if I edited that image file improperly	20:12
donnyd	its not the image	20:12
donnyd	try one more time	20:13
openstackgerrit	James E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669939	20:14
openstackgerrit	James E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	20:15
clarkb	donnyd: looks like clarkb-test-centos-old-unit2 ended up booting and it has the wrong ipv6 address and working ipv4	20:15
clarkb	ok so I think that confirms our suspicions of this being a fix. Whcih means I have to direct my focus on making the test job work	20:15
corvus	clarkb: i rechecked 669773; i think that's everything, but you may want to give it a quick once-over before we wait an hour for nothing :)	20:16
fungi	bingo! all the undeletable images are also associated with in-use volumes in cinder	20:16
clarkb	corvus: looking	20:16
fungi	now to try and identify whether they're really still used by server instances. smart money says they're not	20:17
fungi	and it's the cinder metadata which is the problem	20:17
clarkb	corvus: yup that looks about right	20:17
clarkb	thanks	20:17
clarkb	comparing console log from held test node against the images I uploaded to fn from that test node the major difference seems to be that there are no dhcp actions logged in the failed test node side	20:19
clarkb	I think that is the thread to pull on	20:20
Shrews	fungi: your detective skills are impressive	20:20
fungi	Shrews: impressively slow	20:21
*** xek has quit IRC		20:21
clarkb	I'm gonna pop out for a few then dig into that	20:21
fungi	my detective kit consists primarily of grep and for loops in shell	20:22
*** Lucas_Gray has joined #openstack-infra		20:24
*** irclogbot_1 has joined #openstack-infra		20:24
openstackgerrit	Nate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation https://review.opendev.org/669775	20:26
*** irclogbot_1 has quit IRC		20:27
fungi	oof, 14 volumes in-use referencing those undeletable images	20:31
fungi	and yeah, looks like these are also all from 2019-06-21	20:32
fungi	something must have happened to leave this mess behind	20:32
Shrews	don't blame me. i was under general anesthesia that day	20:33
Shrews	or maybe i did do it and can't remember	20:33
fungi	hah	20:34
Shrews	:)	20:34
fungi	and as i suspected, spot checks indicate the server-id referenced for each of these attachments, when fed into openstack server show, comes back with a "No server with a name or ID...exists" error	20:34
fungi	also, i mistyped, it's 24 volumes not 14	20:36
*** hamzy has quit IRC		20:37
*** eharney has quit IRC		20:37
*** jamesmcarthur has quit IRC		20:40
*** jamesmcarthur has joined #openstack-infra		20:41
fungi	not finding any mention in irc channel logs of something happening on that date which might explain this	20:41
fungi	i was trying to troubleshoot nodepool prematurely deleting network ports in ovh	20:41
*** strigazi has quit IRC		20:42
*** Fidde has joined #openstack-infra		20:42
fungi	Shrews: that reminds me, did you catch that conversation later?	20:42
Shrews	nope	20:42
*** joeguo has joined #openstack-infra		20:42
fungi	upshot was https://review.opendev.org/666852 Increase port cleanup interval	20:43
*** strigazi has joined #openstack-infra		20:43
Shrews	ah, i did see that change	20:43
fungi	basically the port gets allocated, nova takes a while scheduling the server instance, nodepool thinks the port is leaked and deletes it, instance is eventually scheduled and then the boot fails	20:44
fungi	the lag was prevalent enough in ovh that it was creating a fairly steady stream of boot failures for us	20:45
*** jamesmcarthur has quit IRC		20:48
fungi	mnaser: executive summary of the remainder of the leak in sjc1... something seems to have happened there on 2019-06-21 which left 24 orphaned cinder volumes in sjc1 claiming to be in-use attached to server instances which no longer exist per nova. we've also got another 6 cinder volumes from that date which are stuck in an error_deleting state and i can't seem to delete those either. there are 5	20:52
fungi	glance images which we should be able to delete once the volumes are gone. do you want that list of 30 volume uuids?	20:52
smcginnis	When the volume lifecycle is managed by nova, I've seen cases where Nova hits an error and never tells Cinder to delete them. Maybe the case this time.	20:53
*** bobh has joined #openstack-infra		20:54
fungi	yeah, kinda like the port leaks we see with nova/neutron coordination in some providers	20:58
*** bobh has quit IRC		21:01
*** pkopec has joined #openstack-infra		21:01
mordred	fungi: are the volumes identifiable enough, do you think, that we could delete them it neh nodepool/sdk cleanup method? I'm a little more conservative about volumes, because I don't want to delete something that someone needs though ...	21:04
openstackgerrit	Merged zuul/zuul master: Additional note about branches for implied-branches https://review.opendev.org/667415	21:06
fungi	mordred: in this case cinder seems to not be letting me delete these	21:06
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948	21:06
fungi	i think they need admin override	21:07
mordred	ah - well then	21:07
fungi	cinder thinks they're still in use by servers nova says don't exist	21:07
corvus	mordred: ^ when you have a sec, can you give 669948 a really quick once-over and see if that's the general approach you were thinking about?	21:07
corvus	(or, if it's compatible with what you were thinking)	21:07
corvus	mordred: only look at the readme, the code is wrong; it's just a copy of the old role right now)	21:08
corvus	also, hah, my example doesn't match my docs, but you get the idea	21:08
mordred	corvus: ooh - yeah. that seems great - like the keys can be a good api and stuff	21:08
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948	21:09
mordred	corvus: since this is docware at the moment - should this account for per-region things with some defined substitution vars?	21:09
mordred	corvus: like: mirror_info.pypy: https://mirror.{{ nodepool.region }}.{{ nodepool.cloud }}.example.com/pypi/simple	21:10
corvus	mordred: yeah, i'm expecting we can do that more or less the way we do now	21:10
mordred	(for instance)	21:10
mordred	cool	21:10
corvus	i could put that in the readme as a more complex example? or i could push up a DNM change to project-config with what it would look like for us	21:11
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948	21:11
mordred	corvus: I think it's worth including in the README - sine we're defining an API for other people to be able to use as well	21:11
openstackgerrit	Merged zuul/zuul master: Run jobs when their own config changes https://review.opendev.org/669752	21:11
mordred	I mean, ansible variables work like they work, so it's not actullay documenting role-specific behavior - but it might not occur to someone that they can do that	21:12
*** ekultails has quit IRC		21:12
*** irclogbot_0 has joined #openstack-infra		21:14
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948	21:15
corvus	mordred: ya, good point.	21:15
mordred	++	21:15
mordred	that looks great	21:15
corvus	mordred: i think that's probably at a point i can start an email thread on zuul-discuss to get design review on it, yeah?	21:15
mordred	yah	21:15
corvus	i'll fire off an email now	21:16
mordred	corvus: I think it's going to wind up being nice for opendev too - since there will be a nice easy list of what mirrors exist	21:16
corvus	++	21:16
mordred	incidentally - I think in the new scheme we should define 'docker' as a mirror type and have it mean the new v2 style mirror - my gut says to leave the v1 style out becuase all of the clients are supposed to be able to deal with v2 mirrors now - but if we need to still support it have its key be dockerv1 or olddocker or something like that	21:17
corvus	sounds reasonable	21:18
mordred	corvus: or - should we pick a more generic - 'container-registry' - and maybe paint it chartreuse	21:19
*** irclogbot_0 has quit IRC		21:19
*** pcaruana has quit IRC		21:19
corvus	hrm, that's an interesting question.	21:19
*** joeguo has quit IRC		21:20
*** goldyfruit has quit IRC		21:20
*** weifan has quit IRC		21:21
*** weifan has joined #openstack-infra		21:21
*** Lucas_Gray has quit IRC		21:22
clarkb	seems like docker is its own protocol and maybe being specific there is good	21:22
clarkb	also if anyone is wondering hand editing /etc/shadow in order to login in on the console is error prone	21:23
clarkb	(I have no idea what I've done wrong at this point /me starts from fresh copy of original image)	21:24
*** weifan has quit IRC		21:26
mordred	clarkb: well - yeah - that was my first thought- but I think the container registry protocol is more generic (with things like quay.io and gcr.io)	21:26
clarkb	mordred: right but it is the docker container registry protocol because they don't http properly :/	21:27
fungi	clarkb: can you pass init=/bin/sh on the kernel command-line and then remount / rw and use passwd to set a rootpw?	21:27
clarkb	other services just http iirc	21:27
clarkb	fungi: this is the broken centos test image so I can just copy the original and edit it again	21:27
mordred	really? jesus	21:27
fungi	ahh, easier then	21:28
mordred	clarkb: so it might be more accurate then to call it "dockerhub"	21:28
fungi	chroot into it and use passwd there	21:28
clarkb	mordred: ya it doesn't support ipv6 address literals for example	21:28
mordred	actually - it's more accurate to call it dockerhub anyway - since it's not a generic registry, it's specifically dockerhub we're mirroring here	21:28
mordred	and if we were to mirror quay.io, I'd expect a second entry called "quay" or "quay.io" or something	21:29
clarkb	ya	21:30
*** Fidde has quit IRC		21:30
corvus	i just realized this doc section shouldn't be the doc for configure-mirrors2, it should be a top-level doc since it's about a variable that any role might use.	21:31
corvus	so i'm doing a quick re-org before i send out the mail	21:31
mordred	corvus: ++	21:32
*** irclogbot_1 has joined #openstack-infra		21:38
*** eernst has quit IRC		21:41
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror_info documentation https://review.opendev.org/669948	21:42
*** irclogbot_1 has quit IRC		21:43
*** donnyd is now known as donnyd_pto		21:43
donnyd_pto	I will be around, just not at my desk all day	21:44
donnyd_pto	ping me if I am needed	21:44
clarkb	donnyd_pto: did you see I think I confirmed the fix is a good fix? just have to understand why it fails in testing now	21:46
clarkb	fungi: so I did the chroot and chpasswd which was fine boot actually started dbus because root existed but then I couldnt login I think due to selinux contexts so fixing that now :/	21:47
fungi	yuck	21:47
clarkb	eventually I'll be able to examing the running system without working networking :/	21:48
* fungi misses the days when you could log into a terminal without a message bus getting involved		21:48
clarkb	fungi: back when you could put unecrypted passwords in the passwd file	21:48
donnyd_pto	clarkb: would it be helpful to get some actual public ipv4 addresses so you could ssh in and see whats the what?	21:49
corvus	shadow has been around for a long time	21:49
fungi	clarkb: back when most servers had a passwordless "guest" account	21:49
clarkb	donnyd_pto: no this is on one of our test nodes where I'm reproducing the test failures of the fix so that I can figure out how to fix the test for the fix	21:49
fungi	that's so meta	21:50
donnyd_pto	clarkb: Could you repeat that in terms that actual humans can understand	21:50
donnyd_pto	jk... lmk if you need anything from me	21:50
clarkb	and ya now I can login as root finally. Also restorecon doesn't just work in a chroot if anyone is wondering. Thankfully dib has examples of how to make it work	21:51
*** irclogbot_0 has joined #openstack-infra		21:54
clarkb	ok I have managed to confirm /var/lib/dhclient is an empty dir	21:54
clarkb	implying it is never requesting the dhcp leases as suspected	21:55
*** altlogbot_3 has joined #openstack-infra		21:55
*** altlogbot_3 has quit IRC		21:55
clarkb	now trying a reboot to see if it works on second boot as it did on fn	21:59
*** irclogbot_0 has quit IRC		21:59
*** igordc has quit IRC		22:00
clarkb	and it works after a reboot	22:02
clarkb	its almost like the ifcfg file isn't synced to disk by the time network manager runs?	22:02
clarkb	since that is what tells network manager to run a dhclient	22:02
*** aaronsheffield has quit IRC		22:03
*** weifan has joined #openstack-infra		22:10
*** rcernin has joined #openstack-infra		22:10
*** tosky has quit IRC		22:18
*** weifan has quit IRC		22:19
*** jamesmcarthur has joined #openstack-infra		22:21
*** pkopec_ has joined #openstack-infra		22:24
fungi	more like not written to the file... they should both be working off a coherent fs cache even if its writes haven't yet synchronized to the underlying block device	22:24
clarkb	ya adding a sync changed nothing. I've now figured out how to have network manager logging verbosity go way up so rerunning with that hoping for lcues	22:25
fungi	you should never need to call sync for one process to read another's file off the same copy of a filesystem. it may be that the writing process hasn't called fsync yet and still has its fd writes buffered	22:26
*** pkopec has quit IRC		22:26
fungi	er, hasn't flushed	22:26
clarkb	while in general I agree I've definitely found cases where that isn't true like testing zuul's disk usage monitor on btrfs	22:28
clarkb	it failed reliably for me on btrfs locally until we added some flushes	22:28
*** goldyfruit has joined #openstack-infra		22:28
clarkb	er I guess that is what you said	22:28
fungi	flushes of the buffered writes for the process, yes	22:28
fungi	sync to the block device should never be necessary if both processes are going through the same filesystem interface though	22:29
fungi	(now if one is reading straight from the block device and the other is writing through the fs layer, sure)	22:30
fungi	but say you have a process writing to an open file descriptor, it's almost certainly buffering writes in process memory and flushing them to the fs only under certain conditions (say, when the buffer gets full or flush() is explicitly called)	22:33
fungi	(or when close() happens)	22:33
fungi	and yeah, with a persistent daemon writing small amounts of data, it nearly always comes down to where the author included flush() calls	22:34
*** lathiat has quit IRC		22:35
*** rh-jelabarre has quit IRC		22:36
*** pkopec_ has quit IRC		22:37
*** armax has quit IRC		22:40
*** altlogbot_2 has joined #openstack-infra		22:44
*** weifan has joined #openstack-infra		22:45
*** altlogbot_2 has quit IRC		22:49
*** tkajinam has joined #openstack-infra		22:52
*** bobh has joined #openstack-infra		22:52
*** slaweq has quit IRC		22:54
clarkb	ok via nmcli and networkmanager logs I think I've figured out that there are two logical eth0 interfaces being managed by NM: "eth0" and "System eth0" the second one is the one that has ipv4.method set to auto the first is set to disabled	22:55
clarkb	so it must be confusing those two for some reason	22:56
*** bobh has quit IRC		22:57
corvus	i'm going to restart zuul to pick up the self-config change	23:01
clarkb	https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#114 getting closer I think	23:05
openstack	Debian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open]	23:05
clarkb	trying https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#331 next	23:08
openstack	Debian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open]	23:08
fungi	that does sound likely	23:10
clarkb	if that fixes it the race must be glean not needing to run on second boot so NM starts sooner prior to kernel autoconfing?	23:12
*** goldyfruit has quit IRC		23:14
corvus	#status log restarted all of zuul at commit 86f071464dafd584c995790dce30e2e3ca98f5ac	23:15
openstackstatus	corvus: finished logging	23:15
corvus	fungi, clarkb: https://review.opendev.org/669762 is a neato change :)	23:15
clarkb	corvus: +2	23:17
*** weifan has quit IRC		23:17
fungi	i suppose it's self-testing since the restart, except that we wouldn't be able to tell the difference so i guess there's little point in rechecking?	23:18
fungi	(it should trigger testing for all the jobs we just removed the file matcher for it from)	23:19
corvus	we could recheck it and we should see more jobs in check, but i vote we just +W it and see them in gate.	23:20
clarkb	wow I have an ip address now	23:20
clarkb	so Network Manager doesn't know what to do if the kernel assigns an IP to an interface	23:20
clarkb	and of course this is totally fine on our ipv4 only clodus because there is no ipv6 advertisements	23:21
*** jamesmcarthur has quit IRC		23:21
corvus	fungi: well, it looks like it's not triggering those jobs... maybe changing file matchers doesn't trip the detection.	23:21
clarkb	so now I need to have a think about the best way to handle this. We can't assume the interface names on boot because of biosdevname. We could update sysctl if NM is used to disabled autoconfg and RAs by default on all interfaces?	23:22
*** weifan has joined #openstack-infra		23:22
clarkb	Then it is NetworkManager's job not the kernel's to figure it out?	23:22
clarkb	this seems like such a huge NM bug	23:22
openstackgerrit	James E. Blair proposed opendev/system-config master: DNM: test jobs running on config update https://review.opendev.org/669957	23:24
*** dchen has joined #openstack-infra		23:25
corvus	fungi: ^ that seems to be doing what we expect (it's running the eavesdrop job)	23:25
clarkb	fungi: corvus ^ any concerns with doing a blanket sysctl disable of autoconf and accept_ra if NM is used in dib? I'm worried we'll end up breaking ipv6 somehow doing that (but end up with working ipv4 whcih is great except where we want ipv6 too)	23:25
corvus	if changes to file matchers don't trigger job runs, i'm okay with that :)	23:25
corvus	clarkb: i think it would take quite a bit of time for me to come up to speed on this problem enough to render an opinion.	23:28
clarkb	I'll try to summarize since it is probably worthwhile anyway	23:28
clarkb	On boot if NetworkManager sees that an interface already exists it creates a dummy record internally so that it knows about that interface but then assumes something else is managing that interface and doesn't touch it further. If you have kernel/sysctl settings for autoconfing ipv6 on interfaces it is possible for the kernel to assign ipv6 addrs to an interface prior to NM starting. In that case NM	23:30
fungi	clarkb: yeah, i'm worried disabling kernel handling of autoconf will simply break ipv6 since nm likely relies on at least some of that (if nm reimplemented that, wow, crazypants?)	23:30
clarkb	doesn't touch the interface and you get no dhcp for working ipv4	23:30
clarkb	fungi: I think NM did reimplement that	23:30
clarkb	hrm except I don't haev ipv6 addr on interface that neutron says should be there so maybe not	23:31
clarkb	ugh	23:31
fungi	remind me again why we switched to nm for these images? fedora depends on it and it was hard to untangle the centos elements from fedora's?	23:32
clarkb	fungi: rhel 8 is NM only aiui	23:32
clarkb	so we implemented it to ease transition to centos 8	23:33
clarkb	(it is the default on centos 7/rhel 7 but the old sysconfig only stuff exists without NM)	23:33
fungi	i have a hard time believing rhel8 strips ipv6 autoconf out of the kernel	23:33
fungi	but... maybe?	23:33
clarkb	its possible rhel8 uses a much newer NM that isn't broken	23:33
clarkb	or that multi year bug in debian shows that the distros mostly ignore the problem	23:34
fungi	i guess they've made more surprising choices before	23:34
clarkb	this does certainly seem like a fundamental flaw in using NM as network configurator when expecting working ipv6	23:35
clarkb	thinking on the race more I think the problem we have is our boot is many minutes long (because qemu) and neutron sends RAs more frequently than that	23:37
clarkb	so we have a high chance of accepting an RA before network manager has started	23:37
clarkb	whereas in the real world that ordering is far less likely	23:37
clarkb	(still possible though)	23:37
*** armax has joined #openstack-infra		23:39
clarkb	I'll have to make a new image without that sysctl settings boot it again and check if ipv6 is configured (basically determines that ipv6 is not working with NM then)	23:39
openstackgerrit	Merged opendev/system-config master: Remove .zuul.yaml file matchers https://review.opendev.org/669762	23:40
*** jamesmcarthur has joined #openstack-infra		23:42
openstackgerrit	James E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669939	23:43
openstackgerrit	James E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787	23:43
clarkb	ok confirmed if I do that we don't get working ipv6	23:45
clarkb	but if I remove the sysctl entries then we have the ipv6 addr that is expected	23:46
*** sthussey has quit IRC		23:46
*** jamesmcarthur has quit IRC		23:46
clarkb	maybe thats the hack around this: have the functional job use ipv6 (that likely requires we create an interface on the host node with an ipv6 addr that can route to the test node ipv6 addr(s)	23:46
*** tdasilva has quit IRC		23:47
clarkb	this is the sort of problem that deserves a stewing	23:49
corvus	i made brunswick stew on sunday	23:54
clarkb	hrm now I want stew for dinner	23:55
*** mattw4 has quit IRC		23:55
*** bobh has joined #openstack-infra		23:56
*** hamzy has joined #openstack-infra		23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!