donnyd | nice. Air handler is all fixed up | 00:00 |
---|---|---|
*** aaronsheffield has quit IRC | 00:02 | |
*** Lucas_Gray has quit IRC | 00:12 | |
*** nicolasbock has quit IRC | 00:17 | |
ianw | clarkb: it doesn't seem like the fedora vm is booting in ci :/ | 00:24 |
*** ijw has quit IRC | 00:28 | |
donnyd | clarkb: you want to take it the quota up to 20 ? | 00:28 |
donnyd | need to see how many reasonably fit on each hypervisor | 00:28 |
*** tjgresha has quit IRC | 00:30 | |
*** tjgresha has joined #openstack-infra | 00:30 | |
*** pkopec_ has quit IRC | 00:36 | |
ianw | clarkb: i've managed to grab the .qcow2 and console log for the test vm | 00:38 |
ianw | debugging, but eventually this will timeout | 00:39 |
donnyd | ianw: anything you need from the backend please lmk | 00:41 |
*** ijw has joined #openstack-infra | 00:41 | |
ianw | donnyd: thanks; yeah this is in our CI test of the changes to modify the ordering of networkmanger startup | 00:41 |
*** ijw has quit IRC | 00:43 | |
*** tdasilva has quit IRC | 00:48 | |
ianw | even after rebooting the node that nodepool created, it still doesn't come up with networking | 00:50 |
*** igordc has quit IRC | 00:59 | |
*** hongbin has joined #openstack-infra | 01:03 | |
*** uberjay has quit IRC | 01:06 | |
*** bobh has joined #openstack-infra | 01:07 | |
*** uberjay has joined #openstack-infra | 01:08 | |
*** armax has quit IRC | 01:09 | |
*** bobh has quit IRC | 01:12 | |
*** imacdonn has quit IRC | 01:13 | |
*** imacdonn has joined #openstack-infra | 01:14 | |
*** happyhemant has quit IRC | 01:17 | |
ianw | corvus: i feel like we used to get the console logs of the attempted boot of the test vm in nodepool.log; http://logs.openstack.org/73/669773/1/check/dib-nodepool-functional-openstack-fedora-29-src/1c412cb/nodepool/nodepool.log | 01:18 |
ianw | clarkb: i was, however, on the test host, and dumped the console, which looks like the ordering is just fine -> http://paste.openstack.org/show/754185/ | 01:20 |
*** ianychoi_ has joined #openstack-infra | 01:27 | |
clarkb | ya the ordering there looks good | 01:29 |
*** bauzas_ has joined #openstack-infra | 01:29 | |
*** nickv1985_ has joined #openstack-infra | 01:29 | |
*** ianychoi has quit IRC | 01:30 | |
*** altlogbot_0 has quit IRC | 01:30 | |
*** bauzas has quit IRC | 01:30 | |
*** nickv1985 has quit IRC | 01:30 | |
*** bauzas_ is now known as bauzas | 01:30 | |
*** nickv1985_ is now known as nickv1985 | 01:30 | |
*** diablo_rojo has quit IRC | 01:31 | |
*** altlogbot_2 has joined #openstack-infra | 01:31 | |
*** jamesdenton has quit IRC | 01:32 | |
*** hemna_ has quit IRC | 01:34 | |
*** adriant has joined #openstack-infra | 01:36 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780 | 01:36 |
ianw | clarkb: i can test the .qcow2 i captured from the run on my cloud; just need to upload it over what pretends to be upload bandwidth in .au | 01:37 |
*** hemna_ has joined #openstack-infra | 01:38 | |
ianw | i think a) getting the debug logging back in the job so it dumps the console, and then secondly setting systemd.journald.forward_to_console=1 would be *really* helpful, just in general ... i'm not sure how to get that into the image | 01:38 |
ianw | but if we were getting syslog on the console, and that was captured in logs ... well that would be nice | 01:39 |
clarkb | ++ | 01:40 |
clarkb | There is ansible that splats down dib elements | 01:40 |
clarkb | I think you couod modify that | 01:40 |
ianw | yeah, it's more getting it into the grub command line effectively | 01:42 |
*** diablo_rojo has joined #openstack-infra | 01:45 | |
ianw | oh i guess it could just be set in journald.conf | 01:46 |
prometheanfire | building a new gentoo systemd image to test on | 01:55 |
prometheanfire | is the a bug I can read up on? | 01:55 |
*** hongbin has quit IRC | 01:57 | |
*** jamesmcarthur has quit IRC | 02:00 | |
*** yamamoto has joined #openstack-infra | 02:03 | |
*** bobh has joined #openstack-infra | 02:09 | |
*** diablo_rojo has quit IRC | 02:22 | |
clarkb | prometheanfire: no I'donly just discovered the problem before needjng to do dinner | 02:22 |
clarkb | prometheanfire: basically ipv6 isnt configured | 02:22 |
clarkb | so gentoo image inst working on ipv6 only cloud | 02:23 |
*** ijw has joined #openstack-infra | 02:30 | |
prometheanfire | huh | 02:32 |
prometheanfire | only the gentoo image? | 02:32 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784 | 02:39 |
*** hongbin has joined #openstack-infra | 02:57 | |
*** icarusfactor has quit IRC | 03:03 | |
*** yamamoto has quit IRC | 03:03 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780 | 03:04 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 03:04 |
clarkb | prometheanfire: ya, centod and fedora seem to have an unrelated network manager race | 03:10 |
prometheanfire | ok | 03:10 |
*** whoami-rajat has joined #openstack-infra | 03:12 | |
*** rfolco has quit IRC | 03:19 | |
*** jamesmcarthur has joined #openstack-infra | 03:20 | |
*** bobh has quit IRC | 03:28 | |
*** jamesmcarthur has quit IRC | 03:33 | |
*** ykarel|away has joined #openstack-infra | 03:49 | |
*** ykarel|away is now known as ykarel | 03:53 | |
*** jamesmcarthur has joined #openstack-infra | 04:01 | |
*** hongbin has quit IRC | 04:04 | |
*** ykarel has quit IRC | 04:05 | |
*** jamesmcarthur has quit IRC | 04:05 | |
*** ijw has joined #openstack-infra | 04:06 | |
*** mandu_kim has joined #openstack-infra | 04:08 | |
*** udesale has joined #openstack-infra | 04:10 | |
*** sjfjqjfkd has joined #openstack-infra | 04:13 | |
*** ijw has quit IRC | 04:13 | |
openstackgerrit | Ian Wienand proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773 | 04:15 |
*** factor has joined #openstack-infra | 04:16 | |
*** jamesmcarthur has joined #openstack-infra | 04:19 | |
ianw | clarkb: exactly the same image (as in ./test-image-0000000002.qcow2 taken from the CI host) boots fine in my cloud environment. console logs look the same ... but yet CI hosts don't get network :/ | 04:22 |
sjfjqjfkd | Hi all, anybody know what should I append on local.conf for using gnocchi-api? | 04:25 |
sjfjqjfkd | enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer.git stable/rockyCEILOMETER_BACKEND=gnocchienable_service gnocchi-api,gnocchi-metricdit doesn | 04:25 |
sjfjqjfkd | doesn't work,, | 04:26 |
*** jamesmcarthur has quit IRC | 04:29 | |
*** rcernin has quit IRC | 04:30 | |
*** rcernin has joined #openstack-infra | 04:31 | |
*** jamesmcarthur has joined #openstack-infra | 04:32 | |
*** raukadah is now known as chandankumar | 04:35 | |
*** pcaruana has joined #openstack-infra | 04:35 | |
ianw | sjfjqjfkd: probably not much help in this channel, #openstack-telemetry will have more help, however, you might like to use something like http://codesearch.openstack.org/?q=CEILOMETER_BACKEND.*gnocchi&i=nope&files=&repos= to see what other things might do similar and how they do it | 04:35 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: nodepool-functional-openstack: add debug logging https://review.opendev.org/669780 | 04:37 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 04:37 |
*** ykarel has joined #openstack-infra | 04:38 | |
*** pcaruana has quit IRC | 04:38 | |
*** ijw has joined #openstack-infra | 04:41 | |
*** ijw has quit IRC | 04:47 | |
AJaeger | config-core, please review https://review.opendev.org/669727 https://review.opendev.org/667949 https://review.opendev.org/665910 and https://review.opendev.org/668708 | 04:49 |
*** kjackal has joined #openstack-infra | 04:51 | |
*** gyee has quit IRC | 04:59 | |
*** ricolin has joined #openstack-infra | 05:03 | |
*** jamesmcarthur has quit IRC | 05:05 | |
*** ccamacho has quit IRC | 05:06 | |
sshnaidm|afk | Is there problem with ovh BHS1 cloud? We have last 7 gate failures on it only, all of them fail to download containers | 05:06 |
ianw | sshnaidm|afk: not that i'm aware of. was that via the proxies on the mirror? note if the remote end is having issues the proxy doesn't help :) | 05:09 |
sshnaidm|afk | ianw, yeah, we use proxies.. | 05:10 |
*** ijw has joined #openstack-infra | 05:16 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784 | 05:21 |
*** ijw has quit IRC | 05:23 | |
*** rh-jelabarre has quit IRC | 05:24 | |
*** ykarel_ has joined #openstack-infra | 05:39 | |
ianw | clarkb: dropped a comment in your change https://review.opendev.org/#/c/669772/ on what i'm thinking for moving forward ... | 05:39 |
*** ykarel has quit IRC | 05:40 | |
*** ykarel_ has quit IRC | 05:42 | |
*** kjackal has quit IRC | 05:43 | |
*** jamesmcarthur has joined #openstack-infra | 05:44 | |
*** ykarel has joined #openstack-infra | 05:46 | |
*** udesale has quit IRC | 05:48 | |
*** ijw has joined #openstack-infra | 05:49 | |
*** jamesmcarthur has quit IRC | 05:51 | |
*** ccamacho has joined #openstack-infra | 05:53 | |
*** ijw has quit IRC | 05:54 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784 | 05:57 |
*** ijw has joined #openstack-infra | 05:57 | |
*** rcernin has quit IRC | 05:57 | |
*** kjackal has joined #openstack-infra | 06:00 | |
*** udesale has joined #openstack-infra | 06:06 | |
*** ijw has quit IRC | 06:06 | |
*** udesale has quit IRC | 06:09 | |
*** udesale has joined #openstack-infra | 06:10 | |
*** ykarel is now known as ykarel|meetig | 06:16 | |
*** ykarel|meetig is now known as ykarel|meeting | 06:16 | |
*** dpawlik has joined #openstack-infra | 06:19 | |
*** jamesmcarthur has joined #openstack-infra | 06:24 | |
*** jamesmcarthur has quit IRC | 06:28 | |
*** ccamacho has quit IRC | 06:29 | |
*** ccamacho has joined #openstack-infra | 06:29 | |
*** pgaxatte has joined #openstack-infra | 06:33 | |
*** ijw has joined #openstack-infra | 06:34 | |
*** ijw has quit IRC | 06:40 | |
*** udesale has quit IRC | 06:40 | |
*** ociuhandu has joined #openstack-infra | 06:46 | |
*** ociuhandu has quit IRC | 06:52 | |
*** bhavikdbavishi has joined #openstack-infra | 06:54 | |
*** rpittau|afk is now known as rpittau | 06:58 | |
*** jamesmcarthur has joined #openstack-infra | 06:59 | |
*** slaweq has joined #openstack-infra | 07:02 | |
*** pcaruana has joined #openstack-infra | 07:03 | |
*** jamesmcarthur has quit IRC | 07:05 | |
*** ginopc has joined #openstack-infra | 07:11 | |
*** piotrowskim has joined #openstack-infra | 07:12 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] element to send journal logs to console https://review.opendev.org/669784 | 07:15 |
*** ianw is now known as ianw_pto | 07:16 | |
*** udesale has joined #openstack-infra | 07:24 | |
*** tosky has joined #openstack-infra | 07:28 | |
*** ralonsoh has joined #openstack-infra | 07:30 | |
openstackgerrit | Merged openstack/project-config master: Retire tempest-tripleo-ui https://review.opendev.org/667949 | 07:30 |
openstackgerrit | Merged openstack/project-config master: Adding CI for ironic-prometheus-exporter https://review.opendev.org/665910 | 07:32 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Remove base role integration testing https://review.opendev.org/669727 | 07:34 |
*** lucasagomes has joined #openstack-infra | 07:40 | |
*** witek has joined #openstack-infra | 07:40 | |
*** iurygregory has joined #openstack-infra | 07:41 | |
*** priteau has joined #openstack-infra | 07:44 | |
*** bhavikdbavishi has quit IRC | 07:46 | |
*** ykarel_ has joined #openstack-infra | 07:50 | |
*** ykarel_ is now known as ykarel | 07:52 | |
*** ykarel|meeting has quit IRC | 07:52 | |
*** ykarel is now known as ykarel|lunch | 07:53 | |
*** sjfjqjfkd has quit IRC | 07:56 | |
*** dtantsur|afk is now known as dtantsur | 07:57 | |
*** ijw has joined #openstack-infra | 07:58 | |
*** rcernin has joined #openstack-infra | 08:02 | |
*** ijw has quit IRC | 08:03 | |
*** rcernin has quit IRC | 08:11 | |
*** ociuhandu has joined #openstack-infra | 08:17 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Improve error reporting for zuul dequeue https://review.opendev.org/669813 | 08:17 |
*** panda has quit IRC | 08:22 | |
*** panda has joined #openstack-infra | 08:24 | |
*** rcernin has joined #openstack-infra | 08:27 | |
*** pkopec has joined #openstack-infra | 08:30 | |
*** bhavikdbavishi has joined #openstack-infra | 08:38 | |
*** ijw has joined #openstack-infra | 08:38 | |
*** ykarel|lunch is now known as ykarel | 08:42 | |
*** tkajinam has quit IRC | 08:42 | |
*** ijw has quit IRC | 08:44 | |
*** priteau has quit IRC | 08:47 | |
*** priteau has joined #openstack-infra | 08:48 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 08:54 | |
*** Fidde has joined #openstack-infra | 08:55 | |
*** jamesmcarthur has joined #openstack-infra | 09:01 | |
*** jamesmcarthur has quit IRC | 09:05 | |
*** jamesmcarthur has joined #openstack-infra | 09:32 | |
*** jamesmcarthur has quit IRC | 09:37 | |
openstackgerrit | Merged openstack/project-config master: Mark networking-ovn-tempest-dsvm-ovs-release as voting https://review.opendev.org/668708 | 09:39 |
openstackgerrit | Merged opendev/irc-meetings master: Create a meeting for Networking OVN project https://review.opendev.org/668013 | 09:39 |
*** pkopec has quit IRC | 09:40 | |
*** pkopec has joined #openstack-infra | 09:45 | |
*** bhavikdbavishi has quit IRC | 09:47 | |
*** pcaruana has quit IRC | 09:48 | |
*** derekh has joined #openstack-infra | 10:00 | |
*** priteau has quit IRC | 10:03 | |
*** nicolasbock has joined #openstack-infra | 10:10 | |
*** ricolin has quit IRC | 10:14 | |
*** ricolin has joined #openstack-infra | 10:22 | |
*** ricolin has quit IRC | 10:23 | |
*** ijw has joined #openstack-infra | 10:30 | |
*** xek_ is now known as xek | 10:32 | |
*** udesale has quit IRC | 10:34 | |
*** dchen has quit IRC | 10:35 | |
*** kjackal has quit IRC | 10:48 | |
*** jamesdenton has joined #openstack-infra | 10:50 | |
*** yamamoto has joined #openstack-infra | 11:03 | |
*** dosaboy has quit IRC | 11:05 | |
*** nicolasbock has quit IRC | 11:06 | |
*** yamamoto has quit IRC | 11:08 | |
*** roman_g has joined #openstack-infra | 11:08 | |
*** dosaboy has joined #openstack-infra | 11:09 | |
*** ykarel is now known as ykarel|afk | 11:10 | |
*** tesseract has joined #openstack-infra | 11:11 | |
*** tesseract has quit IRC | 11:13 | |
*** tesseract has joined #openstack-infra | 11:15 | |
*** altlogbot_2 has quit IRC | 11:19 | |
*** irclogbot_0 has quit IRC | 11:19 | |
*** altlogbot_2 has joined #openstack-infra | 11:20 | |
*** zbr is now known as zbr|lunch | 11:22 | |
*** tesseract has quit IRC | 11:23 | |
*** apetrich has joined #openstack-infra | 11:25 | |
*** altlogbot_2 has quit IRC | 11:25 | |
*** kjackal has joined #openstack-infra | 11:27 | |
*** raissa has joined #openstack-infra | 11:29 | |
*** jcoufal has joined #openstack-infra | 11:29 | |
*** raissa has quit IRC | 11:30 | |
*** bhavikdbavishi has joined #openstack-infra | 11:30 | |
*** jamesmcarthur has joined #openstack-infra | 11:34 | |
*** ijw has quit IRC | 11:34 | |
*** jamesmcarthur has quit IRC | 11:39 | |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Spec: Add a Kubernetes Operator for Zuul https://review.opendev.org/659180 | 11:39 |
*** ricolin has joined #openstack-infra | 11:44 | |
*** jcoufal has quit IRC | 11:45 | |
*** jcoufal has joined #openstack-infra | 11:46 | |
*** odyssey4me_ has joined #openstack-infra | 11:49 | |
*** jistr_ has joined #openstack-infra | 11:51 | |
*** niceplace_ has joined #openstack-infra | 11:52 | |
*** ykarel|afk has quit IRC | 11:53 | |
*** kinrui has joined #openstack-infra | 11:53 | |
*** electrofelix has joined #openstack-infra | 11:54 | |
*** udesale has joined #openstack-infra | 11:55 | |
*** jistr has quit IRC | 11:55 | |
*** mnencia has quit IRC | 11:55 | |
*** npochet has quit IRC | 11:55 | |
*** markmcclain has quit IRC | 11:55 | |
*** cyberpear has quit IRC | 11:55 | |
*** aprice has quit IRC | 11:55 | |
*** sparkycollier has quit IRC | 11:55 | |
*** niceplace has quit IRC | 11:55 | |
*** hogepodge has quit IRC | 11:55 | |
*** mordred has quit IRC | 11:55 | |
*** guilhermesp has quit IRC | 11:55 | |
*** seyeongkim has quit IRC | 11:55 | |
*** dougwig has quit IRC | 11:55 | |
*** mnasiadka has quit IRC | 11:55 | |
*** jamespage has quit IRC | 11:55 | |
*** kmalloc has quit IRC | 11:55 | |
*** TheJulia has quit IRC | 11:55 | |
*** clayg has quit IRC | 11:55 | |
*** dustinc has quit IRC | 11:55 | |
*** jrosser has quit IRC | 11:55 | |
*** fungi has quit IRC | 11:55 | |
*** odyssey4me has quit IRC | 11:56 | |
*** melwitt has quit IRC | 11:56 | |
*** asettle-PTO has quit IRC | 11:56 | |
*** odyssey4me_ is now known as odyssey4me | 11:56 | |
*** andreykurilin has quit IRC | 11:58 | |
*** zbr|lunch is now known as zbr | 11:59 | |
*** andreykurilin has joined #openstack-infra | 11:59 | |
*** rcernin has quit IRC | 11:59 | |
*** irclogbot_3 has joined #openstack-infra | 12:00 | |
*** mnencia has joined #openstack-infra | 12:01 | |
*** npochet has joined #openstack-infra | 12:01 | |
*** cyberpear has joined #openstack-infra | 12:01 | |
*** aprice has joined #openstack-infra | 12:01 | |
*** sparkycollier has joined #openstack-infra | 12:01 | |
*** hogepodge has joined #openstack-infra | 12:01 | |
*** mordred has joined #openstack-infra | 12:01 | |
*** guilhermesp has joined #openstack-infra | 12:01 | |
*** seyeongkim has joined #openstack-infra | 12:01 | |
*** dougwig has joined #openstack-infra | 12:01 | |
*** mnasiadka has joined #openstack-infra | 12:01 | |
*** jamespage has joined #openstack-infra | 12:01 | |
*** kmalloc has joined #openstack-infra | 12:01 | |
*** TheJulia has joined #openstack-infra | 12:01 | |
*** clayg has joined #openstack-infra | 12:01 | |
*** dustinc has joined #openstack-infra | 12:01 | |
*** jrosser has joined #openstack-infra | 12:01 | |
*** weshay_PTO is now known as weshay | 12:01 | |
*** altlogbot_2 has joined #openstack-infra | 12:02 | |
*** altlogbot_2 has quit IRC | 12:05 | |
*** irclogbot_3 has quit IRC | 12:05 | |
*** jamesmcarthur has joined #openstack-infra | 12:07 | |
*** altlogbot_1 has joined #openstack-infra | 12:08 | |
*** ykarel|afk has joined #openstack-infra | 12:08 | |
*** pcaruana has joined #openstack-infra | 12:10 | |
*** altlogbot_1 has quit IRC | 12:11 | |
*** jamesmcarthur has quit IRC | 12:11 | |
*** rh-jelabarre has joined #openstack-infra | 12:14 | |
*** strigazi has quit IRC | 12:14 | |
*** strigazi has joined #openstack-infra | 12:15 | |
icey | hey - would it be possible to get zuul resources for a multi-node job where temporary OpenStack credentials are granted rather than a set of pre-built nodes? | 12:17 |
mnaser | infra-root: it looks like there is a bunch of extra images on sjc1... can we clean those up if possible? | 12:20 |
*** kinrui is now known as fungi | 12:21 | |
fungi | icey: you can pass those as secrets to post-review jobs if you have an openstack cloud you want a job to talk to, but cleanup (especially on jobs which get aborted prematurely) is probably the hardest problem to solve there | 12:23 |
fungi | mnaser: i'll take a look and see if nodepool is aware of them | 12:23 |
icey | fungi: yeah - I was wondering, more, if it was possible to get the nodepool resources that are allocated in a multi-node job given via the OS API rather then the predefined list in the ansible playbooks ;-) It seems like it's not really doable (for now?) but it would be cool | 12:24 |
*** rlandy has joined #openstack-infra | 12:24 | |
mordred | icey: yeah - I agree with fungi, cleanup is the hard part. we'd need to implement a nodepool driver to do the full thing you're talking about (We have one like it for kubernetes namespaces) - but we haven't implemented one yet because cleaning up resources in a domain or project is a lot of work | 12:25 |
mordred | (although this hardness is a thing we discussed at the last PTG and there is work underway to make such a thing better) | 12:26 |
fungi | icey: probably the only way to do it sanely is if it's a cloud you control, so that you can allocate one or more new projects per build and then (somehow) delete all resources associated with the project when the job is complete | 12:26 |
icey | mordred: I entirely understand - and even though I might be an upstanding citizen who works to clean up after myself nicely, that's not a safe assumption to make generally | 12:26 |
mordred | exactly | 12:26 |
fungi | doing that in a public cloud where you aren't the cloud admin or can't make nodepool a cloud admin would be really tough, i think | 12:27 |
mordred | in the meantime though, you could make a job such as fungi mentioned that used a predefined secret that had the ability to create projects, freeze that job, then pass the created credentials/project info to a child job using zuul_return, then have the parent job delete the project when it's unfrozen | 12:27 |
icey | regarding that: allocate a new project per build, that's exactly what I'm thinking :) | 12:27 |
icey | mordred: in the mean time, I'll just keep using my own Jenkins that talks to my cloud :-P | 12:28 |
mordred | we do a similar pattern in the docker image build jobs - have the first job create a docker registry, then freeze, then pass the registry info to the child jobs | 12:28 |
icey | well, maybe ;-) | 12:28 |
mordred | so, it's, you know - not super terrible to do in zuul in general - but doing it in the openstack zuul is a bit harder due to our lack of such credentials in our existing cloud providers | 12:29 |
icey | yea :-/ | 12:29 |
mordred | so if you wanted to do it in a zuul against a cloud(s) you controlled, I believe all of the building blocks are there and it wouldnt' be terrible | 12:29 |
mordred | and would be a pretty cool thing to show | 12:30 |
fungi | mnaser: nodepool is only aware of 26 ready images in vexxhost-sjc1 (current and previous for each of the 13 we build there) plus one it's been trying to delete for several months (f670e6be-953b-4d4b-a931-6cbb5b568410) | 12:31 |
*** smarcet has joined #openstack-infra | 12:32 | |
mordred | fungi, icey: actually - we could probably write jobs like that in openstack zuul with an additional first step of a job that ran devstack or something to create a cloud, then pass those creds to the "create a project" job - might be a neat set of jobs to have in the library | 12:32 |
*** gtema has joined #openstack-infra | 12:32 | |
mordred | it would not be a FAST job - but we could show the idea functionally working, and then for people with different contexts, they could use them for real | 12:33 |
*** jistr_ is now known as jistr | 12:33 | |
icey | mordred: considering, right now, I have one of my jobs that just topped out at 6 hours... | 12:33 |
icey | mordred: (well, 6 independent jobs, but run sequentially) | 12:33 |
fungi | mnaser: openstackclient says we have 67 active images in there though, so presumably we've leaked ~40. i'll try to filter out the ones nodepool is actually using and delete the rest here shortly | 12:34 |
mnaser | fungi: thank you! | 12:34 |
fungi | thank you for letting us know! | 12:35 |
fungi | and apologies for the inconvenience | 12:35 |
*** jcoufal has quit IRC | 12:38 | |
*** smarcet has left #openstack-infra | 12:42 | |
*** jamesmcarthur has joined #openstack-infra | 12:45 | |
*** rfarr_ has joined #openstack-infra | 12:50 | |
*** rfarr has quit IRC | 12:51 | |
*** jcoufal has joined #openstack-infra | 12:52 | |
openstackgerrit | Natal Ngétal proposed openstack/diskimage-builder master: [Configuration] Switch to stestr. https://review.opendev.org/629414 | 12:52 |
Shrews | fungi: we should try to see if there is any info in nodepool logs for those leaked images. | 12:53 |
*** altlogbot_2 has joined #openstack-infra | 12:54 | |
*** bdodd has joined #openstack-infra | 12:54 | |
*** aaronsheffield has joined #openstack-infra | 12:54 | |
*** ekultails has joined #openstack-infra | 12:55 | |
*** altlogbot_2 has quit IRC | 12:57 | |
fungi | i'll try to take a look in a moment, sure | 12:59 |
Shrews | fungi: if you want to paste me the image ids i can check the logs. seems like a doable 1 handed task | 12:59 |
*** gtema has left #openstack-infra | 13:00 | |
fungi | my pastebinit has stopped working with paste.o.o so that'll take a bit too ;) | 13:00 |
*** jcoufal has quit IRC | 13:00 | |
* fungi is still trying to morning, and isn't doing a very good job of it | 13:00 | |
*** jcoufal has joined #openstack-infra | 13:00 | |
*** jcoufal_ has joined #openstack-infra | 13:02 | |
Shrews | fungi: no rush. everything i do now takes 10x as long | 13:05 |
*** zul has joined #openstack-infra | 13:05 | |
*** jcoufal has quit IRC | 13:06 | |
*** sthussey has joined #openstack-infra | 13:06 | |
*** jcoufal has joined #openstack-infra | 13:07 | |
*** jcoufal has quit IRC | 13:07 | |
*** jcoufal has joined #openstack-infra | 13:08 | |
*** rlandy_ has joined #openstack-infra | 13:09 | |
*** rlandy_ has quit IRC | 13:09 | |
*** jcoufal_ has quit IRC | 13:10 | |
fungi | i should hook you up with a one-handed chording keyboard | 13:12 |
*** mriedem has joined #openstack-infra | 13:13 | |
*** priteau has joined #openstack-infra | 13:14 | |
fungi | Shrews: http://paste.openstack.org/show/754218/ | 13:18 |
fungi | i'm first going to see if i get any interesting errors trying to delete f670e6be-953b-4d4b-a931-6cbb5b568410 since nodepool can't seem to | 13:19 |
*** goldyfruit has joined #openstack-infra | 13:19 | |
Shrews | fungi: happen to know which nb* builds vexxhost offhand? | 13:21 |
*** pkopec has quit IRC | 13:21 | |
*** pkopec has joined #openstack-infra | 13:22 | |
fungi | Shrews: looks like it's nb01 | 13:22 |
*** guimaluf has joined #openstack-infra | 13:22 | |
fungi | oh, or maybe not. i think our macro expansion includes all the providers | 13:23 |
*** gtema has joined #openstack-infra | 13:24 | |
fungi | both nb01 and nb02 have provider entries for it | 13:25 |
fungi | d'oh | 13:26 |
fungi | i guess they both do it (whoever gets to it first) | 13:26 |
fungi | not like launchers where they get assigned to specific providers | 13:26 |
fungi | Shrews: you and your trick questions | 13:26 |
fungi | anyway, the reason f670e6be-953b-4d4b-a931-6cbb5b568410 isn't getting deleted is that glance says that image is in use | 13:27 |
fungi | maybe we have a held node which was booted from it... checking now | 13:28 |
Shrews | oh, i didn't realize we split it up like that | 13:28 |
Shrews | i'm having problems finding any logs for the leaked ones | 13:29 |
fungi | they may be older than our log retention? | 13:30 |
openstackgerrit | Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Enables molecute template to use depends-on tripleo-repos https://review.opendev.org/669871 | 13:31 |
fungi | so, the only node in sjc1 older than a day is 62e01c16-019d-46b8-b92b-b3e409161e97 which is a fedora-29-vexxhost instance in ready state. doesn't correspond to our stuck ubuntu-xenial image deletion | 13:31 |
fungi | mnaser: any way you can determine what has f670e6be-953b-4d4b-a931-6cbb5b568410 in use for over two months? | 13:31 |
fungi | (that's an image uuid in sjc1, for context) | 13:32 |
openstackgerrit | Nate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation https://review.opendev.org/669775 | 13:34 |
*** michael-beaver has joined #openstack-infra | 13:34 | |
*** pkopec has quit IRC | 13:36 | |
*** pkopec has joined #openstack-infra | 13:38 | |
AJaeger | config-core, are you fine with adding tripleo-repos as required-repos to the molecule template? See https://review.opendev.org ... | 13:41 |
*** slaweq has quit IRC | 13:43 | |
Shrews | fungi: i think we need the image metadata from those leaks to map to a build id that we can grep the logs for | 13:48 |
*** jamesmcarthur has quit IRC | 13:49 | |
*** slaweq has joined #openstack-infra | 13:49 | |
fungi | ahh, yeah i guess they're no longer embedded in the image names | 13:49 |
Shrews | the number on the image name is a timestamp :( | 13:49 |
fungi | looks like `openstack image show` has it in the properties list... i should be able to parse it out of there | 13:50 |
*** ijw has joined #openstack-infra | 14:00 | |
fungi | i've got a loop going to assemble that list now | 14:03 |
*** ijw has quit IRC | 14:05 | |
fungi | Shrews: i've appended it to the bottom of http://paste.openstack.org/show/754221/ | 14:13 |
Shrews | fungi: thx | 14:15 |
*** jamesmcarthur has joined #openstack-infra | 14:17 | |
*** whoami-rajat has quit IRC | 14:18 | |
AJaeger | mwhahaha: tempest-tripleo-ui still needs removal from project-config, could you take care of that, please? | 14:25 |
mwhahaha | AJaeger: yea gimme a few | 14:25 |
AJaeger | sure, mwhahaha . Thanks! | 14:26 |
*** dpawlik has quit IRC | 14:34 | |
*** jamesmcarthur has quit IRC | 14:34 | |
*** ijw has joined #openstack-infra | 14:34 | |
*** jamesmcarthur has joined #openstack-infra | 14:35 | |
*** njohnston has joined #openstack-infra | 14:37 | |
clarkb | Shrews: fungi builders build an image then upload it to all clouds. So it comes down to which builder grabs the job first (the arm64 builder is special because there is only one of them but if we had two of those it would work that way too) | 14:38 |
clarkb | and ya launchers are split up roughly by total max-servers count since its a thread per launch and we try to distribute the thread cost | 14:38 |
clarkb | Shrews: fungi as for image leaks a lot of them seem to happen due to failed uploads | 14:40 |
Shrews | failed uploads that somehow succeed? neat | 14:41 |
fungi | ahh, so a builder tries to upload an image, "fails" for $reasons, then doesn't clean it up because it thinks it shouldn't exist? | 14:42 |
clarkb | Shrews: I haven't looked at the current crop of leaked images but no I don't think they succeeded. They remain stuck in a "saving" state | 14:42 |
clarkb | fungi: yes exactly | 14:42 |
*** pgaxatte has quit IRC | 14:42 | |
clarkb | basically every time I've looked at these leaked images the bulk of them are still in a saving state | 14:42 |
clarkb | so I think sdk is failing somewhere and not cleaning up after itself | 14:43 |
Shrews | fungi's paste shows them as active | 14:43 |
openstackgerrit | Alex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui https://review.opendev.org/669883 | 14:44 |
mwhahaha | AJaeger: -^ | 14:44 |
fungi | i wonder if the builders could do something like the launchers do to identify leaked resources... add unique metadata identifying the nodepool deployment (for cases of multiple nodepools in the same project) and then delete any images which have the deployment's id but are not known in the image-list? | 14:44 |
clarkb | corvus: see comment on https://review.opendev.org/#/c/669780/3 | 14:45 |
fungi | i think we can't ever entirely trust cloud apis to be correct about when something has failed, and should probably just assume that sometimes they provide misleading answers | 14:46 |
Shrews | fungi: i think they could (and should if it cant be corrected in sdk) | 14:46 |
fungi | i have a feeling the sdk won't be a complete solution because it's not persistent | 14:47 |
*** ijw has quit IRC | 14:47 | |
fungi | for example, if the upload fails but the image which shouldn't exist doesn't show up in the list immediately due to some sort of update lag, or refuses to be deleted because there's background conversion underway, or whatever | 14:47 |
fungi | the nodepool builders on the other hand maintain state and can know in the future whether there's suddenly something there which shouldn't be | 14:49 |
*** armax has joined #openstack-infra | 14:49 | |
Shrews | fungi: the problem will be in getting an image ID to delete. if it failed, nodepool won't have it | 14:50 |
Shrews | so.... not sure what the solution is here | 14:50 |
fungi | ahh, you mean short of iterating through the image list in the provider | 14:50 |
fungi | i agree that's probably expensive (depending on the number of images involved) | 14:51 |
clarkb | ya the sdk/shade has a porcelain method that is basically "do all the steps to upload an image of which there are many" | 14:51 |
clarkb | nodepool only gets the image info if that succeeds | 14:51 |
clarkb | (I think) | 14:51 |
*** Fidde has quit IRC | 14:51 | |
clarkb | so any errors in that series of steps must be handled at the sdk layer or we need to stop using porcelain methods in ndoepool and basically rewrite shade in nodepool | 14:51 |
*** ykarel|afk is now known as ykarel | 14:52 | |
corvus | clarkb: i don't understand what 669780 has to do with the journal-to-console element | 14:52 |
fungi | nodepool can still ask the api for a list of image uuids and then ask the api for the properties for each of those and look in the metadata from them to compare to what's in zk, right? | 14:52 |
fungi | s/api/sdk/ | 14:52 |
fungi | i mean, it's inefficient for sure, and i don't know how much of that can be effectively cached by the sdk layer | 14:53 |
clarkb | corvus: 669780 logs debug logs which include instance console logs when instances fail to boot. the journal-to-console element includes the journal contents in the instance console log | 14:53 |
clarkb | corvus: I linked an example log for you on 669780 | 14:54 |
fungi | i guess at the moment there's not a deleter thread for builders like there is for launchers? | 14:54 |
clarkb | corvus: this is all about improving the debuggability of those jobs. I don't see these changes as fundamentally a problem other than we may wish to use setup simplifiers like the -d option | 14:54 |
clarkb | fungi: correct | 14:54 |
Shrews | fungi: there is. it cleans up deleted, failed, and unlocked uploading zk records. for deleted, it also deletes any associated upload | 14:55 |
fungi | oh, convenient | 14:56 |
Shrews | fungi: we'd have to change logic to not delete failed zk records w/o first waiting a period to check for the weirdness we see here | 14:56 |
clarkb | Shrews: fungi but it only uses the zk records there not the api records iirc | 14:56 |
Shrews | clarkb: correct | 14:56 |
clarkb | whereas with the launchers we actually check the api for leaks of things like floating ips and instances | 14:57 |
clarkb | which is different | 14:57 |
fungi | i think fundamentally the problem is likely the same... we tell clouds to create resources, sometimes those clouds lie about whether a resource was created, so we can't trust them to always clean up those resources they lied to us and said failed to be created | 14:57 |
Shrews | right, but it can be added | 14:57 |
clarkb | fungi: well in this case I don't think the cloud is lying | 14:58 |
fungi | the sdk can maybe double-check behind the upload failure and then issue an immediate delete for things which shouldn't be there | 14:58 |
clarkb | fungi: in this case I think sdk is not taking the appropriate steps on failure to undo the many steps of image uploading | 14:59 |
clarkb | (in some cases) | 14:59 |
clarkb | fungi: image upload is like 4 or 5 steps | 14:59 |
clarkb | and the sdk must undo each successful step when the current step fails | 14:59 |
Shrews | mordred: you may find convo above entertaining | 14:59 |
mordred | well - it must undo _some_ of the successful steps | 14:59 |
fungi | and what creates the image in glance isn't the last of those steps? | 14:59 |
mordred | fungi: nope | 15:00 |
clarkb | fungi: nope | 15:00 |
mordred | fungi: that would be too convenient | 15:00 |
clarkb | its the opposite actually | 15:00 |
clarkb | the first step creates the image | 15:00 |
clarkb | with no image content | 15:00 |
corvus | clarkb: do you, in your heart of hearts, really believe that these patches are needed for *nodepool*? or do you think they are needed to debug operational issues with clouds and dib elements? | 15:01 |
mordred | yah - well, that's the PUT method. in the rackspace task-import method, it's a bit more like what fungi expects | 15:01 |
fungi | oh! so there can be contentless image records in glance? that's fun | 15:01 |
mordred | fungi: yah | 15:01 |
fungi | i guess the ones where you specify a durable image location must work that way | 15:02 |
clarkb | corvus: I guess I see those as sort of the same problem? the only way to know if it is a cloud or dib etc problem and not a dib problem (dib didn't tell cloud to use networking properly or something) is to record the data so that we can inspect it | 15:02 |
mordred | Shrews, clarkb, fungi: maybe we should make more use of properties on the image records like we do for servers | 15:02 |
clarkb | corvus: you can't rule any of them in or out with the black box we currently have | 15:02 |
corvus | clarkb: right. but either way, it's not a nodepool problem. the way i see it, is, yes, these would be handy if we ever broke the ability to upload our basic centos image to devstack. but the testing on dib is supposed to prevent that from happening. | 15:02 |
mordred | so that we can do a better job 2pcing between image records and zk records - then do what Shrews was saying and wait to remove the zk node until we're more sure all the bits have been cleaned up | 15:03 |
corvus | clarkb: so *dib* needs this, and *opendev* needs this, but *nodepool* doesn't. so we're throwing the changes at the wrong repo. | 15:03 |
*** ijw has joined #openstack-infra | 15:03 | |
fungi | mordred: right, that's what i was wondering as well, like whether we could stash a deployment identifier in all of them so nodepool can recognize image records it created even if it lost track of them or thought they shouldn't exist | 15:03 |
corvus | clarkb: i'm trying really hard to stop alienating nodepool developers by asking them to review a bunch of changes that opendev needs to support its images in new clouds. | 15:03 |
clarkb | corvus: is that actually alienating or just something that they will ignore? | 15:04 |
mordred | fungi: yah- I haven't thought through it fully yet - there may be some issues with doing that we'd need to think about | 15:04 |
clarkb | personally I see this as useful debugging for the jobs | 15:04 |
clarkb | regardless of the repo in question | 15:04 |
mordred | and we might need to add some helper methods in sdk that nodepool can call for this sort of thing | 15:04 |
clarkb | because if the job fails (for whatever unknown reason) that information will help us understand why | 15:04 |
mordred | because the mechanism of cleaning up will vary by cloud upload mechanism, so it shouldn't be nodepool trying to do this directly | 15:04 |
corvus | clarkb: i do not think we should have to ask them to ignore changes like this. that seems rude to me. | 15:04 |
clarkb | corvus: ok so you are ok with nodepool logging at a debug level (that at least seems ok for nodepool tests). Then set additional elements in jobs for other repos (like dib/sdk) as necessary to debug the functionality of those tools specifically on failure? | 15:06 |
clarkb | Mostly for me this is about having knowledge about why something failed and not about assigning blame to one group or another | 15:06 |
*** ijw has quit IRC | 15:08 | |
corvus | clarkb: i'm not trying to assign blame either. i'm trying to (and the weeks of work i did to move all the dib jobs *out* of the nodepool repo is in service of this) to move these jobs to the right place. we've been just assuming that everyone working on nodepool is enthralled by the idea of getting $distro images working on $cloud. it's just not the case. it's important, but it's important to | 15:08 |
corvus | a different set of people, so let's get everyone the tools they need. | 15:08 |
corvus | clarkb: honestly, i'm not thrilled by the idea of debug logging by default. i like the idea of running them in a more production-like setting. so maybe that should be an option too. | 15:09 |
clarkb | corvus: yup I agree. I see this as not conflicting with that though. The two things that are done here seem generically useful to debugging nodepool's job failures. Those two things are 1) log at debug level 2) include logs from test instance to understand why it might fail | 15:09 |
corvus | clarkb: so you think the journald-to-console element is something we should always have in nodepool jobs, in case we break something with a change in nodepool itself? | 15:11 |
clarkb | corvus: yes, as it allows you to trace network connectivity and the ssh connection testing in particular whcih we can break on the nodepool side | 15:11 |
*** altlogbot_0 has joined #openstack-infra | 15:12 | |
corvus | clarkb: okay, i think i'm convinced on that one. maybe for the debug change we should make that a job variable, so we can set it or not in nodepool as needed, and the other job users can do the same? | 15:15 |
*** bobh has joined #openstack-infra | 15:15 | |
clarkb | corvus: ya that works. Then if we end up with a consistent failuer we can toggle that flag and get all the data on the next run | 15:15 |
*** altlogbot_0 has quit IRC | 15:17 | |
corvus | clarkb: ok, review comments adjusted accordingly, thanks :) | 15:18 |
fungi | so on the image leaks topic, i agree it's probably good to make sure the sdk is really doing what it should to unwind image record creation after the api reports an upload failure before we go deciding we need more logic in nodepool to hunt for leaked images which sneak past the sdk's cleanup, though ultimately the latter is probably still useful | 15:20 |
mordred | fungi: yes - I agree | 15:22 |
clarkb | fungi: I've also tried for a while now to convince the glance team that the api is too complicated for the use case, but I think I mostly failed there when they rewrote the api and added steps instead of removing them | 15:25 |
* fungi needs to head to an appointment, but will be back shortly | 15:26 | |
clarkb | seems like the plan to store more than just disk images at one time is part of the complication there? | 15:26 |
*** bobh has quit IRC | 15:27 | |
*** chandankumar is now known as raukadah | 15:32 | |
*** hamzy has quit IRC | 15:33 | |
openstackgerrit | Donny Davis proposed openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing https://review.opendev.org/669893 | 15:36 |
clarkb | config-core infra-root ^ care to be second reviewer on that? I think we'll turn it off while we work through improving our testing then fixing the images | 15:38 |
Shrews | clarkb: corvus: where will the journald output be found using that role? directly in the nodepool logs? | 15:38 |
clarkb | I think donnyd wants to do some hardware changes too | 15:38 |
clarkb | Shrews: yes, see http://logs.openstack.org/73/669773/2/check/dib-nodepool-functional-openstack-centos-7-src/00f6345/nodepool/nodepool-launcher.log as an example | 15:39 |
corvus | Shrews: yes, iff we log at debug level (thus the parent change) | 15:39 |
donnyd | yes. the mirror will be offline for about 10-15 minutes | 15:39 |
Shrews | ah ok. i just didnt scroll down far enough in that log file to see it | 15:41 |
Shrews | kinda ugly output | 15:41 |
*** altlogbot_0 has joined #openstack-infra | 15:42 | |
Shrews | cant it be sent to syslog? | 15:42 |
corvus | Shrews: we don't use syslog at all in nodepool... | 15:42 |
clarkb | and the issue if networking is broken is there is no good way to get syslog | 15:42 |
*** ijw has joined #openstack-infra | 15:43 | |
clarkb | the console log is retrieved through the magic of qemu so is the most accessible logging in that case | 15:43 |
Shrews | corvus: yet we pull the file: http://logs.openstack.org/87/669787/2/check/nodepool-functional-openstack/44716e5/ | 15:43 |
corvus | Shrews: (but, technically, yes, you could write a logging config to send it to syslog)... | 15:43 |
clarkb | Shrews: thats the syslog of the hypervisor not the guest | 15:43 |
Shrews | ah | 15:43 |
*** derekh has quit IRC | 15:44 | |
Shrews | sorry, still trying to catch up on these test changes | 15:44 |
*** altlogbot_0 has quit IRC | 15:46 | |
*** ijw has quit IRC | 15:48 | |
*** diablo_rojo has joined #openstack-infra | 15:48 | |
*** priteau has quit IRC | 15:52 | |
*** igordc has joined #openstack-infra | 15:56 | |
openstackgerrit | Merged openstack/project-config master: Scaling FN cloud back down to zero for ipv6 testing https://review.opendev.org/669893 | 15:59 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 15:59 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Add shared nodepool_opts variable to openstack func test https://review.opendev.org/669899 | 16:00 |
donnyd | just a data point - FN will still be up, the only outage will be for the mirror | 16:01 |
donnyd | clarkb: | 16:01 |
donnyd | clarkb: is there a way to make the mirror ha? | 16:01 |
corvus | donnyd: yes, but it's usually not worth the effort since we have other clouds | 16:02 |
*** hamzy has joined #openstack-infra | 16:02 | |
clarkb | donnyd: we could run two mirrors and an haproxy but I think we've largely decided our redundancy is by having more than one cloud region and then we don't have to overcomplicate each cloud region's setup | 16:02 |
donnyd | yea that makes sense to me | 16:03 |
openstackgerrit | Clark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773 | 16:03 |
donnyd | just curious | 16:03 |
clarkb | corvus: ^ ok I think those three changes do what you suggest. Have a moment to quickly check if that is better? (note I didn't update ianw's change for the debug thing as the appraoches were quite a bit different instead pushed a different change) | 16:04 |
corvus | clarkb: yeah, left a comment on '99 | 16:04 |
*** icarusfactor has joined #openstack-infra | 16:04 | |
*** bobh has joined #openstack-infra | 16:04 | |
clarkb | thanks | 16:04 |
corvus | clarkb: also, i was originally thinking of an explicit job var just for debug, but i'm okay with this if that's what you prefer. | 16:05 |
clarkb | corvus: is there a jinja filter for if var is true then this value instead? | 16:05 |
clarkb | Mostly this was simpler to write and maybe more flexible so did that | 16:05 |
* clarkb finds ansible jinja filter docs | 16:06 | |
*** factor has quit IRC | 16:06 | |
*** factor__ has joined #openstack-infra | 16:06 | |
corvus | clarkb: maybe the usual python "and or" thing would work? | 16:06 |
*** bobh has quit IRC | 16:06 | |
corvus | clarkb: apparently: http://jinja.pocoo.org/docs/2.10/templates/#if-expression | 16:07 |
clarkb | {{ nodepool_debug and '-d' or '' }} ? | 16:07 |
corvus | that's what i was thinking, but docs say: {{ '-d' if nodepool_debug else '' }} | 16:07 |
* clarkb updates with docs this time | 16:08 | |
*** altlogbot_1 has joined #openstack-infra | 16:08 | |
*** icarusfactor has quit IRC | 16:08 | |
*** eernst has joined #openstack-infra | 16:09 | |
donnyd | Is there a way for me to check what build was running on a particular node. I would like to do everything possible to reduce build times. | 16:10 |
*** witek has quit IRC | 16:12 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669899 | 16:13 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 16:13 |
*** altlogbot_1 has quit IRC | 16:13 | |
corvus | donnyd: you can probably query logstash for what you need for that... you can get build results and runtimes and node name, and you can query by region. it won't have the instance uuid though. | 16:13 |
corvus | donnyd: (it will have the ip address and time, so you can xref that way) | 16:13 |
corvus | donnyd: http://logstash.openstack.org/#/dashboard/file/logstash.json | 16:14 |
openstackgerrit | Clark Boylan proposed opendev/glean master: network-manager: add network-pre dependencies https://review.opendev.org/669773 | 16:14 |
clarkb | corvus: ok I think that addresses the comments. 669773 should confirm it works as expected | 16:14 |
*** ykarel is now known as ykarel|away | 16:14 | |
*** rpioso is now known as rpioso|afk | 16:15 | |
*** mattw4 has joined #openstack-infra | 16:16 | |
clarkb | corvus: donnyd ya you can filter by node_provider:"fortnebula-regionone" AND build_name:"some-job-name" AND message:"stuff" and so on | 16:17 |
clarkb | that should allow you to narrow down jobs then dig in via their log links (which can be found by clicking on the event results logstash returns its part of the event details) | 16:17 |
*** lucasagomes has quit IRC | 16:17 | |
*** tdasilva has joined #openstack-infra | 16:19 | |
*** altlogbot_3 has joined #openstack-infra | 16:20 | |
clarkb | comparing the boot up logs we captured to those of a host in fortnebula the fn node networkmanager logs show it doing dhcp | 16:20 |
*** whoami-rajat has joined #openstack-infra | 16:20 | |
clarkb | the test node doesn't seem to dhcp. This makes me wonder if there is a bug in ordering where we don't enable dhcp on the interface? Thats a stretch need more data | 16:20 |
clarkb | I think what I'll do next is build an image locally then start testing with that | 16:21 |
*** eernst has quit IRC | 16:21 | |
clarkb | that will take some time for me to bootstrap though as I've let my dib VM rot | 16:21 |
clarkb | actually I could just edit the centos 7 image that dib has already built via nodepool. Except that is a large image | 16:21 |
clarkb | maybe I'll start with that since networking tends to be cheap ish | 16:22 |
*** altlogbot_3 has quit IRC | 16:23 | |
openstackgerrit | Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432 | 16:24 |
*** irclogbot_1 has joined #openstack-infra | 16:24 | |
*** irclogbot_1 has quit IRC | 16:27 | |
clarkb | alright I'm going to pop out for a bike ride while I wait for test results and consider quicker rtt methods for testing the centos problem | 16:27 |
*** factor__ has quit IRC | 16:29 | |
*** factor__ has joined #openstack-infra | 16:30 | |
logan- | clarkb: regarding your jinja question.. ternary filter | 16:32 |
sshnaidm|ruck | How to configure this - I'd like to run the same periodic job on different branches, apparently this doesn't work: https://github.com/openstack/tripleo-ci/blob/master/zuul.d/periodic.yaml#L4 | 16:32 |
sshnaidm|ruck | it runs on master only | 16:32 |
*** dtantsur is now known as dtantsur|afk | 16:34 | |
*** goldyfruit has quit IRC | 16:35 | |
*** rpittau is now known as rpittau|afk | 16:37 | |
*** bobh has joined #openstack-infra | 16:38 | |
*** bobh has quit IRC | 16:38 | |
*** ccamacho has quit IRC | 16:38 | |
*** eharney has joined #openstack-infra | 16:41 | |
*** factor__ has quit IRC | 16:44 | |
*** mgoddard has quit IRC | 16:45 | |
AJaeger | sshnaidm|ruck: tripleo-ci has only a master branch, so it will run only on that one... | 16:47 |
sshnaidm|ruck | AJaeger, I see, so if I configure the same on branched repo it'll work? | 16:48 |
*** mgoddard has joined #openstack-infra | 16:48 | |
AJaeger | sshnaidm|ruck: if you add it on project-config, it will work | 16:49 |
*** dpawlik has joined #openstack-infra | 16:49 | |
AJaeger | sshnaidm|ruck: if you add it to another repo, it will only run on the branch you add it on... | 16:49 |
AJaeger | so, you would need to add it to *all* branches | 16:49 |
sshnaidm|ruck | AJaeger, ok, thanks | 16:50 |
AJaeger | sshnaidm|ruck: unless there's a magic zuul variable to apply this on all branches, best ask corvus | 16:50 |
AJaeger | or check docs | 16:50 |
sshnaidm|ruck | corvus, do we have such? ^^ | 16:50 |
*** gtema has quit IRC | 16:52 | |
*** gtema has joined #openstack-infra | 16:52 | |
*** ijw has joined #openstack-infra | 16:52 | |
*** sshnaidm|ruck is now known as sshnaidm|afk | 16:57 | |
AJaeger | jamespage: could you check https://review.opendev.org/615273, please? Is there progress happening? | 16:57 |
AJaeger | sshnaidm|afk, check: https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers | 16:59 |
donnyd | clarkb: is your test node using just SLAAC or dhcpv6-stateless? Wonder if it even makes a difference | 16:59 |
*** udesale has quit IRC | 17:00 | |
*** armax has quit IRC | 17:00 | |
*** altlogbot_3 has joined #openstack-infra | 17:00 | |
*** ykarel|away has quit IRC | 17:00 | |
*** gtema has quit IRC | 17:03 | |
*** gtema has joined #openstack-infra | 17:04 | |
*** kjackal has quit IRC | 17:04 | |
sshnaidm|afk | AJaeger, not sure I understand this well and how I can use it.. | 17:05 |
sshnaidm|afk | will ask in #zuul | 17:05 |
*** altlogbot_3 has quit IRC | 17:05 | |
fungi | okay, back and caught up on scrollback here | 17:05 |
fungi | clarkb: donnyd: we also have hostids indexed now in logstash, if that helps | 17:06 |
*** irclogbot_2 has joined #openstack-infra | 17:10 | |
*** Lucas_Gray has joined #openstack-infra | 17:11 | |
*** irclogbot_2 has quit IRC | 17:13 | |
openstackgerrit | Alex Schultz proposed openstack/project-config master: Drop tempest-tripleo-ui https://review.opendev.org/669883 | 17:14 |
*** dpawlik has quit IRC | 17:20 | |
*** Lucas_Gray has quit IRC | 17:24 | |
*** smarcet has joined #openstack-infra | 17:26 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Retire docs-specs https://review.opendev.org/668854 | 17:29 |
*** igordc has quit IRC | 17:29 | |
AJaeger | config-core, two project retirements for review, please: https://review.opendev.org/#/c/668854 and https://review.opendev.org/#/c/669883/ | 17:30 |
*** ralonsoh has quit IRC | 17:36 | |
donnyd | thanks for all of the pointers, I should be able to get what I am looking for to get the tuning optimal for this workload | 17:38 |
*** gyee has joined #openstack-infra | 17:39 | |
*** tesseract has joined #openstack-infra | 17:40 | |
clarkb | donnyd: I think NM may try both | 17:43 |
clarkb | donnyd: its not actually completely clear to me what choices it is making :/ | 17:43 |
clarkb | donnyd: and the behavior seems to change if the old sysconfig stuff is being used | 17:43 |
*** hamzy has quit IRC | 17:47 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669899 | 17:49 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 17:49 |
clarkb | I think I've decided the easiest way to debug this is to hold the centos functional jobs for https://review.opendev.org/#/c/669773/4 so I'm doign that now | 17:51 |
*** hamzy has joined #openstack-infra | 17:52 | |
*** rpioso|afk is now known as rpioso | 17:58 | |
*** jcoufal_ has joined #openstack-infra | 18:00 | |
fungi | Shrews: is there any other detail we should collect before i delete the leaked nodes from sjc1? | 18:02 |
*** jcoufal has quit IRC | 18:02 | |
Shrews | fungi: i dont believe so | 18:02 |
*** smarcet has quit IRC | 18:02 | |
fungi | okay, i'll put together the list and start cleaning up | 18:02 |
zbr | clarkb: fungi AJaeger corvus: please have a look at https://review.opendev.org/#/c/669871/ and let me know which path should I take (check my last comment). | 18:02 |
clarkb | I thought depends on did end up as implicit required projects? | 18:04 |
*** bhavikdbavishi has quit IRC | 18:04 | |
clarkb | zbr: I think what I would do is have an openstack-tox-molecule job that sets the success and failure urls, then inherit from that in tripleo-tox-molecule job and set required project there. Then have two different templates that should never need to change | 18:06 |
*** electrofelix has quit IRC | 18:08 | |
zbr | ok, i guess you wanted to say an tripleo-tox-molecule template, but is ok. i will do it that way. | 18:10 |
clarkb | no jobs | 18:10 |
clarkb | first new job sets success and failure urls, second new job sets the tripleo required projects list | 18:11 |
clarkb | then you can either put those in templates or not | 18:11 |
zbr | ok, i think i got it | 18:12 |
clarkb | that way if you make fixes to the other jobs the tripleo child job gets all of those fixes | 18:12 |
clarkb | and you don't have to double account those changes | 18:12 |
*** eharney has quit IRC | 18:17 | |
*** armax has joined #openstack-infra | 18:20 | |
*** jcoufal_ has quit IRC | 18:20 | |
*** jamesmcarthur has quit IRC | 18:21 | |
zbr | clarkb: can templates have the same names as jobs? | 18:21 |
clarkb | zbr: I believe they can as they are applied in different contexts | 18:22 |
zbr | ok. that is good as it would be useful to have both. in most cases people would refer to the template. | 18:23 |
*** eharney has joined #openstack-infra | 18:30 | |
*** whoami-rajat has quit IRC | 18:30 | |
openstackgerrit | Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871 | 18:31 |
openstackgerrit | Sorin Sbarnea proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871 | 18:33 |
*** jamesmcarthur has joined #openstack-infra | 18:34 | |
*** melwitt has joined #openstack-infra | 18:34 | |
*** jcoufal has joined #openstack-infra | 18:34 | |
*** bobh has joined #openstack-infra | 18:35 | |
*** irclogbot_0 has joined #openstack-infra | 18:36 | |
*** bobh has quit IRC | 18:36 | |
clarkb | corvus: apparently if you set -d with nodepool it does not daemonize | 18:38 |
*** tesseract has quit IRC | 18:39 | |
*** irclogbot_0 has quit IRC | 18:39 | |
clarkb | (this causes the ansible task to start nodepool-builder to never end and we never start the launcher) | 18:39 |
clarkb | that would be why ianw wrote the logging config I guess | 18:40 |
*** panda has quit IRC | 18:43 | |
*** eharney has quit IRC | 18:44 | |
*** eharney has joined #openstack-infra | 18:46 | |
fungi | mnaser: Shrews: in addition to the one image which nodepool is failing to delete due to it being mysteriously in use after more than two months (f670e6be-953b-4d4b-a931-6cbb5b568410), a couple of the images nodepool has lost track of are also refusing to delete for the same reason: 8ea352da-2907-41c9-9b49-d0028f661fd7, c2f138b5-cfe5-43e8-8a11-251d71033628, 233dd054-dae8-4b12-a3de-fe2aa7671f37, | 18:47 |
fungi | e2f7361d-9d04-4b93-b2fa-f4bad095499c | 18:47 |
clarkb | fungi: are those all test images (eg not control plane images?) | 18:48 |
fungi | test images, all | 18:48 |
fungi | so no clue what instances would still be using them | 18:48 |
clarkb | could be held nodes | 18:48 |
fungi | in the case of f670e6be-953b-4d4b-a931-6cbb5b568410 the only running instance anywhere near that old id for a different image type entirely | 18:48 |
*** hamzy has quit IRC | 18:49 | |
*** hamzy has joined #openstack-infra | 18:49 | |
fungi | there are currently no held nodes in sjc1 | 18:49 |
clarkb | cinder and glance and nova must be getting confused then | 18:49 |
clarkb | donnyd: am I good to boot test nodes in fn? | 18:51 |
fungi | i wouldn't be surprised if certain sorts of boot failures leave bfv references behind somewhere | 18:51 |
*** jcoufal has quit IRC | 18:52 | |
*** jcoufal has joined #openstack-infra | 18:53 | |
fungi | #status log manually deleted 30 leaked nodepool images from vexxhost-sjc1 | 18:53 |
openstackstatus | fungi: finished logging | 18:53 |
clarkb | ~5 minutes to our weekly meeting. I expect this one to be quick considering many of us have only worked ~2 days since the last one | 18:55 |
*** igordc has joined #openstack-infra | 18:55 | |
donnyd | clarkb: yea you should be good to go | 18:55 |
*** rlandy has quit IRC | 18:56 | |
fungi | clarkb: i call those folks "efficient" | 18:56 |
fungi | i wonder if these delete failures point to leaked volumes i can find in cinder? | 18:57 |
*** jcoufal has quit IRC | 18:57 | |
fungi | heading down that route next | 18:58 |
clarkb | fungi: oh interesting ya possible the instance delete left a volume hanging around | 18:58 |
fungi | not all of them i guess. i see three marked "available" in openstack volume list | 18:59 |
fungi | all 80gb so must be associated with test nodes | 18:59 |
clarkb | if you volume show on them and they were booted from our images you should get thei mage uuids out | 18:59 |
clarkb | maybe even the names too | 18:59 |
*** rlandy has joined #openstack-infra | 19:00 | |
fungi | all 3 available volumes were created on 2019-06-21 | 19:01 |
*** ociuhandu has quit IRC | 19:02 | |
*** pkopec has quit IRC | 19:02 | |
*** jcoufal has joined #openstack-infra | 19:03 | |
*** crodriguez has joined #openstack-infra | 19:04 | |
*** factor has joined #openstack-infra | 19:06 | |
fungi | oh, in addition to the 3 "available" volumes there are also 6 "error_deleting" volumes | 19:10 |
fungi | okay, all 9 leaked volumes are built from one of two images: c2f138b5-cfe5-43e8-8a11-251d71033628 and e2f7361d-9d04-4b93-b2fa-f4bad095499c | 19:15 |
corvus | clarkb: sigh. i guess we should put ianw's logging config change behind your flag? | 19:16 |
*** weifan has joined #openstack-infra | 19:16 | |
fungi | that covers 2 of our 4 undeletable images at least | 19:16 |
clarkb | corvus: ya thats sort of what I'm thinking unless we want to change nodepool behavior | 19:17 |
corvus | clarkb: i kinda do, but i think we've done as much as we can easily, and the rest is going to be a slower process, so i don't think we should depend on that here. | 19:19 |
corvus | (further work on that will entail more breaking behavior changes i think) | 19:19 |
*** panda has joined #openstack-infra | 19:19 | |
*** jcoufal_ has joined #openstack-infra | 19:21 | |
*** ijw has quit IRC | 19:22 | |
*** michael-beaver has quit IRC | 19:24 | |
*** jcoufal has quit IRC | 19:24 | |
*** goldyfruit has joined #openstack-infra | 19:25 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871 | 19:27 |
AJaeger | sshnaidm|afk: fixed the syntax for you ^ | 19:28 |
*** jcoufal_ has quit IRC | 19:28 | |
zbr | AJaeger: thanks! i was away to pick up something and when I am back i see it fixed, that's a really nice experience. | 19:30 |
AJaeger | ;) | 19:31 |
AJaeger | zbr: you can do now the tripleo-ci part of that change ;) | 19:32 |
*** pkopec has joined #openstack-infra | 19:32 | |
*** eernst has joined #openstack-infra | 19:33 | |
*** pkopec has quit IRC | 19:36 | |
*** ttx has quit IRC | 19:49 | |
*** ttx has joined #openstack-infra | 19:50 | |
clarkb | donnyd: fyi getting {'message': 'No valid host was found. There are not enough hosts available.', 'code': 500, 'created': '2019-07-09T19:49:38Z'} from nova trying to create new instances | 19:51 |
donnyd | 60 seocnds | 19:51 |
donnyd | they are just coming back up | 19:52 |
donnyd | swapped out my core switch for 40G while I was tinkering.. so I won't have to do it later | 19:52 |
clarkb | k I'll delete and try again | 19:52 |
*** eernst has quit IRC | 19:52 | |
AJaeger | clarkb: was https://review.opendev.org/669871 and https://review.opendev.org/669871 what you had in mind for tox-molecule? I think you did and voted those by zbr up | 19:54 |
donnyd | should be good to go now | 19:55 |
clarkb | AJaeger: thats the same change twice but looknig | 19:55 |
*** ijw has joined #openstack-infra | 19:55 | |
clarkb | AJaeger: ya that is what I had in mind for the first bit. Then tripleo can inherit from that job and set the required projects | 19:55 |
clarkb | approved | 19:55 |
corvus | clarkb: want me to do the nodepool debug change? | 19:56 |
*** ijw has joined #openstack-infra | 19:56 | |
clarkb | corvus: if you are able that would be great | 19:56 |
corvus | on it | 19:56 |
* clarkb boots some new instances in fn | 19:56 | |
clarkb | I've got my held node to poke around on once I'm done double checking on fn | 19:57 |
donnyd | your other test instances should be back up and good to go as well | 19:57 |
clarkb | ok just confirmed that the image that fails in nodepool's func test boots properly on fortnebula with working ipv6 | 20:00 |
clarkb | waiting for the old unit image to boot to confirm that does not work. Then I think that confirms we have a good fix and just need to sort out why it breaks in testing | 20:00 |
donnyd | glance speeds are next on my hit list.. taking forever to pull down the image from the controller | 20:01 |
fungi | okay, so of the 9 leaked volumes in vexxhost-sjc1 (all were from the same date btw, 2019-06-21), cinder let me delete the 3 marked as available but not the 6 in an error_deleting state (not terribly surprising) | 20:06 |
fungi | those are accounting for 2 of the 5 total undeletable images in that provider, so i still have no explanation for the remaining 3 | 20:07 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Adds openstack-tox-molecule job so we can inherit it https://review.opendev.org/669871 | 20:07 |
fungi | next i guess i can scrape the image relationships for the in-use volumes and see if any turn up there | 20:08 |
*** eernst has joined #openstack-infra | 20:08 | |
zbr | thanks!! | 20:08 |
clarkb | donnyd: any idea why clarkb-test-centos-old-unit and clarkb-test-centos-old-unit2 seem to be stuck in BUILD state? Makes me wonder if I edited that image file improperly | 20:12 |
donnyd | its not the image | 20:12 |
donnyd | try one more time | 20:13 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669939 | 20:14 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 20:15 |
clarkb | donnyd: looks like clarkb-test-centos-old-unit2 ended up booting and it has the wrong ipv6 address and working ipv4 | 20:15 |
clarkb | ok so I think that confirms our suspicions of this being a fix. Whcih means I have to direct my focus on making the test job work | 20:15 |
corvus | clarkb: i rechecked 669773; i think that's everything, but you may want to give it a quick once-over before we wait an hour for nothing :) | 20:16 |
fungi | bingo! all the undeletable images are also associated with in-use volumes in cinder | 20:16 |
clarkb | corvus: looking | 20:16 |
fungi | now to try and identify whether they're really still used by server instances. smart money says they're not | 20:17 |
fungi | and it's the cinder metadata which is the problem | 20:17 |
clarkb | corvus: yup that looks about right | 20:17 |
clarkb | thanks | 20:17 |
clarkb | comparing console log from held test node against the images I uploaded to fn from that test node the major difference seems to be that there are no dhcp actions logged in the failed test node side | 20:19 |
clarkb | I think that is the thread to pull on | 20:20 |
Shrews | fungi: your detective skills are impressive | 20:20 |
fungi | Shrews: impressively slow | 20:21 |
*** xek has quit IRC | 20:21 | |
clarkb | I'm gonna pop out for a few then dig into that | 20:21 |
fungi | my detective kit consists primarily of grep and for loops in shell | 20:22 |
*** Lucas_Gray has joined #openstack-infra | 20:24 | |
*** irclogbot_1 has joined #openstack-infra | 20:24 | |
openstackgerrit | Nate Johnston proposed opendev/irc-meetings master: Capture artifacts from ical generation https://review.opendev.org/669775 | 20:26 |
*** irclogbot_1 has quit IRC | 20:27 | |
fungi | oof, 14 volumes in-use referencing those undeletable images | 20:31 |
fungi | and yeah, looks like these are also all from 2019-06-21 | 20:32 |
fungi | something must have happened to leave this mess behind | 20:32 |
Shrews | don't blame me. i was under general anesthesia that day | 20:33 |
Shrews | or maybe i did do it and can't remember | 20:33 |
fungi | hah | 20:34 |
Shrews | :) | 20:34 |
fungi | and as i suspected, spot checks indicate the server-id referenced for each of these attachments, when fed into openstack server show, comes back with a "No server with a name or ID...exists" error | 20:34 |
fungi | also, i mistyped, it's 24 volumes not 14 | 20:36 |
*** hamzy has quit IRC | 20:37 | |
*** eharney has quit IRC | 20:37 | |
*** jamesmcarthur has quit IRC | 20:40 | |
*** jamesmcarthur has joined #openstack-infra | 20:41 | |
fungi | not finding any mention in irc channel logs of something happening on that date which might explain this | 20:41 |
fungi | i was trying to troubleshoot nodepool prematurely deleting network ports in ovh | 20:41 |
*** strigazi has quit IRC | 20:42 | |
*** Fidde has joined #openstack-infra | 20:42 | |
fungi | Shrews: that reminds me, did you catch that conversation later? | 20:42 |
Shrews | nope | 20:42 |
*** joeguo has joined #openstack-infra | 20:42 | |
fungi | upshot was https://review.opendev.org/666852 Increase port cleanup interval | 20:43 |
*** strigazi has joined #openstack-infra | 20:43 | |
Shrews | ah, i did see that change | 20:43 |
fungi | basically the port gets allocated, nova takes a while scheduling the server instance, nodepool thinks the port is leaked and deletes it, instance is eventually scheduled and then the boot fails | 20:44 |
fungi | the lag was prevalent enough in ovh that it was creating a fairly steady stream of boot failures for us | 20:45 |
*** jamesmcarthur has quit IRC | 20:48 | |
fungi | mnaser: executive summary of the remainder of the leak in sjc1... something seems to have happened there on 2019-06-21 which left 24 orphaned cinder volumes in sjc1 claiming to be in-use attached to server instances which no longer exist per nova. we've also got another 6 cinder volumes from that date which are stuck in an error_deleting state and i can't seem to delete those either. there are 5 | 20:52 |
fungi | glance images which we should be able to delete once the volumes are gone. do you want that list of 30 volume uuids? | 20:52 |
smcginnis | When the volume lifecycle is managed by nova, I've seen cases where Nova hits an error and never tells Cinder to delete them. Maybe the case this time. | 20:53 |
*** bobh has joined #openstack-infra | 20:54 | |
fungi | yeah, kinda like the port leaks we see with nova/neutron coordination in some providers | 20:58 |
*** bobh has quit IRC | 21:01 | |
*** pkopec has joined #openstack-infra | 21:01 | |
mordred | fungi: are the volumes identifiable enough, do you think, that we could delete them it neh nodepool/sdk cleanup method? I'm a little more conservative about volumes, because I don't want to delete something that someone needs though ... | 21:04 |
openstackgerrit | Merged zuul/zuul master: Additional note about branches for implied-branches https://review.opendev.org/667415 | 21:06 |
fungi | mordred: in this case cinder seems to not be letting me delete these | 21:06 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948 | 21:06 |
fungi | i think they need admin override | 21:07 |
mordred | ah - well then | 21:07 |
fungi | cinder thinks they're still in use by servers nova says don't exist | 21:07 |
corvus | mordred: ^ when you have a sec, can you give 669948 a really quick once-over and see if that's the general approach you were thinking about? | 21:07 |
corvus | (or, if it's compatible with what you were thinking) | 21:07 |
corvus | mordred: only look at the readme, the code is wrong; it's just a copy of the old role right now) | 21:08 |
corvus | also, hah, my example doesn't match my docs, but you get the idea | 21:08 |
mordred | corvus: ooh - yeah. that seems great - like the keys can be a good api and stuff | 21:08 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948 | 21:09 |
mordred | corvus: since this is docware at the moment - should this account for per-region things with some defined substitution vars? | 21:09 |
mordred | corvus: like: mirror_info.pypy: https://mirror.{{ nodepool.region }}.{{ nodepool.cloud }}.example.com/pypi/simple | 21:10 |
corvus | mordred: yeah, i'm expecting we can do that more or less the way we do now | 21:10 |
mordred | (for instance) | 21:10 |
mordred | cool | 21:10 |
corvus | i could put that in the readme as a more complex example? or i could push up a DNM change to project-config with what it would look like for us | 21:11 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948 | 21:11 |
mordred | corvus: I think it's worth including in the README - sine we're defining an API for other people to be able to use as well | 21:11 |
openstackgerrit | Merged zuul/zuul master: Run jobs when their own config changes https://review.opendev.org/669752 | 21:11 |
mordred | I mean, ansible variables work like they work, so it's not actullay documenting role-specific behavior - but it might not occur to someone that they can do that | 21:12 |
*** ekultails has quit IRC | 21:12 | |
*** irclogbot_0 has joined #openstack-infra | 21:14 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: add configure-mirrors2 https://review.opendev.org/669948 | 21:15 |
corvus | mordred: ya, good point. | 21:15 |
mordred | ++ | 21:15 |
mordred | that looks great | 21:15 |
corvus | mordred: i think that's probably at a point i can start an email thread on zuul-discuss to get design review on it, yeah? | 21:15 |
mordred | yah | 21:15 |
corvus | i'll fire off an email now | 21:16 |
mordred | corvus: I think it's going to wind up being nice for opendev too - since there will be a nice easy list of what mirrors exist | 21:16 |
corvus | ++ | 21:16 |
mordred | incidentally - I think in the new scheme we should define 'docker' as a mirror type and have it mean the new v2 style mirror - my gut says to leave the v1 style out becuase all of the clients are supposed to be able to deal with v2 mirrors now - but if we need to still support it have its key be dockerv1 or olddocker or something like that | 21:17 |
corvus | sounds reasonable | 21:18 |
mordred | corvus: or - should we pick a more generic - 'container-registry' - and maybe paint it chartreuse | 21:19 |
*** irclogbot_0 has quit IRC | 21:19 | |
*** pcaruana has quit IRC | 21:19 | |
corvus | hrm, that's an interesting question. | 21:19 |
*** joeguo has quit IRC | 21:20 | |
*** goldyfruit has quit IRC | 21:20 | |
*** weifan has quit IRC | 21:21 | |
*** weifan has joined #openstack-infra | 21:21 | |
*** Lucas_Gray has quit IRC | 21:22 | |
clarkb | seems like docker is its own protocol and maybe being specific there is good | 21:22 |
clarkb | also if anyone is wondering hand editing /etc/shadow in order to login in on the console is error prone | 21:23 |
clarkb | (I have no idea what I've done wrong at this point /me starts from fresh copy of original image) | 21:24 |
*** weifan has quit IRC | 21:26 | |
mordred | clarkb: well - yeah - that was my first thought- but I think the container registry protocol is more generic (with things like quay.io and gcr.io) | 21:26 |
clarkb | mordred: right but it is the docker container registry protocol because they don't http properly :/ | 21:27 |
fungi | clarkb: can you pass init=/bin/sh on the kernel command-line and then remount / rw and use passwd to set a rootpw? | 21:27 |
clarkb | other services just http iirc | 21:27 |
clarkb | fungi: this is the broken centos test image so I can just copy the original and edit it again | 21:27 |
mordred | really? jesus | 21:27 |
fungi | ahh, easier then | 21:28 |
mordred | clarkb: so it might be more accurate then to call it "dockerhub" | 21:28 |
fungi | chroot into it and use passwd there | 21:28 |
clarkb | mordred: ya it doesn't support ipv6 address literals for example | 21:28 |
mordred | actually - it's more accurate to call it dockerhub anyway - since it's not a generic registry, it's specifically dockerhub we're mirroring here | 21:28 |
mordred | and if we were to mirror quay.io, I'd expect a second entry called "quay" or "quay.io" or something | 21:29 |
clarkb | ya | 21:30 |
*** Fidde has quit IRC | 21:30 | |
corvus | i just realized this doc section shouldn't be the doc for configure-mirrors2, it should be a top-level doc since it's about a variable that any role might use. | 21:31 |
corvus | so i'm doing a quick re-org before i send out the mail | 21:31 |
mordred | corvus: ++ | 21:32 |
*** irclogbot_1 has joined #openstack-infra | 21:38 | |
*** eernst has quit IRC | 21:41 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror_info documentation https://review.opendev.org/669948 | 21:42 |
*** irclogbot_1 has quit IRC | 21:43 | |
*** donnyd is now known as donnyd_pto | 21:43 | |
donnyd_pto | I will be around, just not at my desk all day | 21:44 |
donnyd_pto | ping me if I am needed | 21:44 |
clarkb | donnyd_pto: did you see I think I confirmed the fix is a good fix? just have to understand why it fails in testing now | 21:46 |
clarkb | fungi: so I did the chroot and chpasswd which was fine boot actually started dbus because root existed but then I couldnt login I think due to selinux contexts so fixing that now :/ | 21:47 |
fungi | yuck | 21:47 |
clarkb | eventually I'll be able to examing the running system without working networking :/ | 21:48 |
* fungi misses the days when you could log into a terminal without a message bus getting involved | 21:48 | |
clarkb | fungi: back when you could put unecrypted passwords in the passwd file | 21:48 |
donnyd_pto | clarkb: would it be helpful to get some actual public ipv4 addresses so you could ssh in and see whats the what? | 21:49 |
corvus | shadow has been around for a long time | 21:49 |
fungi | clarkb: back when most servers had a passwordless "guest" account | 21:49 |
clarkb | donnyd_pto: no this is on one of our test nodes where I'm reproducing the test failures of the fix so that I can figure out how to fix the test for the fix | 21:49 |
fungi | that's so meta | 21:50 |
donnyd_pto | clarkb: Could you repeat that in terms that actual humans can understand | 21:50 |
donnyd_pto | jk... lmk if you need anything from me | 21:50 |
clarkb | and ya now I can login as root finally. Also restorecon doesn't just work in a chroot if anyone is wondering. Thankfully dib has examples of how to make it work | 21:51 |
*** irclogbot_0 has joined #openstack-infra | 21:54 | |
clarkb | ok I have managed to confirm /var/lib/dhclient is an empty dir | 21:54 |
clarkb | implying it is never requesting the dhcp leases as suspected | 21:55 |
*** altlogbot_3 has joined #openstack-infra | 21:55 | |
*** altlogbot_3 has quit IRC | 21:55 | |
clarkb | now trying a reboot to see if it works on second boot as it did on fn | 21:59 |
*** irclogbot_0 has quit IRC | 21:59 | |
*** igordc has quit IRC | 22:00 | |
clarkb | and it works after a reboot | 22:02 |
clarkb | its almost like the ifcfg file isn't synced to disk by the time network manager runs? | 22:02 |
clarkb | since that is what tells network manager to run a dhclient | 22:02 |
*** aaronsheffield has quit IRC | 22:03 | |
*** weifan has joined #openstack-infra | 22:10 | |
*** rcernin has joined #openstack-infra | 22:10 | |
*** tosky has quit IRC | 22:18 | |
*** weifan has quit IRC | 22:19 | |
*** jamesmcarthur has joined #openstack-infra | 22:21 | |
*** pkopec_ has joined #openstack-infra | 22:24 | |
fungi | more like not written to the file... they should both be working off a coherent fs cache even if its writes haven't yet synchronized to the underlying block device | 22:24 |
clarkb | ya adding a sync changed nothing. I've now figured out how to have network manager logging verbosity go way up so rerunning with that hoping for lcues | 22:25 |
fungi | you should never need to call sync for one process to read another's file off the same copy of a filesystem. it may be that the writing process hasn't called fsync yet and still has its fd writes buffered | 22:26 |
*** pkopec has quit IRC | 22:26 | |
fungi | er, hasn't flushed | 22:26 |
clarkb | while in general I agree I've definitely found cases where that isn't true like testing zuul's disk usage monitor on btrfs | 22:28 |
clarkb | it failed reliably for me on btrfs locally until we added some flushes | 22:28 |
*** goldyfruit has joined #openstack-infra | 22:28 | |
clarkb | er I guess that is what you said | 22:28 |
fungi | flushes of the buffered writes for the process, yes | 22:28 |
fungi | sync to the block device should never be necessary if both processes are going through the same filesystem interface though | 22:29 |
fungi | (now if one is reading straight from the block device and the other is writing through the fs layer, sure) | 22:30 |
fungi | but say you have a process writing to an open file descriptor, it's almost certainly buffering writes in process memory and flushing them to the fs only under certain conditions (say, when the buffer gets full or flush() is explicitly called) | 22:33 |
fungi | (or when close() happens) | 22:33 |
fungi | and yeah, with a persistent daemon writing small amounts of data, it nearly always comes down to where the author included flush() calls | 22:34 |
*** lathiat has quit IRC | 22:35 | |
*** rh-jelabarre has quit IRC | 22:36 | |
*** pkopec_ has quit IRC | 22:37 | |
*** armax has quit IRC | 22:40 | |
*** altlogbot_2 has joined #openstack-infra | 22:44 | |
*** weifan has joined #openstack-infra | 22:45 | |
*** altlogbot_2 has quit IRC | 22:49 | |
*** tkajinam has joined #openstack-infra | 22:52 | |
*** bobh has joined #openstack-infra | 22:52 | |
*** slaweq has quit IRC | 22:54 | |
clarkb | ok via nmcli and networkmanager logs I think I've figured out that there are two logical eth0 interfaces being managed by NM: "eth0" and "System eth0" the second one is the one that has ipv4.method set to auto the first is set to disabled | 22:55 |
clarkb | so it must be confusing those two for some reason | 22:56 |
*** bobh has quit IRC | 22:57 | |
corvus | i'm going to restart zuul to pick up the self-config change | 23:01 |
clarkb | https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#114 getting closer I think | 23:05 |
openstack | Debian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open] | 23:05 |
clarkb | trying https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202#331 next | 23:08 |
openstack | Debian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open] | 23:08 |
fungi | that does sound likely | 23:10 |
clarkb | if that fixes it the race must be glean not needing to run on second boot so NM starts sooner prior to kernel autoconfing? | 23:12 |
*** goldyfruit has quit IRC | 23:14 | |
corvus | #status log restarted all of zuul at commit 86f071464dafd584c995790dce30e2e3ca98f5ac | 23:15 |
openstackstatus | corvus: finished logging | 23:15 |
corvus | fungi, clarkb: https://review.opendev.org/669762 is a neato change :) | 23:15 |
clarkb | corvus: +2 | 23:17 |
*** weifan has quit IRC | 23:17 | |
fungi | i suppose it's self-testing since the restart, except that we wouldn't be able to tell the difference so i guess there's little point in rechecking? | 23:18 |
fungi | (it should trigger testing for all the jobs we just removed the file matcher for it from) | 23:19 |
corvus | we *could* recheck it and we should see more jobs in check, but i vote we just +W it and see them in gate. | 23:20 |
clarkb | wow I have an ip address now | 23:20 |
clarkb | so Network Manager doesn't know what to do if the kernel assigns an IP to an interface | 23:20 |
clarkb | and of course this is totally fine on our ipv4 only clodus because there is no ipv6 advertisements | 23:21 |
*** jamesmcarthur has quit IRC | 23:21 | |
corvus | fungi: well, it looks like it's *not* triggering those jobs... maybe changing file matchers doesn't trip the detection. | 23:21 |
clarkb | so now I need to have a think about the best way to handle this. We can't assume the interface names on boot because of biosdevname. We could update sysctl if NM is used to disabled autoconfg and RAs by default on all interfaces? | 23:22 |
*** weifan has joined #openstack-infra | 23:22 | |
clarkb | Then it is NetworkManager's job not the kernel's to figure it out? | 23:22 |
clarkb | this seems like such a huge NM bug | 23:22 |
openstackgerrit | James E. Blair proposed opendev/system-config master: DNM: test jobs running on config update https://review.opendev.org/669957 | 23:24 |
*** dchen has joined #openstack-infra | 23:25 | |
corvus | fungi: ^ that seems to be doing what we expect (it's running the eavesdrop job) | 23:25 |
clarkb | fungi: corvus ^ any concerns with doing a blanket sysctl disable of autoconf and accept_ra if NM is used in dib? I'm worried we'll end up breaking ipv6 somehow doing that (but end up with working ipv4 whcih is great except where we want ipv6 too) | 23:25 |
corvus | if changes to file matchers don't trigger job runs, i'm okay with that :) | 23:25 |
corvus | clarkb: i think it would take quite a bit of time for me to come up to speed on this problem enough to render an opinion. | 23:28 |
clarkb | I'll try to summarize since it is probably worthwhile anyway | 23:28 |
clarkb | On boot if NetworkManager sees that an interface already exists it creates a dummy record internally so that it knows about that interface but then assumes something else is managing that interface and doesn't touch it further. If you have kernel/sysctl settings for autoconfing ipv6 on interfaces it is possible for the kernel to assign ipv6 addrs to an interface prior to NM starting. In that case NM | 23:30 |
fungi | clarkb: yeah, i'm worried disabling kernel handling of autoconf will simply break ipv6 since nm likely relies on at least some of that (if nm reimplemented that, wow, crazypants?) | 23:30 |
clarkb | doesn't touch the interface and you get no dhcp for working ipv4 | 23:30 |
clarkb | fungi: I think NM did reimplement that | 23:30 |
clarkb | hrm except I don't haev ipv6 addr on interface that neutron says should be there so maybe not | 23:31 |
clarkb | ugh | 23:31 |
fungi | remind me again why we switched to nm for these images? fedora depends on it and it was hard to untangle the centos elements from fedora's? | 23:32 |
clarkb | fungi: rhel 8 is NM only aiui | 23:32 |
clarkb | so we implemented it to ease transition to centos 8 | 23:33 |
clarkb | (it is the default on centos 7/rhel 7 but the old sysconfig only stuff exists without NM) | 23:33 |
fungi | i have a hard time believing rhel8 strips ipv6 autoconf out of the kernel | 23:33 |
fungi | but... maybe? | 23:33 |
clarkb | its possible rhel8 uses a much newer NM that isn't broken | 23:33 |
clarkb | or that multi year bug in debian shows that the distros mostly ignore the problem | 23:34 |
fungi | i guess they've made more surprising choices before | 23:34 |
clarkb | this does certainly seem like a fundamental flaw in using NM as network configurator when expecting working ipv6 | 23:35 |
clarkb | thinking on the race more I think the problem we have is our boot is many minutes long (because qemu) and neutron sends RAs more frequently than that | 23:37 |
clarkb | so we have a high chance of accepting an RA before network manager has started | 23:37 |
clarkb | whereas in the real world that ordering is far less likely | 23:37 |
clarkb | (still possible though) | 23:37 |
*** armax has joined #openstack-infra | 23:39 | |
clarkb | I'll have to make a new image without that sysctl settings boot it again and check if ipv6 is configured (basically determines that ipv6 is not working with NM then) | 23:39 |
openstackgerrit | Merged opendev/system-config master: Remove .zuul.yaml file matchers https://review.opendev.org/669762 | 23:40 |
*** jamesmcarthur has joined #openstack-infra | 23:42 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669939 | 23:43 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 23:43 |
clarkb | ok confirmed if I do that we don't get working ipv6 | 23:45 |
clarkb | but if I remove the sysctl entries then we have the ipv6 addr that is expected | 23:46 |
*** sthussey has quit IRC | 23:46 | |
*** jamesmcarthur has quit IRC | 23:46 | |
clarkb | maybe thats the hack around this: have the functional job use ipv6 (that likely requires we create an interface on the host node with an ipv6 addr that can route to the test node ipv6 addr(s) | 23:46 |
*** tdasilva has quit IRC | 23:47 | |
clarkb | this is the sort of problem that deserves a stewing | 23:49 |
corvus | i made brunswick stew on sunday | 23:54 |
clarkb | hrm now I want stew for dinner | 23:55 |
*** mattw4 has quit IRC | 23:55 | |
*** bobh has joined #openstack-infra | 23:56 | |
*** hamzy has joined #openstack-infra | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!