Wednesday, 2018-10-31

openstackgerritIan Wienand proposed openstack/diskimage-builder master: Update test coverage for openSUSE-minial to 15.0  https://review.openstack.org/61043600:01
openstackgerritMerged openstack/diskimage-builder master: Turn on quiet mode when logfile specified  https://review.openstack.org/61286500:15
*** pall is now known as pabelanger00:18
pabelangerclarkb: mwhahaha: weshay: rfolco|rover: re: docker downloads, there are some tripleo jobs that are not setup to use the reverse proxy for docker, so they will be directly downloading from docker.io each time. For example: http://logs.openstack.org/61/613661/4/gate/tripleo-puppet-ci-centos-7-undercloud-containers/bf1c2d8/logs/undercloud/etc/docker/daemon.json.txt.gz00:19
pabelangernot sure what other jobs are affect, but somebody should audit them all and confirm setup propelry00:19
pabelangeralso, last time I looked into docker caching of images, they would expire after 4 hours. Which meant our miss rate in apache logs was very high.  I did reach out to somebody at docker to see why it was so low, by had a hard time sync with them.00:20
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Update devstack test to Fedora 28  https://review.openstack.org/61437500:21
clarkbpabelanger: right but we'd verify the imahe hasnt updated not redownload it after the 4 hour expiry00:23
clarkbso thats not great but not terrible00:23
pabelangeryah, I cannot remember everything about the last time I debugged, but there were some images that would redownload multiple times a day00:24
pabelangerI was at the point of starting to modify expire headers for testing, but that of course would break http standards00:25
pabelangerThe other options, somebody could try would be to use the new zuul_return for pausing a job: https://zuul-ci.org/docs/zuul/user/jobs.html?highlight=zuul_return#pausing-the-job and have a top-level job first download all images once, then child jobs pull from it. However, that should be no different then our current reverse proxy setup00:26
pabelangerthat also means, job redesign00:26
clarkbbefore we go making drastic changes I think we ned to be able to measure this stuff00:28
clarkbhow long downloads take, how large images are, what the theoretical best network throughput is for that size of file and how we compare (assume gigabit netwokring probably)00:28
pabelangerOh, yah. Images in tripleo are large too, something like 2.5GB of data for each job00:29
pabelangerfat containers for sure00:29
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Update devstack test to Fedora 28  https://review.openstack.org/61437500:30
mwhahahaclarkb: k I'll look into how that got dropped. Thanks00:30
pabelangeranyways, wanted to share some info the last time we had this issue. But do think an audit of jobs will be helpful00:30
clarkbright and if we are already near the best case for network tramsfer of that size we arent going to speed up much without changing images. But we need the data00:30
*** pabelanger is now known as pall00:33
*** gyee has quit IRC00:36
*** markvoelker has quit IRC00:49
*** markvoelker has joined #openstack-infra00:50
openstackgerritMerged openstack-infra/irc-meetings master: Fix meeting ID for Cyborg  https://review.openstack.org/61267600:51
*** longkb has joined #openstack-infra00:52
*** anteaya has quit IRC00:52
*** ssbarnea has quit IRC00:52
*** markvoelker has quit IRC00:55
openstackgerritMerged openstack/diskimage-builder master: Remove python3 legacy jobs  https://review.openstack.org/61404701:18
mwhahahaclarkb: heads up, that job isn't using docker which is why that docker config file is not configured with a mirror. I will have to track down where the podman mirror config is tho01:24
*** ccamacho has quit IRC01:24
*** ccamacho has joined #openstack-infra01:25
clarkbthat would certainly explain it01:27
*** eernst has joined #openstack-infra01:50
*** eernst has quit IRC01:59
*** hongbin has joined #openstack-infra02:02
*** diablo_rojo has quit IRC02:02
*** vtapia has joined #openstack-infra02:06
*** erlon has quit IRC02:33
*** ykarel has joined #openstack-infra02:34
*** rh-jelabarre has quit IRC02:39
*** ramishra has quit IRC02:43
*** ramishra has joined #openstack-infra02:49
*** markvoelker has joined #openstack-infra02:51
*** psachin has joined #openstack-infra02:52
*** mrsoul has joined #openstack-infra02:53
*** carl_cai has joined #openstack-infra03:02
*** markvoelker has quit IRC03:24
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Add ubuntu-systemd-container operating-system element  https://review.openstack.org/56374803:29
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Add systemd-containers functional tests  https://review.openstack.org/61405103:29
*** bhavikdbavishi has joined #openstack-infra03:31
*** bhavikdbavishi has quit IRC03:31
*** bhavikdbavishi has joined #openstack-infra03:33
*** udesale has joined #openstack-infra03:47
*** janki has joined #openstack-infra03:47
*** hongbin has quit IRC04:01
*** dave-mccowan has quit IRC04:18
*** markvoelker has joined #openstack-infra04:21
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing  https://review.openstack.org/61440204:48
*** markvoelker has quit IRC04:54
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing  https://review.openstack.org/61440205:05
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Add prepare-workspace-git role  https://review.openstack.org/61303605:11
openstackgerritMerged openstack-infra/zuul-jobs master: Updated bindep to cover for MacOS requirements  https://review.openstack.org/61372705:19
*** ykarel has quit IRC05:21
*** ykarel has joined #openstack-infra05:21
*** noama has joined #openstack-infra05:22
*** armax has joined #openstack-infra05:28
*** pcaruana|elisa| has joined #openstack-infra05:29
*** pcaruana|elisa| has quit IRC05:37
*** janki has quit IRC05:46
*** janki has joined #openstack-infra05:49
*** markvoelker has joined #openstack-infra05:51
*** ykarel has quit IRC05:51
*** bhavikdbavishi has quit IRC05:56
*** bhavikdbavishi has joined #openstack-infra06:08
*** carl_cai has quit IRC06:10
*** e0ne has joined #openstack-infra06:16
*** e0ne has quit IRC06:18
*** janki has quit IRC06:18
*** markvoelker has quit IRC06:26
*** dpawlik has joined #openstack-infra06:27
*** dpawlik has quit IRC06:28
*** dpawlik has joined #openstack-infra06:28
*** quiquell|off is now known as quiquell06:37
*** janki has joined #openstack-infra06:37
*** bhavikdbavishi has quit IRC06:38
*** bhavikdbavishi has joined #openstack-infra06:39
*** armax_ has joined #openstack-infra06:42
*** armax has quit IRC06:42
*** armax_ is now known as armax06:42
*** ifat_afek has joined #openstack-infra06:49
*** ifat_afek has quit IRC06:55
*** ifat_afek has joined #openstack-infra06:55
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing  https://review.openstack.org/61440207:03
*** chkumar|off is now known as chandankumar07:13
*** ccamacho has quit IRC07:15
*** ccamacho has joined #openstack-infra07:15
*** markvoelker has joined #openstack-infra07:24
openstackgerritMerged openstack-infra/zuul master: Collect docker logs after quick-start run  https://review.openstack.org/61302707:25
*** xek has joined #openstack-infra07:38
*** ykarel has joined #openstack-infra07:42
*** pcaruana|elisa| has joined #openstack-infra07:45
*** shardy has joined #openstack-infra07:45
*** shardy_ has joined #openstack-infra07:46
*** ykarel has quit IRC07:49
*** ykarel has joined #openstack-infra07:50
*** ykarel has quit IRC07:56
*** markvoelker has quit IRC07:57
*** ykarel has joined #openstack-infra08:02
*** sshnaidm|off is now known as sshnaidm|ruck08:03
*** florianf|afk is now known as florianf08:05
*** ykarel has quit IRC08:10
*** e0ne has joined #openstack-infra08:12
*** bhavikdbavishi has quit IRC08:16
*** jtomasek has joined #openstack-infra08:18
*** ykarel has joined #openstack-infra08:27
*** bhavikdbavishi has joined #openstack-infra08:28
*** jpich has joined #openstack-infra08:28
*** ykarel has quit IRC08:28
*** ykarel has joined #openstack-infra08:29
*** larainema has quit IRC08:30
*** ralonsoh has joined #openstack-infra08:31
*** ykarel_ has joined #openstack-infra08:31
*** ykarel has quit IRC08:34
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin  https://review.openstack.org/60238508:34
*** ykarel has joined #openstack-infra08:34
ianwmordred, infra-root: ^ i think that should be CI happy now; i had to add an extra groups file08:38
*** ykarel_ has quit IRC08:38
*** ykarel has quit IRC08:40
*** ssbarnea has joined #openstack-infra08:40
*** derekh has joined #openstack-infra08:51
*** jpena|off is now known as jpena08:52
openstackgerritTakashi NATSUME proposed openstack-infra/project-config master: Remove placement-api-ref jobs  https://review.openstack.org/61443508:54
*** markvoelker has joined #openstack-infra08:54
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Logs stats for nodepool automated cleanup  https://review.openstack.org/61407408:54
*** panda|off is now known as panda09:01
openstackgerritTakashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition  https://review.openstack.org/61444009:02
*** d0ugal has joined #openstack-infra09:02
*** hasharAway is now known as hashar09:03
openstackgerritTakashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition  https://review.openstack.org/61444009:03
*** mordred[m] has quit IRC09:13
*** pcaruana|elisa| has quit IRC09:13
*** niedbalski has quit IRC09:15
*** pcaruana|elisa| has joined #openstack-infra09:15
*** bhavikdbavishi has quit IRC09:15
*** Qiming has quit IRC09:16
*** e0ne has quit IRC09:16
*** bhavikdbavishi has joined #openstack-infra09:16
*** GDPR has quit IRC09:18
*** Qiming has joined #openstack-infra09:19
*** GDPR has joined #openstack-infra09:19
openstackgerritTakashi NATSUME proposed openstack-infra/project-config master: Replace placement-api-ref jobs for nova project  https://review.openstack.org/61443509:23
*** e0ne has joined #openstack-infra09:25
*** markvoelker has quit IRC09:27
*** gfidente has joined #openstack-infra09:37
*** bhavikdbavishi has quit IRC09:47
*** bhavikdbavishi has joined #openstack-infra09:48
*** lennyb has quit IRC09:48
*** lennyb has joined #openstack-infra09:49
*** ianychoi has quit IRC09:51
*** ianychoi has joined #openstack-infra09:52
*** dpawlik_ has joined #openstack-infra09:55
*** dpawlik_ has quit IRC09:56
*** dpawlik has quit IRC09:57
*** dpawlik has joined #openstack-infra09:57
*** xinliang has joined #openstack-infra10:00
*** kjackal has joined #openstack-infra10:01
xinliangianw: Could you have time to see this bug: https://bugs.linaro.org/show_bug.cgi?id=403510:02
openstackbugs.linaro.org bug 4035 in Default "[uk cloud] Wget fetching from mirror.london.linaro-london.openstack.org is more slower and unstable than deb.debian.org" [Enhancement,Unconfirmed] - Assigned to gema.gomez-solano10:02
xinliangianw: london cloud's mirror repo is slower and unstable10:02
*** kopecmartin|off is now known as kopecmartin10:05
*** electrofelix has joined #openstack-infra10:07
*** bhavikdbavishi has quit IRC10:11
*** apetrich has quit IRC10:12
*** markvoelker has joined #openstack-infra10:24
*** apetrich has joined #openstack-infra10:27
*** rossella_s has joined #openstack-infra10:31
*** yamamoto has quit IRC10:34
*** yamamoto has joined #openstack-infra10:34
*** yamamoto has quit IRC10:39
*** yamamoto has joined #openstack-infra10:40
*** yamamoto has quit IRC10:41
*** shrasool has joined #openstack-infra10:42
*** dtantsur|afk is now known as dtantsur10:49
*** markvoelker has quit IRC10:58
*** e0ne has quit IRC11:15
*** e0ne has joined #openstack-infra11:15
*** jtomasek has quit IRC11:16
*** e0ne has quit IRC11:27
*** yamamoto has joined #openstack-infra11:28
*** roman_g has joined #openstack-infra11:28
*** ramishra has quit IRC11:28
*** hashar is now known as hasharLunch11:28
*** udesale has quit IRC11:32
*** yamamoto has quit IRC11:35
*** yamamoto has joined #openstack-infra11:35
*** carl_cai has joined #openstack-infra11:36
*** jtomasek has joined #openstack-infra11:38
*** ramishra has joined #openstack-infra11:40
*** rossella_s has quit IRC11:40
*** longkb has quit IRC11:41
*** bhavikdbavishi has joined #openstack-infra11:43
*** rh-jelabarre has joined #openstack-infra11:46
*** markmcd has quit IRC11:49
*** ramishra has quit IRC11:49
openstackgerritMerged openstack-infra/project-config master: Replace placement-api-ref jobs for nova project  https://review.openstack.org/61443511:51
*** udesale has joined #openstack-infra11:54
*** markvoelker has joined #openstack-infra11:54
*** rossella_s has joined #openstack-infra11:55
*** ramishra has joined #openstack-infra11:56
*** trown|outtypewww is now known as trown12:05
*** ramishra has quit IRC12:07
*** pbourke has quit IRC12:09
*** pbourke has joined #openstack-infra12:10
*** ramishra has joined #openstack-infra12:10
*** markvoelker has quit IRC12:13
*** panda is now known as panda|lunch12:15
*** jchhatbar has joined #openstack-infra12:28
*** jchhatbar has quit IRC12:28
*** janki has quit IRC12:29
*** janki has joined #openstack-infra12:29
*** kgiusti has joined #openstack-infra12:31
*** ansmith has joined #openstack-infra12:34
*** niedbalski has joined #openstack-infra12:36
*** rlandy has joined #openstack-infra12:38
*** tpsilva has joined #openstack-infra12:38
openstackgerritTakashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition  https://review.openstack.org/61444012:41
*** boden has joined #openstack-infra12:41
*** yamamoto has quit IRC12:44
*** jpena is now known as jpena|lunch12:46
*** sshnaidm|ruck is now known as sshnaidm|bbl12:50
*** quiquell is now known as quiquell|lunch12:51
*** ifat_afek has quit IRC12:56
*** bhavikdbavishi has quit IRC12:57
*** panda|lunch is now known as panda13:02
*** yamamoto has joined #openstack-infra13:04
*** e0ne has joined #openstack-infra13:06
*** erlon has joined #openstack-infra13:08
*** lennyb has quit IRC13:10
*** lennyb has joined #openstack-infra13:12
*** eharney has joined #openstack-infra13:13
*** mriedem has joined #openstack-infra13:14
*** markmcd has joined #openstack-infra13:15
*** mriedem is now known as ash_williams13:15
*** auristor has quit IRC13:15
*** auristor has joined #openstack-infra13:16
*** agopi is now known as agopi|brb13:18
fricklerinfra-root: https://docs.openstack.org/infra/system-config/sysadmin.html#accessing-clouds doesn't work for me because I'm not a member of the admin group. is this a bug in the documentation or in the group setup?13:19
fricklerxinliang: I tested from an ubuntu node in london and accessing the mirror seems pretty fast to me. do you happen to know whether the issue is affecting only debian nodes?13:20
odyssey4meHi folks - I'd like to understand how many nodepool-launcher/nodepool-builder hosts are implemented for OpenStack-Infra, and whether they're implemented per provider, etc - is this documented somewhere?13:22
*** janki has quit IRC13:22
*** janki has joined #openstack-infra13:23
*** agopi|brb has quit IRC13:24
fricklerodyssey4me: these should be the configurations: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool13:26
fricklerinfra-root: I'm also stumbling about lack of osc being installed globally on bridge, does everyone manage that in their own local venv?13:27
*** sambetts|afk is now known as sambetts13:28
*** slaweq has quit IRC13:29
*** eharney_ has joined #openstack-infra13:30
odyssey4mefrickler ah, thanks - that's the info I'm after13:30
*** jamesmcarthur has joined #openstack-infra13:31
fricklerinfra-root: even creating a venv fails, maybe I'm missing some very basic understanding here ... http://paste.openstack.org/show/733698/13:32
fungifrickler: i have a python3-built ~/launch-env with openstackclient installed in it, yes13:34
*** eharney_ has quit IRC13:35
fungiand now i don't remember how i bootstrapped it, because as you note there's no global installation of virtualenv and even `python3 -m venv` is broken because ubuntu strips that out to a separate package we haven't installed13:36
*** yamamoto has quit IRC13:37
*** yamamoto has joined #openstack-infra13:38
*** yamamoto has quit IRC13:38
Shrewsfrickler: fungi: as root, python3 /usr/lib/python3/dist-packages/virtualenv.py --python=python3 ~/my-venv13:39
Shrewsit's weird13:39
Shrewsi don't understand why we don't just install python-venv13:40
fricklerShrews: thanks, that worked. and yes, it's weird indeed. unless mordred or clarkb have some reason not to do this, I'll propose an update13:41
SpamapSHey, isn't there a way to make a particular zuul change jump ot the top of the queue?13:43
ShrewsSpamapS: zuul promote, iirc13:44
Shrewsnever used it myself13:45
fungiShrews: strange that virtualenv is installed globally in dist-packages but there's no entrypoint in the standard path13:45
Shrewsfungi: yes, that is part of said weirdness13:46
* Shrews shakes his head and rolls eyes at python things13:46
fungiyeah, `zuul promote --tenant=xyzzy --pipeline=plugh --changes=3133,7`13:46
*** jpena|lunch is now known as jpena13:48
*** agopi has joined #openstack-infra13:48
*** fuentess has joined #openstack-infra13:50
SpamapSShrews: ty13:51
openstackgerritIvan Kolodyazhny proposed openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard  https://review.openstack.org/61450713:51
dhellmannis anyone else seeing issues with etherpad right now? it was a little flakey yesterday but I don't know if that's just a safari issue or if it's something else. right now it's *very* slow and I got some sort of JS kernel error on one page load13:51
dhellmannspecifically with https://etherpad.openstack.org/p/tc-topics-jlm-stein-berlin13:51
*** ifat_afek has joined #openstack-infra13:52
*** jamesmcarthur has quit IRC13:54
fricklerdhellmann: I can confirm that it looks stuck for me now, too. was working fine 2h ago13:54
*** jamesmcarthur has joined #openstack-infra13:55
dhellmannI just got: TimeoutError: The operation timed out. in https://etherpad.openstack.org/static/js/require-kernel.js13:55
dhellmannso not a "kernel error" (I hit reload too fast earlier) just a timeout13:55
*** kaiokmo has quit IRC14:00
*** quiquell|lunch is now known as quiquell14:00
*** yamamoto has joined #openstack-infra14:00
*** jamesmcarthur has quit IRC14:01
*** eernst has joined #openstack-infra14:01
*** sthussey has joined #openstack-infra14:05
hwoaranggood day infra-core. Could anyone reserve an opensuse150 node from the openstack-ansible-functional-opensuse-150 job on https://review.openstack.org/#/c/570543/ ? thank you14:09
Shrewsdhellmann: fungi: etherpad apache error log is reporting "scoreboard is full, not at MaxRequestWorkers"14:09
Shrewsnot quite sure what that means, but maybe restarting apache will help?14:09
fungidhellmann: looks like memory pressure again. upgrading to xenial seems to have possibly resulted in etherpad consuming a lot more cache memory14:09
fungii was going to see about building a replacement with an 8gb flavor14:10
fungiguess i'll start on that here shortly14:11
Shrewsfungi: is there a temp fix?14:11
mordredShrews, fungi: maybe restarting will help short term?14:12
Shrewsi'll restart apache....14:13
fungirestarting etherpad fixes it briefly from what i can tell, probably because everyone gets disconnected and not all of them reconnect14:13
*** shrasool has quit IRC14:13
fungithough yes, apache also seems to be consuming a fair amount of virtual memory14:14
*** d0ugal has quit IRC14:14
Shrewsapache restarted. fungi, is there a separate process for restarting etherpad service?14:15
fungiand we have a stray abiword process. how did we go about disabling abiword on the old server? anyone remember?14:15
fungiShrews: yes, etherpad-lite14:15
Shrewsdhellmann: try now14:16
*** hasharLunch is now known as hashar14:17
*** e0ne has quit IRC14:19
mordredfungi: hrm. I think maybe just uninstalling abiword?14:19
*** ifat_afek has quit IRC14:26
*** e0ne has joined #openstack-infra14:26
*** d0ugal has joined #openstack-infra14:27
*** dpawlik has quit IRC14:29
*** dpawlik has joined #openstack-infra14:30
*** d0ugal has quit IRC14:32
*** dpawlik has quit IRC14:34
*** ifat_afek has joined #openstack-infra14:34
*** dayou has quit IRC14:36
fungiit's been a while, but i think we had to do something uglier because etherpad refused to start if abiword wasn't present or something like that14:38
*** kaiokmo has joined #openstack-infra14:40
*** e0ne has quit IRC14:41
*** slaweq has joined #openstack-infra14:41
*** e0ne has joined #openstack-infra14:44
openstackgerritPierre Riteau proposed openstack/diskimage-builder master: Increase size of EFI system partition  https://review.openstack.org/61452614:45
*** quiquell is now known as quiquell|brb14:48
openstackgerritIvan Kolodyazhny proposed openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard  https://review.openstack.org/61450714:51
fungithe etherpad-lite restart seems to have freed a fair amount of cache memory: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=117&rra_id=all14:53
fungihowever the system cpu activity is persisting: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=114&rra_id=all14:53
clarkbfungi: can we try the hwe kernel first?14:54
fungioh, right, we wanted to try the hwe kernel14:54
fungishould that be a manual install or puppeted?14:54
clarkbfungi: I believe we puppet it for the zuul executors14:54
clarkbbut the resource usage and memory issues remind me a lot of the zuul executors14:54
clarkbhowever if we just want to one off test it we can probably do that by hand14:55
clarkbthen add to puppet if it fixes or leave as is and rebuild bigger if not14:55
*** d0ugal has joined #openstack-infra14:55
clarkbas for abiword I believe its a part of the etherpad config file (you set a path to the abiword executable)14:55
*** quiquell|brb is now known as quiquell14:56
clarkbhttps://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n930 puppet for hwe kernel15:00
*** ianychoi has quit IRC15:00
fungii'm looking to see whether they eventually made it possible to just disable abiword integration completely15:00
clarkbfungi: https://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/templates/etherpad-lite_settings.json.erb#n48 says we set that value to null to disable abiword15:00
*** gyee has joined #openstack-infra15:01
*** slaweq has quit IRC15:01
mordredfungi, clarkb: maybe we should uninstall abiword and then symlink /usr/bin/abiword to /bin/false15:04
clarkbor just disable it in the config?15:04
*** rossella_s has quit IRC15:04
clarkbI don't know how many people use the import/export functionality15:04
mordredoh - duh. I can't read15:05
*** jistr is now known as jistr|call15:05
*** rossella_s has joined #openstack-infra15:07
*** rpioso|afk is now known as rpioso15:11
*** rossella_s has quit IRC15:12
*** ccamacho has quit IRC15:13
*** dave-mccowan has joined #openstack-infra15:19
*** ash_williams has left #openstack-infra15:21
*** ash_williams has joined #openstack-infra15:21
openstackgerritJens Harbott (frickler) proposed openstack-infra/system-config master: Make the pip3 role really install something  https://review.openstack.org/61454515:23
fricklerShrews: fungi: ^^ I think this should fix the venv issue15:23
fricklerthis is called from https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/bridge.yaml15:24
*** dave-mccowan has quit IRC15:25
*** dayou has joined #openstack-infra15:25
*** bhavikdbavishi has joined #openstack-infra15:26
*** rossella_s has joined #openstack-infra15:27
*** slaweq has joined #openstack-infra15:30
lennybHi, I see that my nodepool process takes a lot of CPU. Is there any config that I can tune it?15:34
*** jistr|call is now known as jistr15:35
fungilennyb: which process specifically? nodepool consists of several daemons15:38
*** quiquell is now known as quiquell|off15:38
lennybfungi: nodepoold. http://paste.openstack.org/show/733706/  I see it takes a lot of CPU resource during delete and building15:43
fungilennyb: ahh, i suspect this is because it's polling for deletion and build. there are a lot of idle cycles in those threads though so the load measurements aren't necessarily a reliable indicator of cpu cycles consumed by them15:44
lennybfungi: I see. Thanks.15:46
fungibasically they wake up periodically to check and see if nodes need to be built, if nodes have been built, if nodes need to be deleted, and if nodes have been deleted and then take corresponding actions based on those queries, and go back to sleep again15:47
*** kopecmartin is now known as kopecmartin|off15:49
lennybfungi: I have a lot of short jobs ( 5-10min) that are deleting nodes and create them again. When there are too many commits nodepool takes too much time to delete and add a new node15:51
lennybI meant nodepool is deleting nodes after 5-10mins of job run15:51
*** markvoelker has joined #openstack-infra15:52
funginode deletion can be fairly resource-intensive on your cloud, so it's not surprising that it might take some time for deletion requests to be fulfilled. does the debug log indicate nodepool is requesting deletion right away once the job completes?15:54
fungialso, what version of nodepool is this? modern nodepool would have a separate nodepool-launcher daemon to handle that activity15:55
fungiand there's been a lot of work put into making it more efficient so it doesn't block other threads15:55
*** carl_cai has quit IRC15:56
*** eharney has quit IRC15:59
clarkbif nodepoold its land before time nodepool16:03
clarkband that nodepool was less efficicent with threads because it was harder to split it up into many processes16:03
clarkbmordred: fungi Shrews ok I'm going to do breakfast things now that my meeting is over, but when I get back I'd like to see if we can make the etherpad changes to start getting to a happier place maybe16:04
mordredclarkb: ++16:04
*** agopi is now known as agopi|food16:04
*** jamesmcarthur has joined #openstack-infra16:06
corvusclarkb, fungi, mordred: opendev nameservers -- https://review.openstack.org/61006616:06
mordredcorvus: you're an opendev nameserver16:08
mordred(+2)16:08
corvusno -- 104.239.140.165 is an opendev nameserver!16:09
fungiclarkb: clarkb we want to apt install linux-image-virtual-hwe-16.04 from the look of it?16:11
fungii can do that now and then reboot the server and see what it does to the cacti graphs16:12
clarkbya that looks right16:12
fungialso, noted that killing abiword causes nodepool to immediately respawn it16:12
corvusetherpad?16:13
fungiyes, sorry, etherpad16:13
funginodepool on the brainz16:13
corvusit *also* respawns things that are killed16:13
mordredcorvus: oh - while you were out, we added abiword support to nodepool16:13
*** janki has quit IRC16:13
mordredcorvus: if you don't know what that means, I can't explain it :)16:13
clarkbthen get a bunch of people to type on a pad or three and if not fixed disable abiword?16:13
fungino, while you were out we added pdf export functionality to nodepool with an abiword backend ;)16:13
mordredfungi: I enjoy pdf-exporting my vm images16:14
corvusmordred: figures -- you know i'd only approve adding libreoffice if i were around.16:14
AJaegermordred: let's use svg, please ;)16:14
corvusmordred: what are you going to do if we need a spreadsheet?  huh?16:14
mordredcorvus: panic16:14
fungiAJaeger: could we do xbm maybe?16:15
fungii sort of miss it16:15
corvushow many people in channel are thinking about whether we can have nodepool automatically update an ethercalc spreadsheet with its node list?16:15
*** ifat_afek has quit IRC16:15
fungii'd need a few more drinks first16:15
*** ifat_afek has joined #openstack-infra16:16
corvusspreadsheets are an underutilizied user interface paradigm.16:16
AJaegerfungi: let's wait until next vacation of corvus, ok?16:17
fungiwfm16:17
fungii'll just go back to gorging on candy corn16:17
clarkbcorvus is going to get hired to write vssheet16:18
*** bhavikdbavishi has quit IRC16:18
*** bhavikdbavishi has joined #openstack-infra16:19
fungi#status log manually installed linux-image-virtual-hwe-16.04 on etherpad01.openstack.org to test out theory about cache memory and system cpu utilization16:19
AJaegerclarkb: is that an IBM product?16:19
openstackstatusfungi: finished logging16:19
fungishould i #status notice the etherpad server reboot? or jdi?16:21
clarkbAJaeger: I was thinking of visual studio -> vsscode, excel -> vssheet16:21
AJaegerah16:21
clarkbfungi I think I just did the upgrade16:22
fungia'ight... here goes nothing (and hopefully here comes something)16:22
fungiit's pinging again16:23
fungii can ssh into it16:23
fungiWelcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-38-generic x86_64)16:23
fungithat looks like the kernel we wanted16:24
fungietherpad nodejs process is running16:24
fungii've attempted to refresh some etherpads i had up before the reboot and they all load16:25
fungikeeping an eye on cacti graphs to see if this made any noticeable difference16:25
*** hashar is now known as hasharAway16:27
*** kencjohnston has quit IRC16:27
clarkbas soon as tea is made I'll add a firefox and chrome client to some pads16:27
*** ifat_afek has quit IRC16:29
*** kencjohnston has joined #openstack-infra16:29
*** ifat_afek has joined #openstack-infra16:30
clarkbhttps://etherpad.openstack.org/p/clarkb-test is my long running test pad if anyone else wants to join there and add a bunch of writers/readers16:34
openstackgerritMerged openstack-infra/system-config master: Add opendev nameservers (2/2)  https://review.openstack.org/61006616:35
clarkbfungi: looks like cpu usage on etherpad01 hasn't changed16:37
*** pcaruana|elisa| has quit IRC16:37
fungii concur16:37
openstackgerritMerged openstack-infra/zuul master: Filter file comments for existing files  https://review.openstack.org/61316116:37
fungiand cache memory use may have even gone up a little16:37
clarkbthat said my test etherpad doesn't feel slow16:37
clarkbfungi: I would disable abiword as the next thing. Maybe put etherpad01.openstack.org in the emergency file and update the config by hand and restart the service?16:39
clarkb(debugging quickly over waiting for changes to go through the gate then we forget)16:39
fungii think slowness might correspond to the (granted miniscule) bumps in the swap usage graph: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=119&rra_id=all16:39
clarkbah16:39
clarkbso cpu itself likely not the indicator. That would make sense. Should we let it run on hwe for a bit then and see if it swaps?16:40
fungii wonder if it's somehow paging out something it shouldn't16:40
*** trown is now known as trown|lunch16:40
*** gfidente is now known as gfidente|afk16:40
clarkbiirc part of the old kernel problem was some issue with memory management16:40
fungiyeah, let's give it a couple days16:40
clarkbit didn't consider portions of freeable memory freeable iirc leading to more swapping16:41
fungibut ideally we'll upgrade to a bigger server before berlin if we decide it's warranted16:41
clarkb++16:41
*** Dobroslaw has quit IRC16:41
fungilast thing i want is to return to the bad old days of spending summit week troubleshooting/upgrading etherpad16:41
clarkbsmcginnis: ^ fyi we've made change to etherpad server. Please let us know if you still see that slowing behavior that you've seen recently16:42
clarkbsmcginnis: will be valuable info16:42
*** eernst has quit IRC16:42
smcginnisOK, great. Will do.16:42
fungidhellmann: ^16:42
fungiinfra-root: looks like we have a bunch of +2 votes on 607699 at this point. mind if i self-approve so it's in place by the next time we decide to do a gerrit restart?16:45
fungigranted gerrit restarts aren't as common an occurrence for us any longer. last one was nearly 3 months ago now!16:45
clarkbfungi: I saw go for it16:45
clarkbfungi: and we can probably restart gerrit next week wheile everyone is one a plane (if not sooner)16:46
fungiyeah, i could do it in the middle of tonight if i remember16:46
smcginnisclarkb: Still seeing that slow behavior. Or it's died.16:46
smcginnisOh, finally loaded.16:46
fungiyeah, getting sluggish for me too16:47
clarkbya my test pad worked for my writer but my reader had to refresh to see the data16:48
clarkbfungi: stop abiword next?16:48
fungiyeah, need to figure out how to go about making that happen16:49
clarkbnote memory use is minimal16:49
clarkbimplying it is a cpu not memroy issue16:49
fungikill it and it just comes right back. hallowe'en zombie abiword16:49
clarkbfungi: you need to update teh config (see link above) then restart etherpad16:49
*** gyee has quit IRC16:50
fungicurious why we didn't set it to null in our config already...16:52
*** rh-jelabarre has quit IRC16:52
clarkbfungi: because it provides functioanlity16:52
clarkbwe probably don't need that functionality if that is the source of the bug16:52
fungiwell, i thought we'd done something to prevent abiword from spawning in the past. did we undo that at some point i guess?16:53
*** chandankumar is now known as chkumar|off16:53
erlon@all, does anyone has had problems to ssh in a VM with public keys? The keys are just not being uploaded to the instance after it boots. I have checked and the key_data is being in the logs when the instance is being created16:53
clarkbI don't recall16:53
clarkberlon: it is up to the VM image to run software of some sort like cloud-init or glean to write the keys to disk for whatever users you have16:54
clarkberlon: what images are you using?16:54
erlonclarkb, tried both last cirros and last ubuntu16:54
erlonboth have cloud init16:54
fungiokay, i've set "abiword" : null in /opt/etherpad-lite/etherpad-lite/settings.json and restarted the etherpad-lite service16:55
erlonthe cirros one, I could log using a password and create .ssh/authorized_keys manually, then it worked16:55
dhellmannclarkb , fungi : thanks. I'm still seeing reconnections on the etherpads I have open :-/16:55
fungithe node process is running again but no abiword process this time16:55
fungidhellmann: well, i just this moment restarted it, so you will16:55
dhellmannyeah, duh, I just made that connection16:55
clarkberlon: for ubuntu you are ssh'ing as the ubuntu user?16:56
clarkberlon: and this is the ubuntu image published by ubuntu?16:56
clarkbso many connections!16:56
dhellmannfungi: one day I'll learn to read all the way to the bottom of the buffer before replying16:56
fungiheh16:56
fungino worries!16:56
*** diablo_rojo has joined #openstack-infra16:56
*** shardy has quit IRC16:56
*** shardy_ has quit IRC16:56
*** bhavikdbavishi1 has joined #openstack-infra16:57
erlonclarkb, let me check16:57
fungii reloaded half a dozen pads i had open in my browser tabs and they all came up, though did take a little time to load16:58
*** ash_williams is now known as ash_sawing16:59
*** rh-jelabarre has joined #openstack-infra16:59
*** agopi|food is now known as agopi16:59
fungiclarkb: cache memory usage had dropped drastically so far!16:59
*** bhavikdbavishi has quit IRC16:59
*** bhavikdbavishi1 is now known as bhavikdbavishi16:59
*** jpich has quit IRC16:59
clarkbfungi: great, fwiw we use the asme version of nodejs on trusty and xenial so the kernel or abiword (basically external to node and etherpad itself) seem most likely causes17:00
fungithe kernel seems to be the cache memory explanation17:00
erlonclarkb, yeap, used ubuntu. I downloaded it from the official repo17:00
erlonsame with cirros17:00
fungithat hasn't come back up since ~16:25z17:00
clarkberlon: ok you need to login as the ubuntu user, but it should use cloud init17:00
fungiwhich corresponds to the reboot with the hwe kernel17:01
clarkbfungi: gotcha17:01
fungiwill need a few minutes to be able to see if dropping abiword has impacted system cpu utilization17:01
erlonclarkb, any idea on how the cloudd-init works, how is triggered who does that, I might need to dig some deeper17:01
fungithough looking at top there's still a fair bit17:01
erlonIm installing a fresh devstack env as well to see if the problem still happens17:02
clarkberlon: it runs as a system init service, reads metadata and config drive then configures the host. That is the other thing to check. Do you have metadata server enabled? If not make sure config drive is enabled17:03
*** udesale has quit IRC17:03
erlonclarkb, hmmm, good point to check! I know that nova has this option enabled: enabled_apis = osapi_compute,metadata17:05
erlonclarkb, ill check for the other things, thanks a lot for now17:06
dhellmannfungi : fwiw, I'm still seeing reconnections. it sounds like you're still monitoring and it wasn't clear if you were expecting a change in behavior, yet.17:07
clarkbdhellmann: I've not seen a reconnection on my end, but I am multitasking17:07
clarkbfungi: considering it is reconnections. Maybe it has to do with apache and connection number defaults?17:07
fungidhellmann: if it's on attempting to use a tab you had open from before the 16:55z service restart then you might see that17:08
clarkbI thought we configured apache properly for that already, but maybe things have changed sufficiently between releasesthat we aren't17:08
*** e0ne has quit IRC17:08
fungipossible those options changed between the apache versions on trusty and xenial?17:09
clarkbya or maybe default worker type so we configure the wrong one?17:10
clarkbthat said I noticed the slowdown when smcginnis mentioned it but haven't noticed any since abiword was disabled17:10
*** eharney has joined #openstack-infra17:10
clarkbcacti graphs say no change in cpu though17:10
fungii concur17:11
fungithat could be a red herring, but it looks like it picked up ~2 weeks ago17:11
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=115&rra_id=all shows some failed connections recently17:11
*** Swami has joined #openstack-infra17:12
clarkbthere are ~ 469 connections according to netstat17:13
fungithe established tcp connections graph looks fairly consistent with before the upgrade to xenial though17:13
clarkbya but the TCP Open Stats are different17:14
clarkbthere is a jump from less than one per second to ~4 on average?17:14
clarkb[Wed Oct 31 17:13:56.918371 2018] [mpm_event:error] [pid 1651:tid 140460829304704] AH00485: scoreboard is full, not at MaxRequestWorkers17:15
clarkbI don't know what that means, but it looks very suspicious17:15
fungiahh, yeah tcp open does jump drastically after te xenial upgrade17:15
fungii find discussions of AH00485 going all the way back to 2015-09-08 17:14:07 in the channel log17:16
fungimost recently 2018-02-06 18:55:5417:16
fungii need to pop out to lunch before the sb meeting, but can look into this some more when i get back17:17
clarkbya I think this is the issue that led us to restart apache prior to summits/PTGs17:17
clarkbbasiaclly apache doesn't always recycle workers properly. Its possible that newer apache on xenial is worse?17:17
fungivery well could be17:17
clarkbfungi: one specific suggestion on the internets is to set maxconnectionsperchild to 0 so that it never tries to recycle workers17:17
fungiokay, headed out, will be back in an hour-ish17:17
clarkbok I see at least part of the bug17:19
clarkb/etc/apache2/conf-enabled/connection-tuning is a broken symlink17:19
clarkbalso it should have a .conf after it17:19
clarkband we set maxrequestsperchild to 0 there already17:20
* clarkb goes to the puppet to see how to fix17:20
*** xek has quit IRC17:20
*** shrasool has joined #openstack-infra17:22
openstackgerritClark Boylan proposed openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration  https://review.openstack.org/61459517:23
clarkblets see if ^ makes this a happier server17:23
clarkbthings we have learned, apache connection tuning is broken. New kernel makes memory usage look saner. Abiword was maybe completely unrelated17:24
AJaegerconfig-core, two small cleanups for review, please: https://review.openstack.org/614440 and https://review.openstack.org/61450717:25
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects  https://review.openstack.org/61314317:26
*** ginopc has quit IRC17:29
openstackgerritDoug Hellmann proposed openstack-infra/yaml2ical master: add monthly recurrence options  https://review.openstack.org/60868017:29
corvusi single-core approved the etherpad symlink fix; it's pretty evident it's typo-broken17:30
clarkbcorvus: thanks17:30
corvusi have manually installed links on etherpad so i can hit http://localhost/server-status17:30
corvusand yeah, the number of workers is tiny.17:31
*** shrasool has quit IRC17:33
corvuswe're using the event mpm, and we're maxing out at 128 concurrent requests.  when the correct tuning config goes into place, we'll have 4096 slots.17:33
corvus(concurrent requests == concurrent etherpad users because of websocket)17:33
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition  https://review.openstack.org/61444017:36
corvusmnaser: may i please trouble you for a rdns update when you have a moment?  162.253.55.16 and 2604:e100:1:0:f816:3eff:fe2c:7447 should point to ns2.opendev.org17:37
*** electrofelix has quit IRC17:38
*** florianf is now known as florianf|afk17:38
*** aojea has quit IRC17:40
openstackgerritMerged openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard  https://review.openstack.org/61450717:40
*** harlowja has joined #openstack-infra17:44
*** gyee has joined #openstack-infra17:44
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Set ansible python version for opendev nameservers  https://review.openstack.org/61460717:46
*** sambetts is now known as sambetts|afk17:46
corvusdo we have puppet working on bionic yet?17:46
clarkbfungi: ^17:47
clarkbcorvus: I am not aware of it if we do. The puppet 4 work is far enough along that if puppet 4 works on bionic (it was 5 though iirc) that should all just work17:47
clarkbiirc fungi was installing puppet 3 on bionic to mixed success. The thing to try is installing puppet 4 next probably17:48
corvuswhat's the state of running containers?17:48
clarkbcorvus: aiui the missing piece for that is image builds for various services. If we already have images being built (eg nodepool and zuul) then the ansible is ready to run containers instead of puppet17:49
*** ifat_afek has quit IRC17:49
clarkbbut I don't think anyone has done that for a service yet17:50
corvusclarkb: we need https://review.openstack.org/605585 first?17:50
clarkbianw has some chagnes in progress for graphite though17:50
clarkbcorvus: ya we likely need some of the things ianw has been working on17:50
corvusit looks like we might be a couple of months out from having working opendev nameservers.17:51
clarkbcorvus: why?17:51
corvusclarkb: we still can't manage bionic machines17:52
clarkbcorvus: can we deploy on xenial?17:52
corvusclarkb: we could.  i'm not willing to.17:52
corvusif someone else wants to pick up the work, i'm happy to hand it over.17:52
clarkbok. Are you willing to help fungi or ianw with the various bits of work that make bionic viable?17:52
corvusclarkb: i thought i have been :)17:52
*** panda is now known as panda|off17:53
clarkbok I just wasn't sure how ready to go you wanted bionic (or deployments in general) to be17:53
corvusi'm not sure what that means17:54
clarkbjust that part of the effort here includes making xenial + puppet viable medium term so that we don't have to do an immediate cutover of all services at once (this is the puppet 4 portion of the spec)17:55
clarkband if that doesn't interest you I am asking how much of the other aspects of teh spec interest you (sounds like you are happy to help on the things that make bionic viable which is mostly ansible + containers)17:56
corvusright, i'm happy to use either puppet or docker, whichever works on bionic.  i spent several days trying to get puppet to work on bionic, and was not successful.  you may remember such changes as 564898 and 564891.17:57
clarkbre ansible + containers in particular ianw's work with graphite is the place to start I think. That includes improving the startup time of ansible with inventory group generation so that ansible can be run more often without a multi minute startup time17:57
clarkb(ianw updated mordred's inventory plugin change and intends on merging that soonish and watching it I think)17:58
corvusclarkb: i'd be happy with a multi-minute startup time.  though i have also done work on that, having identified the error which caused that.17:58
corvusclarkb: i've found much more success in the ansible area, having implemented most of the testing stack which is now being used to show the error in http://logs.openstack.org/85/605585/4/check/system-config-run-docker/8dc36c0/job-output.txt.gz17:59
corvusclarkb: it seems to be *that* is the blocker17:59
corvuser, seems to 'me'17:59
clarkbya interestingly it almost looks like a package bug (or maybe we are installign the wrong package)18:01
clarkbthe docker daemon failed to start after it was installed by apt18:01
corvusis there a reason we're installing from upstream docker?18:01
*** derekh has quit IRC18:01
corvuswell, the spec says so :)18:01
corvusbut it doesn't say why18:01
clarkbcorvus: that is the default for mordred's install-docker role (it also supports installing from distro). I imagine it is because docker moves relatively quickly compared to the ditros18:02
clarkband there are features/bugfixes that we may want through that channel18:02
corvuswell, on that note, the test results are a month old, i'll recheck :)18:02
clarkbseems reasonable if it is ap ackage bug hopefully they have fixed it18:03
*** jpena is now known as jpena|off18:04
*** trown|lunch is now known as trown18:04
corvusclarkb: weirdly, the other job failure for that change is the base ansible tests, which failed to install puppet on every node type *except* bionic.18:05
clarkbhuh maybe something in it is breaking apt?18:06
clarkbbah now it is in merge conflict18:06
*** ash_sawing has quit IRC18:07
clarkblokos like the parent finally passes tests though https://review.openstack.org/#/c/602385/12 so maybe a rebase is in order (it depends on old ps of the yamlgroup change)18:07
clarkbI'll see how difficult that rebase is18:09
*** mriedem has joined #openstack-infra18:11
*** mriedem is now known as ash_williams18:11
openstackgerritClark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role  https://review.openstack.org/60558518:12
clarkbI think that should get it tested again18:13
*** kjackal has quit IRC18:16
*** ralonsoh has quit IRC18:17
*** dtantsur is now known as dtantsur|afk18:17
Shrewsclarkb: uh, don't we already have an install-docker role?18:22
clarkbShrews: yes in zuul-jobs. This is for infra deployments from bridge.openstack.org18:22
Shrewsoh, looked over the repo part18:23
*** noama has quit IRC18:25
clarkbcorvus: mordred ianw I left a comment on the yamlgroup change https://review.openstack.org/#/c/602385/12 I think there is a class of bug in the groups.yaml conversion that we need to fix18:30
*** bhavikdbavishi has quit IRC18:30
clarkbif someone else can review that too and make sure I am not wrong that would be appreciated18:30
*** e0ne has joined #openstack-infra18:31
fungiokay, burrito conquered, back for more troubleshooting18:32
fungionce i catch up on scrollback anyway18:32
fungireviewing etherpad apache tuning fix18:32
*** e0ne has quit IRC18:32
*** psachin has quit IRC18:36
fungiyeah, i concur with the analysis there18:39
openstackgerritEmilien Macchi proposed openstack-infra/project-config master: Add publish jobs for ansible-role-openstack-operations  https://review.openstack.org/61461618:40
fungiwe can likely roll back the abiword disablement and return to the default xenial kernel if that's the only fix we end up needing18:40
EmilienMhey infra, just a bit of an update on tripleo ci18:40
clarkbya, understanding the kernel memory stuff a bit better would probably be good though18:40
EmilienMwe recently switched a bunch of our CI jobs to use podman instead of docker18:40
EmilienMand a regression in podman caused our gate to be very unstable18:41
EmilienMwe figured that out last night18:41
EmilienMwe are reverting these jobs to docker now18:41
clarkbEmilienM: out of curiousity do you know when? (so that we can correlate thigns to graphite and elasticsearch, etc)18:41
EmilienMso the situation should be better18:41
EmilienMclarkb: error started on Oct 26th18:41
clarkbEmilienM: also re podman, don't forget it wasn't using the infra caching mirror either18:41
clarkb(so there were likely layers of failures there)18:41
EmilienMso for that problem we have retries, that mitigated the problem18:42
EmilienMbut right now we hit an actual bug inside podman (with selinux)18:42
clarkbexcept that makes jobs run longer18:42
clarkbthen they timeout18:42
EmilienMright, we reported the bug too...18:42
clarkbwe shouldn't go back to podman without fixing that item too18:42
EmilienMI agree very much18:42
clarkbEmilienM: when was the podman switch? was that the 26th too?18:43
EmilienMclarkb: before18:44
EmilienMclarkb: we think it's a regression in a recent version18:44
EmilienMhttps://github.com/containers/libpod/issues/173918:44
clarkbEmilienM: can we get that date too (since podman not using the mirrors would also likely lead to failures, its more data we can compare against our stats and logs with)18:44
EmilienMhttps://review.openstack.org/#/c/614537/18:44
EmilienMthere are 3 patches that enabled podman in CI18:45
EmilienMand all 3 are being reverted now (squashed)18:45
clarkblooks like october 12 for the first podman switch18:45
clarkbEmilienM: what is fs010 ?18:47
EmilienMclarkb: container-multinode18:47
EmilienMour most popular/run job18:47
EmilienMtripleo-ci-centos-7-containers-multinode18:47
clarkbok so its shorthand for a job?18:47
clarkbgot it18:47
EmilienMclarkb: https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)18:49
EmilienMwe are tacking drastic measures, we really understand the trouble we make here18:49
ssbarneamordred : if you have few minutes, I could use some hints on adding the docker job. please have a look at https://review.openstack.org/#/c/613672/ -- mainly I am doing something stupid in playbooks/molecule.yml (run)18:51
fungiEmilienM: it's really and truly appreciated18:51
clarkbEmilienM: yup, I'm just trying to make sure I understand what the various moving pieces are and fs010 stood at to me as possibly important and I didn't understand it :)18:52
*** jamesmcarthur has quit IRC18:52
fungiwe were a bit stressed initially about publishing that data at all without making sure we didn't accidentally raise angry mobs coming after the tripleo team18:52
*** jamesmcarthur has joined #openstack-infra18:52
fungiwe know you've got a lot going on, and are glad you're able to work on improving this18:52
AJaegerEmilienM: please see my -1 on https://review.openstack.org/#/c/614570/118:54
EmilienMAJaeger: ack18:55
clarkbssbarnea: left a couple notes for you18:55
EmilienMAJaeger: no need since there is no gate section anymore18:55
EmilienMAJaeger: right?18:55
AJaegerEmilienM: https://review.openstack.org/#/c/614593/118:55
AJaegerEmilienM: you still have gates via the template18:56
AJaegerSo, if you want these in the same queue, those two lines (gate/queue) are needed.18:56
*** sshnaidm|bbl is now known as sshnaidm|ruck18:57
EmilienMmwhahaha: ^18:57
EmilienMAJaeger: ack, thx18:57
*** jamesmcarthur has quit IRC18:57
mwhahahaAJaeger: got it, thanks18:57
corvusAJaeger: weren't we asking folks to keep queue lines in project-config?18:57
AJaegercorvus: only integrated18:57
AJaegercorvus: others are optional...18:58
AJaegercorvus: https://docs.openstack.org/infra/manual/creators.html#shared-queues-for-cross-project-testing18:58
corvusok18:59
corvusi guess the tripleo queue contains only tripleo projects18:59
AJaegercorvus: yep19:00
AJaegercorvus: hope so ;)19:00
AJaegermwhahaha: LGTM now.19:00
ssbarneaclarkb: comments made of gold. thanks!19:00
AJaegermwhahaha, EmilienM , those are indeed drastic measures...19:00
AJaegermwhahaha, EmilienM, you could use templates for these jobs - makes it easier to change in a single place for next time...19:01
mwhahahaAJaeger: we do have them, but we were using specific file rules in the other projects19:01
*** jamesmcarthur has joined #openstack-infra19:01
mwhahahaAJaeger: so they are inheriting from the template but we have to tweak it19:01
mwhahahai guess we could merge the file sections up into the template however19:02
AJaegermwhahaha: ah - yes, you could merge the files into the template19:02
*** jamesmcarthur has quit IRC19:08
*** jamesmcarthur has joined #openstack-infra19:09
*** erlon has quit IRC19:13
openstackgerritMerged openstack-infra/system-config master: Hyperlink task footers  https://review.openstack.org/60769919:13
*** nicolasbock has joined #openstack-infra19:19
clarkbcorvus: https://zuul.openstack.org/stream/3bb43f13f58e4b9f9e813d7bb38dbca0?logfile=console.log is the system-config docker test19:19
*** jcoufal has joined #openstack-infra19:21
*** ash_williams is now known as mriedem_away19:25
*** diablo_rojo has quit IRC19:26
*** erlon has joined #openstack-infra19:27
*** erlon has quit IRC19:35
openstackgerritClark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role  https://review.openstack.org/60558519:39
Shrewsanyone familiar enough with ansible-lint to understand what's happening here? http://logs.openstack.org/23/605823/6/check/openstack-zuul-jobs-linters/6bd2ec9/job-output.txt.gz#_2018-10-31_19_02_58_63994319:40
clarkbShrews: http://logs.openstack.org/23/605823/6/check/openstack-zuul-jobs-linters/6bd2ec9/job-output.txt.gz#_2018-10-31_19_02_55_359983 the error is a few lines above19:41
*** jamesmcarthur has quit IRC19:44
*** e0ne has joined #openstack-infra19:44
*** betherly has joined #openstack-infra19:45
*** jamesmcarthur has joined #openstack-infra19:45
*** e0ne has quit IRC19:45
*** diablo_rojo has joined #openstack-infra19:45
openstackgerritClark Boylan proposed openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration  https://review.openstack.org/61459519:46
clarkbcorvus: fungi ^ yay for testing19:46
*** jcoufal has quit IRC19:48
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul-jobs master: Add role to install kubernetes  https://review.openstack.org/60582319:49
*** betherly has quit IRC19:49
*** apetrich has quit IRC19:50
mordredcorvus, clarkb: sorry- was afk earlier - the reason I was pushing for installing from docker upstream and not from distro is, yes, upstream moves a bit quicker than distro19:50
*** erlon has joined #openstack-infra19:51
mordredcorvus, clarkb: but also, the dockerhub urls and mirroring are different between xenial docker and upstream docker - so I figured since docker was new to us, go ahead and start with current state of the art rather than start with an old version19:51
clarkboh right that is why we have two different caching mirror endpoints for docker19:52
mordredyeah19:52
clarkbI do like the idea of not needing to chagne that19:52
mordredprobably should have included things like that in the spec19:52
clarkbmordred: fwiw maybe you can look at https://review.openstack.org/605585 failures and see if my latest patchset makes sense? I noticed that apt-utils missing was in the list of things where it failed so added that19:53
clarkbmordred: might need to add that to the z-j role too if that fixes stuff19:53
*** lujinluo has joined #openstack-infra19:53
mordredlooking19:54
mordredclarkb: what is apt-utils for? (justoutof curiosity)19:55
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951519:55
*** lujinluo has quit IRC19:56
clarkbmordred: https://packages.debian.org/sid/apt-utils19:56
*** lujinluo has joined #openstack-infra19:56
clarkbmordred: my hunch is that docker-ce uses one of those tools, dpkg delays because no apt-utils then fails to start the daemon because some config is missing19:56
mordrednod.19:56
mordredworth a stab - although we haven't had the issue with the install-docker role in zuul jobs :(19:57
clarkbmordred: I wonder if its a bionic thing? (I'm guessing that job is bionic only)19:57
corvusclarkb: was it the same error?19:57
clarkbcorvus: yes same error again19:58
corvusclarkb: yes, that job runs on bionic19:58
mordredweird - cause the pbrx jobs run on bionic too19:58
clarkbmordred: maybe bindep pulls in something on those jobs we don't pull in on these?19:59
clarkbI imagine that distro packages that need apt-utils just dep on them but docker-ce doesn't19:59
clarkb?19:59
mordredclarkb: we don't run bindep in the pbrx job ... this is weird20:01
fungipbrx relies on bindep to tell it what else to include in the image though, right?20:01
* clarkb wanders off to grab lunch now that mordred is looking at it20:03
mordredfungi: it does- but it does that inside of a container - so it doesn't bindep on the host itself20:03
*** apetrich has joined #openstack-infra20:04
fungijust making sure20:04
corvusclarkb, mordred: http://logs.openstack.org/26/610726/3/check/pbrx-build-zuul-containers/7372a00/ara-report/result/bcba7357-f2d6-4527-9b8f-68cb2e34e17e/20:04
mordredyeah. its weird20:04
ianwclarkb: hrm, i think you're right on the name matching.  it would be great if we could unit test this ...20:04
*** dmsimard has quit IRC20:05
corvusclarkb, mordred: that makes me think apt-utils is a red herring20:06
ianwclarkb / amorin : hrm, checking in on our port clearing screen, I'm seeing "HttpException: 500: Server Error for url: https://network.compute.gra1.cloud.ovh.net/v2.0/ports, {"NeutronError": {"message": "Request Failed: internal server error while processing your request.","20:06
mordredcorvus: I think so too20:06
ianwand yeah, ovh graphs not looking promising .... http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 ... do we know about this?20:07
mordredcorvus: looking through th eoutput, it says the control process died and we shoudl run systemctl status docker.service to see what's up20:07
mordredcorvus: maybe we should toss in a systemctl status docker.service line to get some logs?20:07
mordredoh - wait a sec20:07
clarkbianw: news to me20:08
mordredwe're installing an empty daemon.json file20:08
mordredhttps://review.openstack.org/#/c/605585/6/playbooks/roles/install-docker/templates/daemon.json.j220:09
mordredcompared to20:09
mordredhttp://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/install-docker/templates/daemon.json.j220:09
corvusthat is a difference between that and zuul-jobs20:09
mordredto maybe instead of an empty file - we should just not write one out at all20:09
clarkbmordred: ++20:10
corvusit's also not actually empty20:10
mordredyeah20:10
corvusit has some things that don't look like valid json20:10
mordredlet me make that change real quick20:10
clarkbbut json doesn't have comments20:10
clarkbso ya20:10
corvusregarding the nameservers -- maybe we don't want to use docker for those anyway?  maybe we want to just use os packages with ansible?20:11
clarkbcorvus: possibly. The big win with containers is greater control over versions of software. But for dns servers maybe we want to be conservative about that20:12
fungii'm unsure what containers buy us in the case where the software is already well-established and stably packaged on our distro of preference20:12
mordredyeah. I think just ansible on bionic for nameservers is probably fine20:13
*** ramishra has quit IRC20:13
openstackgerritMonty Taylor proposed openstack-infra/system-config master: Initial port of install-docker role  https://review.openstack.org/60558520:13
clarkbfungi: one other thing is collocation of services. But I doubt we want to collocate dns servers anyway20:13
mordredclarkb, corvus, fungi, ianw: removed the config file, and also re-removed apt-utils20:13
fungicompare to mm3 where getting the dependencies right is a beast even on relatively recent distro versions and there are bugs even then which are fixed upstream but not yet packaged20:13
mordredclarkb: ++20:13
corvusi'll see how fast i can port this puppet to ansible20:14
mordredI think in this case the thing we want to install is very straight forward- and containers will just make it more complex20:14
*** apetrich has quit IRC20:14
fungieven i, as not a container fan, see how deploying mm3 from the upstream-provided container images is a win20:14
mordredfungi: ++20:14
mordredfungi: there are times when it's an excellent format to allow upstreams to provide 'packaging' :)20:14
clarkbianw: looks to be gra1 specific?20:16
fungii'm very confused by gerrit... can anybody see why https://review.openstack.org/600472 claims to have been last updated 24 hours ago?20:16
ianwclarkb: yeah, maybe it's on the way back20:17
ianwfungi: yeah ... i had one of those the other day too, couldn't see any ci comments or anything20:18
clarkbfungi: nothing stands out to me. I think the one thing that might not add a log item on the change itself is hitting the little x next to a vote value?20:18
*** ansmith has quit IRC20:18
ianwclarkb: it could be on the way back up.  https://review.openstack.org/#/c/613196/ would help in the nodepool logs matching things up :)20:19
clarkbianw: ok I'm going to finish lunch then take a look at ^ as well as make sure that etherpad fix gets in20:20
*** hasharAway has quit IRC20:20
*** jamesmcarthur has quit IRC20:24
*** trown is now known as trown|outtypewww20:29
*** e0ne has joined #openstack-infra20:30
*** imacdonn has quit IRC20:34
*** imacdonn has joined #openstack-infra20:34
*** jamesmcarthur has joined #openstack-infra20:38
*** jamesmcarthur has quit IRC20:43
*** e0ne has quit IRC20:50
*** kgiusti has left #openstack-infra20:50
openstackgerritMerged openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration  https://review.openstack.org/61459520:55
*** shrasool has joined #openstack-infra20:58
fungitime for me to get presentable and head to a party. happy hallowe'en all! (except for the aussies who got to celebrate it a while i was asleep)20:58
clarkbfungi: enjoy! and thank you for helping with the etherpad stuff20:58
clarkbdon't scare too many small children20:59
funginp, i see the fix hasn't landed yet but i'll try to check back in on it later20:59
clarkbfungi: it just merged above, I'll make sure it gets applied20:59
*** lujinluo has quit IRC21:02
*** boden has quit IRC21:03
*** lujinluo has joined #openstack-infra21:03
mordredsimilar to fungi - I have hit the point in the day where I'm focusing on dealing with halloween - although in my case it's getting prepared for my friend ben to terrify the children21:04
clarkbmordred: we should be able to clean up the openstack sdk stuff tomorrow?21:06
clarkbmordred: thats an item I'd like to get off my list so let me know when that is ready21:06
*** shrasool has quit IRC21:07
*** lujinluo has quit IRC21:07
*** lujinluo has joined #openstack-infra21:08
*** betherly has joined #openstack-infra21:27
*** betherly has quit IRC21:32
clarkbsmcginnis: dhellmann apache should've just restarted with the connection tuning fix21:35
clarkbsmcginnis: dhellmann: if you can watch out for reconnections/slowness form this point forward that will be helpful21:35
smcginnisWill do!21:36
clarkbactually we've leaked the old symlink so I am going to delete that and restart apache again just to be double sure21:36
clarkband done21:37
mordredclarkb: yes - that's all ready to go for tomorrow21:41
*** erlon has quit IRC21:53
dhellmannclarkb : thanks, I'll let you know21:54
clarkbmordred: ianw thinking about the yamlgroup thing a bit more, maybe we want it to use regexes rather than unix shell globs?21:54
clarkbwe'd still need to rewrite the yaml file but at elast python regexes are more common with our user base?21:54
*** eharney has quit IRC21:55
clarkbalso the system config docker change still fails but now it fails because docker adds iptables rules we need to account for in the tests. I'll take a look at that shortly if it helps21:58
*** apetrich has joined #openstack-infra21:58
*** kjackal has joined #openstack-infra22:02
imacdonnjust had a check job fail, seemingly due to an ssh host key change ... do I just recheck it, or is something b0rked ?22:03
imacdonn2018-10-31 19:53:21.337694 | ubuntu-xenial |   "msg": "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe22:03
imacdonnfingerprint for the ED25519 key sent by the remote host is\nSHA256:qy9kk9BhbbsgVtMOsJqSUJ9Tb4yBjdGg+xwO90qFj9s.\r\nPlease contact your system administrator.\r\nAdd correct host key in /var/lib/zuul/builds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts to get rid of this message.\r\nOffending ED25519 key in /var/lib/zuul/builds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts:1\r\n  remove with:\r\n  ssh-keygen -f \"/var/lib/zuul/bui22:03
imacdonnlds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts\" -R 104.130.222.138\r\nED25519 host key for 104.130.222.138 has changed and you have requested strict checking.\r\nHost key verification failed.\r\nrsync: connection unexpectedly closed (0 bytes received so far) [Receiver]\nrsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]\n",22:03
*** agopi is now known as agopi|brb22:04
clarkbimacdonn: can you link to the log file?22:06
imacdonnclarkb: http://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/22:06
openstackgerritClark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role  https://review.openstack.org/60558522:07
*** agopi|brb has quit IRC22:08
clarkbok bindep logs don't indicate anything would restart the sshd (possibly picking up a new host key)22:12
imacdonnCould two hosts be trying to use the same IP address?>22:13
clarkbimacdonn: it is theoretically possible. We'd need someone from rackspace to tell us if that was the case22:14
clarkbthe main ssh portion of the job (the run.yaml bit) uses a controlpersist managed connection that is started early in the job22:14
clarkbthe rsync file synchronizations do not so will start a new connection22:14
clarkbIf the the daemon has indeed changes its key we would notice this way, another host with the same IP is another possibility though I would expect problems with the main controlpersist connection in that case22:15
clarkbis it possible the ssl-cert pacakge triggered a refresh of the sshd host keys?22:16
*** jamesmcarthur has joined #openstack-infra22:16
clarkbthat would surprise me, but maybe it will do that for improving security reasons?22:16
imacdonnit could be an ARP race ... and the cache expired while the main part of the job was running .. just an idea22:16
clarkbimacdonn: ya22:16
imacdonnit'd surprise me too, if a package update replaced host keys ... and not in a good way  ;)22:17
clarkbimacdonn: if this persists maybe the thing to check is have the tox collect logs role run an always block that cat's the host key?22:17
clarkbsince that should go over the existing ssh connection in theory it would work22:17
clarkband if that doesn't show anything wrong with the server then escalate with rax22:17
clarkbcorvus: ^ could we use the control persist connection for rsync too? should make it more reliable overall22:18
*** diablo_rojo has quit IRC22:18
*** gyee has quit IRC22:19
clarkbwe revoke sudo in that job so the cinder unittests themselves shouldn't be able to break ssh22:20
clarkb(we might have a hard time dumping the host key file in that case too :(22:20
ianwmordred / clarkb : could do regexes.  i'm just writing up something that hopefully does a "unit" type test on it, we we can at least stress some edge cases22:20
corvuscatching up22:20
*** jamesmcarthur has quit IRC22:21
clarkbcompletely unrelated but apparently rhel 7.6 released todayish. So we should be on the lookout for centos 7.6 weirdness in the next few days (whenver that ends up being available)22:24
clarkbhttp://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/ara-report/file/9580cb64-f384-4351-a50e-fb2a1dae3968/#line-34 that synchronize runs before the fatal synchronize22:26
*** rlandy is now known as rlandy|bbl22:26
clarkbit succeeds because failed when is set to false. Side effect of some tox runs not creating a venv dir. Pointing it out because the first failure is earlier than the obvious later failure22:26
imacdonnah, interesting22:27
corvusclarkb, imacdonn: i believe the ansible synchronize module does not use the ssh control connection22:28
corvusclarkb, imacdonn: https://github.com/ansible/ansible/issues/8473 is interesting....22:28
corvusclarkb, imacdonn: apparently someone wrote a patch to support that: https://github.com/cognifloyd/ansible/tree/synchronize_control_path22:29
corvusi don't know if they made a pr out of that22:29
imacdonnThat may be a good idea ... but still, there's some underlying issue if the host key apparently changed, right ?22:30
clarkbreading http://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/ara-report/result/48249e3d-f90f-481f-b2de-965582c1443c/ I think it does successfully get the subunit file then fails on the html file22:30
clarkbwhich is really weird22:30
corvusimacdonn: yes; i think clarkb's suggestions for debugging that are warranted :)22:30
clarkbI think if ^ is true that lends some weight to the idea that there is an arp fight happening22:30
clarkbbecause it goes rsync fail (tox logs), rsync success (subunit), rsync fail (html)22:31
clarkbcloudnull: if you aren't trick or treating, ^ may be just enough evidence of fight over IP addresses in rackspace? I'd be curious to hear your thoughts on that22:32
*** gfidente|afk has quit IRC22:36
clarkbEmilienM: mwhahaha do you know who we can get to review https://review.openstack.org/#/c/614305/ a low hanging fruit change on the job cleanup front22:38
clarkbwill remove non voting jobs from puppet gates22:38
mwhahahai can22:40
EmilienMclarkb: looking now22:40
*** agopi|brb has joined #openstack-infra22:44
*** mriedem_away has quit IRC22:44
*** diablo_rojo has joined #openstack-infra22:50
clarkbok https://review.openstack.org/#/c/605585/ to install docker on control plane servers passes now22:52
ianwclarkb: i'm just seeing if some version of the unit-testing infrastructure we have for ansible roles in zuul-jobs makes sense translated to system-config for testing this yaml matching plugin22:55
*** erlon has joined #openstack-infra22:59
mwhahahaalright, i'm confused why the test-release-openstack-python3 jobs are still running on https://review.openstack.org/#/c/613621/ since I thought https://review.openstack.org/#/c/614245/ should stop them from running. thoughts?22:59
clarkbmwhahaha: reading the inheritance path at http://logs.openstack.org/21/613621/6/check/test-release-openstack-python3/6b814f2/zuul-info/inventory.yaml it seems the template definition path for that job on that branch/repo is winning out over the override there23:03
clarkbmwhahaha: you may need to drop that template then define an equivalent for instack that only runs on master23:03
clarkbit is also possible that this represents a bug in zuul job config parsing23:03
*** erlon has quit IRC23:04
clarkbas a human I would agree that the explicit job list should override the slightly more implicit list provided by the template23:04
openstackgerritJames E. Blair proposed openstack-infra/system-config master: WIP: configure adns1.opendev.org via ansible  https://review.openstack.org/61464823:16
corvusclarkb: ^ how's that for a start?23:16
*** erlon has joined #openstack-infra23:19
*** tpsilva has quit IRC23:22
corvusmwhahaha, clarkb: once a job is selected to run, you can't un-select it.  the configuration in 614245 is applying a (null) branch variant to the job already selected by the template.23:25
corvusmwhahaha, clarkb: if a template runs a job you don't want to run, don't use the template :)23:25
clarkbcorvus: re DNS left a couple comments where the ansible might not quite do what we want. Otherwise that looks pretty straight forward23:25
clarkbcorvus: thank you for confirming the template job config thing23:26
corvusclarkb: good catches23:29
*** kjackal has quit IRC23:31
*** kjackal has joined #openstack-infra23:31
*** ianychoi has joined #openstack-infra23:35
openstackgerritAmy Marrich (spotz) proposed openstack-infra/irc-meetings master: Remove WoO meeting  https://review.openstack.org/61464923:35
*** kjackal has quit IRC23:39
mwhahahaAh23:43
mwhahahaThanks I'll fix tomorrow23:43
*** diablo_rojo has quit IRC23:45
*** Swami has quit IRC23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!