openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Update test coverage for openSUSE-minial to 15.0 https://review.openstack.org/610436 | 00:01 |
---|---|---|
openstackgerrit | Merged openstack/diskimage-builder master: Turn on quiet mode when logfile specified https://review.openstack.org/612865 | 00:15 |
*** pall is now known as pabelanger | 00:18 | |
pabelanger | clarkb: mwhahaha: weshay: rfolco|rover: re: docker downloads, there are some tripleo jobs that are not setup to use the reverse proxy for docker, so they will be directly downloading from docker.io each time. For example: http://logs.openstack.org/61/613661/4/gate/tripleo-puppet-ci-centos-7-undercloud-containers/bf1c2d8/logs/undercloud/etc/docker/daemon.json.txt.gz | 00:19 |
pabelanger | not sure what other jobs are affect, but somebody should audit them all and confirm setup propelry | 00:19 |
pabelanger | also, last time I looked into docker caching of images, they would expire after 4 hours. Which meant our miss rate in apache logs was very high. I did reach out to somebody at docker to see why it was so low, by had a hard time sync with them. | 00:20 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Update devstack test to Fedora 28 https://review.openstack.org/614375 | 00:21 |
clarkb | pabelanger: right but we'd verify the imahe hasnt updated not redownload it after the 4 hour expiry | 00:23 |
clarkb | so thats not great but not terrible | 00:23 |
pabelanger | yah, I cannot remember everything about the last time I debugged, but there were some images that would redownload multiple times a day | 00:24 |
pabelanger | I was at the point of starting to modify expire headers for testing, but that of course would break http standards | 00:25 |
pabelanger | The other options, somebody could try would be to use the new zuul_return for pausing a job: https://zuul-ci.org/docs/zuul/user/jobs.html?highlight=zuul_return#pausing-the-job and have a top-level job first download all images once, then child jobs pull from it. However, that should be no different then our current reverse proxy setup | 00:26 |
pabelanger | that also means, job redesign | 00:26 |
clarkb | before we go making drastic changes I think we ned to be able to measure this stuff | 00:28 |
clarkb | how long downloads take, how large images are, what the theoretical best network throughput is for that size of file and how we compare (assume gigabit netwokring probably) | 00:28 |
pabelanger | Oh, yah. Images in tripleo are large too, something like 2.5GB of data for each job | 00:29 |
pabelanger | fat containers for sure | 00:29 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Update devstack test to Fedora 28 https://review.openstack.org/614375 | 00:30 |
mwhahaha | clarkb: k I'll look into how that got dropped. Thanks | 00:30 |
pabelanger | anyways, wanted to share some info the last time we had this issue. But do think an audit of jobs will be helpful | 00:30 |
clarkb | right and if we are already near the best case for network tramsfer of that size we arent going to speed up much without changing images. But we need the data | 00:30 |
*** pabelanger is now known as pall | 00:33 | |
*** gyee has quit IRC | 00:36 | |
*** markvoelker has quit IRC | 00:49 | |
*** markvoelker has joined #openstack-infra | 00:50 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Fix meeting ID for Cyborg https://review.openstack.org/612676 | 00:51 |
*** longkb has joined #openstack-infra | 00:52 | |
*** anteaya has quit IRC | 00:52 | |
*** ssbarnea has quit IRC | 00:52 | |
*** markvoelker has quit IRC | 00:55 | |
openstackgerrit | Merged openstack/diskimage-builder master: Remove python3 legacy jobs https://review.openstack.org/614047 | 01:18 |
mwhahaha | clarkb: heads up, that job isn't using docker which is why that docker config file is not configured with a mirror. I will have to track down where the podman mirror config is tho | 01:24 |
*** ccamacho has quit IRC | 01:24 | |
*** ccamacho has joined #openstack-infra | 01:25 | |
clarkb | that would certainly explain it | 01:27 |
*** eernst has joined #openstack-infra | 01:50 | |
*** eernst has quit IRC | 01:59 | |
*** hongbin has joined #openstack-infra | 02:02 | |
*** diablo_rojo has quit IRC | 02:02 | |
*** vtapia has joined #openstack-infra | 02:06 | |
*** erlon has quit IRC | 02:33 | |
*** ykarel has joined #openstack-infra | 02:34 | |
*** rh-jelabarre has quit IRC | 02:39 | |
*** ramishra has quit IRC | 02:43 | |
*** ramishra has joined #openstack-infra | 02:49 | |
*** markvoelker has joined #openstack-infra | 02:51 | |
*** psachin has joined #openstack-infra | 02:52 | |
*** mrsoul has joined #openstack-infra | 02:53 | |
*** carl_cai has joined #openstack-infra | 03:02 | |
*** markvoelker has quit IRC | 03:24 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Add ubuntu-systemd-container operating-system element https://review.openstack.org/563748 | 03:29 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Add systemd-containers functional tests https://review.openstack.org/614051 | 03:29 |
*** bhavikdbavishi has joined #openstack-infra | 03:31 | |
*** bhavikdbavishi has quit IRC | 03:31 | |
*** bhavikdbavishi has joined #openstack-infra | 03:33 | |
*** udesale has joined #openstack-infra | 03:47 | |
*** janki has joined #openstack-infra | 03:47 | |
*** hongbin has quit IRC | 04:01 | |
*** dave-mccowan has quit IRC | 04:18 | |
*** markvoelker has joined #openstack-infra | 04:21 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing https://review.openstack.org/614402 | 04:48 |
*** markvoelker has quit IRC | 04:54 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing https://review.openstack.org/614402 | 05:05 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul-jobs master: Add prepare-workspace-git role https://review.openstack.org/613036 | 05:11 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Updated bindep to cover for MacOS requirements https://review.openstack.org/613727 | 05:19 |
*** ykarel has quit IRC | 05:21 | |
*** ykarel has joined #openstack-infra | 05:21 | |
*** noama has joined #openstack-infra | 05:22 | |
*** armax has joined #openstack-infra | 05:28 | |
*** pcaruana|elisa| has joined #openstack-infra | 05:29 | |
*** pcaruana|elisa| has quit IRC | 05:37 | |
*** janki has quit IRC | 05:46 | |
*** janki has joined #openstack-infra | 05:49 | |
*** markvoelker has joined #openstack-infra | 05:51 | |
*** ykarel has quit IRC | 05:51 | |
*** bhavikdbavishi has quit IRC | 05:56 | |
*** bhavikdbavishi has joined #openstack-infra | 06:08 | |
*** carl_cai has quit IRC | 06:10 | |
*** e0ne has joined #openstack-infra | 06:16 | |
*** e0ne has quit IRC | 06:18 | |
*** janki has quit IRC | 06:18 | |
*** markvoelker has quit IRC | 06:26 | |
*** dpawlik has joined #openstack-infra | 06:27 | |
*** dpawlik has quit IRC | 06:28 | |
*** dpawlik has joined #openstack-infra | 06:28 | |
*** quiquell|off is now known as quiquell | 06:37 | |
*** janki has joined #openstack-infra | 06:37 | |
*** bhavikdbavishi has quit IRC | 06:38 | |
*** bhavikdbavishi has joined #openstack-infra | 06:39 | |
*** armax_ has joined #openstack-infra | 06:42 | |
*** armax has quit IRC | 06:42 | |
*** armax_ is now known as armax | 06:42 | |
*** ifat_afek has joined #openstack-infra | 06:49 | |
*** ifat_afek has quit IRC | 06:55 | |
*** ifat_afek has joined #openstack-infra | 06:55 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] put test nodes into groups for testing https://review.openstack.org/614402 | 07:03 |
*** chkumar|off is now known as chandankumar | 07:13 | |
*** ccamacho has quit IRC | 07:15 | |
*** ccamacho has joined #openstack-infra | 07:15 | |
*** markvoelker has joined #openstack-infra | 07:24 | |
openstackgerrit | Merged openstack-infra/zuul master: Collect docker logs after quick-start run https://review.openstack.org/613027 | 07:25 |
*** xek has joined #openstack-infra | 07:38 | |
*** ykarel has joined #openstack-infra | 07:42 | |
*** pcaruana|elisa| has joined #openstack-infra | 07:45 | |
*** shardy has joined #openstack-infra | 07:45 | |
*** shardy_ has joined #openstack-infra | 07:46 | |
*** ykarel has quit IRC | 07:49 | |
*** ykarel has joined #openstack-infra | 07:50 | |
*** ykarel has quit IRC | 07:56 | |
*** markvoelker has quit IRC | 07:57 | |
*** ykarel has joined #openstack-infra | 08:02 | |
*** sshnaidm|off is now known as sshnaidm|ruck | 08:03 | |
*** florianf|afk is now known as florianf | 08:05 | |
*** ykarel has quit IRC | 08:10 | |
*** e0ne has joined #openstack-infra | 08:12 | |
*** bhavikdbavishi has quit IRC | 08:16 | |
*** jtomasek has joined #openstack-infra | 08:18 | |
*** ykarel has joined #openstack-infra | 08:27 | |
*** bhavikdbavishi has joined #openstack-infra | 08:28 | |
*** jpich has joined #openstack-infra | 08:28 | |
*** ykarel has quit IRC | 08:28 | |
*** ykarel has joined #openstack-infra | 08:29 | |
*** larainema has quit IRC | 08:30 | |
*** ralonsoh has joined #openstack-infra | 08:31 | |
*** ykarel_ has joined #openstack-infra | 08:31 | |
*** ykarel has quit IRC | 08:34 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin https://review.openstack.org/602385 | 08:34 |
*** ykarel has joined #openstack-infra | 08:34 | |
ianw | mordred, infra-root: ^ i think that should be CI happy now; i had to add an extra groups file | 08:38 |
*** ykarel_ has quit IRC | 08:38 | |
*** ykarel has quit IRC | 08:40 | |
*** ssbarnea has joined #openstack-infra | 08:40 | |
*** derekh has joined #openstack-infra | 08:51 | |
*** jpena|off is now known as jpena | 08:52 | |
openstackgerrit | Takashi NATSUME proposed openstack-infra/project-config master: Remove placement-api-ref jobs https://review.openstack.org/614435 | 08:54 |
*** markvoelker has joined #openstack-infra | 08:54 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Logs stats for nodepool automated cleanup https://review.openstack.org/614074 | 08:54 |
*** panda|off is now known as panda | 09:01 | |
openstackgerrit | Takashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition https://review.openstack.org/614440 | 09:02 |
*** d0ugal has joined #openstack-infra | 09:02 | |
*** hasharAway is now known as hashar | 09:03 | |
openstackgerrit | Takashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition https://review.openstack.org/614440 | 09:03 |
*** mordred[m] has quit IRC | 09:13 | |
*** pcaruana|elisa| has quit IRC | 09:13 | |
*** niedbalski has quit IRC | 09:15 | |
*** pcaruana|elisa| has joined #openstack-infra | 09:15 | |
*** bhavikdbavishi has quit IRC | 09:15 | |
*** Qiming has quit IRC | 09:16 | |
*** e0ne has quit IRC | 09:16 | |
*** bhavikdbavishi has joined #openstack-infra | 09:16 | |
*** GDPR has quit IRC | 09:18 | |
*** Qiming has joined #openstack-infra | 09:19 | |
*** GDPR has joined #openstack-infra | 09:19 | |
openstackgerrit | Takashi NATSUME proposed openstack-infra/project-config master: Replace placement-api-ref jobs for nova project https://review.openstack.org/614435 | 09:23 |
*** e0ne has joined #openstack-infra | 09:25 | |
*** markvoelker has quit IRC | 09:27 | |
*** gfidente has joined #openstack-infra | 09:37 | |
*** bhavikdbavishi has quit IRC | 09:47 | |
*** bhavikdbavishi has joined #openstack-infra | 09:48 | |
*** lennyb has quit IRC | 09:48 | |
*** lennyb has joined #openstack-infra | 09:49 | |
*** ianychoi has quit IRC | 09:51 | |
*** ianychoi has joined #openstack-infra | 09:52 | |
*** dpawlik_ has joined #openstack-infra | 09:55 | |
*** dpawlik_ has quit IRC | 09:56 | |
*** dpawlik has quit IRC | 09:57 | |
*** dpawlik has joined #openstack-infra | 09:57 | |
*** xinliang has joined #openstack-infra | 10:00 | |
*** kjackal has joined #openstack-infra | 10:01 | |
xinliang | ianw: Could you have time to see this bug: https://bugs.linaro.org/show_bug.cgi?id=4035 | 10:02 |
openstack | bugs.linaro.org bug 4035 in Default "[uk cloud] Wget fetching from mirror.london.linaro-london.openstack.org is more slower and unstable than deb.debian.org" [Enhancement,Unconfirmed] - Assigned to gema.gomez-solano | 10:02 |
xinliang | ianw: london cloud's mirror repo is slower and unstable | 10:02 |
*** kopecmartin|off is now known as kopecmartin | 10:05 | |
*** electrofelix has joined #openstack-infra | 10:07 | |
*** bhavikdbavishi has quit IRC | 10:11 | |
*** apetrich has quit IRC | 10:12 | |
*** markvoelker has joined #openstack-infra | 10:24 | |
*** apetrich has joined #openstack-infra | 10:27 | |
*** rossella_s has joined #openstack-infra | 10:31 | |
*** yamamoto has quit IRC | 10:34 | |
*** yamamoto has joined #openstack-infra | 10:34 | |
*** yamamoto has quit IRC | 10:39 | |
*** yamamoto has joined #openstack-infra | 10:40 | |
*** yamamoto has quit IRC | 10:41 | |
*** shrasool has joined #openstack-infra | 10:42 | |
*** dtantsur|afk is now known as dtantsur | 10:49 | |
*** markvoelker has quit IRC | 10:58 | |
*** e0ne has quit IRC | 11:15 | |
*** e0ne has joined #openstack-infra | 11:15 | |
*** jtomasek has quit IRC | 11:16 | |
*** e0ne has quit IRC | 11:27 | |
*** yamamoto has joined #openstack-infra | 11:28 | |
*** roman_g has joined #openstack-infra | 11:28 | |
*** ramishra has quit IRC | 11:28 | |
*** hashar is now known as hasharLunch | 11:28 | |
*** udesale has quit IRC | 11:32 | |
*** yamamoto has quit IRC | 11:35 | |
*** yamamoto has joined #openstack-infra | 11:35 | |
*** carl_cai has joined #openstack-infra | 11:36 | |
*** jtomasek has joined #openstack-infra | 11:38 | |
*** ramishra has joined #openstack-infra | 11:40 | |
*** rossella_s has quit IRC | 11:40 | |
*** longkb has quit IRC | 11:41 | |
*** bhavikdbavishi has joined #openstack-infra | 11:43 | |
*** rh-jelabarre has joined #openstack-infra | 11:46 | |
*** markmcd has quit IRC | 11:49 | |
*** ramishra has quit IRC | 11:49 | |
openstackgerrit | Merged openstack-infra/project-config master: Replace placement-api-ref jobs for nova project https://review.openstack.org/614435 | 11:51 |
*** udesale has joined #openstack-infra | 11:54 | |
*** markvoelker has joined #openstack-infra | 11:54 | |
*** rossella_s has joined #openstack-infra | 11:55 | |
*** ramishra has joined #openstack-infra | 11:56 | |
*** trown|outtypewww is now known as trown | 12:05 | |
*** ramishra has quit IRC | 12:07 | |
*** pbourke has quit IRC | 12:09 | |
*** pbourke has joined #openstack-infra | 12:10 | |
*** ramishra has joined #openstack-infra | 12:10 | |
*** markvoelker has quit IRC | 12:13 | |
*** panda is now known as panda|lunch | 12:15 | |
*** jchhatbar has joined #openstack-infra | 12:28 | |
*** jchhatbar has quit IRC | 12:28 | |
*** janki has quit IRC | 12:29 | |
*** janki has joined #openstack-infra | 12:29 | |
*** kgiusti has joined #openstack-infra | 12:31 | |
*** ansmith has joined #openstack-infra | 12:34 | |
*** niedbalski has joined #openstack-infra | 12:36 | |
*** rlandy has joined #openstack-infra | 12:38 | |
*** tpsilva has joined #openstack-infra | 12:38 | |
openstackgerrit | Takashi NATSUME proposed openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition https://review.openstack.org/614440 | 12:41 |
*** boden has joined #openstack-infra | 12:41 | |
*** yamamoto has quit IRC | 12:44 | |
*** jpena is now known as jpena|lunch | 12:46 | |
*** sshnaidm|ruck is now known as sshnaidm|bbl | 12:50 | |
*** quiquell is now known as quiquell|lunch | 12:51 | |
*** ifat_afek has quit IRC | 12:56 | |
*** bhavikdbavishi has quit IRC | 12:57 | |
*** panda|lunch is now known as panda | 13:02 | |
*** yamamoto has joined #openstack-infra | 13:04 | |
*** e0ne has joined #openstack-infra | 13:06 | |
*** erlon has joined #openstack-infra | 13:08 | |
*** lennyb has quit IRC | 13:10 | |
*** lennyb has joined #openstack-infra | 13:12 | |
*** eharney has joined #openstack-infra | 13:13 | |
*** mriedem has joined #openstack-infra | 13:14 | |
*** markmcd has joined #openstack-infra | 13:15 | |
*** mriedem is now known as ash_williams | 13:15 | |
*** auristor has quit IRC | 13:15 | |
*** auristor has joined #openstack-infra | 13:16 | |
*** agopi is now known as agopi|brb | 13:18 | |
frickler | infra-root: https://docs.openstack.org/infra/system-config/sysadmin.html#accessing-clouds doesn't work for me because I'm not a member of the admin group. is this a bug in the documentation or in the group setup? | 13:19 |
frickler | xinliang: I tested from an ubuntu node in london and accessing the mirror seems pretty fast to me. do you happen to know whether the issue is affecting only debian nodes? | 13:20 |
odyssey4me | Hi folks - I'd like to understand how many nodepool-launcher/nodepool-builder hosts are implemented for OpenStack-Infra, and whether they're implemented per provider, etc - is this documented somewhere? | 13:22 |
*** janki has quit IRC | 13:22 | |
*** janki has joined #openstack-infra | 13:23 | |
*** agopi|brb has quit IRC | 13:24 | |
frickler | odyssey4me: these should be the configurations: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool | 13:26 |
frickler | infra-root: I'm also stumbling about lack of osc being installed globally on bridge, does everyone manage that in their own local venv? | 13:27 |
*** sambetts|afk is now known as sambetts | 13:28 | |
*** slaweq has quit IRC | 13:29 | |
*** eharney_ has joined #openstack-infra | 13:30 | |
odyssey4me | frickler ah, thanks - that's the info I'm after | 13:30 |
*** jamesmcarthur has joined #openstack-infra | 13:31 | |
frickler | infra-root: even creating a venv fails, maybe I'm missing some very basic understanding here ... http://paste.openstack.org/show/733698/ | 13:32 |
fungi | frickler: i have a python3-built ~/launch-env with openstackclient installed in it, yes | 13:34 |
*** eharney_ has quit IRC | 13:35 | |
fungi | and now i don't remember how i bootstrapped it, because as you note there's no global installation of virtualenv and even `python3 -m venv` is broken because ubuntu strips that out to a separate package we haven't installed | 13:36 |
*** yamamoto has quit IRC | 13:37 | |
*** yamamoto has joined #openstack-infra | 13:38 | |
*** yamamoto has quit IRC | 13:38 | |
Shrews | frickler: fungi: as root, python3 /usr/lib/python3/dist-packages/virtualenv.py --python=python3 ~/my-venv | 13:39 |
Shrews | it's weird | 13:39 |
Shrews | i don't understand why we don't just install python-venv | 13:40 |
frickler | Shrews: thanks, that worked. and yes, it's weird indeed. unless mordred or clarkb have some reason not to do this, I'll propose an update | 13:41 |
SpamapS | Hey, isn't there a way to make a particular zuul change jump ot the top of the queue? | 13:43 |
Shrews | SpamapS: zuul promote, iirc | 13:44 |
Shrews | never used it myself | 13:45 |
fungi | Shrews: strange that virtualenv is installed globally in dist-packages but there's no entrypoint in the standard path | 13:45 |
Shrews | fungi: yes, that is part of said weirdness | 13:46 |
* Shrews shakes his head and rolls eyes at python things | 13:46 | |
fungi | yeah, `zuul promote --tenant=xyzzy --pipeline=plugh --changes=3133,7` | 13:46 |
*** jpena|lunch is now known as jpena | 13:48 | |
*** agopi has joined #openstack-infra | 13:48 | |
*** fuentess has joined #openstack-infra | 13:50 | |
SpamapS | Shrews: ty | 13:51 |
openstackgerrit | Ivan Kolodyazhny proposed openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard https://review.openstack.org/614507 | 13:51 |
dhellmann | is anyone else seeing issues with etherpad right now? it was a little flakey yesterday but I don't know if that's just a safari issue or if it's something else. right now it's *very* slow and I got some sort of JS kernel error on one page load | 13:51 |
dhellmann | specifically with https://etherpad.openstack.org/p/tc-topics-jlm-stein-berlin | 13:51 |
*** ifat_afek has joined #openstack-infra | 13:52 | |
*** jamesmcarthur has quit IRC | 13:54 | |
frickler | dhellmann: I can confirm that it looks stuck for me now, too. was working fine 2h ago | 13:54 |
*** jamesmcarthur has joined #openstack-infra | 13:55 | |
dhellmann | I just got: TimeoutError: The operation timed out. in https://etherpad.openstack.org/static/js/require-kernel.js | 13:55 |
dhellmann | so not a "kernel error" (I hit reload too fast earlier) just a timeout | 13:55 |
*** kaiokmo has quit IRC | 14:00 | |
*** quiquell|lunch is now known as quiquell | 14:00 | |
*** yamamoto has joined #openstack-infra | 14:00 | |
*** jamesmcarthur has quit IRC | 14:01 | |
*** eernst has joined #openstack-infra | 14:01 | |
*** sthussey has joined #openstack-infra | 14:05 | |
hwoarang | good day infra-core. Could anyone reserve an opensuse150 node from the openstack-ansible-functional-opensuse-150 job on https://review.openstack.org/#/c/570543/ ? thank you | 14:09 |
Shrews | dhellmann: fungi: etherpad apache error log is reporting "scoreboard is full, not at MaxRequestWorkers" | 14:09 |
Shrews | not quite sure what that means, but maybe restarting apache will help? | 14:09 |
fungi | dhellmann: looks like memory pressure again. upgrading to xenial seems to have possibly resulted in etherpad consuming a lot more cache memory | 14:09 |
fungi | i was going to see about building a replacement with an 8gb flavor | 14:10 |
fungi | guess i'll start on that here shortly | 14:11 |
Shrews | fungi: is there a temp fix? | 14:11 |
mordred | Shrews, fungi: maybe restarting will help short term? | 14:12 |
Shrews | i'll restart apache.... | 14:13 |
fungi | restarting etherpad fixes it briefly from what i can tell, probably because everyone gets disconnected and not all of them reconnect | 14:13 |
*** shrasool has quit IRC | 14:13 | |
fungi | though yes, apache also seems to be consuming a fair amount of virtual memory | 14:14 |
*** d0ugal has quit IRC | 14:14 | |
Shrews | apache restarted. fungi, is there a separate process for restarting etherpad service? | 14:15 |
fungi | and we have a stray abiword process. how did we go about disabling abiword on the old server? anyone remember? | 14:15 |
fungi | Shrews: yes, etherpad-lite | 14:15 |
Shrews | dhellmann: try now | 14:16 |
*** hasharLunch is now known as hashar | 14:17 | |
*** e0ne has quit IRC | 14:19 | |
mordred | fungi: hrm. I think maybe just uninstalling abiword? | 14:19 |
*** ifat_afek has quit IRC | 14:26 | |
*** e0ne has joined #openstack-infra | 14:26 | |
*** d0ugal has joined #openstack-infra | 14:27 | |
*** dpawlik has quit IRC | 14:29 | |
*** dpawlik has joined #openstack-infra | 14:30 | |
*** d0ugal has quit IRC | 14:32 | |
*** dpawlik has quit IRC | 14:34 | |
*** ifat_afek has joined #openstack-infra | 14:34 | |
*** dayou has quit IRC | 14:36 | |
fungi | it's been a while, but i think we had to do something uglier because etherpad refused to start if abiword wasn't present or something like that | 14:38 |
*** kaiokmo has joined #openstack-infra | 14:40 | |
*** e0ne has quit IRC | 14:41 | |
*** slaweq has joined #openstack-infra | 14:41 | |
*** e0ne has joined #openstack-infra | 14:44 | |
openstackgerrit | Pierre Riteau proposed openstack/diskimage-builder master: Increase size of EFI system partition https://review.openstack.org/614526 | 14:45 |
*** quiquell is now known as quiquell|brb | 14:48 | |
openstackgerrit | Ivan Kolodyazhny proposed openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard https://review.openstack.org/614507 | 14:51 |
fungi | the etherpad-lite restart seems to have freed a fair amount of cache memory: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=117&rra_id=all | 14:53 |
fungi | however the system cpu activity is persisting: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=114&rra_id=all | 14:53 |
clarkb | fungi: can we try the hwe kernel first? | 14:54 |
fungi | oh, right, we wanted to try the hwe kernel | 14:54 |
fungi | should that be a manual install or puppeted? | 14:54 |
clarkb | fungi: I believe we puppet it for the zuul executors | 14:54 |
clarkb | but the resource usage and memory issues remind me a lot of the zuul executors | 14:54 |
clarkb | however if we just want to one off test it we can probably do that by hand | 14:55 |
clarkb | then add to puppet if it fixes or leave as is and rebuild bigger if not | 14:55 |
*** d0ugal has joined #openstack-infra | 14:55 | |
clarkb | as for abiword I believe its a part of the etherpad config file (you set a path to the abiword executable) | 14:55 |
*** quiquell|brb is now known as quiquell | 14:56 | |
clarkb | https://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n930 puppet for hwe kernel | 15:00 |
*** ianychoi has quit IRC | 15:00 | |
fungi | i'm looking to see whether they eventually made it possible to just disable abiword integration completely | 15:00 |
clarkb | fungi: https://git.openstack.org/cgit/openstack-infra/puppet-etherpad_lite/tree/templates/etherpad-lite_settings.json.erb#n48 says we set that value to null to disable abiword | 15:00 |
*** gyee has joined #openstack-infra | 15:01 | |
*** slaweq has quit IRC | 15:01 | |
mordred | fungi, clarkb: maybe we should uninstall abiword and then symlink /usr/bin/abiword to /bin/false | 15:04 |
clarkb | or just disable it in the config? | 15:04 |
*** rossella_s has quit IRC | 15:04 | |
clarkb | I don't know how many people use the import/export functionality | 15:04 |
mordred | oh - duh. I can't read | 15:05 |
*** jistr is now known as jistr|call | 15:05 | |
*** rossella_s has joined #openstack-infra | 15:07 | |
*** rpioso|afk is now known as rpioso | 15:11 | |
*** rossella_s has quit IRC | 15:12 | |
*** ccamacho has quit IRC | 15:13 | |
*** dave-mccowan has joined #openstack-infra | 15:19 | |
*** ash_williams has left #openstack-infra | 15:21 | |
*** ash_williams has joined #openstack-infra | 15:21 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/system-config master: Make the pip3 role really install something https://review.openstack.org/614545 | 15:23 |
frickler | Shrews: fungi: ^^ I think this should fix the venv issue | 15:23 |
frickler | this is called from https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/bridge.yaml | 15:24 |
*** dave-mccowan has quit IRC | 15:25 | |
*** dayou has joined #openstack-infra | 15:25 | |
*** bhavikdbavishi has joined #openstack-infra | 15:26 | |
*** rossella_s has joined #openstack-infra | 15:27 | |
*** slaweq has joined #openstack-infra | 15:30 | |
lennyb | Hi, I see that my nodepool process takes a lot of CPU. Is there any config that I can tune it? | 15:34 |
*** jistr|call is now known as jistr | 15:35 | |
fungi | lennyb: which process specifically? nodepool consists of several daemons | 15:38 |
*** quiquell is now known as quiquell|off | 15:38 | |
lennyb | fungi: nodepoold. http://paste.openstack.org/show/733706/ I see it takes a lot of CPU resource during delete and building | 15:43 |
fungi | lennyb: ahh, i suspect this is because it's polling for deletion and build. there are a lot of idle cycles in those threads though so the load measurements aren't necessarily a reliable indicator of cpu cycles consumed by them | 15:44 |
lennyb | fungi: I see. Thanks. | 15:46 |
fungi | basically they wake up periodically to check and see if nodes need to be built, if nodes have been built, if nodes need to be deleted, and if nodes have been deleted and then take corresponding actions based on those queries, and go back to sleep again | 15:47 |
*** kopecmartin is now known as kopecmartin|off | 15:49 | |
lennyb | fungi: I have a lot of short jobs ( 5-10min) that are deleting nodes and create them again. When there are too many commits nodepool takes too much time to delete and add a new node | 15:51 |
lennyb | I meant nodepool is deleting nodes after 5-10mins of job run | 15:51 |
*** markvoelker has joined #openstack-infra | 15:52 | |
fungi | node deletion can be fairly resource-intensive on your cloud, so it's not surprising that it might take some time for deletion requests to be fulfilled. does the debug log indicate nodepool is requesting deletion right away once the job completes? | 15:54 |
fungi | also, what version of nodepool is this? modern nodepool would have a separate nodepool-launcher daemon to handle that activity | 15:55 |
fungi | and there's been a lot of work put into making it more efficient so it doesn't block other threads | 15:55 |
*** carl_cai has quit IRC | 15:56 | |
*** eharney has quit IRC | 15:59 | |
clarkb | if nodepoold its land before time nodepool | 16:03 |
clarkb | and that nodepool was less efficicent with threads because it was harder to split it up into many processes | 16:03 |
clarkb | mordred: fungi Shrews ok I'm going to do breakfast things now that my meeting is over, but when I get back I'd like to see if we can make the etherpad changes to start getting to a happier place maybe | 16:04 |
mordred | clarkb: ++ | 16:04 |
*** agopi is now known as agopi|food | 16:04 | |
*** jamesmcarthur has joined #openstack-infra | 16:06 | |
corvus | clarkb, fungi, mordred: opendev nameservers -- https://review.openstack.org/610066 | 16:06 |
mordred | corvus: you're an opendev nameserver | 16:08 |
mordred | (+2) | 16:08 |
corvus | no -- 104.239.140.165 is an opendev nameserver! | 16:09 |
fungi | clarkb: clarkb we want to apt install linux-image-virtual-hwe-16.04 from the look of it? | 16:11 |
fungi | i can do that now and then reboot the server and see what it does to the cacti graphs | 16:12 |
clarkb | ya that looks right | 16:12 |
fungi | also, noted that killing abiword causes nodepool to immediately respawn it | 16:12 |
corvus | etherpad? | 16:13 |
fungi | yes, sorry, etherpad | 16:13 |
fungi | nodepool on the brainz | 16:13 |
corvus | it *also* respawns things that are killed | 16:13 |
mordred | corvus: oh - while you were out, we added abiword support to nodepool | 16:13 |
*** janki has quit IRC | 16:13 | |
mordred | corvus: if you don't know what that means, I can't explain it :) | 16:13 |
clarkb | then get a bunch of people to type on a pad or three and if not fixed disable abiword? | 16:13 |
fungi | no, while you were out we added pdf export functionality to nodepool with an abiword backend ;) | 16:13 |
mordred | fungi: I enjoy pdf-exporting my vm images | 16:14 |
corvus | mordred: figures -- you know i'd only approve adding libreoffice if i were around. | 16:14 |
AJaeger | mordred: let's use svg, please ;) | 16:14 |
corvus | mordred: what are you going to do if we need a spreadsheet? huh? | 16:14 |
mordred | corvus: panic | 16:14 |
fungi | AJaeger: could we do xbm maybe? | 16:15 |
fungi | i sort of miss it | 16:15 |
corvus | how many people in channel are thinking about whether we can have nodepool automatically update an ethercalc spreadsheet with its node list? | 16:15 |
*** ifat_afek has quit IRC | 16:15 | |
fungi | i'd need a few more drinks first | 16:15 |
*** ifat_afek has joined #openstack-infra | 16:16 | |
corvus | spreadsheets are an underutilizied user interface paradigm. | 16:16 |
AJaeger | fungi: let's wait until next vacation of corvus, ok? | 16:17 |
fungi | wfm | 16:17 |
fungi | i'll just go back to gorging on candy corn | 16:17 |
clarkb | corvus is going to get hired to write vssheet | 16:18 |
*** bhavikdbavishi has quit IRC | 16:18 | |
*** bhavikdbavishi has joined #openstack-infra | 16:19 | |
fungi | #status log manually installed linux-image-virtual-hwe-16.04 on etherpad01.openstack.org to test out theory about cache memory and system cpu utilization | 16:19 |
AJaeger | clarkb: is that an IBM product? | 16:19 |
openstackstatus | fungi: finished logging | 16:19 |
fungi | should i #status notice the etherpad server reboot? or jdi? | 16:21 |
clarkb | AJaeger: I was thinking of visual studio -> vsscode, excel -> vssheet | 16:21 |
AJaeger | ah | 16:21 |
clarkb | fungi I think I just did the upgrade | 16:22 |
fungi | a'ight... here goes nothing (and hopefully here comes something) | 16:22 |
fungi | it's pinging again | 16:23 |
fungi | i can ssh into it | 16:23 |
fungi | Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-38-generic x86_64) | 16:23 |
fungi | that looks like the kernel we wanted | 16:24 |
fungi | etherpad nodejs process is running | 16:24 |
fungi | i've attempted to refresh some etherpads i had up before the reboot and they all load | 16:25 |
fungi | keeping an eye on cacti graphs to see if this made any noticeable difference | 16:25 |
*** hashar is now known as hasharAway | 16:27 | |
*** kencjohnston has quit IRC | 16:27 | |
clarkb | as soon as tea is made I'll add a firefox and chrome client to some pads | 16:27 |
*** ifat_afek has quit IRC | 16:29 | |
*** kencjohnston has joined #openstack-infra | 16:29 | |
*** ifat_afek has joined #openstack-infra | 16:30 | |
clarkb | https://etherpad.openstack.org/p/clarkb-test is my long running test pad if anyone else wants to join there and add a bunch of writers/readers | 16:34 |
openstackgerrit | Merged openstack-infra/system-config master: Add opendev nameservers (2/2) https://review.openstack.org/610066 | 16:35 |
clarkb | fungi: looks like cpu usage on etherpad01 hasn't changed | 16:37 |
*** pcaruana|elisa| has quit IRC | 16:37 | |
fungi | i concur | 16:37 |
openstackgerrit | Merged openstack-infra/zuul master: Filter file comments for existing files https://review.openstack.org/613161 | 16:37 |
fungi | and cache memory use may have even gone up a little | 16:37 |
clarkb | that said my test etherpad doesn't feel slow | 16:37 |
clarkb | fungi: I would disable abiword as the next thing. Maybe put etherpad01.openstack.org in the emergency file and update the config by hand and restart the service? | 16:39 |
clarkb | (debugging quickly over waiting for changes to go through the gate then we forget) | 16:39 |
fungi | i think slowness might correspond to the (granted miniscule) bumps in the swap usage graph: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=119&rra_id=all | 16:39 |
clarkb | ah | 16:39 |
clarkb | so cpu itself likely not the indicator. That would make sense. Should we let it run on hwe for a bit then and see if it swaps? | 16:40 |
fungi | i wonder if it's somehow paging out something it shouldn't | 16:40 |
*** trown is now known as trown|lunch | 16:40 | |
*** gfidente is now known as gfidente|afk | 16:40 | |
clarkb | iirc part of the old kernel problem was some issue with memory management | 16:40 |
fungi | yeah, let's give it a couple days | 16:40 |
clarkb | it didn't consider portions of freeable memory freeable iirc leading to more swapping | 16:41 |
fungi | but ideally we'll upgrade to a bigger server before berlin if we decide it's warranted | 16:41 |
clarkb | ++ | 16:41 |
*** Dobroslaw has quit IRC | 16:41 | |
fungi | last thing i want is to return to the bad old days of spending summit week troubleshooting/upgrading etherpad | 16:41 |
clarkb | smcginnis: ^ fyi we've made change to etherpad server. Please let us know if you still see that slowing behavior that you've seen recently | 16:42 |
clarkb | smcginnis: will be valuable info | 16:42 |
*** eernst has quit IRC | 16:42 | |
smcginnis | OK, great. Will do. | 16:42 |
fungi | dhellmann: ^ | 16:42 |
fungi | infra-root: looks like we have a bunch of +2 votes on 607699 at this point. mind if i self-approve so it's in place by the next time we decide to do a gerrit restart? | 16:45 |
fungi | granted gerrit restarts aren't as common an occurrence for us any longer. last one was nearly 3 months ago now! | 16:45 |
clarkb | fungi: I saw go for it | 16:45 |
clarkb | fungi: and we can probably restart gerrit next week wheile everyone is one a plane (if not sooner) | 16:46 |
fungi | yeah, i could do it in the middle of tonight if i remember | 16:46 |
smcginnis | clarkb: Still seeing that slow behavior. Or it's died. | 16:46 |
smcginnis | Oh, finally loaded. | 16:46 |
fungi | yeah, getting sluggish for me too | 16:47 |
clarkb | ya my test pad worked for my writer but my reader had to refresh to see the data | 16:48 |
clarkb | fungi: stop abiword next? | 16:48 |
fungi | yeah, need to figure out how to go about making that happen | 16:49 |
clarkb | note memory use is minimal | 16:49 |
clarkb | implying it is a cpu not memroy issue | 16:49 |
fungi | kill it and it just comes right back. hallowe'en zombie abiword | 16:49 |
clarkb | fungi: you need to update teh config (see link above) then restart etherpad | 16:49 |
*** gyee has quit IRC | 16:50 | |
fungi | curious why we didn't set it to null in our config already... | 16:52 |
*** rh-jelabarre has quit IRC | 16:52 | |
clarkb | fungi: because it provides functioanlity | 16:52 |
clarkb | we probably don't need that functionality if that is the source of the bug | 16:52 |
fungi | well, i thought we'd done something to prevent abiword from spawning in the past. did we undo that at some point i guess? | 16:53 |
*** chandankumar is now known as chkumar|off | 16:53 | |
erlon | @all, does anyone has had problems to ssh in a VM with public keys? The keys are just not being uploaded to the instance after it boots. I have checked and the key_data is being in the logs when the instance is being created | 16:53 |
clarkb | I don't recall | 16:53 |
clarkb | erlon: it is up to the VM image to run software of some sort like cloud-init or glean to write the keys to disk for whatever users you have | 16:54 |
clarkb | erlon: what images are you using? | 16:54 |
erlon | clarkb, tried both last cirros and last ubuntu | 16:54 |
erlon | both have cloud init | 16:54 |
fungi | okay, i've set "abiword" : null in /opt/etherpad-lite/etherpad-lite/settings.json and restarted the etherpad-lite service | 16:55 |
erlon | the cirros one, I could log using a password and create .ssh/authorized_keys manually, then it worked | 16:55 |
dhellmann | clarkb , fungi : thanks. I'm still seeing reconnections on the etherpads I have open :-/ | 16:55 |
fungi | the node process is running again but no abiword process this time | 16:55 |
fungi | dhellmann: well, i just this moment restarted it, so you will | 16:55 |
dhellmann | yeah, duh, I just made that connection | 16:55 |
clarkb | erlon: for ubuntu you are ssh'ing as the ubuntu user? | 16:56 |
clarkb | erlon: and this is the ubuntu image published by ubuntu? | 16:56 |
clarkb | so many connections! | 16:56 |
dhellmann | fungi: one day I'll learn to read all the way to the bottom of the buffer before replying | 16:56 |
fungi | heh | 16:56 |
fungi | no worries! | 16:56 |
*** diablo_rojo has joined #openstack-infra | 16:56 | |
*** shardy has quit IRC | 16:56 | |
*** shardy_ has quit IRC | 16:56 | |
*** bhavikdbavishi1 has joined #openstack-infra | 16:57 | |
erlon | clarkb, let me check | 16:57 |
fungi | i reloaded half a dozen pads i had open in my browser tabs and they all came up, though did take a little time to load | 16:58 |
*** ash_williams is now known as ash_sawing | 16:59 | |
*** rh-jelabarre has joined #openstack-infra | 16:59 | |
*** agopi|food is now known as agopi | 16:59 | |
fungi | clarkb: cache memory usage had dropped drastically so far! | 16:59 |
*** bhavikdbavishi has quit IRC | 16:59 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 16:59 | |
*** jpich has quit IRC | 16:59 | |
clarkb | fungi: great, fwiw we use the asme version of nodejs on trusty and xenial so the kernel or abiword (basically external to node and etherpad itself) seem most likely causes | 17:00 |
fungi | the kernel seems to be the cache memory explanation | 17:00 |
erlon | clarkb, yeap, used ubuntu. I downloaded it from the official repo | 17:00 |
erlon | same with cirros | 17:00 |
fungi | that hasn't come back up since ~16:25z | 17:00 |
clarkb | erlon: ok you need to login as the ubuntu user, but it should use cloud init | 17:00 |
fungi | which corresponds to the reboot with the hwe kernel | 17:01 |
clarkb | fungi: gotcha | 17:01 |
fungi | will need a few minutes to be able to see if dropping abiword has impacted system cpu utilization | 17:01 |
erlon | clarkb, any idea on how the cloudd-init works, how is triggered who does that, I might need to dig some deeper | 17:01 |
fungi | though looking at top there's still a fair bit | 17:01 |
erlon | Im installing a fresh devstack env as well to see if the problem still happens | 17:02 |
clarkb | erlon: it runs as a system init service, reads metadata and config drive then configures the host. That is the other thing to check. Do you have metadata server enabled? If not make sure config drive is enabled | 17:03 |
*** udesale has quit IRC | 17:03 | |
erlon | clarkb, hmmm, good point to check! I know that nova has this option enabled: enabled_apis = osapi_compute,metadata | 17:05 |
erlon | clarkb, ill check for the other things, thanks a lot for now | 17:06 |
dhellmann | fungi : fwiw, I'm still seeing reconnections. it sounds like you're still monitoring and it wasn't clear if you were expecting a change in behavior, yet. | 17:07 |
clarkb | dhellmann: I've not seen a reconnection on my end, but I am multitasking | 17:07 |
clarkb | fungi: considering it is reconnections. Maybe it has to do with apache and connection number defaults? | 17:07 |
fungi | dhellmann: if it's on attempting to use a tab you had open from before the 16:55z service restart then you might see that | 17:08 |
clarkb | I thought we configured apache properly for that already, but maybe things have changed sufficiently between releasesthat we aren't | 17:08 |
*** e0ne has quit IRC | 17:08 | |
fungi | possible those options changed between the apache versions on trusty and xenial? | 17:09 |
clarkb | ya or maybe default worker type so we configure the wrong one? | 17:10 |
clarkb | that said I noticed the slowdown when smcginnis mentioned it but haven't noticed any since abiword was disabled | 17:10 |
*** eharney has joined #openstack-infra | 17:10 | |
clarkb | cacti graphs say no change in cpu though | 17:10 |
fungi | i concur | 17:11 |
fungi | that could be a red herring, but it looks like it picked up ~2 weeks ago | 17:11 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=115&rra_id=all shows some failed connections recently | 17:11 |
*** Swami has joined #openstack-infra | 17:12 | |
clarkb | there are ~ 469 connections according to netstat | 17:13 |
fungi | the established tcp connections graph looks fairly consistent with before the upgrade to xenial though | 17:13 |
clarkb | ya but the TCP Open Stats are different | 17:14 |
clarkb | there is a jump from less than one per second to ~4 on average? | 17:14 |
clarkb | [Wed Oct 31 17:13:56.918371 2018] [mpm_event:error] [pid 1651:tid 140460829304704] AH00485: scoreboard is full, not at MaxRequestWorkers | 17:15 |
clarkb | I don't know what that means, but it looks very suspicious | 17:15 |
fungi | ahh, yeah tcp open does jump drastically after te xenial upgrade | 17:15 |
fungi | i find discussions of AH00485 going all the way back to 2015-09-08 17:14:07 in the channel log | 17:16 |
fungi | most recently 2018-02-06 18:55:54 | 17:16 |
fungi | i need to pop out to lunch before the sb meeting, but can look into this some more when i get back | 17:17 |
clarkb | ya I think this is the issue that led us to restart apache prior to summits/PTGs | 17:17 |
clarkb | basiaclly apache doesn't always recycle workers properly. Its possible that newer apache on xenial is worse? | 17:17 |
fungi | very well could be | 17:17 |
clarkb | fungi: one specific suggestion on the internets is to set maxconnectionsperchild to 0 so that it never tries to recycle workers | 17:17 |
fungi | okay, headed out, will be back in an hour-ish | 17:17 |
clarkb | ok I see at least part of the bug | 17:19 |
clarkb | /etc/apache2/conf-enabled/connection-tuning is a broken symlink | 17:19 |
clarkb | also it should have a .conf after it | 17:19 |
clarkb | and we set maxrequestsperchild to 0 there already | 17:20 |
* clarkb goes to the puppet to see how to fix | 17:20 | |
*** xek has quit IRC | 17:20 | |
*** shrasool has joined #openstack-infra | 17:22 | |
openstackgerrit | Clark Boylan proposed openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration https://review.openstack.org/614595 | 17:23 |
clarkb | lets see if ^ makes this a happier server | 17:23 |
clarkb | things we have learned, apache connection tuning is broken. New kernel makes memory usage look saner. Abiword was maybe completely unrelated | 17:24 |
AJaeger | config-core, two small cleanups for review, please: https://review.openstack.org/614440 and https://review.openstack.org/614507 | 17:25 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects https://review.openstack.org/613143 | 17:26 |
*** ginopc has quit IRC | 17:29 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/yaml2ical master: add monthly recurrence options https://review.openstack.org/608680 | 17:29 |
corvus | i single-core approved the etherpad symlink fix; it's pretty evident it's typo-broken | 17:30 |
clarkb | corvus: thanks | 17:30 |
corvus | i have manually installed links on etherpad so i can hit http://localhost/server-status | 17:30 |
corvus | and yeah, the number of workers is tiny. | 17:31 |
*** shrasool has quit IRC | 17:33 | |
corvus | we're using the event mpm, and we're maxing out at 128 concurrent requests. when the correct tuning config goes into place, we'll have 4096 slots. | 17:33 |
corvus | (concurrent requests == concurrent etherpad users because of websocket) | 17:33 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove the build-placement-api-ref job definition https://review.openstack.org/614440 | 17:36 |
corvus | mnaser: may i please trouble you for a rdns update when you have a moment? 162.253.55.16 and 2604:e100:1:0:f816:3eff:fe2c:7447 should point to ns2.opendev.org | 17:37 |
*** electrofelix has quit IRC | 17:38 | |
*** florianf is now known as florianf|afk | 17:38 | |
*** aojea has quit IRC | 17:40 | |
openstackgerrit | Merged openstack-infra/project-config master: Fix tooltip for 'Horizon Failure Rate' dashboard https://review.openstack.org/614507 | 17:40 |
*** harlowja has joined #openstack-infra | 17:44 | |
*** gyee has joined #openstack-infra | 17:44 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Set ansible python version for opendev nameservers https://review.openstack.org/614607 | 17:46 |
*** sambetts is now known as sambetts|afk | 17:46 | |
corvus | do we have puppet working on bionic yet? | 17:46 |
clarkb | fungi: ^ | 17:47 |
clarkb | corvus: I am not aware of it if we do. The puppet 4 work is far enough along that if puppet 4 works on bionic (it was 5 though iirc) that should all just work | 17:47 |
clarkb | iirc fungi was installing puppet 3 on bionic to mixed success. The thing to try is installing puppet 4 next probably | 17:48 |
corvus | what's the state of running containers? | 17:48 |
clarkb | corvus: aiui the missing piece for that is image builds for various services. If we already have images being built (eg nodepool and zuul) then the ansible is ready to run containers instead of puppet | 17:49 |
*** ifat_afek has quit IRC | 17:49 | |
clarkb | but I don't think anyone has done that for a service yet | 17:50 |
corvus | clarkb: we need https://review.openstack.org/605585 first? | 17:50 |
clarkb | ianw has some chagnes in progress for graphite though | 17:50 |
clarkb | corvus: ya we likely need some of the things ianw has been working on | 17:50 |
corvus | it looks like we might be a couple of months out from having working opendev nameservers. | 17:51 |
clarkb | corvus: why? | 17:51 |
corvus | clarkb: we still can't manage bionic machines | 17:52 |
clarkb | corvus: can we deploy on xenial? | 17:52 |
corvus | clarkb: we could. i'm not willing to. | 17:52 |
corvus | if someone else wants to pick up the work, i'm happy to hand it over. | 17:52 |
clarkb | ok. Are you willing to help fungi or ianw with the various bits of work that make bionic viable? | 17:52 |
corvus | clarkb: i thought i have been :) | 17:52 |
*** panda is now known as panda|off | 17:53 | |
clarkb | ok I just wasn't sure how ready to go you wanted bionic (or deployments in general) to be | 17:53 |
corvus | i'm not sure what that means | 17:54 |
clarkb | just that part of the effort here includes making xenial + puppet viable medium term so that we don't have to do an immediate cutover of all services at once (this is the puppet 4 portion of the spec) | 17:55 |
clarkb | and if that doesn't interest you I am asking how much of the other aspects of teh spec interest you (sounds like you are happy to help on the things that make bionic viable which is mostly ansible + containers) | 17:56 |
corvus | right, i'm happy to use either puppet or docker, whichever works on bionic. i spent several days trying to get puppet to work on bionic, and was not successful. you may remember such changes as 564898 and 564891. | 17:57 |
clarkb | re ansible + containers in particular ianw's work with graphite is the place to start I think. That includes improving the startup time of ansible with inventory group generation so that ansible can be run more often without a multi minute startup time | 17:57 |
clarkb | (ianw updated mordred's inventory plugin change and intends on merging that soonish and watching it I think) | 17:58 |
corvus | clarkb: i'd be happy with a multi-minute startup time. though i have also done work on that, having identified the error which caused that. | 17:58 |
corvus | clarkb: i've found much more success in the ansible area, having implemented most of the testing stack which is now being used to show the error in http://logs.openstack.org/85/605585/4/check/system-config-run-docker/8dc36c0/job-output.txt.gz | 17:59 |
corvus | clarkb: it seems to be *that* is the blocker | 17:59 |
corvus | er, seems to 'me' | 17:59 |
clarkb | ya interestingly it almost looks like a package bug (or maybe we are installign the wrong package) | 18:01 |
clarkb | the docker daemon failed to start after it was installed by apt | 18:01 |
corvus | is there a reason we're installing from upstream docker? | 18:01 |
*** derekh has quit IRC | 18:01 | |
corvus | well, the spec says so :) | 18:01 |
corvus | but it doesn't say why | 18:01 |
clarkb | corvus: that is the default for mordred's install-docker role (it also supports installing from distro). I imagine it is because docker moves relatively quickly compared to the ditros | 18:02 |
clarkb | and there are features/bugfixes that we may want through that channel | 18:02 |
corvus | well, on that note, the test results are a month old, i'll recheck :) | 18:02 |
clarkb | seems reasonable if it is ap ackage bug hopefully they have fixed it | 18:03 |
*** jpena is now known as jpena|off | 18:04 | |
*** trown|lunch is now known as trown | 18:04 | |
corvus | clarkb: weirdly, the other job failure for that change is the base ansible tests, which failed to install puppet on every node type *except* bionic. | 18:05 |
clarkb | huh maybe something in it is breaking apt? | 18:06 |
clarkb | bah now it is in merge conflict | 18:06 |
*** ash_sawing has quit IRC | 18:07 | |
clarkb | lokos like the parent finally passes tests though https://review.openstack.org/#/c/602385/12 so maybe a rebase is in order (it depends on old ps of the yamlgroup change) | 18:07 |
clarkb | I'll see how difficult that rebase is | 18:09 |
*** mriedem has joined #openstack-infra | 18:11 | |
*** mriedem is now known as ash_williams | 18:11 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role https://review.openstack.org/605585 | 18:12 |
clarkb | I think that should get it tested again | 18:13 |
*** kjackal has quit IRC | 18:16 | |
*** ralonsoh has quit IRC | 18:17 | |
*** dtantsur is now known as dtantsur|afk | 18:17 | |
Shrews | clarkb: uh, don't we already have an install-docker role? | 18:22 |
clarkb | Shrews: yes in zuul-jobs. This is for infra deployments from bridge.openstack.org | 18:22 |
Shrews | oh, looked over the repo part | 18:23 |
*** noama has quit IRC | 18:25 | |
clarkb | corvus: mordred ianw I left a comment on the yamlgroup change https://review.openstack.org/#/c/602385/12 I think there is a class of bug in the groups.yaml conversion that we need to fix | 18:30 |
*** bhavikdbavishi has quit IRC | 18:30 | |
clarkb | if someone else can review that too and make sure I am not wrong that would be appreciated | 18:30 |
*** e0ne has joined #openstack-infra | 18:31 | |
fungi | okay, burrito conquered, back for more troubleshooting | 18:32 |
fungi | once i catch up on scrollback anyway | 18:32 |
fungi | reviewing etherpad apache tuning fix | 18:32 |
*** e0ne has quit IRC | 18:32 | |
*** psachin has quit IRC | 18:36 | |
fungi | yeah, i concur with the analysis there | 18:39 |
openstackgerrit | Emilien Macchi proposed openstack-infra/project-config master: Add publish jobs for ansible-role-openstack-operations https://review.openstack.org/614616 | 18:40 |
fungi | we can likely roll back the abiword disablement and return to the default xenial kernel if that's the only fix we end up needing | 18:40 |
EmilienM | hey infra, just a bit of an update on tripleo ci | 18:40 |
clarkb | ya, understanding the kernel memory stuff a bit better would probably be good though | 18:40 |
EmilienM | we recently switched a bunch of our CI jobs to use podman instead of docker | 18:40 |
EmilienM | and a regression in podman caused our gate to be very unstable | 18:41 |
EmilienM | we figured that out last night | 18:41 |
EmilienM | we are reverting these jobs to docker now | 18:41 |
clarkb | EmilienM: out of curiousity do you know when? (so that we can correlate thigns to graphite and elasticsearch, etc) | 18:41 |
EmilienM | so the situation should be better | 18:41 |
EmilienM | clarkb: error started on Oct 26th | 18:41 |
clarkb | EmilienM: also re podman, don't forget it wasn't using the infra caching mirror either | 18:41 |
clarkb | (so there were likely layers of failures there) | 18:41 |
EmilienM | so for that problem we have retries, that mitigated the problem | 18:42 |
EmilienM | but right now we hit an actual bug inside podman (with selinux) | 18:42 |
clarkb | except that makes jobs run longer | 18:42 |
clarkb | then they timeout | 18:42 |
EmilienM | right, we reported the bug too... | 18:42 |
clarkb | we shouldn't go back to podman without fixing that item too | 18:42 |
EmilienM | I agree very much | 18:42 |
clarkb | EmilienM: when was the podman switch? was that the 26th too? | 18:43 |
EmilienM | clarkb: before | 18:44 |
EmilienM | clarkb: we think it's a regression in a recent version | 18:44 |
EmilienM | https://github.com/containers/libpod/issues/1739 | 18:44 |
clarkb | EmilienM: can we get that date too (since podman not using the mirrors would also likely lead to failures, its more data we can compare against our stats and logs with) | 18:44 |
EmilienM | https://review.openstack.org/#/c/614537/ | 18:44 |
EmilienM | there are 3 patches that enabled podman in CI | 18:45 |
EmilienM | and all 3 are being reverted now (squashed) | 18:45 |
clarkb | looks like october 12 for the first podman switch | 18:45 |
clarkb | EmilienM: what is fs010 ? | 18:47 |
EmilienM | clarkb: container-multinode | 18:47 |
EmilienM | our most popular/run job | 18:47 |
EmilienM | tripleo-ci-centos-7-containers-multinode | 18:47 |
clarkb | ok so its shorthand for a job? | 18:47 |
clarkb | got it | 18:47 |
EmilienM | clarkb: https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged) | 18:49 |
EmilienM | we are tacking drastic measures, we really understand the trouble we make here | 18:49 |
ssbarnea | mordred : if you have few minutes, I could use some hints on adding the docker job. please have a look at https://review.openstack.org/#/c/613672/ -- mainly I am doing something stupid in playbooks/molecule.yml (run) | 18:51 |
fungi | EmilienM: it's really and truly appreciated | 18:51 |
clarkb | EmilienM: yup, I'm just trying to make sure I understand what the various moving pieces are and fs010 stood at to me as possibly important and I didn't understand it :) | 18:52 |
*** jamesmcarthur has quit IRC | 18:52 | |
fungi | we were a bit stressed initially about publishing that data at all without making sure we didn't accidentally raise angry mobs coming after the tripleo team | 18:52 |
*** jamesmcarthur has joined #openstack-infra | 18:52 | |
fungi | we know you've got a lot going on, and are glad you're able to work on improving this | 18:52 |
AJaeger | EmilienM: please see my -1 on https://review.openstack.org/#/c/614570/1 | 18:54 |
EmilienM | AJaeger: ack | 18:55 |
clarkb | ssbarnea: left a couple notes for you | 18:55 |
EmilienM | AJaeger: no need since there is no gate section anymore | 18:55 |
EmilienM | AJaeger: right? | 18:55 |
AJaeger | EmilienM: https://review.openstack.org/#/c/614593/1 | 18:55 |
AJaeger | EmilienM: you still have gates via the template | 18:56 |
AJaeger | So, if you want these in the same queue, those two lines (gate/queue) are needed. | 18:56 |
*** sshnaidm|bbl is now known as sshnaidm|ruck | 18:57 | |
EmilienM | mwhahaha: ^ | 18:57 |
EmilienM | AJaeger: ack, thx | 18:57 |
*** jamesmcarthur has quit IRC | 18:57 | |
mwhahaha | AJaeger: got it, thanks | 18:57 |
corvus | AJaeger: weren't we asking folks to keep queue lines in project-config? | 18:57 |
AJaeger | corvus: only integrated | 18:57 |
AJaeger | corvus: others are optional... | 18:58 |
AJaeger | corvus: https://docs.openstack.org/infra/manual/creators.html#shared-queues-for-cross-project-testing | 18:58 |
corvus | ok | 18:59 |
corvus | i guess the tripleo queue contains only tripleo projects | 18:59 |
AJaeger | corvus: yep | 19:00 |
AJaeger | corvus: hope so ;) | 19:00 |
AJaeger | mwhahaha: LGTM now. | 19:00 |
ssbarnea | clarkb: comments made of gold. thanks! | 19:00 |
AJaeger | mwhahaha, EmilienM , those are indeed drastic measures... | 19:00 |
AJaeger | mwhahaha, EmilienM, you could use templates for these jobs - makes it easier to change in a single place for next time... | 19:01 |
mwhahaha | AJaeger: we do have them, but we were using specific file rules in the other projects | 19:01 |
*** jamesmcarthur has joined #openstack-infra | 19:01 | |
mwhahaha | AJaeger: so they are inheriting from the template but we have to tweak it | 19:01 |
mwhahaha | i guess we could merge the file sections up into the template however | 19:02 |
AJaeger | mwhahaha: ah - yes, you could merge the files into the template | 19:02 |
*** jamesmcarthur has quit IRC | 19:08 | |
*** jamesmcarthur has joined #openstack-infra | 19:09 | |
*** erlon has quit IRC | 19:13 | |
openstackgerrit | Merged openstack-infra/system-config master: Hyperlink task footers https://review.openstack.org/607699 | 19:13 |
*** nicolasbock has joined #openstack-infra | 19:19 | |
clarkb | corvus: https://zuul.openstack.org/stream/3bb43f13f58e4b9f9e813d7bb38dbca0?logfile=console.log is the system-config docker test | 19:19 |
*** jcoufal has joined #openstack-infra | 19:21 | |
*** ash_williams is now known as mriedem_away | 19:25 | |
*** diablo_rojo has quit IRC | 19:26 | |
*** erlon has joined #openstack-infra | 19:27 | |
*** erlon has quit IRC | 19:35 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role https://review.openstack.org/605585 | 19:39 |
Shrews | anyone familiar enough with ansible-lint to understand what's happening here? http://logs.openstack.org/23/605823/6/check/openstack-zuul-jobs-linters/6bd2ec9/job-output.txt.gz#_2018-10-31_19_02_58_639943 | 19:40 |
clarkb | Shrews: http://logs.openstack.org/23/605823/6/check/openstack-zuul-jobs-linters/6bd2ec9/job-output.txt.gz#_2018-10-31_19_02_55_359983 the error is a few lines above | 19:41 |
*** jamesmcarthur has quit IRC | 19:44 | |
*** e0ne has joined #openstack-infra | 19:44 | |
*** betherly has joined #openstack-infra | 19:45 | |
*** jamesmcarthur has joined #openstack-infra | 19:45 | |
*** e0ne has quit IRC | 19:45 | |
*** diablo_rojo has joined #openstack-infra | 19:45 | |
openstackgerrit | Clark Boylan proposed openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration https://review.openstack.org/614595 | 19:46 |
clarkb | corvus: fungi ^ yay for testing | 19:46 |
*** jcoufal has quit IRC | 19:48 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul-jobs master: Add role to install kubernetes https://review.openstack.org/605823 | 19:49 |
*** betherly has quit IRC | 19:49 | |
*** apetrich has quit IRC | 19:50 | |
mordred | corvus, clarkb: sorry- was afk earlier - the reason I was pushing for installing from docker upstream and not from distro is, yes, upstream moves a bit quicker than distro | 19:50 |
*** erlon has joined #openstack-infra | 19:51 | |
mordred | corvus, clarkb: but also, the dockerhub urls and mirroring are different between xenial docker and upstream docker - so I figured since docker was new to us, go ahead and start with current state of the art rather than start with an old version | 19:51 |
clarkb | oh right that is why we have two different caching mirror endpoints for docker | 19:52 |
mordred | yeah | 19:52 |
clarkb | I do like the idea of not needing to chagne that | 19:52 |
mordred | probably should have included things like that in the spec | 19:52 |
clarkb | mordred: fwiw maybe you can look at https://review.openstack.org/605585 failures and see if my latest patchset makes sense? I noticed that apt-utils missing was in the list of things where it failed so added that | 19:53 |
clarkb | mordred: might need to add that to the z-j role too if that fixes stuff | 19:53 |
*** lujinluo has joined #openstack-infra | 19:53 | |
mordred | looking | 19:54 |
mordred | clarkb: what is apt-utils for? (justoutof curiosity) | 19:55 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515 | 19:55 |
*** lujinluo has quit IRC | 19:56 | |
clarkb | mordred: https://packages.debian.org/sid/apt-utils | 19:56 |
*** lujinluo has joined #openstack-infra | 19:56 | |
clarkb | mordred: my hunch is that docker-ce uses one of those tools, dpkg delays because no apt-utils then fails to start the daemon because some config is missing | 19:56 |
mordred | nod. | 19:56 |
mordred | worth a stab - although we haven't had the issue with the install-docker role in zuul jobs :( | 19:57 |
clarkb | mordred: I wonder if its a bionic thing? (I'm guessing that job is bionic only) | 19:57 |
corvus | clarkb: was it the same error? | 19:57 |
clarkb | corvus: yes same error again | 19:58 |
corvus | clarkb: yes, that job runs on bionic | 19:58 |
mordred | weird - cause the pbrx jobs run on bionic too | 19:58 |
clarkb | mordred: maybe bindep pulls in something on those jobs we don't pull in on these? | 19:59 |
clarkb | I imagine that distro packages that need apt-utils just dep on them but docker-ce doesn't | 19:59 |
clarkb | ? | 19:59 |
mordred | clarkb: we don't run bindep in the pbrx job ... this is weird | 20:01 |
fungi | pbrx relies on bindep to tell it what else to include in the image though, right? | 20:01 |
* clarkb wanders off to grab lunch now that mordred is looking at it | 20:03 | |
mordred | fungi: it does- but it does that inside of a container - so it doesn't bindep on the host itself | 20:03 |
*** apetrich has joined #openstack-infra | 20:04 | |
fungi | just making sure | 20:04 |
corvus | clarkb, mordred: http://logs.openstack.org/26/610726/3/check/pbrx-build-zuul-containers/7372a00/ara-report/result/bcba7357-f2d6-4527-9b8f-68cb2e34e17e/ | 20:04 |
mordred | yeah. its weird | 20:04 |
ianw | clarkb: hrm, i think you're right on the name matching. it would be great if we could unit test this ... | 20:04 |
*** dmsimard has quit IRC | 20:05 | |
corvus | clarkb, mordred: that makes me think apt-utils is a red herring | 20:06 |
ianw | clarkb / amorin : hrm, checking in on our port clearing screen, I'm seeing "HttpException: 500: Server Error for url: https://network.compute.gra1.cloud.ovh.net/v2.0/ports, {"NeutronError": {"message": "Request Failed: internal server error while processing your request."," | 20:06 |
mordred | corvus: I think so too | 20:06 |
ianw | and yeah, ovh graphs not looking promising .... http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 ... do we know about this? | 20:07 |
mordred | corvus: looking through th eoutput, it says the control process died and we shoudl run systemctl status docker.service to see what's up | 20:07 |
mordred | corvus: maybe we should toss in a systemctl status docker.service line to get some logs? | 20:07 |
mordred | oh - wait a sec | 20:07 |
clarkb | ianw: news to me | 20:08 |
mordred | we're installing an empty daemon.json file | 20:08 |
mordred | https://review.openstack.org/#/c/605585/6/playbooks/roles/install-docker/templates/daemon.json.j2 | 20:09 |
mordred | compared to | 20:09 |
mordred | http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/install-docker/templates/daemon.json.j2 | 20:09 |
corvus | that is a difference between that and zuul-jobs | 20:09 |
mordred | to maybe instead of an empty file - we should just not write one out at all | 20:09 |
clarkb | mordred: ++ | 20:10 |
corvus | it's also not actually empty | 20:10 |
mordred | yeah | 20:10 |
corvus | it has some things that don't look like valid json | 20:10 |
mordred | let me make that change real quick | 20:10 |
clarkb | but json doesn't have comments | 20:10 |
clarkb | so ya | 20:10 |
corvus | regarding the nameservers -- maybe we don't want to use docker for those anyway? maybe we want to just use os packages with ansible? | 20:11 |
clarkb | corvus: possibly. The big win with containers is greater control over versions of software. But for dns servers maybe we want to be conservative about that | 20:12 |
fungi | i'm unsure what containers buy us in the case where the software is already well-established and stably packaged on our distro of preference | 20:12 |
mordred | yeah. I think just ansible on bionic for nameservers is probably fine | 20:13 |
*** ramishra has quit IRC | 20:13 | |
openstackgerrit | Monty Taylor proposed openstack-infra/system-config master: Initial port of install-docker role https://review.openstack.org/605585 | 20:13 |
clarkb | fungi: one other thing is collocation of services. But I doubt we want to collocate dns servers anyway | 20:13 |
mordred | clarkb, corvus, fungi, ianw: removed the config file, and also re-removed apt-utils | 20:13 |
fungi | compare to mm3 where getting the dependencies right is a beast even on relatively recent distro versions and there are bugs even then which are fixed upstream but not yet packaged | 20:13 |
mordred | clarkb: ++ | 20:13 |
corvus | i'll see how fast i can port this puppet to ansible | 20:14 |
mordred | I think in this case the thing we want to install is very straight forward- and containers will just make it more complex | 20:14 |
*** apetrich has quit IRC | 20:14 | |
fungi | even i, as not a container fan, see how deploying mm3 from the upstream-provided container images is a win | 20:14 |
mordred | fungi: ++ | 20:14 |
mordred | fungi: there are times when it's an excellent format to allow upstreams to provide 'packaging' :) | 20:14 |
clarkb | ianw: looks to be gra1 specific? | 20:16 |
fungi | i'm very confused by gerrit... can anybody see why https://review.openstack.org/600472 claims to have been last updated 24 hours ago? | 20:16 |
ianw | clarkb: yeah, maybe it's on the way back | 20:17 |
ianw | fungi: yeah ... i had one of those the other day too, couldn't see any ci comments or anything | 20:18 |
clarkb | fungi: nothing stands out to me. I think the one thing that might not add a log item on the change itself is hitting the little x next to a vote value? | 20:18 |
*** ansmith has quit IRC | 20:18 | |
ianw | clarkb: it could be on the way back up. https://review.openstack.org/#/c/613196/ would help in the nodepool logs matching things up :) | 20:19 |
clarkb | ianw: ok I'm going to finish lunch then take a look at ^ as well as make sure that etherpad fix gets in | 20:20 |
*** hasharAway has quit IRC | 20:20 | |
*** jamesmcarthur has quit IRC | 20:24 | |
*** trown is now known as trown|outtypewww | 20:29 | |
*** e0ne has joined #openstack-infra | 20:30 | |
*** imacdonn has quit IRC | 20:34 | |
*** imacdonn has joined #openstack-infra | 20:34 | |
*** jamesmcarthur has joined #openstack-infra | 20:38 | |
*** jamesmcarthur has quit IRC | 20:43 | |
*** e0ne has quit IRC | 20:50 | |
*** kgiusti has left #openstack-infra | 20:50 | |
openstackgerrit | Merged openstack-infra/puppet-etherpad_lite master: Actually use connection-tuning configuration https://review.openstack.org/614595 | 20:55 |
*** shrasool has joined #openstack-infra | 20:58 | |
fungi | time for me to get presentable and head to a party. happy hallowe'en all! (except for the aussies who got to celebrate it a while i was asleep) | 20:58 |
clarkb | fungi: enjoy! and thank you for helping with the etherpad stuff | 20:58 |
clarkb | don't scare too many small children | 20:59 |
fungi | np, i see the fix hasn't landed yet but i'll try to check back in on it later | 20:59 |
clarkb | fungi: it just merged above, I'll make sure it gets applied | 20:59 |
*** lujinluo has quit IRC | 21:02 | |
*** boden has quit IRC | 21:03 | |
*** lujinluo has joined #openstack-infra | 21:03 | |
mordred | similar to fungi - I have hit the point in the day where I'm focusing on dealing with halloween - although in my case it's getting prepared for my friend ben to terrify the children | 21:04 |
clarkb | mordred: we should be able to clean up the openstack sdk stuff tomorrow? | 21:06 |
clarkb | mordred: thats an item I'd like to get off my list so let me know when that is ready | 21:06 |
*** shrasool has quit IRC | 21:07 | |
*** lujinluo has quit IRC | 21:07 | |
*** lujinluo has joined #openstack-infra | 21:08 | |
*** betherly has joined #openstack-infra | 21:27 | |
*** betherly has quit IRC | 21:32 | |
clarkb | smcginnis: dhellmann apache should've just restarted with the connection tuning fix | 21:35 |
clarkb | smcginnis: dhellmann: if you can watch out for reconnections/slowness form this point forward that will be helpful | 21:35 |
smcginnis | Will do! | 21:36 |
clarkb | actually we've leaked the old symlink so I am going to delete that and restart apache again just to be double sure | 21:36 |
clarkb | and done | 21:37 |
mordred | clarkb: yes - that's all ready to go for tomorrow | 21:41 |
*** erlon has quit IRC | 21:53 | |
dhellmann | clarkb : thanks, I'll let you know | 21:54 |
clarkb | mordred: ianw thinking about the yamlgroup thing a bit more, maybe we want it to use regexes rather than unix shell globs? | 21:54 |
clarkb | we'd still need to rewrite the yaml file but at elast python regexes are more common with our user base? | 21:54 |
*** eharney has quit IRC | 21:55 | |
clarkb | also the system config docker change still fails but now it fails because docker adds iptables rules we need to account for in the tests. I'll take a look at that shortly if it helps | 21:58 |
*** apetrich has joined #openstack-infra | 21:58 | |
*** kjackal has joined #openstack-infra | 22:02 | |
imacdonn | just had a check job fail, seemingly due to an ssh host key change ... do I just recheck it, or is something b0rked ? | 22:03 |
imacdonn | 2018-10-31 19:53:21.337694 | ubuntu-xenial | "msg": "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe | 22:03 |
imacdonn | fingerprint for the ED25519 key sent by the remote host is\nSHA256:qy9kk9BhbbsgVtMOsJqSUJ9Tb4yBjdGg+xwO90qFj9s.\r\nPlease contact your system administrator.\r\nAdd correct host key in /var/lib/zuul/builds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts to get rid of this message.\r\nOffending ED25519 key in /var/lib/zuul/builds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts:1\r\n remove with:\r\n ssh-keygen -f \"/var/lib/zuul/bui | 22:03 |
imacdonn | lds/6e7e81acfec64acaad97b06534903262/work/.ssh/known_hosts\" -R 104.130.222.138\r\nED25519 host key for 104.130.222.138 has changed and you have requested strict checking.\r\nHost key verification failed.\r\nrsync: connection unexpectedly closed (0 bytes received so far) [Receiver]\nrsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]\n", | 22:03 |
*** agopi is now known as agopi|brb | 22:04 | |
clarkb | imacdonn: can you link to the log file? | 22:06 |
imacdonn | clarkb: http://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/ | 22:06 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Initial port of install-docker role https://review.openstack.org/605585 | 22:07 |
*** agopi|brb has quit IRC | 22:08 | |
clarkb | ok bindep logs don't indicate anything would restart the sshd (possibly picking up a new host key) | 22:12 |
imacdonn | Could two hosts be trying to use the same IP address?> | 22:13 |
clarkb | imacdonn: it is theoretically possible. We'd need someone from rackspace to tell us if that was the case | 22:14 |
clarkb | the main ssh portion of the job (the run.yaml bit) uses a controlpersist managed connection that is started early in the job | 22:14 |
clarkb | the rsync file synchronizations do not so will start a new connection | 22:14 |
clarkb | If the the daemon has indeed changes its key we would notice this way, another host with the same IP is another possibility though I would expect problems with the main controlpersist connection in that case | 22:15 |
clarkb | is it possible the ssl-cert pacakge triggered a refresh of the sshd host keys? | 22:16 |
*** jamesmcarthur has joined #openstack-infra | 22:16 | |
clarkb | that would surprise me, but maybe it will do that for improving security reasons? | 22:16 |
imacdonn | it could be an ARP race ... and the cache expired while the main part of the job was running .. just an idea | 22:16 |
clarkb | imacdonn: ya | 22:16 |
imacdonn | it'd surprise me too, if a package update replaced host keys ... and not in a good way ;) | 22:17 |
clarkb | imacdonn: if this persists maybe the thing to check is have the tox collect logs role run an always block that cat's the host key? | 22:17 |
clarkb | since that should go over the existing ssh connection in theory it would work | 22:17 |
clarkb | and if that doesn't show anything wrong with the server then escalate with rax | 22:17 |
clarkb | corvus: ^ could we use the control persist connection for rsync too? should make it more reliable overall | 22:18 |
*** diablo_rojo has quit IRC | 22:18 | |
*** gyee has quit IRC | 22:19 | |
clarkb | we revoke sudo in that job so the cinder unittests themselves shouldn't be able to break ssh | 22:20 |
clarkb | (we might have a hard time dumping the host key file in that case too :( | 22:20 |
ianw | mordred / clarkb : could do regexes. i'm just writing up something that hopefully does a "unit" type test on it, we we can at least stress some edge cases | 22:20 |
corvus | catching up | 22:20 |
*** jamesmcarthur has quit IRC | 22:21 | |
clarkb | completely unrelated but apparently rhel 7.6 released todayish. So we should be on the lookout for centos 7.6 weirdness in the next few days (whenver that ends up being available) | 22:24 |
clarkb | http://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/ara-report/file/9580cb64-f384-4351-a50e-fb2a1dae3968/#line-34 that synchronize runs before the fatal synchronize | 22:26 |
*** rlandy is now known as rlandy|bbl | 22:26 | |
clarkb | it succeeds because failed when is set to false. Side effect of some tox runs not creating a venv dir. Pointing it out because the first failure is earlier than the obvious later failure | 22:26 |
imacdonn | ah, interesting | 22:27 |
corvus | clarkb, imacdonn: i believe the ansible synchronize module does not use the ssh control connection | 22:28 |
corvus | clarkb, imacdonn: https://github.com/ansible/ansible/issues/8473 is interesting.... | 22:28 |
corvus | clarkb, imacdonn: apparently someone wrote a patch to support that: https://github.com/cognifloyd/ansible/tree/synchronize_control_path | 22:29 |
corvus | i don't know if they made a pr out of that | 22:29 |
imacdonn | That may be a good idea ... but still, there's some underlying issue if the host key apparently changed, right ? | 22:30 |
clarkb | reading http://logs.openstack.org/17/614617/1/check/openstack-tox-py35/6e7e81a/ara-report/result/48249e3d-f90f-481f-b2de-965582c1443c/ I think it does successfully get the subunit file then fails on the html file | 22:30 |
clarkb | which is really weird | 22:30 |
corvus | imacdonn: yes; i think clarkb's suggestions for debugging that are warranted :) | 22:30 |
clarkb | I think if ^ is true that lends some weight to the idea that there is an arp fight happening | 22:30 |
clarkb | because it goes rsync fail (tox logs), rsync success (subunit), rsync fail (html) | 22:31 |
clarkb | cloudnull: if you aren't trick or treating, ^ may be just enough evidence of fight over IP addresses in rackspace? I'd be curious to hear your thoughts on that | 22:32 |
*** gfidente|afk has quit IRC | 22:36 | |
clarkb | EmilienM: mwhahaha do you know who we can get to review https://review.openstack.org/#/c/614305/ a low hanging fruit change on the job cleanup front | 22:38 |
clarkb | will remove non voting jobs from puppet gates | 22:38 |
mwhahaha | i can | 22:40 |
EmilienM | clarkb: looking now | 22:40 |
*** agopi|brb has joined #openstack-infra | 22:44 | |
*** mriedem_away has quit IRC | 22:44 | |
*** diablo_rojo has joined #openstack-infra | 22:50 | |
clarkb | ok https://review.openstack.org/#/c/605585/ to install docker on control plane servers passes now | 22:52 |
ianw | clarkb: i'm just seeing if some version of the unit-testing infrastructure we have for ansible roles in zuul-jobs makes sense translated to system-config for testing this yaml matching plugin | 22:55 |
*** erlon has joined #openstack-infra | 22:59 | |
mwhahaha | alright, i'm confused why the test-release-openstack-python3 jobs are still running on https://review.openstack.org/#/c/613621/ since I thought https://review.openstack.org/#/c/614245/ should stop them from running. thoughts? | 22:59 |
clarkb | mwhahaha: reading the inheritance path at http://logs.openstack.org/21/613621/6/check/test-release-openstack-python3/6b814f2/zuul-info/inventory.yaml it seems the template definition path for that job on that branch/repo is winning out over the override there | 23:03 |
clarkb | mwhahaha: you may need to drop that template then define an equivalent for instack that only runs on master | 23:03 |
clarkb | it is also possible that this represents a bug in zuul job config parsing | 23:03 |
*** erlon has quit IRC | 23:04 | |
clarkb | as a human I would agree that the explicit job list should override the slightly more implicit list provided by the template | 23:04 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: WIP: configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 23:16 |
corvus | clarkb: ^ how's that for a start? | 23:16 |
*** erlon has joined #openstack-infra | 23:19 | |
*** tpsilva has quit IRC | 23:22 | |
corvus | mwhahaha, clarkb: once a job is selected to run, you can't un-select it. the configuration in 614245 is applying a (null) branch variant to the job already selected by the template. | 23:25 |
corvus | mwhahaha, clarkb: if a template runs a job you don't want to run, don't use the template :) | 23:25 |
clarkb | corvus: re DNS left a couple comments where the ansible might not quite do what we want. Otherwise that looks pretty straight forward | 23:25 |
clarkb | corvus: thank you for confirming the template job config thing | 23:26 |
corvus | clarkb: good catches | 23:29 |
*** kjackal has quit IRC | 23:31 | |
*** kjackal has joined #openstack-infra | 23:31 | |
*** ianychoi has joined #openstack-infra | 23:35 | |
openstackgerrit | Amy Marrich (spotz) proposed openstack-infra/irc-meetings master: Remove WoO meeting https://review.openstack.org/614649 | 23:35 |
*** kjackal has quit IRC | 23:39 | |
mwhahaha | Ah | 23:43 |
mwhahaha | Thanks I'll fix tomorrow | 23:43 |
*** diablo_rojo has quit IRC | 23:45 | |
*** Swami has quit IRC | 23:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!