openstackgerrit | Alex Schultz proposed openstack-infra/project-config master: Fix instack jobs (again) https://review.openstack.org/614651 | 00:00 |
---|---|---|
mwhahaha | Or today | 00:00 |
*** markvoelker has quit IRC | 00:01 | |
*** slaweq has quit IRC | 00:15 | |
*** ssbarnea has quit IRC | 00:29 | |
*** erlon has quit IRC | 00:30 | |
fungi | trick-or-treat activity has wound down. catching up now on what i missed | 00:30 |
fungi | imacdonn: clarkb: we've definitely seen arp fights in other providers where a leaked instance held onto an ip address which got reassigned to a new instance, though i don't personally recall seeing it in rackspace | 00:33 |
*** betherly has joined #openstack-infra | 00:35 | |
imacdonn | fungi: any way to easily tell if this is happening repeatedly? | 00:35 |
fungi | not really other than logstash searches on that error string | 00:35 |
imacdonn | I recheck'ed my change, but it failed in a different way .. urgh :/ | 00:36 |
fungi | imacdonn: openstack has bugs? color me surprised ;) | 00:37 |
imacdonn | fungi: hah, well ... it failed tests that it passed the first time, but the first time it hit hat host key change in another job | 00:37 |
imacdonn | speaking of which ... is there any way to ask Zuul to recheck only the job that failed ? | 00:38 |
*** betherly has quit IRC | 00:39 | |
*** erlon has joined #openstack-infra | 00:43 | |
*** fuentess has quit IRC | 00:49 | |
*** longkb has joined #openstack-infra | 00:51 | |
fungi | intentionally not, by design | 00:53 |
fungi | otherwise it becomes waaaay to easy to pin-and-tumbler lockpick nondeterministic failures into merging | 00:54 |
fungi | er, too easy | 00:54 |
*** betherly has joined #openstack-infra | 00:56 | |
*** betherly has quit IRC | 01:01 | |
*** xinliang has quit IRC | 01:06 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Remove WoO meeting https://review.openstack.org/614649 | 01:08 |
*** jamesmcarthur has joined #openstack-infra | 01:24 | |
*** hongbin has joined #openstack-infra | 01:24 | |
*** jamesmcarthur has quit IRC | 01:28 | |
imacdonn | fungi: yeah, I kinda figured ... OK ... recheck #2, I guess :/ | 01:28 |
fungi | imacdonn: there is usually value in identifying the failures you encounter, at least so that others might find ways to fix them | 01:29 |
imacdonn | fungi: sure, and that's why I asked about the host key change thing .... but a lot of the failures are just timeouts | 01:33 |
imacdonn | fungi: although this one isn't ... some DB weirdness | 01:34 |
imacdonn | http://logs.openstack.org/17/614617/1/check/openstack-tox-py27/0040196/ if there's any interest ... not sure what would cause this ... maybe some concurrency thing, I guess | 01:34 |
*** dayou has quit IRC | 01:38 | |
fungi | yeah, two cinder db migration failures? | 01:40 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: updated caldav client lib https://review.openstack.org/614666 | 01:40 |
fungi | sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically. | 01:41 |
imacdonn | yeah... not sure if they caused eachother to fail, or both failed for a common reason | 01:41 |
openstackgerrit | Merged openstack-infra/openstackid-resources master: updated caldav client lib https://review.openstack.org/614666 | 01:41 |
fungi | could certainly be a concurrency-related bug in how the tests are written there | 01:41 |
imacdonn | maybe they're both faking rows in the same table, and happened to both delete them at the same time | 01:41 |
*** tdasilva has quit IRC | 01:42 | |
imacdonn | or something | 01:42 |
fungi | long shot, but i wonder how much smcginnis knows about the db tests in cinder's unit testing (if he's around this fine hallowe'en) | 01:43 |
clarkb | I think there may be an e-r bug for migration failures | 01:44 |
imacdonn | I know there are concurrency issues with some of the cinder unit tests ... they fail somewhat consistently on my 48-vCPU dev box, but pass fairly consistently with '--concurrency=12' | 01:47 |
imacdonn | but I've not seen this particular failure before | 01:47 |
*** betherly has joined #openstack-infra | 01:48 | |
fungi | yeah, high concurrency runs are great for teasing out those sorts of problems | 01:48 |
fungi | we just don't generally have the resources to test that with any frequency | 01:48 |
*** betherly has quit IRC | 01:52 | |
*** erlon has quit IRC | 01:58 | |
imacdonn | HMm, interesting ... those two tests are skipped when I run tox, because "Backend 'mysql+pymysql' is unavailable: Could not connect" | 01:59 |
*** rlandy|bbl is now known as rlandy | 01:59 | |
*** markvoelker has joined #openstack-infra | 02:03 | |
*** lujinluo has quit IRC | 02:03 | |
*** lujinluo has joined #openstack-infra | 02:04 | |
*** lujinluo has quit IRC | 02:09 | |
*** apetrich has quit IRC | 02:16 | |
*** rh-jelabarre has quit IRC | 02:34 | |
*** markvoelker has quit IRC | 02:35 | |
imacdonn | with mysql installed and populated by test-setup.sh, I can't get those test to fail .... ho hum ... back to recheck :P | 02:38 |
*** lujinluo has joined #openstack-infra | 02:49 | |
*** mrsoul has quit IRC | 02:55 | |
*** lujinluo has quit IRC | 02:55 | |
*** xek has joined #openstack-infra | 03:09 | |
*** psachin has joined #openstack-infra | 03:10 | |
*** bhavikdbavishi has joined #openstack-infra | 03:10 | |
*** icey has quit IRC | 03:18 | |
*** betherly has joined #openstack-infra | 03:21 | |
*** icey has joined #openstack-infra | 03:23 | |
*** betherly has quit IRC | 03:26 | |
*** markvoelker has joined #openstack-infra | 03:32 | |
*** armax has quit IRC | 03:49 | |
*** udesale has joined #openstack-infra | 03:50 | |
*** betherly has joined #openstack-infra | 03:53 | |
*** ramishra has joined #openstack-infra | 03:56 | |
*** betherly has quit IRC | 03:57 | |
*** dayou has joined #openstack-infra | 03:59 | |
*** markvoelker has quit IRC | 04:06 | |
*** betherly has joined #openstack-infra | 04:24 | |
*** betherly has quit IRC | 04:29 | |
*** hongbin has quit IRC | 04:37 | |
*** xek has quit IRC | 04:45 | |
*** lujinluo has joined #openstack-infra | 05:01 | |
*** markvoelker has joined #openstack-infra | 05:02 | |
*** lujinluo has quit IRC | 05:05 | |
*** lujinluo has joined #openstack-infra | 05:06 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin https://review.openstack.org/614693 | 05:09 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin https://review.openstack.org/614694 | 05:09 |
*** annp has quit IRC | 05:10 | |
*** betherly has joined #openstack-infra | 05:16 | |
*** betherly has quit IRC | 05:21 | |
*** rlandy has quit IRC | 05:30 | |
*** markvoelker has quit IRC | 05:36 | |
*** betherly has joined #openstack-infra | 05:37 | |
*** betherly has quit IRC | 05:42 | |
*** roman_g has quit IRC | 05:42 | |
*** betherly has joined #openstack-infra | 05:57 | |
*** betherly has quit IRC | 06:02 | |
*** lujinluo has quit IRC | 06:19 | |
*** chkumar|off is now known as chandankumar | 06:21 | |
*** markvoelker has joined #openstack-infra | 06:33 | |
*** xinliang has joined #openstack-infra | 06:45 | |
xinliang | frickler: This issue I found on debian arm64 node. | 06:45 |
xinliang | ianw: Could you have time to see this bug: https://bugs.linaro.org/show_bug.cgi?id=4035, arm64 debian mirror repo is really slow | 07:01 |
openstack | bugs.linaro.org bug 4035 in Default "[uk cloud] Wget fetching from mirror.london.linaro-london.openstack.org is more slower and unstable than deb.debian.org" [Enhancement,Unconfirmed] - Assigned to gema.gomez-solano | 07:01 |
*** bhavikdbavishi has quit IRC | 07:02 | |
*** apetrich has joined #openstack-infra | 07:03 | |
ianw | xinliang: hrm, i just logged into the host, it doesn't look like it's doing anything that would slow itself down | 07:04 |
*** icey has quit IRC | 07:04 | |
ianw | nothing really in dmesg | 07:05 |
*** markvoelker has quit IRC | 07:05 | |
ianw | dev/mapper/main-afscache 99G 2.0G 92G 3% /var/cache/openafs | 07:05 |
ianw | that's interesting, it's not using much afs cache at all | 07:05 |
xinliang | ianw: yeah, it's odd. The cloud inernal bandwidth is very high say ~4 G/s | 07:06 |
xinliang | ianw: any firewall rules on the repo web server?? | 07:07 |
ianw | xinliang: no, we have nothing like that | 07:08 |
ianw | root@mirror01:/etc/openafs# fs getcacheparms | 07:08 |
ianw | AFS using 1781837 of the cache's available 50000000 1K byte blocks. | 07:08 |
xinliang | ianw: so mirror is store upon of afs, right? | 07:08 |
ianw | so it seems like it has plenty of cache. the deb's you using are coming from AFS, so ulitmately we bring that across from our severs in dfw to the mirror node | 07:08 |
ianw | for /debian, yep; we have some reverse proxies | 07:08 |
xinliang | So it fething from remote server then? | 07:09 |
ianw | but it may be that running a few times primes the cache more and helps. any initial access can be slowish | 07:09 |
xinliang | I run wget serveral times, is that enough to use cache? | 07:10 |
*** icey_ has joined #openstack-infra | 07:12 | |
ianw | yeah, that would hopefully prime it better | 07:12 |
ianw | now i'm thinking about it though ... i think the cache gets invalidated on releases of the volumes | 07:12 |
ianw | which happens when we sync the mirrors | 07:12 |
ianw | now for very busy mirrors, the cost of re-fetching is armortised quickly across many jobs | 07:13 |
ianw | but for this region, where we have only a few jobs running occasionally ... that may be a bad case for it | 07:13 |
*** icey_ is now known as icey | 07:14 | |
xinliang | ianw: seems like mirror sync is real-time. I though it just sync serveral times one day | 07:15 |
ianw | all the debian-ish things run reprepro every few hours | 07:16 |
ianw | http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 should show the last release | 07:17 |
*** lujinluo has joined #openstack-infra | 07:18 | |
xinliang | But what I am not understand is why it gets things from remote server not from the local mirror server, athough we specific fetch from the local mirror server dns name | 07:19 |
ianw | because afs is a remote/distributed file system | 07:22 |
*** lujinluo has quit IRC | 07:23 | |
ianw | (e.g. nfs) ... so if the data isn't cached locally, it comes across remotely | 07:23 |
xinliang | ianw: I see, so the first time it get thing is from remote then second time it will be from locally, right? | 07:25 |
ianw | xinliang: yes, second time it should be in the afs cache | 07:25 |
*** lpetrut has joined #openstack-infra | 07:26 | |
xinliang | ianw: ok, let me check if second time is faster than first time. If so , then no problem. thanks ianw. | 07:27 |
ianw | xinliang: yeah, that's worth a try at first. if we're having problems with the cache being invalidated too frequently ... umm ... i dunno, have to think about that | 07:31 |
xinliang | ianw: yes, I check that the second time is much faster than first time. http://paste.openstack.org/show/733755/ | 07:32 |
ianw | auristor would know exactly what "vos release" invalidates and might have ideas about what we could do for less frequently accessed mirrors to try and keep them fresh | 07:32 |
ianw | xinliang: ok, good, that's what we expect as a first pass | 07:33 |
xinliang | ianw: in any case it will invalidated too frequently? or is there an item setting this? | 07:33 |
xinliang | in what case? | 07:34 |
*** pcaruana|elisa| has joined #openstack-infra | 07:40 | |
*** ramishra has quit IRC | 07:45 | |
ianw | xinliang: i'm not sure there's settings we can fiddle. i'm a little concerned that with the frequency we refresh the volume, and the low frequency of jobs, it might not be the best case for this | 07:45 |
*** ramishra has joined #openstack-infra | 07:46 | |
xinliang | ianw: i see. thanks ianw | 07:46 |
*** ifat_afek has joined #openstack-infra | 07:47 | |
*** lujinluo has joined #openstack-infra | 07:57 | |
*** slaweq has joined #openstack-infra | 07:58 | |
*** pcaruana|elisa| has quit IRC | 07:59 | |
*** imacdonn has quit IRC | 08:00 | |
*** bhavikdbavishi has joined #openstack-infra | 08:01 | |
*** udesale has quit IRC | 08:02 | |
*** lujinluo has quit IRC | 08:02 | |
*** markvoelker has joined #openstack-infra | 08:03 | |
*** pcaruana has joined #openstack-infra | 08:05 | |
*** shardy has joined #openstack-infra | 08:08 | |
*** shardy_ has joined #openstack-infra | 08:08 | |
*** bhavikdbavishi has quit IRC | 08:09 | |
*** bhavikdbavishi has joined #openstack-infra | 08:09 | |
*** shardy_ has quit IRC | 08:10 | |
*** shardy_ has joined #openstack-infra | 08:11 | |
*** lpetrut has quit IRC | 08:12 | |
*** ykarel has joined #openstack-infra | 08:16 | |
*** slaweq has quit IRC | 08:18 | |
*** ralonsoh has joined #openstack-infra | 08:22 | |
*** ralonsoh has joined #openstack-infra | 08:23 | |
*** jamesmcarthur has joined #openstack-infra | 08:25 | |
*** jamesmcarthur has quit IRC | 08:30 | |
*** florianf|afk is now known as florianf | 08:32 | |
*** markvoelker has quit IRC | 08:36 | |
*** xek has joined #openstack-infra | 08:49 | |
*** ssbarnea has joined #openstack-infra | 08:55 | |
*** jpich has joined #openstack-infra | 09:00 | |
*** rabel has quit IRC | 09:02 | |
*** kjackal has joined #openstack-infra | 09:09 | |
*** derekh has joined #openstack-infra | 09:16 | |
*** udesale has joined #openstack-infra | 09:16 | |
*** shrasool has joined #openstack-infra | 09:17 | |
*** rabel has joined #openstack-infra | 09:28 | |
*** masayukig[m] has joined #openstack-infra | 09:28 | |
*** electrofelix has joined #openstack-infra | 09:32 | |
*** markvoelker has joined #openstack-infra | 09:33 | |
*** ykarel is now known as ykarel|lunch | 09:38 | |
frickler | xinliang: ianw: interesting, that would explain why all my subsequent tests yesterday didn't show any issue | 09:42 |
frickler | xinliang: were there any real errors like job timeouts due to this or just longer runtimes? | 09:42 |
*** e0ne has joined #openstack-infra | 09:43 | |
xinliang | frickler: haven't found yet | 09:44 |
*** lpetrut has joined #openstack-infra | 09:49 | |
frickler | xinliang: so if the mirror is just slow on the first access, I think we could live with that. if we see failures, we would need to consider further measures like setting up a periodic job that would ensure the cache is kept warm | 09:52 |
xinliang | frickler: yeah, sounds a good idea. | 09:53 |
*** panda|off is now known as panda | 09:56 | |
*** shrasool has quit IRC | 09:57 | |
*** ralonsoh has quit IRC | 10:02 | |
*** ralonsoh has joined #openstack-infra | 10:03 | |
*** markvoelker has quit IRC | 10:07 | |
*** chandankumar has quit IRC | 10:09 | |
*** apetrich has quit IRC | 10:12 | |
*** jaosorior has quit IRC | 10:23 | |
*** apetrich has joined #openstack-infra | 10:25 | |
*** chandankumar has joined #openstack-infra | 10:30 | |
*** lujinluo has joined #openstack-infra | 10:32 | |
*** e0ne has quit IRC | 10:32 | |
*** e0ne has joined #openstack-infra | 10:36 | |
*** lujinluo has quit IRC | 10:36 | |
*** ansmith has joined #openstack-infra | 10:42 | |
*** jtomasek has quit IRC | 10:43 | |
*** dave-mccowan has joined #openstack-infra | 10:45 | |
*** jtomasek has joined #openstack-infra | 10:45 | |
*** ansmith has quit IRC | 10:51 | |
*** markvoelker has joined #openstack-infra | 11:04 | |
*** pcaruana has quit IRC | 11:05 | |
*** bhavikdbavishi has quit IRC | 11:16 | |
*** xek_ has joined #openstack-infra | 11:22 | |
*** udesale has quit IRC | 11:24 | |
*** xek has quit IRC | 11:25 | |
*** xek__ has joined #openstack-infra | 11:25 | |
*** zul has joined #openstack-infra | 11:26 | |
*** xek_ has quit IRC | 11:27 | |
*** longkb has quit IRC | 11:34 | |
*** jaosorior has joined #openstack-infra | 11:35 | |
*** markvoelker has quit IRC | 11:36 | |
*** pcaruana has joined #openstack-infra | 11:52 | |
*** bhavikdbavishi has joined #openstack-infra | 11:54 | |
*** erlon has joined #openstack-infra | 11:57 | |
*** kjackal has quit IRC | 12:01 | |
*** pbourke has quit IRC | 12:09 | |
*** pbourke has joined #openstack-infra | 12:10 | |
*** lpetrut has quit IRC | 12:13 | |
*** rh-jelabarre has joined #openstack-infra | 12:15 | |
*** boden has joined #openstack-infra | 12:18 | |
*** trown|outtypewww is now known as trown | 12:18 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515 | 12:25 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515 | 12:26 |
*** udesale has joined #openstack-infra | 12:32 | |
*** ykarel_ has joined #openstack-infra | 12:32 | |
*** ykarel|lunch has quit IRC | 12:35 | |
*** kjackal has joined #openstack-infra | 12:36 | |
*** rlandy has joined #openstack-infra | 12:41 | |
*** ansmith has joined #openstack-infra | 12:48 | |
*** agopi|brb is now known as agopi | 12:49 | |
*** zul has quit IRC | 12:52 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/openstack-zuul-jobs master: remove the tag pipeline entry for release-notes-jobs https://review.openstack.org/614758 | 12:52 |
*** e0ne has quit IRC | 12:55 | |
*** zul has joined #openstack-infra | 12:55 | |
*** yamamoto has quit IRC | 12:56 | |
*** udesale has quit IRC | 12:56 | |
*** ykarel_ is now known as ykarel | 12:57 | |
*** eharney has joined #openstack-infra | 13:01 | |
*** udesale has joined #openstack-infra | 13:01 | |
*** GDPR is now known as emerson | 13:01 | |
*** hashar has joined #openstack-infra | 13:02 | |
*** jcoufal has joined #openstack-infra | 13:04 | |
*** zul has quit IRC | 13:04 | |
*** zul has joined #openstack-infra | 13:05 | |
*** kgiusti has joined #openstack-infra | 13:06 | |
*** bobh has joined #openstack-infra | 13:10 | |
*** carl_cai has joined #openstack-infra | 13:12 | |
*** yamamoto has joined #openstack-infra | 13:14 | |
*** noama has joined #openstack-infra | 13:19 | |
*** mriedem has joined #openstack-infra | 13:28 | |
ralonsoh | smcginnis, tonyb: hi, we have a problem in os-vif, 1.12.0 version is buggy and it's blocking all CI works in neutron | 13:36 |
ralonsoh | smcginnis, tonyb : https://review.openstack.org/#/c/614764/ | 13:36 |
ralonsoh | smcginnis, tonyb: can you prioritize this one? thank you in advance | 13:36 |
frickler | ralonsoh: that patch is currently failing zuul, see http://logs.openstack.org/64/614764/1/check/openstack-tox-validate/79b2afe/job-output.txt.gz#_2018-11-01_13_32_43_862412 | 13:39 |
*** ykarel_ has joined #openstack-infra | 13:40 | |
*** e0ne has joined #openstack-infra | 13:40 | |
frickler | ralonsoh: once the patch is approved you can also ask infra-root to promote it in the gate if needed | 13:41 |
*** ykarel has quit IRC | 13:42 | |
*** ykarel_ is now known as ykarel | 13:43 | |
ralonsoh | frickler: thanks! | 13:45 |
*** hashar has quit IRC | 13:49 | |
*** tpsilva has joined #openstack-infra | 13:55 | |
*** bhavikdbavishi has quit IRC | 13:58 | |
auristor | ianw xinliang: let me know if you would like a deep dive on openafs volume transactions. | 14:00 |
*** ramishra has quit IRC | 14:02 | |
*** fuentess has joined #openstack-infra | 14:03 | |
*** e0ne_ has joined #openstack-infra | 14:09 | |
*** e0ne has quit IRC | 14:10 | |
openstackgerrit | Alex Schultz proposed openstack-infra/project-config master: Fix instack jobs (again) https://review.openstack.org/614651 | 14:20 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515 | 14:33 |
AJaeger | dhellmann: regarding https://review.openstack.org/614758 - won't we have the same problem once we branch of stein? I think we need to move the release notes template back to project-config... | 14:37 |
fungi | clarkb: i'm starting to see very long load times on etherpads again. did apache need restarting now that the config is in place? | 14:37 |
*** armax has joined #openstack-infra | 14:51 | |
*** Swami has joined #openstack-infra | 14:52 | |
dhellmann | AJaeger : after we branch stein we will have 2 branches adding the same job, so zuul will only run it 1 time | 15:05 |
*** kukacz has quit IRC | 15:12 | |
*** psachin has quit IRC | 15:13 | |
smcginnis | I noticed a couple delays with etherpads this morning. Not consistently though. | 15:13 |
*** ykarel_ has joined #openstack-infra | 15:13 | |
*** ykarel has quit IRC | 15:16 | |
*** ykarel_ is now known as ykarel | 15:16 | |
fungi | reloading https://etherpad.openstack.org/p/tc-chair-responsibilities is taking a minute or more for my browser to pull up the content | 15:16 |
*** apetrich has quit IRC | 15:18 | |
*** kukacz has joined #openstack-infra | 15:19 | |
*** ykarel is now known as ykarel|away | 15:22 | |
fungi | clarkb: hrm, not getting any more AH00485 in the apache error log though, and it does appear apache was restarted after the config got updated | 15:23 |
fungi | so maybe this is unrelated | 15:23 |
fungi | and now refreshing that tab is fairly quick again | 15:24 |
*** anteaya has joined #openstack-infra | 15:27 | |
clarkb | fungi: the config shouldve been applied properly yesterday | 15:29 |
fungi | yeah, i don't see any indication that it hasn't been | 15:31 |
*** gyee has joined #openstack-infra | 15:33 | |
*** e0ne_ has quit IRC | 15:34 | |
clarkb | I even cleaned out the old symlink and restarted again to make sure the desired config worked | 15:34 |
*** kjackal_v2 has joined #openstack-infra | 15:34 | |
*** e0ne has joined #openstack-infra | 15:34 | |
clarkb | after I did that the cpu usage fell off according to cacti. I think a lot of that cpu was apache trying to handle new connections. What does cacti look like more recently? | 15:34 |
*** kjackal has quit IRC | 15:35 | |
*** apetrich has joined #openstack-infra | 15:36 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Remove Daisycloud meeting https://review.openstack.org/612692 | 15:37 |
fungi | tcp open is still high following the xenial upgrade | 15:37 |
fungi | cache memory usage is almost back to where it was yesterday before we started trying things | 15:37 |
fungi | system cpu utilization did die off though and hasn't returned | 15:38 |
fungi | other than a brief spike around the daily cronjob window | 15:38 |
clarkb | I wonder if we have two issues. The forced reconnects and general slowness | 15:39 |
clarkb | the apache fix should fix the first | 15:39 |
clarkb | but possible second was independently happening at overlapping times making them seem related | 15:40 |
fungi | perhaps | 15:40 |
mwhahaha | hey are you folks aware of the issues with mirror.sjc1.vexxhost.openstack.org/pypi? http://logs.openstack.org/68/614168/1/gate/tripleo-ci-centos-7-undercloud-containers/a7d8e31/job-output.txt.gz#_2018-11-01_11_55_04_274859 | 15:41 |
mwhahaha | i scrolled up but didn't see it metioned (i may be blind) | 15:41 |
fungi | there was a release job failure earlier today where that mirror failed to return the pypi simple index page for pbr | 15:42 |
fungi | http://logs.openstack.org/7f/7ff373d1b8f38666fd61adac798318877e14477d/release/release-openstack-python3/ae53d6b/job-output.txt.gz#_2018-11-01_13_34_56_385017 | 15:42 |
fungi | i wonder if it's intermittently having trouble reaching the fastly cdn pypi uses | 15:43 |
clarkb | fungi: mwhahaha ya the indexes are not cached or have very short ttls | 15:44 |
clarkb | so if pypi cdn is unhappy we may notice it | 15:44 |
mwhahaha | i'm not sure how the pypi mirror works, but if it's a timeout wouldn't we got a 503 or something | 15:45 |
fungi | not if pip's internal timeout happened first | 15:45 |
clarkb | mwhahaha: not necessarily, iirc apache can timeout on the front end as can pip | 15:45 |
fungi | looks like pip only waits 15 seconds | 15:45 |
mwhahaha | i see | 15:45 |
clarkb | oh right apache would 500, ya probably pip | 15:45 |
*** agopi is now known as agopi|brb | 15:46 | |
fungi | judging from those job logs | 15:46 |
mwhahaha | it does seem to just be vexxhost tho | 15:46 |
mwhahaha | this was the 2nd time i've seen that today and this one was in the gate | 15:46 |
clarkb | mwhahaha: until a couple months ago we ran a proper mirror of pypi, but the size of pypi has increased dramatically in the last year or so due to daily pacakge releases of machine learning tools that link against cuda | 15:46 |
mwhahaha | pesky machine learning | 15:46 |
clarkb | mwhahaha: so pypi went from a few megabytes per day of new content to a few gigabytes per day of new content | 15:46 |
fungi | entirely possible there's a problem on the internet between vexxhost-sjc1's backbone providers and whoever is hosting the nearest fastly cdn nodes | 15:46 |
mwhahaha | fair enough, i'll keep an eye on things | 15:47 |
clarkb | we can probably configure apache to ignore the ttl on those indexes and have some custom ttl set | 15:48 |
fungi | looks like mirror.sjc1.vexxhost.openstack.org gets a round-robin of 4 different aaaa records for pypi.org | 15:48 |
clarkb | as is I think its like a 30 second or 300 second ttl if cached at all | 15:49 |
fungi | oh, actually we don't have ipv6 on that mirror, so it'll be the round-robin of four a records | 15:49 |
fungi | out of curiosity, did we disable ipv6 on that server for a reason? | 15:50 |
clarkb | fungi: there wasn't (maybe still isn't) ipv6 in vexxhost sjc1 | 15:50 |
*** agopi|brb has quit IRC | 15:50 | |
fungi | huh. i could have sworn i had ipv6 on my instances in sjc1 though i don't have any built at the moment | 15:52 |
clarkb | fungi: I know it was on mnaser's todo list, but the mirror node was one of the first things we spin up so was spun up before ipv6. It is possible the region does ipv6 now | 15:53 |
clarkb | in that case we'd want to build a new mirror with ipv6 and delete the old one | 15:53 |
clarkb | ianw: xinliang frickler re AFS invalidating the cache aiui it doesn't discard the data, it only refreshes metadata which tells it if any of the cached data is invalid. There is a definite hit to performance particularly as rtt increases (due to afs window sizing) | 15:54 |
clarkb | all that to say that hitting the mirror and fetching the data you want should help even if we invalidate the metadata later | 15:54 |
EmilienM | hello folks :-) can someone approve this easy one: https://review.openstack.org/#/c/614616/ | 15:55 |
EmilienM | so we can release - thanks! | 15:56 |
clarkb | EmilienM: that project isn't on pypi yet? fungi do our jobs create the project in pypi now if necessary? | 15:57 |
fungi | clarkb: yes, they do | 15:59 |
clarkb | neat! | 15:59 |
fungi | rather, the _recommended_ way to create a project on pypi these days is to upload an sdist/wheel with twine | 15:59 |
clarkb | (I thought I remembered you talking about it at some point) | 15:59 |
fungi | there is no separate project registration step in warehouse | 15:59 |
EmilienM | thx clarkb | 15:59 |
clarkb | that makes so much sense | 15:59 |
EmilienM | jaosorior: you'll be able to release soon ^ | 15:59 |
mwhahaha | i had chatted w/ the release folks about creating as new job that replaced the old openstack-server release job that does the python3 build but doesn't publish to pypi | 16:00 |
fungi | they intentionally wanted to make it hard/impossible to register a project without an actual sdist/wheel so as to discourage squatting project names | 16:00 |
mwhahaha | but haven't had time to look at it | 16:00 |
mwhahaha | because things like the ansible roles don't make sense published via pypi | 16:01 |
fungi | do ansible roles even warrant tarball/wheel builds? | 16:01 |
mwhahaha | not really but we use the tarball bits for packaging | 16:02 |
fungi | seems like for ansible roles you at most want a tag and release announcement | 16:02 |
mwhahaha | it really should be a publish to galaxy | 16:02 |
fungi | oh, yeah i wonder if anyone's working on an ansible galaxy publisher for release artifacts yet | 16:02 |
fungi | mordred: ^ that seems like something you would know about | 16:02 |
clarkb | I think the struggle with galaxy is that it assumes github | 16:03 |
clarkb | and the way we run our github orgs being push only mirrors from gerrit prevents individuals from doing the github tasks to release on galaxy? I dont' recall all the details but its something like that | 16:03 |
fungi | traceroute from sjc1 suggests that the current fastly endpoints for pypi.org are all via cogent directly or cogent->ntt | 16:06 |
fungi | i suppose if there's flakiness at the cogent/ntt peering point that could be one possible explanation for the timeouts | 16:06 |
clarkb | fungi: fwiw `journalctl -u etherpad-lite` doesn't show any major errors in etherpad. There are a small number of client errors on various pads | 16:08 |
mordred | clarkb, fungi: yes - there is work underway to update galaxy to allow uploading things - and to not assume github | 16:09 |
clarkb | TypeErrors and ReferenceErrors from the js I think | 16:09 |
fungi | yeah, it's also possible the long wait for loading some pads i was seeing was a problem on my end | 16:09 |
mordred | I don't know the status of it - but it was definitely discussed at fest ... I think it's also tied in with the upcoming work on mazer | 16:10 |
clarkb | I did have to reconnect to the clarkb-test etherpad when I sat down this morning but other tabs in browser indicated I probably had a local networking disconnect | 16:10 |
fungi | i'm seeing no icmp packet loss and reasonable rtt from mirror.sjc1.vexxhost.openstack.org to each of the 4 fastly nodes listed for pypi.org at the moment, but this sort of problem tends to come and go at random anyway so that's not surprising | 16:10 |
clarkb | fungi: it could potentially have been a problem between local fastly and backend servers too? | 16:11 |
clarkb | and pip has a shorter timeout than fastly does | 16:11 |
*** agopi has joined #openstack-infra | 16:12 | |
*** jpich has quit IRC | 16:13 | |
*** imacdonn has joined #openstack-infra | 16:15 | |
openstackgerrit | Merged openstack-infra/project-config master: Add publish jobs for ansible-role-openstack-operations https://review.openstack.org/614616 | 16:16 |
AJaeger | config-core, could you review https://review.openstack.org/614651 and https://review.openstack.org/614758 , please? | 16:18 |
*** ykarel|away has quit IRC | 16:21 | |
fungi | clarkb: also entirely possible, yes. we're basically looking at two layers of caching proxies in this case (ours and fastly's) | 16:21 |
fungi | and so maybe it's only affecting one or more of the fastly nodes hit by vexxhost-sjc1 and not the fastly nodes our other mirrors are resolving to | 16:22 |
fungi | for $reason | 16:22 |
EmilienM | AJaeger: test-release-openstack-python3 is still running on https://review.openstack.org/#/c/613621/ I'm confused | 16:22 |
EmilienM | I thought mwhahaha disabled it in project-config | 16:23 |
mwhahaha | EmilienM: there's a patch | 16:23 |
EmilienM | ah I missed that | 16:23 |
mwhahaha | turns out you can't define the template and the job | 16:23 |
mwhahaha | cause the template wins | 16:23 |
mwhahaha | https://review.openstack.org/#/c/614651/ | 16:23 |
*** pall is now known as pabelanger | 16:24 | |
pabelanger | clarkb: mwhahaha: you don't really publish to galaxy today, it is more about triggering a hook in galaxy to import. So, which some work you could write a job in post pipeline to ensure github.com is first mirrored (from gerrit). But you are correct, today it is pinned to github. Shortly, ansible roles will be using mazer (new tool), which will allow you to create a tarball then push that into | 16:26 |
pabelanger | galaxy. | 16:26 |
pabelanger | s/which/with | 16:26 |
*** e0ne has quit IRC | 16:26 | |
mwhahaha | interesting | 16:27 |
pabelanger | next week, I'll be starting to test mazer more and publishing to galaxy-qa from zuul, just to better understand how it all works. But so far, it seems to be much like how a puppet module would be released today. There is a manifest file inside a tarball | 16:28 |
pabelanger | https://github.com/ansible-network/sandbox/pull/24 so far is my POC, with console log for zuul job (creating tarball): https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/24/24/732ca4af4313a479ec0ba9ab6277a165700abf7d/check/build-ansible-role-tarball/1e0ac9d/job-output.html | 16:29 |
*** slaweq has joined #openstack-infra | 16:29 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Set ansible python version for opendev nameservers https://review.openstack.org/614607 | 16:32 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 16:32 |
corvus | clarkb, fungi, mordred: ^ there's a lot of infrastructure in there, so fingers crossed the tests all work, but i think that's ready to go | 16:33 |
corvus | oops, one quick fix needed | 16:34 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 16:34 |
*** jamesmcarthur has joined #openstack-infra | 16:35 | |
corvus | i accidentally checked in my local test playbook :) that's fixed now | 16:35 |
AJaeger | EmilienM: I thought the patch merged to fix it - but that was another one. So, my recheck was too early... | 16:36 |
mordred | corvus: I'm guessing that once we have a story for per-service playbooks, we'd move https://review.openstack.org/#/c/614648/3/playbooks/base.yaml into a adns.yaml playbook or something? | 16:36 |
corvus | mordred: maybe...? probably? but as long as we have the giant ansible wheel, what's in that change makes a lot of sense. | 16:37 |
corvus | mordred: we could go ahead and start splitting it out and doing include_playbook maybe? | 16:38 |
fungi | what's with the "start_services=true" addition? | 16:39 |
mordred | corvus: oh - totally - I think it makes perfect sense as things are now | 16:39 |
corvus | fungi: i wanted the testinfra job to verify that bind can start, but i didn't want to have the playbook start it in production. that's my attempt to have cake and eat it too. | 16:40 |
fungi | oh, i see where it's called from the master-nameserver playbook now | 16:40 |
corvus | if we like that approach, we can use it in other services where we don't necessarily want ansible starting services behind our backs | 16:41 |
fungi | i missed that playbooks/zuul/run-production-playbook.yaml was testing-only | 16:41 |
corvus | fungi: oh, hrm. i think i may have put that in the wrong place... | 16:43 |
*** pabelanger is now known as pall | 16:43 | |
corvus | yep, working on a new revision | 16:45 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 16:46 |
corvus | fungi: good catch. i think that ^ is much more sane too :) | 16:46 |
*** sshnaidm|ruck is now known as sshnaidm|afk | 16:48 | |
clarkb | corvus: lgtm, I approved the python3 base change since that should be a noop until we start trying to deploy stuff with the followup change | 16:48 |
openstackgerrit | Merged openstack-infra/irc-meetings master: Remove various unused Neutron meetings https://review.openstack.org/612698 | 16:51 |
*** Swami has quit IRC | 16:55 | |
mordred | corvus: left a second +2 on the adns change - leaving it un +Ad assuming you might want to watch it / interact with it | 16:56 |
*** rfolco|rover is now known as rfolco|ruck | 16:58 | |
*** carl_cai has quit IRC | 17:02 | |
corvus | yeah, we can wait for tests and fungi | 17:02 |
fungi | yeah, i wanted to look at the test logs to better understand how this all works together | 17:02 |
fungi | sorry, still feel a little out of my depth reviewing ansible stuff | 17:02 |
*** jamesmcarthur has quit IRC | 17:04 | |
corvus | fungi: i'm happy to answer questions and/or add comments in followup changes | 17:04 |
clarkb | I'm also happy for it to not be perfect on the first pass given this is the second set of ansible only host stuff (bridge was first) | 17:05 |
clarkb | related ish to ^ is the yamlgroup change. ianw has written some test framework for that which I think would be good to review (I've reviewed it already) | 17:06 |
*** udesale has quit IRC | 17:06 | |
*** jamesmcarthur has joined #openstack-infra | 17:07 | |
fungi | yeah, i'm not super worried about it breaking something, more just hesitating to vote on it until i have some grasp of what it's doing ;) | 17:09 |
*** trown is now known as trown|lunch | 17:12 | |
*** jamesmcarthur has quit IRC | 17:16 | |
*** jamesmcarthur has joined #openstack-infra | 17:17 | |
*** abishop has joined #openstack-infra | 17:21 | |
*** diablo_rojo has joined #openstack-infra | 17:22 | |
*** abishop has left #openstack-infra | 17:23 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it https://review.openstack.org/614814 | 17:24 |
clarkb | last bit of nodepool cleanup ^ | 17:24 |
*** Swami has joined #openstack-infra | 17:26 | |
*** slaweq has quit IRC | 17:27 | |
*** eernst has joined #openstack-infra | 17:28 | |
*** e0ne has joined #openstack-infra | 17:31 | |
EmilienM | can someone approve this one please: https://review.openstack.org/#/c/614651/ | 17:31 |
EmilienM | so we can retire some old tripleo projects | 17:31 |
*** eernst has quit IRC | 17:31 | |
clarkb | ya I've got most of that paged in will take a look | 17:32 |
clarkb | actually simpler than I expected | 17:33 |
fungi | ahh, we double-approved | 17:33 |
EmilienM | thx :D | 17:35 |
*** jpich has joined #openstack-infra | 17:38 | |
fungi | corvus: looks like system-config-run-dns is running for that change now | 17:41 |
fungi | just about to the good parts of the deploymen | 17:41 |
fungi | t | 17:41 |
*** jamesmcarthur has quit IRC | 17:42 | |
fungi | though system-config-run-eavesdrop just failed for it, and tox-docs seems to have as well | 17:42 |
*** bobh has quit IRC | 17:42 | |
openstackgerrit | Merged openstack-infra/system-config master: Set ansible python version for opendev nameservers https://review.openstack.org/614607 | 17:44 |
openstackgerrit | Merged openstack-infra/project-config master: Fix instack jobs (again) https://review.openstack.org/614651 | 17:46 |
*** e0ne has quit IRC | 17:47 | |
fungi | corvus: apparently it's still trying to use puppet | 17:47 |
fungi | fatal: [adns1.opendev.org]: FAILED! => {"changed": false, "msg": "Unsupported puppet version '3' on this platform"} | 17:47 |
*** yamamoto has quit IRC | 17:49 | |
*** mriedem has left #openstack-infra | 17:56 | |
*** mriedem has joined #openstack-infra | 17:56 | |
*** jamesmcarthur has joined #openstack-infra | 17:59 | |
*** e0ne has joined #openstack-infra | 17:59 | |
*** derekh has quit IRC | 18:00 | |
*** sshnaidm|afk is now known as sshnaidm|off | 18:00 | |
*** e0ne has quit IRC | 18:02 | |
*** ifat_afek has quit IRC | 18:03 | |
*** e0ne has joined #openstack-infra | 18:05 | |
*** anteaya has quit IRC | 18:05 | |
mwhahaha | meh now getting timed out for pypi from iad.rax http://logs.openstack.org/71/614571/2/gate/tripleo-ci-centos-7-undercloud-containers/f6332aa/job-output.txt.gz#_2018-11-01_15_02_28_462326 | 18:06 |
clarkb | likely a pypi issue if it is affecting clouds on different sides of the country | 18:07 |
*** e0ne has quit IRC | 18:07 | |
mwhahaha | yea | 18:07 |
clarkb | mwhahaha: https://status.python.org/ shows that pypi considers itself to be functional but there is a growing number of errors listed for pypi.org if you scroll down a bit | 18:10 |
clarkb | mordred: are you around? I'm going to look at nodepool launchers and builders and getting their software updated now. To double check 0.19.0 is the sdk version we want which includes the port listing fix and the iamge task upload fix? | 18:12 |
clarkb | mordred: and do you think we should restart services with 0.19.0 in place or just ensure it gets installed as expected? | 18:12 |
clarkb | (I'm not sure how well tested those fixes ended up being in sdk) | 18:13 |
*** ralonsoh has quit IRC | 18:13 | |
clarkb | mordred: https://docs.openstack.org/releasenotes/openstacksdk/unreleased.html#relnotes-0-19-0 shows the image upload fix but not the port listing fix? | 18:15 |
clarkb | I'll start with the builders in this case | 18:16 |
clarkb | nb01-nb03 are removed from the emergency file. I'll check that puppet updates things there with new sdk version and restart builder daemons once that looks good | 18:17 |
*** yamamoto has joined #openstack-infra | 18:17 | |
*** yamamoto has quit IRC | 18:17 | |
openstackgerrit | Ryan Beisner proposed openstack-infra/openstack-zuul-jobs master: Add py3 jobs with an unspecified minor version https://review.openstack.org/614823 | 18:18 |
corvus | Failed to read build info file: ValueError("build info file is broken: IndexError('list index out of range',)",) | 18:21 |
corvus | anyone know what that means in the context of a sphinx build? (specifically 'tox -re docs' in system-config) | 18:22 |
corvus | oh, maybe i should rm -rf docs/build ? | 18:22 |
*** kjackal_v2 has quit IRC | 18:23 | |
corvus | yes. apparently "build info file is broken" is a typo for "please run rm -rf docs/build" | 18:23 |
mordred | clarkb: yes - 0.19.0 is the one | 18:23 |
clarkb | mordred: and that has the ports fix too? | 18:24 |
mordred | clarkb: yah - should do - lemme verify | 18:24 |
mordred | clarkb: yes - looks like we didn't do a release note because we're bad people | 18:25 |
clarkb | mordred: ok I'll do the launchers after the builders. Should I go ahead and restart launchers too? probably a reasonable idea | 18:26 |
mordred | clarkb: yah - to pick up the change - although, just to be safe, maybe let's pick one launcher and restart it and then make sure it didn't delete the world | 18:26 |
clarkb | ok | 18:26 |
clarkb | I think we still reverted the change in nodepool right? | 18:27 |
mordred | yah. oh - right - so no need to do that just right now | 18:27 |
mordred | we want to do that once we land the nodepool change | 18:27 |
mordred | should be safe to update to 0.19.0 | 18:27 |
clarkb | mostly I want to get the hosts out of the emergency file so they are no longer special | 18:27 |
clarkb | (also so that normal config updates work again) | 18:28 |
*** jcoufal has quit IRC | 18:28 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 18:28 |
*** jcoufal has joined #openstack-infra | 18:28 | |
corvus | clarkb, fungi, mordred: ^ i think that fixes all the bugs shown by the first round of testing (yay for running all the jobs :) | 18:28 |
mordred | corvus: changes look reasonable | 18:31 |
mordred | clarkb: ++ | 18:31 |
openstackgerrit | Ryan Beisner proposed openstack-infra/openstack-zuul-jobs master: Add py3 jobs with an unspecified minor version https://review.openstack.org/614823 | 18:32 |
*** jcoufal has quit IRC | 18:32 | |
*** jcoufal has joined #openstack-infra | 18:33 | |
*** jcoufal has quit IRC | 18:36 | |
*** jcoufal has joined #openstack-infra | 18:36 | |
*** diablo_rojo has quit IRC | 18:47 | |
*** florianf is now known as florianf|afk | 18:48 | |
*** diablo_rojo has joined #openstack-infra | 18:48 | |
AJaeger | config-core, do we want this in openstack-zuul-jobs or should we ask to make this charm specific in a charm repo? ^ | 18:48 |
clarkb | for some reason nb03 didn't update openstacksdk when it updated nodepool. nb01 and nb02 did | 18:49 |
clarkb | different versiosn of pip3 may explain it. I am going to manually update to 0.19.0 on nb03 and restart there so all the builders run the same code | 18:50 |
clarkb | ah ther eason is we don't require newer sdk in nodepool | 18:51 |
*** yamamoto has joined #openstack-infra | 18:53 | |
*** rfolco has joined #openstack-infra | 18:54 | |
*** rfolco|ruck has quit IRC | 18:56 | |
*** apetrich has quit IRC | 18:56 | |
*** diablo_rojo has quit IRC | 18:57 | |
clarkb | nb01-nb03 are restarted and running with openstacksdk 0.19.0 now. I have removed nl01-nl04 from the emergency file and will restart them once updated | 18:57 |
*** diablo_rojo has joined #openstack-infra | 18:59 | |
*** jpich has quit IRC | 18:59 | |
*** yamamoto has quit IRC | 19:00 | |
*** pcaruana has quit IRC | 19:05 | |
corvus | mordred: i need the ipv4 and ipv6 addresses of the host in ansible -- the equivalent of ::ipaddress and ::ipaddress6 in puppet. do you think that's ansible_default_ipv4.address ansible_default_ipv6.address ? | 19:06 |
corvus | (i believe the adns ansible change is going to need one more revision because of that -- we really don't want named listening on all addresses, only the public ones, because unbound should be listening on the local addresses) | 19:07 |
*** apetrich has joined #openstack-infra | 19:08 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it https://review.openstack.org/614814 | 19:09 |
clarkb | infra-root ^ should be a fairly easy cleanup change. Thank you pabelanger for the earlier review | 19:10 |
*** bobh has joined #openstack-infra | 19:13 | |
*** trown|lunch is now known as trown | 19:14 | |
*** bobh has quit IRC | 19:18 | |
*** panda is now known as panda|off | 19:20 | |
mtreinish | infra-root: it looks like there's something up with the mysql proxy on logstash.o.o again. It just seems to hang on trying to establish a connection to the database for me | 19:24 |
mtreinish | at 2min and counting... | 19:24 |
*** electrofelix has quit IRC | 19:28 | |
*** diablo_rojo has quit IRC | 19:31 | |
mnaser | could we create/maintain our own groups within gerrit somehow? | 19:32 |
*** diablo_rojo has joined #openstack-infra | 19:33 | |
clarkb | mtreinish: ok I'll see what I can find after lunch | 19:36 |
clarkb | mnaser: all of that is driven by the acls file config. We create groups that show up in the acl files | 19:36 |
clarkb | the launchers did not update openstacksdk, beacuse they already had up to date nodepool. I'll have to manually update openstacksdk on them by hand :/ | 19:37 |
mtreinish | clarkb: ok thanks | 19:39 |
*** kjackal has joined #openstack-infra | 19:41 | |
AJaeger | mnaser: and once a group is created and the first person added, you can maintain it yourself... | 19:42 |
*** jamesmcarthur has quit IRC | 19:43 | |
*** sthussey has quit IRC | 19:51 | |
*** jamesmcarthur has joined #openstack-infra | 19:58 | |
ianw | clarkb: empirically "from . yamlgroup import InventoryModule" works, but you're right, i wonder how | 19:58 |
*** jamesmcarthur_ has joined #openstack-infra | 19:58 | |
*** jamesmcarthur has quit IRC | 20:02 | |
openstackgerrit | Kendall Nelson proposed openstack-infra/infra-specs master: StoryBoard Story Attachments https://review.openstack.org/607377 | 20:05 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin https://review.openstack.org/614693 | 20:05 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin https://review.openstack.org/614694 | 20:05 |
*** eharney has quit IRC | 20:08 | |
*** e0ne has joined #openstack-infra | 20:11 | |
ianw | clarkb / amorin: i looks like ovh-gra1 is out again. do we wan t to remove it? i guess it slows things down as we waste time trying to create nodes, although ultimately isn't a correctness issue as the nodes eventually go somewhere else | 20:12 |
ianw | infra-core: would be great if somone else can pass an eye over the host list in https://review.openstack.org/#/c/614693/2/inventory/groups.yaml and just look for anomalies; otherwise i'll rebase & squish it into the base change when clarkb's nodepool.o.o removal stuff goes in | 20:15 |
*** jamesmcarthur_ has quit IRC | 20:15 | |
clarkb | ianw: ya we probably should remove gra1. Let me restart nl04 to pick up the openstacksdk update (this launcher runs ovh) then we can update the config from there? | 20:17 |
clarkb | I'll restart the other three launchers if ovh bhs1 continues to look happy | 20:17 |
clarkb | restart on nl04 is done | 20:18 |
ianw | clarkb: ++ sounds like a plan, good idea to stage the rollout :) | 20:18 |
*** eernst has joined #openstack-infra | 20:19 | |
clarkb | mtreinish: simpleproxy has >7k processes running. My hunch is that it has run out of processes or fds or both | 20:19 |
mtreinish | lol, ok | 20:19 |
clarkb | mtreinish: we upgraded this server from trusty to xenial not too long ago. I wonder if this is a bug in simpleproxy | 20:19 |
mtreinish | yeah, that sounds like a bug | 20:20 |
mtreinish | or a dos :p | 20:20 |
clarkb | mtreinish: I'm going to restart simple proxy and we can monitor to see if it happens again | 20:20 |
*** eernst has quit IRC | 20:20 | |
mnaser | clarkb: AJaeger my apologies, i meant to say was project groups that is | 20:20 |
*** eernst has joined #openstack-infra | 20:20 | |
clarkb | mnaser: you'll need to use more words I think for me to undestand what you mean by that | 20:21 |
clarkb | mtreinish: service stop totally didn't stop the service :) | 20:21 |
mnaser | clarkb: something like https://review.openstack.org/#/admin/projects/API-Projects? | 20:21 |
mnaser | so we can watch directly on that | 20:21 |
mnaser | (and have all sorts of projects be children of it, that way, we can just set "openstack-ansible" as watched project rather than listing them all) | 20:22 |
mtreinish | clarkb: hmm, it looks like it's the same version (just a package revision) between trusty and xenial: https://packages.ubuntu.com/search?keywords=simpleproxy | 20:22 |
clarkb | mnaser: do project watches work that way? the API-Projects project is for common acls isn't it? | 20:23 |
mtreinish | heh, it's not like it's a super active project either: https://github.com/vzaliva/simpleproxy :) | 20:23 |
clarkb | mnaser: I think if you want to see events from individual projects you have to subscribe to them but I've never tested it | 20:23 |
clarkb | mtreinish: ok I manually cleaned up the process and restarted the service. Can you test it now | 20:23 |
mnaser | clarkb: well i bring this up because when i wanted to watch projects but i realized that adding all openstack ansible stuff would take forever | 20:24 |
clarkb | mtreinish: there are already 7.9k processes again | 20:24 |
mnaser | and would fall out of sync | 20:24 |
clarkb | mnaser: sure, I'm just not sure that the proposed solution solves the problem either :/ | 20:24 |
mnaser | ah i see | 20:24 |
*** eernst has quit IRC | 20:25 | |
*** jamesmcarthur has joined #openstack-infra | 20:25 | |
mnaser | is there a common way that folks have been adding a lot of projects under a team or something like that? | 20:25 |
mtreinish | clarkb: yeah, still the same thing, it just hangs trying to connect | 20:25 |
*** jamesmcarthur has quit IRC | 20:26 | |
clarkb | mtreinish: my hunch reading the code is that maybe its a systemd interaction | 20:26 |
clarkb | they have inetd handling but probably don't do the right thing with systemd maybe? | 20:26 |
*** eernst has joined #openstack-infra | 20:26 | |
*** rlandy is now known as rlandy|brb | 20:26 | |
*** e0ne has quit IRC | 20:28 | |
*** jamesmcarthur has joined #openstack-infra | 20:28 | |
clarkb | Nov 01 20:20:25 logstash01 simpleproxy-mysql[26071]: start-stop-daemon: need at least one of --exec, --pid, --ppid, --pidfile, --user or --name | 20:29 |
clarkb | mnaser: I think some people have scripted the subscriptions? I do the opposite and don't watch anything via gerrit because its email is already too noisy | 20:30 |
mtreinish | clarkb: oh, that might be on us. I think we wrote an init file in the puppet-simpleproxy | 20:30 |
mtreinish | and on xenial we might have to update it for systemd | 20:30 |
mtreinish | http://git.openstack.org/cgit/openstack-infra/puppet-simpleproxy/tree/templates/simpleproxy-mysql.init.erb | 20:30 |
*** eernst has quit IRC | 20:31 | |
*** ansmith has quit IRC | 20:31 | |
*** kgiusti has left #openstack-infra | 20:32 | |
clarkb | mtreinish: using an init script is fine, we just need to tell start-stop-daemon how to identify the service | 20:32 |
*** dave-mccowan has quit IRC | 20:32 | |
clarkb | looks like simple-proxy takes a -p argument for the pidfile | 20:32 |
*** eernst has joined #openstack-infra | 20:32 | |
clarkb | so we can do that or use --exec to point to the binary | 20:32 |
clarkb | mnaser: reading the gerrit docs All-Projects is special for all projects visiable to the user, which is all our projects. | 20:33 |
clarkb | mnaser: that behavior appears built in and isn't configurable on a subset from what I can see | 20:34 |
clarkb | mnaser: what I think you can do is subscribe to All-Projects then define the search string in the only-if to match osa with a regex maybe? | 20:34 |
clarkb | "Changes occurring in 'PROJECT'. If 'PROJECT' starts with ^ it matches project names by regular expression. The dk.brics.automaton library is used for evaluation of such patterns." | 20:35 |
clarkb | mnaser: ^ that is what I would try. Something like subscribe to All-Projects then in onlyif box: project:^openstack/openstack-ansible.* | 20:35 |
clarkb | ianw: looks like the gra1 errors are due to instance quota? | 20:36 |
clarkb | ianw: I think our quota must've gotten out of sync there | 20:36 |
openstackgerrit | Merged openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it https://review.openstack.org/614814 | 20:36 |
*** eernst has quit IRC | 20:37 | |
*** eernst has joined #openstack-infra | 20:38 | |
*** eernst has quit IRC | 20:42 | |
clarkb | mtreinish: I've done what I think start-stop-daemon want. Can you take a look again? lets see if that fixes it. If so I can push a change up | 20:44 |
mtreinish | clarkb: hmm, still looks like it's hanging | 20:46 |
clarkb | mtreinish: ok it seems to forkbomb when you connect (which makes sense since its trying to do process per connection I think) | 20:47 |
clarkb | mtreinish: my dumb telnet didn't induce it to do this | 20:47 |
mtreinish | clarkb: fwiw, I"m just running 'mysql -u query --password=query -h logstash.openstack.org subunit2sql' | 20:50 |
*** slaweq has joined #openstack-infra | 20:50 | |
*** eernst has joined #openstack-infra | 20:50 | |
*** bobh has joined #openstack-infra | 20:52 | |
clarkb | mtreinish: and you must've closed the connection? beacuse I see no more processes now | 20:52 |
mtreinish | yeah I did | 20:53 |
mtreinish | didn't see the point in having it sit there | 20:53 |
ianw | clarkb: maybe ... last errors in the cleanup were neutron timeouts | 20:53 |
clarkb | ya ok I can reproduce that locally now. Start single connection and it forkbombs. Kill conncetion no more forks | 20:54 |
*** bobh has quit IRC | 20:56 | |
*** bobh has joined #openstack-infra | 20:56 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin https://review.openstack.org/602385 | 20:56 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin https://review.openstack.org/614693 | 20:56 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin https://review.openstack.org/614694 | 20:56 |
clarkb | first attempt at strace failed. But this is reproduceable so I should be able to get there | 20:58 |
*** eernst has quit IRC | 20:59 | |
*** xarses_ has quit IRC | 21:04 | |
*** erlon has quit IRC | 21:05 | |
*** bobh has quit IRC | 21:05 | |
*** eernst has joined #openstack-infra | 21:05 | |
clarkb | mtreinish: the strace makes it look like it is getting many connections. the sin_port changes for my localhost connection via telnet, it then forks (well really clones) a bunch of subprocesses | 21:06 |
clarkb | mtreinish: https://github.com/vzaliva/simpleproxy/blob/master/simpleproxy.c#L419 that returns way more than I woudl expect it basically | 21:07 |
*** eernst has quit IRC | 21:08 | |
clarkb | ianw: looks like we continue to function on nl04, should be good to restart hte other three launchers with new sdk now ya? | 21:08 |
mordred | clarkb: yay! I always like it when things continue to function | 21:10 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 21:11 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Remove puppet config for opendev nameservers https://review.openstack.org/614869 | 21:11 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 21:11 |
*** trown is now known as trown|outtypewww | 21:11 | |
corvus | clarkb, mordred, fungi: ^ removed uneeded puppet stuff, fixed up an error i missed last time in adns1, reworked the group vars slightly, and added the stuff for the other nameservers | 21:12 |
corvus | the whole stack is there now, and should be ready to go modulo lingering bugs | 21:12 |
corvus | now i need to look an ianw and mordred's group stuff because they conflict | 21:12 |
corvus | clarkb, ianw: yamlgroups uses regexes, right? | 21:13 |
clarkb | corvus: current implementation is fnmatch in python which is unix shell globbing | 21:13 |
corvus | oh ok | 21:14 |
corvus | i should be able to rework based on that | 21:14 |
corvus | are we more or less ready to put yamlgroup in? ie, should i rebase the nameserver stack on it? | 21:14 |
ianw | corvus: i think so, at least i'm happy if we have made a mistake, we have some testing to avoid making the same mistake twice :) | 21:15 |
clarkb | mtreinish: ok I figured it out http://git.openstack.org/cgit/openstack-infra/puppet-simpleproxy/tree/templates/simpleproxy-mysql.init.erb#n19 db_host isn't being populated for some reason | 21:15 |
clarkb | mtreinish: the default is to connect to localhost. So what is happening is first connection comes in then proxy connects to itself | 21:16 |
clarkb | mtreinish: then that creates a recursive talk to myself loop | 21:16 |
corvus | ianw: do i understand correctly that yamlgroup is added in ansible 2.8 which is not released yet, but when it is, we can drop our local copy. | 21:16 |
ianw | oh, i'm not sure on the upstreaming plans | 21:17 |
ianw | it doesn't seem to be in my devel branch checkout | 21:18 |
corvus | ianw: oh, mordred wrote that from scratch? i thought it was a backport. | 21:18 |
clarkb | I'm taking the hiera/ansible hostvar lock | 21:18 |
corvus | neat. | 21:18 |
ianw | ok, since you and clarkb have looked over the groups.yaml now, i'll squash it down | 21:19 |
corvus | so, rather, yamlgroup is original to mordred and may be upstreamed when he gets around to it :) | 21:19 |
*** jamesmcarthur has quit IRC | 21:19 | |
clarkb | and done | 21:19 |
mordred | yeah - it's original work - haven't put much thought/energy into upstreaming yet | 21:19 |
mtreinish | clarkb: ooh fun | 21:20 |
corvus | ianw, mordred: in that case -- real quick -- are we sure we want fnmatch and not regex? :) | 21:20 |
clarkb | mtreinish: I think this should sort itself out on the next puppet run | 21:20 |
clarkb | mtreinish: I had to update some hiera because the hostname changed from logstash.o.o to logstash01.o.o | 21:20 |
mordred | corvus: nope! not sure at all - I did fnmatch because all of the escaping of \. in things ... but I don't have strong feelings | 21:20 |
corvus | ianw, mordred: i'm asking on behalf of the regex "(ad)?ns\d+\.opendev\.org". i'm sure we can just split that into two lines and it'll be fine. | 21:20 |
ianw | corvus: i don't mind ... and that was why i wrote the unit-test framework so switching it could be more like test-driven development :) | 21:21 |
corvus | mordred: yeah. i think i can live with two fnmatches for (ad)ns in exchange for not typing "\." all the time. | 21:21 |
corvus | just thought i'd check. | 21:21 |
clarkb | ok with logstash sorted I'm going to restart the other three nodepool launchers | 21:21 |
*** rlandy|brb is now known as rlandy | 21:21 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin https://review.openstack.org/602385 | 21:21 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin https://review.openstack.org/614694 | 21:21 |
mordred | corvus: we're getting to agreeable in our old age | 21:22 |
corvus | mordred: no we're not | 21:22 |
corvus | i'm rebasing the adns/ns changes on yamlgroup now | 21:22 |
mordred | corvus: yes we ARE! | 21:23 |
mordred | corvus: I left a couple of nitpick reviews on those | 21:23 |
mordred | corvus: so if you're respinning, might be worth looking at | 21:23 |
corvus | mordred: ah thanks, i'll grab them in this round | 21:23 |
clarkb | #status log openstacksdk 0.19.0 installed on nl01-04 and nb01-03 and all nodepool launchers and builders have been restarted | 21:24 |
openstackstatus | clarkb: finished logging | 21:24 |
clarkb | mordred: ^ fyi | 21:25 |
mordred | \o/ | 21:26 |
clarkb | mnaser: when you meeting is over I'd be curious to learn if you subscription chnge as I described above works or not | 21:26 |
corvus | wow... i just realized... we're probably about 10 lines of code away from being able to have the 'dns' test job *actually* serve a copy of the opendev zone and verify the master and authoritative servers are serving the data. | 21:26 |
mnaser | clarkb: i'll give it a shot in a little bit, middle of an upgrade :< | 21:27 |
mordred | corvus: that's super cool | 21:27 |
mnaser | famous last words, 10 lines which end up taking 3 weeks to get working | 21:27 |
mnaser | :P | 21:27 |
*** jamesmcarthur has joined #openstack-infra | 21:27 | |
corvus | mnaser: absolutely! | 21:27 |
mordred | mnaser: ++ | 21:32 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Remove puppet config for opendev nameservers https://review.openstack.org/614869 | 21:33 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 21:33 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 21:33 |
clarkb | mtreinish: ok try it now | 21:33 |
corvus | clarkb, fungi, mordred, ianw: okay that's rebased, fixed, and ready to go | 21:33 |
*** ansmith has joined #openstack-infra | 21:34 | |
clarkb | corvus: ok I'll be taking a look at the whole stack shortly | 21:35 |
ianw | corvus: should we babysit through the ansible update with https://review.openstack.org/#/c/609556/ and get that out of the way? | 21:35 |
TheJulia | is etherpad grumpy today or is it just my browser? | 21:35 |
mtreinish | clarkb: it works! \o/ | 21:35 |
mtreinish | clarkb: thanks, now I can generate a new graph :) | 21:36 |
*** slaweq has quit IRC | 21:36 | |
clarkb | mtreinish: yay, and sorry for that, but that was totally my derp when we deployed the upgraded thing. | 21:36 |
*** agopi is now known as agopi|pto | 21:36 | |
*** xek__ has quit IRC | 21:36 | |
clarkb | TheJulia: we've been trying to track down "slowness". There was a webserver configuration bug that we fixed yesterday. Any more data since then would be helpful to track down additional problems | 21:37 |
clarkb | fungi: re ^ I'm beginning to suspect the database since the server itself seems happy now memory and cpu wise | 21:37 |
clarkb | maybe mordred can take a look at the database? | 21:37 |
TheJulia | I just closed some of my browser tabs and it seems happier, but I kept suddenly loosing connection like the socket was closing out behind it, but I didn't manage to capture anything like tcpdumps of it | 21:38 |
clarkb | TheJulia: did that happen recently? the webserver config bug from yseterday would've caused that | 21:38 |
clarkb | TheJulia: but we now believe that to be fixed | 21:38 |
TheJulia | clarkb: multiple times with-in the last 5 minutes | 21:38 |
clarkb | TheJulia: also if you haven't looked at etherpad since yesterday we had to restart things to pick upt he config change | 21:38 |
corvus | ianw: yes let's. | 21:38 |
mordred | corvus: hrm. I may have lied to you | 21:39 |
corvus | ruh roh | 21:39 |
corvus | mordred: i find that very disagreeable :) | 21:39 |
mordred | corvus: I agree! | 21:39 |
*** carl_cai has joined #openstack-infra | 21:40 | |
mordred | corvus: ok. it's not ansible.default_ipv4 - it's ansible_facts.default_ipv4 ... | 21:40 |
mordred | corvus: and it's not ansible_facts.default_ipv4, it's ansible_facts.default_ipv4.address | 21:41 |
*** agopi|pto has quit IRC | 21:41 | |
mordred | corvus: http://paste.openstack.org/show/733922/ | 21:41 |
clarkb | TheJulia: were those pads in use before the reconnect but sometime today (so after the restarted webserver yesterday) | 21:41 |
TheJulia | clarkb: brand new pads | 21:42 |
TheJulia | One time it just spinned on loading the boiler plate new pad message | 21:42 |
fungi | clarkb: mtreinish: i'm still catching up on scrollback, but is it possible the proxy is connecting to itself spawning another proxy recursively? | 21:43 |
fungi | oh, as i read further, that's precisely what happened | 21:43 |
*** bobh has joined #openstack-infra | 21:43 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 21:43 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 21:43 |
corvus | mordred: ^ | 21:43 |
clarkb | fungi: yup! | 21:44 |
clarkb | TheJulia: ok reading the etherpad lite logs I don't see any errors recently other than for stx-* etherpads (whcih seem older than brand new) | 21:44 |
clarkb | TheJulia: which implies that probably another webserver problem | 21:44 |
mordred | corvus: that nsd.conf looks so weird with ip-address given twice | 21:44 |
TheJulia | clarkb: well, since I closed a bunch of tabs, it has seemed better behaved, some of those windows had disconnected etherpads | 21:45 |
clarkb | I wonder if it is reusing connections and not noticing they have been closed | 21:46 |
corvus | mordred: it is correct though. :) not actually a yaml file :) | 21:46 |
TheJulia | it shouldnt.... considering the max lifetime in a browser is supposed to only be 2 minutes 30 seconds if not actively doing something | 21:47 |
TheJulia | or a websocket | 21:47 |
clarkb | TheJulia: ya it uses websockets | 21:47 |
TheJulia | obscure browser behavior knowledge \o/ | 21:47 |
TheJulia | yeah, it could have been really confused then, but those shouldn't be reused across windows... | 21:47 |
TheJulia | feels like a browser bug | 21:48 |
clarkb | well let us know if you learn anything else about the behavior we can continue to try and track down the weirdness. | 21:48 |
*** jamesmcarthur_ has joined #openstack-infra | 21:48 | |
clarkb | fwiw I've not personally noticed etherpad issues since we fixed the server config | 21:48 |
*** bobh has quit IRC | 21:48 | |
TheJulia | will do, if I see it again I'll start packet capturing and such | 21:48 |
clarkb | double checking the server status page we have tons of free slots for connections. There are only 194 connections and we allow for 4k | 21:50 |
*** bobh has joined #openstack-infra | 21:51 | |
*** jamesmcarthur has quit IRC | 21:52 | |
clarkb | fungi: smcginnis dhellmann ^ have any other data to add in the last ~12 hours or so? | 21:52 |
TheJulia | hmmmm... "Loading..." | 21:52 |
clarkb | TheJulia: this is on a newly opened pad? | 21:53 |
* TheJulia goes and opens up tcpdump | 21:53 | |
TheJulia | just tried to create a new pad | 21:53 |
clarkb | ok I was able to click the new pad button and get one. /me tries by direct url | 21:53 |
clarkb | ya that works too. So its not consistent :( | 21:54 |
clarkb | fwiw I am running firefox 63.0b14 | 21:54 |
TheJulia | hmmmm | 21:54 |
TheJulia | I have a theory | 21:54 |
TheJulia | actually having to install tcpdump now | 21:54 |
clarkb | ok. Chrome 70.0.stuff seems to work too | 21:55 |
TheJulia | yeah, seems my browser is rolling over on six connections and it is trying to re-use :\ | 21:57 |
TheJulia | weird | 21:57 |
TheJulia | and it tries resetting the one it is killing and... wow | 21:58 |
* TheJulia switches browsers | 21:58 | |
smcginnis | I just tried 4 different etherpad links and they all opened in a decent amount of time. | 21:59 |
smcginnis | Chrome on my side | 21:59 |
* TheJulia tries to repopulate the pad again :( | 21:59 | |
clarkb | TheJulia: fwiw http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=115&rra_id=all shows a spike in new connections to etherpad since we upgraded the server for it. This is what pointed at a bad server config (our connection tuning stuff to allow more connections wasn't working), but it hasn't gone away since we fixed that | 22:00 |
clarkb | perhaps newer apache with websockets tickles some bug in $browser? | 22:00 |
mordred | clarkb: oh good | 22:01 |
TheJulia | Maybe, this is a relatively new desktop build and I've just been using the default firefox debian installed, so far chrome seems to be behaving | 22:01 |
*** jcoufal has quit IRC | 22:02 | |
mordred | TheJulia: my favorite part of modern life is how chrome works for a while, then stops, at which point firefox is the right choice, until something breaks it and it turns out chrome works best | 22:04 |
TheJulia | mordred: sadly that doesn't come with tequila | 22:04 |
mordred | TheJulia: anything can come with tequila | 22:05 |
mordred | TheJulia: it's a simple solution - just add tequila | 22:05 |
TheJulia | :) | 22:05 |
clarkb | ya chrome websocket network monitoring thing looks pretty happy here too | 22:07 |
clarkb | firefox's is unfortunately far less readable | 22:08 |
clarkb | oh now that is curious though | 22:09 |
clarkb | it almost looks like firefox is not using web sockets | 22:09 |
clarkb | but chrome is | 22:09 |
openstackgerrit | Merged openstack-infra/system-config master: Install current ansible https://review.openstack.org/609556 | 22:10 |
clarkb | or maybe even chrome isn't? | 22:11 |
clarkb | I'm going to do a thing | 22:15 |
clarkb | mod proxy wstunnel was not enabled. I have enabled ti and restarted apache | 22:16 |
clarkb | now to test if my browsers act differently | 22:16 |
*** threestrands has joined #openstack-infra | 22:17 | |
clarkb | I think this is the fix | 22:18 |
clarkb | etherpad was falling back on the non ws method and polling | 22:18 |
clarkb | TheJulia: ^ want to see if it is happier for you now? | 22:18 |
* clarkb writes a change to make this actually puppeted | 22:18 | |
TheJulia | ack | 22:18 |
openstackgerrit | Clark Boylan proposed openstack-infra/puppet-etherpad_lite master: Enable mod_proxy_wstunnel https://review.openstack.org/614883 | 22:20 |
clarkb | infra-root ^ that may fix the etherpad struggles | 22:20 |
clarkb | I enabled it manually already though because having TheJulia test seems like a good idea | 22:20 |
TheJulia | no issues so far | 22:21 |
TheJulia | seems to be rocking along | 22:21 |
clarkb | alright I need a short break then will review the ansible dns and yamlgroup stack | 22:22 |
*** mriedem has quit IRC | 22:23 | |
*** bobh has quit IRC | 22:24 | |
*** boden has quit IRC | 22:29 | |
dhellmann | clarkb : I have not been experiencing any issues with etherpad this afternoon. | 22:34 |
dhellmann | TheJulia : I've seen issues like what you describe when I had a bad JS file cache for the etherpad server. | 22:34 |
dhellmann | clearing my cache fixed it | 22:34 |
TheJulia | it has been rock solid since clarkb's change and the restart | 22:36 |
TheJulia | But, I have too seen something like that in the past where I needed to update my cache | 22:36 |
TheJulia | Anyway, I think I'm done creating etherpads today. | 22:36 |
openstackgerrit | Merged openstack-infra/yaml2ical master: add monthly recurrence options https://review.openstack.org/608680 | 22:49 |
*** jamesmcarthur_ has quit IRC | 22:50 | |
*** fuentess has quit IRC | 22:54 | |
clarkb | corvus: test failure on https://review.openstack.org/#/c/614648/8 I've identified what I believe is the issue | 22:55 |
clarkb | corvus: I'm happy to push a new ps if that helps too | 22:55 |
corvus | clarkb: what do you think it is? | 22:56 |
clarkb | corvus: sorry I left comment on the change. We check if named is running on adns but should check if bind9 is running | 22:57 |
clarkb | also there is a docs bug on the child change I'm trying to understand now | 22:57 |
corvus | clarkb: oh i see | 22:57 |
*** yamamoto has joined #openstack-infra | 22:57 | |
clarkb | I think our :zuul:rolevar: sphinx thing may only look in the current role? | 22:58 |
corvus | clarkb: i'll fix both | 22:58 |
clarkb | ok reading the master-nameserver REAMDE Idon't see the bind_repos rolevar which is likely the issue | 22:59 |
corvus | yeah, bad copy pasta | 22:59 |
corvus | clarkb: replied to comment | 22:59 |
clarkb | ah | 23:00 |
clarkb | so ps vs systemctl | 23:00 |
corvus | ya | 23:00 |
corvus | like apache/httpd :) | 23:00 |
*** yamamoto has quit IRC | 23:02 | |
corvus | hrm. so for the docs bug -- should i just drop the 'source' attribute from dns_zones in the nameserver role, or fix it? the source attribute is only used by the master nameserver, but it's sort of expected that you'd use the same data structure for both | 23:02 |
corvus | i'll fix it. it's a nice cross-reference. | 23:03 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 23:03 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 23:03 |
*** tpsilva has quit IRC | 23:04 | |
openstackgerrit | Merged openstack-infra/puppet-etherpad_lite master: Enable mod_proxy_wstunnel https://review.openstack.org/614883 | 23:08 |
openstackgerrit | Merged openstack/diskimage-builder master: Add ubuntu-systemd-container operating-system element https://review.openstack.org/563748 | 23:09 |
*** agopi|pto has joined #openstack-infra | 23:10 | |
clarkb | corvus: why did we change from listening on any interface to the specific ip addresses? | 23:10 |
mordred | clarkb: because unbound is going to listen on the local addresses | 23:12 |
clarkb | ah | 23:12 |
*** noama has quit IRC | 23:14 | |
*** owalsh_ has joined #openstack-infra | 23:14 | |
*** owalsh has quit IRC | 23:15 | |
clarkb | corvus: ok pointed out one thing between the two changes that may be a bug in the group vars | 23:16 |
clarkb | specifically we set the group ns to be ns1 and ns2.openstack.org but then in the ns.opendev.org change we use group_vars/ns when testing | 23:16 |
*** kjackal has quit IRC | 23:19 | |
corvus | clarkb: thanks, replied | 23:27 |
clarkb | corvus: so we do need a new patchset to add ns1.opendev.org and ns2.opendev.org to ns gruop? | 23:28 |
corvus | clarkb: yeah, i'm revising the adns1 patch now | 23:28 |
clarkb | ok | 23:29 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 23:29 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 23:29 |
corvus | clarkb: how's that look? | 23:29 |
clarkb | corvus: +2 to both changes now. Thanks | 23:30 |
corvus | ianw: i still see 2.7.0rc1... maybe we do need latest. | 23:31 |
*** jamesmcarthur has joined #openstack-infra | 23:31 | |
ianw | corvus: yeah, i wasn't sure but i think the "ensure: latest" is probably required or it's just happy it's there at all | 23:34 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Really install current ansible on bridge https://review.openstack.org/614889 | 23:34 |
corvus | ianw: ^ | 23:34 |
clarkb | hrm do we want latest though? | 23:34 |
clarkb | like 2.8 is likely to break us beacuse every release breaks the users of the last one ... | 23:35 |
corvus | well, there's a patch upon which people can pontificate | 23:35 |
clarkb | can we set a specific version /me reads docs | 23:35 |
corvus | clarkb: yes, we previously set version 2.7.0rc1 | 23:35 |
*** jamesmcarthur has quit IRC | 23:35 | |
corvus | clarkb: see https://review.openstack.org/609556 | 23:35 |
ianw | is it better to break quickly and force us to fix it (or take action), or wait until we need to update then have the pain of any intermediate bit rot ... | 23:36 |
clarkb | my vote would be to set the version to 2.7.0 | 23:36 |
corvus | so if we want, we can just switch to 2.7.0. | 23:36 |
ianw | an eternal question | 23:36 |
clarkb | ianw: my concern would be for ansible to apply the wrong state remotely | 23:36 |
clarkb | ianw: rather than safely failing | 23:36 |
corvus | if the failure mode is "ansible stops working" i'm fine with bleeding edge. if it's that^ then... | 23:37 |
ianw | how much do you test ansible's ci? :) | 23:37 |
ianw | s/test/trust/ | 23:37 |
ianw | or maybe i mean test too :) | 23:37 |
clarkb | ha | 23:37 |
corvus | if we go with clarkb's approach, we do get the benefit of our own testing (if we run all of the system-config jobs on a version bump) | 23:38 |
ianw | actually future versions does seem like something we can at least minimally (in fact more than minimally) test with our jobs? | 23:39 |
ianw | yeah, the trick is just having something testing it. we could even do a "ansible git master" job? | 23:40 |
clarkb | ianw: oh thats a neat idea, I think we could do that | 23:40 |
ianw | why do i keep saying testing. i mean something updating it! | 23:40 |
ianw | $ git describe --tags --abbrev=0 | 23:42 |
ianw | v2.7.0.a1 | 23:42 |
ianw | would possibly be the way to test "the latest release"? | 23:42 |
clarkb | ya or even just point at master | 23:42 |
clarkb | er devel | 23:42 |
ianw | yep, as long as it doesn't fail so much that nobody ever bothers to look at it | 23:43 |
corvus | clarkb: be sure to leave a -1 on 614889 :) | 23:43 |
clarkb | corvus: yup | 23:43 |
corvus | looks like adns is passing, but ns is not | 23:47 |
*** carl_cai has quit IRC | 23:50 | |
corvus | though it failed on the adns portion | 23:50 |
corvus | i think it's because there's no ipv6 address | 23:51 |
corvus | we can assume there will be one in production, but not in tests | 23:51 |
corvus | (likewise, we shouldn't assume there will be an ipv4 address in tests too) | 23:51 |
clarkb | should be able to wrap those sections with jinja that test if the ipv* vars are set and set them then? | 23:52 |
mordred | corvus: is it working add an {% if ... yeah - that ^^ | 23:52 |
corvus | yeah i think that'll work for now | 23:52 |
corvus | those 10 lines to make a functioning test system with notifies are going to grow to 20... but that's later :) | 23:52 |
*** Swami has quit IRC | 23:53 | |
mordred | haha | 23:54 |
*** gyee has quit IRC | 23:58 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible https://review.openstack.org/614648 | 23:58 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible https://review.openstack.org/614870 | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!