*** dpawlik has joined #openstack-infra | 00:02 | |
*** dpawlik has quit IRC | 00:06 | |
ianw | # ping www.google.com | 00:07 |
---|---|---|
ianw | ping: www.google.com: Name or service not known | 00:07 |
ianw | i guess that's good | 00:07 |
clarkb | ianw: is there an ip address (I'm curious to see too | 00:08 |
ianw | 23.253.218.241 | 00:08 |
ianw | unbound is running | 00:08 |
clarkb | host google.com returns an address | 00:09 |
ianw | # host www.google.com | 00:10 |
ianw | www.google.com has address 216.58.192.164 | 00:10 |
ianw | Host www.google.com not found: 2(SERVFAIL) | 00:10 |
ianw | ? | 00:10 |
ianw | # host www.google.com | 00:10 |
ianw | www.google.com has address 216.58.192.164 | 00:10 |
ianw | www.google.com has IPv6 address 2607:f8b0:4009:80f::2004 | 00:10 |
clarkb | I dropped the wwww | 00:10 |
ianw | now it works ...? | 00:10 |
*** jamesmcarthur has joined #openstack-infra | 00:10 | |
ianw | clarkb: did you restart anything? it certainly seems like it started out borked, but now looks ok | 00:11 |
*** slaweq has joined #openstack-infra | 00:11 | |
clarkb | Sep 27 00:07:29 unbound[1355:1] debug: sending to target: <.> 2620:0:ccc::2#53 then Sep 27 00:07:29 unbound[1355:1] notice: sendto failed: Network is unreachable | 00:11 |
clarkb | ianw: I did not restart anything | 00:11 |
clarkb | I wonder if it is round robining and failing on some of the resolvers | 00:11 |
ianw | ntpd[822]: error resolving pool 2.fedora.pool.ntp.org: Name or service not known (-2) | 00:12 |
clarkb | I wonder if the issue is using ipv6 and relying on RAs to configure the interface | 00:13 |
clarkb | we statically configure the ipv4 address with glean but rely on RAs for ipv6 aiui | 00:13 |
clarkb | so there could be a lag after boot where ipv6 isn't working | 00:13 |
*** ijw has joined #openstack-infra | 00:14 | |
clarkb | ianw: oh we don't have ipv6 configured at all on that host | 00:15 |
*** slaweq has quit IRC | 00:15 | |
*** jamesmcarthur has quit IRC | 00:15 | |
ianw | yeah ... only link local addrs | 00:15 |
clarkb | ok so we don't have working dns on boot beacuse we don't haev working ipv6 on boot like we expect | 00:16 |
clarkb | but some clouds never have working ipv6 | 00:16 |
pabelanger | glean doesn't support ipv6 for fedora / centos | 00:16 |
pabelanger | just ubuntu | 00:16 |
clarkb | ah maybe we do statically configure ipv6 on debuntu then | 00:16 |
pabelanger | yah, only clouds with dhcp ipv6 will be good | 00:16 |
pabelanger | which I think is inap? | 00:16 |
clarkb | vexxhost RAs, I don't think inap does ipv6 | 00:17 |
pabelanger | kk | 00:17 |
clarkb | ovh has ipv6 but the instances don't know about it | 00:17 |
pabelanger | okay, so that might be why ovh is failing on fedora | 00:17 |
pabelanger | if you look in forwarding.conf for unbound, we setup ipv6 | 00:17 |
pabelanger | but when we run configure-unbound role, we check for ipv6 / ipv4 first | 00:18 |
pabelanger | then drop in right forwarding.conf | 00:18 |
clarkb | ya I think we assumed that unbound wouldn't use ipv6 addrs if it couldn't ipv6 | 00:18 |
ianw | in theory it should just ignore that? it seems to come good after a little bit, however ... maybe there's a timeout | 00:18 |
pabelanger | clarkb: yah, maybe that changed in fedora recently | 00:18 |
ianw | clarkb: ok if i reboot and see if dns is borked right at boot again? | 00:18 |
pabelanger | because centos still works | 00:18 |
clarkb | ianw: well I think it round robins the different resolvers and half of them are ipv4 | 00:18 |
clarkb | ianw: ya I'm off of the host | 00:18 |
*** mriedem_away has quit IRC | 00:18 | |
clarkb | pabelanger: oh could be | 00:19 |
pabelanger | that would explain why fedora-28 just start randoming failing too, when fedora-27 worked fine | 00:19 |
pabelanger | I can check unbound changelog and see if anything pops up | 00:20 |
*** felipemonteiro has joined #openstack-infra | 00:22 | |
ianw | yep ... | 00:22 |
ianw | 1028 sendto(42, "\200T\1\0\0\1\0\0\0\0\0\1\3www\3abc\3net\2au\0\0\1\0\1"..., 43, 0, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:4860:4860::8888", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 ENETUNREACH | 00:22 |
ianw | (Network is unreachable) | 00:22 |
ianw | that was stracing unbound | 00:23 |
pabelanger | so, guess we need to rework DIB them | 00:26 |
pabelanger | then* | 00:26 |
clarkb | we could do ipv4 only by default since that works in ipv6 "only" clouds | 00:26 |
clarkb | it is just less reliable because NAT | 00:26 |
pabelanger | https://www.nlnetlabs.nl/svn/unbound/tags/release-1.8.0/doc/Changelog | 00:27 |
ianw | sigh, seems *every* part is unreliable | 00:27 |
pabelanger | haven't found anything yet related to forwarding | 00:27 |
ianw | if it only has link-local ipv6 addresses, why would it be trying this ... time to git pull ... | 00:28 |
ianw | oh don't tell me it's in svn. | 00:29 |
pabelanger | oldschool | 00:29 |
*** jamesmcarthur has joined #openstack-infra | 00:31 | |
*** jamesmcarthur has quit IRC | 00:35 | |
*** ansmith has quit IRC | 00:42 | |
ianw | http://paste.openstack.org/show/730975/ | 00:50 |
ianw | so it seems unbound has a rather complicated algorithm for choosing the forwarding server to talk to based on ping times etc, and there is some caching period for this | 00:50 |
*** felipemonteiro has quit IRC | 00:51 | |
ianw | my inclination here is that *something* is slightly different about fedora start and maybe when unbound queries ipv4 it times out, etc ... basically it ends up looking as good/as bad as ipv6 | 00:51 |
*** jamesmcarthur has joined #openstack-infra | 00:52 | |
ianw | so the ipv6 servers get added into the mix ... at least until networking is fully 100% and enough time has passed that unbound starts to requery it's performance considerations | 00:52 |
ianw | at which point it now notices the ipv4 servers are really fast and the ipv6 servers don't exist | 00:52 |
* ianw is waving hands madly on this ... | 00:53 | |
*** Emine has quit IRC | 00:55 | |
*** jamesmcarthur has quit IRC | 00:56 | |
ianw | yeah, for example | 01:00 |
ianw | Sep 27 00:58:15 ianw-test ntpd[593]: error resolving pool 2.fedora.pool.ntp.org: Name or service not known (-2) | 01:00 |
ianw | Sep 27 00:58:19 ianw-test network[663]: Bringing up interface eth0: [ OK ] | 01:00 |
ianw | Sep 27 00:58:23 ianw-test network[663]: Bringing up interface eth1: [ OK ] | 01:00 |
ianw | Sep 27 00:58:24 ianw-test systemd[1]: Starting Unbound recursive Domain Name Server... | 01:03 |
ianw | i dunno, even more confused. unbound is starting after eth1 | 01:03 |
*** adriancz has quit IRC | 01:07 | |
*** longkb has joined #openstack-infra | 01:13 | |
*** hongbin has joined #openstack-infra | 01:19 | |
*** rlandy has quit IRC | 01:31 | |
*** openstackgerrit has joined #openstack-infra | 01:34 | |
openstackgerrit | Merged openstack-infra/system-config master: Only replicate gtest-org and kdc https://review.openstack.org/605490 | 01:34 |
*** ijw has quit IRC | 01:41 | |
*** mrsoul has joined #openstack-infra | 01:54 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Uncap cherrypy https://review.openstack.org/601136 | 02:13 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: rewrite interface in react https://review.openstack.org/591604 | 02:14 |
*** Bhujay has joined #openstack-infra | 02:19 | |
*** Bhujay has quit IRC | 02:19 | |
*** Bhujay has joined #openstack-infra | 02:20 | |
*** armax has quit IRC | 02:22 | |
*** annp has joined #openstack-infra | 02:23 | |
ianw | ok, i've filed https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4188 , i'm sort of assuming the response, if any, will be "well durr, don't do that" | 02:25 |
openstack | www.nlnetlabs.nl bug 4188 in server "IPv6 forwarders without ipv6 result in SERVFAIL" [Enhancement,New] - Assigned to unbound-team | 02:25 |
*** graphene has quit IRC | 02:33 | |
*** diablo_rojo has quit IRC | 02:33 | |
*** apetrich has quit IRC | 02:40 | |
*** psachin has joined #openstack-infra | 02:48 | |
*** imacdonn has quit IRC | 02:50 | |
*** markvoelker has joined #openstack-infra | 02:50 | |
*** imacdonn has joined #openstack-infra | 02:51 | |
*** rfolco has quit IRC | 02:54 | |
*** rkukura has quit IRC | 02:55 | |
*** Bhujay has quit IRC | 02:59 | |
openstackgerrit | Sergey Vilgelm proposed openstack-dev/pbr master: Special case long_description_content_type https://review.openstack.org/565177 | 03:06 |
*** felipemonteiro has joined #openstack-infra | 03:15 | |
*** harlowja has quit IRC | 03:20 | |
*** ramishra has joined #openstack-infra | 03:27 | |
*** jiapei has joined #openstack-infra | 03:42 | |
*** rcernin_ has quit IRC | 03:42 | |
*** rcernin has joined #openstack-infra | 03:43 | |
*** dave-mccowan has quit IRC | 03:46 | |
*** hongbin has quit IRC | 03:47 | |
*** dpawlik has joined #openstack-infra | 04:02 | |
*** haleyb has quit IRC | 04:07 | |
*** dpawlik has quit IRC | 04:07 | |
*** vivsoni_ has joined #openstack-infra | 04:13 | |
*** vivsoni has quit IRC | 04:15 | |
*** felipemonteiro has quit IRC | 04:16 | |
*** udesale has joined #openstack-infra | 04:18 | |
*** ijw has joined #openstack-infra | 04:23 | |
*** ijw has quit IRC | 04:27 | |
openstackgerrit | Ian Wienand proposed openstack-infra/project-config master: elements/ndoepool-base: only initially populate ipv4 nameservers https://review.openstack.org/605583 | 04:27 |
openstackgerrit | Merged openstack-infra/project-config master: Added twine check functionality to python-tarball playbook https://review.openstack.org/605096 | 04:32 |
*** slaweq has joined #openstack-infra | 05:11 | |
*** slaweq has quit IRC | 05:15 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role https://review.openstack.org/605585 | 05:16 |
*** apetrich has joined #openstack-infra | 05:27 | |
*** quique|off is now known as quiquell | 05:31 | |
*** Bhujay has joined #openstack-infra | 05:35 | |
openstackgerrit | Ian Wienand proposed openstack-dev/pbr master: Special case long_description_content_type https://review.openstack.org/565177 | 05:39 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix unreachable nodes detection https://review.openstack.org/602829 | 05:39 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Also retry the job if a post job failed with unreachable https://review.openstack.org/602830 | 05:39 |
*** bnemec has quit IRC | 05:39 | |
*** ijw has joined #openstack-infra | 05:42 | |
*** rkukura has joined #openstack-infra | 05:44 | |
quiquell | Any infra-root here ? we need to priorize a promotion blocker in the zuul queue | 05:44 |
*** ijw has quit IRC | 05:47 | |
*** rkukura has quit IRC | 05:49 | |
ianw | quiquell: what do you need done? | 05:49 |
quiquell | ianw: this one https://review.openstack.org/#/c/605039/ | 05:50 |
quiquell | ianw: Can we wait, I have doubts about one thing | 05:53 |
ianw | well yeah, there's a lot ahead of it running ... i know you realise it will reset the queue | 05:53 |
quiquell | ianw: Yes... maybe is more hurt than gainn | 05:54 |
quiquell | ianw: people has being waiting to merge stuff for long | 05:54 |
quiquell | ianw: nah leave it | 05:54 |
quiquell | ianw: sorry about the noise | 05:54 |
quiquell | We don't have timeouts now | 05:54 |
*** gfidente has joined #openstack-infra | 06:00 | |
*** roman_g has quit IRC | 06:03 | |
AJaeger | config-core, two small cleanups for your consideration, please: https://review.openstack.org/605077 and https://review.openstack.org/605344 . Also, https://review.openstack.org/605128 is ready | 06:05 |
*** kopecmartin|off is now known as kopecmartin|ruck | 06:08 | |
*** ijw has joined #openstack-infra | 06:09 | |
*** longkb has quit IRC | 06:13 | |
*** njohnston has quit IRC | 06:14 | |
*** longkb has joined #openstack-infra | 06:14 | |
*** ijw has quit IRC | 06:14 | |
*** AJaeger has quit IRC | 06:15 | |
*** njohnston has joined #openstack-infra | 06:15 | |
*** adriancz has joined #openstack-infra | 06:17 | |
*** AJaeger has joined #openstack-infra | 06:18 | |
*** slaweq has joined #openstack-infra | 06:23 | |
*** dpawlik has joined #openstack-infra | 06:23 | |
*** pcaruana has joined #openstack-infra | 06:33 | |
*** jamesdenton has quit IRC | 06:35 | |
*** graphene has joined #openstack-infra | 06:39 | |
*** aojea has joined #openstack-infra | 06:40 | |
mnasiadka | Hi - I'm trying to fetch periodic jobs failure metrics from graphite.openstack.org - and it seems the ones under stats_counts.zuul.tenant.pipeline.periodic.project.* are empty, but the all_jobs and total_changes are populated - should I look somewhere else in the tree to get those? | 06:46 |
*** diablo_rojo has joined #openstack-infra | 06:55 | |
*** quiquell is now known as quiquell|brb | 06:55 | |
*** ginopc has joined #openstack-infra | 07:07 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add build page https://review.openstack.org/597024 | 07:10 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add job page https://review.openstack.org/597048 | 07:10 |
*** graphene has quit IRC | 07:10 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add config-errors notifications drawer https://review.openstack.org/597147 | 07:10 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472 | 07:11 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add project page https://review.openstack.org/604266 | 07:11 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add labels page https://review.openstack.org/604682 | 07:11 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add nodes page https://review.openstack.org/604683 | 07:11 |
*** rcernin has quit IRC | 07:12 | |
*** graphene has joined #openstack-infra | 07:12 | |
*** chkumar|off is now known as chandankumar | 07:18 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement a Kubernetes driver https://review.openstack.org/535557 | 07:19 |
*** shardy has joined #openstack-infra | 07:21 | |
*** diablo_rojo has quit IRC | 07:27 | |
*** graphene has quit IRC | 07:30 | |
*** graphene has joined #openstack-infra | 07:31 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role https://review.openstack.org/605585 | 07:35 |
*** quiquell|brb is now known as quiquell | 07:38 | |
*** florianf|afk is now known as florianf | 07:40 | |
*** longkb has quit IRC | 07:41 | |
*** longkb has joined #openstack-infra | 07:42 | |
*** tosky has joined #openstack-infra | 07:45 | |
*** jpich has joined #openstack-infra | 07:47 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove tricircle dsvm jobs https://review.openstack.org/605344 | 07:49 |
mnasiadka | ok, found those in stats.zuul.tenant... - thanks for help ;) | 07:55 |
*** jpena|off is now known as jpena | 08:01 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: web: add tenant and project scoped, JWT-protected actions https://review.openstack.org/576907 | 08:04 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: CLI: add create-web-token command https://review.openstack.org/605386 | 08:09 |
*** e0ne has joined #openstack-infra | 08:12 | |
*** alexchadin has joined #openstack-infra | 08:21 | |
*** kashyap has joined #openstack-infra | 08:22 | |
kashyap | cmurphy: Morning | 08:22 |
cmurphy | morning kashyap | 08:23 |
kashyap | cmurphy: When you get a moment, can you update the FIXME for SLES here: https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix | 08:23 |
cmurphy | kashyap: will do, thanks for the reminder | 08:23 |
kashyap | Thanks! | 08:23 |
openstackgerrit | Dmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces https://review.openstack.org/391787 | 08:26 |
openstackgerrit | Filippo Inzaghi proposed openstack-infra/bindep master: fix tox python3 overrides https://review.openstack.org/605613 | 08:26 |
*** xinliang has joined #openstack-infra | 08:27 | |
*** janki has joined #openstack-infra | 08:27 | |
*** jiapei has quit IRC | 08:31 | |
*** lpetrut has joined #openstack-infra | 08:33 | |
openstackgerrit | Dmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces https://review.openstack.org/391787 | 08:35 |
*** verdurin has quit IRC | 08:36 | |
*** dtantsur|afk is now known as dtantsur | 08:36 | |
openstackgerrit | Filippo Inzaghi proposed openstack-infra/elastic-recheck master: fix tox python3 overrides https://review.openstack.org/574494 | 08:39 |
*** verdurin has joined #openstack-infra | 08:39 | |
openstackgerrit | Filippo Inzaghi proposed openstack-infra/elastic-recheck master: fix tox python3 overrides https://review.openstack.org/605618 | 08:41 |
*** derekh has joined #openstack-infra | 08:43 | |
*** kashyap has left #openstack-infra | 08:47 | |
*** roman_g has joined #openstack-infra | 08:48 | |
*** panda|off is now known as panda | 09:00 | |
*** ykarel has joined #openstack-infra | 09:04 | |
*** electrofelix has joined #openstack-infra | 09:05 | |
*** janki has quit IRC | 09:13 | |
*** Bhujay has quit IRC | 09:19 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: add Gentoo jobs and vars and also fix install test https://review.openstack.org/602439 | 09:20 |
*** pbourke has quit IRC | 09:24 | |
*** pbourke has joined #openstack-infra | 09:24 | |
*** Emine has joined #openstack-infra | 09:24 | |
*** Bhujay has joined #openstack-infra | 09:25 | |
*** longkb has quit IRC | 09:38 | |
*** longkb has joined #openstack-infra | 09:38 | |
*** alexchadin has quit IRC | 09:42 | |
*** calbers has quit IRC | 10:00 | |
*** calbers has joined #openstack-infra | 10:03 | |
*** longkb has quit IRC | 10:03 | |
*** yamamoto has quit IRC | 10:17 | |
*** shardy is now known as shardy_mtg | 10:18 | |
*** yamamoto has joined #openstack-infra | 10:25 | |
*** yamamoto has quit IRC | 10:46 | |
*** yamamoto has joined #openstack-infra | 10:48 | |
*** yamamoto has quit IRC | 10:49 | |
openstackgerrit | Miguel Angel Ajo proposed openstack-infra/project-config master: Enable storyboard for os-log-merger https://review.openstack.org/605644 | 10:52 |
*** udesale has quit IRC | 10:54 | |
ajo | not sure if that's the right way to do it though ^ | 11:00 |
*** jpena is now known as jpena|lunch | 11:06 | |
*** alexchadin has joined #openstack-infra | 11:22 | |
fungi | just a heads up, i'm on the road until probably ~19:00z, but will try to catch up once i'm back at the computer again | 11:22 |
*** yamamoto has joined #openstack-infra | 11:23 | |
*** joabdearaujo has joined #openstack-infra | 11:26 | |
*** quiquell is now known as quiquell|lunch | 11:27 | |
*** felipemonteiro has joined #openstack-infra | 11:30 | |
*** ykarel_ has joined #openstack-infra | 11:37 | |
*** ssbarnea|bkp has joined #openstack-infra | 11:37 | |
*** ykarel has quit IRC | 11:39 | |
*** quiquell|lunch is now known as quiquell | 11:43 | |
*** tosky__ has joined #openstack-infra | 11:44 | |
*** tosky has quit IRC | 11:44 | |
*** tosky has joined #openstack-infra | 11:45 | |
*** olivierb has joined #openstack-infra | 11:47 | |
*** pcaruana has quit IRC | 11:50 | |
*** Bhujay has quit IRC | 11:51 | |
*** Bhujay has joined #openstack-infra | 11:52 | |
*** Bhujay has quit IRC | 11:53 | |
*** Bhujay has joined #openstack-infra | 11:53 | |
*** tpsilva has joined #openstack-infra | 11:54 | |
*** Bhujay has quit IRC | 11:54 | |
*** rfolco has joined #openstack-infra | 11:56 | |
*** olivierb has quit IRC | 11:58 | |
*** olivierb has joined #openstack-infra | 11:59 | |
openstackgerrit | Simon Westphahl proposed openstack-infra/zuul master: Use merger to get list of files for pull-request https://review.openstack.org/603287 | 12:00 |
*** annp has quit IRC | 12:03 | |
*** felipemonteiro has quit IRC | 12:04 | |
frickler | slaweq: hi, how are you doing with debugging the dvr job? do you still need the held nodes? | 12:05 |
frickler | pabelanger: there's also a node held for you, 19d old | 12:05 |
*** trown|outtypewww is now known as trown | 12:06 | |
*** e0ne has quit IRC | 12:09 | |
*** e0ne has joined #openstack-infra | 12:09 | |
openstackgerrit | Simon Westphahl proposed openstack-infra/zuul master: Use merger to get list of files for pull-request https://review.openstack.org/603287 | 12:13 |
*** shardy_mtg is now known as shardy | 12:14 | |
*** ijw has joined #openstack-infra | 12:14 | |
*** yamamoto has quit IRC | 12:14 | |
*** tosky__ has quit IRC | 12:15 | |
*** yamamoto has joined #openstack-infra | 12:16 | |
*** jamesdenton has joined #openstack-infra | 12:16 | |
*** ijw has quit IRC | 12:18 | |
mordred | morning humans! | 12:19 |
slaweq | frickler: hi, no I don't need them | 12:21 |
slaweq | frickler: sorry, I thought that they will be removed afrer few hours | 12:21 |
*** mriedem has joined #openstack-infra | 12:22 | |
*** rlandy has joined #openstack-infra | 12:22 | |
frickler | slaweq: no, they are kept until someone deletes them manually. will clean up now, thanks for your feedback. | 12:23 |
slaweq | frickler: so please remove them now :) and once again sorry for keeping it so long | 12:24 |
slaweq | I will remember now that :) | 12:24 |
mordred | frickler: oh - sorry - that's my bad | 12:31 |
mordred | frickler: I, like slaweq, thought they had a timeout associated with themselves | 12:31 |
mordred | so didn't delete them manually when he said he was done with them | 12:31 |
slaweq | mordred: I thought like that because You told me that actually :) | 12:31 |
*** mriedem has quit IRC | 12:34 | |
*** jpena|lunch is now known as jpena | 12:34 | |
mordred | slaweq: well, I guess now you learn to not listen to me :) | 12:35 |
slaweq | mordred: LOL | 12:36 |
*** pcaruana has joined #openstack-infra | 12:39 | |
*** alexchadin has quit IRC | 12:43 | |
*** ykarel_ is now known as ykarel | 12:44 | |
*** ansmith has joined #openstack-infra | 12:44 | |
openstackgerrit | Monty Taylor proposed openstack-infra/system-config master: Remove snapd from servers https://review.openstack.org/605676 | 12:46 |
*** mriedem has joined #openstack-infra | 12:49 | |
mordred | infra-root: I just restarted gerrit on review-dev to test the update to the replication rules | 12:53 |
mordred | BUT | 12:53 |
mordred | infra-root: gerrit does not like the regexes we have in the config file that contain # characters | 12:54 |
*** ramishra has quit IRC | 12:54 | |
mordred | I'm not sure what happened / why this has stopped working | 12:54 |
mordred | but gerrit will not start unless they're removed | 12:54 |
mordred | you can replicate the error just by running git config --file=/home/gerrit2/review_site/etc/gerrit.config --list | 12:54 |
pabelanger | frickler: that can be deleted if you like | 12:56 |
*** agopi has quit IRC | 12:58 | |
*** bobh has joined #openstack-infra | 12:59 | |
*** kgiusti has joined #openstack-infra | 13:00 | |
*** haleyb has joined #openstack-infra | 13:03 | |
*** boden has joined #openstack-infra | 13:08 | |
*** dhill_ has quit IRC | 13:13 | |
*** scroll has joined #openstack-infra | 13:13 | |
*** dhill_ has joined #openstack-infra | 13:14 | |
*** njohnston has quit IRC | 13:15 | |
*** njohnston has joined #openstack-infra | 13:16 | |
openstackgerrit | Monty Taylor proposed openstack-infra/system-config master: Double the gerrit regex backslashes https://review.openstack.org/605697 | 13:17 |
mordred | infra-root, cmurphy: ^^ | 13:18 |
mordred | cmurphy: I'm *thinking* that's futureparser related, as that's really the only thing that has changed in the gerrit puppet | 13:18 |
*** bnemec has joined #openstack-infra | 13:18 | |
pabelanger | mordred: Can never have enough backslashes | 13:19 |
cmurphy | mordred: hmm | 13:19 |
*** agopi has joined #openstack-infra | 13:20 | |
*** ramishra has joined #openstack-infra | 13:22 | |
*** e0ne has quit IRC | 13:22 | |
cmurphy | mordred: here's where that gets passed down to http://git.openstack.org/cgit/openstack-infra/puppet-gerrit/tree/templates/gerrit.config.erb#n165 there's no reason that should be different with the future parser | 13:22 |
*** aidin has joined #openstack-infra | 13:22 | |
cmurphy | mordred: iirc clarkb said when we turned the future parser on there was one minor change in the config file and it wasn't this | 13:23 |
mordred | cmurphy: spectacular | 13:24 |
mordred | cmurphy: I am 100% dumbfounded | 13:24 |
mordred | cmurphy: fwiw - I tested that by applying it on review-dev via puppet and it made the error go away | 13:24 |
cmurphy | mordred: well okay then that's good enough | 13:25 |
mordred | cmurphy: well - except - what the heck happened? | 13:25 |
*** e0ne has joined #openstack-infra | 13:26 | |
cmurphy | gremlins | 13:27 |
mordred | cmurphy: don't feed them after midnight | 13:27 |
cmurphy | mordred: i think i saw something similar in how the command parameter of an exec was being interpolated but the problem magically went away after clarkb asked about it and now i don't know which patchset of which change it was in | 13:30 |
*** yamamoto has quit IRC | 13:31 | |
*** yamamoto has joined #openstack-infra | 13:31 | |
mordred | \o/ | 13:36 |
*** psachin has quit IRC | 13:36 | |
*** yumiriam has quit IRC | 13:41 | |
*** rh-jelabarre has joined #openstack-infra | 13:43 | |
*** aidin has quit IRC | 13:53 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: switch all official python projects to python3 publishing job https://review.openstack.org/598323 | 13:55 |
*** ykarel has quit IRC | 13:59 | |
*** fuentess has joined #openstack-infra | 14:08 | |
openstackgerrit | Merged openstack-infra/project-config master: Add zone-opendev.org project https://review.openstack.org/605095 | 14:30 |
*** jistr is now known as jistr|call | 14:31 | |
*** yamamoto has quit IRC | 14:32 | |
*** yamamoto has joined #openstack-infra | 14:33 | |
*** yamamoto has quit IRC | 14:33 | |
*** yamamoto has joined #openstack-infra | 14:34 | |
*** lpetrut has quit IRC | 14:36 | |
*** lpetrut has joined #openstack-infra | 14:36 | |
*** yamamoto has quit IRC | 14:38 | |
*** smarcet has joined #openstack-infra | 14:38 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Fix on CFP presentation submission proccess https://review.openstack.org/605760 | 14:43 |
*** yamamoto has joined #openstack-infra | 14:43 | |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Fix on CFP presentation submission proccess https://review.openstack.org/605760 | 14:44 |
*** evrardjp has quit IRC | 14:45 | |
corvus | mordred, fungi: can you review https://review.openstack.org/605092 when you have a moment? | 14:45 |
*** evrardjp has joined #openstack-infra | 14:47 | |
*** HenryG has quit IRC | 14:49 | |
*** armax has joined #openstack-infra | 14:52 | |
*** yamamoto has quit IRC | 14:52 | |
*** HenryG has joined #openstack-infra | 14:55 | |
*** rkukura has joined #openstack-infra | 14:57 | |
*** _erlon_ has joined #openstack-infra | 14:58 | |
*** sthussey has joined #openstack-infra | 14:58 | |
*** evrardjp has quit IRC | 14:58 | |
*** Swami has joined #openstack-infra | 14:59 | |
*** hamzy_ has quit IRC | 15:00 | |
*** shardy is now known as shardy_afk | 15:00 | |
openstackgerrit | sebastian marcet proposed openstack-infra/system-config master: OpenStackId production release 1.0.25 https://review.openstack.org/605770 | 15:06 |
*** shardy_afk is now known as shardy | 15:07 | |
openstackgerrit | Salvador Fuentes Garcia proposed openstack-infra/openstack-zuul-jobs master: Add zuul envar for kata tests https://review.openstack.org/605773 | 15:09 |
mordred | corvus: done | 15:14 |
*** jbadiapa has quit IRC | 15:20 | |
*** jamesmcarthur has joined #openstack-infra | 15:21 | |
pabelanger | corvus: mordred: clarkb: fungi: is there any interest in having manage-projects in jeepyb grow support for github only things? This might be more in the content of zuul user using github then openstack-infra. I'm just starting to look how we can use some of the concepts here in openstack-infra but for ansible-network. Rather then going out and building something one off | 15:22 |
mordred | pabelanger: I don't pesonally have any interest in that - but I can understand how it would be desirable for someone in github land | 15:25 |
*** hamzy_ has joined #openstack-infra | 15:26 | |
*** shardy has quit IRC | 15:26 | |
pabelanger | I guess the question is more, will people -2 it over just not review | 15:26 |
*** jbadiapa has joined #openstack-infra | 15:27 | |
AJaeger | dhellmann: comment on https://review.openstack.org/598323 - are we ready with PyPi? | 15:27 |
*** quiquell is now known as quiquell|off | 15:28 | |
*** yamamoto has joined #openstack-infra | 15:28 | |
mordred | pabelanger: I would not -2 - but I don't believe I have context to review. I wonder - we have a base class in jeepyb now for reading the config - I wonder if we could do a little more work on the codebase to make it something thatyou could make a jeepyb-github project that depends on jeepyb but doesn't need to add logic directly to jeepyb itself? | 15:29 |
*** hamzy_ has quit IRC | 15:30 | |
pabelanger | mordred: yes, that would work also. And think a fine solution | 15:30 |
*** kopecmartin|ruck is now known as kopecmartin|off | 15:31 | |
pabelanger | it is more the existing config layout / github bits that are interesting, then duplicating that all into something else | 15:31 |
pabelanger | and, imagine say somebody like kata might be intereted in using it, to say help sync things like labels across all projects | 15:32 |
pabelanger | that's what I am really looking for right now | 15:32 |
pabelanger | automating the creation of repos, is icing on the cake :) | 15:32 |
*** eernst has joined #openstack-infra | 15:34 | |
pabelanger | the other idea I had, was maybe just move this into a zuul-job and improve ansible modules for github | 15:34 |
pabelanger | kinda like we use ansible-role-cloud-launcher today | 15:34 |
*** graphene has quit IRC | 15:35 | |
mordred | pabelanger: I actually think that sounds like a more better approach - more generally managable | 15:36 |
*** graphene has joined #openstack-infra | 15:36 | |
*** dave-mccowan has joined #openstack-infra | 15:41 | |
dhellmann | AJaeger : You're right. I think we can work those out as we find them. | 15:45 |
*** shardy has joined #openstack-infra | 15:46 | |
*** chandankumar is now known as chkumar|off | 15:46 | |
*** jistr|call is now known as jistr | 15:46 | |
*** e0ne has quit IRC | 15:51 | |
AJaeger | mordred, dhellmann, smcginnis, please look at https://review.openstack.org/531825 and https://review.openstack.org/598323 - should we merge 598323 (and abandon 531825) - and work these out? (see dhellmann 's comment above as well) | 15:52 |
dhellmann | AJaeger , mordred : yeah, I think we want to just switch to the new job instead | 15:53 |
smcginnis | ++ | 15:53 |
*** gyee has joined #openstack-infra | 15:53 | |
*** rpittau has quit IRC | 15:54 | |
*** gyee has quit IRC | 15:54 | |
dhellmann | I'll go through the list of projects and register os-$foo names for the ones where we don't own the regular name | 15:54 |
*** dpawlik has quit IRC | 15:54 | |
smcginnis | I seem to recall some discussion in the past of trying to get control of at least one or two of those names as they appeared to be abandoned. | 15:55 |
AJaeger | dhellmann: I remember somebody was trying to claim keystone (and others). Let's see what mordred remembers... | 15:55 |
smcginnis | Though I also recall hearing that can be a lengthy process. | 15:55 |
AJaeger | smcginnis: yes, so did I - let's figure status out on that one. | 15:55 |
AJaeger | dhellmann's change leaves one occourence of release-openstack-server template, should we remove it completely? | 15:57 |
dhellmann | which one did I miss? | 15:57 |
AJaeger | mordred, clarkb, fungi, I'd like you to chime in and have alignment ^ | 15:57 |
*** gyee has joined #openstack-infra | 15:57 | |
AJaeger | dhellmann: openstack/ansible-role-tripleo-congress | 15:57 |
*** dpawlik has joined #openstack-infra | 15:58 | |
dhellmann | why are we using that job for ansible roles? | 15:58 |
AJaeger | no idea ;( Might just be wrong... | 15:58 |
*** graphene has quit IRC | 15:59 | |
*** dpawlik has quit IRC | 15:59 | |
mordred | AJaeger, dhellmann: there is a person with a project published to keystone which seems abandoned - I contacted the author several times to see if he'd transfer with no response | 15:59 |
AJaeger | sorry, need to step out for a bit, will read later | 15:59 |
*** dpawlik has joined #openstack-infra | 15:59 | |
mordred | I think the next step in that process was to contact the pypi folks and see if they'd do it for us | 15:59 |
mordred | after having shown good-faith effort to contact the existing owner | 16:00 |
*** graphene has joined #openstack-infra | 16:00 | |
dhellmann | ok, do you still want to do that? | 16:00 |
mordred | I contacted the maintainer, Dan Crosta on 11/17/2017 and again o 01/08/2018 - the second time i copied dhellmann, ttx, fungi and smcginnis | 16:02 |
*** yamamoto has quit IRC | 16:02 | |
mordred | dhellmann: I can - unless you feel like it's a more appropriate request coming from the releaes team | 16:02 |
dhellmann | I'm trying to figure out if they have documented the process | 16:03 |
*** ykarel has joined #openstack-infra | 16:04 | |
mordred | dhellmann: https://www.python.org/dev/peps/pep-0541/ | 16:04 |
mordred | linked from an issue on gh I found: https://github.com/pypa/pypi-legacy/issues/682 | 16:05 |
*** ginopc has quit IRC | 16:05 | |
dhellmann | mordred : ok, I'll see about doing that today | 16:05 |
mordred | dhellmann: cool - let me know if I can help in any way | 16:07 |
*** dave-mccowan has quit IRC | 16:08 | |
dhellmann | mordred : there are currently 28 open requests to have package maintainership changed https://github.com/pypa/warehouse/issues?q=is%3Aissue+is%3Aopen+label%3A%22PEP+541%22 | 16:09 |
dhellmann | some as old as january | 16:09 |
dhellmann | I'm not sure we want to block on having preferred names | 16:10 |
*** efried has quit IRC | 16:10 | |
*** efried has joined #openstack-infra | 16:10 | |
pabelanger | https://github.com/pypa/warehouse/issues/4610 looks to have been transferred in a day | 16:11 |
pabelanger | not sure why so fast vs others | 16:12 |
*** jpich has quit IRC | 16:12 | |
*** graphene has quit IRC | 16:14 | |
dhellmann | https://github.com/pypa/warehouse/issues/4770 | 16:14 |
dhellmann | I don't appear to have permission to tag it | 16:15 |
*** graphene has joined #openstack-infra | 16:15 | |
pabelanger | I've found recently, you need to have special permissions on github repo to tag labels | 16:15 |
dhellmann | that seems likely | 16:16 |
pabelanger | woah, somebody tagged it | 16:16 |
*** diablo_rojo has joined #openstack-infra | 16:17 | |
mordred | woot! responses | 16:18 |
corvus | mordred: hey look what dcrosta's been working on: https://github.com/tox-dev/tox-docker | 16:19 |
mordred | corvus: yah - I think I looked at that last time I looked at him :) | 16:19 |
*** ykarel_ has joined #openstack-infra | 16:19 | |
*** ykarel has quit IRC | 16:22 | |
*** florianf is now known as florianf|afk | 16:25 | |
AJaeger | dhellmann: will you do the process for https://pypi.org/project/magnum/ and https://pypi.org/project/congress as well? | 16:26 |
*** dpawlik has quit IRC | 16:26 | |
dhellmann | AJaeger : I'm making a list so I can do them all at once | 16:26 |
AJaeger | dhellmann: thanks! | 16:26 |
clarkb | I'm having a slow start this morning. Haven't seen a response on the BHS1 email yet | 16:28 |
AJaeger | mordred: will you abandon https://review.openstack.org/#/c/531825/ then? | 16:29 |
AJaeger | mordred: with the two stacked on top of it? | 16:29 |
mordred | AJaeger: abaonded | 16:30 |
*** dpawlik has joined #openstack-infra | 16:30 | |
* mordred afks for a bit | 16:30 | |
AJaeger | thanks, mordred | 16:30 |
openstackgerrit | Merged openstack-infra/zuul master: Uncap cherrypy https://review.openstack.org/601136 | 16:33 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: WIP: Playing with k8s install https://review.openstack.org/605803 | 16:34 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove release-openstack-server template https://review.openstack.org/531830 | 16:34 |
clarkb | corvus: on https://review.openstack.org/#/c/605092/1/manifests/site.pp the node comment annotation is for bionic which means our puppet tests won't run there (also not sure we can puppet those at all?) is that supposed to be xenial like the existing dns servers? | 16:34 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Remove release-openstack-python-without-pypi https://review.openstack.org/531829 | 16:35 |
corvus | clarkb: oh, for some reason i thought we were able to use bionic now | 16:35 |
corvus | i guess i need to change it | 16:35 |
AJaeger | dhellmann, mordred, those are the cleanups we can do once dhellmann's change is in ^ | 16:35 |
clarkb | corvus: and the other question I have is do we need to update iptables or is that going to happen in a followup change? | 16:36 |
*** bobh has quit IRC | 16:37 | |
corvus | clarkb: followup. bootstrapping here is a bit messy. i'll probably do at least one manual iptables thing to get it going. | 16:37 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Add opendev nameservers https://review.openstack.org/605092 | 16:37 |
clarkb | ok | 16:37 |
corvus | mordred: ^ con you re-review pls? | 16:37 |
dmsimard | Why would pip/tox apparently fail on a 404 for a wheel ? Should it not attempt to retrieve the package if a wheel isn't found ? | 16:38 |
AJaeger | dhellmann: So, I'll +2 your change once you update all repos - including the missed one. | 16:39 |
clarkb | dmsimard: it is likely 404ing on a file that is in the index | 16:39 |
clarkb | dmsimard: in which case it has alread decided which file is most appropriate to install from | 16:39 |
dmsimard | clarkb: I have three jobs which failed on two separate mirrors but the package is there and it works locally :/ | 16:40 |
dmsimard | http://paste.openstack.org/show/731034/ | 16:40 |
clarkb | dmsimard: our mirrors are proxy caches now for pypi because our disks can't keep up with how quickly machine learning produces pypi packages | 16:40 |
dmsimard | lol | 16:41 |
dmsimard | no more bandersnatch ? | 16:41 |
dmsimard | I've been quite a bit disconnected :/ | 16:41 |
clarkb | dmsimard: no more bandersnatch, however you 404'd on our wheel mirrors | 16:41 |
*** bobh has joined #openstack-infra | 16:42 | |
clarkb | http://mirror.dfw.rax.openstack.org/wheel/ubuntu-16.04-x86_64/black/ which is a proper 404 | 16:42 |
dmsimard | right | 16:42 |
clarkb | it should install from the pypi/simple/black path instead in that case | 16:42 |
dmsimard | but it sort of gives up | 16:42 |
dmsimard | hands in the air and all that | 16:42 |
*** Swami has quit IRC | 16:46 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove migrated legacy-glare-dsvm job https://review.openstack.org/605077 | 16:47 |
*** mdbooth has quit IRC | 16:49 | |
dmsimard | I'll do a recheck to see what happens... wanted to avoid doing one blindly considering the current state of the zuul queues | 16:50 |
clarkb | dmsimard: I wonder if the reason it failed is it requires python>=3.6 and pip is smart enough to check that | 16:51 |
*** e0ne has joined #openstack-infra | 16:52 | |
clarkb | xenial doesn't have 3.6, it has 3.5 and I'm not sure what that tox target is running under either | 16:52 |
clarkb | but that is my best guess for why it couldn't find a package to install at this point | 16:52 |
dmsimard | oh, does it ? | 16:53 |
dmsimard | Well "It requires Python 3.6.0+ to run" | 16:53 |
dmsimard | huh, okay | 16:54 |
clarkb | https://pypi.org/project/black/#history and pypi is aware of this on the left panel | 16:54 |
*** dtantsur is now known as dtantsur|afk | 16:54 | |
*** e0ne has quit IRC | 16:54 | |
dhellmann | AJaeger : "update all repos"? | 16:54 |
openstackgerrit | Michael McCune proposed openstack-infra/irc-meetings master: update api-sig meeting times https://review.openstack.org/605808 | 16:57 |
*** dpawlik has quit IRC | 16:58 | |
*** ykarel_ is now known as ykarel | 16:58 | |
*** gfidente has quit IRC | 16:59 | |
*** dpawlik has joined #openstack-infra | 16:59 | |
dmsimard | clarkb: TIL | 17:01 |
dmsimard | thanks for the second pair of eyes :) | 17:01 |
dmsimard | btw is there anything in specific causing the zuul queues backlog right now ? | 17:01 |
*** gfidente has joined #openstack-infra | 17:02 | |
clarkb | dmsimard: same story as last week. Down a cloud region and tripleo gate resets and openstack gate resets eating up large amounts of capacity | 17:03 |
dmsimard | yeah nodepool is running at max capacity for sure | 17:04 |
*** derekh has quit IRC | 17:04 | |
*** diablo_rojo has quit IRC | 17:05 | |
dmsimard | Packethost is the one that is down ? | 17:05 |
*** jamesmcarthur has quit IRC | 17:06 | |
clarkb | well it may be down too (I haven't checked if the port situation there is happier) but no it is bhs1 in ovh | 17:07 |
clarkb | we're still trying to sort out the fallout of their upgrade there | 17:07 |
dmsimard | packethost has net zero VMs, everything is failing | 17:07 |
dmsimard | well, according to http://grafana.openstack.org/d/U462abNik/nodepool-packethost?orgId=1 | 17:07 |
clarkb | our VM images boot with working networking now, but we can't reliably boot them due to nova having trouble talking to the neutron ports api | 17:07 |
*** _erlon_ has quit IRC | 17:08 | |
*** jpena is now known as jpena|off | 17:11 | |
dmsimard | mordred, dhellmann: interesting bug in pbr https://github.com/openstack-dev/pbr/blob/master/pbr/packaging.py#L104-L110 | 17:11 |
dmsimard | I have a package "ara-clients" so the egg name was "ara-clients" (perhaps should have been ara_clients?) but anyway, it led to the installation of ara itself and >=clients as the version | 17:12 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Add zuul user to bridge.openstack.org https://review.openstack.org/604925 | 17:12 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Manage user ssh keys from urls https://review.openstack.org/604932 | 17:12 |
*** trown is now known as trown|lunch | 17:12 | |
clarkb | mordred: corvus ^ I think I have testing sorted out for that now so won't bother WIPing the second change there | 17:12 |
*** ramishra has quit IRC | 17:13 | |
*** agopi_ has joined #openstack-infra | 17:13 | |
clarkb | I'm going to cleanup bhs1 ports now then increase max servers to 8 in that region again to see if it works. That seemed to be ok last time it wasn't until we went to 80 that it had a sad | 17:13 |
clarkb | also mnaser did you see my paste of weird volumes in sjc1 ? I think they leaked but wanted to make sure you ahd an opportunity to look at them if you want before I delete them | 17:15 |
*** e0ne has joined #openstack-infra | 17:16 | |
*** agopi has quit IRC | 17:16 | |
clarkb | infra-root config-core for talking about opendev messaing with the foundation I'm looking at picking a time next week. I know many of us are going to be at ansiblefest which may be a good or bad thing for trying to talk about this (as it is in austin where many of the foundation folk are) Would we rather try for Friday next week after we return? | 17:18 |
clarkb | I'm thinking that an IRC meeting/discussion like we had for the naming discussion would work well? | 17:19 |
mnaser | clarkb: can you delete them please? | 17:19 |
clarkb | mnaser: I can | 17:19 |
*** agopi_ is now known as agopi | 17:21 | |
*** Emine has quit IRC | 17:21 | |
clarkb | mnaser: actually I can't because volume delete says the volume is attached but the server it is attached to no longer exist so server remove volume fails. volume set can change the state to error or similar but that is admin only | 17:25 |
mnaser | clarkb: i can do that, if you have the IDs handy so i dont break $world ? | 17:25 |
clarkb | mnaser: http://paste.openstack.org/show/730964/ volumes are on the left and servers are on the right | 17:26 |
*** harlowja has joined #openstack-infra | 17:27 | |
*** rkukura has quit IRC | 17:28 | |
*** aojea has quit IRC | 17:28 | |
dhellmann | dmsimard : that looks like a case where we want to be using rpartition instead of a regex | 17:32 |
dhellmann | dmsimard : do you have time to work on a patch? | 17:32 |
*** Emine has joined #openstack-infra | 17:34 | |
*** dpawlik has quit IRC | 17:36 | |
*** dpawlik has joined #openstack-infra | 17:39 | |
*** dpawlik has quit IRC | 17:39 | |
*** dpawlik has joined #openstack-infra | 17:40 | |
*** dpawlik has quit IRC | 17:40 | |
dmsimard | dhellmann: we've worked around it for now by using an underscore instead which doesn't trigger the issue -- I don't have much bandwidth at all right now but I can document the bug in a storyboard story ? | 17:42 |
dhellmann | dmsimard : that would be good, thanks! | 17:42 |
mnaser | clarkb: they should be goner | 17:44 |
*** jamesdenton has quit IRC | 17:44 | |
clarkb | mnaser: yup thanks | 17:45 |
mnaser | btw i noticed not all graphs are 'multiregion' | 17:45 |
mnaser | i.e.: http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1&from=now-3h&to=now | 17:45 |
clarkb | mnaser: I think ianw was working on converting them? | 17:46 |
clarkb | but ya not all have been done iirc | 17:46 |
*** panda is now known as panda|off | 17:46 | |
mnaser | still looks awesome though | 17:46 |
mnaser | clarkb: i see error node attempts is zero now for the past few minutes so | 17:51 |
mnaser | i think we're good! | 17:51 |
mnaser | sorry for that | 17:51 |
clarkb | no problem | 17:52 |
AJaeger | dhellmann: the missing openstack/ansible-role-tripleo-congress - you changed all other roles... | 17:55 |
dhellmann | ok | 17:55 |
AJaeger | dhellmann: see my comment in https://review.openstack.org/#/c/598323/ | 17:55 |
*** diablo_rojo has joined #openstack-infra | 17:56 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul-jobs master: WIP: Add role to install kubernetes https://review.openstack.org/605823 | 17:59 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Implement a Kubernetes driver https://review.openstack.org/535557 | 18:00 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: WIP: Playing with k8s install https://review.openstack.org/605803 | 18:00 |
*** yamamoto has joined #openstack-infra | 18:00 | |
*** sshnaidm is now known as sshnaidm|off | 18:03 | |
*** dims_ is now known as dims | 18:05 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: switch all official python projects to python3 publishing job https://review.openstack.org/598323 | 18:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix unreachable nodes detection https://review.openstack.org/602829 | 18:07 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Also retry the job if a post job failed with unreachable https://review.openstack.org/602830 | 18:07 |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: Remove release-openstack-python-without-pypi https://review.openstack.org/531829 | 18:08 |
*** ykarel_ has joined #openstack-infra | 18:08 | |
clarkb | bhs1 looks good with max servers set to 8 | 18:09 |
dhellmann | AJaeger , mordred : it looks like we need to resolve the name problem for congress, keystone, magnum, and heat. I have filed tickets for the first 2, and contacted the owners of the other 2 | 18:09 |
clarkb | I'll let it run there for antoher hour or two then try bumping it up to say 13 (for 10 total instances) after | 18:09 |
clarkb | see if we can find where it starts to fall over | 18:09 |
*** Swami has joined #openstack-infra | 18:10 | |
*** ykarel has quit IRC | 18:11 | |
*** bobh has quit IRC | 18:12 | |
*** olivierb has quit IRC | 18:15 | |
*** olivierb has joined #openstack-infra | 18:16 | |
*** lpetrut has quit IRC | 18:21 | |
openstackgerrit | Merged openstack-infra/system-config master: Double the gerrit regex backslashes https://review.openstack.org/605697 | 18:31 |
*** trown|lunch is now known as trown | 18:31 | |
*** jamesdenton has joined #openstack-infra | 18:31 | |
*** gfidente has quit IRC | 18:32 | |
clarkb | infra-root can we get a non foundation review on https://review.openstack.org/#/c/605212/ ? | 18:32 |
*** graphene has quit IRC | 18:32 | |
clarkb | mordred: would you be willing to review my debuntu glean refactor stack at https://review.openstack.org/#/c/604225/2 I think that will help with general readability and understanding | 18:33 |
* fungi is home now, trying to catch up on scrollback | 18:34 | |
*** graphene has joined #openstack-infra | 18:34 | |
*** graphene has quit IRC | 18:34 | |
cmurphy | clarkb: speaking of that kata change https://review.openstack.org/601831 | 18:35 |
clarkb | that would appear to be a prereq | 18:36 |
*** graphene has joined #openstack-infra | 18:36 | |
*** eernst has quit IRC | 18:37 | |
fungi | pabelanger: on jeepyb manage-projects and github things, i'm hoping in the very near future we can at least disable our use of github-related features in manage-projects entirely so it might end up even more un-tested/un-exercised thereafter. wondering if a separate utility might make more sense for that anyway | 18:38 |
clarkb | infra-root yall ok with me approving the futureparser change for static.o.o? | 18:38 |
clarkb | actually is electiosn stuff hosted there? | 18:38 |
clarkb | if so maybe we shoudl wait for after the TC elelection just to avoid any potential unhappyness with that ending today | 18:38 |
clarkb | ya governance.o.o is a vhost on static.o.o | 18:39 |
fungi | smcginnis: dhellmann: mordred was working on following https://www.python.org/dev/peps/pep-0541/#removal-of-an-abandoned-project for keystone, not sure where it got left off (i think he tried e-mail and leaving a github issue as contact attempts) | 18:39 |
fungi | oh, i see mordred replied already, ignore me | 18:39 |
smcginnis | :) | 18:39 |
clarkb | cmurphy: are erb entries like <%= scope['openstack_project::static::cert_file'] %> expected to work with futureparser? | 18:40 |
clarkb | cmurphy: I'm not sure how the scope array differs from scope.lookupvar | 18:40 |
clarkb | (maybe they are aliases for each other?) | 18:41 |
cmurphy | clarkb: they're the same thing | 18:41 |
cmurphy | it will still work | 18:41 |
clarkb | in that case I see no issues with switching static.o.o to futureparser | 18:42 |
pabelanger | fungi: Yup, that is a fair statement. I think adding support into ansible, might be the separate thing. Maybe better is I sync with kata folks, since they depend on github also | 18:42 |
clarkb | persia and diablo_rojo ^ I'd like to update how puppet runs on the server serving the elections/governance content. Would you prefer we wait for after the TC election to do that? | 18:42 |
*** olivierb has quit IRC | 18:43 | |
*** olivierb has joined #openstack-infra | 18:43 | |
*** quite has quit IRC | 18:44 | |
*** bobh has joined #openstack-infra | 18:45 | |
*** jistr has quit IRC | 18:47 | |
*** quite has joined #openstack-infra | 18:48 | |
*** jistr has joined #openstack-infra | 18:49 | |
mordred | corvus: https://review.openstack.org/#/c/605092 has 3x +2 | 18:50 |
fungi | clarkb: persia: diablo_rojo: tonyb: also be aware the post pipeline backlog means that when the change to close out the tc election does (eventually) merge it will likely still be a while before the election page gets updated | 18:51 |
AJaeger | dhellmann: regarding pypi, I think we might have additional repos - like the ansible-role-tripleo ones - that are not set up at pypi. But most might need just creation. These are repos that were created after mordred did his change earlier this year | 18:51 |
fungi | (we're up to about 81 hours for the oldest changes still in post) | 18:52 |
dhellmann | AJaeger : we'll get those for free when we tag a release, right? | 18:52 |
dhellmann | we can no longer just register a name, we have to have something to upload | 18:52 |
pabelanger | clarkb: re: opendev, next week is okay for meeting for me | 18:52 |
dhellmann | I did that for a few regular projects that I found, but the ansible roles are unlikely to have name collisions | 18:53 |
AJaeger | dhellmann: should be fine... | 18:53 |
fungi | yeah, now that twine upload autoregisters projects on pypi people don't necessarily need to precreate them unless they want additional permissions on the entry or are afraid someone might squat the name | 18:53 |
dhellmann | right, that's why I went ahead and created qinling, manila, and zaqar-ui | 18:53 |
clarkb | pabelanger: would friday be preferable due to ansiblefest? | 18:53 |
mordred | clarkb: +2 on 605212 - left off the +a- not sure if we're waiting on anything | 18:54 |
fungi | and also, precreating projects on pypi is a pain since you need an actual sdist/wheel/egg to upload now | 18:54 |
clarkb | mordred: cmurphy pointd out https://review.openstack.org/601831 should go in first | 18:54 |
AJaeger | dhellmann: just wanted to point out that we know about those repos (keystone, magnum, congress,...) but that research was done 9 months ago and new repos created in the meantime. Thanks for registering the new repos. | 18:54 |
corvus | mordred: thx, self-approved | 18:55 |
dhellmann | AJaeger : yeah, I went through all of the repos that are slated to be part of stein so far and we had a total of 7 with issues, 4 of which are still being dealt with | 18:56 |
dhellmann | I am OK with moving ahead and letting the first release attempt force the issue | 18:56 |
dhellmann | teams can either change the sdist name in their setup.cfg or we'll have the pypi acls by by then | 18:57 |
dhellmann | we keep talking about doing this, and blocking on someone having time to deal with the names, and I think we should stop blocking | 18:57 |
mordred | clarkb: whole glean stack looks great | 18:57 |
mordred | clarkb: cool - +A on 601831 | 18:58 |
diablo_rojo | clarkb, fungi I am fine with it happening now, but I'll let persia share an opinion. | 18:58 |
mordred | clarkb, corvus, fungi: if you're bored and want an easy one: https://review.openstack.org/#/c/605676/ - removes snapd from our servers | 18:58 |
fungi | heh. bored? ;) | 18:58 |
mordred | fungi: ikr? | 18:58 |
mordred | just pointing it out because I accidentally noticed it this morning, but it's not tied to any other efforts | 18:59 |
AJaeger | dhellmann, smcginnis, can we really change publish-xstatic-to-pypi to publish-to-pypi-python3? See http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/project-templates.yaml#n307 and http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/jobs.yaml#n776 - don't we need the xstatic check? | 19:00 |
*** yamamoto has quit IRC | 19:01 | |
dhellmann | AJaeger : that check is now performed as part of requesting the release tag | 19:01 |
*** smarcet has quit IRC | 19:02 | |
AJaeger | great, so I'll +2 https://review.openstack.org/#/c/598323 | 19:03 |
dhellmann | cool, thanks for doing such a careful review | 19:05 |
*** ykarel_ is now known as ykarel|away | 19:05 | |
*** graphene has quit IRC | 19:07 | |
*** graphene has joined #openstack-infra | 19:08 | |
*** jistr has quit IRC | 19:08 | |
*** jistr has joined #openstack-infra | 19:08 | |
*** dayou has quit IRC | 19:10 | |
*** ykarel|away has quit IRC | 19:12 | |
dhellmann | would it make sense to have the priorities of pipelines later in the process get higher, so that we're not sending more jobs to a pipeline than it's going to be able to process? | 19:17 |
dhellmann | basically, prioritize finishing things over starting things | 19:18 |
dhellmann | it looks like we have the post queue precedence set to low, so we're going to keep sending more jobs at it and starving it of resources | 19:20 |
fungi | dhellmann: at least zuul now uses a supercedent pipeline manager for post, so it will only enqueue up to two builds of jobs for any project+branch | 19:21 |
dhellmann | sure, that's helpful too | 19:21 |
dhellmann | I'm trying to remember what little of queuing theory I ever studied | 19:21 |
fungi | subsequent merges to a project don't effectively increase the load there while there is already one waiting | 19:21 |
clarkb | dhellmann: ya I've been mulling changing the priority on post | 19:22 |
dhellmann | one project can't starve another, but we're starving them all right now | 19:22 |
clarkb | I think we can probbly do it now that it is supercedent so the cost is low | 19:22 |
clarkb | dhellmann: hwoever | 19:22 |
clarkb | the gate is super unhealthy for tripleo and openstack | 19:22 |
clarkb | and fixing those issues would probably be more globally beneficial | 19:22 |
dhellmann | if check was low, gate was medium, and post was high and we were fully loaded then we would completely finish with 1 patch before starting another | 19:22 |
dhellmann | I'm sure | 19:22 |
dhellmann | I know the tripleo team has been working to address that | 19:23 |
dhellmann | I don't know about openstack per se | 19:23 |
fungi | so we would basically leave release/pre-release/tag, release-post and gate at highest precedence, check and post at normal precedence, and periodic/experimental at lowest precedence? | 19:23 |
pabelanger | clarkb: I'll be traveling on friday most of the day | 19:23 |
*** jistr has quit IRC | 19:23 | |
dhellmann | fungi : I think we want post higher than gate | 19:23 |
openstackgerrit | Merged openstack-infra/system-config master: Redefine $listdomain for kata lists https://review.openstack.org/601831 | 19:23 |
dhellmann | because we're not actually done with the patch until the post queue is done with it | 19:24 |
fungi | dhellmann: one up-side to making post highest precedence, i suppose, is that we could get rid of release-post | 19:24 |
dhellmann | we're "done enough" to start using it in other tests, I guess | 19:24 |
dhellmann | yeah, true | 19:24 |
dhellmann | that would simplify things, too | 19:24 |
AJaeger | will this kind of high load continue? Normally the priorities are fine and two weeks ago I would have said: Don't change anything ;) | 19:24 |
dhellmann | AJaeger : I have ~300 more patches to propose to fix tox settings :-) | 19:25 |
dhellmann | I'm still going to trickle those in, but it's what made me look at the status this afternoon | 19:25 |
AJaeger | fungi, dhellmann if post is higher than gate, then supercedent is not really needed, we would never "merge" post jobs like we do today | 19:25 |
dhellmann | AJaeger : we do still need it, because we might have 3 things pass the gate queue at the same time | 19:25 |
AJaeger | dhellmann: it's soo bad already, 300 more less won't change anything ;/ | 19:26 |
dhellmann | from the same project -- we see that with releases pretty often | 19:26 |
dhellmann | AJaeger : yes, well, I'm not going to add any more jobs today, I was just saying that's what made me go look at the queue depth | 19:26 |
clarkb | if gate and post had the same prioriry I think that would work | 19:26 |
AJaeger | dhellmann: question for me is what is normal. With current load, I agree, we need to give post higher prio. With "normal" load, I would not change it. | 19:26 |
*** jistr has joined #openstack-infra | 19:26 | |
dhellmann | clarkb : no, I really think we need to treat the series of pipelines as a queue of its own, and throttle things going from one stage to the next if it's going to mean the work can't actually complete | 19:27 |
clarkb | dhellmann: when they are the same though they would compete at roughly fair distribution of resources | 19:27 |
dhellmann | ideally we would have a large check queue, a reasonably deep gate, and anything that makes it through the gate would be home free in post so that would be a tiny queue | 19:28 |
*** jistr has quit IRC | 19:28 | |
dhellmann | but we don't want gate and post competing, that's my point | 19:28 |
dhellmann | we view the patch as "done" when it leaves the gate and merges, but we often don't see the build artifacts until after the post job is done so the patch isn't really completely done after it merges | 19:29 |
dhellmann | the post jobs don't tend to be expensive but we need them to run | 19:29 |
fungi | i think we have different work reflected in those different pipelines though, to some extent | 19:29 |
*** jistr has joined #openstack-infra | 19:29 | |
dhellmann | yes, that's true, we do | 19:29 |
dhellmann | I think we've undervalued the post job work | 19:29 |
clarkb | some post jobs are expensive, but probably not tripleo 3 node container test that is going to timeout after 3 hours expensive | 19:29 |
clarkb | dhellmann: I agree | 19:29 |
dhellmann | for example, we merge a doc patch but don't see new documentation live until after post. that's going to be several days from now it seems | 19:30 |
*** bobh has quit IRC | 19:30 | |
fungi | activity in the check pipeline is likely to be the result of developers pushing and iterating on changes, while gate pipeline activity is a result of reviewers approving changes and post is a result of automation performing actions based on changes which were able to survive gating | 19:30 |
corvus | when we're able to make the promote pipeline, i think we can make it higher precedence fairly easily because jobs in there will be very low impact. | 19:30 |
*** bobh has joined #openstack-infra | 19:30 | |
openstackgerrit | Merged openstack-infra/system-config master: Remove snapd from servers https://review.openstack.org/605676 | 19:30 |
mordred | corvus: aroo? we already have a promote pipeline? | 19:31 |
dhellmann | clarkb : how hard is it to get stats on which repos are consuming nodes in each pipeline, and failure rates? | 19:31 |
corvus | (and we'll be able to use promote for tarballs and docs) but we have a bit more work to do there. | 19:32 |
fungi | if felt to me like we basically optimized our pipelines based on which activity we encourage more: highest is people reviewing and approving changes, next is people pushing new stuff that is of perhaps dubious quality, and last is automation (the fact that people are seeking near-realtime feedback from the automation in the low-priority pipeline is something i hadn't considered however) | 19:32 |
dhellmann | if I have numbers I can try to express the issues the tripleo situation may be causing for us | 19:32 |
corvus | mordred: we need some more support in zuul before we can reliably use it (especially if we do swift logs) | 19:32 |
clarkb | dhellmann: the hardest part is figuring out how many nodes per job is consumed since I don't think we directly record that in the data processing | 19:32 |
mordred | corvus: ah - gotcha | 19:32 |
clarkb | corvus: ^ does the zuul sql db have nodeset info like that? | 19:32 |
dhellmann | fungi : I agree, that's what we did. I think that made sense at one time. Maybe it's worth revisiting? | 19:33 |
fungi | i always welcome an opportunity to revisit assumptions | 19:33 |
clarkb | dhellmann: we record job runtimes and status in the zuul sql db and in graphite. | 19:33 |
clarkb | I think the one piece missing is what number of resources that job locks during that time | 19:34 |
corvus | clarkb: no it doesn't | 19:34 |
dhellmann | how many jobs use more than 1 node? | 19:34 |
clarkb | dhellmann: in tripleo? most of them | 19:34 |
*** jistr has quit IRC | 19:34 | |
dhellmann | ok, if you give me numbers of each job, I can turn that into numbers of nodes | 19:34 |
clarkb | top of tripleo gate is 6/9 are multinode | 19:35 |
dhellmann | failure rates would be good, too | 19:35 |
dhellmann | I don't know how to collect the data, but if you help with that I can go argue internally to ease back a bit | 19:36 |
corvus | i do know the tripleo folks are trying to move some tests into single-node jobs | 19:37 |
mordred | dhellmann: or alternately to provide some node resources | 19:37 |
dhellmann | corvus : yeah | 19:37 |
*** jistr has joined #openstack-infra | 19:37 | |
dhellmann | mordred : or that | 19:37 |
clarkb | dhellmann: the place I end up looking a lot is http://status.openstack.org/elastic-recheck/data/others.html that shows 137 faisl in two tripleo multinode container jobs oaver the last 10 days in the gate | 19:37 |
dhellmann | I guess the point is I'll do that part but I need data to make the argument | 19:37 |
mordred | ++ | 19:37 |
*** dayou has joined #openstack-infra | 19:37 | |
*** jamesmcarthur has joined #openstack-infra | 19:37 | |
clarkb | that doesn't give us a complete view, but it is the "world is on fire" check with rough sense of how on fire | 19:38 |
*** hamzy has joined #openstack-infra | 19:38 | |
dhellmann | clarkb : I don't look at that page often enough to know how to read it, which numbers are you looking at? I see 82 fails for tripleo-ci-centos-7-containers-multinode right at the top | 19:38 |
clarkb | for curating global values the neutron team has dashbaord like http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1, The tripleo team could put something similar together using grafyaml | 19:39 |
dhellmann | oh, and I guess the next job down has 55 for 137 total | 19:39 |
clarkb | dhellmann: ya I added those two together | 19:39 |
dhellmann | k, yeah, I worked it out | 19:39 |
*** jistr has quit IRC | 19:39 | |
dhellmann | build-openstack-sphinx-docs is failing pretty often, too | 19:40 |
clarkb | fwiw openstack is super flaky right now too | 19:40 |
clarkb | it just consumes a smaller portion of resources on reset due to smaller queue | 19:41 |
dhellmann | yeah | 19:41 |
dhellmann | though I wonder how much that contributes to the tripleo failures, too | 19:41 |
mordred | I'm having fairly consistently inconsistent devstack behavior in sdk functional tests | 19:41 |
*** e0ne has quit IRC | 19:42 | |
mordred | on any given patch I'm rather likely to have something fail server-side and need to recheck - so I'd bet the tripleo folks are getting hit by that as well to some degree | 19:42 |
clarkb | fwiw I've tried to read the tripleo logs to undersatnd how things are failing but get lost in the ansible drives mistral which drives ansible which runs puppet in docker | 19:43 |
dhellmann | clarkb : yeah, don't even get me started | 19:43 |
AJaeger | dhellmann: build-openstack-sphinx-docs was broken on many stable branches, a fallout from the way we changed the jobs. On master I'm not aware of anything | 19:44 |
*** hamzy has quit IRC | 19:44 | |
dhellmann | AJaeger : ok, I didn't look at any of the failures, I just noticed it was 3rd on that page clarkb linked | 19:47 |
*** e0ne has joined #openstack-infra | 19:47 | |
*** jistr has joined #openstack-infra | 19:49 | |
*** e0ne has quit IRC | 19:50 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Provide some accounting of node usage in logs https://review.openstack.org/605856 | 19:52 |
corvus | 2018-09-27 12:51:53,271 zuul.nodepool INFO Nodeset <NodeSet [<Node 0000000001 ('controller',):label1>]> with 1 nodes was in use for 0.006144285202026367 seconds for build <Build 5eae74f9be32448880a78994d0281de2 of project-test1 on <Worker localhost6.localdomain6>> | 19:52 |
corvus | clarkb, dhellmann: ^ that change will cause zuul to emit log lines like that | 19:52 |
clarkb | neat | 19:52 |
corvus | maybe that would help. if we can ever land the change and restart. ;) | 19:52 |
dhellmann | heh | 19:53 |
dhellmann | thanks, corvus | 19:53 |
corvus | oh we're missing the project name | 19:54 |
corvus | (project-test1 is a job name -- for hysterical raisins) | 19:54 |
*** pcaruana has quit IRC | 19:54 | |
clarkb | dhellmann: it might also be helpful to prioritize bug fixes rather than feature additions? | 19:56 |
dhellmann | clarkb : I know at one point juan kicked everything out of the tripleo gate to clear it up and get some stabilizing fixes through, but I don't know if there has been a real policy on that | 19:57 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Provide some accounting of node usage in logs https://review.openstack.org/605856 | 19:57 |
dhellmann | how many jobs are we running non-voting? | 19:58 |
clarkb | there were a bunch in the tripleo gate but I think those did get cleaned up. There are a few outliers I've noticed but not like before | 19:59 |
clarkb | or do you mean in general? | 19:59 |
dhellmann | in general | 19:59 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Indicate whether a build is voting in the logs https://review.openstack.org/605857 | 19:59 |
dhellmann | I'm just thinking of classes of jobs that we could kill or something. I guess there aren't likely to be many. | 19:59 |
openstackgerrit | Merged openstack-infra/glean master: Use common function for debian bond mode https://review.openstack.org/604221 | 20:00 |
openstackgerrit | Merged openstack-infra/glean master: Check same debian interface path everywhere https://review.openstack.org/604222 | 20:00 |
clarkb | ya I'm actually not sure how to answer that question with distributed zuul job configs now | 20:00 |
corvus | the sql db *does* include whether it's voting | 20:00 |
clarkb | ah maybe ^ is how to answer that question then | 20:00 |
corvus | so we can semi-easily find out how many we've run (and for what projects) | 20:00 |
dhellmann | corvus: what do you think about my theory about setting pipeline priorities so they get progressively higher? | 20:00 |
dhellmann | yeah, that would be useful to have, too. we could ask teams to turn off non-voting jobs for the time being | 20:01 |
*** bobh has quit IRC | 20:01 | |
openstackgerrit | Merged openstack-infra/glean master: Manage the debian interface header in one place https://review.openstack.org/604223 | 20:02 |
corvus | dhellmann: when we have promote, i'm all for it. but right now, we have our historical mix of important + unimportant stuff in post (i think), so it's a harder call. for example, we still have a lot of coverage jobs, which we threw in post because they're not important. but they're heavy resource users. | 20:03 |
dhellmann | hmm, yes, that's a good point | 20:03 |
clarkb | AJaeger has been working to clean those up fwiw | 20:03 |
fungi | by moving them to check instead | 20:03 |
fungi | yes | 20:03 |
corvus | maybe it's as simple as removing all the coverage jobs and then there's nothing unimportant left in post? | 20:04 |
clarkb | ya but check is lower priority so probably ok | 20:04 |
clarkb | corvus: that is probably the vast majority of unimportant stuff in post | 20:04 |
corvus | at any rate, that's my main concern -- basically if we're shifting the importance of post, we should make sure what's in it is aligned with the new importance. | 20:04 |
corvus | skimming http://zuul.openstack.org/builds.html?pipeline=post looks pretty tame | 20:06 |
corvus | maybe we're close enough to make it worth doing now | 20:06 |
*** smarcet has joined #openstack-infra | 20:07 | |
*** e0ne has joined #openstack-infra | 20:08 | |
*** diablo_rojo has quit IRC | 20:08 | |
openstackgerrit | Merged openstack-infra/glean master: Consistent debian interface control flow https://review.openstack.org/604224 | 20:09 |
openstackgerrit | Merged openstack-infra/glean master: Debian interface config set bond once https://review.openstack.org/604225 | 20:09 |
*** hamzy has joined #openstack-infra | 20:09 | |
*** diablo_rojo has joined #openstack-infra | 20:11 | |
*** hamzy has quit IRC | 20:20 | |
*** hamzy has joined #openstack-infra | 20:20 | |
clarkb | using the data in the zuul status.json for the current gate queue and the heuristic of any job with multinode in its name is worth 2 nodes and all other jobs are 1 node we get ~234 nodes for the current tripleo active window | 20:22 |
clarkb | there is only one chagne in the openstack integrated gate so it won't give us an idea of how painful it was when those were in a long queue | 20:25 |
*** hamzy has quit IRC | 20:25 | |
*** diablo_rojo has quit IRC | 20:26 | |
*** Emine has quit IRC | 20:27 | |
*** smarcet has quit IRC | 20:27 | |
clarkb | bhs1 seems happy with a max servers of 13 | 20:28 |
*** Emine has joined #openstack-infra | 20:28 | |
clarkb | I wonder if the issue is more to do with thundering herds | 20:28 |
*** hamzy has joined #openstack-infra | 20:28 | |
mordred | clarkb: that would make sense if the underlying issues are that nova is having issues talking to neutron when it happens | 20:29 |
ianw | clarkb / pabelanger: not sure if you saw https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4188 but it looks like we tickled a unbound bug with the unconfigured ipv6 | 20:32 |
openstack | www.nlnetlabs.nl bug 4188 in server "IPv6 forwarders without ipv6 result in SERVFAIL" [Enhancement,Resolved: fixed] - Assigned to unbound-team | 20:32 |
ianw | clarkb / mnaser : graphs should be generated by a script, let me see | 20:32 |
*** hamzy has quit IRC | 20:34 | |
pabelanger | ianw: woot, yah for CIing the internet | 20:34 |
ianw | mnaser: http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1&from=now-3h&to=now is looking about right to me? up top left there is a region drop down box? | 20:34 |
*** hamzy has joined #openstack-infra | 20:34 | |
pabelanger | ianw: we should see about backporting fix in fedora | 20:34 |
ianw | pabelanger: yep, i only finished up late last night with it ... but i'll file a bug. given this, i think a smaller workaround than the one i proposed (https://review.openstack.org/605583) that we can revert when it's fixed is appropriate | 20:35 |
*** dayou has quit IRC | 20:38 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add zuul envar for kata tests https://review.openstack.org/605773 | 20:39 |
mordred | fuentess: ^^ woot | 20:40 |
*** smarcet has joined #openstack-infra | 20:40 | |
*** dpawlik has joined #openstack-infra | 20:42 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-sphinx master: Add attr_overview directive https://review.openstack.org/604980 | 20:43 |
*** kgiusti has left #openstack-infra | 20:43 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role https://review.openstack.org/605585 | 20:43 |
*** hamzy has quit IRC | 20:43 | |
*** dpawlik has quit IRC | 20:46 | |
*** jamesmcarthur has quit IRC | 20:47 | |
*** ansmith has quit IRC | 20:48 | |
clarkb | http://paste.openstack.org/show/731045/ is a quick and dirty script to get an idea of what the node request distribution is | 20:48 |
*** anteaya has joined #openstack-infra | 20:51 | |
ianw | mordred : ^^ i wasn't sure how we want to do the matching for docker hosts. my initial thought there is to add a subdomain for docker hosts | 20:51 |
ianw | and i'm determined to get "quay" used in openstack after it was rejected for the Q series name ;) | 20:52 |
*** smarcet has quit IRC | 20:52 | |
clarkb | what that breakdown doesn't show is the node requests * time consumed per request | 20:53 |
clarkb | so still not the whole picture | 20:54 |
notmyname | there are some swift jobs in the periodic section and periodic-stable that have been there from 60+ hours. not sure what they are, but maybe they could be cleaned up if they'll be automatically retriggered | 20:55 |
clarkb | notmyname: those jobs are all just queued and zuul isn't queuing new ones every 24 hours so I think we are good to let them sit like that | 20:56 |
notmyname | ok | 20:56 |
mordred | ianw: well - I think once I get the new groups plugin finished, we can just list them in a yaml list | 20:58 |
mordred | ianw: I don't think the subdomain will work out - because we might eventually have docker installed on all of our hosts | 20:58 |
mordred | ianw: (although I grok your desire :) ) | 20:59 |
*** trown is now known as trown|outtypewww | 20:59 | |
*** yamamoto has joined #openstack-infra | 20:59 | |
mordred | in fact, lemme go update that to not try to do the disabled bit | 21:00 |
notmyname | clarkb: I updated your pastebin'd script (and put the changes in github's gist because I can do multiple files there): https://gist.github.com/notmyname/8bf3dbcb7195250eb76f2a1a8996fb00 | 21:05 |
notmyname | clarkb: you'll need to install https://pypi.org/project/ascii_graph/ | 21:05 |
clarkb | neat | 21:06 |
*** dayou has joined #openstack-infra | 21:06 | |
notmyname | clarkb: *super* useful for a quick visual check of what you're showing :-) | 21:06 |
clarkb | ya that graph makes it way easier to understand | 21:06 |
notmyname | those are some big numbers at the end! and that's a conservative estimate since the "multinode" check just adds 2, regardless of how many nodes are used | 21:08 |
clarkb | yup | 21:08 |
openstackgerrit | Monty Taylor proposed openstack-infra/system-config master: Add yamlgroup inventory plugin https://review.openstack.org/602385 | 21:09 |
mordred | ianw, clarkb: ^^ there it is, without trying to do the disabled magic | 21:09 |
*** rh-jelabarre has quit IRC | 21:10 | |
notmyname | clarkb: any way to see what a "normal" multinode job uses? eg I know swift's one multinode job uses 5 nodes | 21:12 |
ianw | mordred: what's with the GPL bits there? it's from somewhere else? | 21:12 |
clarkb | notmyname: not easily, that is what we were discussiing early. The major piece of data we don't record after jobs finish is how many nodes they used | 21:13 |
clarkb | notmyname: what we could do is build a giant lookup table but thats a lot of work likely | 21:13 |
clarkb | notmyname: corvus also started work to have zuul record some of this so hopefulyl we'll be able to answer that in the future | 21:13 |
clarkb | ianw: its from ansible | 21:13 |
*** e0ne has quit IRC | 21:14 | |
dhellmann | clarkb : isn't the zuul configuration already the lookup table? can't we get the job definitions programmatically? | 21:14 |
clarkb | dhellmann: if you load the entire zuul config like zuul does | 21:15 |
dhellmann | I thought that API was there, but maybe not | 21:15 |
notmyname | clarkb: looks like there's 66 multinode jobs defined on http://zuul.openstack.org/jobs.html (ie a search for "multinode-"). is there any way from zuul to load the config and check the node set or something? | 21:15 |
mordred | ianw: it's an inventory plugin that subclasses GPL code - ansible plugins do not get the gpl exception that ansible modules get | 21:15 |
*** jamesdenton has quit IRC | 21:15 | |
ianw | ahh | 21:16 |
mordred | notmyname: we've got patches in flight that add the ability to get the job data for individual jobs | 21:16 |
clarkb | let me check if the api has that info in it yet | 21:16 |
clarkb | some of the job config data is exposed | 21:16 |
notmyname | mordred: it will only be 2, I mean 8, I mean 27 hours until they land! ;-) | 21:16 |
mordred | notmyname: heh | 21:17 |
mordred | oh - actually- I think the API bit landed, it's just the dashboard page outstanding | 21:17 |
*** jamesmcarthur has joined #openstack-infra | 21:17 | |
mriedem | so uh, any ideas on resolving this? http://status.openstack.org/elastic-recheck/gate.html#1449136 | 21:18 |
*** bobh has joined #openstack-infra | 21:18 | |
mriedem | 899 fails in 10 days | 21:18 |
mordred | http://logs.openstack.org/48/597048/7/check/zuul-build-dashboard/44bca93/npm/html is a draft copy of the updated dashboard - if you click to 'jobs' then to 'devstack-multinode' you can see the nodes | 21:18 |
clarkb | mriedem: i think we did solve that one | 21:18 |
*** bobh has quit IRC | 21:18 | |
clarkb | mriedem: the first blip was limestone mirror crashing the second blip was ovh gra1 mirror crashing | 21:18 |
mriedem | get right out of town | 21:18 |
mriedem | i guess we'll know when indexing is happening again | 21:18 |
clarkb | mriedem: limestone fixed their issue and a reboot in gra1 seemed to have fixed that one | 21:18 |
mordred | notmyname: http://zuul.openstack.org/api/job/devstack-multinode is the api call | 21:19 |
notmyname | anyone ever played with the idea of zuul tracking usage to do "billing" tracking per project? maybe that's a dangerous idea with the wrong sort of incentives... | 21:19 |
clarkb | mordred: problem is we get the nodeset name in some cases http://zuul.openstack.org/api/job/tripleo-ci-centos-7-containers-multinode | 21:19 |
clarkb | oh wait no thats just a variable, likely have to look at the parent and recurse until nodset info is found | 21:20 |
mordred | clarkb: we might not have it attached to a job though - it might be a top-level defined nodeset | 21:20 |
clarkb | notmyname: ya I don't know that we want to do that as much as encourage teams to be good stewards of the resources? like if your testing doesn't work at all then stop approving things that aren't bug fixes | 21:20 |
mordred | notmyname: not in any way that would be even more than brainstorming ... for the reasons you would imagine and that clarkb mentions - but in considering multitenant zuul we've definitely pondered ideas like per-tenant build node resources and/or accounting | 21:21 |
mordred | but I think that would be a dangerous amount of siloing to introduce into openstack itself | 21:22 |
*** dhill_ has quit IRC | 21:22 | |
notmyname | my employer has a customer that does a huge amount of pipelined data processing (DNA sequences). they also need to do chargeback for the different groups that are running jobs, using space, etc. zuul happens to be a pretty good pipeline job handler. having some sort of ability to track what "project" uses what resources would make it potentially useful in cases like this | 21:24 |
*** agopi_ has joined #openstack-infra | 21:24 | |
clarkb | another good steward action is to remove all the non voting jobs from the gate queue | 21:25 |
mordred | notmyname: yah - that's precisely the sort of use case that has made me ponder such accounting before | 21:25 |
clarkb | I think it would be useful to track it more explicitly even if we aren't doing charging/billing/etc based on it too | 21:25 |
mordred | clarkb: I think we'd need to add a /nodesets endpoint to get the top-level defined nodeset objects - and then either recurse through parents - or if we get a named nodeset grab it from /nodesets | 21:26 |
clarkb | if your month over month usage skyrockets maybe that indicates a previously unknown issue | 21:26 |
mordred | clarkb: indeed | 21:26 |
*** agopi has quit IRC | 21:27 | |
mordred | clarkb: alternately, perhaps we should consider expanding that server-side - so that everything looks like devstack-multinode even if it's referencing a nodeset object - although we'd want to keep the name: field if it is referencing one | 21:27 |
clarkb | mordred: expanding it server side so the client gets the raw data would probably be the most firendly api wise | 21:28 |
mordred | yah | 21:28 |
clarkb | fwiw I also don't think the infra team wants to be in the role of policing openstack | 21:28 |
*** anteaya has quit IRC | 21:28 | |
mordred | nope | 21:28 |
clarkb | unfortunately we've had this role in various ways for various reasons | 21:28 |
mordred | BUT - I do think providing data so that self-policing is possible is a thing we like doing | 21:28 |
clarkb | ++ | 21:29 |
mordred | such as the grafana stuff | 21:29 |
mordred | clarkb, corvus: the react patch and the puppet patch to go with it are both 2x+2 ... perhaps we should talk about rollout strategy? | 21:30 |
mordred | like, should we maybe put zuul.o.o in the emergency list - land both, then do a puppet run that should update the html and the apache at the same time? | 21:31 |
clarkb | mordred: I've not completely tracked that, but we need to update apache in order for the new js to work? | 21:31 |
*** anteaya has joined #openstack-infra | 21:31 | |
clarkb | if that is the case then ya something like what you describe is probably ideal | 21:31 |
mordred | clarkb: yah - the rewrite rules are different | 21:31 |
mordred | clarkb: https://review.openstack.org/#/c/604251/ is the puppet patch | 21:32 |
*** agopi_ is now known as agopi | 21:33 | |
clarkb | mordred: can the should be removed in a few weeks section be remoevd to simplify things? | 21:33 |
mordred | clarkb: yah - probably so :) | 21:34 |
clarkb | and we must have a singular index.html that does all the things but api ? | 21:34 |
*** jamesmcarthur has quit IRC | 21:35 | |
mordred | although not ALL of it wants to be removed - it's really just that one line | 21:35 |
clarkb | what will make this data tracking much better is getting the nodeset information into the zuul sql db | 21:36 |
clarkb | because then we can do time * resources and show actual usage | 21:37 |
*** jamesmcarthur has joined #openstack-infra | 21:37 | |
clarkb | right now we are sort of integrating over node requests to hand aave around what that number is | 21:37 |
*** jamesmcarthur has quit IRC | 21:39 | |
*** ansmith has joined #openstack-infra | 21:40 | |
clarkb | I've bumped bhs1 up to 20 instances as it continues to not leak nodes there | 21:41 |
*** jamesmcarthur has joined #openstack-infra | 21:42 | |
*** felipemonteiro has joined #openstack-infra | 21:43 | |
mnaser | ianw: the stuff under api operation seems to have ca-ymq-1 only | 21:46 |
*** yamamoto has quit IRC | 21:48 | |
clarkb | https://review.openstack.org/#/c/589068/ just failed and caused a rest of the entire gate | 21:48 |
clarkb | jaosorior: ^ http://logs.openstack.org/68/589068/29/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/fc4a5fe/job-output.txt.gz#_2018-09-27_21_46_40_464305 the post run timed out | 21:49 |
openstackgerrit | Merged openstack-infra/system-config master: Creates 'embargo-notice' list https://review.openstack.org/605212 | 21:54 |
clarkb | jaosorior: it would be interesting to profile what the cost in collecting logs is | 21:54 |
clarkb | jaosorior: we can't do that on that change because I don't think we got all the logs needed to debug that but something to dig into | 21:55 |
clarkb | maybe we can trim the list of things logged again | 21:55 |
*** mriedem has quit IRC | 21:55 | |
mordred | clarkb: ugh. I *REALLY* need to git the log collecting stuff reworked | 21:56 |
mordred | s/git/get/ | 21:56 |
mordred | that's like - 9 months behind schedule | 21:56 |
ianw | mnaser: interesting ... i think it's a problem with the actual data for sjc ... | 21:56 |
*** diablo_rojo has joined #openstack-infra | 21:57 | |
clarkb | mordred: sure but one wonders also if http://logs.openstack.org/68/589068/29/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/cec555e/logs/undercloud/home/zuul/tripleo-heat-installer-templates/ is actually necessary | 21:57 |
clarkb | thats all stuff in git somewhere I'm sure | 21:57 |
mordred | clarkb: yah. good point | 21:58 |
*** rlandy is now known as rlandy|biab | 22:03 | |
*** smarcet has joined #openstack-infra | 22:04 | |
ianw | mordred: remember anything about ComputePostServers possibly changing stats reporting somehow? | 22:04 |
*** eernst has joined #openstack-infra | 22:05 | |
ianw | stats.timers.nodepool.task.vexxhost-ca-ymq-1.ComputePostServersDetail exsists, but stats.timers.nodepool.task.vexxhost-sjc1.ComputePostServersDetail doesn't | 22:05 |
*** mriedem has joined #openstack-infra | 22:07 | |
*** boden has quit IRC | 22:08 | |
*** scarab_ has joined #openstack-infra | 22:09 | |
*** scarab_ has quit IRC | 22:11 | |
*** agopi is now known as agopi|brb | 22:11 | |
mordred | ianw: it shoudnt have changed | 22:12 |
mordred | stats.timers.nodepool.task.vexxhost-sjc1.ComputePostServersDetail seems right to me | 22:12 |
ianw | mordred: where are you seeing that? unless i'm blind that isn't in graphite | 22:15 |
*** agopi|brb has quit IRC | 22:16 | |
*** graphene has quit IRC | 22:16 | |
*** graphene has joined #openstack-infra | 22:17 | |
*** fuentess has quit IRC | 22:19 | |
ianw | it doesn't seem like the ComputePostServersDetail task gets run on nl03, which is launch sjc nodes. i wonder what's different | 22:20 |
ianw | (from grepping the logs for it) | 22:20 |
ianw | doesn't seem like any of the launcher are putting that stat in ... | 22:24 |
ianw | hang on, no Detail ... | 22:28 |
ianw | "Manager vexxhost-ca-ymq-1 running task ComputePostServers (queue 0)" runs all the time | 22:28 |
ianw | but never for sjc1 ... weird | 22:28 |
*** rcernin has joined #openstack-infra | 22:29 | |
ianw | but we have ComputePostOs-volumes_boot | 22:30 |
ianw | ... because, i guess, they're all marked as "boot-from-volume" | 22:33 |
ianw | so a) the graph should capture that and b) i think the nodepool task->stat normalisation might want to fix that up too | 22:34 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add nodesets API route https://review.openstack.org/605877 | 22:34 |
*** jtomasek has quit IRC | 22:35 | |
*** jamesmcarthur has quit IRC | 22:37 | |
mordred | ianw: hrm. so - actually, ComputePostServersDetail doesn't make much sense ... it should be ComputePostServers - which is POST {compute}/servers - which is the rest call, where {compute} is the endpoint of the 'compute' service | 22:37 |
mordred | ianw: the other, ComputePostOs-volumes - you're right - is ugy and should get fixed | 22:38 |
*** agopi|brb has joined #openstack-infra | 22:38 | |
mordred | but it's the (current) translation of POST {compute}/os-volumes_boot - which I kid you not is the endpoint you submit something to to boot from volume | 22:38 |
mordred | ianw: the only place we have Detail is ComputeGetServersDetail - because we run GET {compute}/servers/detail - as GET {compute}/servers is almost useless | 22:39 |
mordred | ianw: do you think we should update the transformation to be to ComputePostOsVolumesBoot ? | 22:40 |
*** tpsilva has quit IRC | 22:42 | |
*** dpawlik has joined #openstack-infra | 22:42 | |
*** anteaya has quit IRC | 22:43 | |
clarkb | ok I think 20 to 25 may have tipped us over into bhs1 is less happy | 22:43 |
clarkb | if these additional instances do fail to boot I will dial it back to 20 (better than nothing) and we can see if it stabilizes | 22:43 |
fungi | unfortunate the problem seems to be persisting | 22:44 |
clarkb | actually I take that back these nodes all flipped to ready | 22:45 |
clarkb | they seemed to take longer to boot though and their ports showed as DOWN, I guess they go DOWN to UP to DOWN again through the lifetime of the instance | 22:46 |
clarkb | I'll keep it at 25 and see how it does | 22:46 |
*** dpawlik has quit IRC | 22:47 | |
*** anteaya has joined #openstack-infra | 22:48 | |
clarkb | fungi: I'm beginning to think it is a thundering herd type problem | 22:51 |
*** jamesmcarthur has joined #openstack-infra | 22:53 | |
*** tosky has quit IRC | 22:53 | |
corvus | mordred, clarkb: re react, i'm inclined to do what mordred suggests (or, perhaps depending on the time, merging both without adding zuul to emergency) when there's a little more predictability to merge times. so... maybe friday or saturday? | 22:55 |
fungi | comparatively small herd | 22:55 |
fungi | fri/sat sounds like a reasonable time for that transition | 22:56 |
*** jamesmcarthur has quit IRC | 22:57 | |
clarkb | wfm | 22:58 |
*** smarcet has quit IRC | 22:59 | |
ianw | mordred: yep, i think the transforms will help, i'll take a look | 23:02 |
*** jamesmcarthur has joined #openstack-infra | 23:07 | |
*** olivierb has quit IRC | 23:08 | |
*** olivierb has joined #openstack-infra | 23:11 | |
*** jamesmcarthur has quit IRC | 23:12 | |
*** dpawlik has joined #openstack-infra | 23:20 | |
*** dpawlik has quit IRC | 23:25 | |
*** mriedem has quit IRC | 23:26 | |
*** bobh has joined #openstack-infra | 23:28 | |
*** olivierb has quit IRC | 23:29 | |
*** anteaya has quit IRC | 23:32 | |
*** anteaya has joined #openstack-infra | 23:39 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Grafana: set zuul node requests yaxis min https://review.openstack.org/605886 | 23:40 |
*** Swami has quit IRC | 23:45 | |
pabelanger | +3 | 23:46 |
pabelanger | :) | 23:46 |
pabelanger | clarkb: notmyname: script is neat, going to play with it | 23:46 |
ianw | memories of my thesis supervisor and the huge red marks he'd put on things if you used a misleading graph axis | 23:47 |
ianw | you can make anything look good by fiddling the axes :) | 23:47 |
corvus | yay! we're almost through the backlog! oh. nope. | 23:48 |
pabelanger | seems we always have about 100 nodes deleting, over the last 6 hours | 23:50 |
pabelanger | oh | 23:51 |
pabelanger | http://grafana.openstack.org/dashboard/db/nodepool-packethost | 23:51 |
clarkb | yes thats been a known issue since the ptg I think | 23:51 |
clarkb | port leaks | 23:51 |
pabelanger | ah | 23:51 |
clarkb | studarus occasionally fiddles it or upgrades a piece of the cloud so havent turned it off | 23:51 |
pabelanger | yah, doesn't look like much nodes used there last 6hours | 23:52 |
*** jamesmcarthur has joined #openstack-infra | 23:52 | |
clarkb | the problem is basically that ports leak then boots fail. If you delete the ports boots work again for a time | 23:53 |
clarkb | not every port is leaked | 23:53 |
clarkb | it is weird | 23:53 |
pabelanger | is it just studarus working on the cloud? | 23:53 |
pabelanger | yah, last 30days doesn't look healthy | 23:54 |
pabelanger | http://grafana.openstack.org/d/U462abNik/nodepool-packethost?orgId=1&from=now-30d&to=now | 23:54 |
*** felipemonteiro has quit IRC | 23:55 | |
clarkb | platform9 supports the cloud distro thing too | 23:55 |
clarkb | and ya we looked at it at the ptg a bit | 23:56 |
clarkb | but havent tracked down where neutron is losing those ports | 23:56 |
clarkb | did learn you cant rely on explicit quotas with noca though as you will lag quota updates | 23:56 |
clarkb | so apparebtly clouds like osic gave us a 10% or so buffer on our quotas | 23:56 |
*** rlandy|biab is now known as rlandy | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!