*** diablo_rojo has joined #opendev-meeting | 14:49 | |
*** hamalq has joined #opendev-meeting | 16:42 | |
*** hashar has joined #opendev-meeting | 18:50 | |
fungi | ahoy mateys! | 19:01 |
---|---|---|
corvus | ahoy hoy | 19:01 |
clarkb | hello! | 19:01 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Sep 8 19:01:28 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2020-September/000082.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
ianw | o/ | 19:01 |
clarkb | I didn't have any formal announcements. But Yesterday and Today Oregon decided to catch on fire so I'm semi distracted by that. We should be ok though a neraby field decided it wanted to be a fire instead | 19:02 |
clarkb | anyone else have anything to announce? | 19:02 |
clarkb | (oh also power outages have been a problem so I may drop out due to that too though haven't lost power yet) | 19:03 |
fungi | nothing which tops that, no ;) | 19:03 |
fungi | :/ | 19:03 |
clarkb | really I expect the worst bit will be the smoke when the winds shift again. So I should just be happy right now :) | 19:03 |
clarkb | #topic Actions from last meeting | 19:04 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:04 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-01-19.01.txt minutes from last meeting | 19:04 |
clarkb | There were no actions from lsat meeting. Lets just dive into this one then | 19:04 |
clarkb | #topic Priority Efforts | 19:04 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:04 | |
clarkb | #topic Update Config Management | 19:04 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:04 | |
clarkb | I've booted a new nb03.opendev.org to run nodepool-builder with docker for arm64 image builds | 19:05 |
clarkb | That has been enrolled into our inventory but has a problem installing things because there aren't wheels for arm64 :) | 19:05 |
clarkb | #link https://review.opendev.org/750472 Add build deps for docker-compose on nb03 | 19:05 |
clarkb | that should fix it and once thats done everything should be handled by docker so should work | 19:05 |
clarkb | one thing that came up as part of this is that we don't seem to have ansible using sshfp records yet? or maybe we do and the issue I had was specific to having a stale known_hosts entry for a reused IP? | 19:06 |
clarkb | ianw: fungi ^ any updates on that? | 19:06 |
fungi | we have usable sshfp records for at least some hosts | 19:06 |
ianw | umm, i think that the stale known_hosts overrides the sshfp | 19:06 |
fungi | yes, if there is an existing known_hosts entry that will be used instead | 19:07 |
clarkb | gotcha, that was likely the issue here then | 19:07 |
ianw | it might be a bit of a corner case with linaro | 19:07 |
clarkb | do we expect sshfp to work otherwise? | 19:07 |
ianw | where we have a) few ip's and b) have rebuilt the mirror a lot | 19:07 |
fungi | though i also don't think bridge.o.o is configured to do VerifyHostKeyDNS=yes is it? | 19:08 |
clarkb | https://review.opendev.org/#/c/744821/ <- reviewing and landing that would be good if we expect sshfp to work now | 19:08 |
ianw | my understanding is yes, since it is using unbound and the dns records are trusted | 19:08 |
fungi | i thought VerifyHostKeyDNS=ask was the default | 19:08 |
fungi | and i couldn't find anywhere we'd overridden it | 19:09 |
fungi | ahh, ssh_config manpage on bridge.o.o claims VerifyHostKeyDNS=no is the default actually | 19:10 |
clarkb | ok we don't have to solve this in the meeting but wanted to call it out as a question that came up | 19:10 |
fungi | yeah, i'm not certain we've actually started using sshfp records for ansible runs from bridge yet | 19:11 |
clarkb | Are there other config management update to call out? | 19:11 |
fungi | also worth noting, glibc 2.31 breaks dnssec (there are nondefault workarounds), so we need to be mindful of that when we eventually upgrade bridge.o.o, or for our own systems | 19:12 |
clarkb | fungi: is 2.31 or newer in focal? | 19:12 |
fungi | as that will also prevent openssh from relying on sshfp records | 19:12 |
fungi | yeah, focal | 19:13 |
fungi | 2.31-0ubuntu9 | 19:13 |
clarkb | sounds like that may be it for config management and sshfp | 19:14 |
clarkb | #topic OpenDev | 19:15 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:15 | |
ianw | we could also move back to the patch that just puts the fingerprints into known_hosts | 19:15 |
ianw | as sshfp seems like it is a nice idea, but ... perhaps more trouble that it's worth tbh | 19:15 |
clarkb | ianw: something to consider for sure | 19:15 |
clarkb | #link https://review.opendev.org/#/c/748263/ Update opendev.org front page | 19:15 |
clarkb | Thank you ianw for reviewing this one | 19:15 |
clarkb | Looks like we've got a couple +2s now. corvus do you want to review it before we approve it? | 19:16 |
clarkb | I should rereview it, but in trying to follow the comments its all made sense to me so far s o Idoubt I'll have major concerns | 19:16 |
fungi | i've left some comments there for things i'm happy to address in a follow-up patch | 19:17 |
fungi | so as not to drag this one out unnecessarily | 19:17 |
fungi | it's already a significant improvement over what's on the site now, in my opinion | 19:18 |
clarkb | frickler: ^ you may be interested as well | 19:18 |
clarkb | maybe fungi can approve it first thing tomorrow if there are no further objects between now and then? | 19:18 |
clarkb | because ya I agree a big improvement | 19:18 |
fungi | sure, i'll push up my suggestions as a second change when doing so | 19:19 |
clarkb | On the gerrit upgrade testing side of things I've not had time to push on that since my least email to luca. I'm hoping that I'll hvae time this week for more testing | 19:19 |
clarkb | Any other opendev topics others would like to call out before we move on? | 19:20 |
corvus | clarkb: i will +3 front page | 19:20 |
clarkb | corvus: k | 19:20 |
fungi | i finished the critical volume replacements in rax-dfw last week | 19:20 |
fungi | and have been poking at replacing the less critical ones in the background as time allows | 19:21 |
clarkb | fungi: other than the sometimes old volumes don't delete problem were there issues? | 19:21 |
fungi | ahh, yeah, looks like wiki.o.o will need special attention. i expect it's because it's booted from a snapshot of a legacy flavor instance, but i can't attach a new volume to it | 19:21 |
fungi | may need to rsync its content over to another instance booted from a modern flavor | 19:22 |
clarkb | "fun" | 19:22 |
fungi | the api accepts the volume add, but then the volume immediately returns to available and the instance never sees it | 19:22 |
fungi | oh, and also i discovered that something about osc is causing it not to be able to refer to volumes by name | 19:23 |
fungi | and it gives an empty name column in the volume list output too | 19:23 |
fungi | i've resorted to using cinderclient for now to get a volume listing with names included | 19:24 |
fungi | i suspect it's something to do with using cinder v1 api, or maybe a rackspace-specific problem | 19:24 |
fungi | just something worth keeping in mind if anybody needs something similar | 19:24 |
fungi | i haven't really had time to take it up with the sdk/cli folks yet | 19:24 |
clarkb | Thank you for taking care of that | 19:25 |
fungi | no problem | 19:25 |
clarkb | #topic General Topics | 19:26 |
*** openstack changes topic to "General Topics (Meeting topic: infra)" | 19:26 | |
clarkb | #topic Vexxhost Mirror IPv6 Problems | 19:26 |
*** openstack changes topic to "Vexxhost Mirror IPv6 Problems (Meeting topic: infra)" | 19:26 | |
clarkb | With this issue it seems we get rogue router advertisements which add bogus IPs to our instance. When that happens we basically break IPv6 routing on the host | 19:27 |
clarkb | This is likely a neutron bug but needs more cloud side involvement to debug | 19:27 |
fungi | note we've seen it (at least) once in limestone too. based on the prefixes getting added we suspect it's coming from a job node in another tenant | 19:27 |
clarkb | frickler has brought up that we should try and mitigate this better. Perhaps via assigning the IP details statically. I looked at this and it should be possible with the new netplan tooling but its a new thing we'll need to figure out | 19:28 |
clarkb | I wrote up an etherpad that I can't find anymore with a potential example config | 19:28 |
clarkb | another thought I has was maybe wecan filter RAs by origin mac ? | 19:28 |
clarkb | is that something iptables can be convinced to do ? | 19:28 |
fungi | i'm not absolutely sure iptables can block that | 19:29 |
fungi | if it's handled like arp, the kernel may be listening to a bpf on the interface | 19:29 |
fungi | so will see and act on it before it ever reaches iptables | 19:29 |
fungi | (dhcp has similar challenges in that regard) | 19:29 |
clarkb | my concern with the netplan idea is if we get it wrong we may have to build a new server. At least with iptables we can tes tthe rule and if we get it wrong reboot | 19:29 |
ianw | clarkb: you could always set a console root password for a bit? | 19:30 |
clarkb | ianw: does remote console access work with vexxhost (I'm not sure but if it does that would be a reaosnable compromise) | 19:30 |
ianw | oh, i'm assuming it would, yeah | 19:31 |
clarkb | Also totally open to other ideas here :) | 19:31 |
ianw | it seems like this is something you have to stop, like a rogue dhcp server | 19:32 |
fungi | statically configuring ipv6 and configuring the kernel not to do autoconf is probably the safest workaround | 19:32 |
clarkb | ya, its basically the same issue just with different IP protocols | 19:32 |
clarkb | I'll try harder to dig out the netplan etherpad after the meeting | 19:33 |
ianw | yeah, so i'm wondering what best practice others use is ... ? | 19:33 |
ianw | oh, it's ipv6 | 19:33 |
ianw | of course there's a rfc | 19:33 |
ianw | https://tools.ietf.org/html/rfc6104 | 19:33 |
fungi | ianw: generally it's to rely on autoconf and hope there's no bug in neutron leaking them between tenants | 19:33 |
clarkb | manual configuration is the first item on that rfc | 19:34 |
ianw | just 15 pages of options | 19:34 |
clarkb | so maybe we start there as frickler suggests | 19:34 |
clarkb | but if any of the other options there look preferable to you I'm happy to try others instead :) | 19:34 |
ianw | is it neutron leaking ra's ... or devstack doing something to the underlying nic maybe? | 19:35 |
clarkb | ianw: we believe it is neutron running in test jobs on the other tenant (we split mirror and test nodes into different tenants) | 19:36 |
fungi | devstack in a vm altering the host's nic would be... even more troubling | 19:36 |
clarkb | and neutron in the base cloud (vexxhost) is expected to block those RAs | 19:36 |
clarkb | per the bug we filed when limestone had this issue | 19:36 |
fungi | in which case it would point to a likely bug in qemu i guess | 19:36 |
ianw | that seems like a DOS attack :/ | 19:36 |
clarkb | ianw: yes I originally filed it as a security bug a year ago or whatever it was | 19:37 |
clarkb | but it largely got ignored as cannot reproduce and then disclosed (so now we can talk about it freely) | 19:37 |
fungi | ianw: yep. neutron has protections which are supposed to prevent exactly this, but sometimes those aren't effective apparently | 19:37 |
clarkb | its possible that because we open up our security groups we're the only ones that notice | 19:37 |
clarkb | (we could try using security groups to block them too maybe?) | 19:37 |
fungi | however we haven't worked out the sequence to reliably recreate the problem, only observed it cropping up with some frequency, so it's hard to pin down the exact circumstances which lead to it | 19:38 |
fungi | the open bug on neutron is still basically a dead end without a reproducer | 19:38 |
clarkb | yup also we don't run the clouds so we don't really see the underlying network behavior | 19:38 |
clarkb | anyway we don't have to solve this here, let's just not forget to work around it this time :) I can help with this once nb03 is in a good spot | 19:39 |
clarkb | #topic Bup and Borg Backups | 19:40 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:40 | |
clarkb | ianw anything new on this? and if not should we drop it from the agenda until we start enrolling servers with borg? | 19:40 |
ianw | sorry i've just had my head in rhel and efi stuff | 19:40 |
clarkb | (I've kept it on because I think backups are important but bup seems to be working well enough for now so borg isn't urgent) | 19:40 |
ianw | it is right at the top of my todo list though | 19:40 |
ianw | we can keep it for now, and i'll try to get at least an initial host done asap | 19:41 |
clarkb | ok and thank you | 19:41 |
clarkb | #topic PTG Planning | 19:41 |
*** openstack changes topic to "PTG Planning (Meeting topic: infra)" | 19:41 | |
clarkb | #topic https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here | 19:42 |
*** openstack changes topic to "https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here (Meeting topic: infra)" | 19:42 | |
clarkb | er | 19:42 |
clarkb | #undo | 19:42 |
openstack | Removing item from minutes: #topic https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here | 19:42 |
clarkb | #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here | 19:42 |
clarkb | October is fast approaching and I really do intend to add some content to that etherpad | 19:42 |
clarkb | as always others should feel free to add their own content | 19:43 |
clarkb | #topic Docker Hub Rate Limits | 19:43 |
*** openstack changes topic to "Docker Hub Rate Limits (Meeting topic: infra)" | 19:43 | |
clarkb | This wasn't on the agenda I sent out this morning as it occurred to me that it may be owrth talking about after looking at emails in openstack-discuss | 19:43 |
clarkb | Long story short docker hub is changing/has changed how they apply rate limits to image pulls. In the past limits were applied to layer blobs which we do cache in our mirrors. Now limits are applied to manifest fetches not blob layers. We don't cache manifest layers because getting those requires auth (even as an anonymous user you get an auth token) | 19:44 |
clarkb | This is unfortunate because it means our caching strategy is no longer effective for docker hub | 19:45 |
clarkb | On the plus side projects like zuul and nodepool and system-config havne't appeared to be affected yet. But othres like tripleo have | 19:45 |
clarkb | docker has promised they'll write a blog post on suggestions for CI operators which I haven't seen being published yet /me waits patiently | 19:45 |
clarkb | If our users struggle with this in the meantime I think their best bet may be to stop using our mirrors because then they will make anonymous requests from IPs that will generally be unique enoughto avoid issues | 19:46 |
clarkb | Other ideas I've seen include building images rather than fetching them (tripleo is doing this) as well as using other registries like quay | 19:47 |
fungi | there are certainly multiple solutions available to us, but i've been trying to remind users that dockerhub has promised to publish guidance and we should wait for that | 19:47 |
fungi | at least before we invest effort in building an alternative solution | 19:47 |
clarkb | ++ I mostly want people to be aware there is an issue and workarounds from the source should be published at some point | 19:48 |
clarkb | and there are "easy" workarounds that can be used between now and then like not using our mirrors | 19:48 |
fungi | (such as running ourt own proxy registry, or switching to a different web proxy which might be more flexible than apache mod_proxy) | 19:48 |
fungi | there was also some repeated confusion i've tried by best to correct around zuul-registry and its presumed use in proxying docker images for jobs | 19:49 |
clarkb | oh ya a couple people were confused by that | 19:50 |
clarkb | not realizing its a temporary staging ground not a canonical source/cache | 19:50 |
ianw | didn't github also announce a competing registry too? | 19:50 |
clarkb | ianw: yes | 19:50 |
clarkb | and google has one | 19:50 |
fungi | yes, but who knows if it will have similar (or worse) rate limits. we've been bitted by github rate limits pretty often as it is | 19:50 |
fungi | man, my typing is atrocious today | 19:51 |
ianw | yeah, just thinking that's sure to become something to mix in as well | 19:51 |
clarkb | #topic Open Discussion | 19:53 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:53 | |
clarkb | Anything else to bring up in our last 7 minutes? | 19:53 |
fungi | oh, yeah | 19:53 |
fungi | pynotedb | 19:53 |
fungi | a few years ago, zara started work on a python library to interface with gerrit notedb databases | 19:54 |
fungi | but didn't get all that far with it | 19:54 |
fungi | we have the package name on pypi and a repo in our )opendev's) namespace on opendev but that's mostly just a cookie-cutter commit | 19:54 |
hashar | :-\ | 19:54 |
fungi | more recently softwarefactory needed something to be able to interface with notedb from python and started writing a module for that | 19:55 |
fungi | they (ironically) picked the same name without checking whether it was taken | 19:55 |
fungi | now they're asking if we can hand over the pypi project so they can publish their library under that name | 19:55 |
clarkb | for the name in pypi was anything released to it? | 19:56 |
clarkb | if yes, then we may want to quickly double check nothing is using it (I think pypi exposes that somehow) but if not I have no objections to that idea | 19:56 |
fungi | a couple of dev releases several years ago, looks like | 19:56 |
fungi | also SotK has confirmed that the original authors are okay with lettnig it go | 19:57 |
fungi | and probably just using tristanC's thing instead once they're ready | 19:57 |
clarkb | works for me | 19:58 |
clarkb | particularly if the original authors are happy with the plan | 19:58 |
diablo_rojo | Seems reasonable | 19:58 |
fungi | ahh, looks like the "releases" for it on pypi have no files anyway | 19:59 |
fungi | evidenced from the lack of "download files" at https://pypi.org/project/pynotedb/ | 19:59 |
hashar | there is no tag in the repo apparently | 20:00 |
fungi | so the two dev releases on there are just empty, no packages | 20:00 |
clarkb | that makes things easy | 20:00 |
diablo_rojo | Nice | 20:00 |
clarkb | and we are at time | 20:00 |
clarkb | Thank you everyone! | 20:00 |
fungi | thanks clarkb! | 20:00 |
clarkb | #endmeeting | 20:00 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:00 | |
openstack | Meeting ended Tue Sep 8 20:00:31 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.html | 20:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.txt | 20:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.log.html | 20:00 |
diablo_rojo | Thanks clarkb! | 20:01 |
hashar | clarkb: note that pynotedb seems to had changes pending in some Gerrit and none ended up merged. https://opendev.org/opendev/pynotedb/graph hints at some changes that proposed the basic implementation | 20:01 |
clarkb | hashar: that shouldn't be a problem for the pypi name especially since the original authors are ok with the switch | 20:02 |
hashar | sure | 20:02 |
hashar | looks like the implementation was in the open change https://review.opendev.org/#/c/449590/ , untouched since 2017 | 20:02 |
*** hashar has quit IRC | 20:38 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!