Tuesday, 2020-09-08

*** diablo_rojo has joined #opendev-meeting14:49
*** hamalq has joined #opendev-meeting16:42
*** hashar has joined #opendev-meeting18:50
fungiahoy mateys!19:01
corvusahoy hoy19:01
clarkbhello!19:01
clarkb#startmeeting infra19:01
openstackMeeting started Tue Sep  8 19:01:28 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-September/000082.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
ianwo/19:01
clarkbI didn't have any formal announcements. But Yesterday and Today Oregon decided to catch on fire so I'm semi distracted by that. We should be ok though a neraby field decided it wanted to be a fire instead19:02
clarkbanyone else have anything to announce?19:02
clarkb(oh also power outages have been a problem so I may drop out due to that too though haven't lost power yet)19:03
funginothing which tops that, no ;)19:03
fungi:/19:03
clarkbreally I expect the worst bit will be the smoke when the winds shift again. So I should just be happy right now :)19:03
clarkb#topic Actions from last meeting19:04
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:04
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-01-19.01.txt minutes from last meeting19:04
clarkbThere were no actions from lsat meeting. Lets just dive into this one then19:04
clarkb#topic Priority Efforts19:04
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:04
clarkb#topic Update Config Management19:04
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:04
clarkbI've booted a new nb03.opendev.org to run nodepool-builder with docker for arm64 image builds19:05
clarkbThat has been enrolled into our inventory but has a problem installing things because there aren't wheels for arm64 :)19:05
clarkb#link https://review.opendev.org/750472 Add build deps for docker-compose on nb0319:05
clarkbthat should fix it and once thats done everything should be handled by docker so should work19:05
clarkbone thing that came up as part of this is that we don't seem to have ansible using sshfp records yet? or maybe we do and the issue I had was specific to having a stale known_hosts entry for a reused IP?19:06
clarkbianw: fungi ^ any updates on that?19:06
fungiwe have usable sshfp records for at least some hosts19:06
ianwumm, i think that the stale known_hosts overrides the sshfp19:06
fungiyes, if there is an existing known_hosts entry that will be used instead19:07
clarkbgotcha, that was likely the issue here then19:07
ianwit might be a bit of a corner case with linaro19:07
clarkbdo we expect sshfp to work otherwise?19:07
ianwwhere we have a) few ip's and b) have rebuilt the mirror a lot19:07
fungithough i also don't think bridge.o.o is configured to do VerifyHostKeyDNS=yes is it?19:08
clarkbhttps://review.opendev.org/#/c/744821/ <- reviewing and landing that would be good if we expect sshfp to work now19:08
ianwmy understanding is yes, since it is using unbound and the dns records are trusted19:08
fungii thought VerifyHostKeyDNS=ask was the default19:08
fungiand i couldn't find anywhere we'd overridden it19:09
fungiahh, ssh_config manpage on bridge.o.o claims VerifyHostKeyDNS=no is the default actually19:10
clarkbok we don't have to solve this in the meeting but wanted to call it out as a question that came up19:10
fungiyeah, i'm not certain we've actually started using sshfp records for ansible runs from bridge yet19:11
clarkbAre there other config management update to call out?19:11
fungialso worth noting, glibc 2.31 breaks dnssec (there are nondefault workarounds), so we need to be mindful of that when we eventually upgrade bridge.o.o, or for our own systems19:12
clarkbfungi: is 2.31 or newer in focal?19:12
fungias that will also prevent openssh from relying on sshfp records19:12
fungiyeah, focal19:13
fungi2.31-0ubuntu919:13
clarkbsounds like that may be it for config management and sshfp19:14
clarkb#topic OpenDev19:15
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:15
ianwwe could also move back to the patch that just puts the fingerprints into known_hosts19:15
ianwas sshfp seems like it is a nice idea, but ... perhaps more trouble that it's worth tbh19:15
clarkbianw: something to consider for sure19:15
clarkb#link https://review.opendev.org/#/c/748263/ Update opendev.org front page19:15
clarkbThank you ianw for reviewing this one19:15
clarkbLooks like we've got a couple +2s now. corvus do you want to review it before we approve it?19:16
clarkbI should rereview it, but in trying to follow the comments its all made sense to me so far s o Idoubt I'll have major concerns19:16
fungii've left some comments there for things i'm happy to address in a follow-up patch19:17
fungiso as not to drag this one out unnecessarily19:17
fungiit's already a significant improvement over what's on the site now, in my opinion19:18
clarkbfrickler: ^ you may be interested as well19:18
clarkbmaybe fungi can approve it first thing tomorrow if there are no further objects between now and then?19:18
clarkbbecause ya I agree a big improvement19:18
fungisure, i'll push up my suggestions as a second change when doing so19:19
clarkbOn the gerrit upgrade testing side of things I've not had time to push on that since my least email to luca. I'm hoping that I'll hvae time this week for more testing19:19
clarkbAny other opendev topics others would like to call out before we move on?19:20
corvusclarkb: i will +3 front page19:20
clarkbcorvus: k19:20
fungii finished the critical volume replacements in rax-dfw last week19:20
fungiand have been poking at replacing the less critical ones in the background as time allows19:21
clarkbfungi: other than the sometimes old volumes don't delete problem were there issues?19:21
fungiahh, yeah, looks like wiki.o.o will need special attention. i expect it's because it's booted from a snapshot of a legacy flavor instance, but i can't attach a new volume to it19:21
fungimay need to rsync its content over to another instance booted from a modern flavor19:22
clarkb"fun"19:22
fungithe api accepts the volume add, but then the volume immediately returns to available and the instance never sees it19:22
fungioh, and also i discovered that something about osc is causing it not to be able to refer to volumes by name19:23
fungiand it gives an empty name column in the volume list output too19:23
fungii've resorted to using cinderclient for now to get a volume listing with names included19:24
fungii suspect it's something to do with using cinder v1 api, or maybe a rackspace-specific problem19:24
fungijust something worth keeping in mind if anybody needs something similar19:24
fungii haven't really had time to take it up with the sdk/cli folks yet19:24
clarkbThank you for taking care of that19:25
fungino problem19:25
clarkb#topic General Topics19:26
*** openstack changes topic to "General Topics (Meeting topic: infra)"19:26
clarkb#topic Vexxhost Mirror IPv6 Problems19:26
*** openstack changes topic to "Vexxhost Mirror IPv6 Problems (Meeting topic: infra)"19:26
clarkbWith this issue it seems we get rogue router advertisements which add bogus IPs to our instance. When that happens we basically break IPv6 routing on the host19:27
clarkbThis is likely a neutron bug but needs more cloud side involvement to debug19:27
funginote we've seen it (at least) once in limestone too. based on the prefixes getting added we suspect it's coming from a job node in another tenant19:27
clarkbfrickler has brought up that we should try and mitigate this better. Perhaps via assigning the IP details statically. I looked at this and it should be possible with the new netplan tooling but its a new thing we'll need to figure out19:28
clarkbI wrote up an etherpad that I can't find anymore with a potential example config19:28
clarkbanother thought I has was maybe wecan filter RAs by origin mac ?19:28
clarkbis that something iptables can be convinced to do ?19:28
fungii'm not absolutely sure iptables can block that19:29
fungiif it's handled like arp, the kernel may be listening to a bpf on the interface19:29
fungiso will see and act on it before it ever reaches iptables19:29
fungi(dhcp has similar challenges in that regard)19:29
clarkbmy concern with the netplan idea is if we get it wrong we may have to build a new server. At least with iptables we can tes tthe rule and if we get it wrong reboot19:29
ianwclarkb: you could always set a console root password for a bit?19:30
clarkbianw: does remote console access work with vexxhost (I'm not sure but if it does that would be a reaosnable compromise)19:30
ianwoh, i'm assuming it would, yeah19:31
clarkbAlso totally open to other ideas here :)19:31
ianwit seems like this is something you have to stop, like a rogue dhcp server19:32
fungistatically configuring ipv6 and configuring the kernel not to do autoconf is probably the safest workaround19:32
clarkbya, its basically the same issue just with different IP protocols19:32
clarkbI'll try harder to dig out the netplan etherpad after the meeting19:33
ianwyeah, so i'm wondering what best practice others use is ... ?19:33
ianwoh, it's ipv619:33
ianwof course there's a rfc19:33
ianwhttps://tools.ietf.org/html/rfc610419:33
fungiianw: generally it's to rely on autoconf and hope there's no bug in neutron leaking them between tenants19:33
clarkbmanual configuration is the first item on that rfc19:34
ianwjust 15 pages of options19:34
clarkbso maybe we start there as frickler suggests19:34
clarkbbut if any of the other options there look preferable to you I'm happy to try others instead :)19:34
ianwis it neutron leaking ra's ... or devstack doing something to the underlying nic maybe?19:35
clarkbianw: we believe it is neutron running in test jobs on the other tenant (we split mirror and test nodes into different tenants)19:36
fungidevstack in a vm altering the host's nic would be... even more troubling19:36
clarkband neutron in the base cloud (vexxhost) is expected to block those RAs19:36
clarkbper the bug we filed when limestone had this issue19:36
fungiin which case it would point to a likely bug in qemu i guess19:36
ianwthat seems like a DOS attack :/19:36
clarkbianw: yes I originally filed it as a security bug a year ago or whatever it was19:37
clarkbbut it largely got ignored as cannot reproduce and then disclosed (so now we can talk about it freely)19:37
fungiianw: yep. neutron has protections which are supposed to prevent exactly this, but sometimes those aren't effective apparently19:37
clarkbits possible that because we open up our security groups we're the only ones that notice19:37
clarkb(we could try using security groups to block them too maybe?)19:37
fungihowever we haven't worked out the sequence to reliably recreate the problem, only observed it cropping up with some frequency, so it's hard to pin down the exact circumstances which lead to it19:38
fungithe open bug on neutron is still basically a dead end without a reproducer19:38
clarkbyup also we don't run the clouds so we don't really see the underlying network behavior19:38
clarkbanyway we don't have to solve this here, let's just not forget to work around it this time :) I can help with this once nb03 is in a good spot19:39
clarkb#topic Bup and Borg Backups19:40
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:40
clarkbianw anything new on this? and if not should we drop it from the agenda until we start enrolling servers with borg?19:40
ianwsorry i've just had my head in rhel and efi stuff19:40
clarkb(I've kept it on because I think backups are important but bup seems to be working well enough for now so borg isn't urgent)19:40
ianwit is right at the top of my todo list though19:40
ianwwe can keep it for now, and i'll try to get at least an initial host done asap19:41
clarkbok and thank you19:41
clarkb#topic PTG Planning19:41
*** openstack changes topic to "PTG Planning (Meeting topic: infra)"19:41
clarkb#topic https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here19:42
*** openstack changes topic to "https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here (Meeting topic: infra)"19:42
clarkber19:42
clarkb#undo19:42
openstackRemoving item from minutes: #topic https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here19:42
clarkb#link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning starts here19:42
clarkbOctober is fast approaching and I really do intend to add some content to that etherpad19:42
clarkbas always others should feel free to add their own content19:43
clarkb#topic Docker Hub Rate Limits19:43
*** openstack changes topic to "Docker Hub Rate Limits (Meeting topic: infra)"19:43
clarkbThis wasn't on the agenda I sent out this morning as it occurred to me that it may be owrth talking about after looking at emails in openstack-discuss19:43
clarkbLong story short docker hub is changing/has changed how they apply rate limits to image pulls. In the past limits were applied to layer blobs which we do cache in our mirrors. Now limits are applied to manifest fetches not blob layers. We don't cache manifest layers because getting those requires auth (even as an anonymous user you get an auth token)19:44
clarkbThis is unfortunate because it means our caching strategy is no longer effective for docker hub19:45
clarkbOn the plus side projects like zuul and nodepool and system-config havne't appeared to be affected yet. But othres like tripleo have19:45
clarkbdocker has promised they'll write a blog post on suggestions for CI operators which I haven't seen being published yet /me waits patiently19:45
clarkbIf our users struggle with this in the meantime I think their best bet may be to stop using our mirrors because then they will make anonymous requests from IPs that will generally be unique enoughto avoid issues19:46
clarkbOther ideas I've seen include building images rather than fetching them (tripleo is doing this) as well as using other registries like quay19:47
fungithere are certainly multiple solutions available to us, but i've been trying to remind users that dockerhub has promised to publish guidance and we should wait for that19:47
fungiat least before we invest effort in building an alternative solution19:47
clarkb++ I mostly want people to be aware there is an issue and workarounds from the source should be published at some point19:48
clarkband there are "easy" workarounds that can be used between now and then like not using our mirrors19:48
fungi(such as running ourt own proxy registry, or switching to a different web proxy which might be more flexible than apache mod_proxy)19:48
fungithere was also some repeated confusion i've tried by best to correct around zuul-registry and its presumed use in proxying docker images for jobs19:49
clarkboh ya a couple people were confused by that19:50
clarkbnot realizing its a temporary staging ground not a canonical source/cache19:50
ianwdidn't github also announce a competing registry too?19:50
clarkbianw: yes19:50
clarkband google has one19:50
fungiyes, but who knows if it will have similar (or worse) rate limits. we've been bitted by github rate limits pretty often as it is19:50
fungiman, my typing is atrocious today19:51
ianwyeah, just thinking that's sure to become something to mix in as well19:51
clarkb#topic Open Discussion19:53
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:53
clarkbAnything else to bring up in our last 7 minutes?19:53
fungioh, yeah19:53
fungipynotedb19:53
fungia few years ago, zara started work on a python library to interface with gerrit notedb databases19:54
fungibut didn't get all that far with it19:54
fungiwe have the package name on pypi and a repo in our )opendev's) namespace on opendev but that's mostly just a cookie-cutter commit19:54
hashar:-\19:54
fungimore recently softwarefactory needed something to be able to interface with notedb from python and started writing a module for that19:55
fungithey (ironically) picked the same name without checking whether it was taken19:55
funginow they're asking if we can hand over the pypi project so they can publish their library under that name19:55
clarkbfor the name in pypi was anything released to it?19:56
clarkbif yes, then we may want to quickly double check nothing is using it (I think pypi exposes that somehow) but if not I have no objections to that idea19:56
fungia couple of dev releases several years ago, looks like19:56
fungialso SotK has confirmed that the original authors are okay with lettnig it go19:57
fungiand probably just using tristanC's thing instead once they're ready19:57
clarkbworks for me19:58
clarkbparticularly if the original authors are happy with the plan19:58
diablo_rojoSeems reasonable19:58
fungiahh, looks like the "releases" for it on pypi have no files anyway19:59
fungievidenced from the lack of "download files" at https://pypi.org/project/pynotedb/19:59
hasharthere is no tag in the repo apparently20:00
fungiso the two dev releases on there are just empty, no packages20:00
clarkbthat makes things easy20:00
diablo_rojoNice20:00
clarkband we are at time20:00
clarkbThank you everyone!20:00
fungithanks clarkb!20:00
clarkb#endmeeting20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:00
openstackMeeting ended Tue Sep  8 20:00:31 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.html20:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.txt20:00
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.log.html20:00
diablo_rojoThanks clarkb!20:01
hasharclarkb: note that pynotedb seems to had changes pending in some Gerrit and none ended up merged.  https://opendev.org/opendev/pynotedb/graph  hints at some changes that proposed the basic implementation20:01
clarkbhashar: that shouldn't be a problem for the pypi name especially since the original authors are ok with the switch20:02
hasharsure20:02
hasharlooks like the implementation was in the open change https://review.opendev.org/#/c/449590/  , untouched since 201720:02
*** hashar has quit IRC20:38

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!