Tuesday, 2020-07-28

*** hamalq has quit IRC00:41
*** weshay|ruck has joined #opendev-meeting12:54
weshay|ruckzbr, 0/12:56
zbro/13:25
fungithe weekly meeting isn't for another 5.5 hours, but get here early for a good seat ;)13:26
zbrfungi: i may fall asleep before it starts, already tired15:45
fungiluckily it's logged15:47
*** hamalq has joined #opendev-meeting16:46
corvus13.5 + 5.5 equals.....18:59
corvuso/18:59
clarkbhello19:00
clarkbwe'll get started shortly19:00
fungii already started19:00
ianwo/19:00
fungioh, you meant the meeting, not uncontrolled drinking19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Jul 28 19:01:04 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-July/000059.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbI have no announcements19:01
clarkb#topic Actions from last meeting19:01
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:02
fungiwe now have dates for the (no surprise) virtual open infrastructure summit19:02
clarkb#undo19:02
openstackRemoving item from minutes: #topic Actions from last meeting19:02
fungiand there's a survey which has gone out to pick dates for the ptg19:02
clarkb#link https://www.openstack.org/summit Info on now virtual open infrastructure summit19:02
mordredo/19:03
clarkb#link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016098.html PTG date selection survey19:03
fungiptg date options seem to span from a couple weeks before the summit to several weeks after19:03
clarkbfungi: thank you for the reminders19:03
fungimy personal preference is to not interfere with hallowe'en ;)19:04
mordredfungi: I dunno - interfering with halloween is a long-standing summit tradition :)19:04
fungiand an unfortunate one in my opinion19:04
fungianyway, you can proceed with the meeting, sorry for derailment19:05
clarkb#topic Actions from last meeting19:05
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:05
mordredI don't disagree with you19:05
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-21-19.01.txt minutes from last meeting19:05
clarkbThere were no actions19:05
clarkb#topic Specs approval19:06
*** openstack changes topic to "Specs approval (Meeting topic: infra)"19:06
clarkb#link https://review.opendev.org/#/c/731838/ Authentication broker service19:06
clarkbThis just got a new patchset. Probably still not ready for approval but definitely worthy of review19:06
fungiyeah, i think we at least have direction for a poc19:06
clarkbfungi: anything you want to add to that re new patchset or $other19:06
funginew patcheset is primarily thanks to your inspiration, more comments welcome of course19:07
fungibasically noting that we're agreed on keycloak, not much new otherwise19:08
fungialso, you can move on, nothing else from me19:09
clarkb#topic Priority Efforts19:09
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:09
clarkb#topic Update Config Management19:09
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:09
clarkbA note that we went about 5 days without any of our infra-prod jobs running due to the zuul security bug fix and our playbook tripping over that. Since that has been fixed things seem happy19:10
clarkbIf you find things are lagging or otherwise not up to date this could be why19:10
clarkbSemi related to that are the changes to run zuul executors under containers. This gets all of our zuul and nodepool services except nb03's builder on containers19:11
fungiworkarounds were mostly trivialish19:11
clarkbtwo things to make note of there. nb03 needs an arm64 image which has hit a speed bump due to the slowness of building python wheels under buildx. Zuul executors need to start docker after afs (a fix for this has landed) to ensure bind mounts work properly for afs19:12
clarkbfor nb03 I think it is using the plaintext zk connection still. Is that correct corvus ?19:12
corvusclarkb: unclear, i'll check19:13
corvusyes19:13
mordredclarkb: have we checked to see if building the wheels on native arm64 for the wheels in question is also slow?19:13
clarkbmordred: thats a thing that ianw has been investigating with upstream python cryptography and I think it is significantly faster19:14
fungimordred: i think it's more that far fewer of our dependencies publish arm manylinux wheels so we have to build more of them than for x8619:14
mordrednod19:14
clarkbfungi: it is that, but also we had an hour time out on the job and they would hit that compiling any one of cryptography, pynacl and bcrypt19:14
clarkbfungi: so the compiles there are also really really slow19:15
fungiand yeah, building on native arm64 is apparently far faster than qemu emulated arm64 on amd6419:15
corvussolutions include: 1) help upstreams build arm wheels; 2) build nodepool images on a native arm builder; 3) add in a new layer to the nodepool image stack that builds and installs dependencies so that we can rebuild this layer separately and less often.19:15
fungii like the idea of helping upstreams, and there's a topic on the agenda for that19:15
corvusianw started on #1;  #2 and #3 are not in play yet, just for discussion19:16
corvusi'm still not keen on #219:16
mordredme either19:16
fungiphilosophically, #1 is preferable19:16
corvus(because i don't think it's okay not to be able to merge nodepool changes if we lose that cloud)19:16
mordredI think failing 1 we should try 3 before going to 219:16
ianw#3 doesn't seem like it precludes #1 either19:17
corvusyeah, we could go ahead and start on #3 if we think it's not a terrible idea19:17
fungiyeah, #3 is doable independently and will likely help19:17
mordredyah. we could also do them not as another image layer, but just as a local version of #1 that publishes to a zuul-specific wheel mirror similar to the openstack specific wheel mirror19:17
corvusbasically, i'm imagining an image that we build only on requirements.txt changes and nightly19:17
fungibut if #1 is practical, then it will probably not be a huge gain19:17
corvusfungi: it's also a fallback for the next requirement we have that doesn't have a wheel19:18
fungier, if #1 is practical then #3 will probably not be a huge gain19:18
corvusit sort of scales up and down as needed :)19:18
mordredso we treat "make sure we have wheels of X, Y and Z" as a task separate from "build an image containing nodepool and X, Y and Z"19:18
fungiyeah, i agree it's a useful backstop either way19:18
mordredcorvus: ++19:18
corvusmordred, ianw: do you think we can generalize the make-a-wheel-mirror thing like that?19:18
ianwcorvus: yes, it's just a script :)19:19
corvusis that a service we'd like opendev to provide?19:19
mordredmight not be a terrible general service to offer19:19
fungiif the new layer for #3 is part of the same job though, then it can still easily time out whenever there's something new to buid19:19
ianwthe only thing is that the arm64 wheel build already runs quite long, such that we've restricted it to the lastest 2 branches19:19
fungibuild19:19
corvusianw: but also a vhost?  or would we just have the mirror host be a superset of all the wheels for all the projects)?19:19
corvusfungi: same job as what job?19:20
fungisame as the job building the other layers19:20
fungithe job currently timing out19:20
corvusfungi: yes there would be no point in that :)  my suggestion is a separately tagged image layer built by a separate job19:20
fungido we have a separate job per layer? if so then less of a problem i guess19:20
clarkbcorvus: I think superset19:20
corvusfungi: if the word 'layer' is tripping you up, just ignore it and call it an 'image' :)19:21
clarkbcorvus: becuase requirements and constraints or lock files should control what projects actually use19:21
clarkbif they rely on packages being available (or not) in our cache to select versions that is a bug imo19:21
ianw(on the wheel job) but this is something that is open to more parallelism.  we already run it under "parallel" and I think using a couple of nodes could really speed it up by letting the longest running jobs sit on one node, while the smaller things zoom by19:21
fungii do like the superset idea, if we have a good way for projects to chuck in the package+version they want a wheel of19:21
ianwfungi: i was probably thinking we run requirements.txt, just without a cap19:21
corvusyeah, a wheel mirror sounds way easier than a new layer19:22
corvusso let's call that #4, and execute #1 and #4, and leave ideas #2 and #3 on the shelf?19:22
ianwi can take an action item to look at the wheel mirror if we like19:22
fungithere is a slight gain from not aprallelizing, in that if projects with common dependencies are split between concurrent jobs then that dependency will be built twice, but ni practice that's probably not a huge concern19:22
fungis/aprallelizing/parallelizing/19:23
fungimy hands are typing at incompatible speeds today19:23
mordredfungi: to one another?19:24
fungiat least, yes19:24
ianwone other thing about wheels is that some of the caveats in https://review.opendev.org/#/c/703916/ still applies ... we append wheels and never delete currently19:25
clarkbianw: thankfully these packages tend to be fairly small (unlike say the cuda pypi packages)19:25
ianwbasically the cache grows unbounded, which is one thing with capped requirements but if we're chasing upstream might need thinking about19:25
fungitrue, if we shard and build in parallel then deleting becomes much harder too19:25
fungithough if we're worried about space on our current mirror, i'd say the first course of action is to delete any wheels which are also present on pypi. we used to copy them (unnecessarily) into the mirror19:27
fungiwe no longer do that, but since we only append...19:27
clarkbalso due to using afs we could clear the contents, rebuild, then publish the result periodically if we want to prune right?19:28
clarkbthe RO volume will serve the old contents until we switch19:28
clarkbs/switch/vos release/19:28
fungiyeah, pruning as a separate step could be atomic, at least19:28
ianwthis is true, but i think that all the jobs would timeout trying to refresh the cache19:28
fungithough we'd need to block additions while pruning19:28
ianwso it would be very manual19:29
clarkbgotcha19:29
clarkbok we've got time in a bit to talk further about #1, anything else to bring up on #4 before we move on?19:30
ianwdo you want to give me an action item on it?  or is anyone else super keen?19:30
clarkbif you're interested I would say go for it19:31
corvusianw: if yo can get started and tag me for assistance/reviews that'd be great19:31
ianw++ will do19:31
clarkb#action ianw Work on incorporating non OpenStack requirements into our python wheel caches. corvus willing to assist19:31
clarkb#topic OpenDev19:32
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:32
clarkbI announced deprecation for gerrit /p/ mirrors19:32
clarkb#link http://lists.opendev.org/pipermail/service-announce/2020-July/000007.html Gerrit /p/ mirror deprecation.19:32
clarkb#link https://review.opendev.org/#/c/743324/ Implement Gerrit /p/ mirror "removal"19:33
clarkband pushed that change to make it happen. I said I would do this friday19:33
clarkbif we think the plan above is a bad one let me know and we can modify as necessary19:33
fungijust to confirm (i also left a review comment), is the idea to remove that line from the config when we upgrade?19:33
clarkbThe reasons for it are that gerrit needs that url for something else on newer versions which we plan to upgrade to and I've slowly been trying to update things to manage git branches and not having another mirror (that is going away anyway) makes that simpler19:34
clarkbfungi: yes19:34
fungior will it not break polygerrit dashboards?19:34
fungiahh, okay, thanks19:34
clarkbthere will also need to be followon cleanup to stop syncing to the local git repos and all that19:34
clarkbbut I was going to wait on that to be sure we don't revert for some reason19:34
mordred++19:35
clarkbThat was what I had for opendev subjects.19:35
clarkb#topic General topics19:36
*** openstack changes topic to "General topics (Meeting topic: infra)"19:36
clarkb#topic Bup and Borg backups19:36
*** openstack changes topic to "Bup and Borg backups (Meeting topic: infra)"19:36
clarkb#link https://review.opendev.org/741366 Adds borg backup support and needs review19:36
ianwwe'd also discussed the old backups and i checked on them19:37
clarkbI've reviewed ^ and think it makes sense. Allows us to have bup and borg side by side too19:37
clarkbianw: they still happy even after the index cleanup?19:37
ianwreview seems fine; i did a full extraction and here's some important bits compared : http://paste.openstack.org/show/796367/19:37
ianwzuul also seems ok, but i did notice we've missed adding that to the "new" server19:37
ianwhttps://review.opendev.org/#/c/743445/19:37
ianw#link https://review.opendev.org/#/c/743445/19:38
ianwif i can get reviews on the borg backup roles, i'll start a new server and we can try it with something19:38
clarkbgreat, thank you for putting that together19:39
ianw(as a tangent on starting a new server)19:39
ianw#link https://review.opendev.org/74346119:39
ianwand19:39
ianw#link https://review.opendev.org/74347019:39
ianwadds sshfp records for our hosts, and makes the launch script print them too :)19:39
clarkb#topic GitHub 3rd Party CI19:40
*** openstack changes topic to "GitHub 3rd Party CI (Meeting topic: infra)"19:40
clarkbas promised earlier we can talk about this a bit more19:40
clarkbBasically python cryptography doesn't have arm64 wheels. ianw filed an issue with them to start a conversation on whether or not we could help with that19:41
clarkb#link https://github.com/pyca/cryptography/issues/533919:41
clarkbwe also talked about it a bit ourselves and it is something we'd like to do if we can make all the pieces fit together such that cryptogrpahy is happy and all that19:41
clarkbTo do this we'll need to spin up a tenant for them19:42
clarkb#link https://review.opendev.org/#/q/topic:opendev-3pci19:42
fungithough in short, the main reason they don't have arm64 manylinux1 wheels is that travis takes too long to test/build them because it uses emulation?19:42
clarkbyes I believe travis is doing something very similar to our buildx nodepool image builds for docker19:42
ianwi'm not sure on the emulation part; i'm pretty sure they run on AWS19:42
ianwbut the slow part, yes19:42
corvusi was thinking the tenant would be a 'cryptography' tenant19:43
corvusthe main issue prompting giving them their own tenant is that they won't be coordinating with anyone else19:43
fungiyeah, or at most a pyca tenant19:43
corvusthat seems reasonable too19:43
ianwyeah, that was something to discuss, i made the tenant deliberately a bit more generic thinking that it may be a home for a collection of things we might need/be interested in19:44
corvus(so a generic 3pci tenant seems weird since we'd never want to add anyone else there)19:44
fungii concur19:44
fungigranularity is good here19:44
ianwalready i've had mentions from people that lxml would like something similar19:44
fungitenants would, ideally, mirror the authorities for them19:44
clarkbits possbile other third party ci groups would be happy with a more generic pool but cryptography in particular seemed to want to be very much their own thing19:45
clarkbso at least to start I agree with fungi and corvus19:45
clarkbthen evaluate if other python deps do similar19:45
ianwok, i will rework it to a pyca tenant, that sounds like a good level19:46
corvus++19:46
clarkbAnything else to add?19:47
* mordred thinks this is neat19:47
ianwnot really, if we get that up, we can start to report on pull requests19:47
fungii, too, think this is neat19:47
clarkb#topic Gerrit project CI rework from Google19:48
*** openstack changes topic to "Gerrit project CI rework from Google (Meeting topic: infra)"19:48
clarkbcorvus: ^ want to fill us in19:48
corvusthe google folks recently decided to abandon work on the checks plugin for gerrit19:48
corvus#link https://docs.google.com/document/d/1v2ETifhRXpuYlahtnfIK-1KL3zStERvbnEa8sS1pTwk/edit#19:48
corvusi think they mostly have an idea of where they want to go19:49
fungiwould it still be a plugin, or something more directly integrated?19:49
corvusbut they did ask a few folks to help them gather requirements, so i hopped on a video conference with ben and patrick and told them about us19:49
corvusfungi: possibly both and multiple plugins19:50
fungiahh19:50
clarkbcorvus: this is driven by googles needs right? not necessarily a change of need for upstream itself?19:50
corvusyes, theoretically the checks plugin could live on19:51
corvusbut google is driving most of the work in this area19:51
corvusthe biggest driving force for the change is that internal google ci systems didn't want to adapt to the new checks api19:51
mordredcorvus: did you suggest the internal google ci systems just migrate to zuul which already supports the new api? ;)19:52
* mordred hides19:52
corvusanyway, i told them about how we do third-party ci, stream-events, firewalls, really big firewalls, polling, etc19:52
corvusmordred: yes actually, but that would be a change too19:52
mordredgood point19:52
corvusi showed them hideci19:53
corvus(they said we should upgrade)19:53
clarkbhave they documented how to upgrade yet ;)19:53
corvusclarkb: yes?19:53
clarkbcorvus: I just remember all the undocumented info you and mordred brought back from gerrit user summit last year19:53
fungistep 1: dump all your existing data ;)19:53
corvusthis is starting to derail, but i'm pretty sure since we're the last folks in the world running on 2.13 and everyone else has upgraded, that's pretty solid19:54
mordrednah - upgrade should be fairly straightforward19:54
mordredwe just need to practice run it19:54
corvusyep19:54
clarkbk19:54
fungiyeah, even my sso spec update assumes we're running newer gerrit19:54
clarkbI mean I'm all onboard with upgrading its just still not entirely clear to me that we have a process (but I don't mean to derail from the CI discussion)19:55
corvuslet me be perfectly clear: the fact that we're running 2.13 and have not upgraded is entirely our doing and has nothing to do with the willingness of the gerrit community to help us do it.19:55
mordredcorvus: how did our use case resonate?19:55
fungii didn't list it as a dependency because i figure we're very, very close19:55
mordredcases19:55
clarkbalso do they intend on disabling features like stream events?19:55
corvusif we have any problems upgrading, they really really want us to let them help us19:56
corvusanyway, sorry, i'll try to get back on track.  where were we?19:56
mordredcorvus: you told them about how we do things19:56
mordredand showed them hideci19:56
corvusand i emphasized the ui features of hideci, and what we've learned from that19:57
corvusabout showing multiple ci systems, re-runs, etc19:57
fungithe idea that a change might have dozens of ci systems reporting on it19:57
corvuswhat summary information is important, why you would want to show successful runs, etc.19:57
corvussome of this was new to them, so i think it was a useful exercise.19:57
corvusi can't say for sure what will come out of it.  i don't think google is in a position to commit to implementing things specifically to make our lives easier19:58
corvusbut i did stress that to the extent that they can make use cases like these simple to deal with, it makes gerrit a better product19:58
clarkbassuming that the checks plugin dies and the google thing doesn't work were basically in a similar spot to where we are today using stream events and commits (though they can be labeled as robot comments) right?19:59
clarkbs/commits/comments/19:59
corvusi think they at least understand what we're doing, why, how, and what works and doesn't work, so hopefully that will inform their work and they'll end up with something we can use without too much difficulty19:59
clarkbI guess my biggest concern is that we don't regress where even the simple thing we do today is no longer functional (hence my question about stream events)20:00
clarkbsomething richer would be great, but we can probably make due if that doesn't pan out20:00
corvusclarkb: it's the reporting/ui thing that i think is the biggest issue20:00
corvusremind me what's our story for replacing hideci when we upgrade?20:00
mordredwe don't have 100% of one yet20:00
mordredbut ...20:00
clarkbI think we had assumed checks was going to handle it for first party ci20:00
mordredour friends at wikimedia have a thing very similar to hideci20:00
fungichecks api having subchecks integration in the polygerrit ui20:01
mordredthat's implemented as a proper polygerrit plugin20:01
mordredno - checks api has never been the plan for the upgrade, sorry, that might not have been clear20:01
corvusokay, so if that new plugin works, then yeah, we're probably good for a while20:01
corvusfor me, the goal is still getting all this data out of the comment stream20:02
clarkbcorvus: ya that would be nice particularly to help reviewers not be distracted by all the CI activity20:02
clarkbalso we are at time now. I don't mind going for a few more minutes but we should probably wrap up20:03
clarkbthough feel free to continue discussion in #opendev20:03
corvusanyway, that's the gist -- i think they're interested in talking to us more as their plans solidify20:03
corvusi'm sure we can invite others to the next meeting if folks want20:03
corvus[eot]20:04
clarkbcool thanks for your time everyone. As mentioned feel free to continue the discussion in #opendev20:05
clarkb#endmeeting20:05
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:05
openstackMeeting ended Tue Jul 28 20:05:15 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:05
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.html20:05
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.txt20:05
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.log.html20:05
*** mordred has quit IRC20:05
fungithanks clarkb!20:05
*** mordred has joined #opendev-meeting20:18
*** hamalq has quit IRC23:48

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!