*** hamalq has quit IRC | 00:41 | |
*** weshay|ruck has joined #opendev-meeting | 12:54 | |
weshay|ruck | zbr, 0/ | 12:56 |
---|---|---|
zbr | o/ | 13:25 |
fungi | the weekly meeting isn't for another 5.5 hours, but get here early for a good seat ;) | 13:26 |
zbr | fungi: i may fall asleep before it starts, already tired | 15:45 |
fungi | luckily it's logged | 15:47 |
*** hamalq has joined #opendev-meeting | 16:46 | |
corvus | 13.5 + 5.5 equals..... | 18:59 |
corvus | o/ | 18:59 |
clarkb | hello | 19:00 |
clarkb | we'll get started shortly | 19:00 |
fungi | i already started | 19:00 |
ianw | o/ | 19:00 |
fungi | oh, you meant the meeting, not uncontrolled drinking | 19:00 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Jul 28 19:01:04 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2020-July/000059.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
clarkb | I have no announcements | 19:01 |
clarkb | #topic Actions from last meeting | 19:01 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:02 | |
fungi | we now have dates for the (no surprise) virtual open infrastructure summit | 19:02 |
clarkb | #undo | 19:02 |
openstack | Removing item from minutes: #topic Actions from last meeting | 19:02 |
fungi | and there's a survey which has gone out to pick dates for the ptg | 19:02 |
clarkb | #link https://www.openstack.org/summit Info on now virtual open infrastructure summit | 19:02 |
mordred | o/ | 19:03 |
clarkb | #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016098.html PTG date selection survey | 19:03 |
fungi | ptg date options seem to span from a couple weeks before the summit to several weeks after | 19:03 |
clarkb | fungi: thank you for the reminders | 19:03 |
fungi | my personal preference is to not interfere with hallowe'en ;) | 19:04 |
mordred | fungi: I dunno - interfering with halloween is a long-standing summit tradition :) | 19:04 |
fungi | and an unfortunate one in my opinion | 19:04 |
fungi | anyway, you can proceed with the meeting, sorry for derailment | 19:05 |
clarkb | #topic Actions from last meeting | 19:05 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:05 | |
mordred | I don't disagree with you | 19:05 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-21-19.01.txt minutes from last meeting | 19:05 |
clarkb | There were no actions | 19:05 |
clarkb | #topic Specs approval | 19:06 |
*** openstack changes topic to "Specs approval (Meeting topic: infra)" | 19:06 | |
clarkb | #link https://review.opendev.org/#/c/731838/ Authentication broker service | 19:06 |
clarkb | This just got a new patchset. Probably still not ready for approval but definitely worthy of review | 19:06 |
fungi | yeah, i think we at least have direction for a poc | 19:06 |
clarkb | fungi: anything you want to add to that re new patchset or $other | 19:06 |
fungi | new patcheset is primarily thanks to your inspiration, more comments welcome of course | 19:07 |
fungi | basically noting that we're agreed on keycloak, not much new otherwise | 19:08 |
fungi | also, you can move on, nothing else from me | 19:09 |
clarkb | #topic Priority Efforts | 19:09 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:09 | |
clarkb | #topic Update Config Management | 19:09 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:09 | |
clarkb | A note that we went about 5 days without any of our infra-prod jobs running due to the zuul security bug fix and our playbook tripping over that. Since that has been fixed things seem happy | 19:10 |
clarkb | If you find things are lagging or otherwise not up to date this could be why | 19:10 |
clarkb | Semi related to that are the changes to run zuul executors under containers. This gets all of our zuul and nodepool services except nb03's builder on containers | 19:11 |
fungi | workarounds were mostly trivialish | 19:11 |
clarkb | two things to make note of there. nb03 needs an arm64 image which has hit a speed bump due to the slowness of building python wheels under buildx. Zuul executors need to start docker after afs (a fix for this has landed) to ensure bind mounts work properly for afs | 19:12 |
clarkb | for nb03 I think it is using the plaintext zk connection still. Is that correct corvus ? | 19:12 |
corvus | clarkb: unclear, i'll check | 19:13 |
corvus | yes | 19:13 |
mordred | clarkb: have we checked to see if building the wheels on native arm64 for the wheels in question is also slow? | 19:13 |
clarkb | mordred: thats a thing that ianw has been investigating with upstream python cryptography and I think it is significantly faster | 19:14 |
fungi | mordred: i think it's more that far fewer of our dependencies publish arm manylinux wheels so we have to build more of them than for x86 | 19:14 |
mordred | nod | 19:14 |
clarkb | fungi: it is that, but also we had an hour time out on the job and they would hit that compiling any one of cryptography, pynacl and bcrypt | 19:14 |
clarkb | fungi: so the compiles there are also really really slow | 19:15 |
fungi | and yeah, building on native arm64 is apparently far faster than qemu emulated arm64 on amd64 | 19:15 |
corvus | solutions include: 1) help upstreams build arm wheels; 2) build nodepool images on a native arm builder; 3) add in a new layer to the nodepool image stack that builds and installs dependencies so that we can rebuild this layer separately and less often. | 19:15 |
fungi | i like the idea of helping upstreams, and there's a topic on the agenda for that | 19:15 |
corvus | ianw started on #1; #2 and #3 are not in play yet, just for discussion | 19:16 |
corvus | i'm still not keen on #2 | 19:16 |
mordred | me either | 19:16 |
fungi | philosophically, #1 is preferable | 19:16 |
corvus | (because i don't think it's okay not to be able to merge nodepool changes if we lose that cloud) | 19:16 |
mordred | I think failing 1 we should try 3 before going to 2 | 19:16 |
ianw | #3 doesn't seem like it precludes #1 either | 19:17 |
corvus | yeah, we could go ahead and start on #3 if we think it's not a terrible idea | 19:17 |
fungi | yeah, #3 is doable independently and will likely help | 19:17 |
mordred | yah. we could also do them not as another image layer, but just as a local version of #1 that publishes to a zuul-specific wheel mirror similar to the openstack specific wheel mirror | 19:17 |
corvus | basically, i'm imagining an image that we build only on requirements.txt changes and nightly | 19:17 |
fungi | but if #1 is practical, then it will probably not be a huge gain | 19:17 |
corvus | fungi: it's also a fallback for the next requirement we have that doesn't have a wheel | 19:18 |
fungi | er, if #1 is practical then #3 will probably not be a huge gain | 19:18 |
corvus | it sort of scales up and down as needed :) | 19:18 |
mordred | so we treat "make sure we have wheels of X, Y and Z" as a task separate from "build an image containing nodepool and X, Y and Z" | 19:18 |
fungi | yeah, i agree it's a useful backstop either way | 19:18 |
mordred | corvus: ++ | 19:18 |
corvus | mordred, ianw: do you think we can generalize the make-a-wheel-mirror thing like that? | 19:18 |
ianw | corvus: yes, it's just a script :) | 19:19 |
corvus | is that a service we'd like opendev to provide? | 19:19 |
mordred | might not be a terrible general service to offer | 19:19 |
fungi | if the new layer for #3 is part of the same job though, then it can still easily time out whenever there's something new to buid | 19:19 |
ianw | the only thing is that the arm64 wheel build already runs quite long, such that we've restricted it to the lastest 2 branches | 19:19 |
fungi | build | 19:19 |
corvus | ianw: but also a vhost? or would we just have the mirror host be a superset of all the wheels for all the projects)? | 19:19 |
corvus | fungi: same job as what job? | 19:20 |
fungi | same as the job building the other layers | 19:20 |
fungi | the job currently timing out | 19:20 |
corvus | fungi: yes there would be no point in that :) my suggestion is a separately tagged image layer built by a separate job | 19:20 |
fungi | do we have a separate job per layer? if so then less of a problem i guess | 19:20 |
clarkb | corvus: I think superset | 19:20 |
corvus | fungi: if the word 'layer' is tripping you up, just ignore it and call it an 'image' :) | 19:21 |
clarkb | corvus: becuase requirements and constraints or lock files should control what projects actually use | 19:21 |
clarkb | if they rely on packages being available (or not) in our cache to select versions that is a bug imo | 19:21 |
ianw | (on the wheel job) but this is something that is open to more parallelism. we already run it under "parallel" and I think using a couple of nodes could really speed it up by letting the longest running jobs sit on one node, while the smaller things zoom by | 19:21 |
fungi | i do like the superset idea, if we have a good way for projects to chuck in the package+version they want a wheel of | 19:21 |
ianw | fungi: i was probably thinking we run requirements.txt, just without a cap | 19:21 |
corvus | yeah, a wheel mirror sounds way easier than a new layer | 19:22 |
corvus | so let's call that #4, and execute #1 and #4, and leave ideas #2 and #3 on the shelf? | 19:22 |
ianw | i can take an action item to look at the wheel mirror if we like | 19:22 |
fungi | there is a slight gain from not aprallelizing, in that if projects with common dependencies are split between concurrent jobs then that dependency will be built twice, but ni practice that's probably not a huge concern | 19:22 |
fungi | s/aprallelizing/parallelizing/ | 19:23 |
fungi | my hands are typing at incompatible speeds today | 19:23 |
mordred | fungi: to one another? | 19:24 |
fungi | at least, yes | 19:24 |
ianw | one other thing about wheels is that some of the caveats in https://review.opendev.org/#/c/703916/ still applies ... we append wheels and never delete currently | 19:25 |
clarkb | ianw: thankfully these packages tend to be fairly small (unlike say the cuda pypi packages) | 19:25 |
ianw | basically the cache grows unbounded, which is one thing with capped requirements but if we're chasing upstream might need thinking about | 19:25 |
fungi | true, if we shard and build in parallel then deleting becomes much harder too | 19:25 |
fungi | though if we're worried about space on our current mirror, i'd say the first course of action is to delete any wheels which are also present on pypi. we used to copy them (unnecessarily) into the mirror | 19:27 |
fungi | we no longer do that, but since we only append... | 19:27 |
clarkb | also due to using afs we could clear the contents, rebuild, then publish the result periodically if we want to prune right? | 19:28 |
clarkb | the RO volume will serve the old contents until we switch | 19:28 |
clarkb | s/switch/vos release/ | 19:28 |
fungi | yeah, pruning as a separate step could be atomic, at least | 19:28 |
ianw | this is true, but i think that all the jobs would timeout trying to refresh the cache | 19:28 |
fungi | though we'd need to block additions while pruning | 19:28 |
ianw | so it would be very manual | 19:29 |
clarkb | gotcha | 19:29 |
clarkb | ok we've got time in a bit to talk further about #1, anything else to bring up on #4 before we move on? | 19:30 |
ianw | do you want to give me an action item on it? or is anyone else super keen? | 19:30 |
clarkb | if you're interested I would say go for it | 19:31 |
corvus | ianw: if yo can get started and tag me for assistance/reviews that'd be great | 19:31 |
ianw | ++ will do | 19:31 |
clarkb | #action ianw Work on incorporating non OpenStack requirements into our python wheel caches. corvus willing to assist | 19:31 |
clarkb | #topic OpenDev | 19:32 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:32 | |
clarkb | I announced deprecation for gerrit /p/ mirrors | 19:32 |
clarkb | #link http://lists.opendev.org/pipermail/service-announce/2020-July/000007.html Gerrit /p/ mirror deprecation. | 19:32 |
clarkb | #link https://review.opendev.org/#/c/743324/ Implement Gerrit /p/ mirror "removal" | 19:33 |
clarkb | and pushed that change to make it happen. I said I would do this friday | 19:33 |
clarkb | if we think the plan above is a bad one let me know and we can modify as necessary | 19:33 |
fungi | just to confirm (i also left a review comment), is the idea to remove that line from the config when we upgrade? | 19:33 |
clarkb | The reasons for it are that gerrit needs that url for something else on newer versions which we plan to upgrade to and I've slowly been trying to update things to manage git branches and not having another mirror (that is going away anyway) makes that simpler | 19:34 |
clarkb | fungi: yes | 19:34 |
fungi | or will it not break polygerrit dashboards? | 19:34 |
fungi | ahh, okay, thanks | 19:34 |
clarkb | there will also need to be followon cleanup to stop syncing to the local git repos and all that | 19:34 |
clarkb | but I was going to wait on that to be sure we don't revert for some reason | 19:34 |
mordred | ++ | 19:35 |
clarkb | That was what I had for opendev subjects. | 19:35 |
clarkb | #topic General topics | 19:36 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:36 | |
clarkb | #topic Bup and Borg backups | 19:36 |
*** openstack changes topic to "Bup and Borg backups (Meeting topic: infra)" | 19:36 | |
clarkb | #link https://review.opendev.org/741366 Adds borg backup support and needs review | 19:36 |
ianw | we'd also discussed the old backups and i checked on them | 19:37 |
clarkb | I've reviewed ^ and think it makes sense. Allows us to have bup and borg side by side too | 19:37 |
clarkb | ianw: they still happy even after the index cleanup? | 19:37 |
ianw | review seems fine; i did a full extraction and here's some important bits compared : http://paste.openstack.org/show/796367/ | 19:37 |
ianw | zuul also seems ok, but i did notice we've missed adding that to the "new" server | 19:37 |
ianw | https://review.opendev.org/#/c/743445/ | 19:37 |
ianw | #link https://review.opendev.org/#/c/743445/ | 19:38 |
ianw | if i can get reviews on the borg backup roles, i'll start a new server and we can try it with something | 19:38 |
clarkb | great, thank you for putting that together | 19:39 |
ianw | (as a tangent on starting a new server) | 19:39 |
ianw | #link https://review.opendev.org/743461 | 19:39 |
ianw | and | 19:39 |
ianw | #link https://review.opendev.org/743470 | 19:39 |
ianw | adds sshfp records for our hosts, and makes the launch script print them too :) | 19:39 |
clarkb | #topic GitHub 3rd Party CI | 19:40 |
*** openstack changes topic to "GitHub 3rd Party CI (Meeting topic: infra)" | 19:40 | |
clarkb | as promised earlier we can talk about this a bit more | 19:40 |
clarkb | Basically python cryptography doesn't have arm64 wheels. ianw filed an issue with them to start a conversation on whether or not we could help with that | 19:41 |
clarkb | #link https://github.com/pyca/cryptography/issues/5339 | 19:41 |
clarkb | we also talked about it a bit ourselves and it is something we'd like to do if we can make all the pieces fit together such that cryptogrpahy is happy and all that | 19:41 |
clarkb | To do this we'll need to spin up a tenant for them | 19:42 |
clarkb | #link https://review.opendev.org/#/q/topic:opendev-3pci | 19:42 |
fungi | though in short, the main reason they don't have arm64 manylinux1 wheels is that travis takes too long to test/build them because it uses emulation? | 19:42 |
clarkb | yes I believe travis is doing something very similar to our buildx nodepool image builds for docker | 19:42 |
ianw | i'm not sure on the emulation part; i'm pretty sure they run on AWS | 19:42 |
ianw | but the slow part, yes | 19:42 |
corvus | i was thinking the tenant would be a 'cryptography' tenant | 19:43 |
corvus | the main issue prompting giving them their own tenant is that they won't be coordinating with anyone else | 19:43 |
fungi | yeah, or at most a pyca tenant | 19:43 |
corvus | that seems reasonable too | 19:43 |
ianw | yeah, that was something to discuss, i made the tenant deliberately a bit more generic thinking that it may be a home for a collection of things we might need/be interested in | 19:44 |
corvus | (so a generic 3pci tenant seems weird since we'd never want to add anyone else there) | 19:44 |
fungi | i concur | 19:44 |
fungi | granularity is good here | 19:44 |
ianw | already i've had mentions from people that lxml would like something similar | 19:44 |
fungi | tenants would, ideally, mirror the authorities for them | 19:44 |
clarkb | its possbile other third party ci groups would be happy with a more generic pool but cryptography in particular seemed to want to be very much their own thing | 19:45 |
clarkb | so at least to start I agree with fungi and corvus | 19:45 |
clarkb | then evaluate if other python deps do similar | 19:45 |
ianw | ok, i will rework it to a pyca tenant, that sounds like a good level | 19:46 |
corvus | ++ | 19:46 |
clarkb | Anything else to add? | 19:47 |
* mordred thinks this is neat | 19:47 | |
ianw | not really, if we get that up, we can start to report on pull requests | 19:47 |
fungi | i, too, think this is neat | 19:47 |
clarkb | #topic Gerrit project CI rework from Google | 19:48 |
*** openstack changes topic to "Gerrit project CI rework from Google (Meeting topic: infra)" | 19:48 | |
clarkb | corvus: ^ want to fill us in | 19:48 |
corvus | the google folks recently decided to abandon work on the checks plugin for gerrit | 19:48 |
corvus | #link https://docs.google.com/document/d/1v2ETifhRXpuYlahtnfIK-1KL3zStERvbnEa8sS1pTwk/edit# | 19:48 |
corvus | i think they mostly have an idea of where they want to go | 19:49 |
fungi | would it still be a plugin, or something more directly integrated? | 19:49 |
corvus | but they did ask a few folks to help them gather requirements, so i hopped on a video conference with ben and patrick and told them about us | 19:49 |
corvus | fungi: possibly both and multiple plugins | 19:50 |
fungi | ahh | 19:50 |
clarkb | corvus: this is driven by googles needs right? not necessarily a change of need for upstream itself? | 19:50 |
corvus | yes, theoretically the checks plugin could live on | 19:51 |
corvus | but google is driving most of the work in this area | 19:51 |
corvus | the biggest driving force for the change is that internal google ci systems didn't want to adapt to the new checks api | 19:51 |
mordred | corvus: did you suggest the internal google ci systems just migrate to zuul which already supports the new api? ;) | 19:52 |
* mordred hides | 19:52 | |
corvus | anyway, i told them about how we do third-party ci, stream-events, firewalls, really big firewalls, polling, etc | 19:52 |
corvus | mordred: yes actually, but that would be a change too | 19:52 |
mordred | good point | 19:52 |
corvus | i showed them hideci | 19:53 |
corvus | (they said we should upgrade) | 19:53 |
clarkb | have they documented how to upgrade yet ;) | 19:53 |
corvus | clarkb: yes? | 19:53 |
clarkb | corvus: I just remember all the undocumented info you and mordred brought back from gerrit user summit last year | 19:53 |
fungi | step 1: dump all your existing data ;) | 19:53 |
corvus | this is starting to derail, but i'm pretty sure since we're the last folks in the world running on 2.13 and everyone else has upgraded, that's pretty solid | 19:54 |
mordred | nah - upgrade should be fairly straightforward | 19:54 |
mordred | we just need to practice run it | 19:54 |
corvus | yep | 19:54 |
clarkb | k | 19:54 |
fungi | yeah, even my sso spec update assumes we're running newer gerrit | 19:54 |
clarkb | I mean I'm all onboard with upgrading its just still not entirely clear to me that we have a process (but I don't mean to derail from the CI discussion) | 19:55 |
corvus | let me be perfectly clear: the fact that we're running 2.13 and have not upgraded is entirely our doing and has nothing to do with the willingness of the gerrit community to help us do it. | 19:55 |
mordred | corvus: how did our use case resonate? | 19:55 |
fungi | i didn't list it as a dependency because i figure we're very, very close | 19:55 |
mordred | cases | 19:55 |
clarkb | also do they intend on disabling features like stream events? | 19:55 |
corvus | if we have any problems upgrading, they really really want us to let them help us | 19:56 |
corvus | anyway, sorry, i'll try to get back on track. where were we? | 19:56 |
mordred | corvus: you told them about how we do things | 19:56 |
mordred | and showed them hideci | 19:56 |
corvus | and i emphasized the ui features of hideci, and what we've learned from that | 19:57 |
corvus | about showing multiple ci systems, re-runs, etc | 19:57 |
fungi | the idea that a change might have dozens of ci systems reporting on it | 19:57 |
corvus | what summary information is important, why you would want to show successful runs, etc. | 19:57 |
corvus | some of this was new to them, so i think it was a useful exercise. | 19:57 |
corvus | i can't say for sure what will come out of it. i don't think google is in a position to commit to implementing things specifically to make our lives easier | 19:58 |
corvus | but i did stress that to the extent that they can make use cases like these simple to deal with, it makes gerrit a better product | 19:58 |
clarkb | assuming that the checks plugin dies and the google thing doesn't work were basically in a similar spot to where we are today using stream events and commits (though they can be labeled as robot comments) right? | 19:59 |
clarkb | s/commits/comments/ | 19:59 |
corvus | i think they at least understand what we're doing, why, how, and what works and doesn't work, so hopefully that will inform their work and they'll end up with something we can use without too much difficulty | 19:59 |
clarkb | I guess my biggest concern is that we don't regress where even the simple thing we do today is no longer functional (hence my question about stream events) | 20:00 |
clarkb | something richer would be great, but we can probably make due if that doesn't pan out | 20:00 |
corvus | clarkb: it's the reporting/ui thing that i think is the biggest issue | 20:00 |
corvus | remind me what's our story for replacing hideci when we upgrade? | 20:00 |
mordred | we don't have 100% of one yet | 20:00 |
mordred | but ... | 20:00 |
clarkb | I think we had assumed checks was going to handle it for first party ci | 20:00 |
mordred | our friends at wikimedia have a thing very similar to hideci | 20:00 |
fungi | checks api having subchecks integration in the polygerrit ui | 20:01 |
mordred | that's implemented as a proper polygerrit plugin | 20:01 |
mordred | no - checks api has never been the plan for the upgrade, sorry, that might not have been clear | 20:01 |
corvus | okay, so if that new plugin works, then yeah, we're probably good for a while | 20:01 |
corvus | for me, the goal is still getting all this data out of the comment stream | 20:02 |
clarkb | corvus: ya that would be nice particularly to help reviewers not be distracted by all the CI activity | 20:02 |
clarkb | also we are at time now. I don't mind going for a few more minutes but we should probably wrap up | 20:03 |
clarkb | though feel free to continue discussion in #opendev | 20:03 |
corvus | anyway, that's the gist -- i think they're interested in talking to us more as their plans solidify | 20:03 |
corvus | i'm sure we can invite others to the next meeting if folks want | 20:03 |
corvus | [eot] | 20:04 |
clarkb | cool thanks for your time everyone. As mentioned feel free to continue the discussion in #opendev | 20:05 |
clarkb | #endmeeting | 20:05 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:05 | |
openstack | Meeting ended Tue Jul 28 20:05:15 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:05 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.html | 20:05 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.txt | 20:05 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.log.html | 20:05 |
*** mordred has quit IRC | 20:05 | |
fungi | thanks clarkb! | 20:05 |
*** mordred has joined #opendev-meeting | 20:18 | |
*** hamalq has quit IRC | 23:48 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!