*** kopecmartin|sick is now known as kopecmartin | 08:08 | |
clarkb | meeting time | 19:00 |
---|---|---|
ianw | o/ | 19:00 |
fungi | ohai | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Oct 11 19:01:04 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
frickler | \o | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000364.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The PTG is happening next week. Be aware of that as we make changes (best to avoid updating meetpad and etherpad for example) | 19:01 |
clarkb | Also encourage people that have problems to reach out to us directly. In the past we've gotten reports of trouble via a game of telephone and that has been difficult | 19:02 |
clarkb | Also, this morning I spilled a glass of water on my keyboard so I've been in disarray all day. Thankfully it was the desktop keyboard and not the laptop and I've got a spare modem M that has been plugged in | 19:03 |
clarkb | but still its really weird to type on a new keyboard after having that one for a decade | 19:03 |
fungi | the zuul default ansible version changed to 6 across all of our tenants as of late last week, and we'll be dropping ansible 5 support "real soon" so fixing related problems is time-sensitive | 19:03 |
fungi | probably warrants another announcement if we can nail down a timeline for the ansible 5 removal change | 19:04 |
clarkb | I did at least warn of the ansible 5 removal in the email anouncing the switch to 6 by default | 19:04 |
fungi | yep | 19:05 |
clarkb | #topic Topics | 19:06 |
clarkb | #topic Bastion Host Updates | 19:06 |
clarkb | The changes to stop writing console log files on bridge landed yesterday. Looks like there was a small issue getting the flag name correct. ianw do we have an idea yet if that is working as expected? | 19:07 |
fungi | and we still need a similar one for static.o.o right? or is that already up? | 19:08 |
ianw | i just checked and i think so. there hasn't been a log file written since Oct 11 06:09 (UTC) which is about when all the periodic jobs cleared out | 19:08 |
ianw | static has been done the same way, so it looks good too | 19:09 |
clarkb | awesome. Thats one less thing to worry about now :) | 19:09 |
ianw | heh yes, thank you for reviews. i think it was good to reach a bit more of a generic solution | 19:10 |
ianw | the "tunnel console things via a socket and the ssh connection" changes are another option that is still on my todo list, and seems like a great thing to look into as well | 19:10 |
ianw | one day ... :) | 19:11 |
clarkb | ya though I think we don't want to expose that on bridge or static due to how the protocl works | 19:11 |
clarkb | since it could be used to read other files? | 19:11 |
clarkb | I think what we've done not exposing it is correct for us | 19:11 |
fungi | is intended for reading additional files | 19:11 |
fungi | designed with that in mind anyway | 19:11 |
clarkb | The other stack of changes in flight here has to do with ansible in a venv | 19:12 |
clarkb | #link https://review.opendev.org/q/topic:bridge-ansible-venv | 19:12 |
corvus | er... the current protocol shouldn't be able to read other files? | 19:12 |
clarkb | corvus: ya I don't think it does, but that was always part of the intention iirc. And I don't want to have to think about undoing any opening of things if that changes | 19:13 |
fungi | i know it was at least a theoretical future use case | 19:13 |
corvus | clarkb: yes, it was originally designed for that, but we haven't implemented it because we haven't figured out how to do it safely | 19:13 |
clarkb | gotcha | 19:13 |
corvus | so if that's the concern -- just future-proofing cool. but if there was a thought that we had a current vulnerability... then i would like to explore that more. :) | 19:14 |
clarkb | oh no I don't think we have a current vulnerability. We're set up to avoid it should the zuul behavior change to what was (I thought anyway) the intended behavior | 19:14 |
corvus | (and there's really 2 protocols here -- there's the websocket/finger protocol of user -> zuul, and the internal protocol of executor -> node; the former is the one we designed to allow reading other files in the future, and the second is what we just changed) | 19:15 |
corvus | (though support for the former probably would need changes like the latter) | 19:16 |
corvus | okay, so i think we're all on the same page that currently there is not the ability to read arbitrary files, but that we like the status quo of explicitly disabling log streaming on bridge because, among other things, that future-proofs us against eventually adding that feature. ya? | 19:16 |
clarkb | ++ | 19:17 |
corvus | cool, thx and sorry for the diversion. just wanted to make sure we didn't open something we didn't intend to. :) | 19:17 |
clarkb | ianw: for ansible in a venv did you manage to sort out using the first member of a singleton group as the hosts specification? | 19:18 |
ianw | (specifically i was talking about https://review.opendev.org/c/zuul/zuul/+/542469 but let's not go into that further now :) | 19:18 |
ianw | clarkb: thanks ... one step back i just approved the change reviewed by yourself and fungi to move the production ansible into a venv on the current bridge. so i'll watch that in today. that's the "venv" bit of it really | 19:19 |
fungi | and that preps us for being able to use newer ansible, right? | 19:20 |
clarkb | ya and in theory thta should just switch over due to symlinking the venv install over to ansible | 19:20 |
clarkb | (that was my thought during review anyway) | 19:20 |
ianw | yep, *in theory* it's a noop :) | 19:20 |
clarkb | fungi: sort of, we need to upgrade the python installation too (which is where the replacement node comes in and why the other group work is related) | 19:20 |
ianw | the bits on top now are about upgrading to jammy, and abstracting the way we address the bastion host so we can switch the host more easily -- in this case to probably bridge01.opendev.org | 19:20 |
ianw | anyway, i did establish that as a playbook matcher "groupname[0]" does seem to work to address the first member of a group | 19:21 |
corvus | like `- hosts: bridgegroup[0]` means this is a play that runs on the first host in the bridge group? | 19:22 |
corvus | (er, in the group named "bridgegroup"; i was trying to be clear and may have failed :) | 19:22 |
fungi | and group member ordering is guaranteed deterministic (uses the order in which the members are added i guess) right? | 19:22 |
clarkb | ya I the idea being we can control what the bridge is in a single place (the bridgegroup group) but then only ever have a single entry in that group | 19:22 |
ianw | yep -- https://review.opendev.org/c/opendev/system-config/+/858476 | 19:22 |
clarkb | fungi: the idea is that it would be a singleton group | 19:22 |
clarkb | but to enforce that we would take the first entry everywhere | 19:23 |
fungi | i see | 19:23 |
corvus | why not just let it run on the whole group of 1? | 19:23 |
clarkb | the reason I was concerned with that is it makes the ansible really confusing when you need to address a specific node | 19:23 |
clarkb | like when grabbing the CA files | 19:23 |
clarkb | the ansible you express becomes "create a different CA on every member of the bridge group, but only distribute the CA files for the first group member | 19:24 |
clarkb | if others prefer that I'm ok with that too, but I found it a bit confusing to read when I reviewed it | 19:24 |
ianw | corvus: one problem i haven't dealt with yet is playbooks/bootstrap-bridge.yaml. that runs both under zuul, where the inventory is setup via the job, and in infra-prod, where the inventory is setup by opendev/base-jobs | 19:25 |
corvus | i'm not sure whether or not i would have the same confusion, but i certainly see your point, and the solution seems good. now that i know the reasoning, i can be on board with that. | 19:25 |
ianw | so basically both have to agree on the name/group. this is a bit annoying for clarkb's note of trying to use a different group name for the initial setup bastion host, and the production version | 19:26 |
ianw | sorry, that wasn't intended for corvus: ... :) | 19:26 |
clarkb | oh hrm if using distinct groups for the top level ansible and nested ansible in CI is problematic I think we can just not do that | 19:26 |
corvus | oh whew cause that's a hard question and i was struggling with that. glad i'm off the hook. :) | 19:27 |
clarkb | it was an idea I had whentrying to sort out why the job needed to redefine the group | 19:27 |
ianw | yeah, it is mostly explained in the comment at https://review.opendev.org/c/opendev/system-config/+/858476/9/zuul.d/system-config-run.yaml | 19:27 |
ianw | anyway -- i will keep at it and see what we can come up with; i don't think we need a solution now | 19:28 |
corvus | intuitively, having the group name be the same makes sense to me... so if that's a workable/livable option i would be in favor of that. | 19:28 |
ianw | i think that's where i'm coming back to as well ... | 19:29 |
corvus | and maybe keep a version of that comment explaining that we're using that as a group for the zuul playbook | 19:30 |
fungi | sounds good to me | 19:30 |
clarkb | wfm | 19:30 |
ianw | yes i will definitely do my usual probably-too-verbose commenting on all this :) | 19:30 |
ianw | anyway, I think it's quite likely by this time next meeting we'll have a fully updated bridge, and an easier path when we want to rotate it out next time as well | 19:32 |
clarkb | sounds good. Thank you for working through all the little details of this | 19:32 |
clarkb | #topic Upgrading Bionic Servers | 19:32 |
clarkb | The expected fix for removing the ubuntu user has landed. Now just need to try booting a jammy control plane server again. I'm hoping to give that a go sometime this week. | 19:33 |
clarkb | Sounds like ianw may also give it a go | 19:33 |
clarkb | But other than that I didn't have any new updated here | 19:33 |
fungi | we'll want it before we boot the new listserv at the very least | 19:33 |
clarkb | yup I was thinking I'd find something easy to replace as a guinea pig like a mirror maybe | 19:35 |
clarkb | but probably not until the end of this week | 19:35 |
clarkb | Lets keep moving as the last topic on the agenda is one that deserves discussion before we run out of time | 19:36 |
clarkb | #topic Mailman 3 | 19:36 |
clarkb | fungi has edited the extra long strings on he production mailman2 site and has begun the process of copying data for reattempting the mm3 migration on a newly held test node with our forked images | 19:36 |
fungi | new held node for this is 149.202.168.204, built from your container image fork | 19:37 |
fungi | will hopefully kick off a new scripted import on it within the next hour or so | 19:37 |
fungi | depending on how much longer the rsync runs | 19:37 |
clarkb | corvus: we noticed that a child change of https://review.opendev.org/c/opendev/system-config/+/860157 doesn't find the images that change builds. And were wondering if we got the bits wrong for telling zuul about the image | 19:37 |
clarkb | corvus: maybe if you get some time you can take a look at how the new image build jobs and system-config-run-mailman3 job are hooked up with the buildest registry and provides/requires and dependencies | 19:37 |
clarkb | we've worked around it by forcingthe node hold change to rebuild the images itself | 19:38 |
clarkb | fungi: anything else you need from the rest of us? I expect it is largely just a wait for test results though | 19:38 |
fungi | we've knocked out about all the remaining todo items, so we're probably ready to talk scheduling for lists.opendev.org and lists.zuul-ci.org production migrations | 19:39 |
corvus | clarkb: let's continue that in #opendev | 19:39 |
fungi | i did want to check a few more urls for possible easy/convenient redirects (things like list description pages which people tend to link in various places) | 19:39 |
fungi | stuff not covered by keeping a copy of the pipermail archives hosted from the new server | 19:40 |
clarkb | corvus: yup don't need to solve that here | 19:40 |
clarkb | fungi: good idea, the existing redirects are probably not much help though as tehy redirect to content on disk but you probbaly want url redirects to mm3 urls for those | 19:40 |
fungi | right. i think the list description pages are probably the only thing we really care about redirects for | 19:41 |
fungi | the list indexes for the sites are just served from the root url of each vhost anyway | 19:42 |
fungi | and i'm not too worried about redirecting old admin and moderator interface urls | 19:42 |
clarkb | makes sense | 19:42 |
clarkb | anything else on mm3 before we continue? | 19:42 |
fungi | we should probably also confirm whether we want local logins for users or whether there's a desire to hold this for keycloak integration in order to avoid local credentials in mailman | 19:43 |
fungi | i'm assuming we'd rather get the mm3 migration done and then look at keycloak integration after the fact, but just want to be sure everyone's on the same page there | 19:44 |
clarkb | you can subscribe to lists without creating a user (I did this with upstream mm3) | 19:44 |
fungi | correct | 19:44 |
clarkb | we might even encourage users to do that if they never want to use the web ui for repsonding to things | 19:44 |
clarkb | but ya I wasn't too worried about a future switch over | 19:44 |
ianw | just off the top of my head, it feels like if we allow local logins and then move to a more generic keycloak, we then have the problem of having to merge the local users too? | 19:44 |
fungi | list admins/moderators will need accounts though, and if someone wants to adjust their subscription preferences they'll need a login | 19:44 |
clarkb | ianw: yes we'd likely need to do that. The good thing is we should have email on both sides to align them at least | 19:45 |
fungi | ianw: we'll have that either way. subscribers technically all have accounts, they just don't necessarily have login info for them unless they go through the password reset | 19:45 |
ianw | ahh ok | 19:46 |
frickler | is the login per list or per site or per installation? for mm2 it was per list iiuc | 19:46 |
clarkb | frickler: its per installation | 19:46 |
fungi | frickler: right, for mm3 it's system-wide | 19:47 |
fungi | so not just all lists on a given site, but all mailman sites on that server | 19:47 |
fungi | convenient for folks who interact with a lot of lists, especially across multiple domains on the sam ehost | 19:47 |
fungi | same host | 19:48 |
frickler | so if this is needed to set e.g. digest mode, I think we cannot delay it into the future | 19:49 |
fungi | anyway, i didn't have anything else. we can mull that over, i expect we'll start doing migration scheduling after the ptg | 19:49 |
fungi | frickler: correct | 19:49 |
fungi | basically the options are 1. wait to migrate lists to mm3 until we have keycloak in production the way we want, or 2. migrate to mm3 and then integrate keycloak later and make sure accounts can be linked/merged as needed | 19:50 |
clarkb | right, I think some users will still need to create accounts, but a good chunk of them shouldn't need to which helps simplify things if we want to try and keep them simple like that | 19:50 |
clarkb | I'm fine with 2 | 19:50 |
frickler | ack | 19:50 |
fungi | well, to reiterate, the accounts are precreated, whether the users have login info for them or not | 19:50 |
clarkb | fungi: for all uses? | 19:51 |
clarkb | I guess the migration doesn't stick to not creating an account if it doesn't need to | 19:51 |
fungi | if they're referenced in a config (admin, mod, existing subscription) then the import process creates their accounts. if they subscribe later an account is created the first time they do so | 19:51 |
clarkb | anyway I think its fine to migrate them later since in this case we should have the info needed to make associations | 19:51 |
clarkb | also the mailing list is the sort of thing that can probably safely not have single sign on forever | 19:52 |
clarkb | we are running out of time and I do want to get to the last item on the agenda | 19:53 |
clarkb | we can return to this in #opendev if necessary | 19:53 |
fungi | please do | 19:53 |
clarkb | #topic Updating OpenDev's base job nodeset to Jammy | 19:53 |
clarkb | It has been pointed out that OpenDev's base job nodeset is still Focal. Jammy has been out for about half a year now and has a .1 release. It should be stable enough for our jobs | 19:53 |
clarkb | But that opens questions about how we want to communicate and schedule the switch | 19:54 |
frickler | yes, I came across that while looking to upgrade devstack jobs | 19:54 |
clarkb | I was thinking that we should avoid changing it before the PTG since that will just add a distraction during PTG week. But maybe we can do it the week after ish? Basically do a 2 week notice to service-announce and then swap? | 19:54 |
fungi | openstack is actively switching from focal to jammy for testing now that their zed release is done | 19:54 |
frickler | I think we'd want to run some tests with base-test before discussing details of scheduling? | 19:54 |
clarkb | frickler: in the past we've done that (when the infra team managed this all for openstack) and he problem with that is it sets the expectation that we are repsonsible for making it work for every job | 19:55 |
clarkb | I twas the xenial switch or maybe trusty switch that made me never want to do that again. | 19:55 |
clarkb | I think people should test what they are interested in and be explicit where they know they need to be (say for specific verisons of python). | 19:56 |
frickler | still we'd need to change base-test in order to allow for that? | 19:56 |
frickler | #link https://review.opendev.org/c/opendev/base-jobs/+/860686 would be the change for that | 19:56 |
clarkb | frickler: no, any job can select the jammy nodeset | 19:56 |
fungi | anything inheriting from our default nodeset which breaks when we change it has the option of overriding the nodeset it uses to the earlier value anyway | 19:56 |
fungi | just as it can be adjusted to use the new value before our planned transition date | 19:57 |
frickler | hmm, true that | 19:57 |
clarkb | I think updating base-test is a good idea to keep it in sync with base. But I don't think that is the method for tesitng this. base-test is for testing the roles in base | 19:57 |
clarkb | we know they work on jammy because projects like zuul already use jammy | 19:57 |
clarkb | so we don | 19:57 |
clarkb | er we don't need to test that base functionality | 19:58 |
corvus | i agree i don't think this needs a base-test cycle since we know that the change won't break all jobs (because we can and have made the change explicitly elsewhere, and zuul performs syntax validation on the change) | 19:58 |
fungi | in my mind, the main questions are when do we plan to switch it/how much advance notice do we want to provide users | 19:58 |
clarkb | fungi: ++ | 19:58 |
clarkb | I think we should wait for after the PTG at the very least | 19:58 |
fungi | wait for after the ptg to announce it, or for actually changing it? | 19:59 |
clarkb | actually changing it. Ideally we should announce whatever we decide on real soon now | 19:59 |
frickler | 2 week notice should be fine then. announce now | 19:59 |
fungi | sounds good to me | 19:59 |
ianw | ++ | 19:59 |
clarkb | cool. I can work on a draft for service-announce after lunch today | 20:00 |
clarkb | (I'm happy to send that as I think most others get moderated) | 20:00 |
fungi | however, we should be mindful of the zuul dropping ansible 5 situation as well, and whether we want those to coincide, or be announced together, or not compete | 20:00 |
clarkb | dropping ansible 5 has already been announced but without a hard date. I think it was a week or so from today that zuul had planned to drop ansible5 | 20:01 |
frickler | agree, having a couple of days between them will help in distinguishing failure causes | 20:01 |
clarkb | we will need to manually restart zuul to pick up that change quicker than our weekly restarts. But that is easy to do | 20:02 |
clarkb | (also I don't think anything is using ansible 5 so should be an easy switch) | 20:02 |
clarkb | I'll work on a draft email for all that in a bit | 20:02 |
fungi | thanks! | 20:02 |
clarkb | and we are at time | 20:02 |
ianw | for mine i think it probably gets confusing to combine them as a single change, as they're not really related as such, so agree with doing separtely | 20:02 |
clarkb | thanks everyone | 20:02 |
corvus | thanks clarkb | 20:02 |
clarkb | feel free to continue discussion over in #opendev | 20:03 |
clarkb | #endmeeting | 20:03 |
opendevmeet | Meeting ended Tue Oct 11 20:03:05 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:03 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.html | 20:03 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.txt | 20:03 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.log.html | 20:03 |
fungi | thanks clarkb! | 20:03 |
clarkb | I'm also trying to decide how hard I should try and rescue that keyboard | 20:03 |
clarkb | I have to rip rubber feet off the bottom to get at the screws then somehow workaround some clips.... | 20:03 |
clarkb | but one thing at a time | 20:03 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!