Tuesday, 2021-11-09

clarkbanyone else here for the meeting?19:00
diablo_rojoo/19:00
fungiohai19:01
clarkbHello19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Nov  9 19:01:09 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-November/000295.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI was hoping I could link to gerrit user summit stuff but I can't find any details on that yet. They must be running into planning issues. I can sympathize with that19:02
ianwo/19:02
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-11-02-19.01.txt minutes from last meeting19:03
clarkbWe didn't record any new actions and the prior actions are all done or in progress. \o/19:03
clarkb#topic Specs19:03
clarkbJust a quick note here that I approved fungi's mailman3 spec after a quick respin to address input from frickler. The updates were minor and didn't seem like anything that needed to be completely rereviewed19:03
clarkb#topic Topics19:05
clarkb#topic Improving OpenDev's CD throughput19:05
clarkbthis is still on my todo list. All the zuul and container fun has been distracting me :/19:05
clarkbianw: have you had a chance to look at why the child changes are failing CI? (I'm mostly just curious)19:05
ianwno sorry, i've managed to be distracted on other things19:06
clarkbI think we've all been in that boat recently19:06
clarkb#topic Gerrit Account Cleanups19:06
clarkbI was mostly going to skip this over except fungi found a story where someone had trouble with @ in their username. fungi has asked them to clarify if they were trying to use an email address as a username or if their username actually has an @ in it19:06
clarkbNoting that here for the possibility there is another sort of cleanup we'll need to do in normalizing usernames19:07
clarkbNot really actioanable at this point but an interesting possibility19:07
clarkb#topic Zuul Multi Scheduler Setup19:08
corvusthere are 2 schedulers19:08
clarkbZuul has made great progress on supporting multiple schedulers (removing the last remaining spof for a zuul install). Our OpenDev zuul is running two schedulers. One on zuul01.o.o and the other on zuul02.o.o. Zuul02 is the "primary"19:08
clarkbWhat makes zuul02 the primary for us is that all web traffic hits it first19:08
corvusand gearman19:09
corvusand actually there's no web on zuul0119:09
clarkbIf things go really sideways I think we can stop zuul01's scheduler and restart the scheduler on zuul02 only19:09
corvus(cause we don't have a load balancer for it)19:09
corvus++19:09
fungiand, if necessary, clear out zk19:10
corvusand if things go really badly, clearing the zk state would be a good idea19:10
clarkbIt is worth noting that we have run into problems but we've been trying to work through them as they show up. corvus has been a great help with that.19:10
clarkbSo far we've had issues with retried jobs not behing handled properly. Nodepool requests getting stuck in a perpetually waiting state, and config errors not serializing properly19:10
fungii'm thrilled that we haven't needed to downgrade again19:10
corvusi think we're at the point where the problems that have been cropping up have been minimal enough we can roll forward19:10
clarkbThere are a few more new issues showing up today that deserve followup after this meeting. Specifically johnsom's designate change error and elodilles lack of a zuul.tag var on release jobs. There is also a job runtime estimate problem that corvus has a fix up for19:11
clarkbcorvus: ^ fyi details for both of those other things are in #opendev19:11
clarkbPlease do report any weird behavior. So far a lot of weird behavior I have noticed ahs been tracked back to the multi scheduler setup and reporting those things has been very helpful because now they are fixed :)19:13
clarkbAnd ya the recovery at this point can probably be achived by simple restarting onto one scheduler using zuul02 since the code seems quite stable in a single scheduler setup19:13
clarkbAnything else to add to the zuul topic? or questions about the setup?19:14
clarkb#topic User management on our systems19:15
clarkbLast week I started pulling on a thread and noticed there were some improvements we could make to how we manage our users and uids19:16
clarkbthank you to those who helped me make sense of it all19:16
clarkbA number of changes have come out of that. Some potentially impactful if we aren't careful. I'd like to ask infra-root take a look over these changes so that we can start landing them when we are confident they are safe and are able to watch them go in19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/816869/ Be explicit about uid/gid ranges19:16
clarkbThis first change adjusts our configured ranges in adduser.conf and logins.def which different tools refer to when creating usrrs19:17
clarkbThe rough layout is: 0-999 system, 1000-1999 unallocated, 2000-2999 for infra-root users, 3000-9999 host level users, 10k - 64k container users that need uids on the host as well for bind mounts.19:17
fungi#link https://review.opendev.org/816869 Lower UID/GID range max to make way for containers19:17
fungialso that one19:17
clarkbThats the same one :)19:17
fungier, yep sorry. i guess it had a different title19:18
clarkbthe idea there is we've put a number of container services on high uids like 10001 but then when we create say the letsencrypt group it gets created as gid 1000219:18
clarkbfungi and I are thinking it would be better to not assign specific values to stuff that actually belongs to the system like our users and letsencrypt group and so on so we cap those at 9999 then we can eb explicit about container uids/gids and ensure those are non overlapping in the >10k space19:19
fungiwell, our users are already statically assigned uids and gids19:19
clarkbyup19:19
fungi"our" personal accounts i mean19:19
clarkb#link https://review.opendev.org/c/opendev/system-config/+/816771 Clean up unused bootstrapping users19:19
clarkbIs antoher user related change. This time to cleanup users that I think we don't need.19:20
clarkbThis is mostly a belts and suspenders spring cleaning move and one that scares me slightly since we might've accidentally used one of those users for something functional but as far as I can tell this isn't the case19:20
fungiwell, it's also a security concern19:20
clarkbright19:20
clarkbI think it is important, but one we should take care with and review carefully19:21
fungias on some systems those accounts come with provider-supplied authorized keys19:21
clarkb#link https://review.opendev.org/c/opendev/system-config/+/816769/ Give gerritbot and matrix-gerritbot a shared user19:21
clarkbThis is a followon to 816869, and I'd like to have us work our way through giving most of our other containers similar treatment19:21
clarkblodgeit, refstack, hound, some other irc bots, etc19:22
clarkbBut I figure start slow make sure we've got things laid out the way we want first before we do a bunch of work that needs updating again later19:22
clarkbThe gerritbots seemed like a good example case19:22
clarkbThen when that has been worked through we should also look into updating the uid for our mariadb containers to something other than 99919:23
clarkbThe mariadb uid was what sparked this whole thing off in the first place and will likely be the last thing to be addressed :)19:23
clarkbI think for mariadb we can probably run it as the user for the services it supports in most cases. For review run it as the gerrit user, for etherpad run it as the etherpad user and so on19:23
clarkbSince we don't run shared mariadbs between services we can do that safely19:24
clarkbIf y'all can review with a very critical eye I would appreciate it. I'm happy to do additiaonl testing (we already did some manual testing before settling on 816869)19:24
fungiyeah, it *seems* to work as hoped19:25
fungipart of the problem is that adduser and useradd rely on entirely different configs19:25
fungiand package maintscripts and ansible roles use who knows which one19:26
clarkbI think I foudn some evidence that package scripts do use both. Or rather one package uses one and another packages uses the other19:26
fungiright19:26
clarkbthe evidence for this is that one will do gidmax-1 and the other will do gidmin+119:26
fungiso at least having the two of them in sync should help19:26
clarkband we see evidence of both on our systems19:26
clarkbAny other questions or concerns to bring up on this topic?19:28
ianwthanks for digging into it and explaining, i'll take a look at the changes too19:28
clarkb#topic Open Discussion19:30
clarkbI wasn't sure which other topics we would want to discuss so decided to trim the agenda down and let this portion of the meeting cover anything else19:30
clarkbGerrit3.4 upgrade stuff and fedora 35 work seem to be progressing, but not sure there is anything to share yet19:31
clarkbI think there is a dib change I should review for containerfile stuff that I haven't been able to get to19:31
clarkb#link https://review.opendev.org/c/openstack/diskimage-builder/+/817139 dib handle containerfile errors better19:31
ianwyeah, that was working but last night centos-9 mirrors were broken19:31
ianwspeaking of19:31
ianw#link https://review.opendev.org/c/opendev/system-config/+/81713619:31
fungii could use some help on fixing bitrot in the storyboard-webclient builds, if anyone has tips for how to update a yarn.lock19:32
ianwadds centos 9-stream mirrors, but i don't think we have space.  my plan is to continue to remove debian-stretch 19:32
fungi#link https://review.opendev.org/814053 [opendev/storyboard-webclient] Bindep cleanup19:32
clarkbianw: just left a comment on the centos-9-stream change19:32
clarkbfungi: to update a yarn.lock you remove the lock and then reinstall iirc. That produces a new lock file and if testing succeeds with that you can merge it19:33
fungireinstall what?19:33
clarkbreinstall the javascript stuff using yarn19:34
ianwi've always just done "yarn upgrade" i think19:34
clarkbhttps://classic.yarnpkg.com/en/docs/cli/install/ I think19:34
fungioh, i can give that a try, thanks19:34
fungioh, also i've had this up for a while, to hopefully make our system-config jobs a little more robust...19:34
clarkbianw: ah that might be the better method then. I guess upgrade ignores the lock and writes a new one?19:34
fungi#link https://review.opendev.org/813880 [opendev/system-config] Retry acme.sh cloning19:34
ianwbut either way, the *real* problem is going to be every javascript library that has maintained the same name but rewritten itself completely (see prior discussions in #opendev yesterday :)19:34
fungiyeah, in this case the reason i need to do it is because one of the js packages needs updating to support python319:35
fungibut i assume that will involve updates to a lot of other dependencies19:36
clarkbya you might have to fiddle with the requirements file equivalent to find something that produces a working set19:36
clarkbwhen I've done this for zuul before it is a fun exercise19:36
fungiand this for making the rejects in our iptables rules slightly more expressive...19:36
fungi#link https://review.opendev.org/810013 [opendev/system-config] Switch IPv4 rejects from host-prohibit to admin19:36
clarkbfungi: I've +2'd that one so you can approve it when you are able to watch it. Like the user changes has potential for wide spread pain if somehow it goes wrong (though again I don't expect any issues)19:38
fungiyep19:39
fungii did some spot testing and confirmed it works as expected, at least19:39
ianwoh, i approved it in between, but yeah, i'll be around19:40
clarkbcool19:40
clarkbI'll give this meeting a few more minutes to bring anything else up19:40
clarkbBut then I need to review some zuul fixes and eat lunch :)19:40
fungii'll be around too, of course19:40
ianw#link https://review.opendev.org/c/opendev/system-config/+/81676619:40
ianwis a minor one to expose the db in gerrit testing19:41
ianwi don't think we were noticing gerrit wasn't actually talking to the db correctly19:41
fungioh, right, i meant to look at that one, thanks19:41
clarkb++ thats a good update to our testing for gerrit19:41
clarkbianw: you might consider toggling the state too since we had the issue with the non unique keys thing in the past that was only hit on the change of an already created row19:42
clarkbI approved it as is as this is claerly better than what we had before19:42
ianwthat's a good idea, can do that, just the same request with a DELETE19:43
ianwI guess a loop of PUT DELETE PUT might work19:43
ianwi wonder if a template to method: works 19:44
fungiseems like it should? it's just a string, right?19:45
clarkbSounds like that is it. Thanks everyone!19:48
clarkb#endmeeting19:48
opendevmeetMeeting ended Tue Nov  9 19:48:09 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:48
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.html19:48
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.txt19:48
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.log.html19:48
fungithanks clarkb!19:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!