clarkb | anyone else here for the meeting? | 19:00 |
---|---|---|
diablo_rojo | o/ | 19:00 |
fungi | ohai | 19:01 |
clarkb | Hello | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Nov 9 19:01:09 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-November/000295.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I was hoping I could link to gerrit user summit stuff but I can't find any details on that yet. They must be running into planning issues. I can sympathize with that | 19:02 |
ianw | o/ | 19:02 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-11-02-19.01.txt minutes from last meeting | 19:03 |
clarkb | We didn't record any new actions and the prior actions are all done or in progress. \o/ | 19:03 |
clarkb | #topic Specs | 19:03 |
clarkb | Just a quick note here that I approved fungi's mailman3 spec after a quick respin to address input from frickler. The updates were minor and didn't seem like anything that needed to be completely rereviewed | 19:03 |
clarkb | #topic Topics | 19:05 |
clarkb | #topic Improving OpenDev's CD throughput | 19:05 |
clarkb | this is still on my todo list. All the zuul and container fun has been distracting me :/ | 19:05 |
clarkb | ianw: have you had a chance to look at why the child changes are failing CI? (I'm mostly just curious) | 19:05 |
ianw | no sorry, i've managed to be distracted on other things | 19:06 |
clarkb | I think we've all been in that boat recently | 19:06 |
clarkb | #topic Gerrit Account Cleanups | 19:06 |
clarkb | I was mostly going to skip this over except fungi found a story where someone had trouble with @ in their username. fungi has asked them to clarify if they were trying to use an email address as a username or if their username actually has an @ in it | 19:06 |
clarkb | Noting that here for the possibility there is another sort of cleanup we'll need to do in normalizing usernames | 19:07 |
clarkb | Not really actioanable at this point but an interesting possibility | 19:07 |
clarkb | #topic Zuul Multi Scheduler Setup | 19:08 |
corvus | there are 2 schedulers | 19:08 |
clarkb | Zuul has made great progress on supporting multiple schedulers (removing the last remaining spof for a zuul install). Our OpenDev zuul is running two schedulers. One on zuul01.o.o and the other on zuul02.o.o. Zuul02 is the "primary" | 19:08 |
clarkb | What makes zuul02 the primary for us is that all web traffic hits it first | 19:08 |
corvus | and gearman | 19:09 |
corvus | and actually there's no web on zuul01 | 19:09 |
clarkb | If things go really sideways I think we can stop zuul01's scheduler and restart the scheduler on zuul02 only | 19:09 |
corvus | (cause we don't have a load balancer for it) | 19:09 |
corvus | ++ | 19:09 |
fungi | and, if necessary, clear out zk | 19:10 |
corvus | and if things go really badly, clearing the zk state would be a good idea | 19:10 |
clarkb | It is worth noting that we have run into problems but we've been trying to work through them as they show up. corvus has been a great help with that. | 19:10 |
clarkb | So far we've had issues with retried jobs not behing handled properly. Nodepool requests getting stuck in a perpetually waiting state, and config errors not serializing properly | 19:10 |
fungi | i'm thrilled that we haven't needed to downgrade again | 19:10 |
corvus | i think we're at the point where the problems that have been cropping up have been minimal enough we can roll forward | 19:10 |
clarkb | There are a few more new issues showing up today that deserve followup after this meeting. Specifically johnsom's designate change error and elodilles lack of a zuul.tag var on release jobs. There is also a job runtime estimate problem that corvus has a fix up for | 19:11 |
clarkb | corvus: ^ fyi details for both of those other things are in #opendev | 19:11 |
clarkb | Please do report any weird behavior. So far a lot of weird behavior I have noticed ahs been tracked back to the multi scheduler setup and reporting those things has been very helpful because now they are fixed :) | 19:13 |
clarkb | And ya the recovery at this point can probably be achived by simple restarting onto one scheduler using zuul02 since the code seems quite stable in a single scheduler setup | 19:13 |
clarkb | Anything else to add to the zuul topic? or questions about the setup? | 19:14 |
clarkb | #topic User management on our systems | 19:15 |
clarkb | Last week I started pulling on a thread and noticed there were some improvements we could make to how we manage our users and uids | 19:16 |
clarkb | thank you to those who helped me make sense of it all | 19:16 |
clarkb | A number of changes have come out of that. Some potentially impactful if we aren't careful. I'd like to ask infra-root take a look over these changes so that we can start landing them when we are confident they are safe and are able to watch them go in | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/816869/ Be explicit about uid/gid ranges | 19:16 |
clarkb | This first change adjusts our configured ranges in adduser.conf and logins.def which different tools refer to when creating usrrs | 19:17 |
clarkb | The rough layout is: 0-999 system, 1000-1999 unallocated, 2000-2999 for infra-root users, 3000-9999 host level users, 10k - 64k container users that need uids on the host as well for bind mounts. | 19:17 |
fungi | #link https://review.opendev.org/816869 Lower UID/GID range max to make way for containers | 19:17 |
fungi | also that one | 19:17 |
clarkb | Thats the same one :) | 19:17 |
fungi | er, yep sorry. i guess it had a different title | 19:18 |
clarkb | the idea there is we've put a number of container services on high uids like 10001 but then when we create say the letsencrypt group it gets created as gid 10002 | 19:18 |
clarkb | fungi and I are thinking it would be better to not assign specific values to stuff that actually belongs to the system like our users and letsencrypt group and so on so we cap those at 9999 then we can eb explicit about container uids/gids and ensure those are non overlapping in the >10k space | 19:19 |
fungi | well, our users are already statically assigned uids and gids | 19:19 |
clarkb | yup | 19:19 |
fungi | "our" personal accounts i mean | 19:19 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/816771 Clean up unused bootstrapping users | 19:19 |
clarkb | Is antoher user related change. This time to cleanup users that I think we don't need. | 19:20 |
clarkb | This is mostly a belts and suspenders spring cleaning move and one that scares me slightly since we might've accidentally used one of those users for something functional but as far as I can tell this isn't the case | 19:20 |
fungi | well, it's also a security concern | 19:20 |
clarkb | right | 19:20 |
clarkb | I think it is important, but one we should take care with and review carefully | 19:21 |
fungi | as on some systems those accounts come with provider-supplied authorized keys | 19:21 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/816769/ Give gerritbot and matrix-gerritbot a shared user | 19:21 |
clarkb | This is a followon to 816869, and I'd like to have us work our way through giving most of our other containers similar treatment | 19:21 |
clarkb | lodgeit, refstack, hound, some other irc bots, etc | 19:22 |
clarkb | But I figure start slow make sure we've got things laid out the way we want first before we do a bunch of work that needs updating again later | 19:22 |
clarkb | The gerritbots seemed like a good example case | 19:22 |
clarkb | Then when that has been worked through we should also look into updating the uid for our mariadb containers to something other than 999 | 19:23 |
clarkb | The mariadb uid was what sparked this whole thing off in the first place and will likely be the last thing to be addressed :) | 19:23 |
clarkb | I think for mariadb we can probably run it as the user for the services it supports in most cases. For review run it as the gerrit user, for etherpad run it as the etherpad user and so on | 19:23 |
clarkb | Since we don't run shared mariadbs between services we can do that safely | 19:24 |
clarkb | If y'all can review with a very critical eye I would appreciate it. I'm happy to do additiaonl testing (we already did some manual testing before settling on 816869) | 19:24 |
fungi | yeah, it *seems* to work as hoped | 19:25 |
fungi | part of the problem is that adduser and useradd rely on entirely different configs | 19:25 |
fungi | and package maintscripts and ansible roles use who knows which one | 19:26 |
clarkb | I think I foudn some evidence that package scripts do use both. Or rather one package uses one and another packages uses the other | 19:26 |
fungi | right | 19:26 |
clarkb | the evidence for this is that one will do gidmax-1 and the other will do gidmin+1 | 19:26 |
fungi | so at least having the two of them in sync should help | 19:26 |
clarkb | and we see evidence of both on our systems | 19:26 |
clarkb | Any other questions or concerns to bring up on this topic? | 19:28 |
ianw | thanks for digging into it and explaining, i'll take a look at the changes too | 19:28 |
clarkb | #topic Open Discussion | 19:30 |
clarkb | I wasn't sure which other topics we would want to discuss so decided to trim the agenda down and let this portion of the meeting cover anything else | 19:30 |
clarkb | Gerrit3.4 upgrade stuff and fedora 35 work seem to be progressing, but not sure there is anything to share yet | 19:31 |
clarkb | I think there is a dib change I should review for containerfile stuff that I haven't been able to get to | 19:31 |
clarkb | #link https://review.opendev.org/c/openstack/diskimage-builder/+/817139 dib handle containerfile errors better | 19:31 |
ianw | yeah, that was working but last night centos-9 mirrors were broken | 19:31 |
ianw | speaking of | 19:31 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/817136 | 19:31 |
fungi | i could use some help on fixing bitrot in the storyboard-webclient builds, if anyone has tips for how to update a yarn.lock | 19:32 |
ianw | adds centos 9-stream mirrors, but i don't think we have space. my plan is to continue to remove debian-stretch | 19:32 |
fungi | #link https://review.opendev.org/814053 [opendev/storyboard-webclient] Bindep cleanup | 19:32 |
clarkb | ianw: just left a comment on the centos-9-stream change | 19:32 |
clarkb | fungi: to update a yarn.lock you remove the lock and then reinstall iirc. That produces a new lock file and if testing succeeds with that you can merge it | 19:33 |
fungi | reinstall what? | 19:33 |
clarkb | reinstall the javascript stuff using yarn | 19:34 |
ianw | i've always just done "yarn upgrade" i think | 19:34 |
clarkb | https://classic.yarnpkg.com/en/docs/cli/install/ I think | 19:34 |
fungi | oh, i can give that a try, thanks | 19:34 |
fungi | oh, also i've had this up for a while, to hopefully make our system-config jobs a little more robust... | 19:34 |
clarkb | ianw: ah that might be the better method then. I guess upgrade ignores the lock and writes a new one? | 19:34 |
fungi | #link https://review.opendev.org/813880 [opendev/system-config] Retry acme.sh cloning | 19:34 |
ianw | but either way, the *real* problem is going to be every javascript library that has maintained the same name but rewritten itself completely (see prior discussions in #opendev yesterday :) | 19:34 |
fungi | yeah, in this case the reason i need to do it is because one of the js packages needs updating to support python3 | 19:35 |
fungi | but i assume that will involve updates to a lot of other dependencies | 19:36 |
clarkb | ya you might have to fiddle with the requirements file equivalent to find something that produces a working set | 19:36 |
clarkb | when I've done this for zuul before it is a fun exercise | 19:36 |
fungi | and this for making the rejects in our iptables rules slightly more expressive... | 19:36 |
fungi | #link https://review.opendev.org/810013 [opendev/system-config] Switch IPv4 rejects from host-prohibit to admin | 19:36 |
clarkb | fungi: I've +2'd that one so you can approve it when you are able to watch it. Like the user changes has potential for wide spread pain if somehow it goes wrong (though again I don't expect any issues) | 19:38 |
fungi | yep | 19:39 |
fungi | i did some spot testing and confirmed it works as expected, at least | 19:39 |
ianw | oh, i approved it in between, but yeah, i'll be around | 19:40 |
clarkb | cool | 19:40 |
clarkb | I'll give this meeting a few more minutes to bring anything else up | 19:40 |
clarkb | But then I need to review some zuul fixes and eat lunch :) | 19:40 |
fungi | i'll be around too, of course | 19:40 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/816766 | 19:40 |
ianw | is a minor one to expose the db in gerrit testing | 19:41 |
ianw | i don't think we were noticing gerrit wasn't actually talking to the db correctly | 19:41 |
fungi | oh, right, i meant to look at that one, thanks | 19:41 |
clarkb | ++ thats a good update to our testing for gerrit | 19:41 |
clarkb | ianw: you might consider toggling the state too since we had the issue with the non unique keys thing in the past that was only hit on the change of an already created row | 19:42 |
clarkb | I approved it as is as this is claerly better than what we had before | 19:42 |
ianw | that's a good idea, can do that, just the same request with a DELETE | 19:43 |
ianw | I guess a loop of PUT DELETE PUT might work | 19:43 |
ianw | i wonder if a template to method: works | 19:44 |
fungi | seems like it should? it's just a string, right? | 19:45 |
clarkb | Sounds like that is it. Thanks everyone! | 19:48 |
clarkb | #endmeeting | 19:48 |
opendevmeet | Meeting ended Tue Nov 9 19:48:09 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:48 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.html | 19:48 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.txt | 19:48 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-09-19.01.log.html | 19:48 |
fungi | thanks clarkb! | 19:49 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!