Tuesday, 2021-09-07

clarkbmeeting time!19:00
clarkbwe will get started momentarily19:00
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Sep  7 19:01:10 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-September/000281.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI had nothing to announce19:01
fungiml upgrade19:01
clarkboh yup that is on the topic list but worth calling out here if people read the announcements and not the rest of the log.19:02
clarkblists.openstack.org will have its operating system upgraded September 12 beginning at 15:00UTC19:02
fungi#link http://lists.opendev.org/pipermail/service-discuss/2021-September/000280.html Mailing lists offline 2021-09-12 for server upgrade19:03
fungii also sent a copy to th emain discuss lists for each of the different mailman sites we host on that server19:03
clarkbthe lists.katacontainers.io upgrade seemed to go well and we've tested this on zuul test nodes as well as a snapshot of that server19:04
clarkbshould hopefully be a matter of answering qusetions for the upgrade system and checking things are happy after19:04
clarkb#topic Actions from last meeting19:05
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-08-31-19.01.txt minutes from last meeting19:05
clarkbThere were no actions recorded19:05
clarkb#topic Specs19:05
clarkb#link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement19:05
clarkbcorvus: fungi: ianw: can I get reviews on this spec? I think it is fairly straightforward and approvable but wanted to make sure I got the details as others expected them19:06
clarkbthank you tristanC and frickler for the reviews19:06
fungithanks for the reminder, i've starred it19:06
clarkb#topic Topics19:07
clarkb#topic lists.o.o operating system upgrade19:07
clarkbas mentioned previously this is happenign on September 12 at 15:00UTC19:07
clarkbThis upgrade will affect lists for openstack, opendev, airship, starlingx and zuul19:07
fungii also did some preliminary calculations on memory consumption for the lists.katacontainers.io server post-upgrade and it seems like it's not going to present any significant additional memory pressure at least19:08
clarkbthank you for checking that. I plan to be around for the upgrade as well19:08
fungiunfortunately i didn't check memory utilization pre-upgrade and we don't have that server in cacti, so no trending19:08
fungihowever i'm not super concerned that the lists.o.o server will be under-sized for the upgraded state19:09
clarkbit is bigger than I had thought previusly too which gives us more headroom than I expected :)19:09
fungiafter the upgrade is concluded, the openinfra foundation is interested in adding a lists.openinfra.dev site and moving a number of foundation-specific lists to that, so i'll pay close attention to the memory utilization post-upgrade to make sure that addition won't pose a resource problem19:10
fungi(for those who aren't aware, our current deployment model uses 9 python processes for the various queue runners for each site)19:11
clarkbI think that is about it for the lists upgrade. be aware of it and fungi and I will keep everyone updated as we go through the process19:11
clarkb#topic Improving OpenDev's CD Throughput19:12
fungialso once the ubuntu upgrade is done, i think we can start planning more seriously for containerized mailman 319:12
fungioops, sorry19:12
clarkbfungi: ++19:12
clarkbno worries I think that is the next step for the mailman services19:12
clarkbI haven't had time to dig into our jobs yet. Too many things kept popping up19:12
clarkb#link https://review.opendev.org/c/opendev/system-config/+/807672/ starts to sketch this out.19:12
clarkbBut ianw took a look yseterday19:12
clarkbianw: can you give us the high level overview of this change? It seems you've modified a few jobs then started working on pipeline updates? Looks like you're sketching stuff out and this isn't quite ready yet19:13
fungii've been sort of paying attention to what jobs are running on system-config changes now, and it still seems sane19:13
ianwyeah i was going to draw graphs and things but i noticed a few things19:14
ianwfirstly the system-config-run and infra-prod stages are fairly different; in that for system-config-run you just include the letsencrypt playbook, while for prod you need to run the job first19:14
ianwin short, i think we really just need to make sure things depend on either the base job, or the letsencrypt job, or their relevant parent (but there's only a handful of cases like that)19:15
ianwi don't see why they can't run in parallel after that19:16
clarkbcool. I also noticed you change how manage-projects runs a little bit. I believe we're primarily driving that from project-config today, but this has it run out of system-config more often?19:16
ianwyeah, manage-projects all i did was put the file matchers into the job: rather than in the projects19:17
clarkbianw: ok, I think it is done that way because we run it from openstack/project-config and file matchers are different there?19:18
ianwand also i think that should probably depend on infra-prod-review?  as in if we've rolled out any changes to review we'd want them to merge before projects19:18
clarkbthat might need a little bit of extra investigating to understand how zuul handles that and whether it is appropraite for manage-projects19:18
clarkbianw: ++19:18
ianwoh; that could be, yep.  something that probably wants a comment :)19:18
ianwthen i think infra-prod-bridge was another one i wasn't sure of in the build hierarchy19:19
clarkbthat helps me understand some of what is going on there. I can leave some comments after the meeting19:19
ianwthat pulls an updated system-config onto bridge; but i don't think that matters?  everything runs on bridge, but via zuul-checkout?19:19
clarkbinfra-prod-bridge also configures other things on bridge like the ansible version iirc19:20
ianwyeah, it was mostly a sketch, i see it synatx errored.  but it suggested to me that we can probably tackle the issue with mostly just thinking about it and formatting things nicely in the file19:20
fungiforcibly updating the checkout on bridge seems like the most sensible way to prevent accidental rollbacks from races in different pipelines too19:21
clarkbI think that each job is using the checkout associated with its triggering change19:21
clarkbthere is an escape hatch in that task that checks if it is running in a periodic pipeline in which case it uses master instead19:22
clarkbdefinitely seems unnecessary to do the cehckout in a prior job19:22
fungiahh, okay, so we still need some mitigation if mutex prioritization is implemented (did that ever land?)19:22
clarkbya I'm still not sure if we decided if that was necessary or not. Going from change to periodic should be fine, but periodic to change may not be?19:23
clarkbthough if we prioritize the change pipeline then periodic to change would only happen when a new change arrives and should be safe19:23
fungioh, when you say "checks if it is running in a periodic pipeline in which case it uses master instead" you mean explicitly updates the checkout when the build starts rather than using the master branch state zuul associated with it when enqueued. yeah that should be good enough19:23
clarkbso ya I htink we're ok as long as deploy has a higher priority than the periodic piepliens19:24
clarkbfungi: yes, that was my reading of it19:24
fungiyes, i concur19:24
ianwso should "infra-prod-bridge" be the base job?  as in infra-prod-base <- infra-prod-bridge <- infra-prod-letsencrypt <- <most other jobs>19:24
fungii forgot we had already arrived at that conclusion19:24
fungiianw: that sounds great to me19:25
ianwif we're thinking that say updating an ansible version on bridge should affect all following jobs19:25
clarkbianw: yes I think so but less for having system-config updates and more so that ansible and its config update before running more jobs19:25
ianwthese are all soft dependencies19:26
clarkbthat sounds right19:26
clarkbwe don't need -bridge to run if ansible isn't updating19:26
ianwi assume they "pass upwards" correctly.  so basically if there's no changes that match on the base/bridge for the change we're running, then everything will just fire in parallel because it knows we're good19:26
clarkbthat is my understanding of how the soft dependencies should work19:27
ianwwe may uncover deficiencies in our file matchers, but i think we just have to watch what runs and debug that19:27
clarkbthat all sounds good. I'll try to leave those comments on the change and we can continue to refine this in review.19:29
clarkbAnything else on this subject?19:29
ianwnope, not from me19:29
clarkb#topic Gerrit Account Cleanups19:29
clarkbI finalized the rpevious batch of conflict cleanups which leaves us with 33 conflicts19:30
clarkbMy intention with these is to find a morning or afternoon where I can start writing down a plan for each one then email the users directly with that proposal19:30
clarkbThen assuming I get acks back I'll go ahead and start committing those fixes in a tmp checkout of All-Users on review02.19:30
fungiis the list of those in your homedir on review.o.o?19:30
clarkbI'll probably give users 2-3 weeks to respond and if they don't go ahead with my plan for them as well. Importantly once we commit these last fixes we should be able to fix any account while gerrit is online by adding and removing commits to all-users that pass validations19:31
clarkbfungi: yup all the logs and details are in the typical location including my most recent audit results19:31
clarkbI'll probably reach out if I need help with planning for these users otherwise I'll start emailing people this week hopefully19:33
clarkbis anyone interested in being CC'd on those comms?19:33
ianwsure19:34
clarkbthanks!19:35
clarkb#topic OpenDev Logo Hosting19:35
clarkbThe changes to make the opendevorg/assets image a thing landed this morning and gitea redeployed using those builds19:35
clarkbthank you ianw for working through this19:35
fungiit's awesome19:35
fungitruly19:35
clarkbWe do still need to update gerrit and paste to incorporate the new bits one way or another19:35
clarkbwith gerrit we currently bind mount the static content dir and could put the files in that location and serve them that way19:36
clarkbI'm not sure what the best method for paste would be19:36
clarkbianw: ^ you might have thoughts on those services?19:36
ianwi think the easy approach of pointing that at https://opendev.org/opendev/system-config/assets/ 19:37
ianwi can propose changes for them both19:37
clarkbthat works too, and thanks19:37
clarkbcertainly we can start there and that will be far more static for the gitea 1.15.x upgrade19:37
clarkbOnce this logo effort is done I'ld like to see if we're happy enough with the state of things to do that gitea upgrade. I'll bring that up once logos are done19:38
clarkb#topic Rebooting gitea servers for host migrations in vexxhost sjc119:38
fungi08 is already done, yeah? just batching up the rest and then doing the lb?19:39
clarkbThis is a last minute addition as mnaser is asking us to reboot gitea servers to cold migrate them to new hardware19:39
fungidid the gerrit server already get migrated?19:39
clarkbyup 08 is done. 06 and 07 are pulled from haproxy and ready to go. Note that mnaser needs to do the reboot/cold migration on his end as we cannot trigger it ourselves so I'm working with mnaser to turn things off in haproxy and then he can migrate19:40
clarkbfungi: the gerrit server is/was already on new amd hardware and doesn't need this to happen19:40
clarkbI think previously mnaser had asked about doing review not realizing it was on the new amd stuff already19:40
clarkbin any case review wasn't on the list supplied this morning19:40
fungioh, cool, i remember him indicating some weeks back a need to migrate it, but maybe that was old info19:40
clarkbI can double check with him when he gets back to the migrations19:41
clarkbdo we have any opinions on how to do the load balancer? Probably just do it this afternoon (relative to my time) if the gitea backends are happy with the moves?19:41
clarkbthat potentially impacts zuul jobs but zuul tends to be quieter during that time of day19:41
fungiyeah, i mean we could try to pause all running jobs $somehow but honestly we warn projects not to have their jobs pull from gitea or gerrit anyway19:42
ianwdoes zuul need a restart for anything?19:42
clarkbianw: there are a few changes that we could restart zuul for but the one change that I really wanted to get in isn't ready yet or wasn't last I checked19:43
ianwi don't think the fix to the log buttons rolled out, and iirc corvus mentioned maybe we should just roll the whole thing19:43
clarkbhttps://review.opendev.org/c/zuul/zuul/+/807221/ that change19:43
corvusmy bugfix merged with 2 others and has gotten big19:43
clarkbI intend on rereviewing that change this afternoon19:43
corvusi'm not feeling a huge need to restart right now19:43
clarkbok19:44
fungii think a quick reboot for the lb is probably fine whenever19:44
clarkbIn that case probably the easiest thing for the load balancer is to just go for it19:44
fungiagreed19:44
clarkbsounds good I'll continue to coordinate with mnaser on that and get this done19:44
ianwi can do it in my afternoon if we like, make it even quieter19:44
clarkbianw: I think the problem with that is mnaser (or vexxhost person) has to do it19:45
* fungi does not know how to make ianw's afternoon even quieter19:45
ianwoh right, well let me know :)19:45
clarkbI'm mostly managing the impact on our side but mnaser is pushing the cold migrate button19:45
ianwfungi: you could come and cure covid and get my kids out of homeschool, that would help! :)19:45
fungid'oh!19:46
clarkb#topic Open Discussion19:46
clarkbThat was it for the agenda. Anything else worth mentioning?19:46
fungiopendev's testing and deployment is going to be featured in an talk at ansiblefest, for those who missed the announcement in other places19:47
fungi#link https://events.ansiblefest.redhat.com/widget/redhat/ansible21/sessioncatalog/session/16248953812130016Yue 19:49
fungiregistration is "free" for the virtual event, after you fill out 20 pages about how your company might be interested in ansible ;)19:50
corvusi hope it's good :)19:50
corvusi think it's about the coolest thing you can do with ansible so...19:51
fungiit certainly is cool, i'll give you that19:51
corvussessions are also ~20m so hopefully shouldn't be a slog19:52
clarkboh I like that19:52
corvussept 29-3019:52
clarkbI think shorter works better for virtual19:52
corvusyeah, they had some good info for speakers about exactly that19:52
corvuslike, your audience is in a different situation when virtual, so structure the talk a little differently19:53
fungithat's helpful19:53
clarkbSounds like that may be it.19:56
clarkbThank you everyone!19:56
clarkbSee you here next week. Same time and location19:56
fungithanks clarkb!19:56
clarkb#endmeeting19:56
opendevmeetMeeting ended Tue Sep  7 19:56:39 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:56
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-07-19.01.html19:56
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-07-19.01.txt19:56
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-07-19.01.log.html19:56
clarkband now time for some lunch :)19:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!