Tuesday, 2024-08-27

clarkbJust about meeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Aug 27 19:00:21 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/CGWURWK2YK4LLA7VPHS5KXF63I47EOYJ/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbDue to timezones and travel and conference obligations I won't make it to next weeks meeting.19:00
clarkb#topic Upgrading old servers19:01
clarkb#link https://review.opendev.org/c/opendev/system-config/+/921321 Wiki replacement ansible stack19:02
clarkbLooks like a couple of us have reviewed that stack since the meeting last week. Overall things look good to me. My main concern was how the ansibel is set up to stop and start things on every run. I think we can probably live with that before we do the cut over if we prefer to nto fix that upfront19:03
clarkbor we can fix it upfront and avoid unnecessary restarts19:03
clarkblooks like frickler found some functional issues that need correcting in the job setup as well19:03
clarkbNot sure if frickler or tonyb are around for the meeting, but are there any questions about the reviews?19:05
clarkbsounds like no at least for now19:06
clarkbseparately tonyb also got some new Noble mirrors running. https://mirror02.sjc1.vexxhost.opendev.org/ I believe that is one of them and it appears to eb working19:07
clarkbwe should probably go ahead and cut dns over and start thinking about cleaning up the old servers19:07
fricklerI'm around, but not sure about the question?19:07
clarkbfrickler: I was mostly opening the door for tonyb to provide feedback on our reviews if there was any. I know I ended up writing a number of comments19:07
fricklerok19:08
tonybyup they're very helpful.19:09
clarkbtonyb: any questions or concerns or updates?19:09
tonybI'm working on addressing them.   just slowly due to running up and down a mountain 19:10
tonybnope nothing specific yet19:10
clarkbcool. Thank you for continuing to push this along19:10
clarkb#topic AFS Mirror Cleanups19:10
fungiif it was a sacred mountain, i hope you wore curse-resistant footwear19:10
clarkbI don't have anything new here. I've been distracted by new clouds and summit/travel prep and this is an easy thing to deprioritize...19:10
clarkb#topic Rackspace Flex Cloud19:11
clarkbBut we have infos about rackspace's new cloud setup and it sounds very promising19:11
fungiit's ready to be flexed19:11
clarkbbasically they are rolling out a new cloud deployment generation. Its currently still in some sort of pre release state but they are happy for us to start kicking the tires on it.19:11
clarkbOur existing accounts work with it if we use a different keystone and region. fungi set up clouds.yaml for us and it seems to be working. I think we should treat this as a separate cloud though because it is so different even though the credentials align19:12
clarkbso we have new clouds.yaml entries for it and we'll have separate nodepool providers and so on19:12
clarkb#link https://review.opendev.org/c/opendev/system-config/+/927214 Enroll New cloud region into cloud launcher19:12
fungiyeah, the open change splits the credential vars in our private store, even though they're just copies of the same values at the moment19:13
clarkbI believe this is the next step in rolling out our usage of the flex cloud. Basically configure networking, ssh keys, and security groups19:13
clarkbThen when that is done we can figure out flavors and quotas, deploy a mirror node, then point nodepool at it19:13
clarkbit does appear they have a noble image so we don't have to upload our own like tonyb did with other clouds but can do that too if we want things to be in sync19:14
fungii already identified --flavor=gp.0.4.8 as being equivalent to our standard for job nodes19:14
fungithat's 8gb ram, 4 vcpus, 80gb rootfs19:15
fungialso has a 128gb ephemeral disk19:15
frickleriiuc our standard is 8 vcpus?19:15
fungidepends on how fast they are19:15
clarkbya on osic we did 4vcpus19:15
fungithese are supposedly "very fast"19:15
corvuswe've traditionally considered ram more important19:16
clarkband it sounded like if we have feedback on that they are open to it19:16
clarkbfor example if 4vcpus aren't enough we could probably ask for an 8vcpu flavor19:16
corvusas in, more important to keep consistent across providers19:16
clarkbbut ya they seemed confident these should be much quicker so hopefully we can get away with 4vcpu19:16
fungithe only other 8gb flavor i saw had a smaller rootfs and no ephemeral disk19:17
fricklerI also saw that we have a quota of 50 instances, but only 256GB ram, so that would only by 32 x 8 GB unless I miscalculated19:17
fungithey said it was a starter quota, so we can test it out and then let them know when we want to scale up19:17
clarkbthey also said they may need to build out capacity, but once its there it should be easy for us to update the max-servers number19:18
fungibut yes, we should check the limits and adjust our initial max-servers accordingly19:18
fricklerah, o.k., so we should limit on the nodepool side for now, fine then19:18
fungiyep19:18
clarkbso ya I think we keep pushing this forward and we should hopefully have a nice shiny cloud to use soon19:19
clarkb#topic Etherpad 2.2.2 Upgrade19:20
clarkbAs a reminder the concern with this upgrade is that 2.2.2 breaks how code is imported into etherpad which appears to break ep_headings plugin that we've used for years. We swapped that out with ep_headings2 in the 2.2.2 image build. fungi then tested a production etherpad dump into the held 2.2.2 node and it looks like ep_headings2 works with existing pads19:21
clarkbI think this means we can go ahead and upgrade (maybe after the summit?) as long as we take a couple of extra precautions. Specifically do a manual db dump prior to the upgrade to make rollbacks easier and maybe also give the current etherpad image a tag other than latest to make rolling back to it easier too19:21
clarkb#link https://review.opendev.org/c/opendev/system-config/+/926078 WIP Change implementing the upgrade19:21
clarkbThat change is WIP only because I was concerned about the compatibility between plugins but maybe I'll keep it WIP until we're comfortable with that upgrade path19:22
fungithe only thing i didn't find time to do was identify a pad which has level 5 or 6 headings to check, since the new plugin only has up to 4 heading levels19:22
clarkband if those break we can probably do a pad export then reimport without the formatting19:22
clarkbannoying but workable19:22
fungialso level 5 and 6 headings were uselessly small, so i doubt they saw much use19:23
fungi(smaller than the normal text size)19:23
clarkbI'll try to plan for that after I return from the summit19:24
clarkbshould go quickly once we actually do it19:24
clarkb#topic Service Coordinator Election19:24
clarkbThe only nomination I saw during the nomination period was the one I sent. Based on our previous meeting I'm not surprised :)19:25
clarkbThat means I'm service coordinator again by default unless I missed any nominations. If there was one that was missed please call that out otherwise I'll consider this election activity done19:25
clarkb#topic OSUOSL ARM Cloud Issues19:27
clarkbThere were two distinct issues that have been noticed in the OSUOSL arm cloud since the linaro cloud shutdown19:27
clarkbthe first is that our nodepool builder for arm images (nb04) had run out of disk. I cleared out /opt/dib_tmp but image builds continued to fail which was due to losetup loopback devices all being consumed. I did a reboot to clear out that state and that seems to hae corrected things19:28
clarkbWe have had at least one successful image build since I made those changes. Unfortunately those image builds are very slow (~7 hours). It sounds like some of that slowness may be due to how cinder volumes are implemented there. ramereth says that an ssd backed volume can be used if some cloud changes are made which may help19:29
clarkbThis would be a good improvement but I Think we can limp along as is it will just be slow19:29
clarkbSeparately the kolla team noticed that their container image build jobs are super slow and timing out on osuosl since the linaro shtudown too19:29
clarkbafter some digging it appears that fio shows poor io against the root disk and ephemeral disks in that cloud. Which is good because now we have some concrete measurable problems that we can feedback to osuosl and hopefully improve things with19:30
clarkbat this point I think we've got what we need to provide feedback so not much more to do19:32
clarkb#topic Updating ansible+ansible-lint versions in our repos19:32
clarkbAfter we updated the default nodeset to ubuntu noble we ran into issues with the versions of ansible + ansible-lint in our linter jobs19:32
clarkbbasically older ansible and old ansible-lint can't run under python3.12. Things work if we update ansible to ansible 8 to match what zuul runs and ansible-lint to latest. But doing so introduces new errors19:33
clarkb#link https://review.opendev.org/c/openstack/project-config/+/92684819:33
clarkb#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/92697019:33
clarkbI've got these two chagnes which correct the problem for project-config and openstack-zuul-jobs.19:33
clarkbThe ozj change doesn't pass CI because we update the playbook to build openafs RPMs and that fails on arm64 due to a stale kernel that doesn't match package headers. The fixes to nb04 above should correct that in the next day or two I hope19:34
clarkbreviews welcome, most of it is mechanical updates to make the linter happy. I didn't just turn off all the rules because the majority seem to make some sense (like naming plays and using fully qualified paths for action modules)19:36
clarkbI was less happy about capitalizing words and reording yaml dicts to someone's preference for oder 19:36
clarkb*order19:36
corvuswhat's the yaml order thing?19:37
clarkbtrying to find an example so many changes19:39
fungiansible-link now cares what order certain associative array elements appear in19:39
fricklerlike this? https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970/4/playbooks/ansible-role-functional/pre.yaml19:39
clarkbcorvus: but basically they want when to go at the beginning of the block not the end19:39
mordredI'm guessing like this: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970/4/playbooks/ansible-role-functional/pre.yaml ?19:39
fungier, ansible-lint i mean19:39
clarkbfrickler: that is a variant of it19:39
mordredI'm guessing that's "name should come first" ? 19:39
clarkbya one is name comes first but also when shouldn't be at the end19:39
fungiit's more than just name19:39
mordredjeez19:39
mordredthat's a dumb rule19:40
fungithen "when" conditions also should be second, i think?19:40
fungiyeah, that19:40
fungilike much of style linting, it's "someone has an opinion about this"19:40
mordredI don't know about you - but frequently I think it play reads better when when is at the end19:40
fungii'd have been fine with marking that rule skipped (most of the rules, or even the entire job, honestly)19:40
corvusfungi speaks for me19:41
fungii still think at least 95% of the issues ansible-lint catches for us would also be caught by a basic yaml parser, so when you weigh the remaining 5% against the time spent updating style for working code over and over...19:42
clarkbheh happy for followups to refine the ruleset either in followup changes or new patchsets19:43
corvus(and fwiw, i usually put when at the beginning)19:43
clarkbbut this works and it does catch some useful things like the mode thing and being better about using modern names for things19:43
fungii agree the mode check is relevant, because 0644 and '0644' are different data types19:43
fungiand the latter is getting interpreted/cast by ansible as octal 644 rather than decimal 64419:44
fungihence entirely different numbers19:44
corvusthat should be in ansible itself19:45
clarkbthat would be nice, unfrotauntely....19:45
fungiideally yes19:45
clarkbanyway reviews welcome19:45
clarkbThis was a followup to noble stuff so I wanted to ensure it didn't get forgotten19:45
clarkbon the whole though I think the noble default nodeset swtich went relatively well. We had some things break but all were fixable in a straightfowrard manner19:45
corvus++19:46
fungiit was less churn than i anticipated19:46
clarkb#topic Open Discussion19:47
clarkbI wanted to note that zuul-jobs was updated to make prepare-workspace-git faster. This was done by moving the implementation of that role from ansible tasks to ansible library python code19:48
clarkbThis should speed up jobs quite a bit. The impact will be greater the more repos are involved in a job19:48
clarkbbe on the lookout for any issues realted to it, though I did some spot checking and it seems to be working as is a speedup19:49
clarkbSounds like that may be everything?19:53
clarkbThank you for your time. I'll let you work out if you want a meeting next week before next tuesday19:53
clarkb#endmeeting19:53
opendevmeetMeeting ended Tue Aug 27 19:53:48 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:53
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.html19:53
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.txt19:53
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.log.html19:53
fungithanks clarkb!19:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!