clarkb | Just about meeting time | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Aug 27 19:00:21 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/CGWURWK2YK4LLA7VPHS5KXF63I47EOYJ/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | Due to timezones and travel and conference obligations I won't make it to next weeks meeting. | 19:00 |
clarkb | #topic Upgrading old servers | 19:01 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/921321 Wiki replacement ansible stack | 19:02 |
clarkb | Looks like a couple of us have reviewed that stack since the meeting last week. Overall things look good to me. My main concern was how the ansibel is set up to stop and start things on every run. I think we can probably live with that before we do the cut over if we prefer to nto fix that upfront | 19:03 |
clarkb | or we can fix it upfront and avoid unnecessary restarts | 19:03 |
clarkb | looks like frickler found some functional issues that need correcting in the job setup as well | 19:03 |
clarkb | Not sure if frickler or tonyb are around for the meeting, but are there any questions about the reviews? | 19:05 |
clarkb | sounds like no at least for now | 19:06 |
clarkb | separately tonyb also got some new Noble mirrors running. https://mirror02.sjc1.vexxhost.opendev.org/ I believe that is one of them and it appears to eb working | 19:07 |
clarkb | we should probably go ahead and cut dns over and start thinking about cleaning up the old servers | 19:07 |
frickler | I'm around, but not sure about the question? | 19:07 |
clarkb | frickler: I was mostly opening the door for tonyb to provide feedback on our reviews if there was any. I know I ended up writing a number of comments | 19:07 |
frickler | ok | 19:08 |
tonyb | yup they're very helpful. | 19:09 |
clarkb | tonyb: any questions or concerns or updates? | 19:09 |
tonyb | I'm working on addressing them. just slowly due to running up and down a mountain | 19:10 |
tonyb | nope nothing specific yet | 19:10 |
clarkb | cool. Thank you for continuing to push this along | 19:10 |
clarkb | #topic AFS Mirror Cleanups | 19:10 |
fungi | if it was a sacred mountain, i hope you wore curse-resistant footwear | 19:10 |
clarkb | I don't have anything new here. I've been distracted by new clouds and summit/travel prep and this is an easy thing to deprioritize... | 19:10 |
clarkb | #topic Rackspace Flex Cloud | 19:11 |
clarkb | But we have infos about rackspace's new cloud setup and it sounds very promising | 19:11 |
fungi | it's ready to be flexed | 19:11 |
clarkb | basically they are rolling out a new cloud deployment generation. Its currently still in some sort of pre release state but they are happy for us to start kicking the tires on it. | 19:11 |
clarkb | Our existing accounts work with it if we use a different keystone and region. fungi set up clouds.yaml for us and it seems to be working. I think we should treat this as a separate cloud though because it is so different even though the credentials align | 19:12 |
clarkb | so we have new clouds.yaml entries for it and we'll have separate nodepool providers and so on | 19:12 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/927214 Enroll New cloud region into cloud launcher | 19:12 |
fungi | yeah, the open change splits the credential vars in our private store, even though they're just copies of the same values at the moment | 19:13 |
clarkb | I believe this is the next step in rolling out our usage of the flex cloud. Basically configure networking, ssh keys, and security groups | 19:13 |
clarkb | Then when that is done we can figure out flavors and quotas, deploy a mirror node, then point nodepool at it | 19:13 |
clarkb | it does appear they have a noble image so we don't have to upload our own like tonyb did with other clouds but can do that too if we want things to be in sync | 19:14 |
fungi | i already identified --flavor=gp.0.4.8 as being equivalent to our standard for job nodes | 19:14 |
fungi | that's 8gb ram, 4 vcpus, 80gb rootfs | 19:15 |
fungi | also has a 128gb ephemeral disk | 19:15 |
frickler | iiuc our standard is 8 vcpus? | 19:15 |
fungi | depends on how fast they are | 19:15 |
clarkb | ya on osic we did 4vcpus | 19:15 |
fungi | these are supposedly "very fast" | 19:15 |
corvus | we've traditionally considered ram more important | 19:16 |
clarkb | and it sounded like if we have feedback on that they are open to it | 19:16 |
clarkb | for example if 4vcpus aren't enough we could probably ask for an 8vcpu flavor | 19:16 |
corvus | as in, more important to keep consistent across providers | 19:16 |
clarkb | but ya they seemed confident these should be much quicker so hopefully we can get away with 4vcpu | 19:16 |
fungi | the only other 8gb flavor i saw had a smaller rootfs and no ephemeral disk | 19:17 |
frickler | I also saw that we have a quota of 50 instances, but only 256GB ram, so that would only by 32 x 8 GB unless I miscalculated | 19:17 |
fungi | they said it was a starter quota, so we can test it out and then let them know when we want to scale up | 19:17 |
clarkb | they also said they may need to build out capacity, but once its there it should be easy for us to update the max-servers number | 19:18 |
fungi | but yes, we should check the limits and adjust our initial max-servers accordingly | 19:18 |
frickler | ah, o.k., so we should limit on the nodepool side for now, fine then | 19:18 |
fungi | yep | 19:18 |
clarkb | so ya I think we keep pushing this forward and we should hopefully have a nice shiny cloud to use soon | 19:19 |
clarkb | #topic Etherpad 2.2.2 Upgrade | 19:20 |
clarkb | As a reminder the concern with this upgrade is that 2.2.2 breaks how code is imported into etherpad which appears to break ep_headings plugin that we've used for years. We swapped that out with ep_headings2 in the 2.2.2 image build. fungi then tested a production etherpad dump into the held 2.2.2 node and it looks like ep_headings2 works with existing pads | 19:21 |
clarkb | I think this means we can go ahead and upgrade (maybe after the summit?) as long as we take a couple of extra precautions. Specifically do a manual db dump prior to the upgrade to make rollbacks easier and maybe also give the current etherpad image a tag other than latest to make rolling back to it easier too | 19:21 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/926078 WIP Change implementing the upgrade | 19:21 |
clarkb | That change is WIP only because I was concerned about the compatibility between plugins but maybe I'll keep it WIP until we're comfortable with that upgrade path | 19:22 |
fungi | the only thing i didn't find time to do was identify a pad which has level 5 or 6 headings to check, since the new plugin only has up to 4 heading levels | 19:22 |
clarkb | and if those break we can probably do a pad export then reimport without the formatting | 19:22 |
clarkb | annoying but workable | 19:22 |
fungi | also level 5 and 6 headings were uselessly small, so i doubt they saw much use | 19:23 |
fungi | (smaller than the normal text size) | 19:23 |
clarkb | I'll try to plan for that after I return from the summit | 19:24 |
clarkb | should go quickly once we actually do it | 19:24 |
clarkb | #topic Service Coordinator Election | 19:24 |
clarkb | The only nomination I saw during the nomination period was the one I sent. Based on our previous meeting I'm not surprised :) | 19:25 |
clarkb | That means I'm service coordinator again by default unless I missed any nominations. If there was one that was missed please call that out otherwise I'll consider this election activity done | 19:25 |
clarkb | #topic OSUOSL ARM Cloud Issues | 19:27 |
clarkb | There were two distinct issues that have been noticed in the OSUOSL arm cloud since the linaro cloud shutdown | 19:27 |
clarkb | the first is that our nodepool builder for arm images (nb04) had run out of disk. I cleared out /opt/dib_tmp but image builds continued to fail which was due to losetup loopback devices all being consumed. I did a reboot to clear out that state and that seems to hae corrected things | 19:28 |
clarkb | We have had at least one successful image build since I made those changes. Unfortunately those image builds are very slow (~7 hours). It sounds like some of that slowness may be due to how cinder volumes are implemented there. ramereth says that an ssd backed volume can be used if some cloud changes are made which may help | 19:29 |
clarkb | This would be a good improvement but I Think we can limp along as is it will just be slow | 19:29 |
clarkb | Separately the kolla team noticed that their container image build jobs are super slow and timing out on osuosl since the linaro shtudown too | 19:29 |
clarkb | after some digging it appears that fio shows poor io against the root disk and ephemeral disks in that cloud. Which is good because now we have some concrete measurable problems that we can feedback to osuosl and hopefully improve things with | 19:30 |
clarkb | at this point I think we've got what we need to provide feedback so not much more to do | 19:32 |
clarkb | #topic Updating ansible+ansible-lint versions in our repos | 19:32 |
clarkb | After we updated the default nodeset to ubuntu noble we ran into issues with the versions of ansible + ansible-lint in our linter jobs | 19:32 |
clarkb | basically older ansible and old ansible-lint can't run under python3.12. Things work if we update ansible to ansible 8 to match what zuul runs and ansible-lint to latest. But doing so introduces new errors | 19:33 |
clarkb | #link https://review.opendev.org/c/openstack/project-config/+/926848 | 19:33 |
clarkb | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 | 19:33 |
clarkb | I've got these two chagnes which correct the problem for project-config and openstack-zuul-jobs. | 19:33 |
clarkb | The ozj change doesn't pass CI because we update the playbook to build openafs RPMs and that fails on arm64 due to a stale kernel that doesn't match package headers. The fixes to nb04 above should correct that in the next day or two I hope | 19:34 |
clarkb | reviews welcome, most of it is mechanical updates to make the linter happy. I didn't just turn off all the rules because the majority seem to make some sense (like naming plays and using fully qualified paths for action modules) | 19:36 |
clarkb | I was less happy about capitalizing words and reording yaml dicts to someone's preference for oder | 19:36 |
clarkb | *order | 19:36 |
corvus | what's the yaml order thing? | 19:37 |
clarkb | trying to find an example so many changes | 19:39 |
fungi | ansible-link now cares what order certain associative array elements appear in | 19:39 |
frickler | like this? https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970/4/playbooks/ansible-role-functional/pre.yaml | 19:39 |
clarkb | corvus: but basically they want when to go at the beginning of the block not the end | 19:39 |
mordred | I'm guessing like this: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970/4/playbooks/ansible-role-functional/pre.yaml ? | 19:39 |
fungi | er, ansible-lint i mean | 19:39 |
clarkb | frickler: that is a variant of it | 19:39 |
mordred | I'm guessing that's "name should come first" ? | 19:39 |
clarkb | ya one is name comes first but also when shouldn't be at the end | 19:39 |
fungi | it's more than just name | 19:39 |
mordred | jeez | 19:39 |
mordred | that's a dumb rule | 19:40 |
fungi | then "when" conditions also should be second, i think? | 19:40 |
fungi | yeah, that | 19:40 |
fungi | like much of style linting, it's "someone has an opinion about this" | 19:40 |
mordred | I don't know about you - but frequently I think it play reads better when when is at the end | 19:40 |
fungi | i'd have been fine with marking that rule skipped (most of the rules, or even the entire job, honestly) | 19:40 |
corvus | fungi speaks for me | 19:41 |
fungi | i still think at least 95% of the issues ansible-lint catches for us would also be caught by a basic yaml parser, so when you weigh the remaining 5% against the time spent updating style for working code over and over... | 19:42 |
clarkb | heh happy for followups to refine the ruleset either in followup changes or new patchsets | 19:43 |
corvus | (and fwiw, i usually put when at the beginning) | 19:43 |
clarkb | but this works and it does catch some useful things like the mode thing and being better about using modern names for things | 19:43 |
fungi | i agree the mode check is relevant, because 0644 and '0644' are different data types | 19:43 |
fungi | and the latter is getting interpreted/cast by ansible as octal 644 rather than decimal 644 | 19:44 |
fungi | hence entirely different numbers | 19:44 |
corvus | that should be in ansible itself | 19:45 |
clarkb | that would be nice, unfrotauntely.... | 19:45 |
fungi | ideally yes | 19:45 |
clarkb | anyway reviews welcome | 19:45 |
clarkb | This was a followup to noble stuff so I wanted to ensure it didn't get forgotten | 19:45 |
clarkb | on the whole though I think the noble default nodeset swtich went relatively well. We had some things break but all were fixable in a straightfowrard manner | 19:45 |
corvus | ++ | 19:46 |
fungi | it was less churn than i anticipated | 19:46 |
clarkb | #topic Open Discussion | 19:47 |
clarkb | I wanted to note that zuul-jobs was updated to make prepare-workspace-git faster. This was done by moving the implementation of that role from ansible tasks to ansible library python code | 19:48 |
clarkb | This should speed up jobs quite a bit. The impact will be greater the more repos are involved in a job | 19:48 |
clarkb | be on the lookout for any issues realted to it, though I did some spot checking and it seems to be working as is a speedup | 19:49 |
clarkb | Sounds like that may be everything? | 19:53 |
clarkb | Thank you for your time. I'll let you work out if you want a meeting next week before next tuesday | 19:53 |
clarkb | #endmeeting | 19:53 |
opendevmeet | Meeting ended Tue Aug 27 19:53:48 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:53 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.html | 19:53 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.txt | 19:53 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-08-27-19.00.log.html | 19:53 |
fungi | thanks clarkb! | 19:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!