clarkb | meeting time | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Sep 17 19:00:28 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
NeilHanlon | o/ heya | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/OLEKXKOL5LLSYPUH6KMC5KSPZKYR24R6/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | I didn't have this in the email but a reminder that if you are eligible to vote in the openstack TC election you have ~1 day to do so | 19:01 |
NeilHanlon | ty for reminder | 19:01 |
clarkb | #topic Upgrading Old Servers | 19:03 |
clarkb | tonyb: anything new with the wiki changes? I havne't seen any updates since I last review them. Also I suspect you have been busy with election stuff? | 19:04 |
tonyb | not doing election stuff this time. just moving slower than I'd like | 19:05 |
tonyb | I've been addressing your review comments and testing locally | 19:06 |
tonyb | I should have updates later today | 19:06 |
clarkb | cool, looks like there were also some comments from frickler | 19:06 |
clarkb | also would it help to reorgnize the meeting so this topic went at the end given the timezone delta? | 19:06 |
tonyb | yup, I'm looking at those as well | 19:07 |
tonyb | it might, I do tend to miss the very beginning of the meeting | 19:07 |
tonyb | good news is that Australia will do it's DST transition within a month | 19:08 |
clarkb | still easy enough to chagne the order up for next time. I'll try to remember to do so | 19:09 |
clarkb | anything else related to new servers? | 19:09 |
tonyb | not from me | 19:09 |
clarkb | #topic AFS Mirror Cleanups | 19:10 |
clarkb | Nothing really new on this topic from me. Other than that I keep finding distractions when it comes to pushing on xenial cleanups. I do think the next step there is removing dead/idle projects from the zuul tenant config so that we can reduce the number of things with xenial references then follow up with xenial removal in what remains | 19:11 |
clarkb | I may take this off the agenda until I'm able to pick that up again | 19:11 |
clarkb | #topic Rackspace Flex Cloud | 19:12 |
clarkb | Wanted to give an update on where we are with Rackspace's new Flex Cloud region but I may drop this off of next week too as I think we're overall in a good spot | 19:13 |
clarkb | We're using the entirety of our quota and most things seem to be working | 19:13 |
clarkb | The small issues we have seen include: this is a floating ip cloud so some jobs have had to adjust to using private ips in their configs instead of public ips (since nodes don't know their public ips) | 19:13 |
clarkb | the mtu on the network interfaces is only 1442 instead of the common 1500. | 19:14 |
clarkb | And we sometimes have slowness scanning ssh keys from nodepool which was causing boot timeouts until we increased the timeout | 19:14 |
clarkb | I do wonder if possibly the mtu thing could cause the slowness ^ there. But it seems like fragmentation should negotiate more quickly than that | 19:14 |
frickler | did we start using swift storage yet? | 19:15 |
clarkb | Continue to be on the lookout for any unexpected behaviors, they have been receptive to our feedback so far and we can continue to feed that back as well | 19:15 |
clarkb | frickler: not yet | 19:15 |
clarkb | we did add this cloud region (and openmetal) to the nested virt labels and johnsom reports they both seem to be working for that | 19:15 |
clarkb | uploading job logs to swift in that region is likely going to be the next good step we take | 19:15 |
fungi | related to my swift cleanup work though, it might be worthwhile long term to migrate from classic rackspace swift to flex swift and then ask them to just delete all the containers in our account once the current log data has expired | 19:16 |
clarkb | as far as setting swift up goes I think the first step is figureing out how auth is supposed to work for that and if our existing auth setup is functional | 19:16 |
clarkb | if it is then I think we can just add this as a new region in the list that we randomly select from. However I half expect we'll need some new settings and setup will be more involved | 19:17 |
frickler | so that would be worth keeping on the agenda then, I'd think | 19:17 |
clarkb | sure we can do that if we want to track the swift effort that way | 19:17 |
frickler | also maybe tracking when they're ready to ramp up quota? | 19:18 |
clarkb | fungi: I don't think you tried swift auth with our swift accounts in the spin up earlier right? | 19:18 |
fungi | i did not, no | 19:18 |
clarkb | ok so we don't have any idea yet on how that works | 19:18 |
clarkb | I'll see if I have time later this week to experiment | 19:18 |
clarkb | frickler: ya though I half expect that to happen in an email response to the feedback thread I started so not sure we need to check in weekly on the quota situation | 19:19 |
clarkb | any other questions or concerns or ideas related to the new cloud region? | 19:20 |
clarkb | sounds like no | 19:21 |
clarkb | #topic Etherpad 2.2.4 Upgrade | 19:21 |
clarkb | So we upgraded and everything seemed happy except for the meetpad integration | 19:21 |
clarkb | it turns out in version 2.2.2 or similar they updated etherpad to assume it is always in the root window for jquery (I may get some of these details wrong because js) | 19:21 |
clarkb | and since meetpad embeds etherpad this broke etherpad | 19:22 |
clarkb | other people using etherpad embedded (including jitsi meet users) noticed and reported the issue which got fixed in the first commit after the 2.2.4 release. Unfortauntely there is no 2.2.5 release yet so we went ahead and deployed a new image that checks out the latest commit (by sha) as of the time of writing that chagne and this has fixed things | 19:22 |
clarkb | Ideally we won't run a random dev commit for very long so I'm still hopeful that 2.2.5 shows up soon. But things seem to work again | 19:23 |
tonyb | makes sense given the ptg is coming up | 19:23 |
fungi | yeah, we didn't want to leave it like that any longer than necessary | 19:24 |
fungi | just glad we remembered to test it once the update was deployed | 19:24 |
clarkb | if you notice any problems with etherpad or meetpad or the integration between the two please say something | 19:24 |
clarkb | but with my admittedly limited in scope and duration testing it seems to be working again | 19:24 |
clarkb | #topic Updating ansible+ansible-lint versions in our repos | 19:25 |
clarkb | I'm selfishly keeping this item on the agenda because I'm having a tough time getting reviews :) | 19:25 |
clarkb | #link https://review.opendev.org/c/openstack/project-config/+/926848 | 19:25 |
clarkb | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 | 19:25 |
clarkb | I'd like to get these landed just as part of the ubuntu noble default nodeset manuever | 19:26 |
clarkb | I'm happy to address feedback if we feel strongly about any of those ansible rules (eg I can disable them and undo the updates) | 19:26 |
clarkb | s/ansible rules/ansible-lint rules/ | 19:26 |
clarkb | but I think getting that updated will help future proof us for a bit | 19:26 |
frickler | I was looking at those but still undecided whether to just accept it or complain like corvus did | 19:27 |
clarkb | basically don't take this as me advocating for anything in particular other than "run more up to date tools so we can keep up with python releases" | 19:27 |
corvus | i offer my moral support for skipping those rules :) | 19:28 |
clarkb | one upside to using a linter is we can avoid complaining about formatting ourselves. That said I would say that as a group we're pretty good about avoiding review nit picks like that and ansible-lint is extrmeely opinionated so we're kind of in a weird situation there | 19:28 |
clarkb | I suspect that other projects (nova maybe based on recent mailing list emails) get bigger benefits from just going with what the tool says to do | 19:29 |
frickler | all those "name"/"hosts" reorderings are the top ones I would likely want to not do | 19:30 |
frickler | but I can also see benefit in just following those, similar to python projects just using black and putting an end to all formatting discussions | 19:31 |
clarkb | ya thats the main thing the easiest thing is probably to just accept someone else had an opinion then fix it once | 19:32 |
clarkb | anyway if no one feels strongly enough to -1 maybe we should proceed? | 19:32 |
clarkb | we can discuss further in review | 19:33 |
clarkb | #topic Zuul-launcher image builds | 19:33 |
clarkb | The opendev/zuul-jobs project has been created and is hosting these image build configs now | 19:33 |
clarkb | #link https://review.opendev.org/c/opendev/zuul-jobs/+/929141 Build a debian bullseye image with dib in a zuul job | 19:33 |
clarkb | this change successfully builds a debian bullseye image and I think it just merged | 19:33 |
clarkb | I think the next step is to upload it to an intermediate location then configure zuul to fetch and upload that to clouds? | 19:34 |
corvus | that is one next step | 19:35 |
frickler | so one question I have about this: we use the cache built into our image to prime the new cache, do I understand this correctly? | 19:35 |
clarkb | corvus: ^ do we need to disable that image in the nodepool builders to prevent conflicts or will they coordinate via zk and it should work out? | 19:35 |
corvus | the other next step, which can also start right now is for someone to run with making more jobs for more images | 19:35 |
clarkb | frickler: correct. Its like we are doing mathematical induction on git caches | 19:35 |
corvus | frickler: yes | 19:35 |
frickler | but can we start the induction in case we loose our existing images? | 19:35 |
corvus | clarkb: it's safe to build duplicate images | 19:36 |
clarkb | frickler: I think the bootstrap process is to use an existing cloud image to run the job then the build will just take much longer to prime the cache essentially | 19:36 |
corvus | frickler: the build should be able to run on an empty cloud node (slowly) | 19:36 |
corvus | yep | 19:36 |
tonyb | I'm keen to look at the "building more images" thing | 19:36 |
clarkb | frickler: if we find that tiem is too long we could manually snapshot an instance with the git repos pre cloned and use that image | 19:36 |
corvus | we could test that case with a 3 hour job if we want | 19:36 |
corvus | tonyb: ++ | 19:36 |
NeilHanlon | (so I don't get distracted looking at opensearch, I have an update on rocky CI failures w.r.t. "should we mirror rocky") | 19:37 |
NeilHanlon | just tag me when you're ready :D | 19:37 |
clarkb | NeilHanlon: will do | 19:37 |
corvus | after we get uploads to object storage working ... | 19:37 |
clarkb | corvus: are the uploads to intermedaite storage then eventually the clouds something you'll be working on? | 19:38 |
corvus | ... the code to have the zuul-launcher actually create cloud images is nearly ready to merge; once that's done, we should have all the pieces in place to watch a zuul-launcher manage a full image build and upload process | 19:38 |
corvus | we will need to add the openstack driver though :) | 19:39 |
clarkb | also are we running a zuul-launcher node? or do we need to do that too | 19:39 |
corvus | https://review.opendev.org/924188 | 19:39 |
corvus | safe to merge any time | 19:39 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/924188/ Run a zuul-launcher | 19:40 |
clarkb | thanks! | 19:40 |
corvus | clarkb: if anyone else wants to do the upload to intermediate storage, i welcome it; otherwise i should be able to get to it in a bit. | 19:40 |
corvus | one open question about that: what intermediate storage do we want? existing log storage? new rax flex container? | 19:40 |
fungi | also because these are huge, we should think carefully about expirations | 19:41 |
clarkb | corvus: due to the size of these images and not needing them to live for 30 days I wonder if we should use a dedicated container | 19:41 |
clarkb | it will just make it easier for humans to grok pruning of the content should we need to | 19:41 |
corvus | (incidentally, one thing we might want to consider if we don't end up liking the process with cloud storage is that we could use a simple opendev fileserver for our intermediate storage; but i like the idea of starting with object storage) | 19:41 |
clarkb | but then we can probably upload to any/all/one of the existing swift locations | 19:42 |
corvus | dedicated container sounds good. and i was thinking an expiration of a couple of days should be okay to start with. maybe we make it longer later, but that should keep the fallout small from any early errors in programming | 19:42 |
clarkb | ++ | 19:42 |
corvus | so if the rax-flex auth question is answered by then, maybe do it there? otherwise... vexxhost? rax-dfw? | 19:43 |
clarkb | probably not vexxhost since we don't use swift there (we made ceph sad when we tried) | 19:44 |
clarkb | but rax-dfw or ovh-bhs1 seem fine | 19:44 |
corvus | dfw will use fewer intertubes | 19:44 |
clarkb | that seems like a good reason to choose it | 19:44 |
corvus | ok, so dedicated container in rax-flex or rax-dfw. sgtm! | 19:44 |
corvus | if someone gets rax flex working, maybe please just go ahead and create an extra container? :) | 19:45 |
fungi | yeah, i noticed that rax classic dfw to rax flex sjc3 communication goes through the internet (but at least they share a common backbone provider) | 19:45 |
clarkb | corvus: will do if I manage that | 19:45 |
corvus | thx! | 19:46 |
clarkb | #topic Mirroring Rocky Linux Packages | 19:46 |
clarkb | NeilHanlon: hello! | 19:46 |
NeilHanlon | hi :) | 19:46 |
NeilHanlon | so.. i can't get opensearch to do what I want, but | 19:46 |
NeilHanlon | https://drop1.neilhanlon.me/irc/uploads/44fb256b36a4f97b/image.png | 19:47 |
clarkb | looks like something keyed off of depsolving? | 19:47 |
NeilHanlon | green is "successful", red is a job which had a "Depsolve Failed" message | 19:47 |
NeilHanlon | yeah | 19:47 |
NeilHanlon | https://drop1.neilhanlon.me/irc/uploads/17b1fdc1dad12d0b/image.png | 19:48 |
NeilHanlon | i can't seem to generate a short URL otherwise I'd link to the viz | 19:48 |
fungi | so indicates builds which hit some sort of package access problem i guess | 19:48 |
NeilHanlon | yeah these I looked into and are almost all because the host got some mirror A for Appstream and mirror B for BaseOS which were not in sync | 19:48 |
NeilHanlon | i'm sure there's others which aren't matching this depsolve message, but the signal was clear for these ones at least | 19:49 |
NeilHanlon | https://paste.opendev.org/show/bHtL7sBLms4vpOIOkxBN/ here.. the opensearch url :D | 19:49 |
clarkb | cool. I think that does point to using our own mirrors would have a benefit | 19:49 |
fungi | which raises a related question then... when we mirror, how can we be sure we keep both of those in sync with each other? | 19:49 |
clarkb | (side note I wonder if the proxies for the upstream mirrors should do some ip stickyness) | 19:49 |
fungi | or are they mirrored as a unit? | 19:50 |
clarkb | fungi: we would be rsyncing from a single source so in theory that source will be in sync with itself | 19:50 |
clarkb | fungi: rather than rsyncing from multiple locations which may be out of sync | 19:50 |
NeilHanlon | right, yeah. using --delay-updates or so | 19:50 |
fungi | and yeah, we do delay deletions | 19:50 |
NeilHanlon | alternatively, I've sometimes set it up so that everything except the metadata is synced first, then the metadata is can be fetched -- but if you're using --delete that wouldn't work | 19:51 |
clarkb | so ya as mentioend before the next steps would be to ensure we've got enough disk (using centos 9 stream as a stand in I think we decided we do) then write a mirroring script (should look very similar to centos 9 stream and other rsync scripts) then an admin can create the afs volume and merge things and get stuff published | 19:52 |
NeilHanlon | alright, I can work on a mirroring script and open a change for that | 19:52 |
tonyb | Similar ti CentOS I'm working on a tool that will ensure that all packages in the repomd are available in a mirror. which we can run after rsync befoer the vos release | 19:52 |
clarkb | NeilHanlon: that would be great. Then whoever ends up reviewing that can ensure the afs side is ready for it to land too | 19:52 |
tonyb | I don't think that will help with issues where BaseOS and Appstream are out of sync though :( | 19:52 |
clarkb | we can also set the quota on the afs volume such taht we don't accidentally sync down too much content | 19:53 |
fungi | yeah, if there's a semi-quick way we can double-check consistency, afs lets us just avoid publishing that state when it's wrong | 19:53 |
clarkb | better to hit a quota limit than completely run out of disk | 19:53 |
NeilHanlon | hear hear | 19:53 |
tonyb | fungi: I ran the tool on an afs node and it was < 1min per repo, which is quick enough for me | 19:54 |
clarkb | tonyb: that is plenty fast compared to how long rsync takes even not syncing any real data | 19:54 |
fungi | yeah, that's quick, especially where afs is concerned | 19:54 |
clarkb | NeilHanlon: and don't hesitate to ask if any questions come up in preparing that script | 19:54 |
clarkb | #topic Open Discussion | 19:55 |
clarkb | we have 5 minutes for anything else before our hour is up | 19:55 |
tonyb | I was thing so, also very quick if we can avoid a bunch of job failures | 19:55 |
fungi | just a heads up that i won't be around much thursday/friday this week, or over the weekend | 19:55 |
* frickler will also be offline starting thursday, hopefully just a couple of days | 19:56 | |
* tonyb will be more around again ... albeit in AU :/ | 19:56 | |
clarkb | thanks for the heads up | 19:57 |
clarkb | sounds like that may be just abuot everything. Thank you for your time today | 19:57 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Sep 17 19:57:46 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-17-19.00.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-17-19.00.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-17-19.00.log.html | 19:57 |
clarkb | we can end it there and everyone can go find $meal a couple minutes early | 19:57 |
corvus | thanks! | 19:57 |
fungi | thanks clarkb! | 19:58 |
tonyb | Thanks all | 20:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!