cardoe | Yeah that’s great. | 00:01 |
---|---|---|
cardoe | I was trying to get OSH to go to gunicorn so we could do uvicorn in the future. | 00:01 |
Clark[m] | Ok the Gerrit meetup page says things happen at 4:45am Pacific time tomorrow | 00:05 |
Clark[m] | So somewhere there was a time zone conversion error | 00:05 |
Clark[m] | "12:00 AM - 1:00 PM GMT" is what the email said. I saw the AM and thought midnight not noon | 00:06 |
Clark[m] | But if you look closer it runs until 1pm | 00:06 |
Clark[m] | Anyway I'm not sure I'll make the 5am edition. fungi: frickler: tonyb: if you happen to be awake you may be interested. They should stream it on the gerritforgetv youtube channel | 00:07 |
opendevreview | Clark Boylan proposed opendev/system-config master: Rebuild our base python container images https://review.opendev.org/c/opendev/system-config/+/944789 | 00:26 |
clarkb | if 2.0.24 doesn't work maybe we should stop building for arm or something | 00:26 |
frickler | nb04 sure is creative ;) new error from the docker-compose cron: "the input device is not a TTY" | 07:58 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Rebuild our base python container images https://review.opendev.org/c/opendev/system-config/+/944789 | 08:10 |
frickler | ^^ just testing a bit, feel free to revert/update. if we only need the uwsgi image for lodgeit, we might no even need both py3.11 and py3.12? | 08:11 |
tweining_ | hi. you probably heard this question before, but I didn't find information about it. How can I change the email of my Gerrit account? Is that possible? | 10:58 |
tweining_ | ah, nevermind. I found it. it is simply the "preferred email" in the settings | 11:04 |
fungi | tweining_: yes, gerrit lets you add multiple addresses to your account, so that you can push changes with any of them as the committer, but whichever you set as your preferred address is the one it shows for your account and the one to which it sends notifications | 13:36 |
tweining_ | thanks | 13:37 |
fungi | clarkb: not sure about the fip mac addrs post, can't remember if that one went through moderation or not. i'll try to kep an eye out for header differences the next time i do though (i approved one just now for exceding the 40kb message limit, but it was from an existing subscriber) | 14:01 |
*** dhill is now known as Guest11735 | 14:21 | |
fungi | heading out to run some errands, back in about an hour | 14:26 |
clarkb | frickler: oh! the docker-compose issue is because docker-compose and docker invert the -t/-T settings and when I manually tested I had a tty because I was running in a shell | 14:48 |
clarkb | I'll get a patch up for that shortly | 14:48 |
clarkb | frickler: removing arm64 was something that I was considering too. Doing that would prevent us from building any uwsgi images on arm64 because the base images won't match. But we don't have any of those images today and if we wanted to add tehm I think we would do so with not uWSGI but granian or gunicorn or uvicorn etc | 14:49 |
clarkb | frickler: so I think that solution is fine | 14:49 |
clarkb | fungi: ^ fyi for when you return, can you rereview 944789? I think that solution is ok | 14:49 |
opendevreview | Clark Boylan proposed opendev/system-config master: Fix nodepool image export cron https://review.opendev.org/c/opendev/system-config/+/945016 | 14:55 |
clarkb | I think ^ should fix the tty issue | 14:55 |
clarkb | frickler: fungi: the other idea I had was maybe doing RUN assemble uWSGI || assemble uWSGI and see if a second pass at compiling makes the build happier since it wouldn't be starting from scratch? But then I remembered python tries to do isoalted builds now so that may not be the case | 15:01 |
clarkb | and then as for why mailman is ok but we aren't I wonder if this is a glibc vs musl problem | 15:02 |
clarkb | that could also explain why uwsgi builds on developer arm apple laptops (though thats a huge assumption stretch) | 15:02 |
fungi | errands were faster than anticipated, so i'll probably step out again for a quick lunch in a few after i catch back up | 15:15 |
fungi | clarkb: i approved 944799 but frickler has a cleanup question in a comment there, just heads up | 15:16 |
fungi | 944789 lgtm too, approved | 15:16 |
fungi | what were we using the uwsgi arm64 images for anyway? we don't run lodgeit on that arch | 15:18 |
clarkb | fungi: frickler: I think we keep the images in place forever? I don't know that there is a good reason to remove them. I suspect removing them would only potentially acuse problems | 15:19 |
clarkb | fungi: we haven't/ don't use uwsgi on arm64 but the base images are built for both arches because it removes a step later if you do end up needing them for that arch | 15:20 |
clarkb | in this case I think uwsgi is definitely a dead end so ratherthan land a new thing that uses uwsgi on arm I would suggest a different wsgi server | 15:20 |
fungi | makes sense, but also seems fine to clean up | 15:20 |
fungi | okay, gonna grab some lunch while those gate, but shouldn't be too long. bbiab | 15:25 |
opendevreview | Merged opendev/system-config master: Drop python3.10 container image builds https://review.opendev.org/c/opendev/system-config/+/944799 | 15:36 |
fungi | okay, back, sorry for all the disappearances | 16:25 |
clarkb | I had to recheck the container rebuild change there was a quay.io hiccup pulling the multiarch build deps | 16:27 |
clarkb | I think that is something we can optimize out of the uwsgi job if it is a persistent problem as we aren't doing multiarch anymore | 16:27 |
fungi | i see that | 16:27 |
clarkb | but for now it shouldn't hurt anything I don't think | 16:28 |
fungi | yeah, will keep an eye out for any recurrences | 16:28 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/945016 passes testing now too which should hopefully be the last fix for nodepool builder cron jobs | 16:28 |
clarkb | and now we seem to have hit docker rate limits. Maybe this is a recheck in a few hours situation | 16:36 |
fungi | ugh | 16:45 |
fungi | we're still incrementally getting away from those at least | 16:46 |
clarkb | ya this is the first one that has affected me in a while | 16:46 |
clarkb | I'm starting to look at booting an nb07 noble node in osuosl for arm test image builds | 16:46 |
clarkb | and I think we need a new mirror there too. But one thing at a time | 16:49 |
clarkb | hahahaha server does not support sse4_2. Ok I think we need to make that check conditional to x86 only | 16:57 |
clarkb | I'll work on a patch after I check launch node cleaned up after itself thsi launch | 16:58 |
fungi | a very astute observation, arm processors do indeed not support intel/amd x86 processor flags | 16:58 |
fungi | but we should make sure to adjust it in such a way that we don't need to revisit in, say, the hopeful future where we add a risc5 builder | 17:05 |
opendevreview | Clark Boylan proposed opendev/system-config master: Only check sse4_2 support on x86_64 https://review.opendev.org/c/opendev/system-config/+/945029 | 17:05 |
clarkb | fungi: yup I think ^ avoids that problem | 17:05 |
clarkb | I'll relaunch when that has landed | 17:06 |
fungi | yeah, lgtm. i assume you haven't tried it with that patch applied, i know we don't do any testing, but happy to approve further adjustments | 17:09 |
clarkb | ya I haven't. I could edit the file in the launcher venv if we want to see it in use | 17:10 |
clarkb | but seems afe enough to land and have ansible update the venv and then try | 17:10 |
fungi | i usually do that when fixing launch-node just because i'm lazy and expect that i'll push bad patches otherwise ;) | 17:10 |
clarkb | I'll do that now then | 17:11 |
fungi | it's not as if there's any production impact | 17:11 |
clarkb | its running now but will take a few minutes to get to that check | 17:12 |
clarkb | its doing stuff now so I think that fixed it | 17:15 |
fungi | cool, my approach with manually-invoked tools like launch-node is that it's fine to experiment "in production" and the whole reason to keep them in git is so that once i've worked out a fix nobody else should have to repeat the same research | 17:21 |
fungi | we have these processes to make things easier, not to get in our way | 17:21 |
clarkb | it seems to have succeeded launching too. I'm going to double check the node then will get some changes up | 17:26 |
fungi | awesome | 17:29 |
*** dhill is now known as Guest11742 | 17:33 | |
opendevreview | Clark Boylan proposed openstack/project-config master: Add config for the new nb07 nodepool builder https://review.opendev.org/c/openstack/project-config/+/945034 | 17:35 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Add nb07 to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/945035 | 17:36 |
frickler | is there a specific reason for us not to mirror the python images from docker.io, too? | 17:37 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add nb07 to the inventory https://review.opendev.org/c/opendev/system-config/+/945036 | 17:38 |
clarkb | frickler: we do mirror a subset (we've been adding them as we go) | 17:38 |
clarkb | I think those three changes should get nb07 deployed | 17:38 |
fungi | "as we go" means as we move services to ubuntu noble, where we can use a new enough toolchain to support using buildset and intermediate registries with quay | 17:43 |
frickler | is nb03 gone? wondering because 945036 removes it from a list instead of nb04, but there still are e.g. dns records for it | 17:46 |
clarkb | oh yes I meant to make a note of that | 17:46 |
clarkb | it is gone fomr our inventory. It was in the old linaro cloud | 17:47 |
fungi | right, the entire provider is gone | 17:47 |
fungi | that reference was just missed cleanup from months ago | 17:47 |
clarkb | I posted some comments on that | 17:47 |
clarkb | jsut to remove confusion for now and the future | 17:47 |
frickler | do you want to clean up dns for that in the same patch or follow-up? | 17:48 |
fungi | agreed, that's yet still more missed cleanup | 17:48 |
clarkb | I'll do a followup | 17:48 |
clarkb | I just approved the existing dns change | 17:48 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Remove nb03 records https://review.opendev.org/c/opendev/zone-opendev.org/+/945038 | 17:49 |
clarkb | there | 17:49 |
opendevreview | Merged opendev/zone-opendev.org master: Add nb07 to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/945035 | 17:56 |
opendevreview | Merged opendev/zone-opendev.org master: Remove nb03 records https://review.opendev.org/c/opendev/zone-opendev.org/+/945038 | 18:00 |
opendevreview | Merged openstack/project-config master: Add config for the new nb07 nodepool builder https://review.opendev.org/c/openstack/project-config/+/945034 | 18:10 |
opendevreview | Merged opendev/system-config master: Only check sse4_2 support on x86_64 https://review.opendev.org/c/opendev/system-config/+/945029 | 18:16 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Update the set-zuul-log-path-fact scheme to prevent huge url https://review.opendev.org/c/zuul/zuul-jobs/+/927582 | 18:16 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Add the build and tenant to the job header https://review.opendev.org/c/zuul/zuul-jobs/+/945042 | 18:16 |
clarkb | deps have merged for https://review.opendev.org/c/opendev/system-config/+/945036 should I approve it now? The alternative is waitnig for me to eat lunch first if we are worred about it it deploying and causing problems | 18:24 |
fungi | approved it | 18:28 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Update the set-zuul-log-path-fact scheme to prevent huge url https://review.opendev.org/c/zuul/zuul-jobs/+/927582 | 18:35 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add upload-image-s3 role https://review.opendev.org/c/zuul/zuul-jobs/+/944813 | 18:57 |
opendevreview | James E. Blair proposed opendev/system-config master: Add remaining clouds as zuul connections https://review.opendev.org/c/opendev/system-config/+/945049 | 19:18 |
opendevreview | Merged opendev/zuul-providers master: Add the DFW3 region for Rackspace Flex https://review.opendev.org/c/opendev/zuul-providers/+/943104 | 19:20 |
clarkb | fungi: tonyb: have a moment for https://review.opendev.org/c/opendev/system-config/+/945016 ? I'd go ahead and approve it except I've already broken this particular thing so figure an extra set of eyeballs is worthwhile | 20:07 |
fungi | i approve of breaking it again | 20:09 |
fungi | oh, i meant to also workflow +1 that one, sorrt | 20:13 |
fungi | s/sorrt/sorry/ | 20:13 |
clarkb | no problem I got it | 20:14 |
clarkb | maybe tomorrow we can start landing some python312 updates if I can get the base images to update today (I just rechecked the change) | 20:14 |
*** elibrokeit__ is now known as elibrokeit | 20:15 | |
fungi | that would be great, yep | 20:16 |
clarkb | the nb07 change should land in the next little bit. Then we have about half an hour of deploying things. If that goes well I'll shutdown the builder on nb04 and put it in the emergency file then request image builds | 20:32 |
clarkb | that should force nb07 to start building things | 20:32 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Refactor raxflex labels https://review.opendev.org/c/opendev/zuul-providers/+/945052 | 20:33 |
fungi | corvus: that ^ reminds me that i'm missing an essential morsel of understanding about niz... how are the label names mapped to image+flavor combinations in each provider? is there a separate file with the parameters? | 20:38 |
fungi | never mind! it's the labels.yaml file | 20:39 |
corvus | fungi: it's done globally in labels.yaml; so globally we say that the "niz-ubuntu-noble-16GB" label always means the "ubuntu-noble" image and the "16gb" flavor | 20:39 |
fungi | i should have looked harder (or at all) | 20:39 |
corvus | then it's the definitions of what "ubuntu-noble" or "16gb" mean that are different on each provider | 20:39 |
fungi | yep, perfect | 20:39 |
corvus | (and, honestly, not sure where any of this will land with respect to the refactoring in 945052 -- i think we're in the "no wrong answers" phase of figuring this out :) | 20:41 |
fungi | from a model perspective i was just thrown because providers.yaml has the labels, flavors and images, but not the mapping which connects them | 20:41 |
fungi | but that's just an organizational choice really | 20:41 |
corvus | there's also flavors.yaml and images.yaml with the global part of those definitions | 20:42 |
fungi | right, those seem more like declarations of the names with little else, but i suppose they could later include additional parameters | 20:42 |
corvus | so in providers.yaml, on the providers objects, we're still "attaching" those global image and flavor definitions | 20:42 |
corvus | yep | 20:42 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add rackspace classic to zuul-launcher https://review.opendev.org/c/opendev/zuul-providers/+/945055 | 20:44 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add rackspace classic to zuul-launcher https://review.opendev.org/c/opendev/zuul-providers/+/945055 | 20:45 |
opendevreview | Merged opendev/zuul-providers master: Refactor raxflex labels https://review.opendev.org/c/opendev/zuul-providers/+/945052 | 20:46 |
clarkb | the reason we have certcheck complaining about ptg.o.o apepars to be stale apache worker processes. I'm going to restart apache on that server to clear those out | 20:46 |
fungi | sgtm | 20:47 |
opendevreview | Merged opendev/system-config master: Add nb07 to the inventory https://review.opendev.org/c/opendev/system-config/+/945036 | 20:47 |
clarkb | thats all done and now I'll pay attention to ^ | 20:49 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add Openmetal provider https://review.opendev.org/c/opendev/zuul-providers/+/945056 | 20:50 |
corvus | openmetal seems to have some opendev-specific flavors, and some generic ones. i believe we are only using the single opendev flavor | 20:51 |
corvus | i'm guessing we should make some extra flavors there? | 20:52 |
clarkb | corvus: ya I think the reason for that was the built in flavors didn't get the ratios right for us. but ya we have admin access in that cloud so can add flavors | 20:52 |
corvus | cool, that seems like a contribution opportunity for anyone who might be interested in that :) | 20:54 |
clarkb | corvus: what disk memeory vcpu do you think the 4gb and 16gb flavors should have? | 20:54 |
corvus | if we're not limited by disk, then 80 for all would be good; if we are, i think it'd be okay to do 40/80/80. | 20:55 |
clarkb | I think we're ok on disk. cpu is the major limit there | 20:55 |
corvus | for vcpu, i think 4/8/8 would be okay | 20:55 |
clarkb | ack | 20:56 |
corvus | (and i suppose if we end up not using all the cpu, we could increase the cpu for the 16gb, but it's not important, and i don't see it happening much elsewhere. i'm not sure which of cpu/ram we hit first) | 20:57 |
clarkb | corvus: ok created. I added a new -8GB flavor too just to fit into the naming scheme | 20:59 |
corvus | ooh cool thanks! | 21:00 |
clarkb | for anyone wondering what the process was tehre I sshed to openmetal.us-east.opendev.org then looked in the kolla info for admin login creds then logged into horizon and used the flavor wdiget | 21:00 |
clarkb | corvus: I think your key isn't in place on those servers and I seem to recall that is because you weren't interested in admining the cloud. Totally fine if that is the case but I can add you if you like | 21:01 |
corvus | ack thanks, i'll let you know :) | 21:01 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add OVH provider https://review.opendev.org/c/opendev/zuul-providers/+/945057 | 21:03 |
clarkb | corvus: there is a syntax error in the first change in the stack. | 21:05 |
corvus | ha of course there is :) | 21:05 |
clarkb | good opportunity to add the new flavors to openmetal config too :) | 21:05 |
corvus | oh i think that "syntax error" won't get fixed until we restart the schedulers with the new config; i'll do that in a minute | 21:07 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add Openmetal provider https://review.opendev.org/c/opendev/zuul-providers/+/945056 | 21:07 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add OVH provider https://review.opendev.org/c/opendev/zuul-providers/+/945057 | 21:07 |
clarkb | ack keep in mind there is a deployment going on that will eventually attemtp to interact with the zuul schedulers (minimally the job will run but it should noop) | 21:07 |
corvus | but i updated anyway | 21:07 |
clarkb | this is the new arm builder deployment. It edited the inventory so all the deployment jobs are running | 21:08 |
corvus | just looking at the job list, are we sure we can't run more in parallel? | 21:09 |
corvus | "can't" is the wrong word | 21:09 |
corvus | i mean effectiveness | 21:09 |
fungi | we could, though when it reaches the letsencrypt job that has to complete before the remainder can start | 21:10 |
clarkb | ya we can. The two big holdups are the beginning with bootstrap-bridge and -base running serially. Then thinsg run in parallel until letsencrypt which is another snychronization point. Then things run in parallel afterwards to a degree but there aer some deps between services after le too | 21:10 |
clarkb | happy to bump to 6 and see how that does | 21:10 |
corvus | yeah, it just got past LE, so it's doing the next 20 jobs 4 at a time | 21:10 |
fungi | given its runtime, we've seen the letsencrypt job go for a while by itself after all earlier jobs have completed | 21:10 |
corvus | 22 jobs | 21:11 |
clarkb | some of those can't be run in parallel But many can its like gitea, gerrit, manage projects, zuul ? in that order? | 21:11 |
clarkb | but still thats 18 | 21:12 |
fungi | right, increasing could still certainly speed up deploys for changes that touch the inventory, but those are generally a small subset of our deploy buildsets | 21:12 |
corvus | it's rolling through them pretty quickly, so i get that it's not going to save a huge amount of time, but i just doesn't feel right. :) | 21:12 |
corvus | i think i'm leaning slightly more toward "more" now :) | 21:12 |
fungi | i definitely don't object to increasing it, when we uped it to 4 i felt like we probably needed to observe it for a week before increasing further | 21:13 |
opendevreview | Merged opendev/system-config master: Fix nodepool image export cron https://review.opendev.org/c/opendev/system-config/+/945016 | 21:14 |
fungi | having resource consumption graphs to look at for bridge would also be helpful as we raise the parallelism, but spot checks suggest we have plenty of headroom from a memory and load average perspective when these happen | 21:14 |
clarkb | I think you can ssh port forward and look at cacti still | 21:15 |
corvus | looks like the zuul.conf update is waiting on a deploy (maybe this one) so i'm waiting for the job | 21:15 |
clarkb | but I haven't tried | 21:15 |
corvus | oh ha it hasn't been approved/merged | 21:15 |
corvus | https://review.opendev.org/945049 | 21:15 |
fungi | it has now! | 21:16 |
clarkb | double approved | 21:16 |
fungi | double secret probation | 21:17 |
opendevreview | Merged opendev/system-config master: Rebuild our base python container images https://review.opendev.org/c/opendev/system-config/+/944789 | 21:18 |
clarkb | woah it merged | 21:19 |
clarkb | service-nodepool succeeded. nb04 has been put in the emergency file and I stopped its builder service | 21:21 |
clarkb | and nb07 is building debian bookwork | 21:21 |
clarkb | *bookworm | 21:22 |
clarkb | I think shutting down nb04 may have "orphaned" debian-bookworm-arm64-dd297fd35c2e44f2bba8711f6e522ed2 in a building state in the db | 21:23 |
clarkb | but I can clean that up later if it doesn't get auto cleared out as other things build | 21:24 |
clarkb | I'll request a few other images build now too | 21:24 |
clarkb | ubuntu-noble-arm64 and rockylinux-9-arm64 were queued up | 21:25 |
fungi | the inventory change deploy seems to have completed too | 21:31 |
clarkb | yup hourlies are running, then the nodepool cron fix, then image promotion | 21:32 |
clarkb | I kinda wonder if the hourly running now is running with 944789 (because we use master in periodic/hourly) and then when we go back to deploy we'll "revert" to 945016 | 21:34 |
clarkb | I'm looking on bridge to see if I can tell | 21:34 |
clarkb | ya we have fd8241286dbafee3528035997e01b060392dac22 checked out which is rebuild container images | 21:34 |
clarkb | it doesn't matter in this case but something we should be careful about I guess | 21:34 |
clarkb | I guess a fix for that is to stop using master but the git state when the buildset is enqueued? | 21:35 |
clarkb | and then everything should go in lockstep? | 21:35 |
clarkb | importantly none of the hourly jobs are really ones that will have problems with things going back and forth | 21:36 |
clarkb | zuul is maybe the biggest one but it will just do something like add new project, remove new project, add new project which is "fine" | 21:36 |
clarkb | something to think about | 21:36 |
clarkb | oh hrm with 945016 holding the "lock" system-config is still fd8241286dbafee3528035997e01b060392dac22 | 21:38 |
clarkb | looking at the log we don't reset to master in this job. We skip that task so prepare-workspace-git must be doing the work for us? | 21:40 |
clarkb | I think this behavior is safe which is great. I just want to undersatnd how it works | 21:40 |
clarkb | ok 944789 is running without infra-prod-bootstrap-bridge and has infra-prod-service-gitea and infra-prod-service-refstack enqueued | 21:43 |
clarkb | oh never I just can't read | 21:44 |
clarkb | hey the system is working thats great. My skepticism being disproven is also great :) | 21:44 |
clarkb | I made some notes for digging into why things didn't rollback like I expected them to. I don't think I'm going to do that right now since that has brain melt potential. But I think I have enough notes to look back and see the order of operations in logs | 21:46 |
opendevreview | Merged opendev/system-config master: Add remaining clouds as zuul connections https://review.opendev.org/c/opendev/system-config/+/945049 | 22:42 |
clarkb | corvus: ^ that has deployed at this opint | 23:02 |
corvus | ah cool, thx; i'll restart the scheds | 23:04 |
clarkb | wow the new nb07 server has already built noble and bookworm | 23:04 |
clarkb | it seems much quicker than before. Looking at cpuinfo it seems like the same hardware. I wonder if kernels and/or ubuntu have just gotten better on arm? | 23:05 |
clarkb | anywy I'm happy with its progress. I'll request it does even more image builds now | 23:05 |
clarkb | everything but openeuler has been requested. Maybe by tomorrow they will have all built and we can clean up the old server too. That would be nice | 23:07 |
corvus | restarted schedulers, web, and launcher with the new config file | 23:31 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!