opendevreview | Lajos Katona proposed openstack/project-config master: Neutron: add review priority to ovn-bgp-agent and ovsdbapp https://review.opendev.org/c/openstack/project-config/+/938908 | 09:17 |
---|---|---|
opendevreview | Merged openstack/project-config master: Neutron: add review priority to ovn-bgp-agent and ovsdbapp https://review.opendev.org/c/openstack/project-config/+/938908 | 13:41 |
opendevreview | Merged openstack/project-config master: Remove Freezer DR from infra https://review.opendev.org/c/openstack/project-config/+/938184 | 15:16 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run gitea with memcached cache adapter https://review.opendev.org/c/opendev/system-config/+/942650 | 15:58 |
clarkb | that update simply changes the memcached container image location to the new mirrored image | 15:59 |
jamesdenton | @fungi No VMs on PUBLICNET in Flex - we implemented service types a while back on those subnets to only allow Floating IPs and router attachments. This allows for tenants to manage their own network space while only consuming IPs for 1:1 NATs when necessary | 16:12 |
jamesdenton | so, un fortunately, you'll have to do the floating IP dance for each VM if you require ingress | 16:13 |
clarkb | jamesdenton: thank you for confirming | 16:14 |
clarkb | so the next step for us is to update our cloud launcher to build the networking for us | 16:14 |
jamesdenton | sure thing. Apologies for the confusion. We will likely also implement the 'get me a network' function, just aren't there yet | 16:14 |
fungi | makes sense, thanks for clarifying! | 16:15 |
fungi | and yeah, support for nova get-me-a-network would probably require a default network/port/router autoprovisioned with each new project when it's created, i don't think there's any magic glue for those steps (but it's possible i missed that it was added at some point) | 16:16 |
clarkb | fungi: shoudl I push up a chnge to update cloud launcher or do you intend on doing that? | 16:41 |
fungi | i can pull it together after my next meeting, i expect it'll just be copying entries from the old raxflex-sjc3 config | 16:42 |
fungi | most of the definitions already exist, so just need to add the references | 16:42 |
clarkb | yes shouldn't need to define any new resources just apply them to the new cloud regions | 16:43 |
clarkb | zuul is very busy this morning | 17:03 |
frickler | jamesdenton: fungi: it would be nice to make "openstack network auto allocated topology create" work. afaict for that you'd a) need to set the default flag on PUBLICNET and b) create a public/shared subnet pool, see also https://docs.openstack.org/neutron/latest/admin/config-auto-allocation.html | 17:30 |
jamesdenton | frickler 100% agree - it's in our backlog | 17:30 |
frickler | if that works, you should be able to simply do "server create --net=auto" | 17:30 |
frickler | jamesdenton: before IPv6 or after it? ;-D | 17:31 |
* jamesdenton runs away | 17:31 | |
jamesdenton | before | 17:31 |
jamesdenton | frickler in the IPv6 implementations you've worked with, do they tend to be direct provider network attach or is it tenant-driven w/ subnet pools and dynamic routing? | 17:51 |
clarkb | jamesdenton: currently all of the clouds opendev uses use direct attachment to public networks for ipv4 and ipv6. Then its a mixture if some are set up to also allow tenant driven networks behind that public net | 17:52 |
clarkb | re zuul being busy one thing I notice is the ready node count seems somewhat high. I wonder if that means that zuul sin't able to assign nodes as quickly as nodepool is building them | 17:53 |
clarkb | there are short drops in executor counts but in general all executors are available so I don't think we're hitting the executor resource limit checks | 17:53 |
jamesdenton | thanks clarkb. I assume those are setup as dual-stack, too? | 17:53 |
clarkb | jamesdenton: yes. We have in the past had some providers that were primarily ipv6 with only NAT'd ipv4 outbound. But we don't currently have any of those | 17:54 |
clarkb | I think the current set are rax classic, vexxhost, and ovh. rax classic and ovh are what I'd call provider networking. YOu just get an ip on a network and don't get to customize that. Vexxhost is the one that does the other thing iirc | 17:55 |
clarkb | where you can have easy mode or opt into doing the tenant driven network setup for each stack iirc | 17:55 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add networks and routers in new flex tenants https://review.opendev.org/c/opendev/system-config/+/942936 | 17:56 |
fungi | clarkb: ^ | 17:56 |
jamesdenton | cool, thank you very much | 17:56 |
fungi | and then ovh does something weird with little ipv6 /124 networks, i think? | 17:57 |
clarkb | I think ovh does /128 and /32 addrs | 17:58 |
fungi | actually no, looking at a server there, it's provider network with a /56 | 17:58 |
clarkb | oh its only ipv4 that does the small subnet then | 17:58 |
fungi | i guess it eventually changed | 17:58 |
clarkb | pretty sure ipv4 is a /32 | 17:58 |
clarkb | and linux knows to talk to your default route even if it isn't on the same subnet | 17:58 |
fungi | aha, yep ipv4 has a /32 mask | 17:58 |
jamesdenton | that's an interesting setup | 17:59 |
fungi | and yeah, the default v4 route in ovh uses a dev route | 17:59 |
fungi | inet 158.69.69.81/32 metric 100 scope global dynamic ens3 | 17:59 |
fungi | default via 158.69.64.1 dev ens3 proto dhcp src 158.69.69.81 metric 100 | 17:59 |
clarkb | I think it is a way of helping the instance undersatnd it is in a pvlan like setup? | 18:00 |
fungi | 158.69.64.1 dev ens3 proto dhcp scope link src 158.69.69.81 metric 100 | 18:00 |
fungi | that's what adds the connection for the gateway | 18:00 |
clarkb | that way you don't try to ARP neighbors that can never respond. But yes it is different | 18:00 |
jamesdenton | we had discussed using subnet pools for IPv6 but i'm not super crazy about needing to depend on BGP agent for that to work, unless there's some other method | 18:00 |
clarkb | fungi: I +2'd your change but didn't approve it. Zuul is sufficiently busy that there is no rush to approve it I don't think. But feel free to do so if no one else manages to review it by the time zuul +1's it | 18:01 |
fungi | well, i'm going to pop out for a quick lunch now that my meetings are over, but can look at it when i get back | 18:01 |
clarkb | looks like a good chunk of the ready/available nodes are in ovh bhs1. I wonder if something about that particular region makes it slower to spin up jobs? | 18:02 |
clarkb | https://grafana.opendev.org/d/2b4dba9e25/nodepool3a-ovh?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all the difference between bhs1 and gra1 is unexpected at least | 18:02 |
frickler | jamesdenton: I've never built something with a provider network, always tenant networks and n-d-r as documented in https://docs.openstack.org/neutron-dynamic-routing/latest/install/usecase-ipv6.html | 18:13 |
frickler | but then I was a network engineer in an earlier life, so using BGP is kind of natural for me ;) | 18:16 |
clarkb | both zuul schedulers have low system loads so I don't think the schedulers are failing to farm out the work to the nodes | 18:30 |
clarkb | I half wonder if we could be having zookeeper contention. Load on the three zk cluster nodes is fine too. But zk is io limited iirc | 18:30 |
clarkb | that said if it was zk realted I wouldn't expect bhs1 and gra1 to have such different graphs | 18:31 |
clarkb | zk is central to everything and slowness there would in theory impact everything somewhat consistently | 18:31 |
clarkb | looking at bhs1 node listing there are definitely unlocked ready nodes with several hour long timestamps. I'm picking one of those and will try and trace it back to zuul and see if there is any indicator for why | 18:35 |
clarkb | looks like node 0040009172 belongs to node request 300-0026459610 for openstack/tacker 942762,1 tacker-ft-v2-st-userdata-ccvp which is a queued job from about 6 hours ago. I'm beginning to wonder if the problem is lots of multinode requests not being able to be fulfilled do to general system contention | 18:38 |
clarkb | so not slowness of the processing but bucket filling problems with available cloud resources | 18:38 |
clarkb | hrm that node requests eventually got declined by nl04's bhs1 provider because some node(s) failed | 18:40 |
clarkb | so these ready nodes are going back into the available bucket and we're not using them in subsequent requests for whatever reason. | 18:40 |
clarkb | meanwhile that job's node request is being processed elsewhere which I will try to track down next | 18:40 |
clarkb | rax-ord declined it due to failed node boots an hour and a half ago | 18:42 |
clarkb | I don't see anyone else picking it up yet (likely due to being at capacity in other regions?) | 18:42 |
corvus | so primary anomaly at this point is why wasn't 0040009172 reassigned? | 18:43 |
clarkb | theory time: as mentioned before tacker is quite demanding in its resource usage. Just a few (even two maybe?) tacker chagnes are enough to use all of our available quota. If enough of those requests fail in say bhs1 while we are already at quota that region won't be reused by those requests and we end up with large available node counts. Meanwhile every other ergion is at capacity | 18:44 |
clarkb | due to additiaonl demand and not able to service those requests | 18:44 |
clarkb | I think the thing I'm most confused about is why bhs1 isn't using those nodes now to fulfill other requests to get other changes moving along. But maybe that is due to node request priority (They are handled in fifo order? will that prevent cheaper newer requests from being handled in providers that have already denied the older expensive requests?) | 18:44 |
clarkb | corvus: yup I think that is the main question | 18:44 |
clarkb | I suspect that things will eventually return to normal particularly as the day goes on and overall demand eases. (Some of this must be related to opensatck release activity?) | 18:46 |
clarkb | but this illustrates a reason why projects like tacker should not do what they are doing | 18:46 |
clarkb | corvus: I see this in the nl04 launcher debug log: nodepool.PoolWorker.ovh-bhs1-main: Handler is paused, deferring request | 18:48 |
clarkb | corvus: is it possible that the provider poolworker sees we are at quota/capacity and pauses without first checking if it can fulfill the request out of ready unlocked nodes? | 18:49 |
clarkb | I haven't looked at that code in a while but I want to say we checked ready unlocked nodes before we look to the cloud and its quota so I don't think that is the case but it would explain what we are seeing | 18:49 |
corvus | this is a bug that is exposed by this unusual situation | 18:52 |
corvus | when we unlock a multinode request when it fails, we don't deallocate the node from the request (this is the main bug). that means that node sits there, still allocated but unlocked. the cleanup method will eventually deallocate it once the request goes away. but that will be a long time with these big requests retrying. | 18:54 |
corvus | i think we can deallocate earlier, gimme a sec. | 18:54 |
clarkb | oh interesting. Is this behavior not the same for a single node request? | 18:57 |
corvus | remote: https://review.opendev.org/c/zuul/nodepool/+/942939 Clear allocation when unlocking failed node requests [NEW] | 18:57 |
corvus | clarkb: if a single node request encounters a failure launching a node, then the only node of interest is a failed one, not a ready one, so no one notices. | 18:58 |
corvus | and an unlocked failed node can always be deleted even if it's allocated | 18:59 |
corvus | so basically, the amount this bug is noticeable scales proportionally to the ratio of successful/failed node launches in a single request -- muliplied by the number of providers that can handle that request. | 18:59 |
corvus | so the current situation is tailor made to bring it to our attention. | 19:00 |
clarkb | got it | 19:01 |
corvus | i don't think there's any (safe) zk surgery we could do to speed this up; pretty much only just yanking node requests or rolling out that fix. | 19:01 |
clarkb | and when we have projects like tacker making many large multinode requests the problem is made worse and probably not noticeable in most other situations | 19:01 |
clarkb | corvus: if we clear the node allocation that would fix the underlying bug and allow the ready unlocked nodes to be reused? | 19:06 |
clarkb | put another way do we expect 942939 to be the solution or merely a mitigation? | 19:07 |
corvus | btw, now is a really good time to look at the status page and appreciate the motivation for making the "zoomed" out view. | 19:07 |
corvus | clarkb: yes and yes | 19:07 |
corvus | or rather yes and "solution" | 19:07 |
clarkb | re the status page the total chagne count does seem to have caused it to slow down a bit | 19:08 |
clarkb | still useable but not sure if that is worth looking into as well (do the firefox and/or chrome debugging tools have profilers/) | 19:08 |
corvus | it is and they do | 19:08 |
clarkb | ok looking at the nodepool source the node.allocated_to field is a request id and we can't assign the node to a different request with that set | 19:09 |
clarkb | so ya this should fix it up | 19:09 |
clarkb | +2 on the change from me. Not somethign we can really force merge as we want the images to build and publish. We could manually updated the images if this doesn't slowly resolve itself I guess | 19:10 |
corvus | i went ahead and put it in gate and promoted it, so we may yet benefit from it today. :) | 19:11 |
corvus | i'll go run the test suite locally | 19:12 |
fungi | catching back up | 19:16 |
fungi | seems like everyone here was a network engineer at some point | 19:16 |
fungi | and yeah, bgp prepends and pads still haunt my dreams | 19:17 |
clarkb | corvus: would dequeing changes work as a safe workaround? | 19:17 |
corvus | clarkb: yes | 19:18 |
corvus | that would remove the node request | 19:18 |
clarkb | I think I'd be ok with that particularly for the check queue if things don't improve | 19:19 |
fungi | +2 from me on 942939 as well, thanks for the quick rundown and solution! | 19:20 |
clarkb | some changes in check are waiting on only a job or two. I'm half tempted to post to those changes a link to the buildset results for what did run then dequeue to try and improve things. However if they only have a job or two then the impact per change is likely small and I would have to do that for a large number of changes | 19:28 |
clarkb | any thoughts on whether we thnk that is a worthwhile mitigation to begin nowish/ | 19:29 |
fungi | for tacker specifically, or everything? | 19:32 |
fungi | also yikes re: the status page, as corvus mentioned. i guess this is the end of openstack feature freeze week in action | 19:33 |
fungi | quick someone take a screenshot for dansmith ;) | 19:33 |
fungi | 165 items in the check pipeline and 57 in gate is definitely something i appreciate being summarized | 19:34 |
dansmith | it's getting to look like a mindcraft kinda came with all the colored tiles | 19:34 |
dansmith | (in collapsed view) | 19:34 |
fungi | or minesweeper for us old folks | 19:34 |
clarkb | fungi: basically everything stuck for hours | 19:35 |
* fungi is afraid to click on any tiles, there might be a bomb | 19:35 | |
clarkb | openstacksdk has a few too | 19:35 |
clarkb | kolla-ansible | 19:35 |
clarkb | etc | 19:35 |
fungi | yeah, i mean this is almost like "old times" if you multiply it by a factor of 10, but 8-hour delays on stuff in check is pretty extreme these days | 19:36 |
fungi | almost as long in the integrated pipeline in the gate too | 19:37 |
clarkb | ya the problem is they have jobs with many nodes in them that aren't getting their requests fulfilled and those requests are effectively locking out the quota we have | 19:37 |
clarkb | which causes the problem to snowball. If we remove/dequeue those changes then the locked resources can be freed for use again | 19:37 |
fungi | nice that resource prioritization has ensured that things in the gate are actually returning results faster than in check, on average | 19:37 |
fungi | granted not by much at the moment | 19:38 |
clarkb | of course if I start dequeing and people immedaitely recheck we're likely to devovle back to this situation. But if we can get nodepool images updated in the interim that may be fine | 19:38 |
clarkb | but also in theory this will slowly recover as the total number of changes decreases | 19:39 |
clarkb | but to be determined if that will happen | 19:39 |
fungi | i haven't actually heard any complaints, am inclined to say we can just let things take their course and when we get restarted on corvus's bugfix they'll improve | 19:39 |
clarkb | wfm. We should monitor that thigns don't trend in the worse direction though. Current queue size for openstack is check: 164 and gate: 59 | 19:41 |
fungi | the system isn't particularly misbehaving, so much as certain projects are monopolizing our available quota. though yes we have a bug that's causing some of our quota to get unnecessarily blocked | 19:41 |
clarkb | thereare also check-arm64 things queued up but they are separate resources so I don't think they matter for this particular issue | 19:41 |
fungi | right, i was basically ignoring the arm builds, those are best-effort whenever osuosl has room to run them | 19:41 |
clarkb | yes my main concern is that the rate of new changes is greater than our ability to process changes in which case we will never catch up again. but we can monitor that and make decisions later if that appears to be the case | 19:41 |
fungi | we're in the middle of the pst workday, near the end of the busiest week in the openstack development cycle, so all things considered this is not terrible | 19:42 |
fungi | this is probably our peak utilization until ~6 months from now | 19:43 |
fungi | i suspect that within 4 hours the inbound rate will slow and then we'll start catching up | 19:43 |
fungi | per https://grafana.opendev.org/d/21a6e53ea4/zuul-status we're clocking 350 concurrent jobs with 450 nodes in use and a 1.5k node backlog, so not terrible | 19:46 |
clarkb | yup that is possible. It is also possible that we deadlock with all nodes in a ready unlocked state but not useable by any jobs. I think that is unlikely at this point given our ability to continue processing things. I just want us to keep an eye on it | 19:46 |
clarkb | the available node count has fallen slightly from its earlier peak too which is a good sign | 19:47 |
fungi | we reached nearly 600jph at the start of the pst workday and are now down to about half that | 19:47 |
fungi | i don't see that deadlock likely, no, the | 19:47 |
clarkb | 147 appears to be the recent peak and we are down to 130 | 19:47 |
fungi | "available" nodes on the test nodes graph are pretty steady | 19:48 |
fungi | don't seem to be climbing | 19:48 |
clarkb | if you look at the 12 hour graph its been steadily climbing over that period but does appear to be plateauing now | 19:48 |
fungi | good point, i was on the 6-hour view, but yes i agree | 19:49 |
fungi | anyway, it looks to me like we could be using our available quota more efficiently, but nothing's really "broken" just overused in very inefficient ways by some projects | 19:50 |
clarkb | ya the problem is mostly that this is a feedback loop on its self | 19:52 |
clarkb | due to the bug | 19:52 |
clarkb | as long as that loop starts trending in the right direction we're fine | 19:52 |
fungi | i mostly focus on the node requests backlog graph, when that begins to trend downward we're recovering. it doesn't particularly seem to be increasing at this point | 19:53 |
fungi | but the prior increases were choppy over the past few hours, so it's hard to say just yet | 19:54 |
fungi | other than that and the nodes stuck in available status that aren't freed up until the related multinode builds complete, things are looking pretty healthy. executors are all accepting, balance of running builds is consistent across all executors, none of the governors look at risk of kicking in... | 19:56 |
clarkb | ya that is a side effect of the bug too. We aren't running as many jobs are we are capable of | 19:57 |
clarkb | so overall load is relatively low | 19:57 |
clarkb | which is good. Better to have one problem and not two | 19:57 |
fungi | znode count and zk data size are trending steadily upward, but i suspect those will tip back over when we begin to burn down outstanding node requests | 19:57 |
clarkb | yup those are directly related to the node requests I think | 19:57 |
clarkb | fungi: if you get a second can you look at the openstack helm repo merge proposal? I suspect the listed plan won't work (particularly the git review step as that will try to push hundreds/thousands of commits?) | 19:59 |
clarkb | that email just went to openstack-discuss | 19:59 |
fungi | thanks, looking | 19:59 |
clarkb | essentially gerrit wants merge changes to consist of chagnes that were already in review (not necessarily merged themselves I don't think) so git reviewing would create those changes or maybe just fail if those changes aren't there arelady? | 20:00 |
clarkb | but I'm not positive of that | 20:00 |
fungi | yeah, i expect we're better off with the traditional way we've done it, push the new repo state in directly rather than as reviews | 20:01 |
clarkb | I need to eat luch now but can followup if you don't awnt to | 20:07 |
clarkb | it will just be a after lunch that I get to it | 20:07 |
fungi | already writing a reply | 20:07 |
clarkb | thanks | 20:07 |
corvus | the nodepool tests blowed up; probably needs the grpcio pin from zuul; if anyone wants to port that over that would help; otherwise i will do it when i finish this sandwich | 20:12 |
fungi | working on it | 20:18 |
Clark[m] | In positive news the upgraded grafana seems to be working | 20:19 |
fungi | very well, i might add | 20:20 |
fungi | knock on wood, the node requests backlog does seem to be falling, it's back down to where it was a few hours ago now | 20:25 |
fungi | gate pipeline size is shrinking, albeit slowly | 20:26 |
fungi | today i discovered ietf rfc 3676, and turned it on in my mua/composer | 20:35 |
fungi | only 21 years late | 20:35 |
fungi | https://www.rfc-editor.org/rfc/rfc3676 | 20:36 |
fungi | old dog learns new trick | 20:36 |
corvus | approved and promoted | 20:44 |
corvus | incidentally, in a bit of irony, i keep promoting these changes over the nodepool-in-zuul changes that implement quota handling and intelligent provider selection. (intelligent may be overstating it, but it's an improvement) | 20:56 |
clarkb | Looking at queues more closely I think that gate resets are making things worse | 20:56 |
clarkb | https://zuul.opendev.org/t/openstack/build/a9b6856edca946699ec5596bbb43b28a/log/job-output.txt#507 apparently doc jobs build pdfs in pre-run and those can timeout leading to retries as well | 20:56 |
clarkb | there were some openstacksdk installation issues in some jobs too | 20:57 |
clarkb | that stuff is all more typical for feature freeze / release activity so not necessarily unexpected | 20:57 |
clarkb | but I Think it does play into the slow backlog processing | 20:57 |
corvus | doc jobs do what now? | 20:58 |
clarkb | corvus: in that log I pasted a link to above it seems they build teh docs in pre-run not in run so when taht timed out building pdfs the job went into retry status | 20:58 |
clarkb | the 3rd retry is queued up now | 20:59 |
corvus | i think that's against the rules | 20:59 |
clarkb | https://zuul.opendev.org/t/openstack/build/360525c8ec954a93a35aee77abfcab97 this is the openstacksdk issue. This one didn't cause restarts but did cause a gate reset | 20:59 |
clarkb | I'll run down how/when that changed | 21:01 |
clarkb | that == docs run in pre-run | 21:01 |
clarkb | oh it isn't doing the actual build it is just trying to install the pdf build deps but since this involves latex and similar the package list must be huge? | 21:02 |
corvus | hrm it says its just prereqs | 21:03 |
corvus | yeah | 21:03 |
corvus | i guess it's no_logs though? | 21:03 |
clarkb | so consistently slow nodes are timing out trying to install a big package list | 21:03 |
clarkb | corvus: no it doesn't appear to be but it is a package task not a shell/command task so we don't get streaming logs unless teh task completes | 21:03 |
corvus | oh yeah i see that | 21:03 |
corvus | https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/roles/prepare-build-pdf-docs/tasks/main.yaml | 21:04 |
clarkb | ya thats the one | 21:04 |
corvus | annoying :( but doesn't look like anything wrong in the job. maybe worth a timeout bump | 21:04 |
clarkb | agreed. I also wonder if there are leaner toolchains for pdf generation. But I have no idea | 21:04 |
corvus | i mean, that's not that big of a package list | 21:04 |
corvus | i know (believe me i know) that will pull in a lot of deps, but they're all tiny | 21:05 |
clarkb | the latex stuff explodes in size | 21:05 |
clarkb | hrm I don't think I have latex installed anymore but I seem to remember it being several gigs of stuff when I did. But probably worth inspecting directly before assuming | 21:05 |
corvus | oh inkscape is in there :) | 21:06 |
clarkb | this ran on jammy. Let me start a container and apt-get install that list and see what happens | 21:06 |
corvus | 0 upgraded, 415 newly installed, 0 to remove and 33 not upgraded. | 21:06 |
corvus | Need to get 408 MB of archives. | 21:06 |
corvus | After this operation, 1468 MB of additional disk space will be used. | 21:06 |
corvus | clarkb: ^that :) | 21:07 |
clarkb | ok so 1.3 ish gigs not terrible but not small either | 21:07 |
clarkb | slow node networking maybe? | 21:08 |
clarkb | there are successful recent runs of that particular job against that project so probably is related to where it ran | 21:09 |
corvus | yeah, or maybe we're actually pushing our mirrors? | 21:09 |
clarkb | system load on the dfw mirror is currently reasonable. But it could be related to openafs caches too | 21:10 |
clarkb | dfw is closest to the afs fileservers though so should be least impacted by the rtt openafs problems | 21:11 |
clarkb | the available node count fell down to 126ish | 21:13 |
clarkb | it repeaked at 149 recently before doing so. A good sign that in general the growth isn't happening I think | 21:13 |
clarkb | speaking of minesweeper https://danielben.itch.io/dragonsweeper is a fun variant | 22:20 |
clarkb | it runs right in your browser too | 22:20 |
clarkb | https://zuul.opendev.org/t/zuul/build/f985aba9494c4cbcb7367096d4d64d22 the nodepool fix failed on this error. I've not seen it before has anyone else? | 22:28 |
clarkb | 1.32/stable is the latest stable series and it looks like we tried to install 1.31/stable. I do wonder if this is another acse where we just need to use the new thing? | 22:28 |
clarkb | https://forum.snapcraft.io/t/launchpad-builds-failing-with-cannot-install-snap-base-core20/45833 this seems to explain the issue | 22:29 |
clarkb | apparently you can `sudo snap install core snapd` to update snap inplace? | 22:30 |
clarkb | oh interesting the job runs on debian bookworm. I was about to say that it is weird for ubuntu to have this kind of incompatibility. Maybe the job should run on Ubuntu? | 22:31 |
clarkb | I'm going to try that since it is an easy fix if it helps | 22:32 |
clarkb | I'll squash it into the existing change since I think both things will be needed | 22:33 |
fungi | sgtm, thx | 22:38 |
clarkb | core20 apparently maps to ubuntu focal which is what had me confused that is old enough to be present on a modern ubuntu | 22:38 |
clarkb | but bookworm's snapd must not be as up to date | 22:39 |
clarkb | I want to see that job pass before we approve and promote it again Just because there is enough stuff in the zuul queue that doing so is disruptive if this job is still a problem | 22:44 |
corvus | it's great to see snapd has solved all packaging version problems | 22:49 |
corvus | clarkb: i approved the change but did not promote; that should save time without adding cost | 22:49 |
clarkb | good point | 22:50 |
clarkb | ya so I guess the "core" is a base image that other things build on. And instead of just pulling in a newer core if they need it you break if you don't have it | 22:51 |
clarkb | an odd choice if the idea is to effectively containerize everything but I don't use snap or flatpak enough to know why | 22:51 |
corvus | i recently tried installing plucky on a machine, and the livecd (or usb or whatever you call it now i guess) worked fine, but the installer is a snap and used an older core version which crashed on the hardware i was using. so... cool. i installed noble. | 22:55 |
clarkb | https://zuul.opendev.org/t/zuul/build/096fa9c630ca4bc29f6f258936c21b3a the first attempt failed downloading files from the bhs1 mirror. Clicking on links that failed I am able to download those files now | 23:02 |
clarkb | I wonder if we are missing an apt-get update somewhere | 23:03 |
clarkb | ok the k8s job succeeded. Do we want to promote now or just wait? Things do seem to be moving more quickly at this point but periodics enqueue in about 3 hours | 23:16 |
corvus | i think no promote for now, there are some changes that might merge at the top of the queue, but if we see an opportunity later, sure | 23:19 |
corvus | nm, head just failed; i prometed. win-win. | 23:21 |
tonyb | Is there such a thing as an "OpenDev" slide template? I don't think I can use presentty (sp?), If no template is there a std. set of images somewhere? | 23:25 |
corvus | i think it's been a while since anyone updated it, but there are a lot of presentations in https://opendev.org/opendev/publications | 23:31 |
corvus | also i guess we should merge some changes https://review.opendev.org/q/status:open+-is:wip+project:opendev/publications | 23:32 |
corvus | keep in mind the real action there is on tags and branches, not the master branch | 23:32 |
clarkb | but also I'm not sure we had a constent template? | 23:38 |
corvus | true. a lot of plain white backgrounds if i recall. :) but i dunno, maybe there's some stuff to mine there. :) | 23:46 |
clarkb | or black backgrounds with presentty :) | 23:46 |
corvus | ++ | 23:46 |
kevko | \o | 23:53 |
kevko | Folks, can I ask here what is with openvdev CI / nodes ? Or is it just overloaded because of too many reviews testing at once ? | 23:54 |
kevko | or ? | 23:54 |
kevko | 134 reviews stacked in check pipeline for example ... plenty of them are red ... so I wanted to know if it is overloaded ..or there is something happening in infra ? | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!