*** dmitriis is now known as Guest4995 | 03:31 | |
opendevreview | Hiroki Narukawa proposed openstack/nova master: libvirt: retry libvirt connection on live_migration_monitor https://review.opendev.org/c/openstack/nova/+/867077 | 03:44 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/nova master: Fix wrong description about minimum values https://review.opendev.org/c/openstack/nova/+/874061 | 08:05 |
opendevreview | melanie witt proposed openstack/nova stable/yoga: db: Resolve additional SAWarning warnings https://review.opendev.org/c/openstack/nova/+/874065 | 08:52 |
opendevreview | melanie witt proposed openstack/nova stable/xena: db: Resolve additional SAWarning warnings https://review.opendev.org/c/openstack/nova/+/874066 | 08:57 |
opendevreview | melanie witt proposed openstack/nova stable/wallaby: db: Resolve additional SAWarning warnings https://review.opendev.org/c/openstack/nova/+/874069 | 09:30 |
opendevreview | melanie witt proposed openstack/nova stable/victoria: db: Resolve additional SAWarning warnings https://review.opendev.org/c/openstack/nova/+/874071 | 09:31 |
opendevreview | melanie witt proposed openstack/nova stable/ussuri: db: Resolve additional SAWarning warnings https://review.opendev.org/c/openstack/nova/+/874072 | 09:32 |
elodilles | bauzas gibi : i don't know whether you noticed, but os-traits 2.10.0 is in the upper constraints \o/ | 09:45 |
bauzas | elodilles: clap clap | 09:49 |
bauzas | thanks for the dedication | 09:49 |
elodilles | np | 09:50 |
gibi | thanks folks | 09:52 |
* gibi has another CI system to debug :/ | 09:52 | |
bauzas | gibi: do we need to modify placement to support those traits ? AFAIR yes | 09:52 |
gibi | bauzas: yes | 09:52 |
bauzas | I can work on that | 09:52 |
gibi | feel free to ping me for review | 09:53 |
bauzas | (provided I easily find the pattern) | 09:53 |
gibi | bauzas: I think it is just a bump in requirements.txt nothing more | 09:53 |
gibi | the rest is automatic | 09:53 |
bauzas | aren't we checking the number of traits we support ? | 09:53 |
gibi | placement loads all the trait defs from whathever os-traits lib it founds at startup | 09:53 |
bauzas | I thought so | 09:53 |
gibi | I think we fixed that check | 09:53 |
gibi | https://review.opendev.org/c/openstack/placement/+/851966 | 09:55 |
bauzas | gibi: great | 09:56 |
bauzas | gibi: hah, fun, we forgot to bump the reqs for 2.9.0 https://github.com/openstack/placement/blob/master/requirements.txt#L29 | 10:14 |
bauzas | gibi: but since we don't pin on a release, we should get 2.10 | 10:15 |
bauzas | so technically, it's more about saying that for 2023.1 Placement will always support those traits are bare min | 10:15 |
bauzas | as* bare | 10:15 |
gibi | bauzas: yeah that is still on me to have a limited lower constraint job running on placement and nova to catch these as we only test with upper today | 10:24 |
gibi | I tried to do that after the last PTG but it was non trivial so I never finished it | 10:25 |
bauzas | cool, we haven't branched RC1 yet so we're on time | 10:25 |
* gibi needs time | 10:25 | |
bauzas | for the moment, we need to just make sure we document this | 10:25 |
gibi | I believe more in systems that enforces rules than documentation that we don't read when we should | 10:27 |
opendevreview | Sylvain Bauza proposed openstack/placement master: Update 2023.1 reqs to support os-traits 2.10 as min version https://review.opendev.org/c/openstack/placement/+/874080 | 10:29 |
bauzas | we have the PTL docs that I personnally enforce :) | 10:30 |
bauzas | gibi: sean-k-mooney: time for a placement review https://review.opendev.org/c/openstack/placement/+/874080 | 10:30 |
gibi | bauzas: +@ | 10:30 |
gibi | bauzas: +2 | 10:30 |
bauzas | nova-next is getting me mad | 10:56 |
bauzas | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/821228/7/gate/nova-next/e4ab52f/testr_results.html is anthologic | 10:57 |
bauzas | volume timeouts + guest kernel tainting | 10:57 |
opendevreview | Tobias Urdin proposed openstack/nova master: libvirt: update description for live_migration_completion_timeout https://review.opendev.org/c/openstack/nova/+/874083 | 11:02 |
gibi | bauzas: is this with the new cirros version? | 12:21 |
bauzas | gibi: nope, we haven't merged yet the change | 12:23 |
bauzas | I haven't verified which cirros image we were having with the test, but I guess it's still 0.5 | 12:23 |
bauzas | 2023-02-15 20:08:15.854905 | controller | ++ stackrc:source:692 : DEFAULT_IMAGE_NAME=cirros-0.5.2-x86_64-disk 2023-02-15 20:08:15.856966 | controller | ++ stackrc:source:693 : DEFAULT_IMAGE_FILE_NAME=cirros-0.5.2-x86_64-disk.img | 12:24 |
bauzas | indeed | 12:24 |
sean-k-mooney | i said this a few days ago but i do think we should look at going to 0.6.x | 12:26 |
sean-k-mooney | there are a few kernel bugs in the 0.5.2 image that cause ocational kernel panics related to the apic in the guest | 12:27 |
sean-k-mooney | the 0.6.2 image is built on the ubuntu 22.04 kernel | 12:27 |
gibi | bauzas: thanks, then I guess yet another guest kernel bug | 12:29 |
bauzas | sean-k-mooney: I have a change for this | 12:30 |
bauzas | sec. | 12:30 |
bauzas | sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/873934 | 12:30 |
bauzas | we could also use 0.6.2 if we want, I'm not against | 12:30 |
sean-k-mooney | i would prefer ot change it in devstack | 12:31 |
sean-k-mooney | we can do it per job | 12:31 |
* bauzas needs to lunch now | 12:31 | |
sean-k-mooney | but it would be better to use the same cirror image in all jobs | 12:31 |
bauzas | sean-k-mooney: but we can test the new cirros image by nova-next and if we see it works, then we could indeed use it for all our jobs | 12:32 |
bauzas | nova-next is also there for testing new stuff | 12:32 |
bauzas | I'm just afraid of changing all our jobs by once without correctly testing first | 12:32 |
sean-k-mooney | i guess but given we have known kernel issue in the 5.2 image nad we have been trying to change it for years i would prefer to do the change in Antilepe if we can | 12:33 |
*** dasm|off is now known as dasm | 14:00 | |
opendevreview | Amit Uniyal proposed openstack/nova master: Added a lock_unlock dcorator for instance https://review.opendev.org/c/openstack/nova/+/873648 | 14:02 |
opendevreview | Amit Uniyal proposed openstack/nova master: Added context manager for instance lock https://review.opendev.org/c/openstack/nova/+/873648 | 14:21 |
*** blarnath is now known as d34dh0r53 | 14:26 | |
bauzas | aaaaaand now I see more and more cirros guest segfaulting... https://review.opendev.org/c/openstack/nova/+/821228/7 | 14:42 |
bauzas | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da2/821228/7/check/nova-multi-cell/da2689f/job-output.txt | 14:42 |
opendevreview | Takashi Natsume proposed openstack/nova master: doc: mark the max microversion for 2023.1 Antelope https://review.opendev.org/c/openstack/nova/+/874103 | 14:44 |
dansmith | bauzas: where does that scenario:dhcp_client thing come in? | 14:50 |
bauzas | dansmith: https://review.opendev.org/c/openstack/nova/+/873934 | 14:50 |
bauzas | dansmith: and https://launchpad.net/bugs/2006467 | 14:51 |
dansmith | bauzas: right, what does scenario:dhcp_client affect | 14:51 |
dansmith | does that end up with cirros behaving differently? or does it make tempest do something inside the guest? | 14:51 |
bauzas | dansmith: ralonsoh told me to use that way b/c of https://review.opendev.org/c/openstack/neutron/+/871272/1/zuul.d/tempest-multinode.yaml | 14:51 |
bauzas | and I shamelessly copy/pasteed | 14:52 |
* bauzas is a bit dumb after like rechecking 100+ times a day | 14:52 | |
ralonsoh | this is the dhcp client used in cirros 0.6.1 | 14:52 |
ralonsoh | more info here: https://review.opendev.org/c/openstack/neutron/+/871272 | 14:53 |
ralonsoh | (in the commit message) | 14:53 |
dansmith | ralonsoh: but if the cirros client just uses it, what does the tempest config change? | 14:53 |
ralonsoh | dansmith, https://review.opendev.org/c/openstack/tempest/+/871270/2/tempest/common/utils/linux/remote_client.py | 14:54 |
ralonsoh | we choose what is the VM dhcp client | 14:54 |
ralonsoh | --> https://review.opendev.org/c/openstack/tempest/+/871270/2/tempest/config.py | 14:55 |
dansmith | ralonsoh: that's just for a manual renew, but doesn't affect how the guest works on first boot.. is that what you mean? | 14:55 |
ralonsoh | dansmith, no, that should not affect how the VM boots | 14:55 |
ralonsoh | the OS will use the exiting dhcp client | 14:56 |
dansmith | ralonsoh: gotcha okay, so I guess most of the issues I see getting an IP seem to be related to initial boot | 14:56 |
ralonsoh | right | 14:56 |
dansmith | okay cool, just making sure I understand | 14:56 |
bauzas | ralonsoh: and to clarify, cirros-0.6.x switched its dhch client to dhcpcd ? | 15:05 |
ralonsoh | yes | 15:05 |
bauzas | ack | 15:06 |
bauzas | then I understand what I wrote, huzzah :D | 15:06 |
dansmith | ralonsoh: does neutron record an event if the IP gets actually leased? | 15:08 |
dansmith | meaning, on failure can we query to neutron to see if the guest ever pulled its IP, to distinguish between "we can't ssh to the guest because of network problems" vs. "the guest is not alive and never pulled its ip" ? | 15:08 |
ralonsoh | dansmith, let me check, maybe in the syslog | 15:08 |
ralonsoh | understood, let me check | 15:08 |
ralonsoh | dansmith, neutron builds (adds/deletes) the leases file but we don't log this event. This is done by dnsmasq, you should be able to see that in syslog | 15:10 |
bauzas | oh good point | 15:10 |
sean-k-mooney | dansmith: i dont think neutron does but dnsmacq might | 15:10 |
bauzas | I forgot to look at dnsmasq, fucking shit | 15:11 |
bauzas | my ops skills become rusty | 15:11 |
dansmith | it's too bad because it might be a nice API to be able to poke that remotely.. i.e. instead of sshing forever, have a reasonably short timeout for the is-it-leased | 15:11 |
sean-k-mooney | bauzas: is this ovn | 15:11 |
fungi | so are the cirros kernel panics similar to one another, or random excuses? | 15:11 |
sean-k-mooney | because if its ovn we are not useing dnsmasq | 15:11 |
dansmith | and for reporting on failure, so we can say what forensics have been done | 15:11 |
sean-k-mooney | this is being handeled by openflow rules added by ovn | 15:11 |
dansmith | also, all three of those failed tests are volume-related | 15:12 |
sean-k-mooney | fungi: if they are related to acpi then its a know issue withthe cirrus 5.2 kernel | 15:12 |
dansmith | so I still wouldn't write-off it being a volume problem | 15:12 |
fungi | sean-k-mooney: sounds like a good reason to switch to 0.6.1 then | 15:13 |
dansmith | fungi: that's what I said, but 0.6.1 bringing other changes could be more destabilizing | 15:13 |
sean-k-mooney | fungi: i started working on alpine based image 2 years ago after i found out that the kernel bug was fixed in a later ubuntu kernel and cirro was just not updated | 15:13 |
ralonsoh | dansmith, what is the backend? OVS or OVN? | 15:13 |
dansmith | ralonsoh: I dunno | 15:13 |
ralonsoh | is this nova-next, right? | 15:13 |
dansmith | ralonsoh: yes | 15:13 |
ralonsoh | ok, let me check | 15:14 |
fungi | dansmith: agreed, the devils you know vs the ones you don't | 15:14 |
bauzas | ralonsoh: I've seen the dhcp lease problems in nova-next yes | 15:14 |
bauzas | man, can't I provide a regex when gerrit searching with 'comment' ? | 15:14 |
dansmith | this is one with the cirros bumped: https://44f9259a9cd22acee92d-000061e1666ecf9c52f0643ab3c391ab.ssl.cf1.rackcdn.com/873934/1/check/nova-next/785cc57/testr_results.html | 15:15 |
dansmith | three tests with ssh timeouts in one job is higher than average I'd say | 15:15 |
dansmith | which makes me concerned | 15:15 |
ralonsoh | dansmith, nova-next uses OVS. About this API call, could be something to be implemented, yes | 15:15 |
ralonsoh | but we don't have it now | 15:15 |
dansmith | ralonsoh: ack, it just seems like it would be nice to have | 15:16 |
sean-k-mooney | dansmith: im not sure it woudl be easy to do in all cases | 15:16 |
sean-k-mooney | it would be ml2 driver specific | 15:17 |
dansmith | sean-k-mooney: I'm sure it wouldn't | 15:17 |
sean-k-mooney | i dont think you coudl do it with ovn currenlty | 15:17 |
sean-k-mooney | not without changes to ovn | 15:17 |
dansmith | necessity, it's the mother of.. I forget.. something.. :D | 15:18 |
bauzas | don't let me play that mother game | 15:19 |
dansmith | bauzas: you and your language lately.. should probably avoid adding "mother" to things :D | 15:19 |
bauzas | :) | 15:20 |
fungi | bauzas: supposedly you can. "regular expressions can be enabled by starting with ^" https://review.opendev.org/Documentation/user-search.html | 15:27 |
fungi | says it specifically in the entry for the message: expression | 15:28 |
bauzas | ralonsoh: I just discovered some timeout on ovs | 15:28 |
bauzas | ralonsoh: Feb 15 17:21:05.272099 np0033113580 nova-compute[73993]: DEBUG ovsdbapp.backend.ovs_idl.vlog [-] 0-ms timeout {{(pid=73993) __log_wakeup /usr/local/lib/python3.10/dist-packages/ovs/poller.py:248}} | 15:28 |
bauzas | Feb 15 17:21:05.274517 np0033113580 nova-compute[73993]: DEBUG nova.compute.manager [None req-f2621b7f-2ead-4510-9e82-aad93ba9f29d tempest-ListServersNegativeTestJSON-1193978856 tempest-ListServersNegativeTestJSON-1193978856-project] [instance: d68e1732-b508-4f6b-be25-cbae06fde7c2] Build of instance d68e1732-b508-4f6b-be25-cbae06fde7c2 was re-scheduled: Timed out waiting for a reply to message ID c6193dab09f444b796478e321d69e3a2 | 15:28 |
bauzas | {{(pid=73993) _do_build_and_run_instance /opt/stack/nova/nova/compute/manager.py:2450}} | 15:28 |
bauzas | fungi: damn shit, missed that even if I did RTFM | 15:28 |
fungi | bauzas: sorry, i misread what you said. you're searching for comment not message | 15:28 |
bauzas | fungi: yeah that | 15:28 |
ralonsoh | bauzas, what is this job link? | 15:28 |
fungi | the entry for comment: doesn't mention regex | 15:28 |
fungi | just says it's a string match | 15:29 |
bauzas | ralonsoh: something new https://zuul.opendev.org/t/openstack/build/76f29afe3f134f139a48b537de7029dc | 15:29 |
fungi | so you're probably right | 15:29 |
bauzas | fungi: doh | 15:29 |
fungi | sadly | 15:29 |
bauzas | ok, I was wanting to query all the recheck messages I wrote | 15:29 |
bauzas | and since I haven't followed a clear pattern... | 15:29 |
fungi | could script that through the rest api, but it would be a bit of work | 15:30 |
fungi | depends on how much you want it, i guess | 15:30 |
bauzas | well, it was just for a quick lool | 15:31 |
bauzas | look* even | 15:31 |
bauzas | nevermind | 15:31 |
ralonsoh | bauzas, I don't know where this message is coming, but this is the OVS local service | 15:32 |
bauzas | yup on n-cpu | 15:32 |
bauzas | but apparently it's enough serious to do a reschedule | 15:32 |
bauzas | ... which on an AIO doesn't help | 15:33 |
bauzas | :) | 15:33 |
ralonsoh | bauzas, actually this is something normal, coming from the python ovs bindings | 15:34 |
ralonsoh | def __log_wakeup(self, events): | 15:34 |
ralonsoh | if not events: | 15:34 |
ralonsoh | vlog.dbg("%d-ms timeout" % self.timeout) | 15:34 |
ralonsoh | https://github.com/openvswitch/ovs/blob/master/python/ovs/poller.py#L246# | 15:34 |
bauzas | hmmmm | 15:37 |
bauzas | I'm able to find another patchset that got the exact failure from the same job on the same test tempest.api.compute.servers.test_list_servers_negative.ListServersNegativeTestJSON | 15:37 |
bauzas | https://d2746e36843633ae266c-ad135f72b22a132e11a904324fbc4e60.ssl.cf1.rackcdn.com/872413/3/check/tempest-integrated-compute-enforce-scope-new-defaults/5204f72/job-output.txt | 15:37 |
* bauzas does a bit of logsearch | 15:37 | |
ralonsoh | bauzas, I'm checking https://44f9259a9cd22acee92d-000061e1666ecf9c52f0643ab3c391ab.ssl.cf1.rackcdn.com/873934/1/check/nova-next/785cc57 | 15:41 |
ralonsoh | for example the first test case failing | 15:41 |
ralonsoh | test_attach_scsi_disk_with_config_drive | 15:41 |
ralonsoh | I see the DHCP agent configuring dnsmasq process, I see n-cpu creating the interface, OVS agent receiving this creation event | 15:42 |
ralonsoh | but I see nowhere in the DHCP agent logs the DHCPREQUEST for this fixed IP | 15:42 |
ralonsoh | and the tempest test doesn't retrieve the VM logs | 15:42 |
dansmith | yeah I'm not sure why we're not logging the console in that case | 15:43 |
dansmith | because we should see that, to know if it's a guest crash | 15:43 |
ralonsoh | dansmith, qq, this is about the image resources | 15:46 |
ralonsoh | Feb 15 15:24:26.291000 np0033111254 nova-compute[39923]: DEBUG nova.compute.resource_tracker [None req-a80484d7-0e98-4086-a01f-afda9d54f06b None None] Instance 749a5ad0-e7b8-47f2-8e9c-78896c92b80e actively managed on this compute host and has allocations in placement: {'resources': {'VCPU': 1, 'MEMORY_MB': 128, 'DISK_GB': 1}}. {{(pid=39923) _remove_deleted_instances_allocations /opt/stack/nova/nova/compute/resource_tra | 15:46 |
ralonsoh | cker.py:1632}} | 15:46 |
bauzas | despite, none of them show the console log | 15:46 |
ralonsoh | why are you using 128MB? | 15:46 |
ralonsoh | cirros should be using 256, right? | 15:46 |
bauzas | hmmm | 15:47 |
dansmith | ralonsoh: idk, maybe that was the result of a resize? | 15:47 |
bauzas | that's a good catch | 15:47 |
ralonsoh | dansmith, no, this is the VM booting | 15:47 |
bauzas | I haven't checked the flavors that were related to the cirros failures I saw | 15:47 |
ralonsoh | first log in the compute agent | 15:47 |
dansmith | flavors are 42, and 84 | 15:47 |
* dansmith looks | 15:47 | |
ralonsoh | 128 and 192 | 15:48 |
ralonsoh | I'll check neutron CI | 15:48 |
dansmith | 42 is the default, and that's 128M | 15:49 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/873575 | 15:50 |
dansmith | 84 is 192M yeah | 15:50 |
dansmith | so perhaps we're flying too close to the sun with 128M | 15:51 |
ralonsoh | dansmith, I'm checking what we are using in Neutron | 15:51 |
dansmith | AFAIK, these are devstack defaults | 15:52 |
ralonsoh | we use the default values too, 128M | 15:53 |
dansmith | bumping the flavor memory is likely to result in other instabilities, I fear | 15:53 |
ralonsoh | {'resources': {'DISK_GB': 1, 'MEMORY_MB': 128, 'VCPU': 1}} | 15:53 |
dansmith | yeah | 15:53 |
ralonsoh | if you could retrieve the console logs, that could help | 15:54 |
ralonsoh | at least to know that the VM tried to request an IP | 15:54 |
ralonsoh | and, somewhere, this request was dropped | 15:54 |
dansmith | yeah, we should chat with gmann about it. I can go look at the tempest stuff to figure out why, | 15:55 |
dansmith | but he probably knows off the top of his head | 15:55 |
bauzas | ralonsoh: I have a few logs where the guest failed to acquire a lease, sec | 16:04 |
bauzas | and some where the guest panickjed | 16:04 |
sean-k-mooney | for what its worht the every increaseign memory requirement for cirrios was one of the reasons i looked at moving us to alpine a few years ago | 16:06 |
sean-k-mooney | longterm i still think that would be a better approch | 16:06 |
dansmith | is alpine really going to be smaller than cirros? I mean, that seems odd to me | 16:06 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/845748 | 16:07 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/845753 | 16:07 |
dansmith | so I think we get the automatic console stuff when we use specific waiters for servers to be available | 16:07 |
dansmith | so that test must use a different one | 16:07 |
bauzas | oh damn, internal meeting | 16:07 |
ralonsoh | bauzas, do you have some links? in any case, this is not Nova nor Neutron fault, I think | 16:08 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/845754 | 16:08 |
ralonsoh | at least in this case | 16:08 |
bauzas | ralonsoh: yup, which kind of failures do you want to see ? for the dhcp query? | 16:08 |
ralonsoh | yes and the kernel panic | 16:09 |
bauzas | ralonsoh: one for segfaults https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da2/821228/7/check/nova-multi-cell/da2689f/job-output.txt | 16:20 |
bauzas | ralonsoh: that one for kernel panicking https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/821228/7/gate/nova-next/e4ab52f/job-output.txt | 16:23 |
bauzas | ralonsoh: that one is an interesting case where the default route is already present and metadata goes into weeds https://ae59d1e8526fa7671728-240e4b572b6f89b26c1b0e70b1c00c17.ssl.cf1.rackcdn.com/872413/3/check/nova-multi-cell/5e89e48/job-output.txt | 16:24 |
ralonsoh | I'll check this last one | 16:27 |
ralonsoh | bauzas, eh hold on, this could be a problem with the OVN version in Jammy | 16:28 |
ralonsoh | --> https://review.opendev.org/c/openstack/neutron/+/873684 | 16:28 |
ralonsoh | there is an issue with ovn v22.03.0, included in jammy | 16:28 |
ralonsoh | and some missing flows for the metadata | 16:29 |
ralonsoh | Yatin found it and we are skipping those tests | 16:29 |
ralonsoh | actually we are going to test using a compiled version of OVN | 16:29 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/874112/1 | 16:29 |
bauzas | ack ok | 16:30 |
*** dasm is now known as Guest5046 | 16:46 | |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/873575 | 16:51 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/873576 | 16:51 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/873577 | 17:01 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/873577 | 17:09 |
fungi | ralonsoh: any idea if the fixes have been backported to v22 such that ubuntu could do an sru to patch their packages? | 17:25 |
ralonsoh | fungi, Yatin opened a bug today: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2003056 | 17:25 |
fungi | awesome | 17:26 |
ralonsoh | (sorry, not today) | 17:26 |
fungi | i'd hate to see devstack using a bespoke ovn build long-term | 17:26 |
fungi | here's hoping they're able to patch it | 17:26 |
ralonsoh | from https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2003056/comments/4, there should be a new version now (22.04.1, instead of 22.03) | 17:27 |
bauzas | ralonsoh: I guess you don't recommend us to work on nova's zuul jobs to build ovs from source? | 17:29 |
fungi | oh, it didn't dawn on me that those might be date-based versions rather than semver, so yeah here's hoping backporting the fix in ubuntu won't be painful | 17:29 |
bauzas | s/ovs/ovn | 17:29 |
* bauzas is tired | 17:29 | |
ralonsoh | bauzas, no, we should use the OS released version | 17:29 |
bauzas | cool | 17:30 |
ralonsoh | we use compiled version in Neutron for testing only | 17:30 |
bauzas | yup saw the DNM | 17:30 |
ralonsoh | (and we had problems for this, this is why we use both now) | 17:30 |
bauzas | but I was wondering how much of this was actionable on our side | 17:30 |
bauzas | I'm like done rechecking every 2 hours | 17:30 |
bauzas | any actional progress sounds to a sweet spot :) | 17:31 |
bauzas | sounds to me* | 17:31 |
fungi | oh yay, so it ended up in j-p-u yesterday and should be showing up on mirrors at any moment assuming the ubuntu autobuilders aren't clogged | 17:32 |
fungi | we may need to add that repository temporarily to the sources list in affected jobs, until it migrates into a jammy point release | 17:33 |
fungi | https://launchpad.net/ubuntu/+source/ovn/22.03.2-0ubuntu0.22.04.1 indicates the binary packages haven't built yet | 17:35 |
* bauzas ends his day now | 17:36 | |
bauzas | I hereby declare Nova on Feature Freeze :) | 17:37 |
bauzas | (anyway, all the accepted blueprints were reviewed) | 17:37 |
* bauzas will continue fighting the Beast tomorrow and later so as to make sure we land what we approved | 17:37 | |
bauzas | kbye ;) | 17:37 |
fungi | fnordahl: i'm a little fuzzy on ubuntu's sru flow... is it like proposed-updates in debian where it only makes it into the mainstream indices in periodic point releases and we need to put jammy-proposed in sources.list in the interim? | 17:38 |
gmann | dansmith: bauzas: ralonsoh: not read all the logs but related to cirros image there is patch up to bump it to version 0.6.1. https://review.opendev.org/c/openstack/devstack/+/859773 | 17:56 |
ralonsoh | gmann, thanks. We have seen some seg faults and kernel panics during the VM boot, using this image | 17:59 |
ralonsoh | in https://review.opendev.org/c/openstack/nova/+/873934 | 17:59 |
*** dasm is now known as Guest5052 | 18:10 | |
spatel | sean-k-mooney i saw your post about my question related ceph disaster | 19:05 |
spatel | I am trying to do rescue method and stuck here - https://ibb.co/y84BLGY | 19:05 |
sean-k-mooney | hum ok have you tried un rescuing and seing if it fixed enough for the vm ot recover its self | 19:08 |
spatel | Let me try now.. | 19:09 |
spatel | no luck - https://ibb.co/4swXjQ1 | 19:10 |
spatel | ceph status showing all PGs are clean and active - https://paste.opendev.org/show/bnG8rvXJydADTZknd2QD/ | 19:11 |
spatel | not sure why i got filesystem corruption. | 19:11 |
mnaser | is there ci jobs that are testing secure rbac across all services? | 19:24 |
mnaser | in an all zed env, enabling it for neutron seems to break new vm deployments | 19:25 |
mnaser | something along these lines: 2023-02-15 22:04:28.241 2935939 ERROR nova.compute.manager [instance: aaac6261-721a-404d-80d3-94cf25bf869b] neutronclient.common.exceptions.PortNotFoundClient: Port 485c1e6e-fdcd-45ca-ae52-c2cdeb95bfdd could not be found. | 19:25 |
sean-k-mooney | im not really sure what to do. you could try stoping the vm and mounting the volume on the host and running the filesystem recovery form there | 19:25 |
sean-k-mooney | you might be able to fix the superblock but im not sure if that will work | 19:25 |
sean-k-mooney | mnaser: i tought there was a devstack job for this yes gmann would know more | 19:27 |
mnaser | i mean enabling it in nova works fine, so nova is happy, but enabling in neutron makes it un happy | 19:27 |
mnaser | so it could be a neutron issue so i dont know if the devstack job enables it for all or just for specific services | 19:27 |
sean-k-mooney | right but i tought we had it enabled for both | 19:27 |
sean-k-mooney | ya its a good question im not sure either | 19:28 |
mnaser | https://github.com/openstack/nova/blob/master/.zuul.yaml#L678-L684 | 19:29 |
mnaser | wonder if its that | 19:29 |
sean-k-mooney | ya that should have it enabled for those 4 services | 19:33 |
sean-k-mooney | mnaser: https://zuul.openstack.org/builds?job_name=tempest-integrated-compute-enforce-scope-new-defaults&skip=0 | 19:33 |
sean-k-mooney | it looks pretty green too | 19:34 |
sean-k-mooney | welll there is at least some green menaing it should work in general but im not sure how much is covered by that | 19:34 |
mnaser | sean-k-mooney: i wonder if we are getting hit by this since its not in zed yet - https://github.com/openstack/neutron/commit/6d8ada0ac93beed05b45adb9582c3ef23bef49d2 | 19:37 |
mnaser | and the test that failed was actually as an admin | 19:37 |
sean-k-mooney | oh your trying to do this in zed | 19:38 |
sean-k-mooney | ya ok we only enabled it by defualt this cycle | 19:38 |
sean-k-mooney | mnaser: im not sure that was planned ot be backported | 19:39 |
sean-k-mooney | i would ask the neutrnon folk to backport it if you intend to enable it | 19:40 |
sean-k-mooney | its kind fo feature ish | 19:41 |
mnaser | sean-k-mooney: yeah i guess one could argue its a bug too | 19:41 |
sean-k-mooney | its because of the pivort that happend at the yoga fourm/ptg | 19:42 |
sean-k-mooney | when we deiced to revert a lot of the work and remove the use of scopes form most apis | 19:42 |
sean-k-mooney | under the orginal plan admin should not be global admin | 19:43 |
sean-k-mooney | so they adapted to that change in zed | 19:43 |
gmann | mnaser: yes, you found those. during integration testing in this cycle (tempest-full-enforce-scope-new-defaults), we found few bugs in neutron and they got fixed in master | 21:51 |
gmann | mnaser: I think slaweq was planning to backport those to stable/zed or older if needed | 21:51 |
gmann | let me find those patches | 21:51 |
gmann | basically these three bugs https://bugs.launchpad.net/neutron/+bug/1996150 https://bugs.launchpad.net/neutron/+bug/1996836 https://bugs.launchpad.net/neutron/+bug/1997089 | 21:55 |
gmann | mnaser: pinged about these in neutron channel. I was in impressions that they wee backported already | 21:57 |
gmann | mnaser: glad to know it worked fine for nova. did you enable scope and new defaults both or just new defaults ? | 22:06 |
*** JohnnyW5 is now known as JohnnyW | 23:11 | |
mnaser | we enabled both gmann ! | 23:13 |
gmann | ok | 23:17 |
*** dasm_ is now known as dasm|off | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!