Thursday, 2023-02-16

*** dmitriis is now known as Guest499503:31
opendevreviewHiroki Narukawa proposed openstack/nova master: libvirt: retry libvirt connection on live_migration_monitor  https://review.opendev.org/c/openstack/nova/+/86707703:44
opendevreviewTakashi Kajinami proposed openstack/nova master: Fix wrong description about minimum values  https://review.opendev.org/c/openstack/nova/+/87406108:05
opendevreviewmelanie witt proposed openstack/nova stable/yoga: db: Resolve additional SAWarning warnings  https://review.opendev.org/c/openstack/nova/+/87406508:52
opendevreviewmelanie witt proposed openstack/nova stable/xena: db: Resolve additional SAWarning warnings  https://review.opendev.org/c/openstack/nova/+/87406608:57
opendevreviewmelanie witt proposed openstack/nova stable/wallaby: db: Resolve additional SAWarning warnings  https://review.opendev.org/c/openstack/nova/+/87406909:30
opendevreviewmelanie witt proposed openstack/nova stable/victoria: db: Resolve additional SAWarning warnings  https://review.opendev.org/c/openstack/nova/+/87407109:31
opendevreviewmelanie witt proposed openstack/nova stable/ussuri: db: Resolve additional SAWarning warnings  https://review.opendev.org/c/openstack/nova/+/87407209:32
elodillesbauzas gibi : i don't know whether you noticed, but os-traits 2.10.0 is in the upper constraints \o/09:45
bauzaselodilles: clap clap09:49
bauzasthanks for the dedication 09:49
elodillesnp09:50
gibithanks folks09:52
* gibi has another CI system to debug :/09:52
bauzasgibi: do we need to modify placement to support those traits ? AFAIR yes09:52
gibibauzas: yes09:52
bauzasI can work on that09:52
gibifeel free to ping me for review09:53
bauzas(provided I easily find the pattern)09:53
gibibauzas: I think it is just a bump in requirements.txt nothing more09:53
gibithe rest is automatic09:53
bauzasaren't we checking the number of traits we support ?09:53
gibiplacement loads all the trait defs from whathever os-traits lib it founds at startup09:53
bauzasI thought so09:53
gibiI think we fixed that check09:53
gibihttps://review.opendev.org/c/openstack/placement/+/85196609:55
bauzasgibi: great09:56
bauzasgibi: hah, fun, we forgot to bump the reqs for 2.9.0 https://github.com/openstack/placement/blob/master/requirements.txt#L2910:14
bauzasgibi: but since we don't pin on a release, we should get 2.10 10:15
bauzasso technically, it's more about saying that for 2023.1 Placement will always support those traits are bare min10:15
bauzasas* bare10:15
gibibauzas: yeah that is still on me to have a limited lower constraint job running on placement and nova to catch these as we only test with upper today10:24
gibiI tried to do that after the last PTG but it was non trivial so I never finished it10:25
bauzascool, we haven't branched RC1 yet so we're on time10:25
* gibi needs time10:25
bauzasfor the moment, we need to just make sure we document this 10:25
gibiI believe more in systems that enforces rules than documentation that we don't read when we should10:27
opendevreviewSylvain Bauza proposed openstack/placement master: Update 2023.1 reqs to support os-traits 2.10 as min version  https://review.opendev.org/c/openstack/placement/+/87408010:29
bauzaswe have the PTL docs that I personnally enforce :)10:30
bauzasgibi: sean-k-mooney: time for a placement review https://review.opendev.org/c/openstack/placement/+/87408010:30
gibibauzas: +@10:30
gibibauzas: +210:30
bauzasnova-next is getting me mad10:56
bauzashttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/821228/7/gate/nova-next/e4ab52f/testr_results.html is anthologic10:57
bauzasvolume timeouts + guest kernel tainting10:57
opendevreviewTobias Urdin proposed openstack/nova master: libvirt: update description for live_migration_completion_timeout  https://review.opendev.org/c/openstack/nova/+/87408311:02
gibibauzas: is this with the new cirros version?12:21
bauzasgibi: nope, we haven't merged yet the change12:23
bauzasI haven't verified which cirros image we were having with the test, but I guess it's still 0.512:23
bauzas 2023-02-15 20:08:15.854905 | controller | ++ stackrc:source:692                       :   DEFAULT_IMAGE_NAME=cirros-0.5.2-x86_64-disk 2023-02-15 20:08:15.856966 | controller | ++ stackrc:source:693                       :   DEFAULT_IMAGE_FILE_NAME=cirros-0.5.2-x86_64-disk.img12:24
bauzasindeed12:24
sean-k-mooneyi said this a few days ago but i do think we should look at going to 0.6.x12:26
sean-k-mooneythere are a few kernel bugs in the 0.5.2 image that cause ocational kernel panics related to the apic in the guest12:27
sean-k-mooneythe 0.6.2 image is built on the ubuntu 22.04 kernel12:27
gibibauzas: thanks, then I guess yet another guest kernel bug12:29
bauzassean-k-mooney: I have a change for this 12:30
bauzassec.12:30
bauzassean-k-mooney: https://review.opendev.org/c/openstack/nova/+/87393412:30
bauzaswe could also use 0.6.2 if we want, I'm not against12:30
sean-k-mooneyi would prefer ot change it in devstack12:31
sean-k-mooneywe can do it per job12:31
* bauzas needs to lunch now12:31
sean-k-mooneybut it would be better to use the same cirror image in all jobs12:31
bauzassean-k-mooney: but we can test the new cirros image by nova-next and if we see it works, then we could indeed use it for all our jobs12:32
bauzasnova-next is also there for testing new stuff12:32
bauzasI'm just afraid of changing all our jobs by once without correctly testing first12:32
sean-k-mooneyi guess but given we have known kernel issue in the 5.2 image nad we have been trying to change it for years i would prefer to do the change in Antilepe if we can12:33
*** dasm|off is now known as dasm14:00
opendevreviewAmit Uniyal proposed openstack/nova master: Added a lock_unlock dcorator for instance  https://review.opendev.org/c/openstack/nova/+/87364814:02
opendevreviewAmit Uniyal proposed openstack/nova master: Added context manager for instance lock  https://review.opendev.org/c/openstack/nova/+/87364814:21
*** blarnath is now known as d34dh0r5314:26
bauzasaaaaaand now I see more and more cirros guest segfaulting... https://review.opendev.org/c/openstack/nova/+/821228/714:42
bauzashttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da2/821228/7/check/nova-multi-cell/da2689f/job-output.txt14:42
opendevreviewTakashi Natsume proposed openstack/nova master: doc: mark the max microversion for 2023.1 Antelope  https://review.opendev.org/c/openstack/nova/+/87410314:44
dansmithbauzas: where does that scenario:dhcp_client thing come in?14:50
bauzasdansmith: https://review.opendev.org/c/openstack/nova/+/87393414:50
bauzasdansmith: and https://launchpad.net/bugs/200646714:51
dansmithbauzas: right, what does scenario:dhcp_client affect14:51
dansmithdoes that end up with cirros behaving differently? or does it make tempest do something inside the guest?14:51
bauzasdansmith: ralonsoh told me to use that way b/c of https://review.opendev.org/c/openstack/neutron/+/871272/1/zuul.d/tempest-multinode.yaml14:51
bauzasand I shamelessly copy/pasteed14:52
* bauzas is a bit dumb after like rechecking 100+ times a day14:52
ralonsohthis is the dhcp client used in cirros 0.6.114:52
ralonsohmore info here: https://review.opendev.org/c/openstack/neutron/+/87127214:53
ralonsoh(in the commit message)14:53
dansmithralonsoh: but if the cirros client just uses it, what does the tempest config change?14:53
ralonsohdansmith, https://review.opendev.org/c/openstack/tempest/+/871270/2/tempest/common/utils/linux/remote_client.py14:54
ralonsohwe choose what is the VM dhcp client14:54
ralonsoh--> https://review.opendev.org/c/openstack/tempest/+/871270/2/tempest/config.py14:55
dansmithralonsoh: that's just for a manual renew, but doesn't affect how the guest works on first boot.. is that what you mean?14:55
ralonsohdansmith, no, that should not affect how the VM boots14:55
ralonsohthe OS will use the exiting dhcp client14:56
dansmithralonsoh: gotcha okay, so I guess most of the issues I see getting an IP seem to be related to initial boot14:56
ralonsohright14:56
dansmithokay cool, just making sure I understand14:56
bauzasralonsoh: and to clarify, cirros-0.6.x switched its dhch client to dhcpcd ?15:05
ralonsohyes15:05
bauzasack15:06
bauzasthen I understand what I wrote, huzzah :D15:06
dansmithralonsoh: does neutron record an event if the IP gets actually leased?15:08
dansmithmeaning, on failure can we query to neutron to see if the guest ever pulled its IP, to distinguish between "we can't ssh to the guest because of network problems" vs. "the guest is not alive and never pulled its ip" ?15:08
ralonsohdansmith, let me check, maybe in the syslog15:08
ralonsohunderstood, let me check15:08
ralonsohdansmith, neutron builds (adds/deletes) the leases file but we don't log this event. This is done by dnsmasq, you should be able to see that in syslog15:10
bauzasoh good point15:10
sean-k-mooneydansmith: i dont think neutron does but dnsmacq might15:10
bauzasI forgot to look at dnsmasq, fucking shit15:11
bauzasmy ops skills become rusty15:11
dansmithit's too bad because it might be a nice API to be able to poke that remotely.. i.e. instead of sshing forever, have a reasonably short timeout for the is-it-leased15:11
sean-k-mooneybauzas: is this ovn15:11
fungiso are the cirros kernel panics similar to one another, or random excuses?15:11
sean-k-mooneybecause if its ovn we are not useing dnsmasq15:11
dansmithand for reporting on failure, so we can say what forensics have been done15:11
sean-k-mooneythis is being handeled by openflow rules added by ovn15:11
dansmithalso, all three of those failed tests are volume-related15:12
sean-k-mooneyfungi: if they are related to acpi then its a know issue withthe cirrus 5.2 kernel15:12
dansmithso I still wouldn't write-off it being a volume problem15:12
fungisean-k-mooney: sounds like a good reason to switch to 0.6.1 then15:13
dansmithfungi: that's what I said, but 0.6.1 bringing other changes could be more destabilizing15:13
sean-k-mooneyfungi: i started working on alpine based image 2 years ago after i found out that the kernel bug was fixed in a later ubuntu kernel and cirro was just not updated15:13
ralonsohdansmith, what is the backend? OVS or OVN?15:13
dansmithralonsoh: I dunno15:13
ralonsohis this nova-next, right?15:13
dansmithralonsoh: yes15:13
ralonsohok, let me check15:14
fungidansmith: agreed, the devils you know vs the ones you don't15:14
bauzasralonsoh: I've seen the dhcp lease problems in nova-next yes15:14
bauzasman, can't I provide a regex when gerrit searching with 'comment' ?15:14
dansmiththis is one with the cirros bumped: https://44f9259a9cd22acee92d-000061e1666ecf9c52f0643ab3c391ab.ssl.cf1.rackcdn.com/873934/1/check/nova-next/785cc57/testr_results.html15:15
dansmiththree tests with ssh timeouts in one job is higher than average I'd say15:15
dansmithwhich makes me concerned15:15
ralonsohdansmith, nova-next uses OVS. About this API call, could be something to be implemented, yes15:15
ralonsohbut we don't have it now15:15
dansmithralonsoh: ack, it just seems like it would be nice to have15:16
sean-k-mooneydansmith: im not sure it woudl be easy to do in all cases15:16
sean-k-mooneyit would be ml2 driver specific15:17
dansmithsean-k-mooney: I'm sure it wouldn't15:17
sean-k-mooneyi dont think you coudl do it with ovn currenlty15:17
sean-k-mooneynot without changes to ovn15:17
dansmithnecessity, it's the mother of.. I forget.. something.. :D15:18
bauzasdon't let me play that mother game15:19
dansmithbauzas: you and your language lately.. should probably avoid adding "mother" to things :D15:19
bauzas:)15:20
fungibauzas: supposedly you can. "regular expressions can be enabled by starting with ^" https://review.opendev.org/Documentation/user-search.html15:27
fungisays it specifically in the entry for the message: expression15:28
bauzasralonsoh: I just discovered some timeout on ovs 15:28
bauzasralonsoh: Feb 15 17:21:05.272099 np0033113580 nova-compute[73993]: DEBUG ovsdbapp.backend.ovs_idl.vlog [-] 0-ms timeout {{(pid=73993) __log_wakeup /usr/local/lib/python3.10/dist-packages/ovs/poller.py:248}}15:28
bauzas Feb 15 17:21:05.274517 np0033113580 nova-compute[73993]: DEBUG nova.compute.manager [None req-f2621b7f-2ead-4510-9e82-aad93ba9f29d tempest-ListServersNegativeTestJSON-1193978856 tempest-ListServersNegativeTestJSON-1193978856-project] [instance: d68e1732-b508-4f6b-be25-cbae06fde7c2] Build of instance d68e1732-b508-4f6b-be25-cbae06fde7c2 was re-scheduled: Timed out waiting for a reply to message ID c6193dab09f444b796478e321d69e3a215:28
bauzas {{(pid=73993) _do_build_and_run_instance /opt/stack/nova/nova/compute/manager.py:2450}}15:28
bauzasfungi: damn shit, missed that even if I did RTFM15:28
fungibauzas: sorry, i misread what you said. you're searching for comment not message15:28
bauzasfungi: yeah that15:28
ralonsohbauzas, what is this job link?15:28
fungithe entry for comment: doesn't mention regex15:28
fungijust says it's a string match15:29
bauzasralonsoh: something new https://zuul.opendev.org/t/openstack/build/76f29afe3f134f139a48b537de7029dc15:29
fungiso you're probably right15:29
bauzasfungi: doh15:29
fungisadly15:29
bauzasok, I was wanting to query all the recheck messages I wrote15:29
bauzasand since I haven't followed a clear pattern...15:29
fungicould script that through the rest api, but it would be a bit of work15:30
fungidepends on how much you want it, i guess15:30
bauzaswell, it was just for a quick lool15:31
bauzaslook* even15:31
bauzasnevermind15:31
ralonsohbauzas, I don't know where this message is coming, but this is the OVS local service15:32
bauzasyup on n-cpu15:32
bauzasbut apparently it's enough serious to do a reschedule15:32
bauzas... which on an AIO doesn't help 15:33
bauzas:)15:33
ralonsohbauzas, actually this is something normal, coming from the python ovs bindings15:34
ralonsoh    def __log_wakeup(self, events):15:34
ralonsoh        if not events:15:34
ralonsoh            vlog.dbg("%d-ms timeout" % self.timeout)15:34
ralonsohhttps://github.com/openvswitch/ovs/blob/master/python/ovs/poller.py#L246#15:34
bauzashmmmm15:37
bauzasI'm able to find another patchset that got the exact failure from the same job on the same test tempest.api.compute.servers.test_list_servers_negative.ListServersNegativeTestJSON15:37
bauzashttps://d2746e36843633ae266c-ad135f72b22a132e11a904324fbc4e60.ssl.cf1.rackcdn.com/872413/3/check/tempest-integrated-compute-enforce-scope-new-defaults/5204f72/job-output.txt15:37
* bauzas does a bit of logsearch15:37
ralonsohbauzas, I'm checking https://44f9259a9cd22acee92d-000061e1666ecf9c52f0643ab3c391ab.ssl.cf1.rackcdn.com/873934/1/check/nova-next/785cc5715:41
ralonsohfor example the first test case failing15:41
ralonsohtest_attach_scsi_disk_with_config_drive15:41
ralonsohI see the DHCP agent configuring dnsmasq process, I see n-cpu creating the interface, OVS agent receiving this creation event15:42
ralonsohbut I see nowhere in the DHCP agent logs the DHCPREQUEST for this fixed IP15:42
ralonsohand the tempest test doesn't retrieve the VM logs15:42
dansmithyeah I'm not sure why we're not logging the console in that case15:43
dansmithbecause we should see that, to know if it's a guest crash15:43
ralonsohdansmith, qq, this is about the image resources15:46
ralonsohFeb 15 15:24:26.291000 np0033111254 nova-compute[39923]: DEBUG nova.compute.resource_tracker [None req-a80484d7-0e98-4086-a01f-afda9d54f06b None None] Instance 749a5ad0-e7b8-47f2-8e9c-78896c92b80e actively managed on this compute host and has allocations in placement: {'resources': {'VCPU': 1, 'MEMORY_MB': 128, 'DISK_GB': 1}}. {{(pid=39923) _remove_deleted_instances_allocations /opt/stack/nova/nova/compute/resource_tra15:46
ralonsohcker.py:1632}}15:46
bauzasdespite, none of them show the console log15:46
ralonsohwhy are you using 128MB?15:46
ralonsohcirros should be using 256, right?15:46
bauzashmmm15:47
dansmithralonsoh: idk, maybe that was the result of a resize?15:47
bauzasthat's a good catch15:47
ralonsohdansmith, no, this is the VM booting15:47
bauzasI haven't checked the flavors that were related to the cirros failures I saw15:47
ralonsohfirst log in the compute agent 15:47
dansmithflavors are 42, and 8415:47
* dansmith looks15:47
ralonsoh128 and 19215:48
ralonsohI'll check neutron CI15:48
dansmith42 is the default, and that's 128M15:49
opendevreviewAlexey Stupnikov proposed openstack/nova stable/ussuri: Test aborting queued live migration  https://review.opendev.org/c/openstack/nova/+/87357515:50
dansmith84 is 192M yeah15:50
dansmithso perhaps we're flying too close to the sun with 128M15:51
ralonsohdansmith, I'm checking what we are using in Neutron15:51
dansmithAFAIK, these are devstack defaults15:52
ralonsohwe use the default values too, 128M15:53
dansmithbumping the flavor memory is likely to result in other instabilities, I fear15:53
ralonsoh{'resources': {'DISK_GB': 1, 'MEMORY_MB': 128, 'VCPU': 1}}15:53
dansmithyeah15:53
ralonsohif you could retrieve the console logs, that could help15:54
ralonsohat least to know that the VM tried to request an IP15:54
ralonsohand, somewhere, this request was dropped15:54
dansmithyeah, we should chat with gmann about it. I can go look at the tempest stuff to figure out why,15:55
dansmithbut he probably knows off the top of his head15:55
bauzasralonsoh: I have a few logs where the guest failed to acquire a lease, sec16:04
bauzasand some where the guest panickjed16:04
sean-k-mooneyfor what its worht the every increaseign memory requirement for cirrios was one of the reasons i looked at moving us to alpine a few years ago16:06
sean-k-mooneylongterm i still think that would be a better approch16:06
dansmithis alpine really going to be smaller than cirros? I mean, that seems odd to me16:06
opendevreviewAlexey Stupnikov proposed openstack/nova stable/victoria: Test aborting queued live migration  https://review.opendev.org/c/openstack/nova/+/84574816:07
opendevreviewAlexey Stupnikov proposed openstack/nova stable/victoria: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/84575316:07
dansmithso I think we get the automatic console stuff when we use specific waiters for servers to be available16:07
dansmithso that test must use a different one16:07
bauzasoh damn, internal meeting16:07
ralonsohbauzas, do you have some links? in any case, this is not Nova nor Neutron fault, I think16:08
opendevreviewAlexey Stupnikov proposed openstack/nova stable/victoria: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/84575416:08
ralonsohat least in this case16:08
bauzasralonsoh: yup, which kind of failures do you want to see  ? for the dhcp query?16:08
ralonsohyes and the kernel panic 16:09
bauzasralonsoh: one for segfaults https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da2/821228/7/check/nova-multi-cell/da2689f/job-output.txt16:20
bauzasralonsoh: that one for kernel panicking https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/821228/7/gate/nova-next/e4ab52f/job-output.txt16:23
bauzasralonsoh: that one is an interesting case where the default route is already present and metadata goes into weeds https://ae59d1e8526fa7671728-240e4b572b6f89b26c1b0e70b1c00c17.ssl.cf1.rackcdn.com/872413/3/check/nova-multi-cell/5e89e48/job-output.txt16:24
ralonsohI'll check this last one16:27
ralonsohbauzas, eh hold on, this could be a problem with the OVN version in Jammy16:28
ralonsoh--> https://review.opendev.org/c/openstack/neutron/+/87368416:28
ralonsohthere is an issue with ovn v22.03.0, included in jammy16:28
ralonsohand some missing flows for the metadata16:29
ralonsohYatin found it and we are skipping those tests16:29
ralonsohactually we are going to test using a compiled version of OVN16:29
ralonsohhttps://review.opendev.org/c/openstack/neutron/+/874112/116:29
bauzasack ok16:30
*** dasm is now known as Guest504616:46
opendevreviewAlexey Stupnikov proposed openstack/nova stable/ussuri: Test aborting queued live migration  https://review.opendev.org/c/openstack/nova/+/87357516:51
opendevreviewAlexey Stupnikov proposed openstack/nova stable/ussuri: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/87357616:51
opendevreviewAlexey Stupnikov proposed openstack/nova stable/ussuri: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/87357717:01
opendevreviewAlexey Stupnikov proposed openstack/nova stable/ussuri: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/87357717:09
fungiralonsoh: any idea if the fixes have been backported to v22 such that ubuntu could do an sru to patch their packages?17:25
ralonsohfungi, Yatin opened a bug today: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/200305617:25
fungiawesome17:26
ralonsoh(sorry, not today)17:26
fungii'd hate to see devstack using a bespoke ovn build long-term17:26
fungihere's hoping they're able to patch it17:26
ralonsohfrom https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2003056/comments/4, there should be a new version now (22.04.1, instead of 22.03)17:27
bauzasralonsoh: I guess you don't recommend us to work on nova's zuul jobs to build ovs from source?17:29
fungioh, it didn't dawn on me that those might be date-based versions rather than semver, so yeah here's hoping backporting the fix in ubuntu won't be painful17:29
bauzass/ovs/ovn17:29
* bauzas is tired17:29
ralonsohbauzas, no, we should use the OS released version17:29
bauzascool17:30
ralonsohwe use compiled version in Neutron for testing only 17:30
bauzasyup saw the DNM17:30
ralonsoh(and we had problems for this, this is why we use both now)17:30
bauzasbut I was wondering how much of this was actionable on our side17:30
bauzasI'm like done rechecking every 2 hours17:30
bauzasany actional progress sounds to a sweet spot :)17:31
bauzassounds to me*17:31
fungioh yay, so it ended up in j-p-u yesterday and should be showing up on mirrors at any moment assuming the ubuntu autobuilders aren't clogged17:32
fungiwe may need to add that repository temporarily to the sources list in affected jobs, until it migrates into a jammy point release17:33
fungihttps://launchpad.net/ubuntu/+source/ovn/22.03.2-0ubuntu0.22.04.1 indicates the binary packages haven't built yet17:35
* bauzas ends his day now17:36
bauzasI hereby declare Nova on Feature Freeze :)17:37
bauzas(anyway, all the accepted blueprints were reviewed)17:37
* bauzas will continue fighting the Beast tomorrow and later so as to make sure we land what we approved17:37
bauzaskbye ;)17:37
fungifnordahl: i'm a little fuzzy on ubuntu's sru flow... is it like proposed-updates in debian where it only makes it into the mainstream indices in periodic point releases and we need to put jammy-proposed in sources.list in the interim?17:38
gmanndansmith: bauzas: ralonsoh: not read all the logs but related to cirros image there is patch up to bump it to version 0.6.1. https://review.opendev.org/c/openstack/devstack/+/85977317:56
ralonsohgmann, thanks. We have seen some seg faults and kernel panics during the VM boot, using this image17:59
ralonsohin https://review.opendev.org/c/openstack/nova/+/87393417:59
*** dasm is now known as Guest505218:10
spatelsean-k-mooney i saw your post about my question related ceph disaster 19:05
spatelI am trying to do rescue method and stuck here - https://ibb.co/y84BLGY19:05
sean-k-mooneyhum ok have you tried un rescuing and seing if it fixed enough for the vm ot recover its self19:08
spatelLet me try now..19:09
spatelno luck - https://ibb.co/4swXjQ119:10
spatelceph status showing all PGs are clean and active - https://paste.opendev.org/show/bnG8rvXJydADTZknd2QD/19:11
spatelnot sure why i got filesystem corruption. 19:11
mnaseris there ci jobs that are testing secure rbac across all services?19:24
mnaserin an all zed env, enabling it for neutron seems to break new vm deployments19:25
mnasersomething along these lines: 2023-02-15 22:04:28.241 2935939 ERROR nova.compute.manager [instance: aaac6261-721a-404d-80d3-94cf25bf869b] neutronclient.common.exceptions.PortNotFoundClient: Port 485c1e6e-fdcd-45ca-ae52-c2cdeb95bfdd could not be found.19:25
sean-k-mooneyim not really sure what to do. you could try stoping the vm and mounting the volume on the host and running the filesystem recovery form there19:25
sean-k-mooneyyou might be able to fix the superblock but im not sure if that will work19:25
sean-k-mooneymnaser: i tought there was a devstack job for this yes gmann would know more19:27
mnaseri mean enabling it in nova works fine, so nova is happy, but enabling in neutron makes it un happy19:27
mnaserso it could be a neutron issue so i dont know if the devstack job enables it for all or just for specific services19:27
sean-k-mooneyright but i tought we had it enabled for both19:27
sean-k-mooneyya its a good question im not sure either19:28
mnaserhttps://github.com/openstack/nova/blob/master/.zuul.yaml#L678-L68419:29
mnaserwonder if its that19:29
sean-k-mooneyya that should have it enabled for those 4 services19:33
sean-k-mooneymnaser: https://zuul.openstack.org/builds?job_name=tempest-integrated-compute-enforce-scope-new-defaults&skip=019:33
sean-k-mooneyit looks pretty green too19:34
sean-k-mooneywelll there is at least some green menaing it should work in general but im not sure how much is covered by that19:34
mnasersean-k-mooney: i wonder if we are getting hit by this since its not in zed yet - https://github.com/openstack/neutron/commit/6d8ada0ac93beed05b45adb9582c3ef23bef49d219:37
mnaserand the test that failed was actually as an admin19:37
sean-k-mooney oh your trying to do this in zed19:38
sean-k-mooneyya ok we only enabled it by defualt this cycle19:38
sean-k-mooneymnaser: im not sure that was planned ot be backported19:39
sean-k-mooneyi would ask the neutrnon folk to backport it if you intend to enable it19:40
sean-k-mooneyits kind fo feature ish19:41
mnasersean-k-mooney: yeah i guess one could argue its a bug too19:41
sean-k-mooneyits because of the pivort that happend at the yoga fourm/ptg19:42
sean-k-mooneywhen we deiced to revert a lot of the work and remove the use of scopes form most apis19:42
sean-k-mooneyunder the orginal plan admin should not be global admin19:43
sean-k-mooneyso they adapted to that change in zed19:43
gmannmnaser: yes, you found those. during integration testing in this cycle (tempest-full-enforce-scope-new-defaults), we found few bugs in neutron and they got fixed in master 21:51
gmannmnaser: I think slaweq was planning to backport those to stable/zed or older if needed21:51
gmannlet me find those patches21:51
gmannbasically these three bugs https://bugs.launchpad.net/neutron/+bug/1996150  https://bugs.launchpad.net/neutron/+bug/1996836 https://bugs.launchpad.net/neutron/+bug/199708921:55
gmannmnaser: pinged about these in neutron channel. I was in impressions that they wee backported already21:57
gmannmnaser: glad to know it worked fine for nova. did you enable scope and new defaults both or just new defaults ?22:06
*** JohnnyW5 is now known as JohnnyW23:11
mnaserwe enabled both gmann !23:13
gmannok23:17
*** dasm_ is now known as dasm|off23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!