Wednesday, 2021-10-27

*** dpawlik3 is now known as dpawlik08:34
gibigood day nova08:56
* kashyap waves09:09
bauzashola folks09:17
gibio/09:20
gibi-another day in downstream land-09:20
opendevreviewBalazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot  https://review.opendev.org/c/openstack/nova/+/81341909:34
gibistephenfin: do you recall why you needed to configure both db in a single Database fixture in https://review.opendev.org/c/openstack/nova/+/799526/5/nova/tests/fixtures/nova.py#612 ?10:58
gibiI think our tests are using a separate fixture instance for each db (api, main)10:59
stephenfingibi: I'm not 100% sure but my guess is that I don't, and that was simply for expediency/laziness :) If SESSION_CONFIGURED became a mapping of DB type to "is configured" bool, we probably wouldn't need that11:00
stephenfin*we don't11:00
gibiI see so we had a single global but with two dbs to configure11:01
stephenfinYeah, I think so11:01
gibiOK, if that is the only reason then I think I have a way to remove that global flag (based on melwitt's idea) with patch_factory from oslo_db11:02
gibiit is no pretty confusing that we have to Database fixture intantiated one for main and one for api but the first one configures both db11:03
gibis/no/now11:03
gibi/to/two11:03
gibi /o\11:03
stephenfinyeah, tbc it could be more complicated than that but I really doubt it11:04
gibiyeah, lets see if my idea works11:08
sean-k-mooneystephenfin: since your about here an easy one for you https://review.opendev.org/c/openstack/nova/+/811947 think we can get that over the line11:12
stephenfinsure, will look now11:13
sean-k-mooneythanks :)11:14
sean-k-mooneystephenfin: based on the ptg discussion woudl you mind removing your -2 on https://review.opendev.org/c/openstack/nova/+/804292 im going to rebase that and the autopep8 one shortly11:19
fricklerkashyap: I didn't make progress with reproduction without nova yet, so I created https://gitlab.com/qemu-project/qemu/-/issues/693 for now. let me know if you need additional data there13:05
kashyapfrickler: Thanks for the report.  A quick one is: were you using nested setup, or was this DevStack instance on a baremetal host (<shudder>)?13:08
kashyapA thumb-rule is to always explicitly state so if you're using a nested setup13:09
kashyapfrickler: Can you edit the report to state that "deploy DevStack in a VM?"  So that an unsuspecting dev won't run it on their baremetal laptop and wreak havoc...13:09
kashyapI'll add a quick comment there, actualy13:10
kashyapDone13:17
kashyapfrickler: I'll check about it w/ a TCG dev13:17
fricklerkashyap: yes, nested is correct, I added that to the description. though I could also duplicate on a baremetal host if you assume that it would behave differently13:25
kashyapfrickler: No, no need for baremetal.  VMs are best.  Can you also post the QEMU command-line of the DevStack VM itself?  (The level-1 VM)13:37
fricklerkashyap: no, I have no admin access to the cloud it is running on. I'm assuming it will essentially look the same, though, just with accel=kvm13:39
kashyapHmm, not sure if it'll be that same w/ accel=kvm.  The details would change quite a bit.  The "host" (guest hypervisor) setup can determine the guest behaviour here a lot.13:43
kashyapThat's one of the questions I'd expect from a TCG dev 13:43
fricklerkashyap: o.k., I'll try running cirros locally without devstack in between, that would give the simplest setup in the end13:45
kashyapfrickler: Sure; yeah, that'd be the best.  The shorter the route to the reproducer, the more likely we can get to the root cause13:47
kashyapfrickler: Thanks for all the testing!  It's a pain, I Know13:47
kashyaps/K/k/13:47
*** kopecmartin is now known as kopecmartin|pto14:00
fricklerkashyap: that went easier than I expected, updated the issue14:14
bauzasgibi: sean-k-mooney: I think we said https://bugs.launchpad.net/nova/+bug/1947753 is valid during our PTG, right?14:19
gibibauzas: I don't remember discussing this14:21
bauzasgibi: sorry you're maybe right14:21
gibiwhat I see is that in the bug they evacuate instances without restarting the compute node14:21
bauzasI originally thought this was about evacuate/evacuateback/evacuate14:21
bauzasadding a comment14:22
gibiso far we said that you can only evacuate if you make sure that the compute is dead 14:23
gibiin the bug case the compute was halted / stuck, the heartbeat was missing so the service was considered down, nova allowed evacuation, then the compute recovered without the nova-compute service restarted14:23
gibinova-compute only cleans up evacuated instance during init_host but does not do it periodically14:24
gibiso in this case the evacuated instance was not cleaned up on the source leading to duplicated instances causing corruption14:24
gibioption a) change nova-compute to clean up evacuated instance in a periodic14:25
gibioption b) change the evac API to only allow evacuation if the compute is forced down (meaning the admin mades sure the host is fenced)14:26
gibioption c) declare the current bug as user error as the nova-compute was not restarted as part of the compute node recovery14:27
bauzasgibi: I wrote a large comment on the bvug14:33
bauzasI think CERN is triggering evacuations before verifying the host status14:33
sean-k-mooneybauzas: i dont think we talked baout this either14:34
bauzasas I said, I feel healthchecks can help them getting a better decision-making about whether they need to evacuate or not14:34
sean-k-mooneybauzas: we talk about a related issue with allcoations tha talso impact evacuate14:34
bauzassean-k-mooney: yeah, my confusion, I originally thought it was about the back-and-forth about evacuate we discussed for pain points14:35
sean-k-mooneye.g. if for any reason we oversubscie the allcoations then we cant evacuate14:35
sean-k-mooneyi have not read it fully but it sound like they are not properly fencing the node an ensuring the vm is not running14:36
sean-k-mooneybefore evacuating if they are having apllciation data currption14:36
bauzasthat's literrally what I wrote.14:41
bauzasanyway, moving to a new bug.14:41
sean-k-mooneyack as i said have not finsihed reading the bug description  or comments so glad we agree :)14:42
kashyapfrickler: Ah-ha, noted.  Good news: there's already some response from two QEMU devs, with a patch in a newer version :)14:51
fricklerkashyap: yeah I just responded, but I didn't see the patch reference. going from 32M to 1G really sounds a bit excessive, would be good to be able to tune that14:55
kashyap(Well, I don't quite think it's "good news" ...)14:55
kashyapfrickler: Sorry, I was referring to the commit that DanPB pointed out - https://gitlab.com/qemu-project/qemu/-/commit/600e17b2614:55
fricklerkashyap: ah, yes, that seems to be the patch that triggers this, I though you were referring to a fix in a recent commit14:56
kashyapfrickler: Yeah, that increase is a tad too much.  14:56
kashyapfrickler: Yes, poor phrasing on my part.14:56
opendevreviewBalazs Gibizer proposed openstack/nova master: Remove SESSION_CONFIGURED global from DB fixture  https://review.opendev.org/c/openstack/nova/+/81568914:56
fricklerkashyap: otoh that also is likely to explain why tests seemed to be going faster on Bullseye than on Focal14:57
opendevreviewBalazs Gibizer proposed openstack/nova master: Refactor Database fixture  https://review.opendev.org/c/openstack/nova/+/81569014:58
kashyapfrickler: Interesting; what tests are going faster?14:59
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix interference in db unit test  https://review.opendev.org/c/openstack/nova/+/81473514:59
gibistephenfin: ^^ here is the removal of the global SESSION_CONFIGURED flag from the DB fixture and some extra :D15:00
fricklerkashyap: I didn't check in particular, but the whole tempest-full job with --serial on Debian doesn't take much longer than with the default (-c 4 I think) on Focal15:02
kashyapI see.15:02
gibimelwitt: thanks a lot for the help exlaning the global db transaction factory situation. I used your info to actually remove SESSION_CONFIGURED from our fixture along the the unit test fixes15:05
kashyapfrickler: So, it is tunable via command-line, but it's not wired up in libvirt yet, though.15:05
kashyapfrickler: See the option: -accel=tcg,tb-size=$value_in_MiB15:05
kashyap"tb-size" in the man page15:06
fricklerkashyap: as long as libvirt doesn't support it, I fear that won't help much. might be good to cap it to something like 50% of the VM memory15:10
kashyapfrickler: Right; libvirt just didn't wire it up ... we can meanwhile do a nasty hack of uploading a QEMU binary to the CI system w/ this param tweaked15:11
kashyapfrickler: Do you have the appetite to file a libvirt upstream RFE?  (Then I can clone it downstream, and get it triaged)15:11
fricklerkashyap: I think I'll do a local test with a reduced default tb-size first in order to be certain that that's the cause. but not before tomorrow15:13
kashyapRight, no rush at all15:13
gibibauzas: replied in https://bugs.launchpad.net/nova/+bug/1947753 I think _destroy_evacuated_instances is not called periodically15:14
kashyapfrickler: So I see that someone else has raised this upstream last year: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg05235.html (TB Cache size grows out of control with qemu 5.0)15:16
bauzasgibi: indeed, only when restarting15:22
bauzasdid I said the other way ?15:22
kashyapfrickler: So, this worked for me:15:22
kashyap -machine q3515:22
kashyap -accel tcg,tb-size=25615:22
kashyap(As an example)15:23
gibibauzas: at least I understood this sentence that way "Either way, if the service continues to run, it verifies the evacuation status periodically and deletes the host."15:25
gibibauzas: btw, about https://bugs.launchpad.net/nova/+bug/1947687 I cannot formulate a logstash signature it seems that this error happens in a lot of cases when no test cases are failing so I get a lot of false positives15:27
bauzasgibi: okay, then my brain fucked15:27
kashyapfrickler: For reference, a minimal command-line:15:27
kashyap$> qemu-kvm -display none -cpu Nehalem -no-user-config \ -machine q35 \ -accel tcg,tb-size=256 \ -nodefaults -m 2048 -serial stdio \ -drive file=/export/vm1.qcow2,format=qcow2,if=virtio15:27
kashyap(Ugh, line-breaks are broken, but you see what I mean)15:27
bauzasgibi: ack for the logstash thing, no worries15:28
fricklerkashyap: thx, added a comment to the issue, seems the libvirt path is really the most promising one15:34
kashyapfrickler: Definitely.  Please file the RFE (and post a link to me, Bz Ccs will take me slower to process) when you can15:39
kashyapThanks for the patience :)15:39
melwittbauzas: hi, could you pls take a look at these train backports when you get a chance? someone posted a comment on the top patch yesterday indicating they are awaiting merge of the fixes https://review.opendev.org/q/topic:%2522bug/1927677%2522+branch:stable/train+status:open15:49
opendevreviewBalazs Gibizer proposed openstack/nova stable/pike: Add a WA flag waiting for vif-plugged event during reboot  https://review.opendev.org/c/openstack/nova/+/81343715:49
bauzasmelwitt: ack, doing it now15:49
melwittthanks!15:50
bauzasmelwitt: I already reviewed them but forgot to submit, my bad15:51
bauzasnow this is fixed.15:52
melwittbauzas: a-ha, thank you15:59
stephenfingibi: question on https://review.opendev.org/c/openstack/nova/+/81569016:13
stephenfinplease excuse my ignorance16:13
gibilookgin16:14
gibistephenfin: you are right something is fishy there16:23
gibiI have to go back and poke that test to understand what is happening16:23
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM:goat  https://review.opendev.org/c/openstack/nova/+/81570516:32
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: goat 2  https://review.opendev.org/c/openstack/nova/+/81570616:32
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: goat3  https://review.opendev.org/c/openstack/nova/+/81570716:32
gibigmann: I did the change you requested in https://review.opendev.org/c/openstack/tempest/+/809168/comment/35477e85_10754ba5/ but I wondering why we need that indirection16:34
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: goat 2  https://review.opendev.org/c/openstack/nova/+/81570616:36
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: goat3  https://review.opendev.org/c/openstack/nova/+/81570716:36
em_are there currently issues with xena nova and (debian) cloud images? Neither my ssh keys nor the admin password seems to get applied. Any open bugs (maybe libvirt issues or kernel related?) using 5.10 debian bullseye as host, kolla xena (ubuntu/source) as libvirt17:16
opendevreviewBalazs Gibizer proposed openstack/nova master: Refactor Database fixture  https://review.opendev.org/c/openstack/nova/+/81569017:18
gibistephenfin: you had a valid point, fixed it ^^17:19
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix interference in db unit test  https://review.opendev.org/c/openstack/nova/+/81473517:20
gmanngibi: replied, basically Tempest test the services with what is configured to test instead of 'test what cloud/service APIs return'17:21
gmannautodetecting service features/extensions to what to test can hide the error. 17:22
gibigmann: OK, I think I got it. Does devstack needs to be changed to generate the extension name to the tempest config/17:26
gibi?17:26
gmanngibi: we do that, like master test with 'All' (enable everything) and stable are pin with the extensions list at the time of stable branch is released. like this - https://review.opendev.org/c/openstack/devstack/+/81148517:28
gmannfor now on master we do not need to do anything in devstack side17:28
gibigmann: ack, thanks for the help and explanation17:30
gmannI will review the tempest patch once gate result is finished 17:32
gmannthanks for update17:32
opendevreviewMerged openstack/nova master: Ensure MAC addresses characters are in the same case  https://review.opendev.org/c/openstack/nova/+/81194718:03
opendevreviewMerged openstack/nova master: Fix instance's image_ref lost on failed unshelving  https://review.opendev.org/c/openstack/nova/+/80755118:29
Zer0Bytehey19:14
Zer0Bytequestion19:14
Zer0Byteim using the cinder frontend option to perform QOS at the storage with the spec total_iops_sec_per_gb=319:15
Zer0Byteis working great19:15
Zer0Bytebut after extend the volume is not updating the total_iops_sec property on the KVM template19:15
Zer0Byteis that normal19:15
Zer0Byte?19:15
EugenMayerHello. Anybody else has troubles with (Xena) bootstrapping a debian 11 (generic cloud) or debian 10(openstack variant) and not able to pre-deploy a ssh-key or even a root password? Looking at the logs, it always prints that there is no suitable ssh key to deploy. Tried it with an rsa ord ed key, no hopes. Any hints?19:37
EugenMayerThe boot log looks like this: https://gist.github.com/EugenMayer/452de9229e8f47dad0fadb4f8774d48219:39
clarkbEugenMayer: are you booting it with the proper flag to assign a nova ssh key to the instance?20:21
clarkbAlso if cloud-init can't reach the nova metadata service this might happen. You might try using a config drive if it isn't already20:22
Zer0Byteno one with the issue of refresh the kvm 23:52
Zer0Bytevolume iops23:52
Zer0Byte?23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!