Tuesday, 2021-11-02

fricklerpriteau: ^^05:56
opendevreviewBalazs Gibizer proposed openstack/nova master: Revert "Temp disable nova-manage placement heal_allocation testing"  https://review.opendev.org/c/openstack/nova/+/81624207:05
bauzasgood morning Nova08:20
bauzasgibi: sent https://review.opendev.org/c/openstack/nova/+/815940 to the gate08:21
gibibauzas: good morning and thank08:51
gibi's08:51
bauzasnp08:52
bauzashow things went those 2 days ?08:52
gibinothing signifficant for me but I was mostly off yesterday08:52
fricklerkashyap: in case you didn't see it yet: https://gitlab.com/libvirt/libvirt/-/issues/22909:08
kashyapfrickler: Morning09:08
kashyapfrickler: Just back after 2 days away from email.  I indeed didn't see it.  Thanks for filing it!09:08
fricklerkashyap: I also tested with a custom built qemu in https://review.opendev.org/c/openstack/devstack/+/815958 , which essentially has tb-size=64M09:09
fricklerthe failures seem to be unrelated09:09
kashyapfrickler: Oh, cool, so you fetched the file and tested it.  (I see no failures there; Zuul gave a +1)09:14
kashyapfrickler: Did setting it to 64M bring it back to the "previous capacity"?09:14
kashyap(I'm putting it in quotes because, I don't know how many instances you were able to launch before this QEMU change)09:14
fricklerkashyap: the failures were in some of the rechecks. the old failures weren't 100% deterministic, it depends on how tempest with -c4 has parallel jobs that all start multiple instances09:20
kashyapfrickler: Rigt.  Shall we let it run on multiple clouds / setups for a week or so?  To rule out it's not the tb-size?09:21
kashyaps/Rigt/Right/09:21
fricklerI based the 64M on looking at a single instance locally with a 128M flavor, qemu then uses a bit more memory than with 4.2, but not too much hopefully, like ~200M instead of 150M09:21
fricklerkashyap: I intend to do a couple more rechecks, but there seems to be some issue on the neutron side which makes things unstable09:22
fricklerI'm pretty confident by now though that tb-size is the trigger09:23
stephenfinI need to test the 'GET /servers/{server_id}/migrations/{id}' API, meaning I need a way to slow down live migrations so I catch one in the act. Anyone have a suggestion for an easy throttle I can set to do that?09:37
stephenfin(rather than relying on big or busy guests)09:38
gibistephenfin: limiting bandwidth ?09:39
stephenfinI assume there isn't a nova or libvirt config option I can use for that though?09:39
stephenfinThis is a simple DevStack two-node deployment, so I don't have a separate management network :)09:39
gibithere is something in libvirt09:39
gibias there is virsh migrate-setspeed command in virsh09:40
stephenfinoh, I looked and didn't see anything obvious09:40
* stephenfin googles09:40
kashyapfrickler: Do mention the Neutron issue in the change, if / when you get a minute09:41
kashyapfrickler: And probably Cc some folks from Neutron who might be able to debug09:41
stephenfingibi: that's exactly what I wanted. Thanks!09:41
kashyapstephenfin: Yes, migrate-setspeed lets you throttle indeed09:42
gibistephenfin: cool09:42
bauzasmmmm09:51
bauzasjust saw a new "Your Turn" series in Gerrit default dashboard09:51
bauzaswhat's the "attention:self" query ?09:51
bauzashah, nevermind, found https://gerrit-review.googlesource.com/Documentation/user-attention-set.html09:52
bauzasinteresting09:52
gibiI'm still learning the rules described in ^^09:59
opendevreviewBalazs Gibizer proposed openstack/nova master: Reno for qos-minimum-guaranteed-packet-rate  https://review.opendev.org/c/openstack/nova/+/80504610:15
gibibauzas: fyi, this is the final patch (the reno) https://review.opendev.org/c/openstack/nova/+/805046 for the https://blueprints.launchpad.net/openstack/?searchtext=qos-minimum-guaranteed-packet-rate blueprint. So we can close that bp soon \o/10:17
bauzasgibi: wow, this was fast.10:17
gibibauzas: we only missed the nova-manage part of that bp in xena10:18
bauzasyup, I know10:18
bauzasbut still :)10:18
gibiyeah, it is always nice to close out a bp even before M110:18
bauzaswe discussed this at the PTG, I wasn't expecting the nova-manage patch to land that soon :)10:18
gibiit is thanks to stephenfin and melwitt 10:19
opendevreviewBalazs Gibizer proposed openstack/nova master: Reno for qos-minimum-guaranteed-packet-rate  https://review.opendev.org/c/openstack/nova/+/80504610:24
gibibauzas: btw, there is a bug fix for the series (for those part we landed in xena) https://review.opendev.org/c/openstack/nova/+/81139610:26
bauzas+w10:28
opendevreviewFederico Ressi proposed openstack/nova master: Debug Nova APIs call failures  https://review.opendev.org/c/openstack/nova/+/80668310:32
lyarwoodfrickler: just catching up after a few weeks out, excellent work with the QEMU tb-size issue! 10:38
kashyaplyarwood: Yeah, libvirt needs to wire it up now, though10:42
kashyapI'll file a RHEL libvirt RFE - that might get on their triage queue quicker10:43
gibibauzas: awesome, thank you10:44
* bauzas goes off for gym duties10:47
lyarwoodkashyap: yeah, shame we can't hackaround this in the meantime somehow10:50
lyarwoodkashyap: couldn't we pass QEMU args directly through libvirt from Nova in the meantime?10:50
* lyarwood has a look10:51
kashyaplyarwood: Definitely, there's QEMU command-line passthrough...10:51
kashyapFor libvirt XML10:51
lyarwoodsecond day back and I'm already writing another hackaround 10:51
kashyaplyarwood: But wait:10:51
kashyapNova doesn't have that XML modelling classes for command-line passthrough  (for good reasons) :-(10:51
kashyaplyarwood: The only current hack is to upload a custom QEMU build with that built in 10:52
lyarwoodewww10:52
lyarwoodI'd rather add the logic in Nova with a workaround option tbh10:52
lyarwoodthan build our own custom QEMU10:52
kashyaplyarwood: I agree, it's nasty to do the custom builds for medium-term10:52
kashyapThe logic in Nova would require to wire in these bits, BTW: https://libvirt.org/kbase/qemu-passthrough-security.html10:53
kashyap(Including the namespace at the top)10:53
lyarwoodYup that's easy enough10:53
* lyarwood gives it a go now10:54
kashyapAnd still it requires more edits.  I was testing last week10:54
kashyapWhen using `-accel tcg,tb-size=256`, we should remove "accel=tcg" from `-machine q35,accel=tcg`10:54
kashyapOtherwise QEMU fails to launch10:55
kashyap(I think libvirt uses the latter syntax by default: "-machine ... accel=")10:55
kashyap(Yep, it does.  Just verified)10:56
ebbexIs there a option/toggle to disable sending numa_topology from nova-compute? (We have some numascale hardware that submits "Data too long for column 'numa_topology')10:56
lyarwoodkashyap: oh fun10:57
kashyaplyarwood: Yeah.  </shameless-plug> For more on the nature of QEMU command-line, see my LWN article: https://lwn.net/SubscriberLink/872321/221e8d48eb609a38/)10:58
kashyap(Especially the "Complexity on the QEMU command line" section)10:59
gibilyarwood: o/ we can revert the temp disable on the heal_allocation in nova-next the nova-manage support landed during the night. https://review.opendev.org/c/openstack/nova/+/81624211:01
lyarwoodawesome checking 11:01
gibithanks11:01
lyarwood+W'd11:01
gibithanks11:02
lyarwoodkashyap: would you be able to test if we could overwrite the original `-machine q35,accel=tcg` part using <qemu:commandline> via libvirt?11:03
kashyaplyarwood: Let me try11:04
kashyapI think <qemu:commandline> _does_ take precedence11:04
* kashyap will confirm in a few11:04
lyarwoodwould be ace as Nova could do that itself then11:04
kashyaplyarwood: Afraid, I was wrong :-(11:17
kashyapI tried this:11:17
* kashyap is getting a paste-bin11:17
kashyaplyarwood: It doesn't overwrite, that was the XML (see line-1 and lines 102-105) https://paste.centos.org/view/1fcbc6a411:19
kashyapWith that, when I start the guest, it gives the familiar:11:19
kashyap$> virsh start cvm211:19
kashyaperror: Failed to start domain 'cvm2'11:19
kashyaperror: internal error: process exited while connecting to monitor: 2021-11-02T11:18:34.543504Z qemu-kvm: The -accel and "-machine accel=" options are incompatible11:19
lyarwoodsorry was just on a call11:21
kashyapNo rush; I don't count on instant responses :-)11:21
lyarwoodkashyap: what if you also define -machine in the XML?11:22
kashyapHmm, lemme try11:22
kashyaplyarwood: Wait, you mean setting -accel and -machine in qemu:commandline explicitly?11:22
lyarwoodkashyap: yes11:23
kashyap(If so, that should fail the same way as above, but lemme double-confirm.  libvirt uses "-machine accel" under the hood, by inference from <domain type='kvm')11:24
kashyapYep, it fails the same way.11:24
kashyaplyarwood: Oh, wait.  There might be another hack, based on my chat w/ Paolo last week:11:25
kashyap17:55 < kashyap> bonzini: Hm, how exactl does "-machine accel=kvm -machine accel=tcg" differ from "-accel kvm -accel tcg"?11:25
kashyap17:55 < bonzini> "-machine accel=tcg" overwrites "-machine accel=kvm"11:25
kashyaplyarwood: So, I can specify by "-accel tcg -accel kvm" ... and see if that works :D11:25
* lyarwood tilts head11:26
kashyapGaah, no, ignore me.  I misread the above complexity.11:26
* kashyap taps on the table and thinks ... wonder if DanPB knows a trick11:27
kashyapNo, there isn't a current trick.11:39
kashyaplyarwood: That said, based on last week chat w/ QEMU folks, libvirt itself should switch to "-accel" as that's recommended than "-machine accel"11:40
* kashyap goes to file an upstream libvirt GitLab ticket for that11:40
lyarwoodkashyap: argh kk, so there's no workaround until that happens11:41
kashyapNo, besides the ugly hack we both revulse at :D11:41
lyarwoodkashyap: blocking pretty much all upstream Openstack testing using qemu until then11:41
lyarwoodyeah without the custom build11:41
lyarwoodurgh11:41
lyarwoodtbh we need to make a big deal out of this11:41
kashyapYeah, QEMU changed it pretty much w/o considering the management tools :-(11:42
* kashyap --> needs lunch, hangry11:43
opendevreviewLee Yarwood proposed openstack/nova master: nova-manage: Always get BDMs using get_by_volume_and_instance  https://review.opendev.org/c/openstack/nova/+/81171611:47
lyarwoodhttps://review.opendev.org/q/topic:%22bug%252F1943431%22+(status:open%20OR%20status:merged) & https://review.opendev.org/q/topic:%22bug%252F1937084%22+(status:open%20OR%20status:merged) should be ready for reviews if people have time btw, simple bugfixes11:47
EugenMayerWhen trying to rebuild an instance and use --preserve-ephimeral is see `The current driver does not support preserving ephemeral partitions.`12:01
EugenMayerIs this option only available when using a storage like nfs/ceph? But how is that different to volumes then? Currently i use the compute node local storage12:01
EugenMayerI use LVM on my computes with ext4 - do i need zfs/btrfs for that to work?12:03
sean-k-mooney[m]EugenMayer that is only supported on ironic12:04
sean-k-mooney[m]its not supported with libvirt or any other vm or container based driver12:04
sean-k-mooney[m]rebuild is intended to remove all data from ths instance by recrating the root disk any epmeral disks12:05
sean-k-mooney[m]if you want to use rebuild and preserve data you should store your data in cinder volumes12:05
EugenMayeri though of ironic as just a barebone provisioning (via biofrost?) then running libvirt - but that is wrong?12:06
EugenMayerironic means, that one does not use any hypervisor at all - the barebone is the actual instance. That is the point right? sean-k-mooney[m]12:07
sean-k-mooney[m]ironic is openstack beremetal as a service project and it can be used with nova to provide instance that are phsyicl servers instead of vms12:07
EugenMayerUnderstood - thank you for clarifing12:08
sean-k-mooney[m]bifrost is an installer for ironic written in ansible12:08
sean-k-mooney[m]bifrost installs ironic in standalone mode so it canbe used without the rest of openstack to manage your phsyical hardware12:09
EugenMayerThank you!12:11
sean-k-mooney[m]lyarwood:  given we do not allow the use of qemu arg passthough in nova i dont see anyway for us to adress this in nova12:12
EugenMayerOne question to cinder - how do you deal with databases? I mean  using NFS or alikes and storing (running) a database on such a storage with heavily impact performance - this was the main reason to use local disk (we yet avoided cinder in our setup idea). How do you deal with that? Are you using a specific cinder backend like ceph/gluster so you12:12
EugenMayeractually have local latency and 'sync' the data to the central storage?12:12
sean-k-mooney[m]for database workload i think its more common to use a dedicated san and mount the data over iscsi nfs really is not up to that level of iops. ceph can handel database but generally you will need to use flash if you have high iops12:14
sean-k-mooney[m]i know that many do use local for dbs12:15
sean-k-mooney[m]e.g. the root disk or epmeral disk but then you just need to ensure that you do not use rebuild and make backups at the application level12:15
EugenMayerok so this is a common issue12:16
EugenMayersean-k-mooney[m] with flash you mean SSD/NVME drives, right? (we have those only, the latter)12:17
sean-k-mooney[m]yes if you have nvme storage and a high speed 25G+ networking you can deploy high iops workloads on cpeh but your network will become the bottleneck12:18
EugenMayerwell our network is about 1GB12:19
EugenMayerit's provider based12:19
EugenMayerdo i understand ceph correcly here, that it is actually local access when a 'backend sync' in the background, other then nfs which is a transparent access on the network mount with the performance pain 12:20
sean-k-mooney[m]the normal way to deploy databases with local storage is to deploy them in a 3 node ha cluster with local sotrage and backup to cinder volumes  with update managed via yum/apt ectra inside the vms.12:21
sean-k-mooney[m]no ceph is directly acessed over the network12:21
sean-k-mooney[m]it use the rbd protocol rather then iscsi but its more similar to iscsi then nfs12:22
EugenMayerthis means that one rather runs central db clusters for each DB variant (5.5,5.6,8 or pg 9.6,10,10) and loses the encapsulation that every app travels with it's database like we rather used right now (self container docker-compose / k8s stacks)12:23
sean-k-mooney[m]if you need to use rebuild because of a higher level orchestror then effectivly the best way to do that with only local sotrage i is to serials the rebuild and by removing 1 instace form the cluser, rebuilding it, rejoin it to the cluster then waiting for it to sync with the latest state. then repeat for the rest.12:24
sean-k-mooney[m]well with k8s it change slightly12:24
sean-k-mooney[m]in that it assumes that you have shared network based storage by default12:25
EugenMayerit can be, also k8s can use ephemeral, which we plan to12:25
sean-k-mooney[m]so it assume it cna just terminate the db contaienr and when its recreated after update it can reconnect to the same storeage on any host and get the data back12:25
sean-k-mooney[m]you plan ot use the local provide to back your persitnet volume claims?12:26
EugenMayeryeah, that is the optional/usual assumed mode in k8s - it can though also use ephemeral storage which then cannot be distributed to other nodes just like that12:27
sean-k-mooney[m]right the local provide is not normally intended for production use12:27
EugenMayeryou plan ot use the local provide to back your persitnet volume claims? <- not sure i can answer this question / understand it12:27
EugenMayerlocal provide is what you define as 'local disk'12:27
sean-k-mooney[m]it can be used of corse but it puts the burden of persitign the storage on the operator to configure the storage to be ha by some means12:28
EugenMayeror no ha at all12:28
EugenMayerdepends on the needs, of course12:28
EugenMayersean-k-mooney[m] you really helped enormously. Thanks. I guess i have to rethink the ideas with all the input given.12:29
sean-k-mooney[m]EugenMayer:  am in k8s there are 2 ways to use local storage 1 dont use a k8s volume and just use the storage in the contaienr fs, 2 confire a local storage provider on each host that can be used to creat persitent volumen that are attached to pods vis persistent volume claims12:29
EugenMayersean-k-mooney[m] i think we planned the second12:30
sean-k-mooney[m]but yes if you have no ha that works as long as you are careful with the data replication and or dont need the data to outlive the conatienr12:30
sean-k-mooney[m]i have never done this mind you but i have always wondered how mergerfs would work in production. e.g. could you use it to merge your local storage with remote such that all data is sync to the remote sotrage in the background.12:33
sean-k-mooney[m]anyway im not sure how much i helped but i have to run o/12:33
opendevreviewMerged openstack/nova master: Fix unit test for oslo.concurrency 4.5  https://review.opendev.org/c/openstack/nova/+/81594012:42
opendevreviewMerged openstack/nova master: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81139612:42
opendevreviewMerged openstack/nova master: Revert "Temp disable nova-manage placement heal_allocation testing"  https://review.opendev.org/c/openstack/nova/+/81624212:42
lyarwoodsean-k-mooney: hey sorry missed your reply above, yeah we could easily add the config classes and only use them with a workarounds configurable but as kashyap highlighted even then thanks to some other QEMU bugs it isn't going to work13:24
EugenMayersean-k-mooney[m] you helped me BIG times.13:25
EugenMayersean-k-mooney[m] i ask myself if working with freezer could be an option for rebuilding, while sticking to local disks13:29
gibilyarwood: could I ask you a favor to look at the backports of https://review.opendev.org/q/topic:bug/1944759 it is a fairly easy patch and clean backport14:31
lyarwoodack, just catching up with some downstream backports first then I'll try to get to these14:32
lyarwoodFWIW I'll be afk for the upstream meeting today, have to fetch my kid from nursery14:32
gibithanks lyarwood14:35
bauzasgibi: hmpff, I need to ask you two things14:37
gibibauzas: sure, shoot14:37
bauzas1/ I'll need to go off at 5.30pm in our TZ, so I could chair the nova meeting for only 30 mins14:38
bauzas2/ I'll be off tomorrow14:38
gibi1/ I will be here to chair things after you left14:38
gibi2/ I will be off tomorrow too :)14:38
bauzasahah ok :)14:40
bauzasthanks :)14:40
gibiI'm not sure what was the request in 2/ :D14:40
bauzasgibi: just to tell you I was off :)14:50
gibiahh OK14:50
artomThat (wait, what excactly?) reminds me - what do we need to do to get https://review.opendev.org/c/openstack/nova/+/796909 moving?15:05
artomThat's only the top change, there's 2 backport series below it, with some dependencies that are... borderline backportable? I mean to me they clearly are15:05
artomBut there's some policy controversy there :)15:05
EugenMayerwhen having not swift / cinder active, and i do a snapshot on a compute, where is the snapshot located?15:09
opendevreviewBalazs Gibizer proposed openstack/nova master: Enable min pps tempest testing in nova-next  https://review.opendev.org/c/openstack/nova/+/81174815:20
bauzaswarning European folks, our nova meeting will start in 28 mins !!!15:32
bauzasdaylight savings change15:32
bauzas1600UTC is now 5pm for CET and 4pm for BST15:33
opendevreviewBalazs Gibizer proposed openstack/nova stable/xena: Reproduce bug 1945310  https://review.opendev.org/c/openstack/nova/+/81140515:36
opendevreviewBalazs Gibizer proposed openstack/nova stable/xena: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81140715:36
EugenMayerrunning kvm on a compute (empty) with 128GB ram, EE nvmes (very fast, raid1 mdadm) with 4 cores on a AMD Ryzen 9 5950X 16-Core Processor, the instance takes 3 minutes to install python3-pip, while it is not the download, but the processing. What could be the reason that it underperformance so massively?15:37
EugenMayerdoing the same on the compute host would take < 8s15:38
EugenMayerwait the compute shows as QEMU in the hypervisor overview..how did that happen?15:39
sean-k-mooney[m]EugenMayer:  kvm is an accleration not a hypervior15:40
sean-k-mooney[m]qemu is the hyperviors15:40
sean-k-mooney[m]we show the same regardless of if you use qemu with the tcg backend or kvm15:40
EugenMayeri see15:41
bauzaskashyap: lyarwood: need help with some bug to triage upstream https://bugs.launchpad.net/nova/+bug/194922415:41
* kashyap clicks15:41
EugenMayeri wondered, since the default of nova_compute_virt_type is kvm15:41
EugenMayersean-k-mooney[m] how to be sure KVM is used?15:42
kashyapbauzas: Hmm, I have a vague recall of a similar issue from the past, but I need to reload some context.  Will check tomm15:42
bauzaskashyap: I just asked for qemu/libvirt versions15:42
sean-k-mooney[m]on the compute host you can do a virsh list then a virsh dumpxml on the domian15:42
sean-k-mooney[m]EugenMayer: ^15:42
bauzaskashyap: I'm tempted to say this is unrelated to Nova but rather a libvirt/virt bug15:42
kashyapbauzas: That's a good catch.  We don't test this area all that much15:42
kashyapLet's see if it's a bug yet :)15:43
EugenMayersean-k-mooney[m]: <domain type='kvm' id='1'>15:43
sean-k-mooney[m]yep that should be using kvm then15:44
EugenMayerso what could be the actual reason for this massive slowdown? this host easily hosted heavy load with proxmox15:44
sean-k-mooney[m]what do you have the cpu model set to?15:45
sean-k-mooney[m]and mode15:45
sean-k-mooney[m]by default its unset which for the libvirt dirver will get treated as if you set host-model15:45
EugenMayersean-k-mooney[m] https://gist.github.com/EugenMayer/cacd5c44ae7dafffa31c9c8025bee3aa15:45
sean-k-mooney[m]i know that there were slowdowns on older qemu versions with amd cpus because they did not have the correct cpu_models defiedn15:46
sean-k-mooney[m]you should proably try setting [libvirt]/cpu_mode=host-passthrough15:46
EugenMayerthis is bullseye debian - same as for proxmox 15:46
EugenMayeri would assume that it would run the same way. Sureley pve uses a different kernel but still suprised15:47
sean-k-mooney[m]if that improves the performance on the razen 5950x then it likely means you just need a newer qemu to get better perfromance out of the box without using host-passhtough15:48
EugenMayerunderstood15:48
EugenMayertrying to up the kernel from the backports and will check what qemu version would be on ubuntu15:48
EugenMayerthank you for the hint!15:48
EugenMayerhow would i actually change the cpu_mode with nova?15:48
sean-k-mooney[m]in the nova.conf set cpu_mode in the libvirt section15:49
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81141615:49
EugenMayersean-k-mooney[m] do i need to redo the instances or just stop/start?15:50
bauzaslast reminder: nova weekly meeting starts in 10 mins now here in this chan :)15:50
EugenMayerbauzas i will shut up then, promised :)15:51
bauzas(spoiler alert, you'll see new meetbot commands :p )15:51
sean-k-mooney[m]just hard reboot after restarting the agent15:51
sean-k-mooney[m]https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.cpu_mode15:51
EugenMayersean-k-mooney[m] again, thanks!15:52
bauzas(actually, not 'new', since I used them in 2014, but I haven't seen them used for a while now, so I'm all about using them back :D )15:52
sean-k-mooney[m]EugenMayer:  you likely dont need to update the kernel by the way but its worth a shot if that does not help15:52
EugenMayerbauzas you jinxed it, so it might fail! :D15:52
bauzasthe more people joining us, the better it will be :)15:53
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Reproduce bug 1945310  https://review.opendev.org/c/openstack/nova/+/81141415:54
bauzasif I can make jokes to let people chime in, I can try15:54
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81141615:55
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Query ports with admin client to get resource_request  https://review.opendev.org/c/openstack/nova/+/81141615:55
EugenMayersean-k-mooney[m] massive speedup - 17 seconds15:58
sean-k-mooney[m]cool so basically what happening qemu does not have a profile that matches your cpu. the simplet way to fix that is to update qemu to on that does. failing that host-passthough is a good choice if you dont need to live migrate the vms to different cpu models15:59
sean-k-mooney[m]althernitvie you can pick one that is close and add flags16:00
sean-k-mooney[m]we can tell you how to do that after the meeting but what you have should be fine for now16:00
bauzas3...16:00
bauzas2...16:00
bauzas1...16:00
bauzas#startmeeting nova16:00
opendevmeetMeeting started Tue Nov  2 16:00:51 2021 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:01
bauzasgood day, everyone16:01
dansmitho/16:01
sean-k-mooney[m]o/16:01
gibi\o16:01
bauzasas discussed before, I will exceptionnally be only able to chair this meeting for 30 mins16:02
bauzasso I'll let gibi co-chair16:02
bauzas#chair gibi16:02
opendevmeetCurrent chairs: bauzas gibi16:02
* gibi accepts the challenge16:02
bauzaslet's start while people join16:02
bauzas #topic Bugs (stuck/critical) 16:02
bauzas No Critical bug16:02
bauzas #link 20 new untriaged bugs (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New16:03
bauzasthanks to anybody who triaged a few of them16:03
bauzas#help any help is appreciated with bug triage and we have a how-to https://wiki.openstack.org/wiki/Nova/BugTriage#Tags16:04
bauzas 32 open stories (+0 since the last meeting) in Storyboard for Placement #link https://storyboard.openstack.org/#!/project/openstack/placement16:04
bauzasso, maybe we closed one or more stories in Storyboard, but I don't think so16:05
bauzasyeah, last story was written on Oct 21st16:06
bauzasany bug to discuss in particular ?16:06
bauzasok, I guess no, moving on16:07
bauzas #topic Gate status 16:07
bauzas Nova gate bugs #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure16:07
bauzasnothing new16:07
bauzas Placement periodic job status #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly16:07
bauzasno issues so far ^16:07
bauzasjust the usual reminder,16:08
bauzasPlease look at the gate failures, file a bug, and add an elastic-recheck  signature in the opendev/elastic-recheck repo (example: #link https://review.opendev.org/#/c/759967)16:08
bauzasthat's it for gate status16:08
gibithis is a gate fix that needs a second core https://review.opendev.org/c/openstack/nova/+/81403616:08
bauzasoh right16:08
bauzasgibi: I'll look at it while we speak16:08
gibithanks16:09
bauzas(already looked at it, but need one last glance)16:09
bauzasmoving on or any gate failure to mention besides the above one ?16:09
bauzas #topic Release Planning 16:10
bauzas Yoga-1 is due Nova 18th #link https://releases.openstack.org/yoga/schedule.html#y-116:10
bauzas(3 weeks from now)16:11
bauzaserr, 2 weeks16:11
bauzas+2d16:11
bauzaswhich means, typey typey your specs16:11
bauzashttps://review.opendev.org/q/project:openstack/nova-specs+is:open is not that large16:12
bauzaswhat makes me tell :16:12
bauzas #startvote Spec review day proposal on Tuesday Nova 16th ? (yes, no)16:12
gibiyes16:13
gibi#yes16:13
gibi(how to vote?)16:13
bauzasdang, the meetbot doesn't tell what to say16:13
bauzas#vote yes16:13
dansmithyou have a space in front16:13
gibi#vote yes16:13
dansmithof the startvote16:13
bauzas#startvote Spec review day proposal on Tuesday Nova 16th ? (yes, no)16:13
opendevmeetBegin voting on: Spec review day proposal on Tuesday Nova 16th ? Valid vote options are , yes, no, .16:13
opendevmeetVote using '#vote OPTION'. Only your last vote counts.16:13
bauzasdansmith: huzzah16:13
gibi#vote yes 16:13
opendevmeetgibi: yes  is not a valid option. Valid options are , yes, no, .16:13
sean-k-mooney[m]yes16:14
gibi#vote yes, ! 16:14
opendevmeetgibi: yes, !  is not a valid option. Valid options are , yes, no, .16:14
gibi#vote yes,16:14
opendevmeetgibi: yes, is not a valid option. Valid options are , yes, no, .16:14
dansmithlol16:14
bauzasoh man16:14
bauzasthis is an absolutule fail.16:14
dansmith#vote yes,16:14
opendevmeetdansmith: yes, is not a valid option. Valid options are , yes, no, .16:14
dansmith#vote yes, no16:14
opendevmeetdansmith: yes, no is not a valid option. Valid options are , yes, no, .16:14
gibi#vote ,16:14
opendevmeetgibi: , is not a valid option. Valid options are , yes, no, .16:14
dansmith#vote  yes, no16:14
opendevmeetdansmith: yes, no is not a valid option. Valid options are , yes, no, .16:14
sean-k-mooney[m]lets just assume we are ok with the 16th and move on16:14
* bauzas facepalms16:14
dansmith#vote  yes, no,16:14
opendevmeetdansmith: yes, no, is not a valid option. Valid options are , yes, no, .16:14
yuriysthis is bot abuse!16:14
bauzas#endvote16:14
opendevmeetVoted on "Spec review day proposal on Tuesday Nova 16th ?" Results are16:14
dansmith#vote meh16:14
gibithet is fun 16:14
bauzasok I guess this was epic but we should leave this bot quiet back for 5 years16:15
bauzasanyway,16:15
bauzas#agreed Spec review day happening on Nov 16th16:15
bauzasvoilĂ 16:15
bauzasmoving on16:15
bauzas #topic Review priorities 16:16
bauzas#link  https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement)+label:Review-Priority%252B116:16
dansmithalso leading space16:16
bauzasdansmith: good catch, the copy/paste makes me mad16:16
bauzas#topic Review priorities 16:16
bauzas#undo16:16
opendevmeetRemoving item from minutes: #topic Review priorities 16:16
bauzasfun, the meetbot isn't telling new topics16:17
bauzasanyway, next point16:17
dansmithit doesn't on oftc I think16:17
bauzas#action bauzas to propose a documentation change by this week as agreed on the PTG16:17
dansmithbut if you don't do it #properly it won't record them either16:17
bauzasfor adding a gerrit ACL to let contributors +1ing16:18
bauzasdidn't had time to formalize it yet16:18
bauzas#topic Stable Branches 16:18
bauzaselodilles: floor is yours16:18
bauzasI guess he's not around16:20
bauzasso I'll paste16:20
bauzasstein and older stable branches are blocked, needs the setuptools pinning patch to unblock: https://review.opendev.org/q/I26b2a14e0b91c0ab77299c3e4fbed5f7916fe8cf16:20
bauzaswe need a second stable core especially on https://review.opendev.org/c/openstack/nova/+/81345116:20
bauzasUssuri Extended Maintenance transition is scheduled to next week (Nov 12)16:21
bauzasthe list of open and unreleased patches: https://etherpad.opendev.org/p/nova-stable-ussuri-em16:21
bauzasI guess we need to make a few efforts before ussuri becomes EM16:22
bauzaselodilles: again, I offer my help if you ping me16:22
bauzaspatches that need one +2 on ussuri: https://review.opendev.org/q/project:openstack/nova+branch:stable/ussuri+is:open+label:Code-Review%253E%253D%252B216:22
bauzas(I'll skim this list)16:23
elodillesoh, sorry, DST :S16:23
bauzaslast but not least: https://review.opendev.org/806629  patch (stable/train) needed 14 rechecks, I was pinged with the question  whether testing should be reduced in train to avoid this amount of  rechecks (mostly volume detach issue)16:23
bauzaselodilles: hah, I warned about it in the channel :p16:23
sean-k-mooney[m]are the detach issue due to the qemu version we have in bionic16:24
sean-k-mooney[m]i assume train is not on focal?16:24
elodillesyes, train is on bionic16:24
elodilles(just like ussuri)16:24
bauzashmmm, technically, Train is EM16:26
sean-k-mooney[m]im somewhat tempeted to same maybe move it to centos 8 or focal but we could disable the volume tests16:26
sean-k-mooney[m]yes it is16:26
bauzasI'd rather prefer us fixing the gate issues rather than reducing the test coverage, but this depends on any actions we can take16:27
bauzasso, let's be pragmatic16:27
sean-k-mooney[m]well the first question would be does train have gibis event based witing patch or is it still using the retry loop16:28
bauzasgibi's patch isn't merged yet, right?16:28
bauzascould it help ?16:28
sean-k-mooney[m]the only options reallly to fi this are change the qemu verions or backport gibis patch16:28
bauzas(I'll have to leave gibi chair in the next 2 mins but dansmith has a point I'm interested in)16:29
dansmithI also have to go sooner16:29
bauzassean-k-mooney[m]: we can try to backport gibi's patch and see whether that helps16:29
dansmithmaybe we could swap open and libvirt?16:29
bauzasdansmith: I'll16:29
gibiI don't think there is anything in the libvirt topic16:29
gibilyarwood is out now16:30
dansmithokay16:30
bauzasokay, elodilles I'll propose to wait for gibi's patch to land in master and then be backported16:30
gibibauzas: it is backported til wallaby16:30
elodillesbauzas: ack16:30
bauzasand punt the decision to reduce the test coverage once we get better ideas16:30
gibiif we are talking about https://review.opendev.org/q/topic:bug/188252116:30
bauzasgibi: then we need to backport it down to train16:31
gibiI don't think it will be a piece of cake to bring that back train16:31
gibi*to train16:31
bauzasgibi: (apologies I confused with the vnic types waiting patch)16:31
bauzasI have to leave, but can we hold this one discussion and go straight to dansmith's point16:31
bauzas?16:32
gibianyhow we can take that outside when lyarwood is back16:32
bauzasso I and dansmith can leave16:32
gibilets go to that16:32
bauzas#topic Sub/related team Highlights 16:32
bauzasnothing to tell16:32
bauzas#topic Open discussion 16:32
bauzasBring default overcommit ratios into sanity (dansmith / yuriys)16:32
dansmithSo, I think we all know the default 16x cpu overcommit default is insane16:32
yuriysexciting16:32
dansmithwe've got reports that some operators are USING those defaults because they think we're recommending them16:32
sean-k-mooney[m]yes it is16:32
yuriysI am so used to Slack and Discord for drop a paragraph level of communication, so pardon all the incoming spam! I prewrote stuff.16:33
bauzashah16:33
dansmithyuriys is here to offer guidance and help work on this,16:33
dansmithbut I think we need to move those defaults to something sane, both in code and update the docs16:33
bauzasI guess this can be workload dependent, right?16:33
sean-k-mooney[m]it basically should not be set over 4x16:33
yuriys4:1 for cpu , 1:1 for mem.16:33
sean-k-mooney[m]yep16:33
bauzasI think we started to document things based on workloads16:33
yuriysyes16:33
dansmithbauzas: it's completely workload dependent, but we should not be recommending really anything, and thus I think the default needs to be closer to 1:1 with docs saying why you may increase it (or not)16:33
bauzasbut we never achieved this16:33
yuriysit's VERY use case specific16:34
sean-k-mooney[m]it is16:34
bauzasI'm referring to https://docs.openstack.org/nova/latest/user/feature-classification.html16:34
yuriysIdeally the documentation is restructured on how to scale up these over commits to match the desired use case, density and performance and start at sane values/defaults. Engineers can then template out the necessary config values after they've gotten to know system capabilities. Instead of working backwards from chaos and mayhem, giving future admins the opportunity to reach desired state through scaling up from a stable syste16:34
yuriyshould be the goal of arch design.16:34
dansmithso can we agree that we'll change the defaults to something that seems reasonable, and modify the docs that just say "these are the defaults" to have big flashing warnings that defaults will never be universal in this case?16:35
sean-k-mooney[m]the 16:1 number originally was assuming webhosting as the main usecase or similar workloads16:35
dansmithspecifically: https://docs.openstack.org/arch-design/design-compute/design-compute-overcommit.html16:35
sean-k-mooney[m]that does not fit with how openstack is typicaly used16:35
bauzasdansmith: this sounds a reasonable change16:35
dansmithcool, specless bp or bug?16:35
bauzasmy only worries would go on how this is wired at the object level but we can take the opportunity to lift this16:36
sean-k-mooney[m]4:1 for cpu is the highist ratio i would general consider usable in production16:36
bauzasdansmith: there are a few upgrade concerns with placement as IIRC this is set by the model itself16:36
dansmithI think we already moved towards something better when we moved the defaults to placement right? but we need to do something16:36
sean-k-mooney[m]well we have the inital allcoation ratios now16:36
yuriysi said 4 to be reasonable haha, i think i've set about 3:1 with nova-scheduler and placement randomization elements.16:36
dansmithbauzas: yeah I think placement now has explicit defaults right?16:36
bauzasI'm pretty sure we have some default value in the placement DB that says "16"16:37
sean-k-mooney[m]but it still default to 1616:37
dansmithsean-k-mooney[m]: right16:37
sean-k-mooney[m]i think we can decrease inial to 4 for cpu an 1 for memory16:37
dansmithso I think we can just move those and reno that operators who took those defualts years ago should change them likely16:37
sean-k-mooney[m]+116:37
bauzasI don't wanna go procedural16:37
bauzasso a specless BP could work for me but,16:37
bauzaswe need renos16:37
dansmithsean-k-mooney[m]: yeah I think 4:1 CPU and 1:1 memory is fine for a default, we might need up to up for devstack I guess but that's where insane defaults should be :)16:38
sean-k-mooney[m]yep i was thinking the same16:38
bauzas+ we need to ensure we consider the DB impact before16:38
dansmithcool, specless bp and renos.. sounds good16:38
bauzasif that becomes debatable in the reviews, we could go drafting more16:38
sean-k-mooney[m]bauzas:  i dont think there will be any16:38
bauzasbut here, we're talking of changing defaults, not changing existing deployments16:38
sean-k-mooney[m]if we are just changing the initial values it wont affect existing RPs16:39
dansmithyeah I think it'll be straightforward, but we can always revise the plan if needed16:39
dansmithright16:39
bauzasOK, looks to me we have a plan16:39
dansmith#micdrop16:39
bauzas#agreed changing overcommit CPU ratio to <16.0 can be a specless BP16:39
bauzasyuriys: typey typey16:39
bauzasand ping me on IRC once you have the Launchpad BP up so I can approve it16:40
* bauzas needs to drop16:40
gibiOK16:40
gibiis there anything else for today?16:40
yuriysno idea what that means, but sounds good?16:40
yuriysill just coordinate through dan i suppose16:41
gibiyuriys: you need a file a blueprint here https://blueprints.launchpad.net/nova/16:41
gibiso we can track the work16:41
dansmithyuriys: with just an overview of what we said, no big deal16:41
gibiyepp16:41
yuriysAh sounds good. Dang, I thought I was going to have like do a whole speech and everything16:42
yuriysjust to win over votes16:42
yuriyshaha16:42
gibi:)16:42
dansmithyuriys: I told you it wouldn't be a big deal :)16:42
gibiit is not a bid deal if dansmith is on your side ;)16:42
yuriysyeah, i think the expandability still needs to be part of that doc btw16:42
yuriysfor dollar reasons16:42
yuriysbut ill throw up a BP and we'll go from there16:43
gibicool16:43
gibiis there anything else for today? I don't see other topics on the agenda16:43
gibiit seems not16:44
gibiso then I have the noble job to close the meeting:)16:44
gibithank you all for joining today16:44
gibi#endmeeting 16:44
opendevmeetMeeting ended Tue Nov  2 16:44:40 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:44
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-02-16.00.html16:44
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-02-16.00.txt16:44
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2021/nova.2021-11-02-16.00.log.html16:44
EugenMayersean-k-mooney[m] what is the 'qemu' version. i guess nova-compute-qemu/stable 2:22.0.1-2 all is just management version of nova, not the actual qemu version16:46
sean-k-mooney[m]try qemu-system-x86_64 —verions16:46
EugenMayersean-k-mooney[m] hmm, i use kolla, thus i guess libvirt is insite the docker package16:47
sean-k-mooney[m]ah in that case you can docker exec into the nova_libvirt container16:47
sean-k-mooney[m]it contians the libvirt and qemu binaries that are uesd16:48
EugenMayerbut that means that this is not controlled by me16:48
sean-k-mooney[m]well you could rebuild the container. do you have multiple servers with differnt cpus?16:49
EugenMayersean-k-mooney[m] qemu-kvm                             1:4.2-3ubuntu6.18                     amd64        QEMU Full virtualization on x86 hardware16:49
EugenMayerall main computes have the same (exact) - all have AMDs (there are smaller AMDs)16:49
sean-k-mooney[m]if they are all exactly the same then there is no downside to using host-passthough16:50
EugenMayerwell i willh ave no live migrations16:50
sean-k-mooney[m]it will give you the best performance but the limitation it imposes is you can only live migrate to other hosts with the exact same cpu16:50
EugenMayeri have no live migrations since no shared block storage16:50
EugenMayer(not planing to)16:51
sean-k-mooney[m]you do not need shared storage for live migration16:51
sean-k-mooney[m]it just makes it faster if you do16:51
EugenMayeri guess a non-live migration from AMD-A to AMD-B should be no issue right16:51
EugenMayerIt tells me 'live migration is not available'16:52
sean-k-mooney[m]cold migration has no cpu requirement beyond dont change the architeture i guess16:52
sean-k-mooney[m]so ya you can always fall back to cold migration16:52
EugenMayersure, amd64 they all are, but most are the big ryzen, the others the little brothers16:52
sean-k-mooney[m]so your other option is to find the closest cpu model that your qemu support and add in addtional cpu flags16:53
EugenMayer99% of live migration will happen between the main compute, all AMD Ryzen 9 5950X16:53
sean-k-mooney[m]you could group them in an az16:54
EugenMayerdoesnt host-passthrough also harm the security / encapsulation?16:54
EugenMayeryes, az planned for the big ones, the smaller ones are internal CI servers only (azure agents or concourse CI workers)16:54
sean-k-mooney[m]no it just allows the vm to use all the cpu features16:56
sean-k-mooney[m]host-model still allows the vm to know what model of cpu is used16:56
EugenMayerhmm, all VMs are our VMs, no customers or such. so i know what runs on each of them16:57
sean-k-mooney[m]so form a security point of view more or less the same16:57
EugenMayeri guess passthrough and hosting freebsd could be an issue. What about running windows?16:57
EugenMayerThank you so much!16:59
EugenMayerthe only thing i disklike is that i would need to fix every compute and i'am not sure this configuration file is controlled by kolla, but they have overrides and in the end, i have chef, so this will do it in any case17:00
sean-k-mooney[m]it is17:01
sean-k-mooney[m]you can drop an override in /etc/kolla/config/nova.conf or in /etc/kolla/nova/nova-compute.conf i belvie17:02
sean-k-mooney[m]https://docs.openstack.org/kolla-ansible/latest/admin/deployment-philosophy.html#kolla-s-solution-to-customization17:03
sean-k-mooney[m]its basically the example they use17:03
sean-k-mooney[m]its one of the reasons i like kolla17:04
sean-k-mooney[m]it makes this type of config simple17:04
yuriysthere are way too many reasons to like kolla17:04
EugenMayersean-k-mooney[m] thanks. Did that with globa_pysical_mtu already17:24
EugenMayeryuriys there also some to not do so. As with everything17:24
EugenMayersean-k-mooney[m] your right, nova.conf would work https://gist.github.com/EugenMayer/d74ffdc0b15db9c9c1af344bd27accd117:33
Zer0Bytehey @lyarwood are you there?17:49
lyarwood\o evening, yeah17:52
Zer0Bytehey17:52
Zer0Bytehttps://bugs.launchpad.net/bugs/194922417:52
Zer0Byteis regarding about this bug17:52
lyarwoodYeah apologies I just saw your update17:53
lyarwoodand I missed this QoS spec on the cinder side was per GB17:53
Zer0Bytewhat i mean is performing the resize only should update the iops on the kvm configuration isn't it?17:53
lyarwoodI guess this is valid in that case but I'm not sure how we can fix this between Nova and Cinder17:53
lyarwoodif it's a size related QoS spec then yeah I guess it should17:54
lyarwoodtbh I didn't even know these existed17:54
Zer0Byteyeah im using on my backend storage and work well17:54
* lyarwood reads https://docs.openstack.org/cinder/latest/admin/blockstorage-capacity-based-qos.html17:55
Zer0Bytethe problem is with nfs this is why in moving to the frontend17:55
Zer0Byteand if i create a new instance with the same volume resized take the new iops specs17:56
lyarwoodhttps://github.com/openstack/nova/blob/82be4652e2c840bd69ec354fd734a2d3f83f395b/nova/virt/libvirt/volume/volume.py#L63-L77 is where Nova is told about the cinder side QoS FWIW17:56
Zer0Bytebut this is only during volume attach ?17:57
lyarwoodYeah there's a missing piece during a resize17:58
lyarwoodA workaround would be shelve and unshelve the instance17:58
Zer0Bytelet me try it17:58
lyarwoodThat should regenerate the connection_info in cinder and have that passed to Nova17:58
Zer0Byteyou are right @lyarwood works if i perform shelve and unshelve18:04
Zer0Bytequesiton but shelveand unshelve change mac address or any machine configuration?18:05
Zer0Bytelike uuid or serial 18:05
lyarwoodOverall things should remain the same but I'm not entirely sure if we persist the MAC addresses, sean-k-mooney ^ any idea?18:06
Zer0Byteyeah is running cloud init again18:07
Zer0Bytechanging the ssh key18:07
lyarwoodcloud-init shouldn't regenerate ssh keys if they already exist right?18:08
Zer0Bytemhmm if the machine id change18:09
Zer0Bytetrigger execute cloud init agai18:09
Zer0Byten18:09
Zer0Byteand cloudinit perform ssh-keygen 18:09
sean-k-mooney[m]lyarwood:  the mac adress comes from the neutron port so it wont change18:10
sean-k-mooney[m]the machine id i guess you mean the one shown in dmidecoe18:11
sean-k-mooney[m]i think that depends on your config but i think its the vms uuid by default18:11
sean-k-mooney[m]https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sysinfo_serial18:11
sean-k-mooney[m]Zer0Byte:  so it would only change if you had set the sysinfo_serial to OS which uses the host /etc/machine-id file , hardware which uses the host uuid form libvirt or auto whcih choses between those18:14
sean-k-mooney[m]Zer0Byte: so with our default config of unique unshleving should not change the guest serial and cloud-init should not run18:14
sean-k-mooney[m]the mac adress should not change either unless you changed it manually in neutron while i was sheleved18:15
lyarwoodanyway there's some additional tooling in Xena to refresh connection_info for shutdown instances without the need to shelve and unshelve etc18:17
lyarwoodhttps://docs.openstack.org/nova/latest/cli/nova-manage.html#volume-attachment-commands18:17
lyarwoodthat's another way to workaround it while we try to work something out in-tree18:17
lyarwoodtbh I can't think of a way with the current cinder APIs18:18
* lyarwood brb18:18
Zer0Bytei got to go thanks anyway i will check the option of @sean-k-mooney[m] 18:19
sean-k-mooney[m]no worries im offically on pto until tomorrow anyway so not really here today just saw you question while i was checking over my car insurance18:20
*** ianw_pto is now known as ianw19:00
EugenMayerAnybody in here uses freezer (successfully?)19:01
dansmithbauzas: when you return: https://blueprints.launchpad.net/nova/+spec/nova-change-default-overcommit-values19:16
hyang[m]Can anyone help to take a look this patch https://review.opendev.org/c/openstack/nova/+/811521 thanks in advance!20:10
opendevreviewMerged openstack/nova stable/stein: [stable-only] Pin virtualenv and setuptools  https://review.opendev.org/c/openstack/nova/+/81345122:31

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!