Friday, 2023-05-05

auniyalfinally https://review.opendev.org/c/openstack/nova/+/839922 merged, feels like a good morning :D 03:13
opendevreviewTobias Urdin proposed openstack/nova master: Fix wrong nova-manage command in upgrade check  https://review.opendev.org/c/openstack/nova/+/88081906:12
opendevreviewAmit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes  https://review.opendev.org/c/openstack/nova/+/88145706:49
opendevreviewAmit Uniyal proposed openstack/nova master: WIP: Delete dangling volumes  https://review.opendev.org/c/openstack/nova/+/88228406:49
* bauzas facepalms : I forgot to say 'recheck ' when i wrote my gerrit comment 07:22
opendevreviewyatin proposed openstack/nova master: Add config option to configure TB cache size  https://review.opendev.org/c/openstack/nova/+/86841907:23
sahido/07:30
opendevreviewDanylo Vodopianov proposed openstack/nova-specs master: Add support for Napatech LinkVirt SmartNICs  https://review.opendev.org/c/openstack/nova-specs/+/85929008:27
gibibauzas: thanks for the +2 on https://review.opendev.org/c/openstack/nova/+/862687 , could you please check the test patch below that as well?08:34
* bauzas clicks08:34
bauzasgibi: ah I forgot to click 'submit'08:35
bauzasshitty today08:35
bauzasI could be off for 15 mins, the Tesla garage (changing my windshield due to a rock) will pass me a TMX for 10 mins :) 08:36
gibienjoy :)08:57
opendevreviewMerged openstack/nova master: Reproduce asym NUMA mixed CPU policy bug  https://review.opendev.org/c/openstack/nova/+/86268610:07
sean-k-mooneyim off today but while i remember this i found an issue with nova unit tests yesterday when trying to setup my new laptop11:43
sean-k-mooneysome of our unit tests are calling systemctl becasue of a lack of mocks11:43
sean-k-mooneyi noticed this by runing the tests in a ubuntu:22.04 container that did not have it11:43
sean-k-mooneyill try an repoduce on monday and file a bug or fix it depeneind on how much time i have to look at it11:44
sean-k-mooneygibi: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0ae/862687/3/gate/nova-tox-functional-py38/0aea18b/testr_results.html there is an odd db issue in teh func test result on your patch11:46
sean-k-mooneythat often meens there is a sharing of global state somewhere11:47
sean-k-mooneyit passed in other runs so we can proably just recheck it and see if its persitent but that test might be flaky so we should kep an eye on it11:47
sean-k-mooneythe failure is not related to your change11:48
gibisean-k-mooney: it is tracked in https://bugs.launchpad.net/nova/+bug/200278211:54
sean-k-mooneyoh cool11:54
sean-k-mooneyi have seen it once or twice but not often11:55
sean-k-mooneyo/11:55
gibio/11:56
opendevreviewFranciszek Przewoźny proposed openstack/placement master: Changed /tmp/migrate-db.rc to /root/migrate-db.rc  https://review.opendev.org/c/openstack/placement/+/88243612:14
opendevreviewFranciszek Przewoźny proposed openstack/placement master: Changed /tmp/migrate-db.rc to /root/migrate-db.rc  https://review.opendev.org/c/openstack/placement/+/88243612:26
opendevreviewDan Smith proposed openstack/nova master: Populate ComputeNode.service_id  https://review.opendev.org/c/openstack/nova/+/87990414:11
opendevreviewDan Smith proposed openstack/nova master: Add compute_id columns to instances, migrations  https://review.opendev.org/c/openstack/nova/+/87949914:11
opendevreviewDan Smith proposed openstack/nova master: Add dest_compute_id to Migration object  https://review.opendev.org/c/openstack/nova/+/87968214:11
opendevreviewDan Smith proposed openstack/nova master: Add compute_id to Instance object  https://review.opendev.org/c/openstack/nova/+/87950014:11
opendevreviewDan Smith proposed openstack/nova master: Online migrate missing Instance.compute_id fields  https://review.opendev.org/c/openstack/nova/+/87990514:11
opendevreviewArtom Lifshitz proposed openstack/nova stable/wallaby: Reproduce bug 1995153  https://review.opendev.org/c/openstack/nova/+/88232114:12
opendevreviewArtom Lifshitz proposed openstack/nova stable/wallaby: Save cell socket correctly when updating host NUMA topology  https://review.opendev.org/c/openstack/nova/+/88232214:12
dansmithkashyap: another kernel crash this morning: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3b1/881764/9/check/cinder-tempest-plugin-basic-zed/3b198f9/testr_results.html15:38
dansmithagain, in a test that is attaching volumes, but a very different scenario than the original one.. the former was ceph backed and a full tempest run, this one is non-ceph, small set of tests for just the cinder plugin15:39
dansmithI wonder if it would make sense to just open a kernel bug instead of hitting up the virt team?15:39
kashyapdansmith: Hmm, so it is "fairly intermittently reproducible".  I'm assuming you're meaning an upstream kernel bug?  (I've filed those in the past - depending on the subsystem, they get looked at; or it just rots)15:47
kashyaps/upstream/distro/15:47
dansmithkashyap: well, starting with ubuntu kernel might be reasonable, yeah15:47
kashyapdansmith: Since the host (L1) is Ubuntu, how about we (I can do it on first thing Monday) start from filing an Ubuntu kernel bug?15:47
kashyapBingo15:48
dansmithhttps://bugs.launchpad.net/nova/+bug/201861215:48
dansmithI just filed this for nova ^15:48
dansmithso you can reference that if you want15:48
dansmithkashyap: yeah that'd be great if you can do that15:49
dansmithsean-k-mooney was also going to get an alpine guest to try to see if it ever sees the same15:49
dansmithI wish we had a cirros-bug-rhel-kernel image so we could make it a RHEL thing, but.. not sure we can15:49
dansmithbut since this is ubuntu host, ubuntu guest, that's probably a good place to start15:49
kashyapdansmith: Aaaah, nice.  Can we do an "affects kernel" reference here?  /me loos15:50
kashyaps/loos/looks/15:50
dansmithkashyap: idk15:50
kashyapDon't worry; I'll do it15:50
dansmiththanks15:50
kashyapBy "cirros-bug-rhel-kernel" image you mean CirrOS with the RHEL kernel on it, right?15:51
kashyap(Yeah, I agree; as there's more kernel-team capacity that can look at)15:51
dansmithah, I typo'd15:51
dansmithI meant "cirros-but-with-the-rhel-kernel" :)15:51
dansmithi.e something that can run in 128MB of ram so we could use it to repro (or not) the problem :)15:51
kashyapAah, okay, I parsed it right then15:51
kashyapYeah, I see what you mean.  And _still_ the RHEL folks might decline to debug -  RHEL doesn't support TCG (emulation).15:52
kashyapI need to go for a doc appointment; be back later.  And point noted.  I hope this is not grinding the CI env to a halt15:52
dansmithno, but I'm just one guy and have seen this a few times this week15:53
dansmitheven CI impact aside, if it can happen to real instances, that's something we should care about15:53
kashyapFair enough.  "Other guys might see it more times" 15:53
dansmithah, noted about TCG15:54
kashyapWhat makes this really tricky is QEMU on KVM setup :-( "KVM on KVM" would make it so much more tractable for getting the RHEL kernel/KVM folks to look at15:54
dansmithyeah, well, hard to fix that really15:55
dansmithif it's really TCG related that'd be interesting I guess15:55
kashyapWe used to get some nodes from KVM-on-KVM setup; I forget which cloud it is15:55
dansmithI understand about not debugging issues in unsupported environments, but unless it matters for this, it'd suck to ignore it because of that15:55
kashyapYeah, if we can reproduce it in an env w/ KVM-on-KVM then it's a "more real issue" (for lack of a better term)15:56
kashyap(I agree)15:56
* kashyap back later15:57
clarkbwe have nested virt flavors that should schedule to those that do kvm on kvm15:59
clarkbyou could pusha change that ran the workload a bunch of times on those flavors to see if it occurs with kvm on kvm15:59
clarkbre emulation not being supported: I always find it interesting when the ability to test your software in the first place is deprioritized. It feels like openshift does this as well since you can't just run the services in a few containers and test our software integration with it anymore (you could with v3 but v4 effectively killed that but some people are working on it again)16:00
dansmithclarkb: the cinder tempest plugin jobs are pretty volume and compute heavy, so just running those on that flavor might be a thing16:00
dansmithclarkb: can you easy button show me how to do that?16:00
clarkbdansmith: ya let me pull up the necessary info16:03
clarkbI think devstack's special nodesets may make this slightly more complicated than we would hope but still doable16:03
clarkbdansmith: you would define a new nodeset like https://opendev.org/openstack/devstack/src/branch/master/.zuul.yaml#L11-L19 changing its name to something unique and modifying its label value(s) to nested-virt labels (full list of labels can be seen at https://zuul.opendev.org/t/openstack/labels) Then modify your tempest job definitions to use that new nodeset.16:06
clarkbhttps://opendev.org/openstack/tempest/src/branch/master/zuul.d/integrated-gate.yaml#L25416:06
clarkbthen if you make a lot of copies of the job definition, each with a unique name, you can add them to your pipeline definition https://opendev.org/openstack/tempest/src/branch/master/zuul.d/project.yaml#L10 and run the same thing a bunch of times in parallel16:07
dansmithso I depends-on that devstack change from wherever I need to do that?16:07
clarkbyes, or you can just define the new nodeset directly where you do it16:09
clarkbsince you are making a copy it can live in the same change that you write for this (in tempest or nova or cinder etc)16:09
clarkbIt just needs to follow the same basic format of the devstack nodesets because devstack has expectations about groups. You can change the nodeset name and node labels though16:10
dansmithah, okay16:10
dansmithwould it not be easier to just redefine all or most of our nodesets to be the nested label and then I could just run regular jobs depends-on that devstack change without needing to modify the nodeset everywhere?16:11
opendevreviewArtom Lifshitz proposed openstack/nova stable/wallaby: Reproduce bug 1995153  https://review.opendev.org/c/openstack/nova/+/88232116:12
opendevreviewArtom Lifshitz proposed openstack/nova stable/wallaby: Save cell socket correctly when updating host NUMA topology  https://review.opendev.org/c/openstack/nova/+/88232216:12
clarkbyes you could do that too16:16
dansmithokay, lemme try that16:16
clarkbit would probably be worth a note in the commit message that you shouldn't merge that change bceuase it will severely limit how many available nodes are available for running all devstack based jobs16:20
clarkbI think it is a single cloud currently16:20
dansmithyep, marked as DNM16:20
clarkbdansmith: oh and you need to update devstack to not force emulation by default16:20
dansmithyeah I saw that16:20
* clarkb looks for where that is done16:20
dansmithI guess I better do that in the devstack patch too16:20
dansmithI found it already16:21
clarkblooks like devstack/.zuul.yaml grep for LIBVIRT_TYPE and set to kvm16:21
clarkbbut you found it16:21
clarkb(it is set twice I think both need modification)16:22
dansmithyu[p16:22
dansmithclarkb: https://review.opendev.org/c/openstack/devstack/+/88245716:26
dansmith"does not match definition on master"  ?16:26
clarkbbah16:27
clarkbI guess you will need to add definitions then with unique names16:27
dansmithhow would one ever change them then/16:30
dansmithdefine new and delete the old or something?16:30
clarkbyes16:30
clarkbthere are alternatives as well. You can define anonymous nodesets in jobs directly (without names they just appl when that job runs) or in a central unbranched repo then there is a single copy of them (project-config could serve this purpose)16:31
clarkbI believe they are defined directly in devstack to make it easier for third party ci systems to use though16:31
dansmithokay so I should be able to just do this in c-t-p's .zuul then16:31
clarkbyes that would work16:31
spatelsean-k-mooney afternoon! 16:31
clarkband then you can flip the LIBVIRT_TYPE var there too I think16:32
spatelIf you around then i have question related nova DB cleanup stuff, my machine oom out and some VMs stuck in nova DB and not sure how to clean them out. 16:32
dansmithclarkb: I'm messing up something with the nodeset definition: https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/88245816:48
dansmiththe error message is less helpful this time16:48
dansmithoh wait16:50
dansmithis it because name isn't indented? must be it16:51
dansmithi believe it's running as expected now, thanks clarkb 17:03
opendevreviewBalazs Gibizer proposed openstack/nova master: [doc]Clarify devname support in pci.device_spec  https://review.opendev.org/c/openstack/nova/+/88246417:34
clarkbsorry I stepped out for a bit18:03
dansmithclarkb: no problem I think I'm good now18:07
dansmithclarkb: semi-related, am I just stupid or is it impossible to get two conditions in the opensearch query?18:07
dansmithany two-condition search I do always returns no results18:08
dansmithand I get a syntax error18:08
dansmithoh I guess I need an "and" operator18:08
clarkbI don't know I'm not really invovled in it18:11
dansmithit seems like I used to be able to stumble myself into a useful query and since the upgrade I can only get really basic stuff to work18:12
dansmithclarkb: so, I guess something is still wrong because lib/nova switches my requested kvm to qemu because /dev/kvm is not accessible18:44
dansmithhttps://github.com/openstack/devstack/blob/master/lib/nova#L26918:44
dansmithoh, because that job didn't select the nested-jammy label, hrm18:44
spatelAny idea how to clean up orphan VMs entries in nova DB?19:36
spatelI have used virsh destroy command to delete vms and now DB has entries for them but VM doesn't exist 19:36
dansmithspatel: virsh destroy does nothing for nova vms, nova will just try to recreate them19:40
spatelHmm, I did delete them in openstack also using nova delete command 19:41
dansmiththat's the only way, but they remain in the database until you archive (as you noted in your mailing list post)19:42
dansmitharchive will only remove them if they're marked as deleted19:42
spatelThey are doesn't exist https://paste.opendev.org/show/b0Caj0S65vx4hBFhAyML/19:43
spatelopenstack hypervisor stat showing 91 vms running but openstack servers list showing only single VM19:43
spatelDefinitely nova DB is out of sync 19:43
spatelI look into nova/instances DB table and there is only single entry 19:44
dansmiththen what's the problem?19:45
dansmithjust the running_vms count?19:45
spatelYes...19:45
spatelHow do i may everything in sync ? 19:45
spatelJust curious from where openstack hypervisor command finding 91 vms?19:46
dansmithyou need to look at compute_nodes.running_vms to see which one is still reporting instances19:47
spatellet me take a look at that table19:47
spatelI am not able to find that tables in DB 19:51
spatelit should be inside nova/instance correct?19:51
dansmithinstance (assuming you meant instances) is a table, compute_nodes is a table, running_vms is a column in the compute_nodes table19:52
spatelfound it 19:53
spatelYes, I can see them there that on node1 - https://paste.opendev.org/show/bnhl6ZeHXIA3Tjk1ucV6/19:56
spateldo you think just update those number in table is enough?19:56
dansmiththat's three nodes19:56
dansmithno19:56
dansmithI mean, that will make the number change, but it's not the right fix19:57
dansmithyou need to select the hostname along with the count to know which is which19:57
dansmithnova-compute should be updating those numbers19:57
dansmithselect host,hypervisor_hostname,running_vms from compute_nodes;19:58
spatelhttps://paste.opendev.org/show/bwohGVNFtteg3J4x5qUY/ 19:58
spatelat present on ctrl node there are no VM running.. 19:59
spatelat present on ctrl1 and ctrl3 node there are no VM running.. 19:59
spatelThat entry should be zero technically 19:59
dansmithis nova-compute running on each of those three nodes?19:59
dansmithbecause it should be updating that number every few minutes19:59
spatelyes its running20:00
spatelall services showing fine.. I have restarted them 20:00
spatelno nasty logs or errors anywhere 20:00
dansmiththey should all be iterating over their instances regularly and updating those numbers20:02
dansmithperhaps it's not doing that if there are no instances (although you said there was one, so at least that one should be correct)20:05
spatelout of 3 nodes only node2 has 1 VM running and rest are empty 20:06
dansmithyeah, so that node should show 1 in the database and doesn't, which to me means something is wrong (unless it hasn't run update_available_resource yet)20:07
spatelThinking to reboot all 3 nodes to start with fresh troubleshooting 20:07
spatelwho will run update_available_resource task? compute nodes correct?20:08
dansmithnova-compute does it20:08
spatelmay be rabbitMQ is in zombie state... I have checked cluster_status and its showing all good but who knows.. 20:09
dansmiththere should be errors in nova-compute if so, but hard to say after something like an oom20:09
spatelLet me check.. 20:11
spatelI found this lines in nova-compute - AMQP server on 192.168.1.11:5672 is unreachable: timed out. Trying again in 0 seconds.: socket.timeout: timed out20:12
spatelLook like issue is related to rabbit.. hmm20:13
spatelbut cluster status is green20:13
spatelLet me destroy rabbit and rebuild it to see if it come clean 20:13
spateldansmith look like it was rabbit issue, after re-building rabbit I can see correct count on hypervisor stats :) 20:27
dansmithspatel: good, that's why I was recommending you not just fix it manually because it's an indication of something else20:28
spatelThank you for staying with me :) 20:28
spatelI was about to blowup DB.. haha20:28
spateldansmith I have last question, How do i tell nova to just limit number of VMs per kvm host?20:29
spatelI have 3 nodes and just want to stick with 10 vms per compute nodes limit so i don't blow up again 20:30
dansmithI don't know that you can, easily.. you might be able to hack that up with placement, a custom resource class, and flavor extra specs, but it would be complicated20:31
dansmithbetter to set memory overcommit to 1.0 reserved memory enough to run your host services and then it will limit to whatever will fit without creating too much memory pressure20:31
spatelIn my case i have controller and compute on same node. 20:32
spatelThis is very small environment for small budget. 20:32
spatelI like the idea of memory overcommit to 1.020:33
spatelI thought nova has config setting per compute node to tell number of VM allow to run. 20:34
dansmithnot that I know of.. generally such a number would make no sense.. one 32G instance might fit where 16 2GB instances would fit.. "number of instances" is not a very useful number for most people20:34
opendevreviewDan Smith proposed openstack/nova master: DNM: Test new ceph job configuration with nova  https://review.opendev.org/c/openstack/nova/+/88158520:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!