Friday, 2020-11-13

openstackgerritBrin Zhang proposed openstack/nova master: WIP: Cyborg suspend/resume support  https://review.opendev.org/72994500:00
openstackgerritBrin Zhang proposed openstack/nova master: [Trivial] Rename host/node to hostname/nodename in conductor manager  https://review.opendev.org/76249900:00
*** martinkennelly has joined #openstack-nova00:13
*** jangutter has quit IRC00:29
*** LinPeiWen96 has joined #openstack-nova00:30
*** jangutter has joined #openstack-nova00:32
*** rcernin has quit IRC00:42
*** rcernin has joined #openstack-nova00:43
openstackgerritGhanshyam Mann proposed openstack/nova master: [WIP] Migrate nova-grenade-multinode job to zuulv3 native  https://review.opendev.org/74205601:03
*** martinkennelly has quit IRC01:26
*** _mlavalle_1 has quit IRC01:31
*** spatel has joined #openstack-nova01:37
openstackgerritchengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration  https://review.opendev.org/76233001:38
*** spatel has quit IRC01:42
*** songwenping_ has joined #openstack-nova01:53
*** k_mouza has joined #openstack-nova01:54
*** LinPeiWen96 has quit IRC01:54
*** LinPeiWen13 has joined #openstack-nova01:55
*** rcernin has quit IRC01:58
*** k_mouza has quit IRC01:58
*** macz_ has quit IRC02:09
*** tinwood has quit IRC02:10
*** tinwood has joined #openstack-nova02:13
openstackgerritchengsheng proposed openstack/nova master: Modify the default value of the force parameter in live migration  https://review.opendev.org/76245802:18
*** sapd1 has joined #openstack-nova02:31
*** macz_ has joined #openstack-nova02:39
*** macz_ has quit IRC02:45
openstackgerritBrin Zhang proposed openstack/nova master: [Trivial] Rename host/node to hostname/nodename in conductor manager  https://review.opendev.org/76249902:49
*** rcernin has joined #openstack-nova02:57
*** jangutter has quit IRC03:08
*** jangutter has joined #openstack-nova03:09
*** songwenping_ has quit IRC03:22
*** rcernin has quit IRC03:38
*** nweinber has joined #openstack-nova03:38
*** nweinber has quit IRC03:43
*** rcernin has joined #openstack-nova04:05
openstackgerritMerged openstack/nova stable/ussuri: Change default num_retries for glance to 3  https://review.opendev.org/74893604:05
openstackgerritKeigo Noha proposed openstack/nova stable/train: Change default num_retries for glance to 3  https://review.opendev.org/76261004:11
openstackgerritchengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration  https://review.opendev.org/76233004:13
*** mkrai has joined #openstack-nova04:25
*** psachin has joined #openstack-nova04:53
*** rm_work has quit IRC04:58
*** rm_work has joined #openstack-nova04:58
*** gyee has quit IRC05:24
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-nova05:33
*** zzzeek has quit IRC05:34
*** zzzeek has joined #openstack-nova05:36
*** rcernin has quit IRC05:39
*** ociuhandu has joined #openstack-nova05:40
*** rcernin has joined #openstack-nova05:42
brinzhang_gibi: hi, I added cyborg shelve/unshelve support patch to the wallaby runway slot, https://etherpad.opendev.org/p/nova-runways-wallaby05:48
*** ociuhandu has quit IRC06:00
*** JamesBenson has quit IRC06:04
openstackgerritYumengBao proposed openstack/nova-specs master: libvirt supports composing cyborg owned vGPU accelerator into domain XML  https://review.opendev.org/75011606:19
*** tosky has joined #openstack-nova06:24
*** elod has joined #openstack-nova06:45
*** rcernin_ has joined #openstack-nova06:53
*** mkrai has quit IRC06:53
*** rcernin has quit IRC06:54
*** rcernin_ has quit IRC06:59
*** ociuhandu has joined #openstack-nova07:13
*** ociuhandu has quit IRC07:13
*** ralonsoh has joined #openstack-nova07:14
*** ociuhandu has joined #openstack-nova07:14
*** ociuhandu has quit IRC07:17
*** rpittau|afk is now known as rpittau07:20
*** slaweq has joined #openstack-nova07:26
*** jangutter_ has joined #openstack-nova07:35
*** jangutter has quit IRC07:36
*** mkrai has joined #openstack-nova07:40
gibistephenfin: ack08:07
gibibrinzhang_: hi, OK thanks08:07
*** tkajinam has quit IRC08:08
*** andrewbonney has joined #openstack-nova08:10
*** tkajinam has joined #openstack-nova08:12
*** tesseract has joined #openstack-nova08:14
brinzhang_gibi: I was update https://review.opendev.org/#/c/729563/17, but the zuul not stable08:25
brinzhang_gibi: I saw you said nova-live-migration sometimes timeout, it also happened on https://review.opendev.org/#/c/762499/508:26
*** tesseract has quit IRC08:29
*** tesseract has joined #openstack-nova08:31
*** david-lyle has joined #openstack-nova08:47
*** dklyle has quit IRC08:47
*** mkrai has quit IRC08:54
*** CeeMac has joined #openstack-nova08:55
*** ociuhandu has joined #openstack-nova08:58
*** david-lyle has quit IRC08:58
*** ociuhandu has quit IRC09:08
bauzasgibi: got tons of failures on the RPC API change, could you please tell me whether we still have CI issues ?09:08
bauzascontect : https://review.opendev.org/#/c/761452/09:09
openstackgerritchengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration  https://review.opendev.org/76233009:15
*** nightmare_unreal has quit IRC09:17
*** k_mouza has joined #openstack-nova09:18
*** jawad_axd has joined #openstack-nova09:22
lyarwoodis there anyone running focal locally who could help me understand why libvirtd restarts even when I've killed the service and associated sockets in systemd? re to the nova-live-migration gate failure https://bugs.launchpad.net/nova/+bug/190397909:23
openstackLaunchpad bug 1903979 in OpenStack Compute (nova) "nova-live-migration job fails during evacuate negative test" [High,Confirmed] - Assigned to Lee Yarwood (lyarwood)09:23
lyarwoodseems something has changed in focal, likely another socket/service that wakes libvirtd backup after 5 seconds09:24
*** kaisers has joined #openstack-nova09:27
*** jangutter has joined #openstack-nova09:31
openstackgerritLee Yarwood proposed openstack/nova master: WIP nova-live-migration: Disable *all* virt services during negative tests  https://review.opendev.org/76262309:31
lyarwood^ blind attempt while I try to get a local focal env to play with09:31
*** jangutter_ has quit IRC09:33
*** derekh has joined #openstack-nova09:39
jawad_axdHi folks,  Facing this http://paste.openstack.org/show/799983/  . Any pointers what might be wrong? Instance gets created at the end but it takes a while, seems like rabbitmq connection is not stable anymore after rebooting controllers for some reason. rabbitmq cluster status is fine.09:39
*** martinkennelly has joined #openstack-nova09:39
jawad_axdusing stable stien.09:41
bauzaseek, just saw http://status.openstack.org/elastic-recheck/#168654209:41
bauzasthe gate is f*** flakey09:42
* bauzas needs to take the red pill09:42
bauzashah, nevermind the above sentence, I apologize09:43
bauzasway more context than just the Matrix movie, unfortunately09:44
kashyapbauzas: "There is no Gate"09:45
*** ociuhandu has joined #openstack-nova09:48
*** whoami-rajat__ has quit IRC09:48
openstackgerritBrin Zhang proposed openstack/nova-specs master: Filter instances by tenant_id  https://review.opendev.org/73724109:50
gibibauzas: I use this to see if there any total brokenness on the gate https://zuul.opendev.org/t/openstack/builds?project=openstack%2Fnova&pipeline=gate09:55
gibibauzas: also we have 83 unclassified failures on the gate by http://status.openstack.org/elastic-recheck/data/integrated_gate.html09:56
gibiso the answer is we don't know09:57
*** ociuhandu has quit IRC09:58
gibibauzas: I did filed https://bugs.launchpad.net/nova/+bug/1903979 and lyarwood is trying to find the root of it09:58
openstackLaunchpad bug 1903979 in OpenStack Compute (nova) "nova-live-migration job fails during evacuate negative test" [High,In progress] - Assigned to Lee Yarwood (lyarwood)09:58
gibidansmith, stephenfin, gmann: filed a doc bug to document the minimal config for each nova services https://bugs.launchpad.net/nova/+bug/190417910:01
openstackLaunchpad bug 1904179 in OpenStack Compute (nova) "[doc] define the minimal mandatory configuration for each nova service" [Medium,Confirmed]10:01
openstackgerritBalazs Gibizer proposed openstack/nova stable/victoria: Warn when starting services with older than N-1 computes  https://review.opendev.org/76192310:12
bauzasgibi: ack thanks10:13
gibibauzas: this is also new for me on master https://1d1ac8c4d7a38514d020-4dcc482f43713c13ecd75f64a0eb3df3.ssl.cf1.rackcdn.com/762319/1/check/tempest-integrated-compute/0b7fa72/controller/logs/screen-n-cpu.txt10:14
gibibauzas: filing a bug now for that too10:14
bauzasgibi: http://status.openstack.org/elastic-recheck/#168654210:14
bauzaslooks like we have problems with the queues10:15
bauzasanyway;, /me needs to taxi his kids10:15
gibithe we have multiple problems :)10:15
gibiyey10:15
bauzaslooks like a super fancy Friday10:15
bauzas\o/10:15
gibiit is 13th so I'm not surprised10:16
*** brinzhang_ has quit IRC10:16
*** janno has quit IRC10:19
*** xinranwang has joined #openstack-nova10:23
xinranwanggibi:  Hi gibi, I have updated the smarnic spec, please review it when you got time :)10:24
gibixinranwang: thanks10:24
gibiI will try to jump on reviews after I put out the fire on the gate10:24
*** efried has quit IRC10:24
xinranwanggibi: Ah ok,  it's friday evening in my timezone, I will check it next monday. Take your time.10:26
xinranwangThanks in advance10:26
gibixinranwang: there is a good chance that I will not reach your spec today. Have a nice weekend.10:27
*** janno has joined #openstack-nova10:27
gibiFYI another gate failure https://bugs.launchpad.net/nova/+bug/190418110:30
openstackLaunchpad bug 1904181 in OpenStack Compute (nova) "nova-compute fails to start is cell conductor is not running." [High,Confirmed] - Assigned to Balazs Gibizer (balazs-gibizer)10:30
*** efried has joined #openstack-nova10:31
xinranwanggibi:  thanks :)10:32
openstackgerritBalazs Gibizer proposed openstack/nova master: Restore retrying the  RCP connection to conductor  https://review.opendev.org/76263310:53
gibibauzas: when you are back there are two nova gate fixing patch. One from me ^^ and one from stephenfin https://review.opendev.org/#/c/76254310:54
bauzasI'm here10:55
bauzasack, looking then10:55
gibibauzas: look stephenfin's first I will try to add a test to mine10:55
gibiand thanks10:56
*** jangutter_ has joined #openstack-nova10:56
bauzasgibi: +W'd stephenfin's one10:59
*** jangutter has quit IRC10:59
gibiawesome, thanks10:59
bauzasgibi: for your own change, I have a concern, where are we looking at the conductor ?10:59
gibibauzas:    https://github.com/openstack/nova/blob/eb279e9a5676f4142cce4700c3097ecc14161895/nova/service.py#L11511:00
gibiit is well hidden gem of the Service object11:00
*** jawad_axd has quit IRC11:01
bauzasthanks11:01
bauzaslooking11:01
bauzasa-ha, now I remember11:01
bauzashttps://github.com/openstack/nova/blob/eb279e9a5676f4142cce4700c3097ecc14161895/nova/service.py#L113 is for the computes11:02
gibiyes11:02
*** ociuhandu has joined #openstack-nova11:03
bauzasgibi: okay, then waiting for your test11:03
bauzasgibi: just do a functional test honestly11:03
gibiOK, working on it11:03
gibiyes, it will be a functional one11:03
bauzaswould be simplier and better11:03
bauzascool11:04
bauzasgibi: fwiw, I'll be lunching, but ping me when you're done and I'll look at it when I'm back11:04
gibisure. have a nice one!11:04
*** jangutter has joined #openstack-nova11:24
gibibauzas: bah, in the func test env the compute service does not need a running conductor to start up, I guess something is mocked in the indirection API11:24
*** ociuhandu has quit IRC11:27
*** jangutter_ has quit IRC11:27
openstackgerritBalazs Gibizer proposed openstack/nova master: Restore retrying the RCP connection to conductor  https://review.opendev.org/76263311:31
gibiwith a less good but still covering unit test ^^11:32
*** takamatsu has quit IRC11:33
*** takamatsu has joined #openstack-nova11:33
*** ociuhandu has joined #openstack-nova11:44
*** JamesBenson has joined #openstack-nova12:01
*** tbachman has quit IRC12:02
*** JamesBenson has quit IRC12:04
*** JamesBenson has joined #openstack-nova12:04
openstackgerritLee Yarwood proposed openstack/nova master: nova-live-migration: Disable *all* virt services during negative tests  https://review.opendev.org/76262312:09
openstackgerritchengsheng proposed openstack/nova master: Add hypervisor CPU feature check during live migration  https://review.opendev.org/76233012:15
openstackgerritBalazs Gibizer proposed openstack/nova master: Doc that [database]connection is not for nova-compute  https://review.opendev.org/76264712:25
*** xinranwang has quit IRC12:33
kashyapAnyone else with CPU and live migration interest, have a look at this review, from above, chengsheng ... I'm not comfortable parsing /proc/cpuinfo straight-up (and a couple of other issues in there): https://review.opendev.org/#/c/762330/612:35
kashyapErr, wrong link12:35
kashyapCorrect one: https://review.opendev.org/#/c/762330/12:35
*** k_mouza has quit IRC12:38
sean-k-mooneywe have said no to that in the past12:52
sean-k-mooneyand required it the info to be exposed in libvirt12:53
sean-k-mooneykashyap: as written that patch will break upgrades and prevent migrating a vm to newwer hardware so its a nonstarter13:06
*** k_mouza has joined #openstack-nova13:06
sean-k-mooneyits also not filtering out no virutaliseable feature that dont affact the cpu flags exposed to the vm13:07
*** artom has joined #openstack-nova13:15
*** ociuhandu_ has joined #openstack-nova13:21
*** raildo has joined #openstack-nova13:24
*** ociuhandu has quit IRC13:25
*** tbachman has joined #openstack-nova13:27
openstackgerritMerged openstack/nova master: functional: Wait for revert resize to complete  https://review.opendev.org/76254313:40
*** martinkennelly has quit IRC13:41
f0oI'm a bit confused... More and more linux guests cant reboot. their instances go from running to paused for some reason and need to be hard-reboot from horizon/api to be able to become alive again. Is a guest no longer allowed to reboot by itself? Am I missing a setting somewhere? (using kvm/qemu)13:44
sean-k-mooneyf0o: they are allowed too yes13:44
f0owhy are they ending up in "paused" then?13:45
sean-k-mooneythey would only be moved to paused if the state in the db for the vm was paused and you had the sync power state option set13:45
f0owhat would cause the db to think it's paused?13:45
sean-k-mooneyonly an api request13:45
sean-k-mooneydid you check the instance event log13:46
f0othe way I can reproduce it is: create instance (ubuntu lts for instance), log in and do sudo reboot, vm is now stuck in paused13:46
f0ono api-calls made13:46
sean-k-mooneyim not directly aware of anythin that would cause that13:47
sean-k-mooneyi have not seen it on ussuri at least13:47
sean-k-mooneywhat release are you using13:47
f0oit becomes very annoying for OS that run updates as part of their first boot (like coreos/flatcar) then the vm never becomes alive...13:47
f0oI think I'm still on stein13:47
kashyapsean-k-mooney: Yeah; it's a non-starter for other reasons too13:48
kashyapsean-k-mooney: I've just only gave a cursory look :-)  Please comment there13:48
sean-k-mooneykashyap: i did13:49
sean-k-mooneyi was summerising for you :)13:49
kashyapHehe; thanks13:49
*** derekh has quit IRC13:54
f0ohow would I go about debugging this weird paused issue?13:56
f0owhere to start looking?13:56
sean-k-mooneyf0o: you should be looking at the nova-compute logs for the instance in question and seeing what actions cause the instance to be paused13:57
sean-k-mooneye.g. did the compute agent recived an event form libvirt saying the guest is now paused13:57
sean-k-mooneyor did it call libvirt to pause it13:57
f0ook, will do13:58
*** mlavalle has joined #openstack-nova13:58
f0othnkas :)13:58
f0othanks*13:58
*** nweinber has joined #openstack-nova14:01
f0oI have a few entries that have VM Paused (Lifecycle Event) in them. just gonna spawn a new instance and reboot it to see all the logs it generates14:02
*** macz_ has joined #openstack-nova14:17
f0o[instance: 4d131088-4ca6-47bd-ab5c-47b9e0a7c996] VM Paused (Lifecycle Event) & During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor.14:19
f0o[instance: 4d131088-4ca6-47bd-ab5c-47b9e0a7c996] Instance is paused unexpectedly. Ignore.14:19
f0oso those 3 lines happen when I issue reboot inside the instance14:19
f0oand now horizon shows it as Active/Paused and there's no resume action. I need to issue Hard-Reboot to kick it back alive14:20
sean-k-mooneythat look like libvirt is moving it to paused so14:20
sean-k-mooneyand then the compute agent is just updating the db to reflect that14:20
*** macz_ has quit IRC14:22
f0owell /var/log/libvirt/qemu/instance-00000115.log sure is useless lol14:22
openstackgerritRadosÅ‚aw Piliszek proposed openstack/nova master: [docs] Fix a placement client's command  https://review.opendev.org/76266314:26
f0onot sure where to go from here14:29
sean-k-mooneycan you paste the xml for the instace somewhere14:30
f0osure14:30
sean-k-mooneythere is an option to contol the reboot action but we dont set it but im just wondering if anything else is there that was generated by libvirt14:30
sean-k-mooneylibvirt modifies the xml we give it an fills in things like pci address automatically14:31
f0ohttp://paste.openstack.org/show/oGhApPwoy0lNe5cKmV0G/14:31
sean-k-mooney<on_poweroff>destroy</on_poweroff>14:32
sean-k-mooney  <on_reboot>restart</on_reboot>14:32
sean-k-mooney  <on_crash>destroy</on_crash>14:32
sean-k-mooneyso those are what i expect14:32
sean-k-mooney<on_reboot>restart</on_reboot> is what i was wondering about14:33
f0oso that seems alright then?14:33
f0onova-compute-kvm version 2:20.0.0~rc1-0ubuntu3~cloud0 and qemu-kvm version 1:4.0+dfsg-0ubuntu9~cloud0 (just in case)14:33
sean-k-mooneyyes so what could be happening is a race between the periodic task and the vm reboot14:33
sean-k-mooneyis it happeing everytime14:34
sean-k-mooneyor jsut some times14:34
f0oit's happening the majority of times14:34
f0othere are some exceptions to it, but by now I'd wager that most times it gets stuck in paused14:34
sean-k-mooneyhum ok the interval is 10 minutes for the update by default14:35
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval14:35
sean-k-mooneyi assume you have not made that run faster14:35
f0onope14:35
f0oconfig is very trivial, most are left as defaults14:35
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events14:36
sean-k-mooneyso you might want to set that to false14:36
sean-k-mooneyit look liek this is a know race14:36
f0ook let's give it a shot14:37
sean-k-mooneyif you have the interval at its default then setting that to false should be fine14:37
sean-k-mooneythat would have to be set on the compute nodes fyi14:38
f0oset and restarting nova-compute14:38
sean-k-mooneycool14:38
f0olet's give it a shot14:38
sean-k-mooneyf0o: by the way the reason we default to haneling event i belive is for ironic14:41
*** k_mouza has quit IRC14:42
*** ociuhandu_ has quit IRC14:42
*** ociuhandu has joined #openstack-nova14:43
f0omakes sense14:43
f0oat first I thought I was going insane but now that it became more frequent I figured I ask14:44
sean-k-mooneyhopefully that option will help14:47
sean-k-mooneythe main sideefct is that if you do poweroff in the guest then it wont be refected in the api/db until the interval expires14:47
sean-k-mooneye.g. up to 10 mins form power off by efault14:48
f0othat should be fine14:48
sean-k-mooneythat is genreally and ok tradeoff and you can adjust the interval if you want too14:48
f0oworst case I lower the update interval14:48
sean-k-mooneyyep14:48
*** lpetrut has joined #openstack-nova14:49
*** rpittau is now known as rpittau|afk14:49
*** hemna has quit IRC14:50
*** k_mouza has joined #openstack-nova14:50
*** hemna has joined #openstack-nova14:51
f0o[  652.025665] reboot: Restarting system .... let's hope it comes back up :D14:53
f0oI can see the qemu process running and consuming 30% cpu but nothing seems alive in it14:56
f0ovnc shows guest hasnt initialized display and view logs is only showing that reboot message14:56
f0oDuring _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (3). Updating power_state in the DB to match the hypervisor / Instance is paused unexpectedly. Ignore.15:00
f0oagain :<15:00
*** k_mouza has quit IRC15:01
f0oprocess is still running and eating up 30% cpu tho, not sure what it's actually doing there15:02
*** ociuhandu has quit IRC15:06
*** ociuhandu has joined #openstack-nova15:08
*** lpetrut has quit IRC15:12
*** dopereira has joined #openstack-nova15:12
*** ociuhandu has quit IRC15:24
*** LinPeiWen13 has quit IRC15:29
*** dklyle has joined #openstack-nova15:48
sean-k-mooneyf0o: that is sounding more like a qemu bug then an openstack one15:49
sean-k-mooneyit sound like its not actully restating properly15:49
f0ocool :D15:50
*** tacco has joined #openstack-nova15:51
openstackgerritGhanshyam Mann proposed openstack/nova master: DNM: Testing system scope in tempest  https://review.opendev.org/74012415:51
f0ojust my luck15:51
taccohey everyone. Anyone knows why i only get 64VCPUs on a HV with 256 CPU Cores? over commiting ratio is 1.0 :(15:52
taccoAMD EPYC 7742 64-Core Processor15:52
*** mlavalle has quit IRC15:53
gibidansmith: hi! Do I understand correctly that the service version check at https://review.opendev.org/#/c/729563/17/nova/compute/api.py@4162 can see version 54 and allowing the shelve call with accelerators while the RPC is still can be manually pinned to < 5.13 and therefore the compute will not the accel_uuids param and therefore not handle the acceleratos properly?15:53
gibi* will not get the accel_uuids15:53
dansmithI'll have to look at that decorator, I think that just got added, right?15:54
taccoi see processor: 255 and cpu cores: 64 in /proc/cpuinfo is this something like HT on Intel CPUs? but only can provide the "real" cores to the VM?15:54
taccocause i have 250CPUs in my flavor15:54
*** mlavalle has joined #openstack-nova15:55
taccothe VM then spawns with 64CPus15:55
f0oUhm15:55
f0o7742 shows as 64 cores15:55
f0oHT stuff doesnt really count afaik15:55
dansmithgibi: yikes, that makes an uncached cross-cell db lookup for every single call of that method :(15:55
gibidansmith: not too long ago, but it basically calls get_minimum_version_all_cells15:56
f0otacco: inside the vm you dont see the HT threads?15:56
taccoin the VM at proc/cpuinfo i see only 64 but the vm was spawned with a flavor with 250 vcpus15:57
taccothats kinda strange and cli hypervisor list also shows 256 CPUs15:57
dansmithgibi: commented on that patch, see if that helps15:59
gibithanks15:59
*** ociuhandu has joined #openstack-nova16:02
*** k_mouza has joined #openstack-nova16:03
taccoi see this seems to be two physical CPUs with 64 cores and 128 threads. that makes sense. But no clue why the VM only got 64Cpus if in the flavor are 250 specified. anyway. will have to dive deeper :D16:05
f0oI'm happy to swap issues with you tacoc :D16:08
f0oother than my dyslexia today lol16:08
*** ociuhandu has quit IRC16:09
taccoalways nice to be a usefull person :D16:09
gibidansmith: you confirmed my fears, thanks16:12
dansmithgibi: ack16:12
dansmithgibi: honestly that decorator seems like a bad idea to me.. that's a lot of overhead hidden in a decorator16:12
dansmithit should at least use a cached value, but I'd prefer we did things like I described, which is rely on the rpc version and a raise from rpcapi16:13
sean-k-mooneytacco: there was a libvirt bug realted to numa reporting and amd but perhaps there are others16:13
sean-k-mooneytacco: what do you see on the host if you do nprc16:13
gibidansmith: agree. I missed the heavy weightness of that decorator impl in previous reviews.16:14
sean-k-mooneytacco: can you provide the output of virsh capabilities for me in a paste and ill quickly take a look16:14
dansmithgibi: oh actually, I forgot that we cache in the service object itself, so I guess not as bad, but still, it leaks the rpc problem so it'd be better to use that16:14
tacconproc shows me 256 as expected.16:14
*** takamatsu has quit IRC16:15
sean-k-mooneytacco: also the vm xml. if possible. i know some kernel have a process limit built in so just want to check if the xml has 64 or 250 cores16:15
taccoone sec. will do so.16:15
gibidansmith: ack16:16
taccosean-k-mooney: http://paste.openstack.org/show/xs49lPLHXUPh43AeceZs/16:19
taccothats the dumpxml from the VM16:20
sean-k-mooney<nova:vcpus>250</nova:vcpus>16:20
taccovirsh capas on the way. :)16:20
sean-k-mooney<vcpu placement='static'>250</vcpu>16:20
sean-k-mooneyso nova is telling libvirt/qemu to use 25016:20
taccook, but inside the VM i only see 64 when i do cat /proc/cpuinfo hm.. maybe some problems with the image16:20
sean-k-mooney<topology sockets='250' cores='1' threads='1'/> in a really dumb way but this is the default16:20
taccoyes would enable numa pinning later16:21
sean-k-mooneytry adding hw_cpu_sockets=2 hw_cpu_theads=2 to the image16:21
taccowould keep this in the flavor, because i would like to keep this as a seperate aggregate only available for "some" users16:21
sean-k-mooneyactull with 250 that wont work16:22
*** k_mouza has quit IRC16:22
sean-k-mooneytacco: try 248 cores16:22
sean-k-mooneywith those too set16:22
taccowill do so. one sec.16:22
*** k_mouza has joined #openstack-nova16:22
sean-k-mooneyim guessing the kernel is comiled to only support 64 sockets16:22
gibidansmith: I feel we have other places in the code where we rely purely on the service version to see if something is supported and we ignore the fact that rpc might be pinned.16:23
dansmithgibi: those *should* be places where there isn't a specific rpc version in play,16:23
dansmithin which case the service version is specifically what we want16:23
gibidansmith: yeah, that would be the proper usage of the service version only checks16:23
*** k_mouza has quit IRC16:23
dansmithservice version alone should be used for cases where "I'm not saying anything new over rpc, but I depend on some action being done on the compute"16:23
*** k_mouza has joined #openstack-nova16:24
gibidansmith: I will look through these checks to be sure16:24
taccosean-k-mooney: i guess this should be debian-cloud-image most things kernel related should be default16:27
taccook flavor has now 248vcpus and | properties                 | hw:cpu_sockets='2', hw:cpu_theads='2' |16:28
sean-k-mooneytacco: ya im not sure what the default sockets is upstream but 64 sound like a number peopel would choose as a default16:29
*** mgoddard has joined #openstack-nova16:29
taccosean-k-mooney: yes this was also in my mind as first :D16:29
taccothats why i asked here.. because if this is known.. you should know it. :)16:30
taccothis is the first time i have a HV with so many CPUs16:30
taccoand i know some people here should have way larger setups and way more experience than i have. :)16:30
sean-k-mooneyi havent gone over 128 but i alway make my flavor mirror the host toplogy in terms or threads and sockets16:31
taccoanyway. Thanks for your initial help. Know in know this could be related to the image. Will digg aroung and see what i can find.16:31
gibidansmith: does an ovo always travels through RPC with all its data and only backlevelled on the receiving side? So if a new field is added to an o.vo that is sent via RPC then we don't need to bump RPC version and the reciving side gets the new field if the code on the reciving side has the new field definition in its own ovo class independently of the RPC api version?16:31
sean-k-mooneytacco: did that work by the way16:31
sean-k-mooneythe updated flavor16:31
taccoususaly you also don't want such huge flavors.16:31
tacconope updated flavor also has only 64cpus in the cm16:31
taccos/cm/vm/16:31
sean-k-mooneyreally16:32
sean-k-mooneythat is odd16:32
sean-k-mooneyam can you quickly check the qemu string jut to triple check16:32
taccoand i double checked the xml if that change affected the xml and reflects my change to the flavor16:32
dansmithgibi: we always send the version of the object we have. If the receiving side determines it's too new, it calls to conductor and asks for conductor to backlevel it. Conductor can either return an object that has an older version (i.e. if it _can_ backlevel it) or it can refuse16:32
sean-k-mooneyit likely a qemu or guest kernel limiation however16:32
taccoyes. Thanks thats where i would like to digg more deeper16:33
gibidansmith: good to know, that in this case there is an extra call back to the conductor16:33
dansmithgibi: but that's just for the object(s), not the rpc signature itself, and the idea was that during an upgrade you have extra conductor load to handle all the backports, but as you upgrade everything that just disappears16:34
gibidansmith: yes, it is clear that this is only possible for o.vos itself, not for the whole RPC method signature.16:35
*** macz_ has joined #openstack-nova16:36
gibidansmith: but then in a new-field-in-an-ovo case the service version check is enough16:36
gibias if the compute service version is new enough then it will understand a new ovo version with the extra field16:36
sean-k-mooneytacco: https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig#L994-L100516:36
dansmithyeah, and in some cases, it's possible to backlevel the object so we can just deal with it on the receiving end, but not if you require specific behavior16:36
sean-k-mooneytacco: so it should be 512 so proably a qemu issue16:37
*** jdillaman has quit IRC16:38
taccoi see. Thanks.16:38
gibidansmith: thanks again, this make sense now16:39
dansmithcool16:39
sean-k-mooneytacco: it look like the max cpus depends on the machine type you enable16:40
gibiI think this is a good time to finish my week and let the new understanding solidifies :)16:41
gibihave a nice weekend folks o/16:41
lyarwood\o16:41
taccosean-k-mooney: here is also the capa list of virsh http://paste.openstack.org/show/nxpSZUUrICfvvRJBgosL/16:44
taccothis machine type?     <type arch='x86_64' machine='pc-i440fx-4.0'>hvm</type>16:45
sean-k-mooneyya you are using the pc machine type but it should in theory support up to 25616:45
*** takamatsu has joined #openstack-nova16:46
sean-k-mooneytacco: if you look in teh output it has the limits16:47
sean-k-mooneyline 119316:47
sean-k-mooney<machine maxCpus='255'>pc-i440fx-4.0</machine>16:47
taccoyes i also found this in there.16:47
taccook. so no "real"limitations more of a missconfiguration or bug. :)16:47
sean-k-mooneyso this is looking like a guest issue or a bug ya16:48
sean-k-mooneyyou could try with another image16:48
taccoi already struggled about that limitations that you can only have 8 disks inside a qemu vm. :)16:48
taccoyes will do so but in general this was the debian cloud image with only minimal changes. but will test next week.16:48
sean-k-mooneylike a fedora or tubleweed image16:48
taccook. will do so. but for today im done.16:49
sean-k-mooneycool16:49
sean-k-mooneyenjoy your weekend16:49
taccothanks a lot for your help so far. Will get back to you next week. have a nice weekend as well16:49
*** ociuhandu has joined #openstack-nova16:51
*** hemna has quit IRC16:58
*** hemna has joined #openstack-nova16:59
openstackgerritBalazs Gibizer proposed openstack/nova master: Restore retrying the RPC connection to conductor  https://review.opendev.org/76263317:04
*** hemna has quit IRC17:08
*** hemna has joined #openstack-nova17:09
*** tesseract has quit IRC17:14
*** ociuhandu_ has joined #openstack-nova17:15
dopereiraHi, everyone. I'm a new contributor and just joined the OpenStack community.17:17
*** ociuhandu has quit IRC17:17
dopereiraI recently started to work on a project that develops a Openstack distribuition with comercial support. Hopefully I will be contributing to the upstream Openstack as well.17:18
dopereiraI recently started to work on a project that develops a Openstack distribuition with comercial support. Hopefully I will be contributing to the upstream Openstack as well.17:18
dopereiraAs part of my ramp up process, I'm learning how to contribute with Openstack, and I was asked to take care of a low-hanging-fruit bug.17:18
dopereiraI choose this one: https://bugs.launchpad.net/nova/+bug/1888927 and already have a patch for it: https://review.opendev.org/#/c/762433/417:18
openstackLaunchpad bug 1888927 in OpenStack Compute (nova) "cell_v2 update_cell cell0 get transport_url from config file" [Low,In progress] - Assigned to Daniel de Oliveira Pereira (danielpereira01)17:18
dopereiraCould you guys please take a look and help to review it?17:19
*** ociuhandu_ has quit IRC17:19
dopereiraAlso, the VMware NSX CI check failed. How can I started recheck  for it? It does not provide this information17:21
sean-k-mooneydopereira: have you submitted the patch to gerrit17:23
sean-k-mooneyah you have17:23
sean-k-mooneydopereira: when a third party ci fails it leaves a comment telling you how to recheck that specific ci17:24
sean-k-mooneydopereira: you have to click the toggle extra ci button to see it17:24
sean-k-mooneyoh but they dont17:24
sean-k-mooneyit should be listed here17:25
dopereirait seems that's not the case for VMware NSX CI17:25
sean-k-mooneyhttps://wiki.openstack.org/w/index.php?title=ThirdPartySystems17:25
sean-k-mooneyvmware-recheck-patch17:25
sean-k-mooneyacorrding to https://wiki.openstack.org/wiki/ThirdPartySystems/VMware_CI17:26
*** ociuhandu has joined #openstack-nova17:26
dopereiraI saw, thanks17:27
sean-k-mooneyso i havent done a full review but i think you need to add a release note for the bug fix17:27
sean-k-mooneylooks like you have added tests17:28
sean-k-mooneyso ya the main thing i think is needed is a realse note but im not that familar with that part fo the code otherwise it looks fine17:29
dopereiracould you point me some documentation about release notes?17:30
*** ociuhandu has quit IRC17:31
gmanndopereira: here - https://docs.openstack.org/reno/latest/user/usage.html17:31
sean-k-mooneybasically you create a new one form the template via tox. so tox -e venv -- reno new cell_0-transport-url17:31
sean-k-mooneythen you edit the file it creates17:32
*** jangutter has quit IRC17:32
sean-k-mooneyyou can check its correct with tox -e releasenotes17:32
sean-k-mooneyyou just need to fill out the fixes section in this case and remove the rest17:32
dopereirathanks, will take a look17:34
*** bnemec is now known as beekneemech17:37
*** k_mouza has quit IRC17:43
*** psachin has quit IRC17:44
*** mgoddard has quit IRC17:49
taccosean-k-mooney: can't hold myself and tested with fedora. same same.. 64vcpus in /proc/cpuinfo.. d17:54
tacconproc also 64.17:54
*** mlavalle has quit IRC17:58
sean-k-mooneytacco: you try a dmidecode in the vm17:59
sean-k-mooneyor check dmsge to see if it prints anything on kernel start17:59
sean-k-mooneybut ya nova seams to be doing the right thing so its a qemu or guest issue18:00
sean-k-mooneylooking more like qemu since it happens on multiple distoros18:00
*** mlavalle has joined #openstack-nova18:05
openstackgerritGhanshyam Mann proposed openstack/nova master: [WIP] Migrate nova-grenade-multinode job to zuulv3 native  https://review.opendev.org/74205618:12
*** ralonsoh has quit IRC18:34
*** andrewbonney has quit IRC18:36
*** nweinber has quit IRC18:36
*** nweinber has joined #openstack-nova18:36
*** sean-k-mooney1 has joined #openstack-nova18:41
*** sean-k-mooney has quit IRC18:42
*** gyee has joined #openstack-nova18:54
openstackgerritDaniel de Oliveira Pereira proposed openstack/nova master: Avoid changing transport_url when updating Cell0  https://review.opendev.org/76243319:05
*** nweinber has quit IRC19:18
*** nweinber has joined #openstack-nova19:19
*** lifeless has quit IRC19:31
*** k_mouza has joined #openstack-nova19:43
*** lifeless has joined #openstack-nova19:47
*** k_mouza has quit IRC19:48
*** dopereira has quit IRC19:50
*** slaweq has quit IRC20:03
openstackgerritMerged openstack/nova stable/train: add [libvirt]/max_queues config option  https://review.opendev.org/74006420:06
openstackgerritMerged openstack/nova stable/victoria: Handle disabled CPU features to fix live migration failures  https://review.opendev.org/75876020:52
*** ociuhandu has joined #openstack-nova21:19
*** rcernin has joined #openstack-nova21:19
openstackgerritMerged openstack/nova stable/ussuri: docs: Resolve issue with deprecated extra specs  https://review.opendev.org/74838621:27
openstackgerritMerged openstack/nova stable/ussuri: replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id"  https://review.opendev.org/74718921:27
*** ociuhandu has quit IRC21:45
*** jmlowe has quit IRC21:49
*** sean-k-mooney2 has joined #openstack-nova21:54
*** sean-k-mooney1 has quit IRC21:56
*** hamalq has joined #openstack-nova22:30
*** nweinber has quit IRC22:46
openstackgerritGhanshyam Mann proposed openstack/nova master: Fix config option default value for sample config file  https://review.opendev.org/76272123:15
openstackgerritGhanshyam Mann proposed openstack/nova master: Fix config option default value for sample config file  https://review.opendev.org/76272123:16
*** hemna has quit IRC23:19
*** hemna has joined #openstack-nova23:20
*** corvus has quit IRC23:22
*** hamalq has quit IRC23:37
*** hamalq has joined #openstack-nova23:38
*** tosky has quit IRC23:43

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!