Thursday, 2021-10-21

brinzhangbauzas:hi, about the suspend/resume an accelerator instance feature, there is no spec, before we completed rebuild/evacuate, shelve/unshelve, I just want to continue00:22
brinzhangbauzas: today, I am not sure that time(1pm) which is ok for me, but if possiable, I think I can, thanks00:24
*** carloss_ is now known as carloss01:09
*** ministry is now known as __ministry04:23
*** bhagyashris_ is now known as bhagyashris07:46
bauzasbrinzhang: OK, then we'll start with your topic, but in case you're not around, I'll punt it until tomorrow08:29
gibimorning Nova08:30
sean-k-mooneyfor what its worth i think it make sense to proceed with suspend/resume for cyborg without a spec in the same way i hope to do the same for vdpa10:09
sean-k-mooneyits just the completion of the existing work which does nto really require much design work so i think specless blueprint and code review should be suffincet in both cases10:10
gibiI'm ok with a specless bp too10:18
bauzasreminder : nova sessions start in 50 mins on https://www.openstack.org/ptg/rooms/newton12:11
bauzassean-k-mooney: gibi: yup I agree with both of you but given brinzhang asked to discuss for it, let's wait until we do12:15
gibisure12:31
tbarronbauzas: do you have a window (however large) when you are targeting the virtios topic?  manila folks plan to discuss at 23:15 this morning (I know Lee is out of course)12:37
fricklerwith virt_type=qemu on qemu>=5.2 (bullseye, jammy, centos-stream-8) instances (i.e. rss of the qemu process) seem to use 3x as much ram as the flavor allocates, has anyone seen this behaviour before? it is causing failures in devstack CI with things just going OOM12:37
tbarronbauzas: 13:15 :)12:39
tbarronwhich isn't morning for lots of people, I know12:39
* tbarron is waking up12:39
sean-k-mooneyfrickler: no but i do know that in qemu 6.0 the -m attibute is nolonger required when using mem-backing args12:43
sean-k-mooneyfrickler: so in 5.2 where we were seeing libvirt use both way that was still required12:44
sean-k-mooneyfrickler: actully that chang is in 6.1 https://wiki.qemu.org/ChangeLog/6.1#Memory_backends12:45
sean-k-mooneyfrickler: so the qemu command line generated form the libvirt xml that i looked at before i think was correct so this sound like a qemu bug12:46
bauzastbarron: others, sorry was taking a bit of timeoff before we start12:57
fricklersean-k-mooney: do you have some contact at qemu to push that to?12:57
bauzastbarron: not sure we'll have time to discuss about the Manila spec today12:57
bauzastbarron: we can start to discuss it for tomorrow 1pm (or later if you prefer)12:58
sean-k-mooneyfrickler: good question kashyap:  do you know who could help root cause this12:58
kashyapWhat's the context?  /me catches up.  (In a meeting now)12:59
sean-k-mooneykashyap: context is that qemu on debian seams to be using 2-3x the ram that is allocated to the vm12:59
kashyapWhat version of QEMU?12:59
sean-k-mooney5.212:59
kashyapFairly recent (Dec 2020)12:59
*** redrobot is now known as Guest365612:59
sean-k-mooneykashyap: we are seeign this result in OOM issues in the ci hence frickler concern12:59
tbarronbauzas: *bit* is right :D.  1300UTC tomorrow would be great, manila sessions don't start till 1400.12:59
stephenfinI'll be 20/30 minutes late to the PTG sessions again today13:00
kashyapfrickler: Very odd; is this something sudden?  I can check w/ the upstream, but before that, I'd need a bug (with proper details)13:00
kashyapfrickler: Oh, wait13:01
kashyapfrickler: virt_type=qemu is already deadly slow: as you might know it's not using any hardware acceleration13:01
kashyapfrickler: What is the host OS?  (I'm assuming this is a nested env; i.e. the "host" is a level-1 guest.  And the baremetal is some cloud-vendor provided)13:02
bauzasnova sessions start by now, people can  join with https://www.openstack.org/ptg/rooms/newton13:02
kashyapfrickler: I'd need these, to start with: QEMU version and the complete QEMU command-line of the guest.  You'll find it here: /var/log/libvirt/qemu/instance-yyyyyyyy.log13:02
* kashyap bbiab; need air, been in back-t-back meetings elsewhere13:04
bauzastbarron: okay then let's discuss this at 1300UTC tomorrow13:04
bauzassean-k-mooney: joing13:05
bauzasjoining ?13:05
sean-k-mooneyyes sorry be right there13:06
bauzasstephenfin: are you able to join us ?13:07
bauzasstephenfin: we would discuss about your topics next13:07
stephenfinbauzas: in 10 minutes13:24
bauzasdansmith: fyi, we're discussing the healthcheck proposal (as you provided comments in there)14:16
dansmithbauzas: arg, okay, in the tc right now14:18
bauzasdansmith: I guess we collected your thoughts14:18
dansmithbauzas: I can be there in a few I think14:20
bauzasok we'll continue to discuss the design before you join14:20
bauzasdansmith: tl;dr:14:32
bauzas1/ discuss a library like oslo.healthcheck providing the HT knobs14:32
bauzas2/ having a global cache object knowing the state of the service14:32
bauzas3/ (I can't remember it, sean-k-mooney ? ) 14:33
bauzasoh, the API thing14:33
sean-k-mooney3 was allow /healtcheck at the api to call the helath check for that api process14:34
bauzasyeah, I remembered it :)14:35
fricklerkashyap: https://paste.opendev.org/show/810150/ we know that qemu is slow, but we cannot use kvm reliably in CI. happening for different host OSes (i.e. what devstack runs in, no idea what host OS is in place on the cloud side)14:47
kashyapExactly; that's the peril here  w/ nested probs ... not knowing the host OS :-(14:51
clarkbit isn't jsut a reliability problem some of our providers don't expose nested virt at all. So it is a two tier problem. In some places we cannot use kvm. And in others we may but with the risk of poor reliability14:59
clarkbkashyap: on the x86_64-v2 thing (^ related because qemu) does qemu not provide a predefined compatible cpu?15:00
kashyapclarkb: Hiya15:00
clarkbseems like it should? I remember when we first set up nova live migration testing one of the things I tried was defining a custom cpu since we have heterogenous resources in the clouds we use (and have no way to request consistent cpus)15:00
clarkbbut that didn't work at all whcih is how we ened up using he qemu64 model15:01
kashyapYeah, I know :-(15:01
clarkbI want to say at the time we identified bugs in nova and friends and they were getting fixed so it is possible this is no longer an issue today but I think that if centos is saying that you need this minimum cpu now it is reasonable for there to be that cpu predefined in qemu15:01
kashyapclarkb: Sadly, QEMU does not provide a model that will work on (a) TCG, i.e. plain emulation, *and* KVM; and (b) that works on Intel and AMD15:01
kashyapclarkb: But!15:01
kashyapThere is an option I just tested.  And this works:15:02
kashyapclarkb: Using "Nehalem" satisfies both the above conditions.15:02
kashyapAnd it works on both Intel and AMD.15:02
kashyapNehalem is the oldest comatible model that works with the x86_64-v215:02
clarkbgot it so there is a predefined model that addresses things. That is great news. I guess we should consider defaulting devstack to that then?15:03
kashyapYes!  Indeed15:03
clarkbI can try pushing a devstack change later today that does that15:03
kashyapclarkb: I was going to write an email to openstack-discuss list with this recommendation15:03
clarkbkashyap: that would be great, thank you15:03
kashyapclarkb: I'm drafting as we speak.  So your timing couldn't be perfect :-)15:03
kashyaps/be perfect/be more perfect/15:04
kashyapclarkb: The only assumption, which I think you'll agree is reasonable is:15:04
kashyapclarkb: ... I imagine any hardware older than Nehalem is not capable of running OpenStack.15:04
kashyapSay "yes", please :D15:05
opendevreviewBalazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests  https://review.opendev.org/c/openstack/nova/+/81501715:05
clarkbkashyap: well thats the next thing we need to figure out because we get virtual resources with their own cpu models being managed. Currently we can't boot fedora-34 in half of our clouds and are wondering if this is the same issue15:05
clarkbkashyap: it isn't just real hardware to consider but also virtual hardware15:06
kashyapI don't know of F34 switched to the -v2, lemme check15:06
kashyapclarkb: Wait, pretty sure F34 is switched to it too15:07
kashyapHow do I know?  By inference :D RHEL9 is based on F33/F34, so I presume it did too15:08
kashyapI just need to double-check15:08
clarkbkashyap: ya and half our clouds can't boot it15:08
clarkbso there is possibility that we can't actually switch to Nehalem, but we can make a change and test it15:09
opendevreviewBalazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests  https://review.opendev.org/c/openstack/nova/+/81501715:10
kashyapclarkb: Yeah, I'm actually quickly pushing a test-only patch to DevStack to see where it fails.  If you haven't already done so15:19
clarkbkashyap: remote:   https://review.opendev.org/c/openstack/devstack/+/815020 Use Nehalem CPU model by default I just pushed that to have the CI system check things. But thats because I'm in meetings and don't have a current devstack install anywhere to check with15:19
kashyapAh-ha, thank you15:20
kashyapclarkb: I don't have it either right now.  But:15:20
kashyapclarkb: I have tested today outside of DevStack: Nehalem model works with virt_type=qemu *and* with Intel and AMD15:20
clarkbkashyap: ok, ya I suspect the question becomes whether or not the CPUs we get in the host VMs are new enough to support Nehalem15:21
clarkbsince they too get virtual capabilities to enable live mgiration in clouds and if they were too conservative we'll have problems. The good news if there is any is that they won't be able to boot centos/rhel 9 either so clouds seem likely to update them15:22
clarkbwe can recheck 815020 if it generally works to get it to run across a bunch of clouds and double check that15:22
kashyapclarkb: Nehalem is nearly 13 years old ... I hope there are no such hosts :15:24
clarkbkashyap: the issue is the clouds provide virtual CPUs with custom models too15:26
clarkbbecause they want to do live mgiration. And if they were too conservative you have this problem. I don't think the actual CPUs are that old15:26
clarkbbut the CPUs we get in our instances may be15:26
kashyapRight; I see what you mean.  Even if they are conservative w/ the virtual CPUs, I'd be really surprised (wouldn't be the first time) if they're more conservative than Nehalem15:27
kashyapclarkb: Bad me.  I was _wrong_ earlier on my stupid "inference" about Fedora: Fedora *did not* switch the baseline ABI to x86-64-v215:29
kashyapSo we can rule that out15:29
clarkbok good, our problems for booting are differen then :)15:30
kashyapYeah; grr, now to find out the actual cause15:32
kashyapDo you have the boot console log?15:32
clarkbkashyap: not right now. It is something I can probably dig into tomorrow morning if you want to dive into it (have ptg stuff now then real world school meeting stuff after and by the time I'm done you should be enjoying your evening)15:33
clarkbkashyap: also ianw is interested in the fedora 34 issue and his timezone overlap might be better? I guess it depends on how much of a morning person you are :)15:34
kashyapNo problem at all15:34
kashyapGo handle what you need to.  This can wait.  And yes, will check with ianw.  I'm in CEST; he's in Australia...so there should be some overlap :)15:34
clarkbsounds good, thanks again!15:35
*** efried1 is now known as efried15:46
gibistephenfin, artom: another way to test is to move heat from novaclient to sdk ;)15:56
artomgibi, you're an evil, evil man15:56
stephenfingibi: or tempest. That uses novaclient under the hood, I assume rather than subprocessing to the shell?15:56
artomstephenfin, no, tempest reimplements all the client stuff from scratch in Python in-tree15:56
artomBy design15:56
stephenfinoh, so it does15:56
artomAll I was saying is - we need to make https://opendev.org/openstack/openstacksdk/src/branch/master/.zuul.yaml#L116 work with 2 nodes do we can test how sdk does the client-side stuff for migrations and such15:58
sean-k-mooneyby the way when i was mentioning tempst i ment recreeate teh senario test using sdk in the sdk func test15:58
gibiartom: neutron just realized that they have to do the same switch in heat 15:58
toskybut in general for non testing stuff gibi's proposed solution is correct (assuming the sdk implementation of orchestration is complete enough)15:58
sean-k-mooneygibi: well heat need to do it rather then neutron but im sure it will be the same people in either case15:59
gibisean-k-mooney: except if neutron people want heat support for new neutron features where the only client support is in sdk :)15:59
gibianyhow starting the transition does not seem scarry https://review.opendev.org/c/openstack/heat/+/81342516:00
sean-k-mooneygibi: but do they want that support :)16:00
stephenfingibi: artom: I hate to be that guy but we also need to move our internal use of cinderclient, glanceclient and neutronclient to SDK at some point :)16:00
stephenfinI think ironicclient was done a few cycles ago16:00
sean-k-mooneygibi: you are assuming neutron care about heat16:00
sean-k-mooneythey proably do but maybe not16:01
stephenfinwait, no, there's still ironicclient. It's just not a mandatory import16:01
stephenfin*requirements16:01
artomstephenfin, yeah...16:01
artomLike, I'm taking part in this debat, but who am I kidding, I won't be the one doing the work - I tried to start, then just ran out of steam16:01
gibistephenfin: ouch, you are right :)16:02
artomOTOH, what's kind of annoying is that the problem that I'm fixing in https://review.opendev.org/c/openstack/openstacksdk/+/741688 is still a problem.16:02
artomimage is *still* optional, when it isn't in the PAI16:02
artom*API16:02
artomAnd it's *still* missing the kwargs things for all the other params that are in the API16:03
stephenfinartom: I should have looked at that. My bad :( If you got time to rebase it onto the feature/r1 branch (which will be merged into master soon enough) I'll review it in the AM16:04
artomstephenfin, I need to fix the unit test below it and rebase16:04
artomstephenfin, it's just such a slog, and I'm lazy and easily distracted16:04
stephenfinyou should try cocaine16:04
artomSpeaking from experience?16:04
stephenfinall the bankers I know swear by it16:04
artomExplains the state of the financial system16:05
gibihappy hours already?16:07
*** bhagyashris_ is now known as bhagyashris16:27
sean-k-mooneyam are we goint to call it a day or do we want to do the pain points discussion 16:31
sean-k-mooneysince we got time back16:31
sean-k-mooneybauzas: ^16:31
bauzassean-k-mooney: I prefer to leave early16:31
sean-k-mooneyok 16:32
bauzassean-k-mooney: I feel we can make all the agenda by tomorrow16:32
bauzaswe have 7 topics left16:33
sean-k-mooneywell we might but there is also tc sesssions tomorow16:33
sean-k-mooneyso we might not have quorm for the full day16:33
sean-k-mooneybut yes we likely can finish tomorow16:33
bauzassean-k-mooney: you're right, we're constrainted by the big TC RBAC thing16:34
bauzassean-k-mooney: but I feel we can postpone a few topics if we really need16:34
sean-k-mooneyas a last resort yes but in general we shoudl try to avoid that16:35
bauzasagreed16:35
bauzasI'll do a timekeeping thing16:35
bauzasand try to not exceed 30 mins per topic16:36
sean-k-mooneyok im going to step away for a few minutes. i was still using my wired headset today since i did not find my wireless one this morning so i have slight headach anyway.16:37
sean-k-mooneyit went away after 20 mins of not wareing it yesterday so hopefully the same will hapen today.16:38
bauzassean-k-mooney: yeah that's also why I wanted to stop earlier16:38
bauzaswe were not in the room16:38
bauzasasking people to rejoin was an effort16:38
bauzasso it would have meant 1 topic to discuss16:38
sean-k-mooneyyep getting momentum back is hard16:39
bauzassean-k-mooney: I see the TC discussion around RBAC occuring at 1:30pm until 3pm16:41
bauzassean-k-mooney: I accordingly flipped topics in the agenda16:41
sean-k-mooneyack16:42
opendevreviewBalazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot  https://review.opendev.org/c/openstack/nova/+/81341917:09
gibiartom: I've fixed up your comments ^^17:15
artomgibi, *looks* you have some more asserts in the test that are... unrelated? Like I don't know how anal we want to be about this, but it's really only the last one we care about17:17
gibiartom: I can drop the other asserts17:19
gibiI'm also not sure about our strategy in these tests17:19
gibiI admit I copied a previous test and modified that hence the bigs cope17:19
artomOur unit tests are overly tied to the implementation and confusing? Say it ain't so ;)17:19
gibiI don't like our unit tests either :)17:20
artomgibi, yeah, I figured that was the case :) I think in this situation, with the code being what it is, what you have is OK17:20
artomWell, minus the extraneous asserts17:20
opendevreviewBalazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot  https://review.opendev.org/c/openstack/nova/+/81341917:29
gibiartom: ^^17:30
artomgibi, cool, thanks for your patience :)17:32
gibiartom: no worries. I do want to have nice unit tests so at least lets have the new ones nicer17:32
sean-k-mooneyi think the unit tests we write as small local tests are nice17:35
sean-k-mooneybut some of them are close to funcitonal test  then unit17:36
sean-k-mooneyin that they test the behavior of things that are down several calls.17:36
gibiyeah17:37
gibiI finished for today. See you tomorrow17:37
gibio/17:37
sean-k-mooneyo/17:38
*** mdbooth1 is now known as mdbooth21:50
rm_workhey, was there a specific reason that properties/metadata isn't something you can filter by in a server list?23:50
rm_workor, would that be a patch you might accept?23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!