Thursday, 2021-10-21

brinzhang	bauzas:hi, about the suspend/resume an accelerator instance feature, there is no spec, before we completed rebuild/evacuate, shelve/unshelve, I just want to continue	00:22
brinzhang	bauzas: today, I am not sure that time(1pm) which is ok for me, but if possiable, I think I can, thanks	00:24
*** carloss_ is now known as carloss		01:09
*** ministry is now known as __ministry		04:23
*** bhagyashris_ is now known as bhagyashris		07:46
bauzas	brinzhang: OK, then we'll start with your topic, but in case you're not around, I'll punt it until tomorrow	08:29
gibi	morning Nova	08:30
sean-k-mooney	for what its worth i think it make sense to proceed with suspend/resume for cyborg without a spec in the same way i hope to do the same for vdpa	10:09
sean-k-mooney	its just the completion of the existing work which does nto really require much design work so i think specless blueprint and code review should be suffincet in both cases	10:10
gibi	I'm ok with a specless bp too	10:18
bauzas	reminder : nova sessions start in 50 mins on https://www.openstack.org/ptg/rooms/newton	12:11
bauzas	sean-k-mooney: gibi: yup I agree with both of you but given brinzhang asked to discuss for it, let's wait until we do	12:15
gibi	sure	12:31
tbarron	bauzas: do you have a window (however large) when you are targeting the virtios topic? manila folks plan to discuss at 23:15 this morning (I know Lee is out of course)	12:37
frickler	with virt_type=qemu on qemu>=5.2 (bullseye, jammy, centos-stream-8) instances (i.e. rss of the qemu process) seem to use 3x as much ram as the flavor allocates, has anyone seen this behaviour before? it is causing failures in devstack CI with things just going OOM	12:37
tbarron	bauzas: 13:15 :)	12:39
tbarron	which isn't morning for lots of people, I know	12:39
* tbarron is waking up		12:39
sean-k-mooney	frickler: no but i do know that in qemu 6.0 the -m attibute is nolonger required when using mem-backing args	12:43
sean-k-mooney	frickler: so in 5.2 where we were seeing libvirt use both way that was still required	12:44
sean-k-mooney	frickler: actully that chang is in 6.1 https://wiki.qemu.org/ChangeLog/6.1#Memory_backends	12:45
sean-k-mooney	frickler: so the qemu command line generated form the libvirt xml that i looked at before i think was correct so this sound like a qemu bug	12:46
bauzas	tbarron: others, sorry was taking a bit of timeoff before we start	12:57
frickler	sean-k-mooney: do you have some contact at qemu to push that to?	12:57
bauzas	tbarron: not sure we'll have time to discuss about the Manila spec today	12:57
bauzas	tbarron: we can start to discuss it for tomorrow 1pm (or later if you prefer)	12:58
sean-k-mooney	frickler: good question kashyap: do you know who could help root cause this	12:58
kashyap	What's the context? /me catches up. (In a meeting now)	12:59
sean-k-mooney	kashyap: context is that qemu on debian seams to be using 2-3x the ram that is allocated to the vm	12:59
kashyap	What version of QEMU?	12:59
sean-k-mooney	5.2	12:59
kashyap	Fairly recent (Dec 2020)	12:59
*** redrobot is now known as Guest3656		12:59
sean-k-mooney	kashyap: we are seeign this result in OOM issues in the ci hence frickler concern	12:59
tbarron	bauzas: bit is right :D. 1300UTC tomorrow would be great, manila sessions don't start till 1400.	12:59
stephenfin	I'll be 20/30 minutes late to the PTG sessions again today	13:00
kashyap	frickler: Very odd; is this something sudden? I can check w/ the upstream, but before that, I'd need a bug (with proper details)	13:00
kashyap	frickler: Oh, wait	13:01
kashyap	frickler: virt_type=qemu is already deadly slow: as you might know it's not using any hardware acceleration	13:01
kashyap	frickler: What is the host OS? (I'm assuming this is a nested env; i.e. the "host" is a level-1 guest. And the baremetal is some cloud-vendor provided)	13:02
bauzas	nova sessions start by now, people can join with https://www.openstack.org/ptg/rooms/newton	13:02
kashyap	frickler: I'd need these, to start with: QEMU version and the complete QEMU command-line of the guest. You'll find it here: /var/log/libvirt/qemu/instance-yyyyyyyy.log	13:02
* kashyap bbiab; need air, been in back-t-back meetings elsewhere		13:04
bauzas	tbarron: okay then let's discuss this at 1300UTC tomorrow	13:04
bauzas	sean-k-mooney: joing	13:05
bauzas	joining ?	13:05
sean-k-mooney	yes sorry be right there	13:06
bauzas	stephenfin: are you able to join us ?	13:07
bauzas	stephenfin: we would discuss about your topics next	13:07
stephenfin	bauzas: in 10 minutes	13:24
bauzas	dansmith: fyi, we're discussing the healthcheck proposal (as you provided comments in there)	14:16
dansmith	bauzas: arg, okay, in the tc right now	14:18
bauzas	dansmith: I guess we collected your thoughts	14:18
dansmith	bauzas: I can be there in a few I think	14:20
bauzas	ok we'll continue to discuss the design before you join	14:20
bauzas	dansmith: tl;dr:	14:32
bauzas	1/ discuss a library like oslo.healthcheck providing the HT knobs	14:32
bauzas	2/ having a global cache object knowing the state of the service	14:32
bauzas	3/ (I can't remember it, sean-k-mooney ? )	14:33
bauzas	oh, the API thing	14:33
sean-k-mooney	3 was allow /healtcheck at the api to call the helath check for that api process	14:34
bauzas	yeah, I remembered it :)	14:35
frickler	kashyap: https://paste.opendev.org/show/810150/ we know that qemu is slow, but we cannot use kvm reliably in CI. happening for different host OSes (i.e. what devstack runs in, no idea what host OS is in place on the cloud side)	14:47
kashyap	Exactly; that's the peril here w/ nested probs ... not knowing the host OS :-(	14:51
clarkb	it isn't jsut a reliability problem some of our providers don't expose nested virt at all. So it is a two tier problem. In some places we cannot use kvm. And in others we may but with the risk of poor reliability	14:59
clarkb	kashyap: on the x86_64-v2 thing (^ related because qemu) does qemu not provide a predefined compatible cpu?	15:00
kashyap	clarkb: Hiya	15:00
clarkb	seems like it should? I remember when we first set up nova live migration testing one of the things I tried was defining a custom cpu since we have heterogenous resources in the clouds we use (and have no way to request consistent cpus)	15:00
clarkb	but that didn't work at all whcih is how we ened up using he qemu64 model	15:01
kashyap	Yeah, I know :-(	15:01
clarkb	I want to say at the time we identified bugs in nova and friends and they were getting fixed so it is possible this is no longer an issue today but I think that if centos is saying that you need this minimum cpu now it is reasonable for there to be that cpu predefined in qemu	15:01
kashyap	clarkb: Sadly, QEMU does not provide a model that will work on (a) TCG, i.e. plain emulation, and KVM; and (b) that works on Intel and AMD	15:01
kashyap	clarkb: But!	15:01
kashyap	There is an option I just tested. And this works:	15:02
kashyap	clarkb: Using "Nehalem" satisfies both the above conditions.	15:02
kashyap	And it works on both Intel and AMD.	15:02
kashyap	Nehalem is the oldest comatible model that works with the x86_64-v2	15:02
clarkb	got it so there is a predefined model that addresses things. That is great news. I guess we should consider defaulting devstack to that then?	15:03
kashyap	Yes! Indeed	15:03
clarkb	I can try pushing a devstack change later today that does that	15:03
kashyap	clarkb: I was going to write an email to openstack-discuss list with this recommendation	15:03
clarkb	kashyap: that would be great, thank you	15:03
kashyap	clarkb: I'm drafting as we speak. So your timing couldn't be perfect :-)	15:03
kashyap	s/be perfect/be more perfect/	15:04
kashyap	clarkb: The only assumption, which I think you'll agree is reasonable is:	15:04
kashyap	clarkb: ... I imagine any hardware older than Nehalem is not capable of running OpenStack.	15:04
kashyap	Say "yes", please :D	15:05
opendevreview	Balazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests https://review.opendev.org/c/openstack/nova/+/815017	15:05
clarkb	kashyap: well thats the next thing we need to figure out because we get virtual resources with their own cpu models being managed. Currently we can't boot fedora-34 in half of our clouds and are wondering if this is the same issue	15:05
clarkb	kashyap: it isn't just real hardware to consider but also virtual hardware	15:06
kashyap	I don't know of F34 switched to the -v2, lemme check	15:06
kashyap	clarkb: Wait, pretty sure F34 is switched to it too	15:07
kashyap	How do I know? By inference :D RHEL9 is based on F33/F34, so I presume it did too	15:08
kashyap	I just need to double-check	15:08
clarkb	kashyap: ya and half our clouds can't boot it	15:08
clarkb	so there is possibility that we can't actually switch to Nehalem, but we can make a change and test it	15:09
opendevreview	Balazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests https://review.opendev.org/c/openstack/nova/+/815017	15:10
kashyap	clarkb: Yeah, I'm actually quickly pushing a test-only patch to DevStack to see where it fails. If you haven't already done so	15:19
clarkb	kashyap: remote: https://review.opendev.org/c/openstack/devstack/+/815020 Use Nehalem CPU model by default I just pushed that to have the CI system check things. But thats because I'm in meetings and don't have a current devstack install anywhere to check with	15:19
kashyap	Ah-ha, thank you	15:20
kashyap	clarkb: I don't have it either right now. But:	15:20
kashyap	clarkb: I have tested today outside of DevStack: Nehalem model works with virt_type=qemu and with Intel and AMD	15:20
clarkb	kashyap: ok, ya I suspect the question becomes whether or not the CPUs we get in the host VMs are new enough to support Nehalem	15:21
clarkb	since they too get virtual capabilities to enable live mgiration in clouds and if they were too conservative we'll have problems. The good news if there is any is that they won't be able to boot centos/rhel 9 either so clouds seem likely to update them	15:22
clarkb	we can recheck 815020 if it generally works to get it to run across a bunch of clouds and double check that	15:22
kashyap	clarkb: Nehalem is nearly 13 years old ... I hope there are no such hosts :	15:24
clarkb	kashyap: the issue is the clouds provide virtual CPUs with custom models too	15:26
clarkb	because they want to do live mgiration. And if they were too conservative you have this problem. I don't think the actual CPUs are that old	15:26
clarkb	but the CPUs we get in our instances may be	15:26
kashyap	Right; I see what you mean. Even if they are conservative w/ the virtual CPUs, I'd be really surprised (wouldn't be the first time) if they're more conservative than Nehalem	15:27
kashyap	clarkb: Bad me. I was _wrong_ earlier on my stupid "inference" about Fedora: Fedora did not switch the baseline ABI to x86-64-v2	15:29
kashyap	So we can rule that out	15:29
clarkb	ok good, our problems for booting are differen then :)	15:30
kashyap	Yeah; grr, now to find out the actual cause	15:32
kashyap	Do you have the boot console log?	15:32
clarkb	kashyap: not right now. It is something I can probably dig into tomorrow morning if you want to dive into it (have ptg stuff now then real world school meeting stuff after and by the time I'm done you should be enjoying your evening)	15:33
clarkb	kashyap: also ianw is interested in the fedora 34 issue and his timezone overlap might be better? I guess it depends on how much of a morning person you are :)	15:34
kashyap	No problem at all	15:34
kashyap	Go handle what you need to. This can wait. And yes, will check with ianw. I'm in CEST; he's in Australia...so there should be some overlap :)	15:34
clarkb	sounds good, thanks again!	15:35
*** efried1 is now known as efried		15:46
gibi	stephenfin, artom: another way to test is to move heat from novaclient to sdk ;)	15:56
artom	gibi, you're an evil, evil man	15:56
stephenfin	gibi: or tempest. That uses novaclient under the hood, I assume rather than subprocessing to the shell?	15:56
artom	stephenfin, no, tempest reimplements all the client stuff from scratch in Python in-tree	15:56
artom	By design	15:56
stephenfin	oh, so it does	15:56
artom	All I was saying is - we need to make https://opendev.org/openstack/openstacksdk/src/branch/master/.zuul.yaml#L116 work with 2 nodes do we can test how sdk does the client-side stuff for migrations and such	15:58
sean-k-mooney	by the way when i was mentioning tempst i ment recreeate teh senario test using sdk in the sdk func test	15:58
gibi	artom: neutron just realized that they have to do the same switch in heat	15:58
tosky	but in general for non testing stuff gibi's proposed solution is correct (assuming the sdk implementation of orchestration is complete enough)	15:58
sean-k-mooney	gibi: well heat need to do it rather then neutron but im sure it will be the same people in either case	15:59
gibi	sean-k-mooney: except if neutron people want heat support for new neutron features where the only client support is in sdk :)	15:59
gibi	anyhow starting the transition does not seem scarry https://review.opendev.org/c/openstack/heat/+/813425	16:00
sean-k-mooney	gibi: but do they want that support :)	16:00
stephenfin	gibi: artom: I hate to be that guy but we also need to move our internal use of cinderclient, glanceclient and neutronclient to SDK at some point :)	16:00
stephenfin	I think ironicclient was done a few cycles ago	16:00
sean-k-mooney	gibi: you are assuming neutron care about heat	16:00
sean-k-mooney	they proably do but maybe not	16:01
stephenfin	wait, no, there's still ironicclient. It's just not a mandatory import	16:01
stephenfin	*requirements	16:01
artom	stephenfin, yeah...	16:01
artom	Like, I'm taking part in this debat, but who am I kidding, I won't be the one doing the work - I tried to start, then just ran out of steam	16:01
gibi	stephenfin: ouch, you are right :)	16:02
artom	OTOH, what's kind of annoying is that the problem that I'm fixing in https://review.opendev.org/c/openstack/openstacksdk/+/741688 is still a problem.	16:02
artom	image is still optional, when it isn't in the PAI	16:02
artom	*API	16:02
artom	And it's still missing the kwargs things for all the other params that are in the API	16:03
stephenfin	artom: I should have looked at that. My bad :( If you got time to rebase it onto the feature/r1 branch (which will be merged into master soon enough) I'll review it in the AM	16:04
artom	stephenfin, I need to fix the unit test below it and rebase	16:04
artom	stephenfin, it's just such a slog, and I'm lazy and easily distracted	16:04
stephenfin	you should try cocaine	16:04
artom	Speaking from experience?	16:04
stephenfin	all the bankers I know swear by it	16:04
artom	Explains the state of the financial system	16:05
gibi	happy hours already?	16:07
*** bhagyashris_ is now known as bhagyashris		16:27
sean-k-mooney	am are we goint to call it a day or do we want to do the pain points discussion	16:31
sean-k-mooney	since we got time back	16:31
sean-k-mooney	bauzas: ^	16:31
bauzas	sean-k-mooney: I prefer to leave early	16:31
sean-k-mooney	ok	16:32
bauzas	sean-k-mooney: I feel we can make all the agenda by tomorrow	16:32
bauzas	we have 7 topics left	16:33
sean-k-mooney	well we might but there is also tc sesssions tomorow	16:33
sean-k-mooney	so we might not have quorm for the full day	16:33
sean-k-mooney	but yes we likely can finish tomorow	16:33
bauzas	sean-k-mooney: you're right, we're constrainted by the big TC RBAC thing	16:34
bauzas	sean-k-mooney: but I feel we can postpone a few topics if we really need	16:34
sean-k-mooney	as a last resort yes but in general we shoudl try to avoid that	16:35
bauzas	agreed	16:35
bauzas	I'll do a timekeeping thing	16:35
bauzas	and try to not exceed 30 mins per topic	16:36
sean-k-mooney	ok im going to step away for a few minutes. i was still using my wired headset today since i did not find my wireless one this morning so i have slight headach anyway.	16:37
sean-k-mooney	it went away after 20 mins of not wareing it yesterday so hopefully the same will hapen today.	16:38
bauzas	sean-k-mooney: yeah that's also why I wanted to stop earlier	16:38
bauzas	we were not in the room	16:38
bauzas	asking people to rejoin was an effort	16:38
bauzas	so it would have meant 1 topic to discuss	16:38
sean-k-mooney	yep getting momentum back is hard	16:39
bauzas	sean-k-mooney: I see the TC discussion around RBAC occuring at 1:30pm until 3pm	16:41
bauzas	sean-k-mooney: I accordingly flipped topics in the agenda	16:41
sean-k-mooney	ack	16:42
opendevreview	Balazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot https://review.opendev.org/c/openstack/nova/+/813419	17:09
gibi	artom: I've fixed up your comments ^^	17:15
artom	gibi, looks you have some more asserts in the test that are... unrelated? Like I don't know how anal we want to be about this, but it's really only the last one we care about	17:17
gibi	artom: I can drop the other asserts	17:19
gibi	I'm also not sure about our strategy in these tests	17:19
gibi	I admit I copied a previous test and modified that hence the bigs cope	17:19
artom	Our unit tests are overly tied to the implementation and confusing? Say it ain't so ;)	17:19
gibi	I don't like our unit tests either :)	17:20
artom	gibi, yeah, I figured that was the case :) I think in this situation, with the code being what it is, what you have is OK	17:20
artom	Well, minus the extraneous asserts	17:20
opendevreview	Balazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot https://review.opendev.org/c/openstack/nova/+/813419	17:29
gibi	artom: ^^	17:30
artom	gibi, cool, thanks for your patience :)	17:32
gibi	artom: no worries. I do want to have nice unit tests so at least lets have the new ones nicer	17:32
sean-k-mooney	i think the unit tests we write as small local tests are nice	17:35
sean-k-mooney	but some of them are close to funcitonal test then unit	17:36
sean-k-mooney	in that they test the behavior of things that are down several calls.	17:36
gibi	yeah	17:37
gibi	I finished for today. See you tomorrow	17:37
gibi	o/	17:37
sean-k-mooney	o/	17:38
*** mdbooth1 is now known as mdbooth		21:50
rm_work	hey, was there a specific reason that properties/metadata isn't something you can filter by in a server list?	23:50
rm_work	or, would that be a patch you might accept?	23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!