brinzhang | bauzas:hi, about the suspend/resume an accelerator instance feature, there is no spec, before we completed rebuild/evacuate, shelve/unshelve, I just want to continue | 00:22 |
---|---|---|
brinzhang | bauzas: today, I am not sure that time(1pm) which is ok for me, but if possiable, I think I can, thanks | 00:24 |
*** carloss_ is now known as carloss | 01:09 | |
*** ministry is now known as __ministry | 04:23 | |
*** bhagyashris_ is now known as bhagyashris | 07:46 | |
bauzas | brinzhang: OK, then we'll start with your topic, but in case you're not around, I'll punt it until tomorrow | 08:29 |
gibi | morning Nova | 08:30 |
sean-k-mooney | for what its worth i think it make sense to proceed with suspend/resume for cyborg without a spec in the same way i hope to do the same for vdpa | 10:09 |
sean-k-mooney | its just the completion of the existing work which does nto really require much design work so i think specless blueprint and code review should be suffincet in both cases | 10:10 |
gibi | I'm ok with a specless bp too | 10:18 |
bauzas | reminder : nova sessions start in 50 mins on https://www.openstack.org/ptg/rooms/newton | 12:11 |
bauzas | sean-k-mooney: gibi: yup I agree with both of you but given brinzhang asked to discuss for it, let's wait until we do | 12:15 |
gibi | sure | 12:31 |
tbarron | bauzas: do you have a window (however large) when you are targeting the virtios topic? manila folks plan to discuss at 23:15 this morning (I know Lee is out of course) | 12:37 |
frickler | with virt_type=qemu on qemu>=5.2 (bullseye, jammy, centos-stream-8) instances (i.e. rss of the qemu process) seem to use 3x as much ram as the flavor allocates, has anyone seen this behaviour before? it is causing failures in devstack CI with things just going OOM | 12:37 |
tbarron | bauzas: 13:15 :) | 12:39 |
tbarron | which isn't morning for lots of people, I know | 12:39 |
* tbarron is waking up | 12:39 | |
sean-k-mooney | frickler: no but i do know that in qemu 6.0 the -m attibute is nolonger required when using mem-backing args | 12:43 |
sean-k-mooney | frickler: so in 5.2 where we were seeing libvirt use both way that was still required | 12:44 |
sean-k-mooney | frickler: actully that chang is in 6.1 https://wiki.qemu.org/ChangeLog/6.1#Memory_backends | 12:45 |
sean-k-mooney | frickler: so the qemu command line generated form the libvirt xml that i looked at before i think was correct so this sound like a qemu bug | 12:46 |
bauzas | tbarron: others, sorry was taking a bit of timeoff before we start | 12:57 |
frickler | sean-k-mooney: do you have some contact at qemu to push that to? | 12:57 |
bauzas | tbarron: not sure we'll have time to discuss about the Manila spec today | 12:57 |
bauzas | tbarron: we can start to discuss it for tomorrow 1pm (or later if you prefer) | 12:58 |
sean-k-mooney | frickler: good question kashyap: do you know who could help root cause this | 12:58 |
kashyap | What's the context? /me catches up. (In a meeting now) | 12:59 |
sean-k-mooney | kashyap: context is that qemu on debian seams to be using 2-3x the ram that is allocated to the vm | 12:59 |
kashyap | What version of QEMU? | 12:59 |
sean-k-mooney | 5.2 | 12:59 |
kashyap | Fairly recent (Dec 2020) | 12:59 |
*** redrobot is now known as Guest3656 | 12:59 | |
sean-k-mooney | kashyap: we are seeign this result in OOM issues in the ci hence frickler concern | 12:59 |
tbarron | bauzas: *bit* is right :D. 1300UTC tomorrow would be great, manila sessions don't start till 1400. | 12:59 |
stephenfin | I'll be 20/30 minutes late to the PTG sessions again today | 13:00 |
kashyap | frickler: Very odd; is this something sudden? I can check w/ the upstream, but before that, I'd need a bug (with proper details) | 13:00 |
kashyap | frickler: Oh, wait | 13:01 |
kashyap | frickler: virt_type=qemu is already deadly slow: as you might know it's not using any hardware acceleration | 13:01 |
kashyap | frickler: What is the host OS? (I'm assuming this is a nested env; i.e. the "host" is a level-1 guest. And the baremetal is some cloud-vendor provided) | 13:02 |
bauzas | nova sessions start by now, people can join with https://www.openstack.org/ptg/rooms/newton | 13:02 |
kashyap | frickler: I'd need these, to start with: QEMU version and the complete QEMU command-line of the guest. You'll find it here: /var/log/libvirt/qemu/instance-yyyyyyyy.log | 13:02 |
* kashyap bbiab; need air, been in back-t-back meetings elsewhere | 13:04 | |
bauzas | tbarron: okay then let's discuss this at 1300UTC tomorrow | 13:04 |
bauzas | sean-k-mooney: joing | 13:05 |
bauzas | joining ? | 13:05 |
sean-k-mooney | yes sorry be right there | 13:06 |
bauzas | stephenfin: are you able to join us ? | 13:07 |
bauzas | stephenfin: we would discuss about your topics next | 13:07 |
stephenfin | bauzas: in 10 minutes | 13:24 |
bauzas | dansmith: fyi, we're discussing the healthcheck proposal (as you provided comments in there) | 14:16 |
dansmith | bauzas: arg, okay, in the tc right now | 14:18 |
bauzas | dansmith: I guess we collected your thoughts | 14:18 |
dansmith | bauzas: I can be there in a few I think | 14:20 |
bauzas | ok we'll continue to discuss the design before you join | 14:20 |
bauzas | dansmith: tl;dr: | 14:32 |
bauzas | 1/ discuss a library like oslo.healthcheck providing the HT knobs | 14:32 |
bauzas | 2/ having a global cache object knowing the state of the service | 14:32 |
bauzas | 3/ (I can't remember it, sean-k-mooney ? ) | 14:33 |
bauzas | oh, the API thing | 14:33 |
sean-k-mooney | 3 was allow /healtcheck at the api to call the helath check for that api process | 14:34 |
bauzas | yeah, I remembered it :) | 14:35 |
frickler | kashyap: https://paste.opendev.org/show/810150/ we know that qemu is slow, but we cannot use kvm reliably in CI. happening for different host OSes (i.e. what devstack runs in, no idea what host OS is in place on the cloud side) | 14:47 |
kashyap | Exactly; that's the peril here w/ nested probs ... not knowing the host OS :-( | 14:51 |
clarkb | it isn't jsut a reliability problem some of our providers don't expose nested virt at all. So it is a two tier problem. In some places we cannot use kvm. And in others we may but with the risk of poor reliability | 14:59 |
clarkb | kashyap: on the x86_64-v2 thing (^ related because qemu) does qemu not provide a predefined compatible cpu? | 15:00 |
kashyap | clarkb: Hiya | 15:00 |
clarkb | seems like it should? I remember when we first set up nova live migration testing one of the things I tried was defining a custom cpu since we have heterogenous resources in the clouds we use (and have no way to request consistent cpus) | 15:00 |
clarkb | but that didn't work at all whcih is how we ened up using he qemu64 model | 15:01 |
kashyap | Yeah, I know :-( | 15:01 |
clarkb | I want to say at the time we identified bugs in nova and friends and they were getting fixed so it is possible this is no longer an issue today but I think that if centos is saying that you need this minimum cpu now it is reasonable for there to be that cpu predefined in qemu | 15:01 |
kashyap | clarkb: Sadly, QEMU does not provide a model that will work on (a) TCG, i.e. plain emulation, *and* KVM; and (b) that works on Intel and AMD | 15:01 |
kashyap | clarkb: But! | 15:01 |
kashyap | There is an option I just tested. And this works: | 15:02 |
kashyap | clarkb: Using "Nehalem" satisfies both the above conditions. | 15:02 |
kashyap | And it works on both Intel and AMD. | 15:02 |
kashyap | Nehalem is the oldest comatible model that works with the x86_64-v2 | 15:02 |
clarkb | got it so there is a predefined model that addresses things. That is great news. I guess we should consider defaulting devstack to that then? | 15:03 |
kashyap | Yes! Indeed | 15:03 |
clarkb | I can try pushing a devstack change later today that does that | 15:03 |
kashyap | clarkb: I was going to write an email to openstack-discuss list with this recommendation | 15:03 |
clarkb | kashyap: that would be great, thank you | 15:03 |
kashyap | clarkb: I'm drafting as we speak. So your timing couldn't be perfect :-) | 15:03 |
kashyap | s/be perfect/be more perfect/ | 15:04 |
kashyap | clarkb: The only assumption, which I think you'll agree is reasonable is: | 15:04 |
kashyap | clarkb: ... I imagine any hardware older than Nehalem is not capable of running OpenStack. | 15:04 |
kashyap | Say "yes", please :D | 15:05 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests https://review.opendev.org/c/openstack/nova/+/815017 | 15:05 |
clarkb | kashyap: well thats the next thing we need to figure out because we get virtual resources with their own cpu models being managed. Currently we can't boot fedora-34 in half of our clouds and are wondering if this is the same issue | 15:05 |
clarkb | kashyap: it isn't just real hardware to consider but also virtual hardware | 15:06 |
kashyap | I don't know of F34 switched to the -v2, lemme check | 15:06 |
kashyap | clarkb: Wait, pretty sure F34 is switched to it too | 15:07 |
kashyap | How do I know? By inference :D RHEL9 is based on F33/F34, so I presume it did too | 15:08 |
kashyap | I just need to double-check | 15:08 |
clarkb | kashyap: ya and half our clouds can't boot it | 15:08 |
clarkb | so there is possibility that we can't actually switch to Nehalem, but we can make a change and test it | 15:09 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Prevent leaked greenlets to interact with later tests https://review.opendev.org/c/openstack/nova/+/815017 | 15:10 |
kashyap | clarkb: Yeah, I'm actually quickly pushing a test-only patch to DevStack to see where it fails. If you haven't already done so | 15:19 |
clarkb | kashyap: remote: https://review.opendev.org/c/openstack/devstack/+/815020 Use Nehalem CPU model by default I just pushed that to have the CI system check things. But thats because I'm in meetings and don't have a current devstack install anywhere to check with | 15:19 |
kashyap | Ah-ha, thank you | 15:20 |
kashyap | clarkb: I don't have it either right now. But: | 15:20 |
kashyap | clarkb: I have tested today outside of DevStack: Nehalem model works with virt_type=qemu *and* with Intel and AMD | 15:20 |
clarkb | kashyap: ok, ya I suspect the question becomes whether or not the CPUs we get in the host VMs are new enough to support Nehalem | 15:21 |
clarkb | since they too get virtual capabilities to enable live mgiration in clouds and if they were too conservative we'll have problems. The good news if there is any is that they won't be able to boot centos/rhel 9 either so clouds seem likely to update them | 15:22 |
clarkb | we can recheck 815020 if it generally works to get it to run across a bunch of clouds and double check that | 15:22 |
kashyap | clarkb: Nehalem is nearly 13 years old ... I hope there are no such hosts : | 15:24 |
clarkb | kashyap: the issue is the clouds provide virtual CPUs with custom models too | 15:26 |
clarkb | because they want to do live mgiration. And if they were too conservative you have this problem. I don't think the actual CPUs are that old | 15:26 |
clarkb | but the CPUs we get in our instances may be | 15:26 |
kashyap | Right; I see what you mean. Even if they are conservative w/ the virtual CPUs, I'd be really surprised (wouldn't be the first time) if they're more conservative than Nehalem | 15:27 |
kashyap | clarkb: Bad me. I was _wrong_ earlier on my stupid "inference" about Fedora: Fedora *did not* switch the baseline ABI to x86-64-v2 | 15:29 |
kashyap | So we can rule that out | 15:29 |
clarkb | ok good, our problems for booting are differen then :) | 15:30 |
kashyap | Yeah; grr, now to find out the actual cause | 15:32 |
kashyap | Do you have the boot console log? | 15:32 |
clarkb | kashyap: not right now. It is something I can probably dig into tomorrow morning if you want to dive into it (have ptg stuff now then real world school meeting stuff after and by the time I'm done you should be enjoying your evening) | 15:33 |
clarkb | kashyap: also ianw is interested in the fedora 34 issue and his timezone overlap might be better? I guess it depends on how much of a morning person you are :) | 15:34 |
kashyap | No problem at all | 15:34 |
kashyap | Go handle what you need to. This can wait. And yes, will check with ianw. I'm in CEST; he's in Australia...so there should be some overlap :) | 15:34 |
clarkb | sounds good, thanks again! | 15:35 |
*** efried1 is now known as efried | 15:46 | |
gibi | stephenfin, artom: another way to test is to move heat from novaclient to sdk ;) | 15:56 |
artom | gibi, you're an evil, evil man | 15:56 |
stephenfin | gibi: or tempest. That uses novaclient under the hood, I assume rather than subprocessing to the shell? | 15:56 |
artom | stephenfin, no, tempest reimplements all the client stuff from scratch in Python in-tree | 15:56 |
artom | By design | 15:56 |
stephenfin | oh, so it does | 15:56 |
artom | All I was saying is - we need to make https://opendev.org/openstack/openstacksdk/src/branch/master/.zuul.yaml#L116 work with 2 nodes do we can test how sdk does the client-side stuff for migrations and such | 15:58 |
sean-k-mooney | by the way when i was mentioning tempst i ment recreeate teh senario test using sdk in the sdk func test | 15:58 |
gibi | artom: neutron just realized that they have to do the same switch in heat | 15:58 |
tosky | but in general for non testing stuff gibi's proposed solution is correct (assuming the sdk implementation of orchestration is complete enough) | 15:58 |
sean-k-mooney | gibi: well heat need to do it rather then neutron but im sure it will be the same people in either case | 15:59 |
gibi | sean-k-mooney: except if neutron people want heat support for new neutron features where the only client support is in sdk :) | 15:59 |
gibi | anyhow starting the transition does not seem scarry https://review.opendev.org/c/openstack/heat/+/813425 | 16:00 |
sean-k-mooney | gibi: but do they want that support :) | 16:00 |
stephenfin | gibi: artom: I hate to be that guy but we also need to move our internal use of cinderclient, glanceclient and neutronclient to SDK at some point :) | 16:00 |
stephenfin | I think ironicclient was done a few cycles ago | 16:00 |
sean-k-mooney | gibi: you are assuming neutron care about heat | 16:00 |
sean-k-mooney | they proably do but maybe not | 16:01 |
stephenfin | wait, no, there's still ironicclient. It's just not a mandatory import | 16:01 |
stephenfin | *requirements | 16:01 |
artom | stephenfin, yeah... | 16:01 |
artom | Like, I'm taking part in this debat, but who am I kidding, I won't be the one doing the work - I tried to start, then just ran out of steam | 16:01 |
gibi | stephenfin: ouch, you are right :) | 16:02 |
artom | OTOH, what's kind of annoying is that the problem that I'm fixing in https://review.opendev.org/c/openstack/openstacksdk/+/741688 is still a problem. | 16:02 |
artom | image is *still* optional, when it isn't in the PAI | 16:02 |
artom | *API | 16:02 |
artom | And it's *still* missing the kwargs things for all the other params that are in the API | 16:03 |
stephenfin | artom: I should have looked at that. My bad :( If you got time to rebase it onto the feature/r1 branch (which will be merged into master soon enough) I'll review it in the AM | 16:04 |
artom | stephenfin, I need to fix the unit test below it and rebase | 16:04 |
artom | stephenfin, it's just such a slog, and I'm lazy and easily distracted | 16:04 |
stephenfin | you should try cocaine | 16:04 |
artom | Speaking from experience? | 16:04 |
stephenfin | all the bankers I know swear by it | 16:04 |
artom | Explains the state of the financial system | 16:05 |
gibi | happy hours already? | 16:07 |
*** bhagyashris_ is now known as bhagyashris | 16:27 | |
sean-k-mooney | am are we goint to call it a day or do we want to do the pain points discussion | 16:31 |
sean-k-mooney | since we got time back | 16:31 |
sean-k-mooney | bauzas: ^ | 16:31 |
bauzas | sean-k-mooney: I prefer to leave early | 16:31 |
sean-k-mooney | ok | 16:32 |
bauzas | sean-k-mooney: I feel we can make all the agenda by tomorrow | 16:32 |
bauzas | we have 7 topics left | 16:33 |
sean-k-mooney | well we might but there is also tc sesssions tomorow | 16:33 |
sean-k-mooney | so we might not have quorm for the full day | 16:33 |
sean-k-mooney | but yes we likely can finish tomorow | 16:33 |
bauzas | sean-k-mooney: you're right, we're constrainted by the big TC RBAC thing | 16:34 |
bauzas | sean-k-mooney: but I feel we can postpone a few topics if we really need | 16:34 |
sean-k-mooney | as a last resort yes but in general we shoudl try to avoid that | 16:35 |
bauzas | agreed | 16:35 |
bauzas | I'll do a timekeeping thing | 16:35 |
bauzas | and try to not exceed 30 mins per topic | 16:36 |
sean-k-mooney | ok im going to step away for a few minutes. i was still using my wired headset today since i did not find my wireless one this morning so i have slight headach anyway. | 16:37 |
sean-k-mooney | it went away after 20 mins of not wareing it yesterday so hopefully the same will hapen today. | 16:38 |
bauzas | sean-k-mooney: yeah that's also why I wanted to stop earlier | 16:38 |
bauzas | we were not in the room | 16:38 |
bauzas | asking people to rejoin was an effort | 16:38 |
bauzas | so it would have meant 1 topic to discuss | 16:38 |
sean-k-mooney | yep getting momentum back is hard | 16:39 |
bauzas | sean-k-mooney: I see the TC discussion around RBAC occuring at 1:30pm until 3pm | 16:41 |
bauzas | sean-k-mooney: I accordingly flipped topics in the agenda | 16:41 |
sean-k-mooney | ack | 16:42 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot https://review.opendev.org/c/openstack/nova/+/813419 | 17:09 |
gibi | artom: I've fixed up your comments ^^ | 17:15 |
artom | gibi, *looks* you have some more asserts in the test that are... unrelated? Like I don't know how anal we want to be about this, but it's really only the last one we care about | 17:17 |
gibi | artom: I can drop the other asserts | 17:19 |
gibi | I'm also not sure about our strategy in these tests | 17:19 |
gibi | I admit I copied a previous test and modified that hence the bigs cope | 17:19 |
artom | Our unit tests are overly tied to the implementation and confusing? Say it ain't so ;) | 17:19 |
gibi | I don't like our unit tests either :) | 17:20 |
artom | gibi, yeah, I figured that was the case :) I think in this situation, with the code being what it is, what you have is OK | 17:20 |
artom | Well, minus the extraneous asserts | 17:20 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add a WA flag waiting for vif-plugged event during reboot https://review.opendev.org/c/openstack/nova/+/813419 | 17:29 |
gibi | artom: ^^ | 17:30 |
artom | gibi, cool, thanks for your patience :) | 17:32 |
gibi | artom: no worries. I do want to have nice unit tests so at least lets have the new ones nicer | 17:32 |
sean-k-mooney | i think the unit tests we write as small local tests are nice | 17:35 |
sean-k-mooney | but some of them are close to funcitonal test then unit | 17:36 |
sean-k-mooney | in that they test the behavior of things that are down several calls. | 17:36 |
gibi | yeah | 17:37 |
gibi | I finished for today. See you tomorrow | 17:37 |
gibi | o/ | 17:37 |
sean-k-mooney | o/ | 17:38 |
*** mdbooth1 is now known as mdbooth | 21:50 | |
rm_work | hey, was there a specific reason that properties/metadata isn't something you can filter by in a server list? | 23:50 |
rm_work | or, would that be a patch you might accept? | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!