Thursday, 2024-02-15

opendevreviewAmit Uniyal proposed openstack/nova master: Added context manager for instance lock  https://review.opendev.org/c/openstack/nova/+/87364805:43
opendevreviewAmit Uniyal proposed openstack/nova master: Separate OSError with ValueError  https://review.opendev.org/c/openstack/nova/+/90882505:43
opendevreviewAmit Uniyal proposed openstack/nova master: Disconnecting volume from the compute host  https://review.opendev.org/c/openstack/nova/+/87744605:43
opendevreviewAmit Uniyal proposed openstack/nova master: Removed explicit call to delete attachment  https://review.opendev.org/c/openstack/nova/+/89128905:43
opendevreviewAmit Uniyal proposed openstack/nova master: Separate OSError with ValueError  https://review.opendev.org/c/openstack/nova/+/90882508:47
opendevreviewAmit Uniyal proposed openstack/nova master: Disconnecting volume from the compute host  https://review.opendev.org/c/openstack/nova/+/87744608:47
opendevreviewAmit Uniyal proposed openstack/nova master: Removed explicit call to delete attachment  https://review.opendev.org/c/openstack/nova/+/89128908:48
tkajinamsean-k-mooney, hi. thanks for your feedback in libvirt related patches ! I've replied to your comment in https://review.opendev.org/c/openstack/nova/+/908546/ so please check it when you have time08:49
tkajinamin short, swtpm_ioctl has been required since swtpm support was initially added to libvirt.08:50
opendevreviewMerged openstack/nova master: libvirt: Stop unconditionally enabling evmcs  https://review.opendev.org/c/openstack/nova/+/89977610:22
opendevreviewsean mooney proposed openstack/nova stable/2023.2: libvirt: Stop unconditionally enabling evmcs  https://review.opendev.org/c/openstack/nova/+/90908211:12
sean-k-mooney[m]elodilles:  gibi  ^ can we start backporting that now that it finally merged on master. this need to go to 2023.1 to fix windows on amd 11:13
sean-k-mooney[m]there is one other releated patch for live migration but i belive that is still not merged on master11:13
sean-k-mooney[m]it can be backported seperatly11:14
gibisean-k-mooney[m]: thanks for the backport, +211:16
artomsean-k-mooney[m], oh you're doing the enlightenments backports?11:20
sean-k-mooneyartom: well it merged os i just did it via the ui11:52
sean-k-mooneyill let you actully do the other one or respin it11:53
sean-k-mooneyartom: do you have the linke to the live migration patch by the way11:54
artomsean-k-mooney, https://review.opendev.org/c/openstack/nova/+/904183?usp=email11:54
artomsean-k-mooney, so I'm doing from bobcat to antelope, and the downstream cherry-picl?11:54
sean-k-mooneyartom: yes please11:55
sean-k-mooneyand can you backport the other one too11:55
opendevreviewTakashi Kajinami proposed openstack/os-traits master: Add traits for TPM models  https://review.opendev.org/c/openstack/os-traits/+/90910711:57
artomsean-k-mooney, it didn't merge yet11:57
artomBut maybe we don't care, because it has +A11:57
sean-k-mooneyyes i know. we do care for the cherry picked form line i just ment when its ready can you do it and ping me12:00
sean-k-mooneyif you want to prepare them local youc an but hopefuly it will merge later today12:00
elodillessean-k-mooney[m]: i'm not super confident about that backport, so left a question in my comment there12:45
sean-k-mooneyelodilles: ack, right now you cant boot windows vms on amd host becuase of a reguression intoduced by enableing that feature12:46
sean-k-mooneyso we really need to undo that12:46
sean-k-mooneylive migration is also broken for a simialr reason12:46
elodillessean-k-mooney: ACK i understand that, but what about with intel hosts? we might remove a feature that is already used by some operators?12:46
sean-k-mooneyyep that hsoudl only reduce performace in some edge cases where the enlightnemtn was used12:48
elodillesand yepp, probably to fix this to work both AMD and Intel hosts has higher prio, i'm just asking whether this is the right way12:48
sean-k-mooneyto be clear i would have consider this a criticla bug requireing a new RC if we had found it before the release12:48
sean-k-mooneyelodilles: so to reenable this for intel hosts we will need a trait and a new falvor/image property12:49
sean-k-mooneybecause it depend on the hardware vendor of the underlying host12:50
sean-k-mooneywe cant just enable this via the exissting OS_TYPE=windows12:50
sean-k-mooneyit will need to be requested explictly to safely enable and supprot live migration as we will have to schdule on it12:50
elodillesACK, then i also tend to agree on backporting this. but to be on the safe side, i let other stable cores to also share their opinions on the patch o:)12:51
sean-k-mooneyelodilles: what will happen with the backport is existing vms will keep the enlightnement until they are hard rebooted, cold migrated, shelved or rebuilt12:51
elodillesis this already a bug since 2023.1 Antelope? :-o12:55
sean-k-mooneyyep...12:56
sean-k-mooneyi was kind of shocked it took so long for it to be reported12:56
sean-k-mooneywell i gues it didnt take that long to be reported but it did take a long time to get this fixed12:57
sean-k-mooneyi moved it to high the second i realsised the impact and confirmed the limitation12:58
elodillesi guess not many operators have windows guests on AMD hosts :-o12:58
sean-k-mooneythe workaround is just dont set OS_TYPE=windows on the image12:59
sean-k-mooneybut that disables all the enlightenment12:59
sean-k-mooneyand that reduces performance a lot12:59
sean-k-mooneyhaving them eaislly doubles the guest performacne and more in some workloads12:59
sean-k-mooneynot that one specificaly jsut in general13:00
sean-k-mooneyso truning off all the windows enlightenments is really not a compeling choice13:00
sean-k-mooneyelodilles: https://bugs.launchpad.net/nova/+bug/2046549 is more shocking to me13:04
sean-k-mooneyelodilles: i would have expected operators to be screaming at use for breakign live migration with windows guests13:05
elodilles:-o13:05
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/904183 is the fix but that is still in ci13:06
elodillesso that will be also backported together with the above fix i guess then13:08
sean-k-mooneythat would be the intent yes13:09
sean-k-mooneythey can be backported seperatly13:09
sean-k-mooneybut both are a high prioryt i think to backprot13:09
sean-k-mooneyi think that will be compelling ot operators to have espeiclly those who have not upgraded to antelope yet13:10
elodillesyepp, probably the thing is that 'those' operators have noy yet upgraded (not even to 2023.1 Antelope :S)13:16
opendevreviewMerged openstack/nova master: libvirt: stop enabling hyperv feature reenlightenment  https://review.opendev.org/c/openstack/nova/+/90418313:21
*** jph5 is now known as jph13:45
sean-k-mooneyartom: ^ fyi that merged on master so you can start the backport of that too now14:00
opendevreviewMohammed Naser proposed openstack/nova stable/2023.2: libvirt: stop enabling hyperv feature reenlightenment  https://review.opendev.org/c/openstack/nova/+/90908614:20
sean-k-mooneyJayF: for that its worth we have mypy in some files in nova14:31
sean-k-mooneyJayF: so its not all or nothing it can be added gradully14:31
JayFI know, and I think it would be cool, but that email the point was more.. do something rather than telling other people to do something 😀14:32
sean-k-mooneywhen startging new files like for the healthcheck stuff i have inteionally starte with mypy enabeld14:32
sean-k-mooneyyep14:32
sean-k-mooneyi am kind of trying to stay out of that thread14:32
sean-k-mooneypartly because i have been burnt on tryign to do simiear things and i just dont have time right now14:32
sean-k-mooneyi.e. i would like to moderenize some things and pay down tech debt14:33
sean-k-mooneylike i really want to see the SQLA2.0 stuff done in manilla so we can move to it14:34
sean-k-mooneyor get the wsgi/pyproject.yaml stuff doen14:34
sean-k-mooneyjust time...14:34
sean-k-mooneyJayF: thanks for responding :)14:35
JayFI have a friend who works on a major python library who always says (joking) that they're about 100 developers working on infrastructure across all technology.14:36
JayFFinding the quote from that paper about OSS contribution that said 97% of value was created by 5% of people... It's weird to see the gist of your validated by research14:37
JayF**your joke 14:37
sean-k-mooneythe best jokes are based in reality...14:38
* sean-k-mooney like centos stream being as stable as rhel... looks at the libvirt package that was broken for 3 weeks14:38
sean-k-mooneyi was chatting to one of the developer workign on stading up the infra for centos 10 stream yeasterday and they actully suggested that instead of using centos stream a a rhel next proxy we should be using https://docs.fedoraproject.org/en-US/eln/14:40
sean-k-mooneyapprenetly fedora eln is what facebook and amazone use to build adn test there rhel deritivtes 14:41
sean-k-mooneyif we were usign fedroa eln + rocky/almalinux we would ahve better proxies for btoh rhel next and rhel current14:42
sean-k-mooneythen useing stream gives us today14:42
opendevreviewDoug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660"  https://review.opendev.org/c/openstack/nova/+/90912215:35
opendevreviewDoug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660"  https://review.opendev.org/c/openstack/nova/+/90912215:45
dougszuAs promised sean-k-mooney ^. It's entirely clear to me yet. Will keep digging.15:56
dougszu*not15:56
dansmithbauzas: I thought you said you were going to remove the dict and use the guest XMLs to determine persistence in this: https://review.opendev.org/c/openstack/nova/+/90420917:15
dansmithdid I misunderstand or are we still waiting for that revision/17:15
mnaserI'd appreciate reviews on https://review.opendev.org/c/openstack/nova/+/909082 + https://review.opendev.org/c/openstack/nova/+/909086 -- we'll need to cherry-pick it a bit further down so b enice to merge those in17:26
bauzasdansmith : I'm not super opinionated about that, I eventually thought that it wouldn't be a problem if we reserve the mdevs with the dict18:00
dansmithusing the xmls seems better to me, but I know it's more work18:01
dansmithI'm mostly asking because I was expecting to see that change and haven't18:01
bauzasWell, we already look at the XMLs18:01
bauzasSo it should be automatically find the paused guests 18:02
bauzasSo the dict shouldn't be needed18:02
bauzasbut again, in case libvirt would change it, I'm not sure it would continue to work18:03
bauzasI can test it if you want18:03
dansmithI think it would be better to not have a thing that can get out of sync, so if it's reasonable to PoC it, I think that's worth it18:04
sean-k-mooneymnaser: i tought you might be interested in those18:18
sean-k-mooneydansmith: bauzas so im very hesitent to addign even more reliance on the xmls18:18
sean-k-mooneyi have said this  in the past that i woudl not have merged the orginal vgpu code that used the xmls to track what mdevs where in use18:18
sean-k-mooneywith that said i get dans concern about it getting out of sync18:18
sean-k-mooneybauzas: the dict is needed because the pasued guest does not exist in pre-live-migrate18:20
sean-k-mooneyits only created by libvirt when we start the migration18:20
sean-k-mooneyso if we dont have the dict pre precreate the domain in pre live migrate on the dest we woudl have a rcace18:21
sean-k-mooneyso to remove the dict you need to add creating a domin18:21
sean-k-mooneyyou shoudl not create a paused doamin however18:21
sean-k-mooneyjsut define it and create the approate cleanup codded for revert18:22
sean-k-mooneyjust beause i strongly dislike what we do with the xml does not mean im -2 on replacing the dict with pre defineing a domian to reserve the mdev18:22
bauzassean-k-mooney: the target is automatically created by libvirt 18:22
bauzasand then paused18:23
sean-k-mooneyyes but only wehn we call migrate2URL18:23
bauzasdon't ask me how18:23
bauzassean-k-mooney: yup frow what I saw18:23
sean-k-mooneyit wont exist in pre-live migrate18:23
bauzasindeed18:23
sean-k-mooneyso you cant jsut remove the dict18:23
sean-k-mooneywithout intoducting a race condition18:23
bauzasokay, so your preference is to keep the dict ? fine by me, as I said I'm not opiniated18:24
bauzasthere are pros and cons18:24
sean-k-mooneyif you want to remove the dict based on dansmith's feedback then you need to define the domain on the dest instead18:24
bauzasnot sure it may work18:24
sean-k-mooneyim ok with using a domian xml to reserve it as long as we evenutally stop using the xmls for this in the long term18:24
sean-k-mooneybauzas: you can defien a domain without starting it18:24
bauzasmaybe the best would be to persist the mdevs in some internal state18:24
sean-k-mooneyyes it would18:24
sean-k-mooneybut out of scope of this cycle18:25
bauzasso, if dansmith accepts, let's keep it straight as it is18:25
sean-k-mooneywe could also tack on the use of a doamin as a patch on the end18:25
sean-k-mooneyagain depending on dansmith's prefernce to derisk things18:26
bauzassean-k-mooney: we would need to delete the domain as soon as we call migrate18:26
sean-k-mooneyno we would not18:26
sean-k-mooneylibvirt shoudl update it18:26
bauzasand there could be residues too18:26
sean-k-mooneythere would be yes which is why i said you would have to update revert18:26
dansmithI don't want to push us to define the domain before we migrate, I just thought bauzas said it was already there by the time we needed to account for the things18:26
bauzashonestly, the more I think is to keep the existing18:27
sean-k-mooneydansmith: its not we are creating the mdevs and returning the uuid in prelive migrate18:27
bauzasdansmith: sean-k-mooney had a concern if we were removing the patch as the domain is only created by libvirt once we call migrate18:27
dansmithpersonally I don't like that we're keeping an in-memory dict which could get out of sync if we have an mq blip, where restarting compute will just make it work again and for which the operator has basically no visibility18:27
dansmiththe logs help, but...18:27
sean-k-mooneythen we generate the domain on the souce after that18:27
bauzasthere is no easy path here18:28
bauzasas I said, pros and cons for each of the solutions18:28
sean-k-mooneydansmith: ya i would prefer to be trackign mdev in novas db. but i would also prefer to do that next cycle18:28
dansmithyeah clearly not something we can do in the short term18:28
bauzasI'm just saying that a dict that would be emptied as soon as we restart the compute is probably the simpliest approach18:28
sean-k-mooneyyep18:29
dansmithbauzas: yeah it just sucks to make it fragile and restarts to fix things18:29
dansmithbut sounds like that's the option for now18:29
bauzasdansmith: only if the MQ blip happens on abort, right?18:29
bauzasthat's only when you have a residue18:29
dansmithbauzas: any time between when we record it and would commit/clean18:30
dansmithjust saying, it feels fragile, like I've _been_ saying, but I know the argument for it18:30
bauzaswe record it when we pre-livemigrate18:30
bauzasthen we delete it as soon as the migration is complete or aborted18:30
dansmithI understand18:31
bauzasif we have a MQ blip in between, then the live-migration could be impacted as well18:31
bauzasI said "could" because the migration could be run on different NIC18:31
bauzasbut what's important is that the message saying "I'm done, please remove it" or "I was aborted, please remove it" needs to arrive18:32
bauzasdansmith: I think I proposed a periodic, I'll try to add it on top of my series18:33
* bauzas needs to leave, I gonna work on the periodic tomorrow18:36
dansmithack18:37
* bauzas just needs to consider whether we use a libvirt method which is periodically called or if we want to create a specific one 18:37
bauzasbut I think we have some reconciliation methods in libvirt for the guests 18:37
bauzas(I'm sure, at least for the guest states)18:37
sean-k-mooneybauzas: either add a new one or do this as part of update_avaiable_resouces18:46
sean-k-mooneyi dont think there are any other perodics that would make sense18:47
sean-k-mooneyupdate_avaiable_resouces takes a resouce tracker lock and already acocunts for migration ectra18:47
sean-k-mooneyso removign instances form the dict that are not on this host and dont have a migration assocated with the host would be ok i think18:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!