Wednesday, 2023-12-06

dpawlikmelwitt: hey, so we have configured retention to 14 days - https://opendev.org/openstack/ci-log-processing/src/branch/master/opensearch-config#create-ilm-index-lifecycle-management - but we have enough space to keep it 30 days (until debug messages are sent to OpenSearch). Only subunit index got retention 7 days.07:33
dpawlikshould we increase?07:35
dpawlikand if you expected the result within 30 days, unfortunately it will only be within 14 days :| 07:37
dpawliklet me know if I should change the policy to 30 days :)07:38
*** Guest8552 is now known as layer808:37
*** layer8 is now known as Guest939108:37
*** Guest9391 is now known as layer908:39
*** layer9 is now known as l808:45
bauzashappy spec review day everyone09:27
gibi /o\09:33
gibi(thanks for the reminder :D)09:33
opendevreviewStephen Finucane proposed openstack/nova master: Remove nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268609:51
opendevreviewStephen Finucane proposed openstack/nova master: Add new nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268709:51
opendevreviewStephen Finucane proposed openstack/nova master: setup: Remove pbr's wsgi_scripts  https://review.opendev.org/c/openstack/nova/+/90268809:51
bauzasgibi: I'll be honest, I'm currently fixing some problem I had with my machine for our company demo yesterday :)10:22
opendevreviewStephen Finucane proposed openstack/nova master: Add new nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268711:01
opendevreviewStephen Finucane proposed openstack/nova master: setup: Remove pbr's wsgi_scripts  https://review.opendev.org/c/openstack/nova/+/90268811:01
opendevreviewStephen Finucane proposed openstack/nova master: Add new nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268711:07
opendevreviewStephen Finucane proposed openstack/nova master: setup: Remove pbr's wsgi_scripts  https://review.opendev.org/c/openstack/nova/+/90268811:07
stephenfinsean-k-mooney: jfyi https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/5RQGZZMCKKHAVH4OLSJCLBQNIHYDPGZD/11:12
stephenfinI tested it locally. Works fine.11:13
opendevreviewMerged openstack/nova master: Use split kernel/initramfs Cirros UEC image by default in jobs  https://review.opendev.org/c/openstack/nova/+/90221711:13
sean-k-mooney[m]stephenfin:  related i was able to demonstrate that the current support is broken https://review.opendev.org/c/openstack/pbr/+/90072811:18
sean-k-mooney[m]in pbr11:18
sean-k-mooney[m]and i think i know how to fix it but ya havent had time to try and implement it since11:18
bauzassean-k-mooney: thanks for your review of my mdev live-mig spec, will look at your points a bit later 12:55
bauzassean-k-mooney: now, about my problem with the CONF object, I checked and basically the cleanups we have in the oslo.config fixture don't delete the dynamically created groups12:55
bauzasI just checked, we pass the same object between tests12:56
sean-k-mooney[m]ack so this is just a gap in the fixture12:56
bauzasfixing it would require our nova fixture to add some cleanup that would delete the dynamically generated groups12:56
sean-k-mooney[m]ya12:56
sean-k-mooney[m]i think that is fine12:56
sean-k-mooney[m]the fixture should be fully resetting the object12:57
bauzasyeah, so after https://github.com/openstack/nova/blob/09b651540f120ad30fa7127e87b27a6363044dcf/nova/tests/fixtures/conf.py#L69 we would add a addCleanup(devices.unregister_dynamic_opts)12:57
sean-k-mooney[m]am ya i guess that works that would do the cleanup at the end for the test12:57
bauzassean-k-mooney: I just checked and calling self.conf.reset is useless12:57
sean-k-mooney[m]but we could also just reassing an empty config bject to the gloabl12:58
bauzasthat's even already done by the oslo.config fixture itself https://github.com/openstack/oslo.config/blob/master/oslo_config/fixture.py#L4112:58
bauzasso there is no need to duplicate the useless call in the nova fixture12:58
sean-k-mooney[m]that not what i ment12:58
sean-k-mooney[m]i ment actully concruting a new object and assigning to the singleton12:58
bauzassean-k-mooney: yeah I thought about recreating a whole object12:58
bauzasbut I don't know how to make it cleanly12:59
sean-k-mooney[m]ack12:59
sean-k-mooney[m]lets try the self.addCleanup approch12:59
bauzasfor the sake of the series, I will remove https://review.opendev.org/c/openstack/nova/+/902084/3/nova/tests/unit/virt/libvirt/test_driver.py#26492 from the patch13:00
bauzasand I'll provide this test in a separate change that would also add the addcleanup fix for the fixture13:00
bauzasI maybe have an idea for recreating a new conf object but I need to test this13:02
bauzas(stocking the original object at the beginning of setUp and then assigning it to self.conf in addcleanup)13:03
bauzasshall work, provided oslo.config allows copying a CONF object13:03
eanderssonAnyone that can take a look at this minor patch? It just fixes some missing unt test coverage that was being ignored by coverage due to the threading model https://review.opendev.org/c/openstack/nova/+/90011613:05
opendevreviewSylvain Bauza proposed openstack/nova master: vgpu: Allow device_addresses to not be set  https://review.opendev.org/c/openstack/nova/+/90208413:27
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: add coverage for registered dyn opts  https://review.opendev.org/c/openstack/nova/+/90277113:27
gibisean-k-mooney: I'm +2 on the PCI grouping spec. I think the direction there is good and we will anyhow need to touch the implementation to learn the gotchas of the current code13:47
gibisean-k-mooney: I also see we are on a similar page with the PXB spec about opt-in ness 13:48
* gibi stops reviewing specs now, if I need to re-review something the please ping me13:49
sean-k-mooneygibi: ok am im more or less ok with merging the pci grouping spec.14:17
sean-k-mooneyi was getting a bit burnt out with the reviews this morning so i wanted to take another skim over it after a break14:17
sean-k-mooneybut assuumign i dont see anything on the second pass ill upgrade to +2w and we can dig into the detail mroe in the impelmenteion review and update the spec accordingly if required14:18
bauzasI need to look at the specs14:23
bauzasETOOMANYTHINGSHERE14:24
bauzassean-k-mooney: do you have time to discuss about https://review.opendev.org/c/openstack/nova-specs/+/900636/comment/2948412c_75d00114/ ?14:28
sean-k-mooney am i technially have back to back meeting for the next 4 hours but on irc yes14:30
sean-k-mooneygmeet ectra no14:31
sean-k-mooneybasically i would like to keep the conductor check minimal and just to the main compatiblity check in in pre_live_migrate14:31
sean-k-mooneyits what we do for sriov live migration https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/libvirt-neutron-sriov-livemigration.html#resource-claims14:33
sean-k-mooneywell not quite14:33
sean-k-mooneywe actully added an rpc in check_can_live_migrate_destination14:33
sean-k-mooneyto do the claim i guess14:33
sean-k-mooneyif you are going to add an rpc to do that i woudl be fine with doing it in check_can_live_migrate_destination14:34
sean-k-mooneybaiscally i want to minimise the amount of rpc callses we need and reuse the ones we already have currently14:35
bauzassean-k-mooney: tbc, I tried to do like for other features14:36
opendevreviewMerged openstack/nova-specs master: resubmit per-process-healthchecks spec for 2024.1  https://review.opendev.org/c/openstack/nova-specs/+/89722514:36
bauzasI thought it would be nice to check the types by the conductor14:37
bauzasbecause operators couldn't use traits or something else 14:37
bauzassay custom rcs14:37
sean-k-mooneywell if they are using traits ro customer reousces 14:37
bauzaswe could do this in pre-live-migrate for sure, after the conductor14:37
sean-k-mooneythen the type is checked by placment14:37
bauzassean-k-mooney: yeah indeed, but that's optional14:38
sean-k-mooneyright i think that shoudl be requried14:38
bauzasI don't think so14:38
sean-k-mooneyi know14:38
bauzassome operators just want to provide VGPUs14:38
sean-k-mooneyso i think the conductor check could be doen but im worried it will be somewhat expensive14:38
bauzasgiven every new GPU has specific types14:38
bauzassean-k-mooney: shouldn't be that expensive as we already have the calls14:39
sean-k-mooneyso what i can do is try and reivew the code and that section again14:39
sean-k-mooneywhen i got to your spec i was starting tot get review burn out14:39
bauzasno worries14:40
bauzasthis is somehow an implementation question14:40
bauzasI mean, testing the types is a design point14:40
bauzasbut where can be something related to the implementation14:41
bauzasI can modify my spec accordingly14:41
sean-k-mooneyso you plan ot extend https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L359-L36214:42
sean-k-mooneyif check_can_live_migrate_destination was passed the currnt mdev type and uuid touple14:42
sean-k-mooneyand did the claim and populated it in migrate_data14:42
sean-k-mooneyi would be happy with that14:43
sean-k-mooneythat would be like how sriov works14:43
sean-k-mooneywhat i didnt really liek about your proposal is it was only checking the types14:43
sean-k-mooneywe currntly call into the driver here https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8537-L853914:45
opendevreviewMerged openstack/nova-specs master: Re-propose "Add maxphysaddr support for Libvirt" for 2024.1 Caracal  https://review.opendev.org/c/openstack/nova-specs/+/89513514:45
sean-k-mooneyso you could update the in memory dict and pass back the uuids there14:46
sean-k-mooneyfor sriov we do the claim here https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8563-L8565 but that is not in the driver specific section since the pci_tracker is driver independent14:47
bauzasyeah I saw your optimization point14:47
bauzasand yeah I saw for SRIOV14:47
sean-k-mooneyits not just optimisztion14:47
sean-k-mooneywhat im hoping to prevent is racing for the same mdev betwen boots and live migrations14:47
sean-k-mooneyso if we can atomically allocate the mdev that will be used in check_can_live_migrate_destination14:48
sean-k-mooneythen i think that helps prevent a subset of possibel issues14:48
sean-k-mooneyperhaps im wrong but that was part of the motivation14:48
bauzasoh14:49
bauzasI see14:49
sean-k-mooneydoiong only the tyep check here seams wasteful but if we also selefct the mdev here14:49
bauzasI need to remember 14:49
sean-k-mooneythen im all for it14:49
bauzasbecause I cared about the races indeed14:49
bauzasand I wanted to reserve the mdev as close as we claim14:49
sean-k-mooneyfor sriov it slight diffent as we have 3 states avaiable claimed and allocated14:50
bauzasthat's why I added the 'claim' in post_claim_migrate_data()14:50
bauzasas it was literally just after the claim14:50
sean-k-mooneythe diffent between the later too is cliamed means the vm is not running on the node yet and allcoated is after it is14:50
bauzaswe could 'claim' those mdevs a bit ealier14:50
bauzasbut that means we would claim by check_can_live_migrate_destination() 14:51
sean-k-mooneyyes14:51
sean-k-mooneyso i would like to do that14:51
bauzaswfm14:51
bauzasso we would return the tuple like you said14:52
bauzassounds good to me14:52
sean-k-mooneyok care to update that in the spec and im happy to review14:52
bauzasyeah, I'll try14:52
bauzasI'm also packed with meetings the day along :)14:52
sean-k-mooneythe one i was in finished early so i had a break14:52
bauzashappy you14:52
bauzasnext one sounds optional to me14:53
bauzasso I'll probably have time14:53
gibiIs it me or we don't have vTPM tempest tests? Would it be possible to run a simple server create with vTPM in the gate? 15:15
bauzascan't tell15:47
bauzasgibi: I guess we need to use barbican for this, right?15:48
bauzasso it would be a specific job 15:48
bauzasenabling both vTPM and barbican15:49
gibiright it needs barbican15:49
bauzashttps://github.com/openstack/nova/blob/master/.zuul.yaml#L897C37-L897C3715:49
bauzasgibi: ^15:50
bauzasI just see barbican but without vTPM https://opendev.org/openstack/barbican-tempest-plugin/src/branch/master/.zuul.yaml#L23-L5915:53
gibiOK, so our default jobs does not have barbican, but we have a separate job with barbican15:54
bauzascorrect, but we run barbican jobs on our check pipeline15:55
bauzashttps://zuul.opendev.org/t/openstack/builds?job_name=barbican-tempest-plugin-simple-crypto&skip=015:55
bauzas(n-v)15:56
bauzasbut I'm not able to see a single job definition setting vtpm 15:56
gibiI will try to put together a simple tempest test that boot a VM with vtpm, and put it behind a tempest feature flag. Then we can try to enable that flag in our barbican doc16:10
gibis/doc/job/16:10
bauzassean-k-mooney: I now remember why I needed to do the claim in post_claim_migrate_data() and not in one lockstep by check_can_live_migrate_destination()16:32
bauzastl;dr: when we call check_can_live_migrate_destination(), we don't know neither the used mdev from the guest nor its type16:33
bauzaswe just have an instance object16:33
bauzasso we need this kind of tic-tac-toe between dest>src>dest16:34
bauzascall it TCP handshake :p16:34
artomgibi, we have vTPM tests in whitebox16:43
artomgibi, example run: https://70d77d1b8efad238ddf5-24c1b4d0bad6f112f42872ab6905686e.ssl.cf5.rackcdn.com/824772/32/check/whitebox-devstack-multinode/2a8b9df/testr_results.html16:45
artom(expand the last line)16:45
gibinice16:45
gibithanks16:45
gibinow I just need to figure out if we have whitebox available in our tempest container used in openstack-k8s-operators testing...16:45
artomOut of the box? I suspect not, but hopefully they have the framework in place to add arbitrary tempest plugins16:46
gibiartom: we will see shortly  https://github.com/openstack-k8s-operators/openstack-operator/pull/589 :D16:56
sean-k-mooneybauzas: i see. thats less then ideal but if you can docuemnt it in the spec i guess i can live with that for now17:07
sean-k-mooneyim not really sure how to adress that otherwize17:07
sean-k-mooneyim not sure we reallly need to do the mdev type check at the conducor level17:08
sean-k-mooneybut if you think it helps then ok17:08
bauzassean-k-mooney: I can write something indeed explaining more the ping-pong17:08
sean-k-mooneythanks17:08
bauzassean-k-mooney: we could just pass the source mdevs to the dest and then the dest would claim a type17:09
bauzasso basically removing the first item17:09
bauzasdest would claim a *mdev*17:09
sean-k-mooneyright that why i was orginlly suggetion jsut doing it in pre_live_migrate17:10
bauzaspre_live_migration is after the claim17:10
bauzasso I think we need to reserve the mdev in the conductor call to compute 17:10
sean-k-mooneybut we dont have the mdev types at that point right17:11
sean-k-mooneyform the souce node17:11
sean-k-mooneywell i guess17:11
sean-k-mooneywe can claim them17:11
sean-k-mooneybut we wont detect if the type are diffent 17:11
sean-k-mooneyuntil after we call the souce node17:11
sean-k-mooneyso we will claim them on the test17:11
sean-k-mooneythen apass the info to the soruce with the types17:11
sean-k-mooneyand it can check if they change and abort17:12
bauzashum17:12
bauzasactually, this is not possible, damn17:13
bauzaswe call the destination first, then we call the sourcer17:13
bauzasand then we claim17:13
sean-k-mooneyright but the call to the dest coudl do the claim17:17
sean-k-mooneythen pass to the souce to confirm if the types match17:17
sean-k-mooneyactully no17:17
sean-k-mooneywe are stuck with the 3 stpes i think17:17
sean-k-mooneylets pick up this converstaion tomorrow17:17
gibiartom: we have a chance https://github.com/openstack-k8s-operators/tcib/blob/4aef81881de941f0251aa848515243f666b18672/container-images/tcib/base/os/tempest/tempest-extras/tempest-extras.yaml#L417:32
artomOh wow17:32
artomjparker++17:32
artomI'm assuming he's to blame :D17:32
artom(Not literal `git blame`)17:33
gibialmost :) https://github.com/openstack-k8s-operators/tcib/commit/199178028bba73548e21df5171a620df7981b45117:34
artomJames was pulling the strings17:40
gibiindeed, he is still pulling it https://github.com/openstack-k8s-operators/tcib/pull/11017:41
artomHe's pull requesting them :D17:42
gibiexactly :D17:46
opendevreviewmelanie witt proposed openstack/nova master: Set UEC image vars for jobs not defined in Nova  https://review.opendev.org/c/openstack/nova/+/90280919:04
sean-k-mooneymelwitt: oh right i guess we should do that for those too19:15
sean-k-mooneyill try and take a look at ^ tomorrow19:17
sean-k-mooneyone thing we might want to consider is actully makeing this change in the base devstack job19:18
sean-k-mooneynova is really the only project that likely cares about testing the full disk image code path19:18
sean-k-mooneyso defaulting to the uec image might be worth considering in the tempest/devstack base job19:18
sean-k-mooneygmann: ^ we should proably conisder that after we have run nova on them for a few weeks19:19
gmannsean-k-mooney: melwitt yeah, i am thinking on that, if not default at least we can do that for most of the general integrated jobs in tempest and keep a few of them for example tempest-full-py3 on full image19:20
gmannyes, let's observe it in nova for a few weeks first19:21
melwittsean-k-mooney, gmann: I realized the above (jobs not defined in nova) bc I saw tempest-integrated-compute fail on the follow up patch and it was the guest kernel panic. at first I  was confused how/why could that be if we've got the UEC image but then I realized it is _not_ using the image bc I had only set the vars for the jobs defined by us in our .zuul.yaml19:34
sean-k-mooneyyep i forgot about the other jobs too19:35
melwittsean-k-mooney: semi related to that, I had been meaning to ask you, why does using the UEC image work to avoid guest kernel panics?19:35
melwittjust a curiosity question19:36
opendevreviewMerged openstack/nova-specs master: Add spec for PCI Groups  https://review.opendev.org/c/openstack/nova-specs/+/89971919:40
sean-k-mooneymelwitt: i dont actully know. i suspect it takign a slightly diffent path in both qemu and in the init shell script19:41
sean-k-mooneymelwitt: i know neutron had previosuly added apci=off or something like that19:41
sean-k-mooneyto disable the apic timer19:41
melwittah ok19:41
sean-k-mooneybut they dont actully do that  any more19:41
sean-k-mooneybasiclly one of the panics suggeted doign that and when you use the split image19:42
sean-k-mooneynova supprot passing kernel command line args to qemu19:42
sean-k-mooneyso orginally they explored this just to follow the mesage output which said A X to the command line to turn this off19:42
sean-k-mooneymelwitt: https://bugs.launchpad.net/openstack-gate/+bug/171316219:45
sean-k-mooneyKernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option.19:45
sean-k-mooneyso ya they orgianlly tied setting noapic on the kernel commadn line19:45
melwittohh I see19:45
sean-k-mooneyi "fixed" this a diffent way in devstack19:46
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#workarounds.libvirt_disable_apic19:46
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/766043 and https://github.com/openstack/devstack/commit/1e86a25cc28e34d7f73a4c6ccbbc3fc667598d5019:47
sean-k-mooneylee actully enabeld it in devstack19:48
melwittoh, so using the split image seems to be a way to alleviate the problem without disabling the apic timer?19:48
sean-k-mooneyso i tought at one point we were addign the noapic thign somewhere19:48
sean-k-mooneyi just dont know where that is exactly19:49
sean-k-mooneymelwitt: but ya apprently the split imag eis enouch i just cant prove that19:49
melwittwell that's confusing. it's already been set to disabled but it was still panicking? 🤔 19:50
sean-k-mooneyso there are two tings 19:50
sean-k-mooneythe apic and ioapic19:50
sean-k-mooneywe are disabling the apic19:50
melwittoh19:50
sean-k-mooneyhttps://bugs.launchpad.net/nova/+bug/193910819:51
sean-k-mooneyim not sure if i orgianlly got those mixed up or not19:51
sean-k-mooneyin anycase i think thats a slighly idffent kernel panic19:52
melwitt🤯 19:52
sean-k-mooneyas that speicic one was a upstrema bug in the cirros iamge kernel that was fixed in a later release19:52
sean-k-mooneybasically this specific panicn in that bug should not happen in 6.x cirros iamges19:52
melwittI see19:53
sean-k-mooneyfor what its worth it might be interesting to see if we can disabel that workaround once we have moved to the split miage19:55
sean-k-mooney*image19:55
sean-k-mooneywe are using cirros 6.x now so it should not eb required19:56
melwittgotcha19:57
sean-k-mooneyso we could perhaps deprecate it this cycle and remove it in a future release19:57
melwittnone of the panics I have seen had the message that specifically names the APIC timer so that seems to track19:57
melwitt(from that bug)19:58
sean-k-mooneyya same19:58
sean-k-mooneyok im going to call it a day20:00
sean-k-mooneyill chekc back on your patches tomorrow o/20:00
melwittthanks sean-k-mooney, seeya tomorrow o/20:05
JayFsean-k-mooney: I know you're interested in getting sharding in this cycle; I'm trying to land stephen's chain here https://review.opendev.org/c/openstack/nova/+/659691/38 before re-landing sharding; if you want to take a look I think Ironic cores are happy (Ironic CI passes on the head of that chain)20:15
melwittdpawlik: hi, thanks for confirming! it was just a misunderstanding about swift log retention vs opensearch index retention. the swift log retention is 30 days but opensearch retains less. I think this is ok :)20:24
opendevreviewMerged openstack/nova master: Fix coverage issues with eventlet  https://review.opendev.org/c/openstack/nova/+/90011621:32
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key21:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!