Wednesday, 2023-02-08

opendevreviewMerged openstack/nova master: Abort startup if nodename conflict is detected  https://review.opendev.org/c/openstack/nova/+/87243201:14
opendevreviewMerged openstack/nova master: Stable compute uuid functional tests  https://review.opendev.org/c/openstack/nova/+/87244101:14
opendevreviewMaxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted  https://review.opendev.org/c/openstack/nova/+/87238505:58
*** blarnath is now known as d34dh0r5306:52
*** blarnath is now known as d34dh0r5307:02
opendevreviewsean mooney proposed openstack/nova master: introduce global greenpool  https://review.opendev.org/c/openstack/nova/+/87306107:19
opendevreviewsean mooney proposed openstack/nova master: introduce global greenpool  https://review.opendev.org/c/openstack/nova/+/87306107:30
opendevreviewsean mooney proposed openstack/nova master: introduce global greenpool  https://review.opendev.org/c/openstack/nova/+/87306107:39
opendevreviewsean mooney proposed openstack/nova master: introduce global greenpool  https://review.opendev.org/c/openstack/nova/+/87306107:52
opendevreviewPierre Libeau proposed openstack/nova master: Add mechanism to manage snapshot during nc init  https://review.opendev.org/c/openstack/nova/+/87306208:12
plibeau4hello guys, I have push https://review.opendev.org/c/openstack/nova/+/873062 to have your feedback before to start written some tests. I have explain the issue also in this bug: https://bugs.launchpad.net/nova/+bug/200655908:15
sean-k-mooneygibi: bauzas when ye are around take a look at https://review.opendev.org/c/openstack/nova/+/873061 and specificaly the failing tests here https://2f622fb6915ea6772a94-26db3adf591a82ec37c96a7d3180086f.ssl.cf2.rackcdn.com/873061/4/check/openstack-tox-py310/be4df4f/testr_results.html those are the unit tests that are leaking08:31
bauzassean-k-mooney: okay but fwiw the leaked tests for https://launchpad.net/bugs/194633908:35
sean-k-mooneyha ok i also know why08:35
bauzasare functional tests08:35
sean-k-mooneyhttps://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L49708:36
sean-k-mooneythe test is calling   h.initialize()08:36
sean-k-mooneywhich calls init_event which does   utils.spawn(self._dispatch_thread)08:37
sean-k-mooneybauzas: here are the functionla failures https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3e4/873061/4/check/nova-tox-functional-py310/3e45bc4/testr_results.html08:38
sean-k-mooneywe have many more of them08:38
bauzassean-k-mooney: we agreed yesterday on merging it after the RC1, right? 08:40
bauzasfor Bobcat I mean08:40
sean-k-mooneymaybe i created it because i tought it would find all the tests that were broken08:41
sean-k-mooneyi was more concenred with teh test fallout then any impact it woudl nova on nova in production08:43
sean-k-mooneythe code change is pretty low risk08:43
sean-k-mooneyfixign all the broken test could be a lot of work.08:44
sean-k-mooneybauzas: the libvirt event dispatch thread is curently while True08:45
sean-k-mooneyhttps://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L209-L21808:45
sean-k-mooneyto fix that we need a way to stop that thread for the functionl tests at least08:45
sean-k-mooneywell or we need to make sure its mocked out properly08:46
sean-k-mooneyhttps://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L607 is always creating 2 greenthreads that never exit if its called today08:48
sean-k-mooneyso the libvirt fixture need to be enhanced to stub that out08:49
bauzassean-k-mooney: the problem is that given we don't know which test is causing trouble, we won't be able to make sure your change can fix it08:49
sean-k-mooneymy patch show which tests are causing the issue08:49
bauzasI see08:50
sean-k-mooneybauzas: i added a fixture that detects when tests leak green threads https://review.opendev.org/c/openstack/nova/+/873061/4/nova/tests/fixtures/nova.py#113808:50
sean-k-mooneyand prints there name although that part currenlty has a race08:51
sean-k-mooneyfortunetly when we hit the races it still raise an exception and fails the test08:51
bauzasok, so then we could use your change for telling which functests leak out 08:52
sean-k-mooneyyep08:52
bauzasbut for the moment, we should only merge your change at the beginning of Bobcat08:53
sean-k-mooneyi suspect that much of the cases it in the common code08:53
bauzasjust because I want to have time to make sure that if we find some issues, it shouldn't be a time problem08:53
bauzas(I just want to be careful here)08:53
sean-k-mooneyso as i noted above. the host.py initalize currently spanw 2 greenthreads and that is call by the libvirt driver08:54
sean-k-mooneyi dont think that is stubbed by the the libvirt fixture08:55
sean-k-mooneyso today i think all libvirt functional tests are leaking at least 2 greenthreds.08:55
sean-k-mooneythe libvit event dispatcher thread and connection event thread08:55
bauzaslemme verify08:57
sean-k-mooneywe dont currently save the returned GT and they have a while true so currenlty there is no way to stop those but it would no be hard to add one08:57
bauzasat least we know the leaked thread is calling self.live_migration_abort()08:58
bauzasbut I think the leaked thread comes from a RPC call08:58
bauzasnot from a libvirt thread08:58
gibiI have some concerns of the pooling in general and I left comments there08:58
sean-k-mooneyack08:58
bauzashonestly, my concern is more about the time here08:59
gibibauzas, sean-k-mooney: I think the current pooling won't catch the RPC threads as that is created in oslo.messaging08:59
bauzasI see three cores at least looking at this bug while we only have 7 days for merging features08:59
sean-k-mooneygibi: yes it likely wont but those are not created directly by nova08:59
sean-k-mooneygibi: with that said i might be able to make it do that09:00
gibiI will keep rechecking bauzas'09:00
gibiI will keep rechecking bauzas's patch to catch a failure to see where it is coming from09:00
bauzasso, while I think it's important to have a better way to have green threads pooling, I'm just saying that we maybe should try to just find the issue and at least do other stuff09:00
bauzassean-k-mooney: see,that's a problem then09:00
sean-k-mooneybauzas: no its not09:00
bauzassean-k-mooney: as I said, I'm pretty sure that the threads that are leaked and create this libvirt exceptioin are RPC calls09:01
bauzaswhen I say a problem, I mean I'm not sure this change would help then09:01
sean-k-mooneyright but ye did not have a repoducer so i tried to create one and found a bunch of issue09:01
bauzashttps://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html09:01
sean-k-mooneyit may not fix the current issue but all the other tests if found may be flaky09:02
gibisean-k-mooney, bauzas: cool, we have two set of issues to solve then :)09:02
bauzasagreed09:02
bauzasand agreed with gibi09:02
sean-k-mooneyand if im wirte we are also constantly builiding up greenthreas as the test run09:02
bauzassean-k-mooneyI'm not saying "NO" to your change and thanks for having worked on it09:02
sean-k-mooneyyes09:02
sean-k-mooneytwo sets of issues09:02
bauzassean-k-mooney:I'm just saying that I'm afraid this won't help the functest CI failure we have atm09:03
bauzasanyway, I said loudly yesterday that I'll stop looking at the CI failures today and I'll rather review some changes09:04
bauzasos-vif and os-traits first, and then nova features09:04
sean-k-mooneythere arnt any we need for os-vif in this release09:04
bauzasand I'll continue to look at https://review.opendev.org/c/openstack/nova/+/872975 and recheck until we get a -109:04
sean-k-mooneynot sure about os-traits09:04
bauzasthat's what I'll be doing09:04
bauzaspeople are free to do anything09:05
bauzasthey prefer09:05
bauzasI also need to look at the releases we have for os-vif, os-traits and os-rc09:05
sean-k-mooneyi approved the os-vif one yesterday09:06
sean-k-mooneyi did not look at the others09:06
*** amorin_ is now known as amorin09:09
sean-k-mooneygibi: so we do stub out the events thread https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/tests/fixtures/libvirt.py#L919-L94109:09
sean-k-mooneybut not the other one09:09
sean-k-mooneyhttps://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L616-L61909:10
sean-k-mooneyi wonder if we can just more  utils.spawn(self._conn_event_thread) into self._init_events()09:11
gibisean-k-mooney: I won't mix the events part with the connection thread. the events part uses a native thread09:12
bauzaselodilles:  2023-02-07 14:02:46.446989 | compute1 | neutron-openvswitch-agent: no process found on https://review.opendev.org/c/openstack/nova/+/87170209:12
bauzaselodilles: that's the second recheck having the same problem09:13
gibibut sure we can wrap the connection thread spawning and mock it09:13
bauzaselodilles: so, OK, when you say "broken broken", I understand it :)09:13
sean-k-mooneywell these are both related to event handeling09:13
sean-k-mooneyso it feell like it should be also in _init_events09:14
gibibauzas: how do you feel about https://review.opendev.org/c/openstack/nova/+/872975/2/nova/virt/libvirt/driver.py#10064 our current trials haven't hit the true positive case yet, but already hit the half false positives a lot 09:14
gibisean-k-mooney: connection thread is where nova initiate calls to libvirt the event thread is where libvirt initiate call back to nova09:14
sean-k-mooneyok but both of them are related to the libvirt events transmit vs recive09:15
sean-k-mooneyhttps://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L48009:15
sean-k-mooneyand the doc string makes it sound like a good fit09:16
sean-k-mooneyit woudl jsut be adding the spawn to the end fo that function 09:16
sean-k-mooneyso we spawn the dispatche and reciver threads form the same place09:16
sean-k-mooneyand the native thread09:16
sean-k-mooneyanyway ill be back in an hour or two i was woken up at around 6 and started looking at this. so im going to see if i ca rest for a bit although at this point i may have missed that window09:18
gibisean-k-mooney: ack09:20
elodillesbauzas: yepp, it's broken broken this time :]09:23
elodillesbauzas: but this should do the work when merged: https://review.opendev.org/c/openstack/grenade/+/87296909:25
opendevreviewPierre Libeau proposed openstack/nova master: Add mechanism to manage snapshot during nc init  https://review.opendev.org/c/openstack/nova/+/87306209:26
zigosean-k-mooney[m]: Would you know how to easily just forbid *ALL* .vmdk on train and before?09:41
bauzasgibi: sure, let's do it09:43
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297509:48
bauzasgibi: ^09:49
gibibauzas: thanks10:10
bauzasfolks, on os-traits, given we only merged https://opendev.org/openstack/os-traits/commit/feb3e28a00eeab0d4dbf097dd801aff3adeb10d6 we don't need a release10:17
bauzasthat being said, there are 2 open changes that are related to some accepted nova blueprint https://review.opendev.org/q/status:open+project:openstack/os-traits10:18
bauzaselodilles: I just approved https://review.opendev.org/c/openstack/os-traits/+/871226 on os-traits10:18
bauzaselodilles: we'll need a release10:19
bauzascores, need a second +2/+W on https://review.opendev.org/c/openstack/os-traits/+/872185 to let Uggla not blocked by this os-trait change10:21
bauzaseven if there are very little chances to have virtiofs-scaphandre series to be mergeable in Antelope10:21
* bauzas needs to disappear until noon-ish UTC time10:25
opendevreviewMerged openstack/os-traits master: Add new 'COMPUTE_ADDRESS_SPACE_*' traits  https://review.opendev.org/c/openstack/os-traits/+/87122610:25
gibibauzas: it is in merge conflict now so I will rebase it and approve it10:30
opendevreviewMerged openstack/os-resource-classes master: Change minversion of tox to 3.18.0  https://review.opendev.org/c/openstack/os-resource-classes/+/79197410:31
opendevreviewBalazs Gibizer proposed openstack/os-traits master: Add 'COMPUTE_SHARE_LOCAL_FS'  https://review.opendev.org/c/openstack/os-traits/+/87218510:33
gibibauzas: done ^^10:34
gibibauzas: we need to bump the os-traits min version in placement once there is an os-traits release for A10:34
elodillesbauzas: ack, let me know when os-traits can be released10:42
gibielodilles: I think we only waiting for https://review.opendev.org/c/openstack/os-traits/+/872185 to land then we are good to go10:43
gibithe rest of the open os-traits reviews seems to be not related to the a release 10:43
gibis/a release/A release/10:44
elodillesgibi: ack, thanks10:46
elodillesas I see that patch will land within 2-3 minutes if everything goes well10:48
opendevreviewMerged openstack/os-traits master: Add 'COMPUTE_SHARE_LOCAL_FS'  https://review.opendev.org/c/openstack/os-traits/+/87218511:02
elodillesand merged ^^^, do you want me to create the release patch?11:11
gibielodilles: if it means you cannot +2 it then I will create the release patch :)11:14
elodillesgibi: then please do o:)11:22
gibion it11:25
opendevreviewJohn Garbutt proposed openstack/nova stable/victoria: Revert "Revert resize: wait for events according to hybrid plug"  https://review.opendev.org/c/openstack/nova/+/85742311:27
gibielodilles, bauzas https://review.opendev.org/c/openstack/releases/+/87310611:30
elodillesthanks! will review when the release jobs have finished11:34
gibielodilles: cool, and bauzas will be back soon I guess to review it too11:37
bauzasjust done11:38
* bauzas is eating a kebab while working :p11:38
elodilles+2'd!11:42
elodillesbauzas: Bon Appétit!11:42
gibibauzas: enjoy 11:45
*** blarnath is now known as d34dh0r5312:14
gibibauzas: I just realized that we don't have debug log enabled in the functional test by default so I will going to change https://review.opendev.org/c/openstack/nova/+/872975 to log a warning instead12:28
bauzasack, ok12:29
sean-k-mooneyzigo: the best you can do is disable the format in glance including disbaling the image conversion. there is no way to fully block it in nova and even then im not sure idsablity in glace will prevnet all ways it could be used.12:30
bauzasor pass OS_DEBUG=1 in tox.ini?12:30
sean-k-mooneyOS_DEBUG=1 can be set on the comand line12:30
sean-k-mooneyno need to modify tox.ini12:30
sean-k-mooneyoh you mean for ci12:30
sean-k-mooneyya you could in the DNM change12:30
sean-k-mooneybut its very verbose on failure i think the pass logs are not too bad12:31
bauzasyeah I mean in the specific change12:31
bauzasbut ok, will change the level12:31
sean-k-mooneywell it might be useful to use debug12:31
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297512:31
bauzashaha, gibi just did it :)12:32
gibiwe can switch to DEBUG all over for this patch if we want but that will mean a lot bigger subunit file to deal with :)12:35
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (db)  https://review.opendev.org/c/openstack/nova/+/83119312:55
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (objects)  https://review.opendev.org/c/openstack/nova/+/83940112:55
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (manila abstraction)  https://review.opendev.org/c/openstack/nova/+/83119412:55
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (drivers and compute manager part)  https://review.opendev.org/c/openstack/nova/+/83309012:55
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api)  https://review.opendev.org/c/openstack/nova/+/83683012:55
opendevreviewribaudr proposed openstack/nova master: Check shares support  https://review.opendev.org/c/openstack/nova/+/85049912:55
opendevreviewribaudr proposed openstack/nova master: Add metadata for shares  https://review.opendev.org/c/openstack/nova/+/85050012:55
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach notification  https://review.opendev.org/c/openstack/nova/+/85050112:55
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach notification  https://review.opendev.org/c/openstack/nova/+/85102812:55
opendevreviewribaudr proposed openstack/nova master: Add shares to InstancePayload  https://review.opendev.org/c/openstack/nova/+/85102912:55
opendevreviewribaudr proposed openstack/nova master: Add helper methods to attach/detach shares  https://review.opendev.org/c/openstack/nova/+/85208512:55
opendevreviewribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working.  https://review.opendev.org/c/openstack/nova/+/85208612:55
opendevreviewribaudr proposed openstack/nova master: Add virt/libvirt error test cases  https://review.opendev.org/c/openstack/nova/+/85208712:55
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/85482312:55
opendevreviewribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/85482412:55
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach_error notification  https://review.opendev.org/c/openstack/nova/+/86028212:55
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach_error notification  https://review.opendev.org/c/openstack/nova/+/86028312:55
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/86028412:55
opendevreviewribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028512:55
opendevreviewribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares  https://review.opendev.org/c/openstack/nova/+/86028612:55
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part)  https://review.opendev.org/c/openstack/nova/+/86028712:55
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028812:55
opendevreviewribaudr proposed openstack/nova master: Documentation  https://review.opendev.org/c/openstack/nova/+/87164212:55
*** dasm|off is now known as dasm|rover13:51
*** Roamer`_ is now known as Roamer`14:00
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api)  https://review.opendev.org/c/openstack/nova/+/83683014:11
opendevreviewribaudr proposed openstack/nova master: Check shares support  https://review.opendev.org/c/openstack/nova/+/85049914:11
opendevreviewribaudr proposed openstack/nova master: Add metadata for shares  https://review.opendev.org/c/openstack/nova/+/85050014:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach notification  https://review.opendev.org/c/openstack/nova/+/85050114:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach notification  https://review.opendev.org/c/openstack/nova/+/85102814:11
opendevreviewribaudr proposed openstack/nova master: Add shares to InstancePayload  https://review.opendev.org/c/openstack/nova/+/85102914:11
opendevreviewribaudr proposed openstack/nova master: Add helper methods to attach/detach shares  https://review.opendev.org/c/openstack/nova/+/85208514:11
opendevreviewribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working.  https://review.opendev.org/c/openstack/nova/+/85208614:11
opendevreviewribaudr proposed openstack/nova master: Add virt/libvirt error test cases  https://review.opendev.org/c/openstack/nova/+/85208714:11
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/85482314:11
opendevreviewribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/85482414:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach_error notification  https://review.opendev.org/c/openstack/nova/+/86028214:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach_error notification  https://review.opendev.org/c/openstack/nova/+/86028314:11
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/86028414:11
opendevreviewribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028514:11
opendevreviewribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares  https://review.opendev.org/c/openstack/nova/+/86028614:11
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part)  https://review.opendev.org/c/openstack/nova/+/86028714:11
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028814:11
opendevreviewribaudr proposed openstack/nova master: Documentation  https://review.opendev.org/c/openstack/nova/+/87164214:11
bauzasUggla: your're next in my review list, can I do it ? ^14:11
Ugglabauzas, sure14:12
bauzasack, grabbing a coffee first14:13
*** tosky_ is now known as tosky14:19
opendevreviewElod Illes proposed openstack/nova stable/train: DNM: CI test  https://review.opendev.org/c/openstack/nova/+/87311614:33
gibi bauzas: we are really unlucky with the DNM patch, it is still passing and not even producing the half false positives14:50
gibianyhow I rechecked again14:50
bauzasyup, I've seen it14:50
bauzasmaybe we would need to merge it for like 2 or 3 days and revert it after14:50
bauzasso more than one change would use it14:50
*** dasm|rover is now known as dasm|afk15:36
bauzasif, while I'm reviewing some series, someone wants to jab my own power management series, it would be loved <3 https://review.opendev.org/q/topic:bp%252Flibvirt-cpu-state-mgmt15:40
bauzasI can hit the bullet.15:40
gibibauzas: on my list to go back to it15:40
bauzasthanks15:41
bauzasI'm on the manila shares one, but a single coffee shot isn't enough15:41
opendevreviewJorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2  https://review.opendev.org/c/openstack/nova/+/87312715:50
gmanngibi: did you get chance to look into the placement RBAC change, updated as per your comment https://review.opendev.org/c/openstack/placement/+/86561816:57
gibigmann: good point. I'm fast tracking your patch now17:00
gibigmann: +2 +A thanks17:02
gibithe check queue getting long in zuul (>2h to get an executor). It almost feels like FF week :)17:03
gmanngibi: thanks17:10
gmannyeah, gate is in bad situation 17:10
opendevreviewDan Smith proposed openstack/nova master: Add docs for stable-compute-uuid behaviors  https://review.opendev.org/c/openstack/nova/+/87297717:36
gibibauzas: you might not like it but I reviewed the power managemenet series18:18
gibibauzas: I'm close to -2 on the tar.gz part ;)18:19
sean-k-mooneygibi: that is proably resolveable18:19
bauzasgibi: thanks for the fish18:19
bauzasgibi: do you have time for discussing about it ?18:19
sean-k-mooneywe started with a large copy of the syfs filesystem but the funcitonl test can be writeen a differnt way18:19
gibibauzas: I have a bit of time yes18:20
sean-k-mooneythe current approch was taken when thsi started as arbiterd18:20
sean-k-mooneyit had a much larger scope18:20
gibimy main concern with the tar.gz is that I cannot review it or diff it in a normal way18:20
gibiso it is hidden information18:20
bauzasgibi: I do understand your point18:20
sean-k-mooneyfor the limite scope in nova i htink we can create a fixture another way18:20
gibiI would do couple of helper function that generates the fs on the fly18:20
bauzasgibi: the problem is, as said by sean-k-mooney, is that the large number of files was creating a problem18:21
sean-k-mooneygibi: ya that was the alternitive18:21
sean-k-mooneybauzas: we dont need a large number of files18:21
gibibauzas: I assume we have two type of files 18:21
bauzasok, I can try to generate those arbitrary files then18:21
sean-k-mooneywe actully only need a small subset18:21
bauzaswe only need a subset of the sysfs18:21
bauzasyep, ok18:21
gibione for the governor and one for the online18:21
bauzassounds a plan then18:21
sean-k-mooneygibi: ya more or less18:21
bauzasyup18:22
gibihuhh, this went better than I expected :)18:22
sean-k-mooneyand then the parent directory structure18:22
bauzasand we can augment the fixture if needed18:22
sean-k-mooneybut thats just a call to "mkdir -p"18:22
bauzasfor other usages, I mean18:22
gibisean-k-mooney: +118:22
sean-k-mooneywithin the tempfs/tempdir18:22
bauzasyeah, I can do it quick18:22
gibiI found one edge case that might be intersting18:23
* bauzas doesn't have the kids at home until end of next week, my productivity is raising the bar18:23
sean-k-mooneyi didnt do that for arbiterd partly becasue it was a quick hack to get fucntional test and i expecet to do much more eventully18:23
gibiif you do reconfiguration from cpu_state to governor strategy or back then the patch might not do the right thing now18:23
sean-k-mooneyoh that is intersting18:23
gibii.e. we set the governor but we never online a cpu that was off before18:24
sean-k-mooneywe coudl disallow that. but that should be resolveable18:24
gibiyeah I think this is resolveable18:24
gibian fairly easily testable I hope18:24
bauzasgibi: ah, good edge case18:24
sean-k-mooneyso the issue is when we start we cant know what it used to be18:24
gibijust need a bit more logic on power_up and down18:24
bauzaswhat would be your preference ?18:24
sean-k-mooneyi actully think i know how to detect it18:25
bauzasto check the core status and online if a governor helper is called ?18:25
sean-k-mooneywhen its set to govoner the libvirt chack that checks the cores should be online should catch this18:25
gibibauzas: yeah18:25
bauzasok, I can augment power_down and up18:25
bauzasgibi: sean-k-mooney: fwiw, I'll be traveling tomorrow back-and-forth to Paris18:26
bauzasso most of my day will be trains18:26
bauzasbut I'll be online18:26
bauzasand on work status18:26
gibimaybe the other direction is trickier. so you have a governor startegy and you set the cpu to low. Then you reconfigure to cpu_state. And you boot a VM that need the cpu so you online it, but it still has the low governor set18:26
sean-k-mooneybauzas: you chould change https://review.opendev.org/c/openstack/nova/+/821228/5/nova/virt/libvirt/host.py#75518:26
sean-k-mooneywe shoudl check if cpu power management is enabled and its set to cpu_state18:27
sean-k-mooneyif its set to govoner then we can detect that cores are offlien and online them18:27
sean-k-mooneyor raise an error and stop the compute18:27
bauzasdamn operators who like to play with config options18:27
sean-k-mooneythis is not really something you should change when there are instance on the host18:28
sean-k-mooneygibi: the other direction wee should not really do anything (state->govoner)18:28
bauzassean-k-mooney: true, I wonder whether it would be rather preferable to detect it at startup and fail18:28
sean-k-mooneysorry (govoner->state)18:29
gibiI'm OK to reject the reconfiguration with instance on the host, BUT we power down CPUs at init_host without an instance 18:29
sean-k-mooneybecause you are allowed to manage the govoner today outside of nova18:29
gibiso you don't need an instance to get a discrepancy18:29
sean-k-mooneyso you may have change the state18:29
bauzasI think we're entering danger zone18:29
sean-k-mooneywith that we could say if you have ONF.libvirt.cpu_power_management=true18:29
bauzashere, we're asking operators to think about what they gonna do18:30
sean-k-mooneythen all govenros in the cpu_dedicated_set must be the same18:30
sean-k-mooneywe could detect that in that case and catch the (govoner->state) change18:30
bauzasI mean, if I'm an operator that already sets governor levels, then I should NOT let nova play with my cores18:30
sean-k-mooneyso (govoner->state) we can detect that there is a govoner differnce on the cpus, (govoner<-state) we can detect the offline cores18:31
bauzassean-k-mooney: I don't want to add logic while I can add documentation about supported features18:31
sean-k-mooneyi agree on documenting that you cant cahnge this with instances on the host18:31
sean-k-mooneyand shoudl reboot if you do change it or something like that18:32
sean-k-mooneybut we can detect and error or fix it fi we want too18:32
sean-k-mooneyif we have CONF.libvirt.cpu_power_management=true i think that shoudl mean you cannot manage the cpus outside of nova18:32
gibiagain, becase we apply the config to the cpus at init_host regardless of any instance on the host you don't need an instance on the host to cause problems18:32
sean-k-mooneygibi: we dont your corect18:33
sean-k-mooneybut we can detect it and document that you should not do this without a host reboot18:33
bauzasI'm confused18:33
gibisean-k-mooney: I agree to document 18:34
gibisean-k-mooney: and detect if possible18:34
sean-k-mooneydetect is possibel but i think bauzas  would prefer to not require that for the feature to merge and maybe to it later18:34
sean-k-mooney(govoner->state) the concer is we leave some cpus in low perfomacne mode18:35
sean-k-mooney(state-)18:35
sean-k-mooney(state->govoner) the concern is we leave some cores offline18:35
sean-k-mooneywe can detect boot and make it a hard error from init_host18:35
sean-k-mooneyin either case a host reboot will resolve it or they operator can fix it18:36
gibiyepp18:37
sean-k-mooneybauzas: does ^ make sense18:38
sean-k-mooneythe offline core check is triival its just https://review.opendev.org/c/openstack/nova/+/821228/5/nova/virt/libvirt/host.py#75518:39
sean-k-mooneythe inconsitent govoner state is also trivial18:39
sean-k-mooneyi dont think we shoudl check if its explictly powersave or anything like that18:39
opendevreviewMerged openstack/placement master: Modify the placement API policies defaults and scope_type  https://review.opendev.org/c/openstack/placement/+/86561818:39
sean-k-mooneyjust in cpu_sate mofe assert the govoner is the same for all cores in cpu_dedicated_set18:40
sean-k-mooneys/mofe/mode/18:41
bauzasI just want to clarify18:41
bauzasare we OK with doing this in init_host() ?18:42
gibiyes18:42
sean-k-mooneyyep18:42
bauzasOK18:42
bauzasand in init_host(), do we agree on detecting that if power strategy is set to governor, all cores should be up ?18:43
sean-k-mooneyi would prefer if you put these in a new fucntions that you invoked form there for simpler testing18:43
sean-k-mooneybauzas: yes18:44
sean-k-mooneyif its set to govoner the old requiremnt that all core must be up should still apply18:44
bauzashttps://review.opendev.org/c/openstack/nova/+/868237/9/nova/virt/libvirt/driver.py#82618:44
bauzasI already call power_down_all_cores()18:44
bauzasin power_down_all_cores() I can do two checks 18:45
bauzasbesides the existing one18:45
sean-k-mooneyyou should not over complicate that function18:45
bauzas1/ if strategy is set to governor, all cpus need to be online18:45
bauzas3/ if strategy is set to cpu_state, then all governors should be identical 18:46
bauzass/3/218:46
sean-k-mooneyi would prefer a new validate_cores() or similr function and only call  libvirt_cpu.power_down_all_dedicated_cpus() if the mode is state18:46
sean-k-mooneybauzas: yes those are the two checks that i think you shoudl do18:47
sean-k-mooneybut not in libvirt_cpu.power_down_all_dedicated_cpus()18:47
sean-k-mooneydo it ins libvirt_cpu.validate_all_cpus()18:47
bauzasI can manage that request18:47
gibifor 2/ if all cores has governor low as nova set it to that before the reconfiguration then we have a problem still18:47
sean-k-mooneygibi: no18:47
sean-k-mooneywe cant check for that18:47
sean-k-mooneybecause the admin might have set it to low intentionally18:48
sean-k-mooneyor to any other value18:48
bauzasthis is why I think docs is important18:48
bauzas(which is missing, but we can write it before RC1)18:48
sean-k-mooneythats why i stated the requiremnt that they should all be the same if the power state is manged by nova18:48
bauzasI'm OK with this18:49
bauzasnova will try to detect18:49
bauzasif the operator explicitely manages all the governors and wants to set nova to turn off cores, that's his choice18:49
gibiI undrestand that we cannot catch that if the operator changed a governor directly. And I agree that we should not be able to catch that. But if an empty compute was configured with governor strategy, then reconfigured to cpu_state strategy then that compute will have all dediceted cpus in low performance mode for ever18:49
bauzashe'll probably end up with problems, but meh, unsupported18:49
sean-k-mooneygibi: that is fixed by the host reboot18:50
sean-k-mooneythat why i think we need to document that if you want to change this you should reboot the host18:50
sean-k-mooneyso we start form a clean state18:50
gibisean-k-mooney: so we will say in the doc that the startegy can only be change via host reboot?18:50
sean-k-mooneythat is what im suggestign yes18:50
gibiOK18:50
gibithat will solve it yes18:50
bauzasgibi: the problem is that we can't assume that governor_high is the default value *before* 18:50
gibibauzas: I know :)18:51
gibihost reboot, it is18:51
bauzasok, so docs18:51
sean-k-mooneydocs and the 2 checks you wrote above please18:51
sean-k-mooneydocs is the most imporant18:51
sean-k-mooneybut i would like to see the check if possibel too18:51
gibiworks for me18:52
gibithanks folks for the discussion18:52
bauzas++18:52
bauzaswill be working on it on the train tomorrow18:52
sean-k-mooneyi will be around for reviews tomorrow and then wednesday18:53
sean-k-mooneyso feel free to ping me18:54
sean-k-mooneyi would normaly take the week of valentines off but since it FF week im going to be here for wednesday-friday18:54
bauzasdon't feel obliged18:57
bauzaswe have already promised more than what we can offer18:58
bauzasand we're short in time18:58
bauzasthe numbers will be terrible, but I can surely explain those18:58
opendevreviewElod Illes proposed openstack/nova stable/train: DNM: CI test  https://review.opendev.org/c/openstack/nova/+/87311620:21
gmannbauzas: with placement change merged (https://review.opendev.org/c/openstack/placement/+/865618) we can mark this BP as completed https://blueprints.launchpad.net/placement/+spec/policy-defaults-improvement21:09
*** dasm|afk is now known as dasm|offline22:05

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!