Thursday, 2021-10-14

*** tbachman is now known as Guest280400:15
*** bhagyashris is now known as bhagyashris|out03:30
opendevreviewLucian Petrut proposed openstack/nova master: api: enable oslo.reports when using uWSGI  https://review.opendev.org/c/openstack/nova/+/81092206:23
bauzasgood morning Nova07:00
* kashyap waves07:19
opendevreviewalecorps proposed openstack/nova master: VMware: Support volumes backed by VStorageObject  https://review.opendev.org/c/openstack/nova/+/80879108:32
bauzasmmm, I'm stuck trying to install a devstack on RHEL8.2 with a "openstack: command not found" when creating keystone accounts... https://paste.opendev.org/show/809996/08:41
bauzasanyone hitting it ?08:42
bauzasI'm out of ideas08:42
kashyapbauzas: Why are you installing it on RHEL8.2?08:47
kashyapFWIW, I'd suggest to pick a latest-1 Fedora (or Debian/Ubuntu - if you're comfy w/ it) :)08:47
* kashyap crawls back into his cave - need to prepare for a presentation on a short notice 08:48
fricklerbauzas: did you check that there is no earlier error already? also 8.2 afaict isn't supported by devstack anymore08:49
opendevreviewRodolfo Alonso proposed openstack/nova master: Set "cache_ok=True" in "TypeDecorator" inheriting classes  https://review.opendev.org/c/openstack/nova/+/80735909:01
gibibauzas: hi! did you got the moderator info for the PTG from Ashlee? or should I forward?09:13
bauzasfrickler: kashyap: thanks (for some reason, got no pidgin notification when you highlighted me)09:18
bauzasfrickler: kashyap: I'll then use RHEL8.4 I guess (I need it for testing the nvidia GPUs, Fedora is not supported for their driver)09:18
bauzasgibi: hmmm, by email ? if yes, nope09:19
bauzasgibi: thanks09:19
gibiemail so I forward then09:19
gibidone09:20
bauzasgibi: thanks, will look at it !09:21
gibialso fyi, on Monday I will only be available from 15:00 UTC 09:21
gibirest of the week I will fully available 09:24
bauzas++09:25
kashyapbauzas: Ah, I see.09:35
viks__hi, I have set `live_migration_completion_timeout=100 & live_migration_timeout_action=force_complete`, now i'm stressing the vm via stress-ng tool with load going up to 300. But my migration is not completing. Why `force_complete` action is not getting kicked in? 09:37
-opendevstatus- NOTICE: zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued10:02
opendevreviewIlya Popov proposed openstack/nova master: Fix to use NUMA cell with more free memory first  https://review.opendev.org/c/openstack/nova/+/80564910:04
gibisean-k-mooney: you were right, the rpc.NOTIFIER is the global that we facilitate the test case crosstalk https://bugs.launchpad.net/nova/+bug/1946339/comments/712:04
gibis/we facilitate/ facilitates/12:04
gibiwe reset that global between tests but the nova does dynamically gets the global from the rpc module whathever it is at the moment12:05
gibiso if the global was re-inited it uses the re-inited global for the next notification12:05
gibiI don't see how to fix this from the rpc.NOTIFIER perspective12:09
sean-k-mooneyi see i looked at that breifly but did not see how that happened but that is what my gut was telling had to be happening12:09
gibiyou have good gut :)12:10
sean-k-mooneyso this global https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/rpc.py#L53 is really the issue right12:11
gibiyes, 12:11
sean-k-mooneywe need that to be mocked in the setup of the test12:11
gibian whatever code wants to emit a notificiation it uses that global12:11
gibisean-k-mooney: that won't work as the code grabs the global at the point of time when the notification needs to be emitted12:11
gibiso the first tc will grab it 60 seconds after the tc is finished12:12
gibiand at that time it is already restubbed to the current test case12:12
gibiso it grabbs the new stubbed version that is connected to the current testcase12:12
gibihence the crosstalk12:13
gibiif nova would grab the global at service startup then yes stubbing would work12:13
sean-k-mooneydamnb ok ya that is annoying12:14
gibiit is really due to that the test case executor things that a tc is finished and moves forward but the tc still has greenlets running in the background12:15
gibi+ the global :)12:15
gibiI tried killing greenlets at the end of test case but I think I cannot properly kill it12:15
sean-k-mooneywe might be able to wait for the notificaiton in that one test but this  could affect any set of tests12:16
gibiyes, waiting in each test for each build to finish is a way to solve this12:16
sean-k-mooneyso i think we need a more systematic way of mocking this but im not sure how to approch that12:16
gibiyeah probably we need a higher level mock than the stub on rpc.NOTIFIER12:17
gibiI have to think about it12:17
sean-k-mooneywe cant just stub out nova.rpc.get_versioned_notifier()12:18
gibinope12:18
gibithe module level function is also a global12:18
gibiso when the caller say rpc.get_versioned_notifier it gets whatever mocked version the modul has at the moment12:19
gibiand 60 seconds after the first tc, it will be mocked to the current tc not to the first tc12:19
sean-k-mooneyit does yes but i was wondering if we coudl have a per test dictionaty of notifieers and do a lookup in that12:20
gibithe caller cannot provide the test case id12:20
gibiafaik12:20
gibior in other way, what would be the key in the lookuptable?12:21
sean-k-mooneyit cant but we can12:21
sean-k-mooneyin the fixture we can stash that value 12:21
sean-k-mooneyso the ideay i had was use a dict with set_default with the test_id as a key and a new fake notifyer as the default12:22
sean-k-mooneythen return the result12:22
sean-k-mooneythen clear it at the end fo a test run12:22
sean-k-mooneyif a long runing eventlet sends a notificaiton after the test we will get a new notifyer12:23
sean-k-mooneyinstead of the current one12:23
gibithe long running eventlet when calls nova.rpc.get_versioned_notifier it does not provide any tc id, same as if the currntly runnig tc calls nova.rpc.get_versioned_notifier12:23
gibifrom the fixture prespective both nova.rpc.get_versioned_notifier call are happening at the current tc time12:24
gibiand providing no id12:24
gibiis there a greenlet specific storage space like threadlocal?12:25
sean-k-mooneyi think we shoudl be able to make it sticky to the greenlet yes12:25
gibisomehow we need to mark the long running eventlet with a different id than the current eventlets12:25
sean-k-mooneyi feel like i  have done this before12:26
gibian we have to store the tc id automatically in each greenlet nova spawns which is /o\12:32
sean-k-mooneyi rememebr trying to use with context managers to create funcitonal test where each isntance of nova compute had a different nova.conf in the past12:32
sean-k-mooneywe did not merge it but i was able to make each nova-compute have a differfent view of the global config12:33
sean-k-mooneyi have no idea where that his however so i think we can spawn the nova-comptue serivce such that the things we have monkey patched are sticky to that instance but i have no idea if that would out live the test12:34
sean-k-mooneyi think those patcher would likely get towrn down when the test funciton ends12:34
sean-k-mooneyleading to the same problem12:35
sean-k-mooneygibi: basicaly i was hopign we could use functool.partil or something to carry the extra info12:35
sean-k-mooneygibi: there is https://eventlet.net/doc/modules/corolocal.html12:36
gibifor the partial: for that we need to attach the partial to a thing that is specific to the current test case execution12:36
sean-k-mooneygibi: ya and i dont really know how to do that12:37
gibifor corolocal that can be the storage, but then we probably need to patch eventlet.spawn* to fill it12:37
gibiI will play around12:37
sean-k-mooneycertnely not in the notificaiotn fixture which is where we really want to do this12:37
sean-k-mooneyya i might try and play with this too12:37
gibisean-k-mooney: eventlet patches threading.local to be corolocal.local()12:38
lajoskatonagauzas, gibi: Hi, for rbac discussion do you think Neutron should join to the discussion? (see: https://etherpad.opendev.org/p/policy-popup-yoga-ptg )12:38
lajoskatonabauzas ---^ (sorry)12:38
gibilajoskatona: for the external event discussion would be good to have somebody from neutron as the client of that api12:39
sean-k-mooneylajoskatona: i think there is work to be done with makeing nova capable of calling neturon where neutron is using scope enforcement12:39
sean-k-mooneyand ya the external events is the flip side of that12:40
lajoskatonagibi, sean-k-mooney: thanks, than I add it to next week's shcedule12:40
sean-k-mooneygibi: i still think we need to create some form of oslo.midelware so that we can dicorver a services policy programticaly form teh api12:41
sean-k-mooneyright now the operator will need to set the correct scopes ectra in our config file12:41
lajoskatonagibi, bauzas, sean-k-mooney: we have edge session at the same time (1400-1600) try to fix that12:42
bauzassorry in a meeting12:42
* gibi lets bauzas agree on the schedule12:42
bauzascan you tl;dr ?12:42
bauzasI'm hardly following12:42
sean-k-mooneybauzas: there is a clash between the nova rbac popup session and a neutron? edge session12:44
sean-k-mooneyit woudl be good if we could adjust the schduler to accomidate that lajoskatona  is that a correct summary12:44
lajoskatonasean-k-mooney: yes12:45
lajoskatonasean-k-mooney, bauzas, gibi: I try to fetch ildikov to see if wee need both hours for edge.....12:45
sean-k-mooneywe can also talk about neutorn rback issues in the nova neutron session too if we cant12:45
lajoskatonasean-k-mooney: yeah, worst case12:46
*** bhagyashris|out is now known as bhagyashris|mtg13:00
bauzasah ok13:01
bauzaslet's then wait for ildikov but I'm pretty sure we can find other slots13:01
* bauzas disappears for 30 mins after 1h30 of meetings => haircutr13:30
bauzasand then I'm back13:30
*** bhagyashris|mtg is now known as bhagyashris|away15:03
melwittbauzas: sorry to bring this up again but do you think there's a chance you could look at https://review.opendev.org/c/openstack/nova/+/791807 and https://review.opendev.org/c/openstack/nova/+/806629 before the ptg? elodilles is +2 on them and I'm trying to avoid them getting lost16:06
bauzasmelwitt: yeah I remember I had to review your patches but I wasn't seeing them on my review priority list, adding them16:08
* bauzas has to disappear now16:08
melwittthank you bauzas 16:08
bauzasmelwitt: will be the first patches I look tomorrow16:08
opendevreviewBalazs Gibizer proposed openstack/nova master: Prevent leaked eventlets to send notifications  https://review.opendev.org/c/openstack/nova/+/81403616:29
gibisean-k-mooney: ^^ this fixes the local reproduction for me16:29
melwittgibi: omg, I spent some time looking at these failures yesterday and could not figure out how/why we got "no reply on conductor" and DBNonExistentTable. awesome find 🙌16:34
gibimelwitt: it was hard one, sean-k-mooney and I spent two day figuring it out16:34
gibithe lesson for me is that we probably should not run multiple testcases in a sequence in a same process if those testcases use some kind of parallelism16:36
gibibut I have no viable alternative16:36
melwittI was completely stumped. I'm so happy y'all figured it out16:36
*** dpawlik5 is now known as dpawlik16:37
gibi... and I still not like eventlets ;)16:37
melwitthaha :)16:40
melwittspeaking of eventlet...16:41
melwitthere's a thing I'd appreciate your eyes on https://review.opendev.org/c/openstack/nova/+/813114 to see if this is the right way to solve it or if I'm missing a better way16:42
melwittgibi ^16:42
gibimelwitt: added to my queue16:43
melwittdanke16:43
sean-k-mooneymost of the work was by gibi  but ok ill review that it looks promising17:04
sean-k-mooneyit its not that complex at first glance17:04
sean-k-mooneyah you are intercepting spawn17:06
sean-k-mooneyand that is where your getting the testcase id17:06
sean-k-mooneyand propagating it17:06
sean-k-mooneygibi: so the runtime error will that cause the test to fail?17:06
sean-k-mooneyi.e. when https://review.opendev.org/c/openstack/nova/+/814036/1/nova/tests/fixtures/notifications.py#168 is raised17:07
sean-k-mooneywhich test will fail17:07
sean-k-mooneyor will we catch that and just loog it17:08
sean-k-mooneyoh the runtim error goes to the eventlet that called notify17:08
sean-k-mooneywhich is the one that is runnign the backgound and it kill whatever was leaked17:09
sean-k-mooneygibi: by the way would stoping the compute service help in this case.17:12
sean-k-mooneygibi: we start the service here https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/tests/functional/integrated_helpers.py#L1125-L114317:14
sean-k-mooneygibi: if we added a tear_down funciton implmeation that expiclty stop them would that help clean up any running eventlests17:15
sean-k-mooney we have the service refernces so i feel like we shoudl be able to use them to invoke stop https://github.com/openstack/nova/blob/7b063e4d0518af3e57872bc0288a94edcd33c19d/nova/service.py#L282-L29617:16
sean-k-mooneythat will at least stop the rpc server instnaces17:16
sean-k-mooneyhum ok i guess we are at least partly doing that https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/test.py#L438-L44117:18
sean-k-mooneyactully no that is not registring it to be involed automaticaly its patching the stop funciton17:20
sean-k-mooneythe service fixture calls kill as a cleanup function which in turn calls stop so we are already stoping the services when the serivce fixtuer is disposed of17:22
opendevreviewIlya Popov proposed openstack/nova master: Fix to use NUMA cell with more free memory first  https://review.opendev.org/c/openstack/nova/+/80564919:14
opendevreviewHang Yang proposed openstack/nova master: Support creating servers with RBAC SGs  https://review.opendev.org/c/openstack/nova/+/81152123:28

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!