Tuesday, 2022-05-17

*** gibi_pto is now known as gibi05:51
tobias-urdinhberaud: i'm trying to figure out why nova-compute is leaking a_inodes allocated for eventpoll after we've upgraded, hopefully i can pick your brain a little since it involves some changes in oslo.messaging07:20
tobias-urdinthe issue is pretty much the same as described and solved with https://review.opendev.org/c/openstack/oslo.messaging/+/386656 a long time ago, but in that is also leaks anon_inodes until NOFILE limit is hit for the process and it just stops working since there is no available fds07:21
tobias-urdinthis was then reverted by this change, but i dont understand why this line was reverted, is it because threading is already patched by eventlet to threading would already point to that a similar implementation to what that helper class in eventutils does?07:23
tobias-urdinhttps://github.com/openstack/oslo.messaging/commit/22f240b82fffbd62be8568a7d0d3369134596ace#diff-ba636bdb71407febb1ff546dee098c4bc45952da2bb4e7f86f1126d53d7ec11fR94907:23
tobias-urdinthe final change is then the change to using pthreads for heatbeats by default https://github.com/openstack/oslo.messaging/commit/add5ab4ecec090efdb9864bc9385f871f2dd082a07:24
tobias-urdinwhich i guess would also be the case, that if I firstly, try to disable that and see the impact, however that is also deprecated so I assume the behavior will be default in future and not configurable?07:25
tobias-urdinI will continue to investigate by seeing if I can actually find something that solves the issue.07:25
jrossertobias-urdin: we had that trouble here i think07:26
tobias-urdininteresting, just for posterity we upgraded oslo.messaging from 12.5.2 to 12.9.307:27
jrosserthere is a LP bug that my colleague made, just looking for it07:27
jrosserhttps://bugs.launchpad.net/oslo.messaging/+bug/194996407:28
tobias-urdinthanks! i will check it out07:30
jrosserah yes, there was a very bad FD leak from the amqp library, and once that was fixed there was an underlying eventpoll FD leak related to threading07:37
jrosserand it affects more than nova-compute for us, anything not running with uwsgi https://bugs.launchpad.net/openstack-ansible/+bug/196160307:38
tobias-urdinack, yeah I assume we will start hitting that with more services not running with mod_wsgi when upgrading, from what i understand apps running under mod_wsgi should use pthreads07:42
hberaudtobias-urdin: Concerning the heartbeat in pthread option, we undeprecated it last year, so this option will remain and the possibility to swith from greenthread to pthread too https://review.opendev.org/c/openstack/oslo.messaging/+/80062107:50
tobias-urdinack07:57
damanisorry for the meeting yesterday 12:15
damanii was to the doctor and then i forget it, but we will do it next week 12:15
damani/bu/bu4412:16
sean-k-mooneyhberaud: tobias-urdin  yep it was undeprecated at my request because it could cause issue in the nova-comptue agent in some cases if i recall correctly12:35
sean-k-mooneyhberaud: tobias-urdin  with that said we were recently chatting in #openstack-nova about some eventlet internals12:36
sean-k-mooneywe think that we shoudl perhaps remove the use of eventlets spawn_n12:37
sean-k-mooneywe think that using spawn_n can lead to leaking greanthread over time when exeptions are raised or in some other cases12:38
sean-k-mooneytobias-urdin: so the inode leak coudl be related to useing spawn_n12:38
sean-k-mooneyhttps://github.com/eventlet/eventlet/issues/731#issuecomment-95376188312:39
hberaudNot related to threading but also triggered by monkey patched env, DNS and sockets are also impacted by eventlet issues (https://github.com/celery/py-amqp/commit/98f6d364188215c2973693a79e461c7e9b54daef) (recently fixed)12:40
sean-k-mooneytobias-urdin: im considering doing https://github.com/eventlet/eventlet/issues/731#issuecomment-968135262 eventually.12:40
sean-k-mooneyhberaud: afctully that might not be needed anymore12:41
sean-k-mooneyhberaud: https://github.com/openstack/nova/commit/fe1ebe69f358cbed62434da3f1537a94390324bb12:42
sean-k-mooneyhberaud: i turn back on greendns recently 12:42
hberaudoh cool12:42
hberaudgood to know12:42
sean-k-mooneyso on my ever growing todo list i want to see if i gloablly do "eventlet.spawn_n = eventlet.spawn"12:43
sean-k-mooneyand the rest of that comment in nova 1 will it break anythign and 2 will it help withthe leaking of greenlets12:43
sean-k-mooneyif it does i would like to then convert all the use of spawn_n to spawn in nova nad eventrually in oslo12:44
hberaudgreat12:44
hberaudmay I can help you on the oslo side?12:44
sean-k-mooneyhttps://github.com/eventlet/eventlet/issues/731#issuecomment-968135262 is a bit of a hack but if that work we can put it behind a workaround option and run it in the ci for a whiel12:45
sean-k-mooneysure but i want to confirm it does not break the world first12:45
hberaudsure12:45
sean-k-mooneywhen we do this we are going to silently discard the reference to the greenthread intially in the places that are using spwan_n12:46
sean-k-mooneybut that should not matter12:46
sean-k-mooneythe api is otherwise the same12:46
hberaudI see12:46
sean-k-mooneybut the sematic of using a greenthread vs freestandign greenlet we think will help with the resouce leaks based on the comments form the upstream eventlet maintainer12:47
sean-k-mooneyim not sure that we fully comprehended the full implciations of  """The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions).""" ment when we first started usign spawn_n12:49
sean-k-mooneyif an excption perculates all the way to the top of the call stack for the function we invoke with spaw_n the greenlet jsut stays in the background after loging the traceback and i belive nothing handels it today12:51
sean-k-mooneymnaser triggered a GMR on a nova-compute that had 100s of greenlets in that state.12:53
sean-k-mooneytobias-urdin: nova has not recently started useing spwan_n but the pthread change is relitivly recent so maybe the two dont play nicely togehter12:56
sean-k-mooneytobias-urdin: if you do find something on the nova side besure to file a bug and let us know12:57
tobias-urdini'm rolled out the heartbeat_in_pthread=false change, will monitor that it doesn't leak for 1-2 days, but that's probably the issue based on the bug https://bugs.launchpad.net/oslo.messaging/+bug/194996413:00
tobias-urdini've13:00
tobias-urdinbut will do13:00
jrosserwe get log noise from oslo.cache etc3gw backend turning into bug reports in openstack-ansible `Could not load 'oslo_cache.etcd3gw': No module named 'etcd3gw'` - are all the backends loaded regardless if we only use memcached?13:27
opendevreviewTakashi Kajinami proposed openstack/taskflow master: Remove six  https://review.opendev.org/c/openstack/taskflow/+/84211413:57
opendevreviewTakashi Kajinami proposed openstack/taskflow master: Remove six  https://review.opendev.org/c/openstack/taskflow/+/84211414:44
sean-k-mooneyjrosser: i think the oslo_cache modules are loaded to register config options14:59
sean-k-mooneyand i would guess the etcd one is unconditonlly doign an import of etcd3gw14:59
sean-k-mooneyhum the import is in the init actully15:00
sean-k-mooneyhttps://github.com/openstack/oslo.cache/blob/master/oslo_cache/backends/etcd3gw.py#L43=15:00
sean-k-mooneyah it was fixed a year ago https://github.com/openstack/oslo.cache/commit/40946a9349407f36a43d5020d991085c1146869815:01
sean-k-mooneyjrosser: so ya now it wont fail since it will only do the import if you try to use it15:01
sean-k-mooneyhttps://bugs.launchpad.net/oslo.cache/+bug/192831815:01
sean-k-mooneyits backported back to xena15:02
jrossersean-k-mooney: ah excellent - we get a steady drip of LP bugs to openstack-ansible about that log message15:04
sean-k-mooneyi was pretty sure that got fixed since i rembere seeign it in devstack often enough but i havent in a while15:09
*** andrewbonney_ is now known as andrewbonney16:29
*** ricolin_ is now known as ricolin16:29
*** dansmith_ is now known as dansmith16:55
*** melwitt_ is now known as melwitt18:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!