Monday, 2023-05-22

bauzasgood morning07:27
gibio/08:05
ykarelsean-k-mooney[m], gibi can you please check https://bugs.launchpad.net/neutron/+bug/2015065 comment 7/8 08:06
ykarelrandomly one of nova-api worker just get's stuck when doing requests to neutron(not sure if same is seen with any other service yet)08:07
gibiykarel: quickly looked at the bug. Thanks for collecting all that data. When the nova-api stuck in calling neutronclient's show_security_group do you see that the actualy API request to neutron was sent but never received by neutron-server? Or nova-api is stuck on sending the message?08:42
ykarelgibi, i don't see the request received on neutron side, not sure where to check if it's stuck on sending08:49
gibiykarel: ack08:51
gibiykarel: 09:03
gibii feel like we are seeing an interesting interaction between multiple things09:03
gibiI'm trying to follow the stack trace from the latest comment from the bug to see where the neutronclient got stuck09:04
gibithe firts interesting point is09:04
gibi/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py:28 in is_connection_dropped09:04
gibihttps://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/connectionpool.py#L27209:04
gibiso urllib try to check if the existing client connection is still usable or got disconnected09:05
gibihttps://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/connection.py#L2809:05
gibiwait_for_read(sock, timeout=0.0)09:05
gibios it checks if it can read from the socket with 0.0 timeout 09:06
gibihttps://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/wait.py#L84-L8509:06
gibithat 0.0 timeout is passed to python's select.select 09:06
gibihttps://docs.python.org/3.10/library/select.html#select.select09:07
gibi"The optional timeout argument specifies a time-out as a floating point number in seconds. When the timeout argument is omitted the function blocks until at least one file descriptor is ready. A time-out value of zero specifies a poll and never blocks."09:07
gibiso that select.select called with 0.0 should never block09:07
gibiBUT09:07
gibiin our env the envtlet monkey patching is changing python's select.select09:08
gibi/usr/local/lib/python3.10/dist-packages/eventlet/green/select.py:80 in select09:08
gibiand redirects it to implement the envtlet switching mechanism09:08
gibihttps://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L30-L80C3209:11
gibilooking at that code it seems enventlet sets a timer with the timeout value09:12
gibivia hub.schedule_call_global09:12
gibihere I'm getting lost in the eventlet code but I assume sheduling a timer with 0.0 timeout in eventlet can be racy09:17
gibibased on the comment in https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L62-L6909:17
gibione could argue that what we see is an eventlet bug as select.select with timeout=0.0 should not ever block but it does block in our case.09:21
opendevreviewsuzhengwei proposed openstack/nova master: rename 'recreate' to 'evacuate'  https://review.opendev.org/c/openstack/nova/+/88381009:28
ykarelThanks gibi for checking, anyway the issue can be fixed/worked around on nova side?10:24
gibiI'm trying to open an issue on eventlet and see if the maintainer agrees with my analysis or not. I don't see now any easy workaround. Maybe sean-k-mooney or melwitt can see some10:25
gibiykarel: I will update the launchpad bug 10:25
ykarelThanks gibi 10:26
sean-k-mooneygibi: sorry i missed the start of this what is the issue10:28
gibinova-api using neutron client to call neutron API but get stuck for ever10:29
gibibased on the stack trace it stuck checking if the previous connection is still usable10:30
gibiwe end up in eventlet monkeypatched select.select on a socket10:30
gibiwith a timeout 0.010:30
gibibased on the stdlib doc timeout 0.0 means non blocking but we still block10:31
gibiso I assume eventlet not properly handles timeout 0.0 in the eventlet select impl10:31
sean-k-mooneyi see10:31
sean-k-mooneythat or its python version specirif10:31
sean-k-mooneybut ya sound like a api compaitblity bug10:31
gibidetails are here  https://bugs.launchpad.net/neutron/+bug/201506510:31
sean-k-mooneyChanged in version 3.7: The method no longer toggles SOCK_NONBLOCK flag on socket.type.10:36
sean-k-mooneyhttps://docs.python.org/3/library/socket.html#socket.socket.settimeout10:36
sean-k-mooneyykarel: gibi: it looks like we shoudl not be using 0.0 to make it non-blocking after 3.710:37
sean-k-mooneywe should be using socket.setblocking(false)10:38
gibisean-k-mooney: https://docs.python.org/3.10/library/select.html#select.select for select.select timeout=0.0 still means 10:38
gibinon blocking10:38
gibiso it is not the socket that is set to non blocking mode, it is select called with non blocking timeout10:38
sean-k-mooneyi feel like this is still a logic but on our part10:40
sean-k-mooneywe likely shoudl be settign timeout to a non zeor value and setting the socket to non-blocking10:40
gibithis is urllib3, not our code10:40
sean-k-mooneyfun10:42
sean-k-mooneygibi: as far as i can tell eventlet supprot for 3.10 is still not fully complete10:44
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: [wip]network: convert usage of neutronclient to openstacksdk  https://review.opendev.org/c/openstack/nova/+/88271410:53
sean-k-mooneygibi: so it does look like its alwasy a blockign operation https://github.com/eventlet/eventlet/blob/master/eventlet/green/select.py#L30-L3610:53
gibifiled https://github.com/eventlet/eventlet/issues/79810:54
gibisean-k-mooney: as far as I understand the eventlet code they do block but set up a timer to wake up after the given timeout 10:55
sean-k-mooney    if timeout is not None:10:55
gibiso I think that can race and the code can miss the weakup as it is not reached hub.switch() yet10:55
sean-k-mooney        timers.append(hub.schedule_call_global(timeout, on_timeout))10:55
sean-k-mooneyso yes10:55
sean-k-mooneybut they proably need to instead spwan this in a seperate greenthread10:56
sean-k-mooneyalthough based on the assert10:56
sean-k-mooneyhttps://github.com/eventlet/eventlet/blob/master/eventlet/green/select.py#L4010:56
sean-k-mooneythey are expecting you to be spanwign this in a thread pool or similar i think10:57
sean-k-mooneynot that we have contol over this really10:57
gibiyeah10:58
sean-k-mooneyi guess we will see what they say10:58
sean-k-mooneyfor what its worth this is happening in tempest right10:58
gibialso I don't think I fully understand the logic of the double timer in https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L59-L7210:58
gibibut it feel scarry10:58
sean-k-mooneyim surpised we are using eventlet in tempest10:58
gibiwe need eventlet in nova-api for the scatter-gather, do we? 10:59
gibidon't we?10:59
sean-k-mooneyoh sorry i tought this was the tempest client that was blocking11:00
sean-k-mooneynot in nova11:00
gibiit is nova-api using neutron client to call neutron API11:00
gibithen nova-api blocks11:01
sean-k-mooneyi see11:01
gibiand therefore tempest call to nova-api timeouts too11:01
sean-k-mooneyyep yep yep 11:01
sean-k-mooneyfollowing now11:01
gibiOK11:01
sean-k-mooneyya thats not good11:01
gibiI will stop here now and move over to golang for another set of challenges :)11:01
gibilet's see if eventlet maintainers has some ideas11:01
gibianother way would be to ask urllib3 maintainers to change their side11:02
sean-k-mooneywell eventlet is monkeytpatching urllib311:02
sean-k-mooneyhttps://github.com/eventlet/eventlet/blob/master/eventlet/green/urllib/request.py11:03
gibi(or we can rip out eventlet from nova and move to pure threads in the next coupe of years :))11:03
sean-k-mooneywell nova-api didnt have any direct depency on eventlet until we added scater gateher11:03
sean-k-mooneyso if it was just that we could rip that out 11:03
sean-k-mooneybut this could obvioulys happen for all teh other client invocations11:04
sean-k-mooneyin the other services so ya...11:04
gibihm, that seems like only monkey patching urllib but not urllib3 11:05
sean-k-mooneymaybe i didn't look too closely i assumed it was the same11:05
gibiit seem for urllib3 the socket and select monkey considered enough11:05
opendevreviewMerged openstack/os-vif stable/wallaby: Use TCP keepalives for ovsdb connections  https://review.opendev.org/c/openstack/os-vif/+/84177311:07
opendevreviewMerged openstack/os-vif stable/wallaby: only register tables used by os-vif  https://review.opendev.org/c/openstack/os-vif/+/84177411:07
sean-k-mooneyoh urllib is the standard libary11:08
sean-k-mooneyhttps://docs.python.org/3/library/urllib.html?highlight=urllib#module-urllib11:09
opendevreviewAmit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes  https://review.opendev.org/c/openstack/nova/+/88145711:32
opendevreviewAmit Uniyal proposed openstack/nova master: WIP: Delete dangling bdms  https://review.opendev.org/c/openstack/nova/+/88228411:32
*** dmellado90 is now known as dmellado12:48
opendevreviewDan Smith proposed openstack/nova master: Populate ComputeNode.service_id  https://review.opendev.org/c/openstack/nova/+/87990413:37
opendevreviewDan Smith proposed openstack/nova master: Add compute_id columns to instances, migrations  https://review.opendev.org/c/openstack/nova/+/87949913:37
opendevreviewDan Smith proposed openstack/nova master: Add dest_compute_id to Migration object  https://review.opendev.org/c/openstack/nova/+/87968213:37
opendevreviewDan Smith proposed openstack/nova master: Add compute_id to Instance object  https://review.opendev.org/c/openstack/nova/+/87950013:37
opendevreviewDan Smith proposed openstack/nova master: Online migrate missing Instance.compute_id fields  https://review.opendev.org/c/openstack/nova/+/87990513:37
opendevreviewSylvain Bauza proposed openstack/nova stable/zed: Fix get_segments_id with subnets without segment_id  https://review.opendev.org/c/openstack/nova/+/88372315:39
opendevreviewSylvain Bauza proposed openstack/nova stable/yoga: Fix get_segments_id with subnets without segment_id  https://review.opendev.org/c/openstack/nova/+/88372415:40
opendevreviewSylvain Bauza proposed openstack/nova stable/xena: Fix get_segments_id with subnets without segment_id  https://review.opendev.org/c/openstack/nova/+/88372515:41
opendevreviewSylvain Bauza proposed openstack/nova stable/wallaby: Fix get_segments_id with subnets without segment_id  https://review.opendev.org/c/openstack/nova/+/88372615:42
sean-k-mooneyo/15:50
sean-k-mooneymelwitt: bauzas dansmith could ye take a look at https://review.opendev.org/c/openstack/nova/+/853269/1 and the follow ups15:53
melwittsure15:54
bauzasdone15:54
sean-k-mooneythanks15:55
melwittsweet15:55
dansmithso if powerkvm is gone, why do we still have its CI commenting on all our patches? Looks to me like it can't even devstack anymore16:20
dansmithmaybe it's just a bot in a cloud somewhere that someone forgot to turn off?16:20
opendevreviewMerged openstack/nova stable/xena: Reproducer for bug 1983753  https://review.opendev.org/c/openstack/nova/+/85326916:23
clarkbdansmith: there should be a contact email in the account as well as on the wiki for the third party ci. Asking them to stop would be the first step and if they don't then we (a gerrit admin) can disable the account16:30
dansmithokay, the "lightbits" CI also seems superfluous, but I think we've asked before and nobody knows what it is?16:33
sean-k-mooneyi do16:33
sean-k-mooneyits a ci that test tehre os-brick integration16:33
sean-k-mooneyits testing changes to https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume/lightos.py16:34
sean-k-mooneyhttps://github.com/openstack/nova/commit/b5e2128f3847d444a808a2b0f89e6f1e4ffb77fc16:34
sean-k-mooneywe asked them to set it up and maintain it whne https://www.lightbitslabs.com/ wanted to integrate with nova16:35
sean-k-mooneywhich they did in yoga so its relitivly new16:35
sean-k-mooneybaiscally they upstream the integratio they had internally for several years16:36
dansmithhmm, okay16:37
dansmiththe powerkvm contact either never made it to oftc or isn't around16:39
dansmithnever registered no oftc at least16:40
dansmithI can email to ask16:40
opendevreviewMerged openstack/nova stable/xena: Update RequestSpec.pci_request for resize  https://review.opendev.org/c/openstack/nova/+/85327017:17
opendevreviewMerged openstack/nova stable/xena: Add reno for fixing bug 1941005  https://review.opendev.org/c/openstack/nova/+/85327117:17
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (db)  https://review.opendev.org/c/openstack/nova/+/83119317:24
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (objects)  https://review.opendev.org/c/openstack/nova/+/83940117:24
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (manila abstraction)  https://review.opendev.org/c/openstack/nova/+/83119417:24
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (drivers and compute manager part)  https://review.opendev.org/c/openstack/nova/+/83309017:24
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api)  https://review.opendev.org/c/openstack/nova/+/83683017:24
opendevreviewribaudr proposed openstack/nova master: Check shares support  https://review.opendev.org/c/openstack/nova/+/85049917:24
opendevreviewribaudr proposed openstack/nova master: Add metadata for shares  https://review.opendev.org/c/openstack/nova/+/85050017:24
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach notification  https://review.opendev.org/c/openstack/nova/+/85050117:24
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach notification  https://review.opendev.org/c/openstack/nova/+/85102817:24
opendevreviewribaudr proposed openstack/nova master: Add shares to InstancePayload  https://review.opendev.org/c/openstack/nova/+/85102917:24
opendevreviewribaudr proposed openstack/nova master: Add helper methods to attach/detach shares  https://review.opendev.org/c/openstack/nova/+/85208517:24
opendevreviewribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working.  https://review.opendev.org/c/openstack/nova/+/85208617:24
opendevreviewribaudr proposed openstack/nova master: Add virt/libvirt error test cases  https://review.opendev.org/c/openstack/nova/+/85208717:24
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/85482317:24
opendevreviewribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/85482417:24
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach_error notification  https://review.opendev.org/c/openstack/nova/+/86028217:24
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach_error notification  https://review.opendev.org/c/openstack/nova/+/86028317:24
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/86028417:25
opendevreviewribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028517:25
opendevreviewribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares  https://review.opendev.org/c/openstack/nova/+/86028617:25
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part)  https://review.opendev.org/c/openstack/nova/+/86028717:25
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028817:25
opendevreviewribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process  https://review.opendev.org/c/openstack/nova/+/88007517:25
opendevreviewribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion  https://review.opendev.org/c/openstack/nova/+/88147217:25
opendevreviewribaudr proposed openstack/nova master: Docs about Manila shares API usage  https://review.opendev.org/c/openstack/nova/+/87164217:25
opendevreviewribaudr proposed openstack/nova master: Allow to mount manila share using Cephfs protocol  https://review.opendev.org/c/openstack/nova/+/88386217:25
opendevreviewMerged openstack/nova master: Fixes a typo in availability-zone doc  https://review.opendev.org/c/openstack/nova/+/88347419:49
opendevreviewGhanshyam proposed openstack/nova master: DNM testing cirros 0.6.1  https://review.opendev.org/c/openstack/nova/+/88387520:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!