*** k_mouza has joined #openstack-nova | 01:07 | |
*** k_mouza has quit IRC | 01:12 | |
*** k_mouza has joined #openstack-nova | 01:27 | |
*** gmann_afk is now known as gmann | 01:30 | |
*** k_mouza has quit IRC | 01:32 | |
*** sapd1 has joined #openstack-nova | 01:54 | |
*** sapd1_x has quit IRC | 01:56 | |
*** sapd1 has quit IRC | 02:03 | |
*** sapd1 has joined #openstack-nova | 02:03 | |
*** psachin has joined #openstack-nova | 02:49 | |
*** sapd1_x has joined #openstack-nova | 03:17 | |
*** mkrai has joined #openstack-nova | 04:23 | |
*** k_mouza has joined #openstack-nova | 04:43 | |
*** k_mouza has quit IRC | 04:47 | |
*** k_mouza has joined #openstack-nova | 04:55 | |
*** ratailor has joined #openstack-nova | 04:58 | |
*** k_mouza has quit IRC | 04:59 | |
*** k_mouza has joined #openstack-nova | 05:03 | |
*** k_mouza has quit IRC | 05:08 | |
*** Alon_KS has quit IRC | 05:22 | |
*** ralonsoh has joined #openstack-nova | 05:28 | |
*** mkrai_ has joined #openstack-nova | 05:34 | |
*** sapd1_x has quit IRC | 05:37 | |
*** mkrai has quit IRC | 05:37 | |
*** Alon_KS has joined #openstack-nova | 05:45 | |
*** sapd1_x has joined #openstack-nova | 06:01 | |
*** k_mouza has joined #openstack-nova | 06:16 | |
*** k_mouza has quit IRC | 06:20 | |
*** sapd1_x has quit IRC | 06:25 | |
*** slaweq has joined #openstack-nova | 06:33 | |
*** Alon_KS has quit IRC | 06:37 | |
*** Alon_KS has joined #openstack-nova | 06:41 | |
*** mkrai_ has quit IRC | 07:09 | |
*** sapd1_x has joined #openstack-nova | 07:09 | |
*** tosky has joined #openstack-nova | 07:20 | |
*** andrewbonney has joined #openstack-nova | 07:31 | |
*** slaweq has quit IRC | 07:32 | |
*** k_mouza has joined #openstack-nova | 07:33 | |
*** k_mouza has quit IRC | 07:34 | |
*** k_mouza has joined #openstack-nova | 07:34 | |
*** slaweq has joined #openstack-nova | 07:35 | |
*** vishalmanchanda has joined #openstack-nova | 07:41 | |
*** lucasagomes has joined #openstack-nova | 07:58 | |
lyarwood | \o morning | 08:01 |
---|---|---|
*** derekh has joined #openstack-nova | 08:05 | |
*** mkrai_ has joined #openstack-nova | 08:06 | |
*** ociuhandu has joined #openstack-nova | 08:23 | |
*** stephenfin has quit IRC | 08:27 | |
lyarwood | stephenfin: would you mind hitting these again if you're around today https://review.opendev.org/q/topic:%22bug%252F1928063%22+(status:open%20OR%20status:merged) | 08:28 |
*** Hazelesque has joined #openstack-nova | 08:47 | |
*** Hazelesque is now known as Hazelesque_ | 09:02 | |
*** Hazelesque_ is now known as Hazelesque__ | 09:02 | |
*** Hazelesque__ is now known as Hazelesque | 09:02 | |
kevinz | lyarwood: stephenfin: morning! Could you help to review this when you convenient? https://review.opendev.org/c/openstack/nova/+/763928, It is about live migration support on Arm64 | 09:03 |
* lyarwood clicks | 09:04 | |
*** sapd1_y has joined #openstack-nova | 09:19 | |
*** sapd1_x has quit IRC | 09:19 | |
*** sapd1 has quit IRC | 09:19 | |
*** ociuhandu has quit IRC | 09:34 | |
*** ociuhandu has joined #openstack-nova | 09:35 | |
*** sapd1 has joined #openstack-nova | 09:41 | |
*** ociuhandu has quit IRC | 09:43 | |
*** stephenfin has joined #openstack-nova | 09:49 | |
*** jangutter_ has joined #openstack-nova | 09:50 | |
*** jangutter has quit IRC | 09:53 | |
*** owalsh has quit IRC | 10:05 | |
*** sapd1 has quit IRC | 10:16 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: Add generic mdevs to Nova https://review.opendev.org/c/openstack/nova-specs/+/792796 | 10:21 |
*** owalsh has joined #openstack-nova | 10:27 | |
*** sapd1_y has quit IRC | 10:29 | |
*** sapd1 has joined #openstack-nova | 10:30 | |
*** sapd1_x has joined #openstack-nova | 10:31 | |
*** sapd1_x has quit IRC | 10:31 | |
*** sapd1_x has joined #openstack-nova | 10:32 | |
*** lpetrut has joined #openstack-nova | 10:43 | |
*** pmannidi has joined #openstack-nova | 10:43 | |
*** pmannidi has quit IRC | 10:43 | |
*** ociuhandu has joined #openstack-nova | 10:47 | |
*** ociuhandu has quit IRC | 11:00 | |
*** ociuhandu has joined #openstack-nova | 11:03 | |
*** ociuhandu has quit IRC | 11:15 | |
*** sapd1 has quit IRC | 11:15 | |
*** mkrai_ has quit IRC | 11:19 | |
*** ociuhandu has joined #openstack-nova | 11:27 | |
*** ociuhandu has quit IRC | 11:31 | |
*** ociuhandu has joined #openstack-nova | 11:34 | |
*** ociuhandu has quit IRC | 11:38 | |
*** ociuhandu has joined #openstack-nova | 11:38 | |
*** ociuhandu has quit IRC | 11:39 | |
*** ociuhandu has joined #openstack-nova | 11:41 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Move libvirt-specific fixtures https://review.opendev.org/c/openstack/nova/+/790969 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Add os-brick fixture https://review.opendev.org/c/openstack/nova/+/790970 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Rename 'ImageBackendFixture' to 'LibvirtImageBackendFixture' https://review.opendev.org/c/openstack/nova/+/792353 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Create a fixture around fake_notifier https://review.opendev.org/c/openstack/nova/+/758446 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Use NotificationFixture for legacy notifications too https://review.opendev.org/c/openstack/nova/+/758448 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Test the NotificationFixture https://review.opendev.org/c/openstack/nova/+/758450 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Move fake_notifier impl under NotificationFixture https://review.opendev.org/c/openstack/nova/+/758451 | 11:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: rpc: Mark attributes as private https://review.opendev.org/c/openstack/nova/+/792803 | 11:41 |
*** sapd1 has joined #openstack-nova | 11:42 | |
*** ociuhandu has quit IRC | 11:47 | |
*** ociuhandu has joined #openstack-nova | 11:59 | |
*** ociuhandu has quit IRC | 12:02 | |
*** ociuhandu has joined #openstack-nova | 12:03 | |
*** sapd1_x has quit IRC | 12:11 | |
*** sapd1 has quit IRC | 12:12 | |
*** links has joined #openstack-nova | 12:25 | |
*** psachin has quit IRC | 12:28 | |
openstackgerrit | Merged openstack/nova master: tests: Move libvirt-specific fixtures https://review.opendev.org/c/openstack/nova/+/790969 | 12:34 |
openstackgerrit | Merged openstack/nova master: tests: Add os-brick fixture https://review.opendev.org/c/openstack/nova/+/790970 | 12:35 |
sean-k-mooney | stephenfin: are you rewriting all the test fixutures again :P | 12:36 |
openstackgerrit | Merged openstack/nova master: tests: Rename 'ImageBackendFixture' to 'LibvirtImageBackendFixture' https://review.opendev.org/c/openstack/nova/+/792353 | 12:36 |
lyarwood | I didn't like backporting func tests anyway | 12:36 |
lyarwood | /s | 12:37 |
sean-k-mooney | :) | 12:37 |
sean-k-mooney | we just need to get all our customer to deploy master at all times | 12:37 |
sean-k-mooney | problem solved | 12:37 |
* lyarwood nods | 12:39 | |
*** bbowen has joined #openstack-nova | 12:41 | |
*** elod has quit IRC | 12:56 | |
*** elod has joined #openstack-nova | 13:05 | |
*** sapd1 has joined #openstack-nova | 13:11 | |
*** ociuhandu has quit IRC | 13:40 | |
lyarwood | Random, as an admin I can't seem to list server events for a user defined instance? | 13:48 |
lyarwood | $ openstack server event list srvtree-server1-rh5da6maeidh | 13:49 |
lyarwood | No server with a name or ID of 'srvtree-server1-rh5da6maeidh' exists. | 13:49 |
*** ratailor has quit IRC | 13:49 | |
stephenfin | @lyarwood If you're in a different project then that won't work | 13:51 |
lyarwood | even as the admin? | 13:51 |
lyarwood | okay weird | 13:51 |
lyarwood | I thought this worked previously | 13:51 |
stephenfin | because the name-based search is done against the instances in the project | 13:51 |
stephenfin | I doubt it. A couple of command have an '--all-projects' option just for this | 13:52 |
* stephenfin double checks to make sure he didn't change anything recently, just in case | 13:52 | |
lyarwood | ah right I was likely using the UUID in the past | 13:53 |
stephenfin | Nope, I added missing options but it's otherwise unchanged, save for some docs, since it was added in 2017 | 13:53 |
stephenfin | I'd say so | 13:54 |
*** ociuhandu has joined #openstack-nova | 13:54 | |
lyarwood | Yeah sorry, I've used UUIDs everywhere in the request-id presentation and that just threw me | 13:56 |
*** ociuhandu has quit IRC | 13:59 | |
*** ociuhandu has joined #openstack-nova | 14:01 | |
*** jangutter has joined #openstack-nova | 14:16 | |
*** jangutter has quit IRC | 14:16 | |
*** jangutter_ has quit IRC | 14:16 | |
*** jangutter has joined #openstack-nova | 14:16 | |
*** ociuhandu has quit IRC | 14:25 | |
*** martinkennelly has joined #openstack-nova | 14:34 | |
lyarwood | test_volume_backed_live_migration keeps failing on master at the moment btw | 14:35 |
*** ociuhandu has joined #openstack-nova | 14:38 | |
*** ociuhandu has quit IRC | 14:42 | |
*** ociuhandu has joined #openstack-nova | 14:43 | |
*** jangutter_ has joined #openstack-nova | 14:47 | |
lyarwood | Ah I see why, slow nodes and pre_live_migration is timing out | 14:49 |
*** lpetrut has quit IRC | 14:50 | |
*** jangutter has quit IRC | 14:51 | |
*** dklyle has joined #openstack-nova | 14:51 | |
*** jangutter has joined #openstack-nova | 15:03 | |
*** jangutter_ has quit IRC | 15:07 | |
*** raildo has joined #openstack-nova | 15:32 | |
*** jangutter_ has joined #openstack-nova | 15:33 | |
melwitt | stephenfin: were you planning to work on trying to remove eventlet or would you mind if I took a try at it? the ptg discussion on that occurred earlier than I had come online that day | 15:35 |
stephenfin | melwitt: go for it. I am hoping to do something on it but it's a potential minefield that would benefit from many eyes | 15:36 |
*** jangutter has quit IRC | 15:37 | |
melwitt | stephenfin: cool, thanks. I've gone quite down the rabbit hole related to eventlet on some downstream bugs, so I've some ideas now (fortunately or unfortunately) | 15:41 |
*** lucasagomes has quit IRC | 16:02 | |
dansmith | melwitt: this is remove eventlet from api right? | 16:06 |
melwitt | dansmith: and potentially everything else too | 16:07 |
dansmith | melwitt: nothing else is threadsafe so I have a hard time imagining that being a thing | 16:09 |
*** jangutter has joined #openstack-nova | 16:09 | |
melwitt | tl;dr is there's a bad interaction and failure mode between gevent/eventlet and pymysql and mysqlconnector wherein if a green thread is killed before a connection is cleaned up, it leaves it in an inconsistent state and the next attempt to use a connection blows up | 16:10 |
melwitt | I've spoken at length with zzzeek about this and my understanding is this can't be worked around or handled and that replacing our usage of eventlet with native threading or similar is the only way to avoid it | 16:11 |
dansmith | is there a pointer to something to read about it? | 16:12 |
melwitt | yeah, sec | 16:12 |
*** ociuhandu has quit IRC | 16:12 | |
*** jangutter_ has quit IRC | 16:13 | |
dansmith | api and conductor going to native threads are doable I think without too much crazy, | 16:15 |
dansmith | but I think compute will be a nightmare, but it also doesn't use the DB driver at all, so it should be immune | 16:15 |
melwitt | dansmith: this comment contains the relevant references https://bugzilla.redhat.com/show_bug.cgi?id=1927994#c45 the rest of that bug has a lot of comments, most of which are private because I'm not sure they help bring any clarity. it's been a long discussion on there | 16:15 |
openstack | melwitt: Error: Error getting bugzilla.redhat.com bug #1927994: NotPermitted | 16:15 |
melwitt | oh, the entire bug looks to be private /facepalm | 16:16 |
melwitt | https://github.com/PyMySQL/PyMySQL/issues/234 | 16:16 |
melwitt | https://github.com/sqlalchemy/sqlalchemy/issues/3258 | 16:16 |
melwitt | https://github.com/PyMySQL/PyMySQL/issues/260 | 16:16 |
melwitt | those are the references ^ | 16:16 |
dansmith | thanks, I can read it at least | 16:17 |
dansmith | so the assertion is that all openstack projects will have to undertake this conversion? | 16:17 |
dansmith | that's a pretty big deal | 16:17 |
melwitt | for whatever reason nova is the only one that seems affected by this, this error has not been found in any other service's logs in the same deployments that see it in nova often | 16:18 |
dansmith | and only nova-api? | 16:19 |
melwitt | and I think most of the appearance of the error is from back when we had the eventlet-based wsgi server for nova-api. my guess is that since moving away from that, our usage is much reduced and reduces the chances of hitting this | 16:19 |
melwitt | no I have seen traces in nova-scheduler and nova-conductor as well | 16:19 |
dansmith | okay, because nova-api already runs with a combination of native and green threads, so if it was just api, then that could be why, | 16:20 |
dansmith | but if scheduler and conductor see it as well, I'm not sure why no other service would be affected | 16:20 |
melwitt | yeah, I'm guessing it's because it becomes very rare when the only things using eventlet are the periodic tasks, timers/retries, and some bits of scatter/gather. the error shows up under high load and coexists with other various connection errors to the database | 16:22 |
melwitt | well, maybe "very rare" is not a good way to put it, "more rare" | 16:23 |
melwitt | I have searched for some way that nova does something different than any other service wrt to database access and found nothing | 16:25 |
melwitt | the only lead we have so far is that nova uses eventlet more than any other services do (afaik so far) | 16:26 |
sean-k-mooney | melwitt: i assume we have not had any downstream sqlalcamy updtes in rhel 7 | 16:26 |
sean-k-mooney | does this happen in modren openstack that was for 13 so queens | 16:27 |
melwitt | sean-k-mooney: what do you mean, like version changes? I don't think so but zzzeek would have covered that | 16:27 |
sean-k-mooney | yep or perhaps a backport that could have caused it | 16:28 |
melwitt | sean-k-mooney: to your other question, I could find no bug reports for this for newer than 13 | 16:28 |
melwitt | that could either mean it went away or that it happens rarely enough that no one has bothered reporting it. not sure what to think | 16:29 |
dansmith | melwitt: but nothing changed about conductor and scheduler that would account for them not showing the issue in later versions | 16:30 |
dansmith | meaning api going from evenlet to real wsgi is a change that could affect this, but that doesn't impact the other services | 16:30 |
melwitt | zzzeek anyway strongly recommended we stop using eventlet for its known issues with the mysql connectors | 16:30 |
dansmith | I've read the bug now, and I see that he has, | 16:31 |
dansmith | but it's not quite as simple as just turning it off | 16:31 |
melwitt | dansmith: yeah, I appreciate that. but it's hard to know if this is just not been reported or if it's really not there anymore | 16:31 |
sean-k-mooney | i wonder is this wsgi related | 16:31 |
dansmith | so much of what goes on changes if you don't have those yield points patched in | 16:32 |
dansmith | sean-k-mooney: melwitt says she has reports of it from scheduler and conductor, which is what really puzzles me | 16:32 |
melwitt | yeah, I know it's not simple as turning it off but afaict we could switch everything to native threads. I've started looking into it (just by having to explain to the customer where/when it's workaroundable and when it's not) | 16:32 |
sean-k-mooney | ok i was wondering if it was related to the issue with enabling multiple tread in the wsgi process | 16:32 |
sean-k-mooney | but if its in the schduler and conductor its not that | 16:33 |
dansmith | melwitt: native threads mean a ton of code can race that can't race now | 16:33 |
sean-k-mooney | dansmith: well im not sure about "cant race now" but i agree it would be more | 16:33 |
melwitt | dansmith: yeah, those logs are attached to the case and accessible on supportshell if you're interested in looking | 16:33 |
sean-k-mooney | the GIL will save use somewhat but not entirely | 16:34 |
dansmith | sean-k-mooney: not all code, a ton of code.. there is a whole class of stuff that can't race because it's single threaded and can't overlap except at schedule points.. all of that stuff will suddenly be actually paralell | 16:34 |
dansmith | sean-k-mooney: it will save individual accesses to data structures, but not multiple statements making changes | 16:34 |
melwitt | maybe at the very least we could use futurist and make it configurable whether to use eventlet or native threading, and default to eventlet so as not to change existing behavior for those it works ok for | 16:34 |
sean-k-mooney | ya thats true | 16:35 |
melwitt | and let people like these customers try out the native threading and let us know if it works well in a real deployment or not | 16:35 |
dansmith | well, any change would have to be gradual like that I think.. meaning a switch to flip that we keep around for a while | 16:35 |
dansmith | otherwise we're going to flip the switch and not find out if we broke everyone for 18 months :) | 16:35 |
sean-k-mooney | we neeed "osapi_compute_workers" to be 1 though right and scale that via the process. | 16:36 |
melwitt | yeah, a good point | 16:36 |
dansmith | sean-k-mooney: that's a different concern I think | 16:37 |
sean-k-mooney | maybe the reinit issues for that have been fixed? but we used to have issue with the pultiple interperters running in the same wsgi process because of how it reloaded | 16:37 |
dansmith | not sure we need _workers at the point where we're actually natively threaded | 16:37 |
melwitt | sean-k-mooney: osapi_compute_workers is actually the number of processes but the wsgi.default_pool_size defaults to 1000 and represents the number of green threads for the nova-api eventlet based wsgi server | 16:38 |
sean-k-mooney | melwitt: ah ok | 16:38 |
sean-k-mooney | its https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.osapi_compute_workers | 16:39 |
sean-k-mooney | im still not sure it makes sense to sue that when running under a wsgi service | 16:41 |
dansmith | right, | 16:41 |
dansmith | that's unrelated I think | 16:41 |
dansmith | we'll never spawn our own worker processes when under uwsgi, AFAIK, we'll only spawn (green)threads | 16:41 |
melwitt | yeah I think with uwsgi or mod_wsgi the number of processes is configured by their respective configs | 16:43 |
melwitt | the osap_compute_workers is for other services or the old eventlet wsgi server we had provided back then https://github.com/openstack/nova/blob/stable/queens/nova/wsgi.py#L75 | 16:43 |
sean-k-mooney | apparently its never used directly in the nova code | 16:44 |
melwitt | er sorry, osapi_compute_workers was only for nova-api. the other services have their own "workers" settings which map to the oslo.service workers | 16:44 |
dansmith | right, it's for when we spawn our own master and sub processes and listen on the socket ourselves | 16:44 |
sean-k-mooney | https://codesearch.opendev.org/?q=osapi_compute_workers&i=nope&files=&excludeFiles=&repos=openstack/nova | 16:45 |
melwitt | sean-k-mooney: it was here https://github.com/openstack/nova/blob/stable/queens/nova/service.py#L364 | 16:46 |
sean-k-mooney | im wondering if it still used since it does not appear to be | 16:46 |
sean-k-mooney | anyway its proably unrelated to the db error | 16:48 |
melwitt | dansmith: I was thinking one of the reasons nova sees this more is because we use the eventlet executor for oslo.messaging any maybe other projects don't. that opens up a lot more chances to hit the error, I think | 16:48 |
melwitt | s/any/and/ | 16:48 |
sean-k-mooney | melwitt: instent that the default executor | 16:48 |
dansmith | melwitt: as opposed to what? synchronous waiting? | 16:48 |
melwitt | they have a native threads executor | 16:48 |
dansmith | melwitt: but if eventlet is monkeypatching then they're the same I think | 16:49 |
dansmith | I mean, effectively the same | 16:49 |
melwitt | iiuc with the eventlet one, any rpc call coming into a service is in a green thread, so if it collides with a periodic task or scatter/gather, that's a chance for it to happen | 16:49 |
melwitt | dansmith: yeah, that is true but if other projects use the native threads executor and don't monkey patch that might be why they don't see it. afaik nova is the only project that monkey patches | 16:50 |
dansmith | melwitt: but if python's own threading library gets patched, the "native" one will be spawning gtreen threads too | 16:50 |
dansmith | melwitt: really? | 16:50 |
sean-k-mooney | i dont think they do | 16:51 |
melwitt | dansmith: yeah, I know, the configurable thing would only monkey patch if configured for eventlet, right? | 16:51 |
melwitt | sean-k-mooney: you don't think they monkey patch? | 16:51 |
melwitt | or you think they do | 16:51 |
dansmith | melwitt: I don't parse the configurable question | 16:51 |
dansmith | I'm not sure what the point of using eventlet without monkey patching is | 16:51 |
dansmith | otherwise you're just fully synchronous, you just have "threads" that run to completion all the time, AFAIK | 16:52 |
melwitt | dansmith: sorry, I guess I'm confused. I thought you were pointing out that making it configurable in nova would result in still having things be green threads | 16:52 |
dansmith | melwitt: I'm saying that if you were to ask for native threading in oslo.messaging, but you were monkeypatching python's thread library, then you're going to get greenthreads from your "native" o.msg threading module | 16:53 |
melwitt | yeah, I'm not sure if anyone else uses eventlet on purpose or if it's only indirectly through oslo.service or oslo.messaging | 16:53 |
dansmith | glance certainly does use it | 16:53 |
dansmith | they spawn background threads for import tasks | 16:53 |
melwitt | dansmith: ah, ok. yeah | 16:54 |
melwitt | hm, ok | 16:54 |
dansmith | and they do monkeypatch | 16:54 |
dansmith | because otherwise it would be kinda pointless | 16:54 |
dansmith | cinder monkeypatches too | 16:54 |
dansmith | and they call eventlet operations directly, in a looot of places | 16:55 |
dansmith | lots of direct threadpool and eventlet.sleep() interaction | 16:55 |
melwitt | ok, so when I began looking at this, I was assuming that nova does something different than anyone else and that's why we hit the error, but I could not find what that could be | 16:55 |
dansmith | so I think they're intentionally using eventlet | 16:55 |
melwitt | meaning, that we are not the only ones using eventlet, yet we're the only ones who hit this | 16:56 |
dansmith | well, that's what I'm trying to zero in on.. if we're the only ones.. why | 16:56 |
melwitt | so far, the only thing I can think of is we just having a lot more green threads flying around. either that, or there is something different about the way we do database interactions through sqla | 16:57 |
melwitt | I couldn't find anything when I looked, but obviously I could have missed something | 16:58 |
sean-k-mooney | is the reentrent call is happing as part of the rollback | 17:02 |
sean-k-mooney | i wonder if this could be releated to how we hanedl exction with teh scater gater implemation | 17:02 |
melwitt | sean-k-mooney: looks like if you don't choose an executor, it will detect whether you're monkey patched and if you are, it will use eventlet executor, else it will use native threading https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_utils.py#L70 | 17:03 |
melwitt | sean-k-mooney: yeah, something dies in the middle of the rollback and then the connection is left in a bad state and then when it's accessed again it raises that error | 17:04 |
*** andrewbonney has quit IRC | 17:05 | |
melwitt | sean-k-mooney: that was one of the earliest theories but there are also bug report for this same thing in OSP10 when we didn't have scatter gather | 17:05 |
sean-k-mooney | welll what i was wondering is dont we have slighly odd exctpion handeling in the scater gater wehere we return the excptions instead of raisign them | 17:05 |
sean-k-mooney | or am i imagining things | 17:05 |
melwitt | and I have looked at those traces too | 17:05 |
sean-k-mooney | oh ok | 17:05 |
sean-k-mooney | never mind then | 17:05 |
dansmith | melwitt: is it always during a scatter/gather? | 17:06 |
melwitt | dansmith: no, it happened in OSP10 too when we didn't have scatter/gather | 17:06 |
melwitt | and I have seen traces where it was raised from a "get quotas" call, from service_update I have seen a lot | 17:07 |
dansmith | oh right, I read that | 17:07 |
lyarwood | https://bugs.launchpad.net/nova/+bug/1929446 - This is the issue I was highlighting earlier if anyone has time to help narrow this down a little. | 17:07 |
openstack | Launchpad bug 1929446 in OpenStack Compute (nova) "check_can_live_migrate_source taking > 60 seconds in CI" [Undecided,New] | 17:07 |
sean-k-mooney | i am not sure this is evently related | 17:07 |
sean-k-mooney | it might be | 17:07 |
sean-k-mooney | but the nova api was not alwasy monky patched | 17:08 |
melwitt | there was a window when it wasn't | 17:08 |
sean-k-mooney | if you ran it under uwisgi before scater gatter was added it was not monkey patched | 17:08 |
melwitt | but it was prior to uwsgi/mod_wsgi being a way to run nova-api | 17:08 |
sean-k-mooney | the comman line nova-api alwasy was | 17:08 |
melwitt | right | 17:08 |
melwitt | *but it was monkey patched | 17:09 |
sean-k-mooney | no we had a perfiod of time when uwsgi was supported but we did not monkey patch | 17:09 |
melwitt | I know | 17:09 |
sean-k-mooney | athough we may not have relased that way downstream | 17:09 |
melwitt | I'm saying that prior to uwsgi it was always monkey patched | 17:09 |
sean-k-mooney | ah yes it was | 17:09 |
sean-k-mooney | osp 10 is what newton i think we just used the nova-api command directly at that point | 17:10 |
melwitt | but even if we stop monkey patching in nova-api, we will still see this in nova-scheduler and nova-conductor at least | 17:10 |
sean-k-mooney | is there a cler writeup of how the error happens | 17:11 |
melwitt | yeah. and in the sosreports I've looked at for 13 it's also the nova-api command in these cases, afaict from the ps output | 17:11 |
sean-k-mooney | i think ooo avoid using apache initally due to concens of memory overhead | 17:11 |
melwitt | sean-k-mooney: yeah but not in the context of openstack. this is a private bug but here https://bugzilla.redhat.com/show_bug.cgi?id=1927994#c45 and the links are https://github.com/PyMySQL/PyMySQL/issues/234 https://github.com/sqlalchemy/sqlalchemy/issues/3258 https://github.com/PyMySQL/PyMySQL/issues/260 | 17:11 |
openstack | melwitt: Error: Error getting bugzilla.redhat.com bug #1927994: NotPermitted | 17:11 |
sean-k-mooney | ah ok i was looking at the nova bug report and trying to find the repoducer | 17:12 |
zzzeek | hey just snooping a little bit, I think the main thing nova is doing that nobody else is, is using eventlet monkeypatching *with* mod_wsgi at the same time | 17:13 |
melwitt | we haven't been able to reproduce it in openstack | 17:13 |
zzzeek | so that's two frameworks with heavy and opposing opinions on concurrency getting together | 17:13 |
melwitt | zzzeek: the traces I've been looking at have all been not running under mod_wsgi and also occurred in services (scheduler and conductor) that are not using wsgi in any form | 17:13 |
zzzeek | ah | 17:14 |
zzzeek | melwitt: that's odd. pymysql doesnt like if you use eventlet but as long as the scope of a connection is maintained in only one greenlet at a time, this kind of error shouldnt happen. what can happen is if requsts are interrupted and not cleaned up correctly | 17:14 |
zzzeek | or if cleanup code itself is not able to run correctly due to the monkeypatrching | 17:15 |
sean-k-mooney | dont we initalise the connection globally and share it between all greentherads | 17:15 |
melwitt | no we don't | 17:15 |
melwitt | that's not a "connection" it's a "transaction context manager" which is a factory if I'm remembering terminology zzzeek explained to me last time | 17:16 |
sean-k-mooney | ah yes that is what i was thinking of | 17:16 |
sean-k-mooney | the object we recently wraped in teh run once decorator | 17:17 |
melwitt | and he has confirmed that is the correct way to use it, the threads can share that factory and get new connections from it | 17:17 |
sean-k-mooney | so when ever we context switch eventlet never guarentes we will resume on the same tread. | 17:17 |
sean-k-mooney | but normally we have only one | 17:18 |
melwitt | yeah the configure() of those objects | 17:18 |
sean-k-mooney | could this be related to the use of pthread for the heartbeat | 17:18 |
melwitt | which heartbeat? the service heartbeats are eventlet, that I saw | 17:18 |
sean-k-mooney | the only real pthread i know of in nova are teh oslo.messaging heartbeat and the libvirt one | 17:19 |
sean-k-mooney | although no that would not make sense fo 10/13 | 17:19 |
dansmith | at one point we changed the ordering of our imports relative to the monkeypatching to "fix" something | 17:20 |
melwitt | zzzeek: yeah... dansmith pointed out that glance and cinder use eventlet and monkey patch, but yet we don't see this error from them | 17:20 |
dansmith | and I think that got backported.. I wonder if that's relevant? | 17:20 |
sean-k-mooney | melwitt: i was refering to https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L90-L100 | 17:20 |
* melwitt looks for link | 17:21 | |
sean-k-mooney | dansmith: mdboots change | 17:21 |
dansmith | sean-k-mooney: right | 17:21 |
melwitt | I was just looking at that earlier | 17:21 |
dansmith | sean-k-mooney: I wonder if that ended up with us getting a combination of real and green threads in a way that is problematic.. | 17:21 |
sean-k-mooney | https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04#diff-c2e5ad6353633e738ba126e0f11ea14ed3f6ea94554deec967586fd2dfcf060d | 17:22 |
sean-k-mooney | that one | 17:22 |
melwitt | yeah that's it | 17:22 |
melwitt | sean-k-mooney: ack thanks (pthread) | 17:22 |
*** ralonsoh has quit IRC | 17:23 | |
sean-k-mooney | dansmith: well in principal that should have moved the patching eairler so less likely to get a mix | 17:23 |
sean-k-mooney | but you are suggestign without it we still could be | 17:23 |
sean-k-mooney | if we have not backported it | 17:23 |
dansmith | well, | 17:23 |
melwitt | yeah, we did not backport it | 17:23 |
dansmith | I think in wsgi mode that will come in at the point at which we hit it due to importing that api module | 17:23 |
dansmith | melwitt: oh I thought we did.. maybe that's related to the sudden cessation of reports? :) | 17:24 |
melwitt | could be, yeah | 17:24 |
dansmith | melwitt: did you say you didn't see it at all in later releases, or just ... less? | 17:24 |
*** coreycb has joined #openstack-nova | 17:24 | |
sean-k-mooney | this was merged in train | 17:25 |
melwitt | dansmith: I could not find any mention of it past queens/13 when I bugzilla searched everything under nova and pymysql | 17:25 |
sean-k-mooney | so if we did not abckprot we shoudl see it up to 15 | 17:25 |
dansmith | melwitt: that seems like it could be a strong contender for being related then | 17:26 |
sean-k-mooney | didnt we also change the mysql clint at one point | 17:26 |
melwitt | dansmith: yeah, agree | 17:26 |
dansmith | sean-k-mooney: long ago | 17:26 |
sean-k-mooney | i think pymsql is the new one right | 17:26 |
sean-k-mooney | it used to be mysql_python or something | 17:26 |
dansmith | it is, but that change was like icehouse or something I think | 17:26 |
sean-k-mooney | ya ok | 17:26 |
melwitt | I had been looking at it from the context of it also providing a way to disable monkey patching, but due to my lack of understanding of eventlet and mixing with native threads, it did not click for me to think it could have fixed things to monkey patch earlier | 17:27 |
melwitt | it makes sense when you say it now though.. | 17:28 |
dansmith | a combination of references to the un-patched library and the patched one could very much be relevant | 17:28 |
dansmith | and that's what that change was aabout | 17:28 |
dansmith | and it's also the argument against monkeypatching altogether of course :P | 17:28 |
sean-k-mooney | its ok stephenfin will reventyly get around to deleteing all the eventlet code like all the ohter stuff he has deleted :) | 17:29 |
melwitt | yeah. that change was mostly non understandable by my brain | 17:29 |
sean-k-mooney | but ya we did this for urllib3 eventully | 17:29 |
sean-k-mooney | *orginally | 17:30 |
sean-k-mooney | and som eohter service i guess but it makes sesne | 17:30 |
dansmith | urllib3 being socket-oriented, along with pymysql ... :) | 17:30 |
sean-k-mooney | we did actully backport this https://review.opendev.org/c/openstack/nova/+/647310 | 17:30 |
sean-k-mooney | but only to stien | 17:31 |
melwitt | ah, ok, my bad | 17:31 |
melwitt | I had thought it landed in stein | 17:31 |
dansmith | okay I was sure we did backport it some, but .. fair enough | 17:31 |
melwitt | (originally) | 17:31 |
sean-k-mooney | i guess that stien is where the orginal bug was reported | 17:31 |
sean-k-mooney | and we just did not bring it back before that | 17:32 |
sean-k-mooney | hum https://bugs.launchpad.net/nova/+bug/1808951 | 17:33 |
openstack | Launchpad bug 1808951 in tripleo "python3 + Fedora + SSL + wsgi nova deployment, nova api returns RecursionError: maximum recursion depth exceeded while calling a Python object" [High,Incomplete] | 17:33 |
sean-k-mooney | oh i miss read SSL as SQL | 17:33 |
sean-k-mooney | i was going to say it refrence SQL too | 17:33 |
melwitt | ok, I think it would be interesting if I build them a test package with that change and see if they can try it out | 17:34 |
melwitt | that would be good proof that it is/was the fix | 17:34 |
dansmith | melwitt: yeah if they're willing I think that'd be a good test | 17:34 |
melwitt | I'll get that done and give them the option | 17:35 |
sean-k-mooney | melwitt: lyarwood so it look like the new resovled is breaking lowerconstraits on stable os-vif branches | 17:55 |
sean-k-mooney | how is that adressed for stabel brances | 17:55 |
sean-k-mooney | do we update to the oldest lib that works? | 17:55 |
sean-k-mooney | hacking seams to be what is breakign things | 17:55 |
sean-k-mooney | although there coudl be other issues | 17:56 |
sean-k-mooney | ok on master stephenfin removed any non direct deps | 17:57 |
sean-k-mooney | https://github.com/openstack/os-vif/commit/44d8937148aac1a61f40e59d3271c45f9fe6aa03 | 17:57 |
sean-k-mooney | ill see if i can do somethign similar | 17:57 |
lyarwood | sean-k-mooney: ack yeah or backport a modified version of that? | 17:58 |
sean-k-mooney | well that is waht i was going to do but with lower min version to match the branch its going too | 17:59 |
sean-k-mooney | im not sure if starting from that patch will be faster or not | 17:59 |
*** __ministry has joined #openstack-nova | 18:04 | |
*** links has quit IRC | 18:07 | |
*** __ministry has quit IRC | 18:18 | |
openstackgerrit | sean mooney proposed openstack/os-vif stable/victoria: Resolve dependency issues https://review.opendev.org/c/openstack/os-vif/+/792840 | 18:30 |
*** sapd1_x has joined #openstack-nova | 18:34 | |
*** sapd1 has quit IRC | 18:37 | |
*** k_mouza has quit IRC | 18:42 | |
*** k_mouza has joined #openstack-nova | 18:42 | |
*** k_mouza has quit IRC | 18:47 | |
sean-k-mooney | elod: lyarwood: am i allowed to squash two change in a backport upstream | 18:56 |
sean-k-mooney | basicaly i have 2 chocie squash https://review.opendev.org/c/openstack/os-vif/+/716223 and the ussuri version of https://review.opendev.org/c/openstack/os-vif/+/792840 | 18:57 |
sean-k-mooney | or i can incoperate ignoring W504 | 18:57 |
sean-k-mooney | oh wait no that is not adding W504 | 18:58 |
sean-k-mooney | that not the patch i need to add | 18:59 |
sean-k-mooney | its https://github.com/openstack/os-vif/commit/d57a5f39edcb8ef3de09e80925c8fe628e5e0f3a | 18:59 |
sean-k-mooney | but since that raise the min version hackign i cant really backport | 19:01 |
sean-k-mooney | ok ill take a look at this again tomorrow | 19:02 |
lyarwood | <sean-k-mooney "elod: lyarwood: am i allowed to "> Yes FWIW, if it fixes an otherwise unsolvable problem you can merge multiple. | 19:06 |
sean-k-mooney | i tought that i need to backport the cleanup patch and merge it with the lower constratint one but that is not what intoduced the w504 skip | 19:07 |
sean-k-mooney | lyarwood: it was the patch that bumps hacking form 1.x to 3.0 for python 3 support | 19:07 |
sean-k-mooney | ill see if i can figure out how to cap flak8 and py code style to avoid the hacking bump | 19:08 |
sean-k-mooney | but if i cant we will have to deciside if we are oke with the bump to hacking in the pep8 tox env | 19:09 |
sean-k-mooney | given it wont impackt any other test and wont be used at runtime | 19:09 |
*** gyee has joined #openstack-nova | 19:10 | |
sean-k-mooney | if we are ok with going to hacking 3.0 ill look at squashing thsoe two patches | 19:10 |
sean-k-mooney | well tomorrow im going to go get dinner now o/ | 19:11 |
*** k_mouza has joined #openstack-nova | 19:23 | |
*** k_mouza has quit IRC | 19:28 | |
*** boxiang has quit IRC | 19:52 | |
*** boxiang_ has joined #openstack-nova | 19:52 | |
*** vishalmanchanda has quit IRC | 19:59 | |
*** k_mouza has joined #openstack-nova | 20:02 | |
*** k_mouza has quit IRC | 20:07 | |
*** slaweq has quit IRC | 20:14 | |
*** raildo has quit IRC | 21:06 | |
*** pmannidi has joined #openstack-nova | 21:33 | |
openstackgerrit | Merged openstack/nova master: image_meta: Provide image_ref as the id when fetching from instance https://review.opendev.org/c/openstack/nova/+/790659 | 22:01 |
*** k_mouza has joined #openstack-nova | 22:03 | |
*** k_mouza has quit IRC | 22:04 | |
*** k_mouza_ has joined #openstack-nova | 22:04 | |
*** openstackgerrit has quit IRC | 22:05 | |
*** k_mouza_ has quit IRC | 22:09 | |
*** xek has quit IRC | 23:05 | |
*** xek has joined #openstack-nova | 23:05 | |
*** tosky has quit IRC | 23:18 | |
*** openstack has joined #openstack-nova | 23:50 | |
*** ChanServ sets mode: +o openstack | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!