openstackgerrit | Merged openstack/octavia master: Correct delay between UDP healthchecks https://review.opendev.org/718881 | 00:18 |
---|---|---|
*** hongbin has quit IRC | 00:22 | |
*** hongbin has joined #openstack-lbaas | 00:29 | |
*** sapd1 has joined #openstack-lbaas | 02:28 | |
*** JayLiu has joined #openstack-lbaas | 02:44 | |
*** hongbin has quit IRC | 02:50 | |
*** JayLiu has quit IRC | 02:51 | |
*** psachin has joined #openstack-lbaas | 03:26 | |
*** threestrands has joined #openstack-lbaas | 03:29 | |
*** osmanlicilegi has joined #openstack-lbaas | 04:14 | |
*** vishalmanchanda has joined #openstack-lbaas | 05:01 | |
*** gcheresh has joined #openstack-lbaas | 05:13 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Refactor the failover flows https://review.opendev.org/705317 | 06:03 |
*** ccamposr has joined #openstack-lbaas | 06:27 | |
rm_work | not sure what happened to https://review.opendev.org/#/c/714855/ ... it was passing and then suddenly not, due to tempest requiring python>=3.6 | 06:29 |
rm_work | i guess tempest broke it because they're branchless? this seems like a real issue... | 06:30 |
rm_work | for like... most stable branches | 06:30 |
*** ccamposr__ has quit IRC | 06:30 | |
*** maciejjozefczyk has joined #openstack-lbaas | 07:13 | |
*** ccamposr__ has joined #openstack-lbaas | 07:26 | |
*** ccamposr has quit IRC | 07:29 | |
openstackgerrit | Dawson Coleman proposed openstack/octavia master: Add TLS cipher blacklist to octavia.conf https://review.opendev.org/720375 | 07:30 |
*** rpittau|afk is now known as rpittau | 07:34 | |
*** ccamposr has joined #openstack-lbaas | 07:39 | |
*** ataraday_ has joined #openstack-lbaas | 07:42 | |
*** ccamposr__ has quit IRC | 07:43 | |
*** happyhemant has joined #openstack-lbaas | 08:14 | |
*** ccamposr__ has joined #openstack-lbaas | 08:41 | |
*** ccamposr has quit IRC | 08:43 | |
*** threestrands has quit IRC | 08:45 | |
*** tkajinam has quit IRC | 08:50 | |
openstackgerrit | Merged openstack/octavia-dashboard stable/stein: Add missing fields for HTTPS health monitors https://review.opendev.org/717672 | 09:30 |
openstackgerrit | Merged openstack/octavia-dashboard stable/train: Add missing fields for HTTPS health monitors https://review.opendev.org/717671 | 09:30 |
openstackgerrit | Merged openstack/octavia-dashboard stable/rocky: Add missing fields for HTTPS health monitors https://review.opendev.org/717673 | 09:32 |
*** born2bake has joined #openstack-lbaas | 09:51 | |
*** rpittau is now known as rpittau|bbl | 10:19 | |
openstackgerrit | Merged openstack/octavia master: Run taskflow jobboard conductor conditionally https://review.opendev.org/720237 | 11:41 |
*** rpittau|bbl is now known as rpittau | 12:07 | |
*** psachin has quit IRC | 12:16 | |
rm_work | cgoncalves: https://review.opendev.org/#/c/719922/ good to go now? | 12:37 |
*** vishalmanchanda has quit IRC | 12:50 | |
*** ataraday_ has quit IRC | 13:18 | |
*** kevinz has quit IRC | 13:51 | |
*** zzzeek has joined #openstack-lbaas | 14:20 | |
zzzeek | hi johnsom , rm_work sent me here, I wrote a detailed descriptoin of the sqlite "bug" being observed in some octavia tests and why at least the central test does what it seems to do: https://review.opendev.org/#/c/720244/ | 14:21 |
johnsom | zzzeek Hi there. Yes, so the latest 1.3.16 change caused some of our tests to start failing with sqlite. | 14:22 |
johnsom | The old behavior was the sqlite layer always did autocommit, even if in sqlalchemy we set autocommit = False. | 14:23 |
zzzeek | johnsom: yes I think at best that is an artifact of garbage collection or something. nothing has changed in sqlalchemy and there is no "bug" in sqlite | 14:24 |
johnsom | That said, when we dug into what is going on we found a odd issue that we cannot work around for 1.3.16. | 14:24 |
zzzeek | johnsom: plaease read my explaination linked above, there is no mystery at least within test_sqlite_transactions_broken | 14:24 |
johnsom | zzzeek It was this line in this change that caused the tests to fail: https://github.com/sqlalchemy/sqlalchemy/commit/9ebbf8614a24fbc430365f6e76c9fd04616992fa#diff-e9762e21a27d8e6c44db6f9dd4edc694R455 | 14:25 |
johnsom | I can toggle pass/no pass by commenting that line. | 14:26 |
johnsom | However, as I said above, I don't think that change is *wrong*, it just uncovered another issue | 14:26 |
zzzeek | johnsom: ok here is the thing. that mehtod and all of its variants are never called | 14:28 |
zzzeek | at all | 14:28 |
zzzeek | that is dead python code that is unused as far as octavia / oslo.db is concerned | 14:28 |
johnsom | That other issue is this scenario: We start a session with autocommit=False, we create a row in table A, then later in the same transaction, we create another row in table B that has a foreign key to the row in table A, it fails as the row in table A disappears from the session/transaction. | 14:28 |
johnsom | I tried adding a bunch of "flush" calls around the row in Table A, but no luck | 14:29 |
zzzeek | johnsom: OK I would have to look at that separately. the "sqlite broken" test does something very simple and wrong and it is not at all surprising | 14:29 |
zzzeek | it simply emits an unwanted ROLLBACK on the connection and puts it into autocommit mode | 14:29 |
johnsom | Yeah, I am not looking at that test at all. | 14:29 |
zzzeek | johnsom: OK tell me which test to focus on | 14:29 |
johnsom | The test that I care about is the create a tree test | 14:29 |
zzzeek | waht is the name | 14:30 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/tests/functional/db/test_repositories.py#L428 | 14:30 |
zzzeek | johnsom: test only fails when you run the entire suite and not the test itself, is my understanding ? | 14:31 |
johnsom | We create the LB record here: https://github.com/openstack/octavia/blob/0ccb997793035588e9289c23fd3f2d0ec7fd578d/octavia/db/repositories.py#L602 | 14:31 |
johnsom | I can select it after that, no problem. | 14:31 |
johnsom | However by this point: https://github.com/openstack/octavia/blob/0ccb997793035588e9289c23fd3f2d0ec7fd578d/octavia/db/repositories.py#L628 | 14:32 |
johnsom | It is gone from the session/transaction and it fails with a foreign key constraint violation and selects right before that show no LB record. | 14:32 |
johnsom | zzzeek Yeah, It's only when other tests are running. Timing is random, but so is our test order, so.... | 14:33 |
johnsom | With that line in the patch commented out, the sqlite switches to autocommit and the LB record never disappears. With that line in the sqlalchemy it disappears randomly. | 14:34 |
johnsom | I figured we just needed a flush, but that didn't help. | 14:34 |
zzzeek | johnsom: so it has something to do with garbage collection, dictionary ordering, or very likely some existing state in the sqlite database | 14:35 |
openstackgerrit | Ghanshyam Mann proposed openstack/octavia-tempest-plugin master: Use Tempest compatible version for xenial node job https://review.opendev.org/720494 | 14:38 |
*** TrevorV has joined #openstack-lbaas | 14:39 | |
openstackgerrit | Ghanshyam Mann proposed openstack/octavia stable/stein: DNM: Testing xenail job Tempest issue https://review.opendev.org/720495 | 14:41 |
johnsom | Yeah, I was wondering about a threading issue as well given it only happens when we run multiple test suites | 14:43 |
zzzeek | johnsom: OK so we've established that some other tests, related to that sqlite "Broken" thing, are just wrong, e.g. test_create_load_balancer_tree_quotas | 14:54 |
zzzeek | johnsom: that test is using two sessions also. you can't do that with a sqlite memory database beacuse there is only a single database connection | 14:55 |
zzzeek | so that test is invalid | 14:55 |
zzzeek | unless you run it on mysql | 14:55 |
zzzeek | johnsom: so now im going to look at def test_create_load_balancer_tree and just that one alone, right? | 14:56 |
zzzeek | johnsom: test uses two sessions also, it's also a broken test. behavior will be unpreditable | 14:57 |
rm_work | the way i see it the test is valid, because it's a totally accurate sample of how our code is run and what results should be produced -- sqlite just can't handle it. the test itself is not invalid | 14:57 |
johnsom | Yeah, I have not looked at the quota one either. I have ignored that test. It's been disabled for a long time | 14:57 |
zzzeek | rm_work: sure | 14:58 |
rm_work | sqlite is just not workable for it | 14:58 |
zzzeek | johnsom: I just took a cursory look at that test and I see "self.session" and "lock_session". you wont get predictiable results | 14:58 |
rm_work | we're not going to change our tests to not test our code the way it is actually run | 14:58 |
zzzeek | rm_work: then these tests have to run on mysql or Posgresql or somethign else | 14:58 |
zzzeek | you cannot test transactional concurrency with a sqlite memory database beacuse there is only one connection | 14:58 |
zzzeek | so anyway, there is no reason to blacklist sqlalchemy 1.3.1.6 | 14:59 |
zzzeek | 1.3.16 | 14:59 |
johnsom | Yeah, the two parts that started failing are all inside the "lock_session", it doesn't cross those | 14:59 |
zzzeek | johnsom: and if you replace it with "self.session", everything works, right? | 14:59 |
johnsom | No | 15:00 |
rm_work | i would assume not? since the code relies on having two distinct sessions of different types | 15:00 |
johnsom | Well, if we changed self.session to be autocommit=False, then only used it, the failure reproduces | 15:00 |
zzzeek | rm_work: all the sessions use ****the same database connection and the same transaction**** | 15:00 |
rm_work | ok but **our code relies on having two different sessions** | 15:01 |
rm_work | so changing it to be the same session will obviously not work, ever | 15:01 |
zzzeek | rm_work: then you ***cannot use sqlite memory DB for testing that *** | 15:01 |
rm_work | i'm not disagreeing on that :D | 15:01 |
rm_work | but you seem to think that "what sqlite is capable of" is the correct way to do things, over "how the code we're testing actually works" | 15:01 |
zzzeek | rm_work: I have no doubt your code works if you give it two concurrent transactions | 15:02 |
rm_work | and it will not work if we only give it one | 15:02 |
rm_work | so giving it one to make sqlite happy wouldn't suddenly make it work, lol | 15:02 |
zzzeek | rm_work: I can totally believe that , that is fine, these tests assume multiple database connecvtions | 15:02 |
rm_work | because the code needs them | 15:04 |
zzzeek | rm_work: would it make more sense if I said, you want to test for a CPU core concurrency, and you run the test on a machine that has only one core, is that "broken" ? | 15:04 |
rm_work | if our test machine is a single-core machine, and we need two cores to run certain features of our app, then yes i would create a similar canary designed to fail if multiple cores are present, so we can un-skip the tests for the multi-core features | 15:05 |
rm_work | that's an excellent analogy actually | 15:05 |
zzzeek | yes but nothing is "broken", it's just an incompatible environment | 15:05 |
rm_work | mutli-core execution is broken on a machine with one core | 15:05 |
* rm_work shrugs | 15:05 | |
zzzeek | sure but the single core machine is not the "broken" part :) | 15:05 |
zzzeek | so you need to get these tests to run on myqsl or maybe a sqlite file database | 15:06 |
rm_work | i thought file DB *also* wouldn't work for this? | 15:06 |
zzzeek | sqlite file databases tend to lock completely however so you probably can't do it there either | 15:06 |
rm_work | right | 15:06 |
zzzeek | it probably wont beucase of the oslo.db thing where they make transactions serializable | 15:06 |
rm_work | so on a feature-matrix of "things you can expect in SQL", i would say the "simultaneous transactions" / "sqlite" cell would be "broken" or "missing", take your pick, to me the two terms would be interchangeable for this usage | 15:08 |
rm_work | whereas for "mysql" it would be "present" or "working" | 15:08 |
* rm_work shrugs | 15:08 | |
*** happyhemant has quit IRC | 15:14 | |
*** rpittau is now known as rpittau|afk | 16:18 | |
*** gcheresh has quit IRC | 16:38 | |
*** servagem has joined #openstack-lbaas | 17:23 | |
*** rcernin has quit IRC | 17:37 | |
*** maciejjozefczyk has quit IRC | 17:39 | |
*** gcheresh has joined #openstack-lbaas | 18:36 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Refactor the failover flows https://review.opendev.org/705317 | 19:26 |
*** TrevorV has quit IRC | 19:56 | |
*** servagem has quit IRC | 20:35 | |
*** armax has quit IRC | 21:04 | |
*** armax has joined #openstack-lbaas | 21:05 | |
haleyb | johnsom, cgoncalves: so i have another question about this provider driver. I can see that my class is getting initialized on an API call, but I never see it getting deleted. And every API call creates another. When I stop the api server I see them all get cleaned-up. What am i missing? | 21:07 |
haleyb | even forcefully deleting it in the post() doesn't help | 21:09 |
haleyb | unless i can blame this on using atexit() | 21:12 |
*** gcheresh has quit IRC | 21:13 | |
johnsom | haleyb Some the OVN provider is doing strange? I just checked the amphora driver: | 21:16 |
johnsom | Apr 16 14:15:44 devstack2 devstack@o-api.service[26532]: ERROR octavia.api.drivers.amphora_driver.v1.driver [None req-49ac57c7-73ad-461b-b620-29a9ec1d22af admin admin] @@@@@@@@@@@ BOOM DESTROYED @@@@@@@@ | 21:16 |
johnsom | Just added: | 21:16 |
johnsom | def __del__(self): | 21:17 |
johnsom | LOG.error('@@@@@@@@@@@ BOOM DESTROYED @@@@@@@@') | 21:17 |
haleyb | johnsom: so even talking out loud led me to delete the atexit() call and override __del__().... which worked | 21:17 |
haleyb | just having atexit there caused problems | 21:17 |
johnsom | Now if only my riddle was so easy to figure out. We broke port plug in the failover patch somehow... sigh. | 21:20 |
haleyb | i'll have to look after margarita-time, which might not be that helpful :) | 21:21 |
johnsom | No worries, I'm tracking it down | 21:21 |
haleyb | johnsom: but thanks for trying that, i was thinking the api code was my problem | 21:25 |
johnsom | NP | 21:26 |
*** ramishra has quit IRC | 21:37 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Refactor the failover flows https://review.opendev.org/705317 | 22:34 |
*** rcernin has joined #openstack-lbaas | 22:39 | |
*** tkajinam has joined #openstack-lbaas | 22:39 | |
*** threestrands has joined #openstack-lbaas | 23:14 | |
*** threestrands has quit IRC | 23:15 | |
*** threestrands has joined #openstack-lbaas | 23:15 | |
*** born2bake has quit IRC | 23:26 | |
*** sapd1 has quit IRC | 23:39 | |
*** sapd1 has joined #openstack-lbaas | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!