Thursday, 2020-04-16

openstackgerritMerged openstack/octavia master: Correct delay between UDP healthchecks  https://review.opendev.org/71888100:18
*** hongbin has quit IRC00:22
*** hongbin has joined #openstack-lbaas00:29
*** sapd1 has joined #openstack-lbaas02:28
*** JayLiu has joined #openstack-lbaas02:44
*** hongbin has quit IRC02:50
*** JayLiu has quit IRC02:51
*** psachin has joined #openstack-lbaas03:26
*** threestrands has joined #openstack-lbaas03:29
*** osmanlicilegi has joined #openstack-lbaas04:14
*** vishalmanchanda has joined #openstack-lbaas05:01
*** gcheresh has joined #openstack-lbaas05:13
openstackgerritMichael Johnson proposed openstack/octavia master: Refactor the failover flows  https://review.opendev.org/70531706:03
*** ccamposr has joined #openstack-lbaas06:27
rm_worknot sure what happened to https://review.opendev.org/#/c/714855/ ... it was passing and then suddenly not, due to tempest requiring python>=3.606:29
rm_worki guess tempest broke it because they're branchless? this seems like a real issue...06:30
rm_workfor like... most stable branches06:30
*** ccamposr__ has quit IRC06:30
*** maciejjozefczyk has joined #openstack-lbaas07:13
*** ccamposr__ has joined #openstack-lbaas07:26
*** ccamposr has quit IRC07:29
openstackgerritDawson Coleman proposed openstack/octavia master: Add TLS cipher blacklist to octavia.conf  https://review.opendev.org/72037507:30
*** rpittau|afk is now known as rpittau07:34
*** ccamposr has joined #openstack-lbaas07:39
*** ataraday_ has joined #openstack-lbaas07:42
*** ccamposr__ has quit IRC07:43
*** happyhemant has joined #openstack-lbaas08:14
*** ccamposr__ has joined #openstack-lbaas08:41
*** ccamposr has quit IRC08:43
*** threestrands has quit IRC08:45
*** tkajinam has quit IRC08:50
openstackgerritMerged openstack/octavia-dashboard stable/stein: Add missing fields for HTTPS health monitors  https://review.opendev.org/71767209:30
openstackgerritMerged openstack/octavia-dashboard stable/train: Add missing fields for HTTPS health monitors  https://review.opendev.org/71767109:30
openstackgerritMerged openstack/octavia-dashboard stable/rocky: Add missing fields for HTTPS health monitors  https://review.opendev.org/71767309:32
*** born2bake has joined #openstack-lbaas09:51
*** rpittau is now known as rpittau|bbl10:19
openstackgerritMerged openstack/octavia master: Run taskflow jobboard conductor conditionally  https://review.opendev.org/72023711:41
*** rpittau|bbl is now known as rpittau12:07
*** psachin has quit IRC12:16
rm_workcgoncalves: https://review.opendev.org/#/c/719922/ good to go now?12:37
*** vishalmanchanda has quit IRC12:50
*** ataraday_ has quit IRC13:18
*** kevinz has quit IRC13:51
*** zzzeek has joined #openstack-lbaas14:20
zzzeekhi johnsom , rm_work sent me here, I wrote a detailed descriptoin of the sqlite "bug" being observed in some octavia tests and why at least the central test does what it seems to do: https://review.opendev.org/#/c/720244/14:21
johnsomzzzeek Hi there. Yes, so the latest 1.3.16 change caused some of our tests to start failing with sqlite.14:22
johnsomThe old behavior was the sqlite layer always did autocommit, even if in sqlalchemy we set autocommit = False.14:23
zzzeekjohnsom: yes I think at best that is an artifact of garbage collection or something.  nothing has changed in sqlalchemy and there is no "bug" in sqlite14:24
johnsomThat said, when we dug into what is going on we found a odd issue that we cannot work around for 1.3.16.14:24
zzzeekjohnsom: plaease read my explaination linked above, there is no mystery at least within test_sqlite_transactions_broken14:24
johnsomzzzeek It was this line in this change that caused the tests to fail: https://github.com/sqlalchemy/sqlalchemy/commit/9ebbf8614a24fbc430365f6e76c9fd04616992fa#diff-e9762e21a27d8e6c44db6f9dd4edc694R45514:25
johnsomI can toggle pass/no pass by commenting that line.14:26
johnsomHowever, as I said above, I don't think that change is *wrong*, it just uncovered another issue14:26
zzzeekjohnsom: ok here is the thing.   that mehtod and all of its variants are never called14:28
zzzeekat all14:28
zzzeekthat is dead python code that is unused as far as octavia / oslo.db is concerned14:28
johnsomThat other issue is this scenario: We start a session with autocommit=False, we create a row in table A, then later in the same transaction,  we create another row in table B that has a foreign key to the row in table A, it fails as the row in table A disappears from the session/transaction.14:28
johnsomI tried adding a bunch of "flush" calls around the row in Table A, but no luck14:29
zzzeekjohnsom: OK I would have to look at that separately.  the "sqlite broken" test does something very simple and wrong and it is not at all surprising14:29
zzzeekit simply emits an unwanted ROLLBACK on the connection and puts it into autocommit mode14:29
johnsomYeah, I am not looking at that test at all.14:29
zzzeekjohnsom: OK tell me which test to focus on14:29
johnsomThe test that I care about is the create a tree test14:29
zzzeekwaht is the name14:30
johnsomhttps://github.com/openstack/octavia/blob/master/octavia/tests/functional/db/test_repositories.py#L42814:30
zzzeekjohnsom: test only fails when you run the entire suite and not the test itself, is my understanding ?14:31
johnsomWe create the LB record here: https://github.com/openstack/octavia/blob/0ccb997793035588e9289c23fd3f2d0ec7fd578d/octavia/db/repositories.py#L60214:31
johnsomI can select it after that, no problem.14:31
johnsomHowever by this point: https://github.com/openstack/octavia/blob/0ccb997793035588e9289c23fd3f2d0ec7fd578d/octavia/db/repositories.py#L62814:32
johnsomIt is gone from the session/transaction and it fails with a foreign key constraint violation and selects right before that show no LB record.14:32
johnsomzzzeek Yeah, It's only when other tests are running. Timing is random, but so is our test order, so....14:33
johnsomWith that line in the patch commented out, the sqlite switches to autocommit and the LB record never disappears. With that line in the sqlalchemy it disappears randomly.14:34
johnsomI figured we just needed a flush, but that didn't help.14:34
zzzeekjohnsom: so it has something to do with garbage collection, dictionary ordering, or very likely some existing state in the sqlite database14:35
openstackgerritGhanshyam Mann proposed openstack/octavia-tempest-plugin master: Use Tempest compatible version for xenial node job  https://review.opendev.org/72049414:38
*** TrevorV has joined #openstack-lbaas14:39
openstackgerritGhanshyam Mann proposed openstack/octavia stable/stein: DNM: Testing xenail job Tempest issue  https://review.opendev.org/72049514:41
johnsomYeah, I was wondering about a threading issue as well given it only happens when we run multiple test suites14:43
zzzeekjohnsom: OK so we've established that some other tests, related to that sqlite "Broken" thing, are just wrong, e.g. test_create_load_balancer_tree_quotas14:54
zzzeekjohnsom: that test is using two sessions also. you can't do that with a sqlite memory database beacuse there is only a single database connection14:55
zzzeekso that test is invalid14:55
zzzeekunless you run it on mysql14:55
zzzeekjohnsom: so now im going to look at def test_create_load_balancer_tree and just that one alone, right?14:56
zzzeekjohnsom: test uses two sessions also, it's also a broken test.  behavior will be unpreditable14:57
rm_workthe way i see it the test is valid, because it's a totally accurate sample of how our code is run and what results should be produced -- sqlite just can't handle it. the test itself is not invalid14:57
johnsomYeah, I have not looked at the quota one either. I have ignored that test. It's been disabled for a long time14:57
zzzeekrm_work: sure14:58
rm_worksqlite is just not workable for it14:58
zzzeekjohnsom: I just took a cursory look at that test and I see "self.session" and "lock_session".  you wont get predictiable results14:58
rm_workwe're not going to change our tests to not test our code the way it is actually run14:58
zzzeekrm_work: then these tests have to run on mysql or Posgresql or somethign else14:58
zzzeekyou cannot test transactional concurrency with a sqlite memory database beacuse there is only one connection14:58
zzzeekso anyway, there is no reason to blacklist sqlalchemy 1.3.1.614:59
zzzeek1.3.1614:59
johnsomYeah, the two parts that started failing are all inside the "lock_session", it doesn't cross those14:59
zzzeekjohnsom: and if you replace it with "self.session", everything works, right?14:59
johnsomNo15:00
rm_worki would assume not? since the code relies on having two distinct sessions of different types15:00
johnsomWell, if we changed self.session to be autocommit=False, then only used it, the failure reproduces15:00
zzzeekrm_work: all the sessions use ****the same database connection and the same transaction****15:00
rm_workok but **our code relies on having two different sessions**15:01
rm_workso changing it to be the same session will obviously not work, ever15:01
zzzeekrm_work: then you ***cannot use sqlite memory DB for testing that ***15:01
rm_worki'm not disagreeing on that :D15:01
rm_workbut you seem to think that "what sqlite is capable of" is the correct way to do things, over "how the code we're testing actually works"15:01
zzzeekrm_work: I have no doubt your code works if you give it two concurrent transactions15:02
rm_workand it will not work if we only give it one15:02
rm_workso giving it one to make sqlite happy wouldn't suddenly make it work, lol15:02
zzzeekrm_work: I can totally believe that , that is fine, these tests assume multiple database connecvtions15:02
rm_workbecause the code needs them15:04
zzzeekrm_work: would it make more sense if I said, you want to test for a CPU core concurrency, and you run the test on a machine that has only one core, is that "broken" ?15:04
rm_workif our test machine is a single-core machine, and we need two cores to run certain features of our app, then yes i would create a similar canary designed to fail if multiple cores are present, so we can un-skip the tests for the multi-core features15:05
rm_workthat's an excellent analogy actually15:05
zzzeekyes but nothing is "broken", it's just an incompatible environment15:05
rm_workmutli-core execution is broken on a machine with one core15:05
* rm_work shrugs15:05
zzzeeksure but the single core machine is not the "broken" part :)15:05
zzzeekso you need to get these tests to run on myqsl or maybe a sqlite file database15:06
rm_worki thought file DB *also* wouldn't work for this?15:06
zzzeeksqlite file databases tend to lock completely however so you probably can't do it there either15:06
rm_workright15:06
zzzeekit probably wont beucase of the oslo.db thing where they make transactions serializable15:06
rm_workso on a feature-matrix of "things you can expect in SQL", i would say the "simultaneous transactions" / "sqlite" cell would be "broken" or "missing", take your pick, to me the two terms would be interchangeable for this usage15:08
rm_workwhereas for "mysql" it would be "present" or "working"15:08
* rm_work shrugs15:08
*** happyhemant has quit IRC15:14
*** rpittau is now known as rpittau|afk16:18
*** gcheresh has quit IRC16:38
*** servagem has joined #openstack-lbaas17:23
*** rcernin has quit IRC17:37
*** maciejjozefczyk has quit IRC17:39
*** gcheresh has joined #openstack-lbaas18:36
openstackgerritMichael Johnson proposed openstack/octavia master: Refactor the failover flows  https://review.opendev.org/70531719:26
*** TrevorV has quit IRC19:56
*** servagem has quit IRC20:35
*** armax has quit IRC21:04
*** armax has joined #openstack-lbaas21:05
haleybjohnsom, cgoncalves: so i have another question about this provider driver.  I can see that my class is getting initialized on an API call, but I never see it getting deleted.  And every API call creates another.  When I stop the api server I see them all get cleaned-up.  What am i missing?21:07
haleybeven forcefully deleting it in the post() doesn't help21:09
haleybunless i can blame this on using atexit()21:12
*** gcheresh has quit IRC21:13
johnsomhaleyb Some the OVN provider is doing strange? I just checked the amphora driver:21:16
johnsomApr 16 14:15:44 devstack2 devstack@o-api.service[26532]: ERROR octavia.api.drivers.amphora_driver.v1.driver [None req-49ac57c7-73ad-461b-b620-29a9ec1d22af admin admin] @@@@@@@@@@@ BOOM DESTROYED @@@@@@@@21:16
johnsomJust added:21:16
johnsom    def __del__(self):21:17
johnsom        LOG.error('@@@@@@@@@@@ BOOM DESTROYED @@@@@@@@')21:17
haleybjohnsom: so even talking out loud led me to delete the atexit() call and override __del__().... which worked21:17
haleybjust having atexit there caused problems21:17
johnsomNow if only my riddle was so easy to figure out. We broke port plug in the failover patch somehow...  sigh.21:20
haleybi'll have to look after margarita-time, which might not be that helpful :)21:21
johnsomNo worries, I'm tracking it down21:21
haleybjohnsom: but thanks for trying that, i was thinking the api code was my problem21:25
johnsomNP21:26
*** ramishra has quit IRC21:37
openstackgerritMichael Johnson proposed openstack/octavia master: Refactor the failover flows  https://review.opendev.org/70531722:34
*** rcernin has joined #openstack-lbaas22:39
*** tkajinam has joined #openstack-lbaas22:39
*** threestrands has joined #openstack-lbaas23:14
*** threestrands has quit IRC23:15
*** threestrands has joined #openstack-lbaas23:15
*** born2bake has quit IRC23:26
*** sapd1 has quit IRC23:39
*** sapd1 has joined #openstack-lbaas23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!