*** hongbin has joined #openstack-lbaas | 00:26 | |
*** sapd1 has joined #openstack-lbaas | 00:46 | |
*** abaindur has quit IRC | 01:05 | |
*** bzhao__ has joined #openstack-lbaas | 01:10 | |
rm_work | johnsom: anything else? | 01:11 |
---|---|---|
johnsom | rm_work: there are two rocky backports, after that we are good | 01:21 |
rm_work | ok | 01:21 |
rm_work | #1 and 2? | 01:21 |
rm_work | ah those merged? | 01:21 |
rm_work | oh, *are merging* | 01:22 |
rm_work | so don't need me :) | 01:22 |
bzhao__ | Sorry, team. Just back to office after a business trip with miguel. | 01:27 |
rm_work | plz to be ignoring this | 01:48 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: DNM: two dumb downstream things to fix, IGNORE ME https://review.openstack.org/593986 | 01:48 |
rm_work | 100% ignore | 01:48 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Experimental multi-az support https://review.openstack.org/558962 | 01:51 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: AZ Evacuation resource https://review.openstack.org/559873 | 01:51 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: WIP: Floating IP Network Driver (spans L3s) https://review.openstack.org/435612 | 01:55 |
sapd1 | johnsom: I know why my octavia-api is too slow. Because I added a new columns to load_balancer table so when listing from database it takes more times. But I can't know how to solve the problem? | 02:10 |
rm_work | you added DB columns? O_o | 03:15 |
sapd1 | rm_work: I knew why? | 03:17 |
sapd1 | because LoadBalancerRepository extend from BaseRepository, and use get_all function. in get_all function it joins all table. | 03:18 |
rm_work | yeah but like | 03:26 |
rm_work | why did you have to add custom DB columns? | 03:26 |
sapd1 | I would like to use multiple nova flavor for amphora instance. | 03:30 |
sapd1 | In my case, We would like allow use to select flavor of load balancer. | 03:30 |
sapd1 | s/use/user | 03:31 |
sapd1 | rm_work: https://github.com/openstack/octavia/blob/master/octavia/db/repositories.py#L133 why we need join all in here? | 03:31 |
rm_work | ah yeah, so that is the flavor framework | 03:34 |
rm_work | like, that's exactly what it will do | 03:34 |
rm_work | are you working on the patches for that? | 03:34 |
rm_work | that is there because it actually REDUCES the trips to the DB a ton | 03:34 |
rm_work | which is important because actually we found that the round-trip overhead was WAY worse than the delay from doing the joins | 03:35 |
rm_work | even with very large data sets | 03:35 |
sapd1 | rm_work: I used cProfile to tracing source code. And Total time to get from database is too large. | 03:39 |
sapd1 | rm_work: http://paste.openstack.org/show/728470/ | 03:40 |
*** ramishra has joined #openstack-lbaas | 03:40 | |
sapd1 | example | 03:40 |
rm_work | yeah something is wrong with your DB I think :/ | 03:40 |
rm_work | i mean, how many LBs do you have? | 03:41 |
rm_work | it should be subsecond even when joining every other table, with thousands of LBs... | 03:42 |
rm_work | maybe there's a case in the data we missed? | 03:42 |
sapd1 | rm_work: I have 4 LB and on one LB a have 4 L7 Policy. When create more L7 Policy load time is increased. | 03:44 |
rm_work | .... | 03:44 |
rm_work | yeah that's insane | 03:44 |
rm_work | 80s makes NO sense | 03:44 |
sapd1 | s/a/I | 03:45 |
sapd1 | rm_work: normally It takes 20 seconds, But when I add cProfile, It takes more. | 03:45 |
rm_work | right, more than 0.5s or so is nuts for that | 03:45 |
sapd1 | rm_work: Have you checked my story yet? https://storyboard.openstack.org/#!/story/2002933 | 03:45 |
rm_work | so is it ONLY with a ton of L7 policies? | 03:46 |
rm_work | i wonder if there's some sort of weird interaction with the way those are linked in? | 03:46 |
sapd1 | rm_work: I commented query.options(joinedload('*')) and re-run unit test in python-octaviaclient and no errors. | 03:46 |
*** abaindur has joined #openstack-lbaas | 04:04 | |
*** hongbin has quit IRC | 04:10 | |
*** yboaron_ has joined #openstack-lbaas | 04:20 | |
johnsom | dayou Happen to be around? https://review.openstack.org/#/c/591295 A translation for nlbaas. Then I will cut our RC2 | 04:52 |
johnsom | Thank you. RC2 patch posted. Catch you all tomorrow. | 05:18 |
openstackgerrit | Merged openstack/neutron-lbaas-dashboard master: Drop nose dependencies https://review.openstack.org/593147 | 05:20 |
*** abaindur has quit IRC | 05:38 | |
*** abaindur has joined #openstack-lbaas | 05:38 | |
*** yboaron_ has quit IRC | 06:16 | |
openstackgerrit | Yang JianFeng proposed openstack/octavia master: [WIP] Add listener and pool protocol validation. https://review.openstack.org/594040 | 06:27 |
*** rcernin has quit IRC | 06:38 | |
*** rcernin has joined #openstack-lbaas | 06:40 | |
*** pcaruana has joined #openstack-lbaas | 06:42 | |
*** luksky has joined #openstack-lbaas | 06:50 | |
*** rcernin has quit IRC | 06:51 | |
*** velizarx has joined #openstack-lbaas | 07:16 | |
*** abaindur has quit IRC | 07:22 | |
nmagnezi | O/ | 07:41 |
*** velizarx has quit IRC | 07:49 | |
cgoncalves | johnsom, thanks for cutting queens. we could have made to get https://review.openstack.org/#/c/592569/ in too but okay | 07:53 |
cgoncalves | nmagnezi, https://review.openstack.org/#/c/592569/ pretty please :) | 07:53 |
nmagnezi | cgoncalves, +2 | 08:08 |
cgoncalves | tks | 08:08 |
openstackgerrit | Yang JianFeng proposed openstack/octavia master: [WIP] Add listener and pool protocol validation. https://review.openstack.org/594040 | 08:13 |
*** velizarx has joined #openstack-lbaas | 08:23 | |
openstackgerrit | Nir Magnezi proposed openstack/octavia master: DNM: Leave VIP NIC plugging for keepalived https://review.openstack.org/589292 | 08:38 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Add octavia-v2-dsvm-scenario-ipv6 https://review.openstack.org/594078 | 08:56 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Add octavia-v2-dsvm-scenario-ipv6 https://review.openstack.org/594078 | 09:00 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Temporarily remove octavia-v2-dsvm-scenario-ubuntu.bionic https://review.openstack.org/588883 | 09:07 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 09:09 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442 | 09:15 |
openstackgerrit | Kobi Samoray proposed openstack/neutron-lbaas master: nlbaas2octavia: do not change SG owned by user https://review.openstack.org/592471 | 09:28 |
*** luksky has quit IRC | 09:28 | |
openstackgerrit | Kobi Samoray proposed openstack/neutron-lbaas master: nlbaas2octavia: improve member error log message https://review.openstack.org/593610 | 09:28 |
*** luksky has joined #openstack-lbaas | 10:02 | |
*** dolly has joined #openstack-lbaas | 10:22 | |
dolly | johnsom: yesterday you were saying something about making a "stable" release for queens and that it would be ready today, did you manage to do so ? | 10:23 |
dolly | is it the "stable/queens" branch in git repo ? | 10:23 |
dolly | cgoncalves: you there ? | 10:26 |
*** hvhaugwitz has quit IRC | 10:45 | |
*** hvhaugwitz has joined #openstack-lbaas | 10:45 | |
*** velizarx has quit IRC | 10:46 | |
*** velizarx has joined #openstack-lbaas | 10:46 | |
jiteka | Hello, I'm still doing some testing on our deployement of octavia in queens (with ACTIVE/STANDBY topology) | 11:10 |
jiteka | I have a scenario where, one of my amphora went in error while trying to failover. | 11:10 |
jiteka | I tried to delete the amphora VM (Backup) in error to see if health-manager would catch it and generate a new VM | 11:10 |
jiteka | But now I end up with only 1 amphora MASTER and the loadbalancer show as ONLINE (operating_status) | 11:10 |
jiteka | http://paste.openstack.org/show/728491/ | 11:10 |
jiteka | What could be done to come back to an ACTIVE/STANDBY haproxy on that VIP ? | 11:10 |
jiteka | More generally how to deal with amphora in ERROR ? | 11:10 |
jiteka | johnsom: looking at https://github.com/openstack/octavia/tree/stable/queens | 11:12 |
jiteka | I see that latest commit was 3 days ago 69beadc7a8a14c2fedee79227b38bc37153b5dce (Merge "Fix neutron "tenat_id" compatibility" into stable/queens) | 11:12 |
jiteka | Is it the correct way to double check if a new version can be used ? | 11:12 |
*** crazik has joined #openstack-lbaas | 11:35 | |
crazik | hello. | 11:35 |
crazik | I had some issues with DB and octavia. Tried to fix this, and now I have only MASTER amphora for LB. | 11:36 |
crazik | how to force octavia to add BACKUP amphorae? | 11:36 |
crazik | tried loadbalancer failover, but with no effect. | 11:36 |
crazik | any ideas? | 11:38 |
rm_work | jiteka: that is a thing i've run into a lot -- i believe we did have a fix recently that improved handling of that a bit | 11:47 |
rm_work | if you try a manual failover of that amp via the amphora-api, it might perform better maybe | 11:48 |
cgoncalves | yes. most of failover issues should be fixed in rocky/master and backported to queens | 11:48 |
crazik | hm. how can I do that? | 11:49 |
cgoncalves | dolly, yes | 11:49 |
cgoncalves | dolly, still open https://review.openstack.org/#/c/593954/ | 11:49 |
rm_work | crazik: sorry was still talking to jiteka -- seems you ran into basically the same issue at the same time | 11:50 |
rm_work | if it is GONE, i think the only way is manual DB hackery | 11:50 |
crazik | ;) | 11:50 |
crazik | I did manual DB cleanup | 11:50 |
nmagnezi | johnsom, do you want to have this in RC2? https://review.openstack.org/#/c/589408/ | 11:50 |
rm_work | which I have done but requires you to be very careful and know what you're doing exactly :P | 11:51 |
crazik | and at the end, I have only one amphora in DB. LB is working, but in ERROR state | 11:51 |
rm_work | yeah | 11:51 |
rm_work | so | 11:51 |
rm_work | you can ... kinda fix it manually | 11:51 |
rm_work | if it's absolutely necesary | 11:51 |
crazik | need a way to recreate backup amphora | 11:51 |
rm_work | copy the entry in the amp table for the master, and change the role to BACKUP, and all of the ID fields to made-up uuids | 11:51 |
rm_work | err, when I say "all", I mean | 11:52 |
crazik | I have db backup, I can try to re-add previous one | 11:52 |
rm_work | compute_id, vrrp_port_id | 11:53 |
rm_work | ah yeah that works too | 11:53 |
rm_work | re-add the previous line | 11:53 |
rm_work | and THEN trigger a failover via the amp-api, of that amp | 11:53 |
rm_work | i am used to not knowing the old info | 11:53 |
rm_work | congrats on having a backup :P | 11:53 |
crazik | ;> | 11:53 |
crazik | ok, I will try. | 11:53 |
crazik | thanks | 11:53 |
dolly | cgoncalves: ok cool. trying to build my own container as we speak :p | 11:54 |
*** numans_ has joined #openstack-lbaas | 12:07 | |
*** velizarx has quit IRC | 12:08 | |
*** numans has quit IRC | 12:10 | |
*** velizarx has joined #openstack-lbaas | 12:18 | |
jiteka | rm_work: just saw that https://ghosthub.corp.blizzard.net/openstack/octavia/commit/72715ba6197105db352779e0236ac108a710f72d | 12:24 |
jiteka | and we are running a version without that change | 12:24 |
jiteka | I will try to get updated our deployement and see how it behave | 12:24 |
rm_work | lol internal git | 12:28 |
rm_work | but yeah that one is important | 12:29 |
jiteka | rm_work: I can't agree or deny that we have an internal git | 12:32 |
jiteka | :) | 12:32 |
rm_work | you definitely can't deny :P | 12:33 |
rm_work | it's basically required tho, github breaks builds like crazy with timeouts | 12:33 |
jiteka | rm_work: ahaha yes that's good practice in CI/CD | 12:34 |
jiteka | rm_work: never said we are following it (or not) | 12:34 |
jiteka | rm_work: secrets everywhere | 12:34 |
*** Krast has joined #openstack-lbaas | 12:44 | |
*** velizarx has quit IRC | 12:45 | |
dolly | cgoncalves: you there ? | 13:50 |
cgoncalves | dolly, yep | 13:51 |
dolly | cgoncalves: so, I went out on a lim here and made some custom octavia containers. All I did was take the branch stable/queens from github and overwrote the octavia-sitepackage in the container. Got everything up and running, can talk to the api and so forth. But when deploying an LB I get this, http://paste.openstack.org/show/728523/ | 13:54 |
dolly | now I understand that this is not supported in any way | 13:54 |
dolly | I just figure I might as well do some testing and see what I come up with | 13:54 |
dolly | Do you think it has to do with the amphora-image ? Because I haven't rebuilt that one that is used for building the amphoras | 13:55 |
cgoncalves | dolly, might be, yes. I reckon I fixed some things in the amphora agent side for centos | 13:56 |
dolly | cgoncalves: Is this image available somewhere ? Or can I build it easily myself | 13:56 |
cgoncalves | dolly, for testing purposes, there are a few options | 13:58 |
cgoncalves | dolly, I would say download http://tarballs.openstack.org/octavia/test-images/test-only-amphora-x64-haproxy-centos-7.qcow2 | 13:58 |
cgoncalves | it is based on master but should be okay | 13:59 |
dolly | Cool! | 13:59 |
cgoncalves | other option is to build your own. use diskimage-create: https://github.com/openstack/octavia/tree/master/diskimage-create | 13:59 |
dolly | Ah, well if you have one build already I gladly try that one =) | 14:00 |
dolly | Just upload it with a specific name and put that name in the octavia-conf right ? | 14:00 |
cgoncalves | upload it under the 'service' project and tag it | 14:00 |
dolly | yep! | 14:01 |
dolly | got it | 14:01 |
cgoncalves | octavia will use latest uploaded/created image filtering by tag name | 14:01 |
dolly | perfect =) | 14:02 |
cgoncalves | "amphora-image" is the tag used by default in TripleO-based envs | 14:02 |
dolly | got it | 14:02 |
dolly | 518 234 586 2,10KB/s eta 23s <- whats going on here :D | 14:03 |
dolly | downloads like 70% of the image and then croaks | 14:03 |
cgoncalves | downloading. full speed so far | 14:03 |
dolly | ah now it started again :) | 14:03 |
cgoncalves | 100% downloaded at full speed | 14:04 |
dolly | yep, the reason why I got confused was because I started on my desktop, it croaked at ~50-60%.. Then from a server, same thing happend.. Not sure why, but started to work anyway so.. | 14:05 |
dolly | Now lets upload it to our cloud =) | 14:05 |
cgoncalves | johnsom, thanks! :) | 14:07 |
johnsom | cgoncalves The priority bug list is your friend... | 14:08 |
dolly | $ > openstack loadbalancer list | 14:17 |
dolly | $ > | 13e711ee-daad-4dee-b1e4-5f9ae23a45a6 | pm-lb | 91cf3955df114256870c20b7737b3a41 | 10.40.5.16 | ACTIVE | octavia | | 14:17 |
dolly | Would you look at that =) | 14:17 |
cgoncalves | yay! | 14:18 |
dolly | Running with both the multi_az patch and branch stable/queens from the git repo | 14:18 |
dolly | On a standard OSP 13 installation | 14:18 |
dolly | ACTIVE_STANDBY mode btw. | 14:18 |
dolly | Let me see what happens when I destroy the activve amphora :p | 14:18 |
cgoncalves | and amp from master (built today) | 14:18 |
dolly | yes, perfect =) | 14:18 |
openstackgerrit | Jacky Hu proposed openstack/neutron-lbaas-dashboard master: Remove obsolete gate hooks https://review.openstack.org/594243 | 14:23 |
dolly | Oh, btw, the octavia_health_manager container is unhealthy, same with octavia_housekeeping... | 14:23 |
dolly | I reckon that is not suppose to be like that ? | 14:23 |
dolly | Are all containers suppose to be healthy ? | 14:24 |
dolly | (just so I know) | 14:24 |
cgoncalves | what is because the container doesn't include a health check | 14:24 |
cgoncalves | should be fixed in OSP13 z2 https://bugzilla.redhat.com/show_bug.cgi?id=1517500 | 14:24 |
openstack | bugzilla.redhat.com bug 1517500 in openstack-tripleo-common "OPS Tools | Availability Monitoring | Octavia dockers monitoring support" [Medium,Assigned] - Assigned to mmagr | 14:24 |
dolly | Hm ok, well I deleted the master amp. But no new amp got created. Switchover worked though, so backends still available through LB VIP. | 14:26 |
dolly | 2018-08-21 14:27:07.343 82 WARNING stevedore.named [-] Could not load health_db: NoMatches: No 'octavia.amphora.health_update_drivers' driver found, looking for 'health_db'' | 14:27 |
dolly | 2018-08-21 14:27:07.343 83 WARNING stevedore.named [-] Could not load stats_db: NoMatches: No 'octavia.amphora.stats_update_drivers' driver found, looking for 'stats_db' | 14:27 |
dolly | Not sure if that is good ? | 14:27 |
dolly | Hm, where are the octavia.amphora.health_update_drivers and octavia.amphora.health_update_drivers suppose to be ? Maybe I missed something when building my containers ? | 14:31 |
cgoncalves | there are defaults set for both: http://git.openstack.org/cgit/openstack/octavia/tree/octavia/common/config.py#n213 | 14:32 |
cgoncalves | so I'm not sure why it is not loading | 14:32 |
dolly | But what is health_db ? | 14:34 |
johnsom | It a health manager driver that stores the results in the mysql database (there is also a logging only driver) | 14:35 |
dolly | But if I do a "grep health_db" in site-packages/octavia-directory I only find the reference from the config-file... Nothing more.. Shouldn't I see more references to it ? | 14:36 |
johnsom | dolly: https://github.com/openstack/octavia/tree/stable/queens/octavia/controller/healthmanager/health_drivers | 14:37 |
johnsom | Assuming you are running queens | 14:37 |
johnsom | Same path for Rocky/master | 14:37 |
dolly | 2018-08-21 14:39:38.057 23 DEBUG octavia.amphorae.drivers.health.heartbeat_udp [-] Received packet from ('172.24.0.13', 30401) dorecv /usr/lib/python2.7/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:187 2018-08-21 14:39:38.101 30 WARNING stevedore.named [-] Could not load health_db 2018-08-21 14:39:38.103 31 WARNING stevedore.named [-] Could not load stats_db | 14:39 |
dolly | Ok, so there is something fishy going on with the healthmanager for sure then... | 14:40 |
johnsom | Yeah, it looks like you have version mis-match somehow | 14:41 |
dolly | well, since I "manually hack this" - that is definitely possible. | 14:42 |
dolly | version mismatch between what you mean ? | 14:42 |
johnsom | I would not expect heartbeat_udp.py to exist on queens or rocky | 14:42 |
johnsom | I would update your version of Octavia inside that container. Be sure to pip uninstall then pip install. | 14:43 |
dolly | Hm, ok. Well what I did to "update my octavia", was to clone the git-repo, check out the stable/queens branch, and then create a docker file where I copied all the content from the folder octavia into the containers /usr/lib/python2.7/site-packages/ | 14:45 |
dolly | But maybe that will only partially work then. | 14:46 |
cgoncalves | dolly, there's an easier and better way. 1 sec | 14:46 |
dolly | cgoncalves, I'm sure there is :D | 14:46 |
cgoncalves | https://docs.openstack.org/kolla/latest/admin/image-building.html | 14:48 |
cgoncalves | I ran it once months ago. I don't remember exactly which steps I took but definitely based on that page | 14:48 |
jiteka | Just deployed the latest version of stable/queens in our environment and hit a bug : | 14:53 |
jiteka | Exception during message handling: InvalidRequestError: Entity '<class 'octavia.db.models.Amphora'>' has no property 'show_deleted' | 14:53 |
jiteka | log : http://paste.openstack.org/show/728529/ | 14:53 |
jiteka | is it something that was fixed in rocky but not backported yet ? | 14:53 |
johnsom | No, again something is wrong with your versioning. Did you run the DB migration? | 14:54 |
jiteka | johnsom: hello, no I didn't | 14:55 |
jiteka | johnsom: I though it was only needed when changing major version | 14:55 |
johnsom | I would recommend also reading the upgrade guide: https://docs.openstack.org/octavia/latest/admin/guides/upgrade.html | 14:55 |
jiteka | johnsom: was in stable/queens and I'm still in stable/queens | 14:55 |
johnsom | Well, I'm not sure what state your environment is in. I'm not sure that is really a schema issue or not, but it's a thought. | 14:56 |
jiteka | johnsom: but it makes sense if changes were made on the DB scheme within the same release | 14:56 |
johnsom | We don't really have a column for show deleted, so it might be completely wrong. I'm just not sure how you could get that error with an install..... | 14:57 |
jiteka | no I didn't got that with an install I got that on a lb failover after refreshing our docker image and restart service with latest version of stable/queens | 14:58 |
jiteka | johnsom: including these changes http://paste.openstack.org/show/728531/ | 14:59 |
johnsom | jiteka Is your API process the same version of stable/queens as your octavia-worker process? | 15:02 |
jiteka | johnsom: yes | 15:06 |
johnsom | jiteka Ok, confirmed we have a problem on queens somehow. Looking into that now. Likely a missing backport | 15:07 |
johnsom | nmagnezi Are you around? | 15:09 |
*** pcaruana has quit IRC | 15:09 | |
*** dlundquist has quit IRC | 15:39 | |
*** strigazi has quit IRC | 15:46 | |
*** strigazi has joined #openstack-lbaas | 15:46 | |
*** luksky has quit IRC | 15:48 | |
openstackgerrit | Merged openstack/octavia master: Remove user_group option https://review.openstack.org/589408 | 15:51 |
*** ramishra has quit IRC | 15:54 | |
johnsom | Cores, please review https://review.openstack.org/594332 to fix a bad backport impacting failover | 15:55 |
*** velizarx has joined #openstack-lbaas | 15:58 | |
*** sapd1_ has joined #openstack-lbaas | 16:01 | |
johnsom | jiteka https://review.openstack.org/594332 will fix your queens issue | 16:01 |
jiteka | thanks johnsom, lgtm | 16:12 |
jiteka | will share it with other colleague to have them taking a look as well | 16:13 |
johnsom | I will try to get that into the 2.0.2 queens release | 16:13 |
*** sapd1_ has quit IRC | 16:19 | |
jiteka | johnsom: did that new version you told me about yesterday was released ? | 16:28 |
jiteka | johnsom: or it's 2.0.2 ? | 16:28 |
johnsom | It's 2.0.2, it didn't get merged last night for whatever reason, so I will slip it in that version. | 16:29 |
jiteka | johnsom: ok I was correct then assuming that it wasn't available yet | 16:32 |
johnsom | Yeah, just didn't get reviewed last night | 16:32 |
*** sapd1_ has joined #openstack-lbaas | 16:32 | |
sapd1_ | johnsom: I don't know why This have to join all table. | 16:32 |
sapd1_ | https://github.com/openstack/octavia/blob/master/octavia/db/repositories.py#L96 | 16:32 |
johnsom | sapd1_ Yeah, I have started to look at some of these issues. It is definitely a regression in API performance. | 16:33 |
johnsom | I know that there was some concern with the number of DB connections sqlalchemy was making, which led to some of these changes. But we need to re-evaluate and work on some optimizations. | 16:34 |
sapd1_ | johnsom: When I remove the options. API performance is increased | 16:34 |
johnsom | Yes, it's just not that simple sadly. It's going to take some work | 16:34 |
johnsom | Y | 16:35 |
jiteka | johnsom: Applying the current stable/queens with running LB was a bit scary, all active LB transitioned their state to PENDING_UPDATE. Hopefully VIP were still reachable but not sure how I would recover from this state if it was production vip | 16:35 |
johnsom | jiteka Did you follow the upgrade guide steps? | 16:36 |
jiteka | johnsom: forcing a state change via CLI is not a thing for the moment right ? something like loadbalancer set --state error LB_ID | 16:36 |
jiteka | johnsom: no that was my mistake when I did the db upgrade head octavia service were running | 16:36 |
johnsom | PENDING_UPDATE means the controllers have taken ownership and are actively doing something with the LBs. Forcing will toast your cloud | 16:36 |
jiteka | johnsom: but restarting services kept them in PENDING_UPDATE | 16:37 |
jiteka | johnsom: as I was on an unstable version (not including your fix), I destroyed all LB (after moving their state in error on the DB itself) | 16:37 |
jiteka | johnsom: were just testing LBs | 16:37 |
johnsom | You restarted gracefully right? Not a kill -9? | 16:37 |
jiteka | johnsom: yes it was not a kill | 16:38 |
johnsom | Then if they were in PENDING_* the processes should not exit until they are back in ACTIVE or ERROR | 16:38 |
johnsom | I should also comment, I don't use OSP, so things might be different in OSP | 16:39 |
jiteka | johnsom: in our deployement systemctl stop octavia-{api,housekeeping,worker,health-manager} is actually a stop on a docker container | 16:40 |
johnsom | I'm not really sure why they would flip to PENDING_UPDATE either, other than somehow it decided they all needed to failover | 16:40 |
jiteka | it's a "docker stop" | 16:40 |
johnsom | jiteka OH! If that does not gracefully shutdown the processes you are going to have a lot of pain | 16:40 |
jiteka | hmm | 16:40 |
jiteka | johnsom: actually each restart is a bit painfull because it takes something like 3 attempt on any call that do a POST to get it pass to the handler (most of the time octavia-worker) to get the action done | 16:41 |
jiteka | johnsom: was thinking about worker not connecting faster enough to the rabbitMQ | 16:42 |
jiteka | johnsom: I'm not using OSP | 16:42 |
jiteka | johnsom: using source with in-house CI/CD with control plane living in docker containers and configuration managed via custom puppet module | 16:42 |
johnsom | Oh, ok, sorry I got confused with all of the new installations | 16:43 |
jiteka | ahaha yes I understand | 16:43 |
jiteka | but no, no ansible-kolla or OSA or OSP distro | 16:43 |
johnsom | Tell me more about this 3 attempts? What is going on? | 16:43 |
jiteka | so for example I'm restarting the service because I changed a value in conf or just pushed a new image version of my dockers images (that will be used at the next start) | 16:44 |
jiteka | if I try to create a new LB | 16:44 |
jiteka | it will always fail 3 times before working | 16:44 |
jiteka | same with trying to delete an existing LB | 16:44 |
jiteka | but anything that is handle at the API level work | 16:45 |
johnsom | Right after the restart our just always? | 16:45 |
jiteka | like loadbalancer list or loadbalancer amphora list | 16:45 |
jiteka | just after the restart | 16:45 |
jiteka | as soon as the first action reach octavia-worker | 16:45 |
johnsom | Are you fronting the API with a load balancer? like haproxy? | 16:45 |
jiteka | all the others will also get to them (octavia-workers) | 16:45 |
jiteka | yes I have 2 controller nodes running the docker containers with a haproxy balancing api traffic between them | 16:46 |
johnsom | list calls don't go to the worker, they are serviced by the API layer directly. So no rabbit there | 16:46 |
jiteka | I see my delete/create call on api logs | 16:46 |
jiteka | but they don't get to octavia-worker | 16:46 |
jiteka | and after insisting | 16:46 |
jiteka | it works | 16:46 |
johnsom | So the sequence is: | 16:47 |
jiteka | list always works | 16:47 |
johnsom | Ok. hmm, and you are on queens? | 16:47 |
jiteka | only POST call that need to be handle from octavia-worker fail 2 time then works the 3rd | 16:47 |
jiteka | (or fail 3 times then work the 4rd I don't remember) | 16:47 |
jiteka | yes running everything in stable/queens | 16:47 |
johnsom | And by fail, the user gets an error back on the command line? | 16:48 |
jiteka | no the user get a 200 | 16:48 |
jiteka | but nothing happened on the backend | 16:48 |
johnsom | So it's going onto the queue then. And the controllers doen't run all three calls? | 16:48 |
jiteka | no | 16:48 |
jiteka | only 1 | 16:49 |
jiteka | but that's only happening after a restart | 16:49 |
jiteka | each call get process otherwise | 16:49 |
johnsom | That is super odd, if the user got 200 it should mean it posted the action to the rabbit queue. When a controller comes up it should pop that from the queue. Three calls should mean three executions | 16:50 |
johnsom | Is rabbit somehow getting restarted too and not running with durrable queues? | 16:50 |
*** velizarx has quit IRC | 16:50 | |
jiteka | I don't restart the rabbit when restarting my docker containers with octavia services inside (it's 1 container per service) | 16:51 |
jiteka | if it help here are the few parameter relative to rabbitMQ http://paste.openstack.org/show/728545/ | 16:52 |
jiteka | but I'm not sure thats the culprit, was just a guess | 16:53 |
jiteka | some of these value may come from config we run for other modules (like nova, neutron etc.) | 16:53 |
jiteka | we are still tweaking the service | 16:54 |
johnsom | jiteka Yeah, you have me stumped on that one. | 16:55 |
johnsom | I guess open a story on it with steps to reproduce | 16:55 |
rm_work | dolly: you're using my multi-az patch? :P | 16:56 |
johnsom | We don't set any of those rabbit specific settings, so I can't talk to those, but hmmm | 16:56 |
jiteka | johnsom: first I will see if it's still a problem when deploying 2.0.2 | 16:56 |
johnsom | Could be this: kombu_reconnect_delay = 15 | 16:57 |
sapd1_ | jiteka: Could you share steps to reproduce? | 16:57 |
johnsom | If it's waiting like 15 seconds before reconnecting to the queue | 16:57 |
johnsom | bug still, should get all three calls I thingk | 16:57 |
jiteka | johnsom: just as precision, I'm not trying to delete or create LB right after the restart. I let sometime few minutes but it still happening | 17:00 |
jiteka | johnsom sapd1 : I'm retrying it now and will share step in detail and logs | 17:00 |
jiteka | actually I'm running out of time, will take care of this tomorrow morning | 17:03 |
sapd1_ | jiteka: are you running Octavia in production now? | 17:04 |
jiteka | no | 17:05 |
jiteka | only cloud admin have access to APIs for the moment | 17:05 |
jiteka | wouldn't leave it like that if it was in production ^^ | 17:05 |
johnsom | Ok, please open a story with the steps | 17:06 |
jiteka | johnsom: ok I will | 17:07 |
rm_work | yeah i'm unclear why all your LBs went to PENDING_UPDATE still O_o | 17:21 |
rm_work | there is no reason for that to happen automatically unless like, all of them tried to failover | 17:21 |
rm_work | or did we figure that out already and i missed it | 17:21 |
johnsom | No, no idea on that one | 17:21 |
johnsom | rm_work Can you take a quick look at: https://review.openstack.org/#/c/594332/ | 17:22 |
rm_work | yeah i saw | 17:22 |
rm_work | which patch didn't we backport? | 17:22 |
johnsom | the "DELETED" as 404 patch as it changed the API version | 17:22 |
rm_work | ah <_< | 17:23 |
johnsom | This one: https://review.openstack.org/#/c/564430/ | 17:23 |
rm_work | i wish we could have merged that | 17:24 |
rm_work | i only -1'd it because i thought we should have the discussion | 17:24 |
rm_work | not because i didn't want to do it :P | 17:24 |
johnsom | Yeah, it's a good point though that we don't want to bump that version on queens.... We need that to fix the tests (which I am working on right now) | 17:25 |
*** velizarx has joined #openstack-lbaas | 17:29 | |
*** numans_ has quit IRC | 17:54 | |
*** luksky has joined #openstack-lbaas | 18:02 | |
*** sapd1_ has quit IRC | 18:08 | |
*** dolly_ has joined #openstack-lbaas | 18:31 | |
dolly_ | Hi again guys, sorry to bother this much but I feel that I'm pretty close to get this up and running. As I said earlier was to use the github stable/queens branch and put it into a container. This seems to be all fine and dandy and most things seems to work. Except for the health-manager, doesn't say much except this, http://paste.openstack.org/show/728549/ | 18:34 |
dolly_ | Now, 1 ) is the healthmanager responsible for detecting if an amphora dissapears and thus trigger a build of a new one ? | 18:35 |
dolly_ | Cause that would explain why that doesn't happen in my current setup =) | 18:35 |
*** abaindur has joined #openstack-lbaas | 18:36 | |
*** abaindur has quit IRC | 18:36 | |
*** abaindur has joined #openstack-lbaas | 18:37 | |
*** dougwig has quit IRC | 18:42 | |
*** fyx has quit IRC | 18:42 | |
*** mnaser has quit IRC | 18:42 | |
johnsom | dolly_ That is correct, those errors mean that stats are not getting update and the health monitoring of the amphroa is not active. (note, those should probably be ERROR and not warning) | 18:43 |
dolly_ | ok cool. then I just need to understand why the health-manager cant find that health_db/stats_db driver, right ? | 18:45 |
johnsom | yes | 18:45 |
dolly_ | ok cool | 18:45 |
johnsom | It should be using setuptools entrypoints, loaded from the setup.cfg in octavia | 18:45 |
johnsom | at install time | 18:45 |
dolly_ | awesome, I'll dig deeper =) | 18:58 |
*** mnaser has joined #openstack-lbaas | 19:09 | |
*** luksky11 has joined #openstack-lbaas | 19:14 | |
*** luksky has quit IRC | 19:17 | |
*** celebdor has joined #openstack-lbaas | 19:22 | |
*** luksky has joined #openstack-lbaas | 19:36 | |
*** luksky11 has quit IRC | 19:38 | |
*** PagliaccisCloud has quit IRC | 19:39 | |
*** cgoncalves has quit IRC | 19:39 | |
*** PagliaccisCloud has joined #openstack-lbaas | 19:46 | |
*** dolly_ has quit IRC | 19:47 | |
*** luksky11 has joined #openstack-lbaas | 19:56 | |
*** luksky has quit IRC | 20:00 | |
*** beisner_ has joined #openstack-lbaas | 20:12 | |
*** velizarx has quit IRC | 20:17 | |
*** crazik has quit IRC | 20:19 | |
*** beisner has quit IRC | 20:19 | |
*** beisner_ is now known as beisner | 20:19 | |
nmagnezi | johnsom, thanks for the vote :) | 20:23 |
nmagnezi | johnsom, now that list looks great :D https://etherpad.openstack.org/p/octavia-priority-reviews | 20:24 |
johnsom | nmagnezi It didn't make RC2 as I didn't have enough cores for the backport vote, but if we can get that merged back in stable I will do an RC3 for you. | 20:25 |
nmagnezi | johnsom, I don't think we need an RC3 just for that one | 20:25 |
nmagnezi | johnsom, I'll propose a backports anyways, but don't cut another RC just for that one | 20:26 |
johnsom | Ok, if we can turn it around it's not that big of deal, I just didn't want to hold everything else | 20:26 |
johnsom | if I couldn't get a core quorum | 20:26 |
nmagnezi | johnsom, got it | 20:28 |
*** blake has joined #openstack-lbaas | 20:29 | |
openstackgerrit | Merged openstack/neutron-lbaas-dashboard master: Remove obsolete gate hooks https://review.openstack.org/594243 | 20:44 |
*** blake has quit IRC | 20:53 | |
*** blake has joined #openstack-lbaas | 20:54 | |
*** harlowja has joined #openstack-lbaas | 21:02 | |
*** luksky11 has quit IRC | 21:29 | |
KeithMnemonic | smcginnis: I think this one is ready now https://review.openstack.org/#/c/589576/ | 21:50 |
johnsom | KeithMnemonic Wrong channel? | 21:52 |
KeithMnemonic | dah | 21:54 |
KeithMnemonic | how are you doing today johnsom? I hope you are doing well | 21:55 |
*** blake_ has joined #openstack-lbaas | 21:55 | |
*** blake_ has quit IRC | 21:55 | |
johnsom | Ha, yeah, it's going along.... Fun with tempest tests today. You? | 21:55 |
KeithMnemonic | so my octavia, designate, neutron troubleshooting session got accepted ;-). | 21:55 |
johnsom | Nice! | 21:56 |
johnsom | I won't be in Berlin, but will be cheering for you remote | 21:56 |
KeithMnemonic | thanks, i think we will focus on amphora launch, maybe HA | 21:59 |
*** blake has quit IRC | 21:59 | |
KeithMnemonic | i have about 25 minutes or so to cover it | 21:59 |
johnsom | Let me know if you want me to review something or you have any questions | 22:00 |
*** rcernin has joined #openstack-lbaas | 22:10 | |
johnsom | nmagnezi dayou If you have a minute, I have fix for Queens: https://review.openstack.org/#/c/594332/ then we can cut 2.0.2 | 22:38 |
*** celebdor has quit IRC | 23:14 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!