*** Swami has quit IRC | 00:08 | |
*** abaindur has quit IRC | 00:56 | |
*** ricolin has joined #openstack-lbaas | 01:02 | |
sapd1 | I don't know why he does not continue implement l3-active-active feature? https://review.openstack.org/#/q/owner:yjf1970231893%2540gmail.com+status:open | 01:03 |
---|---|---|
*** hongbin has joined #openstack-lbaas | 01:33 | |
*** ricolin_ has joined #openstack-lbaas | 01:50 | |
*** ricolin has quit IRC | 01:50 | |
*** hongbin has quit IRC | 02:35 | |
*** hongbin has joined #openstack-lbaas | 02:38 | |
*** hongbin has quit IRC | 02:49 | |
*** hongbin has joined #openstack-lbaas | 02:51 | |
*** yamamoto has quit IRC | 03:40 | |
*** yamamoto has joined #openstack-lbaas | 03:55 | |
*** ramishra has joined #openstack-lbaas | 03:59 | |
*** hongbin has quit IRC | 04:05 | |
*** Vorrtex has joined #openstack-lbaas | 04:13 | |
*** Vorrtex has quit IRC | 04:22 | |
*** abaindur has joined #openstack-lbaas | 04:51 | |
*** abaindur has quit IRC | 05:12 | |
*** ricolin_ has quit IRC | 05:58 | |
*** ccamposr has joined #openstack-lbaas | 06:10 | |
*** ricolin has joined #openstack-lbaas | 06:22 | |
rm_work | sapd1: basically, that work was being done by walmart, and they ended up dropping the whole project IIRC | 06:27 |
rm_work | johnsom might be able to corroborate that or correct me | 06:27 |
sapd1 | worst | 06:30 |
*** pcaruana has joined #openstack-lbaas | 06:30 | |
*** abaindur has joined #openstack-lbaas | 06:36 | |
sapd1 | rm_work, do you know his IRC nickname? | 06:41 |
rm_work | sapd1: not sure if he's still here anymore | 06:42 |
rm_work | umm | 06:42 |
rm_work | i think we interfaced with someone else there | 06:43 |
sapd1 | Ya. I see a topic about active/active this PTG. looking forward to hear something new. I'm trying to follow some patches from he. But seem likes he does not work on it any more. | 06:47 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Set member initializing state as OFFLINE https://review.openstack.org/651111 | 07:02 |
*** ivve has joined #openstack-lbaas | 07:06 | |
*** rpittau|afk is now known as rpittau | 07:12 | |
*** happyhemant has joined #openstack-lbaas | 07:48 | |
openstackgerrit | Merged openstack/octavia-lib master: Remove testtools from test-requirements.txt https://review.openstack.org/644876 | 07:59 |
*** rcernin has quit IRC | 08:01 | |
*** luksky has joined #openstack-lbaas | 08:15 | |
*** abaindur has quit IRC | 08:23 | |
*** yamamoto has quit IRC | 08:30 | |
*** yamamoto has joined #openstack-lbaas | 08:36 | |
*** vishalmanchanda has joined #openstack-lbaas | 08:44 | |
*** chungpht has joined #openstack-lbaas | 09:11 | |
*** luksky has quit IRC | 09:15 | |
*** yamamoto has quit IRC | 09:16 | |
*** yamamoto has joined #openstack-lbaas | 09:44 | |
*** yamamoto has quit IRC | 09:47 | |
*** psachin has joined #openstack-lbaas | 09:59 | |
*** salmankhan has joined #openstack-lbaas | 10:00 | |
*** yamamoto has joined #openstack-lbaas | 10:02 | |
*** livelace has joined #openstack-lbaas | 10:02 | |
livelace | Hello. Cannot find any information about a heartbeat logic. If amphora sends UDP packet through NAT - is it Ok ? Does healthmanager work with such packets ? | 10:04 |
*** yamamoto has quit IRC | 10:08 | |
*** luksky has joined #openstack-lbaas | 10:08 | |
livelace | Why am I asking ? Because I see that amphora is in ERROR status, but it works fine. | 10:12 |
cgoncalves | livelace, hi. sadly it will not work with heartbeats being sent behind NAT. the health manager looks for the source address of the UDP packet | 10:15 |
*** yamamoto has joined #openstack-lbaas | 10:15 | |
cgoncalves | https://github.com/openstack/octavia/blob/372ff99a030e6b33dad11a35cb9d5c4058805c53/octavia/amphorae/drivers/health/heartbeat_udp.py#L188 | 10:15 |
livelace | cgoncalves, Thanks. That means I should change my communication topology :( | 10:17 |
*** yamamoto has quit IRC | 10:18 | |
*** yamamoto has joined #openstack-lbaas | 10:21 | |
*** yamamoto has quit IRC | 10:21 | |
*** yamamoto has joined #openstack-lbaas | 10:22 | |
*** yamamoto has quit IRC | 10:24 | |
*** yamamoto has joined #openstack-lbaas | 10:24 | |
cgoncalves | sorry for that. perhaps the heartbeat packet could include in the payload the IP address and the health manager later could check if it's set and use that one, otherwise fall back to the srcaddr as is today | 10:33 |
cgoncalves | livelace, could you please file a story on storyboard.openstack.org? | 10:34 |
livelace | cgoncalves, Don't think so, really. Because I have a very specific configuration + I don't know Octavia well. If I'm change my mind, I will fill story on storyboard (that is cool that such board exists). | 10:41 |
*** livelace has quit IRC | 10:56 | |
*** yamamoto has quit IRC | 11:01 | |
*** yamamoto has joined #openstack-lbaas | 11:03 | |
*** yamamoto has quit IRC | 11:07 | |
*** yamamoto has joined #openstack-lbaas | 11:09 | |
*** yamamoto has quit IRC | 11:09 | |
*** yamamoto has joined #openstack-lbaas | 11:26 | |
*** yamamoto has quit IRC | 11:41 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Set member initializing state as OFFLINE https://review.openstack.org/651111 | 11:47 |
rm_work | cgoncalves: yeah we did it that way partly for security too, not just convenience | 11:51 |
rm_work | kind of "proof" that it is the packet it says it is | 11:51 |
rm_work | since the only other auth bits are global (no per-amp encryption key for packets) | 11:51 |
rm_work | that way someone couldn't perform part of a DoS by spoofing "healthy" messages and then trying to take down an amp or something (i dunno) | 11:52 |
rm_work | or at least make it more difficult | 11:52 |
rm_work | but maybe it doesn't actually help that much, dunno, would be good to hear from a seasoned network security person who might know if that's actually reasonable | 11:53 |
cgoncalves | rm_work, I didn't know about that decision (you guys are Octavia dinosaurs). make sense :) | 11:53 |
rm_work | well, we could also have per-amp encryption keys | 11:54 |
* rm_work shrugs | 11:54 | |
rm_work | just more work | 11:54 |
rm_work | we were going for "secure enough" but also "able to launch in a reasonable timeframe" | 11:54 |
rm_work | turns out we were too late for RAX :/ | 11:55 |
rm_work | and also GD | 11:55 |
rm_work | as it turns out T_T | 11:55 |
*** yamamoto has joined #openstack-lbaas | 11:59 | |
*** yamamoto has quit IRC | 11:59 | |
*** celebdor1 has joined #openstack-lbaas | 12:18 | |
*** livelace has joined #openstack-lbaas | 12:21 | |
*** celebdor1 has quit IRC | 12:22 | |
*** celebdor1 has joined #openstack-lbaas | 12:23 | |
johnsom | Yeah, the design does not support NAT. It is a routable private network, so shouldn’t need NAT. But as rm-work said, it is a security feature. One of a few layers. | 12:24 |
livelace | Caught a difference between centos and ubuntu images (amphora images). I see that ubuntu network namespace doesn't have an default route. Centos has default routes (which it takes from network settings). It seems a problem with the ubuntu image, isn't it ? | 12:25 |
johnsom | Ubuntu has a default route, but you might have an image that has a recent bug in it. | 12:26 |
johnsom | The bug causes eth1 to not be up at all in the netns, the routes included. | 12:27 |
livelace | johnsom, That is why I use centos mostly. Ok, thanks. | 12:27 |
*** celebdor1 has quit IRC | 12:28 | |
johnsom | In fairness, it was an Octavia bug we introduced in the RCs. RC3 has it fixed. | 12:29 |
*** livelace has quit IRC | 12:32 | |
*** yamamoto has joined #openstack-lbaas | 12:43 | |
cgoncalves | reviewers: as we plan to cut a release for stable/queens soon, it would be great if we could also try to have https://review.openstack.org/#/c/650909/ in | 12:43 |
*** openstackgerrit has quit IRC | 12:44 | |
*** HVT has quit IRC | 12:44 | |
*** yamamoto has quit IRC | 13:10 | |
*** yamamoto has joined #openstack-lbaas | 13:11 | |
*** yamamoto has quit IRC | 13:11 | |
*** vishalmanchanda has quit IRC | 13:50 | |
*** yamamoto has joined #openstack-lbaas | 13:51 | |
*** fnaval has joined #openstack-lbaas | 13:54 | |
*** Vorrtex has joined #openstack-lbaas | 13:58 | |
*** yamamoto has quit IRC | 14:02 | |
*** openstackgerrit has joined #openstack-lbaas | 14:11 | |
openstackgerrit | Merged openstack/octavia stable/queens: Fix the amphora base port coming up https://review.openstack.org/650469 | 14:11 |
*** celebdor1 has joined #openstack-lbaas | 14:13 | |
openstackgerrit | Merged openstack/octavia stable/rocky: Fix the amphora base port coming up https://review.openstack.org/650468 | 14:15 |
*** celebdor1 has quit IRC | 14:20 | |
*** boden has joined #openstack-lbaas | 14:30 | |
cgoncalves | nmagnezi, dayou_: do you have some spare minutes to review https://review.openstack.org/#/c/650909/? | 14:30 |
*** gcheresh_ has joined #openstack-lbaas | 14:33 | |
*** livelace has joined #openstack-lbaas | 14:42 | |
*** lemko has joined #openstack-lbaas | 14:46 | |
openstackgerrit | Merged openstack/octavia master: Fix the amphora base port coming up https://review.openstack.org/650417 | 14:46 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Set member initializing state as OFFLINE https://review.openstack.org/651111 | 14:46 |
*** gcheresh has joined #openstack-lbaas | 14:50 | |
*** gcheresh_ has quit IRC | 14:54 | |
livelace | johnsom, Were there problems with DNS resolving in Centos based Amphora images ? | 14:57 |
livelace | jiteka, I see in nsswitch only: "hosts: files myhostname" | 14:58 |
johnsom | DNS is disabled in all of the amphora images. | 14:58 |
livelace | jiteka, Sorry | 14:58 |
livelace | johnsom, For what reason ? | 14:59 |
johnsom | It is not needed, it slows down a lot of processes, and there is typically no DNS resolver available to the amphora. | 14:59 |
johnsom | Not to mention the security implications. | 15:00 |
johnsom | If you feel you do need it, you can always create a custom image that turns it back on and create a way for the amps to access a resolver. | 15:04 |
livelace | johnsom, Yes, I understand. Thanks for your comments. | 15:07 |
livelace | johnsom, I'm just investigating how can I increase speed of amphora initialization. Are there any prod installations who use containers for this purpose ? | 15:09 |
johnsom | No. The amps boot in around 30 seconds in most clouds, which seems reasonable for most. | 15:11 |
johnsom | I have done a PoC using lxd, the patches are posted, but there are tradeoffs. | 15:12 |
cgoncalves | and if single topology, you may consider amphora spare pool for faster provisioning times | 15:12 |
livelace | My initialization spend 160 seconds :( | 15:13 |
johnsom | The real issues are that we are plumbing and the container platforms are focused on the app layers. | 15:13 |
johnsom | livelace: Then you have something very wrong in your deployment. | 15:13 |
cgoncalves | is your deployment on a nested virtualized environment? | 15:14 |
johnsom | I can’t go into detail today as I am traveling, so can go more into detail later in the week on containers. | 15:14 |
livelace | johnsom, I see that worker checks availability of the agent, at that moment VM is available and I can connect to it over SSH. | 15:14 |
livelace | These check eat most of the time. | 15:15 |
livelace | johnsom, Ok, have a good trip :) | 15:16 |
*** gcheresh has quit IRC | 15:17 | |
johnsom | If it is checking, the vm should be booted by then. It sounds like the environment has a problem. | 15:19 |
bcafarel | cgoncalves: o/ can you +2 https://review.openstack.org/#/c/646673/ ? (I see you went through all the other ones) | 15:19 |
cgoncalves | bcafarel, +W'd | 15:19 |
bcafarel | cgoncalves: thanks, kicking the complete set in gate run | 15:20 |
livelace | I caught another one issue (for me), after shutdown the whole server (I'm testing on it) the amphora stays in "ERROR" state and the worker doesn't recreate it/create a new one. If I shutdown the amphora correctly - the worker recreate a new one. | 15:22 |
*** sapd1_x has joined #openstack-lbaas | 15:22 | |
livelace | Is there any way to always check amphora state and tries to create a new instance ? | 15:24 |
livelace | It seems that controller/worker "forget" about amphora, if controller/worker was restarted. | 15:25 |
cgoncalves | livelace, the health manager service should have triggered an amphora fail over. could you check the logs around the time the amphora went in to ERROR? | 15:26 |
cgoncalves | oh, you restarted the health manager service while it was failing over the amphora? | 15:26 |
*** sapd1_x has quit IRC | 15:27 | |
colin- | i'd be surprised if any of the octavia processes forgot about an amphora, you can verify that by checking the amphora table in the octavia database where load_balancer_id=<your id> | 15:28 |
colin- | it will even track deleted ones there. if you have that visibility, it can be helpful in illustrating what octavia sees | 15:28 |
livelace | Status of amphora and lb right after shutdown of the server https://paste.fedoraproject.org/paste/IEYR26k~jBJQYR2jsAzCAg | 15:28 |
livelace | The amphora vm is in Shutoff state | 15:29 |
livelace | cgoncalves, I'm simulating powerloss, I just put the whole server (shutdown -h now) into shutdown, the amphora and the controller were working that time. | 15:31 |
johnsom | Yeah, that was probably the only compute node, the controller attempted to failover and repair, but nova failed to provision a replacement instance, so we stopped and marked it error. | 15:31 |
gthiemonge | View All | 15:32 |
*** ivve has quit IRC | 15:32 | |
gthiemonge | oops, sorry | 15:32 |
cgoncalves | livelace, johnsom made a good point. could you please check the health manager service logs? | 15:34 |
livelace | Debug log https://paste.fedoraproject.org/paste/IBO1HrPcMWEQskeB4rgZAA | 15:38 |
*** ccamposr has quit IRC | 15:39 | |
cgoncalves | hmm, "ComputeDeleteException: Failed to delete compute instance.". the exception raised by is not being handled | 15:39 |
livelace | And what next ? Amphora and lb are not working. What should we do if cluster is down ? Recreate all lb by hand ? :) | 15:40 |
*** luksky has quit IRC | 15:42 | |
livelace | cgoncalves, I guess because Nova services were initializing that time. | 15:43 |
livelace | My experience suggests that we should do periodic checks and trying to recreate amphorae and their lbs, because we don't know how the cluster was putted down and we don't know what order it will be during initialization. | 15:46 |
colin- | i can't see the log (gated?), which process logged the "ComputeDeleteException" message? | 15:51 |
colin- | want to see if i've ever recorded it | 15:51 |
livelace | cgoncalves, is your deployment on a nested virtualized environment? No, I don't use nested virt, it's a server where I can do some experiments, before changes go to the prod cluster. | 15:51 |
cgoncalves | the exception is thrown here: https://github.com/openstack/octavia/blob/12668dec63906628e2f01f651a9e57d9b2446e40/octavia/controller/worker/tasks/compute_tasks.py#L187 | 15:51 |
colin- | thanks | 15:52 |
cgoncalves | get_failover_flow isn't catching that though | 15:52 |
cgoncalves | https://github.com/openstack/octavia/blob/147a340f4031d13bd196adb2fd7204db7a7bd5c5/octavia/controller/worker/flows/amphora_flows.py#L387-L389 | 15:52 |
johnsom | It didn’t revert? Did they turn that off in the config? | 15:54 |
cgoncalves | so, what I'm reading is that we trigger a nova instance delete to delete the old amphora and nova throws an error because the compute node is unavailable. am I missing something? | 15:54 |
cgoncalves | johnsom, no revert defined for that task :/ | 15:55 |
livelace | Could you guys give a verdict ? | 15:59 |
cgoncalves | livelace, at this point you may have to recreate the LB, I'm afraid | 16:00 |
livelace | cgoncalves, No-no, I meant your planning for this issue ? | 16:01 |
livelace | Will you change behaviour in future or not ? | 16:01 |
cgoncalves | livelace, once we confirm and root cause the issue, someone should work on a fix | 16:05 |
cgoncalves | the first step would be to file a story. could you please open one on storyboard.openstack.org? | 16:06 |
livelace | cgoncalves, johnsom Thanks for your replies! | 16:06 |
*** livelace has quit IRC | 16:06 | |
johnsom | It should still revert even if we don’t define a revert step for that task. | 16:17 |
cgoncalves | ah! our great new PTL has us covered! | 16:33 |
cgoncalves | https://review.openstack.org/#/c/616287/ | 16:33 |
cgoncalves | LB was not being marked in ERROR before | 16:33 |
cgoncalves | queens and rocky backports merged in end of March/beginning of April | 16:34 |
*** salmankhan has quit IRC | 16:39 | |
johnsom | Ah, so it failed prior to entering the flow? | 16:45 |
cgoncalves | johnsom, no. it entered the flow, exception raised on nova delete, amp marked in ERROR, and bubbled up to failover_amphora which while catching any exception it was not marking LB in ERROR | 16:51 |
johnsom | That flow doesn’t have a capstone task that moves the lb to error? Maybe the database was down by then and we couldn’t mark it as such. | 16:53 |
openstackgerrit | Merged openstack/neutron-lbaas stable/stein: Replace openstack.org git:// URLs with https:// https://review.openstack.org/646674 | 16:54 |
openstackgerrit | Merged openstack/neutron-lbaas stable/rocky: Replace openstack.org git:// URLs with https:// https://review.openstack.org/646673 | 16:54 |
cgoncalves | LoadBalancerToErrorOnRevertTask | 16:55 |
cgoncalves | apparently not, no | 16:55 |
*** gcheresh has joined #openstack-lbaas | 16:56 | |
*** psachin has quit IRC | 17:00 | |
cgoncalves | https://github.com/openstack/octavia/blob/a728bc000f65e431dc57fef61d41e5ba63d72b02/octavia/controller/worker/tasks/lifecycle_tasks.py#L34-L35 | 17:04 |
cgoncalves | no mark lb status error there | 17:05 |
cgoncalves | it should considering also that it might be a spare amp hence not associated to any LB | 17:05 |
*** jiteka1 has joined #openstack-lbaas | 17:13 | |
*** ramishra has quit IRC | 17:19 | |
*** gcheresh has quit IRC | 17:24 | |
*** yamamoto has joined #openstack-lbaas | 17:28 | |
*** livelace has joined #openstack-lbaas | 17:32 | |
*** ricolin has quit IRC | 18:04 | |
*** ceryx has joined #openstack-lbaas | 18:10 | |
*** yamamoto has quit IRC | 18:19 | |
*** rpittau is now known as rpittau|afk | 18:21 | |
*** luksky has joined #openstack-lbaas | 18:22 | |
*** happyhemant has quit IRC | 18:38 | |
*** yamamoto has joined #openstack-lbaas | 18:58 | |
openstackgerrit | Merged openstack/neutron-lbaas stable/queens: Choose correct log option by listener protocol https://review.openstack.org/647689 | 19:04 |
*** yamamoto has quit IRC | 19:09 | |
*** abaindur has joined #openstack-lbaas | 19:35 | |
openstackgerrit | Merged openstack/neutron-lbaas stable/queens: Fix proxy extension for neutron RBAC https://review.openstack.org/649048 | 20:17 |
openstackgerrit | Merged openstack/neutron-lbaas stable/pike: Replace openstack.org git:// URLs with https:// https://review.openstack.org/646671 | 20:17 |
*** pcaruana has quit IRC | 20:31 | |
*** pcaruana has joined #openstack-lbaas | 20:33 | |
*** pcaruana has quit IRC | 20:36 | |
*** pcaruana has joined #openstack-lbaas | 20:39 | |
*** lemko has quit IRC | 20:44 | |
livelace | cgoncalves, Ok. I'm going to do it right now :) | 20:45 |
*** pcaruana has quit IRC | 20:47 | |
*** Vorrtex has quit IRC | 20:47 | |
livelace | cgoncalves, https://storyboard.openstack.org/#!/story/2005417 | 21:07 |
*** boden has quit IRC | 21:08 | |
*** luksky has quit IRC | 21:50 | |
*** rcernin has joined #openstack-lbaas | 22:28 | |
*** fnaval has quit IRC | 22:29 | |
*** abaindur has quit IRC | 23:02 | |
*** abaindur has joined #openstack-lbaas | 23:03 | |
*** livelace has quit IRC | 23:14 | |
*** fnaval has joined #openstack-lbaas | 23:42 | |
*** fnaval has quit IRC | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!