Friday, 2018-12-21

*** maysams has quit IRC00:05
*** celebdor has joined #openstack-kuryr01:06
*** rh-jelabarre has joined #openstack-kuryr01:24
*** hongbin has joined #openstack-kuryr02:00
*** celebdor has quit IRC02:33
*** rh-jelabarre has quit IRC04:25
*** janki has joined #openstack-kuryr04:44
*** sean-k-mooney has quit IRC05:18
*** hongbin has quit IRC05:55
*** ccamposr has joined #openstack-kuryr05:59
*** janki has quit IRC06:38
dulekbathri-s: Seems like your Docker daemon isn't running for some reason. Checkout it's logs.07:18
*** maysams has joined #openstack-kuryr07:43
*** pcaruana has joined #openstack-kuryr08:19
*** dmellado has quit IRC08:44
dulekltomasbo: Seems like you're only one I can consult my idea with… ;)08:45
ltomasbodulek, sure! tell me!08:45
dulekltomasbo: I'm tempted to try switching all jobs to multinode and only run etcd on the second node.08:46
dulekltomasbo: That way we should be able to see if it's some contention or simple lack of resources issue, or something in the software.08:46
ltomasbodulek, umm, with the current status of the gates...08:46
dulekltomasbo: Naaah, just for testing.08:46
*** dmellado has joined #openstack-kuryr08:46
ltomasbodulek, are we actually using the multinode gate right now?08:46
ltomasbodulek, perhaps you can test it on that one...08:47
dulekltomasbo: Yes, but etcd only runs on controller node.08:47
ltomasbodulek, ahh, true...08:47
ltomasbodulek, sounds good to me08:47
dulekltomasbo: Well, the etcd issue is hitting us 1 in 10 runs, so it's easier to just switch all. :P08:48
ltomasbodulek, also, note there were some problems with locating etcd on a different node (if I remember correctly)08:48
ltomasbodulek, I remember having to do so for a mix env with nested and baremetal08:48
ltomasbodulek, and I had to tweak devstack to allow that... not sure about the current status08:48
ltomasbodulek, on a (somehow) related note. I didn't hit the int literal problem thing on this: https://review.openstack.org/#/c/626624/108:49
dulekltomasbo: Hm, yeah, I'll need to get subnode IP somehow.08:49
ltomasbodulek, but probably because it failed before...08:49
dulekltomasbo: xD08:49
ltomasbodulek, so, I'm rechecking to see if it helps...08:49
ltomasbodulek, also updated https://review.openstack.org/#/c/626363/08:50
ltomasbodmellado, ^^08:50
dulekdmellado: o/08:50
ltomasbodulek, this looks good too: https://review.openstack.org/#/c/626638/108:50
dulekltomasbo: It does, but K8s 1.13 drops etcd2. xD08:51
dmelladohi folks, damn (hug) bouncer08:51
dulekltomasbo: I needed to go back to 1.12 to make this work.08:51
dmelladoany more hugged findings from the gate?08:51
dulekdmellado: http://eavesdrop.openstack.org/irclogs/%23openstack-kuryr/08:51
ltomasbodulek, yep, but dmellado was moving back to 1.12 too in another patch...08:51
ltomasbobetter to go with 1.12 until we make 1.13 work, rather than the other way around, right?08:51
ltomasbodulek, dmellado ^^08:51
dulekltomasbo: It's find for short term of course, but in the long run we need to find a way.08:51
dulekOr maybe it's 1.13 bug and we should report it.08:52
dulekltomasbo: But from what I've seen K8s API folks only answer - add resources to your etcd node.08:52
dulekltomasbo: s/find/fine. :D08:52
ltomasbodulek, yep... and that could be actually the case... I think etcd has a problem...08:52
dmelladoloool so k8s 1.13 to blame08:52
dmelladoand just 'add resources'08:52
dmelladoawesome08:53
dmelladolet's revert to 1.12 for now and investigate with 1.13 on an experimental gate08:53
ltomasbodulek, I'm my envs sometimes pods are not getting active too due to etcd missing events08:53
ltomasbodulek, and in eminguez env the other day, deletion of resources was taking ages...08:53
dulekdmellado: We've seen that occasionally with 1.12 as well. It's probably just 1.13 stretching etcd more.08:53
ltomasboprobably due to etcd sync...08:53
dmelladoin any case it might become better if we go to kubeadm and put etcd on another node...08:54
dulekdmellado: That's my idea for a test patch now. Switch all the gates to multinode with etcd on the subnode.08:54
dulekdmellado: I just need to get subnode ip somehow. :D08:54
dulekdmellado: Another thing to try is to give the freaking etcd higher CPU priority. It haven't worked with IO, but dstat says that it isn't really an issue with iops.08:56
*** janki has joined #openstack-kuryr08:56
dmelladoeven with nice? xD08:56
dulekI'll start with analyzing https://review.openstack.org/#/c/626638/08:56
dulekdmellado: https://review.openstack.org/#/c/624731/08:57
dmelladodulek: # Żółć. ? xD08:58
dmelladoin any case looks promising08:59
dmelladoif we could get by with this without having to get all our gates to multinode it'd be easier08:59
dulekdmellado: Oh, I only want to switch that to test if it's contention/lack of resources issue.09:00
dmelladoI wouldn't be surprised if that's the case09:00
dulekdmellado: żółć means bile. Hard to find more Polish word. :D09:00
dmelladoas we're installing a lot of things in the node09:00
dmelladodevstack, hyperkube and so09:01
dulekltomasbo: Hey, and about the failures on your OVS from source patch.09:01
dulekltomasbo: Isn't it due to different kernel versions maybe?09:01
dulekltomasbo: Just a thought but maybe it's only failing on one of the clouds.09:01
dulekOkay, so with switch to etcd2 we're still getting some issues on RAX nodes. I'll wait for the recheck, but looks like it's not helping.09:03
dmelladodulek: that with also 1.12?09:04
dulekdmellado: Yup, I needed to downgrade that - 1.13 drops etcd2 support.09:05
dmelladoactually it seems that https://review.openstack.org/#/c/624730/ performs better09:08
ltomasbodulek, I check and some of them are unrelated... other could be due to not rmmod openvswitch before compiling it from source (I guess)09:18
*** garyloug has joined #openstack-kuryr09:19
ltomasbodulek, in your patch: HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://158.69.74.203:2379 has no leader\n","code":500}09:20
dmelladoI've also seen that on some patches09:20
dulekltomasbo: Which of my patches/09:24
dulek?09:24
ltomasbothe one using etcd2 and kubernetes 1.1209:24
dulekltomasbo: Which gate?09:25
ltomasbohttp://logs.openstack.org/38/626638/1/check/kuryr-kubernetes-tempest-daemon-containerized-octavia/050a482/testr_results.html.gz09:25
ltomasbodulek, this seems to work actually: https://review.openstack.org/#/c/626624/09:38
ltomasbodulek, it is failing on ovn because by using that var, I believe both ovn and neutron are trying to install ovs from soruce09:39
ltomasbodulek, the other gate is failing on the sg_svc_isolation (which is unrelated)09:39
ltomasbodulek, and the last failures is related t7u failed kubernetes-scheduler.service...09:40
ltomasbodulek, I'm going to recheck again...09:40
dulekltomasbo: http://logs.openstack.org/24/626624/1/check/kuryr-kubernetes-tempest-daemon-octavia/d5736a8/controller/logs/screen-kubernetes-api.txt.gz#_Dec_21_08_25_21_61855109:40
dulekltomasbo: It still has failures on K8s API.09:41
dulekltomasbo: So I'd say it's just luck. ;)09:41
ltomasbodulek, I meant for the other error, not for etcd09:41
dulekltomasbo: Ah, but you were aiming to get rid of base 10. :D09:41
ltomasbodulek, I'm refering to the base 10 thing, yes09:41
dulekltomasbo: Those were more rare, so it might need a few rechecks to confirm.09:42
ltomasboyes yes, I'm rechecking...09:42
ltomasbo(and waiting for experimental)09:42
ltomasbobut I was getting those locally with ovs2.909:42
ltomasboso, I believe it may help... fingers crossed09:43
*** gmann is now known as gmann_pto09:43
*** celebdor has joined #openstack-kuryr09:53
openstackgerritMerged openstack/kuryr-kubernetes master: Drop Octavia providers supported protocols list  https://review.openstack.org/62603209:57
*** phuoc_ has quit IRC10:06
*** phuoc_ has joined #openstack-kuryr10:06
dulekdmellado: If you want easy Friday refactoring, there's a devstack-minimal job that could serve as our base job. :)10:39
dmelladodulek: heh, we could use that10:39
dmelladobut let me fist finish dealing with pagure and fedora10:39
dmelladohugged python-openshift10:39
dmelladodulek:  please remember me to force using request for anyone who needs a client who's not packaged in distro already10:40
openstackgerritMichał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services  https://review.openstack.org/62660910:51
openstackgerritMichał Dulko proposed openstack/kuryr-kubernetes master: DNM: Put etcd on another host  https://review.openstack.org/62687210:51
dulekdmellado: ^ that's really bruteforce, but let's see.10:51
dulekdmellado: My another idea is to put etcd data directory onto ramdisk. xD10:51
dmelladodulek: hmmm looking forward to see the result...10:51
dmelladoLOL10:51
dmelladothat might actually not be a bad idea xD10:52
dmelladobut we're already short on ram xD10:52
dulekdmellado: For tests it's totally a viable long-term solution.10:52
dulekdmellado: Depends how big it would need to be. If etcd only needs ~500 MB, then I think we can spare that.10:52
dmelladowell, let's see the outcome of ^^first and then we can get to discuss that10:53
dmelladowe could even use the another host ramdisk xD xD XD10:53
dulekdmellado: Shhh, I hear infra folks walking nearby. ;)10:53
dmellado:D10:54
*** gkadam has joined #openstack-kuryr10:56
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs  https://review.openstack.org/62687811:05
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs  https://review.openstack.org/62687811:37
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs  https://review.openstack.org/62687811:44
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: DNM Test building ovs from source  https://review.openstack.org/62662411:55
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs  https://review.openstack.org/62687811:55
openstackgerritMichał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services  https://review.openstack.org/62660912:19
dulekdmellado: ^ this now depends from a patch that creates a ramfs for etcd's data. We'll see…12:26
dmelladolol12:26
dmelladolet's see if that ramdisk isn't really filled out12:26
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create  https://review.openstack.org/62688712:30
dulekdmellado: I've tried it locally and, well. kubectl works super fast. xD12:38
dulekGotta go now, happy holidays everyone!12:39
dmelladodulek: happy holidays!12:39
dmelladoI'll take a look, thanks!!!12:39
dmelladoand safe travels!!!12:39
dulekI'll check back here in the evening, but probably nobody will be there. ;)12:39
dmelladoI assume you'll be driving back home for Christmas?12:39
dulekdmellado: Naaah, we've decided to fly to Canary Island instead. ;)12:40
dmelladoOh, enjoy then!12:40
dmelladodon't forget to try 'mojo picón' xD12:40
* dulek never tries anything dmellado recommends without checking what's that first.12:42
dmelladolol12:42
dmelladodulek:  c'mon, I even helped you with your CYD issues xD12:43
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs  https://review.openstack.org/62687812:53
*** ccamposr has quit IRC12:54
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs  https://review.openstack.org/62687812:56
*** rh-jelabarre has joined #openstack-kuryr13:15
openstackgerritAntoni Segura Puimedon proposed openstack/kuryr-tempest-plugin master: detect failed curl when streamed from pod  https://review.openstack.org/62689213:21
celebdorltomasbo: dulek: ^^ will at least give a more meaningful error13:22
celebdorto reduce headscratching13:22
*** janki has quit IRC14:04
ltomasbocelebdor, dulek: I'm rechecking once again this one: https://review.openstack.org/#/c/626624/14:21
ltomasbocelebdor, dulek: for now I haven't hit the invalid literal problem. And I fixed the problem for OVN gates and building for source (twice)14:21
celebdorltomasbo: why are you rechecking it?14:21
celebdorltomasbo: did you see the change I made to be more precise on the error?14:22
ltomasbocelebdor, as that problem was not happening all the times, to ensure it is actually avoiding the problem (and not just being lucky14:22
celebdorok14:22
ltomasbocelebdor, not yet14:22
celebdorok14:25
dmelladolet's see if we can get the CI to behave in a more reliable way14:54
dmelladoalso the ramfs will make things faster hopefully14:54
dmelladowhile not getting the node outta ram14:54
openstackgerritMerged openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs  https://review.openstack.org/62687816:04
*** pcaruana has quit IRC16:18
dmelladoo/ I'm off until Jan the 2nd! Happy New Year kuryrs! Thanks for your help along this year! ;)16:21
openstackgerritMaysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Update CRD when NP has podSelectors  https://review.openstack.org/62558816:29
openstackgerritLuis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure gates run the latest OVS  https://review.openstack.org/62662416:40
*** gkadam has quit IRC16:58
*** gkadam has joined #openstack-kuryr16:58
openstackgerritMerged openstack/kuryr-tempest-plugin master: Fixup kuryr_daemon_enabled option description  https://review.openstack.org/62293217:15
*** celebdor has quit IRC19:33
*** maysams has quit IRC20:46
openstackgerritMerged openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create  https://review.openstack.org/62688721:26
*** aojea has joined #openstack-kuryr22:56
*** aojea has quit IRC22:56
*** dims has quit IRC23:33

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!