Thursday, 2019-07-11

*** hongbin has joined #openstack-kuryr00:57
*** ndesh has joined #openstack-kuryr01:47
*** irclogbot_1 has joined #openstack-kuryr02:06
*** irclogbot_1 has quit IRC02:13
*** irclogbot_0 has joined #openstack-kuryr02:16
*** altlogbot_3 has joined #openstack-kuryr02:18
*** irclogbot_0 has quit IRC02:19
*** altlogbot_3 has quit IRC02:34
*** irclogbot_1 has joined #openstack-kuryr03:20
*** irclogbot_1 has quit IRC03:25
*** hongbin has quit IRC03:44
*** altlogbot_3 has joined #openstack-kuryr03:54
*** altlogbot_3 has quit IRC03:59
*** gcheresh_ has joined #openstack-kuryr04:34
*** pcaruana has joined #openstack-kuryr04:55
*** ccamposr has joined #openstack-kuryr05:18
*** ccamposr__ has quit IRC05:20
*** gcheresh_ has quit IRC05:34
*** gcheresh_ has joined #openstack-kuryr05:40
*** ccamposr__ has joined #openstack-kuryr05:57
*** gcheresh_ has quit IRC05:58
*** ccamposr has quit IRC06:00
*** altlogbot_1 has joined #openstack-kuryr06:18
*** altlogbot_1 has quit IRC06:23
*** altlogbot_0 has joined #openstack-kuryr06:24
*** altlogbot_0 has quit IRC06:29
*** openstackgerrit has joined #openstack-kuryr06:53
openstackgerritAlexey Perevalov proposed openstack/kuryr-kubernetes master: Use CNI_IFNAME environment variable  https://review.opendev.org/67014106:53
*** altlogbot_3 has joined #openstack-kuryr06:56
*** altlogbot_3 has quit IRC07:01
*** gcheresh_ has joined #openstack-kuryr07:11
*** korzen has joined #openstack-kuryr07:14
*** irclogbot_2 has joined #openstack-kuryr07:16
*** irclogbot_2 has quit IRC07:20
*** irclogbot_3 has joined #openstack-kuryr07:24
*** irclogbot_3 has quit IRC07:27
*** altlogbot_3 has joined #openstack-kuryr07:48
*** altlogbot_3 has quit IRC07:49
*** irclogbot_3 has joined #openstack-kuryr07:52
*** irclogbot_3 has quit IRC07:55
*** irclogbot_0 has joined #openstack-kuryr08:08
*** irclogbot_0 has quit IRC08:13
*** maysams has joined #openstack-kuryr08:20
*** gkadam has joined #openstack-kuryr08:28
*** ccamposr has joined #openstack-kuryr08:51
*** ccamposr__ has quit IRC08:53
*** FlorianFa has joined #openstack-kuryr09:36
*** korzen has quit IRC09:37
*** irclogbot_3 has joined #openstack-kuryr09:48
*** irclogbot_3 has quit IRC09:51
*** altlogbot_3 has joined #openstack-kuryr10:26
*** altlogbot_3 has quit IRC10:29
*** janki has joined #openstack-kuryr10:32
*** altlogbot_0 has joined #openstack-kuryr10:34
*** altlogbot_0 has quit IRC10:39
*** altlogbot_2 has joined #openstack-kuryr10:40
*** irclogbot_0 has joined #openstack-kuryr10:40
*** altlogbot_2 has quit IRC10:45
*** irclogbot_0 has quit IRC10:45
*** altlogbot_3 has joined #openstack-kuryr10:56
*** altlogbot_3 has quit IRC11:01
openstackgerritMerged openstack/kuryr-kubernetes master: Set the validate CRD enabled flag at tempest.conf  https://review.opendev.org/66891611:03
openstackgerritMerged openstack/kuryr-kubernetes stable/stein: Fix fail to recreate namespace when previous KuryrNet CRD is not deleted  https://review.opendev.org/66980011:14
*** altlogbot_1 has joined #openstack-kuryr11:56
*** rh-jelabarre has joined #openstack-kuryr11:59
*** altlogbot_1 has quit IRC12:01
*** danil has joined #openstack-kuryr12:08
danilHello. Does anybody know why should we specify resourceVersion here https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/handlers/vif.py#L212 ? Could we make annotations withot exact resourceVersion? Can it cause problems?12:10
dulekdanil: resourceVersion is to make sure nobody updated that annotation between.12:15
dulekdanil: Basically to avoid lost update.12:15
*** janki has quit IRC12:21
*** irclogbot_3 has joined #openstack-kuryr12:32
*** irclogbot_3 has quit IRC12:35
*** janki has joined #openstack-kuryr12:43
*** janki has quit IRC12:45
danildulek, so from the code I can see, that if somebody updated annotations between, we will get an Exception from k8s client. After that an attempt will be repeated until it will be succeed, right?12:49
*** altlogbot_2 has joined #openstack-kuryr12:50
dulekdanil: Yes, but only if that annotation was not modified.12:50
dulekOtherwise you'll get an error and you'll need to check if there wasn't a conflict or something.12:51
danildo you mean "annotation was not modified" between attempts? or before ?12:51
openstackgerritMerged openstack/kuryr-kubernetes master: Use CNI_IFNAME environment variable  https://review.opendev.org/67014112:52
*** altlogbot_2 has quit IRC12:53
dulekdanil: Okay, so here how it works:13:01
dulek1. We read pod from K8s API.13:01
dulek2. We do some modifications to annotations based on that information.13:02
dulek3. Then we try to save the annotation.13:02
dulek4. Now if we won't include resourceVersion, that pod might have changed, or some other thread/service wrote to that annotation field.13:02
dulekSo we use resourceVersion to make sure that if the pod resource was modified between we read and saved it - we will reconsider it.13:03
dulekAccording to this: <danil> conflict in resourceVersion happens (in my opinion) because object was updated in binding driver (also resourceVersion was updated), but vif handler tries to annotate pod with old resourceVersion13:04
dulekWell then I think resourceVersion serves its purpose - makes sure that VIF handler will notice that pod was updated.13:04
dulekdanil: So the VIF handler should retry, shouldn't it?13:05
danilyes, it retries , but after exceptions from k8s client like this...13:07
danilException response, headers: {'Date': 'Thu, 04 Jul 2019 09:58:23 GMT', 'Content-Length': '338', 'Content-Type': 'application/json'}, content: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on pods \"nginx-sriov-664456478-cr8kq\": the object has been modified; please apply your changes to13:07
danil the latest version and try again","reason":"Conflict","details":{"name":"nginx-sriov-664456478-cr8kq","kind":"pods"},"code":409}, text: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on pods \"nginx-sriov-664456478-cr8kq\": the object has been modified; please apply your changes to the13:07
danillatest version and try again","reason":"Conflict","details":{"name":"nginx-sriov-664456478-cr8kq","kind":"pods"},"code":409}13:07
danildulek, ^ sorry, did'n ,antioned you13:13
dulekdanil: This is saving the VIF annotation?13:14
danildulek, it happens in vif handler while _set_pod_state. It tries to save vif annotations after all neutron ports were active. An Exception occurs , it retries , after that annotations are written13:18
danilhttps://pastebin.com/bLUXpsNm13:18
dulekdanil: Wait, on which level it retries?13:18
dulekOkay, I see, on RetryHandler level.13:19
danildulek, yes13:19
dulekdanil: Do you have more logs?13:25
dulekdanil: Like including this one: https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L198-L19913:25
danildulek, one moment13:25
danildulek, https://pastebin.com/BQAs08A0 here are logs from controller13:35
*** irclogbot_1 has joined #openstack-kuryr13:40
dulekdanil: Okay, so either 'openstack.org/kuryr-vif' or 'openstack.org/kuryr-pod-label' got updated between the read and write.13:43
dulekdanil: You need to discover what updated it and why it happens at this moment.13:43
dulekdanil: And if needed implement retry logic in VIF handler.13:44
dulekdanil: This stuff is actually quite expected as we started annotating pods in kuryr-daemon as well.13:44
dulekIt is way easier to implement if there's just one entity doing that.13:45
dulekBut if only kuryr-daemon knows about SR-IOV, then it might be best design we have.13:45
*** irclogbot_1 has quit IRC13:45
dulekdanil: I would start with printing values of the annotations in the LOG.debug() I linked above.13:45
dulekTo compare what actually changes.13:45
danildulek, hm, actually binding driver made changes in annotations , but it was another name of annotations. I suppose that after this change resourceVersion of pod object was changed. So when controller tries to make annotations once again (with specified old resourceVersion) it has a conflict13:47
dulekdanil: No, this code is supposed to retry if only unrelated annotations were modified: https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L215-L22913:48
dulekdanil: So unless there's a flaw there (might be, it's not trivial), then it should retry if none of 'openstack.org/kuryr-vif' and 'openstack.org/kuryr-pod-label' got modified.13:49
dulekHm should it?13:49
dulekYes, it should.13:50
danildulek,  https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L220-L222 , here I got a break because 'openstack.org/kuryr-vif' was modified from controller side (it changed 'active' to True for default VIF because is have seen active related neutron port)13:58
danilbut it is normal behavior13:58
dulekdanil: Well, then you found the culprit, I assume?13:59
danildulek, no) is this not a normal behavior? neutron port becomes active -> related vif is marked as active -> annotate new annotations14:01
dulekdanil: Wait, you have 2 VIF's there now?14:06
dulekdanil: So it's probably like this:14:06
dulekBoth become active nearly the same time and one update wins the race condition and second loses it.14:07
dulekYou can confirm it the way I proposed above - just add the annotation value to the debug statement there, so you can see what is being annotated and what was overwritten, right?14:08
danildulek, yes, we have 2 VIF's there. (Actually one of them that is sriov is made active by hands, it doesn't wait for related neutron port). Also it is strange because this problems occurred only with https://review.opendev.org/#/c/656482/14:15
danilI confirmed14:15
danilwe try to write "active": True , but we have "active": False . This is for default vif14:16
dulekdanil: Uhm, just a sec, let me think.14:17
*** gcheresh_ has quit IRC14:17
danildulek, I ve checked one thing: I checked out on master for CNI side, and problem disappeared . So I think problem is in new patch that annotates to pod from sriov binding driver14:19
danildulek, no, sorry14:19
dulekdanil: Sure thing, if something modifies pod in between and annotation is already there we get an issue.14:19
dulekNow I don't really see if we could use JSON patch to help with this.14:23
dulekMaybeee…?14:23
dulekdanil: You can try to rewrite that logic to JSON patch - I think it would only fail then when that particular annotations were modified.14:24
dulekBut I'm not sure here.14:24
danildulek, ok, thanks. I will try to localize the place where problems start. It seems to me that I have not realized it yet)14:25
openstackgerritMerged openstack/kuryr master: Fix kuryr CI  https://review.opendev.org/66955314:33
*** gcheresh_ has joined #openstack-kuryr14:35
*** gcheresh_ has quit IRC14:58
*** ndesh has quit IRC15:05
danildulek, I've checked. And it looks strange:15:07
danilI've updated CNI to master version, and only controller side has new patch now. And it causes k8s exception15:08
openstackgerritAlexey Perevalov proposed openstack/kuryr-kubernetes master: Change trace pod/pool drivers are incompatible  https://review.opendev.org/66999215:10
openstackgerritAlexey Perevalov proposed openstack/kuryr-kubernetes master: Change trace pod/pool drivers are incompatible  https://review.opendev.org/66999215:11
*** pcaruana has quit IRC15:11
dulekdanil: IMO it's related to your pod having 2 interfaces.15:17
dulekBoth want to become active and that's why it happens.15:18
dulekIt wouldn't happen if both would be updated at the same time, but I think it's not the case.15:18
*** pcaruana has joined #openstack-kuryr15:57
*** altlogbot_3 has joined #openstack-kuryr16:04
openstackgerritMerged openstack/kuryr-kubernetes master: Make SG modifications for LoadBalancers optional  https://review.opendev.org/66522716:05
*** altlogbot_3 has quit IRC16:07
*** maysams has quit IRC16:32
*** altlogbot_2 has joined #openstack-kuryr16:44
*** gkadam has quit IRC16:46
*** altlogbot_2 has quit IRC16:49
*** altlogbot_2 has joined #openstack-kuryr17:22
*** altlogbot_2 has quit IRC17:25
*** gcheresh_ has joined #openstack-kuryr17:28
*** altlogbot_1 has joined #openstack-kuryr17:30
*** altlogbot_1 has quit IRC17:35
*** altlogbot_2 has joined #openstack-kuryr17:36
*** gcheresh_ has quit IRC17:40
*** altlogbot_2 has quit IRC17:41
*** altlogbot_2 has joined #openstack-kuryr17:43
*** altlogbot_2 has quit IRC17:45
*** irclogbot_3 has joined #openstack-kuryr19:28
*** irclogbot_3 has quit IRC19:33
*** mrostecki has joined #openstack-kuryr19:37
*** altlogbot_2 has joined #openstack-kuryr19:39
*** altlogbot_2 has quit IRC19:41
*** jistr has quit IRC19:43
*** jistr has joined #openstack-kuryr19:45
*** jistr has quit IRC20:03
*** jistr has joined #openstack-kuryr20:05
*** brault has joined #openstack-kuryr20:10
*** irclogbot_3 has joined #openstack-kuryr20:12
*** gcheresh_ has joined #openstack-kuryr20:13
*** brault has quit IRC20:15
*** irclogbot_3 has quit IRC20:16
*** brault has joined #openstack-kuryr20:16
*** brault has quit IRC20:17
*** pcaruana has quit IRC20:31
*** gcheresh_ has quit IRC20:57
*** altlogbot_3 has joined #openstack-kuryr21:00
*** altlogbot_3 has quit IRC21:05
*** irclogbot_0 has joined #openstack-kuryr21:23
*** irclogbot_0 has quit IRC21:26
*** irclogbot_1 has joined #openstack-kuryr21:29
*** irclogbot_1 has quit IRC21:32
*** danil has quit IRC21:57
*** altlogbot_2 has joined #openstack-kuryr22:17
*** altlogbot_3 has joined #openstack-kuryr22:19
*** irclogbot_0 has joined #openstack-kuryr22:19
*** altlogbot_3 has quit IRC22:23
*** irclogbot_0 has quit IRC22:24
*** rh-jelabarre has quit IRC22:27
*** altlogbot_2 has joined #openstack-kuryr22:57
*** altlogbot_2 has quit IRC22:59
*** irclogbot_0 has joined #openstack-kuryr23:01
*** irclogbot_0 has quit IRC23:06
*** irclogbot_0 has joined #openstack-kuryr23:09
*** irclogbot_0 has quit IRC23:16
*** altlogbot_3 has joined #openstack-kuryr23:29
*** altlogbot_3 has quit IRC23:30

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!