Wednesday, 2017-09-06

*** setra has joined #openstack-neutron-ovn00:19
*** setra has quit IRC00:44
russellbjamesdenton: for which database, the main ovs database?  (not the OVN dbs)00:51
russellbjamesdenton: "ovs-vsctl set-manager <...>" -- that persists it in the db00:52
russellbthere are separate commands for the equivalent settings in the OVN northbound and southbound dbs00:53
russellbfor those, it's "ovn-nbctl set-connection <...>" and "ovn-sbctl set-connection <...>"00:54
*** yamamoto has joined #openstack-neutron-ovn01:15
openstackgerritDong Jun proposed openstack/networking-ovn master: Python3.5 RuntimeError: dictionary changed size during iteration  https://review.openstack.org/50106201:43
openstackgerritDong Jun proposed openstack/networking-ovn master: Set requested-chassis with binding host_id.  https://review.openstack.org/49982001:44
*** viaggarw has joined #openstack-neutron-ovn01:48
*** vikrant has joined #openstack-neutron-ovn01:48
*** fzdarsky_ has joined #openstack-neutron-ovn01:52
*** fzdarsky has quit IRC01:54
*** viaggarw has quit IRC02:08
*** vikrant has quit IRC02:08
*** setra has joined #openstack-neutron-ovn02:12
*** yamamoto_ has joined #openstack-neutron-ovn03:25
*** yamamoto has quit IRC03:28
*** janki has joined #openstack-neutron-ovn04:35
*** trinaths has joined #openstack-neutron-ovn04:42
*** yamamoto_ has quit IRC05:07
*** yamamoto has joined #openstack-neutron-ovn05:09
*** pcaruana has joined #openstack-neutron-ovn05:27
*** janki has quit IRC05:28
*** janki has joined #openstack-neutron-ovn05:53
*** zefferno has joined #openstack-neutron-ovn06:02
*** janki has quit IRC06:29
*** yamamoto has quit IRC06:34
*** yamamoto has joined #openstack-neutron-ovn06:35
*** janki has joined #openstack-neutron-ovn06:57
*** ajo has quit IRC07:33
*** ajo has joined #openstack-neutron-ovn07:45
*** lucas-afk is now known as lucasagomes08:28
*** ajo has quit IRC08:39
*** ajo has joined #openstack-neutron-ovn09:04
*** openstackgerrit has quit IRC09:18
*** setra has quit IRC09:23
*** setra has joined #openstack-neutron-ovn09:29
*** yamamoto has quit IRC09:40
*** openstackgerrit has joined #openstack-neutron-ovn09:55
openstackgerritMerged openstack/networking-ovn master: Python3.5 RuntimeError: dictionary changed size during iteration  https://review.openstack.org/50106209:55
*** openstackgerrit has quit IRC10:03
*** yamamoto has joined #openstack-neutron-ovn10:11
*** yamamoto has quit IRC10:16
*** numans has joined #openstack-neutron-ovn10:19
numanslucasagomes, Hi10:20
lucasagomesnumans, hi there10:20
numanslucasagomes, i need your help a bit in this review - https://review.openstack.org/#/c/494293/10:21
lucasagomesnumans, sure, lemme take a look10:21
numanslucasagomes, basicaly the job - 'gate-tripleo-ci-centos-7-scenario007-multinode-oooq' is failing10:21
* lucasagomes check logs10:21
numanslucasagomes, so this job deploys OVN using tripleo and then runs the tempest tests10:21
lucasagomesany visible error ?10:21
numanslucasagomes, the test just times out.10:21
numanslucasagomes, i think it is blocking when tempest test calls GET v2/securitygroups API10:22
lucasagomesright10:22
lucasagomesone strange thing there, why are we running the dhcp agent ?10:22
lucasagomeshttp://logs.openstack.org/93/494293/13/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/32582c3/logs/undercloud/var/log/neutron/dhcp-agent.log.txt.gz10:22
numanslucasagomes, dhcp agent shouldn't be started10:23
numanslet me see10:23
lucasagomesit seems that it's running10:23
numanslucasagomes, you are seeing the logs of undercloud neutron10:23
numanslucasagomes, see subnodes-210:23
lucasagomesoh right10:24
numanslucasagomes, here is the tempest logs - http://logs.openstack.org/93/494293/13/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/32582c3/logs/undercloud/home/jenkins/tempest_output.log.txt.gz10:24
* lucasagomes looks10:24
numansany pointers would be really helpful. i am looking into this issue since almost 3 days10:24
numanslucasagomes, on a side note, i want to know if we can run just the one test - tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops in our networking-ovn job10:26
lucasagomesnumans, cool, I will jump in see if I can see something in the logs10:26
numanslucasagomes, thanks10:26
numanslucasagomes, have a look at this review as well - https://review.openstack.org/#/c/500609/ (this is same patch except that i am running 2 tests) and the log here - http://logs.openstack.org/09/500609/1/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/db52f31/logs/undercloud/home/jenkins/tempest_output.log.txt.gz10:28
numanslucasagomes, this shows the time out exception10:28
lucasagomesnumans, http://logs.openstack.org/93/494293/13/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/32582c3/logs/undercloud/home/jenkins/tempest/tempest.log.txt.gz10:31
lucasagomesas well10:31
lucasagomeshere's with the timestamp: http://logs.openstack.org/93/494293/13/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/32582c3/logs/undercloud/home/jenkins/tempest/tempest.log.txt.gz#_2017-09-06_03_10_01_49410:32
*** yamamoto has joined #openstack-neutron-ovn10:32
numanslucasagomes, i saw that and i dont think that's an issue10:32
numanslucasagomes, we see the same error even in networking-ovn jobs10:32
lucasagomesoh ok10:32
numanslucasagomes, to fix this issue i submitted a patch here - https://review.openstack.org/#/c/500815/10:33
lucasagomescause I was looking at the containerized version of that job and I didn't see that ping error10:33
* lucasagomes looks10:33
numanslucasagomes, in the containerized job, tempest is not run10:33
lucasagomesnumans, http://logs.openstack.org/93/494293/13/check/gate-tripleo-ci-centos-7-containers-multinode/d892bcb/logs/undercloud/home/jenkins/tempest_output.log.txt.gz#_2017-09-06_03_23_1310:34
numanslucasagomes, only gate-tripleo-ci-centos-7-scenario007- deploys OVN, rest is default neutron10:34
lucasagomesright on10:34
numanslucasagomes, yeah, the issue is seen only with OVN jobs not with ml2ovs10:34
numanslucasagomes, so something is definitely wrong either with networking-ovn or with OVN10:35
numanslucasagomes, in the case of OVN, the test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops is successful, but cleanup is failing10:36
lucasagomesyeah right at the tearDownClass10:38
* lucasagomes looks at the tempest code 10:38
*** setra has quit IRC10:48
*** yamamoto has quit IRC10:48
lucasagomesnumans, I'm still wondering about that ping test, I don't see any other apparent error11:03
lucasagomeseven increasing the number of pings from 1 to 3 the ping command still returns 1 (return code)11:03
lucasagomeswhich is seem as a failure11:04
numanslucasagomes, that's true.11:04
lucasagomescause 2 pings succeeded and 1 was lost11:04
lucasagomesas you describe in the commit message11:04
lucasagomesand once that happens, it seems to be seem as a timeout11:04
numanslucasagomes, but why it doesn't complete the test and timeout ?11:04
lucasagomeslook here (1 sec lemme grab the code)11:04
numansok11:05
lucasagomesnumans, https://github.com/openstack/tempest/blob/263d551edef01eb654c7f32e6821e3caec5f1ea3/tempest/scenario/manager.py#L962-L96711:06
lucasagomesthe _check_remote_conectivity() (the private one)11:06
lucasagomesis the one invoking ping_host11:06
lucasagomeswhich will return a failure due that missing ping in OVN11:06
lucasagomesand will log that time out message11:06
numanslucasagomes, if we remove this "set -eu -o pipefail;" it would pass i suppose11:07
numanslucasagomes, i mean the ping result11:07
lucasagomesright yeah11:07
numanslucasagomes, i will submit a test patch in tempest and make it depends on ... let me try that11:08
lucasagomesnumans, let's add a patch doing it ? Or maybe there's a way to tell ping to ignore 1 packet lost11:08
lucasagomeslike a success rate or something, idk11:08
lucasagomesnumans, ack11:08
lucasagomesbecause I don't see anything else that ressembles a error apart from that11:08
numanslucasagomes,  set -eu -o pipefail is the one which is making the command fail11:08
lucasagomesI think that's the thing, it fails to ping and then it goes to teardown and it fails there too11:08
numanslucasagomes, that's true. but why the test blocks in cleanup ?11:08
lucasagomesso one exception is covering the other11:09
lucasagomesnumans, I think teardown will be called upon success or failure right ?11:09
numanslucasagomes, it does cleanup right ?11:09
lucasagomesso if it fails on the test and then fails on the teardown as well maybe one exception will overwrite the other ? idk11:09
lucasagomesyeah it def calls the cleanup11:09
lucasagomeswhich fails listing the security groups for some reason11:10
numanslucasagomes, may be. let me try this. thanks for your help and time11:10
numanslucasagomes, it actually blocks or times out11:10
lucasagomesnumans, cool np I will continue to look11:10
numanslucasagomes, thanks11:10
lucasagomeslet's see if the ping solves it11:10
lucasagomesif not we can continue to dig into it11:10
numanslucasagomes, i just hope, tripleo will pick up the tempest patch and use that11:10
lucasagomesit should, with the depends-on in place11:10
numanslucasagomes, you happened to see the code where ping command is framed ?11:12
lucasagomesnumans, yeah, grep for "def ping_host"11:12
numanslucasagomes, ok11:12
lucasagomesnumans,  tempest/lib/common/utils/linux/remote_client.py:11:13
numanslucasagomes, got it thanks11:13
lucasagomesnp11:13
* lucasagomes fingers crossed11:13
numanslucasagomes, luckily i dont have to submit a patch in tempest - https://github.com/openstack/tempest/blob/263d551edef01eb654c7f32e6821e3caec5f1ea3/tempest/config.py#L70711:15
numanslucasagomes, i can just update this patch - https://review.openstack.org/#/c/500815/11:16
lucasagomesah nice11:16
lucasagomesnumans, by always failing the first ping in case the openflow rule is not in place, isn't it a bug in OVN ?11:16
lucasagomesor seem as one ?11:17
numanslucasagomes, that is a design decision11:17
*** jchhatbar has joined #openstack-neutron-ovn11:17
lucasagomesok11:17
numanslucasagomes, if you see the commit message11:17
numanslucasagomes, what happens is that ovn-controller stores the learnt mac/ip  in MAC_Binding table in southbound db11:17
lucasagomesyeah i saw that, I was just wondering if we should consider that a bug in ovn or not11:17
lucasagomesI understand it's setting up the rule upon the first ping11:18
numanslucasagomes, in case for an ipv4 packet, if it doesn't know the mac for the ip, it generates ARP request packet and sends it11:18
lucasagomesI see11:18
numanslucasagomes, it's tricky to fix right :). you need to kind of enqueue that packet until the ARP reply is seen and then resend it :)11:19
lucasagomesyeah, I just wondering because, everyone testing the connective by the return code of the ping command will have some trouble11:19
lucasagomesyeah it's very tricky11:19
numanslucasagomes, yeah even i thought about it :)11:19
lucasagomesconectivity with the return*11:19
*** janki has quit IRC11:20
jamesdentonrussellb - thanks for the help11:20
lucasagomesnumans, yeah well, fair enough one step at time... let's see if removing the pipefail solves the problem11:20
lucasagomesnumans, I will grab some quick lunch and be right back11:20
numanslucasagomes, bon appetite :)11:21
lucasagomescheers :D11:21
*** lucasagomes is now known as lucas-hungry11:21
*** trinaths has left #openstack-neutron-ovn11:28
*** yamamoto has joined #openstack-neutron-ovn11:45
*** yamamoto has quit IRC12:13
numanslucas-hungry, lets see how this one goes :) - https://review.openstack.org/#/c/500609/12:26
*** yamamoto has joined #openstack-neutron-ovn12:28
*** lucas-hungry is now known as lucasagomes12:34
lucasagomesnumans, cool! I will keep an eye12:35
lucasagomeson it*12:35
*** mmichelson has joined #openstack-neutron-ovn12:57
*** thegreenhundred has joined #openstack-neutron-ovn13:30
*** openstackgerrit has joined #openstack-neutron-ovn13:51
openstackgerritMerged openstack/networking-ovn master: Delete dummy files  https://review.openstack.org/50086613:51
*** zefferno has quit IRC13:57
numansrussellb, can you please have a look at this patch - https://review.openstack.org/#/c/500367/14:00
russellbdone14:02
numansrussellb, thanks14:03
numansrussellb, and this one as well - https://review.openstack.org/#/c/500512/14:04
*** jchhatbar is now known as janki14:11
*** janki has quit IRC14:14
openstackgerritMerged openstack/networking-ovn master: Small refactor of metadata bits  https://review.openstack.org/49303814:14
*** yamamoto has quit IRC14:17
lucasagomesnumans, the job failed on gate but the error seems to be misconfiguration ? See: http://logs.openstack.org/09/500609/3/check/gate-tripleo-ci-centos-7-scenario007-multinode-oooq/7ed70d9/logs/undercloud/home/jenkins/tempest_output.log.txt.gz#_2017-09-06_13_56_2614:30
numanslucasagomes, yeah i saw that :(14:30
lucasagomes:-/14:30
dalvarezrussellb, numans i have an issue when starting networking-ovn-metadata-agent in OVN from tripleO. When the agent starts (under neutron user) it tries to connect to OVSDB server through the UNIX socket but it fails because ovsdb-server is started under 'openvswitch' user and it lacks permissions14:58
dalvarezrussellb, numans the thing is that same may happen in ML2/OVS and they set a manager if the connection fails: https://github.com/openstack/neutron/blob/master/neutron/agent/ovsdb/native/connection.py#L3514:59
dalvarezso that they can connect to ovsdb via TCP14:59
dalvarezas stated in https://github.com/openstack/ovsdbapp/blob/master/ovsdbapp/schema/open_vswitch/helpers.py#L29  it would be better if tripleO or the deployment tool would set this manager14:59
dalvarezrussellb, numans what do you guys think? should i patch metadata agent to enable it? try in tripleo instead?15:00
dalvarezajo ^15:00
ajodalvarez reading15:00
russellbor run the agent as openvswitch user15:00
dalvarezrussellb doesn't it feel weird?15:01
russellbif we open tcp, that's fine, but i'd rather do it from tripleo15:01
ajoyes we may do it from tripleO15:01
dalvarezrussellb, yeah i agree although im not quite sure where to do that and how15:01
dalvarezi'd go with tripleo approach too15:02
ajonumans could may be give you some pointers15:02
ajonumans when a tripleo/OVN internals deep dive? :) (or did we had something already and I missed it?)15:02
dalvarezhah15:02
jamesdentonIs it advisable/not-advisable to leave neutron_sync_mode as 'repair' permanently?15:03
dalvarezrussellb, ajo numans: https://github.com/openstack/puppet-neutron/blob/master/manifests/plugins/ovs/opendaylight.pp#L9615:04
dalvarezthey're already doing it in ODL, shall i mimic that or just do it on compute nodes?15:05
dalvarezif possible15:05
russellbjamesdenton: i wouldn't ... we're working to get rid of the need for it completely ...15:05
jamesdentonthanks russellb. Is the goal some sort of self-healing process?15:06
numansdalvarez, russellb it would be better to use  the local unix socket though right ?15:07
numansi understand about the permissions,15:07
russellbjamesdenton: more eliminating all known cases where things could get out of sync, let me link to the current proposal15:07
russellbnumans: yes15:07
dalvareznumans, security reasons?15:07
numansin my setup, i don't see ovs-vswitchd running as openvswitch user15:08
numansit is running as root15:08
dalvareznumans, tripleo here:15:08
dalvarez[heat-admin@overcloud-novacompute-0 ~]$ ps -ef | grep ovsdb-server15:08
dalvarezopenvsw+    7159       1  0 09:17 ?        00:00:03 ovsdb-server /etc/openvswitch/conf.db15:08
russellbjamesdenton: https://review.openstack.org/#/c/490834/15:08
dalvarezdeployed today running master15:08
numansdalvarez, but ovn-controller also connects to unix socket right15:08
dalvareznumans, ovn-controller runs as root15:08
numansdalvarez, i need to deploy a fresh one i guess.15:08
numansdalvarez, ok15:08
jamesdentonthanks!15:09
*** amuller has joined #openstack-neutron-ovn15:09
dalvarezrussellb, numans ajo so i guess that unless we run the agent under 'openvswitch' user we need to use TCP15:10
dalvarezis it a security concern? it would listen only on localhost15:10
dalvarezit's already done by ovs agent in ml2/ovs and ODL sets the manager too15:10
russellbseems fine15:10
ajodalvarez I wonder if it could also be done via ACLs or groups15:11
ajovia ACLs we could set the domain socket in a context, and the metadata agent into the same context15:11
ajoI haven't done that before but I suspect it could be possible15:12
dalvarezajo maybe that works but i'd do that only if we have security concerns15:12
dalvarezwhich i personally can't see now but i'm probably missing something15:13
ajomay be there aren't any big ones15:13
ajodalvarez may be if it's on the openvswitch group it would be able to fiddle or trash the db files directly if it wanted15:13
dalvarezbut that won't happen with the TCP connection right?15:14
dalvarezso i can see two options now :)15:14
ajoif, may be hacked from a metadata request somehow which I highly doubt because the workflow is quite simple15:14
ajonot the files15:14
dalvarez1. let tripleo set the manager15:14
ajobut I guess you can do many evil things via ovsdb directly :D15:14
dalvarez2. add neutron user to openvswitch group so that it can connect to it15:14
dalvarezyeah that's true15:15
dalvarezit's kinda exposed since it's accepting requests on a unix socket15:15
ajo3. run metadata agent under openvswitch user ?15:15
dalvarezso evil instances could potentially exploit it15:15
ajoor do we need the neutron group too ?15:15
dalvarezajo yeah 3 should be fine too but breaks the general philosophy we already have ?15:15
ajorootwrap daemon I guess15:15
dalvareznot sure about that one15:15
ajoneed to AFK for a while, may be I'd go with 1. even if more costly, it's the same thing we do with other things15:16
ajojust to be uniform15:16
*** yamamoto has joined #openstack-neutron-ovn15:17
dalvarezi'd go with 1 as well15:18
dalvarezi'll give it a go, will do that the same way as ODL even tho we need it just on compute nodes15:18
dalvareznumans, you know of a way to do https://github.com/openstack/puppet-neutron/blob/master/manifests/plugins/ovs/opendaylight.pp#L96  only on compute nodes?15:18
numansdalvarez, i will come back on this later :)15:19
dalvarezsure! :)15:19
*** yamamoto has quit IRC15:25
otherwiseguyshould be able to add user to the openvswitch group since the socket should be created 0770 according to bind_unix_socket().15:31
openstackgerritMerged openstack/networking-ovn master: Track router and floatingip quota usage using TrackedResource  https://review.openstack.org/50036715:36
dalvarezotherwiseguy++15:44
dalvarez$ stat -c %a /var/run/openvswitch/db.sock15:47
dalvarez75015:47
* otherwiseguy sighs15:49
dalvarez:(15:50
openstackgerritMerged openstack/networking-ovn master: Add DNS db mixin in l3 plugin  https://review.openstack.org/50051215:54
otherwiseguydalvarez, with strace I can definitely see it calling fchmod(fd, 0770) then bind(fd, ...), but then the file ends up owned 0750...16:06
*** mmichelson has quit IRC16:07
*** lucasagomes is now known as lucas-afk16:07
dalvarezotherwiseguy, weird!16:27
dalvarezotherwiseguy, i think we definitely need to set the manager from tripleo :) i asked beagles to find out how to do it only on compute nodes16:28
otherwiseguydalvarez, it at least seems like a reasonable workaround. i remember tearing my hair out trying to get permissions stuff sorted last time I looked at it. "THIS SHOULD BE WORKING".16:30
otherwiseguyNever got it working without changing umask, though that changes the permissions of any files created (log files, etc.)16:31
dalvarezyeah :(16:31
otherwiseguydalvarez, https://stackoverflow.com/questions/11781134/change-linux-socket-file-permissions some comment said that fchmod didn't work for them, they had to use chmod.16:35
dalvarezotherwiseguy, yeah i see the last comment..."I  have tested fchmod() on Linux. None of the combinations (before bind, after listen) worked. In all cases it returned 0 but did not change the file permissions. Only chmod() worked"16:47
dalvarezotherwiseguy, shall we fill a bug against ovs and use chmod instead (assuming it's using fchmod now)16:47
otherwiseguydalvarez, I'm checking to see if that makes any difference right now. :)16:48
dalvarezotherwiseguy++16:48
* otherwiseguy waits for ovs to build16:48
otherwiseguythe problem with chmod is that if files are moved etc. i think it fails.16:49
otherwiseguyas in https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use16:53
dalvarezahh i see16:53
dalvarezotherwiseguy, from https://stackoverflow.com/questions/1892501/is-it-better-to-use-fchmod-over-chmod   "With chmod you run the risk of someone renaming the file out from under you and chmodding the wrong file. In certain situations (especially if you're root) this can be a huge security hole."16:54
dalvarezotherwiseguy, i gotta run now it's been a looooooong first day after PTO but drop a message if you want me to try something tomorrow first thing in the morning and i'll do that!16:56
dalvarezthanks for your awesome help :)16:56
dalvarezcya tomorrow!16:57
otherwiseguydalvarez, have a nice night16:57
otherwiseguydalvarez, i'm checking to see if it works anyway just out of curiosity.16:57
*** yamamoto has joined #openstack-neutron-ovn17:13
*** yamamoto has quit IRC17:17
otherwiseguydalvarez, yep, chmod after bind works even if it is a bad idea. :p17:39
*** anilvenkata has quit IRC18:10
*** lrichard_ has joined #openstack-neutron-ovn18:52
*** lrichard_ has quit IRC18:53
*** lrichard_ has joined #openstack-neutron-ovn18:54
*** lrichard has quit IRC18:54
*** pcaruana has quit IRC19:34
*** mmichelson has joined #openstack-neutron-ovn20:04
*** amuller has quit IRC20:13
*** mmichelson has quit IRC21:49
*** thegreenhundred has quit IRC22:55
*** mmichelson has joined #openstack-neutron-ovn23:35
*** mmichelson has quit IRC23:40

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!