*** daneyon has joined #openstack-kolla | 00:10 | |
*** sdake has joined #openstack-kolla | 00:13 | |
*** daneyon has quit IRC | 00:14 | |
*** rhallisey has quit IRC | 00:26 | |
*** sdake has quit IRC | 00:32 | |
*** fragatina has joined #openstack-kolla | 00:48 | |
*** fragatina has quit IRC | 00:54 | |
*** banix has quit IRC | 00:55 | |
*** mkoderer_ has quit IRC | 01:02 | |
*** mkoderer has joined #openstack-kolla | 01:03 | |
*** lmiccini_ has quit IRC | 01:03 | |
*** sdake has joined #openstack-kolla | 01:05 | |
*** sdake_ has joined #openstack-kolla | 01:07 | |
*** lmiccini has joined #openstack-kolla | 01:08 | |
*** sdake has quit IRC | 01:10 | |
*** huikang has joined #openstack-kolla | 01:12 | |
*** Administrator_ has joined #openstack-kolla | 01:15 | |
*** zhugaoxiao has quit IRC | 01:18 | |
*** zhugaoxiao has joined #openstack-kolla | 01:19 | |
*** Administrator_ has quit IRC | 01:19 | |
*** banix has joined #openstack-kolla | 01:20 | |
*** Jeffrey4l has quit IRC | 01:55 | |
*** daneyon has joined #openstack-kolla | 01:58 | |
*** daneyon has quit IRC | 02:03 | |
*** Jeffrey4l has joined #openstack-kolla | 02:07 | |
*** dwalsh has joined #openstack-kolla | 02:13 | |
*** huikang has quit IRC | 02:22 | |
*** sdake_ has quit IRC | 02:23 | |
*** huikang has joined #openstack-kolla | 02:25 | |
*** sdake has joined #openstack-kolla | 02:28 | |
*** huikang has quit IRC | 02:32 | |
*** dwalsh has quit IRC | 02:38 | |
*** dwalsh has joined #openstack-kolla | 02:43 | |
*** fragatina has joined #openstack-kolla | 02:51 | |
*** dwalsh has quit IRC | 02:52 | |
*** fragatina has quit IRC | 02:56 | |
*** signed8bit_Zzz is now known as signed8bit | 03:03 | |
*** signed8bit is now known as signed8bit_Zzz | 03:03 | |
*** huikang has joined #openstack-kolla | 03:08 | |
*** signed8bit_Zzz is now known as signed8bit | 03:12 | |
*** huikang has quit IRC | 03:13 | |
openstackgerrit | Jeffrey Zhang proposed openstack/kolla: Make the kolla_keystone_service can update fields https://review.openstack.org/348382 | 03:15 |
---|---|---|
openstackgerrit | Jeffrey Zhang proposed openstack/kolla: Enable the nova microversion api https://review.openstack.org/348432 | 03:15 |
*** signed8bit has quit IRC | 03:17 | |
*** mkoderer has quit IRC | 03:37 | |
*** lmiccini has quit IRC | 03:37 | |
*** huikang has joined #openstack-kolla | 03:38 | |
*** sdake has quit IRC | 03:38 | |
*** mkoderer has joined #openstack-kolla | 03:39 | |
*** lmiccini has joined #openstack-kolla | 03:43 | |
*** dave-mccowan has quit IRC | 03:44 | |
*** daneyon has joined #openstack-kolla | 03:47 | |
*** daneyon has quit IRC | 03:51 | |
openstackgerrit | Shaun Smekel proposed openstack/kolla: Add full support for fernet [WIP] https://review.openstack.org/349366 | 04:00 |
openstackgerrit | Shaun Smekel proposed openstack/kolla: Add full support for fernet [WIP] https://review.openstack.org/349366 | 04:00 |
openstackgerrit | Shaun Smekel proposed openstack/kolla: Add dockerfiles for keystone fernet https://review.openstack.org/351139 | 04:06 |
openstackgerrit | Shaun Smekel proposed openstack/kolla: Add dockerfiles for keystone fernet https://review.openstack.org/351139 | 04:08 |
*** banix has quit IRC | 04:15 | |
*** zhugx has joined #openstack-kolla | 04:59 | |
*** huikang has quit IRC | 05:38 | |
*** fragatina has joined #openstack-kolla | 05:47 | |
*** fragatina has quit IRC | 05:54 | |
*** lmiccini has quit IRC | 06:22 | |
*** mkoderer has quit IRC | 06:24 | |
*** mkoderer has joined #openstack-kolla | 06:25 | |
*** lmiccini has joined #openstack-kolla | 06:28 | |
*** senk has joined #openstack-kolla | 06:29 | |
*** daneyon has joined #openstack-kolla | 06:29 | |
*** daneyon has quit IRC | 06:34 | |
openstackgerrit | Hiroki Ito proposed openstack/kolla: Prechecks fails when using multinode deploy using a single node and haproxy disabled https://review.openstack.org/351588 | 06:37 |
*** zhurong has joined #openstack-kolla | 06:44 | |
*** senk has quit IRC | 07:02 | |
*** senk__ has joined #openstack-kolla | 07:02 | |
*** senk has joined #openstack-kolla | 07:22 | |
*** senk__ has quit IRC | 07:23 | |
*** dwalsh has joined #openstack-kolla | 07:26 | |
*** dwalsh has quit IRC | 07:44 | |
*** mewald has joined #openstack-kolla | 07:46 | |
*** fragatina has joined #openstack-kolla | 07:50 | |
*** fragatina has quit IRC | 07:56 | |
*** bootsha has joined #openstack-kolla | 08:09 | |
*** zhurong has quit IRC | 08:11 | |
*** daneyon has joined #openstack-kolla | 08:17 | |
*** daneyon has quit IRC | 08:22 | |
*** bootsha has quit IRC | 08:34 | |
*** bootsha has joined #openstack-kolla | 08:36 | |
*** bootsha has quit IRC | 08:39 | |
*** senk has quit IRC | 08:42 | |
*** fragatina has joined #openstack-kolla | 09:53 | |
*** mewald has quit IRC | 09:55 | |
*** senk has joined #openstack-kolla | 09:56 | |
*** fragatina has quit IRC | 09:58 | |
*** daneyon has joined #openstack-kolla | 10:06 | |
openstackgerrit | Jeffrey Zhang proposed openstack/kolla: Make the kolla_keystone_service can update fields https://review.openstack.org/348382 | 10:06 |
openstackgerrit | Jeffrey Zhang proposed openstack/kolla: Enable the nova microversion api https://review.openstack.org/348432 | 10:06 |
*** senk has quit IRC | 10:08 | |
*** daneyon has quit IRC | 10:10 | |
*** dwalsh has joined #openstack-kolla | 10:13 | |
*** ad_rien_ has joined #openstack-kolla | 10:14 | |
*** dwalsh has quit IRC | 10:30 | |
*** bootsha has joined #openstack-kolla | 10:36 | |
*** egonzalez90 has joined #openstack-kolla | 11:22 | |
*** Jeffrey4l has quit IRC | 11:24 | |
*** zhurong has joined #openstack-kolla | 11:29 | |
*** dave-mccowan has joined #openstack-kolla | 11:34 | |
*** egonzalez90 has quit IRC | 11:36 | |
*** rhallisey has joined #openstack-kolla | 11:39 | |
*** fragatina has joined #openstack-kolla | 11:47 | |
*** fragatina has quit IRC | 11:53 | |
*** daneyon has joined #openstack-kolla | 11:54 | |
*** daneyon has quit IRC | 11:59 | |
*** senk has joined #openstack-kolla | 12:04 | |
*** senk has quit IRC | 12:29 | |
*** zhugaoxiao has quit IRC | 12:37 | |
*** zhurong has quit IRC | 12:38 | |
*** zhugaoxiao has joined #openstack-kolla | 12:38 | |
*** zhurong has joined #openstack-kolla | 12:39 | |
*** zhurong has quit IRC | 12:43 | |
*** zhurong has joined #openstack-kolla | 12:45 | |
*** banix has joined #openstack-kolla | 12:48 | |
*** zhurong has quit IRC | 12:49 | |
*** sdake has joined #openstack-kolla | 12:50 | |
*** signed8bit has joined #openstack-kolla | 12:52 | |
*** sdake_ has joined #openstack-kolla | 12:53 | |
*** sdake has quit IRC | 12:54 | |
sdake_ | morning | 12:55 |
*** banix has quit IRC | 13:05 | |
sbezverk | sdake_ morning, do you have sometime next week to dedicate to the traceback issue? I am afraid the workaround is not as stable as I was hoping.. | 13:13 |
sdake_ | want to do now? | 13:14 |
sdake_ | or next week | 13:14 |
sdake_ | the week is super busy typically | 13:14 |
sbezverk | next week, I am about to leave for a trip and my test bed is down.. | 13:14 |
sdake_ | ok - well lets play it by ear | 13:15 |
sdake_ | the afternoons are typically downtime for our community - so that may be the best time to do the work | 13:15 |
sdake_ | we need uninterrupted time | 13:15 |
sbezverk | extra piece of info. while the script is short it looks stable | 13:15 |
sdake_ | and from around 5am -> 3pm PST the irc channel is off the hook | 13:15 |
sbezverk | but with extra commands I absolutely need it starts behaving as the original issue | 13:16 |
sdake_ | even when using a rc insted of a ds? | 13:16 |
sdake_ | if so, that sounds like a fundamental isue with kubernetes and openvswitch integration | 13:17 |
sbezverk | ok I can go for 9pm est monday or wednesday | 13:17 |
sdake_ | we can do earlier if you like | 13:17 |
sdake_ | but all depends on how busy the channel is | 13:17 |
sdake_ | have many cats to feed :) | 13:18 |
* sdake_ is crazy cat lady | 13:18 | |
sbezverk | :-) | 13:18 |
sbezverk | I am not so sure about fundamental issue as on one of two nodes it works perfectly.. | 13:18 |
sdake_ | but 3 ndoes fails? | 13:20 |
sdake_ | sbezverk it could be your hardware | 13:23 |
sdake_ | does your gear include ECC ram? | 13:23 |
sbezverk | :-) it happens on both nodes (I have two compute nodes) | 13:24 |
sbezverk | just not at the same time | 13:24 |
*** banix has joined #openstack-kolla | 13:26 | |
*** banix has quit IRC | 13:26 | |
*** signed8bit is now known as signed8bit_Zzz | 13:26 | |
*** banix has joined #openstack-kolla | 13:29 | |
openstackgerrit | Merged openstack/kolla-kubernetes: Spec - Deploy kolla-kubernetes with Ansible https://review.openstack.org/335279 | 13:39 |
*** daneyon has joined #openstack-kolla | 13:43 | |
*** senk has joined #openstack-kolla | 13:45 | |
*** Jeffrey4l has joined #openstack-kolla | 13:46 | |
*** daneyon has quit IRC | 13:47 | |
*** fragatina has joined #openstack-kolla | 13:51 | |
*** signed8bit_Zzz is now known as signed8bit | 13:55 | |
*** fragatina has quit IRC | 13:56 | |
*** diogogmt has quit IRC | 13:57 | |
*** diogogmt has joined #openstack-kolla | 13:59 | |
*** banix has quit IRC | 14:08 | |
sdake_ | sbezverk even with rcs? | 14:09 |
sdake_ | you said it wasn't as stable as you originally thgouth - could you cxpand on that statement | 14:10 |
*** diogogmt has quit IRC | 14:11 | |
*** signed8bit is now known as signed8bit_Zzz | 14:14 | |
*** senk has quit IRC | 14:25 | |
*** zhugx has quit IRC | 14:34 | |
*** huikang has joined #openstack-kolla | 14:44 | |
*** sdake_ has quit IRC | 14:52 | |
*** signed8bit_Zzz is now known as signed8bit | 15:01 | |
*** sdake has joined #openstack-kolla | 15:05 | |
sdake | pbourke around? | 15:06 |
sdake | sean-k-mooney around? | 15:09 |
sdake | any other cats that have used the osic cluster already - could use a bone thrown :) | 15:09 |
*** duonghq has joined #openstack-kolla | 15:13 | |
*** huikang has quit IRC | 15:19 | |
*** signed8bit is now known as signed8bit_Zzz | 15:20 | |
*** zhugaoxiao has quit IRC | 15:21 | |
*** zhugaoxiao has joined #openstack-kolla | 15:22 | |
*** huikang has joined #openstack-kolla | 15:22 | |
*** daneyon has joined #openstack-kolla | 15:31 | |
*** daneyon has quit IRC | 15:35 | |
*** dwalsh has joined #openstack-kolla | 16:01 | |
*** dave-mccowan has quit IRC | 16:10 | |
*** huikang has quit IRC | 16:12 | |
*** dave-mccowan has joined #openstack-kolla | 16:14 | |
*** duonghq has left #openstack-kolla | 16:18 | |
*** senk has joined #openstack-kolla | 16:20 | |
*** huikang has joined #openstack-kolla | 16:26 | |
*** huikang has quit IRC | 16:31 | |
*** zhubingbing has joined #openstack-kolla | 16:35 | |
openstackgerrit | zhubingbing proposed openstack/kolla: Add aodh role https://review.openstack.org/351027 | 16:37 |
openstackgerrit | zhubingbing proposed openstack/kolla: Add sahara ansible role https://review.openstack.org/351294 | 16:45 |
*** dwalsh has quit IRC | 17:07 | |
*** daneyon has joined #openstack-kolla | 17:19 | |
*** daneyon has quit IRC | 17:24 | |
sdake | so.... | 17:35 |
sdake | hot.... | 17:35 |
*** fragatina has joined #openstack-kolla | 17:47 | |
*** fragatina has quit IRC | 17:52 | |
openstackgerrit | zhubingbing proposed openstack/kolla: Add gnocchi ansible role https://review.openstack.org/349351 | 17:53 |
openstackgerrit | zhubingbing proposed openstack/kolla: fix sahara dockerfile https://review.openstack.org/351320 | 17:56 |
zhubingbing | - - | 17:56 |
zhubingbing | hot | 17:56 |
sdake | ya 115F | 18:04 |
sdake | arizona is a very hot place in the summer - atleast in phoenix | 18:04 |
zhubingbing | 去游泳 | 18:07 |
zhubingbing | 去游泳 | 18:07 |
zhubingbing | Go for a swim | 18:07 |
zhubingbing | - - | 18:07 |
zhubingbing | we're hot, too | 18:09 |
*** senk has quit IRC | 18:15 | |
zhubingbing | sdake,see you | 18:18 |
sdake | swim lol | 18:18 |
sdake | zhubingbing sorry to hear it :( | 18:18 |
zhubingbing | We are 2 in the morning, tomorrow we have to go on working. | 18:20 |
zhubingbing | i'll miss you :) | 18:21 |
sdake | ttyl | 18:23 |
sbezverk | sdake ping | 18:36 |
sdake | wound me sbezverk | 18:36 |
sbezverk | another observation if 3 container pod gets into an issue, then by manually restarting ovsdb container recovers everything | 18:37 |
sbezverk | do you want to add some extra debugging to ovsdb-server source and recompile it? | 18:39 |
sbezverk | ideally it should be done by ovs developers, but I am not sure if they go for it | 18:40 |
openstackgerrit | Steven Dake proposed openstack/kolla: Add OSIC Scale Testing Documentation https://review.openstack.org/352101 | 18:41 |
sdake | too much on my plate to do that atm sbezverk | 18:41 |
sdake | lets get a backtrace | 18:41 |
sdake | and call it a day | 18:41 |
sdake | i dont think \you understand how much impact a backtrace has on c developers | 18:43 |
sdake | it will spur action - take my word or it | 18:43 |
sdake | we may need to do a little lmor then a backtrace | 18:44 |
sdake | but lets get the thing into gdb so we can get a backtrace and produce debug inf ofor ovs cats to work with | 18:44 |
sdake | right now what your telling them is "it doesn't work" | 18:44 |
sbezverk | things changed a little bit.. | 18:44 |
sdake | your not telling them why | 18:44 |
sdake | if you tell them why- they will fix it | 18:45 |
sbezverk | now ovsdv-server does not generate backtrace | 18:45 |
sbezverk | it just does not create socket | 18:45 |
sdake | moment need to switch networks - done with osic cluster fo rthe moment | 18:45 |
sbezverk | ok | 18:45 |
openstackgerrit | zhubingbing proposed openstack/kolla: Add gnocchi ansible role https://review.openstack.org/349351 | 18:46 |
*** sdake_ has joined #openstack-kolla | 18:47 | |
*** sdake has quit IRC | 18:50 | |
sdake_ | sbezverk just red your email | 18:54 |
sdake_ | my immediate response from an ovs point of view is "get me a backtrace of th ecrash" | 18:54 |
sdake_ | the socket being created or not is not relevant | 18:54 |
sdake_ | that happens after the crash | 18:54 |
sdake_ | anything that happens after a crash is bad data | 18:55 |
sdake_ | junk in = junk out | 18:55 |
sdake_ | we needd to get to the good in -> junk out and see why the junk out is happening | 18:55 |
sbezverk | sdake_ there is no crash!! | 18:55 |
sdake_ | you had a crash with daemonsets | 18:55 |
sbezverk | sdake_ not anymore ovsdb just does not create a socket | 18:56 |
sdake_ | the segfault fixed itself? | 18:56 |
sbezverk | I suspect what we saw if either not releated or another issue | 18:56 |
sbezverk | s/if/is/ | 18:57 |
sdake_ | how did the segfault fix itself on daemon sets | 18:57 |
sdake_ | did you reconfigure the gear? | 18:57 |
sbezverk | sdake_ nope, I was playing with commands and delays in the script | 18:57 |
sdake_ | ok so you ahve a delay | 18:58 |
sdake_ | if you take the delay out - you can still get a crash right? | 18:58 |
sbezverk | I could try but at this point since I still have problem even without seeing seg fault, why would we want it? | 18:59 |
sdake_ | to get a backtrace of course is why we want the crash | 18:59 |
sdake_ | but if your getting a running environment without a crash just no socket | 18:59 |
sdake_ | and allergic to gdb | 19:00 |
sdake_ | another option is to run it through strace | 19:00 |
sdake_ | that would be helpful as well | 19:00 |
sbezverk | ok cool, let me try it, but remember last time as soon as we added strace everything started working automagically :-) | 19:01 |
openstackgerrit | Christian Berendt proposed openstack/kolla: Remove files from /var/lib/apt/lists when cleaning up on Ubuntu/Debian https://review.openstack.org/351738 | 19:03 |
sdake_ | that is because strace got rid of the segfault | 19:04 |
sdake_ | but you just said there is no longer a segfault | 19:04 |
*** daneyon has joined #openstack-kolla | 19:07 | |
*** signed8bit_Zzz is now known as signed8bit | 19:11 | |
*** daneyon has quit IRC | 19:12 | |
*** senk has joined #openstack-kolla | 19:20 | |
*** senk has quit IRC | 19:21 | |
openstackgerrit | Christian Berendt proposed openstack/kolla: Unify keystone endpoint descriptions https://review.openstack.org/352110 | 19:31 |
sdake_ | sbezverk any results with strace? | 19:42 |
sbezverk | sorry had small house emeregency | 19:43 |
Mech422 | sdake_: that sounds like a classic race condition... | 19:44 |
Mech422 | sdake_: timing sensitive, magically 'disappears', etc etc | 19:45 |
sbezverk | sdake_ with strace it works without hickup | 19:45 |
sdake_ | Mech422 yes of couse | 19:45 |
sdake_ | sbezverk are you sure your assertion there is no crash is correct | 19:45 |
Mech422 | sdake_: if the ovs stuff is in a container, perhaps its trying to setup ovs before the host is ready ? | 19:46 |
sdake_ | Mech422 sbezverk has a sleep in there to prevent that scenario | 19:46 |
sbezverk | Mech422: here is the funny thing, I have two compute node in a cluster, the issue appears randomly on one of these nodes | 19:47 |
sbezverk | not on the same | 19:47 |
sbezverk | always, but I do not see any pattern | 19:47 |
sdake_ | dmesg shows no segfault sbezverk ? | 19:48 |
Mech422 | sbezverk: yeah - race of some sort | 19:48 |
sdake_ | it is unliekly tobe a race if there is a sleep 10 at teh start of things | 19:48 |
Mech422 | ovsdb just need /var/run and the db location IIRC | 19:49 |
Mech422 | just check - for ubuntu, it's looking for /var/run/openvswitch/* | 19:50 |
Mech422 | s/check/checked/ | 19:50 |
Mech422 | hmm - lsof shows it wants /var/lib/openvswitch for a lock file, and /var/log/openvswitch | 19:51 |
sbezverk | in my case the socket is at /run/openvswitch/db.sock | 19:52 |
Mech422 | and /dev/null | 19:52 |
sbezverk | and db sits at /etc/openvswitch/conf.dbn | 19:52 |
Mech422 | sbezverk: the actuall db is in /etc/openvswitch? or the conf ? funny place for a db file... | 19:53 |
sbezverk | I do not know how it got there, but it is the same location as in classical kolla :-) | 19:54 |
sbezverk | sdake_ 3 times with strace no issue | 19:55 |
sdake_ | sbezverk what about dmesg? | 19:55 |
sdake_ | (whtout strace) | 19:55 |
sbezverk | 21586.002436] traps: handler30[17143] general protection ip:7fb6696b4e37 sp:7fb667c239d0 error:0 in libc-2.17.so[7fb66967e000+1b7000] | 19:56 |
sbezverk | [ | 19:56 |
sbezverk | here is with strace | 19:56 |
sdake_ | you said earlier there was no segfault | 19:56 |
sdake_ | clearly there is sstill a segfault | 19:56 |
sbezverk | well but where do you see its realted to ovs? | 19:56 |
sdake_ | the segfault is related to why the socket is not created | 19:56 |
sbezverk | the socket is created | 19:57 |
sdake_ | because without strace there is no segfault | 19:57 |
sdake_ | rather with strace there is no segfault | 19:57 |
sbezverk | I keep telling you it does not seem to be related | 19:57 |
sdake_ | i keep telling you it is related | 19:57 |
sdake_ | the evidence is as follows for my position | 19:57 |
Mech422 | is it me, or does a protection fault in libc smell like bad memory ? | 19:57 |
sdake_ | strace -> no segfault -> works | 19:57 |
sdake_ | no strace -> segfault -> doesn't work | 19:58 |
sbezverk | ok now: I have strace on, I have socket created and I see seg fault of this god knows what this process is | 19:58 |
sdake_ | sbezverk need to see your screen up for a webex | 19:58 |
sbezverk | ok but I will not be able to join voice bridge | 19:58 |
sdake_ | well thats not helpful :) | 19:59 |
sbezverk | sorry cannot do anything about it atm | 19:59 |
openstackgerrit | Christian Berendt proposed openstack/kolla: Remove sudo commands from docs https://review.openstack.org/352118 | 19:59 |
sdake_ | i wasn't complaining | 19:59 |
sdake_ | ok so lets get focused here | 20:00 |
sdake_ | i need the following yes or no questions asnswered | 20:00 |
*** senk has joined #openstack-kolla | 20:00 | |
sdake_ | with strace, does a segfault occur? | 20:00 |
sbezverk | yes but for some unrelatd to ovs process | 20:01 |
sbezverk | I got the issue with strace | 20:02 |
sdake_ | got wich issue - no socket created? | 20:02 |
openstackgerrit | Christian Berendt proposed openstack/kolla: Remove heat dev environment https://review.openstack.org/352119 | 20:03 |
sbezverk | http://paste.openstack.org/show/551517/ | 20:04 |
sbezverk | yes | 20:04 |
Mech422 | sbezverk: ovsdb-server creates a sub-process - it runs a monitoring process on the original pid and the actual db in the sub - so your segfault might be in the sub process ? | 20:04 |
*** imcsk8 has quit IRC | 20:05 | |
openstackgerrit | Christian Berendt proposed openstack/kolla: Unify keystone endpoint descriptions https://review.openstack.org/352110 | 20:05 |
sdake_ | line 272 showss the socket is there | 20:05 |
sdake_ | line 280 is problematic | 20:06 |
sdake_ | let me track down errno 11 | 20:06 |
*** fragatina has joined #openstack-kolla | 20:06 | |
*** imcsk8 has joined #openstack-kolla | 20:06 | |
sdake_ | resource temporarily unavailable | 20:07 |
sbezverk | sdake_ you are right the socket now there, you need to check my initialization script | 20:07 |
sbezverk | 1st I use ovsdb-server to run a command on its own DB to add external bridge | 20:08 |
*** ad_rien_1 has joined #openstack-kolla | 20:08 | |
*** ad_rien_ has quit IRC | 20:08 | |
sbezverk | 2nd I use ovsdb-server to run a command to plug external interface to newly created bridge | 20:08 |
sbezverk | 3rd I start ovsdb-server process | 20:08 |
sdake_ | sbezverk ok lets focus on line 280 for a moment | 20:09 |
sdake_ | can you look in /usr/include/asm-generic | 20:09 |
sbezverk | So when 1st line was running ovsdb-server process did not open a socket and external bridge did not get created | 20:09 |
sdake_ | and look for 11 | 20:09 |
*** zhubingbing has quit IRC | 20:09 | |
sdake_ | just give me a moment to switch networks | 20:10 |
Mech422 | sbezverk: wait - are you using ovs inside AND outside of kolla container ? | 20:10 |
Mech422 | sbezverk: (eg is your host networking ovs based ?) | 20:10 |
*** fragatina has quit IRC | 20:11 | |
sbezverk | there is no kolla :-) | 20:11 |
Mech422 | sbezverk: ahh - you mentioned kolla default location before...confused me ... my bad | 20:11 |
sbezverk | my ucs server runs 5 VMs each VM is a node in a cluster | 20:11 |
*** sdake has joined #openstack-kolla | 20:11 | |
sbezverk | but you are right I do use ovs on my host to connect these VMs | 20:12 |
*** senk has quit IRC | 20:12 | |
sdake | sbezverk i need to download kernel.org | 20:12 |
sdake | to see why your getting an EAGAIN | 20:12 |
Mech422 | sdake: here's what I get for error 11: | 20:12 |
Mech422 | root@os-control-01:~# grep 11 /usr/include/asm-generic/socket.h | 20:13 |
Mech422 | #define SO_NO_CHECK 11 | 20:13 |
sdake | Mech422 nah - this is EAGAIN not SO_NO_CHECK | 20:13 |
sbezverk | sdake: /usr/include/asm/signal.h:#define SIGSEGV 11 | 20:13 |
sdake | the socket manual page doesn't list EAGAIN as a return code | 20:13 |
sdake | errno.h guys ;) | 20:13 |
*** sdake_ has quit IRC | 20:14 | |
sdake | sbezverk which kernel version do you have | 20:14 |
sbezverk | #define EWOULDBLOCK EAGAIN /* Operation would block */ | 20:14 |
sbezverk | 3.10.0-327.22.2.el7.x86_64 | 20:15 |
sdake | if you type "man socket" it doesn't list eagain as a reutrn code | 20:15 |
sdake | red hat's kerknel | 20:15 |
sbezverk | centos | 20:15 |
sbezverk | if it makes things easier I can get 4.5 or 4.6 | 20:15 |
sdake | no | 20:16 |
sdake | keep things as they are please | 20:16 |
openstackgerrit | Christian Berendt proposed openstack/kolla: Fix service_type of mistral endpoint https://review.openstack.org/352120 | 20:16 |
Mech422 | sbezverk: so the error occurs on the host side, for one machine - but not the other ? | 20:17 |
Mech422 | sbezverk: and the host AND vm's are correct on the other node ? | 20:18 |
openstackgerrit | Christian Berendt proposed openstack/kolla: Remove unused project_yaml parameter from role metadata files https://review.openstack.org/351928 | 20:18 |
Mech422 | sbezverk: or is it 1 physical box, and 1 VM is right, but not the other ? | 20:19 |
sbezverk | mech422 it is 1 physical node and 1 VM ok 1 VM does not | 20:20 |
sbezverk | but it is not always the same VM | 20:20 |
Mech422 | sbezverk: and virsh dumpxml shows all VMs defined the same? | 20:20 |
Mech422 | (sometimes when I'm copying vm configs I forget to change the MAC address and end up with dupes, etc) | 20:21 |
sbezverk | mech422 yep, I built all these manually | 20:21 |
sbezverk | in case of config issue I would expect to see systematic failure | 20:22 |
sbezverk | here we see very random :-( | 20:22 |
Mech422 | sbezverk: eh - never hurts to start with the basics... | 20:22 |
sbezverk | sure sure | 20:22 |
Mech422 | sbezverk: I do enough stupid shit, not to be surprised anymore :-) | 20:22 |
Mech422 | sbezverk: like copying vms and ending up using the same backing store on 2 vms | 20:23 |
sbezverk | :-) I use lvm volume per VM | 20:23 |
sbezverk | I am 99.99% positive it is race condition | 20:23 |
sbezverk | because when I start containers with sleep 86400 | 20:24 |
Mech422 | sbezverk: me too | 20:24 |
sbezverk | then connect to each container and run my script manually it always works | 20:24 |
Mech422 | sbezverk: these are full VMs not containers right ? | 20:25 |
sbezverk | correct | 20:25 |
Mech422 | sbezverk: so the race would probably be between VM | 20:25 |
sbezverk | do not think so | 20:25 |
sdake | the error is right there in the strace | 20:25 |
Mech422 | sbezverk: maybe at the disk or network layer | 20:25 |
sdake | socket is returning EAGAIN | 20:25 |
sdake | yet ovs is not trying socket again | 20:25 |
sbezverk | I bet it is kubernetes initialization | 20:25 |
sdake | it just goes blindingly on its way | 20:25 |
sbezverk | sequence | 20:25 |
sdake | sbezverk got a link to the openvswithc source code | 20:26 |
sbezverk | Mech422 If I manually restart ovsdb container everything stabilizes | 20:26 |
sdake | sbezverk focus on me please :) | 20:27 |
sdake | lets not rehash debugging that happened 4 days ago | 20:27 |
sbezverk | https://github.com/openvswitch/ovs/tree/branch-2.5 | 20:27 |
sdake | which process are you straccing | 20:28 |
sbezverk | ovsdb-server | 20:28 |
Mech422 | sbezverk: which one? I have two of them - the 'monitoring' one, and the 'real' one... | 20:29 |
Mech422 | sbezverk: oh - your stracing...manual start...nvm | 20:30 |
Mech422 | sbezverk: when you having a 'working' one up - does lsof -p FOO show anything unusual | 20:31 |
Mech422 | sbezverk: no unexpected dirs/mountpoints or devices ? | 20:32 |
sbezverk | I did docker inspect on both correctly workign container and not and compare them, I could not find anything abnormal | 20:34 |
*** Jeffrey4l has quit IRC | 20:35 | |
*** Jeffrey4l has joined #openstack-kolla | 20:35 | |
Mech422 | sbezverk: I don't know about k8s, but kolla likes to wipe the ovsdb when starting ovs...that hoses my host networking... | 20:35 |
Mech422 | sbezverk: if your doing container stuff - its not trying to reset your networking is it ? | 20:36 |
sbezverk | nope everything else works perfectly | 20:37 |
sdake | working on solution - calm down guys :) | 20:38 |
*** bootsha has quit IRC | 20:43 | |
Mech422 | sbezverk: sounds like its gotta be a race caused by some sort of config. issue - All-in-one setups have been beaten to death...if it didn't work in centos, I'd imagine there'd be stuff all over the net crying about it. Anyway, I gotta get back to work...let me know how it turns out :-) | 20:47 |
sdake | sbezverk quetion | 20:47 |
sbezverk | sure | 20:48 |
sdake | you linked a strace prior - are you SURE there was no segfault associated with that failure to create the socket? | 20:49 |
sbezverk | sdake I do see some seg faults but I do not recognize processes | 20:50 |
sbezverk | [23126.119755] traps: urcu6[20124] general protection ip:7fee826f0e37 sp:7fee7ec5b9f0 error:0 in libc-2.17.so[7fee826ba000+1b7000] | 20:50 |
sdake | that is an openvswitch process | 20:51 |
sbezverk | I did a search in ovs git for this symbol urcu6 | 20:52 |
sbezverk | nothing comes up | 20:52 |
sdake | http://openvswitch.org/pipermail/discuss/2015-December/019689.html | 20:52 |
sbezverk | ok | 20:53 |
sbezverk | but I do not see any trace generated in dmesg | 20:54 |
sbezverk | as it was mentioned in that thread.. | 20:55 |
sdake | ya who knows why that crashes - probablybecause the socket isn't there | 20:55 |
sdake | add a /dev:/dev bindmount | 20:55 |
sdake | and reproduce the problem with strace without a direct segfault of ovs-ddb | 20:55 |
sbezverk | ok | 20:56 |
*** daneyon has joined #openstack-kolla | 20:56 | |
sbezverk | should I leave strace? | 20:57 |
sdake | yes pls | 20:59 |
sdake | nd reproduce the problem with strace without a direct segfault of ovs-ddb | 20:59 |
sdake | my bet is after you add the dev bindmount the problem will disappear | 20:59 |
sdake | but could be wrong | 20:59 |
sdake | isn't debugging fun ? :) | 20:59 |
sdake | brb switching networks | 21:00 |
*** egonzalez90 has joined #openstack-kolla | 21:00 | |
*** daneyon has quit IRC | 21:00 | |
*** sdake_ has joined #openstack-kolla | 21:01 | |
*** sdake_ has quit IRC | 21:02 | |
*** sdake_ has joined #openstack-kolla | 21:02 | |
*** signed8bit is now known as signed8bit_Zzz | 21:03 | |
*** signed8bit_Zzz is now known as signed8bit | 21:03 | |
*** signed8bit is now known as signed8bit_Zzz | 21:03 | |
*** sdake has quit IRC | 21:04 | |
*** signed8bit_Zzz is now known as signed8bit | 21:04 | |
*** signed8bit is now known as signed8bit_Zzz | 21:05 | |
sdake_ | sbezverk let me know if you have a failure in 20-30 runs | 21:06 |
sbezverk | %) 20-30 | 21:06 |
sdake_ | also can you show me your current bindmounts | 21:07 |
sdake_ | (for that container) | 21:08 |
sbezverk | - mountPath: /var/lib/kolla/config_files | 21:09 |
sbezverk | name: openvswitch-db-config | 21:09 |
sbezverk | readOnly: true | 21:09 |
sbezverk | - mountPath: /var/lib/openvswitch | 21:09 |
sbezverk | name: openvswitch-db | 21:09 |
sbezverk | - mountPath: /run | 21:09 |
sbezverk | name: host-run | 21:09 |
sbezverk | - mountPath: /dev | 21:09 |
sbezverk | name: host-dev | 21:09 |
sbezverk | - mountPath: /etc/localtime | 21:09 |
sbezverk | name: host-etc-localtime | 21:09 |
sbezverk | readOnly: true | 21:09 |
sdake_ | try otu dev, if that fails, try otu /sys/fs/cgroups | 21:16 |
sdake_ | if that fails | 21:16 |
sdake_ | let me know | 21:17 |
sdake_ | (with a strace paste) | 21:17 |
*** sdake has joined #openstack-kolla | 21:21 | |
*** ad_rien_1 has quit IRC | 21:23 | |
*** ad_rien_ has joined #openstack-kolla | 21:24 | |
*** sdake_ has quit IRC | 21:24 | |
*** egonzalez90 has quit IRC | 21:25 | |
sdake | sbezverk any word? | 21:28 |
openstackgerrit | Ryan Hallisey proposed openstack/kolla-kubernetes: Add an --all-in-one flag to the CLI https://review.openstack.org/352138 | 21:35 |
sbezverk | sdake: still working.. I see containers crashes but then it gets stabilized.. | 21:36 |
sdake | so /dev:/dev fixes it | 21:37 |
sdake | or undecided? | 21:37 |
sbezverk | I want to test without strace | 21:37 |
sdake | does running without strace cause a crash that is fata lin nature? | 21:38 |
sbezverk | without strace it was reproduced each time | 21:39 |
sdake | cool give that a spin | 21:39 |
*** bootsha has joined #openstack-kolla | 21:41 | |
*** sdake_ has joined #openstack-kolla | 21:50 | |
*** sdake has quit IRC | 21:52 | |
sbezverk | sdake_: man it works like a charm. now containers are not restarting | 21:53 |
sdake_ | yw | 21:53 |
sbezverk | 3 out of 3 were sucess | 21:53 |
sbezverk | thank you | 21:53 |
sbezverk | where did you get idea to add dev: | 21:53 |
sbezverk | have you noticed something in the code? | 21:54 |
sdake_ | pulled it out of my ass | 21:54 |
sdake_ | link the strace that failed again - my backscroll is gone | 21:54 |
sdake_ | i'll show you why i suspected that may fix it | 21:55 |
sbezverk | http://paste.openstack.org/show/551517/ | 21:55 |
sdake_ | line 274 | 21:56 |
sdake_ | nah wrong line | 21:56 |
sdake_ | line 275 | 21:57 |
sdake_ | there i a makedev syscall | 21:57 |
sdake_ | i/is | 21:57 |
sdake_ | did you remove all the otehr hacks you have in place | 21:57 |
sdake_ | to certify that concretely fixes it | 21:58 |
sdake_ | if it fixes it - woudl appreciate a coauthor line ;) | 21:59 |
sbezverk | doing it right now | 22:00 |
sbezverk | FYI I would still to use a script regardless | 22:01 |
sbezverk | in order to be able to use DaemonSet for completely dynamic operation | 22:01 |
*** huikang has joined #openstack-kolla | 22:03 | |
*** fragatina has joined #openstack-kolla | 22:08 | |
Mech422 | [12:52] <Mech422> and /dev/null | 22:11 |
Mech422 | so it was a config error...nice | 22:11 |
sdake_ | sbezverk it would be nice if our code didnt have sleep 1s in it | 22:12 |
sdake_ | sloppy | 22:12 |
sdake_ | the only reason i +2'ed the chnge is to unblock you | 22:12 |
*** fragatina has quit IRC | 22:12 | |
sdake_ | short term slop is ok in my book | 22:12 |
sdake_ | as long as it gets fixed short term :) | 22:12 |
Mech422 | I am curious how it worked in the other VMs without /dev bound to the containers though... | 22:14 |
sdake_ | Mech422 i'm cruious why it works in ansible without dev ;) | 22:15 |
Mech422 | sdake_: its really wierd... /dev missing isn't a race...why would it suddenly appear later? or not need /dev later ? | 22:17 |
sdake_ | i dont claim to know the root cause | 22:18 |
Mech422 | sdake_: yeah - just very odd...oh well, working now :-) | 22:18 |
sdake_ | indeed it culd till be a race - that is why i want the sleep hacks removed | 22:18 |
Mech422 | sdake_: maybe 'building' /dev takes longer then mounting it 'pre-built' ? | 22:19 |
*** huikang has quit IRC | 22:24 | |
*** huikang has joined #openstack-kolla | 22:25 | |
Mech422 | sdake_: this docker bug talks about udev races: https://github.com/docker/docker/issues/4036 | 22:25 |
Mech422 | sdake_: I wonder if it might be related? | 22:25 |
Mech422 | udev not responding fast enough in the vm or something? | 22:26 |
sdake_ | dont know - dont care - watching tv :) | 22:26 |
Mech422 | sdake_: LOL - enjoy :-) | 22:26 |
Mech422 | sbezverk: so does centos even do the 'build /dev on boot' thing ? or is it a static /dev ? | 22:27 |
sbezverk | Mech422: looks like mounting dev on host | 22:27 |
sbezverk | fixed the issue | 22:27 |
sbezverk | sdake: found some references in code suggesting that this mount might be required. | 22:28 |
sdake_ | enders game inc | 22:28 |
Mech422 | sbezverk: yeah - I'm just curious why it worked on ANY nodes if /dev is required... | 22:28 |
Mech422 | sdake_: oh - loved those books.. :-) | 22:29 |
*** huikang has quit IRC | 22:29 | |
Mech422 | sdake_: my fav was the second (?) book... the one told from the brazillian kids point of view | 22:30 |
Mech422 | sdake_: his nemesis from the barrio was a cold blooded S.O.B. | 22:31 |
sdake_ | i like the movie | 22:31 |
sdake_ | i haven't read the books | 22:31 |
sdake_ | ender fails his team | 22:31 |
sdake_ | yet succeeds at the same time | 22:31 |
sdake_ | as oxymoronic as that is | 22:32 |
Mech422 | sdake_: I heard the movie was pretty good - but haven't seen it | 22:33 |
sdake_ | worth watching | 22:33 |
sdake_ | probably not as good as the book | 22:33 |
sdake_ | i spend 12 hours a day reading - thats enough for me | 22:33 |
Mech422 | sdake_: I'm waiting on Suicide Squad now... | 22:33 |
sdake_ | yup i want to see that one | 22:34 |
sdake_ | star trek was a let down | 22:34 |
sdake_ | bourne was just ok | 22:34 |
sdake_ | waiting or a good movie to come along | 22:34 |
sdake_ | something like enders game | 22:35 |
sdake_ | or 410 to yuma | 22:35 |
sdake_ | or no country for old men | 22:35 |
sdake_ | you know - a mdoern classic | 22:35 |
Mech422 | star trek let you down? bummer...I'm waiting for that to hit vudu | 22:35 |
sdake_ | well you know what they say about opinions :) | 22:35 |
Mech422 | the girl friend is a big jason stratham fan - so the new 'Mechanic' movie will be on our list too :-P | 22:35 |
sdake_ | sbezverk can you confirm you removed the sleep1 and had success | 22:36 |
sdake_ | watched wild card last night | 22:36 |
Mech422 | oh - thats an old one... | 22:36 |
sdake_ | if your in for a good tv episode, watch "Chain of Command" from star trek | 22:36 |
Mech422 | I still think Leon from 'The Professional' is one of the best bad-good-guys | 22:37 |
Mech422 | sdake_: oh - chain of command is ST:NG ? I'll have to check it out...plot summary sounds really good | 22:40 |
sdake_ | ya stng | 22:41 |
sdake_ | i read somewhere star trek is doing another run on tv | 22:42 |
Mech422 | oh? sweet - they seem to do a decent job with ST | 22:42 |
Mech422 | I haven't really been disappointed in any of them - for some reason, I really like the 'enterprise' series | 22:43 |
Mech422 | sometime I need to take a week off and watch babylon 5 from start to finish... | 22:43 |
Mech422 | I've never managed to see the whole thing | 22:43 |
*** ad_rien_ has quit IRC | 22:44 | |
sdake_ | i was a super fan of stargate | 22:51 |
sdake_ | all of em | 22:51 |
sdake_ | except the last one which ended with an unresolved cliffhanger | 22:52 |
Mech422 | sdake_: stargate was good...but they ended up with so many spinoffs, I refused to get sucked it - that would be as long as babylon 5 to watch them all! | 22:56 |
*** huikang has joined #openstack-kolla | 23:04 | |
*** sdake_ has quit IRC | 23:14 | |
*** huikang has quit IRC | 23:25 | |
*** daneyon has joined #openstack-kolla | 23:38 | |
*** daneyon has quit IRC | 23:42 | |
*** fragatina has joined #openstack-kolla | 23:48 | |
*** zhurong has joined #openstack-kolla | 23:52 | |
*** fragatina has quit IRC | 23:54 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!