Friday, 2016-01-15

openstackgerritMerged openstack/astara: devstack doesn't check ASTARA_APPLIANCE_SSH_PUBLIC_KEY existence  https://review.openstack.org/26458900:21
adam_gwhen it rains it pours: https://bugs.launchpad.net/diskimage-builder/+bug/153438700:27
openstackLaunchpad bug 1534387 in diskimage-builder "debian DIB_EXTLINUX=1 builds fail: Missing package name for distro/element: debian/base" [Undecided,New]00:27
openstackgerritAdam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests  https://review.openstack.org/24387400:36
openstackgerritAdam Gandelman proposed openstack/astara: fail func tests early (do not merge)  https://review.openstack.org/26666600:36
openstackgerritMerged openstack/astara: Drop unused call to non-existent function  https://review.openstack.org/26712700:41
markmcclainadam_g: fun times00:47
*** stanchan has quit IRC01:01
*** yanghy has joined #openstack-astara01:58
*** outofmemory is now known as reedip02:05
openstackgerritYang Hongyang proposed openstack/astara: Refactor ensure_cache for loadbalancer driver  https://review.openstack.org/26674102:44
openstackgerritYang Hongyang proposed openstack/astara: Refactor ensure_cache for instance manager  https://review.openstack.org/26674002:44
openstackgerritYang Hongyang proposed openstack/astara: Refactor ensure_cache for router driver  https://review.openstack.org/26674202:44
openstackgerritYang Li proposed openstack/astara: Add an option to get max sleep time from the config file  https://review.openstack.org/26621302:55
openstackgerritYang Hongyang proposed openstack/astara: Remove unnecessary nosetest param  https://review.openstack.org/26790403:12
openstackgerritYang Hongyang proposed openstack/astara: Remove unused openstack common conf  https://review.openstack.org/26792003:34
openstackgerritxiayu proposed openstack/astara: Automatically generate etc/orchestrator.ini file  https://review.openstack.org/26557603:40
openstackgerritYang Li proposed openstack/astara: Add an option to get max sleep time from the config file  https://review.openstack.org/26621304:42
*** leonstack has quit IRC04:54
*** leonstack has joined #openstack-astara04:56
*** leonstack has quit IRC05:19
*** leonstack has joined #openstack-astara05:20
*** leonstack1 has joined #openstack-astara06:37
*** leonstack has quit IRC06:40
*** leonstack has joined #openstack-astara06:46
*** leonstack1 has quit IRC06:47
*** leonstack has quit IRC06:49
*** leonstack has joined #openstack-astara07:48
*** ronis has joined #openstack-astara08:30
*** yanghy_ has joined #openstack-astara08:32
*** yanghy has quit IRC08:35
*** reedip is now known as outofmemory08:49
*** leonstack1 has joined #openstack-astara08:55
*** leonstack has quit IRC08:58
*** outofmemory has quit IRC09:07
*** ronis has quit IRC09:31
*** leonstack has joined #openstack-astara09:43
*** leonstack1 has quit IRC09:45
*** xiayu has quit IRC09:47
*** xiayu has joined #openstack-astara09:48
*** xiayu has quit IRC10:00
openstackgerritYang Li proposed openstack/astara: Fix a bug for default provider  https://review.openstack.org/26808511:28
*** leonstack1 has joined #openstack-astara12:18
*** leonstack has quit IRC12:21
*** leonstack has joined #openstack-astara12:31
*** leonstack1 has quit IRC12:32
*** openstackgerrit has quit IRC12:50
*** openstackgerrit has joined #openstack-astara12:51
*** yanghy_ has quit IRC14:38
ryanpetrelloadam_g you around?15:54
ryanpetrelloor markmcclain15:54
ryanpetrelloI was talking to rods about this orphaned vrrp port issue15:54
markmcclainryanpetrello, rods: yes15:55
ryanpetrellowe had discussed the idea of having a "Cleanup" state15:55
ryanpetrellomy concern is that it complicates the state machine more15:55
ryanpetrelloand also if you run into a situation where port deletes aren't working, it clogs up the state machine workers15:55
ryanpetrellowhat about some way to mark ports as orphaned after detachment?15:55
markmcclainyeah... am concerned about that too15:55
ryanpetrelloand a thread in the rug that does nothing but deletes them up in the background?15:55
ryanpetrellomaybe even just changing the name to some identifier that flags them for cleanup via the rug15:56
markmcclainyeah.. we could have a reaper thread15:56
markmcclainthe issue is HA rug15:56
markmcclainwho runs the reaper?15:56
markmcclainthe alternate is that we set a dirty bit15:57
ryanpetrellois it a huge issue if they're both issuing deletes?15:57
markmcclainonly if deletes are failing for strange reasons15:57
ryanpetrelloyea15:57
markmcclainbecause you'll have N processes attempting15:57
ryanpetrelloright15:57
ryanpetrellohow does the HA sharding currently work?15:58
markmcclainit's based on the hash-ring15:58
markmcclainof resource id15:58
markmcclainwe could alternately set a dirty bit15:58
ryanpetrellowhere?15:58
markmcclainand if it's set the exit of is_alive15:58
markmcclainmakes 1 attempt to delete15:59
markmcclainand then continues on with the state machine15:59
markmcclainso blocking wouldn't occur and we'd get periodic cleanups15:59
ryanpetrellomaybe15:59
rodsmarkmcclain one of the issue that I'm seeing when deleting the port in the REPLUG is that there are cases where if an exception is raised the rug moves the router from REPLUG to CONFIG to CALCACTION so just cleanup in the replug is not enough15:59
markmcclainright15:59
ryanpetrelloright15:59
ryanpetrelloas the state machine evolves15:59
ryanpetrellowe're going to keep running into this15:59
ryanpetrellowhich is why I think something that reaps in the background makes sense16:00
markmcclainI'm thinking we just flag that the state is dirty16:00
markmcclainand there's some kind of janitor state16:00
ryanpetrelloanother consideration16:00
ryanpetrellothis approach only handles ports detached by the rug16:00
markmcclainwell the configure step will notice that there's a mismatch16:01
ryanpetrellotrue16:01
markmcclainand attempt to clean up16:01
ryanpetrelloand replug16:01
markmcclainbasically replug is supposed to be the state where ports are reconciled16:02
ryanpetrelloright16:02
ryanpetrelloI'm not against the idea of this happening in is_alive16:02
ryanpetrellomy original inclination with the reaping was in tandem w/ the health check16:02
ryanpetrellothere's obviously still room for orphaned ports16:02
markmcclainyeah.. that's what I'm leaning towards16:03
ryanpetrelloe.g., stop the rug at the wrong time16:03
markmcclainis that we set a dirty bit16:03
markmcclainand the health check can try to make the instance healthy16:03
ryanpetrellodo neutron ports have metadata?16:03
ryanpetrellowhen you say dirty bit, do you mean something at runtime in the state machine?16:04
ryanpetrelloor something stored at the DB level16:04
ryanpetrellowhatever approach we take here16:04
ryanpetrelloI'd like it to be something the rug can recover if it's restarted16:04
ryanpetrelloe.g., if a replug happens and then we immediately restart the rug16:04
ryanpetrellowhen the new rug process comes back up, it should notice the orphaned port and delete it16:04
ryanpetrello(imo)16:04
rodsyeah, should be something in the db16:04
ryanpetrellootherwise we're just slowly going to leak ports16:04
ryanpetrellomaybe we rename ports with some signification of orphaned + hash-ring resource ID16:05
ryanpetrelloso each rug only handles the orphaned ports it should care about16:05
rodsis that going to cause issues on rebalancing?16:07
rodsbrb16:09
ryanpetrellointerested in adam_g's perspective when he's in16:14
markmcclainsadly the ports don't have metadata16:26
markmcclainso we'll have to see a dirty bit in the workers machine state16:26
markmcclainthe nice thing is that even if control is handed off the dirty bit would be set again becuase the ports would not match16:26
*** ryanpetrello is now known as ryanpetrello116:47
*** ryanpetrello1 is now known as ryanpetrello16:48
openstackgerritmark mcclain proposed openstack/astara-neutron: Add reno for release notes management  https://review.openstack.org/26781217:13
* adam_g reads backscroll17:46
adam_gmarking stuff dirty for later cleanup seems like we'd still be in the situation of things being eventually consistent and tenants not being able to delete their things (at least for some period of time)17:50
*** cleverdevil has quit IRC17:55
*** cleverdevil has joined #openstack-astara17:56
*** smartshader has quit IRC18:17
rodsI think there are no alternatives to the eventually consistent behaviour right now, we may want to focus on not leaving stray ports18:34
markmcclainadam_g: https://review.openstack.org/#/c/267962/118:38
openstackgerritmark mcclain proposed openstack/astara-neutron: remove dead floating IP code  https://review.openstack.org/26828618:48
*** leonstack1 has joined #openstack-astara18:56
*** leonstack has quit IRC18:57
openstackgerritmark mcclain proposed openstack/astara-neutron: remove dead floating IP code  https://review.openstack.org/26828618:57
openstackgerritmark mcclain proposed openstack/astara-neutron: remove dead floating IP code  https://review.openstack.org/26828618:59
openstackgerritMerged openstack/astara-neutron: Add reno for release notes management  https://review.openstack.org/26781219:01
fzylogicmarkmcclain: that patch appears to be working as expected19:01
fzylogicthanks!19:01
markmcclainawesome19:01
adam_gmarkmcclain, so looking at the gate failures19:27
adam_gi think that test is just bad19:28
adam_gand we'd be better off dropping it once the newer tests land19:28
adam_gwhich create tenant/network/router per test instead of relying on the devstack created one19:29
markmcclainyeah.. thanks makes sense19:29
openstackgerritmark mcclain proposed openstack/astara-neutron: allow DHCP from router interfaces  https://review.openstack.org/26658619:32
markmcclainadam_g: fixed rebase conflict if you want to re-add +A19:33
openstackgerritAdam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests  https://review.openstack.org/24387419:33
markmcclainadam_g: thanks19:36
openstackgerritAdam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests  https://review.openstack.org/24387420:00
openstackgerritAdam Gandelman proposed openstack/astara: Enrich functional test suite  https://review.openstack.org/21995220:00
openstackgerritAdam Gandelman proposed openstack/astara: fail func tests early (do not merge)  https://review.openstack.org/26666620:00
markmcclainrods, ryanpetrello: https://review.openstack.org/#/c/246005/1420:25
*** davidlenwell has quit IRC20:38
*** davidlenwell has joined #openstack-astara20:39
openstackgerritmark mcclain proposed openstack/astara: Move settings from plugin.sh to the settings file  https://review.openstack.org/26832720:50
adam_gmarkmcclain, so im trying to make network cleanup more robust in this test suite and hitting network in use errors because of ports not being cleaned up in time21:16
adam_gmarkmcclain, wondering if overriding the ml2's delete_networks() to avoid raising the in-use exception if the ports its found as astara-internal ports, under the assumption that we will be cleaning those up later21:17
adam_g... if that coupled with some reaper thread in astara would help with the stray ports issue rods is having21:18
adam_ghttp://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/db_base_plugin_v2.py#n5821:21
markmcclainadam_g: yeah... considered that too21:29
markmcclainadam_g: I've also wanted to retire the ml2 plugin wrapper too21:29
adam_gmarkmcclain, it seems thats exactly what happens for dhcp ports/etc21:29
markmcclainright it's a bit different because dhcp does not use nova21:29
markmcclainso not sure what's going to happen if we delete things out from underneath nova21:29
adam_gyea21:32
openstackgerritAdam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests  https://review.openstack.org/24387421:34
openstackgerritAdam Gandelman proposed openstack/astara: Enrich functional test suite  https://review.openstack.org/21995221:34
openstackgerritAdam Gandelman proposed openstack/astara: fail func tests early (do not merge)  https://review.openstack.org/26666621:34
stupidnicOkay. I seem to be having a problem with the Astara instance starting up.21:42
stupidnicI blew away our entire install and redeployed. Most everything works. We can turn up instances, but I can't seem to boot an Astara instance.21:43
openstackgerritMerged openstack/astara: Allow API listening address to be specified in config  https://review.openstack.org/24600521:43
markmcclainstupidnic: seeing any errors in the logs?21:44
stupidnicmarkmcclain: the only thing I am seeing is a traceback on CheckBoot.execute()21:44
stupidnicLet me dig a bit deeper in the logs21:44
* adam_g needs to run baby to the baby doc. ttyl21:44
markmcclainstupidnic: ok.. can you paste the traceback to paste.openstack.org?21:46
stupidnicSure. Thinking on this a bit more... is it possible that image isn't in the correct project and as such Astara can't access it?21:47
markmcclainpossibly21:47
markmcclainespecially if the instance isn't showing up in Nova21:47
stupidnicWell it shows up, but it immediately errors21:47
stupidnicLet me look at Nova21:48
stupidnicRescheduledException: Build of instance dacef104-0eda-4b7a-95f4-f452574dadd9 was re-scheduled: 'dict' object has no attribute 'disk_format'\n"21:49
stupidnicbad image import?21:49
stupidnicodd... glance shows the image is disk_format raw which is what it should be21:51
stupidnicmight be related to a bug in rbd... I seem to recall having this fixed before and we made a Salt rule to update the file with the patched version, but we pulled it exepcting the patch to have made it up stream21:57
stupidnicdouble checking21:57
elois this related to this bug: https://bugs.launchpad.net/nova/+bug/150823021:57
openstackLaunchpad bug 1508230 in nova (Ubuntu Wily) "regression in cloning raw image type with ceph" [High,Fix committed] - Assigned to James Page (james-page)21:57
stupidnicelo: one and the same21:58
stupidnicWe made the mistake of assuming that after 3 months this bug would have been patched in the debs that Ubuntu is publishing... sadly that is not the case21:59
stupidnicYep... sigh22:00
openstackgerritmark mcclain proposed openstack/astara: Move settings from plugin.sh to the settings file  https://review.openstack.org/26832722:01
stupidnicAlright... have the instance booted... can't talk to it over ipv6 though... digging into that now22:14
stupidnicOkay. I see the traffic making it over the VXLAN tunnels we are using22:17
stupidnicfrom the controller to the compute node22:18
elocan't talk to the management IP address22:18
stupidnicanywhere else I should look?22:18
stupidnicYeah. The instance is up and runninng (I can see the console) but I can't ping the management ip22:18
eloI assume using OVS on the nodes22:19
stupidnicnegative22:19
stupidnicI hate it :)22:19
stupidnicIt's not a bad technology... it just makes things way more complicated than they really need to be22:19
elomarkmcclain: could this be the issue that you seen with linux bridge is not replicating packet from one bridge to another?22:21
stupidnicWhat is the details to connect to the Astara instance (I used the default one)22:22
markmcclainelo: possibly22:22
markmcclainif everything is on same host then it's something else22:22
stupidnicit's not22:22
markmcclainok.. so it's a replication issue22:23
stupidnicwe have separate controllers and compute nodes22:23
markmcclainsince neighbor discoverty/arp has to work22:23
stupidnicI can see the packets on the vxlan interface on the compute node... so the packets are making it that far22:23
elodo you see any drops in the IPtables for packets22:23
eloon the VXlan bridge interface22:24
markmcclainhmmm.. if they're making to the compute node then things are good22:24
stupidnic17:26:46.517809 IP6 fdd6:a1fa:cfa8:748e:f816:3eff:fefc:fb88 > ff02::1:ff15:d0dc: ICMP6, neighbor solicitation, who has fdd6:a1fa:cfa8:748e:f816:3eff:fe15:d0dc, length 3222:26
stupidnicthat's on the vxlan interface on the compute node22:27
stupidnicI see fb88 is the IPv6 on the controller22:27
openstackgerritmark mcclain proposed openstack/astara: make the enabled_drivers configurable in devstack  https://review.openstack.org/26835322:28
stupidnicI would like to login to the console on the Astara router to confirm it actually has the IPv6 address we are looking for22:28
markmcclainyou have to build an appliance with the demo user22:29
markmcclainor mount the disk image and user that can login with username/passwd22:29
markmcclains/demo/debug/22:29
stupidnicOkay.22:29
markmcclainhas the L2 agent moved the vxlan device to proper bridge?22:30
stupidnicHow would I confirm that?22:30
markmcclainbrctl show22:31
markmcclainshould list the interfaces added to the bridge22:31
stupidnicOkay I show that vxlan-10 and tapf92... are on the same bridge22:32
stupidnicso in theory I should also see the packets on that same interface if I tcpdump it22:32
stupidnicyep22:32
eloyes22:32
stupidnicI can the who has22:33
stupidnicI can see22:33
stupidnicSo the packets are making it into the tap interface22:33
stupidnicon the compute node22:33
stupidnicI suspect something is up with the instance itself....22:34
stupidnicmaybe dhcp22:34
eloconfig drive is used for router instance for network configuration22:35
stupidnichow do I enable the debug user in disk-image-create22:37
eloIll send you instruction on how to backdoor a qcow image22:37
markmcclainok.. I've got head out for a bit22:37
stupidnicI am pretty sure this is a DHCP issue22:41
stupidnicI just checked the network in Horizon and got an error about DHCP for the service network22:42
elohttps://gist.github.com/eric-lopez/d0321112cc1678566c7e22:50
stupidnicelo: can you clarify something for me? We have dhcp-agent and metadata-agent disabled on the controller. Is that correct?22:51
eloyes22:52
elothese services are handled by the router instance22:52
stupidnicThat's what I thought22:52
eloif the router is configured properly or connected to the tenant network, the instance in that network will not get DHCP or metadata info22:53
stupidnicheh, vi isn't in the image22:56
elook. will update gist. this was quickly written up23:03
stupidnicit's no problem... I just had a laugh23:04
stupidnicI echo'ed it23:04
stupidnicand we are in23:09
stupidnicand now I can't type any more23:09
stupidnicsigh23:10
stupidnicI should start drinking23:10
stupidnicman... that's weird23:11
eloTGIF...23:12
eloyes23:12
stupidnicI have another instance that has been running for hours... I can type on the console for that one23:12
stupidnicit's like the instance locks up23:12
elois it on the same compute node?23:13
stupidnicNo. Different compute node23:13
stupidnicbut they are all identical23:13
*** shashank_hegde has joined #openstack-astara23:13
stupidnicSoft rebooting isn't working either23:13
stupidnicsomething up there23:14
stupidnicOkay. I don't see any configuration for eth0 or eth123:17
stupidnicthe interfaces are there, but there is nothing in the interfaces file23:17
stupidnicWhere is the astara appliance supposed to pull its configuration information from?23:21
eloconfdrive23:22
stupidnicOkay. The lock issue is probably due to the rug rebooting the instance out from under me23:27
stupidnicit just did it to me again23:27
stupidnicOkay. So why isn't this instance getting its configuration23:28
stupidnicWhat services do I need to have running on the controller?23:28
elothat makes sense as it can't configure the instance as it isn't getting a mgmt IP address23:29
stupidnicRight. So where might we going wrong?23:29
elonova-compute should of installed genisoimage packages that configdrive requires23:31
fzylogicassuming nova-compute is configured to use ISO confdrive images23:32
fzylogicmight be vfat23:32
fzylogicboth have their own optional requirements23:32
elocorrect. I forgot about vfat23:32
elocheck nova.conf on the format the config_drive_format is set23:34
stupidnicnope, not set on the compute nodes23:34
stupidnicnegative on the controller as well23:35
stupidnicso if it is not set then it will default to iso23:36
stupidnicand I can confirm that genisoimage23:36
stupidnicis installed23:36
eloreference docs says default is set to iso966023:38
fzylogicnova-compute.log should tell you for sure if it's being built when the instance spawns23:42
stupidnicthe confdrive?23:43
stupidnicnot seeing anything in the logs reference confdrive or geniso23:43
fzylogicyeah23:44
fzylogicwhen instances boot here, it logs 2 things23:44
fzylogic"instance: <uuid>] Using config drive"23:44
fzylogic"instance: <uuid>] Creating config drive at <path>"23:44
stupidnicYeah I have that23:44
fzylogicyou can either kill the rug or put the router into debug mode after it boots so you don't get the appliance pulled out from under you23:45
fzylogicthat'll let you poke around a bit more thoroughly23:45
stupidnicOkay. So just drop orchestrator?23:46
openstackgerritAdam Gandelman proposed openstack/astara: Add astara-ctl + API functional tests  https://review.openstack.org/24387423:46
openstackgerritAdam Gandelman proposed openstack/astara: fail func tests early (do not merge)  https://review.openstack.org/26666623:46
eloastara-ctl router debug <router_id>23:47
fzylogic^^23:47
stupidnicwould the router ID be the instance ID or?23:48
fzylogicno, the router ID as neutron knows it23:48
stupidnicOkay... got it23:49
stupidnicSo... we know that nova-compute is building the confdrive23:49
eloso is it configuring the mgmt interface correctly?23:52
stupidnicNo there are no interfaces on the instance other than lo23:53
stupidnicwell there are interfaces but they are not configured23:53
stupidnicI just checked the path for the config drive as specified in the logs and that file does not exist23:53
stupidnicThe path is there and contains console.log and libvirt.xml23:54
adam_gstupidnic, check /proc/partitions in the appliance, should have config drive in there as sr023:56
adam_gon the compute node, virsh dumpxml for the instance, should be a <disk/> entry for disk.config23:57
eloshould be a disk (qcow image) and disk.config (config drive) files in that directory23:57
stupidnicThere isn't.23:57
stupidnicAnd the debug didn't work23:57
stupidnicthe instance is still being yanked out from under me23:58
stupidnicIs it possible that there is an issue with Ceph here?23:58
stupidnicLooking in the libvirt.xml there is a reference to disk.conf but it is referencing an rbd23:58
adam_gstupidnic, is 'force_config_drive=True' set in nova.conf ?23:58
stupidnicno, I don't have any settings related to config_drive23:59
stupidnic<source protocol="rbd" name="volumes/01e62deb-2fd4-4b38-935b-4326a426ecb0_disk.config">23:59
adam_gim not sure what the status of config drive /w ceph is23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!