*** sthillma has quit IRC | 00:01 | |
*** sthillma has joined #tripleo | 00:02 | |
*** rwsu has quit IRC | 00:09 | |
*** alop has joined #tripleo | 00:12 | |
*** rwsu has joined #tripleo | 00:16 | |
*** panda_ has quit IRC | 00:18 | |
*** panda_ has joined #tripleo | 00:19 | |
*** cwolferh has quit IRC | 00:26 | |
*** cwolferh has joined #tripleo | 00:27 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: package-installs doesn't work in python3/gentoo https://review.openstack.org/273782 | 00:29 |
---|---|---|
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add gentoo support to growroot https://review.openstack.org/273769 | 00:32 |
*** panda_ has quit IRC | 00:34 | |
*** panda_ has joined #tripleo | 00:34 | |
*** dmacpher has joined #tripleo | 00:34 | |
*** Marga_ has quit IRC | 00:43 | |
*** panda_ has quit IRC | 00:45 | |
*** panda_ has joined #tripleo | 00:45 | |
*** ayoung has joined #tripleo | 00:49 | |
*** chlong has joined #tripleo | 00:49 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add support for openrc to dib-init-system https://review.openstack.org/273772 | 00:51 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add support for openrc to dib-init-system https://review.openstack.org/273772 | 00:52 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: fix gentoo hardened support https://review.openstack.org/273795 | 00:52 |
*** panda_ has quit IRC | 00:56 | |
*** panda_ has joined #tripleo | 00:56 | |
*** marcusvrn_ has quit IRC | 00:57 | |
*** ccrouch has quit IRC | 01:02 | |
*** ccrouch has joined #tripleo | 01:04 | |
*** panda_ has quit IRC | 01:04 | |
*** panda_ has joined #tripleo | 01:05 | |
*** lazy_prince has joined #tripleo | 01:09 | |
*** yamahata has joined #tripleo | 01:10 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add support for gentoo to simple-init https://review.openstack.org/273790 | 01:12 |
openstackgerrit | James Slagle proposed openstack/tripleo-heat-templates: Add swap to overcloud nodes https://review.openstack.org/273752 | 01:24 |
*** cwolferh has quit IRC | 01:24 | |
*** cwolferh has joined #tripleo | 01:25 | |
*** penick has quit IRC | 01:28 | |
*** rlandy has quit IRC | 01:29 | |
*** mgagne has quit IRC | 01:32 | |
*** mgagne has joined #tripleo | 01:32 | |
*** mgagne has joined #tripleo | 01:32 | |
*** pradk has quit IRC | 01:35 | |
*** pradk has joined #tripleo | 01:51 | |
prometheanfire | are triple-o checks failing for dib? | 01:53 |
*** tiswanso has joined #tripleo | 01:59 | |
*** tiswanso has quit IRC | 01:59 | |
*** tiswanso has joined #tripleo | 02:00 | |
*** yamahata has quit IRC | 02:08 | |
*** sthillma has quit IRC | 02:11 | |
*** cwolferh has quit IRC | 02:12 | |
*** shivrao has quit IRC | 02:16 | |
*** xinwu has quit IRC | 02:16 | |
openstackgerrit | Dan Sneddon proposed openstack/tripleo-heat-templates: Add TripleO Heat Template Parameters for Neutron Tenant MTU https://review.openstack.org/273847 | 02:23 |
*** coolsvap|away is now known as coolsvap | 02:36 | |
*** tzumainn has quit IRC | 02:42 | |
*** mgagne has quit IRC | 02:51 | |
*** mgagne has joined #tripleo | 02:51 | |
*** Marga_ has joined #tripleo | 02:59 | |
*** Marga_ has quit IRC | 03:01 | |
*** Marga_ has joined #tripleo | 03:02 | |
*** pradk has quit IRC | 03:16 | |
*** panda_ has quit IRC | 03:23 | |
*** panda_ has joined #tripleo | 03:23 | |
*** pradk has joined #tripleo | 03:28 | |
*** panda_ has quit IRC | 03:29 | |
*** panda_ has joined #tripleo | 03:30 | |
*** alop has quit IRC | 03:30 | |
*** coolsvap is now known as coolsvap|away | 03:30 | |
*** lazy_prince has quit IRC | 03:34 | |
*** yuanying_ has joined #tripleo | 03:38 | |
*** yuanying has quit IRC | 03:40 | |
*** panda_ has quit IRC | 03:45 | |
*** panda_ has joined #tripleo | 03:45 | |
*** zaneb has quit IRC | 03:46 | |
*** coolsvap|away is now known as coolsvap | 03:49 | |
*** yuanying has joined #tripleo | 04:06 | |
*** yuanying_ has quit IRC | 04:06 | |
*** yuanying has quit IRC | 04:11 | |
*** panda_ has quit IRC | 04:14 | |
*** panda_ has joined #tripleo | 04:14 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: Add support for OpenRC to dib-init-system https://review.openstack.org/273772 | 04:17 |
*** yuanying has joined #tripleo | 04:17 | |
*** panda_ has quit IRC | 04:19 | |
*** panda_ has joined #tripleo | 04:20 | |
*** panda_ has quit IRC | 04:28 | |
*** panda_ has joined #tripleo | 04:29 | |
openstackgerrit | Merged openstack/diskimage-builder: Fix debian-minimal image building https://review.openstack.org/273544 | 04:32 |
*** panda_ has quit IRC | 04:34 | |
*** panda_ has joined #tripleo | 04:34 | |
openstackgerrit | Merged openstack/diskimage-builder: Don't use wc -l for the umount check https://review.openstack.org/273386 | 04:35 |
*** tiswanso has quit IRC | 04:36 | |
*** masco has joined #tripleo | 04:37 | |
*** panda_ has quit IRC | 04:41 | |
*** panda_ has joined #tripleo | 04:41 | |
*** lazy_prince has joined #tripleo | 04:47 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder: Only match #!/bin/bash in scripts https://review.openstack.org/273879 | 04:53 |
*** panda_ has quit IRC | 04:54 | |
*** panda_ has joined #tripleo | 04:55 | |
*** killer_prince has joined #tripleo | 05:09 | |
*** anande has joined #tripleo | 05:09 | |
*** anande has quit IRC | 05:10 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: fix gentoo hardened support https://review.openstack.org/273795 | 05:10 |
*** chlong has quit IRC | 05:12 | |
*** lazy_prince has quit IRC | 05:12 | |
*** coolsvap is now known as coolsvap|away | 05:15 | |
*** panda_ has quit IRC | 05:19 | |
*** panda_ has joined #tripleo | 05:20 | |
*** jaosorior has joined #tripleo | 05:20 | |
*** panda_ has quit IRC | 05:20 | |
*** panda_ has joined #tripleo | 05:21 | |
*** lazy_prince has joined #tripleo | 05:25 | |
*** killer_prince has quit IRC | 05:28 | |
*** chlong has joined #tripleo | 05:32 | |
*** Marga_ has quit IRC | 05:58 | |
*** shivrao has joined #tripleo | 06:03 | |
*** chlong has quit IRC | 06:28 | |
*** rcernin has quit IRC | 06:29 | |
jaosorior | marios: Are you around? | 06:29 |
*** coolsvap|away is now known as coolsvap | 06:31 | |
*** xinwu has joined #tripleo | 06:32 | |
*** yamahata has joined #tripleo | 06:33 | |
*** chlong has joined #tripleo | 06:39 | |
*** jaosorior has quit IRC | 06:40 | |
*** dmacpher has quit IRC | 06:41 | |
*** Marga_ has joined #tripleo | 06:43 | |
*** jaosorior has joined #tripleo | 06:47 | |
*** jprovazn has joined #tripleo | 06:52 | |
*** jprovazn has quit IRC | 06:52 | |
*** jprovazn has joined #tripleo | 06:52 | |
marios | jaosorior: o/ morning | 06:53 |
jaosorior | marios: Hey dude, how's it going? | 06:56 |
jaosorior | marios: Hey, I'm checking out the pingtest, and noticed some functions such as | 06:57 |
jaosorior | "tripleo wait_for" and "tripleo user-config" | 06:57 |
jaosorior | marios: Such as here https://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L552 and here https://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L538 | 06:58 |
jaosorior | marios: Where do these come from? | 06:58 |
marios | jaosorior: they are some of the original tripleo-incubator scripts | 07:00 |
marios | sec | 07:00 |
*** panda_ has quit IRC | 07:01 | |
*** shivrao has quit IRC | 07:01 | |
*** panda_ has joined #tripleo | 07:01 | |
marios | jaosorior: https://github.com/openstack/tripleo-incubator/tree/master/scripts on an undercloud env, you'll find them in /usr/libexec/openstack-tripleo/ | 07:01 |
*** aufi has joined #tripleo | 07:02 | |
jaosorior | marios: Thanks dude | 07:06 |
marios | np man | 07:07 |
*** devvesa has joined #tripleo | 07:12 | |
*** rcernin has joined #tripleo | 07:25 | |
*** dshulyak has joined #tripleo | 07:26 | |
jaosorior | marios: hey dude, have you been looking into the CI failures regarding the pingtest? | 07:26 |
marios | jaosorior: there was a failure yesterday but it was about overcloud heat dying like 503 service unavailable | 07:27 |
marios | jaosorior: i haven't looked today yet | 07:27 |
marios | jaosorior: err, yes during the pingtest | 07:27 |
jaosorior | marios: Oh, I see. I was trying to check out the logs, but damn they don't really have much :/ | 07:28 |
marios | jaosorior: spoke briefly with derekh and apparently there was issue with insufficient ram on the overcloud nodes | 07:28 |
marios | jaosorior: sec | 07:28 |
marios | jaosorior: this is what i mean http://logs.openstack.org/13/260413/9/check-tripleo/gate-tripleo-ci-f22-ha/d5fff90/console.html#_2016-01-27_18_55_34_646 | 07:28 |
jaosorior | marios: I see. Started checking it out today, and only thing I saw was something about the compute not being able to assign a vcpu to a domain | 07:29 |
marios | jaosorior: (give it a sec,should load to exact time) | 07:29 |
jaosorior | marios: The error I saw in another commit was that it was actually able to create the tenant, but the ping itself failed | 07:30 |
jaosorior | marios: http://logs.openstack.org/51/273751/1/check-tripleo/gate-tripleo-ci-f22-ceph/91011a9/console.html#_2016-01-28_22_32_12_699 | 07:30 |
*** chlong has quit IRC | 07:32 | |
jaosorior | marios: This is another commit and it shows the same error: http://logs.openstack.org/52/273752/3/check-tripleo/gate-tripleo-ci-f22-ceph/b3ea7ec/console.html#_2016-01-29_03_18_51_582 | 07:33 |
jaosorior | So yeah, apparently we've been looking at different things | 07:33 |
marios | jaosorior: reading sec (sry doing sthing else gimme few) | 07:36 |
*** hjensas has joined #tripleo | 07:36 | |
marios | jaosorior: yeah that one is different, there the vm actually couldn't be pinged | 07:37 |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: WIP: Move hook generation in to python https://review.openstack.org/271139 | 07:39 |
jaosorior | marios: Yeah, those are the ones I've been looking at | 07:39 |
jaosorior | marios: Maybe the lack of memory might be somewhat addressed by this CR https://review.openstack.org/#/c/273752/ | 07:39 |
marios | jaosorior: so here's the thing. the pingtest initially used the fedora-user image, instead of the cirros image. Mainly because I had intermittent (and seemingly random) issues with pinging the cirros vms (we were also then building fedora-user by default so it made sense at the time) | 07:45 |
marios | jaosorior: i don't know if this is related in this case, but if we continue to see this we may need to revisit | 07:47 |
jaosorior | marios: Yeah, maybe it would make sense to look for another small footprint alternative to cirros | 07:47 |
jaosorior | marios: Then again, I'm not sure if the logs in nova that say that it couldn't allocate a vcpu to the domain is related. I am yet to reproduce it in my environment. | 07:48 |
*** mbound has joined #tripleo | 07:51 | |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Move hook generation in to python https://review.openstack.org/271139 | 07:54 |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Move hook generation in to python https://review.openstack.org/271139 | 07:55 |
*** fgimenez has joined #tripleo | 08:04 | |
*** fgimenez has quit IRC | 08:04 | |
*** fgimenez has joined #tripleo | 08:04 | |
*** mbound has quit IRC | 08:08 | |
*** dshulyak has left #tripleo | 08:19 | |
*** dshulyak has joined #tripleo | 08:21 | |
*** athomas has joined #tripleo | 08:23 | |
*** Marga_ has quit IRC | 08:24 | |
*** pblaho has joined #tripleo | 08:26 | |
*** rasca has quit IRC | 08:28 | |
*** rasca has joined #tripleo | 08:29 | |
*** panda_ has quit IRC | 08:30 | |
jaosorior | marios: Well, bnemec added a patch to debug the ping errors. Here it is if you wanna check it out: https://review.openstack.org/273701 I'm following it to see if I can get any info out of it | 08:30 |
*** panda_ has joined #tripleo | 08:30 | |
marios | jaosorior: thanks (heh and of course the pingtest passed on the ha job this time :/ http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-ha/3edb145/console.html#_2016-01-28_23_19_12_108 ) | 08:32 |
jaosorior | marios: Hahaha yeah dude, it's a very sneaky thing to debug | 08:34 |
*** masco has quit IRC | 08:35 | |
*** jcoufal has joined #tripleo | 08:37 | |
*** bvandenh has joined #tripleo | 08:46 | |
*** mbound has joined #tripleo | 08:50 | |
*** shardy has joined #tripleo | 08:55 | |
*** derekh has joined #tripleo | 08:56 | |
*** bvandenh has quit IRC | 08:56 | |
*** shardy has quit IRC | 09:01 | |
*** shardy has joined #tripleo | 09:02 | |
*** panda_ has quit IRC | 09:04 | |
*** panda_ has joined #tripleo | 09:05 | |
*** jcoufal has quit IRC | 09:07 | |
*** jcoufal has joined #tripleo | 09:08 | |
*** bvandenh has joined #tripleo | 09:09 | |
*** gfidente has joined #tripleo | 09:13 | |
derekh | shardy: that change to set HeatWorkers to 1, only changes the API workers, not the engine | 09:14 |
*** coolsvap is now known as coolsvap|away | 09:15 | |
derekh | shardy: it doesn't look like the puppet-heat module exposes num_engine_workers | 09:15 |
shardy | derekh: hrm | 09:15 |
shardy | doh | 09:16 |
shardy | I guess we'll have to fix that then - although don't the puppet modules allow for setting arbitrary config values? | 09:17 |
* shardy tries via extraconfig | 09:17 | |
*** nico_auv has joined #tripleo | 09:18 | |
shardy | https://review.openstack.org/#/c/269071/ | 09:19 |
shardy | Yeah I think we can do it via controllerExtraConfig instead | 09:19 |
shardy | testing now | 09:19 |
derekh | shardy: ack, thanks, if this doesn't work, I'll bump up the overcloud ram over the weekend, | 09:19 |
* derekh will also look into setting some other workers to 1 | 09:20 | |
shardy | kk, sorry I threw that patch up yesterday assuming it'd work, should've tested it locally really | 09:20 |
shardy | we could completely disable the cfn and cloudwatch APIs on the overcloud as well, seeing as we're not testing them | 09:21 |
*** jaosorior has quit IRC | 09:22 | |
*** jaosorior has joined #tripleo | 09:22 | |
*** paramite has joined #tripleo | 09:27 | |
*** jistr has joined #tripleo | 09:30 | |
*** panda_ has quit IRC | 09:34 | |
*** panda_ has joined #tripleo | 09:35 | |
*** links has joined #tripleo | 09:38 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Bump the pacemaker service op_params to 200s for start and stop https://review.openstack.org/272026 | 09:39 |
*** paramite has quit IRC | 09:40 | |
*** paramite has joined #tripleo | 09:42 | |
*** hjensas has quit IRC | 09:52 | |
*** jcoufal_ has joined #tripleo | 09:53 | |
*** paramite has quit IRC | 09:53 | |
*** jcoufal has quit IRC | 09:54 | |
*** paramite has joined #tripleo | 09:55 | |
*** regebro has quit IRC | 09:55 | |
shardy | derekh: Hey, failing to get the right hiera key structure here: | 09:56 |
shardy | https://github.com/openstack/puppet-heat/blob/master/manifests/config.pp | 09:56 |
shardy | any idea what key will be needed to feed in a value for DEFAULT/num_engine_workers? | 09:57 |
shardy | we include the manifest from https://review.openstack.org/#/c/269071/3/puppet/manifests/overcloud_controller.pp | 09:57 |
*** jcoufal has joined #tripleo | 09:57 | |
derekh | shardy: off the top of my head I don't know, will play with it in a few minutes and see if I can find out | 09:58 |
shardy | derekh: thanks, I've tried a few things but evidently doing it wrong | 09:59 |
shardy | the confusing thing is the values take a hash containing a value key | 09:59 |
derekh | shardy: ack | 09:59 |
*** jcoufal_ has quit IRC | 10:00 | |
jaosorior | shardy: I thjink you can actually set arbitrary config values. I did that for keystone | 10:07 |
*** jaosorior has quit IRC | 10:08 | |
*** jaosorior has joined #tripleo | 10:08 | |
jaosorior | shardy: This is the way you would set it https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/hieradata/controller.yaml#L48 | 10:08 |
shardy | jaosorior: aha! | 10:09 |
shardy | thanks | 10:09 |
shardy | I was trying to do heat::config::DEFAULT/num_engine_workers | 10:10 |
jaosorior | shardy: You could use heat_config { 'some name': 'DEFAULT/num_engine_workers' => {value => 'something something'}} | 10:11 |
jaosorior | but that's with the provider object | 10:11 |
jaosorior | better add it to hieradata like the example I put up there. Also, I've been told it's a better way to do it (less clutter in the puppet manifests) | 10:11 |
shardy | jaosorior: ack, yeah I need to do it via hiera as I'm using the controllerExtraConfig parameter | 10:12 |
*** aufi_ has joined #tripleo | 10:12 | |
jaosorior | shardy: Makes sense | 10:13 |
*** aufi has quit IRC | 10:14 | |
jaosorior | marios: the patch adding more output to pingtest in case there are errors finally failed like we wanted it to. Here you go if you wanna check it out too http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-nonha/7da6bd4/console.html#_2016-01-29_09_36_50_341 | 10:15 |
marios | thanks jaosorior | 10:17 |
*** panda_ has quit IRC | 10:18 | |
*** jcoufal has quit IRC | 10:18 | |
*** panda_ has joined #tripleo | 10:19 | |
jaosorior | marios: Seems that the neutron openvswitch agent crashes at some point. Failing with "Cannot allocate memory" | 10:22 |
jaosorior | marios: And if you see some of the output from neutron, it doesn't really crash but it fails to get some resources, giving outputs such as router not found, and the same for the port when trying to get it | 10:22 |
marios | jaosorior: interesting... are you getting this from the controller logs at http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-nonha/7da6bd4/logs/ | 10:24 |
*** athomas has quit IRC | 10:24 | |
jaosorior | marios: Yes | 10:24 |
marios | jaosorior: ok gonna have a closer look in a sec thx | 10:24 |
*** tremble has joined #tripleo | 10:25 | |
*** jcoufal has joined #tripleo | 10:25 | |
marios | jaosorior: fwiw, i remember at the time, that rebooting the vm helped (made it "pingable" again) but i never tracked down the actual cause | 10:25 |
jaosorior | marios: The interesting bits are in /var/log/neutron/openvswitch-agent.log and /var/log/neutron/server.log | 10:26 |
*** mcornea has joined #tripleo | 10:26 | |
marios | jaosorior: wow 51 - - - - -] OSError: [Errno 12] Cannot allocate memory wtf | 10:27 |
jaosorior | marios: fun fun fun :D | 10:29 |
jaosorior | shardy: The issue that you're trying to fix with the num_engine_workers is also related to the running-out-of-memory errors, right? | 10:32 |
shardy | yup | 10:32 |
jaosorior | shardy: Will that help with the memory problems in the overcloud nodes? or is that just for the undercloud? | 10:33 |
shardy | http://paste.openstack.org/show/485383/ | 10:34 |
shardy | that brings the overcloud memory usage down somewhat | 10:35 |
jaosorior | shardy: That looks about right | 10:35 |
shardy | there's more we can do I think | 10:36 |
openstackgerrit | Steven Hardy proposed openstack-infra/tripleo-ci: Override HeatWorkers parameter for deployed overcloud https://review.openstack.org/273431 | 10:36 |
shardy | we're still running multiple nova conductors for example | 10:36 |
* shardy looks for more things to turn off | 10:36 | |
jaosorior | is that patch directed to tripleo-ci because it's meant for testing? Maybe it should be considered to be added to t-h-t | 10:37 |
openstackgerrit | Steven Hardy proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud https://review.openstack.org/273431 | 10:38 |
shardy | jaosorior: It's only directed at CI because we're trying to use not-enough RAM | 10:38 |
shardy | the defaults make much less sense for real deployments | 10:38 |
shardy | we could document the environment for developers tho | 10:38 |
jaosorior | that makes sense | 10:38 |
shardy | the t-h-t defaults make more sense for production/real usage as they are IMO | 10:38 |
derekh | shardy: don't have a running overcloud at the moment to test on, but does this work for you ? http://paste.openstack.org/show/485385/ | 10:48 |
shardy | derekh: thanks - jaosorior actually pointed me at a keystone example and I've got it working now: | 10:49 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/hieradata/controller.yaml#L48 | 10:49 |
shardy | https://review.openstack.org/#/c/273431/5/toci_instack.sh | 10:49 |
derekh | shardy: ok, cool | 10:49 |
shardy | it's a bit of a weird syntax | 10:50 |
derekh | yup | 10:50 |
shardy | Several other services also don't scale down not-api processes via *Workers | 10:50 |
shardy | so I'll push another update shortly reducing nova/neutron backend workers | 10:50 |
shardy | In total it reduces my memory usage by nearly a gig! | 10:51 |
jaosorior | shardy: http://weknowmemes.com/wp-content/uploads/2012/11/mexcellent.jpg | 10:52 |
marios | shardy: \o/ | 10:53 |
shardy | http://giphy.com/gifs/vMnuZGHJfFSTe | 10:53 |
*** athomas has joined #tripleo | 10:54 | |
*** tosky has joined #tripleo | 10:56 | |
*** olap has quit IRC | 11:01 | |
openstackgerrit | Merged openstack/python-tripleoclient: We do not need to pass a NeutronControlPlaneID https://review.openstack.org/265320 | 11:03 |
jistr | shardy: hi, i'm trying to do kilo->liberty upgrades, and apart from other errors i've been able to work around for the time being (e.g. worked around https://bugs.launchpad.net/heat/+bug/1538551 by reverting the NtpServer patch linked there, and hit similar issues with other params too), i'm seeing this happen http://fpaste.org/315822/53997844/ | 11:05 |
openstack | Launchpad bug 1538551 in heat "Unable to update a parameter from string to comma_delimited_list" [Undecided,New] | 11:05 |
*** jcoufal has quit IRC | 11:05 | |
jistr | shardy: do you think Heat is trying to re-deploy all servers from scratch, because we changed properties of the OS::Nova::Server resources? | 11:06 |
*** sbalukoff has quit IRC | 11:06 | |
shardy | jistr: probably - what properties have changed? | 11:06 |
shardy | the one which springs to mind is the user_data | 11:07 |
jistr | shardy: e.g. when i look at controller in stable/liberty, there's a bunch of them -- https://github.com/openstack/tripleo-heat-templates/blame/stable/liberty/puppet/controller.yaml#L638 | 11:07 |
shardy | that will, unfortunately, replace the resource atm | 11:07 |
jistr | shardy: yeah, userdata, software_config_transport, metadata | 11:07 |
jistr | shardy: and for nova computes, the default hostname scheme has changed | 11:08 |
jistr | (as visible from the fpaste) | 11:08 |
shardy | gah, it'll be the user_data which is doing it I think | 11:08 |
gfidente | jistr, shardy I think the properties remained the same | 11:08 |
jistr | shardy: the rest of the changes wouldn't result in redeploy? (name, metadata, software_config_transport) | 11:09 |
gfidente | except for metadata? | 11:09 |
gfidente | not user_data | 11:09 |
jistr | gfidente: no, user_data changed too | 11:09 |
jistr | we had it before | 11:09 |
jistr | but it's now composed from 2 variables | 11:09 |
shardy | jistr: I was thinking of adding a heat property which allows us to ignore any changes to user_data | 11:09 |
shardy | that would help, but we probably can't backport it | 11:09 |
shardy | http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server-props-opt | 11:09 |
shardy | that shows which properties can be update without replacement | 11:10 |
shardy | vs "Updates cause replacement" | 11:10 |
gfidente | jistr, right went from type: OS::TripleO::NodeUserData to type: OS::Heat::MultipartMime ? | 11:10 |
jistr | shardy: what about "locking down" a resource completely instead? wouldn't it be a better suited option for us? | 11:10 |
jistr | i don't think we ever want Heat to replace a server just like that | 11:10 |
shardy | jistr: That is another option, but then the update would just fail | 11:10 |
shardy | ramishra has a patch up which does exactly that | 11:11 |
shardy | but in this case, I think you'd want any new nodes to get the new user_data, and for existing nodes to ignore the change | 11:11 |
jistr | shardy: well i was thinking something in terms of "update the heat model but don't respin the server", instead of UPDATE_FAILED | 11:11 |
jistr | shardy: yea | 11:11 |
shardy | jistr: but then the heat model is corrupted, in that it doesn't reflect reality anymore | 11:11 |
shardy | I guess it's possible tho | 11:11 |
shardy | https://review.openstack.org/#/c/253074/19/doc/source/template_guide/environment.rst | 11:12 |
shardy | that's the new feature ramishra_ has been working on - I need to test it & hopefully we can land it soon | 11:12 |
shardy | In this case tho, we really want a way to say all OS::Nova::Server resources anywhere in the stack are restricted_actions: replace | 11:13 |
shardy | we can probably specify it for each ResourceGroup tho via wildcards | 11:13 |
gfidente | feels like what we did with the NetworkDeployment actions | 11:14 |
shardy | Yeah, anywhere we use SoftwareDeployment this is much easier | 11:14 |
jistr | shardy: yea that sounds like what we need. So when the action is restricted this way, will the update fail or will it just ignore the action? | 11:14 |
shardy | jistr: it will fail | 11:15 |
jistr | ok so it's not enough to solve our problem by itself | 11:15 |
shardy | no, but it's a potential step towards a solution | 11:15 |
jistr | yes | 11:15 |
shardy | for now, we have to figure out how to not change the causes-replacement properties | 11:15 |
shardy | If it's only the user_data, I can potentially post a patch which adds a config option to allow ignoring user_data changes on update | 11:16 |
*** athomas has quit IRC | 11:16 | |
gfidente | shardy, jistr so from what I can see | 11:16 |
gfidente | we used to define user_data as os::tripleo::nodeuserdata | 11:17 |
gfidente | and now we make it instead a multipart which in one of its parts is os::tripleo::nodeuserdata | 11:17 |
shardy | Yeah, now we pass it a multi-part mime archive | 11:17 |
shardy | it was to wire in the heat-admin user | 11:17 |
gfidente | but I don't think it's worth or useful make heat do any introspection there to see if it actually changed or not | 11:17 |
shardy | Ok, give me a few mins, let me see if I can do a heat patch quickly which makes the on-update behavior configurable | 11:18 |
shardy | we probably can't backport a new property, but we potentially can backport a new config option, defaulted to the current behavior | 11:19 |
jistr | gfidente: +1 | 11:19 |
gfidente | jistr, hey why I got the +1? | 11:19 |
gfidente | the not have introspection? | 11:20 |
jistr | re "not useful to make heat do any introspection" | 11:20 |
*** jcoufal has joined #tripleo | 11:20 | |
gfidente | jistr, man you're diving alone | 11:20 |
*** akrivoka has joined #tripleo | 11:20 | |
*** athomas has joined #tripleo | 11:20 | |
gfidente | I was just curious about this because is similar to what happened with the VIPs :P | 11:20 |
jistr | for OS::Nova::Server specifically, what i would see as useful is simply ignore any updates... | 11:21 |
jistr | maybe something like "ignore_actions" rather than "restrict_actions" | 11:21 |
gfidente | yeah the NetworkDeployment thing | 11:21 |
shardy | jistr: Let's wire in doing that for user_data now, and consider the more general case after | 11:21 |
shardy | http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server-prop-flavor_update_policy | 11:22 |
gfidente | but as shardy said that is only for softwaredeployment | 11:22 |
shardy | we could just have update_policy properties for each causes-replacement property, such as flavor/image | 11:22 |
shardy | and add an option "IGNORE" | 11:22 |
*** sbalukoff has joined #tripleo | 11:22 | |
gfidente | that would ignore all properties | 11:22 |
gfidente | or rather a list of properties to be ignored? | 11:23 |
jistr | shardy: yes, "update_policy: ignore" sounds to me like a good way forward | 11:23 |
jistr | gfidente: both would probably work for us... unless there's an update we can perform on OS::Nova::Server without re-spinning | 11:23 |
gfidente | right because thise would be for ::server only? | 11:24 |
jistr | gfidente: yeah i think so | 11:25 |
jistr | perhaps if one was able to specify a list of properties to ignore, it would be a more generally useful construct in heat templates | 11:25 |
jistr | but to solve our problem with replacing servers, ignoring all would work i think | 11:25 |
*** shardy has quit IRC | 11:27 | |
*** shardy has joined #tripleo | 11:28 | |
shardy | https://bugs.launchpad.net/heat/+bug/1539541 | 11:29 |
openstack | Launchpad bug 1539541 in heat "Can't ignore updates to OS::Nova::Server" [Undecided,New] - Assigned to Steven Hardy (shardy) | 11:29 |
shardy | jistr, gfidente ^^ | 11:29 |
jistr | shardy: not sure where your connection dropped, here's a recent chat history http://fpaste.org/316160/40669351/ | 11:29 |
jistr | shardy: thanks | 11:29 |
shardy | jistr: thanks | 11:29 |
*** xinwu has quit IRC | 11:30 | |
*** bvandenh has quit IRC | 11:30 | |
shardy | jistr, gfidente: I'm planning to solve just for the user_data now, then we can address the more general problem in subsequent patches | 11:31 |
shardy | can you confirm it's the user_data change which is forcing replacement in your case? | 11:31 |
shardy | the more general solution will be good, but it's unlikely we can backport such a change to stable branches | 11:31 |
jistr | shardy: ok will try it again with reverted patch which changes the user data. Had to rebuild my env today so i don't have everything ready yet. | 11:33 |
*** fgimenez has quit IRC | 11:45 | |
*** bvandenh has joined #tripleo | 11:45 | |
*** fgimenez has joined #tripleo | 11:45 | |
*** jcoufal has quit IRC | 11:46 | |
*** athomas has quit IRC | 11:46 | |
*** derekh has quit IRC | 11:47 | |
*** athomas has joined #tripleo | 12:03 | |
*** jcoufal has joined #tripleo | 12:03 | |
*** bvandenh has quit IRC | 12:06 | |
jaosorior | shardy: Seems to me the commit you have for reducing the used memory failed also, the nonha case has just failed due to the pingtest, and I think it's the same issue I was discussing with marios. That openvswitch fails because of lack of memory :/ | 12:10 |
slagle | are we sure it's related to memory at this point? | 12:10 |
slagle | i uploaded a patch to add 4gb of swap, and it failed as well | 12:10 |
slagle | on the pingtest | 12:11 |
jaosorior | slagle: the explicit error was that it couldn't allocate memory | 12:11 |
shardy | My local controller is only using 2.7G of ram with that patch | 12:11 |
jaosorior | slagle: Something like this OSError: [Errno 12] Cannot allocate memory wtf | 12:11 |
jaosorior | except for the wtf | 12:11 |
jaosorior | that was from me :P | 12:11 |
slagle | jaosorior: that's in the openvswitch log on the controller? | 12:12 |
jaosorior | slagle: yeah | 12:12 |
shardy | (2.7G not doing anything admittedly) | 12:12 |
slagle | ok, i don't see that in the controller logs on the failed job i'm looking at | 12:13 |
jaosorior | slagle: Funky, I see that in your patch that wasn't the error. So then there's another problem there | 12:14 |
jaosorior | slagle: The patches I was looking at with marios had the "OSError: [Errno 12] Cannot allocate memory" issue :/ | 12:15 |
*** bvandenh has joined #tripleo | 12:17 | |
*** rcernin has quit IRC | 12:18 | |
*** rcernin has joined #tripleo | 12:18 | |
*** weshay_xchat has joined #tripleo | 12:38 | |
*** weshay_xchat is now known as weshay | 12:42 | |
*** masco has joined #tripleo | 12:45 | |
*** ukalifon has joined #tripleo | 12:46 | |
*** julim has joined #tripleo | 12:47 | |
*** lblanchard has joined #tripleo | 12:50 | |
*** ukalifon has quit IRC | 12:50 | |
*** ukalifon1 has joined #tripleo | 12:50 | |
*** ukalifon1 has quit IRC | 13:01 | |
*** dprince has joined #tripleo | 13:13 | |
*** ukalifon1 has joined #tripleo | 13:17 | |
*** trown|outttypeww is now known as trown | 13:18 | |
*** jayg|g0n3 is now known as jayg | 13:22 | |
*** athomas has quit IRC | 13:24 | |
*** electrofelix has joined #tripleo | 13:25 | |
*** fgimenez has quit IRC | 13:28 | |
*** ukalifon1 has quit IRC | 13:28 | |
*** fgimenez has joined #tripleo | 13:32 | |
*** fgimenez has joined #tripleo | 13:32 | |
*** jtomasek_ has joined #tripleo | 13:34 | |
*** jtomasek has quit IRC | 13:34 | |
*** jcoufal has quit IRC | 13:40 | |
*** jcoufal has joined #tripleo | 13:47 | |
ayoung | for jenkinsnumber in range(1, 8): shouldn't this be a random number, or we always hammer the same node: https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/tripleo-jobs.py#L49 | 13:51 |
*** rpothier has joined #tripleo | 13:51 | |
*** pradk has quit IRC | 13:51 | |
jaosorior | shardy: Hey dude, have you figured out why the nonha job is failing for the CR you put up? The one for minimising the memory usage | 14:00 |
*** lazy_prince has quit IRC | 14:02 | |
slagle | i've been running the pingtest in a loop locally, and so far i've gotten it to fail twice | 14:04 |
slagle | the first time, i forgot it cleans up on failure though, so i disabled that and got it to fail again | 14:05 |
slagle | going to look into it in a bit (got pulled into a meeting) | 14:05 |
*** egafford has joined #tripleo | 14:06 | |
*** akuznetsov has joined #tripleo | 14:06 | |
jaosorior | slagle: any hints on where I should be looking? Seems that the places where I was seeing the memory problems are no longer crashing. But there is something else going on that I haven't figured out. | 14:06 |
*** akuznetsov has quit IRC | 14:06 | |
*** akuznetsov has joined #tripleo | 14:07 | |
shardy | jaosorior: sorry, I'm not looking into that atm, I'm working on heat patches to fix the update issue discovered by jistr & gfidente | 14:07 |
*** akuznetsov has quit IRC | 14:07 | |
slagle | jaosorior: none yet. the instance is up, the floating ip is associated according to nova | 14:07 |
slagle | jaosorior: so i'll need to dig into some internals as to why it's not pingable | 14:07 |
*** akuznetsov has joined #tripleo | 14:07 | |
*** jcoufal has quit IRC | 14:08 | |
*** tzumainn has joined #tripleo | 14:09 | |
slagle | the ip is set in the router ns on the controller, but i can't ping it from there either | 14:09 |
*** akuznetsov has quit IRC | 14:10 | |
*** tiswanso has joined #tripleo | 14:11 | |
*** tiswanso has quit IRC | 14:11 | |
*** rlandy has joined #tripleo | 14:12 | |
*** tiswanso has joined #tripleo | 14:12 | |
jaosorior | well, the logs I've seen haven't been too thankful | 14:14 |
jaosorior | slagle: By the way, bnemec put up a patch for debugging the issue. Might be useful to have it permanently as it only outputs more info if it fails https://review.openstack.org/#/c/273701/ | 14:15 |
*** thrash|pto is now known as thrash | 14:16 | |
*** panda_ has quit IRC | 14:18 | |
*** jcoufal has joined #tripleo | 14:18 | |
*** panda_ has joined #tripleo | 14:19 | |
jaosorior | ayoung: that function you posted seems to fetch data for the jenkins jobs. what's up with that? | 14:19 |
ayoung | jaosorior, I'm, learning jenkins. | 14:20 |
ayoung | jaosorior, and I was wondering why we always select the lowest number jenkins server | 14:20 |
jaosorior | ayoung: For fun or some specific reason? :O | 14:20 |
ayoung | seems like we would hammer that one | 14:20 |
ayoung | jaosorior, Keystone HTTPD failed and ran postci | 14:20 |
ayoung | trying to figure out why | 14:20 |
*** jtomasek_ has quit IRC | 14:21 | |
ayoung | jaosorior, log is here: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html scroll to 2016-01-28 18:50:09.657 | 14:21 |
ayoung | http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html#_2016-01-28_18_50_09_687 | 14:21 |
ayoung | missed a bit, but, close enough | 14:21 |
jaosorior | ayoung: One thing to note is that the CI is broken, running out of memory :/ | 14:22 |
ayoung | jaosorior, maybe because we always run ion the same jenkins machine? | 14:22 |
jaosorior | ayoung: Aaaand also some weird pingtest issue that we haven't figured out | 14:23 |
jaosorior | ayoung: This is where it fails: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html#_2016-01-28_18_48_34_046 | 14:23 |
*** ccrouch has quit IRC | 14:23 | |
ayoung | jaosorior, it might be openstackclient. | 14:23 |
jaosorior | ayoung: No idea about that dude, maybe derekh knows something | 14:23 |
*** jcoufal has quit IRC | 14:23 | |
ayoung | jaosorior, I just was looking at my tripleo deploy and running openstack server list gives an error | 14:23 |
*** Goneri has joined #tripleo | 14:23 | |
ayoung | it looks like the openstack client errors on any empty response | 14:23 |
ayoung | let me reproduce and paste | 14:24 |
*** akrivoka has quit IRC | 14:24 | |
jaosorior | ayoung: Well, that specific failure is actually a regular "ping" command, and the nova server is not responding :/ | 14:24 |
ayoung | http://paste.openstack.org/show/485421/ | 14:25 |
ayoung | jaosorior, the log shows a bunch of errors due to emtables not be available, too | 14:25 |
ayoung | that is in worlddump.sh, so not a smoking gun, but it certainly looks ugly | 14:26 |
ayoung | jaosorior, anyway, I want to see what is calling postci... | 14:26 |
*** mbound has quit IRC | 14:27 | |
*** jcoufal has joined #tripleo | 14:27 | |
jaosorior | funky | 14:28 |
ayoung | tripleo.sh -- Overcloud pingtest, trying to ping the floating IPs 192.0.2.51 | 14:30 |
ayoung | OK...so whatever is monitoring that... | 14:30 |
ayoung | jaosorior, let me see if I can run that on my tripleo run | 14:32 |
ayoung | jaosorior, running it now... | 14:33 |
ayoung | +--------------------------------------+--------------+---------------+---------------------+--------------+ | 14:36 |
ayoung | | id | stack_name | stack_status | creation_time | updated_time | | 14:36 |
ayoung | +--------------------------------------+--------------+---------------+---------------------+--------------+ | 14:36 |
ayoung | | c91e393a-9e69-4e60-85e4-cedf2a8043ab | tenant-stack | CREATE_FAILED | 2016-01-29T14:33:11 | None | | 14:36 |
ayoung | +--------------------------------------+--------------+---------------+---------------------+--------------+ | 14:36 |
*** akrivoka has joined #tripleo | 14:36 | |
ayoung | jaosorior, does that leave a log? | 14:37 |
ayoung | actually...let me see if I can tell how to debug... | 14:37 |
*** ccrouch has joined #tripleo | 14:39 | |
ayoung | shardy, if I have a Heat stack that failed, and no output from why it failed, where do I look for the log | 14:39 |
ayoung | http://paste.openstack.org/show/485423/ | 14:40 |
jaosorior | ayoung: You could do heat resource-list -n 5 tenant-stack | 14:41 |
jaosorior | and it will tell you which resource failed | 14:41 |
ayoung | jaosorior, OK.... | 14:41 |
jaosorior | then from there, you could start debugging further | 14:41 |
*** dmacpher has joined #tripleo | 14:42 | |
ayoung | jaosorior, Stack not found: tenant-stack | 14:42 |
ayoung | and I can run via ID | 14:42 |
ayoung | heat resource-list -n 5 c91e393a-9e69-4e60-85e4-cedf2a8043ab | 14:42 |
ayoung | but they are all deleted | 14:42 |
ayoung | like server1 | | OS::Nova::Server | DELETE_COMPLETE | 2016-01-29T14:33:11 | tenant-stack | 14:43 |
ayoung | jaosorior, going to look in nova server logs | 14:43 |
jaosorior | ah yeah, now I remember that slagle had that issue. The pingtest will ultimately delete the tenant-stack after running.. you need to disable that to debug it... :/ | 14:43 |
ayoung | HTTP exception thrown: Flavor m1.demo | 14:44 |
ayoung | could not be found. | 14:44 |
ayoung | hmmm | 14:44 |
ayoung | bc007211-2eb0-4da6-b1d4-9bdd0f7c7943 | m1.demo | 512 | 10 | 0 | 1 | True | 14:44 |
ayoung | so something has made that, | 14:44 |
ayoung | ... | 14:44 |
shardy | https://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L504 | 14:44 |
jaosorior | shardy: Any idea where the m1.demo flavor is created? | 14:45 |
*** ccrouch1 has joined #tripleo | 14:46 | |
ayoung | jaosorior, I'm running the ping test agaiun, and tail -f the nova log... | 14:46 |
shardy | Hmm, no actually I don't | 14:46 |
shardy | https://github.com/openstack/tripleo-common/blob/master/templates/tenantvm_floatingip.yaml#L22 | 14:47 |
shardy | that's where it's referenced | 14:47 |
*** ccrouch has quit IRC | 14:48 | |
ayoung | THere seems to be this line repeated | 14:49 |
ayoung | 2016-01-29 14:48:16.731 13030 INFO nova.api.openstack.wsgi [req-8280787b-eea7-407c-a360-70f0fab7cf1c 4012f6d4a6b6456c825e430dbcca9c53 ff789e71a7ad47b1a36b9fba084479f8 - - -] HTTP exception thrown: Flavor m1.demo could not be found. | 14:49 |
jaosorior | ayoung: Seems to me that m1.demo is assumed to exist in the overcloud | 14:49 |
jaosorior | not sure where it gets created though | 14:49 |
ayoung | jaosorior, it seems to be there, but the ID is a uuid, not a single digit integer | 14:50 |
shardy | Looks like tripleoclient does it in overcloud_deploy.py | 14:50 |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Retry the ping test on failure https://review.openstack.org/274094 | 14:51 |
ayoung | If one Derek Higgins was logged in to IRC.... | 14:51 |
ayoung | we'd ask him | 14:51 |
openstackgerrit | Merged openstack/python-tripleoclient: Set NeutronMetadataProxySharedSecret https://review.openstack.org/268131 | 14:51 |
ayoung | shardy, so the log of the run shows this http://paste.openstack.org/show/485424/ | 14:52 |
ayoung | how does Heat decide that the stack has failed? | 14:53 |
shardy | https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L493 | 14:53 |
shardy | ayoung: I think the error was before that but the CI didn't catch the failure correctly an exit | 14:53 |
shardy | we presumably failed to create the flavor there (no, I don't know why we're doing that in the client btw) | 14:54 |
ayoung | shardy, I'm running this on a Tripleo I set up myselt, not CI | 14:54 |
ayoung | I have that flavor defined in my overcloud like this: | 14:54 |
shardy | ayoung: source overcloudrc then do nova flavor-list | 14:54 |
shardy | Is there a m1.demo? | 14:54 |
ayoung | http://paste.openstack.org/show/485425/ | 14:55 |
*** akuznetsov has joined #tripleo | 14:55 | |
ayoung | that is overcloud | 14:55 |
shardy | odd, the heat stack makes that exact same call then decides it can't see the m1.demo image | 14:58 |
jaosorior | O_O | 14:58 |
*** jcoufal has quit IRC | 14:59 | |
*** akuznetsov has quit IRC | 14:59 | |
*** jcoufal has joined #tripleo | 15:00 | |
*** morazi has joined #tripleo | 15:02 | |
ayoung | shardy, how do you know that? | 15:02 |
shardy | ayoung: The pingtest creates a server via a heat template, which I linked earlier | 15:03 |
ayoung | shardy, looking at the heat template now | 15:03 |
shardy | all heat does is call novaclient | 15:03 |
shardy | is it actually that nova itself can't find the flavor when we call to create the server? | 15:04 |
ayoung | that is the creat line at 493..where is the fail? | 15:04 |
shardy | ayoung: your pasted nova-api log shows it can't find m1.demo, but we just listed it, hence I am confused | 15:07 |
shardy | all heat does is a novaclient call equivalent to "nova boot" | 15:07 |
*** Goneri has quit IRC | 15:08 | |
ayoung | shardy, let me try to do that manually. | 15:09 |
*** Goneri has joined #tripleo | 15:10 | |
*** ukalifon1 has joined #tripleo | 15:10 | |
*** jcoufal_ has joined #tripleo | 15:16 | |
ayoung | shardy, maybe the issue is actually that there are no images | 15:16 |
ayoung | openstack image list | 15:16 |
ayoung | list index out of range | 15:16 |
ayoung | bm-deploy-ramdisk ? | 15:17 |
*** mbound has joined #tripleo | 15:18 | |
*** jcoufal has quit IRC | 15:18 | |
*** pradk has joined #tripleo | 15:18 | |
*** bvandenh has quit IRC | 15:18 | |
*** ccrouch1 has quit IRC | 15:20 | |
*** mbound_ has joined #tripleo | 15:21 | |
*** ccrouch has joined #tripleo | 15:23 | |
*** mbound has quit IRC | 15:23 | |
*** anande has joined #tripleo | 15:24 | |
*** masco has quit IRC | 15:24 | |
openstackgerrit | James Slagle proposed openstack/tripleo-common: Output some debug info when pingtest fails https://review.openstack.org/273701 | 15:26 |
*** pradk has joined #tripleo | 15:26 | |
ayoung | jaosorior, does that make sense? That the image is missing? | 15:28 |
slagle | bnemec: check out my addition to ^^ | 15:29 |
bnemec | slagle: Ah, good call. | 15:31 |
slagle | i'm going to check if these cirros images are booting with no_timer_check | 15:31 |
slagle | i dont see it in /proc/cmdline from within the instance itself | 15:32 |
slagle | shouldnt I? | 15:33 |
*** paramite is now known as paramite|afk | 15:33 | |
*** akrivoka has quit IRC | 15:35 | |
*** dprince has quit IRC | 15:35 | |
jaosorior | ayoung: that might as well be the issue. But it's kind of puzzling that it spits the wrong error then. | 15:36 |
*** fgimenez has quit IRC | 15:37 | |
*** jprovazn has quit IRC | 15:38 | |
*** fgimenez has joined #tripleo | 15:39 | |
jaosorior | ayoung: Is that system you're testing on accessible from somewhere? | 15:41 |
ayoung | jaosorior, yes. I'll PM you | 15:43 |
*** jdob has quit IRC | 15:43 | |
*** jcoufal_ has quit IRC | 15:45 | |
*** anande has quit IRC | 15:46 | |
*** thrash has quit IRC | 15:47 | |
*** xinwu has joined #tripleo | 15:47 | |
*** mbound_ has quit IRC | 15:47 | |
*** paramite|afk is now known as paramite | 15:50 | |
*** jdob has joined #tripleo | 15:51 | |
*** xinwu has quit IRC | 15:55 | |
*** yamahata has quit IRC | 15:55 | |
*** yamahata has joined #tripleo | 15:56 | |
*** akuznetsov has joined #tripleo | 15:56 | |
*** mbound has joined #tripleo | 15:56 | |
*** mcornea has quit IRC | 15:58 | |
*** absubram has joined #tripleo | 15:58 | |
*** jhenner1 has joined #tripleo | 15:58 | |
*** jhenner has quit IRC | 16:00 | |
*** akuznetsov has quit IRC | 16:00 | |
*** dshulyak has quit IRC | 16:01 | |
ayoung | shardy, jaosorior, we don't grab the controller logs in gerrit. Makes it hard to debug. I | 16:01 |
ayoung | think that the glance issue is telling | 16:01 |
*** thrash has joined #tripleo | 16:02 | |
*** thrash has joined #tripleo | 16:02 | |
jaosorior | ayoung: You can see the controller logs in gerrit | 16:03 |
ayoung | jaosorior, in logs...looking | 16:04 |
ayoung | jaosorior, where? | 16:06 |
ayoung | http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/ only has the one subdir, under that | 16:06 |
jaosorior | ayoung: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/ | 16:06 |
ayoung | http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/ does not have the glance log. | 16:06 |
jaosorior | ayoung: Specifically this file http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/overcloud-controller-0.tar.xz | 16:06 |
ayoung | thanks | 16:06 |
jaosorior | ayoung: Only reason heat spits out for the failure in that pingtest is ClientException: resources.server1: Unknown Error (HTTP 502) | 16:07 |
*** bvandenh has joined #tripleo | 16:07 | |
*** jhenner1 has quit IRC | 16:08 | |
ayoung | jaosorior, the fact that the glance log is full of errors and the fact that there are no images in glance is coincidence of causal? | 16:09 |
ayoung | ah... | 16:11 |
ayoung | | fe176d46-682e-4303-abe5-c2316a85c828 | pingtest_image | active | | 16:11 |
ayoung | so I assume pingtest cleans that up at the end | 16:11 |
jaosorior | ayoung: I removed the cleanup of pingtest in the last run | 16:11 |
ayoung | ok..so not glance. | 16:12 |
jaosorior | but currently there's seems to be some funky thing. getting a 502 from nova | 16:12 |
ayoung | what was the call that triggered that? | 16:12 |
ayoung | is there a 502 in the nova api log? | 16:12 |
*** absubram has quit IRC | 16:13 | |
jaosorior | look | 16:13 |
ayoung | jaosorior, I don't see one | 16:13 |
ayoung | you sure it is from nova? | 16:14 |
jaosorior | nova-api.log shows this when I try to do server list | 16:14 |
jaosorior | 2016-01-29 16:13:36.788 13015 INFO oslo_service.service [-] Child 19808 killed by signal 9 | 16:14 |
jaosorior | 2016-01-29 16:13:37.008 19831 INFO nova.osapi_compute.wsgi.server [req-9a6f67fb-0f54-4cd6-878d-d1433deee590 - - - - -] (19831) wsgi starting up on http://192.0.2.22:8774/ | 16:14 |
ayoung | run it again, please | 16:15 |
ayoung | I bet it is an out of memoruy....let me look in the journal | 16:15 |
jaosorior | now it's hanging | 16:15 |
ayoung | Jan 29 16:15:05 overcloud-controller-0 cinder-scheduler[12156]: 2016-01-29 16:15:05.868 12156 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.22:5672 is unreachable: [Errno 111] ECONNREFUSED | 16:16 |
ayoung | ps -ef | grep rabbit | 16:16 |
ayoung | shows nothing | 16:16 |
jaosorior | yeah | 16:16 |
jaosorior | I had the feeling rabbit had died | 16:16 |
jaosorior | Had some issues with that before | 16:17 |
ayoung | Jan 29 14:43:39 overcloud-controller-0 cinder-volume[12118]: 2016-01-29 14:43:39.161 12252 ERROR oslo_service.periodic_task OSError: [Errno 12] Cannot allocate memory | 16:17 |
ayoung | Hehe | 16:17 |
ayoung | the rabbit done died | 16:17 |
jaosorior | perkele | 16:17 |
*** derekh has joined #tripleo | 16:18 | |
jaosorior | ayoung: Might be worth re-deploying with this addition https://review.openstack.org/#/c/273752/ | 16:18 |
ayoung | "Billy Jean is not my lover..." | 16:18 |
ayoung | oooh, not if we can help it | 16:19 |
jaosorior | that's slagle's change adding a swap partition to the overcloud nodes | 16:19 |
*** mbound has quit IRC | 16:23 | |
*** aufi_ has quit IRC | 16:24 | |
*** jhenner has joined #tripleo | 16:24 | |
*** panda_ has quit IRC | 16:25 | |
*** panda_ has joined #tripleo | 16:26 | |
*** rcernin has quit IRC | 16:27 | |
*** paramite is now known as paramite|afk | 16:28 | |
*** ccrouch1 has joined #tripleo | 16:30 | |
*** jaosorior has quit IRC | 16:31 | |
*** ccrouch has quit IRC | 16:32 | |
EmilienM | gfidente: please look https://review.openstack.org/#/c/271260 | 16:35 |
slagle | bnemec: ok, fwiw, the fedora cloud image has no_timer_check baked into the kernel cmdline, so i thnk we need to switch to use that | 16:36 |
slagle | and it also took 134s to boot, plus the time to associate the floating ip | 16:36 |
*** ukalifon1 has quit IRC | 16:36 | |
slagle | so need to up the timeout as well | 16:36 |
bnemec | Yeah, that's uncomfortably close to 180, especially in heavily loaded CI systems. | 16:37 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Make pingtest partially voting https://review.openstack.org/274151 | 16:40 |
bnemec | slagle: marios: ^ | 16:40 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-heat-templates: Add swap to overcloud nodes https://review.openstack.org/273752 | 16:42 |
derekh | I've reproduced the ping test failures | 16:46 |
derekh | the console log for the instance on the overcloud shows this http://paste.openstack.org/show/485454/ | 16:46 |
derekh | and it stops there | 16:46 |
derekh | any ideas ? | 16:46 |
derekh | fric it, that what yer talking about | 16:47 |
derekh | slagle: bnemec ^ is that what ye are seeing ? | 16:47 |
bnemec | FFS. How has cirros not fixed that? | 16:48 |
bnemec | And how does this not kill the other OpenStack CI jobs? | 16:48 |
bnemec | Because yeah, that's exactly the no_timer_check problem. | 16:49 |
*** absubram has joined #tripleo | 16:49 | |
bnemec | I just restored my fedora image change and am cleaning it up so it can actually merge. | 16:49 |
bnemec | slagle: derekh: ^ | 16:49 |
derekh | bnemec: I've virsh destroy/started an image that failed and it booted the second time fone, ping works | 16:49 |
derekh | bnemec: ack, lets do that | 16:50 |
*** bvandenh has quit IRC | 16:50 | |
derekh | I've checked the logs from shards "use less mem" test and it has gotten rid of the OOm problem (or push it down the road a bit | 16:51 |
slagle | derekh: yep, that's the no_timer_check thing i pointed out earlier | 16:51 |
derekh | https://review.openstack.org/#/c/274151/ | 16:52 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Use Fedora image for ping test https://review.openstack.org/273699 | 16:52 |
*** bvandenh has joined #tripleo | 16:52 | |
bnemec | I doubled the ping timeout to 360. Let me know if you think I should go higher. | 16:53 |
derekh | bnemec: will do, | 16:53 |
*** paramite|afk is now known as paramite | 16:54 | |
d0ugal | Is this the correct way to do releases? | 16:56 |
d0ugal | https://wiki.openstack.org/wiki/TripleO/ReleaseManagement | 16:56 |
slagle | i'll pull that fedora user image patch and run it in a loop | 16:56 |
bnemec | d0ugal: Yes. | 16:56 |
bnemec | d0ugal: Although if you haven't released before you may need to get the bit flipped to allow it. | 16:57 |
trown | testing the fedora ping test as well, I think that is a much better solution than dropping the ping test from the HA job | 16:57 |
bnemec | Or one of us could do it. | 16:57 |
d0ugal | bnemec: ah, I want to release python-tripleoclient liberty | 16:57 |
bnemec | trown: I don't think that's going to solve the problem by itself. We'll see though. | 16:57 |
d0ugal | bnemec: If you could help that would be great. I've not released before. | 16:58 |
d0ugal | How do I get the bit flipped? | 16:58 |
trown | my concern is that the HA job and the nonHA job are not testing the same templates | 16:59 |
trown | so it is totally possible to break the overcloud for just HA if we turn off ping test | 16:59 |
bnemec | d0ugal: slagle did it for me. | 16:59 |
bnemec | Sadly. I kind of liked not being on the hook for releases. :-) | 16:59 |
bnemec | trown: True. :-( | 16:59 |
slagle | what bits need flipping? | 16:59 |
slagle | besides most of them, in a general sense | 17:00 |
trown | :) | 17:00 |
d0ugal | slagle: The bit that allows me to relase | 17:00 |
bnemec | slagle: d0ugal needs a release of tripleoclient. | 17:00 |
bnemec | Specifically liberty, although we should go ahead and release master too. | 17:00 |
d0ugal | thrash: FYI ^ | 17:00 |
d0ugal | bnemec: Yeah, we've never released tripleoclient. | 17:00 |
*** fgimenez has quit IRC | 17:00 | |
slagle | i thought i gave them those perms, let me check | 17:00 |
d0ugal | Maybe I just need to follow the wiki | 17:01 |
derekh | So the other error I'm seeing a lot is timeouts during the overcloud deploy, will start looking into that now | 17:01 |
*** fgimenez has joined #tripleo | 17:01 | |
*** fgimenez has quit IRC | 17:01 | |
*** fgimenez has joined #tripleo | 17:01 | |
bnemec | derekh: I have a patch up to help with that too: https://review.openstack.org/273716 | 17:02 |
bnemec | It won't fix them, but at least we'll be able to debug properly (I hope). | 17:02 |
derekh | I'm pretty sure we need this too, without it we get OOM errors all over the place on the overcloud https://review.openstack.org/#/c/273431/5 | 17:02 |
bnemec | Or slagle's swap change. Or both. :-) | 17:03 |
bnemec | I thought 1 heat engine would cause messaging timeouts :-/ | 17:03 |
*** dprince has joined #tripleo | 17:03 | |
derekh | bnemec: only on deeply nested stacks | 17:04 |
derekh | bnemec: on the overcloud we are testing a simple stack | 17:04 |
bnemec | Do we not qualify? :-) | 17:04 |
*** cwolferh has joined #tripleo | 17:04 | |
bnemec | Ah, gotcha. | 17:04 |
derekh | bnemec: I'm pretty sure we would qualify if it was the undercloud | 17:04 |
slagle | d0ugal: you're not added to do releases, so i'll just release it real quick | 17:05 |
derekh | bnemec: I think thats the wrong timeout | 17:05 |
slagle | d0ugal: so master will become 1.0.0 | 17:05 |
slagle | and i'll release stable/liberty as 0.1.0 | 17:05 |
derekh | bnemec: let me see if I can confirm | 17:05 |
bnemec | derekh: It doesn't take long to actually create the stack does it? It's just waiting for the instance to finish booting that's slower on fedora. | 17:05 |
bnemec | That was my reasoning anyway. | 17:05 |
d0ugal | slagle: thrash had suggested 0.1.0 for master and 0.0.11 for liberty | 17:06 |
d0ugal | slagle: I guess because we have a 0.0.10 tag already | 17:06 |
d0ugal | but I don't mind either way | 17:06 |
thrash | d0ugal: I'm good with 1.0.0 and 0.1.0 | 17:07 |
d0ugal | slagle: ^ | 17:07 |
slagle | yea, already done :) | 17:07 |
thrash | since 0.0.10 was for kilo | 17:07 |
thrash | :) | 17:07 |
d0ugal | hah | 17:07 |
slagle | liberty is 0.1.1 | 17:07 |
derekh | bnemec: Sorry, I thought you did something completly different, ignore me | 17:08 |
d0ugal | slagle: Thanks! | 17:08 |
bnemec | derekh: Oh, I just realized you weren't talking about the fedora patch. :-) | 17:08 |
derekh | bnemec: I was wrong anyways | 17:08 |
*** akuznetsov has joined #tripleo | 17:08 | |
bnemec | derekh: We might want to set other timeouts too. I think this will help with the majority of the problems we're hitting though. | 17:08 |
derekh | bnemec: ack | 17:09 |
slagle | d0ugal: thrash : it's live, http://tarballs.openstack.org/python-tripleoclient/ | 17:09 |
d0ugal | slagle: Awesome. Thank you. | 17:09 |
slagle | derekh: bnemec : fwiw, this HA job appears to be hung, https://jenkins03.openstack.org/job/gate-tripleo-ci-f22-ha/346/console | 17:11 |
derekh | bnemec: your using a F21 cloud image, wanna bump it to F23 ? | 17:12 |
bnemec | Yeah, we hit that quite a bit: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=build_name:%20*tripleo-ci*%20AND%20build_status:%20FAILURE%20AND%20message:%20\%22GATE_RETVAL%3D137\%22 | 17:12 |
devvesa | marios, shardy: Sorry, I don't know what's wrong with this patch: https://review.openstack.org/#/c/260413/ | 17:12 |
slagle | i dont know if there's any way to investigate that | 17:12 |
devvesa | can I help in something? | 17:12 |
bnemec | derekh: No, I did that on purpose. It's the exact image we were using for our downstream CI, and it worked well there. | 17:12 |
bnemec | It's also smaller than the newer fedora images for some reason. | 17:12 |
bnemec | slagle: That's my overcloud timeout patch. Right now we don't even get logs from hung runs. | 17:13 |
bnemec | ETOOMANYPROBLEMS | 17:13 |
* derekh should have read the commit message | 17:13 | |
*** akuznetsov has quit IRC | 17:13 | |
bnemec | :-) | 17:13 |
derekh | slagle: I'm going to log into that instance and see if I can get some logs before it times out | 17:15 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Set overcloud deploy timeout https://review.openstack.org/273716 | 17:16 |
*** dprince has quit IRC | 17:16 | |
bnemec | I realized this morning that the comment on the first version of ^ was completely incomprehensible. :-) | 17:16 |
*** absubram has quit IRC | 17:17 | |
bnemec | devvesa: I think we're mostly just trying to figure out the right combination of configurations that will get the ping test working consistently. | 17:17 |
*** dprince has joined #tripleo | 17:17 | |
*** absubram has joined #tripleo | 17:17 | |
derekh | slagle: | fdb7e332-d1d9-4260-8032-face713b2d8c | overcloud | CREATE_COMPLETE | 2016-01-29T15:43:36 | None | | 17:19 |
*** bvandenh has quit IRC | 17:21 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Enable undercloud ssl on nonha job https://review.openstack.org/273743 | 17:22 |
derekh | slagle: bnemec so the overcloud deploy command looks like its missed the fact that the Heat stack has completed http://paste.openstack.org/show/485459/ | 17:22 |
slagle | derekh: ok, i've seen this locally before | 17:22 |
*** tiswanso has quit IRC | 17:22 | |
slagle | the stack is create complete, but triopleoclient must not think so for some reason | 17:22 |
*** tiswanso has joined #tripleo | 17:23 | |
bnemec | That's strange. I wonder if one of its heat calls hung or something. | 17:24 |
derekh | [root@instack ~]# netstat -pn | grep -i 19578 | 17:25 |
derekh | tcp 1 0 192.0.2.1:60506 192.0.2.1:9292 CLOSE_WAIT 19578/python2 | 17:25 |
derekh | tcp 1 0 192.0.2.1:44483 192.0.2.1:8774 CLOSE_WAIT 19578/python2 | 17:25 |
derekh | strace is showing lots of calls to heat | 17:29 |
*** fgimenez has quit IRC | 17:30 | |
derekh | http://paste.openstack.org/show/485461/ | 17:30 |
devvesa | bnemec: Thanks for the info! | 17:30 |
*** xinwu has joined #tripleo | 17:32 | |
*** bvandenh has joined #tripleo | 17:34 | |
*** pblaho has quit IRC | 17:34 | |
*** yamahata has quit IRC | 17:35 | |
derekh | bnemec: slagle instance is gone, best I could get is a2 minutes of strace from the deploy command, its not stuck, maybe in a loop | 17:41 |
derekh | http://goodsquishy.com/downloads/strace.log | 17:41 |
*** ccrouch1 has quit IRC | 17:43 | |
*** rwsu has quit IRC | 17:46 | |
*** rwsu has joined #tripleo | 17:46 | |
*** ccrouch has joined #tripleo | 17:46 | |
*** ccrouch has joined #tripleo | 17:46 | |
*** gfidente has quit IRC | 17:48 | |
*** jistr has quit IRC | 17:49 | |
ayoung | derekh, hey, I saw your commit for retry on the ping test | 17:54 |
*** paramite is now known as paramite|afk | 17:54 | |
ayoung | derekh, I was able to reproduce the error on a local machine, I don't think retry is going to help. | 17:54 |
derekh | ayoung: I was just kinda test to see what happen, we thing we found the problem earlier ^^ , looks like switching to the fedora cloud image should help us | 17:56 |
ayoung | derekh, is it a memory constraint issue? | 17:56 |
*** yamahata has joined #tripleo | 17:57 | |
derekh | ayoung: nope, the cirros image is failing to setup a timer on boot | 17:58 |
derekh | ayoung: we also have a memory issue | 17:58 |
derekh | ayoung: https://review.openstack.org/#/c/273431/ | 17:59 |
trown | derekh: what is the timer on boot thing? | 17:59 |
trown | the fedora ping test is not working for me | 17:59 |
derekh | trown: this is the console of the instance when its booting http://paste.openstack.org/show/485454/ | 18:00 |
derekh | trown: its stalls there | 18:00 |
derekh | trown: slagle and bnemec seem to have seen it before | 18:00 |
* derekh has gotta run, will check back later | 18:01 | |
*** derekh has quit IRC | 18:01 | |
trown | hmm, I have not seen that. I wonder what is different in CI | 18:01 |
*** paramite|afk is now known as paramite | 18:01 | |
*** bvandenh has quit IRC | 18:04 | |
slagle | trown: it's a known issue with cirros | 18:04 |
slagle | it could be anything really, environmental | 18:05 |
slagle | kernel versions, load, etc | 18:05 |
trown | slagle: have you had success with the fedora based pingtest? | 18:05 |
slagle | trown: i'm not running the exact same code | 18:06 |
slagle | it works for me, but i'm using network isolation | 18:06 |
slagle | so i create the overcloud network slightly differently | 18:06 |
slagle | dunno if that matters | 18:06 |
trown | hmm... I would think if anything that would have a lower chance of success | 18:07 |
slagle | why? | 18:07 |
trown | just more complicated | 18:07 |
slagle | oh. it works fine | 18:07 |
slagle | "for me" :) | 18:07 |
trown | with the same flavor that the tripleo.sh code makes? | 18:08 |
slagle | do your oc nodes already have >4gb ram? | 18:08 |
*** paramite has quit IRC | 18:09 | |
*** Marga_ has joined #tripleo | 18:09 | |
trown | nope, they have exactly 4 | 18:10 |
slagle | trown: oh snap, we probably need to make that flavor bigger | 18:10 |
*** Marga_ has quit IRC | 18:11 | |
*** Marga_ has joined #tripleo | 18:11 | |
bnemec | slagle: Wasn't the m1.demo flavor created solely for the purpose of booting this image? | 18:12 |
slagle | bnemec: probably. m1.demo wfm | 18:13 |
slagle | trown: ^ | 18:13 |
*** absubram has quit IRC | 18:13 | |
trown | k, rerunning even the cirros test against the same cloud it passed on a moment ago is failing, so maybe both are still flaky | 18:14 |
* bnemec fires up an environment so he can get in on this testing fun :-) | 18:15 | |
slagle | bnemec: we will likely have to merge https://review.openstack.org/#/c/273699/ and https://review.openstack.org/#/c/273431/ together | 18:15 |
slagle | should we depends-on one or the other? | 18:15 |
trown | I am running with the shardy single worker overcloud patch too fwiw | 18:15 |
bnemec | slagle: If we want to test them in the same job or are worried about them not getting merged at the same time. | 18:18 |
*** xinwu has quit IRC | 18:18 | |
trown | slagle: bnemec, is `nova console-log` working for either of you? I get nothing for either fedora/cirros | 18:20 |
trown | actually scratch that it works for cirros... I did not git reset --hard enough | 18:22 |
bnemec | git reset harder! | 18:22 |
bnemec | Or --harder | 18:22 |
trown | cirros ping is working for me while fedora is not, and I do not even get console logs from fedora | 18:22 |
slagle | both of those work for me, but i don't have shardy's patch | 18:24 |
bnemec | I don't actually test my changes. I follow the "push and pray" methodology. :-) | 18:24 |
bnemec | Also, I've never been able to reproduce any of these pingtest problems locally, so it hasn't been terribly useful in the past. | 18:25 |
trown | I have had pretty mixed success with the pingtest in rdoci... but I also have had trouble reproducing locally | 18:26 |
ayoung | derekh, is there a review out there for replacing cirros with fedora cloud image for the ping test? I can try it out | 18:26 |
trown | ayoung: https://review.openstack.org/#/c/273699/ | 18:26 |
slagle | LOL | 18:26 |
trown | slagle: are you laughing at me? I have not actually had hard enough time, since we have another one on the way :p | 18:27 |
*** penick has joined #tripleo | 18:27 | |
slagle | no i was laughing at ayoung | 18:27 |
slagle | :) | 18:27 |
ayoung | slagle, so, in addition to this, there is a Keystone midcycle going on that I am attending remotely (you can guess how well that worked) and I had an insurance inspector show up at my door, meanwhile hl;eping a teammate get ready for presenting on ourstuff at devconf. Where am I again? | 18:29 |
trown | redeploying without the single worker environment | 18:29 |
ayoung | slagle, I suspect that memory issues are also hurting my setup. I'm tempted to teardown the undercloud on this setup and restore with larger vms | 18:30 |
slagle | ayoung: sorry, it just made me chuckle since we were talking about the patch | 18:30 |
ayoung | slagle, no worries. If I had trouble with people laughing at me, I wouldn't be able to get my work done | 18:30 |
ayoung | I'm just hoping to be able to make a constructive contriubution here | 18:30 |
*** ccrouch1 has joined #tripleo | 18:32 | |
bnemec | Do we want to just go ahead and merge https://review.openstack.org/#/c/273701/ ? | 18:34 |
bnemec | It's passed 2 of the 3 test jobs, and it pretty clearly didn't break the ceph job. | 18:34 |
bnemec | And it should be helpful debugging all of the other changes. | 18:34 |
*** ccrouch has quit IRC | 18:35 | |
*** rasca has quit IRC | 18:35 | |
slagle | wfm. | 18:35 |
slagle | no one else is around, so i guess we're in charge :) | 18:36 |
*** rasca has joined #tripleo | 18:36 | |
* bnemec merges all the things | 18:36 | |
bnemec | The shotgun approach to debugging. :-) | 18:37 |
*** nico_auv has quit IRC | 18:38 | |
*** jhenner has quit IRC | 18:38 | |
*** jhenner has joined #tripleo | 18:38 | |
*** absubram has joined #tripleo | 18:38 | |
*** sthillma has joined #tripleo | 18:39 | |
*** jhenner has quit IRC | 18:39 | |
*** jhenner has joined #tripleo | 18:39 | |
*** jhenner has quit IRC | 18:40 | |
*** shivrao has joined #tripleo | 18:43 | |
thrash | +2 from me. :) | 18:44 |
trown | thrash: to shotguns or the patch :p | 18:45 |
thrash | the patch | 18:45 |
thrash | maybe the shotguns too... Depends how desperate we are. | 18:45 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud https://review.openstack.org/273431 | 18:50 |
slagle | bnemec: i added a depends-on here ^ on the fedora cloud image one | 18:50 |
openstackgerrit | Merged openstack/tripleo-common: Output some debug info when pingtest fails https://review.openstack.org/273701 | 18:50 |
trown | that ceph job looks like it timed out during the ceph part of deployment | 18:50 |
slagle | if it passes, we can merge them both | 18:51 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-docs: First documentation for Operational tools https://review.openstack.org/265271 | 18:52 |
openstackgerrit | Emilien Macchi proposed openstack/instack-undercloud: puppet: fix uchiwa password parameter https://review.openstack.org/274202 | 18:55 |
EmilienM | dprince: a quick one ^ | 18:55 |
*** alop has joined #tripleo | 18:56 | |
dprince | EmilienM: quick review, but we wait on the CI | 18:56 |
EmilienM | no prob | 18:56 |
dprince | which won't even test this yet | 18:56 |
*** pcaruana has joined #tripleo | 18:57 | |
*** xinwu has joined #tripleo | 18:57 | |
trown | slagle: bnemec, just redeployed without the single worker env and fedora ping test works | 18:59 |
slagle | i see | 18:59 |
trown | not really sure how having a single worker would affect that | 18:59 |
slagle | guess i'll try it now :) | 18:59 |
slagle | i added a depends on so those 2 patches run together in ci, so we'll see there as well | 19:00 |
trown | ya I will be interested if we get the same thing with nova console-log not returning anything | 19:00 |
slagle | goodbye overcloud, you were a good one | 19:01 |
slagle | you lasted about 4 hours | 19:01 |
*** tosky has quit IRC | 19:02 | |
*** devvesa has quit IRC | 19:04 | |
*** akuznetsov has joined #tripleo | 19:08 | |
*** akuznetsov has quit IRC | 19:13 | |
slagle | trown: were you testing ha? | 19:14 |
trown | slagle: ya | 19:16 |
* bnemec toddles off to eat while his overcloud deploys | 19:16 | |
trown | ha is the only thing that is flaky for me... I am starting think it is just not possible to test ha on a 32G host | 19:16 |
slagle | 32g ought to be enough | 19:19 |
slagle | the problem i run into with ha is having only a 4 core box | 19:19 |
slagle | with 5 vm's, plus the host, it's pretty much unusable | 19:20 |
trown | hmm, maybe that is more of my issue then | 19:20 |
*** absubram has quit IRC | 19:20 | |
trown | not sure what the centosci boxes have, but I bet it is not more than my dell mini, which is 4 cores | 19:20 |
*** mkovacik has joined #tripleo | 19:22 | |
trown | I get OOM alot though testing ha | 19:24 |
*** pcaruana has quit IRC | 19:24 | |
slagle | oh aren't you giving the uc a lot more ram in rdoci? | 19:25 |
trown | 12G for the ha job and 16 for the nonha, but I get OOM on the overcloud nodes | 19:29 |
*** rbrady has quit IRC | 19:36 | |
*** pcaruana has joined #tripleo | 19:37 | |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Perform a booting test for our images https://review.openstack.org/204639 | 19:37 |
*** jistr has joined #tripleo | 19:37 | |
slagle | i believe i see the issue with the fedora image patch :) | 19:39 |
*** ccrouch1 has quit IRC | 19:39 | |
slagle | html documents don't make very good images | 19:39 |
*** electrofelix has quit IRC | 19:40 | |
slagle | that url is redirecting | 19:40 |
bnemec | Ah. | 19:40 |
bnemec | I just deleted my overcloud instead of the tenant stack again. | 19:41 |
bnemec | It's a real problem having that muscle memory for heat stack-delete overcloud. | 19:41 |
slagle | i almost did that earlier, but luckily i had overcloudrc sourced | 19:41 |
*** ccrouch has joined #tripleo | 19:41 | |
*** eil397 has joined #tripleo | 19:41 | |
trown | slagle: odd... sometimes it doesnt, but ya.. size=344 is suspect for sure :) | 19:41 |
*** thrash is now known as thrash|biab | 19:41 | |
openstackgerrit | James Slagle proposed openstack/tripleo-common: Use Fedora image for ping test https://review.openstack.org/273699 | 19:42 |
slagle | yep :) | 19:42 |
bnemec | One a semi-related note, we need to change the default for that heatclient env var. | 19:43 |
trown | that also then makes much more sense why we get no console-log | 19:43 |
bnemec | I keep forgetting to set it and end up with an orphaned stack and network because the ping dies. | 19:43 |
trown | ah ya... I am supposed to file a heatclient bug for that as well | 19:44 |
trown | kind of surprising to me that `nova boot` returns success to heat when booting a 344 byte html page | 19:46 |
*** zaneb has joined #tripleo | 19:46 | |
*** olap has joined #tripleo | 19:50 | |
*** ccrouch1 has joined #tripleo | 19:53 | |
trown | retrying the fedora ping test with the single worker env | 19:54 |
*** ccrouch has quit IRC | 19:55 | |
*** mkovacik_ has joined #tripleo | 20:00 | |
bnemec | It's just as well. OVB wouldn't work at all if it verified that the image was bootable. | 20:03 |
bnemec | We start the baremetal nodes out with a completely empty qcow2. :-) | 20:03 |
*** mkovacik has quit IRC | 20:04 | |
openstackgerrit | Steven Hardy proposed openstack/tripleo-docs: Document using node capabilities to control placement https://review.openstack.org/274217 | 20:05 |
*** thrash|biab is now known as thrash | 20:05 | |
*** pcaruana has quit IRC | 20:06 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add new cloud-init element https://review.openstack.org/273764 | 20:07 |
openstackgerrit | Steven Hardy proposed openstack/tripleo-docs: Document using node capabilities to control placement https://review.openstack.org/274217 | 20:07 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder: add gentoo support to growroot https://review.openstack.org/273769 | 20:09 |
*** pcaruana has joined #tripleo | 20:09 | |
trown | bnemec: slagle, confirmed the updated fedora image test worked with the single worker env | 20:14 |
*** ccrouch1 has quit IRC | 20:15 | |
bnemec | Oh, and I totally typoed the filename too. .qcow instead of .qcow2. | 20:16 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Use Fedora image for ping test https://review.openstack.org/273699 | 20:17 |
*** shardy has quit IRC | 20:18 | |
trown | ah hehe, yep glance didnt care :) | 20:18 |
openstackgerrit | James Slagle proposed openstack/tripleo-common: Use Fedora image for ping test https://review.openstack.org/273699 | 20:19 |
bnemec | I wondered how the upload to glance was going so fast with a 150 mB image. :-) | 20:19 |
*** pcaruana has quit IRC | 20:19 | |
slagle | it needed a rebase | 20:19 |
slagle | apparently | 20:19 |
bnemec | tripleo.sh -- Overcloud pingtest, SUCCESS \o/ | 20:20 |
*** ccrouch has joined #tripleo | 20:22 | |
*** Marga_ has quit IRC | 20:23 | |
*** penick has quit IRC | 20:24 | |
*** panda_ has quit IRC | 20:26 | |
*** derekh has joined #tripleo | 20:26 | |
*** panda_ has joined #tripleo | 20:27 | |
derekh | CI isn't running on shadys workers=1 patch, as the patch it depends on changed, since its gotta be restarted anyways I'm gonna fix the comment | 20:28 |
*** jcoufal has joined #tripleo | 20:28 | |
*** jcoufal has quit IRC | 20:28 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud https://review.openstack.org/273431 | 20:30 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Remove option of installing tuskar https://review.openstack.org/265383 | 20:32 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Deploy Aodh services, replacing Ceilometer Alarm https://review.openstack.org/265382 | 20:32 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Switch default keystone auth_uri to v3 https://review.openstack.org/265380 | 20:32 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Add check for sufficient memory to undercloud install https://review.openstack.org/265378 | 20:32 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Set Nova's ram_allocation_ratio configuration option to "1.0" By default https://review.openstack.org/265377 | 20:32 |
*** mkovacik_ has quit IRC | 20:38 | |
*** Goneri has quit IRC | 20:43 | |
*** mkovacik has joined #tripleo | 20:45 | |
ayoung | is anyone else getting the error with the openstack cli that doing list command where there are no results reports and out of range error? | 20:45 |
bnemec | Crap, now our pinned CentOS mirror is not cooperating. :-( | 20:46 |
bnemec | Why does nothing work?! | 20:47 |
EmilienM | it's friday I guess | 20:47 |
EmilienM | it's a sign you should stop :-) | 20:47 |
bnemec | ayoung: What list command? | 20:47 |
ryansb | all the servers went home for the weekend ;) | 20:47 |
bnemec | EmilienM: Sounds good. :-) | 20:48 |
EmilienM | no stay here, we have some work :-P | 20:48 |
*** derekh has quit IRC | 20:51 | |
ayoung | bnemec, openstack server list | 20:52 |
ayoung | bnemec, or any other list command | 20:52 |
ayoung | $ . ./overcloudrc | 20:52 |
ayoung | [stack@instack ~]$ openstack server list | 20:52 |
ayoung | list index out of range | 20:52 |
bnemec | ayoung: Ah, yes. I'm seeing the same thing. | 20:52 |
bnemec | Sounds like an OSC bug. | 20:53 |
ayoung | bnemec, yep, and it is a newish one | 20:53 |
ayoung | I suspect the problem is cliff | 20:53 |
prometheanfire | can someone who knows the tripleo gate let me know if this is my fault? I don't think it is... (gate failure) https://review.openstack.org/273769 | 20:54 |
dhellmann | ayoung : add --debug after openstack and before the rest and you should get a traceback | 20:54 |
ayoung | dhellmann, I did, it was cliff ish | 20:54 |
ayoung | I'l paste | 20:54 |
prometheanfire | ah, probably the rpb thing still | 20:54 |
ayoung | dhellmann, http://paste.openstack.org/show/485491/ | 20:55 |
prometheanfire | derekh said he was going to release the fix | 20:55 |
prometheanfire | https://review.gerrithub.io/#/c/261404/ | 20:55 |
dhellmann | ayoung : yeah, that looks like a bug in the auto-width stuff that was added recently. file a bug for us? | 20:57 |
* bnemec notes that dhellmann must have an IRC watch on cliff :-) | 20:57 | |
dhellmann | bnemec : aye | 20:57 |
ayoung | dhellmann, was just making sure there was not one already | 20:57 |
dhellmann | ayoung : cool, thanks | 20:57 |
ayoung | dhellmann, cliff or OSC? | 20:57 |
dhellmann | ayoung : cliff, I think the issue is that the result set is empty so the table is coming back blank, then we're trying to figure out how wide it is | 20:57 |
ayoung | OK...is that launchpad as well>? | 20:58 |
bnemec | prometheanfire: It looks like our pinned mirror is having issues. It's not your patch. | 20:58 |
dhellmann | ayoung : https://launchpad.net/python-cliff | 20:58 |
ayoung | yep | 20:58 |
ayoung | dhellmann, https://bugs.launchpad.net/python-cliff/+bug/1539770 | 20:59 |
openstack | Launchpad bug 1539770 in cliff "Empy set causing out of range error" [Undecided,New] | 20:59 |
prometheanfire | bnemec: neat | 20:59 |
dhellmann | ayoung : thanks | 20:59 |
dhellmann | ayoung : is there any way for you to provide the data set that's being fed into the formatter, to verify my hypothesis? maybe use a different formatter on the command line, like json? | 21:01 |
dhellmann | or csv or something | 21:01 |
bnemec | Now the docs job is failing on the fedora image change. | 21:01 |
trown | wtf | 21:02 |
bnemec | Not that it matters since apparently the mirror is dead. | 21:03 |
bnemec | At least everything is failing fast. | 21:03 |
bnemec | I've gotta take my silver linings where I can get them right now. | 21:03 |
ayoung | dhellmann, sure... | 21:03 |
ayoung | dhellmann, I just missed grabbing it from the output. Added to the bug report | 21:04 |
dhellmann | ayoung : thanks! | 21:05 |
ayoung | bnemec, pingtest failed for me with a cherrypick of the fedoraimage change: | 21:05 |
ayoung | | server1 | 585369f2-7a6e-4eba-be5d-708f2d0e4e46 | ResourceInError: resources.server1: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" | CREATE_FAILED | 2016-01-29T20:47:10 | | 21:05 |
bnemec | ayoung: Did you deploy with --libvirt-type qemu? | 21:06 |
ayoung | bnemec, not explicitly. | 21:06 |
ayoung | bnemec, where do I make that change>? | 21:06 |
bnemec | ayoung: You need to if you're going to test in a virt environment. | 21:06 |
bnemec | ayoung: Are you using tripleo.sh for the overcloud deploy? | 21:06 |
ayoung | bnemec, not quite | 21:07 |
ayoung | bnemec, I ran | 21:07 |
ayoung | openstack overcloud deploy --template /home/stack/tripleo-heat-templates/ | 21:07 |
ayoung | because I was testing the Keystone HTTPD change | 21:07 |
bnemec | ayoung: Okay, just add --libvirt-type qemu to the end of that. | 21:07 |
ayoung | k | 21:07 |
bnemec | openstack overcloud deploy --template /home/stack/tripleo-heat-templates/ --libvirt-type qemu | 21:07 |
ayoung | let me make sure keystone worked... | 21:07 |
ayoung | of course it did...duh | 21:08 |
ayoung | the commands I showed above were run against it | 21:08 |
ayoung | ok...let me teardown and retry | 21:08 |
*** jistr has quit IRC | 21:10 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Pin to mirror.centos.org https://review.openstack.org/274241 | 21:11 |
*** olap has quit IRC | 21:11 | |
bnemec | ^ at least has a chance of passing CI. Maybe. | 21:12 |
bnemec | It's got the rack to itself, so it should be super fast. :-) | 21:13 |
prometheanfire | bnemec: I know it's being worked on, but mind letting me know when I can run recheck? | 21:15 |
bnemec | prometheanfire: I wouldn't count on it happening before next week. :-( | 21:16 |
bnemec | Our CI pinned mirror is down, which is blocking testing of all the other patches. | 21:16 |
bnemec | Because of course it is. | 21:16 |
prometheanfire | lesigh | 21:20 |
prometheanfire | was really hoping this wouldn't get stuck in the mire that is openstack gateing | 21:21 |
bnemec | Hmm, apparently we're still defaulting to the centos 7.0 image. That's gotta make our image builds take longer. | 21:22 |
trown | bnemec: not sure it would matter, we are upgrading what feels like every package even on the 7.1 image | 21:23 |
bnemec | Actually, I'm confused now. We're pointing at the current image, but the size of file we download doesn't match what I'm seeing. :-/ | 21:25 |
bnemec | Oh. | 21:25 |
* bnemec can't read | 21:25 | |
bnemec | Never mind. Complete false alarm. | 21:26 |
trown | hehe | 21:26 |
trown | at least its friday | 21:26 |
ayoung | bnemec, Ummmm not sure if this is a step forward but: http://paste.openstack.org/show/485495/ | 21:31 |
bnemec | ayoung: Yeah, we need to change the default. Let me find the variable you need to set to fix that. | 21:32 |
ayoung | bnemec, but I can ping the vm....so....yes! | 21:32 |
bnemec | ayoung: export OVERCLOUD_PINGTEST_OLD_HEATCLIENT=0 | 21:32 |
bnemec | Heat client made a change that broke us. | 21:32 |
bnemec | trown: Can we just change the default on that since we changed the CI pin so by default we're deploying a client that doesn't work with the current default? | 21:33 |
bnemec | Default default default | 21:33 |
trown | bnemec: ya probably better to have it default to working on mitaka | 21:33 |
trown | bnemec: but what works on mitaka will not work on liberty and vice versa | 21:33 |
bnemec | Ugh. | 21:34 |
trown | y | 21:34 |
*** rcernin has joined #tripleo | 21:36 | |
*** jayg is now known as jayg|g0n3 | 21:37 | |
*** absubram has joined #tripleo | 21:41 | |
trown | bnemec: filed a heatclient bug for it, though I think it might be a cliff thing https://bugs.launchpad.net/python-heatclient/+bug/1539783 | 21:43 |
openstack | Launchpad bug 1539783 in python-heatclient "backwards-incompatible change to raw format output" [Undecided,New] | 21:43 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Detect when we need the alternate heat command https://review.openstack.org/274263 | 21:50 |
bnemec | trown: ^ | 21:50 |
bnemec | Seems to be working for me locally on the new heat client. | 21:50 |
trown | nice | 21:50 |
trown | that is way better | 21:50 |
bnemec | Error: Could not find dependency File[/etc/httpd/conf/ports.conf] for File[/etc/httpd/conf/httpd.conf] at /etc/puppet/modules/apache/manifests/init.pp:349 | 21:51 |
bnemec | OMFG | 21:51 |
bnemec | I give up. | 21:51 |
*** mkovacik has quit IRC | 21:53 | |
*** derekh has joined #tripleo | 21:53 | |
slagle | wut | 21:53 |
bnemec | My change to switch our pinned centos mirror just failed on that during the undercloud install. | 21:54 |
ayoung | bnemec, soooo can't run yet | 21:55 |
ayoung | Unable to create the flat network. Physical network datacentre is in use. | 21:55 |
ayoung | I should just tear that down by hand? | 21:56 |
ayoung | stave delete did not clean it up | 21:56 |
bnemec | ayoung: Yeah, you'll need to delete all of the stuff it had created. | 21:56 |
bnemec | Unfortunately that failure causes it to skip cleanup too. | 21:56 |
ayoung | ERROR: Property error: : resources.server1.properties.image: : Multiple physical resources were found with name (pingtest_image). | 21:57 |
ayoung | hmmmm | 21:57 |
bnemec | ayoung: Delete by UUID. | 21:57 |
ayoung | image? | 21:57 |
bnemec | You probably have two pingtest_image's in glance. | 21:57 |
ayoung | it still seems to be running... | 21:57 |
ayoung | kill it and delete? | 21:58 |
bnemec | ayoung: I don't like its odds of completing, so I'd say yes. | 21:58 |
ayoung | bnemec, should I delete the flavor too? | 22:00 |
bnemec | ayoung: No | 22:00 |
bnemec | That was actually created as part of the original deployment. | 22:01 |
ayoung | OK. | 22:01 |
ayoung | image is gone. network is gone. Anything else? | 22:01 |
*** pradk has quit IRC | 22:01 | |
bnemec | ayoung: Heat stack? | 22:01 |
ayoung | dead | 22:01 |
ayoung | I killed it all good | 22:01 |
bnemec | ayoung: Heat, glance, and neutron are the only things we clean up from the pingtest, so that should do it. | 22:02 |
ayoung | pinging the floating ips.... | 22:02 |
ayoung | hasn;'t died yet | 22:02 |
ayoung | cleanup | 22:02 |
ayoung | bnemec, success. | 22:05 |
ayoung | bnemec, so, is there something I can do to move this along? This is all awesome | 22:05 |
bnemec | \o/ Something is actually working for somebody. | 22:05 |
bnemec | ayoung: I have no idea. We have so many problems stacked up that are blocking CI that I have no idea when we'll be back in business. | 22:06 |
bnemec | The latest one seems to be that puppet-apache has broken us. | 22:06 |
openstackgerrit | xin wu proposed openstack/tripleo-heat-templates: Add extra config yaml files for big switch agents. https://review.openstack.org/271922 | 22:07 |
ayoung | bnemec, the hacks I needed were the fedora image, setting up with libvirt and the env var to get around the IP test | 22:07 |
ayoung | ok...I'm going to go be a dad now | 22:07 |
bnemec | ayoung: Have a good weekend | 22:07 |
openstackgerrit | xin wu proposed openstack/tripleo-heat-templates: Include big switch puppet modules for deploying overcloud https://review.openstack.org/271940 | 22:09 |
*** rlandy has quit IRC | 22:10 | |
openstackgerrit | xin wu proposed openstack/tripleo-heat-templates: Include big switch puppet modules for deploying overcloud https://review.openstack.org/271953 | 22:10 |
*** Marga_ has joined #tripleo | 22:12 | |
*** dprince has quit IRC | 22:12 | |
bnemec | Warning: Scope(Concat::Fragment[Listen 80]): The $ensure parameter to concat::fragment is deprecated and has no effect. | 22:25 |
bnemec | It's not deprecated if it already has no effect. | 22:25 |
derekh | bnemec: fact | 22:26 |
bnemec | Yep, the concat module just released. 99% sure that's what broke us: https://github.com/puppetlabs/puppetlabs-concat/commit/55cf9354bb02635e3a54d96cf22c4031751ef67a | 22:27 |
*** sbalukoff has quit IRC | 22:29 | |
derekh | bnemec: are you gonna pin it | 22:32 |
bnemec | derekh: Yeah, just working up the change now. | 22:33 |
*** trown is now known as trown|outttypeww | 22:33 | |
derekh | bnemec: ok, the the top of the chain passes we can probbaly just merge all 4? changes | 22:33 |
bnemec | derekh: I don't even know how many changes are needed. There are so many things wrong now that I've lost track. :-/ | 22:34 |
derekh | bnemec: If they havn't been merged tonight I'll get them merged in the morning | 22:34 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Pin puppet-concat https://review.openstack.org/274278 | 22:34 |
*** jdob has quit IRC | 22:35 | |
bnemec | There's a reason we don't do releases on Friday. | 22:36 |
* bnemec remembers that we just released tripleoclient earlier today | 22:37 | |
bnemec | That hadn't previously been released though, so it wasn't going to break anyone. | 22:38 |
derekh | bnemec: lets merge the centos mirror switch https://review.openstack.org/#/c/274241/1 | 22:40 |
derekh | bnemec: looks like its getting a lot further | 22:40 |
bnemec | derekh: Yeah, it at least got further than the other patches. | 22:40 |
derekh | bnemec: yup, merging | 22:40 |
openstackgerrit | Merged openstack-infra/tripleo-ci: Pin to mirror.centos.org https://review.openstack.org/274241 | 22:41 |
bnemec | The first run was a little slow, but once squid has cached all the things it should be better. | 22:41 |
*** thrash is now known as thrash|bbl | 22:41 | |
bnemec | I guess I can verify the concat change locally. It'll probably be faster than CI since my VM is already up and running. | 22:43 |
derekh | bnemec: ya, I think this may have been better http://mirrors.usc.edu/pub/linux/distributions/centos | 22:44 |
derekh | bnemec: based on some pings | 22:44 |
bnemec | derekh: Yeah, I don't know that I have access to the rack so I couldn't really tell what was fastest from there. | 22:44 |
derekh | bnemec: but as long as the proxy is doing its job it should make little difference | 22:44 |
bnemec | That's what I figured. | 22:45 |
prometheanfire | derekh: hi | 22:45 |
derekh | bnemec: yup, add yourself to the list to become and admin ;-) | 22:45 |
bnemec | Maybe we should add a ping to the mirror setting part of the script and fall back to other mirrors if one of them goes down. | 22:45 |
prometheanfire | derekh: was https://review.gerrithub.io/#/c/261404/ ever finished? | 22:45 |
derekh | bnemec: good plan | 22:45 |
bnemec | Damn, it still failed. Hopefully I just screwed up the override locally... | 22:46 |
derekh | prometheanfire: the packaging guys want to see a core review on you DIB patch first, so its less likely it will majorly change | 22:47 |
prometheanfire | derekh: so my dib patch has to merge first? | 22:48 |
prometheanfire | it can't merge becaues it can't pass without that patch | 22:49 |
derekh | prometheanfire: no it doesn't, just needs some reviews, | 22:49 |
derekh | there happy to merge the packaging change first | 22:50 |
derekh | just want to make it more likly they merge the correct thing, | 22:50 |
prometheanfire | ya, it's this one https://review.openstack.org/#/c/273769/ | 22:50 |
prometheanfire | right? | 22:50 |
derekh | prometheanfire: that looks like it, was that part of a bigger patch yesterday? | 22:52 |
prometheanfire | yes, I split it | 22:52 |
prometheanfire | it's going to be hard to get core reviers to look at it if it's failing | 22:52 |
prometheanfire | dunno why it's being made so hard | 22:52 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Pin puppet-concat https://review.openstack.org/274278 | 22:52 |
bnemec | Dammit, the pin was wrong. | 22:53 |
prometheanfire | so, dib cores, can you look at https://review.openstack.org/#/c/273769 please? | 22:53 |
bnemec | Okay, this version looks like it will work. It got past the previous error in my local run. | 22:54 |
derekh | prometheanfire: added myself to the review so it will be in my face when I come in next week, | 22:55 |
bnemec | Man, there isn't even a deprecation warning in the previous version of concat. | 22:55 |
prometheanfire | derekh: thanks, I don't know why this feels like it's moving so slow :( | 22:55 |
*** sbalukoff has joined #tripleo | 22:57 | |
derekh | prometheanfire: unfortunately there is a fairly big backlog of reviews in tripleo land, we need to make things better | 22:57 |
prometheanfire | ah, :( | 22:57 |
derekh | prometheanfire: at them moment trunk is broken anyways so were trying to get that back on track | 22:58 |
prometheanfire | lol | 22:58 |
derekh | bnemec: +2, I made the exact same mistack myself last week pinning something else | 23:00 |
derekh | bnemec: if you've tested locally should we push it through and recheck the other tests, so we have a chance of getting stuff back on track before the next thing crops up? | 23:01 |
bnemec | derekh: Yeah, it worked for me, and it can't possibly make things worse. :-) | 23:02 |
derekh | bnemec: don't, I used that line in my comment :-) | 23:03 |
*** penick has joined #tripleo | 23:03 | |
derekh | *done | 23:03 |
bnemec | What?! This was reported to them yesterday and they merged it anyway: https://tickets.puppetlabs.com/browse/MODULES-3018 | 23:03 |
bnemec | FFS | 23:03 |
* bnemec needs a table to flip | 23:04 | |
bnemec | Maybe several | 23:04 |
openstackgerrit | Merged openstack/tripleo-common: Pin puppet-concat https://review.openstack.org/274278 | 23:04 |
* derekh shakes his fist | 23:05 | |
derekh | bnemec: ok, I rechecked them, will take a look in the morning and see how things have moved on | 23:07 |
bnemec | derekh: Sounds good, thanks. | 23:07 |
derekh | bnemec: and thank you | 23:08 |
derekh | have a good weekend all | 23:08 |
*** derekh has quit IRC | 23:08 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Provide fallback mirrors for centos pin https://review.openstack.org/274287 | 23:17 |
bnemec | And with that, I need a drink. | 23:18 |
bnemec | Maybe several | 23:18 |
*** egafford has quit IRC | 23:41 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud https://review.openstack.org/273431 | 23:50 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!