*** cdearborn has quit IRC | 00:04 | |
*** [1]cdearborn has joined #tripleo | 00:19 | |
*** social has quit IRC | 00:22 | |
*** bana_k has quit IRC | 00:29 | |
*** saneax is now known as saneax-_-|AFK | 00:30 | |
*** [1]cdearborn has quit IRC | 00:39 | |
*** bana_k has joined #tripleo | 00:45 | |
*** thrash is now known as thrash|g0ne | 00:50 | |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Migrate overcloud update to a mistral workflow https://review.openstack.org/381351 | 00:51 |
---|---|---|
*** tiswanso has joined #tripleo | 00:57 | |
*** limao_ has quit IRC | 01:03 | |
*** limao has joined #tripleo | 01:04 | |
*** limao has quit IRC | 01:07 | |
*** bana_k has quit IRC | 01:37 | |
*** bana_k has joined #tripleo | 01:40 | |
larsks | I'm seeing deployments fail with: [overcloud]: CREATE_FAILED Resource CREATE failed: Expression consumed too much memory | 01:54 |
*** bana_k has quit IRC | 01:54 | |
larsks | I am pretty sure this is not the undercloud itself running out of memory. | 01:54 |
larsks | Any hints as to what this error is referring? | 01:55 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Include ceilometer in swift proxy pipeline https://review.openstack.org/371950 | 02:16 |
openstackgerrit | Emilien Macchi proposed openstack/puppet-tripleo: Clean out UI httpd configuration file https://review.openstack.org/380152 | 02:18 |
*** yamahata has quit IRC | 02:36 | |
*** dmacpher is now known as dmacpher-afk | 02:39 | |
*** myoung|bbl is now known as myoung | 02:44 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Cinder volume service is not managed by Pacemaker on BlockStorage https://review.openstack.org/381240 | 03:00 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Set ceph osd max object name and namespace len on upgrade when on ext4 https://review.openstack.org/379401 | 03:01 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: reload HAProxy config in HA setups when certificate is updated https://review.openstack.org/381374 | 03:04 |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Set ceph osd max object name and namespace len on upgrade when on ext4 https://review.openstack.org/381375 | 03:05 |
openstackgerrit | Merged openstack/python-tripleoclient: Remove another openstackclient import https://review.openstack.org/381247 | 03:07 |
*** sudipto has joined #tripleo | 03:08 | |
*** sudipto has joined #tripleo | 03:09 | |
*** sudipto_ has joined #tripleo | 03:09 | |
*** links has joined #tripleo | 03:10 | |
openstackgerrit | Tuan Luong-Anh proposed openstack/tripleo-specs: Fix a typo in documentation https://review.openstack.org/381379 | 03:13 |
*** sudipto has quit IRC | 03:23 | |
*** sudipto_ has quit IRC | 03:23 | |
openstackgerrit | RedHat RDO CI proposed openstack/tripleo-heat-templates: GATE TEST, please ignore https://review.openstack.org/365449 | 03:30 |
*** tiswanso has quit IRC | 03:45 | |
*** sudipto_ has joined #tripleo | 04:07 | |
*** sudipto has joined #tripleo | 04:07 | |
*** dmacpher-afk is now known as dmacpher | 04:09 | |
*** absubram has joined #tripleo | 04:17 | |
*** absubram_ has joined #tripleo | 04:18 | |
*** absubram has quit IRC | 04:21 | |
*** absubram_ is now known as absubram | 04:21 | |
*** tzumainn has quit IRC | 04:27 | |
*** saneax-_-|AFK is now known as saneax | 04:33 | |
*** pmannidi has quit IRC | 04:39 | |
*** pgadiya has joined #tripleo | 04:43 | |
*** pmannidi has joined #tripleo | 04:56 | |
*** rwsu has quit IRC | 04:59 | |
*** masco has joined #tripleo | 05:03 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Make keystone api network hiera composable https://review.openstack.org/380386 | 05:18 |
*** fultonj has quit IRC | 05:36 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 05:40 |
*** ccamacho has quit IRC | 05:43 | |
*** jaosorior has joined #tripleo | 05:46 | |
openstackgerrit | Merged openstack/puppet-tripleo: Clean out UI httpd configuration file https://review.openstack.org/380152 | 05:48 |
*** flaper87 has joined #tripleo | 05:49 | |
*** flaper87 has joined #tripleo | 05:49 | |
*** mbozhenko has joined #tripleo | 05:51 | |
*** yamahata has joined #tripleo | 05:59 | |
jaosorior | bandini: it was still failing pretty randomly regarding the redis password | 06:06 |
jaosorior | but at least today things look better | 06:06 |
jaosorior | I think the old package was still cached somewhere | 06:06 |
bandini | jaosorior: ah that would explain things. I did go wtf this morning ;) | 06:07 |
jaosorior | indeed | 06:07 |
*** numans has joined #tripleo | 06:11 | |
*** mbozhenko has quit IRC | 06:15 | |
*** lmiccini has joined #tripleo | 06:15 | |
*** bana_k has joined #tripleo | 06:18 | |
*** rasca has joined #tripleo | 06:26 | |
*** mbozhenko has joined #tripleo | 06:27 | |
*** jbadiapa has quit IRC | 06:29 | |
*** jbadiapa has joined #tripleo | 06:30 | |
*** jprovazn has joined #tripleo | 06:30 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: reload HAProxy config in HA setups when certificate is updated https://review.openstack.org/381374 | 06:30 |
*** hjensas has quit IRC | 06:30 | |
*** bana_k has quit IRC | 06:40 | |
*** rcernin has joined #tripleo | 06:43 | |
*** jlinkes has joined #tripleo | 06:45 | |
*** ccamacho has joined #tripleo | 06:47 | |
*** chem has joined #tripleo | 06:48 | |
*** jlinkes has quit IRC | 06:50 | |
openstackgerrit | Sharat Sharma proposed openstack/tripleo-common: Changed the link to home-page https://review.openstack.org/381443 | 06:51 |
*** tesseract- has joined #tripleo | 06:52 | |
openstackgerrit | Sharat Sharma proposed openstack/tripleo-validations: Changed the link to home-page https://review.openstack.org/381445 | 06:54 |
*** cylopez has joined #tripleo | 06:54 | |
*** jlinkes has joined #tripleo | 06:55 | |
*** b00tcat has joined #tripleo | 06:56 | |
*** yamahata has quit IRC | 06:56 | |
*** jlinkes_ has joined #tripleo | 06:59 | |
*** jlinkes_ has quit IRC | 06:59 | |
*** jlinkes has quit IRC | 07:00 | |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove usage of --compute-scale from the baremetal overcloud docs https://review.openstack.org/381454 | 07:14 |
*** b00tcat has quit IRC | 07:15 | |
*** b00tcat has joined #tripleo | 07:15 | |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove usage of --control-scale from the HA docs https://review.openstack.org/381458 | 07:17 |
*** fzdarsky has joined #tripleo | 07:21 | |
*** panda|Zz is now known as panda | 07:22 | |
*** dtantsur|afk is now known as dtantsur | 07:26 | |
*** abehl has joined #tripleo | 07:28 | |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove the replace controller docs https://review.openstack.org/381469 | 07:29 |
*** aufi has joined #tripleo | 07:31 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 07:32 |
*** hjensas has joined #tripleo | 07:35 | |
*** hjensas has quit IRC | 07:35 | |
*** hjensas has joined #tripleo | 07:35 | |
*** tremble has joined #tripleo | 07:37 | |
*** tremble has joined #tripleo | 07:37 | |
*** jlinkes has joined #tripleo | 07:37 | |
*** snecklifter has joined #tripleo | 07:44 | |
*** jpich has joined #tripleo | 07:44 | |
snecklifter | Morning #tripleo | 07:44 |
snecklifter | On my Mitaka undercloud node, keystone-manage token_flush doesn't appear to be reducing the db size. | 07:45 |
snecklifter | Any ideas? | 07:45 |
openstackgerrit | Michele Baldessari proposed openstack/tripleo-heat-templates: Change the rabbitmq ha policies during an M/N Upgrade https://review.openstack.org/381485 | 07:45 |
*** jtomasek|afk is now known as jtomasek | 07:51 | |
b00tcat | hi, I'm deploying overcloud-full and everything goes fine, but when the `deploy` command ends my controller is gone and only the compute node is up | 07:52 |
b00tcat | I can see that the controller is up at one point using `nmap` | 07:53 |
b00tcat | but then it's gone | 07:53 |
b00tcat | anybody saw this before? | 07:53 |
b00tcat | I'm redeploying after a reboot and see if this helps.. | 07:53 |
*** jpena|off is now known as jpena | 07:53 | |
*** dbecker has quit IRC | 07:54 | |
openstackgerrit | Michele Baldessari proposed openstack/tripleo-heat-templates: Change rabbitmq queues HA mode from ha-all to ha-exactly https://review.openstack.org/381489 | 07:55 |
*** dbecker has joined #tripleo | 07:56 | |
*** yamahata has joined #tripleo | 07:56 | |
bandini | jaosorior: ok https://review.openstack.org/#/c/380665/ passed ci now (you were right something was probably cached somewhere). ok to +A it? | 07:56 |
*** jistr has joined #tripleo | 07:57 | |
*** mcornea has joined #tripleo | 07:57 | |
jaosorior | bandini: sure | 07:58 |
*** yamahata has quit IRC | 07:58 | |
jaosorior | bandini: done | 07:58 |
bandini | jaosorior++ thanks! | 07:58 |
*** rasca has quit IRC | 08:00 | |
*** hewbrocca-afk is now known as hewbrocca | 08:01 | |
*** radeks has joined #tripleo | 08:01 | |
*** zoli_gone-proxy is now known as zoliXXL | 08:02 | |
*** rwsu has joined #tripleo | 08:03 | |
*** amoralej|off is now known as amoralej | 08:03 | |
*** rasca has joined #tripleo | 08:03 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 08:06 |
skramaja | hello.. we got a new lab setup which we are trying to bring up for TripleO master. during the deploy process, the IPMI is able to boot the machine, DHCP request is reaching the undercloud and an IP is being assinged to the overcloud node. But after acquring DHCP, the overcloud node is trying for a TFTP connect and it is timing out. what i understand is that it should use http to boot the kernel and ipa ramdisk. | 08:07 |
skramaja | any idea whether TFTP is right and supposed to work or there any bios setting to use http only? | 08:07 |
*** yamahata has joined #tripleo | 08:07 | |
skramaja | dtantsur: jaosorior ccamacho ^ | 08:08 |
*** ohamada has joined #tripleo | 08:08 | |
dtantsur | skramaja, TFTP is needed to bootstrap iPXE, if you hardware does not support it natively | 08:09 |
ccamacho | jaosorior, hey man sorry for not making the reviews yesterday... Ill check your reviews now :) | 08:10 |
skramaja | dtantsur: TFTP connection is timing out. i tried tcpdump on port 69 there is no trace. any idea on how to debug the TFTP connection timeout? | 08:10 |
*** egafford has joined #tripleo | 08:11 | |
*** stendulker has joined #tripleo | 08:11 | |
jaosorior | ccamacho: don't worry about it, thanks for checking them out | 08:11 |
dtantsur | skramaja, hard to tell, maybe something about your hardware, maybe something about your networking | 08:11 |
dtantsur | skramaja, first ensure that you access TFTP at least locally | 08:12 |
skramaja | dtantsur: yes. i did that locallay and verified tftp is able to download the files successfully. | 08:12 |
*** yamahata has quit IRC | 08:13 | |
*** yamahata has joined #tripleo | 08:13 | |
bandini | jaosorior: can you take a peek at this one as well please? https://review.openstack.org/#/c/379586/ | 08:14 |
*** athomas has joined #tripleo | 08:15 | |
*** milan has joined #tripleo | 08:20 | |
marios | bandini: looks like there was more update at https://review.openstack.org/#/c/379586/ | 08:24 |
marios | bandini: going over review requests from yesterday | 08:24 |
ccamacho | jaosorior, just to double check From: http://jaormx.github.io/2016/how-is-tls-powered-by-certmonger-being-done/ | 08:24 |
ccamacho | Then check: https://review.openstack.org/#/c/366548/ and https://review.openstack.org/#/c/356430/ | 08:24 |
ccamacho | right?? | 08:24 |
bandini | marios: I addressed al the comments | 08:24 |
bandini | marios: or did I miss any? | 08:25 |
marios | bandini: no is fine i was just looking,.... jaosorior already +A it looks like anyway | 08:25 |
jaosorior | ccamacho: yep | 08:26 |
ccamacho | ack | 08:26 |
marios | bandini: i mean, i didn't expect changes is all but it looks fine | 08:26 |
bandini | marios: I addressed all the comments last night, so it should be good afaict | 08:26 |
b00tcat | is it possible to force a tripleo deployment to be sequential? a.k.a. first deploy the controller, then the compute? | 08:29 |
*** fzdarsky has quit IRC | 08:29 | |
*** tosky has joined #tripleo | 08:31 | |
*** akrivoka has joined #tripleo | 08:36 | |
*** yamahata has quit IRC | 08:37 | |
*** abehl has quit IRC | 08:39 | |
d0ugal | radeks: I have replied to your review here: https://review.openstack.org/#/c/381454 | 08:51 |
*** dtantsur is now known as dtantsur|bbl | 08:52 | |
*** abehl has joined #tripleo | 08:52 | |
*** mbozhenko has quit IRC | 09:00 | |
*** electrofelix has joined #tripleo | 09:00 | |
openstackgerrit | Julie Pichon proposed openstack/puppet-tripleo: Clean out UI httpd configuration file https://review.openstack.org/381568 | 09:03 |
openstackgerrit | Julie Pichon proposed openstack/puppet-tripleo: Use FallbackResource instead of Rewrite for UI https://review.openstack.org/381276 | 09:04 |
*** derekh has joined #tripleo | 09:04 | |
*** paramite has joined #tripleo | 09:04 | |
*** gfidente has joined #tripleo | 09:11 | |
*** mbozhenko has joined #tripleo | 09:11 | |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Move the opensuse mkinitrd script to the zypper element https://review.openstack.org/381574 | 09:11 |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add zypper-minimal element https://review.openstack.org/381575 | 09:11 |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add opensuse-minimal element https://review.openstack.org/381576 | 09:11 |
openstackgerrit | Merged openstack/puppet-tripleo: Fix the timeout for pacemaker systemd resources https://review.openstack.org/380665 | 09:12 |
*** abehl has quit IRC | 09:13 | |
saneax | dtantsur|bbl, trying to debug the tftp issue, it seems the undercloud is getting a request like this - | 09:15 |
saneax | 0.60.21.4.echo > 10.60.21.255.echo: [udp sum ok] UDP, length 43 | 09:15 |
saneax | 09:14:42.718948 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 71) | 09:15 |
saneax | should the undercloud reply for echo request for 10.60.21.255? | 09:15 |
skramaja | dtantsur|bbl: saneax and i are working on the same issue ^^ | 09:19 |
*** fzdarsky has joined #tripleo | 09:24 | |
*** abehl has joined #tripleo | 09:25 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 09:27 |
*** zoliXXL is now known as zoli|mtg | 09:28 | |
*** fzdarsky has joined #tripleo | 09:30 | |
openstackgerrit | Sharat Sharma proposed openstack/puppet-pacemaker: Changed the home-page to point Openstack Puppet Homepage https://review.openstack.org/381582 | 09:31 |
openstackgerrit | Carlos Camacho proposed openstack/python-tripleoclient: Add optional roles_data.yaml override https://review.openstack.org/378740 | 09:34 |
*** ramishra has quit IRC | 09:39 | |
*** ramishra has joined #tripleo | 09:42 | |
openstackgerrit | Michele Baldessari proposed openstack/puppet-tripleo: Fix the timeout for pacemaker systemd resources https://review.openstack.org/381584 | 09:42 |
*** shardy has joined #tripleo | 09:47 | |
*** dtantsur|bbl is now known as dtantsur | 09:50 | |
*** gfidente has quit IRC | 09:51 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 09:51 |
*** gfidente has joined #tripleo | 09:53 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 09:54 |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add zypper-minimal element https://review.openstack.org/381575 | 09:57 |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add opensuse-minimal element https://review.openstack.org/381576 | 09:57 |
*** jistr is now known as jistr|mtg | 09:59 | |
*** zigo has quit IRC | 10:01 | |
matbu | marios: hey man, i'm trying to understand the CI failure for the tripleoclient review: | 10:02 |
matbu | http://logs.openstack.org/47/379547/14/check/gate-tripleo-ci-centos-7-nonha-multinode-updates-nv/1ee38a4/console.html#_2016-10-03_15_17_44_153568 | 10:02 |
matbu | marios: i don't understand from where comes this user-files/97850658acf0327e3882d6e8e00a2f2d-net-config-multinode.yaml' | 10:03 |
matbu | marios: do you have an idea ? | 10:03 |
*** radeks has quit IRC | 10:03 | |
*** zigo has joined #tripleo | 10:04 | |
*** zigo is now known as Guest84780 | 10:05 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in generic.role.j2.yaml https://review.openstack.org/381593 | 10:05 |
dtantsur | saneax, skramaja, no idea what this echo request is, to be honest | 10:08 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 10:08 |
*** mbozhenko has quit IRC | 10:08 | |
dtantsur | saneax, skramaja, what is the hardware you're using? anything interesting about network topology? do you see TFTP requests coming to undercloud (port 69 IIRC) | 10:09 |
marios | matbu: looking | 10:09 |
marios | matbu: could it be something went wrong with the copied file? | 10:10 |
marios | matbu: i mean if /tmp/tmp.... is what you just pulled from swift with the change you made | 10:10 |
shardy | matbu: The additional -e foo.yaml files get added to the plan under a special user-files directory | 10:11 |
shardy | so I suspect the path prefix is broken or something, looking at the logs | 10:11 |
marios | matbu: shardy well there is an extra '\' in the patch there Errno 2] No such file or directory: '/tmp/tmpTjatcP/tripleo-heat-templates//user-files/97850658acf0327e3882d6e8e00a2f2d-net-config-multinode.yaml | 10:12 |
marios | could it be that i wonder | 10:12 |
*** leanderthal|afk is now known as leanderthal | 10:12 | |
*** mbozhenko has joined #tripleo | 10:12 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 10:12 |
shardy | Is it an ordering thing, e.g you're downloading from the plan before user-files is created? | 10:12 |
matbu | marios: nop the extra / is not a pb | 10:12 |
shardy | you should be able to reproduce by passing a -e something_outside_tht.yaml | 10:13 |
matbu | shardy: prehaps, idk what is this user-file | 10:13 |
marios | matbu: getting you link it is from ci | 10:13 |
matbu | shardy: hm k, i already tried on my env, with an extra file outside of tht | 10:13 |
marios | https://github.com/openstack-infra/tripleo-ci/tree/master/heat-templates here matbu | 10:13 |
shardy | matbu: d0ugal can probably explain it - basically it's where we put the stuff resolved locally which doesn't exist in the plan created from --templates foo | 10:14 |
*** Guest84780 has quit IRC | 10:14 | |
shardy | I think it's fair to say all of this logic needs work to simplify, but I'll take a look and see if I can spot where the patch is breaking | 10:14 |
saneax | no dtantsur - the tcpdump shows, DHCP requeest which gives the host IP address, after which we see two more requests - | 10:16 |
saneax | 04:44:42.149235 IP 10.60.21.81.newlixengine > rhbuildhost.nfv.rh.tftp: 35 RRQ "undionly.kpxe" octet blksize 1456 | 10:16 |
matbu | ack i'll try to reproduce it | 10:16 |
saneax | 04:45:18.175627 IP 10.60.21.81.newlixconfig > rhbuildhost.nfv.rh.tftp: 35 RRQ "undionly.kpxe" octet blksize 1456 | 10:16 |
matbu | marios: thx marios | 10:16 |
*** zigo_ has joined #tripleo | 10:16 | |
gfidente | thanks jaosorior :) | 10:17 |
matbu | i think i will use the heat templates from tripleo-ci. Maybe i missed something, it probably needs additional unit tests | 10:17 |
dtantsur | saneax, ok, you can grep journalctl over "in.tftp" and see what it says there | 10:18 |
jaosorior | gfidente: no biggie | 10:20 |
saneax | there is nothing for in.tftp :( | 10:21 |
saneax | except the normal xinetd startup logs | 10:22 |
saneax | need to run a more complete tcpdump on the undercloud and host machine | 10:23 |
*** hjensas has quit IRC | 10:24 | |
saneax | will let I will ping you back dtantsur | 10:24 |
saneax | dtantsur, just for my knowledge, generally if UEFI boot is enabled,the request should be a http, after the DHCP offer ? | 10:27 |
shardy | ccamacho: Hey, I notice you rebased https://review.openstack.org/#/c/378740/ | 10:28 |
shardy | I was planning to add some tests and a similar interface for plan create today, unless you're planning to do it? | 10:28 |
shardy | what I posted should work tho, I tested it locally | 10:29 |
shardy | if that would help share the load of the remaining composable roles patches, I can do that while you focus on the tht patches? | 10:29 |
openstackgerrit | Martin Mágr proposed openstack/puppet-tripleo: Deploy monitoring/logging agents sooner https://review.openstack.org/381604 | 10:31 |
ccamacho | shardy yeah! If you can help there would be awesome | 10:32 |
ccamacho | I notice that we are missing also the generation of the main template for custome roles | 10:33 |
shardy | ccamacho: ack, OK I'll focus on the tripleoclient stuff, and pull/test/review your tht patches | 10:33 |
shardy | ccamacho: yes, that's the final piece | 10:33 |
ccamacho | ack I already pushed the submission | 10:33 |
ccamacho | just testing it locally | 10:33 |
shardy | I think there's going to be a few ugly special-cases there to cope with inconsistent parameter naming etc | 10:33 |
shardy | so we can either leave the old per-role templates in place (and ignore generating them), or have a few j2 conditionals | 10:34 |
ccamacho | an the tripleo-common update as we should not replace i.e. compute.yaml with the generic one | 10:34 |
ccamacho | what im testing is to create a generic.role.j2.yaml | 10:34 |
shardy | yeah, but if we choose the do-not-relplace option, we've got a problem for updates, as we won't know when to replace | 10:34 |
shardy | ccamacho: ack, I'll check out your patch and we can discuss further if needed | 10:35 |
shardy | thanks for picking it up :) | 10:35 |
ccamacho | sure | 10:35 |
ccamacho | np let hope we can land this before Thursday | 10:35 |
*** thrash|g0ne is now known as thrash | 10:37 | |
dtantsur | saneax, these are not quite related.. if your machine does not have iPXE ROM, it first has to download ipxe.efi (undionly.kpxe is for BIOS) | 10:39 |
saneax | dtantsur, so before it requests for undionly.kpxe, it should request for ipxe,efi? | 10:40 |
saneax | It will help if I get a reference doc I can dig more into it | 10:40 |
*** leanderthal has quit IRC | 10:41 | |
dtantsur | saneax, instead | 10:43 |
dtantsur | undionly.kpxe is for BIOS only, you don't need it for UEFI | 10:43 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in generic.role.j2.yaml https://review.openstack.org/381593 | 10:44 |
saneax | ok, clear now, and the url should be undercloud_ip:69/undionly.kpxe | 10:45 |
saneax | the url is working locally on the undercloud, however on the console for the remote server its timing out | 10:45 |
*** rain has joined #tripleo | 10:45 | |
*** rain is now known as leanderthal | 10:46 | |
saneax | I can see a lot of udp echo requests landing on the undercloud with dest ip as the broadcast IP of the provisioning network | 10:46 |
saneax | undercloud is not replying | 10:46 |
jaosorior | shardy: seems that the commit fixing the hostnames is failing | 10:51 |
jaosorior | shardy: I believe it's because, now that mysql is binding on the right place, trying to do actions on it fails, since we need to specify the hostname to the provider | 10:52 |
jaosorior | it used to work cause it used to bind to ctlplane | 10:52 |
shardy | jaosorior: aha, I didn't see that locally as I didn't try with net-iso | 10:52 |
shardy | jaosorior: do you have a fix you can push? | 10:53 |
jaosorior | shardy: I don't, I just started looking into it a bit ago | 10:53 |
shardy | jaosorior: Ok, I was about to dig into it, but you've ahead of me in figuring it out, are you OK to go ahead and dig into the solution? | 10:54 |
shardy | s/you've/you're/ | 10:55 |
jaosorior | shardy: looking into it | 10:56 |
shardy | jaosorior: ack, thanks! | 10:57 |
*** rcernin has quit IRC | 10:57 | |
*** rcernin has joined #tripleo | 10:58 | |
*** jistr|mtg is now known as jistr | 11:00 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Use netapp_host_type instead of netapp_eseries_host_type https://review.openstack.org/363955 | 11:00 |
*** dprince has joined #tripleo | 11:01 | |
*** coolsvap is now known as coolsvap_ | 11:01 | |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-heat-templates: Use netapp_host_type instead of netapp_eseries_host_type https://review.openstack.org/381656 | 11:02 |
*** jpena is now known as jpena|lunch | 11:09 | |
*** radeks has joined #tripleo | 11:10 | |
jschlueter | larsks: you saw "[overcloud]: CREATE_FAILED Resource CREATE failed: Expression consumed too much memory" failure ... did you get any hints or ideas why it was failing? | 11:11 |
jschlueter | ccamacho: ^^ | 11:12 |
*** sudipto has quit IRC | 11:12 | |
jschlueter | we just saw it again today and was wondering if anyone else had hints as to why or what they did to resolve this issue. | 11:12 |
*** amoralej is now known as amoralej|lunch | 11:13 | |
ccamacho | jschlueter there was a patch increase the heat memory | 11:13 |
shardy | ccamacho: Added comments to https://review.openstack.org/#/c/381593/ | 11:13 |
ccamacho | merged already | 11:13 |
*** sudipto has joined #tripleo | 11:13 | |
shardy | I like the general approach, but I think we can simplify things a bit | 11:13 |
jschlueter | ccamacho: ahh when did that get merged? | 11:13 |
jaosorior | shardy: so... the issue stems farther that I thought | 11:13 |
jschlueter | afazekas: ^^ | 11:13 |
shardy | https://github.com/openstack/instack-undercloud/commit/55ccd0e1e8c66c9f474112358063bc263720d84f | 11:13 |
shardy | jschlueter: ^^ | 11:13 |
ccamacho | shardy ack, good idea | 11:14 |
jaosorior | nevermind | 11:14 |
jschlueter | shardy: thanks | 11:14 |
* jaosorior keeps reading | 11:14 | |
*** ccamacho is now known as ccamacho|lunch | 11:14 | |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add zypper-minimal element https://review.openstack.org/381575 | 11:14 |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add opensuse-minimal element https://review.openstack.org/381576 | 11:14 |
*** pkovar has joined #tripleo | 11:15 | |
*** rhallisey has joined #tripleo | 11:15 | |
afazekas | jschlueter, shardy : thx | 11:16 |
*** zigo_ is now known as zigo | 11:23 | |
*** tzumainn has joined #tripleo | 11:27 | |
jschlueter | shardy, ccamacho|lunch: we are still seeing this in a CI job internally | 11:28 |
jschlueter | even with that patch applied | 11:28 |
shardy | jschlueter: is it a particularly large deployment? E.g much more than we test in upstream CI? | 11:29 |
jschlueter | shardy: actually a minimal 1 controller 1 compute 1 ceph deployment by InfraRed | 11:30 |
shardy | jschlueter: ack, Ok that's strange that we're not seeing consistent errors | 11:30 |
jschlueter | what should we be looking for to figure out what's causing the issue? | 11:31 |
jschlueter | what are the moving pieces that effect this? | 11:31 |
*** karthiks has quit IRC | 11:33 | |
larsks | shardy, I'm seeing the same error as jschlueter, in approx the same situation (1 controller/1 compute). | 11:33 |
d0ugal | matbu: sorry, I was at the vet - did you get the answer you needed about the user-file? | 11:33 |
shardy | larsks: Ok and you have the instack-undercloud patch? | 11:33 |
shardy | larsks: what t-h-t commit are you using? | 11:34 |
shardy | some more yaql stuff landed yesterday ... | 11:34 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: [WIP] Mistral API Documentation https://review.openstack.org/358685 | 11:36 |
larsks | shardy, blew away the environment yesterday after repeated failures (I ran into some possibly unrelated issues and decided I needed to start fresh). I will retry w/ current master as soon as I finish gettings kids to school etc. | 11:37 |
shardy | larsks: ack - I've not hit this locally so it'd be good to figure out what is different | 11:37 |
shardy | one thing is I don't use tripleo-quickstart, so possibly there could be differences in the VM setup/specs | 11:37 |
EmilienM | hi | 11:38 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Do Not Merge - Test Undercloud upgrade mitaka -> newton https://review.openstack.org/381309 | 11:38 |
jschlueter | shardy: yes we have the patch and THT is at 4cdc4fc | 11:39 |
*** stendulker has quit IRC | 11:40 | |
jschlueter | shardy, larsks: we have a CI job that is hitting this issue if you need access to it let us know, or what to look for | 11:40 |
*** Goneri has quit IRC | 11:43 | |
openstackgerrit | James Slagle proposed openstack/tripleo-common: Centos images no longer require epel element https://review.openstack.org/368976 | 11:45 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove usage of --compute-scale from the baremetal overcloud docs https://review.openstack.org/381454 | 11:46 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove usage of --control-scale from the HA docs https://review.openstack.org/381458 | 11:47 |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: Remove usage of --control-scale from the scale roles docs https://review.openstack.org/381187 | 11:47 |
*** zoli|mtg is now known as zoliXXL | 11:48 | |
*** karthiks has joined #tripleo | 11:49 | |
jschlueter | shardy, larsks, ccamacho|lunch: what am I looking for here? this is new debugging area for me but willing to learn and do foot work to figure this out but could use a pointer to where to start | 11:54 |
shardy | jschlueter: I'd start by either trying to reproduce locally, or testing with your CI job and a temporary patch which increases the limits more (ref the instack-undercloud patch I linked) | 11:55 |
shardy | either we didn't increase the limit enough, or there's something about your environment causing the queries to use more memory | 11:55 |
shardy | also it'd be good to figure out from the heat logs if we're hitting the memory quota or limit_iterators limit | 11:56 |
* shardy hasn't yet checked if the errors raised are different | 11:56 | |
jschlueter | shardy: thanks will attempt to see if an increase in that max will get it to work, but any tips to see what it was working on and why it was so big? | 11:57 |
shardy | jschlueter: No, that's probably something we could add to heat, e.g when raising an error include the query (or reference to where it was defined) | 11:58 |
shardy | jschlueter: I'd probably reproduce locally, then drop a line of debug in to the yaql function so we can see exactly which query blows up | 11:58 |
shardy | maybe we can rework it to be more efficient | 11:58 |
shardy | https://github.com/openstack/heat/blob/master/heat/engine/hot/functions.py#L1082 | 11:59 |
*** lucas-afk is now known as lucasagomes | 11:59 | |
jschlueter | shardy: ack got another person who ran into it as well | 12:00 |
shardy | jschlueter: that's where you could add some debug to see what's going on when it breaks | 12:00 |
jschlueter | shardy: thanks | 12:00 |
*** soc_off has joined #tripleo | 12:01 | |
shardy | jschlueter: I'm not sure if the context is included in the limit evaluation, but you might want to try https://review.openstack.org/#/c/381588/ if you can locally reproduce | 12:01 |
soc_off | shardy: I'll tru that soon | 12:02 |
*** flepied has quit IRC | 12:04 | |
pradk | marios, Hi | 12:06 |
*** fultonj has joined #tripleo | 12:06 | |
*** karthiks has quit IRC | 12:06 | |
pradk | marios, saw your comments.. so yea our assumption to leverage hiera data from new templates is what lead to this manifest | 12:06 |
marios | pradk: hey man, did you see the comments ? | 12:06 |
marios | pradk: yeah we get away with the stuff that was already there | 12:07 |
pradk | marios, i'm ok with adding the config params to the manifest | 12:07 |
pradk | if there is a way to access that value from manifest | 12:07 |
pradk | the problem is we dont have this data in mitaka | 12:07 |
pradk | as we used eventlet there | 12:07 |
marios | pradk: the things that fails isn't trying to reference the new hiera directly, but rather, assuming the config from the new templates is already in place (wsgi.conf for ceilo-api for example) | 12:07 |
pradk | marios, hmm i'm referring to the servername syntax issue | 12:08 |
marios | pradk: yeah we are saying same... the 'syntax' issue is because the server isn't set int he config file (empty) since we don't pass/set that from the hiera properly until the converge | 12:09 |
pradk | right | 12:09 |
pradk | we can pass the servername => blah to apache class | 12:09 |
pradk | but is there a way to get that value from the new templates | 12:10 |
*** jayg|g0n3 is now known as jayg | 12:10 | |
marios | pradk: i think we can only rely on any existing hiera at this point in the upgrade i.e. not directly by referencing the config_settings of https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceilometer-api.yaml#L74 | 12:11 |
jaosorior | pradk: aren't we already passing the servername to the vhost? | 12:11 |
pradk | yea but as marios says thats not accessible until converge apparently | 12:12 |
marios | pradk: so i think we can get to the values we need... am currently getting ready to push a saharaa upgrades related change then i'll switch to mitaka branch and poke some more | 12:12 |
marios | jaosorior: problem is we want to set config here https://review.openstack.org/#/c/360004/16/extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp | 12:12 |
marios | jaosorior: which is mitaka --> newton ... the setup of ceilometer under wsgi like in https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceilometer-api.yaml#L74 for example, won't happen until the converge step since we don't apply puppet before | 12:13 |
jpich | jtomasek: FYI - https://bugs.launchpad.net/tripleo/+bug/1630222 . I don't know if this affects deployments, if you've run a successful deploy with these settings recently maybe you can confirm it's not a big problem :) | 12:13 |
openstack | Launchpad bug 1630222 in tripleo "Kernel / Ramdisk not set when registering nodes with the UI" [Medium,Triaged] | 12:13 |
jaosorior | pradk, marios: well, servername is not REALLY necessary for the vhost since for mitaka we're still accessing everything via IP addresses | 12:14 |
openstackgerrit | Dougal Matthews proposed openstack/python-tripleoclient: [WIP] Save the result of direct action calls in Mistral https://review.openstack.org/381739 | 12:15 |
jaosorior | wouldn't the servername be set in a later step once the puppet stuff runs? it should be fine | 12:15 |
pradk | jaosorior, problem is apache restart fails with syntax error | 12:15 |
pradk | on missing servername in conf file | 12:15 |
jtomasek | jpich: thanks for bringing this up, yeah it is a problem. I am quite confident that image names were defaulted in the action before | 12:16 |
pradk | jaosorior, https://paste.fedoraproject.org/442730/52346714/ | 12:16 |
jaosorior | ah crap | 12:16 |
jaosorior | why the hell are we passing an empty servername | 12:16 |
jpich | jtomasek: It's possible, I remember struggling around that in order to allow the '--no-deploy-images' to work on the CLI with Mistral, so it's possible we need the UI to add the default as well or that will break | 12:16 |
soc_off | shardy: insreased mem limit didn't help, applying patch | 12:16 |
jaosorior | we're supposed to be using defaults, and if it's empty it should not even write it X_x | 12:17 |
shardy | soc_off: did you try increasing both limit_iterators and memory_quota? | 12:17 |
jaosorior | pradk, it's hacky... but what about giving a provisional servername coming from the $::hostname fact? (which is the actual default) | 12:17 |
soc_off | shardy: yep added 0 to both | 12:18 |
jaosorior | pradk: later that will be overwritten anyway | 12:18 |
shardy | hmm | 12:18 |
shardy | Ok, it's quite surprising that didn't fix it unless we've got a bad query blowing up huge memory usage somewhere | 12:18 |
shardy | (which I'd expect to see locally and in upstream CI) | 12:18 |
*** akshai has joined #tripleo | 12:18 | |
*** karthiks has joined #tripleo | 12:19 | |
soc_off | shardy: how do I find out what query os wrong? | 12:19 |
pradk | jaosorior, hmm so you're suggesting just set servername => $::hostname in the manifest itself and let converge override it | 12:20 |
jaosorior | pradk: pretty much | 12:20 |
shardy | soc_off: as I mentioned above, probably will need a line of debug added to the heat function, so you can dump out each query string just before it's evaluated | 12:20 |
pradk | that could work but i dont know how functional the api would be until the converge | 12:20 |
*** flepied has joined #tripleo | 12:20 | |
pradk | s/functional/reachable perhaps | 12:20 |
jaosorior | pradk: we either way (until now) access the nodes via the IPs | 12:20 |
shardy | https://github.com/openstack/heat/blob/master/heat/engine/hot/functions.py#L1081 | 12:21 |
jaosorior | so we don't really use the servername to route (yet) | 12:21 |
shardy | e.g log self._expression from there | 12:21 |
pradk | marios, ^^ what do you think of that | 12:21 |
jaosorior | only if you really modify your deployment (which I do) | 12:21 |
*** trown|outtypewww is now known as trown | 12:22 | |
openstackgerrit | Markos Chandras proposed openstack/diskimage-builder: Add opensuse-minimal element https://review.openstack.org/381576 | 12:22 |
marios | pradk: it could be something to fall back on if we can't get a better fix... but we definitely expect services to be up and 'normal' after the step (there could be time lapse of days between them) | 12:22 |
EmilienM | jaosorior, jpich, matbu, bandini, gfidente: we merged a bunch of patches recently, please make sure everything candidate for newton is backported in stable/newton, thanks | 12:23 |
marios | jaosorior: pradk i mean, assuming that is the only thing we need to set (or do we need to set all the config from https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceilometer-api.yaml#L74 | 12:23 |
jpich | EmilienM: Will do, thanks! | 12:23 |
jtomasek | jpich: ok, we don't allow to specify node images in GUI yet, so the fix is quite straightforward - hardcode the default image names in the workflow input in GUI, I can create a fix for that unless you want to do it | 12:24 |
jaosorior | marios: what do you mean? | 12:24 |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: Make keystone api network hiera composable https://review.openstack.org/381754 | 12:24 |
marios | pradk: well.. the service name you're setting already in https://review.openstack.org/#/c/360004/16/extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp | 12:24 |
marios | jaosorior: i'm saying that we may need to pass all config at this stage like we do to setup the service under wsgi... it fails and we are discussing the servername on enow... i was wondering about the rest of them | 12:25 |
EmilienM | jaosorior, shardy: do we need something else to make https://review.openstack.org/#/c/378764/ passing CI? | 12:26 |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: Replace per role manifests with a common role manifest https://review.openstack.org/381757 | 12:26 |
jaosorior | well, most of them are set, only ones I'm not sure about are the host bindings | 12:26 |
jaosorior | EmilienM: I' investigating that. I have a theory on why it fails. | 12:26 |
jaosorior | so it might need something more | 12:27 |
pradk | i think we'll need bind_host and servername | 12:27 |
shardy | EmilienM: yes, it's broken with network isolation, jaosorior is working on a fix | 12:27 |
pradk | rest can default | 12:27 |
EmilienM | shardy: ack. Also, can you look at https://review.openstack.org/#/c/379547/ ? | 12:27 |
marios | jaosorior: would appreciate any comments on the review esp your idea about using the $::hostname for now if needs be | 12:27 |
bandini | EmilienM: ack. am waiting for the rabbit folks to comment the last ones. If we do not get positive responses I will just drop them for newton | 12:28 |
panda | https://review.openstack.org/363674 is failing liberty and mitaka, but I don't whink it's related to the change ? is something missing to merge it ? | 12:28 |
jpich | jtomasek: Yeah, I agree the fix doesn't look too complicated. If you have the fix ready to go, don't wait on me!! I have to leave for a couple of hours now but can look at it after the TripleO meeting otherwise | 12:28 |
shardy | EmilienM: yup, I'd like to test that one locally, on my todo list for this afternoon | 12:28 |
jaosorior | marios: done | 12:28 |
EmilienM | shardy: ack | 12:29 |
jtomasek | jpich: ok, if I manage to create one before that I'll let you know | 12:29 |
marios | jaosorior: thanks | 12:29 |
EmilienM | bandini: ok, let me know. Ideally, we need the patch merged by tomorrow | 12:29 |
bandini | EmilienM: yeah if it is not done today I will drop it most likely | 12:29 |
jpich | jtomasek: Cool, I'll test+review if that's the case! | 12:29 |
pradk | marios, if you look at my initial patch https://review.openstack.org/#/c/360004/1/extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp | 12:30 |
pradk | marios, i explictly commented this out with a note | 12:30 |
EmilienM | bandini: k | 12:30 |
pradk | marios, then i was told hiera data is accessible | 12:30 |
pradk | marios, and we move towards that approach | 12:30 |
pradk | on another note, my undercloud upgrade seems to hang .. any known issues | 12:31 |
soc_off | shardy: so first off changing just undercloud hiera won't work as the heat yaql options will be ignored by puppet and second even with the patch to functions.py it fails. | 12:31 |
*** cylopez has left #tripleo | 12:31 | |
shardy | soc_off: why are the heat yaql options ignored by puppet? | 12:32 |
soc_off | shardy: because it does not set them, there is no support for them | 12:32 |
soc_off | shardy: let me check why it got ignored | 12:33 |
shardy | soc_off: Ok, so you don't have https://review.openstack.org/#/q/Id41001d74ce1008dbb5a98b962d5c53dbf39c903 | 12:33 |
hewbrocca | soc_off: you don't look off to me | 12:34 |
openstackgerrit | Merged openstack/tripleo-common: Default to Ironic API v1.15 https://review.openstack.org/364319 | 12:34 |
marios | pradk: my osp9 undercloud update was ok today (will share process sec) ... with the hiera, i think the new hiera is only available after we do a puppet run ... so the things set/passed from the https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceilometer-api.yaml for example, wouldn't be there if you originally deployed mitaka, when that file didn't exist (unless we were alrea | 12:34 |
marios | dy passing those items in te templates but in other files previously) | 12:34 |
*** akshai has quit IRC | 12:35 | |
marios | pradk: i assume you are still on a downstream osp9 env? in which case this is what i did today for example http://paste.openstack.org/show/584187/ | 12:35 |
pradk | marios, i use upstream repos for mitaka not rhos | 12:36 |
pradk | this worked for me couple of weeks ago | 12:36 |
pradk | since yesterday undercloud upgrade has been hanging | 12:36 |
marios | pradk: ah ok i thought you were using downstream setup then matbu may have idea about current undercloud upgrade issue upstream? | 12:36 |
*** amoralej|lunch is now known as amoralej | 12:37 | |
soc_off | marios: we should provide all dependancies of puppet before running puppet in undercloud. update puppet, facter, hiera and puppet modules before running it | 12:38 |
soc_off | marios: something like https://review.openstack.org/#/c/328264/ | 12:38 |
*** jeckersb is now known as jeckersb_gone | 12:38 | |
marios | pradk: i don't think it came up in yesterday so may be a new issue | 12:39 |
pradk | ok, i'll do another fresh one today and report back | 12:40 |
soc_off | shardy: https://review.openstack.org/#/c/381588/1/heat/engine/hot/functions.py didn't work but manually changing yaql options in hiera seem to have worked | 12:41 |
*** pgadiya has quit IRC | 12:41 | |
soc_off | *hiera=heat.cnf | 12:41 |
soc_off | shardy: so it's downstream too old puppet issue | 12:41 |
marios | soc_off: i know you've brought this up before and honestly i don't think it is something that we get for n..m ... do we have a spec/blueprint for it? we have the https://github.com/openstack/tripleo-specs/blob/master/specs/newton/undercloud-upgrade.rst for now... so that may be a good start | 12:42 |
*** jpena|lunch is now known as jpena | 12:42 | |
*** ccamacho|lunch is now known as ccamacho | 12:42 | |
marios | soc_off: i mean write something, maybe just a bug, doesn't have to be a spec, that says why you think we should do that | 12:42 |
hewbrocca | If you want to write a bug you need gfidente , right? | 12:43 |
*** yolanda has quit IRC | 12:44 | |
marios | hewbrocca: only if you need it written and well solved all at once | 12:44 |
hewbrocca | marios: seems like gfidente could save himself some effort by not writing the bugs in the first place | 12:46 |
*** yolanda has joined #tripleo | 12:46 | |
marios | speaking of italians, gfidente perhaps you know/can cofirm what i wrote at https://review.openstack.org/#/c/360004/16 - first comment from today from me - the hiera keys we set on newton templates won't be available on the nodes until the pos-deploy puppet config is first applied (ie converge) right? | 12:47 |
soc_off | shardy: so yaql options in heat fix the issue | 12:49 |
shardy | soc_off: ack, thanks for confirming | 12:49 |
soc_off | marios: you mean launchpad bug yes? | 12:50 |
*** jcoufal has joined #tripleo | 12:50 | |
mcornea | jaosorior: hey man,if you have few minutes could you please have a look at https://bugs.launchpad.net/tripleo/+bug/1629098? it's related to haproxy servers names, looks like they are always set to controllers even though they're using the correct ips | 12:53 |
openstack | Launchpad bug 1629098 in tripleo "The server names in haproxy configs do not match the ip addresses" [Undecided,New] | 12:53 |
EmilienM | pradk: https://review.openstack.org/#/c/371950/ | 12:54 |
EmilienM | swift is broken in your patch | 12:54 |
EmilienM | Oct 4 03:23:23 localhost swift-proxy-server: ImportError: No module named ceilometermiddleware.swift | 12:54 |
EmilienM | looks like packaging? | 12:55 |
jaosorior | mcornea: that is the case, and it is an issue | 12:55 |
jaosorior | mcornea: we need another fix | 12:55 |
jaosorior | mcornea: first we need to finish this one https://review.openstack.org/#/c/378764/8 | 12:55 |
jaosorior | mcornea: then we need to fix manifests/haproxy.pp since that's where it's hardcoded (it's been always like this unfortunately :/) | 12:56 |
*** shardy is now known as shardy_mtg | 12:56 | |
*** limao has joined #tripleo | 12:56 | |
pradk | EmilienM, is this the ha job? | 12:56 |
pradk | EmilienM, hmm packaging should be there | 12:57 |
mcornea | jaosorior: gotcha, is https://review.openstack.org/#/c/378764/8 ready for testing? I can try adding it to my environment | 12:57 |
*** limao_ has joined #tripleo | 12:58 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in generic.role.j2.yaml https://review.openstack.org/381593 | 12:58 |
EmilienM | pradk: yeah only HA job | 12:58 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Add Flag to Keep or Remove Sahara during controller upgrade https://review.openstack.org/375517 | 12:59 |
marios | jistr: tosky added a flag for now ^^^^ | 12:59 |
marios | jistr: tosky i think we need a launchpad bug for discussing what we will do here | 12:59 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 12:59 |
jaosorior | mcornea: it's not. It's failing on net-iso environments, which is what I'm trying to figure out at the moment | 12:59 |
openstackgerrit | Merged openstack/tripleo-quickstart: Customize undercloud and overcloud with virt-customize https://review.openstack.org/370114 | 12:59 |
pradk | EmilienM, packages are in the repo https://trunk.rdoproject.org/centos7/current/python-ceilometermiddleware-0.5.0-0.20161003160220.7f502e2.el7.centos.noarch.rpm | 12:59 |
*** sudipto_ has quit IRC | 13:00 | |
*** sudipto has quit IRC | 13:00 | |
tosky | oh | 13:00 |
pradk | EmilienM, perhaps puppet is not pulling the pkg down | 13:00 |
pradk | i'll investigate | 13:00 |
marios | jistr: tosky basically we can 'detect' if sahara is running during controller_pacemaker_1.sh because all the things are still running then. but by pacemaker_3.sh we've moved to next gen HA so we can't tell then if sahara was running earlier | 13:00 |
*** limao has quit IRC | 13:01 | |
EmilienM | pradk: I'm looking too | 13:01 |
marios | jistr: tosky so another thought was writing to file/signal between those two steps but that is ugly :/ | 13:01 |
tosky | I would kind of expect that you can read the old status before starting to upgrade it, and then you can't properly know the old status because it was upgraded :) | 13:01 |
*** akshai has joined #tripleo | 13:01 | |
marios | tosky: jistr: we'd ideally use 'enabled_services' from the newton templates, but default is for sahara to be off now so it wouldn't tell us if it was previously running (no enabled_services for mitaka) | 13:02 |
*** Goneri has joined #tripleo | 13:02 | |
jistr | marios, tosky: but reading the old status might not help, right? It's not so much whether sahar was running or not, as we know it *was* running in Mitaka. It's more about whether the user wants to keep it or not... | 13:02 |
*** akshai has quit IRC | 13:02 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/python-tripleoclient: Updated from global requirements https://review.openstack.org/375125 | 13:02 |
jistr | i.e. the gap between the Mitaka and Newton defaults | 13:02 |
*** yolanda has quit IRC | 13:02 | |
marios | jistr: tosky so tosky was trying to answer the 'can we just detect' which is was what was asked of us to do | 13:03 |
tosky | right, we know that it was on | 13:03 |
*** paramite has quit IRC | 13:03 | |
tosky | marios: but as jistr pointed out, we know that from the beginning | 13:03 |
tosky | if you stripped out sahara, you did some magic custom post-config manipulation | 13:04 |
marios | tosky: jistr yeah was thinking some more... like right, so it would be not trivial to just remove it then ^^ | 13:04 |
marios | tosky: jistr so that assumption means we don't need to try and detect and donig the env file/flag like the current review is ok | 13:04 |
*** akshai has joined #tripleo | 13:04 | |
slagle | EmilienM: have a look at https://review.openstack.org/#/c/381790/ | 13:05 |
marios | tosky: jistr unless they manually steopped sahara for some reason but then they'd know to explicitly disable it from the docs i guess | 13:05 |
jistr | marios, tosky: yea i think so. What we're asking is inherently undetectable AFAICT, it's a user decision. | 13:05 |
jistr | yea it might be detectable in cases when the users did something special with their deployment, as tosky wrote earlier, but that probably can't be considered the general case | 13:06 |
EmilienM | slagle: yes Sir | 13:06 |
*** sudipto has joined #tripleo | 13:06 | |
tosky | and if they did it once, they know they should explicitely remove it with the new switch | 13:06 |
tosky | yep: if they know about it and they don't want anymore, they can force its removal now; if they don't care, default -> keep | 13:06 |
*** yolanda has joined #tripleo | 13:07 | |
jistr | +1 on default = keep | 13:07 |
*** [1]cdearborn has joined #tripleo | 13:07 | |
EmilienM | slagle: commented | 13:07 |
jistr | +2'd the patch | 13:08 |
marios | jistr: advantage of doing this https://review.openstack.org/#/c/375517/2/environments/major-upgrade-pacemaker.yaml is we should be able to reuse for converge (though may not help at that point, i mean we need to use the existing 'deploy sahara' environment file | 13:08 |
slagle | EmilienM: doh, thx | 13:08 |
*** sudipto_ has joined #tripleo | 13:09 | |
marios | jistr: by 'this' i mean having a KeepSaharaServices in paramater_defaults frmo the controller step... will be persisted | 13:09 |
*** panda is now known as panda|afk | 13:10 | |
jistr | marios: yea we'll need some amount of docs around this whole issue simply b/c the Mitaka vs. Newton defaults differ, so one either needs to set the param on upgrade to false, or add the sahara env file on converge and beyond | 13:10 |
jistr | but i kinda like the explicitness here, e.g. in case user forgets about the whole issue, their sahara won't disappear just by itself, and they can still fix the issue later relatively easily (either stop it manually, or start passing the env file to start managing sahara configs properly) | 13:11 |
jistr | (depending on which way they want to go wrt sahara-ful vs. sahara-less deployment :) ) | 13:12 |
*** jeckersb_gone is now known as jeckersb | 13:12 | |
marios | my what a saharaful deployment ! | 13:12 |
EmilienM | slagle: modest +1 | 13:13 |
jistr | :) | 13:13 |
gfidente | EmilienM, so I realized ensure_packages is not safe if you call it multiple times with same list of packages | 13:15 |
gfidente | are there 'known' workarounds for that? | 13:15 |
gfidente | I read about creating a class which uses ensure_packages and include the class instead? | 13:15 |
*** akshai has quit IRC | 13:16 | |
openstackgerrit | Merged openstack/tripleo-docs: Update docs for undercloud upgrade https://review.openstack.org/368138 | 13:16 |
*** lblanchard has joined #tripleo | 13:16 | |
tosky | jistr, marios: are you saying that the default configuration will still require some work to reach a proper configuration? | 13:16 |
marios | tosky: jistr on converge we will need to explicitly enable the sahara services yes | 13:17 |
tosky | I kind of understand the need for being explicit (as python tries to enforce), but I'd still argue that the default configuration would lead to a working setup | 13:17 |
EmilienM | gfidente: why is it not same? | 13:17 |
tosky | which matches the environment available before | 13:17 |
marios | tosky: this review is about whether we keep/remove the sahara services during the controller upgrade | 13:17 |
EmilienM | gfidente: like resource duplications? | 13:17 |
gfidente | resource duplications yes | 13:17 |
*** limao_ has quit IRC | 13:18 | |
gfidente | it's safe if it gets multiple times same package from a single call | 13:18 |
gfidente | but not from multiple calls | 13:18 |
*** limao has joined #tripleo | 13:18 | |
jaosorior | shardy_mtg: so... I tried running your patch and it actually worked on my local deployment... so that means my theory was wrong. Now I don't know why it's failing. | 13:19 |
jistr | tosky: it's sorta inevitable due to Newton being saharaless by default. So if we want to keep Sahara, we will need to start passing an env file on converge and beyond. And if we want to remove Sahara, we'd need to pass KeepSaharaOnUpgrade: false during the upgrade. | 13:19 |
openstackgerrit | Merged openstack/tripleo-quickstart: Update get-overcloud-nodes script https://review.openstack.org/374190 | 13:19 |
tosky | jistr: even if it is sahara-less by default, we are talking about an upgrade here, and an upgrade is from mitaka, no other possibilities, so it should be possible to have the env file for convergence by default | 13:20 |
jistr | tosky, marios: the only way i can see this could be changed is to default to `KeepSaharaOnUpgrade: false` so that the "remove sahara" use case is without work (== the upgrade by default converges to the Newton defaults), which shifts the work to the "keep Sahara" case | 13:20 |
*** akshai has joined #tripleo | 13:20 | |
tosky | which is against the suggested direction of "whatever was available before" | 13:20 |
*** tiswanso has joined #tripleo | 13:21 | |
jistr | well i don't think it's possible (within reasonable implementation limits) to include env files automatically based on what we're upgrading from... i mean even if we baked Sahara env into the converge by default (which would make "remove Sahara" case a bit more complicated again perhaps), the user would still need to start passing the sahara env file *after* the upgrade, with any `overcloud deploy` commands | 13:23 |
marios | jistr: the requirement we currently have is 'whatever was there previously' ... going stricly on what you can deploy with the mitaka templates, then we can assume it was default on, | 13:23 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 13:23 |
marios | jistr: we could try work out how to detect that if we really need to - detect at pacemaker_1 and signale to pacemaker_3 | 13:24 |
tosky | there is only one path from where we're upgrading from, that's my point | 13:24 |
openstackgerrit | Hironori Shiina proposed openstack/diskimage-builder: Fix a command in Developer Documentation https://review.openstack.org/381833 | 13:24 |
jistr | marios, tosky: regardless of what we do for upgrade, we'll still need to start passing the env file after the upgrade, unless we go and change Newton defaults to deploy Sahara by default | 13:25 |
openstackgerrit | Dougal Matthews proposed openstack/python-tripleoclient: [WIP] Save the result of direct action calls in Mistral https://review.openstack.org/381739 | 13:25 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 13:25 |
jistr | so we can't make Newton behave the same way as Mitaka did without passing the additional env file | 13:26 |
tosky | jistr: uhm, and really no possibility of working around this? | 13:26 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/puppet-tripleo: Make MySQL client default host be same as bind address https://review.openstack.org/381834 | 13:26 |
tosky | I guess no | 13:26 |
tosky | unfortunate | 13:26 |
marios | jistr: 'any stack update operation henceforth' right | 13:26 |
jistr | yea... | 13:26 |
marios | jistr: we need a 'resource_registry_defaults' :) | 13:27 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/tripleo-heat-templates: Select per-network hostnames for service_node_names https://review.openstack.org/378764 | 13:27 |
jistr | tosky: not so unfortunate though, as this means that all Newton deployments behave the same, regardless what they were upgraded from, which IMO should be prioritized over "i don't want to change the set of env files i pass in" | 13:27 |
jistr | marios: ^ | 13:27 |
tosky | jistr: all? So what happens to custom changes to the enabled services when you will migrate from N to O or later? | 13:28 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 13:28 |
marios | jistr: what do you mean? fresh newton deployment won't have sahara if they don't enable it explicitly dduring their deploy | 13:28 |
jistr | i mean "Does this Newton env have Sahara?" should be a question answerable by looking at what we pass into the deployment command, not by going over the history of that deployment :) | 13:28 |
jistr | tosky: as long as users keep passing their custom changes, they should persist | 13:29 |
marios | jistr: ah so if they have newton with sahara, regardless of from upgrade or from fresh, they would have to include the -e sahara henceforth forever and ever till evermore anyway | 13:29 |
jistr | marios: yea re "fresh newton deployment won't have sahara if they don't enable it explicitly dduring their deploy" -- that was actually my point. Upgraded deployments should behave the same. | 13:29 |
jistr | marios: exactly | 13:30 |
jistr | marios, tosky: if we don't stick to such approach, we'll go crazy just after a few releases | 13:30 |
jistr | essentially our compute vs. novacompute problem, scaled up :) | 13:30 |
tosky | so back to the initial point, with the difference that, if you don't specify -e sahara, after the upgrade process you should be able to recover it, while if you want to kill it you need to explicitely pass another env | 13:31 |
tosky | or configuration, or whatever it is relevant in this case | 13:31 |
marios | tosky: yeah another env if you like or just set the KeepSaharaOnUpgrade from the environment/major-upgrade-pacemaker.yaml for the controller upgrade | 13:32 |
openstackgerrit | Merged openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/378750 | 13:32 |
dprince | pradk: Hi, I replied on https://bugs.launchpad.net/tripleo/+bug/1629934 | 13:32 |
openstack | Launchpad bug 1629934 in tripleo "firewall rules defined in service templates missing on overcloud" [Critical,Incomplete] | 13:32 |
dprince | pradk: could you have another look at the report there? | 13:32 |
jistr | tosky: yea. Basically for both options you need to do an explicit action. We could make the "remove Sahara" case non-explicit too, but it would make it slightly more dangerous perhaps, and the explictiness wouldn't be totally gone, it would just shift to the "keep Sahara" case. | 13:32 |
jistr | i.e. if we default to KeepSahara: false, we'd need to pass KeepSahara: true when we want to keep it | 13:33 |
tosky | jistr: explict, but keeping the service (minus the final convergence) should be still the default, so that you don't lose your data if you forgot -e sahara and you redeploy with it | 13:33 |
*** jaosorior has quit IRC | 13:34 | |
tosky | which means, if I get it correctly: please set KeepSahara true | 13:34 |
tosky | by default | 13:34 |
dprince | pradk: oh, wait. I think it isn't working still. | 13:34 |
jistr | tosky: yea +1, safer | 13:34 |
*** jaosorior has joined #tripleo | 13:35 | |
marios | jistr: tosky filing a launchpad bug to try capture some of this discussion and so we can track the patches (there may yet be something we need on converge too) | 13:35 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Use netapp_host_type instead of netapp_eseries_host_type https://review.openstack.org/381656 | 13:36 |
*** fultonj has quit IRC | 13:37 | |
jistr | marios, tosky: generally, anytime we want to change the default set of deployed services, we'll have this problem (especially painful if the new default means removing some service i think) | 13:37 |
*** fultonj has joined #tripleo | 13:38 | |
*** limao has quit IRC | 13:38 | |
*** limao has joined #tripleo | 13:39 | |
openstackgerrit | Jiri Tomasek proposed openstack/tripleo-ui: Deployment states optimization https://review.openstack.org/381845 | 13:39 |
pradk | dprince, if you look in one of our ci jobs.. the rules seems to be missing there too | 13:40 |
*** rodrigods has quit IRC | 13:40 | |
*** rodrigods has joined #tripleo | 13:40 | |
dprince | pradk: okay, I missed the last line of the bug ticket. You are saying that only the redis, mongo, ports are missing? | 13:40 |
EmilienM | dprince: why don't we enable firewall by default? I thought bnemec enable it | 13:40 |
pradk | dprince, from what i could tell yea | 13:41 |
*** skramaja has quit IRC | 13:41 | |
dprince | pradk: I don't see those either. But I did see keystone | 13:41 |
EmilienM | I think we should enable environments/manage-firewall.yaml by default in our CI | 13:42 |
EmilienM | wdyt? | 13:42 |
pradk | dprince, it seems like whatever is defined in puppet-tripleo in haproxy config make it fine | 13:42 |
EmilienM | or set ManageFirewall to True by default | 13:43 |
dprince | EmilienM: step at a time. Lets fix what pradk is seeing first | 13:43 |
marios | tosky: jistr not sure how well the irc chat works there but fwiw https://bugs.launchpad.net/tripleo/+bug/1630247 | 13:43 |
openstack | Launchpad bug 1630247 in tripleo "sahara services during mitaka to newton upgrade" [Undecided,Triaged] - Assigned to Marios Andreou (marios-b) | 13:43 |
soc_off | marios: https://bugs.launchpad.net/tripleo/+bug/1630249 | 13:43 |
openstack | Launchpad bug 1630249 in tripleo " [instack-undercloud] updated puppet and its dependancies before running it" [Undecided,New] | 13:43 |
EmilienM | I remember bnemec switched ManageFirewall to be True by default | 13:43 |
EmilienM | and I see it false now | 13:43 |
EmilienM | someone changed it | 13:43 |
dprince | EmilienM: it may have gotten lost in the composable services somewhere | 13:45 |
jistr | marios: works as a reference in case someone (mainly other than us) needs wants re-live that discussion :)) thanks | 13:45 |
EmilienM | yeah | 13:45 |
dprince | EmilienM: I will re-enable it, sure. But first I'd like to fix what pradk is seeing too | 13:46 |
EmilienM | for history: bnemec enabled firewall by default in June: https://review.openstack.org/#/c/321833/ | 13:46 |
EmilienM | dprince: yeah, it makes sense! | 13:46 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Add Flag to Keep or Remove Sahara during controller upgrade https://review.openstack.org/375517 | 13:47 |
marios | jistr: plus we will need to get this to stable/newton so will need a bug# for it | 13:47 |
jistr | right | 13:47 |
jaosorior | shardy_mtg: nevermind, it didn't fail cause the update didn't restart galera. so it still has the old config | 13:47 |
dprince | pradk: so I guess what I'd like to know from you is did you manually set ManageFirewall to true? Or use the environments/manage-firewall.yaml? | 13:48 |
tosky | jistr, marios: not only for other people, also for us (at least for me and my memory) | 13:49 |
*** links has quit IRC | 13:49 | |
pradk | dprince, whatever is the default, i dint change anything explicitly .. i just use the same workflow our ci jobs use | 13:51 |
dprince | pradk: okay, so that is part of the problem. But I did enable it. I saw some firewall rules, not from haproxy.pp (see the rules.txt I linked in the bug) | 13:52 |
dprince | pradk: but I'm still not seeing mongo rules for whatever reason. SO I'm looking into that now | 13:52 |
dprince | pradk: no, I see them | 13:52 |
pradk | dprince, ok did redis show up ? | 13:52 |
marios | soc_off: thanks | 13:52 |
*** jprovazn has quit IRC | 13:53 | |
dprince | pradk: no redis rules though | 13:53 |
dprince | pradk: so just redis is missing. but I'm seeing everything else | 13:53 |
* dprince checks hiera | 13:54 | |
pradk | dprince, so in our ci we dont enable Managefirewall either i presume? | 13:54 |
openstackgerrit | Jiri Tomasek proposed openstack/tripleo-ui: Add image names to Nodes registration workflow https://review.openstack.org/381855 | 13:54 |
dprince | pradk: exactly, I will push a patch to re-enable that | 13:54 |
pradk | understood | 13:54 |
dprince | pradk: but I did use it... and I'm still missing redis? | 13:54 |
jtomasek | jpich: ^^^^ | 13:56 |
jpich | jtomasek: Awesome, thank you! | 13:56 |
*** panda|afk is now known as panda | 13:57 | |
jtomasek | jpich: I thought gerrit is supposed to update the launchpad bug when a patch is sent for it but for some reason it is not happening | 13:57 |
EmilienM | meeting in 2 min | 13:58 |
jtomasek | jpich: oh, now it's there... | 13:58 |
jpich | jtomasek: It did :) | 13:58 |
jpich | jtomasek: Will test and review now | 13:58 |
jtomasek | jpich: thanks | 13:58 |
tosky | marios: bandini told me about an upgrade statament for the sahara database which is currently commented, did you touch it so far and/or is it relevant for the patch above? | 13:59 |
EmilienM | pradk: we did | 13:59 |
dprince | pradk: got it, it is a bug in the pacemaker template for redis | 13:59 |
EmilienM | but someone disabled it, i'll find which commit to understand why | 13:59 |
*** coolsvap_ is now known as coolsvap | 13:59 | |
*** limao_ has joined #tripleo | 14:00 | |
marios | tosky: ah yes thanks for reminder bandini lemme get you a pointer tosky sec | 14:00 |
soc_off | marios: one thing about it, I'm not sure where to actually put the update, my patch puts it into element but maybe we could consider tripleo-client (dunno) or even just making it package dependancy of instack-undercloud so if you update that you'll get updated puppet and opm | 14:01 |
soc_off | marios: package dep seems to be cleaner to me but I don't have technical argument for that | 14:01 |
marios | tosky: https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L68 | 14:01 |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Re-enable ManageFirewall by default. https://review.openstack.org/381864 | 14:01 |
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted to address performance degradation and should return momentarily | 14:02 | |
tosky | marios: that statement is enabled in the granade jobs which upgrades sahara, so I would say that it should be enabled | 14:02 |
*** limao has quit IRC | 14:03 | |
marios | tosky: ack will include it with https://review.openstack.org/#/c/375517/ i mean uncomment it there | 14:03 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/tripleo-common: Updated from global requirements https://review.openstack.org/375997 | 14:03 |
tosky | marios: thanks! | 14:04 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in generic.role.j2.yaml https://review.openstack.org/381593 | 14:04 |
openstackgerrit | Marios Andreou proposed openstack/tripleo-heat-templates: Add Flag to Keep or Remove Sahara during controller upgrade https://review.openstack.org/375517 | 14:05 |
*** shardy_mtg is now known as shardy | 14:06 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Include redis/mongo hiera when using pacemaker https://review.openstack.org/381869 | 14:06 |
dprince | pradk, EmilienM: ^^ | 14:06 |
EmilienM | dprince: I'll look after meeting | 14:07 |
dprince | EmilienM: also, this https://review.openstack.org/#/c/381864/ | 14:07 |
*** morazi has quit IRC | 14:07 | |
openstackgerrit | Merged openstack/tripleo-quickstart: Revert "Return to using ping test in minimal jobs" https://review.openstack.org/379545 | 14:07 |
pradk | dprince, nice catch | 14:07 |
dprince | pradk: well, you caught the symptom here ;) | 14:08 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in generic.role.j2.yaml https://review.openstack.org/381593 | 14:08 |
pradk | :) | 14:08 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 14:10 |
dprince | bnemec: hi, so per https://review.openstack.org/#/c/381864/ should we actually just remove environments/manage-firewall.yaml? | 14:12 |
*** gfidente has quit IRC | 14:13 | |
dprince | bnemec: it doesn't hurt but setting this default makes that file mute | 14:13 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 14:13 |
*** masco has quit IRC | 14:14 | |
openstackgerrit | Dan Prince proposed openstack/puppet-tripleo: Cleanup the firewall logic. https://review.openstack.org/381875 | 14:14 |
bnemec | dprince: Yeah, that probably makes sense. | 14:15 |
dprince | bnemec: okay, so update this patch? Or do it in a separate one? | 14:16 |
dprince | bnemec: I'm inclined to add it to the patch... | 14:16 |
*** jeckersb is now known as jeckersb_gone | 14:18 | |
bnemec | dprince: I'm fine either way. | 14:19 |
jaosorior | bandini: in the undercloud, where is mysql listening on? is it only on localhost? | 14:19 |
*** tosky has quit IRC | 14:20 | |
dprince | bnemec: patch updated. | 14:20 |
*** jeckersb_gone is now known as jeckersb | 14:21 | |
*** tosky has joined #tripleo | 14:21 | |
*** mcornea has quit IRC | 14:21 | |
bandini | jaosorior: don't think so. the only reason we are somewhat protected is because of iptables on the undercloud not because of properly listening to interfaces | 14:22 |
*** mcornea has joined #tripleo | 14:23 | |
jaosorior | ok | 14:23 |
bandini | jaosorior: lemme check though | 14:23 |
bandini | jaosorior: no it actually binds to the br-ctlplane (via /etc/my.cnf.d/galera.cnf). iptables filters that off though | 14:24 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/instack-undercloud: Set MySQL bind-address via parameter https://review.openstack.org/381887 | 14:25 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/puppet-tripleo: Make MySQL client default host be same as bind address https://review.openstack.org/381834 | 14:25 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/tripleo-heat-templates: Select per-network hostnames for service_node_names https://review.openstack.org/378764 | 14:26 |
*** limao_ has quit IRC | 14:30 | |
*** apetrich has quit IRC | 14:34 | |
*** apetrich has joined #tripleo | 14:34 | |
*** dmacpher has quit IRC | 14:37 | |
*** dmacpher has joined #tripleo | 14:38 | |
*** rook_ is now known as rook | 14:39 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 14:39 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 14:40 |
*** numans has quit IRC | 14:42 | |
hrybacki | ayoung: I've got this deploying locally and passing tests (not that it had an reason to break them) mind reviewing my updates? https://review.openstack.org/#/c/315749/ | 14:44 |
ayoung | ++ | 14:44 |
EmilienM | dprince: ok looking now. | 14:44 |
ayoung | hrybacki, the "false" values in there are for debugging, right? | 14:45 |
ayoung | changed_when: false | 14:45 |
EmilienM | dprince: thanks :) | 14:45 |
*** ayoung has quit IRC | 14:46 | |
*** ayoung has joined #tripleo | 14:46 | |
hrybacki | ayoung: not sure -- I didn't add those. sounds like clever stuff from adaraz | 14:46 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Downloads templates from swift before processing update https://review.openstack.org/381899 | 14:48 |
thrash | d0ugal: ^^^^ | 14:48 |
trown | ayoung: that is an ansible thing so that we dont need to do "ignore errors" and such... makes anisible-lint happy | 14:48 |
thrash | d0ugal: untested. But theoretically... :) | 14:48 |
ayoung | trown, ah, ok | 14:48 |
thrash | d0ugal: i'm tesitng now. | 14:48 |
d0ugal | rbrady: FYI ^ | 14:48 |
hrybacki | trown: interesting, are we pushing that as standard across oooq? | 14:49 |
d0ugal | thrash: cool, I'll test it too | 14:49 |
rbrady | d0ugal, thrash: taking a look | 14:49 |
trown | hrybacki: we have gates for ansible-lint, so ya :) | 14:49 |
hrybacki | trown: are they catching extra roles as well? /me wonders | 14:50 |
rhallisey | thrash, there's already a patch for that: https://review.openstack.org/#/c/379547/ | 14:50 |
trown | hrybacki: doubt it, they are openstack-infra gates | 14:50 |
rhallisey | just an fyi | 14:50 |
hrybacki | trown: ack, thanks for the intel :) | 14:50 |
thrash | rhallisey: that doesn't cover the update scenario | 14:51 |
rhallisey | thrash, gotcha | 14:52 |
thrash | rhallisey: but the code is pretty much the same. :) | 14:53 |
rhallisey | whatever works :) | 14:53 |
EmilienM | dprince: I'm adding tripleo/rc3 gerrit topic to your firewall fixes | 14:53 |
d0ugal | rhallisey: lol, at this point that is all that matters. | 14:53 |
EmilienM | dprince: they sounds me critical to have, right? | 14:53 |
EmilienM | dprince: or no? | 14:53 |
rhallisey | d0ugal, :D | 14:53 |
EmilienM | my hope is we don't add a new regression in upgrades but I don't think so | 14:53 |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: j2 template per-role ServiceNetMapDefaults https://review.openstack.org/381902 | 14:55 |
*** mbozhenko has quit IRC | 14:56 | |
openstackgerrit | Steven Hardy proposed openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/381903 | 14:56 |
d0ugal | thrash: it seems to just be hanging | 14:59 |
EmilienM | dprince: so it sounds like shardy removed the global parameter which was True by default https://review.openstack.org/#/c/347050/ | 14:59 |
EmilienM | dprince: but the parameter was False in puppet/controller.yaml | 14:59 |
EmilienM | which is the original reason why we have it to False since this time | 15:00 |
thrash | d0ugal: did you pass -i? | 15:00 |
EmilienM | dprince: so we have been disabling firewall on controller for 9 weeks iiuc | 15:00 |
openstackgerrit | Tomas Sedovic proposed openstack/tripleo-validations: Validate the instackenv format https://review.openstack.org/353956 | 15:00 |
*** rajinir has joined #tripleo | 15:01 | |
shardy | ouch :( | 15:01 |
*** r-mibu has quit IRC | 15:04 | |
*** r-mibu has joined #tripleo | 15:04 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 15:04 |
*** lmiccini has quit IRC | 15:05 | |
dprince | EmilienM: yep, plus there was a different regression effecting mongo, and redis | 15:05 |
EmilienM | nice catch 2 days before RC3 :P | 15:05 |
EmilienM | just don't tell anyone please | 15:05 |
dprince | EmilienM: that was what confused me about the ticket | 15:05 |
dprince | EmilienM: FWIW, I missed we enabled them by default (I was using that environment file) | 15:06 |
dprince | EmilienM: so I saw these working... again confusing | 15:06 |
EmilienM | yeah, the puppet-tripleo managed rules work by default | 15:07 |
EmilienM | is it expected? | 15:07 |
dprince | EmilienM: if we had removed that environment file when we enabled it I might have caught it sooner | 15:07 |
*** electrofelix has quit IRC | 15:07 | |
dprince | EmilienM: but probably not the redis and mongo issue. That was different | 15:07 |
*** electrofelix has joined #tripleo | 15:07 | |
jaosorior | dprince: what was the issue with redis and mongo? | 15:08 |
dprince | jaosorior: when using pacemaker their firewall rules wouldn't get set, because the pacemaker templates extended the wrong base serices | 15:08 |
dprince | serices | 15:08 |
dprince | services | 15:08 |
* dprince apparently my left index finger isn't typing 'v's today | 15:09 | |
dprince | jaosorior: https://review.openstack.org/#/c/381869/ | 15:10 |
dprince | jaosorior: just look at the patch. It is pretty simple | 15:10 |
*** rcernin has quit IRC | 15:10 | |
jaosorior | dprince: it is | 15:11 |
*** tremble has quit IRC | 15:12 | |
*** yamahata has joined #tripleo | 15:15 | |
*** lucasagomes is now known as lucas-hungry | 15:16 | |
*** jlinkes has quit IRC | 15:17 | |
*** paramite has joined #tripleo | 15:17 | |
jaosorior | shardy: I need some help with the hostnames CR | 15:18 |
d0ugal | thrash: no, I didn't lol | 15:18 |
jaosorior | shardy: I'm figuring out what's up with the ha failure, but the nonha gate failes because of ceph for some reason | 15:18 |
d0ugal | thrash: trying with that now. | 15:18 |
jaosorior | shardy: http://logs.openstack.org/64/378764/8/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/afd9ef2/logs/postci.txt.gz | 15:19 |
thrash | d0ugal: it still should have returned though... | 15:19 |
thrash | d0ugal: make sure you run with --debu | 15:19 |
thrash | *debug | 15:19 |
d0ugal | thrash: how long should it take? | 15:21 |
d0ugal | thrash: What command are you running exactly? | 15:21 |
*** panda is now known as panda|bbl | 15:21 | |
thrash | d0ugal: I'm just getting to it now. I suppose passing the environments is not completely necessary with a plan?? | 15:22 |
thrash | d0ugal: or does it need to maintain that? | 15:22 |
thrash | d0ugal: I guess we do since I put in the create or update plan part... | 15:22 |
*** akshai has quit IRC | 15:24 | |
*** saneax is now known as saneax-_-|AFK | 15:24 | |
thrash | d0ugal: openstack overcloud update stack --templates <path> -i -e <environments> overcloud | 15:27 |
*** aufi has quit IRC | 15:28 | |
d0ugal | thrash: k, so, since I just deployed initiall with "openstack overcloud deploy --templates" I am trying "openstack overcloud update stack overcloud --templates -i" | 15:28 |
d0ugal | thrash: and I get this error... | 15:28 |
d0ugal | thrash: http://paste.openstack.org/show/584240/ | 15:28 |
d0ugal | thrash: oh, maybe I need to install a newer tripleo-common? | 15:29 |
thrash | d0ugal: would you? | 15:29 |
thrash | d0ugal: maybe... | 15:29 |
thrash | d0ugal: mine seems to be working... | 15:29 |
d0ugal | thrash: I remember some mergepy references were removed recently | 15:29 |
thrash | d0ugal: that could do it. | 15:29 |
shardy | ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeoutm | 15:30 |
shardy | jaosorior: Hmm, not sure but I assume something got broken in the ceph service hieradata which does consume the hostname, looking | 15:30 |
thrash | d0ugal: nope. I hit that too... | 15:30 |
thrash | d0ugal: maybe tripleo-common does need to be fixed too. | 15:30 |
thrash | d0ugal: let me look at the UpdateManager code. | 15:31 |
thrash | d0ugal: constants.TEMPLATE_NAME | 15:32 |
thrash | should I change that reference to OVERCLOUD_YAML_NAME and get rid of the TEMPLATE_NAME? | 15:32 |
*** akshai has joined #tripleo | 15:32 | |
*** mcornea has quit IRC | 15:32 | |
d0ugal | thrash: https://review.openstack.org/#/c/375540/ | 15:32 |
*** leanderthal is now known as leanderthal|afk | 15:33 | |
d0ugal | thrash: I thought it had landed already | 15:33 |
thrash | d0ugal: ahhh | 15:33 |
thrash | yep. that'd do it. | 15:33 |
shardy | jaosorior: I assume it means ceph_mon_node_names is now pointing at a different network because CephMonNetwork is mapped to storage, not ctlplane | 15:33 |
* shardy looks around for gfidente | 15:33 | |
thrash | d0ugal: I'll add a depends-on | 15:33 |
jaosorior | shardy: and the mysql failure makes no sense either... it actually seems to be working on my deployment | 15:34 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Downloads templates from swift before processing update https://review.openstack.org/381899 | 15:34 |
shardy | jaosorior: and you're deploying with net-iso enabled? | 15:34 |
openstackgerrit | Juan Antonio Osorio Robles proposed openstack/tripleo-heat-templates: Select per-network hostnames for service_node_names https://review.openstack.org/378764 | 15:34 |
jaosorior | shardy: yes | 15:34 |
jaosorior | shardy: maybe we're missing some iptables setting? | 15:35 |
*** flepied1 has joined #tripleo | 15:35 | |
jaosorior | shardy: do we set anything for that? or in CI? | 15:35 |
*** flepied has quit IRC | 15:37 | |
shardy | jaosorior: well yes, but I'm not sure how your local tests work if the problem is with the firewall config | 15:38 |
shardy | EmilienM, dprince: the patch you were referring to earlier, where the firewall was disabled due to my patch, did the fix for that just land? | 15:38 |
shardy | and/or can you point to how per-network firewall rules are set up? | 15:38 |
dprince | shardy: https://review.openstack.org/#/c/381864/ | 15:39 |
shardy | we've been sending some traffic over ctlplane and it's now breaking when mapped to the ServiceNetMap networks | 15:39 |
EmilienM | no we haven't merged anything wrt this topic | 15:39 |
dprince | shardy: there is also a related bugfix here that effects only pacemaker https://review.openstack.org/#/c/381869/ | 15:40 |
shardy | https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceph-osd.yaml#L45 | 15:40 |
shardy | dprince, EmilienM: Ok thanks, so how do we define which bind network to apply those rules to? | 15:40 |
shardy | e.g if ceph needs tcp/6800, do we use the ServiceNetMap network to define the firewall rule, or apply it for all networks? | 15:41 |
EmilienM | it's using 0.0.0.0 by default AFIK https://github.com/openstack/puppet-tripleo/blob/master/manifests/firewall/rule.pp#L73 | 15:41 |
EmilienM | destination is "undef" by default oo | 15:41 |
EmilienM | too | 15:41 |
EmilienM | we might want to make some dynamic binding here, re-using tht | 15:42 |
*** masco has joined #tripleo | 15:42 | |
shardy | Ok, so yeah we can probably narrow the rules a little, but it's most likely not the reason the patch is failing | 15:42 |
shardy | thanks | 15:42 |
jaosorior | crap | 15:42 |
jaosorior | so if that's not the reason then I'm out of ideas :/ | 15:42 |
shardy | jaosorior: can you reproduce the ceph failure locally? | 15:43 |
shardy | I'd probably start with that, and run the failing command manually while tcpdumping to see where it gets stuck | 15:43 |
d0ugal | thrash: now I get the same error but with overcloud.yaml :-D | 15:43 |
jaosorior | jaosorior: I haven't tried, I'm running another overcloud deploy trying out things :/ | 15:43 |
EmilienM | can someone approve this backport please? https://review.openstack.org/#/c/381375/ | 15:44 |
EmilienM | same for https://review.openstack.org/#/c/381276/ | 15:44 |
*** mcornea has joined #tripleo | 15:46 | |
d0ugal | thrash: I think I have got it working | 15:50 |
d0ugal | thrash: I think I have got it working | 15:50 |
d0ugal | oops | 15:50 |
openstackgerrit | mathieu bultel proposed openstack/python-tripleoclient: Download templates from swift before processing with heatclient https://review.openstack.org/379547 | 15:50 |
*** akshai has quit IRC | 15:51 | |
shardy | Can I get a final review/+A for https://review.openstack.org/#/c/378737/ please? | 15:52 |
EmilienM | shardy: looking | 15:52 |
EmilienM | shardy: we don't wait for ovb? | 15:53 |
d0ugal | thrash: Yeah, I am getting an UPDATE_IN_PROGRESS now... so looking promising. | 15:53 |
shardy | EmilienM: Oh, yeah sorry I missed that hadn't voted yet, sorry for the noise | 15:53 |
EmilienM | shardy: i'll +A it | 15:53 |
EmilienM | once it's pass OVB | 15:53 |
shardy | ack, thanks | 15:53 |
*** yamahata has quit IRC | 15:55 | |
*** masco has quit IRC | 15:56 | |
*** tiswanso has quit IRC | 15:56 | |
dsneddon | shardy, When you have a few minutes, can you look at https://bugzilla.redhat.com/show_bug.cgi?id=1327742 | 15:58 |
openstack | bugzilla.redhat.com bug 1327742 in rhel-osp-director "[RFE] Director Should Allow For Multiple Compute Network Configurations" [Medium,New] - Assigned to athomas | 15:58 |
shardy | dsneddon: Will do, sounds like something we should solve via multiple compute roles using the new custom roles interfaces | 15:58 |
dsneddon | shardy, I don't think this requires immediate action to document, but at least I'd like to communicate back to the customer when this will be ready to test. | 15:59 |
*** tiswanso has joined #tripleo | 15:59 | |
*** tiswanso has quit IRC | 15:59 | |
shardy | dsneddon: Ok, will do - tbh we're still working out some kinks in the custom-roles usage, but hopefully the worst of those will be fixed by the final newton deadline | 15:59 |
EmilienM | dsneddon: could we use a launchpad bug, tracked to ocata-1 maybe? | 15:59 |
*** karthiks has quit IRC | 15:59 | |
dsneddon | EmilienM, Good idea, I'll create one. | 16:00 |
EmilienM | dsneddon: check if it's not already there | 16:00 |
shardy | Yeah, although it may end up a docs-only fix | 16:00 |
*** akshai has joined #tripleo | 16:00 | |
*** tiswanso has joined #tripleo | 16:00 | |
jaosorior | shardy: hey dude, I gotta go :/ | 16:01 |
jaosorior | Hey guys, if someone can help out figuring out what's wrong with https://review.openstack.org/#/c/378764/ it would be greatly appreciated. | 16:01 |
thrash | d0ugal: me too! :) | 16:01 |
shardy | jaosorior: Ok, I'll try to take a look later & we can sync about it tomorrow, thanks for investigating! | 16:01 |
thrash | d0ugal: I got to a breakpoint | 16:01 |
d0ugal | thrash: Nice! I'm still waiting... | 16:02 |
*** jaosorior has quit IRC | 16:02 | |
d0ugal | thrash: oh, actually, I am on one too! woo | 16:02 |
d0ugal | thrash: not that easy to spot with the debug output | 16:02 |
thrash | d0ugal: yah | 16:02 |
d0ugal | thrash: What do I do now? | 16:02 |
thrash | d0ugal: just hit enter | 16:02 |
thrash | that will clear a breakpoint and continue. Then it will hit another. | 16:02 |
*** zoliXXL is now known as zoli|gone | 16:03 | |
*** morazi has joined #tripleo | 16:03 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 16:05 |
*** zoli|gone is now known as zoli_gone-proxy | 16:06 | |
*** dhill_ has quit IRC | 16:08 | |
dsneddon | EmilienM, I found this bug in Launchpad already, which seems to cover the custom Compute role: https://bugs.launchpad.net/tripleo/+bug/1626976 | 16:08 |
openstack | Launchpad bug 1626976 in tripleo "Custom role requires manual environment/files" [High,In progress] - Assigned to Carlos Camacho (ccamacho) | 16:08 |
openstackgerrit | John Trowbridge proposed openstack/tripleo-quickstart: Add centosci release configs for cloudsig-testing repos https://review.openstack.org/381957 | 16:08 |
*** radeks has quit IRC | 16:10 | |
dsneddon | EmilienM, Although that bug is slightly different. I will create a new bug for the custom networking and link to this existing bug. | 16:10 |
*** rbrady is now known as rbrady-mtg | 16:11 | |
dsneddon | EmilienM, OK, found an existing bug that covers this exactly, I'll link them: https://bugs.launchpad.net/tripleo/+bug/1625558 | 16:11 |
openstack | Launchpad bug 1625558 in tripleo "NIC templates examples for usual custom roles" [Medium,Triaged] - Assigned to Steven Hardy (shardy) | 16:11 |
*** jcoufal_ has joined #tripleo | 16:11 | |
*** jcoufal has quit IRC | 16:12 | |
openstackgerrit | Dougal Matthews proposed openstack/python-tripleoclient: Save the result of direct action calls in Mistral https://review.openstack.org/381739 | 16:13 |
*** dhill_ has joined #tripleo | 16:13 | |
EmilienM | dsneddon: perfect. | 16:13 |
EmilienM | can someone approve https://review.openstack.org/381875 ? | 16:14 |
*** dtrainor has joined #tripleo | 16:14 | |
openstackgerrit | Merged openstack/tripleo-quickstart: Revert "Temporarily pin pycparser" https://review.openstack.org/381190 | 16:15 |
*** karthiks has joined #tripleo | 16:15 | |
EmilienM | dprince: sounds like we have iptables now http://logs.openstack.org/64/381864/2/check/gate-tripleo-ci-centos-7-nonha-multinode/5325ef3/logs/subnode-2/iptables.txt.gz | 16:15 |
*** weshay is now known as weshay_lunch | 16:15 | |
EmilienM | pradk: can you review https://review.openstack.org/#/c/381864/ ? i'll approve it once OVB jobs are green | 16:16 |
pradk | looking | 16:16 |
*** links has joined #tripleo | 16:17 | |
pradk | EmilienM, done | 16:18 |
*** links has quit IRC | 16:18 | |
*** lucas-hungry is now known as lucasagomes | 16:21 | |
EmilienM | pradk, dprince: sounds like pingtest is failing http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/console.html#_2016-10-04_16_08_27_290426 | 16:22 |
EmilienM | to ping the fip | 16:22 |
EmilienM | mhh | 16:23 |
EmilienM | http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/console.html#_2016-10-04_16_08_27_243259 | 16:23 |
EmilienM | in fact it sounds like the instance didn't get a private IP | 16:23 |
EmilienM | let's see firewall rules | 16:23 |
*** tiswanso has quit IRC | 16:23 | |
*** tiswanso has joined #tripleo | 16:25 | |
*** openstackgerrit has quit IRC | 16:26 | |
*** openstackgerrit has joined #tripleo | 16:27 | |
EmilienM | bnemec: do you know where does this debug come from in cirros vms ? http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/console.html#_2016-10-04_16_08_27_243189 | 16:27 |
EmilienM | what is running it? | 16:27 |
bnemec | EmilienM: Looks like that's part of the cirros image itself. | 16:28 |
*** openstackgerrit has quit IRC | 16:28 | |
EmilienM | 2016-10-04 16:08:27.241977 | failed 20/20: up 229.08. request failed in the VM shows that VM can't reach neutron-dhcp | 16:28 |
EmilienM | bnemec: ok | 16:28 |
*** tesseract- has quit IRC | 16:29 | |
*** openstackgerrit has joined #tripleo | 16:29 | |
EmilienM | bnemec: so we don't ssh it, right? we get the info with metadata, | 16:29 |
bnemec | EmilienM: That's just the console log of the vm, so no we don't ssh to it. | 16:29 |
EmilienM | ok | 16:29 |
EmilienM | so yeah, something's wrong in iptables | 16:29 |
*** morazi has quit IRC | 16:29 | |
*** openstackgerrit has quit IRC | 16:30 | |
EmilienM | we have -A INPUT -p udp -m multiport --dports 67 -m comment --comment "115 neutron dhcp input" -m state --state NEW -j ACCEPT | 16:30 |
*** openstackgerrit has joined #tripleo | 16:30 | |
EmilienM | and -A OUTPUT -p udp -m multiport --dports 68 -m comment --comment "116 neutron dhcp output" -m state --state NEW -j ACCEPT | 16:30 |
*** abehl has quit IRC | 16:31 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Move the main template files for defalut services to new syntax generation https://review.openstack.org/381975 | 16:32 |
openstackgerrit | Lucas Alvares Gomes proposed openstack/tripleo-quickstart: Force the use of python/pip version 2 https://review.openstack.org/368013 | 16:33 |
openstackgerrit | Merged openstack/tripleo-quickstart: Add centosci release configs for cloudsig-testing repos https://review.openstack.org/381957 | 16:34 |
*** ohamada has quit IRC | 16:35 | |
*** karthiks has quit IRC | 16:35 | |
*** milan has quit IRC | 16:37 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 16:37 |
*** rbowen has quit IRC | 16:37 | |
*** yamahata has joined #tripleo | 16:38 | |
EmilienM | dprince: I'll continue to dig after lunch | 16:39 |
*** trown is now known as trown|lunch | 16:40 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Add generic template for custom roles. https://review.openstack.org/381587 | 16:41 |
*** yamahata has quit IRC | 16:49 | |
openstackgerrit | Dougal Matthews proposed openstack/tripleo-docs: [WIP] Mistral API Documentation https://review.openstack.org/358685 | 16:50 |
*** rbowen has joined #tripleo | 16:50 | |
*** tiswanso has quit IRC | 16:50 | |
*** karthiks has joined #tripleo | 16:52 | |
*** weshay_lunch is now known as weshay | 16:54 | |
*** hewbrocca is now known as hewbrocca-afk | 16:55 | |
openstackgerrit | Merged openstack/tripleo-common: Remove references to overcloud-without-mergepy https://review.openstack.org/375540 | 16:55 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 16:55 |
*** rbrady-mtg is now known as rbrady | 16:57 | |
*** yamahata has joined #tripleo | 16:57 | |
*** fzdarsky is now known as fzdarsky|afk | 16:57 | |
*** weshay is now known as weshay_mtg | 16:58 | |
*** ccamacho has quit IRC | 17:00 | |
*** derekh has quit IRC | 17:00 | |
openstackgerrit | Merged openstack/tripleo-specs: Fix a typo in documentation https://review.openstack.org/381379 | 17:01 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Set ceph osd max object name and namespace len on upgrade when on ext4 https://review.openstack.org/381375 | 17:03 |
openstackgerrit | Merged openstack/puppet-tripleo: Use FallbackResource instead of Rewrite for UI https://review.openstack.org/381276 | 17:03 |
rbowen | Excellent | 17:03 |
*** mcornea has quit IRC | 17:03 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Include redis/mongo hiera when using pacemaker https://review.openstack.org/381869 | 17:03 |
*** penick has joined #tripleo | 17:05 | |
*** social has joined #tripleo | 17:07 | |
*** tiswanso has joined #tripleo | 17:08 | |
*** jpich has quit IRC | 17:09 | |
*** jprovazn has joined #tripleo | 17:15 | |
openstackgerrit | mathieu bultel proposed openstack/python-tripleoclient: Download templates from swift before processing with heatclient https://review.openstack.org/379547 | 17:16 |
*** dtantsur is now known as dtantsur|afk | 17:18 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Test with scheduler hints https://review.openstack.org/378040 | 17:19 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Add support for testing predictable placement https://review.openstack.org/378014 | 17:19 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Test hostname map https://review.openstack.org/378017 | 17:19 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Fall back to previous failure list for older releases https://review.openstack.org/381998 | 17:19 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Do Not Merge - Test Undercloud upgrade mitaka -> newton https://review.openstack.org/381309 | 17:20 |
*** tiswanso has quit IRC | 17:22 | |
*** pkovar has quit IRC | 17:23 | |
*** dmacpher is now known as dmacpher-afk | 17:24 | |
*** sudipto_ has quit IRC | 17:24 | |
*** sudipto has quit IRC | 17:24 | |
*** tiswanso has joined #tripleo | 17:26 | |
*** ayoung has quit IRC | 17:27 | |
*** ayoung has joined #tripleo | 17:27 | |
*** flepied has joined #tripleo | 17:34 | |
larsks | shardy, jschlueter: I'm still seeing that "Expression consumed too much memory" error using current tripleo-heat-templates master. | 17:36 |
larsks | Any luck investigating that earlier? | 17:36 |
EmilienM | larsks: have you looked in logstash if it's consistent? | 17:36 |
larsks | EmilienM, I'm seeing this locally, and pretty consistently. | 17:37 |
*** flepied1 has quit IRC | 17:37 | |
EmilienM | in logstash? | 17:37 |
*** tosky has quit IRC | 17:37 | |
*** rbowen has quit IRC | 17:37 | |
larsks | EmilienM, I've got no logstash. This is on the console, when running "overcloud deploy". | 17:37 |
EmilienM | let's see in http://logstash.openstack.org/#/dashboard/file/logstash.json | 17:37 |
shardy | larsks: yes the downstream build is missing a puppet-heat patch | 17:37 |
shardy | so the increase wasn't doing anything | 17:37 |
shardy | See ChangeId Id41001d74ce1008dbb5a98b962d5c53dbf39c903 | 17:38 |
larsks | shardy, awesome. So if I upgrade to puppet-heat master I should be good? | 17:38 |
shardy | https://github.com/openstack/instack-undercloud/commit/55ccd0e1e8c66c9f474112358063bc263720d84f | 17:38 |
gregwork | any thoughts on cleaning out a failed overcloud deploy where the status of overcloud is DELETE_FAILED | 17:38 |
larsks | shardy, thanks! | 17:38 |
shardy | larsks: I believe so, yes | 17:38 |
EmilienM | shardy: no | 17:38 |
EmilienM | not master | 17:38 |
EmilienM | to stable/newton | 17:38 |
shardy | you can confirm by checking heat.conf to see if you have actually changed the heat config settings | 17:39 |
EmilienM | and I'm preparing a new release of puppet-heat at this time | 17:39 |
EmilienM | is it the yaql fix? | 17:39 |
*** rasca has quit IRC | 17:39 | |
shardy | EmilienM: ack, yeah, I mean you need that patch, which is on master and stable/newton now | 17:39 |
shardy | EmilienM: yup | 17:39 |
EmilienM | larsks: ^ | 17:39 |
EmilienM | do not update on master | 17:39 |
shardy | some folks pulled the instack-undercloud fix without the puppet-heat one | 17:39 |
shardy | so it does nothing | 17:39 |
EmilienM | I haven't seen it in our CI FYI: build_name: *tripleo* AND message: "Expression consumed too much memory" | 17:39 |
shardy | EmilienM: Yes, some folks are testing downstream builds | 17:40 |
EmilienM | shardy: ok, makes sense then. | 17:40 |
EmilienM | i'm working on releasing a new puppet-heat: https://review.openstack.org/#/c/381901/ | 17:40 |
EmilienM | it will be done by today i think | 17:40 |
larsks | shardy, my local heat is 7.0.0-0.20160923054727.e4c4c56. Will I need to upgrade that as well? | 17:40 |
shardy | larsks: No it just needs to be configured to increase the limit | 17:41 |
larsks | Ack. | 17:41 |
EmilienM | larsks: nope, just puppet | 17:41 |
EmilienM | or the config manually, indeed | 17:41 |
*** weshay_mtg is now known as weshay | 17:41 | |
slagle | matbu: is https://bugs.launchpad.net/tripleo/+bug/1624448 even still an issue? based on the latest PS of https://review.openstack.org/#/c/323750/, you are not even using multinode jobs to test the upgrades. you're using ovb | 17:42 |
openstack | Launchpad bug 1624448 in tripleo "CI Upgrade job hang or failed on undercloud install" [High,In progress] - Assigned to James Slagle (james-slagle) | 17:42 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Downloads templates from swift before processing update https://review.openstack.org/381899 | 17:44 |
thrash | jprovazn: do you remember why you originally made 'openstack overcloud update stack' auth_required = False? | 17:45 |
*** Goneri has quit IRC | 17:46 | |
*** rbowen has joined #tripleo | 17:49 | |
dprince | EmilienM: weird. I just rebased my dev environment and now I'm hitting this: http://paste.openstack.org/show/584276/ | 17:50 |
EmilienM | dprince: weird indeed. The error in CI is different | 17:52 |
EmilienM | dprince: pingtest doesn't work, it seems like the instance created by the heat template is not getting an IP address | 17:52 |
EmilienM | dprince: I'm rebasing my env too now. | 17:56 |
*** radeks has joined #tripleo | 17:56 | |
openstackgerrit | Emilien Macchi proposed openstack/tripleo-heat-templates: Include ceilometer in swift proxy pipeline https://review.openstack.org/371950 | 17:58 |
*** rcernin has joined #tripleo | 17:58 | |
EmilienM | pradk: let's try again ^ | 17:58 |
EmilienM | pradk: ok I know why | 17:59 |
EmilienM | pradk: puppet-tripleo patch needs to bebackported | 17:59 |
EmilienM | our CI still checkout stable/newton I think | 17:59 |
*** ccamacho has joined #tripleo | 18:00 | |
pradk | EmilienM, i proposed the backport already | 18:00 |
dprince | EmilienM: I rechecked my patch BTW. Now that the redis, mongo fix landed. I don't think it is related but figured it worth a try since we are testing locally too | 18:00 |
EmilienM | pradk: I'm approving it now | 18:00 |
pradk | cool | 18:00 |
EmilienM | pradk: it will work for sure after that | 18:00 |
EmilienM | dprince: yes, my env is almost deployed, I'll debug | 18:00 |
EmilienM | dprince: have you seen the failure in CI? | 18:00 |
EmilienM | http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/console.html#_2016-10-04_16_08_27_243224 | 18:00 |
dprince | EmilienM: Yes, saw the pingtest | 18:00 |
EmilienM | k | 18:01 |
dprince | EmilienM: the firewall rules seem to be in place too | 18:01 |
EmilienM | right | 18:01 |
dprince | EmilienM: its been off for awhile so hard to say what this is | 18:01 |
EmilienM | maybe are we missing a rule | 18:01 |
*** ayoung has quit IRC | 18:01 | |
EmilienM | yeah, 9 weeks | 18:01 |
*** ayoung has joined #tripleo | 18:01 | |
EmilienM | I'll compare iptables rules since then | 18:01 |
EmilienM | I'll compare when it worked and now I mean | 18:01 |
dprince | EmilienM: its good it wasn't 9 1/2 weeks. Then we'd be in trouble | 18:01 |
EmilienM | dprince: why? | 18:02 |
EmilienM | you're trolling me :P | 18:02 |
dprince | EmilienM: I'll let you google that one :) | 18:02 |
*** tiswanso has quit IRC | 18:02 | |
EmilienM | https://en.wikipedia.org/wiki/9%C2%BD_Weeks | 18:03 |
EmilienM | ok nice | 18:03 |
EmilienM | damn, it's not safe for work | 18:03 |
*** radeks has quit IRC | 18:03 | |
*** tiswanso has joined #tripleo | 18:03 | |
EmilienM | dprince: dude, I didn't know that movie, thanks | 18:03 |
jprovazn | thrash, not sure what you mean by 'auth_required = False"? | 18:04 |
EmilienM | dprince: sad, we don't have logs anymore from 9 weeks ago | 18:05 |
*** abehl has joined #tripleo | 18:06 | |
*** electrofelix has quit IRC | 18:06 | |
Slower_ | is there a way to force delete a stack now? | 18:09 |
*** Slower_ is now known as Slower | 18:09 | |
Slower | failed stacks with mistral seem to break things and I'm not sure how to debug that yet | 18:10 |
EmilienM | dprince: I found old logs with iptables, from August! | 18:10 |
EmilienM | I'm comparing now | 18:10 |
shardy | Slower: you should always be able to do heat stack-delete overcloud && openstack plan delete overcloud | 18:10 |
shardy | possibly you didn't do the plan delete? | 18:10 |
shardy | Slower: if you really need to reset the stack status, you can force it with heat-manage reset_stack_status | 18:11 |
Slower | plan? | 18:11 |
Slower | openstack: 'plan' is not an openstack command. See 'openstack --help'. | 18:11 |
*** mcornea has joined #tripleo | 18:11 | |
Slower | | c820d2de-6b11-436a-bdda-4d2a146e2508 | overcloud | DELETE_FAILED | 2016-10-04T17:08:45Z | 2016-10-04T17:15:31Z | | 18:11 |
shardy | Slower: sorry openstack overcloud plan delete | 18:11 |
Slower | ah ok | 18:12 |
shardy | it deletes the swift bucket and mistral environment | 18:12 |
Slower | righto | 18:12 |
Slower | thanks I needed that :) | 18:12 |
shardy | if you end up with that stuff partially created things can fail non-obviously | 18:12 |
* shardy raised a bug about it IIRC | 18:12 | |
EmilienM | dprince: I have a diff between old iptables rules and new ones https://www.diffchecker.com/7LjRfmZY | 18:12 |
Slower | Cannot delete a plan that has an associated stack. | 18:12 |
Slower | shardy: so I'm guessing I have to enable stack abandon and do that? | 18:13 |
shardy | Slower: you have to delete the stack first, then the plan | 18:13 |
Slower | DELETE_FAILED | 18:13 |
shardy | why is it delete failed tho? | 18:13 |
Slower | do you really want to know? :) | 18:13 |
EmilienM | sounds like we have a bunch of new rules | 18:13 |
*** Goneri has joined #tripleo | 18:13 | |
shardy | Slower: hehe, I just mean that shouldn't ever happen unless something somewhere is broken | 18:14 |
dprince | EmilienM: vrrp? | 18:14 |
Slower | shardy: well my stack is always broken.. | 18:14 |
shardy | Slower: heat doesn't care how broken your stack is tho, it should always be able to delete it | 18:14 |
Slower | could be an ironic issue.. | 18:14 |
Slower | or something | 18:14 |
shardy | undeletable stacks are either a critical heat bug, or a misconfiguration somewhere | 18:14 |
Slower | there are no instances up tho | 18:15 |
EmilienM | dprince: indeed I don't see vrrp any more in new iptables | 18:15 |
EmilienM | dprince: though vrrp is not useful for dhcp | 18:15 |
Slower | shardy: it happens every once in a while | 18:15 |
Slower | shardy: I always just enabled stack abandon and did that | 18:15 |
EmilienM | http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/console.html#_2016-10-04_16_08_21_799339 | 18:15 |
EmilienM | and we can see dhcp agent running fine | 18:15 |
shardy | Slower: I never see stack delete failures unless there's a bug or I somehow broke my undercloud | 18:16 |
EmilienM | looking in neutron logs to see if we see the dhcp request | 18:16 |
shardy | so sure, you can abandon, but it's possible you're papering over some other issue | 18:16 |
Slower | shardy: ResourceFailure: resources.ServiceChain: resources.ObjectStorageServiceChain.(pymysql.err.InternalError) (1205, u'Lock wait timeout e | 18:16 |
Slower | xceeded; try restarting transaction') [SQL: u'DELETE FROM resource WHERE resource.id = %s'] [parameters: (410,)] | 18:16 |
EmilienM | dprince: http://logs.openstack.org/64/381864/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/1886035/logs/overcloud-controller-0/var/log/neutron/dhcp-agent.txt.gz#_2016-10-04_16_01_38_081 | 18:16 |
EmilienM | Could not load neutron.agent.linux.interface.OVSInterfaceDriver | 18:17 |
Slower | shardy: guessing that's it.. not sure yet | 18:17 |
EmilienM | do we have that log in current jobs? | 18:17 |
*** abehl has quit IRC | 18:17 | |
shardy | Slower: ack, so that sounds like it could well be a bug - if you can capture some data from the heat-engine logs we might be able to figure out what | 18:17 |
shardy | particularly if it's failing consistently every delete | 18:17 |
EmilienM | yes we have it, nevermind | 18:17 |
*** akshai has quit IRC | 18:18 | |
dprince | EmilienM: all of the neutron API stuff is missing. I got it | 18:18 |
dprince | EmilienM: we specified neutron_server instead of neutron_api | 18:18 |
dprince | EmilienM: sec and I'll push the patch | 18:18 |
dprince | EmilienM: that diff was really helpful FWIW | 18:19 |
Slower | shardy: also if I run openstack stack failures list overcloud on the failed-to-delete stack it backtraces too | 18:19 |
Slower | shardy: but that may not be a bug, not sure | 18:19 |
EmilienM | dprince: indeed | 18:19 |
EmilienM | dprince: nice catch dude | 18:19 |
dprince | EmilienM: same patch? or new one. I'll have to rebase the other one anyway.... | 18:19 |
EmilienM | dprince: same patch | 18:19 |
EmilienM | dprince: let's save time & resources | 18:20 |
EmilienM | at this stage of the week | 18:20 |
openstackgerrit | Steven Hardy proposed openstack/python-tripleoclient: Add optional overcloud deploy roles_data.yaml override https://review.openstack.org/378740 | 18:20 |
EmilienM | shardy: do you remember what we did with sahara this cycle? | 18:20 |
Slower | shardy: https://paste.fedoraproject.org/443483/14756052/ | 18:20 |
EmilienM | I see iptables rules for sahara removed by default, we stopped deploying it by default right? | 18:20 |
shardy | EmilienM: Not without looking at the git logs tbh | 18:21 |
shardy | EmilienM: that reminds me I meant to discuss release notes in the meeting | 18:21 |
EmilienM | shardy: reno? | 18:21 |
*** trown|lunch is now known as trown | 18:21 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Re-enable ManageFirewall by default. https://review.openstack.org/381864 | 18:21 |
EmilienM | shardy: it's on my looong TODO list | 18:21 |
shardy | we've not used reno this cycle (we should for ocata) | 18:21 |
*** bana_k has joined #tripleo | 18:21 | |
dprince | EmilienM: ^^ | 18:21 |
EmilienM | dprince: ack thx! | 18:21 |
shardy | EmilienM: yeah, but we need to generate some release notes manually this week | 18:21 |
EmilienM | shardy: yes. I'm adding it today | 18:21 |
EmilienM | shardy: I know how reno works, I don't need much time | 18:22 |
EmilienM | shardy: the question is about newton | 18:22 |
EmilienM | how are we going to add release notes and where | 18:22 |
EmilienM | imho it's too late, we should document in tripleo-docs | 18:22 |
EmilienM | and use reno in ocata | 18:22 |
shardy | EmilienM: yes, my assumption was that we'd collaborate on an etherpad which can be used for the release notes | 18:22 |
shardy | but if that's not possible we can do it elsewhere | 18:23 |
shardy | seems like something which could be done between the final code and cycle-trailing deadline to me | 18:23 |
Slower | shardy: oh and this time it worked.. | 18:23 |
EmilienM | shardy: etherpad works for me as a draft before putting it in tripleo-docs | 18:23 |
shardy | EmilienM: Ok, I'd prefer it to be included with the main OpenStack release notes, but if that's not possible then I guess tripleo-docs will work | 18:24 |
EmilienM | shardy: how would be included in main OpenStack release notes? | 18:24 |
shardy | lets capture the information & discuss with the release team | 18:25 |
EmilienM | for Newton I mean | 18:25 |
EmilienM | we could do it with reno but it's probably too late | 18:25 |
EmilienM | in term of tags, etc | 18:25 |
EmilienM | (reno works with tags and branches) | 18:25 |
*** athomas has quit IRC | 18:25 | |
shardy | EmilienM: Ok maybe it is, I assumed there was some manual fallback method we could use, like all pre-reno releases | 18:25 |
shardy | and also that by observing the cycle trailing deadline we'd be allowed more time to finalize the release notes | 18:26 |
Slower | shardy: https://paste.fedoraproject.org/443493/60558414/ | 18:26 |
EmilienM | shardy: I think it's possible | 18:26 |
Slower | shardy: can I abandon or do you want more info? | 18:26 |
shardy | Slower: something is obviously really broken but I don't have time to debug unfortunately | 18:27 |
shardy | perhaps raise a heat bug if you think it's possible to reproduce | 18:27 |
*** shardy is now known as shardy_afk | 18:27 | |
*** amoralej is now known as amoralej|off | 18:29 | |
*** jpena is now known as jpena|off | 18:30 | |
*** radeks has joined #tripleo | 18:31 | |
*** bnemec has quit IRC | 18:34 | |
*** dsariel_ has joined #tripleo | 18:42 | |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-common: Add support to create role main template file based in role.role.j2.yaml https://review.openstack.org/381593 | 18:49 |
*** akshai has joined #tripleo | 18:50 | |
ccamacho | EmilienM arround? Mind to approve https://review.openstack.org/#/c/378737/ ?? | 18:52 |
ccamacho | +2ed' and passing ovb jobs | 18:52 |
openstackgerrit | Emilien Macchi proposed openstack/python-tripleoclient: Add ReNo support https://review.openstack.org/382046 | 18:53 |
EmilienM | ccamacho: looking | 18:53 |
ccamacho | thanks! | 18:53 |
EmilienM | ccamacho: done | 18:54 |
*** rwsu_ has joined #tripleo | 19:01 | |
*** rwsu has quit IRC | 19:04 | |
openstackgerrit | Pradeep Kilambi proposed openstack/tripleo-heat-templates: Ceilometer Wsgi Mitaka->Newton upgrades https://review.openstack.org/360004 | 19:04 |
EmilienM | ccamacho: the patch is not going to land | 19:09 |
EmilienM | because of the Depends-On | 19:09 |
EmilienM | but we first need https://review.openstack.org/#/c/381903/ | 19:10 |
EmilienM | I restored it and ran recheck, let's see | 19:10 |
*** radeks has quit IRC | 19:11 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci: POC: WIP: Full quickstart gate run on OVB https://review.openstack.org/381094 | 19:14 |
rook | beagles sai hey - so back in the day I made a change to nova to reserve the host a bit more memory then upstream (512 -> 2048MB)... If we switch OOO to deploy DVR, and with the metadata service bug (memory hangry)... we should reserve the host a bit more memory... And with the latest DVR it might be a bit silly... (ie, should we set a minimum to 10GB). | 19:15 |
d0ugal | Slower: This may be useful, but it isn't very complete yet: http://www.dougalmatthews.com/2016/Sep/21/debugging-mistral-in-tripleo/ | 19:16 |
sai | beagles: yup, i can confirm the memory growth with DVR on controller.. | 19:16 |
sai | i have data | 19:16 |
sai | sorry i meant compute | 19:17 |
Slower | d0ugal: I saw that. Mostly it went over my head tbh :) | 19:18 |
Slower | it's pretty crazy when you list and there's so many actions and the seemingly random error/non-error status etc. | 19:18 |
d0ugal | Slower: Yeah, I can imagine. I'll try and do a follow up to filter that - the number keeps increasing at the moment too | 19:19 |
* d0ugal isn't really here | 19:19 | |
openstackgerrit | Honza Pokorny proposed openstack/tripleo-ui: Integrate node tagging workflow https://review.openstack.org/367562 | 19:20 |
openstackgerrit | mathieu bultel proposed openstack/python-tripleoclient: Download templates from swift before processing with heatclient https://review.openstack.org/379547 | 19:26 |
thrash | jprovazn: https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_update.py#L32 | 19:29 |
thrash | somehow I missed your response earlier... :) | 19:29 |
rook | EmilienM so, I have a idea to make deployments a bit easier at scale, not sure if you are the right person to discuss the idea with.. However, here it goes... Right now when we do a OC deploy, and the nodes go from build->active, I have seen many times where either 1) the host doesn't finish up with the PXE install and/or the guest doesn't reboot. So a deployment/scale deployment ends up timing out (if you | 19:33 |
rook | are not actively baby sitting the deployment). | 19:33 |
rook | to get around this -- manually, I typically just ping the set of nodes provisioning interface (after it goes from building-> active) | 19:33 |
jprovazn | thrash, no idea (this is really looong time ago), but IIRC an existing overcloud command was used as a "template" for it | 19:33 |
rook | if it doesn't ping after 5 minutes, I start rebooting the host via ironic... that typically "fixes" things, however sometimes that doesn't unwedge things... | 19:34 |
*** rbrady is now known as rbrady-afk | 19:34 | |
rook | Anyway, maybe there is somewhere we can add this logic? | 19:34 |
jprovazn | thrash, e.g. I can see in "git log" that rdomanager_oscplugin/v1/overcloud_deploy.py used the same, so there is a solid chance that if you find answer why it was used for other commands, the same will apply for this one :) | 19:34 |
thrash | jprovazn: ok. I'm really wondering if that should still be the case. Anyway, thanks for the answer. I'll review all of them. | 19:40 |
*** shardy_afk is now known as shardy | 19:41 | |
shardy | rook: that's really interesting feedback, thanks | 19:41 |
ccamacho | EmilienM this already landed | 19:41 |
ccamacho | https://review.openstack.org/#/c/378750/ | 19:42 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Downloads templates from swift before processing update https://review.openstack.org/381899 | 19:42 |
EmilienM | ccamacho: not backported | 19:42 |
shardy | rook: I think there may be somewhere we could wire that in provided the host actually booted | 19:42 |
ccamacho | aaaaaaaaaaaa I see | 19:42 |
EmilienM | ccamacho: our CI is running stable/newton packages... | 19:42 |
shardy | rook: it's more difficult if it failed to boot and the ironic reboot fixes a boot issue | 19:42 |
rook | shardy if there is somewhere I could add this check, I don't mind looking at implementing it... it is a PITA to find out that a host is either A) Stuck in PXE or B) Stuck due some fubar raid controller, at which point we should reschedule to a different node. | 19:42 |
shardy | rook: yeah, I think it's a discussion to have with dtantsur|afk and lucasagomes when they're around | 19:43 |
shardy | rook: there are probably things we could do via mistral to e.g ping nodes before attempting to configure them, but right now the only ping validation we do is inside the nodes | 19:43 |
shardy | so clearly we're not getting to that if it's a boot time problem | 19:44 |
rook | shardy yeah... it happens quite often... which isn't a _huge_ deal... but if a customer is scaling a install out, and they hit this -- they need to debug the reason for the failure, IE look at nova list, see which hosts don't ping, then ironic node-list | grep nova-uuid, then open up the console to that host and figure out wtf went wrong. | 19:44 |
shardy | rook: could you perhaps start a ML thread, or raise a bug about the symtoms and workaround you have in mind? | 19:44 |
rook | right shardy | 19:44 |
shardy | rook: I think we definitely want to act on this operational experience, it's just worth some wider discussion before we do a tripleo specific workaround I think | 19:45 |
shardy | rook: one thing which would really help is if you can describe in detail the recovery workflow you want to automate | 19:46 |
shardy | including which API (nova, ironic) and what state things are in | 19:46 |
shardy | rook: I started looking into a mistral workflow which deploys nodes directly via ironic (see https://review.openstack.org/#/c/313048/), but it's possible we could have a workflow that instead automates the nova->ironic dance you're describing | 19:47 |
lucasagomes | shardy, rook hi there. Yeah we do lack a mechanism to check if the machine did boot correctly or not | 19:48 |
rook | shardy sure - so a RFE bz and a us discussion | 19:49 |
larsks | shardy, in https://github.com/openstack/instack-undercloud/commit/55ccd0e1e8c66c9f474112358063bc263720d84f, shouldn't heat::limit_iterators be heat::yaql_limit_iterators? I am looking at e.g. https://github.com/openstack/puppet-heat/blob/stable/newton/manifests/init.pp#L422 | 19:51 |
*** mbozhenko has joined #tripleo | 19:52 | |
shardy | rook: yup, and/or just file a launchpad bug where we can discuss further | 19:54 |
shardy | thanks for the feedback | 19:54 |
shardy | larsks: yes! Good catch :) | 19:55 |
shardy | http://logs.openstack.org/64/381864/3/check/gate-tripleo-ci-centos-7-nonha-multinode/3eff38e/logs/etc/heat/heat.conf.txt.gz | 19:55 |
shardy | larsks: you can see in that recent CI job we're only setting memory_quota | 19:55 |
larsks | shardy, I will file a change... | 19:55 |
shardy | larsks: thanks | 19:55 |
shardy | I don't think we actually hit that limit, but it seemed a good idea to bump them both at the same time | 19:56 |
*** mbozhenko has quit IRC | 19:57 | |
*** paramite has quit IRC | 19:58 | |
openstackgerrit | Lars Kellogg-Stedman proposed openstack/instack-undercloud: correctly spell yaql_limit_iterators https://review.openstack.org/382056 | 19:58 |
shardy | EmilienM: FYI I abandoned https://review.openstack.org/#/c/381903/ because https://review.openstack.org/#/c/378737/ won't merge to master with the backport posted | 19:58 |
shardy | I'll re-approve to see if we can get the master tht patch to gate, then we can restore | 19:58 |
EmilienM | we have a chicken and egg problem then | 19:58 |
EmilienM | our CI is using packages from stable/newton | 19:58 |
shardy | For patches to master? | 19:59 |
shardy | we need to land both dependent patches to master, then propose them to stable/newton, no? | 19:59 |
EmilienM | shardy: yes | 19:59 |
EmilienM | until we have ocata repo, tripleo takes newton packages. | 19:59 |
EmilienM | since Friday. | 20:00 |
EmilienM | it's how delorean works, they take latest branch | 20:00 |
shardy | Hmm, OK | 20:00 |
shardy | then I guess we have to land https://review.openstack.org/#/c/381903/ | 20:00 |
EmilienM | yes | 20:01 |
*** dprince has quit IRC | 20:01 | |
EmilienM | shardy: i'm approving it | 20:01 |
shardy | EmilienM: Ok, thanks, I hadn't realized we had such a branch issue with CI | 20:01 |
EmilienM | shardy: I hope it will pass CI though | 20:01 |
EmilienM | because our multinode job is voting | 20:01 |
shardy | I don't think there's any reason for this to fail the multinode job, so hopefully it will be OK | 20:02 |
EmilienM | ok | 20:03 |
*** shardy has quit IRC | 20:03 | |
*** lblanchard has quit IRC | 20:06 | |
EmilienM | dprince left, firewall still not passing pingtest http://logs.openstack.org/64/381864/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/045a9cc/console.html#_2016-10-04_19_59_14_926984 | 20:06 |
*** paramite has joined #tripleo | 20:06 | |
*** ccamacho has quit IRC | 20:08 | |
*** Goneri has quit IRC | 20:09 | |
openstackgerrit | Steven Hardy proposed openstack/tripleo-heat-templates: Move the main template files for defalut services to new syntax generation https://review.openstack.org/381975 | 20:15 |
*** ipsecguy has quit IRC | 20:20 | |
*** ipsecguy has joined #tripleo | 20:20 | |
*** gchamoul has quit IRC | 20:21 | |
*** gchamoul has joined #tripleo | 20:22 | |
*** rbowen has quit IRC | 20:23 | |
*** zeroshft has joined #tripleo | 20:28 | |
*** zeroshft has quit IRC | 20:28 | |
*** egafford has quit IRC | 20:29 | |
*** coolsvap has quit IRC | 20:32 | |
*** maticue has joined #tripleo | 20:34 | |
*** jayg is now known as jayg|g0n3 | 20:37 | |
*** rbowen has joined #tripleo | 20:38 | |
*** shardy has joined #tripleo | 20:39 | |
*** paramite has quit IRC | 20:47 | |
*** jprovazn has quit IRC | 20:54 | |
*** dhill_ has quit IRC | 20:54 | |
*** chem has quit IRC | 20:54 | |
*** oneswig has joined #tripleo | 20:58 | |
*** dhill_ has joined #tripleo | 20:58 | |
*** dhill_ has quit IRC | 21:01 | |
*** dhill_ has joined #tripleo | 21:01 | |
*** rbowen has quit IRC | 21:10 | |
*** akrivoka has quit IRC | 21:11 | |
*** trown is now known as trown|outtypewww | 21:12 | |
*** shardy has quit IRC | 21:15 | |
*** tiswanso has quit IRC | 21:15 | |
*** mcornea has quit IRC | 21:15 | |
*** ipsecguy has quit IRC | 21:18 | |
*** ipsecguy has joined #tripleo | 21:18 | |
*** jeckersb is now known as jeckersb_gone | 21:25 | |
*** adam_g` is now known as adam_g | 21:25 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Do Not Merge - Test Undercloud upgrade mitaka -> newton https://review.openstack.org/381309 | 21:27 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Undercloud upgrade for mitaka and newton https://review.openstack.org/381286 | 21:27 |
*** akshai has quit IRC | 21:30 | |
openstackgerrit | Merged openstack/tripleo-common: Modify j2 templating to allow role files generation https://review.openstack.org/381903 | 21:36 |
*** rbrady-afk is now known as rbrady | 21:38 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: j2 template role config templates https://review.openstack.org/378737 | 21:40 |
*** dbecker has quit IRC | 21:40 | |
*** b00tcat has quit IRC | 21:43 | |
*** jcoufal_ has quit IRC | 21:45 | |
*** akshai has joined #tripleo | 21:48 | |
*** akshai has quit IRC | 21:49 | |
*** bana_k has quit IRC | 21:50 | |
*** bana_k has joined #tripleo | 21:52 | |
*** mbozhenko has joined #tripleo | 21:53 | |
openstackgerrit | Emilien Macchi proposed openstack-infra/tripleo-ci: enable undercloud/ssh on multinode jobs https://review.openstack.org/382082 | 21:56 |
*** mbozhenko has quit IRC | 21:58 | |
slagle | MOAR ssh | 21:59 |
*** oneswig has quit IRC | 22:00 | |
EmilienM | :) | 22:03 |
*** mburned is now known as mburned_out | 22:04 | |
dmsimard | slagle: fyi https://review.rdoproject.org/r/#/c/3044/1 | 22:12 |
*** tobias-fiberdata has joined #tripleo | 22:29 | |
*** tobias_fiberdata has quit IRC | 22:34 | |
openstackgerrit | Honza Pokorny proposed openstack/tripleo-common: Support node untagging https://review.openstack.org/372628 | 22:34 |
*** dsariel_ has quit IRC | 22:36 | |
*** pradk has quit IRC | 22:48 | |
*** rcernin has quit IRC | 22:50 | |
*** rhallisey has quit IRC | 22:55 | |
*** yamahata has quit IRC | 22:55 | |
*** dsariel_ has joined #tripleo | 23:04 | |
*** rajinir has quit IRC | 23:05 | |
*** tiswanso has joined #tripleo | 23:09 | |
*** tiswanso has quit IRC | 23:13 | |
*** saneax-_-|AFK is now known as saneax | 23:13 | |
*** sthillma has joined #tripleo | 23:16 | |
*** yamahata has joined #tripleo | 23:17 | |
*** sthillma has quit IRC | 23:21 | |
*** penick has quit IRC | 23:26 | |
*** penick has joined #tripleo | 23:30 | |
*** bana_k has quit IRC | 23:39 | |
*** bana_k has joined #tripleo | 23:42 | |
*** maticue has quit IRC | 23:42 | |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Downloads templates from swift before processing update https://review.openstack.org/381899 | 23:51 |
*** mbozhenko has joined #tripleo | 23:53 | |
*** mbozhenko has quit IRC | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!