*** Marga_ has joined #tripleo | 00:01 | |
*** Marga_ has quit IRC | 00:02 | |
*** Marga_ has joined #tripleo | 00:03 | |
*** Marga_ has quit IRC | 00:03 | |
*** Marga_ has joined #tripleo | 00:03 | |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder: Remove ssh host keys when using simple init https://review.openstack.org/301982 | 00:04 |
---|---|---|
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Upload the ironic-python-agent images to cache https://review.openstack.org/301698 | 00:08 |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: [WIP] Use the cached ironic-python-agent images https://review.openstack.org/301699 | 00:08 |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Fix image caching logic https://review.openstack.org/299537 | 00:08 |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Pre install packages on the instack image https://review.openstack.org/299538 | 00:08 |
*** xinwu has joined #tripleo | 00:14 | |
*** shivrao has joined #tripleo | 00:27 | |
*** rhallisey has quit IRC | 01:00 | |
*** yuanying has quit IRC | 01:01 | |
*** yuanying has joined #tripleo | 01:02 | |
*** yuanying has quit IRC | 01:04 | |
*** yuanying has joined #tripleo | 01:13 | |
*** yamahata has joined #tripleo | 01:15 | |
*** tiswanso has joined #tripleo | 01:15 | |
*** tiswanso has quit IRC | 01:17 | |
*** tiswanso has joined #tripleo | 01:18 | |
*** akshai has joined #tripleo | 01:21 | |
*** akshai_ has joined #tripleo | 01:22 | |
*** yuanying has quit IRC | 01:25 | |
*** akshai has quit IRC | 01:25 | |
*** yuanying has joined #tripleo | 01:28 | |
*** yuanying has quit IRC | 01:30 | |
*** yuanying has joined #tripleo | 01:39 | |
*** yuanying has quit IRC | 01:42 | |
*** yuanying has joined #tripleo | 01:43 | |
*** yuanying has quit IRC | 02:02 | |
*** yuanying has joined #tripleo | 02:05 | |
*** shivrao has quit IRC | 02:37 | |
*** ramishra has joined #tripleo | 02:46 | |
*** ccamacho has quit IRC | 02:47 | |
*** ccamacho has joined #tripleo | 02:47 | |
*** yuanying has quit IRC | 02:50 | |
*** lblanchard has joined #tripleo | 03:13 | |
*** lblanchard has quit IRC | 03:17 | |
*** panda has quit IRC | 03:33 | |
*** panda has joined #tripleo | 03:33 | |
*** coolsvap has joined #tripleo | 03:35 | |
*** ramishra has quit IRC | 03:42 | |
*** tiswanso has quit IRC | 03:43 | |
*** ramishra has joined #tripleo | 03:45 | |
*** yuanying has joined #tripleo | 03:46 | |
*** dmacpher has quit IRC | 03:48 | |
*** links has joined #tripleo | 03:52 | |
*** rwsu has quit IRC | 04:00 | |
*** oshvartz has quit IRC | 04:29 | |
*** rwsu has joined #tripleo | 04:29 | |
*** r-mibu has quit IRC | 04:29 | |
*** r-mibu has joined #tripleo | 04:30 | |
*** tzumainn has quit IRC | 04:38 | |
*** ramishra has quit IRC | 04:45 | |
*** rwsu has quit IRC | 04:49 | |
*** rwsu has joined #tripleo | 04:55 | |
*** jaosorior has joined #tripleo | 04:55 | |
*** shivrao has joined #tripleo | 05:00 | |
openstackgerrit | Swapnil Kulkarni (coolsvap) proposed openstack/tripleo-docs: First documentation for Operational tools https://review.openstack.org/265271 | 05:10 |
*** rwsu_ has joined #tripleo | 05:14 | |
openstackgerrit | Swapnil Kulkarni (coolsvap) proposed openstack/tripleo-docs: First documentation for Operational tools https://review.openstack.org/265271 | 05:17 |
*** rwsu has quit IRC | 05:18 | |
openstackgerrit | Swapnil Kulkarni (coolsvap) proposed openstack/tripleo-docs: Docs for containerized compute node https://review.openstack.org/254743 | 05:20 |
*** ramishra has joined #tripleo | 05:25 | |
*** akshai_ has quit IRC | 05:28 | |
*** shivrao_ has joined #tripleo | 05:34 | |
*** Marga_ has quit IRC | 05:34 | |
*** shivrao has quit IRC | 05:37 | |
*** shivrao_ is now known as shivrao | 05:37 | |
*** Marga_ has joined #tripleo | 05:37 | |
*** Marga_ has quit IRC | 05:38 | |
*** Marga_ has joined #tripleo | 05:38 | |
*** bvandenh has joined #tripleo | 05:38 | |
*** Marga_ has quit IRC | 05:43 | |
*** liverpooler has quit IRC | 05:44 | |
*** saneax_AFK is now known as saneax | 05:44 | |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Remove ssh host keys when using simple init https://review.openstack.org/301982 | 05:47 |
*** ramishra has quit IRC | 05:51 | |
*** Marga_ has joined #tripleo | 05:52 | |
*** masco has joined #tripleo | 05:53 | |
*** oshvartz has joined #tripleo | 05:54 | |
*** Marga_ has quit IRC | 05:56 | |
lazy_prince | Hi all.. Need reviews for https://review.openstack.org/#/c/287784/18 pls.. | 05:58 |
*** rcernin has joined #tripleo | 05:59 | |
*** Marga_ has joined #tripleo | 06:01 | |
*** dtrainor has joined #tripleo | 06:01 | |
*** aufi has joined #tripleo | 06:03 | |
openstackgerrit | Merged openstack/tripleo-docs: Fix some typos in docs https://review.openstack.org/289720 | 06:10 |
*** bvandenh has quit IRC | 06:13 | |
*** mikelk has joined #tripleo | 06:13 | |
*** bvandenh has joined #tripleo | 06:17 | |
*** coolsvap has quit IRC | 06:28 | |
*** florianf has joined #tripleo | 06:30 | |
*** akuznets_ has joined #tripleo | 06:31 | |
*** jprovazn has joined #tripleo | 06:39 | |
*** leanderthal|afk is now known as leanderthal | 06:40 | |
*** hewbrocca-afk is now known as hewbrocca | 06:41 | |
*** coolsvap has joined #tripleo | 06:41 | |
*** hewbrocca has quit IRC | 06:46 | |
*** hewbrocca-afk has joined #tripleo | 06:46 | |
*** hewbrocca-afk is now known as hewbrocca | 06:46 | |
jaosorior | marios: Hey dude, the update gate seems to only pass in stable/mitaka. Are you aware if this is a known issue? Or, is there a specific issue that's preventing it from passing on master? | 06:47 |
hewbrocca | jaosorior: morning | 06:47 |
jaosorior | hewbrocca: Hey dude, how's it going? | 06:47 |
hewbrocca | Well not so bad all in all | 06:48 |
hewbrocca | Folks found some interesting issues with CI setup last night | 06:48 |
hewbrocca | jistr was working on it | 06:48 |
hewbrocca | seems like things were going really slow and we were wondering if the machines were actually swapping | 06:49 |
hewbrocca | the slowness was causing corosync to lose the cluster on the HA job | 06:49 |
jaosorior | I see | 06:50 |
jaosorior | well, today the HA job seems to be passing | 06:50 |
jaosorior | but the upgrades job has been red for a long time | 06:50 |
jaosorior | so I was just wondering if there's a specific error, or a set of errors, that are known to cause this | 06:50 |
hewbrocca | not sure TBH | 06:50 |
marios | bookmark added https://review.openstack.org/#/q/status:open+project:openstack/tripleo-quickstart,n,z | 06:51 |
marios | o/ jaosorior | 06:51 |
marios | reading | 06:51 |
marios | hewbrocca: was that from qe setups with testing upgrades you mean actual ci upstream setup? | 06:52 |
* marios wondering if there is a new issue | 06:52 | |
*** akuznets_ has quit IRC | 06:53 | |
marios | jaosorior: yeah not sure what's going on there | 06:53 |
marios | jaosorior: just getting into my review round - but fwiw http://tripleo.org/cistatus.html aint a lot green | 06:55 |
*** qasims has joined #tripleo | 06:55 | |
*** rook-lappy has quit IRC | 06:56 | |
jaosorior | indeed | 06:56 |
*** rook-lappy has joined #tripleo | 06:57 | |
*** jaosorior has quit IRC | 07:00 | |
*** dshulyak has joined #tripleo | 07:01 | |
*** athomas has quit IRC | 07:02 | |
*** athomas has joined #tripleo | 07:02 | |
*** openstackgerrit has quit IRC | 07:02 | |
*** openstackgerrit has joined #tripleo | 07:03 | |
*** ifarkas has joined #tripleo | 07:04 | |
*** akuznetsov has joined #tripleo | 07:04 | |
*** Marga_ has quit IRC | 07:05 | |
*** liverpooler has joined #tripleo | 07:05 | |
*** pcaruana has joined #tripleo | 07:06 | |
*** jaosorior has joined #tripleo | 07:07 | |
*** akuznetsov has quit IRC | 07:08 | |
*** rcernin has quit IRC | 07:08 | |
*** rcernin has joined #tripleo | 07:09 | |
*** rook-lappy has quit IRC | 07:14 | |
*** rlandy has quit IRC | 07:15 | |
*** shivrao has quit IRC | 07:16 | |
*** dsneddon has quit IRC | 07:17 | |
*** ccamacho has quit IRC | 07:21 | |
*** jaosorior has quit IRC | 07:22 | |
*** jaosorior has joined #tripleo | 07:22 | |
*** gfidente has joined #tripleo | 07:23 | |
*** dsneddon has joined #tripleo | 07:29 | |
*** panda has quit IRC | 07:32 | |
*** panda has joined #tripleo | 07:33 | |
*** jaosorior has quit IRC | 07:47 | |
*** jaosorior has joined #tripleo | 07:47 | |
*** michchap has quit IRC | 07:50 | |
*** shardy has joined #tripleo | 07:51 | |
*** dtantsur|afk is now known as dtantsur | 07:51 | |
*** shardy has quit IRC | 07:52 | |
*** mbound has joined #tripleo | 07:52 | |
*** shardy has joined #tripleo | 07:53 | |
dtantsur | morning folks. so gate is still down, not worth rechecking? | 07:56 |
jaosorior | dtantsur: the update gate seems to fail every time. However, HA and non-HA are passing | 07:56 |
dtantsur | could you please at least land https://review.openstack.org/#/c/288417/ which is from the time when gate was green? | 07:56 |
dtantsur | jaosorior, but update gate is voting, right? | 07:56 |
jaosorior | dtantsur: Oh yeah, I had read that code before... dunno why I didn't score it | 07:58 |
dtantsur | :) | 07:58 |
jaosorior | dtantsur: it is voting. So yeah, the gate is not fully green, and I was just giving you the status of it. So if you see a failure on the HA, it might be an actual issue, cause that's passing now | 08:00 |
dtantsur | thanks | 08:02 |
openstackgerrit | Merged openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device https://review.openstack.org/288417 | 08:02 |
*** ohamada has joined #tripleo | 08:02 | |
dtantsur | shardy, morning! are backport exceptions requests still accepted, and how are chances of getting one for ^^^? | 08:03 |
dtantsur | the idea of this patch is to simplify live people upgrading from Kilo | 08:04 |
*** jpena|off is now known as jpena | 08:04 | |
dtantsur | * life for people | 08:04 |
shardy | dtantsur: if it's likely to break folks on upgrade then I think we agreed upgrade related patches have an exception anyway | 08:05 |
shardy | but if you think it requires discussion, please drop a mail to the list describing the reason we need it and the risk of backporting, and folks can vote there | 08:06 |
dtantsur | shardy, for some definition of "break"... please take a look at the commit message | 08:06 |
dtantsur | will do, just want to get a quick sanity check | 08:06 |
shardy | dtantsur: in the heat meeting atm but will review after, to me it looks OK as it's just adding some new cli options, not an entirely new feature, so should be low risk | 08:07 |
shardy | dtantsur: also, on upgrade, note that folks will upgrade to the latest client before doing the upgrade anyway | 08:08 |
shardy | although you said kilo->liberty so I guess it'd need to be on the liberty branch? | 08:08 |
dtantsur | shardy, yep. downstream we worked around it by changing IPA default logic, but upstream is still affected | 08:10 |
* dtantsur should really write a email on all this | 08:10 | |
jaosorior | thrash: Are you around? | 08:11 |
shardy | dtantsur: cool, email to the list sounds good | 08:11 |
jaosorior | thrash, shardy: Would sure use some reviews for this: https://review.openstack.org/#/c/299475/ trying to solve this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1320950 | 08:12 |
openstack | bugzilla.redhat.com bug 1320950 in rhel-osp-director "os-cloud-config hardcodes SSL ports" [High,On_dev] - Assigned to josorior | 08:12 |
*** paramite has joined #tripleo | 08:13 | |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Allow import command to set deploy image and local boot https://review.openstack.org/299362 | 08:15 |
*** mcornea has joined #tripleo | 08:15 | |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Use wait_for_finish from python-ironic-inspector-client https://review.openstack.org/299377 | 08:17 |
dtantsur | rebase party \o/ | 08:17 |
dtantsur | :) | 08:17 |
*** jistr has joined #tripleo | 08:22 | |
dtantsur | shardy, mail sent | 08:33 |
*** hjensas has quit IRC | 08:38 | |
*** hjensas has joined #tripleo | 08:38 | |
openstackgerrit | xin wu proposed openstack/os-net-config: Always reset-failed ivs before restart ivs https://review.openstack.org/302096 | 08:43 |
*** devvesa has joined #tripleo | 08:43 | |
*** derekh has joined #tripleo | 08:46 | |
*** shivrao has joined #tripleo | 08:49 | |
openstackgerrit | Jiri Stransky proposed openstack/tripleo-heat-templates: Increase corosync token timeout https://review.openstack.org/301836 | 08:51 |
jistr | gfidente: morning :) | 08:52 |
gfidente | jistr I saw the changes | 08:52 |
jistr | so it looks like trown|outtypewww's recheck on https://review.openstack.org/#/c/301624/ produced a green result indeed | 08:52 |
jistr | i mean for the HA job | 08:52 |
jistr | and it has passed for non-HA before | 08:53 |
jistr | i wonder if it's worth merging, or we should try another recheck | 08:53 |
openstackgerrit | Merged openstack-infra/tripleo-ci: Fix timeout on crm_resource --wait https://review.openstack.org/301624 | 08:53 |
hewbrocca | derekh: Hey, did we ever turn *off* swap on the testenvs? | 08:54 |
gfidente | it's safe change in ci | 08:54 |
jistr | gfidente: cool, thanks | 08:54 |
derekh | hewbrocca: no, we can't | 08:54 |
derekh | hewbrocca: we're still over commited although not as badly as we were | 08:54 |
gfidente | jistr so eyes on https://review.openstack.org/#/c/301836/2 now :) | 08:54 |
hewbrocca | derekh: arrgh | 08:55 |
hewbrocca | jistr: there's your slowness, I guess | 08:55 |
hewbrocca | derekh: hey I had another thought, tell me what you think | 08:55 |
derekh | hewbrocca: most of the time it isn't used, and I don't think its causing the slowness | 08:55 |
hewbrocca | we've ordered the RAM upgrade | 08:55 |
jistr | yeah i'll just post more context for derekh | 08:55 |
jistr | yesterday we discussed the HA job failure rate | 08:56 |
hewbrocca | While we've got the boxes open, worth dropping an SSD in each one? | 08:56 |
jistr | and it seems that the cause is generally things being slow | 08:56 |
jistr | http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2016-04-05.log.html#t2016-04-05T16:43:20 | 08:56 |
hewbrocca | assuming it's possible | 08:56 |
gfidente | yeah I think the reason for the high load is indeed disks | 08:56 |
jistr | derekh: mainly corosync log looks bad http://paste.fedoraproject.org/349953/14598757/ | 08:56 |
jistr | "Corosync main process was not scheduled for 6085.1523 ms (threshold is 1320.0000 ms). Consider token timeout increase." | 08:57 |
jistr | this probably results in corosync losing cluster members | 08:57 |
derekh | hewbrocca: it certainly wouldn't do any harm, also each host has 4 disks only one of which is in use, I was wondering about RAID for the new deployment, spreading the load might speed it up | 08:57 |
hewbrocca | I'm sure it would, but not as much as an SSD :) | 08:58 |
derekh | jistr: yes, we've had those warnings for months | 08:58 |
derekh | hewbrocca: yup | 08:58 |
hewbrocca | derekh: I'll check with the relevant sysadmin and see if they think it could be helpful | 08:58 |
openstackgerrit | Merged openstack/diskimage-builder: Remove ssh host keys when using simple init https://review.openstack.org/301982 | 08:58 |
derekh | hewbrocca: ack | 08:58 |
*** sambetts|afk is now known as sambetts | 09:00 | |
*** mkovacik has quit IRC | 09:01 | |
gfidente | derekh I think we should enable writeback cache too | 09:01 |
gfidente | in the guests xml | 09:02 |
gfidente | <driver name='qemu' cache='writeback' /> | 09:03 |
gfidente | so the guests don't wait for data to be committed on disk | 09:04 |
hewbrocca | that's an excellent idea | 09:04 |
*** Marga_ has joined #tripleo | 09:05 | |
*** electrofelix has joined #tripleo | 09:05 | |
openstackgerrit | Saverio Proto proposed openstack/diskimage-builder: Debian: dont set always the hostname to debian https://review.openstack.org/205047 | 09:07 |
derekh | gfidente: hewbrocca their already in unsafe mode | 09:07 |
*** shivrao has quit IRC | 09:08 | |
*** olap has joined #tripleo | 09:08 | |
gfidente | :( | 09:08 |
derekh | [root@testenv31-testenv1-bsom7x7dmic2 heat-admin]# virsh dumpxml baremetal2brbm_one2_3 | grep unsafe | 09:09 |
derekh | <driver name='qemu' type='qcow2' cache='unsafe'/> | 09:09 |
derekh | [root@testenv31-testenv1-bsom7x7dmic2 heat-admin]# virsh dumpxml seed_2 | grep unsafe | 09:09 |
derekh | <driver name='qemu' type='qcow2' cache='unsafe'/> | 09:09 |
*** Marga_ has quit IRC | 09:10 | |
derekh | So here is my take on the situation, 6 months ago (or thereabouts), our non HA job was looking like we would soon be able to get it below 60 minutes, since then all that has changed on the testenvs is | 09:12 |
derekh | 1. We have increased thew CPU and RAM allocated to each undercloud | 09:12 |
derekh | 2. We have increased the RAM allocated to each overcloud | 09:12 |
derekh | 3. We have reduced the number of test envs hosted by each hosts from 4 to 3 | 09:12 |
derekh | 4. We have added 6? more nics to each instance on the host | 09:12 |
derekh | 5. there is others but that the main ones I can remember | 09:12 |
derekh | And the result is that the shortest running job I can point to is now 101 minutes | 09:12 |
derekh | The wall time on our jobs is now 50% more then it used to be | 09:12 |
derekh | We have added something or changed something or screwed up something else that is now demanding a lot more then it used to (or any combination of these) | 09:12 |
hewbrocca | is it just we've added more other jobs? | 09:13 |
*** coolsvap has quit IRC | 09:13 | |
derekh | we need to find out what and until somebody figures it out were gonna keep adding more resources until we get to a happy place | 09:13 |
hewbrocca | derekh: how big an SSD would you want | 09:13 |
hewbrocca | wfoster says there's space in the boxes | 09:13 |
derekh | hewbrocca: no matter how many jobs we add, each host is only gonna run 3 at a time (it used to be 4) | 09:13 |
hewbrocca | derekh: *nod* | 09:14 |
hewbrocca | is it that the HA job runs with pacemaker enabled now? | 09:14 |
*** ramishra has joined #tripleo | 09:15 | |
derekh | hewbrocca: I don't know the answer to that to be honest | 09:15 |
derekh | hewbrocca: but that wouldn't explain why the nonha job is now never below 100 minutes even during quiet times | 09:16 |
hewbrocca | indeed not | 09:17 |
jistr | we only had HA with pacemaker. But we added more services, and one more step where overcloud init is slowly moving into. Still, it's likely that there are possibilities for optimization somewhere. | 09:17 |
hewbrocca | the *nonha* job takes that long? | 09:17 |
derekh | hewbrocca: yes (the others a now worse) | 09:17 |
derekh | hewbrocca: 100 minutes is minimum for nonha now, 130 is probably average (based on a glance of the stats for the last few days) | 09:18 |
derekh | We can't be blaming that change on HW that hasn't changed... | 09:19 |
*** paramite is now known as paramite|afk | 09:19 | |
jistr | could it be that it's not a code change which did the increase, but deploying more services (aodh, sahara) made us add swap, and that made the jobs run longer? | 09:19 |
gfidente | I think we saw a bit spike after enabling netiso | 09:20 |
derekh | jistr: it could, worth investigating | 09:20 |
jistr | hmm that could be it too, but that's not enabled for non-ha job | 09:20 |
jistr | gfidente: ^ | 09:20 |
derekh | gfidente: yup, thats certainly a good candidate to look into I think | 09:21 |
derekh | jistr: the nics arn't used but they are persent on the instances, just having them defined on the host could be adding load somewhere | 09:22 |
*** chem has quit IRC | 09:23 | |
shardy | http://paste.openstack.org/show/493116/ | 09:24 |
shardy | Hey guys, FYI I'm digging a little into memory usage on the undercloud ^^ | 09:24 |
shardy | it looks like we have a bunch of non-heat memory hogs ;) | 09:25 |
shardy | mysql, ceilometer-agent-notification, erlang, sensu and swift-object-replicator are all using >1G on a freshly started by idle undercloud | 09:25 |
jaosorior | what the hell O_O | 09:25 |
jaosorior | what's the backend for ceilometer in the undercloud? Is it mysql? | 09:26 |
shardy | Yes I think it is | 09:27 |
shardy | we actually don't need ceilometer at all in the undercloud | 09:27 |
*** chem has joined #tripleo | 09:27 | |
shardy | so we can probably save a gig by just turning it off | 09:27 |
jaosorior | That makes sense to me | 09:27 |
jaosorior | I have no idea how to reduce the load for rabbitmq though | 09:28 |
shardy | We could even turn it off for some overclouds too, given that we don't test it | 09:28 |
jaosorior | nor swift | 09:28 |
shardy | e.g we don't need all jobs to enable every service | 09:28 |
lazy_prince | greghaynes: can you pls review https://review.openstack.org/#/c/287784/ | 09:31 |
hewbrocca | shardy: huh yeah that seems like an easy win | 09:32 |
derekh | shardy: those number don't seem right, just adding up your top 10 there gets to about 8G, also top on the undercloud on CI runs is showing ceilometer way down the list | 09:34 |
jistr | we smoke-test it by starting it, at least the non-ha job would fail if anything fails to start up. But there might be more value on the "let's have stable jobs" side than "let's smoke-test ceilometer" anyway. | 09:34 |
derekh | :q | 09:34 |
*** xinwu has quit IRC | 09:34 | |
openstackgerrit | Merged openstack/os-net-config: Add MASTER=bond SLAVE=yes to linux bond interfaces https://review.openstack.org/301827 | 09:36 |
* shardy looks at tripleo-ci scripts | 09:36 | |
*** ramishra has quit IRC | 09:37 | |
hewbrocca | woop woop ^^^ | 09:38 |
derekh | shardy: look in host_info.txt , search for top -n 1 -b -o RES | 09:40 |
shardy | derekh: Yeah that does give a somewhat different view: | 09:43 |
shardy | http://paste.openstack.org/show/493120/ | 09:43 |
shardy | rabbit and mysql are still shown as amongst the worst tho | 09:44 |
shardy | I'm not sure why the ceilometer-agent is so wrong tho | 09:44 |
* shardy juggles different memory stats | 09:45 | |
jistr | looking at HA http://logs.openstack.org/40/301840/1/check-tripleo/gate-tripleo-ci-f22-ha/3a5eacb/logs/ | 09:45 |
derekh | shardy: not sure either, didn't even try to understand the awk magic | 09:45 |
jistr | undercloud http://fpaste.org/350264/93587214/raw/ | 09:45 |
*** xinwu has joined #tripleo | 09:45 | |
jistr | overcloud http://fpaste.org/350265/35961145/raw/ | 09:46 |
derekh | shardy: also on the undercloud we have var/log/dstat-csv.log , using that you can see how much memory is being used over time, and how much swap is being used | 09:47 |
shardy | derekh: Yeah, I'm pretty sure the ps size metric is misleading, but there's definitely evidence that DB & rabbit are chewing through a fair amount | 09:48 |
* shardy runs dstat and kills ceilometer | 09:49 | |
derekh | shardy: yup | 09:49 |
hewbrocca | scwewy wabbit | 09:51 |
openstackgerrit | Dmitry Ilyin proposed openstack/puppet-pacemaker: Merge with fuel-infra/puppet-pacemaker https://review.openstack.org/296440 | 09:55 |
*** ramishra has joined #tripleo | 09:57 | |
*** akrivoka has joined #tripleo | 10:03 | |
*** tosky has joined #tripleo | 10:09 | |
*** ramishra has quit IRC | 10:09 | |
*** bvandenh has quit IRC | 10:19 | |
-openstackstatus- NOTICE: npm lint jobs are failing due to a problem with npm registry. The problem is under investigation, and we will update once the issue is solved. | 10:20 | |
*** ChanServ changes topic to "npm lint jobs are failing due to a problem with npm registry. The problem is under investigation, and we will update once the issue is solved." | 10:20 | |
*** ramishra has joined #tripleo | 10:20 | |
*** zoli|wfh is now known as zoli|intw | 10:24 | |
*** mgould has joined #tripleo | 10:25 | |
*** bvandenh has joined #tripleo | 10:33 | |
*** pblaho has quit IRC | 10:34 | |
*** weshay has quit IRC | 10:37 | |
*** mkovacik has joined #tripleo | 10:40 | |
*** dshulyak has quit IRC | 10:48 | |
*** paramite|afk has quit IRC | 10:49 | |
openstackgerrit | Dmitry Ilyin proposed openstack/puppet-pacemaker: Merge with fuel-infra/puppet-pacemaker https://review.openstack.org/296440 | 10:51 |
*** rodrigods has quit IRC | 10:52 | |
*** rodrigods has joined #tripleo | 10:53 | |
*** rhallisey has joined #tripleo | 11:01 | |
*** dshulyak has joined #tripleo | 11:04 | |
gfidente | bandini the rabbit partition handling maybe we can make it a parameter in puppet/controller.yaml | 11:10 |
gfidente | and default to pause_minority so we don't mess with .pp ? | 11:10 |
*** veteran has joined #tripleo | 11:10 | |
*** Marga_ has joined #tripleo | 11:15 | |
*** Marga_ has quit IRC | 11:16 | |
*** Marga_ has joined #tripleo | 11:16 | |
*** michchap has joined #tripleo | 11:18 | |
bandini | gfidente: we could, but then it would be the user who has to set it, no? If we do it in the .pp we could do it automagically when count(controller_nodes)==2 | 11:18 |
gfidente | yes it should be the user | 11:19 |
bandini | gfidente: I wonder if it is worth adding it in controller.yaml in the short-term and then move the logic in .pp at a later time | 11:20 |
gfidente | my point was to avoid adding logic in the .pp | 11:21 |
bandini | ok but then, if we ever decide to support two-node clusters, we need to tell the user: if you use two nodes set this variable, etc. | 11:21 |
bandini | I think doing it automatically would be preferable? | 11:21 |
bandini | also because I assume there will be other needed changes for full two controler node support | 11:22 |
bandini | in galera and whatnot | 11:22 |
hewbrocca | Like, make everything A/P | 11:22 |
hewbrocca | yes | 11:22 |
*** weshay has joined #tripleo | 11:22 | |
hewbrocca | What I don't understand is why we're not providing a P/P configuration | 11:22 |
bandini | lol a full P/P config, would be fun :) | 11:23 |
hewbrocca | We'd get way fewer bugs filed... | 11:23 |
bandini | totally | 11:23 |
gfidente | hmm we wanted to do that with ipv6 too | 11:25 |
gfidente | have an ipv6 hieradata | 11:25 |
*** Goneri has quit IRC | 11:25 | |
bandini | and it did not work? | 11:27 |
*** dtantsur is now known as dtantsur|brb | 11:28 | |
gfidente | we still don't have 'optional inclusion' | 11:29 |
*** veteran has quit IRC | 11:30 | |
shardy | gfidente: what needs to be optionally included, vs having something set (or not) in the hierarchy? | 11:31 |
*** jpena is now known as jpena|lunch | 11:31 | |
shardy | it would actually be possible with recent heat for us to have a parameter which joins things into the hierarchy list btw | 11:31 |
gfidente | shanky chainlist? | 11:31 |
gfidente | shardy :) | 11:32 |
gfidente | resource chains | 11:32 |
gfidente | ? | 11:32 |
shardy | gfidente: I made list_join accept multiple lists | 11:32 |
shardy | which means that it'd be possible to join multiple lists, then split them back into a single hierarchy list | 11:32 |
*** panda has quit IRC | 11:33 | |
shardy | previously that wasn't possible, hence the hard-coded vendor extrraconfig things e.g in controller.yaml | 11:33 |
shardy | I'm not sure if that helps in this case, just mentioning FYI | 11:33 |
mburned | marios: https://review.openstack.org/#/c/300363/ <-- looks like it passed all tests in the last 2 runs... | 11:33 |
*** panda has joined #tripleo | 11:34 | |
mburned | marios: anything stopping +A? | 11:34 |
marios | mburned: looking | 11:34 |
marios | mburned: heh... :/ passed ha this run, previous run passed the other 2 | 11:35 |
gfidente | shardy ah you mean make hierarchy a list_join where additional hieradata is added from parameter? | 11:35 |
hewbrocca | SHIP IT | 11:35 |
marios | mburned: indeed +1 to merging and it is blocking the liberty | 11:35 |
mburned | marios: yep... | 11:35 |
marios | gfidente: jistr 14:33 < mburned> marios: https://review.openstack.org/#/c/300363/ <-- looks like it passed all tests in the last 2 runs... | 11:35 |
* mburned just triggered recheck on the liberty version | 11:35 | |
marios | going to +A | 11:35 |
hewbrocca | merge early, merge often | 11:35 |
shardy | gfidente: Yes, exactly | 11:36 |
jistr | marios: ack | 11:36 |
shardy | so e.g all these https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/controller.yaml#L1312 | 11:36 |
gfidente | shardy ah nice, Isee | 11:36 |
shardy | could be appended via some parameter which is set according to the enabled things | 11:36 |
jistr | marios, gfidente: tbh i think we can do the same here no? https://review.openstack.org/#/c/300360/ | 11:36 |
*** zoli|intw is now known as zoli|lunch | 11:36 | |
shardy | we may be able to make use of ResourceChain too, in the context of composable services to do something similar | 11:36 |
jistr | or do we expect -upgrades to be passing on liberty now? | 11:36 |
marios | jistr: looking | 11:37 |
shardy | jistr: Hey, wondering if you spotted https://review.openstack.org/#/c/296578/ ? | 11:38 |
shardy | I was hoping you and stevebaker can take a look | 11:38 |
marios | jistr: but we should land mitaka first | 11:38 |
shardy | it's the first step to fixing update preview | 11:38 |
marios | jistr: (yes but..) | 11:38 |
marios | jistr: should merge in a sec | 11:39 |
jistr | shardy: makes sense to me, +2'd | 11:39 |
shardy | jistr: thanks! | 11:40 |
marios | shardy: heat question please https://bugzilla.redhat.com/show_bug.cgi?id=1324160#c7 | 11:40 |
openstack | bugzilla.redhat.com bug 1324160 in rhel-osp-director "Overcloud nodes have an empty /etc/resolv.conf post upgrade" [High,New] - Assigned to mandreou | 11:40 |
marios | comment 7.. the question is about the validity of '[] for the actions of a structureddeployment | 11:40 |
*** ramishra has quit IRC | 11:41 | |
openstackgerrit | Merged openstack/instack-undercloud: Temporarily set +e on systemd-journald restart for +bug/1564471 https://review.openstack.org/300363 | 11:41 |
shardy | marios: so, you want to make NetworkDeployment not run, ever, even on CREATE? | 11:42 |
marios | shardy: yeah at least for the duration of the upgrade | 11:43 |
jistr | shardy: the problem there is that we have the actions set to ['CREATE'], but for some reason NetworkDeployment runs on update even though it shouldn't, so we want to try if [] would help | 11:43 |
shardy | marios: but the default NetworkDeploymentActions means we do nothing on UPDATE? | 11:43 |
gfidente | well not on an actual stack update | 11:44 |
gfidente | though on upgrade from 7 to 8 yes | 11:44 |
shardy | e.g CREATE should mean it does nothing on upgrade, because the Deployment is already CREATE_COMPLETE | 11:44 |
jistr | yea | 11:44 |
gfidente | shardy yeah it works as expected on a regular stack-update | 11:44 |
gfidente | it's upgrade from 7 to 8 where we see this happening | 11:44 |
shardy | jistr: the reason for that is probably not the Deployment - does it end up UPDATE_COMPLETE, or is it still CREATE_COMPLETE? | 11:45 |
hewbrocca | UPDATE_COMPLETE | 11:45 |
shardy | Ok, so an input value must be changing, either to the config or deployment | 11:45 |
jistr | ah so input value change overrides actions? | 11:46 |
jistr | as in, if input changes, it's always going to be redeployed, regardless of 'actions' value? | 11:46 |
shardy | Possibly, need to check, sec | 11:47 |
shardy | if that's the case, it's probably a bug, but a workaround will be to map OS::TripleO::Controller::Net::SoftwareConfig to a SoftwareConfig that does nothing | 11:47 |
shardy | Or, use an environment file to map NetworkDeployment to OS::Heat::None | 11:47 |
shardy | give me a few and I'll try a few things | 11:47 |
jistr | ok thank you | 11:48 |
marios | thanks shardy | 11:48 |
shardy | The other confusing thing here is the o-r-c script, e.g 20-os-net-config probably runs every time there's *any* change to the o-a-c data | 11:49 |
shardy | but that should just reassert the existing state | 11:49 |
gfidente | yeah because .json should be updated | 11:49 |
shardy | That's one reason I'm aiming for https://review.openstack.org/#/c/271450/ | 11:49 |
shardy | which just runs os-net-config via a script | 11:49 |
*** ramishra has joined #tripleo | 11:50 | |
shardy | that means we get an error path when it fails, and it means it's explicitly controlled via the Deployment without all the oac/orc stuff | 11:50 |
*** MaxPC has joined #tripleo | 11:50 | |
openstackgerrit | Merged openstack/instack-undercloud: Temporarily set +e on systemd-journald restart for +bug/1564471 https://review.openstack.org/300360 | 11:50 |
hewbrocca | woooooo | 11:52 |
*** ramishra has quit IRC | 11:54 | |
jaosorior | slagle: Hey dude, if you have time can you give this one a review? https://review.openstack.org/#/c/299475/ | 11:55 |
gfidente | jaosorior while messing with the 389 and keystone I realized there is no object class which provides the infamous 'enabled' attribute | 11:57 |
gfidente | jaosorior I resorted to nsAccountLock and _invert | 11:57 |
gfidente | jaosorior but that's not possible for tenants, which are not account entries | 11:57 |
gfidente | so is enabled in some freeipa schema? | 11:58 |
gfidente | cause openldap doesn't have it either | 11:58 |
*** ohamada has quit IRC | 11:58 | |
gfidente | I wonder if we shouldn't ask keystone to just make it possible to not filter for 'enabled' at all | 11:58 |
*** snecklifter_ has joined #tripleo | 12:00 | |
jaosorior | gfidente: Only thing I could find is this: https://www.rdoproject.org/documentation/keystone-integration-with-idm/ | 12:01 |
*** shadower90 has joined #tripleo | 12:01 | |
*** shadower90 has quit IRC | 12:03 | |
jistr | clean corosync log @ https://review.openstack.org/#/c/301836 | 12:04 |
jistr | this http://fpaste.org/350350/94384714/raw/ | 12:04 |
gfidente | jaosorior yeah it's using emulation | 12:04 |
gfidente | see tenant_enabled_emulation and user_enabled_emulation | 12:04 |
shardy | jistr: so, testing shows that despite the deployment going UPDATE_COMPLETE, the deploy actions are respected | 12:04 |
gfidente | so ipa doesn't have 'enabled' either it seems | 12:04 |
jistr | my 2 cents is we should merge ^^ and revert if it does trouble | 12:04 |
shardy | e.g the deployment doesn't actually run | 12:04 |
shardy | https://github.com/openstack/heat/blob/master/heat/engine/resources/openstack/heat/software_deployment.py#L259 | 12:04 |
shardy | that shows why - we process the update, but return before doing anything if the action isn't in the DEPLOY_ACTIONS list | 12:05 |
shardy | so, I think if you're seeing os-net-config running again, it's the orc script doing it (with the old config) not Heat triggering anything | 12:05 |
jaosorior | gfidente: Seems not :/ | 12:05 |
jistr | shardy: thanks for the info, that's very helpful. So it's probably not Heat triggering o-n-c after all. /cc gfidente, marios, mcornea | 12:06 |
*** masco has quit IRC | 12:06 | |
jistr | i haven't been able to reproduce the issue for me | 12:06 |
jistr | still have the same resolv.conf as before upgrade | 12:07 |
*** tiswanso has joined #tripleo | 12:07 | |
shardy | jistr, marios: as far as I can see from the bug, the issue is the upgraded version of os-net-config is configuring things differently, when it's supposed to be idempotent when run multiple times with the same config | 12:07 |
*** tiswanso has quit IRC | 12:07 | |
*** Marga_ has quit IRC | 12:07 | |
gfidente | so maybe we should compare the config.json file | 12:08 |
*** tiswanso has joined #tripleo | 12:08 | |
marios | shardy: right, we were wondering earlier if it was something like the update itself of the os-net-config package which triggers it to reapply the config | 12:08 |
gfidente | but I am pretty sure on a stack-update I don't see the resource moving in UPDATE | 12:08 |
gfidente | while during the upgrade yes | 12:08 |
marios | so there goes the nice workaround | 12:08 |
shardy | gfidente: It'll only try to update if you change one of the input_values | 12:09 |
shardy | Hmm, actually wait | 12:09 |
*** Marga_ has joined #tripleo | 12:09 | |
marios | mcornea tried the '[] too and didn't work https://bugzilla.redhat.com/show_bug.cgi?id=1324160#c8 | 12:10 |
openstack | bugzilla.redhat.com bug 1324160 in rhel-osp-director "Overcloud nodes have an empty /etc/resolv.conf post upgrade" [High,New] - Assigned to mandreou | 12:10 |
shardy | I don't think NetworkDeploymentActions will help, because Heat isn't doing anything here except returning None | 12:10 |
marios | so mcornea confirms it happens at upgrade step3 which is when the controllers are upgraded, so they get new os-net-config | 12:10 |
bandini | gfidente: remind me the bug you worked on some time ago with ipv6 and no route to host errors? | 12:11 |
marios | which we think is just triggering the re-application, but which now happens slightly differently with inclusion of netmask | 12:11 |
jistr | ok this supports the theory that this isn't heat related at all | 12:12 |
marios | yeah :/ | 12:12 |
*** morazi has joined #tripleo | 12:12 | |
shardy | So, yes, if input_values change, the Deployment goes UPDATE_COMPLETE, but the config is not reapplied, because we exit due to DEPLOY_ACTIONS | 12:12 |
shardy | but if you don't change anything, it'll stay CREATE_COMPLETE, as we don't even try to update it | 12:13 |
shardy | in either case, nothing at all should happen on the node (from Heat's perspective) | 12:13 |
*** jayg|g0n3 is now known as jayg | 12:13 | |
shardy | So it must be one of: | 12:14 |
marios | shardy: looks like os-net-config writes the ifcfg files slightly differently and that trigges the network restart which wipes resolv.conf | 12:14 |
shardy | 20-os-net-config runs every time *any* change to the oac data happens | 12:14 |
shardy | marios: OK, so I guess we either fix os-net-config, or chmod/move 20-os-net-config so it can never run | 12:15 |
marios | shardy: this happens just as we have upgraded the controllers, so it gets the new version of os-net-config which writes the files slightly differnely than before (now include netmask on the ip addr) | 12:16 |
shardy | aha | 12:16 |
shardy | perhaps dprince or dsneddon will have some ideas on if we can just fix that in os-net-config | 12:16 |
shardy | it seems to fail the idempotency requirement | 12:16 |
shardy | assuming the config itself hasn't actually changed | 12:16 |
gfidente | shardy thanks | 12:18 |
pradk | shardy, gfidente, the gnocchi patch is passing ha and nonha can we get some +2's and get it in? https://review.openstack.org/#/c/252032/ | 12:18 |
*** ramishra has joined #tripleo | 12:21 | |
*** shadower39 has joined #tripleo | 12:21 | |
*** xinwu has quit IRC | 12:22 | |
jistr | ah so by the action: ['CREATE'] on NetworkDeployment we prevent changes to the configuration coming from o-a-c, but we do not prevent os-net-config from running, so a change in os-net-config itself can still result in network bumps etc. | 12:22 |
*** shadower39 has quit IRC | 12:22 | |
hewbrocca | jistr: ARRRGH | 12:22 |
hewbrocca | you're right, that's the problem | 12:22 |
shardy | jistr: Yes, that's my understanding if it | 12:22 |
shardy | jistr: until we land https://review.openstack.org/#/c/271450/ | 12:22 |
hewbrocca | I still think it's narsty that making a perfectly reasonable network change | 12:22 |
hewbrocca | blats over your resolv.conf | 12:22 |
shardy | which is my WIP attempt to kill all the os-net-config oac/orc stuff | 12:23 |
hewbrocca | that seems like the real problem to me | 12:23 |
*** zoli|lunch is now known as zoli | 12:23 | |
*** trown|outtypewww is now known as trown | 12:23 | |
marios | jistr: mcornea shardy: updated https://bugzilla.redhat.com/show_bug.cgi?id=1324160#c9 | 12:23 |
openstack | bugzilla.redhat.com bug 1324160 in rhel-osp-director "Overcloud nodes have an empty /etc/resolv.conf post upgrade" [High,New] - Assigned to mandreou | 12:23 |
jistr | hewbrocca: yea that's another issue probably. Re-running os-net-config should ideally generate pretty much the same resolv.conf as before. | 12:24 |
shardy | marios: hehe, mid-air-collision, added some notes also | 12:24 |
hewbrocca | Unless someone makes an explicit change to affect resolv.conf... | 12:25 |
hewbrocca | it should generate EXACTLY the same one | 12:25 |
hewbrocca | I mean | 12:25 |
marios | shardy: ack thanks i was going to ask you but then thought we'd already hassled you enough | 12:25 |
hewbrocca | that's a fairly large bug | 12:25 |
shardy | marios: np, I'll push on getting that mega-patch linked into shape so we can move away from the element stuff completely | 12:25 |
*** pradk_ has joined #tripleo | 12:26 | |
jistr | also, i think i sunk my pitch for this patch in the previous discussion https://review.openstack.org/#/c/301836 | 12:26 |
jistr | very likely improved HA CI pass rate ^^ ... any takers? | 12:26 |
* jistr working on pitching skills | 12:26 | |
hewbrocca | ALL THIS CAN BE YOURS | 12:27 |
hewbrocca | for the price of one tiny +2 | 12:27 |
hewbrocca | ^^^ pitch | 12:27 |
jistr | TIL | 12:27 |
marios | jistr: 10000miliseconds | 12:27 |
marios | 10,000 | 12:28 |
jistr | ten thousand | 12:28 |
marios | lol | 12:28 |
marios | jistr: yeah but | 12:28 |
marios | $cluster_setup_extras = { '--token' => hiera('corosync_token_timeout', 1000) } | 12:28 |
* shardy was just about to ask the same | 12:28 | |
jistr | yea i explained in the commit message, but i'm open to -1s :D | 12:29 |
* marios re-reads | 12:29 | |
jistr | i wanted the manifest default to be the same as the real corosync default | 12:29 |
gfidente | bandini ipv6 and no routes? | 12:29 |
jistr | but if that feels weird i can sync it with hiera, or remove the manifest default altogether | 12:29 |
shardy | Ok, provided it was intentional +2 :) | 12:29 |
bandini | gfidente: yea, unless I misremember. In which case ignore me ;) | 12:30 |
marios | jistr: i see so you set 10K but if that is unset for some reason, set the default which is 1K | 12:30 |
jistr | yea | 12:30 |
marios | jistr: yeah ok, i mean if we just want the default then i'd rather not set anything at all but it works like that too | 12:31 |
marios | jistr: i guess its not likely to change in the module ... | 12:31 |
jistr | marios: right, not setting anything would be best functional-wise, but it could make some complex/ugly puppet. We already if/else based on IPv6 in the same spot. | 12:32 |
marios | jistr: ack i +2 with comment | 12:32 |
jistr | maybe using joins as what we do with cinder backends could be done | 12:32 |
jistr | hmm cinder backends is array though, not map, but perhaps there's a similar approach | 12:34 |
jistr | if it's not a blocker i'd be inclined to land it though | 12:35 |
jistr | can we do so without -upgrades passing? it has about 1 in 10 (or less) chance of passing, judging by http://tripleo.org/cistatus.html | 12:35 |
jistr | so recheck might not help | 12:36 |
*** Marga_ has quit IRC | 12:36 | |
trown | +1 to ignoring upgrades job until we up the pass rate | 12:37 |
trown | it is below 10% over the last 3 days conditional on the nonha job passing on the same patch | 12:37 |
openstackgerrit | Dan Prince proposed openstack-infra/tripleo-ci: Metrics tracking for TripleO deployment tasks https://review.openstack.org/291393 | 12:37 |
openstackgerrit | Dan Prince proposed openstack-infra/tripleo-ci: Add common bash functions to help track metrics. https://review.openstack.org/291392 | 12:37 |
trown | http://chunk.io/f/37247de59deb451dbef7be66bcaacb4f | 12:38 |
marios | jistr: gfidente shardy how about a exclude=os-net-config to prevent update for now in yum.conf? | 12:38 |
trown | it has also passed 0 times in 31 tries in the last 3 days on stable/liberty | 12:39 |
*** julim has joined #tripleo | 12:39 | |
derekh | going AFK for a bit, started looking into why the periodic job is failing, looks like its a error comming back from the IPA agent https://etherpad.openstack.org/p/delorean_master_current_issues | 12:40 |
derekh | will resume looking into it later | 12:40 |
gfidente | trown I was long looking for chunk.io :) | 12:40 |
trown | it is pretty sweet | 12:40 |
trown | being able to pipe output to public paste for the win | 12:41 |
* derekh sometimes uses fpaste, chunk.io looks better less dependencies | 12:41 | |
trown | derekh: as in no deps... just curl | 12:41 |
derekh | trown: curl is a dep | 12:41 |
trown | :) | 12:41 |
trown | technically you could use any tool that can do HTTP POST | 12:42 |
*** ramishra has quit IRC | 12:42 | |
derekh | and technically their all deps | 12:42 |
gfidente | derekh is there a cli client for fpaste? | 12:43 |
marios | mcornea: do you still have the env you tried the [] on? | 12:44 |
*** dprince has joined #tripleo | 12:44 | |
mcornea | marios: yes; credentials are in https://bugzilla.redhat.com/show_bug.cgi?id=1324160#c2 | 12:45 |
openstack | bugzilla.redhat.com bug 1324160 in rhel-osp-director "Overcloud nodes have an empty /etc/resolv.conf post upgrade" [High,New] - Assigned to mandreou | 12:45 |
marios | mcornea: thanks just want to confirm the os-net-config update | 12:45 |
*** julim has quit IRC | 12:45 | |
marios | mcornea: looking | 12:45 |
gfidente | derekh trown but hey it's just a GET or POST with fpaste too | 12:45 |
gfidente | http://fpaste.org/doc/api/#create | 12:45 |
*** egafford has quit IRC | 12:46 | |
*** tiswanso has quit IRC | 12:46 | |
hewbrocca | marios: we need the stuff in the new os-net-config, is the problem :( | 12:46 |
marios | Apr 06 11:57:40 Updated: 1:NetworkManager-config-server-1.0.6-29.el7_2.x86_64 | 12:46 |
derekh | gfidente: yup install fpaste , but I prefer trowns method | 12:47 |
jistr | this is what the -upgrade job failed at -- ping validation | 12:47 |
jistr | Trying to ping fd00:fd00:fd00:3000::12 for local network fd00:fd00:fd00:3000::/64...SUCCESS | 12:47 |
jistr | dub 06 12:18:36 overcloud-cephstorage-0 os-collect-config[5264]: Trying to ping fd00:fd00:fd00:4000::12 for local network fd00:fd00:fd00:4000::/64...SUCCESS | 12:47 |
jistr | dub 06 12:18:36 overcloud-cephstorage-0 os-collect-config[5264]: Trying to ping default gateway 192.0.2.1...FAILURE | 12:47 |
jistr | dub 06 12:18:36 overcloud-cephstorage-0 os-collect-config[5264]: 192.0.2.1 is not pingable. | 12:47 |
*** rbrady has joined #tripleo | 12:48 | |
marios | hewbrocca: yeah and it is indeed updated Apr 06 11:56:20 Updated: os-net-config-0.2.2-1.el7ost.noarch | 12:48 |
* marios just sanity checking | 12:48 | |
*** jpena|lunch is now known as jpena | 12:49 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Increase corosync token timeout https://review.openstack.org/301836 | 12:55 |
*** links has quit IRC | 12:56 | |
openstackgerrit | Michele Baldessari proposed openstack/tripleo-heat-templates: [WIP] Add update path for keystone to be moved under wsgi https://review.openstack.org/302235 | 12:57 |
*** rlandy has joined #tripleo | 12:58 | |
openstackgerrit | Dmitry Ilyin proposed openstack/puppet-pacemaker: Merge with fuel-infra/puppet-pacemaker https://review.openstack.org/296440 | 12:58 |
jistr | marios: hmm trying to think how to update it, but fully prevent it from running on existing deployments. We could just rm 20-net-config, but that feels a bit harsh. | 12:58 |
*** akshai has joined #tripleo | 13:01 | |
jistr | hmm looking at failed https://review.openstack.org/#/c/301840 -- also failed at *AllNodesValidationDeployment (this time controller, not ceph). It's intermittent though, so it could be some kind of a race condition. | 13:03 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/python-tripleoclient: Updated from global requirements https://review.openstack.org/268528 | 13:04 |
*** bvandenh has quit IRC | 13:04 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci: Tempest for tripleo CI https://review.openstack.org/295844 | 13:04 |
trown | derekh: I added an etherpad to the trunk issues etherpad that describes building a trunk image using tripleo-quickstart, I am building an image now, and will poke at it https://etherpad.openstack.org/p/trunk-undercloud-image | 13:05 |
*** saneax is now known as saneax_AFK | 13:06 | |
*** nico_auv has joined #tripleo | 13:06 | |
*** myoung has joined #tripleo | 13:10 | |
*** myoung|remote has joined #tripleo | 13:11 | |
*** myoung|remote has quit IRC | 13:11 | |
openstackgerrit | Jiri Stransky proposed openstack-infra/tripleo-ci: Add routing info to host_info.txt https://review.openstack.org/302246 | 13:12 |
*** myoung|afk has quit IRC | 13:13 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci: Tempest for tripleo CI https://review.openstack.org/295844 | 13:16 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci: Run tempest tests with periodic jobs https://review.openstack.org/297038 | 13:16 |
shardy | guys what's the status of the upgrades job now? | 13:17 |
shardy | we landed the use-master-heat patch, but what issue remain to get that passing again? | 13:17 |
*** Goneri has joined #tripleo | 13:18 | |
jistr | shardy: intermittent failures trying to ping default gateway | 13:19 |
jistr | i pasted some info above | 13:19 |
jistr | could be some kind of race perhaps | 13:20 |
jistr | submitted https://review.openstack.org/302246 if it can help us spot anything interesting | 13:20 |
shardy | jistr: ack, thanks | 13:21 |
*** dprince has quit IRC | 13:22 | |
shardy | the upgrades job is the only one we have w/ipv6 so are we blocking everything until we get this resolved? | 13:22 |
*** tiswanso has joined #tripleo | 13:23 | |
shardy | pradk_: ^^ this is the risk if we land your patch with failing CI, we have no coverage of ipv6 unless the upgrades job passes | 13:23 |
openstackgerrit | Michele Baldessari proposed openstack/tripleo-heat-templates: Add update path for keystone to be moved under wsgi https://review.openstack.org/292535 | 13:23 |
*** dustins has joined #tripleo | 13:24 | |
*** dprince has joined #tripleo | 13:29 | |
*** bvandenh has joined #tripleo | 13:34 | |
*** pblaho has joined #tripleo | 13:41 | |
*** tzumainn has joined #tripleo | 13:41 | |
jrist | dprince: ! | 13:43 |
jrist | that was fast | 13:43 |
dprince | jrist: yeah well it auto-deploys man | 13:44 |
*** jaosorior has quit IRC | 13:44 | |
dprince | jrist: I just accepted the pull request ;) | 13:44 |
dprince | jrist: I did test it first mind you. But it looked fine to me | 13:44 |
pradk | shardy, from what i can tell the upgrade failure is not related to my patch | 13:45 |
dprince | jrist: much improved, one minor suggestion would be to make the letters "pop" a bit more | 13:45 |
jrist | dprince: same | 13:45 |
jrist | dprince: hmm | 13:45 |
pradk | shardy, there was one issue i found and fixed already https://review.openstack.org/#/c/301798/ | 13:45 |
jrist | I thought the drop shadow did that | 13:45 |
dprince | jrist: they are a bit gray to me... but totally just me perhaps | 13:45 |
dprince | jrist: seriously, it is a huge step forward. So ++ | 13:46 |
jrist | well it is grey | 13:46 |
jrist | #c3c3c3 | 13:46 |
jrist | specifically | 13:46 |
dprince | jrist: let it ride man. It is good | 13:47 |
derekh | trown: ack | 13:47 |
dprince | jrist: print the t-shirts, stickers, hats | 13:47 |
jrist | dprince: yeah we can always iterate :) | 13:47 |
jrist | dprince: I think rbrady is working on stickers :) | 13:47 |
hewbrocca | We can make swag happen | 13:48 |
rbrady | jrist: limited run of vinyl | 13:48 |
jrist | hewbrocca: \o/ | 13:48 |
slagle | do i get a free mp3 download with the vinyl | 13:48 |
marios | jrist: link or it didn't happen | 13:48 |
jrist | tripleo.org marios ! | 13:48 |
*** pradk_ has quit IRC | 13:49 | |
marios | lol /me grabs his coat | 13:49 |
marios | nice | 13:49 |
*** jprovazn has quit IRC | 13:50 | |
*** egafford has joined #tripleo | 13:51 | |
bandini | dprince: I am *totally* feeling the owl | 13:52 |
bandini | ;) | 13:52 |
*** ramishra has joined #tripleo | 13:52 | |
*** ramishra has quit IRC | 13:56 | |
openstackgerrit | Jiri Tomasek proposed openstack/tripleo-ui: Move Environment and Parameters config to single modal https://review.openstack.org/302272 | 13:58 |
openstackgerrit | Pierre Blanc proposed openstack/tripleo-heat-templates: Add network ExtraConfig hook https://review.openstack.org/265577 | 13:58 |
dprince | bandini: thank jrist | 13:59 |
jrist | bandini: glad to hear :) | 13:59 |
*** apetrich has quit IRC | 14:00 | |
bandini | jrist: \o/ ;) | 14:00 |
*** saneax_AFK is now known as saneax | 14:02 | |
*** electrofelix has quit IRC | 14:02 | |
*** liverpooler has quit IRC | 14:10 | |
*** jaosorior has joined #tripleo | 14:12 | |
*** mkovacik has quit IRC | 14:13 | |
*** snecklifter_ has quit IRC | 14:15 | |
*** snecklifter_ has joined #tripleo | 14:15 | |
*** snecklifter has quit IRC | 14:16 | |
*** snecklifter_ has quit IRC | 14:16 | |
*** snecklifter has joined #tripleo | 14:16 | |
*** snecklifter_ has joined #tripleo | 14:17 | |
snecklifter | Hi, sorry to be a noob but my patch is failing an unsure of problem or if wider issue with CI at the moment? | 14:21 |
snecklifter | https://review.openstack.org/#/c/291243/ | 14:21 |
*** ramishra has joined #tripleo | 14:22 | |
trown | snecklifter: no need to apologize, the upgrades job is very unstable, and the ha job is only slightly better. Folks are working on it though | 14:23 |
snecklifter | trown: thanks, so do you think I need to spend more time on debugging my patch or is it an issue with CI? | 14:23 |
snecklifter | I have done one re-check and same failure | 14:24 |
trown | snecklifter: I would not spend time debugging those jobs right now, I will add myself to your patch and recheck when CI is in a better state | 14:24 |
trown | sorry for the inconvenience | 14:24 |
snecklifter | trown: no problem, stuff happens, thanks for responding | 14:24 |
*** ramishra has quit IRC | 14:27 | |
*** dustins has quit IRC | 14:30 | |
*** Goneri has quit IRC | 14:32 | |
*** oshvartz has quit IRC | 14:33 | |
*** dustins has joined #tripleo | 14:34 | |
*** electrofelix has joined #tripleo | 14:40 | |
*** ohamada has joined #tripleo | 14:40 | |
*** jprovazn has joined #tripleo | 14:43 | |
*** dprince has quit IRC | 14:46 | |
*** Goneri has joined #tripleo | 14:46 | |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-heat-templates: Update .sh references from openstack-keystone to openstack-core https://review.openstack.org/296592 | 14:48 |
gfidente | bandini jistr ^^ massive removal | 14:49 |
bandini | w00t | 14:50 |
gfidente | bandini given you call the migration script early | 14:54 |
*** jaosorior has quit IRC | 14:54 | |
gfidente | I think we might also remove the if openstack-core thing | 14:54 |
gfidente | and just disable it | 14:54 |
gfidente | see inline comments | 14:54 |
hewbrocca | shardy, gfidente any more progress on that resolv.conf thing? | 14:56 |
*** yamahata has quit IRC | 14:59 | |
*** mbound has quit IRC | 15:01 | |
*** mbound has joined #tripleo | 15:01 | |
*** julim has joined #tripleo | 15:06 | |
shardy | hewbrocca: Not from me - it looks like the added subnet mask comes from this commit: | 15:07 |
shardy | https://github.com/openstack/os-net-config/commit/95a8412bd9a80ce1e15363ae6c1a51db9432f9d7 | 15:07 |
shardy | I was hoping for some feedback from dsneddon or dprince | 15:07 |
gfidente | shardy yeah so given the config.json does not change | 15:07 |
gfidente | and removing the orc script isn't great | 15:08 |
gfidente | either we remove that | 15:09 |
gfidente | patch | 15:09 |
*** mbound has quit IRC | 15:09 | |
gfidente | or add dns_servers for all interfaces in the nic templates | 15:09 |
*** ramishra has joined #tripleo | 15:09 | |
derekh | trown: ImportError: No module named ironic_lib | 15:09 |
*** dshulyak has quit IRC | 15:10 | |
trown | derekh: weird... we fixed that https://github.com/openstack-packages/ironic-python-agent/blob/rpm-master/openstack-ironic-python-agent.spec#L57 | 15:10 |
trown | derekh: are you using "consistent" | 15:10 |
hewbrocca | gfidente: wait | 15:11 |
*** egafford has quit IRC | 15:11 | |
hewbrocca | add dns_servers for all interfaces... | 15:11 |
*** absubram has joined #tripleo | 15:11 | |
derekh | trown: yup | 15:11 |
hewbrocca | why wouldn't we just *do that*? | 15:11 |
hewbrocca | Surely it is the correct solution | 15:11 |
trown | derekh: because it is unfortunately a bit old since there is a new dep added for oslo config generator that is not packaged yet | 15:11 |
gfidente | hewbrocca :( | 15:11 |
trown | derekh: I think while we wait for RDO to pick up newton we have to use current | 15:11 |
hewbrocca | gfidente: like, I am perfectly fine with os-net-config updating the vlan config | 15:12 |
hewbrocca | that is not a bug AFAIK | 15:12 |
hewbrocca | But it shouldn't hose the nameservers when it does it | 15:12 |
trown | derekh: kind of hard for RDO to pick up newton before releasing mitaka | 15:12 |
derekh | trown: ok, I'm gonna start over with /current and see how far I get | 15:12 |
derekh | trown: yup, a lot of balls to juggle | 15:12 |
trown | derekh: did you disable mistral in your run? I got stuck there | 15:12 |
hewbrocca | So if we can prevent that by putting dns_servers everywhere | 15:12 |
hewbrocca | ... fine? | 15:12 |
derekh | trown: nope did a run as close to ci as possible, the ironic-lib dep is the first thing I hit | 15:13 |
*** dustins_ has joined #tripleo | 15:13 | |
trown | derekh: ok, probably something specific to tripleo-quickstart then | 15:14 |
derekh | trown: my main problem was figuring out how to get onto the instance and see the error | 15:14 |
derekh | trown: possibly | 15:14 |
*** dmacpher has joined #tripleo | 15:14 | |
openstackgerrit | Steven Hardy proposed openstack-infra/tripleo-ci: tripleo.sh --delorean-build handle oslo.* package builds https://review.openstack.org/302312 | 15:14 |
gfidente | hewbrocca give me a little more to figure why ifcfg is doing the nameservers cleanup | 15:14 |
*** pblaho has quit IRC | 15:14 | |
gfidente | hewbrocca I think it's ifcfg which is mistakenly clearing resolv.conf | 15:14 |
*** dustins has quit IRC | 15:16 | |
*** Marga_ has joined #tripleo | 15:16 | |
hewbrocca | gfidente: OK, makes sense | 15:16 |
*** jcoufal has joined #tripleo | 15:19 | |
*** dtantsur|brb is now known as dtantsur | 15:19 | |
*** mcornea has quit IRC | 15:22 | |
gfidente | hewbrocca though we didn't want the interfaces to go up/down to not interfere with pcmk | 15:22 |
*** leanderthal is now known as leanderthal|afk | 15:22 | |
openstackgerrit | Dimitri Savineau proposed openstack/python-tripleoclient: Generate ceph client key https://review.openstack.org/302318 | 15:24 |
openstackgerrit | Dimitri Savineau proposed openstack/tripleo-heat-templates: Use a different ceph key for admin/client user https://review.openstack.org/302319 | 15:24 |
hewbrocca | gfidente: yes, that is true | 15:24 |
*** egafford has joined #tripleo | 15:25 | |
hewbrocca | won't matter on the upgrade though since cluster is down anyway | 15:25 |
gfidente | shardy so it looks to me if we call first_v6.ip we can still support multiple ips but we won't update it to /64 ? | 15:25 |
hewbrocca | and for that matter | 15:25 |
hewbrocca | it wouldn't matter on the update because the cluster is down then too | 15:25 |
gfidente | oh right it's in maintenance at that stage | 15:25 |
*** mcornea has joined #tripleo | 15:27 | |
*** mcornea has quit IRC | 15:27 | |
hewbrocca | right | 15:29 |
hewbrocca | because yum update misbehaves a lot of things | 15:29 |
*** dshulyak has joined #tripleo | 15:29 | |
*** shardy has quit IRC | 15:31 | |
*** rajinir has joined #tripleo | 15:32 | |
*** aufi has quit IRC | 15:35 | |
*** mikelk has quit IRC | 15:36 | |
gfidente | so I reproduced it | 15:36 |
gfidente | we set the DNS in the ifcfg-br interface but if one of the vlans icfg is brought down and up after that, and it doesn't have DNS | 15:37 |
gfidente | the resolv.conf will be restored | 15:37 |
hewbrocca | gfidente: what does that mean | 15:38 |
openstackgerrit | Jiri Stransky proposed openstack-infra/tripleo-ci: Add routing info to host_info.txt https://review.openstack.org/302246 | 15:41 |
hewbrocca | poor jistr is still trying to fix CI | 15:42 |
hewbrocca | jistr: good work... | 15:42 |
hewbrocca | gfidente: so is what you're describing above a solution? | 15:42 |
jistr | well this is just adding debug info | 15:42 |
gfidente | hewbrocca no | 15:42 |
derekh | trown: ^[[mNotice: /Stage[main]/Mistral::Db::Sync/Exec[mistral-db-sync]/returns: raise TypeError('{!r} is not a Python function'.format(func))^[[0m | 15:42 |
hewbrocca | gfidente: excellent | 15:43 |
derekh | trown: is that the mistral error you were seeing? | 15:43 |
hewbrocca | gfidente: what is the solution? :) | 15:43 |
gfidente | hewbrocca I think there is a change we could do in os-net-config to make it not rerun | 15:43 |
*** julim has quit IRC | 15:43 | |
hewbrocca | Well | 15:43 |
trown | derekh: indeed | 15:44 |
gfidente | passing dns_servers to all interfaces is an alternative | 15:44 |
gfidente | but I think the real issue is in the ifup-post thing | 15:45 |
*** masco has joined #tripleo | 15:46 | |
hewbrocca | gfidente: passing dns_servers to all interfaces has the advantage of being a doc fix (basically) | 15:46 |
hewbrocca | like we could do that *now* | 15:46 |
gfidente | uhm we need to do that in the nic templates we ship | 15:47 |
gfidente | too | 15:47 |
hewbrocca | Sure | 15:47 |
hewbrocca | but still, basically a doc fix? | 15:47 |
gfidente | I can put up a change for that anyway yes | 15:47 |
gfidente | maybe together with the os-net-config change | 15:47 |
bnemec | Which is a problem because it means all existing templates in the field are now broken. | 15:47 |
hewbrocca | yay field | 15:47 |
hewbrocca | bnemec: you are right | 15:47 |
bnemec | Required changes to the nic-configs are extremely painful. | 15:47 |
hewbrocca | I mean, that's still a bandaid | 15:48 |
hewbrocca | What I want is for ifconfig to stop blatting over resolv.conf | 15:48 |
gfidente | hewbrocca yeah that's what I am looking into now | 15:48 |
jistr | gfidente: just wondering if passing dns_servers is enough. We get a good resolv.conf, but the ifaces still get bumped, which could make pacemaker go wild, no? | 15:48 |
hewbrocca | jistr: cluster's down, isn't it? | 15:48 |
bnemec | Yeah, I don't understand how this whole per-interface DNS is supposed to work. There's only one resolv.conf on the system. | 15:48 |
gfidente | ifup script in network-scripts | 15:49 |
bnemec | I assume it made sense to someone somewhere, but it doesn't to me. | 15:49 |
hewbrocca | bnemec: yeah. It seems just basically flawed | 15:49 |
hewbrocca | dsneddon: probably has a better answer | 15:49 |
*** yamahata has joined #tripleo | 15:49 | |
*** pblaho has joined #tripleo | 15:50 | |
jistr | hewbrocca: good point, it could be. I'm not sure where exactly the 20-net-config run which does this happens, tbh. It's not during yum update, but i think it would be before pcs cluster start gets applied. | 15:50 |
jistr | so yeah cluster is probably down | 15:50 |
gfidente | so the problem is as follows | 15:51 |
gfidente | you start with an empty resolv.conf | 15:52 |
gfidente | if-br-cfg makes a copy of it and uses the DNS servers we specify | 15:52 |
gfidente | when later we re-run os-net-config it only has to update the vlans | 15:52 |
gfidente | in bringing those down it restores the empty resolv.conf to what it was | 15:52 |
gfidente | and in bringing it up it doesn't write back the changes because the vlan interfaces have no dns_servers | 15:53 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Use split cirros image for ping test https://review.openstack.org/301951 | 15:53 |
bnemec | derekh: slagle: ^ may help our CI load significantly. | 15:53 |
bnemec | Running the ping test locally, I realized how long it takes to create and delete a 10 gig volume. | 15:54 |
bnemec | 1 gig should be a lot better. | 15:54 |
hewbrocca | bnemec: +1! | 15:54 |
hewbrocca | gfidente: oh... huh... that sucks, doesn'ŧ it | 15:54 |
bnemec | Also, the cirros image was basically pingable within seconds of the stack being complete, which is also a huge improvement. | 15:55 |
*** dshulyak has quit IRC | 15:56 | |
hewbrocca | OK folks, I really do have to take off | 15:56 |
*** electrofelix has quit IRC | 15:56 | |
*** yamahata has quit IRC | 15:59 | |
*** yamahata has joined #tripleo | 15:59 | |
hewbrocca | Get all the bugs fixed please, would you? | 16:00 |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Do not append netmaks suffix to IPV6 addresses https://review.openstack.org/302337 | 16:01 |
gfidente | bnemec jistr so ^^ restores the behaviour we had before | 16:01 |
gfidente | but there could still be other things which trigger a vlan down/up | 16:01 |
mburned | gfidente: that's for the resolv.conf issue? | 16:01 |
gfidente | mburned yeah | 16:02 |
bnemec | gfidente: Is there just not supposed to be a netmask on the address for ipv6? I think I missed that part of the discussion. | 16:02 |
*** devvesa has quit IRC | 16:02 | |
gfidente | there can be, it's optional | 16:02 |
gfidente | while for ipv4 it goes in the NETMASK thing | 16:03 |
gfidente | let me add some more stuff in the commit msg | 16:03 |
*** pblaho has quit IRC | 16:03 | |
bnemec | That would be good. | 16:04 |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Do not append netmaks suffix to IPV6 addresses https://review.openstack.org/302337 | 16:05 |
gfidente | bnemec done, but it isn't fixing an issue | 16:05 |
gfidente | it's just trying to stop os-net-config from rerunning | 16:06 |
hewbrocca | I would love it if there was some -- *any* -- better way to fix this problem | 16:07 |
hewbrocca | since this patche doesn't really fix it in any real sense | 16:07 |
hewbrocca | but, it's better than nothing I suppose | 16:07 |
hewbrocca | now I really am leaving | 16:07 |
gfidente | yeah it's just meant to not break things | 16:07 |
gfidente | I can also add dns_servers to all nic templates | 16:07 |
gfidente | but looks to me the issue is in the behaviour that the ifup script has | 16:08 |
* jistr gtg too | 16:10 | |
*** jistr has quit IRC | 16:10 | |
*** masco has quit IRC | 16:12 | |
derekh | trown: I seem to be able to reproduce that mistral error quit easily, any idea why they didn't see it? http://chunk.io/f/d2896bfbcabb45f89037440b189a471b | 16:13 |
trown | derekh: so it is a mistral issue not a puppet-mistral issue | 16:16 |
*** jdob has quit IRC | 16:16 | |
derekh | trown: yup, look like it was a problem introduced yesterday https://review.openstack.org/#/c/301588/ | 16:16 |
derekh | trown: I'm asking on #openstack-mistral now | 16:17 |
openstackgerrit | Dmitry Ilyin proposed openstack/puppet-pacemaker: Merge with fuel-infra/puppet-pacemaker https://review.openstack.org/296440 | 16:17 |
trown | derekh: sadly I think the same thing that is causing failures to build on master (missing networkx dep) is making cinder-api loop on start-up so validation fails | 16:17 |
trown | derekh: so I am not sure we will be able fix trunk without RDO adding the new dep | 16:17 |
derekh | trown: ok, has anybody started the networkx package? if not I'll take a look once I get passed this mistral thing | 16:19 |
trown | derekh: I know apevec and number80 started on it, not sure how far they got. probably good to sync with them | 16:20 |
derekh | trown: ack, thanks | 16:21 |
*** Ryjedo has quit IRC | 16:23 | |
gfidente | guys | 16:27 |
gfidente | I might actually have found a proper fix :) | 16:27 |
*** jistr has joined #tripleo | 16:28 | |
*** apetrich has joined #tripleo | 16:29 | |
*** trown is now known as trown|lunch | 16:30 | |
*** ohamada has quit IRC | 16:31 | |
*** jtomasek has quit IRC | 16:33 | |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Use PEERDNS when no dns_servers is provided https://review.openstack.org/302352 | 16:33 |
*** jistr is now known as jistr|off | 16:34 | |
*** Marga_ has quit IRC | 16:34 | |
*** dprince has joined #tripleo | 16:34 | |
*** Marga_ has joined #tripleo | 16:37 | |
openstackgerrit | Dan Prince proposed openstack/os-net-config: Bump hacking in test-requirements.txt https://review.openstack.org/301863 | 16:42 |
*** Marga_ has quit IRC | 16:44 | |
*** MaxPC has quit IRC | 16:46 | |
openstackgerrit | Derek Higgins proposed openstack-infra/tripleo-ci: Nothing to see here https://review.openstack.org/111011 | 16:47 |
*** tosky has quit IRC | 16:47 | |
derekh | trown|lunch: will check back on ^ later, and look out for a networkx fix | 16:48 |
*** coolsvap has joined #tripleo | 16:48 | |
*** derekh has quit IRC | 16:50 | |
bnemec | gfidente: That looks good. It will still trigger an interface restart though, right? | 16:57 |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Use PEERDNS when no dns_servers is provided https://review.openstack.org/302352 | 16:57 |
gfidente | bnemec yeah and update the config | 16:57 |
gfidente | ^^ just updated | 16:57 |
gfidente | with tests | 16:57 |
bnemec | gfidente: Okay, it just sounded like that might cause grief for pacemaker. | 16:58 |
*** jpena is now known as jpena|off | 16:58 | |
*** ayoung has quit IRC | 16:59 | |
*** saneax is now known as saneax_AFK | 17:00 | |
gfidente | bnemec yeah so I kept both submissions up | 17:01 |
gfidente | bnemec the updated version of PEERDNS only applies when some ipaddress is provided cause dhcp is the default in and in that case we want to keep it to yes, I suppose | 17:01 |
*** apetrich has quit IRC | 17:03 | |
bnemec | gfidente: Yeah, it's weird. The docs say "If using DHCP, then yes is the default. " which seems to imply when not using DHCP no is default, but that doesn't seem to be true. | 17:03 |
gfidente | yeah the ifup script just checks if PEERDNS == no | 17:04 |
bnemec | I'm pretty sure I looked into this option for another thing a while ago and was equally unimpressed by the documentation that time. | 17:04 |
gfidente | shells growing out of control :P | 17:05 |
*** ramishra has quit IRC | 17:05 | |
bnemec | gfidente: Is there a test that sets both a static address and DNS like we would in tripleo? I don't see one, and that's an important configuration for us to keep working. | 17:08 |
bnemec | I think it should be fine, but it might be good to add coverage of that if we don't have it so we make sure it keeps working going forward. | 17:08 |
*** dprince has quit IRC | 17:08 | |
*** dprince has joined #tripleo | 17:09 | |
gfidente | bnemec there is one yes | 17:10 |
gfidente | bnemec L63 | 17:11 |
openstackgerrit | greghaynes proposed openstack/diskimage-builder: Fix ssh key cleanup to run in chroot https://review.openstack.org/302373 | 17:12 |
bnemec | gfidente: That isn't setting DNS, is it? | 17:12 |
gfidente | ah right | 17:13 |
gfidente | sec | 17:13 |
*** jdob_lt has joined #tripleo | 17:13 | |
bnemec | I'm looking for something like line 553, but with IPADDR set too. | 17:13 |
gfidente | L559 | 17:13 |
*** jtomasek has joined #tripleo | 17:14 | |
gfidente | L573 too | 17:14 |
bnemec | Yeah, but those don't include a static address. | 17:14 |
gfidente | ahaha | 17:15 |
gfidente | ok let me try | 17:15 |
*** eil397 has joined #tripleo | 17:18 | |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Use PEERDNS when no dns_servers is provided https://review.openstack.org/302352 | 17:19 |
gfidente | cause it also doesn't make sense to pass DNS1 when BOOTPROTO=none | 17:20 |
gfidente | and anyway the test below is making same thing | 17:20 |
openstackgerrit | Merged openstack/diskimage-builder: Revert "Skip centos functional testing" https://review.openstack.org/274938 | 17:21 |
*** eil397 has left #tripleo | 17:23 | |
openstackgerrit | Miles Gould proposed openstack/instack-undercloud: Install UEFI version of iPXE ROM in /tftpboot https://review.openstack.org/302379 | 17:26 |
*** Marga_ has joined #tripleo | 17:28 | |
*** jdob_lt has quit IRC | 17:33 | |
*** ayoung has joined #tripleo | 17:33 | |
mburned | gfidente: nice, peerdns=no | 17:34 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Force rebuild of ramdisk as part of overcloud-full https://review.openstack.org/302380 | 17:35 |
dprince | gfidente: is there a bug for https://review.openstack.org/#/c/302352/ ? | 17:35 |
gfidente | mburned looks like it should do it yes | 17:35 |
gfidente | dprince no I was working based on a BZ https://bugzilla.redhat.com/show_bug.cgi?id=1324160 | 17:36 |
openstack | bugzilla.redhat.com bug 1324160 in rhel-osp-director "Overcloud nodes have an empty /etc/resolv.conf post upgrade" [High,On_dev] - Assigned to gfidente | 17:36 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Force rebuild of ramdisk as part of overcloud-full https://review.openstack.org/302381 | 17:36 |
gfidente | dprince it's an intricate problem though | 17:36 |
gfidente | the config.json isn't changing | 17:36 |
gfidente | but yet we need to update the ifcfg files | 17:36 |
gfidente | so firstly I tried to avoid it with https://review.openstack.org/302337 | 17:37 |
gfidente | but that is meh | 17:37 |
*** jdob_lt has joined #tripleo | 17:37 | |
*** ayoung has quit IRC | 17:38 | |
*** shivrao has joined #tripleo | 17:38 | |
*** ayoung has joined #tripleo | 17:39 | |
*** sambetts is now known as sambetts|afk | 17:39 | |
dprince | gfidente: filing a bug upstream that we can track might still be useful. Perhaps even link to the BZ in the LP bug too | 17:39 |
dprince | gfidente: otherwise noone will know why you did this | 17:39 |
gfidente | dprince yeah I don't do that much :( | 17:40 |
gfidente | will do | 17:40 |
dprince | gfidente: thanks | 17:40 |
jrist | what's our slogan | 17:41 |
jrist | for the t-shirts | 17:41 |
jrist | "it's like wearing pants on pants" | 17:42 |
*** tosky_ has joined #tripleo | 17:42 | |
*** tosky_ has quit IRC | 17:42 | |
*** hewbrocca is now known as hewbrocca-afk | 17:43 | |
*** nico_auv has quit IRC | 17:44 | |
*** david-lyle has quit IRC | 17:44 | |
openstackgerrit | Giulio Fidente proposed openstack/os-net-config: Use PEERDNS when no dns_servers is provided https://review.openstack.org/302352 | 17:49 |
gfidente | jrist pop | 17:50 |
gfidente | I'd say l on l so we can make it a LOL | 17:51 |
*** ayoung has quit IRC | 17:52 | |
*** trown|lunch is now known as trown | 17:53 | |
bnemec | Stackception - we have to go deeper! | 17:54 |
*** rcernin has quit IRC | 17:56 | |
*** jprovazn has quit IRC | 18:00 | |
*** jcoufal has quit IRC | 18:01 | |
*** david-lyle has joined #tripleo | 18:02 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Add ping_retry function https://review.openstack.org/302386 | 18:07 |
*** shivrao has quit IRC | 18:08 | |
openstackgerrit | Ben Kero proposed openstack/diskimage-builder: Fix add-apt-repository package for precise https://review.openstack.org/294760 | 18:09 |
*** dtantsur is now known as dtantsur|afk | 18:10 | |
*** pradk has quit IRC | 18:14 | |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Revert "Ping retry" https://review.openstack.org/302393 | 18:18 |
dprince | bnemec: slagle: I'd like to fast track that ^^^ | 18:18 |
dprince | bnemec: slagle: and then follow it up with a correct version https://review.openstack.org/#/c/302386/ | 18:19 |
*** qasims has quit IRC | 18:20 | |
*** shivrao has joined #tripleo | 18:21 | |
*** yamahata has quit IRC | 18:21 | |
dprince | trown: you reviewed that one too | 18:23 |
*** dustins_ is now known as dustins | 18:24 | |
bnemec | dprince: I'm okay with that. Out of curiosity, how did it break? | 18:24 |
dprince | bnemec: the bug describes everything. Quite cryptic actually | 18:24 |
slagle | dprince: in your fixed patch, you only retry 5 times | 18:24 |
dprince | bnemec: the $ping bash variable got set to ping6 | 18:24 |
slagle | i think rook needed the 10 retries | 18:24 |
bnemec | Oh, missed the bug ref. :-) | 18:25 |
dprince | slagle: I want to land the revert first | 18:25 |
dprince | slagle: happy to fix the new patch however we want | 18:25 |
bnemec | Well, we weren't sure how many retries. 10 seemed to be a sane amount. | 18:25 |
dprince | 10 retries is totally fine | 18:25 |
*** pradk has joined #tripleo | 18:25 | |
dprince | slagle, bnemec we've lost another day of CI due to this patch so I'd like to land the revert ASAP... (it should be safe to revert) | 18:26 |
bnemec | dprince: Agreed. | 18:26 |
openstackgerrit | Ethan Gafford proposed openstack/tripleo-heat-templates: Trove Integration https://review.openstack.org/233240 | 18:26 |
trown | +1 to revert, looking at the bug just to understand what went wrong, but no need to wait for me | 18:26 |
slagle | dprince: sure, wfm | 18:26 |
dprince | thanks guys | 18:28 |
*** ayoung has joined #tripleo | 18:29 | |
openstackgerrit | Zane Bitter proposed openstack/tripleo-heat-templates: Don't have separate protocols/ports for Keystone v3 https://review.openstack.org/284265 | 18:31 |
*** mgould has quit IRC | 18:32 | |
*** apetrich has joined #tripleo | 18:34 | |
trown | ah, so $ping weas not used in that ping_default_gateways() function before the ping retry patch | 18:35 |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Add ping_retry function https://review.openstack.org/302386 | 18:35 |
openstackgerrit | Ethan Gafford proposed openstack/python-tripleoclient: Trove integration https://review.openstack.org/233241 | 18:35 |
dprince | trown: yeah, it worked by accident for IPv4. But of IPv6 $ping=ping6 | 18:36 |
trown | I was having trouble understanding how that patch had anything to do with IPv6 vs IPv4 | 18:36 |
dprince | trown: and that is a total fail | 18:36 |
dprince | trown: ping6 192.0.2.1 fails clearly in all our logs | 18:36 |
*** coolsvap has quit IRC | 18:37 | |
* bnemec wonders again why they decided to push the ipv4 vs ipv6 logic onto users instead of making ping smart enough to DTRT. | 18:37 | |
trown | dprince: did we add IPv6 to upgrades on Mar30? | 18:37 |
dprince | trown: it just merged yesterday | 18:37 |
trown | that could be the other tricky part about reviewing that patch is seeing the upgrades job pass on Mar30 on the same patch | 18:37 |
bnemec | This is the danger of not actually gating. | 18:38 |
dprince | trown: it most likely passed CI before the upgrades job was switch to IPv6 | 18:38 |
trown | cool, all makes sense to me now | 18:38 |
openstackgerrit | Dan Prince proposed openstack/tripleo-heat-templates: Add ping_retry function https://review.openstack.org/302386 | 18:38 |
dprince | trown: yeah, it all makes me sad. But 2 days left this week so lets see what happens now :) | 18:39 |
dprince | bnemec: and yeah. this is why we want to gate | 18:39 |
*** gfidente has quit IRC | 18:41 | |
*** apetrich has quit IRC | 18:41 | |
bnemec | No package liberasurecode-devel available. | 18:44 |
bnemec | Sigh | 18:44 |
rook | sup slagle? | 18:46 |
rook | what went on with the ping_retry patch? | 18:48 |
rook | dprince: ^ ? | 18:48 |
dprince | rook: oh hey Joe | 18:49 |
rook | Hey | 18:49 |
dprince | rook: so it caused CI failures for all the IPv6 jobs (upgrades) | 18:50 |
rook | previously the gateway ping was just ping... i didn't know if that was intentional | 18:50 |
rook | so i left it alone. | 18:50 |
dprince | rook: $ping got set to 'ping6' | 18:50 |
rook | right | 18:50 |
rook | but the gateway ping was previously just 'ping' | 18:50 |
dprince | rook: correct | 18:50 |
rook | so, that was intentional | 18:51 |
dprince | rook: our IPv6 still had a default gateway that was IPv4. So with the new patch this was happending... | 18:51 |
dprince | rook: ping6 192.0.2.1 | 18:51 |
rook | ugh | 18:51 |
dprince | rook: that will always fail | 18:51 |
rook | yeah | 18:51 |
dprince | rook: try this one https://review.openstack.org/#/c/302386/ | 18:51 |
rook | thanks dprince | 18:54 |
rook | sorry about causing hell for CI | 18:54 |
rook | we didn't see this :/ | 18:54 |
dprince | rook: ah, no worries. We didn't either. Just a bit of a window where the IPv6 job hadn't come online to catch this yet | 18:54 |
bnemec | rook: It's an edge case that exposed a hole in our CI. | 18:54 |
*** jayg is now known as jayg|g0n3 | 18:55 | |
rook | dprince are you on this call? | 18:58 |
rook | bnemec: alright... anyway... trown has rode me hard about patches causing problems ;P | 18:58 |
dprince | rook: no calls for me | 18:58 |
rook | sent you a private dprince | 18:58 |
trown | rook: lol, in this case I +2'd it so I have no basis to give you crap :) | 18:59 |
rook | trown :D | 19:01 |
rook | who uses ipv6 anyway!?!?!? :P | 19:01 |
rook | please don't whois me now | 19:02 |
trown | [Whois] rook is rook!~jtaleric@2606:a000:4ed0:f300:21f:bcff:fe10:12aa (Joe Talerico) | 19:03 |
rook | :() | 19:03 |
rook | only assholes use v6 | 19:03 |
rook | anyhoo.. I apologize for yet again causing headaches... | 19:04 |
bnemec | Okay, puppet syntax jobs are borked. | 19:04 |
* bnemec blames rook | 19:04 | |
bnemec | :-P | 19:04 |
rook | bnemec you and my wife. | 19:04 |
*** mkovacik has joined #tripleo | 19:05 | |
openstackgerrit | Athlan-Guyot sofer proposed openstack/tripleo-heat-templates: WIP: integration of the new puppet pacemaker. https://review.openstack.org/302409 | 19:09 |
openstackgerrit | Athlan-Guyot sofer proposed openstack/tripleo-heat-templates: WIP: integration of the new puppet pacemaker. https://review.openstack.org/302409 | 19:11 |
openstackgerrit | Merged openstack/diskimage-builder: Fix ssh key cleanup to run in chroot https://review.openstack.org/302373 | 19:12 |
bnemec | So, we're hosed until infra gets Ansible to update Jenkins again. Apparently there's a problem with it right now. | 19:13 |
*** jdob_lt has quit IRC | 19:15 | |
bnemec | dprince: FYI^ The ping revert and everything else with a puppet job is blocked indefinitely. | 19:16 |
*** ayoung has quit IRC | 19:16 | |
*** ayoung has joined #tripleo | 19:16 | |
* bnemec is seriously tempted just to take the rest of the afternoon off | 19:16 | |
trown | the puppet jobs gate? | 19:16 |
trown | w the actual f... of all the things we would have gating | 19:16 |
bnemec | trown: Essentially. Jenkins won't let anything into the gate with a failing check job. | 19:17 |
dprince | bnemec: yeah, I'm noticing that it isn't wanting to land. Same package installation failure twice now... | 19:17 |
bnemec | dprince: Yeah, there's a problem which should be fixed by https://review.openstack.org/#/c/302298 | 19:17 |
bnemec | But with Jenkins not updating it isn't actually applied to the environments yet. | 19:17 |
*** MaxPC has joined #tripleo | 19:24 | |
*** jdob_lt has joined #tripleo | 19:29 | |
*** jdob_lt has quit IRC | 19:29 | |
*** jdob_lt has joined #tripleo | 19:29 | |
*** jdob_lt has quit IRC | 19:30 | |
*** jdob_lt has joined #tripleo | 19:30 | |
*** jayg|g0n3 is now known as jayg | 19:33 | |
*** mkovacik has quit IRC | 19:34 | |
*** Goneri has quit IRC | 19:35 | |
*** Goneri has joined #tripleo | 19:35 | |
*** ifarkas has quit IRC | 19:40 | |
*** apetrich has joined #tripleo | 19:49 | |
*** apetrich has quit IRC | 19:49 | |
*** apetrich has joined #tripleo | 19:50 | |
openstackgerrit | Dan Prince proposed openstack/puppet-tripleo: Add Glance profiles https://review.openstack.org/296076 | 19:55 |
*** julim has joined #tripleo | 19:57 | |
openstackgerrit | Dan Radez proposed openstack/tripleo-heat-templates: Enable deployment of Ceph Storage (OSD) on the Compute Nodes https://review.openstack.org/273754 | 20:02 |
*** mgagne_ is now known as mgagne | 20:04 | |
jrist | bnemec: I like your slogan | 20:04 |
*** jayg is now known as jayg|g0n3 | 20:08 | |
*** rcernin has joined #tripleo | 20:09 | |
*** pradk_ has joined #tripleo | 20:12 | |
*** yamahata has joined #tripleo | 20:13 | |
*** pradk_ has quit IRC | 20:13 | |
*** pcaruana has quit IRC | 20:14 | |
*** pradk has quit IRC | 20:14 | |
*** pradk has joined #tripleo | 20:18 | |
*** jprovazn has joined #tripleo | 20:21 | |
*** mbound has joined #tripleo | 20:25 | |
*** mbound has quit IRC | 20:25 | |
*** mbound has joined #tripleo | 20:25 | |
*** yamahata has quit IRC | 20:30 | |
*** egafford has quit IRC | 20:31 | |
*** dustins has quit IRC | 20:38 | |
slagle | can you not use get_input in a list_join? | 20:38 |
slagle | jdob_lt: you're a heat person now. can i use get_input in a list_join? | 20:39 |
slagle | i'm asking b/c it appears to not work :) | 20:39 |
jdob_lt | in theory, i think it should | 20:42 |
slagle | oh, i found a bug https://bugs.launchpad.net/heat/+bug/1344284 | 20:42 |
openstack | Launchpad bug 1344284 in heat ""Items to join must be strings" - Intrinsic functions resolve before get_input in StructuredConfig resource" [Medium,Triaged] | 20:42 |
bnemec | I feel like the ci status page may have quit updating. | 20:43 |
bnemec | Did the fancy new logo overload the server? :-) | 20:43 |
slagle | bnemec: i think it ran out of red ink | 20:43 |
bnemec | lol | 20:44 |
*** trown is now known as trown|outtypewww | 20:44 | |
*** rwsu_ has quit IRC | 20:44 | |
*** rwsu has joined #tripleo | 20:45 | |
*** apetrich has quit IRC | 20:48 | |
*** qasims has joined #tripleo | 20:48 | |
*** MaxPC has quit IRC | 20:57 | |
*** jdob_lt has quit IRC | 20:58 | |
*** Goneri has quit IRC | 21:04 | |
*** jistr|off has quit IRC | 21:04 | |
openstackgerrit | James Slagle proposed openstack/instack-undercloud: Add hieradata override file https://review.openstack.org/302438 | 21:05 |
*** julim has quit IRC | 21:14 | |
*** qasims has quit IRC | 21:17 | |
*** dprince has quit IRC | 21:20 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Revert "Ping retry" https://review.openstack.org/302393 | 21:24 |
*** rcernin has quit IRC | 21:27 | |
*** egafford has joined #tripleo | 21:27 | |
*** jprovazn has quit IRC | 21:28 | |
bnemec | \o/ puppet must be working again | 21:30 |
*** tiswanso has quit IRC | 21:31 | |
*** ibravo has joined #tripleo | 21:35 | |
*** ibravo has quit IRC | 21:36 | |
mburned | <sigh> os-net-config ci already failed on nonha... | 21:39 |
mburned | at least that one passed on the last run | 21:39 |
*** ayoung has quit IRC | 21:39 | |
* mburned crosses fingers that ha and upgrades pass | 21:39 | |
dsneddon | mburned, Let me know if os-net-config fails in the other CI passes. The latest change looked OK to me, but I don't know that dprince actually tested it. | 21:41 |
mburned | dsneddon: https://review.openstack.org/302352 | 21:41 |
mburned | dsneddon: it's gfidente's patch | 21:41 |
mburned | dprince's is already landed | 21:41 |
dsneddon | mburned, Right, I meant this one. Maybe it already passed CI: https://review.openstack.org/#/c/301827/ | 21:42 |
mburned | looks like it passed all but upgrades and they waived that since it's failing consistently | 21:42 |
bnemec | We should stop doing that now that the ipv6 ping patch has merged. | 21:44 |
bnemec | We suddenly seem to be having network issues. | 21:45 |
bnemec | I've seen a couple of patches fail on DNS lookup failures for various things. | 21:45 |
bnemec | I mean, why not? Everything else has been broken today. | 21:46 |
*** myoung has quit IRC | 21:46 | |
mburned | bnemec: fix 1 thing, 10 others break... | 21:48 |
mburned | bnemec: i don't have the power to waive it anyway... | 21:48 |
*** morazi has quit IRC | 21:49 | |
*** tiswanso has joined #tripleo | 21:51 | |
*** olap has quit IRC | 21:51 | |
*** tiswanso has quit IRC | 21:52 | |
*** tiswanso has joined #tripleo | 21:52 | |
*** rwsu has quit IRC | 22:05 | |
*** weshay has quit IRC | 22:08 | |
*** dustins has joined #tripleo | 22:08 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: TEST: Delete the overcloud when finished https://review.openstack.org/297328 | 22:12 |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Test overcloud deletion with net-iso in periodic job https://review.openstack.org/296765 | 22:12 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder: Turn of tracing around du invocations https://review.openstack.org/302454 | 22:12 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Add ability to auto-generate self-signed certificates https://review.openstack.org/273233 | 22:14 |
*** social has quit IRC | 22:18 | |
*** social has joined #tripleo | 22:18 | |
*** Goneri has joined #tripleo | 22:20 | |
*** rwsu has joined #tripleo | 22:30 | |
*** pradk has quit IRC | 22:36 | |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Generate most of the pystache context automatically https://review.openstack.org/297319 | 22:36 |
*** akrivoka has quit IRC | 23:00 | |
*** ayoung has joined #tripleo | 23:09 | |
*** saneax_AFK is now known as saneax | 23:17 | |
*** dustins has quit IRC | 23:18 | |
openstackgerrit | Merged openstack/diskimage-builder: Turn of tracing around du invocations https://review.openstack.org/302454 | 23:18 |
*** egafford has quit IRC | 23:30 | |
*** akshai has quit IRC | 23:34 | |
*** dmacpher has quit IRC | 23:34 | |
*** rhallisey has quit IRC | 23:45 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!