*** igordc has quit IRC | 00:59 | |
*** hongbin has joined #openstack-kolla | 01:03 | |
*** happyhemant has quit IRC | 01:17 | |
*** JangwonLee_ has joined #openstack-kolla | 01:27 | |
*** JangwonLee has quit IRC | 01:30 | |
*** altlogbot_0 has quit IRC | 01:30 | |
*** altlogbot_2 has joined #openstack-kolla | 01:31 | |
*** hongbin has quit IRC | 01:57 | |
*** KeithMnemonic has quit IRC | 02:26 | |
*** vmixor has quit IRC | 02:46 | |
*** hongbin has joined #openstack-kolla | 02:57 | |
*** icarusfactor has quit IRC | 03:03 | |
*** whoami-rajat has joined #openstack-kolla | 03:12 | |
openstackgerrit | Merged openstack/kolla-ansible stable/stein: CI: add periodic-stable-jobs Zuul project template https://review.opendev.org/669630 | 03:16 |
---|---|---|
*** BjoernT has joined #openstack-kolla | 03:17 | |
*** Sravan has joined #openstack-kolla | 03:17 | |
*** shyamb has joined #openstack-kolla | 03:40 | |
*** skramaja has joined #openstack-kolla | 03:53 | |
*** hongbin has quit IRC | 04:04 | |
*** Sravan has quit IRC | 04:15 | |
*** factor has joined #openstack-kolla | 04:16 | |
*** shyamb has quit IRC | 04:20 | |
*** Sravan has joined #openstack-kolla | 04:27 | |
*** Sravan has quit IRC | 04:28 | |
*** Sravan has joined #openstack-kolla | 04:29 | |
*** Sravan has quit IRC | 04:33 | |
*** pcaruana has joined #openstack-kolla | 04:35 | |
*** pcaruana has quit IRC | 04:38 | |
*** Sravan has joined #openstack-kolla | 05:03 | |
*** Sravan has quit IRC | 05:08 | |
*** ivve has joined #openstack-kolla | 05:19 | |
*** BjoernT has quit IRC | 05:26 | |
*** luksky11 has joined #openstack-kolla | 05:30 | |
*** shyamb has joined #openstack-kolla | 05:36 | |
*** Luzi has joined #openstack-kolla | 05:37 | |
*** shyam89 has joined #openstack-kolla | 05:41 | |
*** shyamb has quit IRC | 05:41 | |
*** cah_link has joined #openstack-kolla | 06:02 | |
openstackgerrit | Radosław Piliszek proposed openstack/kolla-ansible stable/stein: Deprecate Ceph deployment https://review.opendev.org/669792 | 06:07 |
yoctozepto | morning | 06:15 |
yoctozepto | mgoddard: if you check the failures for nova refix, you will find that mariadb fixes did not help :-( | 06:16 |
*** dpawlik has joined #openstack-kolla | 06:19 | |
yoctozepto | and our ci errors queue grew too :-( | 06:22 |
egonzalez | ohwhyosa hi, looking into the ODL bug, can you check karaf.log, likely ODL havent started properly and so restconf service is not available | 06:41 |
openstackgerrit | Eduardo Gonzalez proposed openstack/kolla master: Upgrade ODL to fluorine release https://review.opendev.org/601094 | 06:49 |
*** shyam89 has quit IRC | 06:52 | |
*** rpittau|afk is now known as rpittau | 06:58 | |
openstackgerrit | Eduardo Gonzalez proposed openstack/kolla-ansible master: Upgrade ODL to fluorine release https://review.opendev.org/622896 | 07:00 |
*** shyamb has joined #openstack-kolla | 07:02 | |
*** pcaruana has joined #openstack-kolla | 07:03 | |
openstackgerrit | Eduardo Gonzalez proposed openstack/kolla-ansible master: Run OpenDaylight in NFV CI jobs https://review.opendev.org/598612 | 07:05 |
*** ianw is now known as ianw_pto | 07:16 | |
openstackgerrit | Jeffrey Zhang proposed openstack/kolla stable/rocky: Write hash after compressing horizon static assets https://review.opendev.org/669805 | 07:19 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 07:33 |
openstackgerrit | Merged openstack/kolla-ansible master: Exit on failure in init-runonce https://review.opendev.org/668149 | 07:33 |
*** Sravan has joined #openstack-kolla | 07:34 | |
openstackgerrit | Radosław Piliszek proposed openstack/kolla-ansible stable/stein: Exit on failure in init-runonce https://review.opendev.org/669808 | 07:36 |
openstackgerrit | Eduardo Gonzalez proposed openstack/kolla-ansible master: Test HAproxy in multinode https://review.opendev.org/625088 | 07:38 |
*** Sravan has quit IRC | 07:39 | |
openstackgerrit | Eduardo Gonzalez proposed openstack/kolla-ansible master: Test HAproxy in multinode https://review.opendev.org/625088 | 07:39 |
*** shyamb has quit IRC | 07:40 | |
*** shyamb has joined #openstack-kolla | 07:44 | |
*** priteau has joined #openstack-kolla | 07:44 | |
*** shyamb has quit IRC | 07:50 | |
*** gkadam has joined #openstack-kolla | 07:55 | |
*** gkadam has quit IRC | 07:55 | |
*** fxpester has joined #openstack-kolla | 07:56 | |
priteau | Good morning. One of our gate jobs in Kayobe master, which uses kolla-ansible stein, is failing with the following error: | 07:57 |
priteau | TASK [nova : Waiting for nova-compute services to register themselves] ********* | 07:57 |
priteau | fatal: [controller0]: FAILED! => {"msg": "The field 'vars' has an invalid value, which includes an undefined variable. The error was: 'nova_compute_services' is undefined\n\nThe error appears to have been in '/home/zuul/kolla-venv/share/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml': line 30, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe | 07:57 |
priteau | offending line appears to be:\n\n\n- name: Waiting for nova-compute services to register themselves\n ^ here\n"} | 07:57 |
priteau | It looks like nova_compute_services is used before it is defined at https://opendev.org/openstack/kolla-ansible/src/branch/master/ansible/roles/nova/tasks/discover_computes.yml#L38 | 07:58 |
mgoddard | priteau: yes, we hit this yesterday due to a patch merged to stein which does not support ansible<2.8 | 07:58 |
mgoddard | priteau: there is a fix you can cherry pick for now | 07:58 |
egonzalez | priteau https://review.opendev.org/#/c/669730/2 | 07:59 |
*** Wasaac has joined #openstack-kolla | 08:00 | |
mgoddard | egonzalez beat me to it | 08:00 |
priteau | Thanks. | 08:01 |
priteau | I see, it's allowed to declare the variable using the value from register as long as it is used in the until statement. | 08:02 |
openstackgerrit | Mark Goddard proposed openstack/kolla-ansible master: Test minimum supported and latest versions of Ansible https://review.opendev.org/668413 | 08:04 |
*** cgrosjean has joined #openstack-kolla | 08:07 | |
*** k_mouza has joined #openstack-kolla | 08:16 | |
*** k_mouza has quit IRC | 08:17 | |
*** k_mouza has joined #openstack-kolla | 08:17 | |
*** shyamb has joined #openstack-kolla | 08:23 | |
yoctozepto | priteau: one just can't reference registered in vars in ansible<2.8 | 08:25 |
yoctozepto | seems to be a feature of 2.8 rather than bug of earlier | 08:25 |
*** cgrosjea_ has joined #openstack-kolla | 08:42 | |
*** cgrosjean has quit IRC | 08:42 | |
*** cgrosjean has joined #openstack-kolla | 08:43 | |
mnasiadka | morning | 08:44 |
*** cgrosje__ has joined #openstack-kolla | 08:45 | |
*** cgrosjea_ has quit IRC | 08:46 | |
*** priteau has quit IRC | 08:47 | |
*** cgrosjean has quit IRC | 08:47 | |
*** priteau has joined #openstack-kolla | 08:48 | |
mgoddard | morning mnasiadka | 08:55 |
mnasiadka | hi mgoddard | 08:55 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 09:01 |
*** k_mouza has quit IRC | 09:02 | |
*** k_mouza has joined #openstack-kolla | 09:03 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 09:05 |
*** cgrosje__ has quit IRC | 09:06 | |
*** cgrosjean has joined #openstack-kolla | 09:11 | |
openstackgerrit | Radosław Piliszek proposed openstack/kolla-ansible master: Do not require valid migration_interface for controllers https://review.opendev.org/669631 | 09:11 |
*** hamzaachi has joined #openstack-kolla | 09:14 | |
yoctozepto | mgoddard, mnasiadka: ^ would be pleased if you reviewed the above change ;-) | 09:22 |
*** cgrosjean has quit IRC | 09:26 | |
*** cgrosjean has joined #openstack-kolla | 09:26 | |
mgoddard | yoctozepto: hadn't realised it was the same issue as discovery | 09:33 |
openstackgerrit | Merged openstack/kolla-ansible master: Fix nova deploy with Ansible<2.8 https://review.opendev.org/669730 | 09:33 |
openstackgerrit | Mark Goddard proposed openstack/kolla-ansible stable/stein: Fix nova deploy with Ansible<2.8 https://review.opendev.org/669828 | 09:34 |
openstackgerrit | Mark Goddard proposed openstack/kolla-ansible stable/rocky: Wait for all compute services before cell discovery https://review.opendev.org/669698 | 09:37 |
openstackgerrit | Mark Goddard proposed openstack/kolla-ansible stable/queens: Wait for all compute services before cell discovery https://review.opendev.org/669700 | 09:39 |
*** shyamb has quit IRC | 09:40 | |
*** cgrosjean has quit IRC | 09:44 | |
*** cgrosjean has joined #openstack-kolla | 09:45 | |
*** pcaruana has quit IRC | 09:48 | |
egonzalez | do anybody knows if second iface in infra VMs network range is reserved to each job or is used by other jobs? as for associate an IP address on that range or try to workaround something else | 09:48 |
ohwhyosa | Hey people! Morning! | 09:49 |
*** shyamb has joined #openstack-kolla | 09:52 | |
*** priteau has quit IRC | 10:03 | |
*** egonzalez has quit IRC | 10:07 | |
*** egonzalez has joined #openstack-kolla | 10:09 | |
stingrayza | mgoddard: looks like https://review.opendev.org/#/c/668413/5 was failing while waiting for https://review.opendev.org/#/c/669730/2 to go in. now that the latter is in, the first one should recheck ok? | 10:12 |
mgoddard | egonzalez: I'm not sure, would need to ask infra | 10:13 |
mgoddard | hi ohwhyosa | 10:13 |
yoctozepto | stingrayza: it hangs on https://review.opendev.org/669828 | 10:14 |
yoctozepto | (stein backport is required to pass it) | 10:14 |
stingrayza | ah, gotcha :) | 10:15 |
yoctozepto | <mgoddard> yoctozepto: hadn't realised it was the same issue as discovery | 10:32 |
yoctozepto | ^ did not get you, what did you mean? | 10:33 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 10:34 |
*** shyamb has quit IRC | 10:35 | |
*** luksky11 has quit IRC | 10:40 | |
mnasiadka | mgoddard: is that a huge sin, to enforce dbus installation on Ubuntu in bootstrap? (to make ceph-nfs work) :) | 10:43 |
mgoddard | yoctozepto: looks like the migration_interface issue is a similar underlying issue to the nova discovery issue? | 10:43 |
mgoddard | mnasiadka: probably not if you have ceph-nfs enabled | 10:44 |
yoctozepto | mgoddard: not really, the only common points are that it's ansible's fault and touches variable expansion point | 10:48 |
yoctozepto | mgoddard, mnasiadka: re: ceph - if we are dumping ceph and ceph-nfs does not seem to be too popular, then... ;-) | 10:49 |
mnasiadka | yoctozepto: we can't deprecate and drop in the same moment | 10:49 |
mnasiadka | yoctozepto: officially - it's supported :) | 10:49 |
mnasiadka | yoctozepto: that it doesn't work and nobody uses that, that's something else :D | 10:49 |
mgoddard | so we need to continue to support ceph for Train, and possibly U | 10:50 |
mgoddard | we could put ceph-nfs on the (long) list of things that are not really core and well maintained | 10:51 |
mgoddard | if someone wants to fix things on that list that are broken, great | 10:51 |
mgoddard | we don't necessarily need to prioritise them too highly though IMO | 10:52 |
*** shyamb has joined #openstack-kolla | 10:52 | |
mgoddard | hopefully we can formalise some of these ideas this cycle with a support matrix | 10:52 |
*** serhatd has joined #openstack-kolla | 10:53 | |
mgoddard | if you want to get ceph-nfs going mnasiadka, go for it | 10:53 |
mgoddard | but don't feel obliged :) | 10:53 |
mgoddard | that's my 2c anyway | 10:53 |
mnasiadka | well, I'm nearly done - so it doesn't make sense to stop now :) | 10:53 |
mgoddard | sure | 10:53 |
*** cgrosjean has quit IRC | 10:53 | |
mnasiadka | and then still, in migration to ceph-ansible we should test ceph-nfs, which I will need to fix anyway ;) | 10:54 |
*** cgrosjean has joined #openstack-kolla | 11:00 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 11:01 |
ohwhyosa | Should the router gateway interface be on the same lan as the nodes? | 11:04 |
*** dosaboy has quit IRC | 11:05 | |
ohwhyosa | I kinda managed twice to deploy a multinode, but none of the times does the network, well... work | 11:05 |
jovial[m] | When using kayobe, can I customize nova.conf on a per hypervisor level? How would I achieve that? | 11:06 |
ohwhyosa | mgoddard "we could put ceph-nfs on the (long) list of things that are not really core and well maintained" is that a figurative list of real, readable list? | 11:06 |
mgoddard | ohwhyosa: plan is for it to become a real list soon | 11:07 |
mgoddard | jovial[m]: etc/kayobe/kolla/config/nova/{{ inventory_hostname }}/nova.conf | 11:08 |
mgoddard | jovial[m]: or reference host/group vars in etc/kayobe/kolla/config/nova.conf | 11:09 |
*** dosaboy has joined #openstack-kolla | 11:09 | |
jovial[m] | thanks :) | 11:09 |
*** luksky11 has joined #openstack-kolla | 11:16 | |
*** altlogbot_2 has quit IRC | 11:19 | |
*** irclogbot_0 has quit IRC | 11:19 | |
*** altlogbot_2 has joined #openstack-kolla | 11:20 | |
*** zbr is now known as zbr|lunch | 11:22 | |
*** altlogbot_2 has quit IRC | 11:25 | |
*** hamdyk has joined #openstack-kolla | 11:40 | |
hamdyk | Hello, We are working on creating new container for neutron mlnx agent, how can we do that in kolla ? | 11:42 |
*** cgrosjean has quit IRC | 11:42 | |
*** jistr_ has joined #openstack-kolla | 11:51 | |
*** niceplace_ has joined #openstack-kolla | 11:52 | |
*** jistr has quit IRC | 11:55 | |
*** dannins has quit IRC | 11:55 | |
*** markmcclain has quit IRC | 11:55 | |
*** niceplace has quit IRC | 11:55 | |
*** hogepodge has quit IRC | 11:55 | |
*** mnasiadka has quit IRC | 11:55 | |
*** TheJulia has quit IRC | 11:55 | |
*** rwellum has quit IRC | 11:55 | |
*** zbr|lunch is now known as zbr | 11:59 | |
*** shyamb has quit IRC | 11:59 | |
*** irclogbot_1 has joined #openstack-kolla | 12:00 | |
*** cgrosjean has joined #openstack-kolla | 12:01 | |
*** dannins has joined #openstack-kolla | 12:01 | |
*** hogepodge has joined #openstack-kolla | 12:01 | |
*** mnasiadka has joined #openstack-kolla | 12:01 | |
*** TheJulia has joined #openstack-kolla | 12:01 | |
*** rwellum has joined #openstack-kolla | 12:01 | |
*** altlogbot_1 has joined #openstack-kolla | 12:02 | |
*** irclogbot_1 has quit IRC | 12:05 | |
*** altlogbot_1 has quit IRC | 12:05 | |
*** altlogbot_3 has joined #openstack-kolla | 12:08 | |
*** pcaruana has joined #openstack-kolla | 12:10 | |
*** altlogbot_3 has quit IRC | 12:11 | |
*** strigazi has quit IRC | 12:14 | |
*** strigazi has joined #openstack-kolla | 12:15 | |
*** cah_link has quit IRC | 12:18 | |
*** shyamb has joined #openstack-kolla | 12:26 | |
*** jistr_ is now known as jistr | 12:33 | |
yoctozepto | <ohwhyosa> Should the router gateway interface be on the same lan as the nodes? | 12:36 |
yoctozepto | what do you mean? | 12:36 |
yoctozepto | there are oh so many routers nowadays ;D | 12:37 |
mgoddard | mnasiadka: would you mind: https://review.opendev.org/#/c/669828, https://review.opendev.org/669792 | 12:39 |
mgoddard | hamdyk: hi. Do you want to create just an image or the kolla-ansible support too? | 12:40 |
yoctozepto | mgoddard: http://logs.openstack.org/31/669631/3/check/kolla-ansible-ubuntu-source-upgrade-ceph/cbf3b5c/primary/ara-report/ <- sad :-( | 12:40 |
mgoddard | hamdyk: some info on adding images here: https://docs.openstack.org/kolla/latest/contributor/CONTRIBUTING.html#adding-a-new-service | 12:40 |
mgoddard | yoctozepto: I'm tired of mariadb | 12:42 |
mgoddard | let's switch to NoSQL | 12:43 |
yoctozepto | mgoddard: lolz | 12:44 |
yoctozepto | but it looked so fixed! | 12:44 |
*** Wasaac has quit IRC | 12:45 | |
hamdyk | mgoddard: I see the doc you sent for adding new services in kolla | 12:47 |
hamdyk | mgoddard: it's clear and detailed | 12:48 |
hamdyk | not sure about kolla-ansible support | 12:48 |
hamdyk | but in case we had to, is there such a document for kolla-ansible ? | 12:49 |
mgoddard | hamdyk: yes: https://docs.openstack.org/kolla-ansible/latest/contributor/CONTRIBUTING.html#adding-a-new-service | 12:49 |
*** Wasaac has joined #openstack-kolla | 12:50 | |
hamdyk | mgoddard: thanks man, you have been very helpful :D | 12:52 |
mgoddard | hamdyk: np | 12:53 |
*** altlogbot_0 has joined #openstack-kolla | 12:54 | |
*** altlogbot_0 has quit IRC | 12:57 | |
*** skramaja has quit IRC | 13:09 | |
*** priteau has joined #openstack-kolla | 13:14 | |
*** goldyfruit has joined #openstack-kolla | 13:19 | |
*** shyamb has quit IRC | 13:23 | |
*** BjoernT has joined #openstack-kolla | 13:34 | |
*** cgrosjean has quit IRC | 13:39 | |
openstackgerrit | Radosław Piliszek proposed openstack/kolla-ansible master: Trivial fix: log stderr of init-runonce as well https://review.opendev.org/669873 | 13:40 |
openstackgerrit | Merged openstack/kolla-ansible stable/stein: Fix nova deploy with Ansible<2.8 https://review.opendev.org/669828 | 13:46 |
openstackgerrit | Merged openstack/kolla-ansible stable/stein: Deprecate Ceph deployment https://review.opendev.org/669792 | 13:46 |
*** BjoernT has quit IRC | 13:52 | |
*** cah_link has joined #openstack-kolla | 13:56 | |
*** BjoernT has joined #openstack-kolla | 13:57 | |
*** Luzi has quit IRC | 13:57 | |
*** cgrosjean has joined #openstack-kolla | 14:03 | |
openstackgerrit | Gaëtan Trellu proposed openstack/kolla-ansible master: Testing Masakari role in gate https://review.opendev.org/616050 | 14:04 |
mgoddard | yoctozepto mnasiadka what are we going to do about the stein release given the few mariadb failures that have popped up? | 14:10 |
*** heikkine has joined #openstack-kolla | 14:11 | |
mnasiadka | mgoddard: can we deprecate mariadb? | 14:14 |
mgoddard | :) | 14:14 |
mnasiadka | :) | 14:14 |
mgoddard | would love to | 14:15 |
kplant | yeah, let's go ot microsoft sql! | 14:17 |
*** cah_link has quit IRC | 14:18 | |
goldyfruit | Oracle ? | 14:18 |
openstackgerrit | Gaëtan Trellu proposed openstack/kolla-ansible master: Testing Masakari role in gate https://review.opendev.org/616050 | 14:18 |
*** whoami-rajat has quit IRC | 14:18 | |
*** BjoernT_ has joined #openstack-kolla | 14:30 | |
yoctozepto | <kplant> yeah, let's go ot microsoft sql! | 14:31 |
yoctozepto | <goldyfruit> Oracle ? | 14:31 |
yoctozepto | have you really worked with those or just heard the names? ;p | 14:31 |
yoctozepto | they fail all the same ;p | 14:31 |
yoctozepto | <mgoddard> yoctozepto mnasiadka what are we going to do about the stein release given the few mariadb failures that have popped up? | 14:31 |
yoctozepto | what do you propose? | 14:32 |
*** BjoernT has quit IRC | 14:32 | |
yoctozepto | is there anything better supported for the future (with a migration plan)? | 14:32 |
goldyfruit | yoctozepto, worked with both \o/ | 14:32 |
mgoddard | seriously, openstack == mysql/mariadb | 14:33 |
*** dpawlik has quit IRC | 14:34 | |
mgoddard | galera is probably the issue here | 14:34 |
goldyfruit | at the beginning postgres was one of the choice | 14:34 |
mgoddard | really we need to just make it more reliable. Question for now is, is it too unreliable to release? | 14:35 |
mgoddard | and what happens if it fails? can you just run it again and expect it to work? | 14:37 |
yoctozepto | goldyfruit: postgres has nice clustering | 14:38 |
kplant | yoctozepto: i've worked with mssql and it's cancer | 14:38 |
kplant | is was being sarcastic | 14:38 |
kplant | i* | 14:38 |
mgoddard | postgres seems to have a loyal following | 14:38 |
mgoddard | but overall its less popular than mariadb, and not well tested with openstack | 14:39 |
mgoddard | at one point I was testing with a 'sleep 60'. I guess that wouldn't be so bad if it lets us release | 14:39 |
mgoddard | how many passes do we need to see to be sure it's working though? | 14:39 |
yoctozepto | kplant: thanks for clarifying xD | 14:40 |
goldyfruit | mgoddard, was it the issue with Kolla and MariaDB ? | 14:40 |
yoctozepto | mgoddard: well, the thing I am worried the most | 14:40 |
yoctozepto | are scenarios like | 14:40 |
yoctozepto | it deploys | 14:40 |
mgoddard | goldyfruit: yeah | 14:40 |
yoctozepto | some API db calls go through just fine | 14:40 |
yoctozepto | and then it fails with wsrep | 14:40 |
yoctozepto | wtf | 14:40 |
mgoddard | I think that's due to the lack of haproxy | 14:40 |
mgoddard | possibly | 14:41 |
mgoddard | if we're pointing at a node that isn't primary, maybe we'd see weirdness like that? | 14:41 |
yoctozepto | mgoddard: but should not it then just work all the time? you could convince me that haproxy was to blame but not necessarily the other way around unless im missing something | 14:41 |
yoctozepto | because you are pointing to the same node all the time? | 14:42 |
mgoddard | well we restart things so could end up with a different primary depending on timing | 14:42 |
mgoddard | and non-primaries may or may not be up to date, depending on replication | 14:42 |
mgoddard | honestly I don't know | 14:43 |
goldyfruit | yoctozepto, mgoddard so what is the issue (sorry didn't follow everything) | 14:43 |
yoctozepto | mgoddard: hmm, but it is much after the restarts and some API succeed some do not | 14:44 |
mgoddard | yeah, could just be replication randomness | 14:44 |
yoctozepto | maybe there is something we are doing wrong that's so obvious that we miss it | 14:44 |
mgoddard | i.e. it works if you catch it when up to date, but not if it needs to sync | 14:45 |
yoctozepto | mgoddard: hmm, that's a valid reason | 14:45 |
mgoddard | could be, who knows | 14:45 |
yoctozepto | goldyfruit: mariadb issues | 14:45 |
yoctozepto | mostly upgrade | 14:45 |
yoctozepto | but sometimes also in regular deploy | 14:46 |
yoctozepto | we have observed CI failures | 14:46 |
yoctozepto | related to both mariadb failing to start properly | 14:46 |
yoctozepto | and other oddities in later stages | 14:46 |
yoctozepto | related to wsrep issues | 14:46 |
yoctozepto | goldyfruit: https://etherpad.openstack.org/p/kolla-ci-errors | 14:46 |
goldyfruit | Related to MariaDB packages upgrade ? | 14:46 |
yoctozepto | look at mariadb/wsrep things | 14:46 |
goldyfruit | oki | 14:47 |
yoctozepto | goldyfruit: we replace the containers | 14:47 |
yoctozepto | but some are for "fresh" deployments | 14:47 |
yoctozepto | so it's not bound to upgrades | 14:47 |
yoctozepto | possibly upgrades fail more often because we restart mariadb twice ;p | 14:47 |
yoctozepto | (well, twice the normal times) | 14:47 |
*** hamdyk has quit IRC | 14:51 | |
yoctozepto | mgoddard: what if we are just hitting some bug in mariadb | 14:52 |
yoctozepto | most failures come from centos | 14:52 |
*** ivve has quit IRC | 14:53 | |
goldyfruit | and what about percona xtradb ? | 14:53 |
yoctozepto | I did not notice an ubuntu failure after patching | 14:53 |
mgoddard | yoctozepto: the one you linked earlier was ubuntu: kef201.dh2.11p.lsw1 | 14:54 |
mgoddard | or http://logs.openstack.org/31/669631/3/check/kolla-ansible-ubuntu-source-upgrade-ceph/cbf3b5c/primary/ara-report/ | 14:54 |
yoctozepto | hmm, then I missed them in etherpad :-( | 14:54 |
yoctozepto | ah, I did not group them well | 14:55 |
yoctozepto | let me do some etherpad regrouping | 14:55 |
goldyfruit | mgoddard, yoctozepto during the upgrade how do you stop the containers ? | 14:57 |
goldyfruit | waiting the 10s from Docker and then kill -9 from Docker ? | 14:57 |
goldyfruit | Maybe not related at all but we got an issue with keepalived and docker within Kolla few years ago | 14:58 |
goldyfruit | if keepalived is not stopped gracefully then keepalived doesn't clean the resources properly | 14:58 |
goldyfruit | Maybe something like that happens with mariadb container | 14:59 |
yoctozepto | goldyfruit: keepalived/haproxy not used | 14:59 |
goldyfruit | I saw some post about that with MariaDB into Docker | 14:59 |
goldyfruit | yoctozepto, it was just to give an example of how bad was the kill -9 of Docker during the stop step | 15:00 |
goldyfruit | of Docker container* | 15:00 |
mgoddard | goldyfruit: it's a fair point - docker is very quick to kill -9, we should up that timeout | 15:01 |
yoctozepto | ! | 15:02 |
mgoddard | in this case we use mysqladmin shutdown because docker stop might not work for some rocky containers | 15:02 |
yoctozepto | and since we do stop/start in deploy | 15:02 |
goldyfruit | yeah | 15:02 |
yoctozepto | <mgoddard> in this case we use mysqladmin shutdown because docker stop might not work for some rocky containers | 15:02 |
yoctozepto | we do? | 15:02 |
yoctozepto | no hits in master | 15:02 |
mgoddard | yeah, only in stein, for upgrades | 15:03 |
yoctozepto | mgoddard: but stein->master fails the same | 15:03 |
yoctozepto | or you mean rocky->stein does not now? | 15:03 |
yoctozepto | I have to rethink the CI issue presentation | 15:03 |
yoctozepto | would be better to use some excel-like thing probably | 15:04 |
mgoddard | we only added mysqladmin shutdown in stein upgrades. Master still uses docker restart | 15:04 |
yoctozepto | and then just sort/filter | 15:04 |
yoctozepto | mgoddard: yeah, got you | 15:04 |
mgoddard | ethercalc? | 15:04 |
yoctozepto | the point is you might be right | 15:04 |
yoctozepto | ethercalc? o.O | 15:04 |
yoctozepto | https://ethercalc.openstack.org/kolla-ci-errors | 15:05 |
*** cgrosjean has quit IRC | 15:05 | |
*** fxpester has quit IRC | 15:06 | |
yoctozepto | not the best UI/UX | 15:07 |
yoctozepto | nah, too crappy | 15:08 |
yoctozepto | can we use something else? | 15:08 |
yoctozepto | or no better ideas? | 15:08 |
*** altlogbot_0 has joined #openstack-kolla | 15:12 | |
yoctozepto | re: postgres -> https://docs.openstack.org/oslo.db/latest/install/index.html#using-with-postgresql | 15:13 |
yoctozepto | seems to be still supported | 15:13 |
*** cgrosjean has joined #openstack-kolla | 15:13 | |
mgoddard | yeah sort of, but not well tested. And was considered for dropping a while back | 15:15 |
*** altlogbot_0 has quit IRC | 15:17 | |
yoctozepto | mgoddard: then no touching | 15:21 |
mgoddard | indeed | 15:23 |
mgoddard | too late for stein anyway | 15:23 |
openstackgerrit | Radosław Piliszek proposed openstack/kolla-ansible stable/stein: Exit on failure in init-runonce https://review.opendev.org/669808 | 15:25 |
yoctozepto | mgoddard: obviously, it was more about the future | 15:26 |
yoctozepto | but it seems it must be mariadb | 15:27 |
yoctozepto | honestly I've never had to upgrade a galera cluster | 15:27 |
yoctozepto | and also don't see what wrong we could be doing in deploy | 15:27 |
yoctozepto | but maybe it's worth checking the docker kill issue | 15:27 |
yoctozepto | it never failed me locally so might be that our vms are just a bit slower than normal | 15:28 |
yoctozepto | ggm | 15:30 |
yoctozepto | hmm* | 15:30 |
yoctozepto | look at | 15:30 |
yoctozepto | http://logs.openstack.org/30/669730/2/check/kolla-ansible-centos-source-upgrade-ceph/6f71867/secondary2/logs/kolla/mariadb/mariadb.txt.gz | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [ERROR] mysqld: Table './mysql/user' is marked as crashed and should be repaired | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [Warning] Checking table: './mysql/user' | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [ERROR] mysql.user: 1 client is using or hasn't closed the table properly | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [ERROR] mysqld: Table './mysql/db' is marked as crashed and should be repaired | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [Warning] Checking table: './mysql/db' | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [ERROR] mysql.db: 1 client is using or hasn't closed the table properly | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [Note] WSREP: Signalling provider to continue. | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [Warning] WSREP: SST position can't be set in past. Requested: 12101, Current: 12680. | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [Warning] WSREP: Can't continue. | 15:30 |
yoctozepto | 2019-07-08 21:54:21 0 [ERROR] Aborting | 15:31 |
yoctozepto | goldyfruit: you were riiiight | 15:32 |
yoctozepto | good node: | 15:32 |
goldyfruit | yeah \o/ | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Stop replication | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Closing send monitor... | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Closed send monitor. | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: gcomm: terminating thread | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: gcomm: joining thread | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: gcomm: closing backend | 15:32 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: view(view_id(NON_PRIM,94e9a10d,6) memb { | 15:32 |
yoctozepto | 94e9a10d,0 | 15:33 |
yoctozepto | } joined { | 15:33 |
*** hamzy has quit IRC | 15:33 | |
yoctozepto | } left { | 15:33 |
yoctozepto | } partitioned { | 15:33 |
yoctozepto | 94f373f1,0 | 15:33 |
yoctozepto | b005a5d0,0 | 15:33 |
yoctozepto | }) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: view((empty)) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: gcomm: closed | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Flow-control interval: [16, 16] | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Trying to continue unpaused monitor | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Received NON-PRIMARY. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 12680) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Received self-leave message. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 10 [Note] WSREP: New cluster view: global state: 87be6844-a1be-11e9-9356-b282b53359d4:12680, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3 | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Flow-control interval: [0, 0] | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Trying to continue unpaused monitor | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 12680) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: RECV thread exiting 0: Success | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: recv_thread() joined. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Closing replication queue. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 0 [Note] WSREP: Closing slave action queue. | 15:33 |
yoctozepto | 2019-07-08 21:54:02 10 [Note] WSREP: New cluster view: global state: 87be6844-a1be-11e9-9356-b282b53359d4:12680, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3 | 15:33 |
yoctozepto | 2019-07-08 21:54:02 10 [Note] WSREP: applier thread exiting (code:0) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 2 [Note] WSREP: applier thread exiting (code:6) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 9 [Note] WSREP: applier thread exiting (code:6) | 15:33 |
yoctozepto | 2019-07-08 21:54:02 12 [Note] WSREP: applier thread exiting (code:6) | 15:33 |
yoctozepto | 2019-07-08 21:54:04 1 [Note] WSREP: rollbacker thread exiting | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] Event Scheduler: Purging the queue. 0 events | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] InnoDB: FTS optimize thread exiting. | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: dtor state: CLOSED | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: mon: entered 12679 oooe fraction 0 oool fraction 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: mon: entered 12679 oooe fraction 0.0914899 oool fraction 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: mon: entered 12940 oooe fraction 0 oool fraction 7.72798e-05 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: cert index usage at exit 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: cert trx map usage at exit 113 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: deps set usage at exit 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: avg deps dist 22.8553 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: avg cert interval 0.36659 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: cert index size 101 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: Service thread queue flushed. | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.987776, misses: 155, in use: 0, in pool: 155 | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 12680) | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] WSREP: Flushing memory map to disk... | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] InnoDB: Starting shutdown... | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool | 15:34 |
yoctozepto | 2019-07-08 21:54:04 0 [Note] InnoDB: Buffer pool(s) dump completed at 190708 21:54:04 | 15:34 |
yoctozepto | 2019-07-08 21:54:06 0 [Note] InnoDB: Shutdown completed; log sequence number 25668564; transaction id 38178 | 15:34 |
yoctozepto | 2019-07-08 21:54:06 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" | 15:34 |
yoctozepto | 2019-07-08 21:54:06 0 [Note] /usr/libexec/mysqld: Shutdown complete | 15:34 |
yoctozepto | sorry for the spam! | 15:34 |
yoctozepto | will redo in paste | 15:34 |
yoctozepto | good node: | 15:34 |
yoctozepto | http://paste.openstack.org/show/754226/ | 15:35 |
yoctozepto | (look at timestamps) I pasted the whole shutdown procedure | 15:35 |
yoctozepto | bad node: | 15:35 |
yoctozepto | v | 15:35 |
yoctozepto | http://paste.openstack.org/show/754227/ | 15:35 |
kplant | haha | 15:35 |
yoctozepto | it did not finish shutdown | 15:35 |
yoctozepto | though why it failed to recover at all | 15:35 |
yoctozepto | is beyond me | 15:35 |
yoctozepto | the whole point of clustering is to make it more reliable ;p | 15:35 |
yoctozepto | mgoddard, goldyfruit: ^ | 15:35 |
mgoddard | jeez | 15:35 |
kplant | your client is good at buffering | 15:35 |
yoctozepto | yeah, again sorry for the spam | 15:35 |
yoctozepto | the pastes should be clearer | 15:36 |
mgoddard | yoctozepto: it depends on when that crash happened. Mariadb can recover from a crash in the same version, but rocky -> stein it cannot, which is why we need to be very careful about shutting down properly | 15:37 |
yoctozepto | mgoddard: 'tis be stein->master | 15:38 |
yoctozepto | specifically the docker stop issue | 15:38 |
yoctozepto | seems it can never recover | 15:38 |
yoctozepto | ;p | 15:38 |
yoctozepto | that's it for HA | 15:39 |
goldyfruit | http://eavesdrop.openstack.org/irclogs/%23openstack-kolla/%23openstack-kolla.2017-05-22.log.html | 15:41 |
goldyfruit | look for string "docker stop -t 30 mariadb will work better" | 15:41 |
*** altlogbot_2 has joined #openstack-kolla | 15:42 | |
*** altlogbot_2 has quit IRC | 15:47 | |
openstackgerrit | Gaëtan Trellu proposed openstack/kolla-ansible master: Testing Masakari role in gate https://review.opendev.org/616050 | 15:48 |
openstackgerrit | Mark Goddard proposed openstack/kolla-ansible master: WIP: Set default timeout to 60 seconds for docker stop https://review.opendev.org/669897 | 15:50 |
mgoddard | from those IRC logs: "just tested upgrade from ocata -=> master and mariadb keeps restarting" | 15:51 |
mgoddard | some things never change :) | 15:51 |
yoctozepto | goldyfruit: yeah, what mgoddard did is use shutdown | 15:51 |
yoctozepto | mgoddard: we should really be using shutdown, not docker stop | 15:51 |
yoctozepto | because if it needs that 61 seconds, then let it be | 15:52 |
mgoddard | yoctozepto: docker stop should trigger a shutdown | 15:52 |
*** priteau has quit IRC | 15:52 | |
yoctozepto | mgoddard: but will wait only 60s, not until the real shutdown | 15:52 |
mgoddard | yoctozepto: this may help us: https://review.opendev.org/669897 | 15:53 |
mgoddard | well, currently it will wait 10 seconds :) | 15:53 |
mgoddard | was thinking, perhaps we need a mariadb ansible module | 15:53 |
yoctozepto | mgoddard: yeah, the thing in general is fine by me - e.g. for all services | 15:53 |
mgoddard | might be nicer than docker exec + mysqladmin, and could wrap up multiple tasks into one | 15:53 |
goldyfruit | That could increase upgrade time but it should make it more consistent | 15:54 |
yoctozepto | you mean in-house or some external? | 15:54 |
mgoddard | in-house | 15:54 |
yoctozepto | k | 15:54 |
mgoddard | would be good if we factored out our docker interactions into a python module that we import | 15:55 |
mgoddard | these are just pipe dreams | 15:55 |
yoctozepto | I still don't get this error | 15:55 |
yoctozepto | 2019-07-08 21:54:21 0 [Warning] WSREP: SST position can't be set in past. Requested: 12101, Current: 12680. | 15:55 |
yoctozepto | it was a slave | 15:55 |
yoctozepto | and restarted as slave | 15:56 |
yoctozepto | and it speaks about position in the past | 15:56 |
*** igordc has joined #openstack-kolla | 15:56 | |
yoctozepto | dafuq | 15:56 |
*** hamzy has joined #openstack-kolla | 16:02 | |
yoctozepto | https://stackoverflow.com/questions/54664565/unable-to-complete-sst-transfer-due-to-wsrep-sst-position-cant-be-set-in-past | 16:03 |
yoctozepto | unresolved | 16:03 |
*** icarusfactor has joined #openstack-kolla | 16:04 | |
*** factor has quit IRC | 16:06 | |
*** factor__ has joined #openstack-kolla | 16:06 | |
*** altlogbot_1 has joined #openstack-kolla | 16:08 | |
*** ivve has joined #openstack-kolla | 16:08 | |
*** icarusfactor has quit IRC | 16:08 | |
*** BjoernT_ has quit IRC | 16:09 | |
mgoddard | night all \o | 16:12 |
*** altlogbot_1 has quit IRC | 16:13 | |
*** BjoernT_ has joined #openstack-kolla | 16:13 | |
yoctozepto | night | 16:17 |
*** Sravan has joined #openstack-kolla | 16:17 | |
yoctozepto | I expanded the CI failure listing for mariadb | 16:18 |
*** henriqueof has joined #openstack-kolla | 16:20 | |
*** hamzaachi has quit IRC | 16:20 | |
*** altlogbot_1 has joined #openstack-kolla | 16:20 | |
*** whoami-rajat has joined #openstack-kolla | 16:20 | |
*** altlogbot_1 has quit IRC | 16:23 | |
*** hamzaachi has joined #openstack-kolla | 16:23 | |
*** irclogbot_1 has joined #openstack-kolla | 16:24 | |
*** irclogbot_1 has quit IRC | 16:27 | |
*** factor__ has quit IRC | 16:29 | |
*** factor__ has joined #openstack-kolla | 16:30 | |
*** goldyfruit has quit IRC | 16:35 | |
*** rpittau is now known as rpittau|afk | 16:37 | |
*** luksky11 has quit IRC | 16:39 | |
*** cgrosjean has quit IRC | 16:41 | |
*** k_mouza has quit IRC | 16:41 | |
openstackgerrit | Merged openstack/kolla-ansible master: Trivial fix: log stderr of init-runonce as well https://review.opendev.org/669873 | 16:43 |
*** factor__ has quit IRC | 16:44 | |
*** mgoddard has quit IRC | 16:45 | |
*** mgoddard has joined #openstack-kolla | 16:48 | |
*** dpawlik has joined #openstack-kolla | 16:49 | |
*** altlogbot_3 has joined #openstack-kolla | 17:00 | |
*** altlogbot_3 has quit IRC | 17:05 | |
*** irclogbot_3 has joined #openstack-kolla | 17:10 | |
*** irclogbot_3 has quit IRC | 17:13 | |
*** Sravan has quit IRC | 17:16 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: DNM: Troubleshoot ceph-nfs on ubuntu https://review.opendev.org/669315 | 17:19 |
*** k_mouza has joined #openstack-kolla | 17:19 | |
*** dpawlik has quit IRC | 17:20 | |
*** k_mouza has quit IRC | 17:24 | |
*** igordc has quit IRC | 17:29 | |
*** Sravan has joined #openstack-kolla | 17:32 | |
*** cgrosjean has joined #openstack-kolla | 17:40 | |
*** Sravan has quit IRC | 17:42 | |
*** hamzy has quit IRC | 17:47 | |
*** Sravan has joined #openstack-kolla | 17:51 | |
*** hamzy has joined #openstack-kolla | 17:52 | |
*** Sravan has quit IRC | 17:53 | |
*** Sravan has joined #openstack-kolla | 17:54 | |
openstackgerrit | Merged openstack/kolla-ansible stable/stein: Exit on failure in init-runonce https://review.opendev.org/669808 | 17:57 |
yoctozepto | mgoddard: https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2019/glance.html <- maybe advertise kolla in a similar way | 18:01 |
mnasiadka | we are not so critical as glance :) | 18:03 |
*** Wasaac has quit IRC | 18:03 | |
mnasiadka | now mds has some problem in ceph, ugh | 18:05 |
*** luksky11 has joined #openstack-kolla | 18:09 | |
*** henriqueof has quit IRC | 18:09 | |
*** BjoernT_ has quit IRC | 18:09 | |
*** Wasaac has joined #openstack-kolla | 18:23 | |
*** whoami-rajat has quit IRC | 18:30 | |
*** irclogbot_2 has joined #openstack-kolla | 18:36 | |
*** irclogbot_2 has quit IRC | 18:39 | |
*** Wasaac has quit IRC | 18:40 | |
*** henriqueof has joined #openstack-kolla | 18:44 | |
*** BjoernT has joined #openstack-kolla | 18:45 | |
*** Sravan has quit IRC | 18:48 | |
*** hamzy has quit IRC | 18:49 | |
*** hamzy has joined #openstack-kolla | 18:49 | |
*** Sravan has joined #openstack-kolla | 18:49 | |
*** igordc has joined #openstack-kolla | 18:55 | |
*** Sravan has quit IRC | 18:59 | |
*** Wasaac has joined #openstack-kolla | 19:01 | |
*** factor has joined #openstack-kolla | 19:06 | |
*** Wasaac has quit IRC | 19:07 | |
*** Sravan has joined #openstack-kolla | 19:10 | |
*** jonaspaulo has joined #openstack-kolla | 19:11 | |
*** Sravan has quit IRC | 19:12 | |
*** Sravan has joined #openstack-kolla | 19:16 | |
*** hamzaachi has quit IRC | 19:18 | |
*** Sravan has quit IRC | 19:19 | |
*** Sravan has joined #openstack-kolla | 19:21 | |
*** Sravan has quit IRC | 19:22 | |
*** nde has joined #openstack-kolla | 19:24 | |
*** Sravan has joined #openstack-kolla | 19:24 | |
nde | Hi all. My deployment seems to run fine except the ceph-osd containers never fire up. Any ideas? | 19:25 |
*** goldyfruit has joined #openstack-kolla | 19:25 | |
kplant | could be a number of reasons, did you mark your disks with the proper labels? | 19:25 |
nde | I did. The disk was labeled proper and the proper ceph items tagged in the inventory file. All other ceph containers come up fine. ceph-osd just never gets created. It's strange. | 19:28 |
kplant | you look in /var/log/kolla/ ? | 19:29 |
*** Sravan has quit IRC | 19:29 | |
kplant | or do a ceph -s in the monitor container | 19:29 |
nde | ok | 19:32 |
kplant | i would double check your labels as well, they should have been changed if they were found by kolla-ansible | 19:33 |
nde | https://pastebin.com/MeSP8rHs | 19:34 |
nde | I trimmed out critical data | 19:34 |
kplant | osd: 0 osds: 0 up, 0 in | 19:34 |
kplant | right there | 19:34 |
kplant | if it knew about your osds it would be reporting them down | 19:34 |
kplant | you also don't want an even number of monitors | 19:35 |
kplant | 2n+1 is usually best practice in a cluster | 19:35 |
*** nde has quit IRC | 19:37 | |
*** nde has joined #openstack-kolla | 19:37 | |
nde | kplant: sorry was disconnected if I missed anything after sending the pastie | 19:37 |
kplant | 15:34 < kplant> osd: 0 osds: 0 up, 0 in | 19:37 |
kplant | 15:34 < kplant> right there | 19:37 |
kplant | 15:34 < kplant> if it knew about your osds it would be reporting them down | 19:37 |
kplant | 15:35 < kplant> you also don't want an even number of monitors | 19:38 |
kplant | 15:35 < kplant> 2n+1 is usually best practice in a cluster | 19:38 |
nde | Perhaps thats the issue altogether. We'll trim down the even number. | 19:38 |
kplant | that's not your problem here, it's only problem in the events of failures really | 19:38 |
kplant | split-brain | 19:38 |
nde | I see. Pretty stange. | 19:39 |
kplant | i would look at your disks and make sure kolla did find them and make the partitions it needed to | 19:39 |
nde | Ok. Would there be anything stopping Kolla from hitting /dev/sda? | 19:39 |
kplant | nope, what labels did you use? | 19:40 |
nde | parted /dev/sda -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1 | 19:40 |
kplant | are you using filestore or bluestore | 19:42 |
nde | Perhaps that's the issue. | 19:43 |
yoctozepto | nde, kplant: better yet: what release of kolla-ansible are you using? the one from PyPI currently still deploys rocky only | 19:43 |
nde | In /etc/ansible/group_vars/all.yml the system is defined as bluestore | 19:43 |
nde | 8.0.0.0rc1 | 19:44 |
yoctozepto | nde: k, that's for stein | 19:44 |
yoctozepto | nde: why using /etc/ansible/group_vars/all.yml ? | 19:44 |
yoctozepto | if it's inventory in /etc/ansible then be aware that we might be overriding it by group_vars/all.yml in k-a playbooks dir | 19:45 |
nde | I see. | 19:45 |
yoctozepto | though we default to bluestore anyway | 19:45 |
nde | Should it be another location that is set? | 19:45 |
nde | Ok.. .perhaps it's my disk label that is incorrect because I noticed it is not for bluestore. | 19:46 |
yoctozepto | more specific groups than 'all' will override playbook's 'all' | 19:46 |
yoctozepto | and for globals we use globals.yml in /etc/kolla | 19:46 |
yoctozepto | note that globals override anything | 19:47 |
yoctozepto | because they are sourced as extra variables | 19:47 |
yoctozepto | (that's how ansible works) | 19:47 |
nde | yoctozepto: That's very helpful. Should I just wack all.yml in that case? | 19:47 |
yoctozepto | nde: yeah, saving your customizations elsewhere | 19:48 |
*** Bico_Fino has quit IRC | 19:49 | |
nde | yoctozepto: Very good. Thanks. I'll do that and then properly tag my disks. Perhaps that will help. | 19:49 |
yoctozepto | nde: very likely since labels were wrong indeed | 19:50 |
nde | Thank you guys! | 19:51 |
yoctozepto | nde: you are welcome | 19:53 |
*** cgrosjean has quit IRC | 20:04 | |
*** cgrosjean has joined #openstack-kolla | 20:05 | |
*** Sravan has joined #openstack-kolla | 20:05 | |
*** cgrosjean has quit IRC | 20:06 | |
*** ivve has quit IRC | 20:06 | |
*** cgrosjean has joined #openstack-kolla | 20:08 | |
*** cgrosjean has quit IRC | 20:09 | |
*** Sravan has quit IRC | 20:10 | |
*** cgrosjean has joined #openstack-kolla | 20:12 | |
*** cgrosjean has quit IRC | 20:16 | |
*** nde has quit IRC | 20:21 | |
*** hamzaachi has joined #openstack-kolla | 20:21 | |
*** irclogbot_1 has joined #openstack-kolla | 20:24 | |
*** irclogbot_1 has quit IRC | 20:27 | |
*** hamzy has quit IRC | 20:37 | |
*** strigazi has quit IRC | 20:42 | |
*** BjoernT_ has joined #openstack-kolla | 20:43 | |
*** strigazi has joined #openstack-kolla | 20:43 | |
*** BjoernT has quit IRC | 20:44 | |
*** Sravan has joined #openstack-kolla | 21:00 | |
*** irclogbot_3 has joined #openstack-kolla | 21:14 | |
*** pcaruana has quit IRC | 21:19 | |
*** irclogbot_3 has quit IRC | 21:19 | |
*** goldyfruit has quit IRC | 21:20 | |
*** irclogbot_1 has joined #openstack-kolla | 21:38 | |
*** irclogbot_1 has quit IRC | 21:43 | |
*** BjoernT_ has quit IRC | 21:52 | |
*** irclogbot_0 has joined #openstack-kolla | 21:54 | |
*** altlogbot_3 has joined #openstack-kolla | 21:55 | |
*** altlogbot_3 has quit IRC | 21:55 | |
*** henriqueof has quit IRC | 21:58 | |
*** irclogbot_0 has quit IRC | 21:59 | |
*** igordc has quit IRC | 22:00 | |
*** cgrosjean has joined #openstack-kolla | 22:09 | |
*** luksky11 has quit IRC | 22:12 | |
*** Sravan has quit IRC | 22:14 | |
*** goldyfruit has joined #openstack-kolla | 22:28 | |
*** Sravan has joined #openstack-kolla | 22:34 | |
*** hamzaachi has quit IRC | 22:37 | |
*** Sravan has quit IRC | 22:43 | |
*** Sravan has joined #openstack-kolla | 22:43 | |
*** altlogbot_2 has joined #openstack-kolla | 22:44 | |
*** serhatd has quit IRC | 22:45 | |
*** altlogbot_2 has quit IRC | 22:49 | |
*** jonaspaulo has quit IRC | 22:50 | |
*** Sravan has quit IRC | 22:54 | |
*** goldyfruit has quit IRC | 23:14 | |
*** Wasaac has joined #openstack-kolla | 23:46 | |
*** Wasaac has quit IRC | 23:50 | |
*** hamzy has joined #openstack-kolla | 23:59 | |
*** kplant has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!