*** sshnaidm|afk has joined #tripleo | 00:20 | |
*** sshnaidm|afk is now known as sshnaidm|off | 00:20 | |
*** gouthamr has quit IRC | 00:57 | |
*** gouthamr has joined #tripleo | 00:59 | |
*** tosky has quit IRC | 01:01 | |
*** rfolco has joined #tripleo | 01:02 | |
*** rfolco has quit IRC | 01:18 | |
*** openstackgerrit has quit IRC | 01:37 | |
*** zzzeek has quit IRC | 02:14 | |
*** zzzeek has joined #tripleo | 02:15 | |
*** Goneri has quit IRC | 02:19 | |
*** dhill has quit IRC | 03:37 | |
*** skramaja has joined #tripleo | 03:40 | |
*** dhill has joined #tripleo | 03:46 | |
*** mvalsecc has quit IRC | 04:32 | |
*** hakhande has joined #tripleo | 04:44 | |
*** ykarel|away has joined #tripleo | 04:53 | |
*** mvalsecc has joined #tripleo | 04:53 | |
*** ykarel|away is now known as ykarel | 04:53 | |
*** udesale has joined #tripleo | 04:56 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #tripleo | 05:33 | |
*** lbragstad has quit IRC | 06:02 | |
*** lbragstad has joined #tripleo | 06:03 | |
*** hakhande has quit IRC | 06:17 | |
*** ratailor has joined #tripleo | 06:20 | |
*** hakhande has joined #tripleo | 06:29 | |
*** bandini has joined #tripleo | 06:38 | |
*** lmiccini has joined #tripleo | 06:51 | |
*** ysandeep|away is now known as ysandeep | 06:55 | |
*** rcernin has quit IRC | 07:04 | |
*** marios has joined #tripleo | 07:17 | |
*** rcernin has joined #tripleo | 07:17 | |
*** rcernin has quit IRC | 07:18 | |
*** saneax has joined #tripleo | 07:31 | |
*** hakhande has quit IRC | 07:41 | |
*** amoralej|off is now known as amoralej | 07:44 | |
*** jcapitao has joined #tripleo | 07:57 | |
*** belmoreira has joined #tripleo | 08:05 | |
*** cylopez has joined #tripleo | 08:30 | |
*** tkajinam has quit IRC | 08:31 | |
*** tkajinam has joined #tripleo | 08:32 | |
*** jaosorior has joined #tripleo | 08:35 | |
*** pcaruana has joined #tripleo | 08:35 | |
*** jpena|off is now known as jpena | 08:57 | |
*** xek_ has joined #tripleo | 08:58 | |
*** jpich has joined #tripleo | 08:59 | |
*** tosky has joined #tripleo | 09:05 | |
*** frenzy_friday has joined #tripleo | 09:06 | |
*** derekh has joined #tripleo | 09:09 | |
*** ysandeep is now known as ysandeep|lunch | 09:12 | |
*** ramishra has quit IRC | 09:19 | |
*** ramishra has joined #tripleo | 09:40 | |
*** gfidente|afk is now known as gfidente | 09:46 | |
*** mvalsecc has quit IRC | 09:49 | |
*** karthiks has joined #tripleo | 10:22 | |
*** ysandeep|lunch is now known as ysandeep | 11:13 | |
*** cmorey has joined #tripleo | 11:28 | |
cmorey | where's the best place to ask about troubleshooting ceph storage node deployment with 'openstack overcloud deploy' I'm getting @stderr: [errno 110] error connecting to the cluster\", \"--> RuntimeError: Unable to create a new OSD id\@ when it's in TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] | 11:31 |
---|---|---|
*** rfolco has joined #tripleo | 11:36 | |
Tengu | gfidente, fultonj, fmount -^^ that's for you I think :) | 11:47 |
fmount | cmorey: o/ that should be ceph-ansible (triggered during the overcloud deploy) trying to build the ceph components: I think you should take a look into your <config-download>/ceph-ansible dir (where a ceph_ansible_command.log can be found) | 11:50 |
fmount | Tengu: thanks for the ping | 11:50 |
Tengu | fmount: bouncing things to other folks is my Friday Pleasure ;) | 11:51 |
fmount | cmorey: <config-download> could be /var/lib/mistral/<stack_name> | 11:51 |
fmount | Tengu: ahahah ++ | 11:51 |
fmount | cmorey: that error you see there reminds me some issues related to mons <-> osds nodes connectivity: check they can reach each other on the storage network | 11:53 |
fmount | (mons can be found collocated within the openstack services in controller nodes, osds in the ceph-storage nodes) | 11:53 |
fmount | fultonj: gfidente fyi ^ | 11:53 |
*** jcapitao is now known as jcapitao_luncj | 11:57 | |
*** jcapitao_luncj is now known as jcapitao_lunch | 11:57 | |
cmorey | fmount, ah... that's a nugget i was missing | 11:57 |
cmorey | fmount (the location of the mons), | 11:57 |
cmorey | let me doublecheck the network config then, I should have prov, storage and storagemgmt vlans configured on both the controller and ceph node | 11:58 |
fmount | ack | 11:59 |
Tengu | and check container status | 11:59 |
cmorey | dangnabbit | 12:00 |
Tengu | nice password | 12:01 |
* Tengu runs away | 12:01 | |
*** ratailor has quit IRC | 12:01 | |
cmorey | not a password, just an experession of (oops, i may have found the problem), although the vlans appear to be configured, i can't ping from ceph to controller on at least one of the 3 vlans | 12:02 |
Tengu | route issue? | 12:02 |
hjensas | reviews please - https://review.opendev.org/c/openstack/tripleo-ansible/+/763377 *thanks* | 12:03 |
Tengu | hjensas: uho.... | 12:03 |
Tengu | I'll need to update tripleo-lab I think, with that change :/ | 12:03 |
Tengu | hjensas: does it have any impact on the "name_lower" thing in the network description? | 12:04 |
Tengu | *network-data | 12:05 |
hjensas | Tengu: I don't think we remove the role_networks_lower map set in group_vars. So if you have working templates using that they should'nt be broken. | 12:05 |
hjensas | Tengu: but this will https://review.opendev.org/c/openstack/tripleo-heat-templates/+/763379 | 12:05 |
Tengu | erf | 12:05 |
Tengu | guess I'll start an env with those 2 patches. | 12:06 |
hjensas | Tengu: I'm open to discuss if we should'nt merge that thing. | 12:06 |
hjensas | Tengu: i.e keep the role_networks_lower map in group_vars to not break people. I just think it's not so long since we introduced, it maby we can be a bit naughty and just clean it up before we have to keep both forever because lot's of people are using it. | 12:07 |
Tengu | hjensas: well... if the migration path is known, I'm not against. for instance, here's what I generate here: https://github.com/cjeanner/tripleo-lab/blob/master/roles/overcloud/templates/oc-network-data.yaml.j2 | 12:07 |
Tengu | but it's "name_lower" here. sooooo maybe I'm just mixing things up ? | 12:08 |
Tengu | just want to avoid surprises on the network part next time I deploy ^^'. call me selfish ;) | 12:08 |
cmorey | Tengu, not sure, vlan configuration looks o.k. on the switch, i think there may be an issue with the network profile | 12:13 |
hjensas | Tengu: yes, you are mixing things up. It's not the name_lower in network data. It's a group_var role_networks_lower used by the ansible j2 nic config templates that change. | 12:25 |
Tengu | hjensas: fine then :). Better asking than being lost later. | 12:26 |
*** jpich has quit IRC | 12:26 | |
hjensas | Tengu: indeed. | 12:26 |
*** jpich has joined #tripleo | 12:27 | |
hjensas | Tengu: btw, we should catch up on the need to disable validations when using nova-less (pre-deployed) server some time. | 12:27 |
Tengu | hjensas: ah, I saw a ping some time ago indeed! | 12:27 |
hjensas | Tengu: since nova-less is the default now, it's not nice that we have to use | 12:27 |
hjensas | Tengu: '--disable-validations' flag. | 12:27 |
Tengu | yep. would be interesting to understand why we have to deactivate them, and maybe do some cleanup in order to remove the problematic things. Or, better, correct them | 12:28 |
hjensas | Tengu: I think that disables many validations we do want to run ... | 12:28 |
Tengu | yup | 12:28 |
* hjensas needs to figure out why it's needed | 12:28 | |
Tengu | hjensas: we can have a discussion next week? | 12:28 |
hjensas | Tengu: yes, if I find time to investigate before. Let's revisit next week. | 12:29 |
Tengu | hjensas: same for me - need to do some checks on my own | 12:29 |
Tengu | hjensas: you know where you can find me ;). | 12:29 |
Tengu | hjensas: I think it's related to utils.check_stack_network_matches_env_files (in tripleoclient/v1/overcloud_deploy.py) | 12:31 |
*** jpena is now known as jpena|lunch | 12:32 | |
Tengu | though the easiest way is to disable the check on the parameter inputs and just run things. | 12:32 |
Tengu | and see how it burns :] | 12:33 |
*** zzzeek has quit IRC | 12:36 | |
*** zzzeek has joined #tripleo | 12:37 | |
*** cgoncalves has quit IRC | 12:45 | |
*** cgoncalves has joined #tripleo | 12:46 | |
*** cgoncalves has quit IRC | 12:47 | |
*** udesale_ has joined #tripleo | 12:47 | |
*** udesale has quit IRC | 12:49 | |
*** weshay|pto is now known as weshay|ruck | 12:49 | |
cmorey | Tengu, thanks for that nugget, for some reason i assumed the mon would be running on the (only) CephStorage node, and it turns out my vlan assignments were on the wrong port (obscured by the fact that i was seeing vlan 1 traffic on the port i was expecting to be up) | 12:50 |
Tengu | cmorey: heh - yeah, ceph mon are on the controllers, only the OSD are on the ceph nodes | 12:51 |
*** jcapitao_lunch is now known as jcapitao | 12:58 | |
*** rlandy has joined #tripleo | 13:00 | |
*** bandini has quit IRC | 13:03 | |
*** jpich has quit IRC | 13:03 | |
*** jpich has joined #tripleo | 13:04 | |
fultonj | fmount: gfidente: i think we can land the non-wip parts of https://review.opendev.org/q/topic:%22tripleo_ceph_client%22 soon. | 13:07 |
fultonj | so we're trying to get RDO green (though it's red for unrelated reasons) | 13:08 |
fultonj | and tht patch needs green zuul but others are green with zuul | 13:08 |
fultonj | fmount: i'll be trying your CephExternalMultiConfig idea today | 13:09 |
*** hberaud has quit IRC | 13:14 | |
*** bandini has joined #tripleo | 13:15 | |
*** hberaud has joined #tripleo | 13:16 | |
gfidente | fultonj yeah I am looking at the comments in the puppet change right now | 13:17 |
*** skramaja has quit IRC | 13:19 | |
gfidente | so I might have to make a small change to a parameter description | 13:21 |
gfidente | fmount fultonj ^^ do we have any job in progress waiting to finish or can I update puppet change now? | 13:21 |
fultonj | gfidente: not me | 13:22 |
Tengu | hjensas: sooo.... I'm running an overcloud deploy with pre-deployed nodes (using metalsmith) AND validations... let's see. | 13:22 |
Tengu | since I happen to have an env. | 13:22 |
*** jpena|lunch is now known as jpena | 13:29 | |
*** ysandeep is now known as ysandeep|mtg | 13:33 | |
*** cgoncalves has joined #tripleo | 13:40 | |
cmorey | Tengu, i'm not sure if the fact i have multipath on this node (so the drive is showing up twice) is causing it, but i;ve only told it to use one path, but i'm getting a error that looks like its unhappy about non-zero exit code due to "WARNING: The same type, major and minor should not be used for multiple devices." which is generating a " KeyError: 'ceph.cluster_name'" | 13:41 |
cmorey | Tengu, short of phyiscally pulling one of the SAS links, is there a way around this? | 13:41 |
Tengu | cmorey: you probably want to talk again to fmount and his team :) | 13:41 |
*** fmount has quit IRC | 13:46 | |
cmorey | as if on queue :) | 13:47 |
cmorey | cue even | 13:47 |
cmorey | thanks for your help so far Tengu | 13:48 |
*** pleimer_ has joined #tripleo | 13:48 | |
slagle | hjensas: what do you think about https://review.opendev.org/c/openstack/tripleo-ansible/+/765431? i was running into the issue this fixes yesterday | 13:50 |
*** amoralej is now known as amoralej|lunch | 13:52 | |
cmorey | Tengu, looks a lot like https://access.redhat.com/solutions/5398181 . | 13:52 |
Tengu | meh.... | 13:53 |
cmorey | except i'm not running in HCI (and it's not RHSOP, but train/centos7) | 13:53 |
hjensas | slagle: that makes sense. Thanks! +2 | 13:53 |
cmorey | so i guess fmount/gfidente/fultonj are my go-tos for it? | 13:54 |
Tengu | cmorey: osp-16 (basically stable/train) is aiming rhel8 (centos-8). I think you might want to move away from centos-7 | 13:54 |
Tengu | cmorey: and yeah - they are the best ppl for ceph issues :) | 13:54 |
cmorey | Tengu, centos8 dropped support for the sas cards in the boxes i'm using as a testbench,... so.. | 13:55 |
Tengu | yay... | 13:55 |
Tengu | how convenient | 13:55 |
*** fmount has joined #tripleo | 13:55 | |
*** saneax has quit IRC | 13:57 | |
cmorey | ah ha, speak of fmount and ... | 14:02 |
*** tkajinam has quit IRC | 14:07 | |
cmorey | fmount: o/ i'm getting the same error as https://access.redhat.com/solutions/5398181 but i'm not running HCI | 14:13 |
fmount | cmorey: hey, mmm let me see, didn't see this kind of error before but we can investigate | 14:18 |
*** ysandeep|mtg is now known as ysandeep | 14:19 | |
fmount | so it's basically ceph-volume failing on running that command to build the osd | 14:19 |
*** cmorey has quit IRC | 14:21 | |
*** cmorey has joined #tripleo | 14:25 | |
fmount | cmorey: hey I found an open tracker for this kind of bug in ceph-volume | 14:28 |
fmount | cmorey: https://tracker.ceph.com/issues/44356 | 14:28 |
*** bogdando has joined #tripleo | 14:29 | |
fmount | cmorey: can you show me the ceph-ansible DiskConfig? | 14:31 |
cmorey | where would i find it? | 14:32 |
*** mcornea has joined #tripleo | 14:32 | |
fmount | you should have a storage/ceph/ceph-ansible environement file (where the disks are specified) included in your deploy command, let me see the content of that file | 14:34 |
*** apetrich has quit IRC | 14:35 | |
cmorey | sorry, brain slow, one sec | 14:35 |
cmorey | http://paste.openstack.org/show/800735/ | 14:35 |
*** apetrich has joined #tripleo | 14:38 | |
slagle | hjensas: thanks :) | 14:39 |
*** amoralej|lunch is now known as amoralej | 14:40 | |
*** bogdando has quit IRC | 14:40 | |
*** bnemec has joined #tripleo | 14:44 | |
gfidente | fmount fultonj tht just passed | 14:45 |
gfidente | I am going to refresh puppet just to fix the parameters comment | 14:45 |
fmount | cmorey: osd_scanario: lvm | 14:45 |
fmount | s/scanario/scenario | 14:45 |
cmorey | o.k. | 14:46 |
fultonj | gfidente: glad it passed, sounds good | 14:46 |
fmount | gfidente: /me checking the logs | 14:46 |
cmorey | fmount, whoops,.. thanks | 14:46 |
*** TrevorV has joined #tripleo | 14:48 | |
*** bnemec is now known as beekneemech | 14:50 | |
*** tmazur has joined #tripleo | 14:51 | |
fmount | fultonj: gfidente logs look good, just rechecked with scenario00{1,4} => 760915 | 14:52 |
*** openstackgerrit has joined #tripleo | 14:52 | |
openstackgerrit | Giulio Fidente proposed openstack/puppet-tripleo master: Remove /etc/ceph dependency on puppet services https://review.opendev.org/c/openstack/puppet-tripleo/+/763545 | 14:52 |
*** Goneri has joined #tripleo | 14:54 | |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key https://review.opendev.org/c/openstack/tripleo-ansible/+/762160 | 14:56 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module https://review.opendev.org/c/openstack/tripleo-ansible/+/764638 | 14:56 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output https://review.opendev.org/c/openstack/tripleo-ansible/+/764639 | 14:56 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports https://review.opendev.org/c/openstack/tripleo-ansible/+/764640 | 14:56 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support https://review.opendev.org/c/openstack/tripleo-ansible/+/762162 | 14:56 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook https://review.opendev.org/c/openstack/tripleo-ansible/+/765545 | 14:56 |
gfidente | fmount ack I fixed the params in the puppet change | 14:57 |
gfidente | (descriptions) | 14:57 |
fmount | ++ thanks | 14:57 |
*** jbadiapa has joined #tripleo | 15:00 | |
openstackgerrit | Francesco Pantano proposed openstack/tripleo-heat-templates master: WIP - Disable ceph client role execution https://review.opendev.org/c/openstack/tripleo-heat-templates/+/760915 | 15:02 |
openstackgerrit | Sandeep Yadav proposed openstack/tripleo-quickstart-extras master: Modify overcloud-deploy to support multiple stacks https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/763786 | 15:11 |
*** ysandeep is now known as ysandeep|away | 15:23 | |
cmorey | fmount, died with the same error | 15:24 |
*** pojadhav|ruck is now known as pojadhav|afk | 15:25 | |
cmorey | should i re-enable multipathd and switch to the multi-path device? | 15:25 |
*** mcornea has quit IRC | 15:27 | |
fmount | not sure, do you have any ceph-volume.log <= in the storage node which is failing | 15:29 |
cmorey | yeah, it seems obsessed with /dev/sda1 | 15:29 |
fultonj | cmorey: did you clean your disks? | 15:30 |
fultonj | https://bugzilla.redhat.com/show_bug.cgi?id=1613918 | 15:30 |
cmorey | fultonj, earlier today i removed the vg and pv.. for the lvm volume /dev/sda is the OS disk | 15:30 |
openstack | bugzilla.redhat.com bug 1613918 in Documentation-RHHI4C "[Docs] The Ceph Guide for OpenStack should have have a disk cleaning recomendation" [Medium,Verified] - Assigned to agunn | 15:30 |
fultonj | the OS disk cannot be used as an osd | 15:31 |
fmount | yeah cmorey if you can rerun after cleaning your disks it's better | 15:31 |
fmount | cmorey: is /dev/sdb used for osds, right? | 15:31 |
cmorey | fultonj, i don't want it to use the OS disk | 15:31 |
fmount | fultonj: http://paste.openstack.org/show/800735/ | 15:31 |
fultonj | good | 15:32 |
cmorey | fultonj, i only want it to use /dev/sdb (or ideally /dev/mapper/mpatha | 15:32 |
cmorey | I did a full node clean before i tried to start deploying it as a cephNode | 15:32 |
cmorey | i haven't cleaned it since though | 15:32 |
fultonj | cmorey: please see https://bugzilla.redhat.com/show_bug.cgi?id=1613918#c1 | 15:33 |
openstack | bugzilla.redhat.com bug 1613918 in Documentation-RHHI4C "[Docs] The Ceph Guide for OpenStack should have have a disk cleaning recomendation" [Medium,Verified] - Assigned to agunn | 15:33 |
fultonj | you could run 'lsblk' on the node | 15:34 |
cmorey | fultonj, /dev/sdb is a new array added to the node after deployment, the only thing that's used it is the ceph-node | 15:34 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook https://review.opendev.org/c/openstack/tripleo-ansible/+/765545 | 15:35 |
fultonj | did you read comment #1 ? | 15:35 |
fultonj | it needs to be cleaned in between every deployment attempt | 15:35 |
cmorey | :( | 15:36 |
*** ekultails has joined #tripleo | 15:37 | |
cmorey | fultonj, the node is active, so i can't clean it... | 15:37 |
fultonj | is it in production? | 15:38 |
*** udesale_ has quit IRC | 15:38 | |
cmorey | the rest of the overcloud kind of is | 15:38 |
cmorey | i guess i can set cephnode count to 0 and see if that will de-provision | 15:39 |
mwhahaha | no | 15:39 |
fultonj | deprovisoning is documented | 15:39 |
mwhahaha | you have to delete teh node (if deployed) | 15:39 |
fultonj | production server with only 1 OSD? | 15:39 |
cmorey | openstack baremetal node delete | 15:39 |
fultonj | no | 15:39 |
cmorey | fultonj, PoC test cluster | 15:39 |
fultonj | you can ssh into it and directly clean it | 15:40 |
hjensas | slagle: since you are testing the network extract/provision things I added you on a couple of reviews that still need to merge. | 15:40 |
fultonj | sgdisk -Z | 15:40 |
fultonj | and there's a dmsetup command too | 15:40 |
cmorey | fultonj, one equestion, do you want me to clean the whole node, or just sdb? | 15:41 |
fultonj | sdb | 15:41 |
cmorey | oh right | 15:41 |
cmorey | that's much easier... | 15:41 |
fultonj | since that's what ceph-volume isn't making an OSD on | 15:41 |
fultonj | but because people often don't clean the disk correctly when they try to do it manualy | 15:41 |
cmorey | i'm happy to blow that away, i thought you wanted me to blow the whole node | 15:41 |
fultonj | i think it's better to have ironic clean it correctly | 15:42 |
cmorey | fultonj, i'll happi;y trash the volume from the storage device and re-create if if needs be | 15:42 |
fultonj | ironic will do the whole node and clean it right | 15:42 |
fultonj | you can try it's certtainly possible | 15:42 |
fultonj | sgdisk -Z may not be sufficient | 15:42 |
cmorey | is there a command that would show that it's clean enough? | 15:43 |
fultonj | lsblk | 15:43 |
fultonj | dmsetup | 15:43 |
cmorey | o.k. | 15:43 |
fultonj | sdb 8:16 0 50G 0 disk ceph--a881a17b--eee2--4c65--8709--538fb07af16c-osd--data--207c4446--851f--44de--b25c--9699552c9243 253:1 0 50G 0 lvm | 15:44 |
fultonj | is for a disk that's NOT clean ^ | 15:44 |
fultonj | it has ceph data on it | 15:44 |
fultonj | so ceph "helps" you by not deleting your data for you even if it's left over form an older deployment | 15:44 |
openstackgerrit | Merged openstack/tripleo-ansible stable/victoria: Fix networks_skip_config condition in nic templates https://review.opendev.org/c/openstack/tripleo-ansible/+/764965 | 15:45 |
fultonj | it really is helping you in the long run though, since deleting data is something the admin should take responsibility for | 15:45 |
fultonj | but for now it's getting in your way ;) | 15:45 |
fultonj | dmsetup ls | 15:46 |
*** ysandeep|away is now known as ysandeep | 15:46 | |
fultonj | so if you see something like that then | 15:46 |
fultonj | dmsetup remove ceph--98dfc177--3fe1--4248--9333--c2110f854c76-osd--data--4284de94--753a--4bfc--a27e--8fa79f1ab42b | 15:46 |
fultonj | cmorey: ^ | 15:46 |
cmorey | http://paste.openstack.org/show/800740/ | 15:46 |
fultonj | and then sgdisk -Z /dev/sdX | 15:47 |
rlandy | mwhahaha: https://review.opendev.org/q/dbfa2399b47aa3b9fef84709ff786b9bf69bf2d3 - the branch cherry-picks here are not merged - any problem with merging these now? | 15:47 |
cmorey | fultonj, done | 15:47 |
fultonj | you did the sgdisk too? | 15:47 |
cmorey | yep | 15:47 |
fultonj | is there a ceph-volume log on the machine? | 15:47 |
cmorey | http://paste.openstack.org/show/800740/ | 15:48 |
cmorey | yes | 15:48 |
cmorey | (to the ceph-volume log) | 15:48 |
cmorey | want me to move that out of the way? or just nuke /etc/ceph, /var/lib/ceph, /var/log/ceph | 15:48 |
fultonj | no | 15:48 |
mwhahaha | rlandy: no. i think they were stuck on ci | 15:49 |
fultonj | the ceph-volume log should tell you why ceph-volume wasn't happy | 15:49 |
fultonj | usually it's because the disk isn't clean | 15:49 |
fultonj | cmorey: did the output of those commands look diff before you cleaned it? | 15:49 |
rlandy | mwhahaha: k - thanks - will recheck | 15:49 |
fultonj | it basically runs https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ | 15:50 |
fultonj | but within the ceph container | 15:50 |
cmorey | fultonj, lsblk showed something like you posted, | 15:50 |
fultonj | cmorey: ah good | 15:50 |
fultonj | cmorey: that probably means it was culprit | 15:50 |
fultonj | so try re-running the same 'openstack overcloud deploy ...' | 15:51 |
cmorey | fultonj, do you want the ceph-volume.log or shall i re-try the deploy | 15:51 |
fultonj | you can tail -f config-download/<stack>/ceph-ansible/ceph-ansible.log when you redeploy | 15:51 |
openstackgerrit | mbu proposed openstack/tripleo-validations master: Generate inventory without any overcloud https://review.opendev.org/c/openstack/tripleo-validations/+/764955 | 15:51 |
fultonj | you'll need to confirm the exact path | 15:52 |
fultonj | relative to your version and env | 15:52 |
cmorey | is that on the undercloud node? | 15:52 |
fultonj | yes | 15:52 |
cmorey | ok. i'll see how that goes, | 15:52 |
*** ysandeep has quit IRC | 15:52 | |
*** odyssey4me has quit IRC | 15:52 | |
*** ysandeep has joined #tripleo | 15:53 | |
*** odyssey4me has joined #tripleo | 15:53 | |
openstackgerrit | Merged openstack/tripleo-heat-templates stable/victoria: Fix barbican settings missing from glance Edge nodes https://review.opendev.org/c/openstack/tripleo-heat-templates/+/765139 | 16:04 |
openstackgerrit | Merged openstack/tripleo-validations stable/train: Add a validation to check the local. https://review.opendev.org/c/openstack/tripleo-validations/+/764334 | 16:04 |
openstackgerrit | Merged openstack/tripleo-heat-templates master: [PowerFlex] Fix resource name typo in template https://review.opendev.org/c/openstack/tripleo-heat-templates/+/765031 | 16:05 |
openstackgerrit | Merged openstack/tripleo-quickstart master: Revert "ensure dlrn current is actually pulling current" https://review.opendev.org/c/openstack/tripleo-quickstart/+/765085 | 16:05 |
*** ykarel has quit IRC | 16:17 | |
openstackgerrit | David Peacock proposed openstack/tripleo-operator-ansible master: WIP - add role to show tripleo validation https://review.opendev.org/c/openstack/tripleo-operator-ansible/+/755375 | 16:19 |
openstackgerrit | David Peacock proposed openstack/tripleo-operator-ansible master: WIP - add role to list available tripleo validations https://review.opendev.org/c/openstack/tripleo-operator-ansible/+/755365 | 16:19 |
*** marios is now known as marios|out | 16:22 | |
openstackgerrit | Alan Bishop proposed openstack/tripleo-heat-templates stable/ussuri: Fix barbican settings missing from glance Edge nodes https://review.opendev.org/c/openstack/tripleo-heat-templates/+/765566 | 16:23 |
*** ysandeep is now known as ysandeep|away | 16:28 | |
openstackgerrit | Merged openstack/tripleo-ansible stable/train: port c7 molecule job to c8 - part 3 https://review.opendev.org/c/openstack/tripleo-ansible/+/752382 | 16:31 |
*** amoralej is now known as amoralej|off | 16:38 | |
cmorey | fultonj, cleaning the disk has allowed it to move on, it now seems to be complainign about unrecognised pools (specifically vms, volumes and images | 16:40 |
cmorey | ah, it created vms o.k. but volums and images complain with "[\"Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)" | 16:41 |
fultonj | yep | 16:42 |
fultonj | another "feature" | 16:42 |
cmorey | fixable? | 16:42 |
fultonj | oh yeah definitely | 16:42 |
fultonj | on sec | 16:42 |
fultonj | cmorey: https://bugs.launchpad.net/tripleo/+bug/1749544 | 16:43 |
openstack | Launchpad bug 1749544 in tripleo "Overcloud deployment during ControllerDeployment_Step4 with ceph fails "ObjectNotFound: error opening pool 'metrics'\"," [High,Fix released] - Assigned to John Fulton (jfulton-org) | 16:43 |
fultonj | mon_max_pg_per_osd: 3072 | 16:44 |
fultonj | is a BAD ^ idea | 16:44 |
fultonj | if you care about your data | 16:44 |
fultonj | it's all explained in the bug report above | 16:44 |
fultonj | see also https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_config.html#ceph-placement-group-validation | 16:45 |
fultonj | that will at least let you simulate the pool creation to check that your prosed configuration wont' fail going in | 16:46 |
fultonj | cmorey: ^ | 16:46 |
*** chandankumar is now known as raukadah | 16:47 | |
cmorey | I really need to get my head around ceph | 16:47 |
fultonj | how many storage nodes do you have, just 1? | 16:48 |
fultonj | with one OSD? | 16:48 |
cmorey | at the moment | 16:49 |
fultonj | so ceph makes copies of your data to keep it safe | 16:49 |
fultonj | it can't make multiple copies if there's only one disk (OSD) | 16:50 |
cmorey | it's user data.... they know it's almost ephemeral | 16:50 |
*** bandini has quit IRC | 16:50 | |
fultonj | ceph's default is 3 copies | 16:50 |
cmorey | (but that's why it's on a storage array) | 16:50 |
fultonj | tell it you only need 1 and it will stop comaining | 16:50 |
fultonj | complaining | 16:50 |
cmorey | at the moment, i don't have any other nodes with extra disks | 16:51 |
cmorey | o.k. so if I tell it to keep 1 copy (preumsably via an override), the rest of the config (CephPoolDefaultSize, CephPoolDefaultPgNum and CephPools) will be fine? | 16:53 |
fultonj | not exactly | 16:54 |
fultonj | the pools were already created with the CephPoolDefaultSize which defaults to 3 | 16:55 |
fultonj | you can update the existing pools | 16:55 |
*** bandini has joined #tripleo | 16:55 | |
fultonj | for P in vms volumes images; do ceph osd pool set $P size 1; done | 16:56 |
fultonj | but the ceph binary is in the container | 16:56 |
fultonj | so | 16:56 |
fultonj | on a controller | 16:56 |
fultonj | podman exec ceph-mon-$HOSTNAME | 16:56 |
fultonj | podman exec -ti ceph-mon-$HOSTNAME /bin/bash | 16:57 |
cmorey | if i'm going to be re-runing openstack overcloud deploy, do i need to nuke /dev/sdb, or are we past that stage? | 16:57 |
fultonj | first so youc an use the ceph binary | 16:57 |
fultonj | no need to nuke | 16:57 |
fultonj | it's idempotent | 16:57 |
fultonj | it will just see you have that osd and not try to create it | 16:57 |
fultonj | so add 'CephPoolDefaultSize: 1' to your templates | 16:57 |
cmorey | fwiw, i'm trying to also have ceph_rgw running, as well, if that makes a difference | 16:58 |
fultonj | CephPoolDefaultPgNum: 32 | 16:58 |
fultonj | which is very low | 16:58 |
fultonj | then set all pools which have already been created from 3 to 1 replicas | 16:58 |
fultonj | i'd say you're just trying ceph out on not the right stuff which is fine | 16:59 |
fultonj | it was designed for lots of disks and will perform better that way | 16:59 |
fultonj | but you can deploy with just 1, you'll just need to reduce the defaults | 16:59 |
cmorey | *nods* if i had the ability to go and plug extra disks in, i would | 16:59 |
gfidente | fultonj fmount so I was thinking, if we end up using cephadm for ganesha | 17:00 |
cmorey | so, edit template, re-deploy, then run ceph osd pool set <volume> size 1 | 17:00 |
gfidente | and write some role in tripleo-ceph to create its config file | 17:00 |
gfidente | is this really worth a spec or would it rather go under the existing tripleo-cephadm spec | 17:00 |
gfidente | ? | 17:00 |
*** jaosorior has quit IRC | 17:01 | |
gfidente | probably yes just to mention it'll use a slightly different "workflow" ? | 17:01 |
*** marios|out has quit IRC | 17:01 | |
fultonj | cmorey: ceph osd pool set <volume> size 1 | 17:02 |
fultonj | then edit template and redeploy | 17:02 |
fultonj | else redeploy will still fail with same message since those pools were already created | 17:03 |
fultonj | do it for all the pools you have | 17:03 |
fultonj | you're basically disabling ceph's data protection since you're short on disk | 17:03 |
cmorey | fultonj, one one pool, 'set pool 1 size to 1' | 17:03 |
fmount | gfidente: yeah it's probably ok describe this workflow because it's a standalone service deployed by cephadm (to close the gap w/ ceph-ansible) | 17:04 |
fultonj | pool number vs name | 17:04 |
* fultonj thought both worked | 17:04 | |
cmorey | that was the output of .. set vms size 1 | 17:04 |
cmorey | i only have one pool | 17:04 |
cmorey | my default poolsize is already set to 1 | 17:05 |
cmorey | http://paste.openstack.org/show/800746/ <-- new env file | 17:06 |
fultonj | gfidente: you could make the spec shorter then | 17:06 |
fultonj | the tripleo-ceph spec merged | 17:06 |
gfidente | fultonj yeah I am avoiding work | 17:06 |
fultonj | so the ganesha spec could become like an addendum to it | 17:07 |
fmount | ++ | 17:07 |
fultonj | perhaps a blueprint | 17:07 |
fmount | that's a good idea ^ | 17:07 |
gfidente | we have the blueprints in launchpad for all three I think | 17:07 |
gfidente | I'll shorten it a little | 17:07 |
fultonj | since the spec exists perhaps just finish it but make it short as it needs to be | 17:07 |
fultonj | so people dont' think we gave up on the project :) | 17:07 |
gfidente | yeah | 17:08 |
gfidente | thanks | 17:08 |
fultonj | gfidente: so you would deploy standalone ganesha before on the overcloud before running heat | 17:08 |
fultonj | gfidente: and you'd do with the a correctly crafted cephadm spec file | 17:08 |
*** lmiccini has quit IRC | 17:09 | |
fultonj | cmorey: you've had 'CephPoolDefaultPgNum: 32' + 'CephDefaultPoolSize: 1' since day 1? | 17:09 |
cmorey | fultonj, i'm getting a warning from "openstack overcloud deploy" that the CephDefaultPOolSize is defined but not currently used in the deployment plan | 17:09 |
cmorey | fultonj, no, just CephDefaultPoolSize 1 | 17:09 |
fultonj | ah | 17:09 |
gfidente | POol | 17:09 |
*** dprince has joined #tripleo | 17:10 | |
cmorey | my typo, sorry | 17:10 |
fultonj | yeah so 128 is the default and it's too high | 17:10 |
cmorey | "WARNING: Following parameter(s) are defined but not currently used in the deployment plan. These parameters may be valid but not in use due to the service or deployment configuration. BondInterfaceOvsOptions, SwiftRingPutTempurl, CephDefaultPoolSize, SwiftRingGetTempurl" | 17:10 |
fultonj | ceph octopus has pg auto scale so we won't have to do this dance in the future | 17:10 |
fultonj | maybe the templates aren't in the right order of -e's | 17:11 |
cmorey | hopefully next week i can add some more drives to the other servers and them | 17:11 |
cmorey | "-e $THT/environments/ceph-ansible/ceph-ansible.yaml -e $THT/environments/ceph-ansible/ceph-rgw.yaml -e templates/c | 17:11 |
cmorey | eph-settings.yaml -e templates/local-config.yaml" | 17:11 |
cmorey | you can ignore local-config, that's just SELinux | 17:12 |
cmorey | but the others are the ceph ones | 17:12 |
fultonj | sadly my day off is starting | 17:12 |
cmorey | have a good break. | 17:12 |
fultonj | but https://bugs.launchpad.net/tripleo/+bug/1749544 has the info on what you're running into | 17:13 |
openstack | Launchpad bug 1749544 in tripleo "Overcloud deployment during ControllerDeployment_Step4 with ceph fails "ObjectNotFound: error opening pool 'metrics'\"," [High,Fix released] - Assigned to John Fulton (jfulton-org) | 17:13 |
cmorey | should i add the low-memory.yaml? | 17:13 |
cmorey | (i've not looked at that yaml yet, ) | 17:13 |
fultonj | cmorey: no point | 17:14 |
fultonj | https://review.opendev.org/c/openstack/tripleo-heat-templates/+/544588/4/environments/low-memory-usage.yaml | 17:14 |
cmorey | fultonj, ok. | 17:14 |
fultonj | you've already done the equivalent | 17:14 |
cmorey | true | 17:14 |
cmorey | the nodes aren't short of memory (all overcloud nodes currently have 64G or more) | 17:15 |
* fultonj has to rip himself away from his computer and go play guitar | 17:16 | |
cmorey | thanks for your help fultonj | 17:16 |
* fultonj scheduled pto | 17:16 | |
fultonj | happy hacking everyone | 17:16 |
*** fultonj has quit IRC | 17:16 | |
openstackgerrit | Merged openstack/python-tripleoclient master: Dumping task key instead of tasks from validation_output https://review.opendev.org/c/openstack/python-tripleoclient/+/764937 | 17:18 |
weshay|ruck | mwhahaha, for a happier gate this should help https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/762400 | 17:19 |
*** rlandy is now known as rlandy|brb | 17:20 | |
openstackgerrit | Francesco Pantano proposed openstack/tripleo-heat-templates master: Remove /etc/ceph dependency and add tripleo_ceph_client role https://review.opendev.org/c/openstack/tripleo-heat-templates/+/763542 | 17:20 |
slagle | hjensas: check my comment on https://review.opendev.org/c/openstack/tripleo-ansible/+/761908, i figured out where the Exception was coming from | 17:20 |
*** jpich has quit IRC | 17:21 | |
slagle | code assumes "subnet" key is on the instance networks, but the examples in the spec do not show up. i think it might be needed though, so you know what subnet to use to provision the port | 17:22 |
hjensas | slagle: ah, right. It's should only be required if there is more than one subnet. I will update the patch. | 17:25 |
hjensas | slagle: or, wdyt should we just require the key to be set? For spine-leaf/edge we need it to know where to create the port. But for L2 we can assume only one subnet and get it of the network. | 17:27 |
slagle | i think i'd prefer to not require it if it's not necessary | 17:29 |
hjensas | slagle: ok, we have reached agreement. The feedback is much appriciated! Thanks! | 17:30 |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-specs master: Introduce tripleo-ceph-ganesha spec https://review.opendev.org/c/openstack/tripleo-specs/+/759395 | 17:39 |
*** gfidente is now known as gfidente|afk | 17:41 | |
slagle | hjensas: i added the subnet key to my yaml for now, and I got all my ports provisioned. :) | 17:44 |
slagle | about to try the heat stack now | 17:44 |
*** rlandy|brb is now known as rlandy | 17:47 | |
*** beekneemech has quit IRC | 17:49 | |
*** jpena is now known as jpena|off | 17:59 | |
*** derekh has quit IRC | 18:04 | |
cmorey | :( ceph still is unhappy, i think it deployed, but won't pass.. http://paste.openstack.org/show/800749/ | 18:05 |
mwhahaha | you can specify the max count if you are deploying only a single node | 18:06 |
mwhahaha | we do it in ci i think | 18:06 |
cmorey | max count of? | 18:06 |
mwhahaha | CephPoolDefaultSize i think | 18:06 |
cmorey | mwhahaha, got that included (i hope) | 18:07 |
cmorey | PGs were not reported as active+clean\", \"It is possible that the cluster has less OSDs than the replica configuration\", \"Will refuse to continue\" is the error | 18:07 |
* mwhahaha shrugs | 18:07 | |
cmorey | https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/? might be missing things | 18:08 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Network ports module https://review.opendev.org/c/openstack/tripleo-ansible/+/761908 | 18:12 |
hjensas | slagle: ^^ should take care of the subnet key error? | 18:13 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key https://review.opendev.org/c/openstack/tripleo-ansible/+/762160 | 18:19 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module https://review.opendev.org/c/openstack/tripleo-ansible/+/764638 | 18:19 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output https://review.opendev.org/c/openstack/tripleo-ansible/+/764639 | 18:19 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports https://review.opendev.org/c/openstack/tripleo-ansible/+/764640 | 18:19 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support https://review.opendev.org/c/openstack/tripleo-ansible/+/762162 | 18:19 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook https://review.opendev.org/c/openstack/tripleo-ansible/+/765545 | 18:19 |
*** raildo has quit IRC | 18:21 | |
cmorey | mwhahaha, is there a way to render the output of all of the -e options? | 18:22 |
mwhahaha | No | 18:23 |
cmorey | :( | 18:23 |
mwhahaha | It might be in the stack itself | 18:23 |
mwhahaha | After you tried a deploy | 18:24 |
mwhahaha | But I don't have info off the top of my head | 18:24 |
cmorey | it looks like it ignored 'CephDefaultPoolSize: 1' | 18:24 |
*** bandini has quit IRC | 18:27 | |
cmorey | doh | 18:28 |
cmorey | because the tht has CephPoolDefaultSize | 18:29 |
cmorey | not CephDefaultPoolSize | 18:29 |
* cmorey bangs his head on the desk | 18:30 | |
*** karthiks has quit IRC | 18:35 | |
*** raildo has joined #tripleo | 18:43 | |
*** raildo has quit IRC | 18:45 | |
*** raildo has joined #tripleo | 18:48 | |
*** jbadiapa has quit IRC | 18:51 | |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Network ports module https://review.opendev.org/c/openstack/tripleo-ansible/+/761908 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key https://review.opendev.org/c/openstack/tripleo-ansible/+/762160 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module https://review.opendev.org/c/openstack/tripleo-ansible/+/764638 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output https://review.opendev.org/c/openstack/tripleo-ansible/+/764639 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports https://review.opendev.org/c/openstack/tripleo-ansible/+/764640 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support https://review.opendev.org/c/openstack/tripleo-ansible/+/762162 | 19:04 |
openstackgerrit | Harald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook https://review.opendev.org/c/openstack/tripleo-ansible/+/765545 | 19:04 |
hjensas | ETOOMANYPATCHESTOAPPLYINMYLAB : can we merge https://review.opendev.org/c/openstack/tripleo-ansible/+/763377 please? | 19:09 |
cmorey | mwhahaha, works now... | 19:12 |
mwhahaha | k | 19:12 |
hjensas | mwhahaha: tusen takk! | 19:13 |
*** jcapitao has quit IRC | 19:14 | |
slagle | hjensas: do I still define the fixed ip for the vips with the Heat parameters? | 19:14 |
slagle | hjensas: i'm thinking no, b/c then there's no port created for them | 19:14 |
cmorey | mwhahaha, thanks for your help, (and if you see fultonj, please pass it on) | 19:14 |
hjensas | slagle: yes, the VIP ports are still created by heat. I did'nt get to those yet. It'll be easy for the network VIP, but we have service VIPs for ovn, redis etc which I'm not sure how we should manage. | 19:16 |
slagle | ok, that's fine. i figured i was missing something :) | 19:17 |
*** pcaruana has quit IRC | 19:22 | |
openstackgerrit | Adriano Petrich proposed openstack/tripleo-common master: Fix localization for horizon container https://review.opendev.org/c/openstack/tripleo-common/+/763753 | 19:22 |
*** cmorey has quit IRC | 19:23 | |
openstackgerrit | Harald Jensås proposed openstack/python-tripleoclient master: Add '--network-ports' option to node (un)provision https://review.opendev.org/c/openstack/python-tripleoclient/+/764810 | 19:36 |
*** jfrancoa has quit IRC | 20:03 | |
openstackgerrit | mbu proposed openstack/tripleo-validations master: Generate inventory without any overcloud https://review.opendev.org/c/openstack/tripleo-validations/+/764955 | 20:04 |
openstackgerrit | Francesco Pantano proposed openstack/tripleo-puppet-elements master: [WIP] - Include cephadm in the overcloud image https://review.opendev.org/c/openstack/tripleo-puppet-elements/+/765512 | 20:26 |
openstackgerrit | Merged openstack/tripleo-heat-templates stable/train: Set correct default NovaLibvirtCPUMode https://review.opendev.org/c/openstack/tripleo-heat-templates/+/764314 | 20:31 |
openstackgerrit | Merged openstack/tripleo-heat-templates master: Ensure cloud-init has finished before puppet run https://review.opendev.org/c/openstack/tripleo-heat-templates/+/764943 | 20:31 |
*** belmoreira has quit IRC | 20:46 | |
*** dciabrin_ has joined #tripleo | 20:50 | |
*** jamesdenton has quit IRC | 20:51 | |
*** jamesdenton has joined #tripleo | 20:51 | |
*** dciabrin has quit IRC | 20:52 | |
openstackgerrit | mbu proposed openstack/python-tripleoclient master: Move extra vars cli option to narg type https://review.opendev.org/c/openstack/python-tripleoclient/+/765137 | 20:59 |
*** rlandy has quit IRC | 21:04 | |
*** rfolco has quit IRC | 21:07 | |
*** dprince has quit IRC | 21:28 | |
*** TrevorV has quit IRC | 21:36 | |
*** ccamacho has quit IRC | 21:54 | |
*** raildo has quit IRC | 21:59 | |
*** tkajinam has joined #tripleo | 22:00 | |
*** gfidente|afk has quit IRC | 22:11 | |
*** tmazur has quit IRC | 22:16 | |
*** ekultails has left #tripleo | 22:29 | |
*** jamesdenton has quit IRC | 22:40 | |
*** jamesdenton has joined #tripleo | 22:40 | |
*** pleimer_ has quit IRC | 22:46 | |
*** rfolco has joined #tripleo | 23:01 | |
*** xek_ has quit IRC | 23:02 | |
*** supamatt has joined #tripleo | 23:05 | |
*** rfolco has quit IRC | 23:06 | |
*** cylopez has quit IRC | 23:18 | |
*** ccamacho has joined #tripleo | 23:22 | |
*** Goneri has quit IRC | 23:50 | |
*** tosky has quit IRC | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!