opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 05:59 |
---|---|---|
f0o | ^ this is more of a WIP right now, I need to test it first. It was a shameless ripoff of https://github.com/openstack/openstack-ansible-os_keystone/commit/19af9dabc83fda4f2e14b2cc8ab87b14c50fdc2d - I feel like I need to configure sshd to use sshca first | 06:00 |
f0o | (also good morning!) | 06:00 |
noonedeadpunk | good morning :) | 07:03 |
noonedeadpunk | f0o: I think that the `openstack.osa.ssh_keypairs` does configure sshd | 07:04 |
noonedeadpunk | during ca installation | 07:04 |
f0o | Coolio | 07:05 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/roles/ssh_keypairs/tasks/standalone/install_ssh_ca.yml#L32-L67 | 07:05 |
f0o | wish I could test it but I'm currently at a coworking space because vattenfall is doing some maintenance in the neighborhood resulting in a loss of power | 07:05 |
f0o | but should be done by lunch so I can give it a spin then | 07:06 |
noonedeadpunk | well, we can also wait for CI results :) | 07:08 |
f0o | would the CI result even be conclusive here? even if the SSHCA fails to properly deploy, AIO will still pass.. Same as the issue with ssh-keygen being required before being installed, it worked because a previous service installed it | 07:09 |
noonedeadpunk | well CA is mnore fine grained I guess as the key is specific to the swift user....] | 07:09 |
noonedeadpunk | but indeed in case of metal - it doesn't matter at all | 07:10 |
noonedeadpunk | unless there's some obvious issue making code to fail | 07:10 |
f0o | I meant more the conclusiveness/meaningfulness of the AIO CI in this regard. The SSHCA tasks might still pass (since they're 1:1 copypasta probably will pass) but they're not actually function-tested since the AIO wont use it | 07:13 |
f0o | either way, I'll toss it into our env once I'm back at home and got access | 07:13 |
f0o | I've been looking into why the br-storage network wasnt found and honestly dont understand it. The facts should have it. I see the containers have the veth on the bridge and a correct IP. The hosts absolutely have the br-storage since they require it for nova<>cinder... | 07:16 |
f0o | it's like ansible doesnt find the facts.. will have to compare the tasks with cinder or similar, maybe they diverged | 07:16 |
f0o | like maybe the facts arent called the same anymore | 07:16 |
noonedeadpunk | so I think it looks for the interface inside of the container | 07:33 |
noonedeadpunk | my guess would be that it should be `eth2`, not br-storage | 07:33 |
noonedeadpunk | as you won't have `br-stoprage` fact for container | 07:34 |
f0o | that's the funny part tho, it fails on the hosts and not inside a container.. swift-proxy container passes metal hx_y does not | 07:34 |
f0o | unless the failing task is not the actual task that's failing but the failure is somewhere higher up and obscure | 07:35 |
f0o | one thing I dont understand is the replacement of - to _, so br-storage becomes br_storage which is being looked at for ipv4.address facts | 07:36 |
f0o | https://github.com/openstack/openstack-ansible-os_swift/blob/master/tasks/swift_calculate_addresses.yml#L37 and line 63 does the lookup | 07:36 |
noonedeadpunk | f0o: so I'd guess we need to replace metal with LXC jobs at the very least for swift | 07:37 |
f0o | so if that has changed from 2 years ago, then its an obvious failure | 07:37 |
noonedeadpunk | or even create a several copies of swift-proxy containers for aio? | 07:37 |
noonedeadpunk | as we can do that :) | 07:37 |
f0o | perhaps but swift-proxy hosts are not failing here | 07:38 |
f0o | it's the swift-hosts (storage hosts) that are failing with missing interfaces | 07:38 |
f0o | and those should be metal because the container has no access to the /dev/sdX | 07:38 |
f0o | or am I mistaken here? | 07:38 |
noonedeadpunk | well, I was about ssh part still | 07:38 |
f0o | aaaaah | 07:38 |
f0o | yes in the ssh regard yes | 07:38 |
f0o | I've got too many mental and browser tabs open, apologies for jumping around | 07:39 |
noonedeadpunk | and about br-storage - I think indeed we might jsut be missing facts | 07:39 |
f0o | it looks to me that i can freely set the IPs directly instead of defining the network interface and it will bypass that lot | 07:40 |
noonedeadpunk | ie - smth like that is missing for the playbook? https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/ceph_install.yml#L46-L58 | 07:40 |
f0o | swift_hosts>hx_y>swift_vars>repl_ip for instance seems to be accept and replaces the swift>replcation_network entry entirely | 07:41 |
f0o | gather_extra_facts is not defined anywhere in os_swift | 07:41 |
noonedeadpunk | it kind of is, but only hardware factrs: https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/swift.yml#L20-L23 | 07:42 |
noonedeadpunk | as that is the default https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/all.yml#L139-L140 | 07:43 |
f0o | so then extending that to the network and ipv*_addresses like you linked in ceph should solve that? | 07:43 |
f0o | or does the fact structure itself also change and we need to touch that calculate_addresses tasks? | 07:44 |
f0o | I know too little of that structure unfortunately | 07:44 |
noonedeadpunk | um, not sure what you mean about fact structure, but it should be the same. So at least trying to add alike to ceph into swift playbook might be just enough | 07:48 |
f0o | call me odd but I always get suspicious when there are text transformations and then access to that transformed key | 07:55 |
f0o | like br-storage becoming br_storage | 07:56 |
f0o | feels like a source of problems as it could be a legacy thing that might have been changed sicne the 2 years of its introduction but forgotten to apply here | 07:56 |
f0o | because that replace screams workaround for me | 07:56 |
derekokeeffe85 | Morning all, noonedeadpunk anymore suggestions on what could be causing or how to fix my mariadb issue? | 07:57 |
darkhackernc | morning all, | 07:58 |
darkhackernc | noonedeadpunk, any thought on https://bugs.launchpad.net/openstack-ansible/+bug/2106625 and https://bugs.launchpad.net/openstack-ansible/+bug/2106715 | 07:58 |
f0o | 2106715 doesnt seem to be an ansible related issue tho, reads to me like an openstack cli issue | 08:00 |
f0o | so I think 2106715 should be in cinder or python-cinderclient projects | 08:04 |
jrosser | isnt that replace just an artefact of how ansible has to transform the interface names | 08:04 |
f0o | jrosser: possibly, I do see that plugins/ceph does similar replace for the monitor_interface | 08:05 |
f0o | it was just something that struck me at a first glance | 08:05 |
jrosser | the facts about interfaces always have _ iirc | 08:05 |
jrosser | regardless of what the name of the interface actually is on the system | 08:06 |
f0o | good to know! | 08:06 |
jrosser | the same goes for `:` i think | 08:08 |
jrosser | darkhackernc: 2106715 really is not about the deployment tool at all | 08:10 |
darkhackernc | jrosser++ yes, that is an cinder-cli operations part | 08:12 |
jrosser | cinder won't see that with the bug assigned to openstack-ansible though | 08:13 |
jrosser | derekokeeffe85: can you remind what your mariadb issue is? | 08:14 |
derekokeeffe85 | Will do jrosser, let me capture the error and do a paste of it for you | 08:15 |
derekokeeffe85 | fails on the setup_openstack playbook with this https://paste.openstack.org/show/bqyekQF7IOJP9rt3RQGG/ when I run mysql or mariadb on the utility container it's not there | 08:17 |
f0o | `Lost connection to MySQL server during query` reads like MTU issue to me; can you verify that all interfaces (galera container, haproxy, keystone container) use the same MTU? | 08:22 |
f0o | if my memory is correct, the traffic goes keystone_container -> host bridge -> haproxy interface/s -> host bridge -> galera_container | 08:22 |
f0o | so that's 5-6 interfaces that could have MTU mismatches | 08:22 |
f0o | only reason I mentioned this is because I had very similar intermittent issues when I set up all interfaces to jumbo and missed one | 08:23 |
jrosser | well and your switches could have mtu touble too | 08:24 |
f0o | oh yeah those too good point | 08:25 |
jrosser | derekokeeffe85: that task is run from the utility container and it tries to connect to the db via the loadbalancer | 08:25 |
jrosser | there is a mysql/mariadb client installed for you already on the utility container and you could try some interaction with the db manually using that | 08:26 |
jrosser | if that doesnt work properly, the ansible wont either | 08:26 |
jrosser | you could also have ip routing issues with traffic accidentally exiting a large mtu interface when you didnt intend it to | 08:27 |
derekokeeffe85 | when I run mysql or mariadb on the utility container it's not installed. I did have MTU and UFW issues at the start but I'm fairly sure they're resolved now. I re ran the playbooks after I sorted that issue and both completed fully but still this issue. I set the IPs of all the nodes to their br-ext IP in openstack_user_config rather than the container network, would that be playing a part in it? | 08:31 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 08:43 |
derekokeeffe85 | Sorry I never tagged you in the reply jrosser but you can see it above | 08:45 |
jrosser | derekokeeffe85: you should have a /usr/bin/mariadb binary on the utility host | 08:49 |
jrosser | and a ~/.my.cnf on the same host with the db connection details | 08:49 |
derekokeeffe85 | Yep I have this https://paste.openstack.org/show/bBP0fnh9HOmTMSaYcz6f/ jrosser | 08:52 |
derekokeeffe85 | The IP address is that of the controller node br-mgmt | 08:53 |
jrosser | so you should be able to use the mariadb client on the utility container to connect to the db i think | 08:53 |
derekokeeffe85 | Ok I can try that. Do you have an example off the top of your head by any chance? | 08:54 |
jrosser | at the very simplest you can try `show databases;` | 08:56 |
derekokeeffe85 | Says mariadb and mysql services are not found :( | 08:56 |
derekokeeffe85 | Not sure if this helps but I can't telnet to the utility container on 3306 from the controller so maybe it couldn't install it?? Trying 172.29.239.232... | 08:58 |
derekokeeffe85 | telnet: Unable to connect to remote host: Connection refused | 08:58 |
jrosser | i'm confused | 08:58 |
jrosser | why to the utility container on 3306? | 08:59 |
jrosser | the database runs in the galera containers, and the connection to the database goes via the haproxy loadbalanver | 08:59 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 08:59 |
jrosser | so in your my.cnf you have 172.29.236.6 as the address for the database, which should be your internal vip? | 08:59 |
f0o | jrosser: how can I restart the CI? will it happen automatically on the new commit? | 09:00 |
jrosser | f0o: yes a new revision of the patch will cancel the previous job and start a new one stright away | 09:01 |
f0o | coolio! so nothing to do :) | 09:01 |
f0o | the copypaste errors you mentioned are also in keystone btw heh | 09:01 |
jrosser | oh no :( | 09:01 |
f0o | I'll make a patch for it | 09:01 |
f0o | just need to walk the doggo first | 09:02 |
jrosser | excellent that super helpful, we are a bit short on reviewers currently so that makes it easier | 09:02 |
derekokeeffe85 | jroser internal_lb_vip_address: 172.29.236.6. That's the Ip of my br-mgmt on the controller node. Is that correct? | 09:07 |
jrosser | i think you only have one controller? is that right? | 09:09 |
noonedeadpunk | iirc we left at stage, that haproxy marks mariadb backend as down, right? | 09:17 |
noonedeadpunk | darkhackernc: can you remind me the paste you sent yesterday from haproxy stat? | 09:25 |
noonedeadpunk | sorry, derekokeeffe85 ^ | 09:25 |
noonedeadpunk | miss-pinged | 09:25 |
noonedeadpunk | darkhackernc: frankly, I have never seen anything like you're describing in https://bugs.launchpad.net/openstack-ansible/+bug/2106625 | 09:26 |
noonedeadpunk | don't you accidentally have `package_state: latest` or `nova_package_state: latest` defined somewhere together with running CentOS/Rocky linux? | 09:27 |
noonedeadpunk | nah, you run ubuntu | 09:27 |
noonedeadpunk | derekokeeffe85: what's the oputput of `curl http://$(lxc-ls -1 | grep galera):9200`? | 09:49 |
noonedeadpunk | or it's getting stuck? | 09:49 |
noonedeadpunk | if it's getting stuck - I'd suggest checking from which IP request is coming from the control plane to galera | 09:50 |
noonedeadpunk | as you've changed the VIP - you might need to re-run the galera role, if you didn't do that yet | 09:50 |
noonedeadpunk | ok, forget that ^ | 09:52 |
noonedeadpunk | but basically you'd need to check, that haproxy goes to the galera from one of the allowed IP addresses | 09:54 |
noonedeadpunk | you can check currently allowed ones as `systemctl cat mariadbcheck.socket` from galera container | 09:54 |
noonedeadpunk | or, set galera_monitoring_allowed_source: 172.29.236.0/22 galera_server_proxy_protocol_networks: 172.29.236.0/22 | 09:58 |
noonedeadpunk | derekokeeffe85: ^ | 09:58 |
derekokeeffe85 | Sorry jrosser and noonedeadpunk I was called away. Yep I only have one controller. I'll do the other suggestions now | 10:10 |
f0o | jrosser: it doesnt seem like CI was restarted for my change 946865 | 10:33 |
jrosser | f0o: https://zuul.opendev.org/t/openstack/status?change=946865 | 10:34 |
jrosser | you can see that the lint has already failed so thats probably something to look at immediately | 10:35 |
f0o | annoying that it complains about files that werent touched in the change | 10:36 |
f0o | so this is going to feature-creep | 10:36 |
derekokeeffe85 | jrosser noonedeadpunk thank you so much!!!!! :) It passed that task. I added galera_monitoring_allowed_source: 172.29.236.0/22 galera_server_proxy_protocol_networks: 172.29.236.0/22 to user_variables and re ran the playbooks. setup_openstack is still running now | 10:37 |
derekokeeffe85 | https://paste.openstack.org/show/bRNE6hdoDcJgxMg7C2Bm/ | 10:38 |
jrosser | f0o: you can always make a separate patch for fixing linters | 10:38 |
jrosser | it doesnt have to be in the same change, ideally it wouldnt be | 10:38 |
f0o | I'm not 100% sure how to fix 'no-handler: Tasks that run when changed should likely be handlers. (warning)' | 10:40 |
f0o | tasks/swift_pypy_setup.yml:38 Task/Handler: Setup local pypy | 10:41 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 10:41 |
f0o | but I think I've addressed all other linting issues | 10:41 |
jrosser | i kind of have no odea | 10:42 |
jrosser | idea | 10:42 |
jrosser | ultimately the swift role needs a maintainer really | 10:43 |
f0o | hehe | 10:43 |
jrosser | i.e someone who is wanting to use swift and understand it sufficiently to keep the ansible relevant | 10:43 |
f0o | I'm a bit on the fence whether I should go swift or go ceph... for us it really depends what the resource steal/overhead of ceph is. We would like to run the object storage on the same hosts as compute does just to use the local disks for cheap object storage. Cinder volumes is all NFS appliances so we cant reuse that gear for ceph and have no requirement for ceph to do | 10:45 |
f0o | blockstorage | 10:45 |
f0o | I remember from way back (Mitaka times) swift was super low overhead - I have no experience with Ceph other than knowing that Proxmox uses it as HCI storage | 10:45 |
noonedeadpunk | f0o: warning is not why it's failing | 10:54 |
noonedeadpunk | as warnings are not treated as errors there | 10:54 |
f0o | oh ok then I just fixed a few pipefail warnings for nothing | 10:54 |
f0o | :D | 10:54 |
f0o | but I did also fix the indents and nother things it complained about | 10:54 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 10:56 |
noonedeadpunk | there's a stack trace on loading yaml.... | 10:56 |
f0o | yeah I fat fingered the last patchset | 10:57 |
noonedeadpunk | ah, indent here on L26 | 10:57 |
f0o | yep | 10:57 |
noonedeadpunk | ok, sorry :D | 10:57 |
f0o | slowly but surely I'm getting used to gerrit and zuul navigation :D | 10:58 |
f0o | linting passes now - at least one less thing to worry about. Vattenfall on the other hand... they seem to have left for lunch and while the apartments have power, the building does not so by extension the internet is still down | 11:09 |
f0o | Vattenfall just flipped the power on for the building and I have internet again - Great Success! | 11:17 |
f0o | now I can checkout the os_swift role locally and toss it against the env | 11:17 |
f0o | :) | 11:17 |
f0o | I'm just going to assume they're done for today and hopefully wont get a powercut mid deploy | 11:18 |
noonedeadpunk | f0o: it's not that distant from getting loving it comparing to github :D | 11:36 |
f0o | haha I still feel very much at home with GH after 20 odd years of using it | 11:37 |
f0o | SSH-CA patchset works | 11:38 |
f0o | let me push the patch for all_addresses facts in ansible-plugins | 11:39 |
noonedeadpunk | well, depends-on and series of patches are really nice things. And that fork->PR->clean-up flow is smth I do hate now... | 11:39 |
f0o | heh just skip the cleanup xD | 11:40 |
noonedeadpunk | or when you see stale PR that owned by someone else, which you can't do anything with, except open the new one... | 11:40 |
f0o | they changed that tho, you can now push into other's PRs if you're a maintainer | 11:40 |
noonedeadpunk | yeah. if you own the project - it's simpler. | 11:41 |
noonedeadpunk | sure | 11:41 |
noonedeadpunk | but if you need to contribute to smth you don't own - it's painful imo. Especially painful smth like helm chart couple of years ago | 11:41 |
noonedeadpunk | where each feature needs to increment a version number inside of the PR, resulting in constant conflict between everything | 11:42 |
noonedeadpunk | anyway:) | 11:42 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-plugins master: Add all_addresses facts to os_swift playbook https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/946891 | 11:43 |
f0o | oh yeah those things are pain, you can use GH Actions now but its also a pain | 11:43 |
noonedeadpunk | `stderr: '/bin/sh: 1: set: Illegal option -o pipefail'` | 11:44 |
noonedeadpunk | https://zuul.opendev.org/t/openstack/build/fbd95331f81b4d839c5dd12ea5e70990/log/job-output.txt#11689-11703 | 11:44 |
f0o | one step forward... two steps back... | 11:45 |
f0o | :| | 11:45 |
f0o | but I guess the pipefail linting were all warnings so I can just revert it | 11:45 |
noonedeadpunk | yeah | 11:45 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible/src/branch/master/.ansible-lint#L9 | 11:45 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 11:46 |
f0o | os_swift seems to require crontab; what's the policy here? install cron or try to migrate the recon job to systemd timers? | 11:49 |
noonedeadpunk | systemd timers.... | 11:50 |
f0o | ezpz I'll staple that onto my current change | 11:51 |
noonedeadpunk | I think this is a good example of it: https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/tasks/main.yml#L286-L303 | 11:51 |
f0o | oh awesome | 11:52 |
noonedeadpunk | it's really better to make series of smaller changes | 11:52 |
noonedeadpunk | unless their merge is blocked | 11:52 |
f0o | fair | 11:52 |
f0o | they all sort of depend on eachother | 11:52 |
noonedeadpunk | so you can make jsut 2 commits in same branch and `git review` them independently | 11:52 |
f0o | also the sshca does not work.. just got asked for a password | 11:52 |
noonedeadpunk | as gerrit dfistinguished patches by their `Change-Id` | 11:53 |
noonedeadpunk | so any commit with the same change-id will result in placing content in place, as a patchset | 11:54 |
inakoti | Hi. Quick question: The ansible roles under openstack-ansible have no stable 2025.1 branch(Epoxy) yet. Will it be coming later? Thanks in advance | 11:56 |
noonedeadpunk | NeilHanlon: so... things went relatiively well... Until faced cinder: https://zuul.opendev.org/t/openstack/build/d85f551fd67f44a68dc6aeef0a918923 | 11:57 |
jrosser | inakoti: deployment tooling projects always release later than the service projects in openstack, its basically chicken/egg otherwise | 11:57 |
noonedeadpunk | inakoti: yes, sure it will come later. OSA has a trailing release model, meaning we have 2 month after coordinated release to adopt and release | 11:57 |
jrosser | you can see the official release schedule here https://files.openstack.org/project/releases.openstack.org/flamingo/schedule.html | 11:58 |
noonedeadpunk | Jun 02 is the deadline for us | 11:58 |
jrosser | so epoxy for deployment projects gets released ~10 weeks into the Flamingo dev cycle for the rest of openstack | 11:59 |
noonedeadpunk | NeilHanlon: so I was kinda wondering about `libzstd(x86-64) >= 1.5.5` part.... | 11:59 |
noonedeadpunk | NeilHanlon: or you think it's worth to get python3-zstd from epel instead? | 11:59 |
noonedeadpunk | just what your suggestion would be on solving the conflict? | 12:00 |
noonedeadpunk | inakoti: but hopefully we get a beta release next week | 12:00 |
jrosser | soo much to merge though :( | 12:01 |
noonedeadpunk | I'd say for beta we can have jsut https://review.opendev.org/c/openstack/openstack-ansible/+/946083 ? | 12:01 |
jrosser | you mean make the branch? | 12:01 |
noonedeadpunk | it will hopefully pass in next 30m | 12:01 |
noonedeadpunk | nah | 12:01 |
noonedeadpunk | branch is made with RC | 12:01 |
noonedeadpunk | not beta | 12:01 |
jrosser | yeah ok sure | 12:02 |
jrosser | f0o: you should be able to compare the ssh setup that got made with your swift changes to what you have for keystone/nova | 12:03 |
noonedeadpunk | beta is pretty much a milestone (which back in the days when evrardjp was PTL) that potentially should be even before coordinated release... | 12:03 |
f0o | jrosser: the issue is that the synchronize module used to distribute the rings uses the root user while the sshca was set up for the swift user - I'm going to use the rsync command from the fernet keys distribution - funny enough the comment states that this should be moved to synchronize module. But synchronize does not support setting a user it seems | 12:05 |
jrosser | hmm synchronise is tricky | 12:06 |
noonedeadpunk | can't you `become_user` for it? not sure, but it might work if `swift` has a shell | 12:06 |
noonedeadpunk | but yeah, it can be super tricky indeed | 12:07 |
jrosser | it might be better to remove use of it totally | 12:08 |
inakoti | Thanks noonedeadpunk and jrosser. Will 2024.2 stable be compatible with 2025.1 openstack service projects? (We have a deliverable where we intend to upgrade standalone baremetal service to Epoxy by end of May) | 12:08 |
jrosser | inakoti: possibly - sometimes you can run services from future versions, but you would need to test quite significantly and take steps to ensure that the future versions of services used the future set of upper-constraints | 12:10 |
noonedeadpunk | yes, most likely. But then I guess I'd suggest to try out beta... | 12:10 |
noonedeadpunk | and report back what is broken :D | 12:10 |
inakoti | For sure :) | 12:11 |
inakoti | Thanks again guys for the quick info | 12:11 |
jrosser | do you mean ironic when you say baremetal service? | 12:12 |
inakoti | yes | 12:13 |
noonedeadpunk | I'd hope by mid May to release an RC already TBH | 12:14 |
f0o | noonedeadpunk: top of your head, how can I manually verify that sshca works? | 12:14 |
f0o | should I just be able to ssh from a swift-user into a different host? | 12:14 |
noonedeadpunk | only to one, which has swift as allowed principal | 12:15 |
noonedeadpunk | and from one which has a private key issued by the ca | 12:15 |
noonedeadpunk | or smth like that.... | 12:15 |
noonedeadpunk | so generally it should be swift <-> swift ssh | 12:16 |
f0o | then sshca does not work | 12:16 |
f0o | yeah /etc/ssh/auth_principals/ is missing the swift_principals on one of my swift storage hosts | 12:17 |
f0o | it does exist in the swift-proxy lxc container | 12:17 |
f0o | wonder what went wrong there | 12:17 |
f0o | https://paste.opendev.org/show/bT7aq9SzKsaHFWk76Xgj/ << I think that's the issue | 12:25 |
f0o | why would the trusted_ca be missing on the lxc_containers... | 12:25 |
jrosser | you could re-run the swift playbook with `--tags swift-key`to check exactly what happens for that part | 12:30 |
noonedeadpunk | yeah, it's hard to guess without some output | 12:31 |
f0o | So it places OpenStack-Ansible-SSH-Signing-Key into /etc/ssh/trusted_ca.d | 12:31 |
f0o | is the config supposed to reference that instead of /etc/ssh/trusted_ca ? | 12:33 |
noonedeadpunk | there should be a handler: https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/roles/ssh_keypairs/handlers/main.yml#L16-L20 | 12:34 |
noonedeadpunk | which combines all /etc/ssh/trusted_ca.d/ to /etc/sshd/trusted_ca | 12:34 |
f0o | 'Regenerate trusted_ca file' is not being executed | 12:35 |
noonedeadpunk | so it should be called here https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/roles/ssh_keypairs/tasks/standalone/install_ssh_ca.yml#L32-L42 | 12:35 |
f0o | that is being executed but then it goes immediately to 'Remove sshd trusted authorities for absent CA' | 12:36 |
noonedeadpunk | yeah, handler is executed closer to the end of the play | 12:37 |
f0o | which ironically also calls the regenerate but that is not being executed anywhere in the logs | 12:37 |
f0o | it ends with: TASK [openstack.osa.ssh_keypairs : Copy ssh keys to target] ******************** | 12:37 |
f0o | then comes osa.mq_setup | 12:37 |
f0o | h1_2-swift-proxy-container-a9384c07 : ok=24 changed=0 unreachable=0 failed=0 skipped=24 rescued=0 ignored=0 | 12:37 |
f0o | is there like a wgetpaste somewhere? | 12:37 |
noonedeadpunk | https://paste.openstack.org/ ? or? | 12:38 |
f0o | https://paste.opendev.org/show/bM4hdSYDdufIJV4g8CJx/ | 12:40 |
f0o | only had to scp it around a billion times heh | 12:40 |
noonedeadpunk | so handler only exevutes if task is `changed` | 12:41 |
noonedeadpunk | and it's not if all content already there | 12:41 |
f0o | so if I delete the file and rexec it, it should generate it? | 12:42 |
noonedeadpunk | there can be a corner case where things were placed, but the host failed and handler not executed as a result of failure | 12:42 |
noonedeadpunk | yeah, if you delete and re-exec it should generate it | 12:42 |
f0o | let's see, deleted | 12:42 |
f0o | you're right it did it | 12:43 |
f0o | wouldnt it be safer to always regenerate it? | 12:43 |
f0o | it seems like a nobrainer operation | 12:43 |
opendevreview | Merged openstack/ansible-role-frrouting master: Remove become blocks from tasks https://review.opendev.org/c/openstack/ansible-role-frrouting/+/946115 | 12:44 |
opendevreview | Merged openstack/ansible-role-frrouting master: Use FQCN for module calls https://review.opendev.org/c/openstack/ansible-role-frrouting/+/938273 | 12:44 |
opendevreview | Merged openstack/ansible-role-frrouting master: Use OSA_TEST_REQUIREMENTS_FILE for molecule job https://review.opendev.org/c/openstack/ansible-role-frrouting/+/939300 | 12:44 |
noonedeadpunk | well... I'd rather did a variable flag to force paste | 12:44 |
noonedeadpunk | like clean-up all auth providers before placing new ones | 12:44 |
f0o | I have rings! | 12:45 |
noonedeadpunk | awesome! | 12:45 |
f0o | yes almost there :D | 12:46 |
f0o | the systemd unit expects the rings to be elsewhere, this might be a distro specific thing | 12:46 |
noonedeadpunk | so, https://review.opendev.org/c/openstack/openstack-ansible/+/946083 has just passed CI, so let's land it and a follow-up change right away | 12:46 |
noonedeadpunk | And I'll propose beta with that :) | 12:47 |
noonedeadpunk | mgariepy: damiandabrowski ^ | 12:47 |
f0o | now that I know that sshca works, I'll switch back to synchronize and give the become:swift a try | 12:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-systemd_service master: Remove quotes from conditional statements https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/941946 | 12:51 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add openstack_user_config verification playbook as healthcheck https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/938980 | 12:52 |
f0o | noonedeadpunk: synchronize does not seem to care about become:true;become_user:swift | 13:02 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 13:07 |
jrosser | f0o: how many files are there in the swift rings definition? | 13:11 |
f0o | {account,container,object}.builder {account,container,object}.ring.gz and a few object-N.ring.gz | 13:12 |
f0o | I guess the N is the zones. at least it matches to the 3 zones I got | 13:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Ensure that failures are fatal for upgrade_check https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/946234 | 13:15 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Fix quorum/stream queues if they're below minimal size https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/946268 | 13:15 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Execute rabbitmq post_upgrade hook https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/946270 | 13:15 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 13:26 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 13:33 |
*** noonedeadpunk_ is now known as noonedeadpunk | 13:35 | |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-os_swift master: Migrate role to use SSH CA https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/946865 | 13:39 |
f0o | tried to be smart to merge two tasks into one. failed miserably. so it stays at two | 13:39 |
jrosser | dont be afraid to make several small commits, they can be easier to review | 13:41 |
f0o | these do all tie into the sshca thing tho, maybe the timers could be split out | 13:41 |
jrosser | if you make several commits locally all on top of each other and then do `git review` at the top one, they will get submitted as a series with the concept that they stack on each other being preserved | 13:42 |
jrosser | yes its fine, just giving pointers for the future :) | 13:42 |
f0o | oooh | 13:42 |
f0o | good to know | 13:42 |
jrosser | its particularly important if you find a bug and want it backported to a stable branch | 13:42 |
f0o | I'm very new to gerrit | 13:42 |
jrosser | so this is a thing that you basically can't do with github workflow | 13:43 |
f0o | that is true | 13:43 |
jrosser | if you look at this one https://review.opendev.org/c/openstack/openstack-ansible/+/946043 | 13:43 |
jrosser | see the box "relation chain", thats got 4 patches stacked up on each other in order | 13:44 |
jrosser | those are all patches in the same repo | 13:44 |
jrosser | and then in the commit message there are two "Depends-On" lines | 13:44 |
jrosser | those say that "this other patch in another repo must be applied when testing this one" | 13:45 |
jrosser | and by extension, "this other patch in another repo must merge before this one" | 13:45 |
opendevreview | Merged openstack/openstack-ansible-os_blazar master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-os_blazar/+/941325 | 13:47 |
f0o | I cannot get the regeneration of trusted_ca to trigger reliably | 13:48 |
*** frickler_ is now known as frickler | 13:48 | |
f0o | I got 3 hosts that wont get it, even if I remove the trusted_ca.d/* like I did before | 13:48 |
f0o | do I need to remove the trusted_ca.d/* from all hosts in the group for the handler to retrigger correctly? | 13:49 |
jrosser | for a handler, you need to find the condition that triggers it | 13:49 |
jrosser | and when that task is "changed" the handler will run | 13:49 |
f0o | https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/roles/ssh_keypairs/tasks/standalone/install_ssh_ca.yml#L32-L42 | 13:49 |
f0o | that is run, the file is placed into trusted_ca.d but no regeneration is triggered | 13:50 |
f0o | nvmd. It was triggered at the very end of the playbook - after everything has been already failing since it cant copy the rings | 13:50 |
f0o | can I somehow force it to run the handler a bit earlier? | 13:50 |
opendevreview | Merged openstack/openstack-ansible-os_neutron master: Remove lxb driver support from the role https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/946145 | 13:52 |
noonedeadpunk | not really afaik | 13:52 |
f0o | how do I solve this chicken-egg issue then? | 13:53 |
f0o | I cant copy the rings because the trust isnt setup until after the rings are being attempted to be copied | 13:53 |
jrosser | handlers always run at the end of the play | 13:55 |
jrosser | i think that there is a similar situation in keystone perhaps | 13:55 |
noonedeadpunk | f0o: you can flush_handlers | 13:55 |
noonedeadpunk | https://docs.ansible.com/ansible/latest/collections/ansible/builtin/meta_module.html | 13:56 |
opendevreview | Merged openstack/openstack-ansible-os_barbican master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/941322 | 13:56 |
noonedeadpunk | maybe it makes sense to add this to the end of ssh_keypairs tasks/main.yml | 13:56 |
noonedeadpunk | but it can actually trigger unexcpected thingfs as well | 13:57 |
jrosser | i would worry about flush_handlers | 13:57 |
jrosser | keystone is https://github.com/openstack/openstack-ansible-os_keystone/blob/master/tasks/main_pre.yml | 13:57 |
jrosser | and https://github.com/openstack/openstack-ansible-plugins/blob/master/playbooks/keystone.yml#L53-L61 | 13:57 |
f0o | so keystone solved it by running the sshkeygen in a different playbook before the actual installation | 13:59 |
f0o | I mean I can do the same, swift doesnt have that but I can make a patch for it | 14:00 |
f0o | just seems like this is adding more and more creep | 14:00 |
f0o | soon I touched every bit of it :D | 14:00 |
noonedeadpunk | hehe | 14:00 |
noonedeadpunk | and you said you're not that good in ansible :D | 14:00 |
f0o | I'm not I'm just smashing buttons and copypasta and it somehow works :D | 14:01 |
jrosser | i think that whats happening is that we've come across similar problems in the other roles, and there are good patterns you can lift | 14:01 |
opendevreview | Merged openstack/openstack-ansible-os_aodh master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/941320 | 14:02 |
jrosser | its just very hard to keep on top of all the roles, particularly if you're not using them personally | 14:02 |
opendevreview | Merged openstack/openstack-ansible-memcached_server master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-memcached_server/+/941504 | 14:12 |
noonedeadpunk | omfg, I've started looking at image decompressing code.... | 14:13 |
noonedeadpunk | quite some complexity has build up in image upload, I'd say | 14:14 |
jrosser | this is in glance itself? | 14:15 |
noonedeadpunk | and specifically around checksums.... | 14:15 |
noonedeadpunk | nah, it's all possible different scenarios | 14:15 |
noonedeadpunk | what if a path supplied, what if it's url, what if there's a checksum | 14:15 |
noonedeadpunk | and not with compresion - there's an archive checksum and decompressed image checksum | 14:16 |
noonedeadpunk | *now with | 14:16 |
noonedeadpunk | and then there's gz, xz, etc... | 14:16 |
noonedeadpunk | and then what to do if archive checksum does match, but decompressed image checksum does not match with what is supplied | 14:21 |
opendevreview | Merged openstack/openstack-ansible-os_horizon master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/941445 | 14:24 |
noonedeadpunk | jrosser: do you remember if glance verifies checksum provided by user comparing to what it recieves as image? I think yes, right? | 14:44 |
jrosser | i think so? | 14:45 |
jrosser | and doesnt that also get used perhaps by nova or something later if the image is moved to a compute | 14:45 |
* jrosser not sure | 14:45 | |
noonedeadpunk | yeah, but glance calculates checksum if it's not provided I think... not sure | 14:46 |
noonedeadpunk | but just decided to double-check | 14:46 |
noonedeadpunk | as a bit /o\ | 14:46 |
jrosser | but then there is also image signing stuff thats more complicated again | 14:47 |
noonedeadpunk | yeah, that requires barbican iirc | 14:49 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add ability to decompress images for upload https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/946918 | 14:55 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Define parameters to decompress magnum image before upload https://review.opendev.org/c/openstack/openstack-ansible/+/946919 | 15:04 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_magnum master: Use libxslt1-dev package instead of unversioned one https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/946560 | 15:04 |
noonedeadpunk | I guess it will break terribly... | 15:04 |
opendevreview | Merged openstack/openstack-ansible master: Freeze roles for 31.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/946083 | 15:21 |
noonedeadpunk | release proposed: https://review.opendev.org/c/openstack/releases/+/946937 | 15:54 |
noonedeadpunk | hm.... .wtf https://zuul.opendev.org/t/openstack/build/cac47b4a97ce4ea18c2cc899f2a3b088/log/job-output.txt#11694-11729 | 17:41 |
noonedeadpunk | ah | 17:42 |
noonedeadpunk | fairt enough | 17:42 |
noonedeadpunk | no, not at all.... | 17:45 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add ability to decompress images for upload https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/946918 | 17:57 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add ability to decompress images for upload https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/946918 | 17:57 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add ability to decompress images for upload https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/946918 | 18:00 |
opendevreview | Ivan Anfimov proposed openstack/openstack-ansible master: docs: forgot dot in releases info https://review.opendev.org/c/openstack/openstack-ansible/+/946961 | 21:24 |
opendevreview | Ivan Anfimov proposed openstack/openstack-ansible master: docs: forgot dot in releases info https://review.opendev.org/c/openstack/openstack-ansible/+/946961 | 21:26 |
opendevreview | Merged openstack/openstack-ansible-os_neutron master: Auto-fix usage of modules via FQCN https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/941378 | 22:21 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!