Wednesday, 2023-06-21

*** ultra is now known as Guest368804:08
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Add 'tls-transition' scenario  https://review.opendev.org/c/openstack/openstack-ansible/+/88519408:46
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible stable/zed: Add support for 'tls-transition' scenario  https://review.opendev.org/c/openstack/openstack-ansible/+/88519608:46
kleiniI am currently upgrading to Yoga. setup-infrastructure did not fail in staging but in production: "Host controller2-repo-container-lalala is not in 'Peer in Cluster' state". I will read now through GlusterFS setup guide but maybe you have some faster hints.08:48
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Enable TLS on haproxy VIPs and backends by default  https://review.opendev.org/c/openstack/openstack-ansible/+/88519208:48
kleiniStumbled over https://bugzilla.redhat.com/show_bug.cgi?id=1051992 restarting glusterd resolved peers in status "Accepted peer request"08:58
jrosseri wonder if it is a race somehow09:01
jrosserlike if we bring the gluster peers up all at once then it might end up in a strange state09:02
kleiniNow I have controller2 and controller3 as peers listed in controller2-repo-container-something and controller3-repo-container-something. Of course, "disconnected".09:05
kleinibefore 2- and 3-repo-container had just 1-repo-container as single peer09:05
kleinino, 2-repo-container has just 3-repo-container as single peer and vice versa.09:08
kleinisorry, my first time getting in touch with glusterfs09:09
kleinifinally solved. I had two issues: 1. glusterd required a restart on all nodes to get transition from "Accepted peer request" to "Peer in Cluster" 2. nodes 2 and 3 had each other as peer, while 1 had 2 and 3 as peers. had to add 1 as peer on node 2, which automatically added node 1 as peer on node 3.09:24
jrosserinteresting09:27
jrosseriirc the `infra` CI jobs start 3 repo containers to check this09:28
noonedeadpunkyup, it does09:28
noonedeadpunkbut we don't check there like idempotency or anything like that for example09:29
noonedeadpunkand on top of that, I guess environment wasn't brand new?09:29
kleinimaybe my second issue is caused by my first one.09:30
kleinienvironment is old, initially deployed with S or T. second and third controller node have been added later.09:31
kleinias my initial problem with peers in wrong states still does not seem to be resolved at all, would there be anything that could help to avoid such issues?09:41
kleinianything in my logs now, that could help to avoid it?09:41
noonedeadpunkTo be frank - I'm not huge expert in gluster. We mount cephfs instead. As you can actually use any shared FS instesd of gluster, even things like s3fs or nfs. As at the end mount is configured using systemd-mount, and gluster installation can be disabled with a variable09:45
jrosseryes, gluster is not an absolute requirement at all, it's just that something is needed to provide a shared filesystem09:50
jrosseryou can disable it and provide something else here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/repo_all.yml#L25-L3209:51
jrosserhaving said that, really it should work though09:52
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump ansible-core to 2.15.1 and collections  https://review.opendev.org/c/openstack/openstack-ansible/+/88652709:55
noonedeadpunkyeah, it should for sure09:56
jrossermaybe as simple as needing to run that in serial, but i don't really know10:00
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Remove haproxy_accept_both_protocols from repo_all  https://review.opendev.org/c/openstack/openstack-ansible/+/88658610:00
kleiniwill think about migrating that to a cephfs but then repo container need additional connection to Ceph storage network.10:05
jrosserglusterfs has been reliable for us10:06
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Remove Ubuntu 20.04 support  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/88659510:10
kleiniI expect it to be reliable, too, at least according to what I've heard. I just had now issues in production with setting it up initially.10:11
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Cleanup old OS support  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/88659710:14
noonedeadpunkswitching to ansible-core 2.15 won't be trivial... Prtially because I did an unsupported thing with loop label lately :(10:33
jrosserthis was to prevent secret output (key) being in the ansible log?10:57
noonedeadpunkwell, more to suppress output and be more clear of what;s passed to the module instead of jsut what we're looping against11:04
noonedeadpunkas loop item != what we pass to the module, so kinda weird11:04
jrosseri think using default(omit) on the label is pretty suspect too11:09
jrossereven if a mapping were allowed11:09
noonedeadpunkI think I was adding default('omit') ? Which would jsut print out "omit"11:32
noonedeadpunkAt least that was intention11:32
noonedeadpunkjrosser: have you ever was concerned about live migration speed? 11:33
noonedeadpunkAs it seems that with enabled TLS for libvirtd it uses only single core for migration11:33
noonedeadpunkwhile with disabled it utilkizes all. Which means that speed of migration is like a VM with disabled multiqueue11:34
noonedeadpunk~1.2gb11:34
jrosseri dont recall us having seen an issue with that yet11:35
jrosserandrewbonney: ^ ?11:35
jrosserthat is pretty sad though11:35
andrewbonneyI haven't seen it, but doesn't mean we don't have it11:35
andrewbonneyOur previous issues were all around using the wrong interfaces11:36
noonedeadpunkLike https://listman.redhat.com/archives/libvirt-users/2018-May/msg00053.html11:36
jrosserthe second post there is talking about large volumes11:38
jrosserandrewbonney: related to wrong interfaces, there are some patches regarding management address / ssh address which we need to go over11:38
noonedeadpunkah, well https://wiki.qemu.org/Features/Migration-Multiple-fds11:42
noonedeadpunkwhat is fun though, is to see how encryption affects network throughput11:50
jrossersimilar https://bugzilla.redhat.com/show_bug.cgi?id=196854011:54
noonedeadpunkas without tls enabled for live migrations (using plain tcp), I have like 20gbit/s vs 3gbit/s with enabled encryption11:54
noonedeadpunkyup...11:56
noonedeadpunkSo it's a feature11:57
noonedeadpunkThough now I'm very more sceptical about enabling internal tls by default11:57
noonedeadpunkdamiandabrowski: you might be interested in the topic as well11:57
jrosserfeels like that is a legitimate thing to talk to nova about, as it's not obvious that there is a big performance hit there12:00
noonedeadpunkyeah, already pinged them as well. At least mentioning that in docs would be good I guess12:01
damiandabrowskinoonedeadpunk: but nova_qemu_vnc_tls is enabled by default already12:02
noonedeadpunkbut what is really nasty, that when you disable tls - you also can not do authentication12:03
noonedeadpunkas it's done through mTLS 12:03
damiandabrowskii didn't want to mention vnc :D 12:03
noonedeadpunkdamiandabrowski: well, what i meant, is that usage of any encryption that backed by gnutls, will get serious performance hit12:04
jrosserisnt live migration an extreme case of that though?12:05
jrossernormal API traffic will be spending much time doing $stuff in python anyway12:06
damiandabrowskii can test that with rally if we have any doubts12:06
damiandabrowskibut i agree with jrosser 12:06
jrosseri am also not really familiar with the process model of uwsgi, if the tls is in a separate process/thread from the python parts12:08
noonedeadpunkwell, this feature for gnutls was merged for 3.7.3 which is exactly what you'd get in 22.0412:11
noonedeadpunkI should test in the sandbox12:11
*** mgoddard- is now known as mgoddard12:28
NeilHanlonyayyy centos is not longer publishing to git.centos.org 12:57
* NeilHanlon begins crying12:57
noonedeadpunk┻━┻︵ \(°□°)/ ︵ ┻━┻13:18
NeilHanlon(https://www.redhat.com/en/blog/furthering-evolution-centos-stream)13:22
noonedeadpunkSo now source code is locked for rhel customers only?13:29
noonedeadpunkrly?13:29
NeilHanlonpretty much13:30
NeilHanlonwhat a fun wednesday13:30
noonedeadpunkguess we should discuss marking CentOS as experimental at this point13:31
noonedeadpunkBut not really sure I realize what it means for Rocky? Not much I guess?13:32
NeilHanlonfor now it means we don't have updates...13:33
NeilHanlonactively working on wtf we're going to do, though13:33
mgariepywow.13:34
noonedeadpunkoh, wow13:35
mgariepyrebase rocky on.. debian ?13:37
noonedeadpunklol13:37
noonedeadpunkbut that's not fun at all to be frank13:39
mgariepyi know.13:39
noonedeadpunkobvoisly that's a move against derivatives13:39
mgariepyibm is so evil. imo.13:41
spatelwhat could be the issue for cinder volume stuck in detaching state ?13:44
spatelI am able to create/attach but detach just stuck 13:44
noonedeadpunkdoes it get detached from nova point of view?13:45
noonedeadpunkas attachment is stored both in cinder and nova databases13:46
spatelchecking nova logs13:47
noonedeadpunkand depending on what command you use to detach - flows might be different. Or well, they could be different until latest os-brick OSSA vulnarability got covered13:47
spatelnoonedeadpunk this is the error I am getting on nova-compute.log - https://paste.opendev.org/show/bAbgQN0Qpf9do7pp0tMj/13:51
noonedeadpunkask kolla ヽ(。_°)ノ13:52
spatelhaha, its my lab 13:52
spatelmy production still running on openstack-ansible but some small environment using kolla.. :(13:53
noonedeadpunkbut eventually the latest ossa coverage got commands to detach volume issued to cinder invalid13:53
noonedeadpunkand I think you should always use nova api to detach volumes since then13:54
spatelHmm! I am using horizon to detach. You are saying use CLI?13:54
spateleverything was working fine until yoga but as soon as I upgrade to zed this issue encounter. 13:57
spatelI will open bug and see if its real issue or something else 13:57
noonedeadpunkYeah, that's actually backported back to Yoga13:58
noonedeadpunkhttps://security.openstack.org/ossa/OSSA-2023-003.html13:58
noonedeadpunkand that's the release note covering your issue I beleive https://review.opendev.org/c/openstack/cinder/+/882835/2/releasenotes/notes/redirect-detach-nova-4b7b7902d7d182e0.yaml#2014:00
noonedeadpunk`cinder now rejects user attachment delete requests for attachments that are being used by nova instances to ensure that no leftover devices are produced on the compute nodes which could be used to access another project's volumes.`14:01
spatelYou are saying it required to use nova service token ?14:01
spatelis this what you refer - https://docs.openstack.org/nova/latest/admin/configuration/service-user-token.html14:02
noonedeadpunkI'm saying there used to be 2 api cals that allowed to detach volume - one to cinder and another to nova14:03
noonedeadpunkfrom now own requests directly to cinder will fail14:03
noonedeadpunkSo you have this https://docs.openstack.org/api-ref/block-storage/v3/index.html#detach-volume-from-server14:04
noonedeadpunkand you have that https://docs.openstack.org/api-ref/compute/#detach-a-volume-from-an-instance14:04
noonedeadpunkand now first one can be called only by nova service and not by user14:05
noonedeadpunkIf I'm not mistaken and it's vice versa...14:06
spatelLet me understand, You want me to use nova volume-detach command to detach volume? 14:07
spatelwhen you say nova api means what? 14:07
spatelLet me understand whole bug report first14:12
noonedeadpunkor `openstack server remove volume`14:12
spatelthat didn't help :( 14:14
spatelI believe we need to configure something like this in cinder or nova - send_service_user_token = True14:14
spatelbecause of security reason now cinder won't allow to detach volume without valid token from nova.. 14:15
spateltrying to understand where and how I should add those option in config14:15
spatelhttps://bugs.launchpad.net/cinder/+bug/2004555/comments/7514:16
noonedeadpunkYou can check how we did that :)14:18
noonedeadpunkso you should define service_user https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L193-L20314:19
spateloh.. let me splunk OSA code..14:19
noonedeadpunkand also use service token roles https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L177-L17914:19
spatelI should be adding them in NOVA and Cinder both place correct?14:19
noonedeadpunkand glance I guess14:19
spatel3 roles?14:20
noonedeadpunkWell, role is `service` for all of them14:24
admin1noonedeadpunk, from which tag  in osa is the nova service token implemented ? 14:30
admin1i see .. yoga and xena 14:31
noonedeadpunkit's not backported to xena :(14:42
noonedeadpunkfor yoga it's 25.4.014:43
noonedeadpunkand minor upgrade could be quite breaking as well - I believe I wrote a release note to address that14:43
spatelI hit this issue in my upgrade path so definitely worth keeping eyes 14:44
anskiynoonedeadpunk: that note only mentions major upgrades, if I understand correctly15:23
mgariepyanyone here do multiple let'sencrypt domain/ips on a deployment ?15:23
anskiyI'm planning an upgrade from 25.2.0 to 25.4.0, should I be concerned about that thing?15:24
* noonedeadpunk needs to check patch again15:24
noonedeadpunkanskiy: I think you're right, and minor upgrade will just cover vulnarability and resilently enable usage of service tokens15:26
noonedeadpunkor well, relatively resiltently15:26
anskiyokay, thank you :)15:27
noonedeadpunkas problem was arising, when you already require service roles, but users were not assigned the role on the first place15:27
noonedeadpunkWhen upgrading to Yoga, you would get role assigned, but it was not forced yet. And with this minor upgrade it will be forced15:28
spatelnoonedeadpunk it works now after adding service_user snippet :)16:16
spatelThank to point out that 16:16
admin1mgariepy, use case ? .. i usually use a wildcard 18:22
mgariepyhaving 2 different ips on the ctrl, one for api the other object18:24
mgariepyadmin1, ^^ 18:25
admin1i have use SAN  for those, but not used letsencrypt 18:33
jrossermgariepy: I think you can supply extra args to certbot through the haproxy role vars18:52
jrosserwith more ‘-d {{ fqdn}}’ as you need them18:52
mgariepyjrosser, yep i think i found the correct stuff18:53
mgariepyi want to have 1 cert per ip/domain tho.18:53
jrosseryeah I’m not so sure we can do that just now18:53
mgariepywill need to do some stuff but it should work.18:53
mgariepyi'll see and patch as needed i guess.18:53
jrosserwe are looking at enabling s3 static sites which needs another ip/dns on the same haproxy18:53
mgariepythe keepalived part is kinda simple.18:54
mgariepywith keepalived_instances_overrides18:54
jrosserunclear if it is possible to have two haproxy front ends with different LE setups18:54
jrosseror if it’s ok to share the same very with a SAN18:55
jrosser*same cert18:55
mgariepyi'll dig a bit, might need some custom haproxy front/back i guess..18:59
mgariepyhmm yeah i guess it would need some adjustment.. 19:24
mgariepyjrosser, i guess we would need to refactor part of the code. here.19:56
mgariepythe haproxy is probably the simplest part since it can be bound to a specific ip for a Front19:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!