Wednesday, 2024-03-27

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Do not define a random password for each run  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91233208:43
jrosseri think we might have circular dependancies for skyline08:49
jrosserlinters fail for this https://review.opendev.org/c/openstack/openstack-ansible/+/85944608:49
jrosserwhich appears to be fixed by https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/912370/608:49
jrosserbut that depends-on the first one08:49
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Use container setup role from plugins repo  https://review.opendev.org/c/openstack/openstack-ansible/+/90500409:56
noonedeadpunkjrosser: yeah, you're right I guess.... though I hoped they were fixed by 91233309:59
jrosserit might be the difference between 9 and "9" in the meta/main.yml10:00
jrosserbut it is not good how lint passes on 912333 but fails on 85944610:02
noonedeadpunkyeah, though I hoped that meta is not verified in integrted repo10:03
noonedeadpunkwe can actually make another patch I guess to cover this meta thing10:04
jrosseroh you mean just merge that first into the os_skyline?10:04
noonedeadpunkyeah10:06
jrosseri can make a patch for that10:06
noonedeadpunkjrosser: do you remember how you workarounded unsafe condition previously for python_venv_build?10:11
noonedeadpunkas seems we have common thing here https://zuul.opendev.org/t/openstack/build/d2c51c94f37e48419a73143266eb699d/log/job-output.txt#9486-949010:11
jrosseryeah i added `use: "{{ ansible_facts['pkg_mgr'] }}"` to the `package` module10:12
jrosserwhich makes it then not try to template that ^^ inside the action plugin10:13
noonedeadpunkok, it's different then I assume10:15
jrosserits related but yes different10:15
jrosserlooks lke that totally deserves a new bug10:15
jrosseras it's a different side-effect of the same change to the types they made10:16
jrosseroh well perhaps its not, as it says `[WARNING]: conditional statements should not include jinja2 templating`10:17
jrosserright before it fails10:17
noonedeadpunkyeah, true, though I would assume that it could be fine for assert... anyway will check 10:19
jrosserhow did this not fail before10:20
jrosseroh that job is maybe conditional10:20
noonedeadpunkyeah10:21
noonedeadpunkwill check how to transform10:21
jrosseri think i found it pretty hard to escape from having an AnsibleUnsafeText10:25
jrosserand the | type_debug thing was pretty handy when trying to work out what was happening10:26
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_skyline master: Add quotes for EL version in meta/main.yml  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91443910:30
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability  https://review.opendev.org/c/openstack/openstack-ansible/+/85944610:30
jrosserurgh `[MIRROR] systemd-devel-252-32.el9.x86_64.rpm: Status code: 404 for https://mirror-int.ord.rax.opendev.org/centos-stream/9-stream/AppStream/x86_64/os/Packages/systemd-devel-252-32.el9.x86_64.rpm (IP: 10.209.96.36)`10:32
jrosserand more `fatal: [localhost]: FAILED! => {"attempts": 5, "changed": false, "msg": "Failed to download packages: pcp-6.2.0-2.el9.x86_64: Cannot download, all mirrors were already tried without success", "results": []}`10:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions  https://review.opendev.org/c/openstack/openstack-ansible/+/91446110:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions  https://review.opendev.org/c/openstack/openstack-ansible/+/91446110:42
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability  https://review.opendev.org/c/openstack/openstack-ansible/+/85944610:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions  https://review.opendev.org/c/openstack/openstack-ansible/+/91446110:42
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability  https://review.opendev.org/c/openstack/openstack-ansible/+/85944610:42
noonedeadpunkugh10:43
noonedeadpunkok, but now that should pass, I assume10:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Add EL distro support  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91237010:44
jrosseri hope so10:44
opendevreviewJames Denton proposed openstack/openstack-ansible-os_skyline master: Support large uploads via Skyline  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91414910:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Install skyline-console through yarn  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91440510:44
gokhanhello folks, after upgrade to antelope, live migration is not working. I can migrate but live migrate is not 10:44
gokhanwhen I check with nova user on compute hosts, I can not ssh between compute hosts. It waits when trying to ssh  10:45
noonedeadpunkiirc, what changed in terms of live migration, is that libvirt should be service tls since antelope10:45
noonedeadpunkand also ports were changed iirc10:46
noonedeadpunkregarding SSH - it's done with SSH certs right now, but SSH is used only for offline migration10:46
gokhanmy env is without ssl :( 10:46
noonedeadpunkIIRC tunnel migrations were deprecated10:47
jrosserssl between haproxy and the api is not the same as live migtation over tls between libvirt<>libvirt10:47
noonedeadpunkhttps://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_tunnelled10:47
noonedeadpunkso, we have that by default: https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/defaults/main.yml#L454-L45510:48
noonedeadpunkbut we kept old logic there actually: https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/templates/nova.conf.j2#L271-L27710:49
noonedeadpunkthough I guess it's broken?10:49
noonedeadpunkas keyfile is probably wrong?10:49
noonedeadpunkhuh, let me check quickly10:49
gokhanok thanks jrosser fyi10:49
gokhannoonedeadpunk, my libvirt config is in nova https://paste.openstack.org/show/bPYwsAhtOFxZUoJbxhYO/10:50
noonedeadpunkyeah, ok, so you should be running live migrations through TLS10:51
noonedeadpunkif you're using firewall, just ensure that required ports are open, as they're different10:53
noonedeadpunkso it should be 16514 and 49152:49215 for management network10:54
noonedeadpunkI again forgot how ssh certs are working /o\10:58
noonedeadpunk ssh 10.X.X.X -l nova -i /var/lib/nova/.ssh/id_rsa throws Permission denied (publickey)10:58
gokhannova conductor throws no valid host found error https://paste.openstack.org/show/broAn9S8ZiszrJVzenXc/10:59
noonedeadpunkok, well, that is different10:59
gokhanall compute service is up 11:00
noonedeadpunkanything in scheduler logs?11:02
noonedeadpunklike some "info" of why11:02
gokhannoonedeadpunk, Mar 27 10:35:37 dev-infra3-nova-api-container-d4035159 nova-scheduler[65766]: 2024-03-27 10:35:37.376 65766 INFO nova.scheduler.host_manager [req-15e24c17-4b77-4911-a4a5-f99315dcc173 req-e06bd0de-9490-44ec-8c9a-ca75321d8744 4841276dcdbe4ab096ef60b1744c4fa9 f2e52a5c5d1c4ca1b51274619b517e0e - - default default] Host filter only checking host dev-compute1 and node dev-compute111:04
gokhanMar 27 10:35:37 dev-infra3-nova-api-container-d4035159 nova-scheduler[65766]: 2024-03-27 10:35:37.377 65766 INFO nova.scheduler.host_manager [req-15e24c17-4b77-4911-a4a5-f99315dcc173 req-e06bd0de-9490-44ec-8c9a-ca75321d8744 4841276dcdbe4ab096ef60b1744c4fa9 f2e52a5c5d1c4ca1b51274619b517e0e - - default default] Host filter ignoring hosts: dev-compute111:04
noonedeadpunkWell, I guess it's worth checking what is special about the instance or host - maybe some aggregates, AZs, server groups...11:07
opendevreviewMerged openstack/openstack-ansible-os_nova stable/2023.2: Ensure nova_device_spec is templated as JSON string  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/91350111:17
gokhannoonedeadpunk, https://paste.openstack.org/show/bBgug7t7cvPqKnHwoepK/11:17
noonedeadpunkok, it was my internal issue not being able to login via ssh :)11:18
noonedeadpunk(as nova user)11:18
gokhannoonedeadpunk, I can't ssh to compute nodes with command which you sent 11:19
noonedeadpunkactually this reminds me of some bug11:19
noonedeadpunkgokhan: SSH is used only for offline migration with the config you have11:21
noonedeadpunkfor online migration it is not needed at all11:21
noonedeadpunkand you obviously have some scheduling issue rather then anything else11:21
gokhannoonedeadpunk, ok so this is different.11:22
gokhannoonedeadpunk, I am using default scheduler filter in osa 11:23
noonedeadpunkI guess the question here is, why `Host filter ignoring hosts: dev-compute1`11:24
noonedeadpunkas that's the only candidate11:24
noonedeadpunkand it could be quite some reasons actually11:25
gokhanin nova config resizeonsamehost is true 11:25
jrossernoonedeadpunk: i just also checked the nova<>nova ssh and it works for me here11:25
noonedeadpunkyeah, just our internal mess dropped Include /etc/ssh/sshd_config.d/*.conf out of sshd_config11:26
gokhanlive migration is not working in all my antelope envs 11:28
gokhanfilters are there https://paste.openstack.org/show/bSnInnwo29kLjTB79oc0/11:29
jrossergokhan: i think you need to do debugging to find the root cause in your specific environment11:31
jrosserit's not possible to understand just from the config what is happening11:31
gokhanjrosser, I think it is about scheduling. I debugged nova scheduler but I didn't find root cause. In scheduler logs it lastly says There are 0 hosts available but 1 instances requested to build. these are scheduler logs https://paste.openstack.org/show/b1CktfA1CRBk95ZR6KYS/11:43
noonedeadpunkyeah, so `Host filter ignoring hosts: dev-compute1`11:46
noonedeadpunkso it's a specific filter that filtered out it11:46
opendevreviewMerged openstack/openstack-ansible-os_horizon stable/2023.2: Do not change mode of files recursively  https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/91401511:47
noonedeadpunkquestion - do you provide destination for the vm explicitly?11:47
gokhannoonedeadpunk, yes I am giving destination explicitly 11:47
noonedeadpunkand what if you don't?:)11:48
noonedeadpunksame - no hosts available?11:48
jrosseryou can look in the nova compute log on the dev-compute1 to see if the migration was rejected there11:49
jrosserand you can also look in placement logs to see if the resource claim is rejected there11:50
gokhannoonedeadpunk, yes similar error :( Host filter ignoring hosts: dev-compute3, dev-compute1, dev-compute211:50
jrossergokhan: are you migrating between antelope hosts? or is this migration part of your upgrade to clean out / upgrade compute nodes?11:51
noonedeadpunkwell, from what I do see in nova code - there should be ignored host somehow explicitly to get filtered in a way it does https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/host_manager.py#L595-L59811:51
noonedeadpunkI do recall one possible bug in nova, but it was sorted quite some time ago11:51
jrosserthere is this https://bugs.launchpad.net/nova/+bug/202303511:52
noonedeadpunknah, this one is kernel issue iirc11:52
noonedeadpunkit's like specific set of cpu and kernel - at least that's what we had11:53
noonedeadpunksec11:53
noonedeadpunkwill try to find what I'm talking about11:53
gokhanjrosser, yes now I checked nova-compute and it throws this bug. it is about cpu doesn2t have compability 11:54
gokhanjrosser, yes  this is migration part of upgrade to clean out / upgrade compute nodes11:55
noonedeadpunkjrosser: so, there's regression between kernel 3.14 and 3.17 where with Intel Gold kernel announces an extra cpu flag regardless of what's requested or present in cpu_map11:56
noonedeadpunkso migrating vms from some old E5 to Gold is one way ticket, unless you have hwe kernel11:56
jrosserahhhh ok we would not have come across that11:57
jrosserand on focal i think we were running HWE11:57
jrosserfor antelope we needed to patch nova for migrations to work https://github.com/bbc/nova/commits/bbc-antelope-27.1.0/11:58
gokhanwe are running also hwe on focal 11:58
jrossergokhan: it is pretty much normal for us to test this completely in a lab and the result of that is patched versions of a few services11:59
jrosseri think it's been a while since a completely stock install worked for us12:00
noonedeadpunkhuh, it does for us mostly...12:01
gokhanjrosser, so I need this commit https://github.com/bbc/nova/commit/159869cde16fbd3e780a2a5bfa59e999890e651112:02
jrossernova/neutron/magnum/keystone we need to fork currently12:03
jrossergokhan: perhaps - this is not anything i guarantee for you12:03
jrosserthats based on our tests in our lab]12:03
gokhanjrosser, I will test it now in test env12:04
jrosserwe discussed this with the nova team some time ago in their weekly meeting, and there suggestion for a quick-fix was to make that reverty12:05
jrosser*revert12:05
jrosserwhere "we" is my team here12:05
jrossernot OSA12:06
gokhanjrosser, it worked and now I can migrate thanks.12:09
jrosserit is probably worth replying to the bug to say that you are also affected12:11
gokhanjrosser, yes I am replying now. 12:12
gokhanthanks jrosser noonedeadpunk for your help :)12:19
gokhanjrosser, rebuilf rom volume based image is also not working. It seem you also fixed this problem. https://github.com/bbc/nova/commit/0411adc64c96fe2f83ef7cf1584e2b1ee5b602d312:26
jrossergokhan: ah well yes but i think you really have noonedeadpunk to thank there https://bugs.launchpad.net/nova/+bug/204026412:28
jrosserand that looks to be fixed in nova 2023.1 anyway now, so would be good idea to see why you don't have that already12:30
gokhanjrosser, I am getting again Image 4ef6efed-3ee1-4360-b366-3c6d69eedc09 is unacceptable: Unable to rebuild with a different image for a volume-backed server.12:34
gokhanI think this feature is added in zed but now it is not working 12:34
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Use container setup role from plugins repo  https://review.opendev.org/c/openstack/openstack-ansible/+/90500413:01
nixbuilderAnyone run across this after installing with 28.0.1??? https://paste.openstack.org/show/bAgJFWLfCr6VTIdTkNEK/13:03
nixbuilderIt's rabbitmq errors :-(13:04
jrosserideally you'd be using 28.1.013:06
nixbuilderjrosser: OK... will try that!13:07
jrosserthough i don't specifically expect a magic fix there, just that it's the latest tagged release13:07
noonedeadpunkWRONG_VERSION_NUMBER?13:08
noonedeadpunkit feels that it tries to reach rabbit on tls port without tls or smth like that13:09
noonedeadpunkSo yeah. I wouldn't put too much on 28.1.0 in this case13:09
jrosserso in my rabbitmq.conf i have management.ssl.versions.1 = tlsv1.213:10
jrosserand then in my nova.conf the rabbitmq connection string ends `@10.48.240.167:5671//nova?ssl=1&ssl_version=TLSv1_2&ssl_ca_file=`13:11
jrossernixbuilder: ^ so you can see there that ssl is enabled and the tls version is matching at both ends13:12
jrosserthats probably a good place to start debugging whats happening13:12
noonedeadpunkjrosser: finally found bug I was talking about /o\ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/203667513:12
nixbuilderjrosser: Thanks for the tip.. I am re-installing with 28.1.0 but I imagine I will have to debug my issue after my latest system comes back up!13:14
noonedeadpunkit was time when I spent like 2 days trying to build a kernel patch to live-apply through kpatch, but failed in building even kernel itself lol13:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Define amount of gunicorn workers through config  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91451613:36
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability  https://review.opendev.org/c/openstack/openstack-ansible/+/85944613:37
noonedeadpunkok, this one passed ^ but I had to update it a bit as yarn jobs were in retry_limit13:37
noonedeadpunkand this one also looks like good to go https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/91427513:40
jrosserrelation chain on some of the skyline patches is pretty messed up13:40
jrosseri dare not try to rebase that13:40
noonedeadpunkhaha13:40
jrosserthis in particular https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91233213:42
jrosseri'm always wary when the relation chain on the patch is pretty different to how it looks on other patches13:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Do not define a random password for each run  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91233213:43
noonedeadpunkthere're just couple of changes on top of https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/912370/7 as the one which should run at least playbooks towards skyline role13:44
noonedeadpunkstill not good enough testing, but at least role gets executed13:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Place sysctl options to it's own file  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/90891213:49
opendevreviewMerged openstack/openstack-ansible-openstack_hosts unmaintained/wallaby: Update .gitreview for unmaintained/wallaby  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/91308713:59
noonedeadpunkI;m looking through apt_key patches and see how ppl just rush to irc/launchpad filling bugs about failed deployments...14:00
noonedeadpunkwhile indeed that might make most sense right now14:01
jrosserbecasue the data structure for confgiruing custom repos pretty much changes?14:02
opendevreviewMerged openstack/openstack-ansible-openstack_hosts unmaintained/xena: Update .gitreview for unmaintained/xena  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/91314414:02
noonedeadpunkyeah14:03
noonedeadpunkand not always obvious14:03
jrosserno, though the new module is pretty neat and there is an opportunity to do a bunch of tidy up14:03
jrosserit is good to get feedback / review though, and i also think that people will trip over this14:03
noonedeadpunkI was thinking if we can get some pre-upgrade check or smth to fail early14:04
noonedeadpunkor dunno14:04
jrosserwe could do that if we changed the name of the var perhaps14:04
jrosserthen assert old var is not defined14:04
noonedeadpunkwell, I tried to do some wiring, but then realized it's also so different for 3 repos in topic14:05
noonedeadpunkand vars are same for centos/ubuntu which is another complication14:06
noonedeadpunkwe could probably to check for some required keys in the list14:07
jrosserwell perhaps this is then "all or nothing"14:07
noonedeadpunkbut I' not sure what is required except name14:07
jrosserthat we need a big transition to the new module, everywhere14:07
jrosseri realised pretty early that there was no way to retrofit the old data structure to the new module14:08
jrosserbut the new module is so flexible that pretty much any possibility is covered if we just expose all the module params and put in the right places for overrides14:09
jrosserit would make it possible to no longer need to pre-stage apt keys for internal repos for example, they could be in data14:10
noonedeadpunkyeah, I know and kinda agree on that... but also I'm quite sure that at least 2 times I will be pinged about that itnerally after upgrade failures14:28
jrosseri also didnt make a releasenote for this as it felt like one overall one was needed14:30
jrosser^ in the individual patches14:30
noonedeadpunk Ithink you did?14:30
jrosseroh did i?! long ago :_14:30
noonedeadpunkie https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/907833/4/releasenotes/notes/rabbitmq_repo_deb822-b47ef07ff462193f.yaml14:30
noonedeadpunkor well, not everywhere, but somewhere14:30
noonedeadpunkand https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/907752/5/releasenotes/notes/galera_repo_deb822-fc1aa6b88ee33b57.yaml14:31
noonedeadpunkok, so only openstack_hosts is not covered :D14:31
jrosserhah14:31
noonedeadpunkand debian bullseye seems not liking that at all15:03
noonedeadpunkhttps://zuul.opendev.org/t/openstack/build/86b88fc0c94146a0b6d5868d53b00840/log/job-output.txt#1150515:04
noonedeadpunksame for mariadb15:04
noonedeadpunksounds like bullseye expects smth different for signed-by then should be15:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Implement installation method selection for MariaDB role  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/91453015:36
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Implement installation method selection for MariaDB role  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/91453015:37
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Install skyline-console through yarn  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91440517:10
noonedeadpunkso this looks quite good now from what I can tell https://review.opendev.org/c/openstack/openstack-ansible/+/85944617:11
noonedeadpunkthe biggest question if 914405 pass :D17:12
jrosserthis is really cool stuff17:20
jrosserdamiandabrowski: mgariepy NeilHanlon is anyone around to look at https://review.opendev.org/c/openstack/openstack-ansible/+/914461 ?17:22
* NeilHanlon is around17:23
jrossercool, thanks!17:24
NeilHanlonof course :) 17:25
NeilHanlonbackport candidate? 17:25
noonedeadpunknah17:26
noonedeadpunkor well... dunno17:26
noonedeadpunkthat's good question actually17:26
noonedeadpunkas we probably should bump ansible version on stable braches to cover cve17:26
NeilHanlonmmm17:28
NeilHanloni mean, probably not incredibly urgent17:28
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_neutron master: Fix multiline yaml formatting in neutron systemd services  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/91454417:32
* jrosser throws rocks at ansible lint :/17:32
noonedeadpunkwell, we can at least on 2023.217:33
jrosser914544 can be checked with something like this 91454417:33
jrosserargh17:33
jrosserhttps://paste.opendev.org/show/b4rOfwsuGBVJsLwLr4F7/17:33
jrossernoonedeadpunk: out of interest, where does the actual skyline-console wheel get built?18:19
jrosseroh i see it gives skyline_console_yarn_build_path as the source location for skyline-console to python_venv_build role18:23
opendevreviewMerged openstack/openstack-ansible master: Remove Jinja from conditions  https://review.opendev.org/c/openstack/openstack-ansible/+/91446119:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!