Monday, 2023-05-22

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Bump down etcd version for zun  https://review.opendev.org/c/openstack/openstack-ansible/+/88371507:48
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun stable/zed: Install kata containers from source  https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/88371107:49
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Configure spice console on haproxy only when it is used  https://review.opendev.org/c/openstack/openstack-ansible/+/88368807:53
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Rename container_address to management_address  https://review.opendev.org/c/openstack/openstack-ansible/+/88314807:57
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Add management_ip option for metal hosts  https://review.opendev.org/c/openstack/openstack-ansible/+/87011309:54
noonedeadpunkI did bunch of rebases that have invalidated previous votes on them09:57
noonedeadpunk(as they were not "trivial" ones)09:57
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump upstream SHAs for Antelope  https://review.opendev.org/c/openstack/openstack-ansible/+/88355010:14
opendevreviewMerged openstack/openstack-ansible master: Add 'tls' scenario  https://review.opendev.org/c/openstack/openstack-ansible/+/88196810:40
noonedeadpunkI'm trying to figure out 2 things now - why volume tempest test is failing with ceph scenario and why jammy upgrade fails11:08
noonedeadpunkmanually vms are created without issues for ceph scenario12:08
jrosserare there two approaches12:10
jrosserlike 2 steps of create volume, boot instance using volume12:10
jrosservs. single step b-f-v of image=foo and the first step is done automatically12:10
jrosseri recall a bunch of work on cinder tempest tests recently to force waiting to be able to ssh in as well?12:19
jrosserperhaps that was volume detach specifically12:19
noonedeadpunkYes, so what fails is volume detach at the moment. But when I've tried these patches to wait for ssh (newer tempest), there were waaay more failures...13:22
noonedeadpunkbut it's not about tempest but smth related to the service config13:31
noonedeadpunk` Detected user call to delete in-use attachment. Call must come from the nova service and nova must be configured to send the service token. Bug #2004555` but defining service_user doesn't help13:39
noonedeadpunkI wonder if it could be somehow related to a client or smth...13:45
noonedeadpunkaha, ok.... my bad....13:48
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Add auth credentials for service_user  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/88384213:52
jrosserahha yes i was just reading the bug13:55
jrosserand got as far as https://review.opendev.org/c/openstack/kolla-ansible/+/883110/3/ansible/roles/nova-cell/templates/nova.conf.j213:55
jrossersaw `valid_interfaces = internal` there as well13:56
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_cinder master: Define service_user for cinder services  https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/88384413:57
noonedeadpunkI don't see valid_interfaces in config ref: https://docs.openstack.org/nova/latest/configuration/config.html#service-user13:58
jrosserhttps://opendev.org/openstack/keystoneauth/src/branch/master/keystoneauth1/loading/adapter.py#L4614:00
jrosserTIL14:00
noonedeadpunk`interface and valid_interfaces are mutually exclusive`14:01
noonedeadpunkAnd we use interface usually14:01
jrosserwhich seems deprecated14:02
noonedeadpunkwell. yes. we can replace with sed and mass-push the update14:03
noonedeadpunkhaven't really seen any deprecation notice in logs though...14:03
noonedeadpunkbut let's update it for B :)14:04
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump upstream SHAs for Antelope  https://review.opendev.org/c/openstack/openstack-ansible/+/88355014:07
noonedeadpunkhopefully this pass now ^14:07
jrosserthat is odd it's not in our deprecation logs as it seems like that for many releases now14:11
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Switch ubuntu upgrade jobs to Jammy  https://review.opendev.org/c/openstack/openstack-ansible/+/87989014:20
damiandabrowskijrosser: I've looked into your comments about ironic & tls-backend. I think we have 2 topics to discuss.14:42
damiandabrowski1. [iPXE] fetching image over https14:42
damiandabrowskiIt's not that simple as iPXE does not allow to disable cert verification.14:42
damiandabrowskiThe only option is to recompile iPXE with custom CA, so I suggest to mark it as out of scope for now.14:43
damiandabrowskihttps://lists.ipxe.org/pipermail/ipxe-devel/2014-November/003898.html14:43
damiandabrowski2. You mentioned that in some cases, IPA can communicate with ironic conductor *backend* directly.14:43
damiandabrowskiWhen exactly can this happen? Could you please share more details so I can update docs accordingly?14:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Ensure OVN is restarted on package update  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/88374014:44
noonedeadpunkdamiandabrowski: fwiw, I also had comments about docker & tls for zun14:45
noonedeadpunkbut it's merged already and not sure how much we should worry about that14:45
jrosserdamiandabrowski: the "default" config for ironic in OSA potentially puts br-bmaas as a isolated network14:46
jrosserand in that case the VIP is not accessible from the bmaas network14:47
jrosserso the solution there (as documented in my lxc example in the os_ironic docs) is to configure the ironic agent to make it's callback to the conductor which is responsible for it, directly14:48
jrosseras that has an interface on the bmaas network14:48
damiandabrowskinoonedeadpunk: thanks for reminding me, I'll check if we can do something about it before release14:53
damiandabrowskijrosser: ah, i get it now, but IIRC you mentioned conductor before and I see an override only for inspector15:01
damiandabrowskihttps://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/doc/source/configure-lxc-example.rst?display=source#L45615:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Drop OVN package installation from ovn_config  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/88385115:01
damiandabrowskihttps://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/templates/inspector.ipxe.j2#L815:01
jrosserdamiandabrowski: https://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/doc/source/configure-lxc-example.rst?display=source#L443-L44615:01
jrosserinspector is probably 3) on your list tbh15:02
damiandabrowskihmm, so it indicates that IPA can access service catalog. Do you know how is it possible on isolated bmaas network? 15:05
jrosserno this is config for inspector/conductor15:06
jrossernormally they use the catalog to construct the URL passed to IPA to use for the callback15:06
jrosserso in this case `endpoint_override` means, "dont use the catalog, use this URL instead"15:07
noonedeadpunkwe somehow broke tempest installation for distro scenario as well15:13
noonedeadpunkit's trying to follow source path15:13
noonedeadpunkhttps://zuul.opendev.org/t/openstack/build/dbe35ce58d43486db28bd727227516af15:13
damiandabrowskiahhh okok, and how exactly IPA receives ironic API URL from inspector/conductor?15:13
damiandabrowskias a response, after sending a request to ipa-inspection-callback-url ?15:13
* NeilHanlon has to mentally context switch to not read IPA as in FreeIPA, the identity management suite15:18
noonedeadpunk++15:18
jrosserdamiandabrowski: it needs something to start with, so generally kernel command line parameter15:18
jrosserit's a disk image which is pxebooted, and it looks like the options are kernel command line parameter or lookup with mdns15:19
damiandabrowskiyeah, but I can't see any parameter for setting ironic API URL here: https://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/templates/inspector.ipxe.j215:20
jrosserbut thats inspector15:28
jrosseri'm confused15:28
jrosserthey are two different things15:28
opendevreviewMerged openstack/openstack-ansible master: Configure spice console on haproxy only when it is used  https://review.opendev.org/c/openstack/openstack-ansible/+/88368815:28
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_tempest master: Make tempest respect service_install_method  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/88385415:29
jrosserdamiandabrowski: i'm not sure what we are talking about now tbh, ironic, or ironic inspector15:29
damiandabrowskii'm trying to figure out how this endpoint is passed to IPA :D 15:29
damiandabrowskihttps://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/doc/source/configure-lxc-example.rst?display=source#L443-L44615:29
jrossermy example shows overrides for both the ironic insepctor callback and the ironic-python-agent callback15:29
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_tempest master: Make tempest respect service_install_method  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/88385415:30
damiandabrowskiah wait, i think i understand it now15:30
jrosserdamiandabrowski: well ironic writes a lot of files out dynamically as well15:30
jrosserfor the pxebooting15:31
damiandabrowskiironic inspector container also has only bmaas network, right?15:31
jrossertheres a diagram :)15:31
jrosseri mean this also all "depends" of course15:31
damiandabrowskiso ironic inspector <> ironic API communication also needs to be done without haproxy15:31
jrosseryou can do what you like in any particular deployment15:31
damiandabrowskiyeah, I'm talking only about your example15:31
jrossermake it be on metal, routed networks, add a firewall etc15:32
jrosserwhat ironic inspector <> ironic API?15:32
damiandabrowskior no, i think i still do not understand it..15:32
damiandabrowski"# Direct IPA to callback directly to deploying ironic container (via BMAAS network)"15:32
damiandabrowskithis comment in particular15:32
spateljrosser I am getting this error during my upgrade of wallaby to Xena - https://paste.opendev.org/show/bKeyJvdFoldpPtI6iErQ/15:33
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_tempest master: Clean up old rhel variable files  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/88385615:33
jrosserdamiandabrowski: what in particular?15:33
spatelshould i do  -e galera_ignore_cluster_state=true ?15:33
noonedeadpunkspatel: does all your galera backends are up?15:33
noonedeadpunkas this might indicate that one of them are unhealthy15:33
damiandabrowskihow exactly service_catalog.endpoint_override affects IPA15:33
spatelHmm I think its dead.. damn it let me check 15:34
damiandabrowskii.e. what and how provides the value of service_catalog.endpoint_override to the IPA15:35
spatelnoonedeadpunk only infra-1 mysql is down 15:36
jrosserwell ironic writes out a whole load of files into /httpboot, when a node is booted15:36
jrosserspecific to that node15:37
spatelon infra-2 and 3 working fine..15:37
spatellet me try to restart and see15:37
damiandabrowskijrosser: yes, i even deployed ironic AIO trying to understand that, besides initramfs and kernel there is only inspector.ipxe15:39
damiandabrowskibut the value of ironic API endpoint is not provided in inspector.ipxe: https://opendev.org/openstack/openstack-ansible-os_ironic/src/branch/master/templates/inspector.ipxe.j215:39
damiandabrowskiahh sorry15:39
jrosserno becasue ironic != inspector15:39
damiandabrowski"when the node is booted"15:39
jrossersorry i keep saying this15:39
jrosserinspector config != ironic config15:40
damiandabrowskiyes, but IPA is a part of inspector, right?15:40
jrosserIPA is used by both yes i believe15:41
damiandabrowskiahhh ok15:42
jrosserironic-python-agent is used as part of the inspection process (i am slightly guessing now) that the relevant inspector parts are dropped into the regular i-p-a thing with dib15:43
jrosserbut i-p-a booted in ramdisk and used to write the actual OS to the physical disk during deployment with ironic, and also to wipe the disks during cleaning etc15:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_tempest master: Build wheels only for source isntalls  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/88385715:44
jrosserafaik there is work going on currently in the ironic project to merge inspector and ironic into one singular thing15:44
jrosserdamiandabrowski: i think that the reason you see a static config for the callback URL in the inspector case in the config file OSA writes is that nothing at all is known about the node at that point, the idea of inspector is to do discovery from zero, so you have to start somewhere15:46
damiandabrowskiahh thanks, it's complicated :| i'll go trough ironic docs once again trying to understand how exactly endpoint_override affects IPA15:46
jrosserbut for an already enrolled node that is being deployed later using ironic then there are * options for how it happens, so the actual config to pxeboot (and somewhere in there goes the callback url) are written out dynamically15:47
jrosserunfortunatly it also cleans these files up afterwards so i cant jsut grab an old one and take a look15:47
jrosserdamiandabrowski: we need to improve the CI here and add "virtualbmc" to our ironic tests to allow a virtual baremetal node to be booted15:49
spatelnoonedeadpunk This is very odd, I have upgraded 3 production cluster without error but today on 4th one hitting lots of error and strange issue 15:49
spatelPlaybook stuck here - TASK [galera_server : Install galera role remote packages (apt)] 15:49
opendevreviewMerged openstack/openstack-ansible stable/zed: Bump down etcd version for zun  https://review.opendev.org/c/openstack/openstack-ansible/+/88371515:50
opendevreviewMerged openstack/openstack-ansible master: Rename container_address to management_address  https://review.opendev.org/c/openstack/openstack-ansible/+/88314815:50
jrosserspatel: playbooks can get stuck like that if you have CTRL-C them at some point before and leave a python process behind holding the lock on the apt database15:50
damiandabrowskijrosser: ack, thanks for your patience!15:51
spateljrosser I didn't do any CTRL+C yet.. just trying to look into logs and see if i can find something 15:52
spatelJust test "apt update" in galera container and it works fine.. 15:52
jrosserok15:53
spatelwhere are ansible logs? 15:53
spateloh its on deployment node 15:54
spatelno activity happening in /var/log/apt/term.log on galera container15:55
spatelcontainer saying mysql is missing.. haha15:56
spatellet me reboot container and see15:56
spatelvery odd that it has all mariadb packages install but still not able to find mysql command 15:59
spateljrosser look like galera-2 node in wonky stat, it has package half done and now i am not able to remove them even i can see them 16:05
spatelis it safe to delete container ?16:05
spateland re-run - openstack-ansible setup-infrastructure.yml -e 'galera_upgrade=true' -e 'rabbitmq_upgrade=true' -e package_state=latest16:05
spatelhttps://paste.opendev.org/show/bkzqzhzMDNKsaWE4DyyI/16:06
damiandabrowskijrosser: found it! that's how ironic fetches API URL for IPA:16:07
damiandabrowskihttps://github.com/openstack/ironic/blob/69c8977e5094403ff8a06c1fdbac381ebfebc56a/ironic/drivers/modules/deploy_utils.py#L66816:07
damiandabrowskithen its passed as a ipa-api-url kernel parameter(so you were right about temporary files in /httpboot)16:08
damiandabrowskiPS. it's only used when deploying node16:08
jrosserand here is the other end of that in the agent https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/config.py#L30-L3716:08
damiandabrowskivictory \o/16:09
NeilHanlon#teamwork 16:14
spatelQuick question how do I re-add fresh node in galara cluster? I have removed all package on node-2 but playbook doesn't like that 16:18
spatelshould i delete container and re-run play ?16:18
jrosserthere are instructions for this in the guides16:25
jrosserhttps://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html16:26
spateljrosser I am reading that but in my case its whole container is dead and if i am re-running playbook then getting error that bring up mysql but i don't have mysql on that container 16:30
spatelTying to understand should I be going - openstack-ansible galera-install.yml --tags galera-bootstrap 16:31
spatelor just openstack-ansible galera-install.yml  will re-build my node-2 and add into existing node. 16:32
spatelThis is prod so taking baby step :(16:32
damiandabrowskispatel: if it's not primary galera node(`galera_cluster_members[0]`), I'd remove /var/lib/mysql on a broken container and just run `openstack-ansible galera-install.yml -l <broken_galera_container`16:51
spateldamiandabrowski how do i check that? This is node-2 (infra-2)16:52
damiandabrowskicd /opt/openstack-ansible; ansible -m ping galera_all[0]16:54
spatelostack-sin-infra-1-1_galera_container-dbb207e816:56
spatelIn my case node-1/3 is working one. 16:56
spatelnode-2 is dead16:56
damiandabrowskiok, feel free to run openstack-ansible galera-install.yml -l node-2 then16:56
damiandabrowskibut to ensure that galera won't have any troubles with state transfer, I'd remove /var/lib/mysql content beforehand16:57
spatelLet me understand that means it will use node-1 to do SST to node-2 correct?16:57
damiandabrowski(only on node-2 ofc)16:57
spatelok16:57
damiandabrowskihmm, IIRC donor is chosen randomly16:59
damiandabrowskibut I'm not 100% sure16:59
spateldamiandabrowski I have one more question after upgrade why we keep older packages hanging there? - https://paste.opendev.org/show/bOw5MabaWbAEcJl0hnQo/17:00
spatel10.5 is wallaby and 10.6 is Xena 17:01
noonedeadpunkLooks like 883857 is fixing distro jobs17:02
damiandabrowskihmm i'm not sure tbh, probably because it doesn't hurt to still have them17:05
damiandabrowskibut since galera does not support downgrades, they are not really useful17:05
noonedeadpunkspatel: we actually should not...17:05
noonedeadpunkI'd say that's likely a bug as packages we remove should be here: https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/vars/debian.yml#L60-L6317:06
noonedeadpunkbut wait....17:06
noonedeadpunkwhat older packages?17:06
noonedeadpunkrc means these are removed from the system17:07
damiandabrowskiah, i was looking for 'apt' instead of 'package' in our code :D so noonedeadpunk is right17:07
noonedeadpunksweet, 883550 almost passing as well17:09
spateldamiandabrowski I am getting this error when i run playbook -l node-2 17:10
spatelhttps://paste.opendev.org/show/byr2B536kxRVuZxzc8LD/17:10
spatelThis is rabbit hole :(17:13
spatellet me destroy container that is all left at this point 17:14
spatelI believe i should do openstack-ansible galera-install.yml (instead -l node-2)17:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Ensure OVN is restarted on package update  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/88374017:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Drop OVN package installation from ovn_config  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/88385117:18
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Switch ubuntu upgrade jobs to Jammy  https://review.opendev.org/c/openstack/openstack-ansible/+/87989017:18
noonedeadpunkno, you totally should not use limit here17:20
spatelI have destroy whole container and now re-running setup-host + galera-install.yml 17:22
NeilHanlonbtw i'm around if anything needs a +2 (especially if I already reviewed it ;) ). will check on review dashboard periodically17:22
NeilHanlonhappy monday :) 17:22
damiandabrowskinoonedeadpunk: weird, i was pretty sure that galera role supports limits O.o distro upgrade guide also suggests to run it with limits: 17:25
damiandabrowskihttps://docs.openstack.org/openstack-ansible/latest/admin/upgrades/distribution-upgrades.html#:~:text=limit%20localhost%2Creinstalled_host*-,openstack%2Dansible%20setup%2Dinfrastructure.yml%20%2D%2Dlimit%20localhost%2Crepo_all%2Crabbitmq_all%2Creinstalled_host*,-openstack%2Dansible%20setup17:25
noonedeadpunkoh, huh, I might be wrong then...17:26
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Implement support for octavia-ovn-provider driver  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/86846217:32
spateldamiandabrowski noonedeadpunk that works... all looking good now18:09
spatelwe should add this kind of scenario in doc but again.. we can't add all use case...:( 18:10
spatelThank you!!! 18:10
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Add immutable object cache documentation  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/88283018:24
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Add config and documentation for ceph perisistent write log cache  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/88284919:36
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Add releasenote for ceph immutable object cache and persistent write log cache  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/88285019:36
noonedeadpunkso, looks like all obvious issues are covered new20:57
noonedeadpunkcouple of rechecks needed due to intermittent issues21:02
noonedeadpunkI've updated https://etherpad.opendev.org/p/osa-antelope-leftover-patches with patches just in case21:02
jrosserfeels like we continually (like every few years) mess with the distro install for tempest21:02
noonedeadpunkyeah, when I was looking at the role I was wondering how it worked to be frank21:04
noonedeadpunkas it was more like lucky co-incidence21:04
jrosseridk if it is fixed now but there never used to be tempest plugins packaged for ubuntu21:04
noonedeadpunkno, it's not21:04
noonedeadpunkfor ubuntu it's still source install21:04
noonedeadpunkthough, we shouldn't try to build wheels in this case21:04
noonedeadpunkI'm not sure though if we should try to build wheels for tempest at all though21:05
noonedeadpunkas playbook run it against utility_all[0] always21:05
noonedeadpunkso wheels in this case kinda waste of time21:05
jrosserso here https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/883857/121:05
jrosserisnt `venv_build_host != inventory_hostname` always false when there is no repo server21:06
noonedeadpunkI was thinking to hardcode "no" there...21:06
jrosserright - so this is to make the code more obvious then, rather than change behaviour?21:06
noonedeadpunkvenv_build_host was failing to calculate21:06
noonedeadpunkdue to our changes21:06
jrosserahha i see21:06
noonedeadpunkwith 'dict object' has no attribute 'ubuntu_22.04_x86_64'21:07
jrosserhrrm we probably miss a default somewhere21:07
jrosserbut anyway21:07
noonedeadpunkBut what default that would be...21:07
noonedeadpunkas venv_build_targets should be empty dict I guess21:09
noonedeadpunkbut yeah, anyway21:10

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!