Thursday, 2024-01-18

bbezakMorning, kevko, frickler: please review - https://review.opendev.org/c/openstack/kolla-ansible/+/905858, change which is testing that looks good https://review.opendev.org/c/openstack/kolla/+/90457608:09
fricklerbbezak: thx for the reminder, +3. I also notice that there seems to be a severe lack of debian coverage, will bump that on my todo list once again08:14
bbezakthx08:17
opendevreviewDarin Chakalov proposed openstack/kolla-ansible master: Enables Cinder Volume Image Caching.  https://review.opendev.org/c/openstack/kolla-ansible/+/90602908:32
opendevreviewMerged openstack/kolla-ansible master: use docker_custom_config override for Kolla CI upgrade jobs  https://review.opendev.org/c/openstack/kolla-ansible/+/90585809:47
SvenKieske\o/09:58
opendevreviewGrzegorz Koper proposed openstack/kolla-ansible master: Configure nova-compute to support exposing vendordata over configdrive  https://review.opendev.org/c/openstack/kolla-ansible/+/90584310:31
kevko\o/10:31
opendevreviewGrzegorz Koper proposed openstack/kolla-ansible master: Configure missing nova services to support exposing vendordata over configdrive  https://review.opendev.org/c/openstack/kolla-ansible/+/90584310:32
HighLowGuys I'm having hard time fixing a issues with instance reboot on a fresh deployment of kolla ansible for 2023.2. Is this a right forum to ask for some assistance?10:55
kevkoHighLow: it's not ..but go ahead, we will try to help you 10:55
HighLowThank you 10:55
HighLowHere is my env, Ubuntu 22.04 Vanilla spun with with MaaS, 3 controller vm, 3 compute and 1 neutron. 10:56
SvenKieskekevko: why is it not?10:56
HighLowUploaded nova images and created a VM with image - Worked good10:56
kevkoSvenKieske: well, this is kolla irc channel ..not openstack-nova channel ... but as I said ... we will help :) 10:57
HighLowI use Netapp backend with iscsi and multipath enabled. 10:57
HighLowWhen the VM instance is up, its properly showing both multipath as active and enabled on OS layer in compute.10:58
SvenKieskekevko: I bet openstack-nova would say "this is not kolla channel, ask there" ;)10:58
HighLowWhen I issue openstack reboot, it just gets stuck forever. Volume gets unmapped on the storage end and it was send from openstack compute node as per storage log. 10:58
HighLowhttps://bugs.launchpad.net/kolla-ansible/+bug/2049574 If possible can you please check this link 10:59
HighLowWhen I use bootable volume and spin up the image, it is reboot as expected11:00
kevkoSvenKieske: Well, he is trying to solve a problem with instance reboot ...and as you can see ..he is asking for specific NetApp driver ...  I think openstack-nova is definitely better channel than kolla11:00
HighLowReboot issues happens on LVM as well, saying device not found11:01
SvenKieskekevko: might be, might not be; in my experience you get more help from the deployment project channels because there are actual operators there who know how to fix and diagnose stuff in prod envs. if it's code related I agree though.11:02
HighLowDocker is 24.0.411:02
SvenKieskeHighLow: did you use the "--wait" flag also?11:02
HighLow I tried once, but it stuck and after sometime, the instance moves to Error state11:02
HighLowI can share my screen if required, if I did not get my point across clearly. Noobie here, so please be patient with me LOL11:03
kevkoHighLow: well, i don't have experience with netapp actually .. did you switch to debug ? to see more logs ? 11:05
HighLow9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] [instance: 9f2931f0-2752-457e-8943-f0a48a509e3c] Booting with volume-backed-image ffeca47e-d6e0-41c3-865c-8e2673e45009 at /dev/vda 2024-01-18 10:13:30.242 7 INFO oslo.privsep.daemon [req-5c56425d-5f9b-44cc-81f8-821d9ce88468 req-8cc0708c-bda1-4a4c-a07c-93b6e263d7c9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] Running 11:06
SvenKieskewhat I can say is, that we don't have, in fact, automated tests that utilize "openstack server reboot" in any form, so that might be worthwhile to add?11:06
HighLow024-01-18 10:13:31.210 7 WARNING os_brick.initiator.connectors.nvmeof [req-5c56425d-5f9b-44cc-81f8-821d9ce88468 req-8cc0708c-bda1-4a4c-a07c-93b6e263d7c9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] Process execution error in _get_host_uuid: Unexpected error while running command. Command: blkid overlay -s UUID -o value Exit code: 2 Stdout: '' Stderr: '': oslo_concurrency.processutils.ProcessExecutionErro11:06
HighLowAny idea why its trying to access this file - SvenKieske /sys/module/nvme_core/parameters/multipath11:06
HighLowI;m only using iscsi11:06
SvenKieskeHighLow if you post many lines of logs it might be better to use https://paste.opendev.org for that11:07
kevkoHighLow:  currently checking the code if it is relevant ..11:07
SvenKieskethat seems to be a function call in os-brick project. maybe it's built lazy and just checks all multipath stuff for everything it knows, even if not necessary?11:08
HighLowThanks you - Here you go - https://paste.opendev.org/show/bOAbb3rQ4fBIUEPj7nW7/ This is what happens during the VM creation backed by Image11:08
SvenKieskehttps://docs.openstack.org/os-brick/latest/reference/os_brick/initiator/connector.html states "The connectors here are responsible for discovering and removing volumes for each of the supported transport protocols."11:08
HighLowAnd it tries to run this command - Command: blkid overlay -s UUID -o value and errors out with generic unknown error11:09
kevkoHighLow: this is already fixed i think ...but it shoudn't be related 11:09
HighLowThe wierd thing I cant wrap my head around is somehow openstack compute node is sending an unmap command to netapp lun11:10
HighLowWhen I ask my storage buddies to map the lun manually, voila it reboots successfully 11:11
SvenKieskekevko: this looks related? https://bugs.launchpad.net/os-brick/+bug/191913211:11
HighLowand subsequent reboots are good11:11
SvenKieskein my experience multipath is often buggy, because few people actually use it and even less people report bugs and fix them when stuff goes wrong.11:12
SvenKieskeso you already stand out, HighLow :D11:12
HighLowLooks like it @SvenKieske. But only difference is OP of that bug is using FC, while I use Iscsi11:12
SvenKieskeyeah, but I doubt it's only relevant for FC, like stated in the bug, but I didn't really look into the code (I'm no os-brick expert)11:13
HighLowI have used Yoga back in the day for a POC that went nowhere, dont remember seeing this problem. That was with netapp backend as well11:13
HighLowSo it looks to me like only the recent versions are affected.11:14
HighLowAnd there are soooo many deprication warning in OS brick11:15
kevkoSvenKieske: it's for FC ..11:15
kevkoHighLow: did you try to use latest os-brick ? 11:17
SvenKieskeyeah, but if FC has broken multipath which seems to never have been fixed, chances are higher than zero that iscsi is also broken? :D11:17
HighLowNo, how do I change that ?11:18
HighLowos-brick run on container or on the host os11:18
HighLow?11:18
kevkoHighLow: everything related to openstack running in container 11:18
kevkoin kolla deployment 11:18
HighLowunderstood, but I'm not really sure how to upgrade os-brick though :(11:19
kevkoHighLow: well, rewrite the openstack-base and build new images locally 11:20
HighLowgot it, I will try to do this.. 11:21
kevkoHighLow: check my review for pycadf -> i am installing from source instead of pip -> https://review.opendev.org/c/openstack/kolla/+/904576 ...you can do the same with os-brick ... 11:21
kevkoHighLow: because if I check the os-brick git ..there are some fixes with multipath11:22
HighLowsorry, I'm a bit lost here11:23
kevkoHighLow: I am aftraid that's only thing I can help you with ... don't have experiences with netapp11:23
HighLowdo you mean I should install the os-brick from source, in the dockerfile?11:24
HighLowand build image from that?11:24
kevkoHighLow: 1. you can build your images from edited code in kolla ... 2. you can go into nova_compute container ... source venv for kolla located in /var/lib/kolla/venv ; git checkout os-brick master for example ..and pip3 install -e /path-to-os-brick-checkouted11:25
kevkoHighLow: yes 11:25
HighLowI checked with netapp guys, they are pointout that openstack is unmapping the lun, Provided more cinder logs to them, to see if they can help from netapp side11:25
kevkoos-brick should be OK to upgrade11:25
HighLowI will do the upgrade inside container for now and let you know shortly11:26
HighLowdoing it right away11:26
SvenKieskeah well, you might want to test that; or is this a test environment? :)11:27
HighLowThis is a POC lab, so no problem, nothing critical runs here11:28
SvenKieskecool, you can also look at the kolla docs on how to customize your setup (in this case using a different os-brick version)11:29
SvenKieskehttps://docs.openstack.org/kolla/latest/admin/image-building.html#build-openstack-from-source11:29
SvenKieskethat might be new to you if you don't already build your own container images. it's a little work, but it's a good thing to do if you plan for production usage anyway11:30
HighLowyes, if it works, I will do this for sure11:30
HighLowpip install -e git+https://github.com/openstack/os-brick/@stable/2023.211:31
SvenKieskeplease update the bug report if you find a solution/os-brick is the culprit. people tend to forget that if it works ;)11:31
HighLowIs that the correct command11:32
greatgatsbyGood day.  We've started our yoga -> zed upgrade, and are looking at the local patches we apply in yoga that we'll need to port to zed.  Can someone confirm there is nothing in zed that solves this ticket related to graceful l3 agent restarts?  https://review.opendev.org/c/openstack/kolla-ansible/+/87476911:35
SvenKieskeHighLow: I'm not sure what the "-e" option does there, my man page doesn't reference it? The rest looks fine though.11:36
greatgatsbywe put together our own hacky solution when we discovered a couple tickets related to this problem, and it's been fine so far, but from the looks of it, there's nothing in zed that solves network connectivity to VMs getting dropped during a deploy?11:36
kevkoHighLow: yes it is if your want to 2023.2  ...but source into kolla venv    so      source /var/lib/kolla/venv/bin/activate i think11:36
HighLowthat did not work, I did this, git cloned it and installed it from the local folder11:36
greatgatsby-e is for editable, as in you want to be able to edit it in place, which is usually for dev purposes11:37
HighLowos-brick==6.4.0 (earlier)  Successfully installed os-brick-6.6.011:37
HighLowaaahhh ok I get it now11:37
SvenKieskegreatgatsby: yes, this is not fixed. I happen to know this patchset fairly well. you could apply it locally I guess, as I happened to run this in prod in the past11:37
greatgatsbySvenKieske: excellent, thanks11:38
SvenKieskegreatgatsby: notice though that it also does not really fix the problem at it's core, but it's a better solution than the current upstream one in master branch. there is a plan to fix this eventually completely but nobody had really time to do this yet11:38
SvenKieskegreatgatsby: if you have a better patch, just post it ;)11:38
HighLowSven, stay here with me for a second, I will try to schedule the vm now and update you 11:39
greatgatsbyI don't think ours is better and likely wouldn't work in all deployments, we end up making the l3 agent restart handler just loop and call a separate tasks file11:39
HighLowShould I restart the container ? Sven11:41
kevkoHighLow: you need to 11:41
SvenKieskeyeah, I liked the approach in the above change, an ex colleague invented it. the ultimate goal will be afaik to start the processes in different containers11:41
kevko+111:42
HighLowRestarted the container on one my compute and scheduled a vm creation with rocky image now11:44
greatgatsbyok - thanks again, just wanted to be 100% sure we didn't miss something before I start patching files11:44
kevkoreally cool patch 11:45
kevkoi will write it to my todo list probably :D 11:45
SvenKieskewell the inline python is really ugly11:45
HighLowIts still doing this - blkid overlay -s UUID -o value and errors out..11:45
kevkoyeah - but the idea is nice 11:45
kevkoHighLow: blkid is somewhere fixed ..but it's not related i think 11:45
kevkoHighLow: let me check where it's fixed 11:45
SvenKieskeHighLow: mhmm too bad, please add this information to the bug report, but I guess we should switch this bug report over to os-brick, it looks like a multipath issue to me11:46
opendevreviewMerged openstack/kolla-ansible master: Drop more remnants of install_type  https://review.opendev.org/c/openstack/kolla-ansible/+/90588011:46
HighLowIts still the same reboot problem. Instance stuck at hard reboot, multipath is in fail state.11:46
HighLowOnly happens with image based instance 11:47
HighLowOhhh wait a min11:49
HighLow@Sven, That might have worked out. I'll retry this again and report back11:51
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Fix neutron DNS integration  https://review.opendev.org/c/openstack/kolla-ansible/+/90585211:53
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate  https://review.opendev.org/c/openstack/kolla-ansible/+/90564411:53
HighLowNo it did not work, the stale multipath device is still present. But when I use this now orphaned image and spin a new vm, the old vm is also active. The Volume page says the volume is now attached to the new vm, but it has io errors, but the old instance that just became active has the no errors11:58
HighLow(Head Scratching moment really ) @SvenKieske11:59
HighLowUpdated the bug report12:04
HighLowIts as if the multipath connection is not terminated properly in this case ..12:07
*** priteau_ is now known as priteau12:14
Core7908.12:36
opendevreviewRafal Lewandowski proposed openstack/kolla-ansible master: Enable ML2/OVN and distributed FIP by default  https://review.opendev.org/c/openstack/kolla-ansible/+/90495912:41
wuchunyangHi, cores, help me review this trove commit, thanks in advance. https://review.opendev.org/c/openstack/kolla-ansible/+/86332112:59
opendevreviewGrzegorz Koper proposed openstack/kolla-ansible master: Configure missing nova services to expose vendordata over configdrive  https://review.opendev.org/c/openstack/kolla-ansible/+/90584313:05
opendevreviewMerged openstack/kolla master: Fix openstack CADF audit maps and installation  https://review.opendev.org/c/openstack/kolla/+/90457613:19
opendevreviewMartin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario  https://review.opendev.org/c/openstack/kolla-ansible/+/83694114:24
kevkojust curious how big is your openstacks ? 14:26
kevko*are14:27
Core5633kevlo: are you asking me?14:49
SvenKieskekevko: well that depends, various customers, with various sizes.. from a handful of hosts to large data centers. and yours?15:28
opendevreviewMartin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario  https://review.opendev.org/c/openstack/kolla-ansible/+/83694116:24
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: Fix neutron DNS integration  https://review.opendev.org/c/openstack/kolla-ansible/+/90585216:35
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate  https://review.opendev.org/c/openstack/kolla-ansible/+/90564416:35
opendevreviewRoman KrĨek proposed openstack/kolla-ansible master: Split ipv4 and ipv6 systemctl config  https://review.opendev.org/c/openstack/kolla-ansible/+/90583117:27
kevkoSvenKieske: 4291 regular vms, 938 amphoras, 112 computes, 21660VCPUs 60TB ram 17:31
kevkoI was just wondering if it is small medium big compared to others :D 17:31
spatelkevko :O17:32
spatelkevko What you guys using for monitoring and logging? 17:33
spatelI am also looking for that solution 17:33
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate  https://review.opendev.org/c/openstack/kolla-ansible/+/90564418:02
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate  https://review.opendev.org/c/openstack/kolla-ansible/+/90564421:27
opendevreviewMichal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate  https://review.opendev.org/c/openstack/kolla-ansible/+/90564422:25

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!