bbezak | Morning, kevko, frickler: please review - https://review.opendev.org/c/openstack/kolla-ansible/+/905858, change which is testing that looks good https://review.opendev.org/c/openstack/kolla/+/904576 | 08:09 |
---|---|---|
frickler | bbezak: thx for the reminder, +3. I also notice that there seems to be a severe lack of debian coverage, will bump that on my todo list once again | 08:14 |
bbezak | thx | 08:17 |
opendevreview | Darin Chakalov proposed openstack/kolla-ansible master: Enables Cinder Volume Image Caching. https://review.opendev.org/c/openstack/kolla-ansible/+/906029 | 08:32 |
opendevreview | Merged openstack/kolla-ansible master: use docker_custom_config override for Kolla CI upgrade jobs https://review.opendev.org/c/openstack/kolla-ansible/+/905858 | 09:47 |
SvenKieske | \o/ | 09:58 |
opendevreview | Grzegorz Koper proposed openstack/kolla-ansible master: Configure nova-compute to support exposing vendordata over configdrive https://review.opendev.org/c/openstack/kolla-ansible/+/905843 | 10:31 |
kevko | \o/ | 10:31 |
opendevreview | Grzegorz Koper proposed openstack/kolla-ansible master: Configure missing nova services to support exposing vendordata over configdrive https://review.opendev.org/c/openstack/kolla-ansible/+/905843 | 10:32 |
HighLow | Guys I'm having hard time fixing a issues with instance reboot on a fresh deployment of kolla ansible for 2023.2. Is this a right forum to ask for some assistance? | 10:55 |
kevko | HighLow: it's not ..but go ahead, we will try to help you | 10:55 |
HighLow | Thank you | 10:55 |
HighLow | Here is my env, Ubuntu 22.04 Vanilla spun with with MaaS, 3 controller vm, 3 compute and 1 neutron. | 10:56 |
SvenKieske | kevko: why is it not? | 10:56 |
HighLow | Uploaded nova images and created a VM with image - Worked good | 10:56 |
kevko | SvenKieske: well, this is kolla irc channel ..not openstack-nova channel ... but as I said ... we will help :) | 10:57 |
HighLow | I use Netapp backend with iscsi and multipath enabled. | 10:57 |
HighLow | When the VM instance is up, its properly showing both multipath as active and enabled on OS layer in compute. | 10:58 |
SvenKieske | kevko: I bet openstack-nova would say "this is not kolla channel, ask there" ;) | 10:58 |
HighLow | When I issue openstack reboot, it just gets stuck forever. Volume gets unmapped on the storage end and it was send from openstack compute node as per storage log. | 10:58 |
HighLow | https://bugs.launchpad.net/kolla-ansible/+bug/2049574 If possible can you please check this link | 10:59 |
HighLow | When I use bootable volume and spin up the image, it is reboot as expected | 11:00 |
kevko | SvenKieske: Well, he is trying to solve a problem with instance reboot ...and as you can see ..he is asking for specific NetApp driver ... I think openstack-nova is definitely better channel than kolla | 11:00 |
HighLow | Reboot issues happens on LVM as well, saying device not found | 11:01 |
SvenKieske | kevko: might be, might not be; in my experience you get more help from the deployment project channels because there are actual operators there who know how to fix and diagnose stuff in prod envs. if it's code related I agree though. | 11:02 |
HighLow | Docker is 24.0.4 | 11:02 |
SvenKieske | HighLow: did you use the "--wait" flag also? | 11:02 |
HighLow | I tried once, but it stuck and after sometime, the instance moves to Error state | 11:02 |
HighLow | I can share my screen if required, if I did not get my point across clearly. Noobie here, so please be patient with me LOL | 11:03 |
kevko | HighLow: well, i don't have experience with netapp actually .. did you switch to debug ? to see more logs ? | 11:05 |
HighLow | 9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] [instance: 9f2931f0-2752-457e-8943-f0a48a509e3c] Booting with volume-backed-image ffeca47e-d6e0-41c3-865c-8e2673e45009 at /dev/vda 2024-01-18 10:13:30.242 7 INFO oslo.privsep.daemon [req-5c56425d-5f9b-44cc-81f8-821d9ce88468 req-8cc0708c-bda1-4a4c-a07c-93b6e263d7c9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] Running | 11:06 |
SvenKieske | what I can say is, that we don't have, in fact, automated tests that utilize "openstack server reboot" in any form, so that might be worthwhile to add? | 11:06 |
HighLow | 024-01-18 10:13:31.210 7 WARNING os_brick.initiator.connectors.nvmeof [req-5c56425d-5f9b-44cc-81f8-821d9ce88468 req-8cc0708c-bda1-4a4c-a07c-93b6e263d7c9 47508d32186a45b1bcff2a5e30e1b92e 99c5d5886e9040558a9486fb26af942a - - default default] Process execution error in _get_host_uuid: Unexpected error while running command. Command: blkid overlay -s UUID -o value Exit code: 2 Stdout: '' Stderr: '': oslo_concurrency.processutils.ProcessExecutionErro | 11:06 |
HighLow | Any idea why its trying to access this file - SvenKieske /sys/module/nvme_core/parameters/multipath | 11:06 |
HighLow | I;m only using iscsi | 11:06 |
SvenKieske | HighLow if you post many lines of logs it might be better to use https://paste.opendev.org for that | 11:07 |
kevko | HighLow: currently checking the code if it is relevant .. | 11:07 |
SvenKieske | that seems to be a function call in os-brick project. maybe it's built lazy and just checks all multipath stuff for everything it knows, even if not necessary? | 11:08 |
HighLow | Thanks you - Here you go - https://paste.opendev.org/show/bOAbb3rQ4fBIUEPj7nW7/ This is what happens during the VM creation backed by Image | 11:08 |
SvenKieske | https://docs.openstack.org/os-brick/latest/reference/os_brick/initiator/connector.html states "The connectors here are responsible for discovering and removing volumes for each of the supported transport protocols." | 11:08 |
HighLow | And it tries to run this command - Command: blkid overlay -s UUID -o value and errors out with generic unknown error | 11:09 |
kevko | HighLow: this is already fixed i think ...but it shoudn't be related | 11:09 |
HighLow | The wierd thing I cant wrap my head around is somehow openstack compute node is sending an unmap command to netapp lun | 11:10 |
HighLow | When I ask my storage buddies to map the lun manually, voila it reboots successfully | 11:11 |
SvenKieske | kevko: this looks related? https://bugs.launchpad.net/os-brick/+bug/1919132 | 11:11 |
HighLow | and subsequent reboots are good | 11:11 |
SvenKieske | in my experience multipath is often buggy, because few people actually use it and even less people report bugs and fix them when stuff goes wrong. | 11:12 |
SvenKieske | so you already stand out, HighLow :D | 11:12 |
HighLow | Looks like it @SvenKieske. But only difference is OP of that bug is using FC, while I use Iscsi | 11:12 |
SvenKieske | yeah, but I doubt it's only relevant for FC, like stated in the bug, but I didn't really look into the code (I'm no os-brick expert) | 11:13 |
HighLow | I have used Yoga back in the day for a POC that went nowhere, dont remember seeing this problem. That was with netapp backend as well | 11:13 |
HighLow | So it looks to me like only the recent versions are affected. | 11:14 |
HighLow | And there are soooo many deprication warning in OS brick | 11:15 |
kevko | SvenKieske: it's for FC .. | 11:15 |
kevko | HighLow: did you try to use latest os-brick ? | 11:17 |
SvenKieske | yeah, but if FC has broken multipath which seems to never have been fixed, chances are higher than zero that iscsi is also broken? :D | 11:17 |
HighLow | No, how do I change that ? | 11:18 |
HighLow | os-brick run on container or on the host os | 11:18 |
HighLow | ? | 11:18 |
kevko | HighLow: everything related to openstack running in container | 11:18 |
kevko | in kolla deployment | 11:18 |
HighLow | understood, but I'm not really sure how to upgrade os-brick though :( | 11:19 |
kevko | HighLow: well, rewrite the openstack-base and build new images locally | 11:20 |
HighLow | got it, I will try to do this.. | 11:21 |
kevko | HighLow: check my review for pycadf -> i am installing from source instead of pip -> https://review.opendev.org/c/openstack/kolla/+/904576 ...you can do the same with os-brick ... | 11:21 |
kevko | HighLow: because if I check the os-brick git ..there are some fixes with multipath | 11:22 |
HighLow | sorry, I'm a bit lost here | 11:23 |
kevko | HighLow: I am aftraid that's only thing I can help you with ... don't have experiences with netapp | 11:23 |
HighLow | do you mean I should install the os-brick from source, in the dockerfile? | 11:24 |
HighLow | and build image from that? | 11:24 |
kevko | HighLow: 1. you can build your images from edited code in kolla ... 2. you can go into nova_compute container ... source venv for kolla located in /var/lib/kolla/venv ; git checkout os-brick master for example ..and pip3 install -e /path-to-os-brick-checkouted | 11:25 |
kevko | HighLow: yes | 11:25 |
HighLow | I checked with netapp guys, they are pointout that openstack is unmapping the lun, Provided more cinder logs to them, to see if they can help from netapp side | 11:25 |
kevko | os-brick should be OK to upgrade | 11:25 |
HighLow | I will do the upgrade inside container for now and let you know shortly | 11:26 |
HighLow | doing it right away | 11:26 |
SvenKieske | ah well, you might want to test that; or is this a test environment? :) | 11:27 |
HighLow | This is a POC lab, so no problem, nothing critical runs here | 11:28 |
SvenKieske | cool, you can also look at the kolla docs on how to customize your setup (in this case using a different os-brick version) | 11:29 |
SvenKieske | https://docs.openstack.org/kolla/latest/admin/image-building.html#build-openstack-from-source | 11:29 |
SvenKieske | that might be new to you if you don't already build your own container images. it's a little work, but it's a good thing to do if you plan for production usage anyway | 11:30 |
HighLow | yes, if it works, I will do this for sure | 11:30 |
HighLow | pip install -e git+https://github.com/openstack/os-brick/@stable/2023.2 | 11:31 |
SvenKieske | please update the bug report if you find a solution/os-brick is the culprit. people tend to forget that if it works ;) | 11:31 |
HighLow | Is that the correct command | 11:32 |
greatgatsby | Good day. We've started our yoga -> zed upgrade, and are looking at the local patches we apply in yoga that we'll need to port to zed. Can someone confirm there is nothing in zed that solves this ticket related to graceful l3 agent restarts? https://review.opendev.org/c/openstack/kolla-ansible/+/874769 | 11:35 |
SvenKieske | HighLow: I'm not sure what the "-e" option does there, my man page doesn't reference it? The rest looks fine though. | 11:36 |
greatgatsby | we put together our own hacky solution when we discovered a couple tickets related to this problem, and it's been fine so far, but from the looks of it, there's nothing in zed that solves network connectivity to VMs getting dropped during a deploy? | 11:36 |
kevko | HighLow: yes it is if your want to 2023.2 ...but source into kolla venv so source /var/lib/kolla/venv/bin/activate i think | 11:36 |
HighLow | that did not work, I did this, git cloned it and installed it from the local folder | 11:36 |
greatgatsby | -e is for editable, as in you want to be able to edit it in place, which is usually for dev purposes | 11:37 |
HighLow | os-brick==6.4.0 (earlier) Successfully installed os-brick-6.6.0 | 11:37 |
HighLow | aaahhh ok I get it now | 11:37 |
SvenKieske | greatgatsby: yes, this is not fixed. I happen to know this patchset fairly well. you could apply it locally I guess, as I happened to run this in prod in the past | 11:37 |
greatgatsby | SvenKieske: excellent, thanks | 11:38 |
SvenKieske | greatgatsby: notice though that it also does not really fix the problem at it's core, but it's a better solution than the current upstream one in master branch. there is a plan to fix this eventually completely but nobody had really time to do this yet | 11:38 |
SvenKieske | greatgatsby: if you have a better patch, just post it ;) | 11:38 |
HighLow | Sven, stay here with me for a second, I will try to schedule the vm now and update you | 11:39 |
greatgatsby | I don't think ours is better and likely wouldn't work in all deployments, we end up making the l3 agent restart handler just loop and call a separate tasks file | 11:39 |
HighLow | Should I restart the container ? Sven | 11:41 |
kevko | HighLow: you need to | 11:41 |
SvenKieske | yeah, I liked the approach in the above change, an ex colleague invented it. the ultimate goal will be afaik to start the processes in different containers | 11:41 |
kevko | +1 | 11:42 |
HighLow | Restarted the container on one my compute and scheduled a vm creation with rocky image now | 11:44 |
greatgatsby | ok - thanks again, just wanted to be 100% sure we didn't miss something before I start patching files | 11:44 |
kevko | really cool patch | 11:45 |
kevko | i will write it to my todo list probably :D | 11:45 |
SvenKieske | well the inline python is really ugly | 11:45 |
HighLow | Its still doing this - blkid overlay -s UUID -o value and errors out.. | 11:45 |
kevko | yeah - but the idea is nice | 11:45 |
kevko | HighLow: blkid is somewhere fixed ..but it's not related i think | 11:45 |
kevko | HighLow: let me check where it's fixed | 11:45 |
SvenKieske | HighLow: mhmm too bad, please add this information to the bug report, but I guess we should switch this bug report over to os-brick, it looks like a multipath issue to me | 11:46 |
opendevreview | Merged openstack/kolla-ansible master: Drop more remnants of install_type https://review.opendev.org/c/openstack/kolla-ansible/+/905880 | 11:46 |
HighLow | Its still the same reboot problem. Instance stuck at hard reboot, multipath is in fail state. | 11:46 |
HighLow | Only happens with image based instance | 11:47 |
HighLow | Ohhh wait a min | 11:49 |
HighLow | @Sven, That might have worked out. I'll retry this again and report back | 11:51 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Fix neutron DNS integration https://review.opendev.org/c/openstack/kolla-ansible/+/905852 | 11:53 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate https://review.opendev.org/c/openstack/kolla-ansible/+/905644 | 11:53 |
HighLow | No it did not work, the stale multipath device is still present. But when I use this now orphaned image and spin a new vm, the old vm is also active. The Volume page says the volume is now attached to the new vm, but it has io errors, but the old instance that just became active has the no errors | 11:58 |
HighLow | (Head Scratching moment really ) @SvenKieske | 11:59 |
HighLow | Updated the bug report | 12:04 |
HighLow | Its as if the multipath connection is not terminated properly in this case .. | 12:07 |
*** priteau_ is now known as priteau | 12:14 | |
Core7908 | . | 12:36 |
opendevreview | Rafal Lewandowski proposed openstack/kolla-ansible master: Enable ML2/OVN and distributed FIP by default https://review.opendev.org/c/openstack/kolla-ansible/+/904959 | 12:41 |
wuchunyang | Hi, cores, help me review this trove commit, thanks in advance. https://review.opendev.org/c/openstack/kolla-ansible/+/863321 | 12:59 |
opendevreview | Grzegorz Koper proposed openstack/kolla-ansible master: Configure missing nova services to expose vendordata over configdrive https://review.opendev.org/c/openstack/kolla-ansible/+/905843 | 13:05 |
opendevreview | Merged openstack/kolla master: Fix openstack CADF audit maps and installation https://review.opendev.org/c/openstack/kolla/+/904576 | 13:19 |
opendevreview | Martin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario https://review.opendev.org/c/openstack/kolla-ansible/+/836941 | 14:24 |
kevko | just curious how big is your openstacks ? | 14:26 |
kevko | *are | 14:27 |
Core5633 | kevlo: are you asking me? | 14:49 |
SvenKieske | kevko: well that depends, various customers, with various sizes.. from a handful of hosts to large data centers. and yours? | 15:28 |
opendevreview | Martin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario https://review.opendev.org/c/openstack/kolla-ansible/+/836941 | 16:24 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: Fix neutron DNS integration https://review.opendev.org/c/openstack/kolla-ansible/+/905852 | 16:35 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate https://review.opendev.org/c/openstack/kolla-ansible/+/905644 | 16:35 |
opendevreview | Roman KrĨek proposed openstack/kolla-ansible master: Split ipv4 and ipv6 systemctl config https://review.opendev.org/c/openstack/kolla-ansible/+/905831 | 17:27 |
kevko | SvenKieske: 4291 regular vms, 938 amphoras, 112 computes, 21660VCPUs 60TB ram | 17:31 |
kevko | I was just wondering if it is small medium big compared to others :D | 17:31 |
spatel | kevko :O | 17:32 |
spatel | kevko What you guys using for monitoring and logging? | 17:33 |
spatel | I am also looking for that solution | 17:33 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate https://review.opendev.org/c/openstack/kolla-ansible/+/905644 | 18:02 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate https://review.opendev.org/c/openstack/kolla-ansible/+/905644 | 21:27 |
opendevreview | Michal Arbet proposed openstack/kolla-ansible master: [CI] Test neutron DNS integration and designate https://review.opendev.org/c/openstack/kolla-ansible/+/905644 | 22:25 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!