* NeilHanlon took a pass at some reviews while he was avoiding sleeping | 04:14 | |
opendevreview | Merged openstack/openstack-ansible master: [doc] Rename extending-osa page https://review.opendev.org/c/openstack/openstack-ansible/+/915078 | 04:26 |
---|---|---|
opendevreview | Merged openstack/openstack-ansible master: [doc] Document usage of user.rc file https://review.opendev.org/c/openstack/openstack-ansible/+/915076 | 04:30 |
opendevreview | Merged openstack/openstack-ansible-os_trove master: Manage trove images through openstack_resources role https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/918103 | 06:03 |
noonedeadpunk | mornings | 07:28 |
noonedeadpunk | thanks Neil! | 07:28 |
jrosser_ | so the manila trouble is starting with an exception in cinder https://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/host/cinder-api.service.journal-17-03-38.log.txt#2670 | 07:50 |
noonedeadpunk | well | 07:57 |
noonedeadpunk | tempest config does not contain image ids | 07:57 |
noonedeadpunk | https://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/etc/host/tempest/tempest.conf.txt#67-68 | 07:57 |
noonedeadpunk | I think it should be detected by this: https://opendev.org/openstack/openstack-ansible-os_tempest/src/branch/master/tasks/tempest_resources.yml#L202-L217 | 07:58 |
noonedeadpunk | but also this looks wierd: https://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/etc/host/tempest/tempest.conf.txt#29 | 07:58 |
noonedeadpunk | so it's quite weird | 08:01 |
noonedeadpunk | on positive note - https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/868462 passed ovn test :) | 08:04 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Implement support for octavia-ovn-provider driver https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/868462 | 08:16 |
noonedeadpunk | andrewbonney: it would be nice to have a release not covering tags change for https://review.opendev.org/c/openstack/openstack-ansible/+/918615 | 08:26 |
andrewbonney | I think I've added one | 08:28 |
noonedeadpunk | snap | 08:28 |
noonedeadpunk | sorry :D | 08:28 |
andrewbonney | No problem :) | 08:28 |
opendevreview | Merged openstack/openstack-ansible-os_adjutant master: reno: Update master for unmaintained/zed https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/919143 | 08:53 |
jrosser_ | oh well i think in the past we had some thing to use a local image file in CI for tempest | 09:02 |
farbod | Hi, after upgrading from xena to yoga everything was ok except neutron-server and neutron-rpc-server. | 09:22 |
farbod | The problem is that its using a lot of resources. Actually it uses all the memory and other services getting down. | 09:22 |
farbod | here are the logs of neutron-server when I start it: https://paste.opendev.org/show/bMt9mf825ndj81dLa1Z7/ | 09:22 |
farbod | I tried to decrease threadpool executer and workers but nothing changed. | 09:22 |
noonedeadpunk | yeah, adjutant is severely borked as of today.... | 09:25 |
noonedeadpunk | farbod: well, one thing you can try, is to disable uwsgi for neutron | 09:25 |
noonedeadpunk | though I've spotted recently, that just disabling uwsgi won't bring neutron-rpc-server down on it's own | 09:26 |
noonedeadpunk | so you can define `neutron_use_uwsgi: False` in user_variables and run openstack-ansible os-neutron-install.yml --limit neutron_server | 09:26 |
noonedeadpunk | once that is done, you can use ad-hoc to stop/disable/mask rpc service | 09:27 |
noonedeadpunk | ie - `cd /opt/openstack-ansible; ansible -m service -a "name=neutron-rpc-server status=stopped enabled=false masked=true" neutron_server` | 09:27 |
farbod | Thanks. Let me test it | 09:29 |
jrosser_ | i did notice that neutron were adding some wsgi zuul jobs so hopefully this situation is going to improve | 09:34 |
noonedeadpunk | well, afaik they still don't support ovn | 09:34 |
noonedeadpunk | at least last time I asked around caracal branching it was not | 09:35 |
jrosser_ | https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/919725 | 09:35 |
farbod | What do I lose by disabling uwsgi for neutron-server? | 09:47 |
noonedeadpunk | kinda nothing | 09:49 |
noonedeadpunk | just eventlet vs wsgi | 09:49 |
noonedeadpunk | it used to be eventlet for neutron for quite a while, but trend is just to go with wsgi everywhere | 09:50 |
farbod | Does this problem exist in later versions? | 09:51 |
noonedeadpunk | um, don't really know, I'm not sure we've seen issues with resources | 09:54 |
noonedeadpunk | but we have quite heavy hardware for control plane | 09:55 |
farbod | Oh you mean with enough resources I can use uwsgi? | 09:57 |
jrosser_ | i think it means more that we have not seen issues with large amounts of resources needed by neutron server | 09:58 |
noonedeadpunk | yeah, exactly | 09:59 |
farbod | how much e.g? | 09:59 |
jrosser_ | depends what you mean but on one infra node i see neutron-server wanting 50% of one CPU core and ~500M ram | 10:03 |
farbod | So now with 2 workers and 4 thread pool executer on a 8core 64Gb RAM on one of my infra nodes my neutron server is using about 20GB of memory and its growing... | 10:05 |
noonedeadpunk | Um.... SOmething is very off I'd say... | 10:06 |
noonedeadpunk | Though, I think we're running 2023.1 as of today | 10:07 |
noonedeadpunk | We never stayed on Yoga long enough | 10:07 |
noonedeadpunk | As just did Xena->Yoga->2023.1 right away | 10:07 |
semantic | So, trying to revert https://github.com/openstack/oslo.messaging/commit/fd2381c723fe805b17aca1f80bfff4738fbe9628 makes things even worse in my case, with rabbit-server constantly logging something like that https://paste.opendev.org/show/bMl8Te9hxAvyXrNDEpOy/ | 10:09 |
halali | Hi, seems with tag 27.4.2 and ubuntu-22.04 redeploy, the nova-api got failed to reconnect to newly RabbitMQ deployed node https://paste.openstack.org/show/bZqhykeJV8ygwJSQXWiX/ and requires nova daemon restart | 10:18 |
noonedeadpunk | feels quite alike to what semantic is experiencing | 10:27 |
jrosser_ | i think we might try some more to replicate this | 10:30 |
jrosser_ | though it is clearly a problem between rabbitmq <> oslo.messaging <> nova | 10:30 |
jrosser_ | it is not really at all an openstack-ansible problem, as far as i can see | 10:31 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Define release names in documentation https://review.opendev.org/c/openstack/openstack-ansible/+/919814 | 10:43 |
noonedeadpunk | unless we miss that reporting on failover flag to oslo.messaging | 10:45 |
noonedeadpunk | `enable_cancel_on_failover` | 10:46 |
noonedeadpunk | as that somehow sounds very related at least to me | 10:46 |
noonedeadpunk | https://www.rabbitmq.com/docs/ha#cancellation | 10:47 |
opendevreview | Merged openstack/openstack-ansible master: Deploy horizon by default with metal AIO scenarios https://review.opendev.org/c/openstack/openstack-ansible/+/916005 | 10:48 |
noonedeadpunk | so like - instead of waiting for reply, client wil lbe notified that it has no chance to get it? | 10:48 |
noonedeadpunk | I would really like to try that out, but there's no very easy way to apply it everywhere | 10:49 |
jrosser_ | well - if you have ideas to try i think this afternoon we might look at it some more | 10:51 |
jrosser_ | andrewbonney has some time to spend on this i think | 10:51 |
noonedeadpunk | I would try to add `enable_cancel_on_failover = True` to [oslo_messaging_rabbit] section of configs | 10:55 |
noonedeadpunk | I'm not sure if that has that much effect with qourum queus enabled, but it really might help with HA queues | 10:55 |
andrewbonney | My suspicion is that now reply queues are actually HA this isn't really an RMQ issue but something internal getting confused, but happy to try things | 10:56 |
noonedeadpunk | well, what this doc describes, is that this is particulary useful for HA queues | 11:04 |
noonedeadpunk | as they seem to be not master/master anyway | 11:04 |
noonedeadpunk | so clients might need to be acked about queues being moved to another host, as it's not transparent process... | 11:04 |
noonedeadpunk | but I can be wrong | 11:04 |
noonedeadpunk | but yeah, that's probably more about duplicates.... | 11:05 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Enable RabbitMQ Quorum Queues by default https://review.opendev.org/c/openstack/openstack-ansible/+/919816 | 11:19 |
noonedeadpunk | huh, I just realized, that distro install method has no chance of passing ^ | 11:21 |
noonedeadpunk | due to severe old versions of rabbitmq | 11:22 |
noonedeadpunk | in centos.... | 11:22 |
noonedeadpunk | or maybe it's not _that_ bad... | 11:34 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Update rabbitmq/erlang to latest versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/919821 | 11:34 |
noonedeadpunk | ooops, cinder looks borked: https://zuul.opendev.org/t/openstack/build/bd5791375b98451f8c8ef136dd0ec9b9/log/logs/host/cinder-api.service.journal-10-02-53.log.txt#2795-2824 | 11:51 |
noonedeadpunk | huh, why it's trying amqp driver.... | 11:52 |
noonedeadpunk | nah, amqp is part of rabbit implementation anyway | 11:59 |
farbod | For neutron high resource usage which i mentioned above, I tried to check logs in debug mode but every thing seems OK. Any of you guys have any idea for this or a way to troubleshooting neutron-server. neutron-server was ok in xena, but after upgrade it started to consume all the resources | 12:08 |
semantic | enable_cancel_on_failover is not supported wih quorum queues it seems: 2024-05-16 12:20:09.800 49061 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 10.33.177.116:5671 after inf tries: Basic.consume: (406) PRECONDITION_FAILED - invalid arg 'x-cancel-on-ha-failover' for queue 'reply_compute2:nova-compute:1' in vhost 'nova' of queue type rabbit_quorum_queue: amqp.exceptions.PreconditionFai | 12:24 |
semantic | led: Basic.consume: (406) PRECONDITION_FAILED - invalid arg 'x-cancel-on-ha-failover' for queue 'reply_compute2:nova-compute:1' in vhost 'nova' of queue type rabbit_quorum_queue | 12:24 |
noonedeadpunk | farbod: and you have reverted to non-UWSGI mode? | 12:43 |
noonedeadpunk | huh | 12:44 |
noonedeadpunk | interesting | 12:44 |
noonedeadpunk | semantic: vmware docs say it should be... | 12:44 |
noonedeadpunk | https://docs.vmware.com/en/VMware-RabbitMQ-for-Kubernetes/1/rmq/migrate-mcq-to-qq.html#:~:text=x%2Dcancel%2Don%2Dha,sent%20again%20(duplicate%20messages). | 12:44 |
noonedeadpunk | "Most of the cases covered by x-cancel-on-ha-failover do not exist with quorum queues but those that are not covered are still there" | 12:44 |
farbod | yes i disabled uwsgi | 12:45 |
farbod | here is the end of log: | 12:47 |
farbod | https://paste.opendev.org/show/bcKmwaRfg2BrwMMJltm7/ | 12:47 |
farbod | it doesnt continue and doesnt respond to requests | 12:47 |
mnaser | jrosser_: are you still seeing a bunch of NODE_FAILUREs? | 12:56 |
jrosser_ | mnaser: no, but last time i looked at grafana something still looked wrong | 12:56 |
mnaser | I wonder where those are being launched | 12:57 |
jrosser_ | https://grafana.opendev.org/d/b283670153/nodepool3a-vexxhost?orgId=1&from=now-7d&to=now | 12:57 |
jrosser_ | "something" happened and theres a ton of things deleting forever | 12:57 |
jrosser_ | i *think* that when deleting was at 32 i got NODE_FAILURE (i guess that is the number of available 32G instances?) and it's hovering now at 28 | 12:58 |
semantic | Well https://github.com/rabbitmq/rabbitmq-server/blob/58b36b808878d5e29c49cd40eae3286b06291ca1/deps/rabbit/src/rabbit_quorum_queue.erl makes me think that quorum queues really do not support x-cancel-ha-on-failover as it is missed in capabilities/consumer_arguments as opposed to classic queue currently. Though i may be reading it wrong of course... | 13:05 |
noonedeadpunk | yeah, you can be right about that | 13:06 |
noonedeadpunk | according to https://review.opendev.org/q/topic:%22osa/rmq-migrate%22 - only Cinder is broken with qourum queues | 13:12 |
noonedeadpunk | and weirdly broken.... | 13:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Implement support for octavia-ovn-provider driver https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/868462 | 13:14 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Use NFS Ganesha 5 https://review.opendev.org/c/openstack/openstack-ansible/+/919714 | 13:18 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Update cirros image for manila tempest https://review.opendev.org/c/openstack/openstack-ansible/+/919702 | 13:21 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add quorum queues support for service https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/898914 | 13:22 |
noonedeadpunk | mhm, so probavbly issue with cinder is in non-unique name of the service in hsm... | 13:29 |
noonedeadpunk | which would make sense | 13:29 |
noonedeadpunk | ugh | 13:31 |
noonedeadpunk | as `/dev/shm/aio1_uwsgi_qmanager` would be same for all services using uwsgi | 13:34 |
noonedeadpunk | and that merged by https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqpdriver.py#L64-L65 | 13:35 |
noonedeadpunk | which kinda configurable: https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/impl_rabbit.py#L248-L255 | 13:35 |
noonedeadpunk | and will be just fine for LXC.... | 13:35 |
noonedeadpunk | it probably won't for metal | 13:35 |
noonedeadpunk | meaning... we should supply processname in config | 13:36 |
noonedeadpunk | meaning... another series of patches... | 13:36 |
noonedeadpunk | but apparently I found what;s wrong with manila images... | 13:37 |
noonedeadpunk | crap. but then I guess counter will be reset each time :( | 13:43 |
NeilHanlon | noonedeadpunk: regarding mq on rocky/centos... if there's a want to have a newer version, I can investigate | 14:20 |
NeilHanlon | there used to be a messaging SIG | 14:20 |
noonedeadpunk | I realized that I mixed up mq and mariadb | 14:22 |
NeilHanlon | i do that all the time with rabbit and redis | 14:23 |
* NeilHanlon also needs to look at the lxc-templates stuff | 14:24 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Enable RabbitMQ Quorum Queues by default https://review.opendev.org/c/openstack/openstack-ansible/+/919816 | 15:33 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_cinder master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/919674 | 15:33 |
noonedeadpunk | can't think of anything better, then just disable queue manager by default :( | 15:33 |
noonedeadpunk | it has weird default in roles... | 15:34 |
noonedeadpunk | but really no good solution here | 15:34 |
opendevreview | Merged openstack/openstack-ansible-os_heat master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_heat/+/919675 | 20:49 |
opendevreview | Merged openstack/openstack-ansible-os_glance master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/919673 | 20:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add service policies defenition https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918129 | 20:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add variable to globally control notifications enablement https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918130 | 20:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Implement variables to address oslo.messaging improvements https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918131 | 20:50 |
opendevreview | Merged openstack/openstack-ansible-os_skyline master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/919694 | 20:51 |
opendevreview | Merged openstack/openstack-ansible-os_horizon master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/919676 | 20:52 |
opendevreview | Merged openstack/openstack-ansible-os_barbican master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/919671 | 20:52 |
opendevreview | Merged openstack/openstack-ansible-os_ironic master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_ironic/+/919684 | 20:52 |
opendevreview | Merged openstack/openstack-ansible-os_octavia master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/919687 | 20:53 |
opendevreview | Merged openstack/openstack-ansible-os_keystone master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/919670 | 20:53 |
opendevreview | Merged openstack/openstack-ansible-os_swift master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/919678 | 20:53 |
opendevreview | Merged openstack/openstack-ansible-os_blazar master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_blazar/+/919689 | 20:53 |
jrosser_ | noonedeadpunk: ^ i have no idea why these are only running the docs job in the gate queue :/ | 20:54 |
opendevreview | Merged openstack/openstack-ansible-os_aodh master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/919682 | 20:54 |
opendevreview | Merged openstack/openstack-ansible-os_nova master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/918614 | 20:54 |
opendevreview | Merged openstack/openstack-ansible-os_designate master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_designate/+/919677 | 20:54 |
opendevreview | Merged openstack/openstack-ansible-os_placement master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_placement/+/919672 | 20:56 |
opendevreview | Merged openstack/openstack-ansible-os_trove master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/919686 | 20:56 |
opendevreview | Merged openstack/openstack-ansible-os_ceilometer master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/919681 | 20:56 |
opendevreview | Merged openstack/openstack-ansible-os_ceilometer master: Add service policies defenition https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/918105 | 20:56 |
opendevreview | Merged openstack/openstack-ansible-os_ceilometer master: Implement variables to address oslo.messaging improvements https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/918107 | 20:56 |
jrosser_ | omg it is also in the check jobs https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918130?tab=change-view-tab-header-zuul-results-summary | 20:58 |
opendevreview | Merged openstack/openstack-ansible-os_tacker master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_tacker/+/919688 | 20:58 |
opendevreview | Merged openstack/openstack-ansible-os_mistral master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_mistral/+/919692 | 20:58 |
opendevreview | Merged openstack/openstack-ansible-os_magnum master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/919685 | 21:04 |
opendevreview | Merged openstack/openstack-ansible-os_neutron master: Add tag to enable targeting of post-install config elements only https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/919695 | 21:06 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Remove murano from zuul required projects, it is now retired https://review.opendev.org/c/openstack/openstack-ansible/+/919902 | 21:23 |
jrosser_ | ok, so don't approve anything at all. until we merge that ^^^ | 21:30 |
jrosser_ | otherwise testing is basically bypassed | 21:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!