Thursday, 2021-07-08

*** zbr is now known as Guest16405:03
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Exclude neutron from venv constraints  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/79896005:47
*** rpittau|afk is now known as rpittau06:59
*** sshnaidm_ is now known as sshnaidm08:25
mindthecaphi! Asking this here aswell (copying from openstack channel): i'm gettin an error with clean install using OSA victoria. I can't create instance - volume attachment fails with cinder error "Invalid input received: Connector doesn't have required information: initiator)". The error is persistent when i try to attach already made volume to the instance.11:48
mindthecap.I'm out of ideas what / where to check. I'm using ISCSI (lvm).11:48
jrossermindthecap: i'm guessing you've got something like this in your config https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L631-L64212:23
fridtjof[m]mindthecap: try (re)starting iscsi related services on the compute node. I don't remember which one did the trick right now12:33
mindthecapthanks! for some reason iscsid was disabled and stopped on compute hosts. Started them and it works.12:42
mindthecapIt's weird that the service is stopped and not started event tho i have replayed OSA deployment scripts many times.12:43
*** rpittau is now known as rpittau|afk12:45
jrossermindthecap: which OS are you using?13:03
spatelany idea related this error during building new vm - {"message": "Build of instance 4e65ec9b-1b47-4972-98f9-2430d67eece5 aborted: Failed to allocate the network(s), not rescheduling.", "code": 500, "created": "2021-07-08T13:08:01Z"}13:12
spateli am not seeing any bad error in neutron logs13:12
spatelstill looking for more evidence 13:12
noonedeadpunkwell, I see issues for neutron OVN jobs for master13:13
noonedeadpunkor you're not talking about OVN?13:13
spatelno no13:15
spateli have real production issue in my old openstack13:15
spatelwhat is the issue related OVN?13:15
noonedeadpunkI was seing that when dhcp agent was stuck for some reason in ovs13:15
spatelhmm! in CI job?13:16
spatelsend me link i will try to debug and see 13:16
noonedeadpunkno, in our prod) so  when nutron dhcp agent was acting weird, we were not able to create VM with same issue and nothing in any logs13:16
noonedeadpunkregarding OVN - https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/79896013:17
spatelhmm let me check DHCP logs and see 13:17
noonedeadpunkthere's issue with calico, but I think I will just bump it back to old version for now...13:17
spatelok13:17
spatelnoonedeadpunk my /var/log/neutron/neutron-dhcp-agent.log looking very clean 13:19
spatelnoticed in this neutron log - 2021-07-08 09:19:42.699 26878 ERROR oslo.messaging._drivers.impl_rabbit [-] [6cd375b0-8f3a-4a70-b2fa-91d52197b74b] AMQP server on 172.28.15.248:5671 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer13:20
spatelrestarting and see 13:20
spatelnoonedeadpunk do you think this is something serious - http://paste.openstack.org/show/807274/14:07
noonedeadpunkum, well, it could mean that either smth wrong with rabbit (but you would see that in other services) or dhcp just stuck for some reason14:12
noonedeadpunkand not replying messages14:12
spatelrabbitMQ cluster looking health so not sure what is the issue, but i can restart rabbitMQ cluster14:13
noonedeadpunkand restart of dhcp agents didn't help?14:14
spateldoes just restarting rabbitMQ service is enough?14:14
spatelno help with dhcp agent restart14:14
noonedeadpunkhm...14:14
jrosseryou could try starting it with debug logging14:14
spatelneutron agent?14:15
spateli meant dhcp14:15
jrosseryeah, you'd get some idea if it was just sitting doing nothing, or somehow spinning and failing14:15
spatellet me try that also... 14:16
noonedeadpunkalso - is it regarding only one network or all networks are failing?14:17
spateli am able to delete vm but not able to create, does that also related to rabbitMQ issue? i know if rabbitMQ is not working then you can't delete vm 14:17
noonedeadpunkas what we also did - we were adding another dhcp agent to the network14:17
noonedeadpunkas it might be issue with namespace actually14:17
spatelThis is related to any network.. when i create vm it stuck in BUILD and then throw error - aborted: Failed to allocate the network(s), not rescheduling14:18
noonedeadpunkwell in our case it sometimes was dependant on the network where port for VM resides14:19
noonedeadpunks/sometimes/most times14:19
spatelhmm14:23
spateljrosser i have enable debug in /etc/neutron/dhcp_agent.ini is that correct place?14:23
spatelafter restart agent not seeing any good info except this error - http://paste.openstack.org/show/807274/14:24
jrosseri think it's normally somewhere right at the top of /etc/neutron/neutron.conf14:24
spatellet me try that 14:24
spateldoes DHCP agent talk to RabbitMQ?14:25
jrosserthis looks similar https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=177476414:26
spatelhmm but not saying any solution or something14:27
jrosserbut kind of unhelpful other than it does reference some patches like https://review.opendev.org/c/openstack/neutron/+/659274 and https://review.opendev.org/c/openstack/neutron/+/69456114:27
jrosserbut i'm just totally guessing14:28
spatelwhen its saying timeout does that it saying fail to talk to neutron server or rabbitMQ?14:29
spateljrosser after restarting all nova-* service look like i am able to spin up vms14:36
spateldoesn't make any sense 14:36
jrossernothing useful in the nova log?14:37
jrosseranyway remember to put any debug back to False :)14:37
spateli noticed some errors like failed to talk to neutron14:37
jrosserthat is probably related14:37
jrosseras booting the VM / creating the port are very much coupled14:38
spatelI strongly believe my neutron-server is under presser 14:38
spatelI have 800 vms running on this cloud.. 14:39
spateldoes more vms put pressure on neutron or compute host?14:39
spatelon this cloud i have 260 compute hosts and 800 VMs.. 14:41
noonedeadpunkit would put pressure on rabbit at the first place15:02
noonedeadpunkso maybe it's timeouting for reason15:02
*** frickler is now known as frickler_pto15:03
spatelhmm 15:04
spatelhow do you guys scale rabbitMQ?15:05
spatelDo you guys using some kind of different HA queue in rabbitMQ? like don't sync ABC queue etc?15:06
spateli have noticed rabbit is doing bad job when you sync all queue 15:06
noonedeadpunkit really does15:06
noonedeadpunkha queue is really a penalty on performance15:07
spatelso what do you suggest? I am running everything default come out from OSA 15:07
spatelnever thought of playing with ha queue15:07
noonedeadpunkbut otherwise you might get issues when restarting rabbit15:08
noonedeadpunkI'm not sure though, might be with maintenance mode this can be workedaround, but I never had a time to play a lot with it15:08
noonedeadpunkalso it's available only from rabbit 3.8 or smth like that15:09
noonedeadpunkeventually I know that at some scale ppl even start using dedicated nodes for rabbit to gain some more performance15:10
noonedeadpunkand have like 5-7 nodes for rabbit...15:10
spatelagreed on dedicated nodes, but still even you add more node that add more load on HA syncing job 15:11
noonedeadpunkiirc there were also some new features to rabbit that allowed to sync queues between specific nodes only, but not sure here15:11
noonedeadpunkyeah, they don't use ha queus15:12
noonedeadpunknot sure how they recover in case of rabbit failure though...15:12
noonedeadpunkas what we saw previously without ha queus is that services got stuck with rabbit failover for some reason. Maybe it's solved now though15:12
spatelwhen i last time talk to someone they said they are running this HA policy which help them a lot - http://paste.openstack.org/show/807279/15:13
spatelno HA for notifications* etc which is useless 15:14
spateli will try to play in lab and see how we can make RabbitMQ more responsive... its painful when you want to scale..15:15
spatelOVN can solved lots of rabbitMQ issue but its itself a beast 15:15
fridtjof[m]jrosser: re the issue mindthecap was having, i've personally experienced this on ubuntu 18.04 at least15:47
jrosserah - i was wondering if it was a centos type thing where the service is installed but disabled by default15:48
*** mgoddard- is now known as mgoddard20:50
opendevreviewGhanshyam proposed openstack/openstack-ansible master: Moving IRC network reference to OFTC  https://review.opendev.org/c/openstack/openstack-ansible/+/80012723:25
opendevreviewGhanshyam proposed openstack/ansible-hardening master: Moving IRC network reference to OFTC  https://review.opendev.org/c/openstack/ansible-hardening/+/80012823:26

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!