Monday, 2021-11-08

*** gouthamr_ is now known as gouthamr		04:07
*** akahat is now known as akahat\|rover		07:00
*** sshnaidm\|off is now known as sshnaidm		08:04
admin1	in an already running cluster, I have to add ceph support for only volumes .. this is what I did in the user_variables => https://gist.github.com/a1git/152e79a84700dc0bee43d69266c11990 and re-ran os-cinder and os-nova playbooks .. i am able to create the ceph volumes via cinder .. and i also see the /etc/ceph folder in nova compute and	08:38
admin1	virsh secret-list is added .. .. trying to mount the volume gives: 2021-11-08 03:37:22.092 3559222 ERROR oslo_messaging.rpc.server libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd' ... running qemu-system-x86_64 -drive format=? 2>&1 \| grep rbd ... this returns rbd so i think the support	08:38
admin1	is there	08:38
admin1	so checking here if anyone has seen this before and if I have missed anything in the viarbales ..	08:38
admin1	this is the error message: https://gist.githubusercontent.com/a1git/be15ef9f7e64fccb9a18e78f727ff592/raw/d0164a35633d1c29eceaedd2107d3dc3b1fb2c3e/gistfile1.txt	08:42
noonedeadpunk	admin1: and the same issue is there even when you try to create an instance from volume directly (not attaching volume to already running VM)?	08:44
admin1	noonedeadpunk, i have not tried that yet .. but could be the same	08:47
noonedeadpunk	because eventually comand should differ there. Also - have you checked apparmor log? Wondering if it might be unhappy for some reason	08:49
admin1	nothing about this in the kern logs	08:51
admin1	journalctl -u libvirtd -f => libvirtd[1125752]: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd'	08:58
kleini	rbd driver has been somehow separated from qemu code. I don't know any details. This results in running VMs, still running on older qemu code, need a restart to be able to load this separated rbd driver. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1495895	08:59
kleini	I had the same problem just with attaching volumes to VMs although having Ceph as backend for volumes from the beginning.	08:59
admin1	kleini, you had to restart the whole hypervisor ?	09:02
kleini	no, just the VM	09:02
admin1	oh	09:02
admin1	let me do a shutdown and try that out	09:02
noonedeadpunk	for rbd support qemu-block-extra is required but we should install it anyway	09:05
kleini	it was installed on my hypervisors but still the VM requires a complete restart	09:05
noonedeadpunk	huh intersesting. i guess due to secret that was added after vm startup	09:06
kleini	sorry, I did not dig deeper as I was satisfied to have solved the problem by restarting the VM	09:07
admin1	kleini thanks .. it worked \o/	09:15
jrosser_	so https://review.opendev.org/c/openstack/ansible-role-pki/+/816520 and https://review.opendev.org/c/openstack/ansible-role-pki/+/808022 were not quite duplicates of each other	10:23
agemuend	Dear all, is there a way to set WEBSSO_IDP_MAPPING in horizon conf? Its commented out in horizon_local_settings.py.j2, and it seems not possible to override it. We're trying to use openidc based identity federation with osa, which seems to generate most of the config correctly, but after being redirected back to horizon, it reports "No authentication backend could be determined to handle the provided credentials. This is likely a	10:34
agemuend	configuration error that should be addressed.", so we assume that we need to set that mapping variable	10:34
jrosser_	agemuend: we have an OIDC integration here and don't need to override that	10:36
jrosser_	if you do want to set a value for it you can use this https://github.com/openstack/openstack-ansible-os_horizon/blob/master/templates/horizon_local_settings.py.j2#L829	10:37
agemuend	Mhm, okay, then maybe we made a mistake.	10:38
jrosser_	because horizon_local_settings.py is a python file you have to generate valid python code with your override, which is why the normal override mechanism in openstack-ansible doesnt quite hold here	10:38
agemuend	I've read somewhere that the config override doesnt work for these settings because the file is python based	10:38
agemuend	axo	10:38
jrosser_	right, thats why you have to construct something in horizon_config_overrides which results in valid python. this is all ugly really	10:39
agemuend	I'd rather fix our error if it works in your case	10:39
agemuend	One thing we've been a bit confused about is the WEBSSO_CHOICES	10:40
agemuend	The examples use "openid", but the config file comment actually states:	10:40
agemuend	"Current supported protocol IDs are 'saml2' and 'oidc'"	10:40
agemuend	Could you maybe compare that in your file?	10:41
jrosser_	we set that to 'openid'	10:42
agemuend	Okay, mhm, we did that as well	10:42
agemuend	It generated the line ("openid", _("EGI")) for the WEBSSO_CHOICES, but it still says "no authentication backend could be determined" in Horizon after coming back from the login page	10:43
jrosser_	and EGI is the display_name of your idp?	10:48
jrosser_	keystone_sp -> trusted_idp_list -> display_name	10:49
agemuend	Yes	10:50
agemuend	Ah no, its actually keystone_sp -> trusted_idp_list -> name	10:51
agemuend	not display_name. Should it be the latter, is the name relevant for something else?	10:51
jrosser_	urgh its been a while :/	10:52
jrosser_	There are some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L72	10:58
jrosser_	i think in our case the IDP has a 'friendly name' that the users expect/recognise which includes a space, so thats not really appropriate for the 'name' field	10:59
jrosser_	theres some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L72	11:00
agemuend	Thanks, we'll check again	11:11
admin1	anyone uses swift with OSA ?	11:18
admin1	also, checking if anyone knows what project adds the NFV buttons to horizon	11:19
kleini	jrosser_: how do you create clouds.yaml and secure.yaml when using OIDC? how does everything headless authenticate then?	11:25
noonedeadpunk	I do recall that ppl use swift but I can't recall who exactly...	11:35
jrosser_	kleini: we use a combination of two things, this adds PKCE support to keystoneauth https://github.com/bbc/keystoneauth-oidc and then we have a shell script that interacts with a local browser to complete the OIDC authentication flow. The shell script either drops you into the interactive openstack client, or exits and leaves the relevant OS_<...> env vars set with a valid keystone token	11:45
jrosser_	truly headless things cannot authenticate in the usual manner as we have no support at all for username/password auth, so anything needing to do that uses application credentials	11:46
opendevreview	James Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain https://review.opendev.org/c/openstack/ansible-role-pki/+/816857	12:01
opendevreview	Andrew Bonney proposed openstack/openstack-ansible-os_octavia master: keypair: copy key to deploy host rather than setup host https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/816997	12:08
kleini	jrosser_: thanks. this clarifies for me such an authentication integration	12:27
jrosser_	in our case PCKE support in the IDP was critical	12:27
jrosser_	otherwise you have to give the client secret out to anyone wanting to use the CLI, which is pretty bad	12:27
jrosser_	PKCE allows secure auth without a client secret, and us very often used with 'public' clients such as mobile apps or embedded devices	12:29
opendevreview	James Gibson proposed openstack/openstack-ansible master: Add playbook to generate any user defined certificates https://review.opendev.org/c/openstack/openstack-ansible/+/816522	13:04
noonedeadpunk	wtf is that... https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/815631	13:12
noonedeadpunk	feels like tempestconf moval result	13:13
jrosser_	hrrm i wonder if the infra people are adjusting stuff	13:25
jrosser_	has tempestconf repo moved?	13:26
noonedeadpunk	Um, now I'm not sure. Just see tons of errors like `Unknown projects: osf/python-tempestconf` https://zuul.openstack.org/config-errors	13:34
noonedeadpunk	probably they are to be renamed...	13:34
noonedeadpunk	but anyway we don't decalre it anywhere as require-dprojects	13:40
opendevreview	James Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain https://review.opendev.org/c/openstack/ansible-role-pki/+/816857	13:50
spatel	folk, my stein release of openstack running rabbitmq-server-3.7.15 (look like it has lots of bugs, small network blip making cluster out of sync)	15:46
spatel	i want to upgrade this version	15:46
spatel	is this rabbitmq-server-3.7.15 hardcoded somewhere in playbook?	15:48
spatel	i found it - vars/redhat.yml	15:48
spatel	do i need to bump erlang if i bump rabbitmq version?	15:50
mgariepy	hmm. how comes you are on 3.7.15 ? you are not on the latest sha ?	15:51
mgariepy	https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/stable/stein/vars/redhat.yml#L36	15:52
spatel	I stuck with my older release of stein and now i am thinking to retire this stack and move it to ubunut	15:57
spatel	but regardless that i can just upgrade rabbitMQ right? if i don't want to touch other components	15:58
mgariepy	yep you cloud indeed.	15:58
mgariepy	take the version from the repo that file ?	15:58
spatel	let me give it a try and just upgrade rabbitMQ with this new version	15:58
mgariepy	from that file in the repo lol	15:58
mgariepy	monday is hard lol.	15:59
spatel	_rabbitmq_erlang_repo_url: right?	15:59
spatel	i will sure give this try on my lab and then i will do.. anyway this is 4th time my rabbitMQ is down in last 1 week	15:59
mgariepy	upgrade the rabbit role to the stable/stein ?	15:59
spatel	its giving me hard time	15:59
mgariepy	not fun	16:00
spatel	my problem is we are getting DDoS and that causing little blip in network making this rabbitMQ cluster go bad.. i am not able to figure out what is happening in logs because logs are not useful	16:01
spatel	all i am doing nuke rabbitMQ and everything come back	16:01
spatel	i do have other openstack stack on same LAN and they are fine..	16:01
spatel	only this guys doesn't like small blip so assuming its kind of BUG	16:01
mgariepy	what version the other stack is running ?	16:04
spatel	they are running on qeens mgariepy	16:26
spatel	queens*	16:28
mgariepy	hmm ok	16:31
spatel	that is why i am so sure that may be i am on wrong version on stein..	16:42
spatel	what do you guys use to monitor rabbitMQ, any good way or just check pid etc..?	16:43
spatel	mgariepy thinking why don't i go directly to 3.9.8-1.el7 ?	16:50
spatel	that is most latest version	16:51
spatel	does it required any OSA change if i directly push to 3.9.8-1 ?	16:51
mgariepy	spatel, are both rabbitmq queues configured the same on both deployment ?	18:53
spatel	what do you mean by same queues?	18:55
spatel	I have two openstack cloud both running on same datacenter same switch fabtic	18:55
spatel	I have two openstack cloud both running on same datacenter same switch fabric	18:55
spatel	but only stein one giving me hard time	18:55
spatel	currently i am upgrading them	18:56
spatel	installing - rabbitmq-server-3.7.28-1.el7.noarch	18:57
mgariepy	i saw in the past not sure when. but rabbitmq queues switched from non-ha to ha.	19:12
mgariepy	so i'm wondering if both your deployment have the same config for the rabbitmq queues.	19:13
spatel	i have HA on both rabbitMQ	19:21
spatel	i never mess with default setting	19:21
spatel	how do i increase memory setting in rabbitMQ ?	19:21
spatel	currently i have 0.2 setting whatever default comes with OSA	19:22
spatel	thinking may be rabbit crying for memory	19:22
spatel	mgariepy i can see my ready queue is getting full	19:41
spatel	any good command to clean up queue	19:41
spatel	i can see scheduler_fanout_9ce95bd354464c5bbf0ef65d01e4bdb0 is growing	19:42
mgariepy	hmm why is this one growing?	19:53
spatel	i don't know :(	19:58
spatel	it has 80k mesg in queue in "ready"	19:58
spatel	again nuking rabbitMQ and re-building it	19:58
spatel	something is very odd going on..	19:58
mgariepy	it the number of msg that is crashing it ?	19:58
spatel	may be my rabbitMQ need more memory	19:59
mgariepy	you probably need to find why they are not consumed i guess ?	19:59
spatel	as soon as i upgrade rabbitMQ and start noticing growing queue	19:59
spatel	how do i increase this value using OSA - vm_memory_high_watermark, 0.2	20:01
spatel	i have 250 compute nodes and may be my rabbitMQ need more memory	20:01
spatel	i didn't find this variable in default/main.yml	20:01
spatel	mgariepy just rebuild and now this is the status - https://ibb.co/L5cTSMg	20:05
spatel	watching closely to see if queue growing or not	20:05
spatel	I have 11k consumer do you think its big number?	20:10
spatel	hmm i have rpc_workers=1 in neutron.conf may be this number is very slow causing backlog	20:13
spatel	damn it.. that was it..	20:22
spatel	rpc_workers=1 was my issue	20:22
spatel	as soon as i bump that up my rabbitMQ is draining like hell	20:22
spatel	i will keep eye on it and let you know if this setting resolve my issue..	20:23
spatel	what is q-server-resource-versions_fanout ?	20:27
mgariepy	how big is that cluster ?	21:31
mgariepy	arf..	21:32
mgariepy	i'm not sure what can be the issue, i don't have a cluster that big on openstack.	21:34
mgariepy	11k consumer. i guess it probably X thread per process..	21:35
mgariepy	or per service	21:35
spatel	I have 250 compute nodes in this RabbitMQ cluster	21:49
spatel	I think my cluster was missing some critical tuning options, as i said rpc_workers, rpc_pool_size and agent report_interval etc..	21:50
spatel	https://docs.mirantis.com/mcp/q4-18/mcp-deployment-guide/advanced-config/tune-rabbitmq-perf.html	21:50
spatel	I am using this guide to adjust them according..	21:51
spatel	Thinking we should create one document in official OSA related what setting you can play to scale out your stack	21:53
spatel	More good stuff - https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Evolution-of-OpenStack-Networking-at-CERN3.pdf	21:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!