Monday, 2021-11-08

*** gouthamr_ is now known as gouthamr04:07
*** akahat is now known as akahat|rover07:00
*** sshnaidm|off is now known as sshnaidm08:04
admin1in an already running cluster, I have to add ceph support for only volumes .. this is what I did in the user_variables => https://gist.github.com/a1git/152e79a84700dc0bee43d69266c11990       and re-ran  os-cinder and os-nova playbooks ..   i am able to create the ceph volumes via cinder .. and i also see the /etc/ceph folder in nova compute and08:38
admin1virsh secret-list is added ..  .. trying to mount the volume gives:  2021-11-08 03:37:22.092 3559222 ERROR oslo_messaging.rpc.server libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd' ... running  qemu-system-x86_64 -drive format=? 2>&1 | grep rbd    ... this returns rbd so i think the support08:38
admin1is there08:38
admin1so checking here if anyone has seen this before and if I have missed anything in the viarbales .. 08:38
admin1this is the error message: https://gist.githubusercontent.com/a1git/be15ef9f7e64fccb9a18e78f727ff592/raw/d0164a35633d1c29eceaedd2107d3dc3b1fb2c3e/gistfile1.txt 08:42
noonedeadpunkadmin1: and the same issue is there even when you try to create an instance from volume directly (not attaching volume to already running VM)?08:44
admin1noonedeadpunk, i have not tried that yet .. but could be the same 08:47
noonedeadpunkbecause eventually comand should differ there. Also - have you checked apparmor log? Wondering if it might be unhappy for some reason08:49
admin1nothing about this in the kern logs 08:51
admin1journalctl -u libvirtd -f => libvirtd[1125752]: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd' 08:58
kleinirbd driver has been somehow separated from qemu code. I don't know any details. This results in running VMs, still running on older qemu code, need a restart to be able to load this separated rbd driver. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/149589508:59
kleiniI had the same problem just with attaching volumes to VMs although having Ceph as backend for volumes from the beginning.08:59
admin1kleini, you had to restart the whole hypervisor ? 09:02
kleinino, just the VM09:02
admin1oh 09:02
admin1let me do a shutdown and try that out 09:02
noonedeadpunkfor rbd support qemu-block-extra is required but we should install it anyway09:05
kleiniit was installed on my hypervisors but still the VM requires a complete restart09:05
noonedeadpunkhuh intersesting. i guess due to secret that was added after vm startup 09:06
kleinisorry, I did not dig deeper as I was satisfied to have solved the problem by restarting the VM09:07
admin1kleini thanks .. it worked \o/ 09:15
jrosser_so https://review.opendev.org/c/openstack/ansible-role-pki/+/816520 and https://review.opendev.org/c/openstack/ansible-role-pki/+/808022 were not quite duplicates of each other10:23
agemuendDear all, is there a way to set WEBSSO_IDP_MAPPING in horizon conf? Its commented out in horizon_local_settings.py.j2, and it seems not possible to override it. We're trying to use openidc based identity federation with osa, which seems to generate most of the config correctly, but after being redirected back to horizon, it reports "No authentication backend could be determined to handle the provided credentials. This is likely a 10:34
agemuendconfiguration error that should be addressed.", so we assume that we need to set that mapping variable10:34
jrosser_agemuend: we have an OIDC integration here and don't need to override that10:36
jrosser_if you do want to set a value for it you can use this https://github.com/openstack/openstack-ansible-os_horizon/blob/master/templates/horizon_local_settings.py.j2#L82910:37
agemuendMhm, okay, then maybe we made a mistake.10:38
jrosser_because horizon_local_settings.py is a python file you have to generate valid python code with your override, which is why the normal override mechanism in openstack-ansible doesnt quite hold here10:38
agemuendI've read somewhere that the config override doesnt work for these settings because the file is python based10:38
agemuendaxo10:38
jrosser_right, thats why you have to construct something in horizon_config_overrides which results in valid python. this is all ugly really10:39
agemuendI'd rather fix our error if it works in your case10:39
agemuendOne thing we've been a bit confused about is the WEBSSO_CHOICES10:40
agemuendThe examples use "openid", but the config file comment actually states: 10:40
agemuend"Current supported protocol IDs are 'saml2' and 'oidc'"10:40
agemuendCould you maybe compare that in your file?10:41
jrosser_we set that to 'openid'10:42
agemuendOkay, mhm, we did that as well10:42
agemuendIt generated the line ("openid", _("EGI")) for the WEBSSO_CHOICES, but it still says "no authentication backend could be determined" in Horizon after coming back from the login page10:43
jrosser_and EGI is the display_name of your idp?10:48
jrosser_keystone_sp -> trusted_idp_list -> display_name10:49
agemuendYes10:50
agemuendAh no, its actually keystone_sp -> trusted_idp_list -> name10:51
agemuendnot display_name. Should it be the latter, is the name relevant for something else?10:51
jrosser_urgh its been a while :/10:52
jrosser_There are some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L7210:58
jrosser_i think in our case the IDP has a 'friendly name' that the users expect/recognise which includes a space, so thats not really appropriate for the 'name' field10:59
jrosser_theres some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L7211:00
agemuendThanks, we'll check again11:11
admin1anyone uses swift with OSA ? 11:18
admin1also, checking if anyone knows what project adds the NFV  buttons to horizon 11:19
kleinijrosser_: how do you create clouds.yaml and secure.yaml when using OIDC? how does everything headless authenticate then?11:25
noonedeadpunkI do recall that ppl use swift but I can't recall who exactly...11:35
jrosser_kleini: we use a combination of two things, this adds PKCE support to keystoneauth https://github.com/bbc/keystoneauth-oidc and then we have a shell script that interacts with a local browser to complete the OIDC authentication flow. The shell script either drops you into the interactive openstack client, or exits and leaves the relevant OS_<...> env vars set with a valid keystone token11:45
jrosser_truly headless things cannot authenticate in the usual manner as we have no support at all for username/password auth, so anything needing to do that uses application credentials11:46
opendevreviewJames Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain  https://review.opendev.org/c/openstack/ansible-role-pki/+/81685712:01
opendevreviewAndrew Bonney proposed openstack/openstack-ansible-os_octavia master: keypair: copy key to deploy host rather than setup host  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/81699712:08
kleinijrosser_: thanks. this clarifies for me such an authentication integration12:27
jrosser_in our case PCKE support in the IDP was critical12:27
jrosser_otherwise you have to give the client secret out to anyone wanting to use the CLI, which is pretty bad12:27
jrosser_PKCE allows secure auth without a client secret, and us very often used with 'public' clients such as mobile apps or embedded devices12:29
opendevreviewJames Gibson proposed openstack/openstack-ansible master: Add playbook to generate any user defined certificates  https://review.opendev.org/c/openstack/openstack-ansible/+/81652213:04
noonedeadpunkwtf is that... https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/81563113:12
noonedeadpunkfeels like tempestconf moval result13:13
jrosser_hrrm i wonder if the infra people are adjusting stuff13:25
jrosser_has tempestconf repo moved?13:26
noonedeadpunkUm, now I'm not sure. Just see tons of errors like `Unknown projects: osf/python-tempestconf` https://zuul.openstack.org/config-errors13:34
noonedeadpunkprobably they are to be renamed...13:34
noonedeadpunkbut anyway we don't decalre it anywhere as require-dprojects13:40
opendevreviewJames Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain  https://review.opendev.org/c/openstack/ansible-role-pki/+/81685713:50
spatelfolk, my stein release of openstack running rabbitmq-server-3.7.15 (look like it has lots of bugs, small network blip making cluster out of sync) 15:46
spateli want to upgrade this version 15:46
spatelis this rabbitmq-server-3.7.15 hardcoded somewhere in playbook?15:48
spateli found it - vars/redhat.yml15:48
spateldo i need to bump erlang if i bump rabbitmq version? 15:50
mgariepyhmm. how comes you are on 3.7.15 ? you are not on the latest sha ?15:51
mgariepyhttps://github.com/openstack/openstack-ansible-rabbitmq_server/blob/stable/stein/vars/redhat.yml#L3615:52
spatelI stuck with my older release of stein and now i am thinking to retire this stack and move it to ubunut15:57
spatelbut regardless that i can just upgrade rabbitMQ right? if i don't want to touch other components15:58
mgariepyyep you cloud indeed.15:58
mgariepytake the version from the repo that file ?15:58
spatellet me give it a try and just upgrade rabbitMQ with this new version 15:58
mgariepyfrom that file in the repo lol15:58
mgariepymonday is hard lol.15:59
spatel_rabbitmq_erlang_repo_url: right?15:59
spateli will sure give this try on my lab and then i will do.. anyway this is 4th time my rabbitMQ is down in last 1 week15:59
mgariepyupgrade the rabbit role to the stable/stein  ?15:59
spatelits giving me hard time15:59
mgariepynot fun 16:00
spatelmy problem is we are getting DDoS and that causing little blip in network making this rabbitMQ cluster go bad.. i am not able to figure out what is happening in logs because logs are not useful16:01
spatelall i am doing nuke rabbitMQ and everything come back 16:01
spateli do have other openstack stack on same LAN and they are fine.. 16:01
spatelonly this guys doesn't like small blip so assuming its kind of BUG 16:01
mgariepywhat version the other stack is running ?16:04
spatelthey are running on qeens mgariepy 16:26
spatelqueens*16:28
mgariepyhmm ok16:31
spatelthat is why i am so sure that may be i am on wrong version on stein.. 16:42
spatelwhat do you guys use to monitor rabbitMQ, any good way or just check pid etc..?16:43
spatelmgariepy thinking why don't i go directly to 3.9.8-1.el7 ? 16:50
spatelthat is most latest version16:51
spateldoes it required any OSA change if i directly push to 3.9.8-1 ?16:51
mgariepyspatel, are both rabbitmq queues configured the same on both deployment ?18:53
spatelwhat do you mean by same queues? 18:55
spatelI have two openstack cloud both running on same datacenter same switch fabtic18:55
spatelI have two openstack cloud both running on same datacenter same switch fabric18:55
spatelbut only stein one giving me hard time18:55
spatelcurrently i am upgrading them 18:56
spatelinstalling - rabbitmq-server-3.7.28-1.el7.noarch18:57
mgariepyi saw in the past not sure when. but rabbitmq queues switched from non-ha to ha. 19:12
mgariepyso i'm wondering if both your deployment have the same config for the rabbitmq queues.19:13
spateli have HA on both rabbitMQ 19:21
spateli never mess with default setting 19:21
spatelhow do i increase memory setting in rabbitMQ ? 19:21
spatelcurrently i have 0.2 setting whatever default comes with OSA19:22
spatelthinking may be rabbit crying for memory 19:22
spatelmgariepy i can see my ready queue is getting full 19:41
spatelany good command to clean up queue19:41
spateli can see scheduler_fanout_9ce95bd354464c5bbf0ef65d01e4bdb0 is growing 19:42
mgariepyhmm why is this one growing?19:53
spateli don't know :(19:58
spatelit has 80k mesg in queue in "ready" 19:58
spatelagain nuking rabbitMQ and re-building it19:58
spatelsomething is very odd going on.. 19:58
mgariepyit the number of msg that is crashing it ?19:58
spatelmay be my rabbitMQ need more memory19:59
mgariepyyou probably need to find why they are not consumed i guess ?19:59
spatelas soon as i upgrade rabbitMQ and start noticing growing queue 19:59
spatelhow do i increase this value using OSA - vm_memory_high_watermark, 0.220:01
spateli have 250 compute nodes and may be my rabbitMQ need more memory20:01
spateli didn't find this variable in default/main.yml20:01
spatelmgariepy just rebuild and now this is the status - https://ibb.co/L5cTSMg20:05
spatelwatching closely to see if queue growing or not20:05
spatelI have 11k consumer do you think its big number?20:10
spatelhmm i have rpc_workers=1 in neutron.conf may be this number is very slow causing backlog 20:13
spateldamn it.. that was it.. 20:22
spatelrpc_workers=1 was my issue20:22
spatelas soon as i bump that up my rabbitMQ is draining like hell 20:22
spateli will keep eye on it and let you know if this setting resolve my issue.. 20:23
spatelwhat is q-server-resource-versions_fanout ? 20:27
mgariepyhow big is that cluster ?21:31
mgariepyarf.. 21:32
mgariepyi'm not sure what can be the issue, i don't have a cluster that big on openstack.21:34
mgariepy11k consumer. i guess it probably X thread per process..21:35
mgariepyor per service21:35
spatelI have 250 compute nodes in this RabbitMQ cluster21:49
spatelI think my cluster was missing some critical tuning options, as i said rpc_workers, rpc_pool_size and agent report_interval etc..21:50
spatelhttps://docs.mirantis.com/mcp/q4-18/mcp-deployment-guide/advanced-config/tune-rabbitmq-perf.html21:50
spatelI am using this guide to adjust them according.. 21:51
spatelThinking we should create one document in official OSA related what setting you can play to scale out your stack 21:53
spatelMore good stuff - https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Evolution-of-OpenStack-Networking-at-CERN3.pdf21:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!