*** gouthamr_ is now known as gouthamr | 04:07 | |
*** akahat is now known as akahat|rover | 07:00 | |
*** sshnaidm|off is now known as sshnaidm | 08:04 | |
admin1 | in an already running cluster, I have to add ceph support for only volumes .. this is what I did in the user_variables => https://gist.github.com/a1git/152e79a84700dc0bee43d69266c11990 and re-ran os-cinder and os-nova playbooks .. i am able to create the ceph volumes via cinder .. and i also see the /etc/ceph folder in nova compute and | 08:38 |
---|---|---|
admin1 | virsh secret-list is added .. .. trying to mount the volume gives: 2021-11-08 03:37:22.092 3559222 ERROR oslo_messaging.rpc.server libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd' ... running qemu-system-x86_64 -drive format=? 2>&1 | grep rbd ... this returns rbd so i think the support | 08:38 |
admin1 | is there | 08:38 |
admin1 | so checking here if anyone has seen this before and if I have missed anything in the viarbales .. | 08:38 |
admin1 | this is the error message: https://gist.githubusercontent.com/a1git/be15ef9f7e64fccb9a18e78f727ff592/raw/d0164a35633d1c29eceaedd2107d3dc3b1fb2c3e/gistfile1.txt | 08:42 |
noonedeadpunk | admin1: and the same issue is there even when you try to create an instance from volume directly (not attaching volume to already running VM)? | 08:44 |
admin1 | noonedeadpunk, i have not tried that yet .. but could be the same | 08:47 |
noonedeadpunk | because eventually comand should differ there. Also - have you checked apparmor log? Wondering if it might be unhappy for some reason | 08:49 |
admin1 | nothing about this in the kern logs | 08:51 |
admin1 | journalctl -u libvirtd -f => libvirtd[1125752]: internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 'rbd' | 08:58 |
kleini | rbd driver has been somehow separated from qemu code. I don't know any details. This results in running VMs, still running on older qemu code, need a restart to be able to load this separated rbd driver. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1495895 | 08:59 |
kleini | I had the same problem just with attaching volumes to VMs although having Ceph as backend for volumes from the beginning. | 08:59 |
admin1 | kleini, you had to restart the whole hypervisor ? | 09:02 |
kleini | no, just the VM | 09:02 |
admin1 | oh | 09:02 |
admin1 | let me do a shutdown and try that out | 09:02 |
noonedeadpunk | for rbd support qemu-block-extra is required but we should install it anyway | 09:05 |
kleini | it was installed on my hypervisors but still the VM requires a complete restart | 09:05 |
noonedeadpunk | huh intersesting. i guess due to secret that was added after vm startup | 09:06 |
kleini | sorry, I did not dig deeper as I was satisfied to have solved the problem by restarting the VM | 09:07 |
admin1 | kleini thanks .. it worked \o/ | 09:15 |
jrosser_ | so https://review.opendev.org/c/openstack/ansible-role-pki/+/816520 and https://review.opendev.org/c/openstack/ansible-role-pki/+/808022 were not quite duplicates of each other | 10:23 |
agemuend | Dear all, is there a way to set WEBSSO_IDP_MAPPING in horizon conf? Its commented out in horizon_local_settings.py.j2, and it seems not possible to override it. We're trying to use openidc based identity federation with osa, which seems to generate most of the config correctly, but after being redirected back to horizon, it reports "No authentication backend could be determined to handle the provided credentials. This is likely a | 10:34 |
agemuend | configuration error that should be addressed.", so we assume that we need to set that mapping variable | 10:34 |
jrosser_ | agemuend: we have an OIDC integration here and don't need to override that | 10:36 |
jrosser_ | if you do want to set a value for it you can use this https://github.com/openstack/openstack-ansible-os_horizon/blob/master/templates/horizon_local_settings.py.j2#L829 | 10:37 |
agemuend | Mhm, okay, then maybe we made a mistake. | 10:38 |
jrosser_ | because horizon_local_settings.py is a python file you have to generate valid python code with your override, which is why the normal override mechanism in openstack-ansible doesnt quite hold here | 10:38 |
agemuend | I've read somewhere that the config override doesnt work for these settings because the file is python based | 10:38 |
agemuend | axo | 10:38 |
jrosser_ | right, thats why you have to construct something in horizon_config_overrides which results in valid python. this is all ugly really | 10:39 |
agemuend | I'd rather fix our error if it works in your case | 10:39 |
agemuend | One thing we've been a bit confused about is the WEBSSO_CHOICES | 10:40 |
agemuend | The examples use "openid", but the config file comment actually states: | 10:40 |
agemuend | "Current supported protocol IDs are 'saml2' and 'oidc'" | 10:40 |
agemuend | Could you maybe compare that in your file? | 10:41 |
jrosser_ | we set that to 'openid' | 10:42 |
agemuend | Okay, mhm, we did that as well | 10:42 |
agemuend | It generated the line ("openid", _("EGI")) for the WEBSSO_CHOICES, but it still says "no authentication backend could be determined" in Horizon after coming back from the login page | 10:43 |
jrosser_ | and EGI is the display_name of your idp? | 10:48 |
jrosser_ | keystone_sp -> trusted_idp_list -> display_name | 10:49 |
agemuend | Yes | 10:50 |
agemuend | Ah no, its actually keystone_sp -> trusted_idp_list -> name | 10:51 |
agemuend | not display_name. Should it be the latter, is the name relevant for something else? | 10:51 |
jrosser_ | urgh its been a while :/ | 10:52 |
jrosser_ | There are some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L72 | 10:58 |
jrosser_ | i think in our case the IDP has a 'friendly name' that the users expect/recognise which includes a space, so thats not really appropriate for the 'name' field | 10:59 |
jrosser_ | theres some docs here https://github.com/openstack/openstack-ansible-os_keystone/blob/master/doc/source/configure-federation-sp.rst#L72 | 11:00 |
agemuend | Thanks, we'll check again | 11:11 |
admin1 | anyone uses swift with OSA ? | 11:18 |
admin1 | also, checking if anyone knows what project adds the NFV buttons to horizon | 11:19 |
kleini | jrosser_: how do you create clouds.yaml and secure.yaml when using OIDC? how does everything headless authenticate then? | 11:25 |
noonedeadpunk | I do recall that ppl use swift but I can't recall who exactly... | 11:35 |
jrosser_ | kleini: we use a combination of two things, this adds PKCE support to keystoneauth https://github.com/bbc/keystoneauth-oidc and then we have a shell script that interacts with a local browser to complete the OIDC authentication flow. The shell script either drops you into the interactive openstack client, or exits and leaves the relevant OS_<...> env vars set with a valid keystone token | 11:45 |
jrosser_ | truly headless things cannot authenticate in the usual manner as we have no support at all for username/password auth, so anything needing to do that uses application credentials | 11:46 |
opendevreview | James Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain https://review.opendev.org/c/openstack/ansible-role-pki/+/816857 | 12:01 |
opendevreview | Andrew Bonney proposed openstack/openstack-ansible-os_octavia master: keypair: copy key to deploy host rather than setup host https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/816997 | 12:08 |
kleini | jrosser_: thanks. this clarifies for me such an authentication integration | 12:27 |
jrosser_ | in our case PCKE support in the IDP was critical | 12:27 |
jrosser_ | otherwise you have to give the client secret out to anyone wanting to use the CLI, which is pretty bad | 12:27 |
jrosser_ | PKCE allows secure auth without a client secret, and us very often used with 'public' clients such as mobile apps or embedded devices | 12:29 |
opendevreview | James Gibson proposed openstack/openstack-ansible master: Add playbook to generate any user defined certificates https://review.opendev.org/c/openstack/openstack-ansible/+/816522 | 13:04 |
noonedeadpunk | wtf is that... https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/815631 | 13:12 |
noonedeadpunk | feels like tempestconf moval result | 13:13 |
jrosser_ | hrrm i wonder if the infra people are adjusting stuff | 13:25 |
jrosser_ | has tempestconf repo moved? | 13:26 |
noonedeadpunk | Um, now I'm not sure. Just see tons of errors like `Unknown projects: osf/python-tempestconf` https://zuul.openstack.org/config-errors | 13:34 |
noonedeadpunk | probably they are to be renamed... | 13:34 |
noonedeadpunk | but anyway we don't decalre it anywhere as require-dprojects | 13:40 |
opendevreview | James Gibson proposed openstack/ansible-role-pki master: Add tasks to generate intermediate cert chain https://review.opendev.org/c/openstack/ansible-role-pki/+/816857 | 13:50 |
spatel | folk, my stein release of openstack running rabbitmq-server-3.7.15 (look like it has lots of bugs, small network blip making cluster out of sync) | 15:46 |
spatel | i want to upgrade this version | 15:46 |
spatel | is this rabbitmq-server-3.7.15 hardcoded somewhere in playbook? | 15:48 |
spatel | i found it - vars/redhat.yml | 15:48 |
spatel | do i need to bump erlang if i bump rabbitmq version? | 15:50 |
mgariepy | hmm. how comes you are on 3.7.15 ? you are not on the latest sha ? | 15:51 |
mgariepy | https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/stable/stein/vars/redhat.yml#L36 | 15:52 |
spatel | I stuck with my older release of stein and now i am thinking to retire this stack and move it to ubunut | 15:57 |
spatel | but regardless that i can just upgrade rabbitMQ right? if i don't want to touch other components | 15:58 |
mgariepy | yep you cloud indeed. | 15:58 |
mgariepy | take the version from the repo that file ? | 15:58 |
spatel | let me give it a try and just upgrade rabbitMQ with this new version | 15:58 |
mgariepy | from that file in the repo lol | 15:58 |
mgariepy | monday is hard lol. | 15:59 |
spatel | _rabbitmq_erlang_repo_url: right? | 15:59 |
spatel | i will sure give this try on my lab and then i will do.. anyway this is 4th time my rabbitMQ is down in last 1 week | 15:59 |
mgariepy | upgrade the rabbit role to the stable/stein ? | 15:59 |
spatel | its giving me hard time | 15:59 |
mgariepy | not fun | 16:00 |
spatel | my problem is we are getting DDoS and that causing little blip in network making this rabbitMQ cluster go bad.. i am not able to figure out what is happening in logs because logs are not useful | 16:01 |
spatel | all i am doing nuke rabbitMQ and everything come back | 16:01 |
spatel | i do have other openstack stack on same LAN and they are fine.. | 16:01 |
spatel | only this guys doesn't like small blip so assuming its kind of BUG | 16:01 |
mgariepy | what version the other stack is running ? | 16:04 |
spatel | they are running on qeens mgariepy | 16:26 |
spatel | queens* | 16:28 |
mgariepy | hmm ok | 16:31 |
spatel | that is why i am so sure that may be i am on wrong version on stein.. | 16:42 |
spatel | what do you guys use to monitor rabbitMQ, any good way or just check pid etc..? | 16:43 |
spatel | mgariepy thinking why don't i go directly to 3.9.8-1.el7 ? | 16:50 |
spatel | that is most latest version | 16:51 |
spatel | does it required any OSA change if i directly push to 3.9.8-1 ? | 16:51 |
mgariepy | spatel, are both rabbitmq queues configured the same on both deployment ? | 18:53 |
spatel | what do you mean by same queues? | 18:55 |
spatel | I have two openstack cloud both running on same datacenter same switch fabtic | 18:55 |
spatel | I have two openstack cloud both running on same datacenter same switch fabric | 18:55 |
spatel | but only stein one giving me hard time | 18:55 |
spatel | currently i am upgrading them | 18:56 |
spatel | installing - rabbitmq-server-3.7.28-1.el7.noarch | 18:57 |
mgariepy | i saw in the past not sure when. but rabbitmq queues switched from non-ha to ha. | 19:12 |
mgariepy | so i'm wondering if both your deployment have the same config for the rabbitmq queues. | 19:13 |
spatel | i have HA on both rabbitMQ | 19:21 |
spatel | i never mess with default setting | 19:21 |
spatel | how do i increase memory setting in rabbitMQ ? | 19:21 |
spatel | currently i have 0.2 setting whatever default comes with OSA | 19:22 |
spatel | thinking may be rabbit crying for memory | 19:22 |
spatel | mgariepy i can see my ready queue is getting full | 19:41 |
spatel | any good command to clean up queue | 19:41 |
spatel | i can see scheduler_fanout_9ce95bd354464c5bbf0ef65d01e4bdb0 is growing | 19:42 |
mgariepy | hmm why is this one growing? | 19:53 |
spatel | i don't know :( | 19:58 |
spatel | it has 80k mesg in queue in "ready" | 19:58 |
spatel | again nuking rabbitMQ and re-building it | 19:58 |
spatel | something is very odd going on.. | 19:58 |
mgariepy | it the number of msg that is crashing it ? | 19:58 |
spatel | may be my rabbitMQ need more memory | 19:59 |
mgariepy | you probably need to find why they are not consumed i guess ? | 19:59 |
spatel | as soon as i upgrade rabbitMQ and start noticing growing queue | 19:59 |
spatel | how do i increase this value using OSA - vm_memory_high_watermark, 0.2 | 20:01 |
spatel | i have 250 compute nodes and may be my rabbitMQ need more memory | 20:01 |
spatel | i didn't find this variable in default/main.yml | 20:01 |
spatel | mgariepy just rebuild and now this is the status - https://ibb.co/L5cTSMg | 20:05 |
spatel | watching closely to see if queue growing or not | 20:05 |
spatel | I have 11k consumer do you think its big number? | 20:10 |
spatel | hmm i have rpc_workers=1 in neutron.conf may be this number is very slow causing backlog | 20:13 |
spatel | damn it.. that was it.. | 20:22 |
spatel | rpc_workers=1 was my issue | 20:22 |
spatel | as soon as i bump that up my rabbitMQ is draining like hell | 20:22 |
spatel | i will keep eye on it and let you know if this setting resolve my issue.. | 20:23 |
spatel | what is q-server-resource-versions_fanout ? | 20:27 |
mgariepy | how big is that cluster ? | 21:31 |
mgariepy | arf.. | 21:32 |
mgariepy | i'm not sure what can be the issue, i don't have a cluster that big on openstack. | 21:34 |
mgariepy | 11k consumer. i guess it probably X thread per process.. | 21:35 |
mgariepy | or per service | 21:35 |
spatel | I have 250 compute nodes in this RabbitMQ cluster | 21:49 |
spatel | I think my cluster was missing some critical tuning options, as i said rpc_workers, rpc_pool_size and agent report_interval etc.. | 21:50 |
spatel | https://docs.mirantis.com/mcp/q4-18/mcp-deployment-guide/advanced-config/tune-rabbitmq-perf.html | 21:50 |
spatel | I am using this guide to adjust them according.. | 21:51 |
spatel | Thinking we should create one document in official OSA related what setting you can play to scale out your stack | 21:53 |
spatel | More good stuff - https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Evolution-of-OpenStack-Networking-at-CERN3.pdf | 21:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!