Friday, 2022-01-14

kleiniwith W I have the problem on some computes, that nova-compute runs into too many open files with oslo messaging. Is this a known issue?08:32
andrewbonneyAha, you'll be after https://bugs.launchpad.net/oslo.messaging/+bug/194996408:33
andrewbonneyThere's a partial fix in there, but I'm not sure it's 100% solved08:33
kleinioh, thanks very much. will try to get the fix deployed08:38
opendevreviewJonathan Rosser proposed openstack/ansible-role-python_venv_build master: Add per-distro vars files  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/82418008:50
opendevreviewMerged openstack/openstack-ansible-plugins master: Fix modules location  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82464909:01
opendevreviewMerged openstack/openstack-ansible-plugins master: Update provider_networks with latest changes  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82464609:02
opendevreviewMerged openstack/openstack-ansible-plugins master: Move git_requirements to plugins collection  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82456309:03
opendevreviewMerged openstack/openstack-ansible-tests stable/xena: Fix rich version for ansible-lint  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/82454009:23
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-systemd_service master: Add ability to create templated services  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/81653110:24
noonedeadpunkI'm working on journald-remote now and systemd templated service would be handy to define multiple destinations ^10:25
noonedeadpunkDo we have any agreement if we want this feature or not because of complexity it brings?10:25
noonedeadpunkIt always could be just different services indeed10:27
noonedeadpunkbut it's somehow comfy to have same name and jsut different arguments...10:28
opendevreviewMerged openstack/openstack-ansible-os_keystone master: Use common service setup tasks from a collection rather than in-role  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/82099910:30
jrosser__noonedeadpunk: the complexity is not out of line with other things we have10:31
jrosser__it maybe lacks a link to the systemd documentation which describes what template services are10:32
noonedeadpunkI put it in reno:) https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/816531/4/releasenotes/notes/templated_service-f31e4515c2fd75ab.yaml10:32
noonedeadpunksorry should have placed in commit msg as well I guess10:32
noonedeadpunkbut it's really described in good manner there10:34
jrosser__or even as a comment in the defaults file, as I had to go read to understand that it was not an ansible tenplating thing, but actually native to systemd10:34
noonedeadpunkah, yes, fair10:35
jrosser__does it need to also account for the “load” Boolean I just added?10:35
noonedeadpunkI was again suck in naming things... 10:35
noonedeadpunkI just rebased in on top of your change10:35
noonedeadpunkL25 https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/816531/4/tasks/systemd_load.yml#2510:36
jrosser__ah yes I see10:37
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-systemd_service master: Add ability to create templated services  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/81653110:37
jrosser__I’m happy with it, so use it for the journal things10:38
noonedeadpunkWhy I was talking about complexity is that with patch it becomes less obvious from output what service we're trying to run against10:38
noonedeadpunkbecause of double iteration10:38
jrosser__can that be improved with two variables used in the task description10:39
noonedeadpunkThus I added service_name to task to at least somehow cover that10:39
jrosser__if I follow properly10:39
noonedeadpunk*to task name10:39
noonedeadpunkyeah, true10:39
noonedeadpunkBut I wasn't able to make it for handlers for some reason...10:40
noonedeadpunk(but not sure now maybe I was :p)10:41
noonedeadpunkworth checking logs10:41
*** arxcruz|ruck is now known as arxcruz11:13
*** dviroel|out is now known as dviroel11:21
opendevreviewMerged openstack/ansible-config_template master: Copy refactor of code quality issues  https://review.opendev.org/c/openstack/ansible-config_template/+/82460111:32
opendevreviewJonathan Rosser proposed openstack/ansible-role-python_venv_build master: Split venv_rebuild functionality  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/77398411:57
*** anbanerj is now known as frenzyfriday|ruck12:02
opendevreviewMerged openstack/openstack-ansible-os_zun stable/xena: Remove testing on Centos-8  https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/82453512:11
jrosser__theres still two blocked things on here which i won't be able to look at today https://review.opendev.org/q/topic:%22osa%252Fremove-centos8%22+(status:open%20OR%20status:merged)12:13
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Use provider_networks from collection  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/82465012:16
opendevreviewMerged openstack/ansible-role-python_venv_build master: Add per-distro vars files  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/82418013:14
opendevreviewMerged openstack/openstack-ansible-plugins master: Move system_crontab_role to collection  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82459013:36
opendevreviewMerged openstack/openstack-ansible-tests stable/ussuri: Remove opensuse jobs  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/82420713:50
opendevreviewOpenStack Proposal Bot proposed openstack/openstack-ansible-os_murano stable/ussuri: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-os_murano/+/82471714:02
opendevreviewMerged openstack/openstack-ansible master: Do not duplicate packages installed with the venv build role  https://review.opendev.org/c/openstack/openstack-ansible/+/82417914:37
spateljamesdenton question how do i remove dead gateway chassis from OVN 15:11
jamesdentoncan you elaborate?15:18
jamesdenton(i am not sure)15:18
spatelI had 4 network node and one of node is dead,  i have removed dead node but ovn still showing in ovn-nb db 15:28
*** dviroel is now known as dviroel|lunch15:33
jamesdentonwhich command are you using to list those15:48
spateljamesdenton let me DM you for security reason 15:55
opendevreviewMerged openstack/openstack-ansible-os_magnum master: Run service_setup only once  https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/82452616:02
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Convert infra-journal-remote playbook to role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82473116:36
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Move infra-journal-remote logic to its role  https://review.opendev.org/c/openstack/openstack-ansible/+/82473416:41
noonedeadpunkdamiandabrowski[m]: you might be interested in these 2 ^16:41
noonedeadpunkfor some reason I see systemd-journal-upload being stuck in my aio, but as log host I used container, so maybe that's why...16:42
damiandabrowski[m]ouh, so we already made a decision to put it into openstack-ansible-plugins16:42
damiandabrowski[m]thanks anyway16:42
noonedeadpunkI guess we did during last meeting?16:43
noonedeadpunkWe agreed to re-evaluate that later if needed16:44
noonedeadpunkbut place for now in plugins as "staging" place16:45
damiandabrowski[m]ouh ok, maybe i missed that :D it's ok then16:45
*** dviroel|lunch is now known as dviroel|16:46
*** dviroel| is now known as dviroel16:46
noonedeadpunkor we might all misunderstood each other16:46
noonedeadpunkI was kind of referrencing https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-01-11-15.00.log.html#l-9316:47
noonedeadpunkbut you're right, I must use `agreed` command more16:47
noonedeadpunkWill try fixing that in the future16:47
damiandabrowski[m]it's ok Dmitriy ;)16:47
damiandabrowski[m]if someone is interested: it may be a new beginning of infra-journal-remote role(i'm going to focus on that on February) 16:48
damiandabrowski[m]https://github.com/citynetwork/role-journal-remote16:48
noonedeadpunklol16:48
noonedeadpunkwelll16:50
noonedeadpunkwe're a big company :p16:50
noonedeadpunkI know that we were about to do some work on it but didn't know it has been already done16:53
noonedeadpunkI should have left that alone...16:54
damiandabrowski[m]ooops :D 16:54
damiandabrowski[m]however, I haven't looked much into Erik's role yet, so at the end of the day, it may be easier to start with Your version16:57
noonedeadpunkthey're very close... I kind of have a feeling that I just stole that work, but I swear I saw it just now :)16:58
noonedeadpunkI mean - even files named same16:59
noonedeadpunkwell, looking https://github.com/citynetwork/role-journal-remote/blob/main/tasks/journal_remote_post_install.yaml#L40-L54 I realized that using systemd_service role to setup service might be indeed wrong17:03
noonedeadpunkas just realized that systemd-journal-remote.service shipped with package17:03
noonedeadpunkor, well, override_only should be used I believe17:05
noonedeadpunkor just mask default services :)17:05
noonedeadpunkbut override_only sounds like nice option indeed.17:06
noonedeadpunkwill sort this out anyway )17:08
damiandabrowski[m]I'm not that familiar with ansible-role-systemd_service but looking at #816531, I think override_only may be a good idea ;)17:09
jawad-axdHi all! I came across this paper http://seclab.cs.sunysb.edu/seclab/pubs/asiaccs16.pdf . A bit shocked to see Section 3.2 in the paper. Just want to know if that is really the case, when one compute node is compromised, to what extent rest of infrastructure is secure. Highly appreciate some comments on this.17:44
mgariepyin what case you can have a host compromised and be confident you are not screwed ?17:58
noonedeadpunkjawad-axd: um, for instance we create a rabbitmq users per vhost, each service reside on it's own vhost18:08
jawad-axdThere are hypervisor exploitation stories around, and my boss is very concerned that if one compute host is compromised then others should not be affected. Sigh! But yeah, is there some answer to this question?18:09
noonedeadpunkAnd I don't think you can really send RPC call with nova rpc user that will make neutron to create port and without that you won't have instance18:10
noonedeadpunkIf one is compromised this means there's a way to compromise it, which means that others likely have same door open. So even without RPC and other stuff mentioned you're likely screwed as mgariepy said18:11
mgariepyyou can jump from compute to compute in case you have migration activated.18:12
noonedeadpunkwell, also, I think good thing to do might be to use different users for cells... So even if they get rpc for cell, this doesn't mean it will be possible to screw whole rabbit18:13
noonedeadpunkfor nova user - yes18:13
mgariepyif you use shared storage between your computes, you likely have access to the whole pool via the client access.18:13
jawad-axdFrom paper, they mentioned on compromised compute host , sniffing token, grabbing MQ credentials, and creating MG message with wild card, and getting authorized by API. According to them, there is no check after token is authorised from API, also about token permissions are not limited and so on. 18:14
mgariepyfor nova user yes but escalation is often possible but it depends on a lot of other stuff for sure..18:15
noonedeadpunkhm, btw I wonder if things like https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/templates/nova.conf.j2#L135-L150 must be defined for computes....18:19
noonedeadpunkbecause with that you don't need to sniff a thing:)18:19
mgariepyhmm. indeed.18:20
jrosser__ultimately any host running software that needs to interact with another host must have some kind of credential18:21
jrosser__and so if that credential is compromised then there is nothing you can do18:22
mgariepyeven with vault or something like that it would not help a lot i think18:23
noonedeadpunkit will just make things a bit harder18:24
mgariepyyeah18:24
noonedeadpunkbut until you have credentials to access vault...18:24
noonedeadpunkstored on same host18:24
mgariepythe idea is only to run faster than the other anyway no ? haha18:25
mgariepyso the lions doesn't get you ! 18:25
jawad-axdRight. Seems like its very difficult to close all the holes from compromised compute, also there is shared storage between compute hosts.I got that. Is there any other OSA solution for compute security domains or maybe I am asking too much.18:25
jawad-axd?18:26
jawad-axdWhich might help in this kind of situation.18:26
noonedeadpunkSo indeed, you don't need to sniff anything, as keystone admin credentials are stored in nova.conf on each compute18:30
noonedeadpunkthey passed really extra mile there :P18:30
noonedeadpunkas nova-compute needs to talk to other services and have admin privileges or write specific policy, etc18:31
mgariepylocking all the holes is hard ofthen holes get spawned out of thin air like log4shell thing or heartbleed...18:31
noonedeadpunkgood idea that was thrown in #openstack-nova - use application credentials per compute node.18:36
noonedeadpunkat least you can rotate that fast enough....18:36
noonedeadpunkbut still - extra users/projects could be created until that is done18:37
jawad-axdI ll look at app credentials with compute node. Thanks.Wondering if all openstack public clouds out there have some kind of mechansim for this secanrio or they are just trusting the hypervisor.18:41
noonedeadpunkjawad-axd: but that paper is stupid imo... `Each compute node stores its MQ credentials inside its OpenStack configuration files.` In addition to MQ credentials, all keystone user credentials are stored there as well, so they could just use them rather play around and sniff smth...18:42
noonedeadpunkIf you can read nova.conf - you don't need to do all the rest stuff as you granted access to keystone18:43
jawad-axdThats also true. @noonedeadpunk 18:44
noonedeadpunkbut eventually if you escaped libvirt domain - you should not be able to read nova.conf anyway...18:46
noonedeadpunkso it's only if you got root kind of...18:51
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Fix umask for /etc/nova directory  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82477418:54
mgariepynoonedeadpunk, small comment on this one ^^19:04
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Fix umask for /etc/nova directory  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82477419:07
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Change default mode while creating directories  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82477419:08
mgariepythanks19:13
spateljamesdenton did you work on GPU virtualization19:53
spateli have GPU nodes and trying to play :)19:53
spateli am working on build HPC on openstack19:55
spatelMy hardware is Tesla V100S PCIe 32GB19:56
jrosser__noonedeadpunk: i expect we could do a lot more with this stuff https://docs.arbitrary.ch/security/systemd.html20:16
spatelis this check against any database to get this score?20:22
spatelhow does it score ?20:22
spatelnevermind - https://itectec.com/ubuntu/ubuntu-how-to-address-results-of-systemd-analyze-security/20:23
spatelhope this is not some like SELinux and soon folks stop paying attention and disable it20:24
jamesdentonspatel i did some GPU passthrough a while back, but nothing more than that20:26
spateljamesdenton https://www.jimmdenton.com/gpu-offloading-openstack/20:26
jamesdentonyep20:26
spateljamesdenton - https://paste.opendev.org/show/812125/20:34
spatelin my case should i black-list - nouveau, nvidia_drm20:34
spatelcurious that you used vfio-pci but all other documents not talking about that 20:52
*** dviroel is now known as dviroel|out20:57
spatelanyway i am following your blog to see how it goes :)21:08
mgariepyspatel, do you want to do passthrough or vgpus stuff?21:13
spatelpassthrough21:13
spateli want to expose my GPU to virtual machine21:13
mgariepyvgpus also do exposes some gpu to the vms 21:14
mgariepybut you can usually split the gpu and share it between multiple vms.21:14
spatelwhat is the difference here?21:14
mgariepyand you need to pay a licence i think.21:15
mgariepyimo passthrough is much more simple.21:15
spatelwhat is the difference between Passthrough vs vgpu ? function and benefit point of view21:15
mgariepyyou can split a gpu to share it between multiple vms21:16
spatelin vgpu deployment?21:17
mgariepyyou can have 1 gpu splitted between 4 vms 21:17
mgariepy(made up number as i don't currently have any that i run) 21:18
spatelThis is my first GPU compute nodes so no idea what i am doing :)21:18
spateljust 2 hour ago i got my GPU compute node and try to learn as much as possible21:19
mgariepypassthrough is more simple.21:19
mgariepyvgpus needs some licence and a match of driver version on the hosts and in vm. irrc.21:19
spatellets do passthrough then and later vgpus 21:20
spatelI have 10 GPU compute nodes where we are going to run simulation 21:20
mgariepyok21:21
mgariepyif you have 1 user i guess you don't need to bother with vgpus.21:21
spatelThis is University openstack cluster for HPC style simulation and research 21:22
spateli am assuming multiple folks or student going to use21:22
spatelbut let me first go with whatever easy21:23
spateli am reading this doc and they are talking about vGPU - https://docs.openstack.org/nova/queens/admin/virtual-gpu.html21:23
mgariepyjamesdenton, nice post. i vaguely remember having to add the driver and pciid combinaison to the initramfs. 21:23
spateli am following his doc21:24
mgariepylet me know if it works.21:24
spatelI will blog that out too :)21:24
spateli need to add entry in /etc/nova/nova.conf of compute node like - passthrough_whitelist: 21:25
spatelbut why do i need entry in nova-api for alias ? alias: { "vendor_id":"10de", "product_id":"1c30", "device_type":"type-PCI", "name":"quadro-p2000" }21:25
mgariepyso you can refer to the card bia the alias in the flavor.21:26
spatelwhat if i have multiple kind of GPU hardware then how do i handle in nova-api?21:26
spateli have to add multiple key/value assuming 21:26
mgariepy--property "pci_passthrough:alias"="quadro-p2000:1" 21:27
mgariepyyou can have multiple alias21:27
spatelah! ok21:27
spatellet me give it a try 21:27
mgariepyif you endup with gamers gpus that have usb ports and other stuff you will have fun :D21:28
spatelfun part is i am doing all this with kolla-ansible :) 21:28
mgariepyget out ! :P21:28
spatelhehe21:28
mgariepyLOL21:28
spatelThey have hard requirement to use kolla-ansible and that is why i am learning my way to use kolla 21:28
mgariepydo you have real gpu or you have gamers ones?21:28
spatelreal GPU :)21:29
mgariepyok nice.21:29
mgariepyanyway. have a nice weekend. i'm, done for this week.21:29
spatelthanks for the help :) and have a great weekend with lots of wins and beers21:29
spatelwine*21:30
spateli bought new keyborad and its messing with me 21:30
opendevreviewMerged openstack/openstack-ansible master: Move git_requirements plugin to collection  https://review.opendev.org/c/openstack/openstack-ansible/+/82457422:42

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!