Wednesday, 2022-01-19

bjolo	godmorning	07:43
bjolo	goodmorning i mean :)	07:44
jrosser	good morning	07:55
noonedeadpunk	\o/	08:16
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/825113	10:25
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Create ssh certificate authority https://review.opendev.org/c/openstack/openstack-ansible/+/825292	10:27
*** dviroel\|out is now known as dviroel		10:58
jrosser	noonedeadpunk: did you see the ML stuff about the new RBAC for deployment projects?	11:38
damiandabrowski[m]	he's not fully available today	12:03
damiandabrowski[m]	guys, have You noticed that attaching cinder volume does not work on our AIO? for some reason, iscsid.socket is not spawning iscsid.service	12:05
damiandabrowski[m]	starting iscsid.service manually or rebooting the whole AIO helps	12:06
damiandabrowski[m]	but I wonder how to properly fix it	12:06
damiandabrowski[m]	(i'm testing it on focal)	12:09
jrosser	that might be why the zun tests fail	12:14
jrosser	(one of the reasons)	12:14
jrosser	it could be an ordering issue, seeing the service state and journal from a fresh AIO might be interesting	12:15
jrosser	to see if it was ever tried to be started, or there is some error with the config which then doesnt get reloaded	12:16
damiandabrowski[m]	I just spawned a fresh aio	12:18
damiandabrowski[m]	https://paste.openstack.org/show/812220/	12:18
damiandabrowski[m]	don't see anything indicating that iscsid.service tried to start before	12:19
andrewbonney	I've seen this before in my own AIOs, but relevant tests have passed in CI. I couldn't work out why there was a difference	12:27
damiandabrowski[m]	i assume that's why this test is currently disabled(it tries to attach a volume to the VM which is not working, so it fails):	12:29
damiandabrowski[m]	https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/utility_all.yml#L96	12:29
jrosser	damiandabrowski[m]: you might want to look at this	12:33
jrosser	https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/824042	12:33
jrosser	it's not about iscsi but i used a socket activated service there to replace xinetd	12:34
jrosser	so you can see the order that the services need to be created / loaded / restarted to make that work	12:34
jrosser	i think that the state: "restarted" was a key thing on the socket service itself	12:35
damiandabrowski[m]	thank You, unfortunately restarting iscsid.socket does not help	12:37
damiandabrowski[m]	I'm trying to find what does `ListenStream=@ISCSIADM_ABSTRACT_NAMESPACE` in socket definition means, because I have literally no idea O.o	12:38
jrosser	which bit? ListenStream?	12:39
damiandabrowski[m]	no, `@ISCSIADM_ABSTRACT_NAMESPACE` :D	12:39
jrosser	`If the address starts with an at symbol ("@"), it is read as abstract namespace socket in the AF_UNIX family.`	12:40
damiandabrowski[m]	ahhh, thanks	12:41
jrosser	https://www.freedesktop.org/software/systemd/man/systemd.socket.html	12:42
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/825306	12:45
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Create ssh certificate authority https://review.opendev.org/c/openstack/openstack-ansible/+/825292	12:45
jrosser	damiandabrowski[m]: this is related but a long time ago https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1755858	12:54
damiandabrowski[m]	ah, so we probably hit our issue when this one was fixed	13:01
damiandabrowski[m]	the interesting thing is that when i started and then stopped iscsid.service manually, then nova/cinder were able to start it again when i tried to attach a volume(previously they couldn't do that)	13:03
opendevreview	Merged openstack/openstack-ansible master: Move system_crontab_coordination role to collection https://review.opendev.org/c/openstack/openstack-ansible/+/824593	14:16
*** dviroel is now known as dviroel\|lunch		14:57
DK4	im using this guide for ceph prod setup: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html and fail because of ceph error in the second playbook: https://controlc.com/01090692 any ideas?	15:45
jrosser	DK4: ultimately i think that this is the documentation diverged from reality https://paste.opendev.org/show/812224/	15:49
jrosser	i would think that in the past cidr_networks used to be available in ansible hostvars but that seems not to be the case any more	15:50
DK4	jrosser: thanks for the quick response. i think i found the mistake, ive forgot the &-anchor in the userconfig	15:50
jrosser	quickest workaround would be to replace those entries in user_variables.yml with the actual address ranges	15:50
jrosser	DK4: for production deployments, we normally see people deploying separate ceph clusters, rather than integrated tightly with OSA	15:52
jrosser	you have the choice to do it either way, but long term maintainance tends to be easier if they are decoupled	15:53
jrosser	but size / scale / use-case can also play a part	15:53
*** dviroel\|lunch is now known as dviroel		16:15
evrardjp	hello folks. For keepalived I am still testing with ansible 2.9. Should I drop this?	16:47
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/825306	16:47
jrosser	evrardjp: we are a long way ahead of that now in these parts	16:47
evrardjp	jrosser: including stable branches?	16:48
jrosser	though it depends how far back you want to cover	16:48
evrardjp	as long as OSA is covering I guess	16:48
evrardjp	for the rest of the folks using the roles I think it's fine to move on.	16:48
evrardjp	alternatively, old osa branches can just not bump keepalived role, which is fine too	16:49
jrosser	ussuri is EM and the last place we used 2.9	16:49
evrardjp	ok then I should be good	16:49
evrardjp	thanks for confirming jrosser!	16:49
jrosser	we are 2.10 for V & W so that will be around for a while yet	16:50
evrardjp	and happy new year to you, your family, and your team :)	16:50
jrosser	thankyou :)	16:50
evrardjp	good to see damiandabrowski[m] in here :)	16:51
damiandabrowski[m]	hey JP!	16:52
evrardjp	damiandabrowski[m]: unrelated convo, are you using matrix?	16:55
damiandabrowski[m]	I am, is it wrong? :D	16:56
evrardjp	Well, I do have matrix, but I am still using my bouncer. I want to get rid of it tbh	16:57
evrardjp	was wondering if the bridge is nice nowadays.	16:57
jrosser	i've never looked back from irccloud	16:58
evrardjp	jrosser: a bit more context, when I was in TC, before the whole mess with freenode happened: https://governance.openstack.org/ideas/ideas/pylon/synchronous-and-pseudo-synchronous-comms.html#the-proposal	16:59
evrardjp	but yeah irccloud is nice.	17:00
damiandabrowski[m]	I'm using element.io and I'm quite happy with it, but I never used anything else(except some console client years ago) so I can't really compare	17:01
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825164	17:23
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Implement variable: tempest_endpoint_type https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825156	17:26
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Rename [orchestration] section to [heat_plugin] in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825163	17:27
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825164	17:31
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Implement variable: tempest_endpoint_type https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825156	17:32
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825164	17:33
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Rename [orchestration] section to [heat_plugin] in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825163	17:35
spatel	I have glusterFS mount point on all my compute nodes and i pointed nova /var/lib/nova to glusterfs mount point and all good but when i delete vm i found nova not deleting disk file so i have to do it by hand	17:35
evrardjp	Sometimes I want to shoot myself in the head when I see the direction ansible and molecule is taking. Making things incredibly hard for 0 reason...	17:35
spatel	did anyone noticed this issue with shared mount point	17:35
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825164	17:40
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825164	17:40
evrardjp	has anyone here an example repo with molecule testing? I would like to know what's the recommended way to add the docker collection to make testing work with molecule with requiring the collection into a new requirements.yml on the root of my repo	17:46
evrardjp	without the docker collection, it all fails, as molecule-docker now requires it since version >1.0	17:46
jrosser	evrardjp: the tripleo people started on this in the os_tempest role https://github.com/openstack/openstack-ansible-os_tempest/commit/3f4b58bd4133b83c8556c2275875188147d2a58b	17:48
jrosser	but i feel that it really has not gone anywhere	17:48
evrardjp	I see	17:48
jrosser	however i'm not really sure this is going to be helpful	17:49
evrardjp	I was using molecule, but pinned to old versions	17:49
evrardjp	it's the new versions that are a pain, because they are assuming the docker collection is installed on the system	17:49
jrosser	we do have the same problem in OSA, what to do in a post openstack-ansible-tests world	17:49
jrosser	we are very close to dropping all the jobs relying on that repo now as the maintainance overhead is just too much	17:49
evrardjp	well, I feel that the value of having different repos is nowadays pretty much removed	17:50
jrosser	but it does leave a gap with how to run tests in underlying/utility roles	17:50
evrardjp	jrosser: would it make sense to take a stance, in the OSA community, abotu where you want to head in terms of testing, and get the ball rolling?	17:51
jrosser	indeed	17:51
evrardjp	If you feel it's not sustainable, that's something you need to fix	17:51
evrardjp	you -> we	17:52
evrardjp	was there any proposition raised in the last PTG?	17:52
jrosser	mostly we have to address tech debt currently	17:52
jrosser	so discussions tend to focus on that	17:53
jrosser	and we do make good progress though	17:53
jrosser	but i guess i mean feature debt rather than process / ci	17:53
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources. https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803477	17:55
jrosser	like i say in terms of sustainability, there is not enough effort to simultaneously keep openstack-ansible and openstack-ansible-tests both functional	17:55
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources. https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803477	17:55
jrosser	but i think we lack someone with insight / ready answer for ansible role testing outside of the AIO	17:56
evrardjp	Then you might want to reconsider the current testing structure to simplify indeed	17:57
evrardjp	moving all the things back to the openstack-ansible might help on maintainability	17:57
evrardjp	(as you focus energy on testing scenarios)	17:58
evrardjp	(and reduce the amount of repos)	17:58
evrardjp	I see there is less and less reason to work on separate repos nowadays.	17:58
jrosser	the only time that multiple repos is a big pain is when we want to do refactoring across them all	17:59
evrardjp	maybe there should be some kind of project plan to do such refactors?	17:59
evrardjp	moving to noop jobs isn't that hard ;)	17:59
jrosser	this sort of thing https://review.opendev.org/q/topic:%2522osa/include_vars%2522+(status:open+OR+status:merged)	17:59
evrardjp	I mean, it's all work, so you need to evaluate the end goal and if it's worth it over time	18:00
jrosser	moving the existing roles to a collection would be easy	18:00
jrosser	and with some benefit	18:00
evrardjp	well, you could go as crazy as bringing all roles into the integrated repo, it would make things far simpler. But then you lose the flexibility of overriding easily. That is something I am not sure the community is ready to pay	18:01
jrosser	more problematic is key things like the new work on pki and ssh where the low level roles lack really any rigorous tests	18:01
evrardjp	yaeh, but that's not something the _structure_ will fix	18:02
jrosser	no indeed	18:02
jrosser	more that we don't have a cookie-cutter pattern to use there yet	18:02
evrardjp	that's understanding the importance of tests, a different topic :)	18:02
evrardjp	we used to have one	18:02
evrardjp	but it wasn't really well maintained	18:02
evrardjp	testability of the roles is the hardest, tbh	18:03
evrardjp	because that needs thinking what needs to run to be efficient	18:03
evrardjp	standalone work can probably be more easily tested....	18:03
evrardjp	so what's holding those roles up to increase coverage?	18:03
evrardjp	manpower? prioritization?	18:04
evrardjp	or setting expectations for commits?	18:04
jrosser	no, knowing what to do there "here is a great way to test your role in a zuul job -> use it" vs. having to figure that out	18:04
evrardjp	I have to go for dinner, but it sounds to me that you found the solution: Define template, and document that in OSA ;)	18:06
jrosser	and i would prefer that to be a something as simple as possible, the openstack-ansible-tests repo was mind bogglingly complex	18:06
evrardjp	that's for sure	18:07
evrardjp	for standalone you can probably use molecule directly ;)	18:07
jrosser	right, and standalone should == in zuul	18:07
jrosser	to match expectations with openstack-ansible repo	18:07
jrosser	anyway, enjoy your dinner :)	18:08
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803492	18:15
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803492	18:15
noonedeadpunk	Doh, it feels I missed all fun...	18:27
noonedeadpunk	But I'm not conviced there's no reason to have roles in independent repos nowadays as well	18:28
noonedeadpunk	Like looking at huge ceph-ansible (which is not _that_ big comparing to osa) - and repo is really overloaded.	18:31
evrardjp	I think the point was not structure, but test coverage: The need to have a documented "standard" for standalone testing, and simplify the coverage overall	18:32
noonedeadpunk	Well with having roles separately is easier to controll coverage imo	18:33
evrardjp	for non standalone, it seems the -tests repo is considered complex, and simplification would be welcomed.	18:33
noonedeadpunk	as it's super easy to miss smth when having all gathered together	18:33
evrardjp	I agree with you	18:33
evrardjp	it's however easy to "miss something" in all cases	18:33
noonedeadpunk	While we do miss things now, we also kind of know what )	18:33
noonedeadpunk	but yeah	18:34
evrardjp	I think jrosser is right however on deciding on a standard for standalone roles, which should be "easy to apply"	18:34
noonedeadpunk	also if continue with ceph-ansible - they run about 20 jobs for each change to cover scenarious they have	18:34
noonedeadpunk	infra will kill us for that approach:)	18:34
noonedeadpunk	Yes, absolutely	18:35
evrardjp	to simply the -tests, I feel it _could_ (not saying we should do it) make sense to make the roles that can only be tested "together"	18:36
evrardjp	e.g. use a collection, or make those part of the main repo	18:36
noonedeadpunk	What we miss now is the way of testing collections themselves at the moment	18:38
noonedeadpunk	(not even saying about publishing them)	18:38
noonedeadpunk	so yeah, we have room for improvement and I was thinking about jsut other more unified and simplified way comparing to tests repo of running tests/test.yml tbh	18:39
evrardjp	I am not understanding your proposition :)	18:41
noonedeadpunk	so idea is kind of to leverage gate-check-commit, but instead of start deploying things, run test.yml from repo	18:42
noonedeadpunk	as main pita with tests were their own way of deploying things, own a-r-r, inventory, etc	18:43
noonedeadpunk	which was leading to cross-dependencies, being unable to update ansible version (as not possible to update it in 2 places at same time)	18:44
noonedeadpunk	but it might be not that good idea or it might not work out as expected. Wasn't really looking into details yet, just smth that raised in mind during previous meeting	18:45
noonedeadpunk	but again - it's all about what issue we solve... This way it would be really easy to start per-repo test, that are defined only inside that repo. But have exact same environment prepared as if it was a regular deployment	18:49
jrosser	maybe we can move the bootstrap host role to the plugins collection	18:50
jrosser	then that becomes a common piece	18:50
noonedeadpunk	but we also need bootstrap-ansible?	18:56
noonedeadpunk	I'd even say that bootstrap-ansible might be key thing for such tests?	18:57
evrardjp	bootstrap-ansible should be restricted to just install ansible in a way we expect	19:13
evrardjp	for me, it _could_ make sense to move bootstrap_host content the sole purpose of the test repo	19:15
evrardjp	or alternatively, only focus on the integrated repo for _everything_	19:16
evrardjp	(everything related to the integrated)	19:16
evrardjp	but you're right it all depends on what you want to achieve	19:16
evrardjp	for those large changes, think about the pain, write code, see if it's better, then iterate	19:16
evrardjp	it happened quite a few times I rewrote a large chunk of code, thinking it will be easier long term, then abandon it because it was clever and not simpler	19:17
noonedeadpunk	oh, yes, it's really hard to see whole picture until you start doing smth... And that leads to work abandonment :(	19:25
noonedeadpunk	jrosser: answering your question - no, I haven't yet. But I wonder how that aligns with https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 (the part that I tried to add extra role to admin one for services)	19:26
jrosser	i don't know tbh - there seems to be a lot of stuff in the ML thread now	19:27
jrosser	some implemented, some not right now	19:27
noonedeadpunk	`add service role to all service users` :)	19:28
noonedeadpunk	I kind of read their ideas https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009/6/defaults/main.yml#154	19:28
noonedeadpunk	I will read properly tomorrow and will interate on that...	19:28
noonedeadpunk	but yes, thread is big now...	19:29
noonedeadpunk	but indeed, overall changes seem big	19:45
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump OpenStack-Ansible master https://review.opendev.org/c/openstack/openstack-ansible/+/825390	19:50
noonedeadpunk	not sure though what project/system scopes will bring to us in terms of changes though	20:00
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OpenStack-Ansible Xena https://review.opendev.org/c/openstack/openstack-ansible/+/825391	20:03
DK4	does osa have any means to recover from a complete mariadb failure? are there any recover functions like in kolla?	20:17
noonedeadpunk	DK4: well, we're not finding out member with latest state, that's for sure. But you can trigger re-bootstrap and define boostrap node explicitly https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L21-L24	20:25
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible Wallaby https://review.opendev.org/c/openstack/openstack-ansible/+/825395	20:27
noonedeadpunk	But tbh I won't trust any automation toolings to recover my galera cluster :)	20:28
spatel	jamesdenton around	20:33
jamesdenton	maybe	20:33
spatel	I am still dealing with my GPU issue.. take a look here - https://paste.opendev.org/show/812235/	20:33
jamesdenton	k	20:33
spatel	I have two GPU card in compute node and don't know how my flavor will target them?	20:33
noonedeadpunk	spatel: and you want to do jsut paasthrough? As with v100 I guess it might make sense to use vgpus instead?	20:34
spatel	if you see my GPU PCI card has same bus number 10de:1df6	20:34
spatel	noonedeadpunk passthrough because we don't have license for vGPU	20:34
spatel	i believe we need to buy it in order to unlock that feature	20:35
noonedeadpunk	passthorugh should be really straight... But I think that it's still might be up to placement to report compute capabilities....	20:36
spatel	if i spin up first VM it works but how does second VM know i need to use second GPU?	20:36
noonedeadpunk	not sure htough	20:36
mgariepy	"pci_passthrough:alias"="tesla-v100:1" will match 1 gpu and assign it to your vm	20:37
jamesdenton	IIUC, the first flavor will assign 1 GPU, and the 2nd flavor will assign 2 GPU	20:37
mgariepy	"pci_passthrough:alias"="tesla-v100:2" will match 2 gpus and assign 2 to your vm	20:37
jamesdenton	^^^	20:37
spatel	i have created two flavor g1.small and g2.small with tesla-v100:1 and tesla-v100:2	20:37
jamesdenton	the flavor is targeting the GPUs via the alias you defined	20:38
jamesdenton	which in turn matches vendor/product id	20:38
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/victoria: Bump OpenStack-Ansible Victoria https://review.opendev.org/c/openstack/openstack-ansible/+/825397	20:39
jamesdenton	the flavors you defined give you the ability to match a single GPU to a single VM (twice) or both GPUs to a single VM	20:39
mgariepy	you also need to add the scheduling filtering stuff iirc	20:39
spatel	i need single VM with single GPU	20:39
jamesdenton	so, your first flavor with tesla-v100:1 should do that	20:39
spatel	its not doing :(	20:40
jamesdenton	ok, what's the error?	20:40
mgariepy	what is the error ?	20:40
mgariepy	lol @jamesdenton	20:40
jamesdenton	mind-meld	20:40
mgariepy	yep lol	20:40
spatel	first {"code": 500, "created": "2022-01-19T20:39:07Z", "message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 5c70f26e-840a-4336-b975-b8d81d3ef54f. Last exception: XML error: Hostdev already exists in the domain configuration", "details": "Traceback (most recent call last):	20:41
jamesdenton	send me your GPU and i fix for you :D	20:42
spatel	i thought tesla-v100:1 will target first GPU-1 and tesla-v100:2 will target GPU-2	20:42
jamesdenton	no, it's more to do with scheduling 1 or 2 GPUs to the same VM	20:42
spatel	ohhh	20:42
mgariepy	lspci -nk -s 5e:00.0	20:43
spatel	lol :)	20:43
mgariepy	make sure your do not have the nvidia or nouveau kernel module loaded for the gpus.	20:43
spatel	i have 12 compute nodes so total 24 GPU :) each has 2 GPU	20:43
jamesdenton	which flavor did you use in your test?	20:44
jamesdenton	try the single one, first	20:44
spatel	https://paste.opendev.org/show/812236/	20:44
spatel	let me just use single flavor tesla-v100:1 and try	20:45
spatel	mgariepy that output for you	20:46
mgariepy	what's in your nova.conf for : [pci] passthrough_whitelist ?	20:46
mgariepy	saw it it seems correct :D	20:46
mgariepy	should have something like: passthrough_whitelist = [{"vendor_id": "10de", "product_id": "1df6"}]	20:47
mgariepy	on your computes with gpus.	20:48
spatel	jamesdenton look at this i created both vm with single flavor and second one ERROR out - https://paste.opendev.org/show/812238/	20:48
spatel	i do have passthrough_whitelist	20:49
spatel	let me show you	20:49
spatel	https://paste.opendev.org/show/812239/ on my compute node	20:50
jamesdenton	so, spatel - there was a patch to libvirt about 6 mos ago that introduced that error: https://www.mail-archive.com/libvir-list@redhat.com/msg218688.html	20:52
jamesdenton	if i'm reading that correctly, anyway	20:53
spatel	let me read	20:53
prometheanfire	I'm trying to figure out why my infra nodes are not getting a storage_address in the container networks for the storage bridge	20:55
jamesdenton	I think you have to do that by hand	20:55
prometheanfire	I have the swift_proxy group bind added to the storage network	20:55
spatel	hmm interesting	20:55
spatel	patch by hand ?	20:56
jamesdenton	no that was for prometheanfire	20:56
prometheanfire	I have another cluser with it included, no idea why	20:56
jamesdenton	oh hmm	20:56
spatel	haha	20:56
prometheanfire	it has a storage_hostS stanza, but only on one of the three infra nodes, so probably not it	20:56
mgariepy	spatel, do you have any trace on the compute itself ?	20:59
spatel	trace?	20:59
jamesdenton	is there a traceback or error in the compute log	20:59
spatel	let me look	21:00
jamesdenton	and if you have nova-compute in debug mode, will it print the xml for the domain? i can't recall	21:00
prometheanfire	adding swift_hosts instead of swift_proxy seems to have done it, maybe docs are bad or need updating	21:00
jamesdenton	never, sir.	21:00
prometheanfire	or not, br-storage still not found	21:01
* prometheanfire shrugs		21:01
mgariepy	or error in dmesg is the kernel did not allowed you to do something for ${REASON}	21:01
mgariepy	or on libvirt	21:02
spatel	https://paste.opendev.org/show/812240/	21:03
spatel	in libvirt single line error - error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: Hostdev already exists in the domain configuration	21:03
spatel	can i open bug for nova, may be something is already patched and i am running older code	21:05
spatel	I am running wallaby	21:05
jamesdenton	what version of libvirt?	21:06
spatel	libvirt version: 7.6.0	21:06
spatel	https://www.mail-archive.com/libvir-list@redhat.com/msg218688.html looks very close to our issue	21:07
spatel	don't know if i have patched version or not	21:07
jamesdenton	7.6.0 appears to be where it was introduced	21:08
jamesdenton	this may be an unintended side effect	21:08
spatel	or may be i am missing some config or setting	21:09
spatel	jamesdenton you are correct when i use tesla-v100:2 flavor then i can see two GPU attached in my VM	21:15
jamesdenton	that's working?	21:16
jamesdenton	that's the one i would expect not to work :D	21:16
spatel	yes i can see two GPU connected to vm in lspci output	21:16
jamesdenton	but glad to hear it	21:16
spatel	look at yourself - https://paste.opendev.org/show/812241/	21:17
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove ANSIBLE_ACTION_PLUGINS override https://review.opendev.org/c/openstack/openstack-ansible/+/824595	21:17
jamesdenton	very nice. for grins, can you 'virsh dumpxml <domain>'?	21:17
spatel	k	21:17
spatel	let me pull out	21:17
spatel	https://paste.opendev.org/show/812242/	21:18
spatel	hostdev0 and hostdev1	21:19
spatel	look like same issue i have :) - https://bugs.launchpad.net/nova/+bug/1628168	21:21
jamesdenton	i saw that, but being 5+ years old i'm likely to ignore	21:22
spatel	lol	21:22
spatel	thinking to open bug for nova and see how it goes	21:23
jamesdenton	good call. i wonder if the same hostdev alias is being used for the second instance and causing any kind of issue (no idea)	21:25
jamesdenton	like, does it compare domains or only the single domain configuration	21:25
spatel	hmm	21:26
spatel	let me open bug and see how it goes	21:28
*** dviroel is now known as dviroel\|out		21:38
spatel	jamesdenton by the way i have build this cloud using kolla-ansible and using OVN for networking, this cloud has 50 around compute nodes	21:41
spatel	kolla-ansible is hard requirement from customer but in next upgrade i am planning to migrate it to OSA	21:42
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible master: Remove tempest.api.volume.admin.test_multi_backend test https://review.opendev.org/c/openstack/openstack-ansible/+/825166	21:54
jamesdenton	i noticed that. which openstack version?	21:55
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources. https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803477	21:55
spatel	wallaby	21:55
spatel	This is HPC openstack, it has all kind of cool toy like GPUs, InfiniBand network with 200Gbps	21:56
spatel	Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]	21:58
spatel	They are going to use it for Research	21:58
spatel	did you work on mechanism_drivers = mlnx_infiniband	22:02
spatel	jamesdenton do you know what is Partition Keys (PKEY) per network	22:08
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/803492	22:18
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Add support for both Credential Provider Mechanisms https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825403	22:26
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Remove unused variables https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825405	22:46
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Do not store unnecessary sections in tempest.conf https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825407	23:23
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded instance_type in [heat_plugin] section https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/825408	23:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!