Wednesday, 2022-01-19

bjologodmorning07:43
bjologoodmorning i mean :)07:44
jrossergood morning07:55
noonedeadpunk\o/08:16
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82511310:25
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Create ssh certificate authority  https://review.opendev.org/c/openstack/openstack-ansible/+/82529210:27
*** dviroel|out is now known as dviroel10:58
jrossernoonedeadpunk: did you see the ML stuff about the new RBAC for deployment projects?11:38
damiandabrowski[m]he's not fully available today12:03
damiandabrowski[m]guys, have You noticed that attaching cinder volume does not work on our AIO? for some reason, iscsid.socket is not spawning iscsid.service12:05
damiandabrowski[m]starting iscsid.service manually or rebooting the whole AIO helps12:06
damiandabrowski[m]but I wonder how to properly fix it12:06
damiandabrowski[m](i'm testing it on focal)12:09
jrosserthat might be why the zun tests fail12:14
jrosser(one of the reasons)12:14
jrosserit could be an ordering issue, seeing the service state and journal from a fresh AIO might be interesting12:15
jrosserto see if it was ever tried to be started, or there is some error with the config which then doesnt get reloaded12:16
damiandabrowski[m]I just spawned a fresh aio12:18
damiandabrowski[m]https://paste.openstack.org/show/812220/12:18
damiandabrowski[m]don't see anything indicating that iscsid.service tried to start before12:19
andrewbonneyI've seen this before in my own AIOs, but relevant tests have passed in CI. I couldn't work out why there was a difference12:27
damiandabrowski[m]i assume that's why this test is currently disabled(it tries to attach a volume to the VM which is not working, so it fails):12:29
damiandabrowski[m]https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/utility_all.yml#L9612:29
jrosserdamiandabrowski[m]: you might want to look at this12:33
jrosserhttps://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/82404212:33
jrosserit's not about iscsi but i used a socket activated service there to replace xinetd12:34
jrosserso you can see the order that the services need to be created / loaded / restarted to make that work12:34
jrosseri think that the state: "restarted" was a key thing on the socket service itself12:35
damiandabrowski[m]thank You, unfortunately restarting iscsid.socket does not help12:37
damiandabrowski[m]I'm trying to find what does `ListenStream=@ISCSIADM_ABSTRACT_NAMESPACE` in socket definition means, because I have literally no idea O.o12:38
jrosserwhich bit? ListenStream?12:39
damiandabrowski[m]no, `@ISCSIADM_ABSTRACT_NAMESPACE` :D 12:39
jrosser`If the address starts with an at symbol ("@"), it is read as abstract namespace socket in the AF_UNIX family.`12:40
damiandabrowski[m]ahhh, thanks12:41
jrosserhttps://www.freedesktop.org/software/systemd/man/systemd.socket.html12:42
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82530612:45
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Create ssh certificate authority  https://review.opendev.org/c/openstack/openstack-ansible/+/82529212:45
jrosserdamiandabrowski[m]: this is related but a long time ago https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/175585812:54
damiandabrowski[m]ah, so we probably hit our issue when this one was fixed13:01
damiandabrowski[m]the interesting thing is that when i started and then stopped iscsid.service manually, then nova/cinder were able to start it again when i tried to attach a volume(previously they couldn't do that)13:03
opendevreviewMerged openstack/openstack-ansible master: Move system_crontab_coordination role to collection  https://review.opendev.org/c/openstack/openstack-ansible/+/82459314:16
*** dviroel is now known as dviroel|lunch14:57
DK4im using this guide for ceph prod setup: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html and fail because of ceph error in the second playbook: https://controlc.com/01090692 any ideas?15:45
jrosserDK4: ultimately i think that this is the documentation diverged from reality https://paste.opendev.org/show/812224/15:49
jrosseri would think that in the past cidr_networks used to be available in ansible hostvars but that seems not to be the case any more15:50
DK4jrosser: thanks for the quick response. i think i found the mistake, ive forgot the &-anchor in the userconfig15:50
jrosserquickest workaround would be to replace those entries in user_variables.yml with the actual address ranges15:50
jrosserDK4: for production deployments, we normally see people deploying separate ceph clusters, rather than integrated tightly with OSA15:52
jrosseryou have the choice to do it either way, but long term maintainance tends to be easier if they are decoupled15:53
jrosserbut size / scale / use-case can also play a part15:53
*** dviroel|lunch is now known as dviroel16:15
evrardjphello folks. For keepalived I am still testing with ansible 2.9.  Should I drop this? 16:47
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_nova master: Use ssh_keypairs role to generate cold migration ssh keys  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82530616:47
jrosserevrardjp: we are a long way ahead of that now in these parts16:47
evrardjpjrosser: including stable branches? 16:48
jrosserthough it depends how far back you want to cover16:48
evrardjpas long as OSA is covering I guess16:48
evrardjpfor the rest of the folks using the roles I think it's fine to move on. 16:48
evrardjpalternatively,  old osa branches can just not bump keepalived role, which is fine too16:49
jrosserussuri is EM and the last place we used 2.916:49
evrardjpok then I should be good16:49
evrardjpthanks for confirming jrosser! 16:49
jrosserwe are 2.10 for V & W so that will be around for a while yet16:50
evrardjpand happy new year to you, your family,  and your team :)16:50
jrosserthankyou :)16:50
evrardjpgood to see damiandabrowski[m] in here :)16:51
damiandabrowski[m]hey JP!16:52
evrardjpdamiandabrowski[m]: unrelated convo, are you using matrix?16:55
damiandabrowski[m]I am, is it wrong? :D 16:56
evrardjpWell,  I do have matrix, but I am still using my bouncer.  I want to get rid of it tbh 16:57
evrardjpwas wondering if the bridge is nice nowadays.16:57
jrosseri've never looked back from irccloud16:58
evrardjpjrosser: a bit more context, when I was in TC, before the whole mess with freenode happened: https://governance.openstack.org/ideas/ideas/pylon/synchronous-and-pseudo-synchronous-comms.html#the-proposal 16:59
evrardjpbut yeah irccloud is nice. 17:00
damiandabrowski[m]I'm using element.io and I'm quite happy with it, but I never used anything else(except some console client years ago) so I can't really compare17:01
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516417:23
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Implement variable: tempest_endpoint_type  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82515617:26
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Rename [orchestration] section to [heat_plugin] in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516317:27
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516417:31
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Implement variable: tempest_endpoint_type  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82515617:32
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516417:33
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Rename [orchestration] section to [heat_plugin] in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516317:35
spatelI have glusterFS mount point on all my compute nodes and i pointed nova /var/lib/nova to glusterfs mount point and all good but when i delete vm i found nova not deleting disk file so i have to do it by hand 17:35
evrardjpSometimes I want to shoot myself in the head when I see the direction ansible and molecule is taking. Making things incredibly hard for 0 reason... 17:35
spateldid anyone noticed this issue with shared mount point17:35
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516417:40
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: do not include [*-feature-enabled] sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82516417:40
evrardjphas anyone here an example repo with molecule testing? I would like to know what's the recommended way to add the docker collection to make testing work with molecule with requiring the collection into a new requirements.yml on the root of my repo17:46
evrardjpwithout the docker collection,  it all fails,  as molecule-docker now requires it since version >1.017:46
jrosserevrardjp: the tripleo people started on this in the os_tempest role https://github.com/openstack/openstack-ansible-os_tempest/commit/3f4b58bd4133b83c8556c2275875188147d2a58b17:48
jrosserbut i feel that it really has not gone anywhere17:48
evrardjpI see17:48
jrosserhowever i'm not really sure this is going to be helpful17:49
evrardjpI was using molecule,  but pinned to old versions17:49
evrardjpit's the new versions that are a pain, because they are assuming the docker collection is installed on the system17:49
jrosserwe do have the same problem in OSA, what to do in a post openstack-ansible-tests world17:49
jrosserwe are very close to dropping all the jobs relying on that repo now as the maintainance overhead is just too much17:49
evrardjpwell, I feel that the value of having different repos is nowadays pretty much removed17:50
jrosserbut it does leave a gap with how to run tests in underlying/utility roles17:50
evrardjpjrosser: would it make sense to take a stance,  in the OSA community,  abotu where you want to head in terms of testing,  and get the ball rolling? 17:51
jrosserindeed17:51
evrardjpIf you feel it's not sustainable,  that's something you need to fix17:51
evrardjpyou -> we17:52
evrardjpwas there any proposition raised in the last PTG?17:52
jrossermostly we have to address tech debt currently17:52
jrosserso discussions tend to focus on that17:53
jrosserand we do make good progress though17:53
jrosserbut i guess i mean feature debt rather than process / ci17:53
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources.  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80347717:55
jrosserlike i say in terms of sustainability, there is not enough effort to simultaneously keep openstack-ansible and openstack-ansible-tests both functional17:55
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources.  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80347717:55
jrosserbut i think we lack someone with insight / ready answer for ansible role testing outside of the AIO17:56
evrardjpThen you might want to reconsider the current testing structure to simplify indeed17:57
evrardjpmoving all the things back to the openstack-ansible might help on maintainability17:57
evrardjp(as you focus energy on testing scenarios)17:58
evrardjp(and reduce the amount of repos)17:58
evrardjpI see there is less and less reason to work on separate repos nowadays.  17:58
jrosserthe only time that multiple repos is a big pain is when we want to do refactoring across them all17:59
evrardjpmaybe there should be some kind of project plan to do such refactors?17:59
evrardjpmoving to noop jobs isn't that hard ;) 17:59
jrosserthis sort of thing https://review.opendev.org/q/topic:%2522osa/include_vars%2522+(status:open+OR+status:merged)17:59
evrardjpI mean, it's all work, so you need to evaluate the end goal and if it's worth it over time18:00
jrossermoving the existing roles to a collection would be easy18:00
jrosserand with some benefit18:00
evrardjpwell,  you could go as crazy as bringing all roles into the integrated repo,  it would make things far simpler.  But then you lose the flexibility of overriding easily. That is something I am not sure the community is ready to pay18:01
jrossermore problematic is key things like the new work on pki and ssh where the low level roles lack really any rigorous tests18:01
evrardjpyaeh,  but that's not something the _structure_ will fix18:02
jrosserno indeed18:02
jrossermore that we don't have a cookie-cutter pattern to use there yet18:02
evrardjpthat's understanding the importance of tests,  a different topic :) 18:02
evrardjpwe used to have one18:02
evrardjpbut it wasn't really well maintained18:02
evrardjptestability of the roles is the hardest, tbh18:03
evrardjpbecause that needs thinking what needs to run to be efficient18:03
evrardjpstandalone work can probably be more easily tested.... 18:03
evrardjpso what's holding those roles up to increase coverage? 18:03
evrardjpmanpower? prioritization? 18:04
evrardjpor setting expectations for commits?18:04
jrosserno, knowing what to do there "here is a great way to test your role in a zuul job -> use it" vs. having to figure that out18:04
evrardjpI have to go for dinner, but it sounds to me that you found the solution: Define template, and document that in OSA ;) 18:06
jrosserand i would prefer that to be a something as simple as possible, the openstack-ansible-tests repo was mind bogglingly complex18:06
evrardjpthat's for sure18:07
evrardjpfor standalone you can probably use molecule directly ;)18:07
jrosserright, and standalone should == in zuul18:07
jrosserto match expectations with openstack-ansible repo18:07
jrosseranyway, enjoy your dinner :)18:08
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80349218:15
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80349218:15
noonedeadpunkDoh, it feels I missed all fun...18:27
noonedeadpunkBut I'm not conviced there's no reason to have roles in independent repos nowadays as well18:28
noonedeadpunkLike looking at huge ceph-ansible (which is not _that_ big comparing to osa) - and repo is really overloaded.18:31
evrardjpI think the point was not structure,  but test coverage: The need to have a documented "standard" for standalone testing,  and simplify the coverage overall18:32
noonedeadpunkWell with having roles separately is easier to controll coverage imo18:33
evrardjpfor non standalone,  it seems the -tests repo is considered complex, and simplification would be welcomed.18:33
noonedeadpunkas it's super easy to miss smth when having all gathered together18:33
evrardjpI agree with you18:33
evrardjpit's however easy to "miss something" in all cases18:33
noonedeadpunkWhile we do miss things now, we also kind of know what )18:33
noonedeadpunkbut yeah18:34
evrardjpI think jrosser is right however on deciding on a standard for standalone roles,  which should be "easy to apply" 18:34
noonedeadpunkalso if continue with ceph-ansible - they run about 20 jobs for each change to cover scenarious they have18:34
noonedeadpunkinfra will kill us for that approach:)18:34
noonedeadpunkYes, absolutely18:35
evrardjpto simply the -tests,  I feel it _could_ (not saying we should do it) make sense to make the roles that can only be tested "together"18:36
evrardjpe.g.  use a collection,  or make those part of the main repo18:36
noonedeadpunkWhat we miss now is the way of testing collections themselves at the moment18:38
noonedeadpunk(not even saying about publishing them)18:38
noonedeadpunkso yeah, we have room for improvement and I was thinking about jsut other more unified and simplified way comparing to tests repo of running tests/test.yml tbh18:39
evrardjpI am not understanding your proposition :)18:41
noonedeadpunkso idea is kind of to leverage gate-check-commit, but instead of start deploying things, run test.yml from repo18:42
noonedeadpunkas main pita with tests were their own way of deploying things, own a-r-r, inventory, etc18:43
noonedeadpunkwhich was leading to cross-dependencies, being unable to update ansible version (as not possible to update it in 2 places at same time)18:44
noonedeadpunkbut it might be not that good idea or it might not work out as expected. Wasn't really looking into details yet, just smth that raised in mind during previous meeting18:45
noonedeadpunkbut again - it's all about what issue we solve... This way it would be really easy to start per-repo test, that are defined only inside that repo. But have exact same environment prepared as if it was a regular deployment18:49
jrossermaybe we can move the bootstrap host role to the plugins collection18:50
jrosserthen that becomes a common piece18:50
noonedeadpunkbut we also need bootstrap-ansible?18:56
noonedeadpunkI'd even say that bootstrap-ansible might be key thing for such tests?18:57
evrardjpbootstrap-ansible should be restricted to just install ansible in a way we expect 19:13
evrardjpfor me,  it _could_ make sense to move bootstrap_host content the sole purpose of the test repo19:15
evrardjpor alternatively,  only focus on the integrated repo for _everything_ 19:16
evrardjp(everything related to the integrated)19:16
evrardjpbut you're right it all depends on what you want to achieve19:16
evrardjpfor those large changes,  think about the pain,  write code,  see if it's better,  then iterate 19:16
evrardjpit happened quite a few times I rewrote a large chunk of code, thinking it will be easier long term, then abandon it because it was clever and not simpler19:17
noonedeadpunkoh, yes, it's really hard to see whole picture until you start doing smth... And that leads to work abandonment :(19:25
noonedeadpunkjrosser: answering your question - no, I haven't yet. But I wonder how that aligns with https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009 (the part that I tried to add extra role to admin one for services)19:26
jrosseri don't know tbh - there seems to be a lot of stuff in the ML thread now19:27
jrossersome implemented, some not right now19:27
noonedeadpunk`add service role to all service users` :)19:28
noonedeadpunkI kind of read their ideas https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/823009/6/defaults/main.yml#15419:28
noonedeadpunkI will read properly tomorrow and will interate on that...19:28
noonedeadpunkbut yes, thread is big now...19:29
noonedeadpunkbut indeed, overall changes seem big19:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump OpenStack-Ansible master  https://review.opendev.org/c/openstack/openstack-ansible/+/82539019:50
noonedeadpunknot sure though what project/system scopes will bring to us in terms of changes though20:00
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/xena: Bump OpenStack-Ansible Xena  https://review.opendev.org/c/openstack/openstack-ansible/+/82539120:03
DK4does osa have any means to recover from a complete mariadb failure? are there any recover functions like in kolla?20:17
noonedeadpunkDK4: well, we're not finding out member with latest state, that's for sure. But you can trigger re-bootstrap and define boostrap node explicitly https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L21-L2420:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible Wallaby  https://review.opendev.org/c/openstack/openstack-ansible/+/82539520:27
noonedeadpunkBut tbh I won't trust any automation toolings to recover my galera cluster :)20:28
spateljamesdenton around 20:33
jamesdentonmaybe20:33
spatelI am still dealing with my GPU issue.. take a look here - https://paste.opendev.org/show/812235/20:33
jamesdentonk20:33
spatelI have two GPU card in compute node and don't know how my flavor will target them?20:33
noonedeadpunkspatel: and you want to do jsut paasthrough? As with v100 I guess it might make sense to use vgpus instead?20:34
spatelif you see my GPU PCI card has same bus number 10de:1df620:34
spatelnoonedeadpunk passthrough because we don't have license for vGPU20:34
spateli believe we need to buy it in order to unlock that feature 20:35
noonedeadpunkpassthorugh should be really straight... But I think that it's still might be up to placement to report compute capabilities....20:36
spatelif i spin up first VM it works but how does second VM know i need to use second GPU?20:36
noonedeadpunknot sure htough20:36
mgariepy"pci_passthrough:alias"="tesla-v100:1"  will match 1 gpu and assign it to your vm20:37
jamesdentonIIUC, the first flavor will assign 1 GPU, and the 2nd flavor will assign 2 GPU20:37
mgariepy"pci_passthrough:alias"="tesla-v100:2"  will match 2 gpus and assign 2 to your vm20:37
jamesdenton^^^20:37
spateli have created two flavor g1.small and g2.small with tesla-v100:1 and tesla-v100:2 20:37
jamesdentonthe flavor is targeting the GPUs via the alias you defined20:38
jamesdentonwhich in turn matches vendor/product id20:38
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/victoria: Bump OpenStack-Ansible Victoria  https://review.opendev.org/c/openstack/openstack-ansible/+/82539720:39
jamesdentonthe flavors you defined give you the ability to match a single GPU to a single VM (twice) or both GPUs to a single VM20:39
mgariepyyou also need to add the scheduling filtering stuff iirc20:39
spateli need single VM with single GPU 20:39
jamesdentonso, your first flavor with tesla-v100:1 should do that20:39
spatelits not doing :(20:40
jamesdentonok, what's the error?20:40
mgariepywhat is the error ?20:40
mgariepylol @jamesdenton 20:40
jamesdentonmind-meld20:40
mgariepyyep lol20:40
spatelfirst {"code": 500, "created": "2022-01-19T20:39:07Z", "message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 5c70f26e-840a-4336-b975-b8d81d3ef54f. Last exception: XML error: Hostdev already exists in the domain configuration", "details": "Traceback (most recent call last):20:41
jamesdentonsend me your GPU and i fix for you :D20:42
spateli thought tesla-v100:1 will target first GPU-1 and tesla-v100:2 will target GPU-2 20:42
jamesdentonno, it's more to do with scheduling 1 or 2 GPUs to the same VM20:42
spatelohhh20:42
mgariepylspci -nk -s 5e:00.0 20:43
spatellol :)20:43
mgariepymake sure your do not have the nvidia or nouveau kernel module loaded for the gpus.20:43
spateli have 12 compute nodes so total 24 GPU :) each has 2 GPU20:43
jamesdentonwhich flavor did you use in your test?20:44
jamesdentontry the single one, first20:44
spatelhttps://paste.opendev.org/show/812236/20:44
spatellet me just use single flavor tesla-v100:1 and try 20:45
spatelmgariepy that output for you20:46
mgariepywhat's in your nova.conf for : [pci] passthrough_whitelist ?20:46
mgariepysaw it it seems correct :D20:46
mgariepyshould have something like: passthrough_whitelist = [{"vendor_id": "10de", "product_id": "1df6"}]20:47
mgariepyon your computes with gpus.20:48
spateljamesdenton look at this i created both vm with single flavor and second one ERROR out - https://paste.opendev.org/show/812238/20:48
spateli do have passthrough_whitelist 20:49
spatellet me show you20:49
spatelhttps://paste.opendev.org/show/812239/ on my compute node20:50
jamesdentonso, spatel - there was a patch to libvirt about 6 mos ago that introduced that error: https://www.mail-archive.com/libvir-list@redhat.com/msg218688.html20:52
jamesdentonif i'm reading that correctly, anyway20:53
spatellet me read20:53
prometheanfireI'm trying to figure out why my infra nodes are not getting a storage_address in the container networks for the storage bridge20:55
jamesdentonI think you have to do that by hand20:55
prometheanfireI have the swift_proxy group bind added to the storage network20:55
spatelhmm interesting 20:55
spatelpatch by hand ?20:56
jamesdentonno that was for prometheanfire 20:56
prometheanfireI have another cluser with it included, no idea why20:56
jamesdentonoh hmm20:56
spatelhaha20:56
prometheanfireit has a storage_hostS stanza, but only on one of the three infra nodes, so probably not it20:56
mgariepyspatel, do you have any trace on the compute itself ?20:59
spateltrace? 20:59
jamesdentonis there a traceback or error in the compute log20:59
spatellet me look 21:00
jamesdentonand if you have nova-compute in debug mode, will it print the xml for the domain? i can't recall21:00
prometheanfireadding swift_hosts instead of swift_proxy seems to have done it, maybe docs are bad or need updating21:00
jamesdentonnever, sir.21:00
prometheanfireor not, br-storage still not found21:01
* prometheanfire shrugs21:01
mgariepyor error in dmesg is the kernel did not allowed you to do something for ${REASON}21:01
mgariepyor on libvirt21:02
spatelhttps://paste.opendev.org/show/812240/21:03
spatelin libvirt single line error - error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error: Hostdev already exists in the domain configuration21:03
spatelcan i open bug for nova, may be something is already patched and i am running older code 21:05
spatelI am running wallaby 21:05
jamesdentonwhat version of libvirt?21:06
spatellibvirt version: 7.6.021:06
spatelhttps://www.mail-archive.com/libvir-list@redhat.com/msg218688.html looks very close to our issue21:07
spateldon't know if i have patched version or not21:07
jamesdenton7.6.0 appears to be where it was introduced21:08
jamesdentonthis may be an unintended side effect21:08
spatelor may be i am missing some config or setting21:09
spateljamesdenton you are correct when i use tesla-v100:2 flavor then i can see two GPU attached in my VM 21:15
jamesdentonthat's working?21:16
jamesdentonthat's the one i would expect not to work :D21:16
spatelyes i can see two GPU connected to vm in lspci output21:16
jamesdentonbut glad to hear it21:16
spatellook at yourself - https://paste.opendev.org/show/812241/21:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove ANSIBLE_ACTION_PLUGINS override  https://review.opendev.org/c/openstack/openstack-ansible/+/82459521:17
jamesdentonvery nice. for grins, can you 'virsh dumpxml <domain>'?21:17
spatelk21:17
spatellet me pull out21:17
spatelhttps://paste.opendev.org/show/812242/21:18
spatelhostdev0 and hostdev1 21:19
spatellook like same issue i have :) - https://bugs.launchpad.net/nova/+bug/162816821:21
jamesdentoni saw that, but being 5+ years old i'm likely to ignore21:22
spatellol21:22
spatelthinking to open bug for nova and see how it goes21:23
jamesdentongood call. i wonder if the same hostdev alias is being used for the second instance and causing any kind of issue (no idea)21:25
jamesdentonlike, does it compare domains or only the single domain configuration21:25
spatelhmm 21:26
spatellet me open bug and see how it goes 21:28
*** dviroel is now known as dviroel|out21:38
spateljamesdenton by the way i have build this cloud using kolla-ansible and using OVN for networking, this cloud has 50 around compute nodes 21:41
spatelkolla-ansible is hard requirement from customer but in next upgrade i am planning to migrate it to OSA 21:42
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Remove tempest.api.volume.admin.test_multi_backend test  https://review.opendev.org/c/openstack/openstack-ansible/+/82516621:54
jamesdentoni noticed that. which openstack version?21:55
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Allow to create only specific tempest resources.  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80347721:55
spatelwallaby21:55
spatelThis is HPC openstack, it has all kind of cool toy like GPUs, InfiniBand network with 200Gbps 21:56
spatelInfiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]21:58
spatelThey are going to use it for Research 21:58
spateldid you work on mechanism_drivers = mlnx_infiniband22:02
spateljamesdenton do you know what is Partition Keys (PKEY) per network22:08
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded flavor_ref and flavor_ref_alt  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/80349222:18
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Add support for both Credential Provider Mechanisms  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82540322:26
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Remove unused variables  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82540522:46
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Do not store unnecessary sections in tempest.conf  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82540723:23
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_tempest master: Fix hardcoded instance_type in [heat_plugin] section  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82540823:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!