Wednesday, 2022-07-06

*** ysandeep|out is now known as ysandeep|ruck03:12
*** raukadah is now known as chandankumar04:22
*** ysandeep|ruck is now known as ysandeep|ruck|afk04:32
*** ysandeep|ruck|afk is now known as ysandeep|ruck05:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_rally master: Return upgrade jobs to voting  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/84877806:43
jrosser_morning08:23
*** ysandeep|ruck is now known as ysandeep|ruck|lunch08:43
noonedeadpunko/09:04
damiandabrowski[m]hi!09:05
jrosser_doh got to be careful with those +W regarding gate queues09:08
noonedeadpunkoh yes09:08
jrosser_i still am 50/50 about if the shared queue is a good idea or no09:09
jrosser_like random db errors and MODULE FAILURE still happens too much09:09
jrosser_and those are soooo hard to debug, i don't know what to do about them09:09
jrosser_it was on my mind to make an AIO and take that MODULE FAILURE prone facts gathering task and just run it endlessly in a loop09:10
jrosser_"see stdout stderr for details" it says and theres just nothing to see09:11
noonedeadpunkoh yes, it's weird09:39
noonedeadpunkregarding shared queue - yeah, I don't know. But I'd tried it just to be sure it doesn't work for us indeed.09:40
noonedeadpunkAs zuul and infra folks were quite convincing about need of that09:41
jrosser_yeah, and it then also becomes quite important which order we +W things in09:42
noonedeadpunkAnd I don't think this is smth you can catch locally...09:42
jrosser_which is a total change of workflow for all of us09:42
noonedeadpunkOh really? I thought for some reason that without shared queues +w is important and at least that will be fixed...09:43
jrosser_+W order (and maybe something to do with topics?) decides the order the patches stack up in the gate queue i think09:43
jrosser_i was 8-O about that09:43
noonedeadpunkbut how that would affect our workflow...09:44
jrosser_well we allow many things to go in parallel09:44
noonedeadpunkwe still have depends-on and when it's defined queue should understand that09:44
jrosser_and quite often we make a mistake with dependancies or something that causes an os_<> role patch to fail09:44
jrosser_anyway09:45
noonedeadpunkyeah, dunno09:45
noonedeadpunkI mean it's contraversary for sure09:45
noonedeadpunkBut dunno if we should try that or not09:46
*** ysandeep|ruck|lunch is now known as ysandeep|ruck09:53
opendevreviewMerged openstack/openstack-ansible-os_rally master: Control rally-openstack installed version  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/84866610:35
*** dviroel|out is now known as dviroel11:25
mgariepywoohoo !!! openstack-ansible 25.0.0: Ansible playbooks for deploying OpenStack14:32
jrosser_deploy!14:39
mgariepyi usually wait a couple of months :D14:40
*** dviroel is now known as dviroel|lunch14:59
b1tsh1ft3rRunning train release here, is there a specific tag or playbook for removing a single controller node from the cluster? I know one exists for removing/adding compute nodes.15:08
noonedeadpunkb1tsh1ft3r: I'm pretty sure we don't have anything to remove compute, only to add15:16
jrosser_removing a controller is pretty challenging too15:16
jrosser_depending on what you want to do it is better to "replace it in place" rather than try to remove15:17
b1tsh1ft3rWell.. looking to remove it entirely to use the gear somewhere else. Best i could do for now was change up haproxy to not use the services on the node and then power it off for now.15:21
jrosser_it could easily be that the memcached config in all the hosts needs updating15:22
jrosser_perhaps oslo.cache deals with a missing server, not sure15:23
*** ysandeep|ruck is now known as ysandeep|dinner15:28
b1tsh1ft3rnoonedeadpunk im thinking of the openstack-ansible-ops remove_compute_node.yml playbook.15:29
noonedeadpunkoh, well. I didn't know about it lol15:31
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Add default rate-limits for API endpoints and Horizon authentication  https://review.opendev.org/c/openstack/openstack-ansible/+/84865915:31
b1tsh1ft3rHeh, no worry. I figured if the compute node removal playbook existed, surely a controller would. Oh well15:32
jrosser_i think it doesnt exist becasue to do it properly is really tricky15:33
b1tsh1ft3rYeah i could see that for sure. It's tied into quite a lot.15:33
*** ysandeep|dinner is now known as ysandeep16:03
*** dviroel|lunch is now known as dviroel16:08
*** ysandeep is now known as ysandeep|out16:08
spatelanyone here from cumulus world?18:08
spateli need help to setup switch18:09
mgariepyi know a little bit18:26
mgariepywhat do you need ?18:26
mgariepyspatel, ^^ 18:40
spatelgive me a sec.. 18:45
spatelmgariepy i am learning cumulus linux 19:11
spateli used vagrant to bring up one cumulus linux 19:11
spatelbut its not allowing me to run "net add" etc.. (100% related to permission or privileg issue)19:12
spateltrying to understand how do i give full access to vagrant user so it can run all NLCU commands 19:13
spatelThis is very nice doc but somehow not working for me - https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-41/System-Configuration/Network-Command-Line-Utility-NCLU/#:~:text=To%20add%20a%20new%20user,group%20%60netedit'%20...&text=You%20can%20use%20the%20adduser%20command%20for%20local%20user%20accounts%20only.19:13
spatelusers_with_edit = root, cumulus19:15
spatelgroups_with_edit = netedit19:15
spatelhttps://paste.opendev.org/show/bDI0aMXdlkoInJkR1d1E/19:16
mgariepyho. no idea. we do run it on our switches and we do always use the cumulus user.19:18
mgariepyour testbed are on gns319:18
mgariepywe do have some ansible playbook to manage the configuration19:19
spatelI did switch to cumulus user but still same issue19:20
mgariepyyou probably need to complet command19:20
mgariepy`net show configuration commands`19:21
mgariepydoes that display the running config ?19:21
mgariepyyou also should have tab completion19:22
noonedeadpunkspatel: never trust nvidia docs :p19:22
spatel:(19:22
mgariepylol.19:22
spatelThey own cumulus lol19:23
noonedeadpunkthat's what I learned working with vGPUs19:23
mgariepyand yet they push for sonicos.19:23
noonedeadpunkthey also own GRID licensing. Doesn't mean their docs are always relevant19:23
mgariepyhttps://en.wikipedia.org/wiki/SONiC_(operating_system)19:23
noonedeadpunkand not misleading or just wrong19:23
noonedeadpunkbtw, if the own cumulus, why they ship their drivers in packed qcow images of Ubuntu?19:24
noonedeadpunkThey can't get it working as well? :p19:25
noonedeadpunks/drivers/license server19:25
spatelhmm19:26
mgariepylol.19:26
noonedeadpunksry, just got some beer and every time I hear nvidia I feel so frustrated with them....19:26
spatelI am building ovn lab using cumulus spine-leaf network 19:26
mgariepyi've seens quite a lot of ugly stuff from corporate trying to do linux stuff19:27
mgariepyso that `net show configuration commands` does it works ??19:28
spatelall net show command works 19:28
mgariepyok that's a good start19:28
spatelonly net add or privilege command not working :(19:28
mgariepyfrom cumulus and vagrant user?19:28
spatelcumulus@spine-1:~$ id cumulus19:28
spateluid=1000(cumulus) gid=1000(cumulus) groups=1000(cumulus),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),993(netedit)19:28
spatelI am cumulus user and it has permission of netedit group19:29
mgariepythen ? what does it output ? `net add hostname spine1`?19:30
spateldo i need to restart any service etc ?19:30
mgariepyif you edit the configuration of certain services i think you do.19:31
spatelhttps://paste.opendev.org/show/bg8FKSb9lZ8q0wgMuo2s/19:31
spatelit doesn't understand "add" 19:31
mgariepynet help ?19:31
spatelnet show config works 19:31
spatelhttps://paste.opendev.org/show/bKrfc6jr6Wtz4pi9vGdG/19:32
mgariepyhmm. weird.19:32
mgariepywhat pkg do you have installed?19:32
spatelI just did vagrant init cumulus-vx 19:34
spatelvangrant up19:34
spateland machine was ready in few min19:34
jamesdentonwhat version of cumulus do you end up with on that?19:35
jrosseri never did anything more that run up one of the examples but there is stuff here https://air.nvidia.com/Login19:35
jrossernvidia air is a virtualised network-lab-as-a-service you can try stuff out in19:36
spatelCumulus Linux 5.1.019:36
mgariepyinstalling vagrant and virtual box on a spare server to test it 19:37
spateljrosser I am trying to build lab on my desktop 19:37
jamesdentonhttps://docs.nvidia.com/networking-ethernet-software/cumulus-linux-50/System-Configuration/NVIDIA-User-Experience-NVUE/19:37
spatelHere is the lab which i am trying to mimic using their vagrant files - https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/19:38
jamesdentonnclu may be deprecated now?19:38
mgariepynclu has been replaced by curl ? 19:39
jamesdentonlol19:39
jamesdentonplausible. but, NVUE CLI 19:39
jamesdenton"In addition to the nv show commands, Cumulus Linux continues to provide a subset of the NCLU net show commands."19:39
mgariepyhahah19:39
jamesdentonIIRC this change made me migrate my lone Cumulus-based switch -> SONiC19:40
mgariepyyeah19:41
mgariepySONiC seems quite nice.19:41
noonedeadpunkas I said - never trust nvidia docs :D19:41
mgariepyand the cumulus support is just...19:41
mgariepyterrible ..19:41
jamesdentonlast convo i had with Nvidia, my takeaway was they were going to double down on cumulus and maybe slowly pull away from Onyx19:42
jamesdentonso, maybe it will get better19:42
mgariepywe have a site with probably close to 100 of cumulus switch.19:43
mgariepywhen the network engeneer need support it's always painful19:43
jamesdentonhow do you manage them?19:44
mgariepythey are managed via ansible19:44
mgariepysome homemade playbooks.19:44
jamesdentonhow do you store their configuration? in inventory files or do you have some better way?19:45
mgariepyit's somewhat bad and in inventory...19:45
spatellet me post this issue in reddit or stackoverflow to see what other folks thinking19:47
mgariepydoes `nv set system` works?19:48
spatelif nclu is deprecated then what is the alternative way?19:48
mgariepyhttps://docs.nvidia.com/networking-ethernet-software/cumulus-linux-51/System-Configuration/NVIDIA-User-Experience-NVUE/NVUE-CLI/19:48
mgariepythis ? ^^19:48
spateli ran this command - nv set system and it works but what is this?19:48
spatelno error in command19:48
mgariepynv set system hostname leaf0119:48
mgariepythat's the new command.. 19:48
spatelit works - nv set system hostname spine-119:49
spateldamn, now i need to translate all net add to nv stuff19:49
mgariepycumulus linux 5 is nv now ..19:49
mgariepyso..19:49
spatelnoonedeadpunk agreed never trust nvidia doc 19:50
mgariepywell you were not reading the right doc ;p19:50
* mgariepy not defending nvidia19:50
spatelThis is handy - https://docs.nvidia.com/networking-ethernet-software/knowledge-base/Configuration-and-Usage/Network-Configuration/NCLU-to-NVUE-Commands/19:50
spatelThey should update doc (with if / else version ) :D19:51
mgariepythe version is in the url ;p19:51
mgariepysave you some trouble.. go with sonic :P19:52
mgariepyjamesdenton, is sonic working correcly on your side ?19:52
spatelsonic switches?19:52
mgariepyi did try to convince the network guy to try it but i'm not sure it will be done.19:53
mgariepyhttps://github.com/sonic-net/SONiC/blob/master/doc/SONiC-User-Manual.md#sonic-user-manual19:53
jamesdenton"working correctly"19:54
jamesdentonif you mean, forwarding traffic, then yes19:55
jamesdentoni have it installed on an SN210019:55
mgariepywhat no advanced configuration ?19:55
jamesdentoni think i had immediate issues with some of the CLI commands, and needed to patch them. But nothing too advanced, no. 19:55
jamesdentoni've only got 1. and it's powered off right now to save me a few bucks in power costs19:56
spatelwhat HW are you guys running sonic ?19:58
jamesdentonMellanox/NVIDIA SN2100 here19:58
spatelhmm19:58
jamesdentoni can only imagine support is worse than Cumulus Linux, though, since it's DIY19:59
mgariepywell and least sonic os open..20:00
mgariepynot a big fan of support contract here..20:00
jamesdentontrue20:01
mgariepythat's why i love osa :) haha20:01
noonedeadpunkmgariepy: they should have training for finding right doc20:02
noonedeadpunkwhich you also should never trust:D20:02
mgariepylol20:02
mgariepyyeah or just.. net add .. error. you should sitch to nv instead.. lol20:03
spatelnv also giving tough time.. crashing daemon 20:08
spatelI am going to switch image version to 3.x 20:08
jamesdenton3.x. The dark ages.20:09
mgariepyyep.20:09
mgariepyindeed it's quite old.20:09
mgariepyhave a nice weekend guys. i'm out until monday.20:09
jamesdentonsee ya! enjoy your time off20:09
mgariepythanks20:09
spatelI want to match with author version for POC. i am not going to use cumulus in my life... hehe20:09
noonedeadpunkoh, that's sweet! have great time!20:09
spatelThis is for simplicity.. otherwise my goal is to use Cisco lab 20:10
noonedeadpunk*a great time20:10
noonedeadpunkI see you love pain spatel :D20:10
jamesdentonit's a good learning exercise, if it works20:11
spatelI will make it work so we can implement in OSA :)20:11
jamesdenton-120:12
noonedeadpunkworks for me then :)20:12
jamesdenton:D20:12
spatel-1 scare the shit out of me...lol20:12
noonedeadpunkdoesn't reject my previous replica though hehe20:12
noonedeadpunk-1 should not. jamesdenton was generous and haven't put -2 :[20:12
spatellol20:13
spatelok guys time for some beers... my brain exploding watching errors whole day :D20:14
noonedeadpunkcheers!20:15
spateli will see you tomorrow!! good night20:15
jamesdentonenjoy20:15
*** dviroel is now known as dviroel|out20:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!