Monday, 2021-11-01

*** sshnaidm is now known as sshnaidm|afk10:09
*** sshnaidm|afk is now known as sshnaidm11:39
evrardjpShameless plug: If you know someone who wants to work on Ironic,  feel free to give that person this link: https://citynetwork.uhigher.com/en/job-details?job=61508774-cc6f-4269-87c2-3e07162160f7 ...  Or to contact me on irc ;)13:31
spateljamesdenton_alt  altnative ID :)14:47
spatelwhat happened here?14:47
*** jamesdenton_alt is now known as jamesdenton14:48
jamesdentonmaybe an imposter!14:48
spatel:)14:52
spatelI have very strange issue going on related networking 14:52
spateli thought may be you can help me guide me or advice me14:53
spatelwe have c7000 HP chassis with 16 gen9 blades 14:53
spatelall blade configured for Active-Standby LACP bundle for redundancy. 14:53
spatelyesterday i noticed one of blade has some crash and turn out related memory failure. but that created strange issue that blade switch went wrong and stop sending LACP PDU to upstream TOR switch and switch isolated :(14:55
spatelI have wild theory that may be memory failure created loop on switch (not sure how) 14:56
spatelthinking to configure PASSIVE LACP config on HP blade switch side so if anything happened to server switch will shutdown port. 14:57
jamesdentonhmm14:59
jamesdentonwas it active-standby or lacp? I think lacp aggregates all links?15:00
spatelhttps://paste.opendev.org/show/810315/15:01
spatelThis is what i have on Ubuntu server15:02
spatelI that LACP has mode called active-standby15:02
jamesdentoni just blame netplan15:02
spatelwhat do you mean?15:03
jamesdentonactive-standby corresponds to mode 1 (not lacp), i think, while 802.3ad would active-active (mode 4 lacp)15:04
spatelyou are saying in my case its not LACP bond right?15:04
jamesdentonrifht15:04
jamesdentonyes15:04
jamesdentonso the link must actually go NO-CARRIER, i think, for the failover to occur15:05
spatelhmm 15:05
spatelThis bond config doesn't detect my upstream uplink failure :(15:06
jamesdentonit would not detect that15:06
spatelI am thinking to add arp_ip_target to get gateway arp to detect upstream failure of uplink 15:06
jamesdentonnever used it myself, but give it a shot15:08
spatelThis issue killing me.. whenever server crashed or memory failed on these blade cause blade switch break LACP bond with TOR switch :(15:10
spateltrying to understand what is the relation with server crash and TOR LACP bundle go down. 15:10
spatelI am seeing HP 6120XG blade switch stopped sending LACP PDU to tor switch which put LACP in suspended mode. 15:11
jamesdentonand then the downlinks to the servers don't recognize that and appear offline?15:14
spateljamesdenton look at this diagram - https://ibb.co/FntGz0115:16
spatelfor server both HP 6120 switch is up but TOR switch not getting any LACP PDU packet so tor putting this switch LACP port in suspended 15:17
spatelThis incident only happened to switch-A15:17
spateljamesdenton did you test bonding inside VM ?18:40
jamesdentoneeeesh, if i did i don't recall19:43
jamesdentonhaving issues?19:43
spateljamesdenton no worry let me dig and see21:09
spatelwhat vm_memory_high_watermark setting you guys do for rabbitMQ .21:09
spatel/21:09
spatel?21:09
bjoernt0.221:11
bjoerntit really depends on how much ram you have and how large the vm should be come21:11
spateli have 128GB memory 21:14
spatelmy rabbitMQ keep dying :(21:14
spateli got getting OOM killer when i had 64GB memory so i have added bunch of more dimm and now i have 128GB21:20
spatelmy current setting is 0.2 so thinking to change it to 0.4 21:20
bjoerntdepends how it dies. doubling the ram is effectively the same as 0.2 on the old one21:39
bjoernt0.4 i meant21:49
bjoerntyou dont want too large vms then the GC will take too long21:49
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org is being restarted quickly for some security updates, but should return to service momentarily22:09

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!