Monday, 2023-07-03

rpittaugood morning ironic! o/08:10
Continuity_Morning o/09:13
*** Continuity_ is now known as Continuity09:13
opendevreviewRiccardo Pittau proposed openstack/ironic master: Remove python 3.6 mock hack  https://review.opendev.org/c/openstack/ironic/+/88702309:30
iurygregorygood morning Ironic11:18
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: RedfishFirmware Interface  https://review.opendev.org/c/openstack/ironic/+/88542511:20
Kirill_Hi, Maybe someone can help, i'm working with neutron client method - show_port, but i have >10 ports, Do we have any possibilies to get info for all these ports in on request instead of calling show_port(port) several times. Thanks12:24
TheJuliaKirill_: more than 10 ports?!13:05
Kirill_yep, i use list_ports to get trunk ports, then i from "trunk_details" getting all subports ids and need to get info for each subport. in that case i calling show_port13:08
Kirill_right now i got answer from neutron - that i still have to call show_port for each id(13:09
TheJuliaI thought it was 100 ports, but yeah, best to do specific port lookups if you know the data you need, just don't understand why you need to walk the ports13:13
Kirill_in nearest future it will be > 100 ports.13:19
TheJuliaI'm still waking up, but I don't understand why13:20
Kirill_i want to return to user list of bms with trunck+subports13:20
opendevreviewJulia Kreger proposed openstack/ironic master: Fix db migration tests for sqlalchemy 2.0  https://review.opendev.org/c/openstack/ironic/+/88743213:27
opendevreviewJulia Kreger proposed openstack/ironic master: Add job to test with SQLAlchemy master (2.x)  https://review.opendev.org/c/openstack/ironic/+/88602013:27
TheJuliaperhaps start with asking ironic for each node's vifs?13:28
TheJuliathere is no special flag afaik in neutron13:28
TheJuliaI guess test_walk_versions is still deadlocking sometimes13:54
TheJuliaI guess more reason to push it into it's own job to allow us to continue working and then hopefully figure out exactly what is going on13:54
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: Add more debugging to tie test class, test, state  https://review.opendev.org/c/openstack/ironic/+/88716814:08
iurygregory++14:10
TheJuliaThat might help, because we should see some debugging at where it halts14:13
TheJuliayeah, we should merge the split off14:28
TheJuliaI looked at some of my other changes and some of them are still impacted in there14:29
JayFfyi might be a couple minutes late getting the meeting started but I am here14:57
JayFgot back in time, good stuff15:00
JayF#startmeeting ironic15:00
opendevmeetMeeting started Mon Jul  3 15:00:22 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'ironic'15:00
dtantsuro/15:00
JayFI would anticipate an ill-attended, short meeting as tomorrow is a US federal holiday and today is a popular day to take off :D 15:00
TheJuliao/15:00
TheJuliaI was going to take today off15:00
JayF#topic Announcements/Reminder15:00
JayF#note Standing reminder: review patches tagged #ironic-week-prio, and tag your patches for priority review15:01
JayF#topic Review previous action items15:01
iurygregoryo/15:01
dtantsurI may add the next inspection patch to ironic-week-prio once I test it15:01
JayFJust a reminder that rpittau has an action to moderate our meeting the next two weeks. Next week (starting a week from today) I will be out of office and out of country for a week, so don't expect me around :D15:01
JayF#topic Review Ironic CI Status15:02
rpittauo/15:02
JayFWhere are we?15:02
rpittauthat's very phylosophical15:02
dtantsurWell, my patch has seen the first green for a long time. So not too bad at least?15:02
rpittauJayF: I think we fixed/workarounded most of the issues15:02
JayFApplying extremely cautious optimism lol15:03
JayFAre there any outstanding CI related patches we need to review or land?15:03
rpittaummm not on ironic AFAICS15:03
TheJuliaThere is one for sqlalchemy 2.0, to fix migrations15:03
rpittauah yeah, that one15:03
TheJuliaand there is the mysql split out patch15:04
iurygregoryyeah15:04
TheJuliaI'd split it and possibly merge additional troubleshooting15:04
JayFBoth of those sound good to me, you wanna link them or just we'll find and land them afterwards in any event15:04
iurygregorysome of them we need in stable / bugfix branches 15:04
rpittauI rechecked the sqla one15:04
TheJulia++15:05
JayF++ lets backport whatever is needed to stable branches; but I don't feel like that should be a rush until/unless we have patches to backport there tbh15:05
JayFI trust us all to sanely prioritize 15:05
JayFAight, going to move on15:05
JayF#topic 2023.2 Workstream15:05
JayF#link https://etherpad.opendev.org/p/IronicWorkstreams2023.215:05
JayFI will note that many things seem to be pending review; I'll be taking time to review today15:05
JayFThanks for updating that dtantsur 15:06
dtantsurbtw, have we had a chance to say welcome (back) to masghar?15:07
dtantsurMahnoor is helping me with the inspector merger work and will take over more tasks as we go15:07
iurygregoryI don't think we did15:07
JayFmasghar: welcome (back?) I'm not sure we've ever met but any friend of ironic+dtantsur gets adopted by me :D15:08
iurygregorywelcome masghar =) 15:08
dtantsurMahnoor participated in outreachy, I think TheJulia was her mentor15:08
JayFoh, wonderful!15:08
JayFheck yeah15:08
JayFThere is another former member of our community coming back, for at least a small stint15:08
JayFbut I'll let them make the announcement to that larger group when time comes15:08
JayFMoving on15:09
JayF#topic Open Discussion15:09
JayFI had a note here on PTL availability, I will not be here next week as noted before and am planning to miss the next two-ish meetings due to travel.15:09
JayFAnything else for open discussion?15:09
iurygregoryI probably have something (sorry didn't add to the agenda)15:10
JayFit's open discsusion :D 15:10
iurygregory:D15:10
TheJulia\o/15:10
iurygregoryok, some of you probably remember a problem we had related to multipah and we added a lot of logic on IPA to be able to handle things15:10
opendevreviewMerged openstack/ironic master: Use jammy for base jobs  https://review.opendev.org/c/openstack/ironic/+/86905215:11
rpittau\o/15:11
* dtantsur hears multipath and runs away screaming15:11
rpittauI actually tried hard to forget about that mpath stuff15:11
dtantsurno amount of alcohol can wash this out of memory15:12
iurygregorywe have an interesting bug downstream, where inspection is timing out (takes more than 30min), because the machine has a loooooot of disks and we check all of them I think 15:12
rpittaubut I guess we'll hear more about it :/15:12
iurygregory+80 disks if I recall15:12
dtantsuriurygregory: define "check" please. or is it unclear yet?15:12
* iurygregory looks for the tab with the information15:13
zorunis the issue about multipath in Linux + some NVMe disks?15:13
zorun(hi there)15:13
TheJuliawho said mpath?!/15:13
* TheJulia hides15:13
* TheJulia builds a bunker15:13
JayFWe should proabbly ensure the behavior is documented in a launchpad bug15:14
JayFthen go from there?15:14
JayFsounds like one in a long line of "ridiculously large hardware causes edge case" bugs we've been squashing for a decade :D 15:14
TheJuliaiurygregory: your really going to need to be specific on what is being encountered15:14
TheJuliabecause what JayF said :)15:15
JayFIs there any further specifics on this or something else for open discussion?15:16
iurygregoryis not an error, it's a timeout issue because ipa doesn't report all info in 30 min, because they have a lot of disks and we will do all the checks from _get_multipath_parent_device etc15:16
iurygregoryso we take a lot of time and fails15:16
JayFyeah, in that case I'd probably adjust timeouts to reflect the reality of that environment15:16
JayFbut we should likely have a way to turn off some of that if it's taking forever, too15:16
iurygregoryso I'm wondering if we have some ideas on how to avoid this taking a lot of time15:17
JayFI suspect it's probably reproducable in a unit test; most of what takes a long time is probably in the python parsing, yeah?15:17
iurygregoryinstead of just increasing timeout15:17
iurygregorymaybe the logic we adde to not clean some devices can be used? like "I don't want IPA to check things on /dev/sda, /dev/sdb...."15:18
JayFI suspect there may be a more straightforward fix15:18
JayFbut until there's a bug with details we can look at alongside the code we're just guessing :D 15:19
* TheJulia is on a call so trying to digest15:19
iurygregory(currently I don't think we are using the feature to skip downstream...)15:19
rpittauyou really need to know in advance very well ytour disks, that is not always trivial15:19
iurygregorythings are not trivial when they have like +80 disks :D15:19
JayFlets continue this talk outside of the logged meeting?15:19
JayF#endmeeting15:19
opendevmeetMeeting ended Mon Jul  3 15:19:53 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:19
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-07-03-15.00.html15:19
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-07-03-15.00.txt15:19
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-07-03-15.00.log.html15:19
iurygregoryyeah np15:20
rpittauyou should not have a direct correspondence between disks and mpath devices though15:21
TheJuliawell, really you do.15:21
TheJulia /dev/mpath0 may be backed by /dev/sdx /dev/sda and /dev/sdn15:21
dtantsurI'd still be curious to learn which exactly process takes so much time on each disk.15:22
TheJulia+++++15:22
TheJuliathey should all be super fast checks15:22
dtantsurthat's nearly half a minute per disk15:23
dtantsurif we figure it out, we can try running $thing in parallel15:23
JayF++15:23
iurygregorydtantsur, derek found things and we have some info in the internal slack I think...15:23
JayFI may be less available than usual today on IRC; going to be migrating my IRC bastion server at some point today15:24
opendevreviewVerification of a change to openstack/ironic-python-agent-builder master failed: Extend the DIB_CHECKSUM variable usage  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/88129915:24
dtantsurwell, we'll need to bring this info upstream anyway15:24
JayFUpstream cannot participate in conversations without some redacted public info :D 15:24
iurygregorydtantsur, yeah15:24
iurygregoryI will try to extract the info from the thread15:24
rpittaugood night! o/15:57
TheJuliaiurygregory: some solid way to identify to exclude, I guess...16:03
TheJuliaiurygregory: the bug your chasing, does the customer have *any* multipath devices in that physical server, it sounds like it is just a storage node16:30
TheJuliawhich makes me think it *shouldn't*16:30
TheJuliaso maybe the path is to enable a kernel command line to disable multipath checking?!?16:30
TheJuliaI'm *guessing* here, but it sounds like list_all_block_devices gets called, and because the ramdisk has running multipath tools, it still attempts to do the inverse resolution to provide an accurate list16:31
TheJuliaperhaps we upfront just run multipath -ll and cache that?!16:32
JayFIs there any reason we should *enable* multipath support by default?16:32
JayFif we added the flag, it seems like default off is the saner setting, given the rarity (is my experience wrong?) of that hardware config16:32
TheJuliaokay, we ask it to check the device and then pull the data, and on some hardware will always have a entry there16:34
TheJuliain enterprise environments, it is still sometimes a thing16:34
TheJuliaand we break pretty hard without checking16:35
TheJuliaand otherwise, they end up trying to clean the same lun 4-8 times16:35
JayFI'm mentally modelling this to a single disk still setup in RAID mode on a controller16:35
JayFmultipath support being enabled but not really in full use so we have to be aware of it anyway (?)16:35
TheJuliaso... my laptop *used* to have multipath installed....16:35
TheJuliawell, more like, if we don't know it, then we do bad htings16:36
JayFthis is mainly showing my giant gaping blind spot for storage tech16:36
TheJulialike... clean for a week16:36
JayFI've literally never worked on non-local storage in a production environment ever, so I'm just asking questions to try and understand :D 16:36
JayFif it's a thing that's realtively common, it's a thing16:36
TheJuliathink of having a disk on the other side of the city, but you can take 4-8 different ways to get there16:37
TheJuliaand if you don't know it, you may just think they are 4-8 unique devices16:37
JayFso the multipath "failure mode" is N os devices for N paths, at least in some cases16:37
TheJuliaso then you do 4-8 times the work driving to the exact same disk in the end because you can't tell them apart without checking16:37
TheJuliayeah16:37
JayFso we really have to check hard to ensure that's not happening because then we do N things per device16:37
TheJulia*OR*16:37
JayFwhich can be materially impacting to a drive to shred it so much (depending on config)16:38
TheJulia"only 1 or 2 of 8 paths is available for use, all others blow up"16:38
JayFeven regardless of clock time elapsed16:38
TheJulia"or cause the san to get VERY angry"16:38
* TheJulia has crashed SAN controllers by making them VERY angry16:38
JayFthat makes a lot of sense16:38
JayFwe have to do the expensive annoying but rareish thing16:38
JayFbecause of how spectacularly it breaks if we don't16:38
TheJuliayup16:38
JayFlike, go set off some sans to celebrate tomorrow levels of spectacualar I'm sure16:39
JayF'is that a smoke and laser show?' 'nope, just made the EMC angry!'16:39
TheJulialol16:39
TheJuliayup16:39
TheJulia"Why is my sql database corrupt" "oh, the agent clobbered the wrong controller and the san tried to do the needful, but your database VM had IO requests which took 31 seconds due to direct io locking... sorry" 16:40
TheJuliaI'm kind of with you, timeouts or a disable mpath for this node knob16:41
JayFthat's when the CTO is the fireworks16:41
TheJuliaindeed16:41
JayFI honestly feel like timeouts are the real answer16:41
TheJuliaAnd things like DR/BC begin to get thrown about16:41
JayFit's not unreasonable to say "you have extreme hardware, it takes longer to deal with"16:41
TheJulia"to do it right"16:41
TheJuliayeah16:41
JayFare those 80 physical disks on the server?16:41
JayFor are they 80 attached via a SAN/LUN/etc?16:41
JayFthe other thing this could be a sign of -> the need to support attached storage more as a first class, so Ironic can do intellegent things with it instead of just puppeting the nodes to16:42
* TheJulia looks16:42
JayFI don't know the domain enough to know if that even makes sense, it just seems like an obvious question 16:42
TheJuliadepends on the SAN actually16:43
JayFthis *is* composable hardware, even if it's such an old school form of it that we may not mentally model it that way16:43
TheJuliasince there is variety on if they are different luns or not16:43
TheJuliaoh of course, the customer redacted the device id info16:44
TheJuliaand the scsi id path data16:44
JayFI'd hate for you to mount their iscsi disk from a physically different network, you rascal16:45
TheJuliaso yeah, 80 LUNs, 4 paths per LUN16:45
TheJulialike... who does that16:45
TheJuliaso yeah, we check/scan each "device" so... basically end up checking like 320 disks which can take 30 seconds to double check16:47
JayFthat seems exceedingly reasonable16:47
JayF320 paths in 30 seconds would be asking it to do 10+ a second16:47
TheJuliawell, we do it one disk at a time16:48
TheJuliabecause we want the latest data16:48
JayFI would also think it's tough to assume that a parallel check would return valid data16:48
JayFgiven all the "sans act weird" conversation we've had in the last 15 minutes16:48
TheJuliayeah16:49
TheJuliaiurygregory: when you get a chance, lets talk, I've got a lunch pancake thing starting in 11 minutes16:49
iurygregoryTheJulia yeah it's a storage node (with ceph) 16:59
iurygregoryI just finished my lunch XD17:00
TheJuliawe could split multipath -c out and multi-thread/block, and then do the data check17:01
TheJuliaif we limit it to like 10 concurrently, it shouldn't be horrible17:01
iurygregorykinda makes sense to me17:02
iurygregoryfirst I think I will try to ask them to test increasing the timeout to see how it goes17:04
iurygregoryhopefully they will be unblocked and I wouldn't need to rush in thinking on how to improve our checks17:05
TheJuliaWell, I think they are *very* much a not intended environment config17:09
TheJuliaCeph storage nodes, the intent and performance design is local disks for the multithreaded ip, but use of a San is just centralizing the San to be the io bottleneck17:10
iurygregoryI'm wondering if they are using a San to plug in the server or if is just a server with a bunch of disks 17:15
iurygregoryI can check some info tomorrow (the TAM from the bug is OOO)17:15
TheJuliaYeah, they surely are to have this state17:17
TheJuliaAnd granted, sans are highly optimized,  it even then still…17:17
iurygregoryyeahhh17:24
iurygregoryif someone has time to review https://review.opendev.org/c/openstack/ironic/+/887297 so we can have a separate job to mysql =)17:50
JayFopening17:50
JayFtrying to frantically get python-ironicclient support for sharded done17:51
JayFsince b-2 is this week17:51
JayFlol17:51
JayFhttps://review.opendev.org/c/openstack/ironic/+/887343/3/ do we not want this anymore?17:57
iurygregoryI think it's already included in the squash that rpittau did18:00
* iurygregory double checks18:00
iurygregoryhttps://review.opendev.org/c/openstack/ironic/+/88737318:01
iurygregoryyup it's18:01
TheJuliaOh, I guess I should do parent_node18:07
opendevreviewJay Faulkner proposed openstack/python-ironicclient master: Node sharding support  https://review.opendev.org/c/openstack/python-ironicclient/+/88753318:14
JayFabandoned that patch then18:15
JayFfor the ironicclient change, I wasn't sure how much logic I should put in the client about permitted/unpermitted searches18:15
JayFright now, you can only search for sharded=True/False (with no other attributes)18:16
JayFwe validate that in the API code; I was thinking duplicating that validation in the client was the wrong thing, given the existing code18:16
JayFif that's wrong please say so (preferably in code review)18:16
JayFI'm going to test this as soon as my bifrost test env is up then hopefully it's done18:16
TheJuliaoh... my... soooo hot18:19
TheJuliayeah, I think that is fine just to keep it simple18:20
JayFI gotta show off my new test environment setup that I did this weekend18:20
JayFif you have time for an aside; go look at netboot.xyz18:20
JayFsomeone hacked ipxe to basically turn it into a menu of internet-installable distribution/livecd/etc options18:21
JayFso I can click a few buttons in virt manager, bridge the vm, pxe boot, and provision a machine in almost any distribution immediately18:21
JayFwhich I mean, we're openstack, that shouldn't be super impressive but I finally did it for my homelab lol18:21
JayFshoemaker's children have no shoes and all that18:21
JayFTheJulia: fyi https://review.opendev.org/c/openstack/releases/+/887497 is the release I'm holding for any client stuff we need, it's due 7/6 aiui18:23
TheJuliaoh wonderful18:24
TheJuliaokay, so I guess I'm working today18:24
TheJuliaor.. maybe I don't stress on it18:24
JayFI mean, I worry about it b/c the nova bits are coming for sharding18:25
JayFfor parent/child node it seems ... environment specific enough that someone getting a newer client wouldn't be awful18:25
JayFbut I trust you to make that call18:25
TheJuliayeah, the whole kit and kaboodle for dpus/smartnics and all, I'm thinking more end of year mentally18:25
JayFand I also suspect the change is super duper duper trivial for the python-ironicclient18:25
TheJuliaso many pieces to get to line up there and it is just going to move at it's own speed18:25
JayFand looks a lot like mine18:25
TheJuliawell, I also have to do value setting and whatnot18:26
JayFdoes value setting not come for free?18:26
TheJuliaanyway, give me a few to look18:26
JayFit looked like it came for free in the client18:26
JayFif not my change is incomplete18:26
TheJuliait might18:26
TheJuliawell,18:26
TheJuliafor get but on the node object18:26
* TheJulia will look in a few18:26
JayFTheJulia: Node.update() is basically just yolo-putting fields into a JSON patch18:27
JayFso I'm 99.99% sure you get it free18:27
JayFI'll find out shortly b/c my ironic just came up in my bifrost test to test this with18:28
TheJuliasomething feels like something is missing18:33
TheJuliaheh18:33
TheJuliaokay18:33
TheJuliaokay, it is osc stufs which might not be needed in your csae18:35
JayFWhat OSC stuff?18:38
TheJuliaoh, you need to add osc unset and set support18:38
TheJuliafor the shard value18:38
TheJuliaopenstack command18:38
JayFhttps://github.com/openstack/python-ironicclient/blob/master/ironicclient/osc/v1/baremetal_node.py#L118018:40
JayFack, will do 18:40
JayFI realize why this was so confusing to me before but it was so clear now18:40
JayFI was only in osc/ before and missed the other code18:40
JayFthis time I missed osc/ and was in the actual code18:40
JayFwhee18:40
* JayF not known for his attn to detail18:41
opendevreviewJulia Kreger proposed openstack/python-ironicclient master: WIP: Parent_node support  https://review.opendev.org/c/openstack/python-ironicclient/+/88753518:45
TheJulianowhere from complete, butmaybe helps draw lines18:45
TheJuliaI'm going to go into town with the wife18:45
TheJuliaI'm not going to stress on parent_node right now, I just don't have the spoons on top of our homes A/C being dead18:47
TheJulia(that was yesterday!)18:47
opendevreviewJay Faulkner proposed openstack/python-ironicclient master: Node sharding support  https://review.opendev.org/c/openstack/python-ironicclient/+/88753319:53
JayFTheJulia: ^ tested working \o/19:53
JayFthank you for the pointer19:53
opendevreviewJay Faulkner proposed openstack/python-ironicclient master: Node sharding support  https://review.opendev.org/c/openstack/python-ironicclient/+/88753319:56
JayFand now pep8 won't hate it either lol19:56
opendevreviewVerification of a change to openstack/ironic master failed: Unit tests: Isolate mysql test migrations  https://review.opendev.org/c/openstack/ironic/+/88729720:12
JayFproductive quick day so far today20:22
JayFI always feel like I get 10x as much done after getting back from being sick20:22
opendevreviewVerification of a change to openstack/ironic master failed: Unit tests: Isolate mysql test migrations  https://review.opendev.org/c/openstack/ironic/+/88729720:36
JayFstevebaker[m]: hang tight and I'll have that revised, tyvm -- this one has to be in before 7/6 so trying to make sure it's on a tight cycle21:05
stevebaker[m]OK I'll rereview as soon as its revised21:06
opendevreviewJay Faulkner proposed openstack/python-ironicclient master: Node sharding support  https://review.opendev.org/c/openstack/python-ironicclient/+/88753321:22
JayFIn general, we should look at libraries21:22
JayFthey get cut b-221:22
JayFwhich is 7/6/2023 (3 days)21:22
stevebaker[m]+2!21:43
JayF\o/ thanks21:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!