Thursday, 2022-03-10

jandersquestion: is it possible to do "live" development on a booted IPA? If so, do I need to reload IPA service to make changes take effect? Trying to debug a couple things, it would be much quicker this way than if I keep coding locally, push the change, rebuild IPA and reboot the node into the new image...00:07
janders^ and importantly how do I reload IPA service without making the whole thing fall over?00:07
JayFThere used to be a `--standalone` flag for IPA, which made it not connect to the Ironic API, and before agent tokens, was useful for developing cleaning/deploy steps00:12
JayFnowadays, I don't know what folks do for that kinda dev00:13
JayFbut restarting IPA service on the ramdisk is *never* going to work properly :(00:13
jandersJayF noted, thank you. Two follow-up questions: 1) Would kill -HUP be any good? 2) I suppose if I edit hardware.py, it likely won't have the effect on the running service right?00:19
JayFNope :(00:19
JayFYou're trying to troubleshoot something in a hardware manager?00:20
jandersI'm trying to finish https://review.opendev.org/c/openstack/ironic-python-agent/+/81871200:20
JayFYou can try to emulate standalone mode thru code edits; remove the callbacks to ironic, and checks for the agent token, then you should just be left with an IPA bringing up an API you can call (as if you were Ironic)00:20
janderscouple bits aren't working and if I could tweak the code on a live system it would make the development cycle much shorter00:21
jandersbut it is likely basic stuff so trying to create a framework for this is likely harder than the problem itself00:21
jandersI was meaning to finish this patch long time ago but downstream duties took priority00:22
JayFthe other option, just get on a box you can test it on00:22
JayFload it into a pythonpath00:22
JayFand call the method(s) in question from the python cli00:22
JayFbut that gets dangeresque when you're working on an app designed to mutilate your data00:22
janders:D yeah00:23
jandersI can think of ways but again likely harder than the problem I am trying to fix00:23
janders(if I booted this off a read-only NFS on a test node - and were tweaking code on the NFS server that would likely do what I want)00:24
jandersand the code won't harass ro devices obviously00:24
jandersbut I think I better add some unit tests and that may fix my problems00:24
jandersThe problem is a "TypeError: 'BlockDevice' object is not iterable" in https://paste.openstack.org/show/b24ZbE3rYFX2noPciv9Y/00:26
jandersnot sure whether it's in _list_erasable_devices or erase_devices_express, guessing the latter00:26
janderswill have a harder look at the code :)00:27
jandersbiggest issue is I had several months of a break from looking into bits like this so a bit rusty00:27
jandersmostly working on RedFishy stuff these days00:27
TheJuliajanders: I've done it in the past, but mainly by short circuiting ironic so it never proceeded00:36
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871201:08
opendevreviewMerged openstack/networking-baremetal master: Set agent_type in tests  https://review.opendev.org/c/openstack/networking-baremetal/+/83269701:28
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871201:36
jandersJayF TheJulia looks like I fixed it, just had a successful test run. Thanks for the hints anyway.02:03
opendevreviewMerged openstack/networking-baremetal stable/yoga: Set agent_type in tests  https://review.opendev.org/c/openstack/networking-baremetal/+/83285806:35
opendevreviewMerged openstack/networking-baremetal stable/yoga: Update .gitreview for stable/yoga  https://review.opendev.org/c/openstack/networking-baremetal/+/83257207:04
opendevreviewMerged openstack/networking-baremetal stable/yoga: Update TOX_CONSTRAINTS_FILE for stable/yoga  https://review.opendev.org/c/openstack/networking-baremetal/+/83257307:04
arne_wiebalckGood morning, Ironic!07:32
rpittaugood morning ironic! o/07:57
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Add non-voting dib CentOS Stream 9 job  https://review.opendev.org/c/openstack/ironic-python-agent/+/83294708:11
dtantsurmorning folks08:18
rpittauhey dtantsur :)08:19
dtantsurFYI folks, the glance change has merged for https://review.opendev.org/c/openstack/ironic/+/826930, ready for approval now08:21
jandersgood morning arne_wiebalck rpittau dtantsur and Ironic o/09:13
rpittauhey janders :)09:14
jandersarne_wiebalck I got https://review.opendev.org/c/openstack/ironic-python-agent/+/818712 (hybrid NVMe/HDD cleaning) to a working state. There's still some finishing off required but I would be keen to know 1) if you are in position to test it and 2) tell me what you think about general approach. I tested it in our lab and it seems to work well. Let09:16
jandersme know and I will pass the config required to save you digging.09:16
jandersrpittau thank you for your review and suggestion regarding ^^. Will work on unit tests next and will add that log line.09:17
opendevreviewVerification of a change to openstack/networking-baremetal master failed: Add Python3 zed unit tests  https://review.opendev.org/c/openstack/networking-baremetal/+/83257509:26
hjensasI'm seeing an issue where conductor moves to provision state "inspect failed" for no apparent reason. https://paste.opendev.org/10:04
hjensasI'm using virtual switch appliances, performance is very poor, but I've bumped timeouts to 7200 and that happens after "just" 30ish minutes10:06
hjensasoh, proper paste link: https://paste.opendev.org/show/bxFTmX0ofs7cDgNeAtm2/10:07
hjensasThese are my timout settings:10:09
hjensascrudini --set --existing /etc/ironic-inspector/inspector.conf DEFAULT timeout 720010:09
hjensascrudini --set --existing /etc/ironic/ironic.conf conductor deploy_callback_timeout 720010:09
hjensascrudini --set --existing /etc/ironic/ironic.conf pxe boot_retry_timeout 720010:09
hjensashm, there is an event failed:10:15
hjensasMar 09 21:33:17 openstack ironic-conductor[127396]: DEBUG ironic.common.states [None req-31a60bcb-2a09-4e25-bc9a-63d1406d91e7 None None] Exiting old state 'inspect wait' in response to event 'fail' {{(pid=127396) on_exit /opt/stack/ironic/ironic/common/states.py:325}}10:15
hjensasoh, inspect_wait_timeout is probably what I need to bump.10:30
jandersarne_wiebalck dtantsur rpittau would you have some time to give me a couple hints how to go about unit tests for https://review.opendev.org/c/openstack/ironic-python-agent/+/818712?10:35
jandersIt would be awesome to have this in Yoga, but I think I need to move reasonably quickly right?10:35
dtantsurright10:38
dtantsurwhat's the problem with tests?10:38
dtantsurhjensas: yep10:38
jandersdtantsur they do not yet exist10:39
jandersand looking at existing tests I am not to sure where to start to get this done quickly10:39
jandersthinking what would be the simplest sufficient set10:40
jandersand which existing ones could it be based on10:40
jandersUnit tests aren't my strength but with test_hardware.py in IPA in particular there's a lot of mocks and other structures that need to be fabricated in exactly the right way10:41
dtantsurjanders: you pretty much need to mock _list_erasable_devices(), _nvme_erase() and destroy_disk_metadata()10:43
dtantsurah and _ata_erase()10:44
jandersdo I need to make a list of block_devices with NVMes, HDDs, etc and feed it to the tests to see if the right cleaning mode is triggered for each type?10:45
janders( e.g. https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/tests/unit/test_hardware.py#L2107 )10:47
dtantsurjanders: exactly10:49
dtantsurI think we have plenty of examples in the existing tests10:49
dtantsur(you probably only need a mock with a name though)10:49
dtantsuriurygregory: hey, when are we planning the final Yoga releases?11:37
dtantsurwe got a bit of a delay with sprint 2, so it seems from https://releases.openstack.org/yoga/schedule.html that the final releases are this or next week? or?11:37
iurygregorygood morning ironic o/11:42
iurygregoryhey dtantsur o/11:42
iurygregoryMar 21 - Mar 25R-1Final RCs and intermediary releases11:43
iurygregorythis would be the deadline we have11:43
dtantsuryeah, that's the very last deadline11:44
iurygregorywe can focus on reviewing things we want to be included in the cycle till middle of next week11:44
iurygregoryand try to cut stable/yoga11:44
dtantsurack, I think it's sensible11:44
iurygregoryI will be keeping an eye on ironic, inspector, ipa, bifrost11:44
dtantsurI think I have quite a few bifrost patches still :) and a few ironic ones11:45
iurygregoryyeah =)11:45
iurygregorytoday I will be reviewing things11:45
iurygregoryEveryone, please take some time and evaluate the proposals for our PTG slots http://lists.openstack.org/pipermail/openstack-discuss/2022-March/027647.html =)11:46
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871211:57
mgoddardmorning, I have a question about Ironic restarts. How well does Ironic conductor cope with restarts when there are transient operations (deploy, clean, etc) these days?11:59
dtantsurmgoddard: 'wait' operations are kept, '-ing' operations are aborted12:16
mgoddarddtantsur: ack, thanks12:16
*** pmannidi is now known as pmannidi|Away12:17
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871212:36
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871213:25
opendevreviewJacob Anders proposed openstack/ironic-python-agent master: Improve efficiency of storage cleaning in mixed media envs  https://review.opendev.org/c/openstack/ironic-python-agent/+/81871213:27
* TheJulia tries to wake up14:48
dtantsurTheJulia: don't14:49
dtantsur:)14:49
TheJuliadoes this mean I should just go get a pillow and sleep at my desk?14:50
dtantsursounds good!14:51
TheJuliadtantsur: does that mean all my outstanding patches will be reviewed today?!14:52
dtantsurprobably? you never know before you try14:53
TheJuliatouche14:53
TheJuliaoooh ipa patch has review feedback14:55
dtantsursee? it works!15:02
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: Create fstab entry with appropriate label  https://review.opendev.org/c/openstack/ironic-python-agent/+/83102915:04
TheJuliarloo: rpittau ^^^ review feedback addressed15:04
rlooTheJulia: +2 :)15:09
opendevreviewJulia Kreger proposed openstack/ironic stable/queens: Remove legacy experimental jobs  https://review.opendev.org/c/openstack/ironic/+/82771315:12
dtantsuriurygregory, TheJulia, FYI I'm writing a response to dansmith's email with my -2 on prolonging grenade and deprecation period to 1 year skip-release.15:14
TheJuliadtantsur: There is a balance that can be achieved, but projects will have to be smarter about not breaking users. i.e. "oh, your trying to do this, this should be that" sort of logic which makes the service a bit more graceful15:16
dtantsurwell, we have been doing it15:17
TheJuliabut not everyone has been doing it, and we could be a little better by carrying some migration stuff more than one cycle when it is in place15:17
dtantsurwell, the conversation is basically about switching to one year cadence, except that also have half-year releases15:17
iurygregorydtantsur, ack, I've added to my list to read the email today since it has the PTL tag15:17
dtantsurwhich is the worst of both worlds15:18
iurygregorywow15:18
iurygregoryO.o15:18
dtantsurrealistically, we *barely* keep the current CI working15:18
TheJuliadtantsur: well, yeah. Nobody was willing to concede their position in the TC from their extremes15:19
TheJuliaso we somehow ended up in to that, which in a sense is *kind* of what we've already been doing15:19
dtantsurwell... I was not going to have 1 year deprecation e.g. for netboot15:19
dansmithdtantsur: FWIW, aside from one nova broken sanity check, wallaby->yoga "just worked" with no changes to any of the projects that grenade runs by default15:19
TheJuliafor some reason, there is a whole set of people who only want to ever release every six months15:19
* TheJulia doesn't grok it15:19
dtantsurdansmith: I won't be surprised if Ironic works too. but that's not something I want to commit to in the form of a voting job.15:20
dtantsurwe're very limited in resources, with half of this team only caring about ~ 1 year of lifetime15:20
TheJulia(and that six months cycle has a complex loaded history which will make me depressed expressing)15:20
dansmithdtantsur: so you were against 1yr cycles?15:20
dtantsurdansmith: I'm not *that* much against 1yr cycles as I'm against a hybrid approach15:20
dtantsurI may not even be against 1yr cycles at all if we can still produce our lightweight intermediate releases15:21
dansmithdtantsur: of course you can, so I'm not sure why this is really any different :)15:21
TheJuliadtantsur: ++ on limited resources, keeping CI working in older releases can be a huge lift when things change15:21
dansmithkeeping things working across a 1yr gap is pretty much the same regardless of if there's an intermediate release or two in between no?15:22
dtantsurdansmith: the promise if very different. bugfix branches are maintained in a similar vain as EM branches: they're just a common place to share patches15:22
dtantsurwith e.g. no grenade and limited CI coverage15:22
dansmithokay I still don't see how the hybrid approach is substantially more work15:22
TheJuliaIf we skip major stable backports on the "upstream intermediates" then that actually reduces our burden after a year15:23
dtantsurdansmith: much more CI, much longer life time15:23
dansmithdtantsur: it's not longer life time, at all15:23
dtantsurunless normal grenade is gone (which is explicitly not the case) and the life time of "tock" releases is much shorter15:23
dtantsurI'm explaining the difference with bugfix branches15:23
dtantsurthey come with basically no life time guarantee15:24
dtantsurTheJulia: the current stable policy disallows us skipping backports (for good reasons)15:24
dtantsurwe do skip backports for bugfix branches - another example of how they are best-effort15:24
TheJuliadtantsur: maybe it is time that policy be revisited?15:25
dtantsurTheJulia: if we stop supporting upgrades - sure15:25
dtantsurotherwise there is a very good reason for not skipping backports: to avoid regressions on upgrade15:25
TheJuliaupgrades from, or stopping over on that release?15:25
dtantsurif you backport a change from CC to AA skipping BB, then there will be a regression on AA -> BB15:26
TheJuliaI believe the desire is to move more towards skipleveling15:26
dtantsurand since we support upgrades to/from "tock" releases, it's a problem15:26
TheJuliawell, from tock to tock on a bounds, should be fine if the job is voting15:26
TheJuliabut beyond that tock remaining tocks drop off15:27
TheJuliaI can see your point there15:27
TheJuliaif it is kept rolling forever, yeah15:27
iurygregorybut we will also have grenade voting, so it would be Dmitry's concern if I understood?15:27
TheJuliasomewhere, at some point, the tocks need to die quickly15:27
dtantsuriurygregory: yeah, 2 grenades now: normal and skip-release15:28
iurygregoryyeah =(15:28
TheJuliawell, last commit and last release15:28
TheJulialast major release15:28
dansmiththe tocks only ever need to support upgrading to/from adjacent ticks, you can't skip from one tock to another tock15:28
TheJuliaon the openstack stable model15:28
dtantsurdansmith: sure, I was explaining why skipping backports is a problem15:28
iurygregoryI think we can say we will have a bomb instead of grenades lol =X 15:28
TheJulialets not talk about bombs15:29
dtantsuras to killing tocks quickly... we already barely support N-315:29
TheJuliaenough people are worried about the current state of geopolitical affairs.15:30
TheJuliawe need a word for "bringer of sadness"15:30
TheJuliaperhaps something in german?15:30
TheJuliaLabel every single job that15:30
dtantsurScheissursache? dunno, arne_wiebalck can invent a better one15:30
TheJulia:)15:30
iurygregoryyeah15:31
TheJuliaWell, the burden now is we quite literally jump across every release and we explicitly expect people to to do the same when upgrading. Reality is... people don't always (and the fact they don't explode is good now), but our testing doesn't represent or account for it and we've been a bit aggressive in even trying to enable it to be a thing in the past15:31
dtantsurhonestly, the openshift approach to upgrades (upgrade often and in smaller chunks) is growing on me15:32
TheJuliamoving forward, a job, even non-voting can start to move us on that path as long as we eventually enable it to be voting15:32
TheJuliaand then remove it after so long15:32
dtantsurTheJulia: this is a good argument for a 1 year cycle15:32
dtantsurnot sure why keep "tock" releases then15:32
TheJuliayeah15:33
dtantsurprojects that are consumed often (ironic, swift?) can keep bugfix releases with whatever promises they define15:33
dansmithdtantsur: I really don't understand -- you want 1yr cycles but you also want to do lots of intermediate ones? but don't want tocks? :)15:33
dtantsurdansmith: I don't want to commit to too much. if we want 1yr cycles - fine. let's just do 1yr cycles.15:33
dansmithyou still need to test from the previous major release to all those intermediate ones no?15:33
TheJuliaThe whole frustrating thing of this is really the coordinated release is more about problems and needs unrelated to the software's existence and testing15:33
dtantsurdansmith: *I* do not.15:33
dtantsurthat's simply not our mode of operation15:34
dtantsur(our = metal3/openshift)15:34
iurygregoryyeah15:34
iurygregorywe need the bugfix to be able to have a better integration with metal3/openshift 15:35
dansmithokay, well, this is definitely about committing to more, that's kinda the point... because it seems a subset of people want it15:35
dtantsurI don't doubt it, but then the "subset of people" should contribute more15:35
iurygregoryI will try to push a patch with the new upgrade job to see how it goes also15:35
dansmithso I understand the push back on it taking more resources, I just think this is the least change and least amount of extra work of the options we realistically have15:36
dtantsurwell.. we're struggling the way we are already15:36
dansmithiurygregory: fwiw, writing that job helped me uncover an issue in nova, which also affected our (redhat) FFU, which I was able to get fixed upstream before it bit us later15:36
dtantsurand what this suggests STILL doesn't help $SOME_COMPANY that is basing its product on Train15:36
dansmithso from our (redhat) perspective it has already paid off, even without the skip-level aspect15:37
iurygregorydansmith, nice! =)15:37
dtantsurI wish Red Hat provided a 3rd party CI for projects it cares about15:38
dansmithI'm quite un-proud nova was the only thing I found, and it was pretty trivial, but still :D15:38
dtantsurI remember the fate of the multinode grenade on Ironic...15:39
TheJuliaoh.. that one15:39
dtantsurso yeah, we did rolling upgrades, added a job.. then it kept breaking, we made it non-voting... now it's experimental15:39
dtantsurI'm worried that the new job will suffer the same fate15:39
TheJuliathat one largely had problems and still does because of networking15:39
dtantsur(and I'll *definitely* mark it non-voting if it starts causing troubles)15:39
dtantsurtrue that. but the regular grenade has also been problematic15:39
TheJuliaone or two of the cloud providers has weird MTU restrictions which completely blows up our packets across the vxlan tunnel which is setup15:40
TheJuliaand given network booting, it is super sensitive to that15:40
dtantsurand I'm very unimpressed about the perspective to expand the deprecation period15:40
TheJuliaand *boom*15:40
TheJuliawe actually had it super stable before that cloud went into place15:40
dtantsurMTU is always a problem. I have a question about it that I ask on job interviews nowdays :)15:40
TheJuliahehehe15:40
dtantsurbecause I've recently had the DNS haiku, but with MTU15:41
TheJuliaYesterday was spanning tree15:41
TheJuliafor the 32767 time15:41
iurygregorynow if it's not DNS we can say it's MTU?15:41
TheJuliawe deal with the pysical, it can be spanning tree!15:41
TheJuliaphysical15:42
iurygregoryI saw your tweet yesterday about it 15:42
* TheJulia needs to wake up before the next meeting15:42
TheJulia... in 20 minutes15:42
dtantsuriurygregory: if it's a weird traceback on the server side, it can be MTU with a very high probability15:42
TheJuliaor someone stepped on the thinnet cable15:52
TheJuliathe packets became attenuaated and decided to meet the floor15:53
dtantsurheh15:56
* dtantsur screams in yaml at kubernetes15:56
dtantsuryou've probably heard me calling tripleo overcomplicated? forget it, I just didn't know kubernetes back then.15:56
iurygregorywow15:57
TheJuliaFor the longest time when I was growing up, we ran a serial cable and ran PPP from one end of the house to the other end... and then finally got some 10base2 gear and replaced the serial cable. We ended up having to buy cable covers to lay on the floor... and we *eventually* went to 10baset and 100baset because if you disturbed the thinnet cable, bad things would happen.15:58
TheJulia"Browser load stopped because corgi is sitting on the cable"15:58
dtantsurlovely :D15:58
TheJuliaactually, that was a collie then15:58
dtantsurcorgi-based firewall15:58
dtantsurcolliewall \o/15:58
*** gmann is now known as gmann_afk16:13
*** gmann_afk is now known as gmann16:29
TheJuliaiurygregory: are you going to request another networking-baremetal release?17:08
arne_wiebalckdtantsur: your German is improving rapidly I see :-D17:26
dtantsur:D17:26
opendevreviewVerification of a change to openstack/networking-baremetal master failed: Add Python3 zed unit tests  https://review.opendev.org/c/openstack/networking-baremetal/+/83257517:28
rpittaugood night! o/17:30
* TheJulia digs into logs to see why that failed17:31
opendevreviewMerged openstack/ironic stable/queens: Remove legacy experimental jobs  https://review.opendev.org/c/openstack/ironic/+/82771317:53
opendevreviewMerged openstack/sushy stable/ussuri: Protect Connector against empty auth object  https://review.opendev.org/c/openstack/sushy/+/83282617:58
opendevreviewMerged openstack/sushy stable/train: Protect Connector against empty auth object  https://review.opendev.org/c/openstack/sushy/+/83282717:58
opendevreviewMerged openstack/sushy stable/xena: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83286017:58
opendevreviewMerged openstack/sushy stable/yoga: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83285917:58
opendevreviewMerged openstack/sushy stable/victoria: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83286417:58
opendevreviewMerged openstack/sushy stable/wallaby: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83286217:58
opendevreviewMerged openstack/sushy stable/train: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83286717:59
opendevreviewMerged openstack/sushy stable/ussuri: Fix session authentication issues  https://review.opendev.org/c/openstack/sushy/+/83286617:59
TheJulia\o/18:01
arne_wiebalckbye everyone o/18:20
TheJuliagoodnight18:21
dtantsurI'll go as well, see you18:26
TheJuliagoodnight as well18:31
opendevreviewHarald JensÃ¥s proposed openstack/networking-baremetal master: WIP - OpenConfig YANG and Netconf  https://review.opendev.org/c/openstack/networking-baremetal/+/83226819:31
iurygregoryTheJulia, the patches merged in stable/yoga already? too many emails today, sorry20:27
TheJuliaiurygregory: I think so, but there was a failure I ... think20:48
TheJuliaunit test failed20:49
iurygregorywoot20:51
iurygregorylet me check things to see20:52
TheJuliayeah, so it was just a timeout20:52
TheJuliaI didn't see a cause20:52
TheJuliarechecking it20:52
*** pmannidi|Away is now known as pmannidi21:38
*** pmannidi is now known as pmannidi|Away21:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!