Tuesday, 2021-11-09

TheJuliadtantsur: doesn't Xena override the last bug fix branch at this point?00:16
TheJuliaarne_wiebalck: so, thinking through nova patch I'm working on. I *think* changing the conductor group would just get it re-assigned onto a compute node host where it is then visible, if visible to the instance list visible to that compute node, but that likely has a failure case I'm not thinking of in that 00:19
TheJuliaarne_wiebalck: could there be another case or a permutation on that?00:19
opendevreviewJacob Anders proposed openstack/sushy master: Changing boot device string for vmedia on SuperMicro  https://review.opendev.org/c/openstack/sushy/+/81713705:31
*** pmannidi is now known as pmannidi|pto06:20
iurygregorygood morning Ironic o/06:43
arne_wiebalckGood morning, iurygregory janders and Ironic!07:16
iurygregoryarne_wiebalck, o//07:16
arne_wiebalckTheJulia: The issue we see on Nova (Stein) is that the moved host is only soft-deleted by the old compute node and then fails to be re-added by the new one (since the UUID already exists).07:19
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent stable/xena: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81714307:25
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent stable/wallaby: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81714507:32
arne_wiebalckiurygregory: which bugfix branch(es) should the partx patch go to (and how do I find them next time)? 07:33
iurygregoryarne_wiebalck, bugfix/18.1 if possible :D07:34
iurygregory(downstream we are using this one =X)07:34
arne_wiebalckthis for the IPA, not Ironic07:34
iurygregoryoh 07:34
iurygregorylet me check =)07:34
arne_wiebalckbugfix/8.1 ?07:34
iurygregoryarne_wiebalck, yup =)07:35
arne_wiebalckiurygregory: policy is that it only goes to the latest bugfix?07:36
arne_wiebalckiurygregory: i.e. not to bugfix/8.007:36
iurygregoryarne_wiebalck, it makes sense to me (but I don't remember if we did specify if we need to always backport to bugfix branches)07:37
* iurygregory looks for the spec about the new release model to double check07:37
iurygregoryBugfix branches (for deliverables that have them) are supported for 6 months. Only high and critical bug fixes are accepted during the whole support time.07:40
iurygregorywell, we should backport them to all based on that (but since we don't know if people are really using bugfix/8.0 for ipa it's not mandatory)07:41
iurygregoryhttps://specs.openstack.org/openstack/ironic-specs/specs/15.1/new-release-model.html07:42
iurygregoryarne_wiebalck, does it makes sense to you? =)07:44
*** sshnaidm is now known as sshnaidm|afk07:45
arne_wiebalckiurygregory: let's start with 8.1 then and see07:45
arne_wiebalckiurygregory: 8.1 is the bugfix branch for Xena, right?07:46
iurygregoryyup07:46
arne_wiebalckiurygregory: then I am lost as to why the patch applies cleanly to stable/xena, but not to bugfix/8.107:46
iurygregoryhuh? :O07:46
arne_wiebalckiurygregory: hmm ... maybe there are non-backported patches07:47
iurygregoryI was about to say this :D07:47
arne_wiebalcki.e. sth in stable/xena which did not go into bugfix/8.107:47
arne_wiebalckok07:47
arne_wiebalckpatch incoming ... :)07:47
iurygregorymaybe some of TheJulia patches?07:49
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent bugfix/8.1: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81715107:49
arne_wiebalckiurygregory: yes07:49
opendevreviewZhouHao proposed openstack/ironic master: [iRMC] Convert the type of irmc_port to int  https://review.opendev.org/c/openstack/ironic/+/81715407:52
opendevreviewZhouHao proposed openstack/ironic master: [iRMC] Convert the type of irmc_port to int  https://review.opendev.org/c/openstack/ironic/+/81715407:54
rpittaugood morning ironic! o/08:43
dtantsurmorning ironic08:44
dtantsurTheJulia: some of us do use bugfix/18.1 still08:44
rpittauoh yes we do08:45
iurygregorymorning rpittau dtantsur o/08:45
rpittauhey iurygregory :)08:45
jandershey arne_wiebalck iurygregory rpittau dtantsur and Ironic o/08:45
rpittauhey janders :)08:46
iurygregoryjanders, o/08:46
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent-builder master: Build and publish arm64 debian based ipa ramdisk  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/81581508:49
*** sshnaidm|afk is now known as sshnaidm09:40
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Move rescan device function to general utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/81691609:58
rpittau^ we probably need udevadm settle after parted too, so I'm wondering if we should invert and run udevadm settle *before* partx in rescan_device, or before AND after10:08
iurygregoryomg10:09
rpittauheh10:10
dtantsurle sigh10:12
rpittauI believe before should be enough as the uuids should be already updated by udev, let's see what the CI says10:12
rpittauI may even move udevadm_settle to utils at this point10:12
dtantsurI'm quite sure we need udev after partx, that was the outcome of Arne's previous attempt10:12
rpittauoh right :/10:13
arne_wiebalckyes, udev should be last10:43
arne_wiebalckit is the sync point where all the newly found partitions are turned into device nodes10:43
rpittaualright, let's see how the CI goes, in the end it may be that we can ignore that error I'm seeing and we just need a little customization of partx inside rescan_device10:44
*** dviroel|out is now known as dviroel11:07
opendevreviewVerification of a change to openstack/ironic master failed: Avoid handling a deploy failure twice  https://review.opendev.org/c/openstack/ironic/+/81668411:26
rpittaummm I see still ironic-standalone-ipa-src failing for 'mdadm: cannot open /dev/vdb1: No such file or directory'11:33
iurygregoryhuh?! but we already merged the fix right? 11:36
rpittauyeah11:36
arne_wiebalckrpittau: does the partx -a call fail? it does when I run it on VMs, it works on BM, though11:37
rpittaummm interesting it still shows partx -u11:38
arne_wiebalckaha11:38
dtantsurthe image wasn't rebuilt yet?11:40
rpittaummm the change was just for image though, I'm reading this from hardware11:41
arne_wiebalckaren't there 2 partx calls, one in rescan_device, one in the s/w RAID part in hardware.py11:41
rpittauarne_wiebalck: yeah this is in the raid part11:41
arne_wiebalckrpittau: this is the one we wanted subsequently change to call the new rescan11:42
rpittauyep11:42
arne_wiebalckwe need the refactoring to merge first I guess?11:42
rpittauyes, at this point, or we can split that in two for the sake of backporting the entire fix11:43
rpittauwhat I mean is we change -u to -a in the raid part as a fix for that, and we can backport it easily11:47
rpittauthen we do the refactor and we don't necessarily need to backport it11:47
arne_wiebalckyou mean: have partx -u replaced in the raid code now, then merge your refactoring patch, then call rescan_device from the new place? 11:47
rpittauyeah11:47
arne_wiebalcksounds good to me ... you want to do the u/a replacement or shall I do it?11:48
rpittauI can do that so I directly rebase the refactor onto it11:49
arne_wiebalckrpittau: +1, ty11:49
opendevreviewMerged openstack/ironic-python-agent-builder master: new element burn-in for package stress-ng, added fio  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/81545311:52
opendevreviewMerged openstack/ironic-python-agent stable/xena: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81714311:59
opendevreviewMerged openstack/ironic-python-agent stable/wallaby: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81714511:59
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Re-read the partition table with partx -a, part 2  https://review.opendev.org/c/openstack/ironic-python-agent/+/81719712:03
opendevreviewVerification of a change to openstack/ironic master failed: Avoid handling a deploy failure twice  https://review.opendev.org/c/openstack/ironic/+/81668412:05
opendevreviewJacob Anders proposed openstack/sushy master: Changing boot device string for vmedia on SuperMicro  https://review.opendev.org/c/openstack/sushy/+/81713712:13
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Move rescan device function to general utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/81691612:16
arne_wiebalckwell ... the raid code is calling udev, then partx ... rescan is doing it the other way round :)12:17
arne_wiebalcklet's see what implications this has ...12:18
opendevreviewMerged openstack/ironic-python-agent bugfix/8.1: Re-read the partition table with partx -a  https://review.opendev.org/c/openstack/ironic-python-agent/+/81715112:18
opendevreviewDmitry Tantsur proposed openstack/ironic-python-agent master: Simplify error messages when running clean/deploy step  https://review.opendev.org/c/openstack/ironic-python-agent/+/81720113:00
janderssee you tomorrow Ironic o/13:14
iurygregorybye janders o/13:19
opendevreviewRiccardo Pittau proposed openstack/ironic bugfix/18.1: Fix idrac-wsman deploy with existing non-BIOS jobs  https://review.opendev.org/c/openstack/ironic/+/81710713:27
arne_wiebalckBare Metal SIG meeting in 10mins! Topic: "Hardware Burn-in with Ironic", details on  https://etherpad.opendev.org/p/bare-metal-sig13:51
rpiosoarne_wiebalck: I lost my Zoom connection to the meeting.14:12
TheJuliaarne_wiebalck: hmm, soft delete does definitely sound like a separate conundrum14:31
arne_wiebalckTheJulia: ok14:57
arne_wiebalckTheJulia: this is why I was a little careful on saying this is an issue, since it also may have changed already14:57
opendevreviewMerged openstack/ironic-python-agent stable/wallaby: Output verbose info from efibootmgr  https://review.opendev.org/c/openstack/ironic-python-agent/+/81701215:09
opendevreviewMerged openstack/ironic-python-agent stable/wallaby: Delete EFI boot entry duplicate labels first  https://review.opendev.org/c/openstack/ironic-python-agent/+/81701315:09
opendevreviewMerged openstack/ironic master: Avoid handling a deploy failure twice  https://review.opendev.org/c/openstack/ironic/+/81668415:12
opendevreviewDmitry Tantsur proposed openstack/ironic stable/xena: Avoid handling a deploy failure twice  https://review.opendev.org/c/openstack/ironic/+/81710915:27
TheJuliaarne_wiebalck: is it just a matter of changing soft deleted back to active in the db?15:30
arne_wiebalckTheJulia: I cannot say, tbh: for sure, nova does not expect to find an entry. If undeleting it, or hard delete and re-create is better, I cannot judge.15:40
arne_wiebalckTheJulia: Belmiro (as our Nova expert) wanted to check more recent releases first, before patching, esp. as we are preparing the Nova upgrade.15:42
TheJuliaokay, well, if it is not auto-recovered, we likely could do it in the compute node soon then and just update the db, but that entirely depends on what else nova does15:44
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Move rescan device function to general utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/81691616:33
arne_wiebalckI do not have the full logs to prove it, but we just removed a test node from our deployment which has been instantiated with Ironic certainly more than 10k times (probably even more than 20k times).16:58
opendevreviewDmitry Tantsur proposed openstack/ironic master: Fix RedfishManagement.get_mac_addresses and related functions  https://review.opendev.org/c/openstack/ironic/+/81672617:00
dtantsurarne_wiebalck: wow Oo17:00
arne_wiebalckrally, once an hour, and the node was added in nov 201717:01
dtantsurimpressive :)17:10
dtantsursee you tomorrow folks o/17:10
arne_wiebalckbye dtantsur o/17:10
arne_wiebalckTheJulia: it is not auto-recovered from what I see ... maybe when the DB is finally purged after some months :-D17:10
*** JasonF is now known as JayF17:12
arne_wiebalckbye everyone, see you tomorrow o/17:12
JayFarne_wiebalck: we hit numbers like that with onmetal17:12
JayFarne_wiebalck: because with the number of nodes we had in inventory + the amount of CI we ran, we provisioned a node dozens of times a week17:13
arne_wiebalckJayF: I imagine!17:13
JayFit's more impressive if you hit those numbers with real usage17:13
JayFand not CI-style usage17:13
arne_wiebalckJayF: heh, true, but all the activity on our cloud is dominated by the CI/Rally activity17:14
arne_wiebalckJayF: it is still cool to see how the numbers accumulate17:14
JayFIt's often easy to focus on the times when things don't work right.17:14
arne_wiebalckJayF: and also to think how often that node has been rebooted17:14
JayFWhen you hit big milestones, it helps you reflect on all the stuff that silently works17:15
arne_wiebalckJayF: exactly17:15
arne_wiebalcko/17:15
JayFo/17:15
TheJuliaarne_wiebalck: you should so tweet a about a node deployed 10k+ times :)17:17
TheJuliaarne_wiebalck: I guess in that case, regarding auto recovery, we just need to have a bug someplace and then it *might* be stupidly easy to fix soon()17:18
rpittaubye everyone! o/17:32
*** dviroel is now known as dviroel|out20:51
opendevreviewJacob Anders proposed openstack/sushy master: Changing boot device string for vmedia on SuperMicro  https://review.opendev.org/c/openstack/sushy/+/81713720:53
jandersgood morning Ironic o/21:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!