Thursday, 2021-10-21

opendevreviewMichal Nasiadka proposed openstack/bifrost master: WIP: Bump up Ansible to >=2.9,<5  https://review.opendev.org/c/openstack/bifrost/+/81485806:09
iurygregorygood morning Ironic o/06:11
jandershey iurygregory o/06:11
janders(making progress with my laptop setup :) )06:11
iurygregoryjanders, o/06:11
dtantsurplease tell me it's Friday already06:11
jandershey dtantsur o/06:12
iurygregorydtantsur, almost? 06:12
jandersunfortunately I do not have good news on that06:12
jandersbut good news is - it aint Wednesday!06:12
janders(unless someone is in LA or Hawaii - then it's Wednesday still)06:14
iurygregoryOMG :D06:16
dtantsurpoor people!06:16
iurygregory*poor soul*06:17
arne_wiebalckGood morning janders iurygregory dtantsur and Ironic!06:28
jandershey arne_wiebalck o/06:28
iurygregoryarne_wiebalck, o/06:32
rpittaugood morning ironic! o/06:34
iurygregorymorning rpittau o/06:34
rpittauhey iurygregory :)06:34
* rpittau breakfast incoming06:39
* dtantsur -> haircut, bbl07:54
opendevreviewMichal Nasiadka proposed openstack/bifrost master: Bump up Ansible to >=2.10,<5  https://review.opendev.org/c/openstack/bifrost/+/81485808:52
*** pmannidi is now known as pmannidi|brb10:34
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent-builder master: Bump pip for tinyipa to 21.3  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/81489410:37
opendevreviewMerged openstack/ironic master: Add Xena versions to release notes  https://review.opendev.org/c/openstack/ironic/+/81458110:51
*** dviroel|rover|out is now known as dviroel|rover10:51
TheJuliaGood morning11:35
dtantsurmorning TheJulia. it's still not Friday, imagine?11:36
TheJuliano, unfortunately it is not11:50
TheJuliaiurygregory: so it looks like I'm going to get pulled onto a call at 10:3011:53
iurygregory good morning TheJulia o/11:56
TheJuliawell, 10:30 my local time, 14:30 UTC11:57
iurygregoryoh ok11:57
TheJuliaSo we can start discussions and see where they go11:57
iurygregoryI was going to ask the UTC time XD11:57
TheJuliaAnd I can always follow-up with notes11:57
iurygregoryyeah, we can also probably move dtantsur topic from friday to today also, so we don't have 3 topics from you11:58
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent stable/xena: Assert EFI part UUID is not None before editing fstab  https://review.opendev.org/c/openstack/ironic-python-agent/+/81490412:11
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent stable/wallaby: Assert EFI part UUID is not None before editing fstab  https://review.opendev.org/c/openstack/ironic-python-agent/+/81490512:12
iurygregoryJust a reminder that our PTG sessions today are in the Kilo room - https://ptg.opendev.org/ptg.html12:36
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent bugfix/8.1: Assert EFI part UUID is not None before editing fstab  https://review.opendev.org/c/openstack/ironic-python-agent/+/81476912:49
*** redrobot is now known as Guest365612:59
opendevreviewHanGuangyu proposed openstack/ironic master: Add a description of stopping ironic-api.service  https://review.opendev.org/c/openstack/ironic/+/81491213:04
dtantsurTheJulia, iurygregory, wrote the RFE about enabled interfaces: https://storyboard.openstack.org/#!/story/200931613:05
dtantsurwill probably take next (downstream) sprint if nobody beats me to it13:06
dtantsurbtw what do you think about enabling redfish by default nowadays? (and thus making sushy a requirement)13:07
iurygregorysounds like a good idea i would say ^13:07
iurygregoryreminder: the tc discussion about CI is in a few minutes13:08
dtantsur12 minutes to be precise?13:08
iurygregoryyeah =)13:08
iurygregoryif it starts sooner I will put a message here13:09
arne_wiebalckI observe that nodes after instantiation sometimes end up in active but with no instance UUID. I think this happens when RPCs time out (Nova fails, but Ironic moves on). While I guess this "active, but no UUID" situation is ok for Ironic stand-alone, it is a false active and hence a blocked resource in an Ironic w/ Nova deployment. Would it make sense / is it possible to have sth in Ironic which detects and cleans this up?13:09
dtantsurarne_wiebalck: I thought Nova was supposed to set instance_uuid *first*13:21
dtantsursince it's a sort of a lock13:21
arne_wiebalckdtantsur: set it in Ironic?13:22
dtantsuryeah13:22
arne_wiebalckdtantsur: I thought the lock is the resource allocation in placement13:22
dtantsurit shouldn't proceed with deployment if instance_uuid is not set on a node13:22
dtantsurthat's on the nova side13:22
dtantsuron the ironic side instance_uuid is a lock13:22
arne_wiebalckright, but noone else can take that node anymore (through nova)13:22
arne_wiebalckalso, the reason why I end up here is just speculation13:23
arne_wiebalckbut it is a fact I end up here :)13:23
arne_wiebalck... and the nodes are blocked until I step in and clean up13:24
dtantsurNode ca2b5f09-1fc7-4784-b2c9-700cd614cec3 reached failure state deploy failed while waiting for provision_state=['active']. Error: Agent returned error for deploy step {'interface': 'raid', 'step': 'apply_configuration', 'args': {'raid_config': {'logical_disks': [{'size_gb': 'MAX', 'raid_level': '1', 'controller': 'software'}]}}, 'priority': 97} on node13:29
dtantsurca2b5f09-1fc7-4784-b2c9-700cd614cec3 : Error performing deploy_step apply_configuration: Software RAID caused unknown error: Failed to create md device /dev/md0 on /dev/vda1 /dev/vdb1: Unexpected error while running command.13:29
dtantsurCommand: mdadm --create /dev/md0 --force --run --metadata=1 --level 1 --raid-devices 2 /dev/vda1 /dev/vdb113:29
dtantsurExit code: 213:29
dtantsurStdout: ''13:29
dtantsurStderr: 'mdadm: cannot open /dev/vdb1: No such file or directory\n'.13:29
dtantsurThis has started appearing in the CI (stable/wallaby). arne_wiebalck, has your recent fix hit wallaby?13:30
arne_wiebalckdtantsur: the udev patch, yes it should have13:31
arne_wiebalckTheJulia: filed the backports IIRC13:31
arne_wiebalckbut the symptom looks indeed identical13:33
TheJuliawhat did I do!??13:33
TheJuliaor forgot?!?13:33
opendevreviewHanGuangyu proposed openstack/ironic master: Add description to the mod_wsgi part  https://review.opendev.org/c/openstack/ironic/+/81491613:36
dtantsurhmm, this is a fresh stable/wallaby run13:36
dtantsurit's https://opendev.org/openstack/ironic-python-agent/commit/9d707e9f4bab40109b7e29df2136e86d65325ea3 right?13:36
arne_wiebalckdtantsur: yes13:37
dtantsurit should be in the builds...13:37
opendevreviewHanGuangyu proposed openstack/ironic master: Add description to the mod_wsgi part  https://review.opendev.org/c/openstack/ironic/+/81491613:37
dtantsurthe job: https://zuul.opendev.org/t/openstack/build/9a9727d4b6f44734949ccb47f2ee1040/artifacts13:37
arne_wiebalckthe patch is indeed in stable/wallaby13:39
arne_wiebalckTheJulia: heh, you submitted the backport for the udev raid patch :)13:40
TheJuliadtantsur: redfish by default was on my mind, just haven't pushed the patch up yet. By all means!13:40
TheJulialock on the ironic node is instance_uuid being set13:41
TheJuliaif field is already set, then it reschedules13:41
opendevreviewHanGuangyu proposed openstack/ironic master: Add description to the mod_wsgi part  https://review.opendev.org/c/openstack/ironic/+/81491613:41
arne_wiebalckTheJulia: hmm ... how would I end up with an active node without UUID then?13:42
arne_wiebalck*instance UUID13:42
TheJuliaaiui, you shouldn't unless a human did it13:43
arne_wiebalcka human did what?13:43
TheJuliadeploy an instance13:43
dtantsurthat humans!!!13:43
TheJuliaevil, pesky humans13:43
arne_wiebalckit is humans fault then!13:44
TheJuliabad humans, no cookies13:44
dtantsurwe, wool owls, consider humans overrates13:44
arne_wiebalcksrsly, what do you mean by a human deployed an instance? directly on ironic w/o nova?13:44
TheJuliadtantsur: upgraded to a wool owl?13:44
TheJuliaarne_wiebalck: you can totally do it :)13:44
dtantsurTheJulia: indeed: https://twitter.com/creepy_owlet/status/1450815340900933634 (actually it says "owl wool", dunno what they mean)13:45
TheJuliaso dtantsur is now a wool owl13:45
TheJuliahuh!13:45
TheJuliabrainsplodes13:45
arne_wiebalckI bet, but this is not what I did ... unfortunately I do not have the nova instances anymore13:46
TheJuliadoes it show up in nova logs?13:46
arne_wiebalckI will see if I can get it from my logs  ... (input for the log discussion later :-D)13:46
TheJulia++13:47
arne_wiebalckTheJulia: it should, the node should show up in nova and from there I should be able to find the failed instance and ... reasons!13:47
TheJuliaoh wow13:47
opendevreviewAija Jauntēva proposed x/sushy-oem-idrac master: Update unit test folder structure  https://review.opendev.org/c/x/sushy-oem-idrac/+/81491913:48
TheJuliaarne_wiebalck: I *guess* there could have been a window time wise where the request was still in the pipeline but not locked yet with a task on the conductor side13:49
TheJuliaif nova failed, it could potentially rip the entry out and ironic would keep going13:49
dtantsurIury, Julia and I are still in the TC room FYI14:00
iurygregorydtantsur, I just left 14:04
dtantsurI think I'll stay until the end of this discussion14:05
iurygregorythat was my plan also..14:06
rpittauI'm having some issues with zoom14:08
* dtantsur hears "privsep has issues with memory"14:09
iurygregorywow14:13
dtantsurlike, serious issues apparently...14:13
iurygregoryfunny =(14:13
opendevreviewMerged openstack/ironic-python-agent stable/xena: Assert EFI part UUID is not None before editing fstab  https://review.opendev.org/c/openstack/ironic-python-agent/+/81490415:16
erbarrneed a molteniron review: https://review.opendev.org/c/openstack/molteniron/+/81502316:00
rpittaubye everyone, see you on monday o/16:06
dtantsursame, but on Friday o/16:07
TheJuliastevebaker[m]: so I guess my concern of sorts is that vendors are building some grub network booting, some not. I guess fedora/centos/rhel grubs are not network booting for uefi, except for a special purpose built binary which is documented to be on the install disks. :\16:08
TheJuliastevebaker[m]: which to me, seems like a bug/defect, but maybe we should address that with the maintainers directly16:08
TheJuliasome others, upstream, I guess it is there/present in other forms16:08
stevebaker[m]Yeah I'm assuming when grub network boot doesn't work it's because it's not well tested by distros16:10
arne_wiebalckTheJulia: dtantsur: I had a look at the logs of the node in active w/o instance UUID now. It seems indeed that nova sees a timeout ("Failed to request Ironic to provision instance abc: Timed out waiting for a reply to message ID xyz"), then issues a delete, but Ironic ends up with an active instance.16:24
arne_wiebalckIronic moves the node from available to deploying only after Nova has given up already.16:24
JayFThat is an overworked rmq or conductor, IME16:25
arne_wiebalckJayF; exactly16:25
JayFthis is a very longstanding bug, we experienced this from day 1 at onmetal during failure cases16:25
arne_wiebalckthe conductor is overloaded cannot handle the RPC in time, Nova gives up, Ironic moves on :)16:25
arne_wiebalckJayF: ha! ok :)16:26
arne_wiebalckJayF: Usually, our conductor is not overloading, but we re-created >1000 nodes in a short time.16:26
arne_wiebalck*overloaded16:27
arne_wiebalckJayF: I increased the RPC timeout as a first measure16:28
arne_wiebalckJayF: these are nodes which are not distributed over groups nicely16:28
arne_wiebalckso the conductor side is understaffed16:28
arne_wiebalckanyway, good to know my theory of how I ended up here seems to check out16:29
arne_wiebalckJayF: Is there a bug in storyboard for this, or were there even any ideas how to address this?16:30
arne_wiebalck(apart from "scale up your infra!" :))16:30
JayFI don't know the answers to those questions; I just recognize the flow and the behavior16:30
JayFwe only ever saw it during like, split-brain network failurey sorta things16:30
arne_wiebalckthanks16:31
arne_wiebalckbye everyone, see you tomorrow o/16:37
opendevreviewMerged openstack/ironic-python-agent stable/wallaby: Assert EFI part UUID is not None before editing fstab  https://review.opendev.org/c/openstack/ironic-python-agent/+/81490516:52
opendevreviewArun S A G proposed openstack/ironic master: Fix format of kickstart options  https://review.opendev.org/c/openstack/ironic/+/81408717:50
*** sshnaidm is now known as sshnaidm|afk18:53
*** dviroel|rover is now known as dviroel|rover|afk21:58
*** pmannidi|brb is now known as pmannidi23:23

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!