Wednesday, 2022-06-01

opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: Add a regression test for bug 1939545  https://review.opendev.org/c/openstack/nova/+/84394800:05
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: compute: Ensure updates to bdms during pre_live_migration are saved  https://review.opendev.org/c/openstack/nova/+/84394900:05
opendevreviewmelanie witt proposed openstack/nova stable/train: Define new functional test tox env for placement gate to run  https://review.opendev.org/c/openstack/nova/+/84077700:19
opendevreviewSteve Baker proposed openstack/nova master: Align ironic driver with libvirt secure boot enable  https://review.opendev.org/c/openstack/nova/+/84424305:35
*** whoami-rajat__ is now known as whoami-rajat07:35
opendevreviewRico Lin proposed openstack/nova master: libvirt: Ignore LibvirtConfigObject kwargs  https://review.opendev.org/c/openstack/nova/+/83064408:44
opendevreviewRico Lin proposed openstack/nova master: libvirt: Remove unnecessary TODO  https://review.opendev.org/c/openstack/nova/+/83064508:44
opendevreviewRico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest  https://review.opendev.org/c/openstack/nova/+/83064608:44
opendevreviewBalazs Gibizer proposed openstack/nova stable/train: Extend the reproducer for 1953359 and 1952915  https://review.opendev.org/c/openstack/nova/+/83935409:22
opendevreviewBalazs Gibizer proposed openstack/nova stable/train: [rt] Apply migration context for incoming migrations  https://review.opendev.org/c/openstack/nova/+/83935509:22
opendevreviewBalazs Gibizer proposed openstack/nova master: Reject AZ changes during aggregate add / remove host  https://review.opendev.org/c/openstack/nova/+/82142309:26
opendevreviewBalazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39  https://review.opendev.org/c/openstack/osc-placement/+/82854509:32
slaweqgibi hi, since few days we are seeing same failure in our fedora based scenario periodic job, see https://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d for example09:38
slaweqit seems like some nova and/or glance issue for me09:38
slaweqdid You maybe saw already something like that? or should I open new LP for it?09:38
gibislaweq: it does not immediately ring a bell but give me some time to look into it... I will get back to you09:39
slaweq@gibi sure, thx a lot09:40
sean-k-mooneyslaweq: its proably related to runnign on python 3.1009:47
sean-k-mooneyslaweq: we dont have any tempest based testing on 3.10 currently for nova09:48
sean-k-mooneyif you are only seing it on fedora09:48
slaweqsean-k-mooney: yes, we are seeing it only on fedora currently09:50
sean-k-mooneydo you need to run that job on fedora for a particalar reason by the way? im wondering if ubuntu 22.04 woudl see the same failure09:51
slaweqsean-k-mooney (@sean-k-mooney:matrix.org) we just want to have fedora based periodic job, we run many Ubuntu jobs in check/gate queues already09:54
slaweqmaybe we could/should move it to c9s now09:54
slaweqbut for now it's fedora09:54
sean-k-mooneyslaweq: right i was going to suggest ye move to ubuntu 22.04 or centos 9 stream09:57
sean-k-mooneyi strongly dislike haveing fedora based testing in the gate09:57
sean-k-mooneyim supportive of c9s testing09:58
sean-k-mooneyim looking at the logs currently and waitign for the filters error logs to render09:59
sean-k-mooneyactully there are no error level logs in nova10:02
sean-k-mooneyso there were no excptions10:02
sean-k-mooneyi do see  DEBUG neutronclient.v2_0.client [-] Error message: {"NeutronError": {"type": "PortNotFound", "message": "Port 1a6c72ae-3e1a-45d6-a70e-1f9964b70756 could not be found.", "detail": ""}}10:02
sean-k-mooneybut that proably not related to the failure10:02
sean-k-mooneyim only looking at the novac compute currently10:03
gibislaweq: I see that the instance being snapshotted is crashed 10:04
gibihttps://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d/log/controller/logs/screen-n-cpu.txt#1923510:04
gibiInstance instance-0000002e disappeared while taking snapshot of it: [Error Code 42] Domain not found: no domain with matching uuid 10:04
gibihttps://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d/log/controller/logs/libvirt/libvirt/qemu/instance-0000002e_log.txt10:04
gibi2022-06-01 03:13:33.853+0000: shutting down, reason=crashed10:04
gibibut I don't see any reaonse why it is crashed10:05
sean-k-mooneyits paused but i have nto gotten to where it crahsed yet10:05
sean-k-mooneyInstance instance-0000002e disappeared while taking snapshot of it: [Error Code 42] Domain not found: no domain with matching uuid '84e8668d-6263-4a85-98f5-15d1a7e38f0d' (instance-0000002e) 10:06
gibiit seems that the syslog journal was rotated as it only has later logs in it10:07
gibislaweq: as far as I see this is job failing since 2022-05-07 with the same issue10:16
gibibut older runs has no logs any more10:17
gibiso potentially it is failing a lot longer than 05-0710:18
gibislaweq: I found the rotated yournal 10:26
gibiqemu is segfaulted10:26
gibihttps://paste.opendev.org/show/bIxQstqasZeJiuyDRHxD/10:26
gibibetter paste https://paste.opendev.org/show/byrrh32nwTYvGXtw52ax/10:27
sean-k-mooneynice find10:29
sean-k-mooneyso likely a fedora bug10:29
gibiprobably worth taking up with the downstream virt team10:29
gibias my knowledge ends here10:29
sean-k-mooneyya perhaps although they dont really care about the tcg backend for qemu10:30
sean-k-mooneyso if this does not happen with kvm10:30
sean-k-mooneythey may or may not help10:30
sean-k-mooneyhum there is nothing useful in the qemu instance log10:32
sean-k-mooneyit looks like a pretty normal vm 10:33
sean-k-mooney-machine pc-i440fx-6.1,accel=tcg,usb=off,dump-guest-core=off,memory-backend=pc.ram 10:33
sean-k-mooneyso still pc machine type too10:33
opendevreviewMerged openstack/nova master: Fix typos in help messages  https://review.opendev.org/c/openstack/nova/+/84384310:50
opendevreviewMerged openstack/nova master: Fix typos  https://review.opendev.org/c/openstack/nova/+/84312710:50
opendevreviewMerged openstack/nova stable/train: Define new functional test tox env for placement gate to run  https://review.opendev.org/c/openstack/nova/+/84077710:50
*** tbachman_ is now known as tbachman10:56
*** tbachman_ is now known as tbachman11:07
opendevreviewBalazs Gibizer proposed openstack/osc-placement master: Drop py36 and py37 support  https://review.opendev.org/c/openstack/osc-placement/+/84428111:28
opendevreviewBalazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39  https://review.opendev.org/c/openstack/osc-placement/+/82854511:29
slaweqthx sean-k-mooney (@sean-k-mooney:matrix.org) and gibi for checking tha11:36
slaweq*that11:36
sean-k-mooneyslaweq: assuming this is just the qemu bug then you shoudl be able to swap the nodeset to c9s and that hopefully will resolve the problem unless that but has already made it into c9s11:42
gibielodilles, melwitt: I see a new blocker on stable/ussuri https://zuul.opendev.org/t/openstack/build/034890df09f94bcba63b97f1586e9a48/log/job-output.txt#43311 11:46
gibiadded to the tracking etherpad https://etherpad.opendev.org/p/nova-stable-branch-ci L2611:46
gibigmann: ^^ fyi maybe you have a view on this too11:47
elodillesgibi: there's already a patch for that: https://review.opendev.org/c/openstack/openstacksdk/+/84397811:49
gibithanks, then now that is a blocker for ussuri as the job fails 100%11:51
gibidue to master upper-constraint was bumped recently11:51
elodillesyepp11:52
gibielodilles: thanks for the info11:54
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: func: Increase rpc_response_timeout in TestMultiCellMigrate tests  https://review.opendev.org/c/openstack/nova/+/84420012:01
gibielodilles, melwitt: I saw this issue ^^ in wallaby (newer branches are already fixed)12:03
opendevreviewBalazs Gibizer proposed openstack/osc-placement master: Add Python3 zed unit tests  https://review.opendev.org/c/openstack/osc-placement/+/83536912:09
opendevreviewBalazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39  https://review.opendev.org/c/openstack/osc-placement/+/82854512:09
opendevreviewAlexey Stupnikov proposed openstack/nova master: [minor]Remove unused argument from _fake_do_delete  https://review.opendev.org/c/openstack/nova/+/84428512:15
gibielodilles, melwitt: another blocker on stable/train grenade is failing in devstack-plugin-ceph https://zuul.opendev.org/t/openstack/build/a6e02e6d4b5d40f794c188c852214002/log/job-output.txt#5771-5780 https://zuul.opendev.org/t/openstack/build/a6e02e6d4b5d40f794c188c852214002/log/job-output.txt#5771-578012:17
gibiit seems it started two days ago https://paste.opendev.org/show/bxCRqAFCVzM28hDqlWix/12:17
elodillesgibi: wallaby patch is +2'd. the train blocker is interesting :S i was focused on ussuri and victoria and haven't noticed we have a blocker in train :S12:25
opendevreviewRico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest  https://review.opendev.org/c/openstack/nova/+/83064612:28
gibielodilles: seems to be a fresh one12:32
gibielodilles: thanks for the review on the wallaby one12:33
elodillesgibi: about the train failure: it is strange, it seems 'source' command is missing (?) from the node :-o12:40
gibiit could be that "" is not found12:45
gibiby source12:45
gibias the next line is12:46
gibi/bin/sh: 6: install_ceph_remote: not found12:46
gibior meh12:46
elodillesi guess that should come from the sourced code12:46
gibihttps://github.com/openstack/devstack-plugin-ceph/commit/67da31fa430b99bbe13dc53bbcf81d0e7688523f12:47
gibithis did some change 6 days ago on ussuri and victoria12:47
gibiso this might be related12:47
gibiother than that I don't see any new thing in the plugin repo12:50
gibielodilles: btw, you were right in https://review.opendev.org/c/openstack/nova/+/839354/2#message-8f0247bc92c2d82ed37c05c88db0769fa1f69fa0 I fixed up the cherry pick13:04
gibiyou have eagle eyes 13:04
elodillesgibi: diffing the diffs helps a lot :D13:22
elodillesgibi: it was strange that in one we have CPUPinning while in the other we have CPUUnpinning :)13:23
elodillesi'll review that again when i get there :)13:23
elodillesbtw, i think the problem is with the train thing is that we don't have bash (thus 'source' command). somehow now it runs /bin/sh instead of /bin/bash. (at least that is what i suspect)13:25
sean-k-mooneyelodilles: odd thing i noticed on pop os 22.04 but might be the same on ubuntu 22.04 /bin/sh is actully dash...13:26
sean-k-mooneyyou shoud never depend on /bin/sh for portabl scripts13:26
gibielodilles: interesting...13:27
sean-k-mooney. <thing>13:27
sean-k-mooneyis more porable then "source thing"13:27
sean-k-mooneybut ya i hit the /bin/sh -> /bin/dash issue because pushd was not a command13:28
sean-k-mooneyin the script that uses /bin/sh 13:29
sean-k-mooneywehre is /bin/sh being used13:29
elodillessean-k-mooney: what i see is we expicitly ask for /bin/bash: https://opendev.org/openstack/nova/src/branch/stable/train/gate/live_migration/hooks/ceph.sh#L1013:30
sean-k-mooneyis it in the devstack plugin13:30
sean-k-mooney ah yep we do13:30
elodillessean-k-mooney: but then we got errors like '/bin/sh: .: install_ceph_remote: not found'13:30
elodillessean-k-mooney: or '/bin/sh: 5: source: not found'13:31
sean-k-mooneywhy are we using ansible with raw there13:31
sean-k-mooneyand not shell13:31
elodillesso it seems the subnode does not have 'bash'13:31
elodillessean-k-mooney: good question :)13:32
sean-k-mooneyit should im not sure that executable=/bin/bash is the correc t syntx13:32
sean-k-mooneythat is defining a varable in the "script"13:32
sean-k-mooneyits not actully part of the argument to the ansibel module13:33
sean-k-mooneyas in that is not telling the raw module to use bash as far as i can see13:33
sean-k-mooneyits just storing /bin/bash in the executable variable in the shell that is creted by raw13:34
sean-k-mooneyexecutable is a parmater it can accpet13:35
sean-k-mooneyhttps://docs.ansible.com/ansible/latest/collections/ansible/builtin/raw_module.html13:35
sean-k-mooneybut that does not look liek the correct syntax to me13:35
elodillessean-k-mooney: the interesting thing is that this worked so far on stable/train, but some days ago it started to fail13:36
sean-k-mooney~/repos/ansible_role_devstack on  add-ubuntu-multinode [?] via 🐍 v3.8.13 (.venv) 13:40
sean-k-mooney[14:40:15]❯ ansible localhost -m raw -a "executable=/bin/bash; echo test" 13:40
sean-k-mooney[WARNING]: No inventory was parsed, only implicit localhost is available13:40
sean-k-mooneylocalhost | FAILED | rc=127 >>13:40
sean-k-mooney/bin/sh: line 1: -c: command not found13:40
sean-k-mooneynon-zero return code13:40
sean-k-mooneyso that would imply that this is using /bin/sh13:41
sean-k-mooneyim tryign to figure out how to pring the current executable13:41
sean-k-mooneymaybe $013:41
sean-k-mooneyhum13:41
sean-k-mooney14:41:30]❯ ansible localhost -m raw -a "executable=/bin/bash echo $0" 13:41
sean-k-mooney[WARNING]: No inventory was parsed, only implicit localhost is available13:41
sean-k-mooneylocalhost | CHANGED | rc=0 >>13:41
sean-k-mooneybash13:41
sean-k-mooney14:42:10]➜ ansible localhost -m raw -a "executable=/home/sean/.nix-profile/bin/fish echo $0" 13:42
sean-k-mooney[WARNING]: No inventory was parsed, only implicit localhost is available13:42
sean-k-mooneylocalhost | CHANGED | rc=0 >>13:42
sean-k-mooneybash13:42
sean-k-mooneyso settign execuatble seams to have no impact on what is used with that syntax13:42
gibithis is the last success run https://zuul.opendev.org/t/openstack/build/9c886d2f459143cdaf587ba0b93c7cd4/log/job-output.txt13:43
sean-k-mooneybased on https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html it looks like that syntax should actully work13:44
elodillesgibi: yepp, and the 'source' here works so install_ceph_remote simply runs without any error13:45
sean-k-mooneyoh im dumb i need to use '' not ""13:47
sean-k-mooney$0 is being evaulated before its passed to ansible13:47
sean-k-mooneyhttps://pastebin.com/0wfCmm1z openstacks past seams to be down https://pastebin.com/0wfCmm1z13:54
sean-k-mooneybut ya even if quote it correctly that does nto seam to set teh execuabel used by raw13:55
sean-k-mooneyoh no for fish the syntax is differnt13:57
elodillessean-k-mooney: based on your example, here is mine: https://paste.opendev.org/show/btfrnyRESb4uBo2t9TCj/14:01
elodillessean-k-mooney: so i think bash is disappeared from the subnode in the past days14:01
sean-k-mooneymaybe but i wonder how14:02
elodillessean-k-mooney: and now ansible defaults to /bin/sh14:02
elodillessean-k-mooney: where we don't have 'source' command14:02
sean-k-mooneywell /bin/sh is not not a real shell14:02
sean-k-mooneyits a symlink14:02
elodillessean-k-mooney: that is a good question: how :)14:02
sean-k-mooneyther is not sh binary14:02
sean-k-mooneymaybe it got removed form the cloud image by default14:03
sean-k-mooneydid you say this was train14:03
sean-k-mooneyor ussuri14:03
sean-k-mooneyjust wonderign if its 18.04 or 20.0414:03
elodillessean-k-mooney: well it could be /bin/dash, still the result is the same: we don't have 'source' command14:03
sean-k-mooneycorrect14:04
elodillessean-k-mooney: train14:04
elodillessean-k-mooney: and it's bionic14:04
elodillessean-k-mooney: so 20.0414:04
sean-k-mooneyack so i think we can pull the nodepool images directly form somewhere14:04
sean-k-mooneyinfra makes them aviabel via the nodepool job14:04
sean-k-mooneywe could open the qcow and take a look14:04
sean-k-mooneyand or look and see if anyone tweaked the dib element or nodepool config recently14:05
elodillessean-k-mooney: is there a way to see the recent changes in the image we have in nodepool? :-o14:07
sean-k-mooneythe image are here nb03.opendev.org14:09
sean-k-mooneywell the logs at least14:09
Ugglasean-k-mooney, are you using starship for your prompt ?14:09
sean-k-mooneyUggla: yes14:09
Ugglasean-k-mooney, cool ! Do you like it ? 14:10
sean-k-mooneyya its not bad i dont notice it much14:11
sean-k-mooneythe command time in the output is nice14:11
* Uggla plans to use starship, but stay with powerline atm.14:11
sean-k-mooneyit can be helpful sometimes14:11
sean-k-mooneyi do not use vim so powerline is not as useful to me14:11
Ugglasean-k-mooney, you are emacs user right ?14:12
sean-k-mooneyelodilles: the x86 logs are here https://nb02.opendev.org/ubuntu-bionic-0000229995.log14:12
sean-k-mooneyUggla: currently i used to just use nano14:12
sean-k-mooneyi wanted some thing a little more advance so went to spacemacs in emacs mode14:12
sean-k-mooneyi used to use ides like pycharm years ago bug got tired of having to set them up on various dev systems14:13
Ugglasean-k-mooney, oh ok. I tried astronvim --> really cool. (if you like vim)14:13
sean-k-mooneyi hate how modes work in vim 14:14
sean-k-mooneyi dont want my modes to change what keys do14:14
sean-k-mooneyso insert vs replace vs normal vs visual mode is a deal breaker for me14:15
sean-k-mooneyi never want to have to care about that and cary that context in my brain when im coding14:15
sean-k-mooneyUggla: as you may have notice there is more then enough context to remember for nova as it is14:16
opendevreviewAlexey Stupnikov proposed openstack/nova master: [minor]Remove unused argument from _fake_do_delete  https://review.opendev.org/c/openstack/nova/+/84428514:17
sean-k-mooneyelodilles: dash is definetly isntalled 2022-06-01 08:18:05.065 | I: Retrieving dash 0.5.8-2.1014:18
Ugglasean-k-mooney, sure. I understand, in my case, I use vim for so long that it is more or less automatic.14:18
sean-k-mooneyas is bash14:18
sean-k-mooneyUggla: i used to automate unistalling it form every system i used14:18
elodillessean-k-mooney: yepp, i see bash 4 in it. still it is strange :S14:19
Ugglasean-k-mooney, ಥ_ಥ14:20
Ugglasean-k-mooney, you brake my heart. ;)14:20
elodillessean-k-mooney: even the base ubuntu image ( bionic-server-cloudimg-amd64.img ) has /bin/bash in it14:29
zigoI'm getting "Live Migration failure: operation failed: migration out job: Cannot write to TLS channel: Input/output error: libvirt.libvirtError: operation failed: migration out job: Cannot write to TLS channel: Input/output error".14:46
zigoHow can I check what's wrong?14:46
zigoOh, I know ... nova doesn't have a shell. :/14:48
zigoHum... this was a problem, but it's not the only one, I'm still getting it.14:50
opendevreviewGhanshyam proposed openstack/nova stable/ussuri: Make sdk broken job non voting until it is fixed  https://review.opendev.org/c/openstack/nova/+/84430914:58
gmanngibi:  elodilles melwitt ^^ sdk broken job fix might take time or might not be fixed soon. http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028763.html14:59
gmanngibi: elodilles melwitt ^^ I am making it non voting until then to unblock gate like we did in devstack 14:59
elodillesgmann: ack. though this is passing so this should be a good workaround i guess: https://review.opendev.org/c/openstack/openstacksdk/+/84397815:03
elodillesor do i miss something? :-o15:04
Ugglagibi, sean-k-mooney could you review https://review.opendev.org/c/openstack/nova/+/831507 in the next days if you have time ?15:04
gmannelodilles: yeah but honestly saying that does not give us much benefit than making it non voting15:05
zigoFound the issue ... :)15:14
melwittgibi: thanks for adding those to the etherpad! also +W on the rpc_response_timeout backport15:18
melwittgmann: thanks, I'm looking through to understand the options15:36
gmannmelwitt: i commented on sdk patch. I like the venv approach but that need some commitment and bandwidth from sdk team 15:38
melwittgmann: I see, makes sense. we need opinion from gtema and stephenfin 15:39
gmannyeah I will ping them on sdk channel in case they did not see15:40
melwitt++15:40
melwittordinarily I would say we'd want the venv approach bc sdk is to be backward compat with previous versions BUT as you point out, branch is EM so there is no obligation to keep it working15:41
gmannyeah, same as Tempest. and Tempest team bandwidth is the key for us not to support Tempest master for all EM branches. 15:42
gmannand I think sdk is also in same situation. 15:43
* melwitt nods15:46
opendevreviewRico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest  https://review.opendev.org/c/openstack/nova/+/83064615:49
melwittgmann, elodilles: commented on the sdk job n-v patch https://review.opendev.org/c/openstack/nova/+/84430916:28
opendevreviewGhanshyam proposed openstack/nova stable/ussuri: [stable-only] Make sdk broken job non voting until it is fixed  https://review.opendev.org/c/openstack/nova/+/84430916:35
gmannmelwitt: thanks, add [stable-only]  ^^16:35
gmannadded 16:35
opendevreviewMerged openstack/nova-specs master: Repropose volume backed server rebuild spec  https://review.opendev.org/c/openstack/nova-specs/+/84015516:36
opendevreviewMerged openstack/nova stable/wallaby: func: Increase rpc_response_timeout in TestMultiCellMigrate tests  https://review.opendev.org/c/openstack/nova/+/84420016:43
gibigmann: thanks for the sdk nonvoting patch17:19
opendevreviewMerged openstack/nova stable/xena: Add service version check workaround for FFU  https://review.opendev.org/c/openstack/nova/+/83117417:33
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Add service version check workaround for FFU  https://review.opendev.org/c/openstack/nova/+/84420217:37
opendevreviewArtom Lifshitz proposed openstack/nova master: Func test for deletion of auto-created port on detach  https://review.opendev.org/c/openstack/nova/+/84432518:21
opendevreviewArtom Lifshitz proposed openstack/nova master: Don't delete auto-existing port when detaching  https://review.opendev.org/c/openstack/nova/+/84432618:21
artom_dansmith, well there's 2 things. Does the user want the port deleted when the instance is deleted, and does the user want the port deleted when it's detached?18:27
*** artom_ is now known as artom18:27
dansmithartom: ack, so we let them change the flag on volumes for the same reason right/18:28
dansmithyeah in 2.8518:29
dansmithso maybe making that the same in the port is the way, if you want to be able to change the instance delete behavior too18:29
artomdansmith, so the instance delete case is orthogonal, the original issue is that the port gets deleted when the user detaches it18:30
dansmithon first thought I was thinking that you'd always want to delete it on instance delete, but I dunno why.. volume data is *super* important, but ip allocation can be important too (for broken software :P)18:30
dansmithartom: right I know18:30
melwittdo we delete a volume when you detach it? I didn't think so18:30
dansmithmelwitt: you couldn't detach root volumes so it didn't matter18:31
artomYou still can't IIRC... can you?18:31
dansmithand those are the ones that get that flag18:31
artomI know there was an attempt https://review.opendev.org/q/topic:bp/detach-boot-volume+18:31
dansmithartom: not that I know of, yeah18:31
dansmithright18:31
melwittyeah but what if you booted with another block device to attach? I guess that would mean you created the volume outside of nova 18:31
dansmithmelwitt: right18:31
artomdansmith, so for the detach case, you're proposing an API change (like 'delete_port': True in the body, for example) when doing the DELETE os-interface call?18:33
dansmithso I think we don't expose preserve_on_delete anywhere currently right? if not, it would probably be best to mirror the same boolean logic as the volume one18:33
dansmithwhich is inverted from preserve, I think18:33
artomdansmith, we do not, it's in a JSON blob in instance info network cache18:33
dansmithartom: that's what I was proposing, but I think I've changed my mind18:33
dansmithartom: for the "I need to delete this instance but I want to keep its IP" case18:33
melwittI think it's best to mirror it too if we can18:33
dansmithmelwitt: yeah should just be "delete_on_termination=!preserve_on_delete"18:34
artomdansmith, but we're not deleting any instance, we're keeping the instance (at least initially), but detaching the port18:35
dansmithartom: I realize that, I'm saying make the flag mutable so that it will work for either18:35
melwittwhich is why I was thinking preserve_on_delete_and_detach18:36
dansmithartom: I'm considering the whole picture and not just your single case :)18:36
artomAnd I'm saying they're different cases :)18:36
dansmithmelwitt: you want that to be the flag name? that's inverted from the volume one and also really oddly worded :)18:36
dansmithartom: why?18:36
melwittdansmith: er... I guess it should be delete_on_terminate_and_detach then18:37
artomBecause you could want a port auto-deleted when you delete an instance (delete_on_termination addresses that)18:37
dansmithit's basically a "is this precious" flag which applies in either case, and can be tweaked right before you do either operation18:38
artomBut for the exact same instance, you could want a port preserved when detaching it18:38
artomWell...18:38
dansmithartom: so tweak the flag before you do either, but I think having different behavior for those cases is more complicated than necessary and more likely for someone to set one and not the other and then be surprised18:38
melwittI dunno how yall will want to do it but I'm just saying I don't want it to delete on detach if the only word in the flag is "termination"18:38
dansmithmelwitt: the volume flag doesn't mention the exact operation in the name either.. we have no "terminate" server op,18:39
dansmithso terminate can cover any reason, IMHO18:40
dansmithcross-cell migrate might not be able to keep the same port in the other cell, so wouldn't we want to honor the same flag and leave that port allocated for the user in the old cell if they've marked it as precious?18:40
dansmithor should we delete it because "meh, it's not detach or delete"? :)18:41
melwitteh... "terminate" means "delete server" as far as I've known18:41
artomI guess if you think of the verb terminate as acting on the attachment itself...18:41
dansmithartom: right :)18:41
artomBut then it should really be delete_(volume|port)_on_termination18:42
dansmithit would be better to not use a verb and use an adjective.. "is this thing precious, should I preserve it always"18:42
dansmithpreserve_on_deallocate -- does that cover both?18:42
artomIt's really about lifecycle coupling (bingo!)18:42
artomIs this volume/port coupled to this instance?18:43
dansmithdelete_with_parent18:43
artomdelete_with_owner?18:43
dansmithowner is the user, IMHO, but it's close18:43
dansmiththe user *owns* the hierarchy of resources including an instance and a port, volume18:44
dansmithbut whatever, we're far into the weeds now18:44
artom*puffs*18:44
dansmithsounds like we're coalescing on exposing the to-be-named flag, letting it be mutable, and using that during detach and instance delete?18:44
artomWell, we are, because you're more opinionated and I care less, but the Europeans might have a different opinion once a spec is written up :)18:45
dansmithdelete_with_ouwner?18:45
melwittI'm not commenting on it being coupled, just trying to say if we couple it, it should be really clear in the name of the flag. otherwise we're gonna have people be surprised and unhappy their port got deleted when they detached it and we'll be doing yet another microversion to change the name or add another flag18:46
dansmithwell, delete_on_termination is plenty ambiguous, when stored on a volume attachment, IMHO18:46
dansmithbut it says "server delete" in the docs as the explainer,18:47
dansmithso I think we do the same and we'll please some and offend others18:47
melwittI don't think it is given that detaching a volume _never_ deletes the volume18:47
artomdansmith, honestly, if it wasn't for the long standing precedent, I'd be arguing for the changing the default18:47
artomBecause if you want to delete the port, just do the damn API call yourself :)18:47
artomNo less ambiguous than that18:48
dansmithartom: eh? you can't if it's bound right?18:48
artomRight, so detach it first18:48
dansmithyou know what the feedback was when we split out neutron?18:48
dansmithpeople hated having to do one operation in two places just to get an instance booted18:48
artomLemme guess, "plz proxy and/or automate more things"18:48
dansmithso that sounds a lot like a regressive pattern to me 18:48
artomYeah, I see the smooth UX argument18:49
dansmithartom: I boot my instance on a private provisioning network first, then I want to attach it to public and delete the temporary nic, but I have to do two things?18:49
artomBut can't beat the idea in terms of clarity of intent :)18:49
dansmithyou could also follow your logic for the delete case,18:50
artomYeah, it's admittedly gray there18:50
dansmithand say: if you have an ip you really need to keep, detach it with no_delete=True before you delete your instance,18:50
dansmithso if you think that's more natural, that's fine18:50
dansmithmeaning put the intent in the detach call, instead of a property on the attachment,18:51
dansmithwhich solves melwitt's "what does terminate mean" problem18:51
dansmith(which I created for her, noted...)18:51
melwitthehe. I do like this idea more, it feels a lot clearer18:52
dansmithit was my original thought, but then I was thinking "but what if I want to delete the instance?" .. but if you can detach first, then that solves that18:52
dansmithkinda unfortunate to make it different from volumes, even though volumes have a reason to be different (because detaching root)18:53
artomSo for volumes AFAICT the delete_on_termination flag only applies to instance delete18:54
artomThe detach flow *always* preserves the volume18:54
dansmithartom: but because it has to18:54
gibiyou can delete a bound port in neutron and it will trigger a detach in nova, a buggy detach :D18:54
melwittI think it's fine to make it same as volumes in concept but it shouldn't use ambiguous words IMHO. it's not ambiguous for volumes but it is for NICs, IMO18:55
artomAnd I guess because nova-created volumes aren't a thing18:55
dansmithgibi: because we get a network-deleted and then try to yank it out?18:55
gibiyepp18:55
dansmithartom: they are, but only root18:55
gibibut that codepath is incomplete18:55
artomdansmith, ah, right18:55
dansmithgibi: yeah, I wish we didn't do that18:55
dansmithgibi: like trove creating instances for you, I wish we had a better notion of "this was created for you by $service, delete it from there if you want to"18:56
gibidansmith: totally agree18:57
artomI guess what I'm forgetting is that in his particular case, they instance *will* get deleted regardless18:57
gibialso I'm wondering if nova can trigger volume creation other than the root volume via image->volume or blank->volume BDMs in the boot request18:57
dansmithso does anyone think that just declaring intent during detach, and detach before instance delete if you want to keep it, is not okay?18:57
gibiI'm OK to declar intent at detach18:58
artomSo for them delete_on_termination=False or whatever we end up calling it is fine18:58
artomAnd then they'll be able to just set that and delete the instance, and have the port be preserved18:58
dansmithgibi: Depending on the destination_type and guest_format, this will either be a blank persistent volume "18:58
dansmithgibi: so source_type=blank,destination_type=volume maybe?18:59
gibiyepp that is what I vaguely remember18:59
dansmithartom: that's not what I was just describing18:59
gibibut it is too late here to actually try18:59
dansmithartom: detach(port=$uuid,no_delete=True) <- only change19:00
dansmithartom: if you want to keep an ip before deleting an instance, detach it first19:00
dansmithgibi: I thought you had to do this with a volume uuid so yeah I'm skeptical, but maybe it works19:00
artomdansmith, oh, so you wouldn't actually expose the preserve_on_delete flag19:00
dansmithartom: right, hence the "intent on detach" approach19:01
artomdansmith, but rather *just* add a delete_port option to the detach call19:01
melwittif everyone else prefers to have delete_on_termination for the port and have it also mean delete on detach, I will support the consensus19:01
dansmithartom: right, whatever the flag is19:01
artomdansmith, right, I think I prefer that19:02
artomThe confusion was never about instance deletion19:02
dansmithartom: since we already have behavior, I'd say that if you don't provide the flag, we follow the network_info setting to determine whether or not it gets deleted, but if you ask for True or False, we do that19:02
artomThe current logic of "if Nova created it, Nova deletes it" is fine19:02
artomThe confusion is upon interface detach19:02
artomSo it would make sense to add a way to specify intent there19:02
artomWhile keeping the current behaviour as the historical default19:03
dansmithartom: again, I was not confusing the two, I was trying to make sure that we had a consistent story for both19:03
artomConfusion in the sense of "what happens to the port?:19:03
artom"19:03
dansmithagain, I... okay. :P19:05
gibiUggla: I read half of the https://review.opendev.org/c/openstack/nova/+/831507 and I left feedback. 19:20
opendevreviewRico Lin proposed openstack/nova master: libvirt: Ignore LibvirtConfigObject kwargs  https://review.opendev.org/c/openstack/nova/+/83064420:19
opendevreviewRico Lin proposed openstack/nova master: libvirt: Remove unnecessary TODO  https://review.opendev.org/c/openstack/nova/+/83064520:19
opendevreviewRico Lin proposed openstack/nova master: add locked_memory extra spec and image property  https://review.opendev.org/c/openstack/nova/+/77834720:19
opendevreviewRico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest  https://review.opendev.org/c/openstack/nova/+/83064620:19
ricolin^^^ sean-k-mooney:  stephenfin  gibi just update nova viommu and lock memory patch set according to the bp20:24
opendevreviewRico Lin proposed openstack/os-traits master: Add traits for vIOMMU  https://review.opendev.org/c/openstack/os-traits/+/84433620:34
melwittgmann: hm, it looks like the patch somehow did not affect the gate queue and it voted https://review.opendev.org/c/openstack/nova/+/84430921:23
melwitter, I mean that it didn't remove the job from the gate queue and it ran and then voted21:25
gmannmelwitt: ohk, I think we have this in integrated template also, let me fix21:25
melwittk21:25
melwittgmann: ah yeah I found it (I was curious) https://opendev.org/openstack/tempest/src/branch/master/zuul.d/integrated-gate.yaml#L36121:31
gmannyeah21:34
gmannmelwitt: https://review.opendev.org/c/openstack/tempest/+/84434221:34
gmannl wait for this to merge first21:35
melwittack21:35
opendevreviewGhanshyam proposed openstack/nova stable/ussuri: DNM: testing sdk job not runnign from template on ussuri  https://review.opendev.org/c/openstack/nova/+/84434321:37
gmannmelwitt: ^^ will test it with the tempest change in case it is running from somewhere else also :)21:38
melwitt:)21:38
gmannirrelevant-files are making all these hidden run, we should have some way to just override irrelevant-files without adding job in pipeline21:39
melwittgmann: it looks like https://bugs.launchpad.net/devstack/+bug/1906322 has somehow cropped up on stable/wallaby grenade runs /o\22:25
melwittexample build: https://zuul.opendev.org/t/openstack/build/09f7ff5b69b84e429e3141a622dfa951/log/controller/logs/grenade.sh_log.txt#16798 from https://review.opendev.org/c/openstack/nova/+/84420222:26
melwittdoes it mean we should backport https://review.opendev.org/c/openstack/devstack/+/802642 to wallaby?22:28
melwittI'm not sure how this suddenly started failing22:29
melwittlog says it's using pip version 21.0.1;22:29
melwittI'll try a backport and see what happens22:31
melwittsomeone else had proposed it and abandoned it https://review.opendev.org/c/openstack/devstack/+/80500822:33
gmannohk, let's discuss in qa channel22:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!