Thursday, 2024-03-07

opendevreviewRajesh Tailor proposed openstack/python-novaclient master: Bump microversion to 2.96  https://review.opendev.org/c/openstack/python-novaclient/+/91157505:06
*** mklejn_ is now known as mklejn09:00
*** fnordahl_ is now known as fnordahl10:13
opendevreviewRajesh Tailor proposed openstack/python-novaclient master: Bump microversion to 2.96  https://review.opendev.org/c/openstack/python-novaclient/+/91157512:00
-opendevstatus- NOTICE: Jobs that fail due to being unable to resolve mirror.dfw.rackspace.opendev.org can be rechecked. This error was an unexpected side effect of some nodepool configuration changes which have been reverted.16:55
dansmithmelwitt: so I created an encrypted instance, cirros image, booted fine18:04
dansmithI snapshotted it, the snap has the new credential uuid18:05
dansmithI then booted a new instance from that snap, saw that it ran qemu-img convert to (I think) unencrypt it before it uploaded to ceph to then create a vm image from,18:05
dansmithbut the disk is unreadable in the new instance.. like fails in BIOS18:06
dansmithsince this is ceph it's hard to examine the actual disk to see what the state is.. I should try to repro with qcow because I can poke at it easier18:06
dansmithbut (a) that's expected to work, right?18:07
dansmithand (b) I'm thinking if a user does this, they don't realize that you just stripped the encryption from their snapshot image to upload it18:07
dansmithor maybe I'm wrong about what was happening - perhaps we were just re-encrypting it with the new key? if had used qcow it'd be easier for me to see what happened18:08
dansmithmaybe it's in the logs actually18:08
melwittdansmith: hm, well it shouldn't be unencrypting it, convert is used to make a copy with a new key. but as we discussed I have to change all that and haven't yet18:08
dansmithokay, I thought that maybe the "no support yet for encrypted backing files" meant we were de-keying it because it's the image we're based on18:09
melwittand yeah, it was expected to work, to be able to boot a new instance from the image. there might be something that got messed up when I refactored the things needed from review feedback on the bottom patches18:10
melwittwell, the rbd patch is after the encrypted backing file support one18:10
dansmithokay so, there's a command in the log which is what I was seeing when I watched it, where we convert _to_ luks _from_ raw18:11
dansmithso I assumed that meant we had raw'd it18:11
melwittand that's the root disk, not swap or ephemeral?18:12
dansmithwe wouldn't run convert on ephemeral or swap because we're creating from scratch would we/18:14
melwittoh, right. in earlier revisions I had it creating as raw blank and then mkfs and then encrypted it. but yeah I think I changed it to create luks off the bat18:14
melwittso it sounds like there must be a bug there that happened in the intervening time. I need to look into it18:16
melwittconverting from raw to luks should only happen if the source image is unencrypted. and it would be interesting to see if it's erroneously getting converted to raw after pulling it down from ceph. I wonder if maybe i messed something up in the fetch_to_raw function. it's supposed to be a no-op if the source image is luks18:17
melwittthat was something that got changed during addressing review feedback18:18
melwittI'll investigate18:21
dansmiththere's a raw disk in _base, but I think that might be the original cirros I created from18:22
dansmithbut yeah, I probably need to take ceph out of the equation here so I can grok what's going on18:22
dansmithI'm trying to look at the log and I don't see anything with "-O raw" which is good18:22
dansmithyeah I'm not sure maybe I misread a command line as a convert when it was something else18:24
melwittit would be 'qemu-img convert' in that case18:26
dansmithyeah18:27
dansmithwe are doing a lot of qemu-img info execs with no input format, which is a problem,18:27
dansmithnot sure if that's old or new code though18:27
dansmithokay let me redo this with no ceph so I can try to figure out they boot-of-a-snap is dead18:28
dansmithor see if it's a ceph thing I guess18:29
melwittshould be old, for qemu-img info18:29
dansmithack, I thought we had fixed those18:30
melwittI'm double checking. maybe I'm forgetting something18:31
melwittinfo itself didn't change but passing the format is optional, it's possible I added a call and didn't pass the format. I don't remember off the top of my head but just saying there's a possibility. I'll go back and fix that if I find it18:33
dansmithoh I know it's optional, it's just a security bug to not provide the format18:36
dansmithI guess maybe we had some info cases where we need to probe in the dark, as long as we're careful not to follow the links18:36
dansmithconvert is the one you absolutely have to always have18:36
dansmithanyway, no need to rathole on that specifically, just noticed it18:37
melwittack, thanks18:38
stblatzheimdansmith: if you're around would be happy for review of https://review.opendev.org/c/openstack/nova/+/911070 :)20:10
artomI think I need a bit of philosophical guidance from anyone still around...21:20
artomWhat's the general opinion for changing Nova code _just_ to enable additional testing in CI?21:21
artomSpecifically https://review.opendev.org/c/openstack/nova/+/910022/21:21
artomCPU power management in libvirt is done via module-level functions, so I don't see a way to mock/patch/stub stuff to do live migration testing in functional tests, as both of the libvirt drivers on the source and destination host call into the same module, nova.virt.libvirt.cpu.api21:22
artomI would essentially have to turn all that machinery into classes, and give each virt driver an instance of the class21:23
artomWhich would allow proper per-host mocking21:23
artomAnd I feel like that's a pretty radical change21:23
dansmithphilosophically, I think we should design code to work well and to be testable (as the latter feeds into the former)21:38
dansmithwhether or not it's worth refactoring what you're looking at specifically is maybe a judgment call, but I wouldn't rule it out on principle or anything21:38
artomSo - thinking out loud here then.21:39
artom1. The scope is in-tree functional tests.21:39
artom2. The existing code is _not_ testable for live migration. Not without some pretty convoluted, probably fragile, and hard to understand hax that may not even work.21:40
artom3. Sounds like I have a case for refactoring. At least, it's not to be dismissed off-hand.21:42
artomIf I do it and it ends up getting shot down in review, at least that outcome wasn't immediately obvious when I started (now).21:43
dansmithyeah I mean it's hard to say without a little more detail, but I'm too fried at this point in the day to dig into something else21:43
dansmithbut the words you're saying sound fine on their face21:44
artomYeah, that's the best I can hope for, at this point. Hard to argue about hypothetical without concrete code to look at.21:44
artomAt least I know I'm not way cray-cray with my reflex to refactor.21:44
dansmithnot *because* of this anyway :)21:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!