kata-irc-bot | <fupan> What’s the raw volume in /mnt? Can you paste the test cases here, since the kata-agent only used that check to make sure it mount a blk storage from device file in guest. | 02:13 |
---|---|---|
kata-irc-bot | <dgibson> @fupan this device bind mount thing is still confusing me | 04:55 |
kata-irc-bot | <dgibson> @fupan afaict, every place that sets the "blk" storage driver on the runtime side also sets the storage source to a PCI path, so it won't hit that /dev/ path | 04:55 |
kata-irc-bot | <fupan> @dgibson Do you mean it wouldn’t reach https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L378 here, instead it would go to here: https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L386 ? | 05:20 |
kata-irc-bot | <dgibson> well assuming this case goes to that function at all, which I'm not 100% sure of yet | 05:22 |
kata-irc-bot | <fupan> but there’s no difference, and it would reach https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L392 at last, which would also do the bind mount, and the mount src would be the real /dev/<vdx> path. | 05:23 |
kata-irc-bot | <fupan> Here https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L389 the storage source would be set the path of /dev/<vdx>. | 05:24 |
kata-irc-bot | <dgibson> right, I guess si | 05:26 |
kata-irc-bot | <dgibson> so | 05:26 |
kata-irc-bot | <dgibson> I guess that mount wouldn't work anyway if it was hitting that /dev path - the host path of the /dev node wouldn't match the guest path anyway | 05:26 |
kata-irc-bot | <dgibson> I think it's a bug that both this bind mount of a device node, and actually mounting a device with a filesystem go through the same "driver" | 05:27 |
kata-irc-bot | <dgibson> they're very different operations | 05:27 |
kata-irc-bot | <fupan> @dgibson Actually in the case of the /dev path, it isn’t the host path, and it’s the guest path which would be figured by runtime according its virtual path in guest. | 05:35 |
kata-irc-bot | <dgibson> hmm... that's not really clear. The source field has different precise semantics depending on a bunch of different factors | 05:35 |
kata-irc-bot | <dgibson> in some cases, yes, the runtime preducts the guest path (often in ways I suspect aren't really reliable), but I'm not sure that's true in every case | 05:36 |
kata-irc-bot | <dgibson> for PCI devices the host has no way of predicting the guest /dev path | 05:36 |
kata-irc-bot | <dgibson> well, usually | 05:36 |
kata-irc-bot | <fupan> But for blk volume, sometime’s runtime couldn’t know the real filesystem, and just inherit the original “bind” mount to deal with the storage mount. | 05:38 |
kata-irc-bot | <dgibson> in the case you're describint there is no "real filesystem" | 05:40 |
kata-irc-bot | <dgibson> or at least the filesystem is not relevant to the operation | 05:41 |
kata-irc-bot | <dgibson> you're binding a device node into the container | 05:41 |
kata-irc-bot | <dgibson> on runc that's a straightforward bind mount, on kata we need to translate host to guest device node, then do a guest-local bind mount | 05:41 |
kata-irc-bot | <dgibson> that's a totally different operation from asking that the filesystem container on a host device node be mounted into the guest | 05:41 |
kata-irc-bot | <dgibson> instead it's like mounting a single regular file into the container | 05:43 |
kata-irc-bot | <dgibson> also a simple bind mount on runc, but for kata we'd need to use virtiofs (or 9p) instead | 05:43 |
kata-irc-bot | <dgibson> I'm not sure where those two cases (dev fs. regular file) are separated in the kata runtime | 05:44 |
kata-irc-bot | <fupan> It didn’t mount the host device node into guest, actually it hotplug the host device into guest and mounted the guest device node | 05:44 |
kata-irc-bot | <dgibson> "it" is unclear | 05:45 |
kata-irc-bot | <dgibson> we're trying to match runc semantics, right | 05:45 |
kata-irc-bot | <fupan> It means kata runtime. | 05:46 |
kata-irc-bot | <dgibson> I mean there are several different related, but distinct cases here | 05:46 |
kata-irc-bot | <dgibson> for runc, -v /dev/sdX:/some/path is basically identical to -v /tmp/somefile:/some/path | 05:47 |
kata-irc-bot | <dgibson> for Kata they're different | 05:47 |
kata-irc-bot | <fupan> Yes, and only when runtime wouldn’t catch the block’s filesystem, kata would tried to match runc’s sematics and did the device node bind mounted. | 05:47 |
kata-irc-bot | <dgibson> no, that's not correct | 05:47 |
kata-irc-bot | <dgibson> even when it knows the filesystem, with runc -v /dev/sdX:/some/path won't mount the filesystem in the container | 05:48 |
kata-irc-bot | <dgibson> it will mount the device node in the container | 05:48 |
kata-irc-bot | <dgibson> the container can use the device raw, or it can mount it itself | 05:48 |
kata-irc-bot | <dgibson> in fact, runc really never needs to mount filesystems, other than bind mounts | 05:48 |
kata-irc-bot | <dgibson> Kata does, as an optimization | 05:48 |
kata-irc-bot | <dgibson> the normal case for "blk" is, AFAICT, where we mount a block device on the host, then bind that mounted filesystem into the container | 05:49 |
kata-irc-bot | <dgibson> with runc, it just bind mounts the host mount | 05:49 |
kata-irc-bot | <dgibson> for Kata we could virtiofs the host mount, but that's slow | 05:49 |
kata-irc-bot | <dgibson> so AIUI, we have an optimization to instead map the underlying device into the guest and mount it within the guest | 05:50 |
kata-irc-bot | <dgibson> Hrm... wait.. now I'm less sure about this | 05:50 |
kata-irc-bot | <dgibson> They *should* be totally different operations, but OCI is kind of broken | 05:51 |
kata-irc-bot | <dgibson> the semantics of the container shouldn't depend on whether the runtime recognizes the filesystem or not | 05:51 |
kata-irc-bot | Action: dgibson rereads OCI | 05:53 |
kata-irc-bot | <dgibson> of for goodness sake | 05:53 |
kata-irc-bot | <dgibson> no, you're at least partially right, and OCI is totally broken | 05:53 |
kata-irc-bot | <dgibson> "source" for a mount can be a device name or a host path | 05:53 |
kata-irc-bot | <fupan> that’s kata’s optimization, once it know the filesystem, it would do the real filesystem mount onto the container. yes, it’s break the oci semantic, but it would be the user expected. | 05:53 |
kata-irc-bot | <dgibson> ugh... | 05:54 |
kata-irc-bot | <dgibson> so the distinguishing thing is whether it's a bind mount (from the host side description) or not | 05:54 |
kata-irc-bot | <dgibson> it's still nothing to do with whether the runtime can recognize the filesystem or not | 05:54 |
kata-irc-bot | <dgibson> it's whether the container spec says it's a real fs or a bind mount | 05:55 |
kata-irc-bot | <dgibson> the thing is that "bind" mounts as specificed in the container spec make no sense in the Kata context | 05:56 |
kata-irc-bot | <greg.bock> I’ve been carrying forward these: https://github.com/kata-containers/agent/pull/407 https://github.com/kata-containers/runtime/pull/882 | 05:56 |
kata-irc-bot | <dgibson> for regular files "Bind" mounts for the container mean virtiofs mounts within the guest | 05:57 |
kata-irc-bot | <dgibson> but for device nodes we have to do extra magic | 05:57 |
kata-irc-bot | <dgibson> my point is that the special case of exposing a host device node into the guest should be a special case of the virito-fs mount path, not of the block mount path | 05:58 |
kata-irc-bot | <dgibson> it has nothing to do with block *mount*ing | 05:58 |
kata-irc-bot | <dgibson> we're passing through the host side fstype to the guest, and that can never be correct for bind mounts | 05:59 |
kata-irc-bot | <fupan> But there’s no meaning to expose a host device node into guest by virtio-fs , since the container could’t access the host device node. | 06:00 |
kata-irc-bot | <dgibson> exactly, which is we we can't implement it with virtiofs | 06:00 |
kata-irc-bot | <dgibson> but my point is the difference in container semantics between "filesystem" volume and "bind" volume is a more fundamental distinction | 06:00 |
kata-irc-bot | <dgibson> the decision is like this | 06:02 |
kata-irc-bot | <dgibson> if (volume-is-bind-in-spec) { if (bind-target is regular files & dirs) { | 06:02 |
kata-irc-bot | <dgibson> but at the moment paths A & C are ending up in the same place and we have to re-special case them with logic that doesn't really make sense | 06:06 |
kata-irc-bot | <fupan> @dgibson Did you mean the B and C are blended into the same place? | 06:50 |
kata-irc-bot | <eric.adams> According to https://vsupalov.com/docker-latest-tag/ the latest is the default tag when no tag is specified. Browsing around some other popular Docker hub repos they used :latest to refer to the last release which usually is some alpha. I think using latest as a tag to match the last image posted makes sense. The "stable" version will always be a little bit older which makes sense to me. | 21:55 |
kata-irc-bot | <fidencio> Cool, it matches with what was my understanding! | 21:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!