*** fuentess has quit IRC | 00:15 | |
*** th0din has quit IRC | 01:10 | |
*** th0din has joined #kata-dev | 01:15 | |
kata-irc-bot | <leevn2011> Hi There, I am wondering why we are only allowing running `one pod per VM`. What is the main motivation of running the container inside of a VM (OCI compatability ?) Since we have got the VM and we can only run 1 container inside of it, why don't we just run the applications inside of the VM directly? Why bother creating a container inside? Any reference would be greatly appreciated ! | 07:05 |
---|---|---|
kata-irc-bot | <caoruidong> kata is The speed of containers, the security of VMs. If you run in VM directly, it comes back to the age before containers | 07:19 |
kata-irc-bot | <leevn2011> When you say `speed` do you mean the speed of deployment and packaging libraries? It seems to me that running container inside of the VM might imposes some overhead(perhaps not much) , in principle it should be slower than running applications inside of the `kata lightweight VM` directly. I | 07:25 |
kata-irc-bot | <leevn2011> Another interesting thing is that `one pod per one VM` The memory foot print cost is actually very high. | 07:30 |
*** sameo has joined #kata-dev | 07:31 | |
kata-irc-bot | <fidencio> the memory footprint is around 300MB per pod, when using QEMU as VMM. | 07:34 |
kata-irc-bot | <leevn2011> Indeed, compared to a traditional QEMU VM. the memory foot print cost is high. @caoruidong Maybe frank can elaborate more ? | 07:40 |
*** dklyle has quit IRC | 07:41 | |
*** sgarzare has joined #kata-dev | 08:07 | |
*** jodh has joined #kata-dev | 08:13 | |
*** snir has quit IRC | 08:21 | |
*** snir has joined #kata-dev | 08:22 | |
kata-irc-bot | <christophe> Since a VM is a pod, and you can run multiple containers per pod (see https://www.howtoforge.com/multi-container-pods-in-kubernetes/ on how to do it), why do you think we can't run multiple containers in a VM? | 08:34 |
kata-irc-bot | <christophe> If you tried it and it failed, I believe this is a bug and you should report it as such. | 08:35 |
*** fgiudici has joined #kata-dev | 08:36 | |
kata-irc-bot | <leevn2011> oh thanks, That's makes sense. According to the official documentation, In the case of docker, `kata-runtime` creates a single container per pod. https://github.com/kata-containers/documentation/blob/master/design/architecture.md | 08:57 |
*** davidgiluk has joined #kata-dev | 09:03 | |
kata-irc-bot | <caoruidong> Kata uses vm for security, you must pay something for that. Pod is k8s concept, docker doesn't know it. So in docker it's one container per pod | 09:54 |
kata-irc-bot | <christophe> @leevn2011 This is _in the case of docker_ because docker has no real idea of a pod. | 10:40 |
kata-irc-bot | <leevn2011> @christophe @caoruidong Thank you both for the clarification. It makes sense to me now! | 10:53 |
*** yyyeer has joined #kata-dev | 11:10 | |
yyyeer | ping eric | 11:11 |
yyyeer | https://github.com/kata-containers/kata-containers/issues/1171 | 11:12 |
yyyeer | we might decide what way to go | 11:12 |
kata-irc-bot | <christophe> FYI, silly experiment of the day: it looks like the lowest `default_memory` I can use and still boot `alpine` correctly is 184M. With that, `cat /proc/meminfo` gives me: ```MemTotal: 147616 kB MemFree: 34104 kB MemAvailable: 42972 kB``` I wonder why there is always a delta between the `default_memory` value and what we see as `MemTotal` in the guest. | 11:35 |
kata-irc-bot | <christophe> That same setup also boots `fedora` with: ```MemTotal: 147616 kB MemFree: 28292 kB MemAvailable: 41628 kB``` | 11:37 |
kata-irc-bot | <christophe> However, with `fedora`, you can't use `dnf install` in such a low-memory footprint. It gets `Killed`. By contrast, `apk add emacs` in `alpine` works OK. | 11:38 |
*** yyyeer has quit IRC | 11:38 | |
kata-irc-bot | <christophe> ```[ 119.140604] Out of memory: Killed process 215 (dnf) total-vm:299452kB, anon-rss:28180kB, file-rss:8kB, shmem-rss:0kB, UID:0 pgtables:208kB oom_score_adj:0 [ 119.142402] oom_reaper: reaped process 215 (dnf), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB``` Will try with straight rpm. | 11:38 |
kata-irc-bot | <christophe> Straight `rpm` seems to work, although it means you need to find the URLs and dependencies manually. | 11:43 |
kata-irc-bot | <fidencio> @christophe, out of curiosity, what exactly are you trying to measure? The minimal value for `default_memory` that would work for the most basic operations in the most common container images? | 11:49 |
kata-irc-bot | <christophe> Yes. | 13:01 |
kata-irc-bot | <fidencio> @christophe, and are you testing with our own kernel or with upstream kernel? | 13:08 |
kata-irc-bot | <christophe> On Fedora with the host kernle | 13:15 |
kata-irc-bot | <fidencio> *If* you have the time and the interest, would be nice to know those numbers *also* with: • upstream kernel; • 2.x agent; Both things should bring that number down. | 13:20 |
kata-irc-bot | <wmoschet> Hi folks! I noticed that `make cri-containerd` has failed on some jobs. Is it a known issue? Someone already working on a fix? | 13:44 |
kata-irc-bot | <fidencio> I guess it's part of the issues faced since last week, Wainer. | 13:47 |
kata-irc-bot | <wmoschet> @fidencio it might be. I noticed the url to download cri-containerd binaries is broken, then the script build it from source. Not sure it is building the correct version, so this might be one of the problems | 13:50 |
kata-irc-bot | <fidencio> Also, I've noticed kube* not being installed, which would also cause a different failure | 13:51 |
kata-irc-bot | <wmoschet> @fidencio hmm... so, let's try to fix it? | 13:52 |
kata-irc-bot | <fidencio> Do you have cycles for that? | 13:52 |
kata-irc-bot | <fidencio> If so, please, go ahead. | 13:52 |
kata-irc-bot | <wmoschet> @fidencio yes, I am block by some PRs not getting merged...so... | 13:53 |
*** fuentess has joined #kata-dev | 13:56 | |
*** crobinso has joined #kata-dev | 14:05 | |
kata-irc-bot | <eric.ernst> Just to clarify one more thing @leevn2011 - we're looking at defense in depth here. | 15:40 |
kata-irc-bot | <eric.ernst> And in the pod case, want to still provide the same isolation between containers within a single pod. | 15:41 |
kata-irc-bot | <eric.ernst> 300 MB seems high @fidencio? | 15:41 |
kata-irc-bot | <fidencio> As @christophe is investigating, it does depend on the guest kernel being used, but that's what I was able to benchmark with the distro kernel. | 15:44 |
kata-irc-bot | <christophe> @eric.ernst At the moment, roughly: 150-200M required for the VM payload, 30-50M of overhead for qemu, another 20M for shim/runtime and for virtiofsd. Some variations depending on kernel and qemu build, but these are the rough estimates. | 15:47 |
kata-irc-bot | <fidencio> @christophe, was your evaluation done with 1.x? (mine was). | 15:49 |
kata-irc-bot | <eric.ernst> this is with the releases packages (kernel/guest/qemu)? | 15:50 |
kata-irc-bot | <eric.ernst> (sorry, apparently i can only read up two messages at once -- I see -- distro kernel) | 15:50 |
kata-irc-bot | <wmoschet> https://github.com/kata-containers/tests/pull/3126 | 15:51 |
kata-irc-bot | <wmoschet> running some jobs...let's see | 15:51 |
kata-irc-bot | <eric.ernst> that number hurts, but i guess it depends on expectations. I would hope that we're on the order of 100 MB best case, and was concerned as we slipped to ~160. | 15:51 |
kata-irc-bot | <eric.ernst> You may run into issues if you try running a pod with container memory requests, as well. ie, you'll be limited on how much you can hotplug. | 15:53 |
kata-irc-bot | <eric.ernst> ...perhaps this shoudl be for a different thread though... | 15:54 |
kata-irc-bot | <eric.ernst> overheads :thread: | 15:55 |
kata-irc-bot | <eric.ernst> I was looking at this a few weeks ago, as I was uncomfortable with the numbers. | 15:55 |
kata-irc-bot | <eric.ernst> virtiofsd was ~ 4 MB each (2 per pod) shim is ~ 25 MB VMM ~ 147 MB | 15:56 |
kata-irc-bot | <eric.ernst> this was with shim-v2. | 15:56 |
kata-irc-bot | <eric.ernst> and with Kata 2.0 | 15:56 |
kata-irc-bot | <eric.ernst> with 1.x, we weren't building with PIE, so the shim was smaller. When PIE is enabled (it shoudl be!), it is near 25 MB as well | 15:57 |
kata-irc-bot | <fidencio> And VMM being statically QEMU? | 15:57 |
kata-irc-bot | <eric.ernst> (pie is expecnsive) | 15:57 |
kata-irc-bot | <eric.ernst> yes. | 15:57 |
kata-irc-bot | <eric.ernst> from dmesg: ```10242 K kernel code 1864 K rodata 501 K rwdata 852 K init Total of kernel stuff: ~13.1 MB``` | 15:57 |
kata-irc-bot | <eric.ernst> And then, looking more at costs in the guest: ```expected total memory: 1179648 (128MiB default + 1GiB for container) Memory total reported in /proc/meminfo: 1152084 Difference (presumable kernel structures??): 26.9 MB. ``` | 15:59 |
kata-irc-bot | <eric.ernst> (note, I wouldn't suggest memory default of 128 MiB, it as just used during this measurement experiment) | 16:00 |
kata-irc-bot | <eric.ernst> if folks have time, i think it'd be pretty useful to standardize on how we measure each item, and to look at improving. | 16:00 |
kata-irc-bot | <fidencio> Sure, we're in a meeting right now, which will take 51 more minutes. | 16:09 |
kata-irc-bot | <fidencio> In short, what I measured was a way less specific measurement than you or @christophe are doing. What I ended up doing was programmatically starting a number of pods, checking (via prometheus) the amount of memory going up, redoing it several times. | 16:14 |
kata-irc-bot | <eric.ernst> Yeah, which makes sense for first pass. I just want to understand the exact breakdown, so we can begin to investigate improvements, and understand where we are ... sub .. optimal. | 16:21 |
*** dklyle has joined #kata-dev | 16:52 | |
*** dklyle has quit IRC | 17:30 | |
*** dklyle has joined #kata-dev | 17:30 | |
*** sgarzare has quit IRC | 17:59 | |
*** jodh has quit IRC | 18:00 | |
*** fgiudici has quit IRC | 18:28 | |
kata-irc-bot | <christophe> Also, note that we have a non-technical constraint that in our case, we need to run as well as possible with the host kernel. | 18:55 |
kata-irc-bot | <fidencio> @jose.carlos.venegas.m, @gabriela.cervantes.te, @wmoschet, Seems that sonobouy timeout failures have been increasing by a lot. Those timeouts are accounted from the time the k8s cluster is up or does this take into consideration the whole job time? | 19:25 |
kata-irc-bot | <wmoschet> @fidencio I glanced at this problem...IIUC at some point the test script runs `sonobouy run (...)` and the timeout happens on that operation. So I think it is a sonobouy timeout | 19:27 |
kata-irc-bot | <wmoschet> supposely sonobouy by default should wait hours before the timeout...but it seems to wait a few minutes only | 19:27 |
kata-irc-bot | <fidencio> Ack, I see it here. Let me increase the time and see if it helps. | 19:31 |
kata-irc-bot | <fidencio> @wmoschet, as you have to rerun the tests anyways in your last PR about the skipping test, mind to also adding the following patch before rerunning? ```fidencio@machado ~/go/src/github.com/kata-containers/tests $ git diff diff --git a/integration/kubernetes/e2e_conformance/run.sh b/integration/kubernetes/e2e_conformance/run.sh index 36f2e40d..a8390a2d 100755 --- a/integration/kubernetes/e2e_conformance/run.sh +++ | 19:33 |
kata-irc-bot | b/integration/kubernetes/e2e_conformance/run.sh @@ -29,7 +29,7 @@ MINIMAL_CONTAINERD_K8S_E2E="${MINIMAL_CONTAINERD_K8S_E2E:-false}" KATA_HYPERVISOR="${KATA_HYPERVISOR:-}" # Overall Sonobuoy timeout in minutes. -WAIT_TIME=${WAIT_TIME:-180} +WAIT_TIME=${WAIT_TIME:-300} JOBS_FILE="${SCRIPT_PATH}/e2e_k8s_jobs.yaml"``` | 19:33 |
kata-irc-bot | <wmoschet> @fidencio sure. btw, I saw that code... 180 is suppose to be in hours but I believe it is in seconds | 19:39 |
kata-irc-bot | <fidencio> I also thought that's in seconds | 19:40 |
kata-irc-bot | <fidencio> let me re-check | 19:40 |
kata-irc-bot | <fidencio> Yep, waits is actually minutes. | 19:46 |
kata-irc-bot | <fidencio> Well, the problem is not exactly the sonobuoy timeout to finish the process ... take a look ... ```20:10:30 + echo 'running: sonobuoy run --wait=180 --e2e-focus="ConfigMap should be consumable from pods in volume|ConfigMap should be consumable from pods in volume as non-root|Kubelet when scheduling a busybox command that always fails in a pod should be possible to delete|Kubectl client Kubectl apply should reuse port when apply to an | 19:53 |
kata-irc-bot | existing SVC"' 20:10:30 running: sonobuoy run --wait=180 --e2e-focus="ConfigMap should be consumable from pods in volume|ConfigMap should be consumable from pods in volume as non-root|Kubelet when scheduling a busybox command that always fails in a pod should be possible to delete|Kubectl client Kubectl apply should reuse port when apply to an existing SVC" 20:10:30 + eval 'sonobuoy run --wait=180 --e2e-focus="ConfigMap should be consumable from | 19:53 |
kata-irc-bot | pods in volume|ConfigMap should be consumable from pods in volume as non-root|Kubelet when scheduling a busybox command that always fails in a pod should be possible to delete|Kubectl client Kubectl apply should reuse port when apply to an existing SVC"' 20:10:30 ++ sonobuoy run --wait=180 '--e2e-focus=ConfigMap should be consumable from pods in volume|ConfigMap should be consumable from pods in volume as non-root|Kubelet when scheduling a busybox | 19:53 |
kata-irc-bot | command that always fails in a pod should be possible to delete|Kubectl client Kubectl apply should reuse port when apply to an existing SVC' 20:10:30 time="2020-12-16T19:10:30Z" level=info msg="created object" name=sonobuoy namespace= resource=namespaces 20:10:30 time="2020-12-16T19:10:30Z" level=info msg="created object" name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts 20:10:30 time="2020-12-16T19:10:30Z" level=info | 19:53 |
kata-irc-bot | msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings 20:10:30 time="2020-12-16T19:10:30Z" level=info msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles 20:10:30 time="2020-12-16T19:10:30Z" level=info msg="created object" name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps 20:10:30 time="2020-12-16T19:10:30Z" level=info msg="created object" | 19:53 |
kata-irc-bot | name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps 20:10:31 time="2020-12-16T19:10:31Z" level=info msg="created object" name=sonobuoy namespace=sonobuoy resource=pods 20:10:32 time="2020-12-16T19:10:31Z" level=info msg="created object" name=sonobuoy-aggregator namespace=sonobuoy resource=services 20:11:51 time="2020-12-16T19:11:42Z" level=error msg="error attempting to run sonobuoy: waiting for run to finish: failed to get status: | 19:53 |
kata-irc-bot | failed to get namespace sonobuoy: etcdserver: request timed out"``` The problem here is the timeout reached to start sonobuoy as it seems to timeout due to etcdserver. @eric.ernst, any idea why it may be timing out like this? | 19:53 |
kata-irc-bot | <fidencio> I remember facing similar issues when the VM I use to test happens to lack resources | 19:55 |
kata-irc-bot | <fidencio> @jose.carlos.venegas.m, do we happen to have an easy way to increase the number of vCPUs / RAM in the guest? | 19:55 |
kata-irc-bot | <jose.carlos.venegas.m> @fidencio I am not familiar on how to do it, @salvador.fuentes is the one | 19:56 |
kata-irc-bot | <jose.carlos.venegas.m> I think we have to create a new machine alias in jenkins to point to a new VM machine type | 19:57 |
kata-irc-bot | <fidencio> @jose.carlos.venegas.m, Could we have the -k8s-minimal tests not being blockers for now? | 19:57 |
kata-irc-bot | <fidencio> Those errors started happening when a new version of Ubuntu was deployed, isn't it? I wonder if the resources consumption increased | 19:58 |
kata-irc-bot | <fidencio> A way to debug would be creating one of those VMs and manually run the process and see how stressed the system is. | 19:58 |
kata-irc-bot | <fidencio> > Those errors started happening when a new version of Ubuntu was deployed, isn't it? I wonder if the resources consumption increased Hmm. Let me take this out. Fedora faces exactly the same issue. | 20:02 |
kata-irc-bot | <jose.carlos.venegas.m> so not distro specific ? | 20:02 |
kata-irc-bot | <jose.carlos.venegas.m> can you share me a job URL to also see if we can find something more | 20:03 |
kata-irc-bot | <fidencio> I'd say it's not distro specific, but seems to happen more with Ubuntu. | 20:03 |
kata-irc-bot | <fidencio> http://jenkins.katacontainers.io/job/kata-containers-2.0-tests-ubuntu-PR-containerd-k8s-minimal/228/ | 20:03 |
kata-irc-bot | <fidencio> http://jenkins.katacontainers.io/job/kata-containers-2.0-tests-fedora-PR-crio-k8s-e2e-minimal/27/ | 20:03 |
kata-irc-bot | <jose.carlos.venegas.m> ```20:11:51 time="2020-12-16T19:11:42Z" level=error msg="error attempting to run sonobuoy: waiting for run to finish: failed to get status: failed to get namespace sonobuoy: etcdserver: request timed out"``` Looks like a very early setup fail | 20:04 |
kata-irc-bot | <jose.carlos.venegas.m> thx | 20:04 |
kata-irc-bot | <jose.carlos.venegas.m> also just seen in 2.0 correct ? | 20:04 |
kata-irc-bot | <fidencio> Yes, just seen in 2.0 | 20:05 |
kata-irc-bot | <jose.carlos.venegas.m> ok, just googling quickly at least based in https://github.com/kubernetes-sigs/kind/issues/717 | 20:11 |
kata-irc-bot | <jose.carlos.venegas.m> seems could happen based on resources | 20:11 |
kata-irc-bot | <jose.carlos.venegas.m> and this is happening randomly | 20:12 |
kata-irc-bot | <jose.carlos.venegas.m> at that point kata should not be a blocker so looks like a env/setup issue | 20:12 |
kata-irc-bot | <fidencio> Exactly, it's too early in the process. | 20:14 |
*** davidgiluk has quit IRC | 20:17 | |
*** sameo has quit IRC | 20:33 | |
kata-irc-bot | <archana.m.shinde> @fidencio @jose.carlos.venegas.m Yes, I have seen those errors myself when running on a smaller machine | 21:43 |
kata-irc-bot | <archana.m.shinde> seaching on the web does point to resources running out | 21:44 |
kata-irc-bot | <archana.m.shinde> could be due to a slower disk | 21:44 |
kata-irc-bot | <archana.m.shinde> @salvador.fuentes Want to make sure we are not running out of space on those VMs | 21:45 |
kata-irc-bot | <archana.m.shinde> @jose.carlos.venegas.m That issue does mention about increasing the inotify limit `sysctl -w fs.inotify.max_user_watches=524288` | 21:48 |
kata-irc-bot | <jose.carlos.venegas.m> @archana.m.shinde @fidencio hey sorry I got distracted, I think we need run in an azure vm to identify the real cause | 21:50 |
kata-irc-bot | <jose.carlos.venegas.m> I can expend some time tomorrow in case you are not looking to it now | 21:50 |
*** dklyle has quit IRC | 22:04 | |
*** dklyle has joined #kata-dev | 22:21 | |
*** crobinso has quit IRC | 22:52 | |
*** fuentess has quit IRC | 23:23 | |
kata-irc-bot | <salvador.fuentes> @fidencio, @jose.carlos.venegas.m, @archana.m.shinde, @eric.ernst hey, taking in consideration @archana.m.shinde comment regarding that the issue could be due to a slow disk, I went through the azure configuration and checked a flag to use Ephemeral OS disk (according to azure, they provide lower I/O latency ) and have restarted some jobs. I see that from the 3 jobs I restarted, all 3 have passed: | 23:31 |
kata-irc-bot | http://jenkins.katacontainers.io/job/kata-containers-2.0-tests-ubuntu-PR-containerd-k8s-minimal/ . It seems that this helped, but still want to be cautious and see how they behave between today and tomorrow. | 23:31 |
kata-irc-bot | <archana.m.shinde> great @salvador.fuentes. Yeah, lets monitor them over a day | 23:33 |
kata-irc-bot | <fidencio> @salvador.fuentes thanks a lot for taking a look into this! | 23:50 |
kata-irc-bot | <liubin0329> Glad to see some jobs have succeed | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!