Friday, 2022-09-23

kata-irc-bot<feng.wang> @norbj.d_kata We frequently run more than five kata VMs on the same node and never see the issue you describe. We run on kata 2.4. What hypervisor do you use?03:02
kata-irc-bot<mail173> thanks @liubin0329, we'll try and reproduce with the `3.0.0-rc0` tag.  @feng.wang we also frequently run a lot more than five kata VMs on the same node, but don't lauch them all simultaneously. if we launch them progressively, e.g. one after another, it's fine, but we see this issue when we launch them in parallel. if i was to guess, i'd say there's a deadlock or race condition somewhere07:37
kata-irc-bot<norbj.d_kata> @feng.wang actually, running more than 5 kata VMs (for example 30) works as long as we don't start the VMs *at the same time* :S08:16
kata-irc-bot<mail173> amazing, thank you @yinnanyao, let us know if we can help in any way08:55
kata-irc-bot<yinnanyao> @mail173 @norbj.d_kata @feng.wang I use a random string to set cgroup path when cgroup path is empty. Please help me to review this PR. https://github.com/kata-containers/kata-containers/pull/5241 Thanks.11:57
kata-irc-bot<mail173> awesome, thanks @yinnanyao, @norbj.d_kata can you test and confirm if this fixes for us?12:07
kata-irc-bot<yinnanyao> I already test and confirm can be created in parallel after using the random path. These modifications can handle the `failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found ` problem. But when more containers are created in parallel, the resulting `failed to create shim task: Failed to Check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed out connecting to12:51
kata-irc-botvsock 1541455645:1024: unknown` problem, I guess It is caused by insufficient memory. After I increase the memory, I can concurrency with a higher number. I haven't been able to find out if it's for other reasons.  If you have better suggestions please provide me.12:51
kata-irc-bot<norbj.d_kata> @yinnanyao thank you ! I can confirm the `failed to create shim task: open /sys/fs/cgroup/systemd/vc/tasks: no such file or directory: not found`  have disappeared with your modifications (that I have applied to 2.5.1 branch) But, you're right, there is still the timeout connecting to vsock. But it looks like it's not a memory problem, because my instance have enough memory (when launching 30 containers in parallel, I can see a14:43
kata-irc-botpeak at 4 GB only, and my instance have 64 GB of RAM). However, CPU is full (100% usage) :thinking_face: Have you experienced that CPU overload too ?14:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!