Wednesday, 2021-09-08

kata-irc-bot	<shuo.chen> Hi, does anyone know whether we can enable multi-queue for kata container nic? from what I saw in kata container: ```root@nginx-deployment-794d9ffbb7-pgcc7:/mnt# ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX:0 TX:0 Other:0 Combined:1 Current hardware settings: RX:0 TX:0 Other:0 Combined:1``` we are always using one queue	18:17
kata-irc-bot	<shuo.chen> looks like change the default vcpu will change the queue count	18:54
kata-irc-bot	<shuo.chen> we did a small benchmark that create around 200 connections to s3 and download a 50MB files for every connection. What we find is kata container takes much longer time than nonkata one. Any thoughts on how to tuning it?	19:11
kata-irc-bot	<fidencio> @rlk, cc'ing you here as you did a reasonable amount of benchmarks for the project.	19:17
kata-irc-bot	<fidencio> I think @rlk faced the very same issue	19:18
kata-irc-bot	<fidencio> And I also think we may need some work to get it solved. Are you using kata-containers with containerd?	19:18
kata-irc-bot	<fidencio> And some work on other components apart from kata-containers (as in, in the cri runtimes)	19:19
kata-irc-bot	<shuo.chen> yes I am using containerd	19:20
kata-irc-bot	<rlk> Yep, same issue. In my case I saw it in terms of the number of vhost kernel threads on the host, and it looked like that vhost thread was using close to a full CPU. The guest CPU was not pinned.	19:20
kata-irc-bot	<shuo.chen> @rlk would you mind take a look at the micro benchmark that we are doing and see whether you have any thoughts for improvements?	19:21
kata-irc-bot	<rlk> I can look at it, but I don't have a huge amount of time for it.	19:21
kata-irc-bot	<rlk> Without seeing the benchmark and the pod and VM configuration it's hard to say. Are these all from one pod (within 1 VM)? Is this the same issue as the other thread here?	19:22
kata-irc-bot	<shuo.chen> yes it is from a single pod. We are using default configuration. My initial thought is it is the same issue as the other threads, but after changing the default_vcpu to 8, I saw 8 queues in kata container but the perforamcne didn’t become better	19:23
kata-irc-bot	<rlk> Look on the host; how many `vhost` threads are present and how much CPU are they getting while running?	19:24
kata-irc-bot	<shuo.chen> ```root 44371 1.0 0.0 0 0 ? S 18:39 0:27 [vhost-44364] root 44372 0.8 0.0 0 0 ? S 18:39 0:21 [vhost-44364] root 44373 0.6 0.0 0 0 ? S 18:39 0:18 [vhost-44364] root 44374 0.6 0.0 0 0 ? S 18:39 0:16 [vhost-44364] root 44375 0.5 0.0 0 0 ? S 18:39 0:13 [vhost-44364] root 44376 0.4 0.0 0 0	19:24
kata-irc-bot	[vhost-44364] root 44378 0.4 0.0 0 0 ? S 18:39 0:12 [vhost-44364] root 44389 0.0 0.0 0 0 ? S 18:39 0:00 [vhost-44364]``` 9 in total	19:24
kata-irc-bot	<rlk> Yeah, actually that's probably wrong, it's virtiofsd you need to look for on the host.	19:25
kata-irc-bot	<shuo.chen> I saw two virtiofsd process	19:26
kata-irc-bot	<shuo.chen> but that is for file system right? not network	19:26
kata-irc-bot	<rlk> Yep. I just don't know whether this is a filesystem or networking operation as seen from inside the Kata pod.	19:26
kata-irc-bot	<shuo.chen> oh, in our benchmark, we just downloading 50MB files into memory, so there should be no disk IO	19:27
kata-irc-bot	<rlk> Downloading how?	19:27
kata-irc-bot	<shuo.chen> we are using python aws s3 client to downloading a 50MB object	19:28
kata-irc-bot	<rlk> OK. Was your benchmark running when you took that `ps` output?	19:28
kata-irc-bot	<shuo.chen> yes	19:28
kata-irc-bot	<shuo.chen> sample python script is here: ```# Define a function for the thread def print_time(s3_client): bytes_buffer = io.BytesIO() time1 = round(time.time() * 1000) obj = s3_client.download_fileobj(Bucket="shuochen-test", Key="workload", Fileobj=bytes_buffer) time2 = round(time.time() * 1000) print(time2 - time1) boto3.setup_default_session(profile_name='aws-dev_databricks-power-user') s3_client = boto3.client('s3') for i	19:29
kata-irc-bot	in range(1,int(sys.argv[1])): start_new_thread( print_time, (s3_client,) ) os.system("rm -rf output*") while 1: pass```	19:29
kata-irc-bot	<rlk> It doesn't look like `vhost` is very busy here. Can you try a much bigger file, so you can watch `vhost` for a longer period of time?	19:32
kata-irc-bot	<rlk> BTW, that looks like you are actually generating output (why that `rm -rf output*`? But unless you have a really slow storage back end, I wouldn't be inclined to think that's a problem. But one of the first rules of performance analysis is to not be too quick to rule things out.	20:18
kata-irc-bot	<shuo.chen> oh the `rm -rf output*` is the code that I havn’t clean up. let me monitor it again	20:33
kata-irc-bot	<eric.ernst> Yeah, see https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/network.go#L440	20:58
kata-irc-bot	<eric.ernst> imo this isn’t ideal.	20:58
kata-irc-bot	<eric.ernst> IIRC the challenge faced in this instance is the number of resulting vCPU isn’t known until longer after we have the network device setup (ie, this is setup before the containers are added to the sandbox)	21:02
kata-irc-bot	<fidencio> CRI-O passes down that info to kata-containers (see: https://github.com/kata-containers/kata-containers/issues/2071#issuecomment-865081526) ... that's something we could explore on having an agreement with containerd folks and expand their sandbox API proposal to include that	21:07
kata-irc-bot	<rlk> We have the same issue with virtiofsd (I'm crunching on that right now, in fact).	21:24
kata-irc-bot	<eric.ernst> Looking at that comment now, Fabiano	21:25
kata-irc-bot	<eric.ernst> need jq and some sed…	21:25
kata-irc-bot	<eric.ernst> can you clarify what that is? You mean to say crio passes annotations to the underlying runtime?	21:26
kata-irc-bot	<eric.ernst> For virtiofsd — are you saying number of queues to use, @rlk?	21:27
kata-irc-bot	<eric.ernst> I thought there wasn’t much perf benefit on changing # queueus for virtiofsd :shrug:	21:27
kata-irc-bot	<fidencio> > can you clarify what that is? You mean to say crio passes annotations to the underlying runtime? Correct me if I'm mistaken, but the problem happens because the underlying runtime doesn't have the info about the requested CPUs, thus it doesn't know what to set until longer we have the network devuce set up. However, CRI-O does pass that information down (the whole yaml file used to create the pod, actually) to the underlying	21:31
kata-irc-bot	runtime as an annotation, and I think we could explore that, mainly if we get an agreement to do the same on containerd side, and that could help with both the issue reported by Robert and the issue now reported by Shuo Chen.	21:31
kata-irc-bot	<rlk> @eric.ernst yes	21:31
kata-irc-bot	<fidencio> Or am I missing something here? (which is very very plausible ;-))	21:32
kata-irc-bot	<eric.ernst> yowza. i didn’t realize crio passed the whole pod spec. how’d that come about?	21:33
kata-irc-bot	<rlk> There can be a lot of benefit; the default of 1 queue isn't enough for heavy multi-stream traffic. For example, with a 32 instance fio job (and a large enough VM -- 16 cores seems to do the job), with the default thread pool size of 1 I can't get better than 1.1 GB/sec on an 8x gen3 NVMe; with 4 threads, I get about 3.2, and 16 threads about 5.2. On bare metal I'm getting 5.9 GB/sec.	21:35
kata-irc-bot	<fidencio> ping me tomorrow and we can have a nice discussion about this, if you think it's handy. I remember doing some tests that I need to find my notes about, but I refuse opening the "office" door at this time. :slightly_smiling_face: (sorry)	21:36
kata-irc-bot	<eric.ernst> I’m not sure consuming this is a good idea; i was surprised to see it (the pod spec that is).	21:36
kata-irc-bot	<eric.ernst> no worries — get offline dude! I’ll chat w/ you tmo	21:37
kata-irc-bot	<fidencio> I don't think we should consume it as it's. But I do think that can be a good starting point for a bigger discussion (as the one suggested by c3d on the last AC meeting) on how to improve the sandbox containers API proposed by containerd folks	21:37
kata-irc-bot	<eric.ernst> I see, got it. I’ll look at pushing a PR to update CRI to pass this information down explicitly.	22:28
kata-irc-bot	<eric.ernst> Probably won’t trickle out until 1.23, but let’s see….	22:28

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!