kata-irc-bot | <archana.m.shinde> @sebastien.boeuf I am debugging the netmon failures for tc now, that I talked about yesterday | 00:06 |
---|---|---|
kata-irc-bot | <archana.m.shinde> question about netmon, before I go dig deeper | 00:06 |
*** igordc has quit IRC | 00:07 | |
kata-irc-bot | <archana.m.shinde> it subscribes to all netlink events? | 00:07 |
kata-irc-bot | <archana.m.shinde> I just started the debug logs for netmon, and for some reason I dont see the debug netlink events in place when a network is disconnected with docker disconnect | 00:08 |
*** auk has joined #kata-dev | 00:20 | |
*** auk has quit IRC | 00:31 | |
*** auk has joined #kata-dev | 01:05 | |
*** EricRen has joined #kata-dev | 01:35 | |
*** irclogbot_0 has quit IRC | 03:01 | |
*** irclogbot_1 has joined #kata-dev | 03:01 | |
*** changcheng has quit IRC | 03:36 | |
*** changcheng has joined #kata-dev | 03:38 | |
*** sameo has joined #kata-dev | 05:08 | |
*** lpetrut has joined #kata-dev | 06:02 | |
*** sgarzare has joined #kata-dev | 06:26 | |
*** jodh has joined #kata-dev | 07:10 | |
*** davidgiluk has joined #kata-dev | 07:59 | |
*** gwhaley has joined #kata-dev | 08:06 | |
gwhaley | hi brtknr: yeah, we know that 'grpc server' error quite well, but sadly, it is a bit of a generic error indicating there was trouble connecting to the agent inside the container. So, it doesn't give us the immediate clue to the solution :-( I don't suppose you got further overnight did you? :-) | 08:18 |
*** sameo_ has joined #kata-dev | 08:43 | |
*** sameo has quit IRC | 08:44 | |
*** tmhoang has joined #kata-dev | 09:28 | |
*** auk has quit IRC | 09:45 | |
brtknr | gwhaley: so the minikube environment on packet seems to work without any issues... I am seeing this problem on another cloud where I was trying to replicate the workshop... | 10:15 |
brtknr | gwhaley: Perhaps I did something wrong in my configuration... | 10:15 |
gwhaley | hi brtknr: as long as you installed the minikube using the command line as detailed in the gist and on the packet motd file, and you checked that you have kvm enabled and nested vm enabled, then I would think it would work.... so... | 10:18 |
brtknr | :q | 10:18 |
gwhaley | if you have done all those (I suspect you have), and you are willing, if you could let us know what sort of setup (or cloud) it is, we could maybe investigate | 10:18 |
brtknr | Do you still need to enable RuntimeClass feature gate? | 10:20 |
brtknr | I didnt think I did in 1.14 | 10:20 |
brtknr | k8s that is | 10:20 |
*** tmhoang has quit IRC | 10:25 | |
gwhaley | brtknr: in 1.14, I don't think you need to enable. but, I've not tried myself | 10:26 |
gwhaley | be interested in what the other cloud is or the physical node - most other clouds are not bare metal, so are already a nested VM - not all support further nesting - so, we may need to check that for sure | 10:27 |
brtknr | The other cloud is Sausage cloud :) it has nested virt enabled | 10:30 |
gwhaley | must be Friday! - heh, never heard of that - off I go to look! | 10:30 |
gwhaley | brtknr: do you have a link? | 10:31 |
* davidgiluk notes the google for that doesn't look promising | 10:33 | |
brtknr | compute.sausage.cloud... you need to ask Nick Jones (yankcrime) to create an account for you | 10:34 |
brtknr | You're presenting at Manchester cloud native inagural meetup that he organises arent you? | 10:35 |
gwhaley | brtknr: indeed I am, and we chatted some at the OpenInfraDays... I'll go peek, and then maybe ask him :-) | 10:39 |
brtknr | I can add your ssh key to the machine where I am messing around if you wanna have a go? | 10:39 |
gwhaley | heh, that homepage for sausage is nicely anonymous... | 10:39 |
gwhaley | let me have a chat with yankcrime and see what I can find out - thx! | 10:40 |
davidgiluk | brtknr: Take care though, things like migration of L1 doesn't work if nesting is enabled (well in some cases in other cases if it's being used) | 10:41 |
brtknr | kgz: hello there | 10:41 |
kgz | sup | 10:42 |
*** yankcrime has joined #kata-dev | 10:42 | |
* brtknr waves to yankcrime | 10:42 | |
yankcrime | yo brtknr | 10:42 |
brtknr | > things like migration of L1 doesn't work if nesting is enabled | 10:44 |
gwhaley | I'm thinking maybe the hardware is too old for the instruction set used to build the rootfs maybe - the clearlinux rootfs has some not-too-old hardware requirements iirc... | 10:52 |
*** EricRen has quit IRC | 10:52 | |
gwhaley | sooo, brtknr yankcrime - there is a ref in the code that we should have 'at least a Westmere', which is a couple of years more recent than the hw I think you have: https://github.com/kata-containers/runtime/blob/master/cli/kata-check_amd64.go#L64 | 10:55 |
yankcrime | yeah looks like that feature was introduced in the westmere architecture | 10:57 |
brtknr | gwhaley: That seems like a reasonable guess! I'll try to find peace in this newly acquired knowledge then :) | 10:58 |
* yankcrime wonders what would happen if you disabled that check | 11:00 | |
* brtknr wonders if /usr/bin/kata-qemu kata-check should pick up on this and report it as a blocker | 11:04 | |
davidgiluk | are you really running on something older than a westmere? That's pretty old | 11:05 |
gwhaley | brtknr: yankcrime - yes, I think if that is the issue then a nicer error message would be great ;-) jodh, do you remember the (long and tortured) history here? | 11:05 |
gwhaley | I don't remember if it won't work, or there are features that won't work, or we just never ever test it, so give you no guarantees ;-) | 11:06 |
gwhaley | I still have a feeling it might come down to some instruction set minimum we set for the clear linux rootfs build as well | 11:06 |
yankcrime | davidgiluk: it's running on old datacentred kit - hardware that was given to us by another isp that was throwing it away, and this was 5 years ago! | 11:08 |
yankcrime | so yeah, it's relatively ancient | 11:08 |
yankcrime | but i hate to see working hardware go to waste, so.... | 11:08 |
davidgiluk | nod | 11:08 |
*** tmhoang has joined #kata-dev | 11:40 | |
*** devimc has joined #kata-dev | 12:00 | |
*** dhellmann has left #kata-dev | 12:10 | |
*** altlogbot_1 has joined #kata-dev | 13:03 | |
*** sgarzare has quit IRC | 13:09 | |
*** altlogbot_1 has quit IRC | 13:32 | |
*** altlogbot_3 has joined #kata-dev | 13:32 | |
*** altlogbot_3 has quit IRC | 13:38 | |
*** altlogbot_3 has joined #kata-dev | 13:38 | |
*** altlogbot_3 has quit IRC | 14:00 | |
*** lpetrut has quit IRC | 14:11 | |
*** altlogbot_2 has joined #kata-dev | 14:25 | |
*** altlogbot_2 has quit IRC | 14:29 | |
*** altlogbot_3 has joined #kata-dev | 14:29 | |
*** altlogbot_3 has quit IRC | 14:33 | |
*** altlogbot_2 has joined #kata-dev | 14:33 | |
*** altlogbot_2 has quit IRC | 14:33 | |
brtknr | gwhaley: 117Mb/s inside kata container, 1.5Gb/s inside a runc container is the preliminary result... I'm going to try doing the volumeBlock approach that someone recommended earlier and see what happens | 14:44 |
*** sgarzare has joined #kata-dev | 14:44 | |
brtknr | this is using the volumeMount approach | 14:44 |
brtknr | which you already warned me about :) | 14:45 |
brtknr | I'm running this command: dd if=/dev/zero of=block bs=1G count=1 | 14:47 |
*** altlogbot_2 has joined #kata-dev | 14:48 | |
davidgiluk | brtknr: That doesn't do a sync does it? So it's still writing once the dd completes? | 14:48 |
gwhaley | writing ... somewhere... oh, sync and VMs - what an interesting little area that is eh davidgiluk :-) | 14:50 |
davidgiluk | brtknr: I think you typically add a oflag=dsync or something? | 14:51 |
*** altlogbot_2 has quit IRC | 14:51 | |
*** altlogbot_0 has joined #kata-dev | 14:52 | |
brtknr | davidgiluk: its about the same: 1073741824 bytes (1.1 GB) copied, 8.91691 s, 120 MB/s | 14:55 |
davidgiluk | ok | 14:55 |
brtknr | with oflag=dsync | 14:55 |
*** altlogbot_0 has quit IRC | 14:55 | |
*** altlogbot_1 has joined #kata-dev | 14:56 | |
*** altlogbot_1 has quit IRC | 14:56 | |
*** altlogbot_2 has joined #kata-dev | 14:58 | |
*** igordc has joined #kata-dev | 15:04 | |
brtknr | davidgiluk: gwhaley: using the volumeBlock approach: 492765184 bytes (493 MB, 470 MiB) copied, 1.32476 s, 372 MB/s | 15:24 |
brtknr | But still nowhere near the raw performance | 15:24 |
brtknr | Anything else I should try? | 15:25 |
brtknr | @archana.m.shinde^^ | 15:25 |
kata-irc-bot | <gmmaharaj> brtknr: is there a place where do have documented your setup? it would be good to mimic it locally to see it. | 15:26 |
*** devimc has quit IRC | 15:26 | |
kata-irc-bot | <gmmaharaj> if we are using a block based volume, the performance should be good. WOn't be close to what you see in raw by pretty close. | 15:26 |
brtknr | I'm running the setup documented here: https://gist.github.com/brtknr/06521748bca81b399152a42bf7cb6538 | 15:28 |
brtknr | Initially, I just did a regular hostPath mount | 15:28 |
kata-irc-bot | Action: gmmaharaj goes to see | 15:28 |
davidgiluk | brtknr: What does the qemu command line part of that look like? | 15:29 |
gwhaley | brtknr: you are also using dd with bs=1G still, yes? we should consider if that is anything like a real world case. maybe look at that fio test code I pointed at, that does multiple different block size transfers iirc :-) | 15:30 |
* gwhaley suspects davidgiluk has his own preference for what io tests to run, probably also fio based maybe? | 15:30 | |
davidgiluk | gwhaley: I'm not really a block wrangler, more of stefanha's department | 15:30 |
brtknr | tbh, i get the same performance outside of kata container... so this seems like a limitation of using loop block device in general | 15:37 |
brtknr | gwhaley: I will eventually run fio tests... I am just trying to get a feel for what is achievable.. dd is a pretty good proxy imho | 15:37 |
gwhaley | brtknr: ah, yes, loopback will hurt perf - but, it is the easiest way to get a block device iirc. I had to repart/install a machine to get a real devicemapper block device at one point :-( | 15:37 |
gwhaley | sure, dd will get you a feel, sure | 15:38 |
gwhaley | gmmaha ^^ fyi, loopback. | 15:38 |
* brtknr wonders how to turn a network mounted disk into a block device | 15:39 | |
gwhaley | heh, when it is not gluster or ceph? :-) | 15:39 |
brtknr | gwhaley: we're using beegfs | 15:39 |
gwhaley | in theory you can mount those net block storage devices inside the container itself.... but, I've not tried that for some time. it might need some kernel module enabling in the vm kernel depending on the fs in use | 15:40 |
kata-irc-bot | <gmmaharaj> brtknr: i think the block size matters when it comes to io with dd too. | 15:40 |
kata-irc-bot | <gmmaharaj> ``` ganeshma@ganeshma-lab1:~$ dd if=/dev/zero of=block bs=512M count=1 1+0 records in 1+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 0.753192 s, 713 MB/s ganeshma@ganeshma-lab1:~$ dd if=/dev/zero of=block_small bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 0.491864 s, 1.1 GB/s ganeshma@ganeshma-lab1:~$ dd if=/dev/zero of=block_small bs=4K count=131072 131072+0 records in | 15:40 |
kata-irc-bot | 131072+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 1.65169 s, 325 MB/s ``` | 15:40 |
kata-irc-bot | <gmmaharaj> i wonder what your throughput would be if you use small block size within kata? any chance you can run that test one more time but drop the block size? | 15:40 |
brtknr | gmmaharaj: using volumeBlock or volumeMount? | 15:41 |
kata-irc-bot | <gmmaharaj> brtknr: volumeBlock please if you could. | 15:41 |
kata-irc-bot | <gmmaharaj> what is your current setup? volumeMount? if yes, then why not both? | 15:42 |
gwhaley | <cough> https://github.com/kata-containers/tests/blob/master/metrics/storage/fio.sh#L57-L60 ;-) | 15:42 |
gmmaha | :D | 15:42 |
brtknr | root@my-pod:/mnt# dd if=/dev/zero of=block count=1000 oflag=dsync | 15:42 |
brtknr | 1000+0 records in | 15:42 |
brtknr | 1000+0 records out | 15:42 |
brtknr | 512000 bytes (512 kB, 500 KiB) copied, 1.45657 s, 352 kB/s | 15:43 |
kata-irc-bot | <gmmaharaj> i wonder what dd's default size is. 4K? | 15:43 |
kata-irc-bot | <gmmaharaj> brtknr: can you specify some sie for the block. bs=1M maybe? | 15:43 |
brtknr | with 1M, its 198M/s | 15:44 |
brtknr | with 1M, its 198MB/s | 15:44 |
kata-irc-bot | <gmmaharaj> and this is VolumeMount? | 15:45 |
brtknr | gmmaha: like I said, I get the same perf outside of kata too... if I mount the block device directly on the host | 15:45 |
brtknr | gmmaha: no, volumeBlock | 15:45 |
kata-irc-bot | <gmmaharaj> aaah ok | 15:45 |
kata-irc-bot | <gmmaharaj> so it | 15:46 |
kata-irc-bot | <gmmaharaj> let me see if i can reproduce this locally with a block on a true volume device instead of loopback | 15:47 |
brtknr | gmmaha: I get the same perf with volumeMount with 1M blocksize | 15:47 |
kata-irc-bot | <gmmaharaj> i have tested this using https://github.com/ganeshmaharaj/lvm-snapshotter on a physical device and the performance was comparable to runc. | 15:48 |
brtknr | but inside a runc container, it goes up to 1.5GB/s | 15:48 |
kata-irc-bot | <gmmaharaj> i can collect the numbers again with the new version to see what i get | 15:48 |
brtknr | gwhaley: I remember you mentioning that things should get better with vsock? | 15:49 |
gwhaley | brtknr: vsock in two ways - only one I think storage related.... | 15:50 |
gwhaley | if vsock is availabe in the kernel, then we can use it to talk to the container, and drop the proxy process - so a small win on size and complexity for kata.... | 15:50 |
brtknr | gwhaley: is there already a way to enable it? | 15:51 |
gwhaley | and then there is the new in-test virtio-fs - which I'm not sure is actually bound to vsock. virtio-fs gives better than 9p storage performance. maybe not as good as block though, but you cannot use block in all situations I think | 15:51 |
gwhaley | there is a guide to enabling virtio-fs - I think gmmaha has done it recently. not quite trivial right now though I think - we are working on getting all the bits into kata so it is easier/available before all the bits land in the upstream (kernel, qemu etc.) | 15:51 |
gmmaha | quite a few moving parts that davidgiluk stefanha have been working on and i have been tracking to make sure we get all the bits landed in kata for it to work out of the box. as gwhaley mentioned, not trivial right now. | 15:53 |
brtknr | gwhaley: should i go down that path if its already known that its no better than using block? | 15:53 |
*** devimc has joined #kata-dev | 15:53 | |
gwhaley | brtknr - I guess that depends on what your goal is? Right now, it seems you have enough to go at with loop block, so you could run some initial stuff and get a feel for it. | 15:54 |
brtknr | gwhaley: it seems like I will be testing the limits of using loop block device rather than kata | 15:55 |
brtknr | I need to see if there is another way to do this | 15:56 |
gwhaley | I'll leave gmmaha to discuss that - he knows a lot more about storage stuff than I do :-) | 15:56 |
gmmaha | brtknr: if you have a spare drive attached to your machine, i can provide you with steps on where you can setup a devicemapper device and use that as a backend snapshot holder using containerd | 15:56 |
brtknr | gmmaha: that would be great | 15:57 |
gmmaha | doing it in the kubecontext might be a bit hard, but yoiu can still do it via commandlines if performance is what you are after | 15:57 |
kata-irc-bot | <archana.m.shinde> catching up.. | 15:57 |
gmmaha | cool. will get that your way later today | 15:57 |
brtknr | i've gone down the root of using cri-o, will it work with that too? | 15:57 |
kata-irc-bot | <archana.m.shinde> brtknr: if you have a spare device, you can pass it using the gist for raw block devices in k8s | 15:57 |
kata-irc-bot | <archana.m.shinde> or use devicemapper backed by a real device as @gmmaharaj mentioned | 15:58 |
gmmaha | brtknr: it seems cri=o also has a devicemapper device. you can definitely work with that | 15:59 |
gmmaha | i haven't tested it though | 15:59 |
brtknr | let me explain my use case, I want to mount a parallel file system to kata containers and see if they can read/write lots of things in parallel when there's lots of them | 15:59 |
brtknr | using a raw block device limits me to using one host correct? | 16:00 |
brtknr | as I need to specify node affinity | 16:00 |
gmmaha | brtknr: yes. what is the parallel file system you have in mind? | 16:01 |
kata-irc-bot | <archana.m.shinde> yes a raw block device will limit you to one | 16:01 |
brtknr | with runc, I can mount the parallel file system using hostPath and can get close to raw performance | 16:01 |
brtknr | gmmaha: We're usibg BeeGFS | 16:02 |
gmmaha | brtknr: aah.. i have never worked on that system, but i have worked on ceph a little and know somethign about it. | 16:03 |
gmmaha | beegFS lets you do a fuse mount of the block backend? | 16:03 |
gmmaha | on the host? | 16:03 |
brtknr | gmmaha: We also have Ceph available for testing but havent tried it yet | 16:03 |
brtknr | gmmaha: with BeeGFS, its a kernel mount on on host | 16:04 |
brtknr | however the server runs on user space... dont ask me why :P | 16:04 |
gmmaha | brtknr: lol.. i won't. | 16:05 |
gmmaha | @amshinde can correct me. but if it is a kernel mount and it can be recognized as a mount point, kata runtime should automaticaly pass it as a block volume onto the guest | 16:05 |
gmmaha | which should bypass 9p and get you out of those vows. | 16:06 |
gmmaha | amshinde: ^^ | 16:06 |
kata-irc-bot | <archana.m.shinde> if its mounted on the host side, then the way it would passed is through 9p | 16:08 |
kata-irc-bot | <archana.m.shinde> brtknr: do you have your pod yaml file that you have setup with runc? | 16:09 |
kata-irc-bot | <archana.m.shinde> wanted to take a look at your exact setup you have now | 16:09 |
kata-irc-bot | <gmmaharaj> https://gist.github.com/brtknr/06521748bca81b399152a42bf7cb6538 | 16:09 |
kata-irc-bot | <gmmaharaj> that's what i got from him @archana.m.shinde | 16:09 |
kata-irc-bot | <archana.m.shinde> @gmmaharaj doesnt help, thats the gist I provided him for `raw block support` in k8s :slightly_smiling_face: | 16:11 |
kata-irc-bot | <gmmaharaj> duh! you wanted the final setup. sorry | 16:12 |
brtknr | https://github.com/brtknr/packaging/commit/db356ec01d236021a691d567db0dfe99a4d64d49 | 16:14 |
brtknr | its just a simple hostPath volume mount | 16:14 |
brtknr | its time for pub here, thanks for the discussion everyone, feels like 2 step forward 1 step but hey, its progress :) | 16:20 |
brtknr | s/1 step/1 step back | 16:21 |
gwhaley | ah, Friday night... have a good weekend brtknr! | 16:21 |
brtknr | you too :) | 16:21 |
*** jodh has quit IRC | 16:24 | |
brtknr | Last thought: wondering if I run 10x as many kata containers, whether I’ll be able to saturate the iops | 16:39 |
davidgiluk | brtknr: It should do, they're all pretty independent - but then it depends how your network filesystem attaches | 16:55 |
*** gwhaley has quit IRC | 16:57 | |
*** sgarzare has quit IRC | 17:34 | |
*** irclogbot_1 has quit IRC | 18:08 | |
*** irclogbot_1 has joined #kata-dev | 18:10 | |
*** davidgiluk has quit IRC | 19:14 | |
*** eernst has joined #kata-dev | 19:21 | |
*** eernst has quit IRC | 19:23 | |
*** eernst has joined #kata-dev | 19:27 | |
*** eernst has quit IRC | 19:32 | |
*** eernst has joined #kata-dev | 19:34 | |
*** eernst has quit IRC | 19:38 | |
*** eernst has joined #kata-dev | 19:40 | |
*** eernst has quit IRC | 19:45 | |
*** eernst has joined #kata-dev | 19:46 | |
*** eernst has quit IRC | 19:50 | |
*** eernst has joined #kata-dev | 19:53 | |
*** eernst has quit IRC | 19:57 | |
*** eernst has joined #kata-dev | 19:59 | |
*** eernst has quit IRC | 20:02 | |
*** eernst has joined #kata-dev | 20:05 | |
*** eernst has quit IRC | 20:09 | |
*** eernst has joined #kata-dev | 20:16 | |
*** eernst has quit IRC | 20:22 | |
*** eernst has joined #kata-dev | 20:23 | |
*** eernst has quit IRC | 20:25 | |
*** devimc has quit IRC | 20:51 | |
*** igordc has quit IRC | 23:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!