*** sm806 has quit IRC | 02:46 | |
*** amoralej has quit IRC | 02:46 | |
*** vabada has quit IRC | 02:46 | |
*** odyssey4me has quit IRC | 02:46 | |
*** vabada has joined #openstack-dib | 02:46 | |
*** sm806 has joined #openstack-dib | 02:47 | |
ianw | johnsom: sorry been a bit distracted with other things, lgtm | 04:36 |
---|---|---|
*** hwoarang has quit IRC | 04:37 | |
johnsom | ianw No worries. I heard you are having fun with the 8 series.... | 04:37 |
*** hwoarang has joined #openstack-dib | 04:37 | |
ianw | that and looking at some weirdness with slower boots from centos nodes | 04:38 |
johnsom | Ah, Carlos looped you into that too... grin | 04:38 |
johnsom | Sadly I have not had time to poke at that, I have been focused on a big feature for stein. | 04:40 |
ianw | it's ... interesting ... spent a bit of time on it today, updating https://etherpad.openstack.org/p/dib-centos-slow | 04:44 |
ianw | spoiler alert ... no smoking gun :/ | 04:44 |
johnsom | My purely out of the air guess is there is some driver issue, like one of the virtio drivers is off or something | 04:46 |
ianw | this could be it ... i'm noticing some weird pauses during scsi enumeration (maybe ... nothing comes out between pauses) as a start | 04:48 |
johnsom | ianw Are these two images being booted on the same host with qemu/libvirt? upstream vs. ext4? | 05:38 |
johnsom | The other thing that is obvious is the ext4 boot kernel includes more cgroup subsystems, yama, and the Spectre code | 05:47 |
johnsom | That is known to tank io performance | 05:47 |
johnsom | The ext4 kernel is a lot more bloated with modules too | 05:52 |
ianw | johnsom: yep, same host, same environment | 06:44 |
*** hwoarang has quit IRC | 07:18 | |
*** hwoarang has joined #openstack-dib | 07:19 | |
ianw | johnsom: added some notpi/spectre_v2 off numbers; within the noise so not totally that :/ | 07:34 |
*** brault has joined #openstack-dib | 08:42 | |
*** brault has quit IRC | 08:45 | |
*** odyssey4me has joined #openstack-dib | 09:44 | |
ianw | tonyb: this fedora gate failure mean anything to you? | 10:08 |
ianw | http://logs.openstack.org/49/627949/1/check/dib-functests-centos7-python2/9ade208/logs/ironic-agent_build-succeeds-fedora.FAIL.log | 10:08 |
ianw | 2019-01-15 19:26:36.687 | Could not open requirements file: [Errno 2] No such file or directory: '/usr/share/ironic-python-agent/upper-constraints.txt' | 10:08 |
ianw | in the good test, it seems to go downloading it from git | 10:09 |
ianw | http://logs.openstack.org/75/627775/1/check/dib-functests-xenial-python3/045759e/logs/ironic-agent_build-succeeds-fedora.PASS.log | 10:09 |
ianw | but seems to bail out in the bad one | 10:09 |
ianw | i'll check in on it tomorrow ... | 10:09 |
*** odyssey4me has quit IRC | 10:49 | |
*** irclogbot_0 has quit IRC | 12:47 | |
*** irclogbot_0 has joined #openstack-dib | 12:58 | |
*** irclogbot_0 has quit IRC | 13:45 | |
*** irclogbot_0 has joined #openstack-dib | 13:56 | |
*** brault has joined #openstack-dib | 14:33 | |
*** brault has quit IRC | 14:33 | |
*** brault has joined #openstack-dib | 16:04 | |
*** brault has quit IRC | 16:08 | |
*** brault has joined #openstack-dib | 16:24 | |
*** greghaynes has joined #openstack-dib | 16:27 | |
*** brault has quit IRC | 16:29 | |
*** brault has joined #openstack-dib | 18:52 | |
*** hwoarang has quit IRC | 20:12 | |
*** hwoarang has joined #openstack-dib | 20:17 | |
*** irclogbot_0 has quit IRC | 20:41 | |
tonyb | win 54 | 20:48 |
*** irclogbot_0 has joined #openstack-dib | 20:53 | |
*** openstackgerrit has quit IRC | 20:56 | |
prometheanfire | tonyb: too many windows | 21:20 |
tonyb | prometheanfire: that's about the midpoint | 21:22 |
prometheanfire | ouch | 21:23 |
ianw | i'm assuming the fedora job didn't magically fix itself as i slept :) | 21:36 |
ianw | i think it must be related to https://github.com/openstack/ironic-python-agent/commit/041c1795db9249b423622b21191cb88d22594451 | 21:49 |
ianw | tonyb: anyway, re my previous message looks like an i-p-a issue, nothing to do with requirements : Fix upper-constraints.txt fallback copying https://review.openstack.org/631664 | 22:08 |
tonyb | Ahh cool | 22:08 |
tonyb | where are you? | 22:08 |
tonyb | Why were you working 10 hours ago? | 22:08 |
ianw | i'm where i always am :) it was only 9pm | 22:10 |
clarkb | johnsom: centos doesn't ship a different kernel for xfs vs ext4 as far as I know. Both modules are loaded in the kernel by default | 22:12 |
johnsom | clarkb That wasn't my concern/question. | 22:13 |
clarkb | ok I don't understand what you mean by ext4 kernel then | 22:13 |
clarkb | the ramdisk kernel? | 22:13 |
johnsom | clarkb I was asking if the platform was the same for Ian's test runs, some of which were ext4 and some were xfs. | 22:13 |
johnsom | The "upstream" run has a very different kernel than the DIB runs. | 22:14 |
clarkb | dib installs the upstream kernel? | 22:15 |
clarkb | (I'm still confused, sorry. DIB installs upstream packages and upstream kernel has ext4 loaded in it) | 22:15 |
johnsom | The "upstream" test run (I assume this is an image built somewhere else and not using DIB) is using 3.10.0-229.el7.x86_64 | 22:16 |
johnsom | The DIB run was 3.10.0-957.1.3.el7.x86_64 | 22:16 |
ianw | clarkb: you probably missed a bunch of context sorry, because we've been tracking why dib-built centos is slower than the upstream images in a google doc with octavia | 22:16 |
ianw | clarkb: anyway, i started running some tests as best i could -- https://etherpad.openstack.org/p/dib-centos-slow | 22:17 |
johnsom | clarkb It was a conversation Ian and I were having about some observed performance issues with the DIB built centos images. | 22:17 |
ianw | one thing i thought was maybe using ext4 instead of xfs (as the upstream image was doing) would make a difference | 22:17 |
ianw | (it didn't) | 22:17 |
johnsom | Yeah, I don't think it's the filesystem | 22:17 |
clarkb | if you upddate the kernel in the upstream image does it get slow? | 22:18 |
ianw | hrm, i think i can do that, let me see if it works | 22:19 |
johnsom | I only got half way through my analysis of the kernel logs last night. I'll take a few minutes and see if anything jumps out at me. | 22:20 |
ianw | this pause | 22:20 |
ianw | [00:01:06.835] [ 45.499264] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 <------------------ pause | 22:20 |
ianw | [00:01:25.950] [ 64.613378] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray | 22:20 |
ianw | is pretty consistent on the dib kernels, and not seen on the upstream one | 22:20 |
johnsom | Yeah, the ACPI is different for some reason. That is why I was asking about the platform. I.e. did the qemu machine type change | 22:20 |
ianw | clarkb: hrm, the vm networking isn't setup, because i'm just booting this by hand nested in my work vm (so it is using binary translation, which we want). have to see what i can setup | 22:25 |
ianw | johnsom: yeah, same qemu command ... although let me see about kernel command lines | 22:25 |
clarkb | ianw: virsh should be able to get you a console if the root or centos user will let you login that way | 22:25 |
johnsom | They are pretty close | 22:25 |
ianw | yeah, nothing too weird on upstream image | 22:26 |
ianw | ooohhh, why am i using such an old upstream image. i've sorted them the wrong way | 22:31 |
ianw | let me try again to get a better baseline | 22:31 |
ianw | hi ho, hi ho, dowloading qcow images we go ... | 22:32 |
clarkb | different movie but now I've got "Hi Ho SILVER!" from whats his name in the island of misfit toys in my head | 22:33 |
clarkb | er no I think I've mashed two things together and now I'm confused | 22:34 |
clarkb | I'll blame the kids waking me up at 3am | 22:34 |
johnsom | Yeah, I see the delay you are talking about with the partition scan on the CDROM device. 2 seconds vs. 20. But, it seems like the kernel is slower even before that on most things. But, it's hard to tell as well since the DIB kernel has more modules loaded. | 22:43 |
ianw | if we could take ~ 1m out of the initrd, it seems like we'd be pretty close to on par with upstream | 22:47 |
ianw | the 1503 image does a weird thing where it throws up a login prompt, but it hasn't really finished booting. i'm guessing someone fixed a bunch of systemd stuff because that doesn't happen on 1809 | 22:47 |
johnsom | Yeah, that is actually pretty common | 22:48 |
johnsom | Yeah, the only things I see are that the DIB kernel has more modules and has more of the security cpu vulnerability code in it. | 22:49 |
ianw | johnsom: so i tried to nopti and spectre_v2 off, didn't make a big difference | 22:50 |
johnsom | Well, that APCI stuff is different in odd ways. | 22:51 |
clarkb | ianw: is the up to date upstream image also fast? | 22:52 |
johnsom | Yeah, I don't know what all knobs are required to disable all of those changes. But even so you would expect to see the pain in the ubuntu images too if it was that stuff. | 22:52 |
ianw | clarkb: yep, results in https://etherpad.openstack.org/p/dib-centos-slow | 22:53 |
johnsom | Ok, yeah, that kernel is a lot closer in terms of modules to the config of the one running via DIB. | 22:56 |
clarkb | it would be nice to do a 1:1 kernel comparison using the upstream image without any dib | 22:57 |
ianw | i wonder if there's something we can set to get the scsi subsystem a bit more chatty about what's going on in that pause | 22:57 |
clarkb | because it could be a regression (and the newer kernel is slower too in usptream) | 22:57 |
clarkb | (just not as slow) | 22:57 |
ianw | clarkb: i think between the two upstreams that's probably close to noise. however the dib kernels seem to be consistently about 1m slower through the intird | 22:58 |
johnsom | Yeah, I see two pauses one almost 10 seconds delay, just prior to the start of the nic and scsi drivers. | 23:02 |
johnsom | This isn't some strange partition alignment thing in the DIB image is it? | 23:03 |
ianw | hrm, seems there's a tool for that, virt-alignment-scan ... let's see | 23:05 |
ianw | we do pad all that out in dib-block-device though, however | 23:06 |
ianw | reports ok, i pasted results in | 23:07 |
johnsom | Well, it was a thought... | 23:10 |
ianw | hrm scsi_logging_level doesn't seem to be doing anything | 23:19 |
ianw | i wonder if i could send a magic sysrq dump while it's paused in there, might give a clue | 23:21 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!