Thursday, 2019-01-17

*** sm806 has quit IRC02:46
*** amoralej has quit IRC02:46
*** vabada has quit IRC02:46
*** odyssey4me has quit IRC02:46
*** vabada has joined #openstack-dib02:46
*** sm806 has joined #openstack-dib02:47
ianwjohnsom: sorry been a bit distracted with other things, lgtm04:36
*** hwoarang has quit IRC04:37
johnsomianw No worries. I heard you are having fun with the 8 series....04:37
*** hwoarang has joined #openstack-dib04:37
ianwthat and looking at some weirdness with slower boots from centos nodes04:38
johnsomAh, Carlos looped you into that too...  grin04:38
johnsomSadly I have not had time to poke at that, I have been focused on a big feature for stein.04:40
ianwit's ... interesting ... spent a bit of time on it today, updating https://etherpad.openstack.org/p/dib-centos-slow04:44
ianwspoiler alert ... no smoking gun :/04:44
johnsomMy purely out of the air guess is there is some driver issue, like one of the virtio drivers is off or something04:46
ianwthis could be it ... i'm noticing some weird pauses during scsi enumeration (maybe ... nothing comes out between pauses) as a start04:48
johnsomianw Are these two images being booted on the same host with qemu/libvirt? upstream vs. ext4?05:38
johnsomThe other thing that is obvious is the ext4 boot kernel includes more cgroup subsystems, yama, and the Spectre code05:47
johnsomThat is known to tank io performance05:47
johnsomThe ext4 kernel is a lot more bloated with modules too05:52
ianwjohnsom: yep, same host, same environment06:44
*** hwoarang has quit IRC07:18
*** hwoarang has joined #openstack-dib07:19
ianwjohnsom: added some notpi/spectre_v2 off numbers; within the noise so not totally that :/07:34
*** brault has joined #openstack-dib08:42
*** brault has quit IRC08:45
*** odyssey4me has joined #openstack-dib09:44
ianwtonyb: this fedora gate failure mean anything to you?10:08
ianwhttp://logs.openstack.org/49/627949/1/check/dib-functests-centos7-python2/9ade208/logs/ironic-agent_build-succeeds-fedora.FAIL.log10:08
ianw2019-01-15 19:26:36.687 | Could not open requirements file: [Errno 2] No such file or directory: '/usr/share/ironic-python-agent/upper-constraints.txt'10:08
ianwin the good test, it seems to go downloading it from git10:09
ianwhttp://logs.openstack.org/75/627775/1/check/dib-functests-xenial-python3/045759e/logs/ironic-agent_build-succeeds-fedora.PASS.log10:09
ianwbut seems to bail out in the bad one10:09
ianwi'll check in on it tomorrow ...10:09
*** odyssey4me has quit IRC10:49
*** irclogbot_0 has quit IRC12:47
*** irclogbot_0 has joined #openstack-dib12:58
*** irclogbot_0 has quit IRC13:45
*** irclogbot_0 has joined #openstack-dib13:56
*** brault has joined #openstack-dib14:33
*** brault has quit IRC14:33
*** brault has joined #openstack-dib16:04
*** brault has quit IRC16:08
*** brault has joined #openstack-dib16:24
*** greghaynes has joined #openstack-dib16:27
*** brault has quit IRC16:29
*** brault has joined #openstack-dib18:52
*** hwoarang has quit IRC20:12
*** hwoarang has joined #openstack-dib20:17
*** irclogbot_0 has quit IRC20:41
tonybwin 5420:48
*** irclogbot_0 has joined #openstack-dib20:53
*** openstackgerrit has quit IRC20:56
prometheanfiretonyb: too many windows21:20
tonybprometheanfire: that's about the midpoint21:22
prometheanfireouch21:23
ianwi'm assuming the fedora job didn't magically fix itself as i slept :)21:36
ianwi think it must be related to https://github.com/openstack/ironic-python-agent/commit/041c1795db9249b423622b21191cb88d2259445121:49
ianwtonyb: anyway, re my previous message looks like an i-p-a issue, nothing to do with requirements : Fix upper-constraints.txt fallback copying  https://review.openstack.org/63166422:08
tonybAhh cool22:08
tonybwhere are you?22:08
tonybWhy were you working 10 hours ago?22:08
ianwi'm where i always am :)  it was only 9pm22:10
clarkbjohnsom: centos doesn't ship a different kernel for xfs vs ext4 as far as I know. Both modules are loaded in the kernel by default22:12
johnsomclarkb That wasn't my concern/question.22:13
clarkbok I don't understand what you mean by ext4 kernel then22:13
clarkbthe ramdisk kernel?22:13
johnsomclarkb I was asking if the platform was the same for Ian's test runs, some of which were ext4 and some were xfs.22:13
johnsomThe "upstream" run has a very different kernel than the DIB runs.22:14
clarkbdib installs the upstream kernel?22:15
clarkb(I'm still confused, sorry. DIB installs upstream packages and upstream kernel has ext4 loaded in it)22:15
johnsomThe "upstream" test run (I assume this is an image built somewhere else and not using DIB) is using 3.10.0-229.el7.x86_6422:16
johnsomThe DIB run was 3.10.0-957.1.3.el7.x86_6422:16
ianwclarkb: you probably missed a bunch of context sorry, because we've been tracking why dib-built centos is slower than the upstream images in a google doc with octavia22:16
ianwclarkb: anyway, i started running some tests as best i could -- https://etherpad.openstack.org/p/dib-centos-slow22:17
johnsomclarkb It was a conversation Ian and I were having about some observed performance issues with the DIB built centos images.22:17
ianwone thing i thought was maybe using ext4 instead of xfs (as the upstream image was doing) would make a difference22:17
ianw(it didn't)22:17
johnsomYeah, I don't think it's the filesystem22:17
clarkbif you upddate the kernel in the upstream image does it get slow?22:18
ianwhrm, i think i can do that, let me see if it works22:19
johnsomI only got half way through my analysis of the kernel logs last night.  I'll take a few minutes and see if anything jumps out at me.22:20
ianwthis pause22:20
ianw[00:01:06.835] [   45.499264] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5       <------------------ pause22:20
ianw[00:01:25.950] [   64.613378] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray22:20
ianwis pretty consistent on the dib kernels, and not seen on the upstream one22:20
johnsomYeah, the ACPI is different for some reason. That is why I was asking about the platform.  I.e. did the qemu machine type change22:20
ianwclarkb: hrm, the vm networking isn't setup, because i'm just booting this by hand nested in my work vm (so it is using binary translation, which we want).  have to see what i can setup22:25
ianwjohnsom: yeah, same qemu command ... although let me see about kernel command lines22:25
clarkbianw: virsh should be able to get you a console if the root or centos user will let you login that way22:25
johnsomThey are pretty close22:25
ianwyeah, nothing too weird on upstream image22:26
ianwooohhh, why am i using such an old upstream image.  i've sorted them the wrong way22:31
ianwlet me try again to get a better baseline22:31
ianwhi ho, hi ho, dowloading qcow images we go ...22:32
clarkbdifferent movie but now I've got "Hi Ho SILVER!" from whats his name in the island of misfit toys in my head22:33
clarkber no I think I've mashed two things together and now I'm confused22:34
clarkbI'll blame the kids waking me up at 3am22:34
johnsomYeah, I see the delay you are talking about with the partition scan on the CDROM device. 2 seconds vs. 20. But, it seems like the kernel is slower even before that on most things. But, it's hard to tell as well since the DIB kernel has more modules loaded.22:43
ianwif we could take ~ 1m out of the initrd, it seems like we'd be pretty close to on par with upstream22:47
ianwthe 1503 image does a weird thing where it throws up a login prompt, but it hasn't really finished booting.  i'm guessing someone fixed a bunch of systemd stuff because that doesn't happen on 180922:47
johnsomYeah, that is actually pretty common22:48
johnsomYeah, the only things I see are that the DIB kernel has more modules and has more of the security cpu vulnerability code in it.22:49
ianwjohnsom: so i tried to nopti and spectre_v2 off, didn't make a big difference22:50
johnsomWell, that APCI stuff is different in odd ways.22:51
clarkbianw: is the up to date upstream image also fast?22:52
johnsomYeah, I don't know what all knobs are required to disable all of those changes. But even so you would expect to see the pain in the ubuntu images too if it was that stuff.22:52
ianwclarkb: yep, results in https://etherpad.openstack.org/p/dib-centos-slow22:53
johnsomOk, yeah, that kernel is a lot closer in terms of modules to the config of the one running via DIB.22:56
clarkbit would be nice to do a 1:1 kernel comparison using the upstream image without any dib22:57
ianwi wonder if there's something we can set to get the scsi subsystem a bit more chatty about what's going on in that pause22:57
clarkbbecause it could be a regression (and the newer kernel is slower too in usptream)22:57
clarkb(just not as slow)22:57
ianwclarkb: i think between the two upstreams that's probably close to noise.  however the dib kernels seem to be consistently about 1m slower through the intird22:58
johnsomYeah, I see two pauses one almost 10 seconds delay, just prior to the start of the nic and scsi drivers.23:02
johnsomThis isn't some strange partition alignment thing in the DIB image is it?23:03
ianwhrm, seems there's a tool for that, virt-alignment-scan ... let's see23:05
ianwwe do pad all that out in dib-block-device though, however23:06
ianwreports ok, i pasted results in23:07
johnsomWell, it was a thought...23:10
ianwhrm scsi_logging_level doesn't seem to be doing anything23:19
ianwi wonder if i could send a magic sysrq dump while it's paused in there, might give a clue23:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!