Thursday, 2021-10-14

*** dviroel|rover is now known as dviroel|out00:19
*** rlandy|ruck|bbl is now known as rlandy|ruck01:27
*** bhagyashris is now known as bhagyashris|out03:30
*** ysandeep|out is now known as ysandeep04:53
*** ykarel|away is now known as ykarel05:19
*** ykarel_ is now known as ykarel06:13
opendevreviewDr. Jens Harbott proposed openstack/devstack-gate master: Cap PyYAML to stay compatible with our code  https://review.opendev.org/c/openstack/devstack-gate/+/81395807:30
*** jpena|off is now known as jpena07:32
*** ykarel is now known as ykarel|lunch08:14
*** ysandeep is now known as ysandeep|lunch08:35
*** ykarel|lunch is now known as ykarel|lunc09:27
*** ykarel|lunc is now known as ykarel09:27
*** ysandeep|lunch is now known as ysandeep09:39
-opendevstatus- NOTICE: zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued10:02
*** jcapitao is now known as jcapitao_lunch10:26
*** rlandy is now known as rlandy|ruck10:37
sean-k-mooneyclarkb: sorry not that i know of with regrads to f3411:04
*** dviroel|out is now known as dviroel|rover11:09
*** jpena is now known as jpena|lunch11:27
*** akahat is now known as akahat|afk11:54
*** ysandeep is now known as ysandeep|afk11:58
fungiade_lee: i know you've been working some on fips testing, has that gotten to the point of trying tempest, or is it just unit and functional tests for now?12:08
fricklerfungi: there were some wip patches, but not working yet iirc, let me check12:17
fungiwe've got cloud providers starting to show an interest in having fips-compliant deployments, but getting tempest (via refstack) to work is apparently blocking the ability to actually claim they have interoperable services12:17
fungilots of problems ssh'ing, generating certs, et cetera12:18
fricklerhttps://review.opendev.org/c/openstack/neutron/+/79753712:18
fungithanks frickler!12:18
fricklerfungi: sure, may I ask where you see those cloud providers showing interest? is that via the foundation?12:22
fungifrickler: yes, approaching the foundation about certifying their deployments as interoperable so they can use the trademarks12:22
*** jpena|lunch is now known as jpena12:22
fricklerwell they don't need to run tempest on a fips-enable machine to do that, or do they?12:23
fungii'm only just hearing about the problem now, but the test architecture is certainly one of my first questions12:24
fricklermight be worth discussing with the qa team once you have more details12:25
fungiyep, i'll also push to see if one of these companies can discuss their challenges with the community directly, since having an intermediary in such matters tends to just get in the way12:27
frickler+112:28
fungii'm also going to try to track the fips testing work a little more actively from the security sig too, in case we can help drum up more people interested in working on it12:46
fungi(not that there are a bunch of people involved there any more either)12:47
*** ysandeep|afk is now known as ysandeep12:55
*** ysandeep is now known as ysandeep|brb12:57
afaranhaHi, we have a test timing out on the CI: https://zuul.opendev.org/t/openstack/build/c1c1a44dcb7d42e08d16f7e929e0037812:57
afaranhait's from this FIPS patch https://review.opendev.org/c/openstack/swift/+/79605712:57
afaranhaWhen running the tests locally on a Centos8 server and FIPS enabled, it works. For the next step, can we have access to the node to troubleshoot the issue? or have a sosreport?12:57
*** bhagyashris|out is now known as bhagyashris|mtg13:00
fungiafaranha: great timing! we were just talking about some similar jobs in here. and yeah, since the run phase playbook timed out it seems we don't generate a summary of it due to incomplete ansible metadata13:01
fungii made the mistake of trying to pull up the job log in my browser, so waiting for it to stop having a fit before i try to download a copy to go through locally instead13:02
*** ysandeep|brb is now known as ysandeep13:03
afaranhafungi, yea, it takes forever on the browser :P13:04
afaranhafungi, do you see more timing out jobs on swift?13:04
fungiafaranha: i downloaded the job-output.txt, decompressed it and looked at it locally. tox ran out of disk space on the test node, though i'm not certain that's why the playbook timed out it certainly wouldn't have succeeded after that13:06
fungi2021-10-04 15:16:14.471609 | centos-8 | OSError: [Errno 28] No space left on device13:07
afaranhafungi, we would think that swift was retrying to create containers many times and that's why it was timing out13:09
*** akahat|afk is now known as akahat13:10
afaranhabut it's passing on local server13:10
fungiafaranha: how much disk space do you have locally? more than the 60gb or so which tends to be available for file creation on our test nodes?13:10
afaranhado you know how can we troubleshoot this? how many containers it tried to create?13:11
afaranhafungi, 40GB, 2 cores 4GB RAM13:11
afaranhafungi,  after the tests: 178M./.tox13:13
fungii think swift's tests create files outside .tox though13:15
fungilooking at https://zuul.opendev.org/t/openstack/build/c1c1a44dcb7d42e08d16f7e929e00378/log/zuul-info/zuul-info.centos-8.txt#100 the job started with 25gb on its rootfs13:16
funginormally, devstack would format and mount the ephemeral disk on the nodes in that provider at /opt, but i'm not seeing evidence of it doing that in the log13:17
fungilooking at the log, it creates a 1gb file and formats it xfs to mount on a loop device as a scratch space for xattr tests13:20
fungioh! i think that may be where the "No space left on device" error is actually coming from, it seems like it may have run out of available space in that 1gb xfs filesystem it created13:21
fungithe exception is raised in swift/obj/diskfile.py from write_metadata calling to xattr13:22
fungiso i have a feeling that's where it was writing13:23
ade_leefungi, frickler thats interesting to hear about other folks looking to have fips-compliant deployments13:28
ade_leefungi, there are a bunch of patches out there right now for both unit and tempest tests13:29
ade_leefungi, this is some of them -- https://review.opendev.org/q/topic:%22add_fips_job%22+(status:open%20OR%20status:merged)13:30
afaranhafungi, the fix would be to increase this file size?13:31
ade_leefungi, the issue with tempest is that tempest uses paramiko and paramiko uses some md5 there.  To get some results, I ended up patching paramiko to allow the md5, and so most of the tests there depend on having a hacked parmiko.  But there are some other patches out there where I'm trying to replace paramiko in tempest with libssh13:32
ade_leefungi, https://review.opendev.org/c/openstack/tempest/+/80627413:33
ade_leefungi, I think maybe I need to attend the security SIG session next week to discuss fips13:34
fungiade_lee: that would be great!13:46
ade_leeafaranha, one thing I remember doing - but I  might be misremembering  - is also running the swift encryption job without fips enabled - and that job passed.  We should probably do that again just to make sure.  That would confirm that there really is something going on with fips and not vm sizes13:46
fungiafaranha: i don't know for sure that will fix it, the xfs filesystem running out of space could just be masking other problems as well, but it might at least fix whichever test are hanging13:46
ade_leeafaranha, that is running the encryption test using the centos nodeset13:47
fungiade_lee: on closer inspection i don't think it's complaining about the vm size, but rather running out of space on the file-backed 1gb xfs loop fs the swift jobs create for the xattr tests13:47
fungii'm not sure why that would only happen in the fips job, but maybe swift needs more space when using fips-compliant hashes?13:48
afaranhaade_lee, I recreate the env today, and re run without fips and with fips13:48
afaranhaall passed13:48
ade_leefungi, afaranha hmm .. so where is the file size defined in the test?  we could try bumping to 1.5G ..13:50
fungiit's also possible filling up that xfs fs is merely a symptom of some other failure, causing a test to retry creating files in there over and over or something13:50
fungii'm not really familiar with swift's jobs, i was mainly going by what i saw in the job-output.txt from the failing build, but can try to find where it's set13:51
ade_leeafaranha, lets do a run where we do the test on the centos-8-stream nodeset without fips enabled -- at least we can confirm then that there is something fips related.13:53
afaranhaade_lee,  I actually don't know how to check this13:56
ade_leeafaranha, ok - let me take a quick look and we can sync13:56
afaranhaade_lee,  ack, I will test without fips enabled again13:56
afaranhaade_lee, sorry, I'm going to the glance meeting in 3min, I need to talk about the bindep.txt13:57
ade_leeafaranha, sounds good13:57
afaranhaade_lee,  btw, shouldn't we have some CIs running on centos upstream as well?13:58
fungiwe do have jobs running on centos-7, centos-8, centos-8-stream, and there's centos-9-stream images in the works14:01
ade_leeafaranha, so I think there is already a swift job defined there -- swift-tox-func-encryption-py36-centos-8 which presumably passes14:02
ade_leeafaranha, actually we're running it in https://review.opendev.org/c/openstack/swift/+/796057 and we see it succeeds with no fips enabled and not when fips is enabbled14:03
*** ykarel is now known as ykarel|away14:21
afaranhaade_lee, fungi I was asking mostly because of the glance issue14:26
fungiwhich glance issue?14:26
afaranhafungi, https://review.opendev.org/c/openstack/glance/+/790536/2814:27
afaranhathe test was failing because it needs to install qemu-img to run the test, not fips related14:28
fungiahh, okay14:28
fungii guess qemu-img isn't packaged for centos?14:29
afaranhafungi, currently not. I'll check if the previous version had this issue, glance team want to backport it to victoria (I'm not sure yet if victoria uses centos7)14:30
afaranhahttps://bugs.launchpad.net/glance/+bug/194714614:30
afaranhafungi,  for ubuntu it install qemu and qemu-utils package, but for centos there's nothing https://github.com/openstack/glance/blob/master/bindep.txt#L23-L2414:31
afaranhas/ubuntu/dpkg/  s/centos/rpm/14:31
fungiin nodepool we use qemu-img to perform image conversions (build raw, make qcow2 and vhd from that) so we can upload the "same" image to multiple providers who have different hyoervisors with their own image format requirements, but we use a forked qemu-img based on the ubuntu package14:32
afaranhafungi, what's the nodepool?14:37
fungiafaranha: a component of zuul: https://zuul-ci.org/docs/nodepool/14:38
fungiit's the part of its service fleet which does resource management and assignment14:38
fungi(build/upload/replace virtual machine images, boot/delete test nodes)14:39
fungithough the use of qemu-img is via diskimage-builder14:40
afaranhafungi, that's a bit confusing14:43
afaranhathe test was running qemu-img manually, maybe that's why we didn't see this issue on other tests...14:44
fungisorry, i was just pointing out that while we do use qemu-img in production, we do so with a (forked) version of the ubuntu packages, not on centos14:45
afaranhafungi, https://github.com/openstack/glance/blob/master/glance/tests/functional/v2/test_images.py#L906-L91214:45
fungiafaranha: yeah, if you're trying to call qemu-img create on a system where it's not installed, then you'll probably not have much luck: https://opendev.org/openstack/glance/src/branch/master/bindep.txt#L2414:48
fungithe jobs will install it on dpkg-based platforms like ubuntu or debian based on that line in the bindep.txt file14:49
fungiyou'd need to add a similar line for the name of whatever package in centos provides the qemu-img command (if there is one)14:49
*** bhagyashris|mtg is now known as bhagyashris|away15:03
*** ysandeep is now known as ysandeep|dinner15:45
*** ysandeep|dinner is now known as ysandeep16:00
*** dpawlik5 is now known as dpawlik16:37
*** ysandeep is now known as ysandeep|out16:39
*** jpena is now known as jpena|off16:42
*** sboyron_ is now known as sboyron17:09
*** sboyron is now known as Guest289717:10
*** sboyron__ is now known as sboyron_17:36
*** sboyron_ is now known as sboyron18:21
opendevreviewMerged openstack/devstack-gate master: Cap PyYAML to stay compatible with our code  https://review.opendev.org/c/openstack/devstack-gate/+/81395818:47
clarkbfrickler: ^ thank you for jumping on that18:48
fungioh, sorry i meant to review that earlier. but yes thanks!18:51
frickleryeah, that one seemed easy enough to do a quick fix instead of telling people to finally stop using ds-gate right now18:58
frickleralso it was the patch that triggered me to look at the zuul issues when it didn't have results after 2h18:58
fungiright, people thankfully *have* stopped using devstack-gate for the latest openstack releases, but older stable branches of some projects are still using it19:00
fungidropping current use of d-g took about 2 years longer than we had hoped it would, but i'm okay with where we're finally at with it19:01
*** dviroel|rover is now known as dviroel|rover|afk21:30
*** rlandy|ruck is now known as rlandy|ruck|bbl22:20

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!