Tuesday, 2021-10-19

*** dviroel|rover|afk is now known as dviroel|out00:50
*** vishalmanchanda_ is now known as vishalmanchanda03:51
*** ysandeep|out is now known as ysandeep05:02
*** ykarel_ is now known as ykarel05:20
*** bshephar is now known as bshephar|brb05:36
*** ysandeep is now known as ysandeep|afk06:30
*** ysandeep|afk is now known as ysandeep|trng06:58
*** jpena|off is now known as jpena07:32
*** ykarel is now known as ykarel|lunch08:48
afaranhafungi, hi, regarding the timing out test on the CI, could you save the node and give access to it? https://zuul.opendev.org/t/openstack/build/8fefe2da3d754c9484f2cdd2090eb48410:12
opendevreviewPierre Riteau proposed openstack/project-config master: [kolla] Preserve Backport-Candidate and Review-Priority scores  https://review.opendev.org/c/openstack/project-config/+/81454810:14
*** rlandy is now known as rlandy|ruck10:34
*** ykarel|lunch is now known as ykarel10:34
*** jcapitao is now known as jcapitao_lunch10:47
*** dviroel|out is now known as dviroel|rover11:06
*** jpena is now known as jpena|lunch11:26
*** jcapitao_lunch is now known as jcapitao12:10
*** jpena|lunch is now known as jpena12:15
*** ysandeep|trng is now known as ysandeep12:56
fungiafaranha: i've set an autohold for that job and rechecked your change13:21
fungionce it fails again, let me know the ssh key you want granted access to the job node13:21
*** bshephar|brb is now known as bshephar13:23
afaranhafungi, sure, thanks :)13:24
afaranhawe have 2 approaches here, one is to identify the test that timed out, and the second one is to increase the size of the of the file created that is mounted for the test, this second one I don't quite get it yet13:27
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement  https://review.opendev.org/c/openstack/project-config/+/81458013:27
fungiafaranha: yeah, also the build does archive the subunit stream from the tests, i don't know if you've tried analyzing it13:28
fungiit's presumably missing whichever test(s) timed out or otherwise didn't get run13:28
afaranhafungi, I was checking the job-output.txt, the tox folder has only the requirements it seems13:32
afaranhado you mean something else?13:33
fungithe tempfile which is included in the top level logs list is the subunit stream13:35
funginormally it gets postprocessed to create the test report13:35
fungibut in cases where the job could not complete it's still available for retrieval and analysis13:36
afaranhafungi,  so the last entry there was: test: test.functional.s3api.test_object.TestS3ApiObject.test_put_object_underscore_in_metadata13:49
afaranhathat means this was the last test is run, so the next one is the one that timed out?13:49
funginot necessarily, the tests are likely run in parallel, so determining a sequence could be hard. you could compare the names of the completed tests to the list of tests you expected to be run and see which ones are missing or don't indicate they finihed13:54
fungifinished13:54
gibi#nova next Oslo-Nova cross project session from 14:0014:00
gibi#nova now Oslo-Nova cross project session from 14:0014:00
gibi#nova now Oslo-Nova cross project session: oslopolicy-sample-generator extensions14:01
gibi#nova next break14:02
fricklergibi: EWIN14:09
gibiups14:10
gibisorry14:10
*** lbragstad_ is now known as lbragstad14:15
*** ykarel_ is now known as ykarel14:22
afaranhafungi, thanks, I'll try to isolate it. For the second approach, increase the size of the file, can you point me to where this is done and how can I change it?14:36
fungiafaranha: it's somewhere in swift's job definitions i think, probably search their repo for xattr or xfs or mkfs14:38
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: enter retirement  https://review.opendev.org/c/openstack/project-config/+/81459714:48
timburke_afaranha, fairly certain it's this: https://opendev.org/openstack/swift/src/branch/master/tools/test-setup.sh#L7-L1414:53
timburke_it's also worth noting that we *do* run our functional tests serially (because they share a user and do things like upload a few objects then check that account stats match) -- see https://opendev.org/openstack/swift/src/branch/master/.functests#L1014:56
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement  https://review.opendev.org/c/openstack/project-config/+/81458015:05
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: enter retirement  https://review.opendev.org/c/openstack/project-config/+/81459715:05
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:06
afaranhafungi, timburke_ thanks for the help, I'll try to increase the size as soons as I find time today or tomorrow (quite a busy week)15:07
fungiyeah, as to why it would run out of space in that file, i have a feeling just increasing the side won't make any difference and that there's some test running away filling it up no matter how large you make it15:08
fungis/side/size/15:08
afaranhafungi,  right, so I'll get have to find and check the timed  out test first to see what it's trying to do15:10
timburke_fwiw, i'd be somewhat shocked if it was really setting the xattrs that pushed it over the edge -- i would've expected the data itself to be doing it. there might be something funky going on with max xattr sizes in fips mode15:10
fungiyeah, hopefully once we've got a held node for the failing job, the reasons will become more obvious15:11
timburke_we have some checks for large xattr support (https://opendev.org/openstack/swift/src/branch/master/test/unit/__init__.py#L1265-L1302) but i'm willing to bet we don't have appropriate skips in *all* the right places15:14
fungiafaranha: what ssh key do you want granted access?15:27
afaranhafungi,  https://paste.openstack.org/raw/810081/15:29
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:30
fungiafaranha: ssh root@137.74.28.17215:33
timburke_speaking of gate jobs, i wouldn't mind getting some thoughts on https://review.opendev.org/c/zuul/zuul-jobs/+/795419 and/or https://review.opendev.org/c/openstack/project-config/+/794351 -- currently, pyeclib's test-release-openstack job keeps failing because it tries to build wheels despite not having all the deps it'd need to build binaries15:34
fungifwiw, i mounted /home/zuul/1G_xfs_file to /home/zuul/xfstmp/ on that node and it's only 5% space used, 1% inodes used15:35
fungiso it's not full now, even if it was at some point15:35
fungitimburke_: looking15:35
timburke_thanks! seems to me we can either skip building the wheel (it's not been getting uploaded anyway) or install bin-deps -- i don't care much which way it goes15:36
fungitimburke_: i think skipping the wheel builds is best. you won't be able to upload platform-specific linux wheels to pypi anyway, you need special build environments for the "manylinux" meta-architectures15:37
timburke_makes sense15:40
timburke_afaranha, you might try cd'ing into the xfs mount and running something like `touch t && xattr -w user.test $(python -c "print(4097*'x')") t`15:42
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:47
clarkbfungi: timburke_ note there is some manylinux build work done for cryptography on our zuul installation. Though I think the package itself has to be fairly aware too and statically link libs that aren't part of manylinux15:52
fungiright, would need to bundle eclib, presumably15:53
fungi(at a minimum)15:53
timburke_yup15:53
fungiworking toward having manylinux* wheels for pyeclib might be an interesting exercise, but probably best to not have the tarball release uploads broken until that can be added15:54
timburke_it'd be cool to have liberasurecode statically linked, but i don't know that anyone's interested enough to dig into what it would take to do it15:54
timburke_fungi, exactly -- the really frustrating thing it that it's blocking me from even updating the README to reflect the freenode -> OFTC change15:54
fungii think it's just a matter of copying its .so into the right path in the wheel, but i've not tried it15:54
fungitimburke_: yep, i think if we can get tristan to confirm that change isn't breaking software foundary, we should be able to go ahead and merge that flag15:56
fungii've asked him in the zuul matrix channel15:56
afaranhatimburke_,  ack, I just need to find the venv to source it15:58
fungiafaranha: no need to source a venv for that, just cd to /home/zuul/xfstmp where it's mounted16:00
afaranhafungi,  yea, but I can't use python16:00
afaranhaoops, python3 my bad16:00
afaranhabut no xattr16:01
afaranhalet me see what it does...16:01
fungimight need to yum install it?16:02
fungier, dnf now16:02
fungioh, i see, it is a python-based tool, so yeah it's probably installed in the tox venv16:03
fungiafaranha: the tox venv is /home/zuul/src/opendev.org/openstack/swift/.tox/func-encryption-py316:04
fungiyou can just directly invoke /home/zuul/src/opendev.org/openstack/swift/.tox/func-encryption-py3/bin/xattr in that case16:04
fungino need to activate the venv to do that16:04
afaranharight, just a minute16:05
afaranhafungi, I run it, no issue happened, but what was I supposed to see?16:07
fungipotentially a traceback with "OSError: [Errno 28] No space left on device" like in the job log16:08
afaranhano, no issue16:08
afaranhaI checked the file with xattr -l t, and the attribute is there16:08
fungiin /var/log/messages at the same time as the failure in the job output i also see this:16:11
fungiOct 19 13:35:05 centos-8-ovh-gra1-0027003455 wsgi-server[1826]: ERROR Insufficient Storage 127.0.0.1:41647/sdb1 (txn: tx2eef682df47b4737af575-00616ec989)16:11
fungiafaranha: https://paste.opendev.org/show/810084 that's the transaction which failed on the insufficient space error, i think16:16
fungiincludes the syslogged entries from both swift and wsgi-server16:17
afaranhafungi,  shouldn't we see the sdb1 partition on this server?16:17
afaranha"ERROR Insufficient Storage 127.0.0.1:41647/sdb1"16:18
fungii have no idea if that sdb1 is an actual block device partition on the node or something that's been virtualized in the functest suite16:18
fungii'm not super familiar with swift's functional testing16:19
timburke_most likely it's something faked up for the test. i can double check16:19
timburke_yeah, those are basically just made up names: https://opendev.org/openstack/swift/src/branch/master/test/functional/__init__.py#L510-L52116:22
clarkbmost of our clouds give us xvd* and vd* type devices16:22
opendevreviewMerged openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460016:25
*** jpena is now known as jpena|off16:27
*** ysandeep is now known as ysandeep|out16:29
timburke_afaranha, might also try a higher value than 4097 -- we expect to be able to go as high as 64k: https://opendev.org/openstack/swift/src/branch/master/swift/obj/diskfile.py#L252-L26816:37
timburke_and the encryption job is definitely going to want to write more metadata per object, which could explain why it fails while the erasure-coding job succeeds16:38
clarkbtimburke_: out of curiousity is the semi recent update to linux xfs drivers to make xfs Y2038 safe something that is on swifts radar? I have no idea what that actually entails from an operation standpoint. I guess tell everyone to upgrade to linux >=5.10 before 2038 and do whatever xfs udpate is necessary?16:41
fungithe clock is ticking, only 19 years left!16:43
timburke_it's something we'll definitely need to think about, but i don't think anyone's started on it yet. presumably, operators would start formatting all new disks to be Y2038-safe; normal disk failures and hardware deprecation cycles would mostly ensure that you're ready by the time it really matters16:48
timburke_might make some difference in our RAM requirements (those inodes must've gotten a little bigger), but RAM's getting denser and denser anyway16:50
clarkbtimburke_: ya and you've got plenty of time. It came up in the context of building centos-9 images because centos-8 and ubuntu focal can't mount the xfs that centos-9 produces by default. Turns out the y2038 fix is the reason. Definitely seems like somethign worht backporting but maybe not with that many years ahead of it16:50
timburke_good to know!16:51
fungii suppose that could be a problem for scenarios like rdb or other shared block devices if one of the systems is centos-9 and another is centos-8, like during upgrades16:53
-opendevstatus- NOTICE: Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again17:00
ade_leefungi, timburke_ hey I saw you set an autohold on the swift fips job that was failing.  Did you guys figure out what was going on?19:39
fungiade_lee: afaranha was looking into it19:40
ade_leefungi, ok - did he get his keys on there?19:40
fungiyep19:40
ade_leeok cool -- I expect I'll hear from him tomorrow then19:41
fungii poked at it a little and confirmed swift and wsgi-server logged a test related transaction was rejected with a 503 due to insufficient space somewhere, but it's not entirely clear where19:42
ade_leefungi, ok - hopefully with a system there we'll be able to see whats going on.  do you want to discuss how want to handle https://review.opendev.org/c/zuul/zuul-jobs/+/813253 now?19:43
ade_leedviroel|rover, ^^19:43
fungiade_lee: the opendev meeting is in progress, and after that i need to work on dinner prep, but can probably start revisiting it19:44
clarkbI think you should parent the multinode job and then add in your own enable-fips rather than making the multinode job have a flag for fips19:45
clarkbthe two don't really ahve a relationship to each other so keeping them separate will reduce confusion19:45
fungiyeah, one of the suggestions i had was to do the fips setup before the multinode setup19:45
fungibecause the reboot should happen as early as possible in order to not have to worry about stateless setup steps getting undone by the reboot19:46
* dviroel|rover reads19:46
clarkbfungi: ah yup. But that should be doable. Have a fips job and then layer on top of that19:46
fungisince zuul wraps pre-run and post-run playbooks like an onion based on order of inheritance, it implies having some job which uses the multinode bits parented to a job with the fips setup bits19:47
clarkbya I think that is fine too as you can do that to the side. But if you stick it directly in the role it becomes a part of that specific interface which is weird to me19:48
fungior just having one pre-run playbook which includes the fips role first and then the multinode role19:48
clarkbbasically have a fips job, them a multinode-fips that inherits from fips19:48
clarkbbut the multinode role interface itself isn't change19:48
clarkb*isn't changed19:48
ade_leefungi, clarkb ok - so what I'm hearing is add a fips job and a mutinode fips job to zuul.d/general-jobs.yaml19:50
ade_leewhere multinode-fips depends on the fips job19:50
clarkbya I think that is better than modifying the multinode role which is intended to be a bit more self contained19:50
fungiright19:50
ade_leedviroel|rover, ^^ think this could work for us?19:51
fungior you can have a multinode-fips job which directly includes the fips and then multinode roles in that order, i think19:51
dviroel|roveryeah, make sense for me, to have another multinode-fips and not mixing things19:51
ade_leecool - well that was easy :)19:52
dviroel|roverade_lee: yes, i can work on that in a couple of mins19:52
ade_leecool thanks -- fungi , clarkb - we'll let you know when we have some results19:53
fungigreat! don't hesitate to ask if you have more questions. happy to answer them when i can, though some will come down to experimentation19:54
ade_leeyup will do - especially if it doesn't work :)19:56
*** dviroel|rover is now known as dviroel|rover|afk20:59
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:20
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:22
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:56
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific  https://review.opendev.org/c/openstack/project-config/+/81468323:11
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific  https://review.opendev.org/c/openstack/project-config/+/81468323:13
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB  https://review.opendev.org/c/openstack/project-config/+/81468323:17

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!