Thursday, 2019-06-13

openstackgerritTristan Cacqueray proposed zuul/nodepool master: static: enable using a single host with different user or port  https://review.opendev.org/65920900:01
*** paladox has quit IRC00:02
*** paladox has joined #zuul00:02
*** paladox has quit IRC00:02
*** paladox has joined #zuul00:02
*** jamesmcarthur has joined #zuul00:07
*** paladox has quit IRC00:08
*** paladox has joined #zuul00:08
*** paladox has quit IRC00:09
*** paladox has joined #zuul00:10
*** jamesmcarthur has quit IRC00:49
*** jamesmcarthur has joined #zuul00:50
*** olaph has joined #zuul00:54
*** jamesmcarthur has quit IRC00:55
*** rf0lc0 has joined #zuul00:57
*** mattw4 has quit IRC01:05
*** jamesmcarthur has joined #zuul01:20
*** michael-beaver has joined #zuul01:38
*** rlandy|ruck|bbl is now known as rlandy|ruck02:06
*** rf0lc0 has quit IRC02:13
*** jamesmcarthur has quit IRC02:15
*** jamesmcarthur has joined #zuul02:15
*** jamesmcarthur has quit IRC02:18
*** jamesmcarthur has joined #zuul02:20
*** rlandy|ruck has quit IRC02:32
*** jamesmcarthur has quit IRC02:33
*** jamesmcarthur has joined #zuul02:33
*** bhavikdbavishi has joined #zuul02:48
openstackgerritMerged zuul/zuul master: Update quickstart nodepool node to python3  https://review.opendev.org/65848602:52
*** bhavikdbavishi has quit IRC02:53
*** jamesmcarthur has quit IRC03:01
*** jamesmcarthur has joined #zuul03:02
*** bhavikdbavishi has joined #zuul03:03
*** jamesmcarthur has quit IRC03:04
*** jamesmcarthur has joined #zuul03:04
openstackgerritPaul Belanger proposed zuul/zuul master: Add more test coverage on using python-path  https://review.opendev.org/65981203:31
*** jamesmcarthur has quit IRC03:43
*** jamesmcarthur has joined #zuul03:43
*** michael-beaver has quit IRC03:48
*** jamesmcarthur has quit IRC03:58
*** igordc has joined #zuul04:07
*** bhavikdbavishi has quit IRC04:19
*** bhavikdbavishi has joined #zuul04:20
*** swest has joined #zuul04:25
*** swest has quit IRC04:31
*** sanjayu_ has joined #zuul04:43
*** swest has joined #zuul04:45
*** pcaruana|afk| has joined #zuul05:01
*** pcaruana|afk| has quit IRC05:04
*** pcaruana has joined #zuul05:05
*** badboy has joined #zuul05:08
*** spsurya has joined #zuul06:07
*** zbr|flow is now known as zbr|ooo06:11
*** gtema has joined #zuul06:34
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213407:00
*** ianychoi has joined #zuul07:02
*** igordc has quit IRC07:15
*** jangutter has joined #zuul07:31
*** jpena|off is now known as jpena07:43
ofososIs there any way I can tell the executor what SSH key to use? In theory Bitbucket has an API for uploading SSH keys and I would like to use that to upload the Zuul key to Bitbucket.07:47
*** themroc has joined #zuul08:08
tristanCofosos: for gerrit, there is a sshkey option that can be set per connection in the zuul.conf08:09
ofosostristanC: i'll have a look08:15
ofososThanks08:15
*** gtema has quit IRC08:20
*** gtema has joined #zuul08:21
openstackgerritMatthieu Huin proposed zuul/zuul master: web: add tenant and project scoped, JWT-protected actions  https://review.opendev.org/57690708:38
*** hashar has joined #zuul08:41
*** themroc has quit IRC09:18
openstackgerritMatthieu Huin proposed zuul/zuul master: Allow operator to generate auth tokens through the CLI  https://review.opendev.org/63619709:20
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul CLI: allow access via REST  https://review.opendev.org/63631509:31
*** gtema has quit IRC10:28
*** gtema_ has joined #zuul10:28
*** gtema_ is now known as gtema10:28
*** bhavikdbavishi has quit IRC10:43
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213410:49
badboyany ideas what's causing this:10:52
badboyAttributeError: type object 'EllipticCurvePublicKey' has no attribute 'from_encoded_point'10:52
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213410:58
*** jpena is now known as jpena|lunch11:01
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213411:20
*** sshnaidm has quit IRC11:52
*** sshnaidm has joined #zuul11:56
*** bhavikdbavishi has joined #zuul12:03
*** rlandy has joined #zuul12:05
*** rlandy is now known as rlandy|ruck12:06
*** rf0lc0 has joined #zuul12:07
*** bhavikdbavishi has quit IRC12:08
*** bhavikdbavishi has joined #zuul12:15
*** spsurya has quit IRC12:18
*** jpena|lunch is now known as jpena12:40
*** pcaruana has quit IRC13:01
*** chandankumar is now known as raukadah13:01
*** rlandy|ruck is now known as rlandy|ruck|mtg13:02
*** sanjayu_ has quit IRC13:05
*** gtema has quit IRC13:26
*** bhavikdbavishi has quit IRC13:27
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213413:27
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul CLI: allow access via REST  https://review.opendev.org/63631513:34
openstackgerritMatthieu Huin proposed zuul/zuul master: Add Authorization Rules configuration  https://review.opendev.org/63985513:34
openstackgerritMatthieu Huin proposed zuul/zuul master: Web: plug the authorization engine  https://review.opendev.org/64088413:35
*** jamesmcarthur has joined #zuul13:43
fungibadboy: are you seeing that with a gerrit connection? i want to say we've seen broken ecc implementation in some gerrit versions13:55
fungibadboy: oh, looks like that could be a mismatch with the installed version of pyca/cryptography13:57
fungiyou may be running too old of a version?13:57
fungiwhat version of cryptography does pip say is installed?13:58
fungialso, sticking the full traceback on http://paste.openstack.org/ would help provide some context13:58
fungihttps://github.com/pyca/cryptography/blob/master/CHANGELOG.rst#25---2019-01-22 suggests you need at least 2.5 (from january of this year) for that method14:02
*** sanjayu_ has joined #zuul14:02
*** igordc has joined #zuul14:05
*** rlandy|ruck|mtg is now known as rlandy|ruck14:10
*** gtema has joined #zuul14:17
*** hashar has quit IRC14:17
pabelangerdmsimard: have you see this ARA failure before? Looks to be encoding issue when generating html: https://logs.zuul.ansible.com/89/57789/8d9f8e0547417362c0241ab039e360035b778478/third-party-check-silent/ansible-test-network-integration-ios-python27/bc7e0b5/job-output.html#l865214:27
dmsimardpabelanger: I have not, I thought all those encoding issues had been ironed out :D14:30
pabelangerdmsimard: yah, this is the first time I've seen it happen, we've been using ARA for ansible-test for some time. Will dig more into it14:31
pabelangerI know it does some odd things with directory names for testing14:31
dmsimardpabelanger: oh, it might be https://github.com/ansible-community/ara/issues/48 then -- that's >1.0 though but it's possible 0.x is also impacted14:32
dmsimardit was also for a filesystem path with non-ascii characters14:32
dmsimard(who does that?)14:32
dmsimardI ran the ansible integration test suite against 1.x but not 0.x -- I should be able to reproduce14:33
pabelangerdmsimard: yup, that likely is it14:35
pabelangerlet me confirm we have that non-ascii chars disabled for ansible integration testing14:35
pabelangerI also don't know why they do it14:35
dmsimard#ansible-devel said it's because it bubbles up bugs like this one :p14:36
pabelangerthat is true14:36
*** igordc has quit IRC14:42
smcginnisDaily third party CI question... :)14:43
smcginnisIf I want to use the devstack job in my local zuul instance form https://opendev.org/openstack/devstack/src/branch/master/.zuul.yaml#L34314:44
smcginnisI've added it to my untrusted-projects, but trying to define a job locally that inherits from it results in "Job devstack not defined".14:44
smcginnisIs there something else I need to do in order to be able to use that?14:45
pabelangersmcginnis: in your tenant config for devstack, did you allow loading of jobs?14:45
smcginnispabelanger: Ah, just noticing... is that "include: - job"?14:46
pabelangeryah14:46
smcginnisOK, nope, I did not do that part. Trying now.14:46
smcginnispabelanger: That job has a long list of required-projects. Will I also need to add those in my zuul config as untrusted-projects?14:47
pabelangeryes14:47
smcginnisOK, thanks. That saves me some digging then. I'll get all of that added and try things out. Thanks!14:47
pabelangerI started testing devstack on rdoproject zuul a while back, but don't think I got it working 100%. Let me look to see if I can find the code14:48
smcginnisOh, great. Or if there's some other way - I ultimately need to be get things running and run tempest against it for third party CI testing.14:50
pabelangersmcginnis: looks like I just added the tenant config, which you already know. Seems I didn't import all the required-projects, and code seems to have been removed now from rdoproject14:52
smcginnispabelanger: OK, no worries. This is a great start, so I'll keep fiddling. Thanks for looking.14:52
pabelangersmcginnis: the job _should_ load properly, that was some of the work that needed to be done. However, you also might run into issues with missing nodesets, which you'll need to also define locally14:53
smcginnisAh, I didn't think about that.14:53
clarkbdevstack has itsown nodesets14:53
pabelangeryah, zuul does a good job at saying what doesn't work :)14:53
smcginnisI'll see if it makes sense to get all that matched up, or just define a local job and hope it doesn't diverge too much over time.14:54
fungias long as the nodeset is defined zuul will be happy. you don't actually need nodes matching those provided by nodepool if you're not actually going to run the jobs declaring they use them14:57
corvussmcginnis: it's going to be some upfront investment to get the list of all the things you need to add, but i think it's going to be worth it.14:58
corvusyeah, nodesets are something that may make sense to override locally14:58
fungisomeone else was working on putting together exactly the same list for the base devstack job... who was it?14:58
fungimaybe they've already done that legwork now14:58
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213414:59
*** sanjayu_ has quit IRC15:09
corvusfungi: several people have done it, i have not seen it shared yet.15:09
smcginnisOK, I'll keep working towards that then and hopefully get it all documented well.15:11
*** igordc has joined #zuul15:23
Shrewscorvus: i have no idea what's going on with the plugin tests, but https://review.opendev.org/663762 has seen so many random failures with it. Earlier failures were timeout related. The new one post-fungi's timeout fix is now: http://logs.openstack.org/62/663762/10/check/tox-py35/ec8a51c/job-output.txt.gz#_2019-06-13_15_08_38_57433615:24
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213415:24
corvusShrews: that means an executor is still running a job15:25
Shrewscorvus: seems unrelated to my change. and the following change passed all tests15:27
corvusShrews: could it be related to http://logs.openstack.org/62/663762/10/check/tox-py35/ec8a51c/job-output.txt.gz#_2019-06-13_15_19_28_066472 ?15:28
corvusShrews: and also http://logs.openstack.org/62/663762/10/check/tox-py35/ec8a51c/job-output.txt.gz#_2019-06-13_15_08_58_19077115:29
Shrewssubunit exception. neat15:29
Shrewsi wonder if there was a recent release of subunit15:30
corvusShrews: i think the test needs to be split.  let's just cut it in half.15:30
corvusShrews: that will address both timeouts and subunit report lengths.15:31
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213415:31
corvusfungi: ^15:31
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213415:32
fungiwill take a look after the meetings i'm in. we could just revert for now15:33
Shrewscorvus: by "cut in half", you mean a new test_plugins() test with half of the entries in the plugin_tests array?15:34
corvusShrews: yep15:35
corvusfungi: i think it'll be just as easy to split as it would be to revert15:36
Shrewsi'll toss a change up15:36
fungithanks Shrews!15:36
fungii hadn't yet looked closely enough at how the framework for that test was done to work out how much duplication/abstraction would be needed to split it15:37
corvusclarkb, pabelanger, fungi, Shrews, tobiash: did we decide yesterday that we should relesae zuul now?  how about nodepool?15:38
fungii think there was a suggestion to restart the opendev deployment on the current state. i expect we might as well lump nodepool in15:38
fungithough i haven't looked to see what's landed in nodepool since the last tag15:38
corvusenough for a releas i think https://zuul-ci.org/docs/nodepool/releasenotes.html15:39
pabelangercorvus: +1 for zuul release15:39
corvusokay, so the plan is: restart all of opendev today, release both later today or tomorrow?15:39
pabelangerwfm15:40
pabelangeralso, we've been using nodepool 3.6.1.dev16 without any issues, so +1 for tagging that too15:40
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Split plugin tests into two  https://review.opendev.org/66516115:41
fungiyeah, the conclusion with the zuul memory leak we're seeing in opendev is that it took a couple weeks to manifest after scheduler restart the last couple times it bit us, so it'll probably be a while before it crops up again and we shouldn't delay the release waiting for that15:41
Shrewscorvus: i'm not sure which version of nodepool we are running in production and if we're in sync with master15:42
Shrewslemme check15:42
Shrewslooks like status logs says we last restarted launchers on 58a2a2b68c58f9626170795dba10b75e96cd551 to pick up memory leak fix15:43
Shrewserr f58a2a2b68c58f9626170795dba10b75e96cd55115:44
Shrewsthat's the 3.6.0 tag15:44
corvusi think that's old enough to warrant a restart15:44
*** bhavikdbavishi has joined #zuul15:45
Shrewscorvus: agreed. which means "no" on nodepool release just yet15:45
corvusShrews: we need to restart zuul too, so i was thinking we'd restart both today and release tomorrow; how's that sound?15:45
Shrewscorvus: fine by me. if there's a nodepool problem, it should be spotted rather quickly15:46
*** bhavikdbavishi1 has joined #zuul15:53
*** jangutter has quit IRC15:54
*** bhavikdbavishi has quit IRC15:54
*** bhavikdbavishi1 is now known as bhavikdbavishi15:54
ofososI get 'something went wrong' from zuul.openstack.org15:56
corvusofosos: please see #openstack-infra for information related to that service15:57
corvuspabelanger, tobiash: http://paste.openstack.org/show/752891/15:57
clarkbcatcing up after getting kids out the door for school and ya thta plan sounds good to me15:57
pabelangercorvus: oh, was that a retry?15:58
corvuspabelanger: no idea15:58
clarkbcorvus: I feel like that should go under "achivement unlocked" re abuse against github by zuul15:59
clarkbdid that happen after the restart? If so the multiprocessing change may send too many requests at once to github?15:59
corvusclarkb: yes15:59
pabelangerI wonder if ad668d74-8df3-11e9-93ab-4ff1818b4f8e got 502 Server error, then we sleep(1) and tried again15:59
smcginnisOK, I'm feeling dumb now. Where do I need to define things to get rid of 'Unable to freeze job graph: The nodeset "openstack-single-node-bionic" was not found.'16:00
corvuspabelanger: did you log retries?16:00
clarkbcorvus: yes they shouldbe logged16:00
pabelangercorvus: yah, you should see it as exception16:00
corvussmcginnis: define a nodeset called "openstack-single-node-bionic"16:00
corvuspabelanger: feel free to dig, i have too many windows open at the moment running the restart16:00
smcginniscorvus: Where is that actually done. I thought I did, but I still get the error.16:00
pabelangerhttps://review.opendev.org/664843/ might be why16:00
corvussmcginnis: it can be in any repo in the tenant16:01
pabelangercorvus: sure, looking now16:01
clarkbpabelanger: I think it more likely the multiprocessing change is to blame16:01
clarkbpabelanger: I would think a second delay between requests is plenty. But sending ~20 (or however many threads are in the pool) requests at once may make it unhappy16:01
tobiashcorvus: you hit the rate limit?16:01
pabelangerclarkb: oh, maybe16:02
tobiashOh and the retry succeeded :)16:02
tobiashcorvus: re release, did the python interpreter work land?16:03
pabelangertobiash: where do you see the succeed?16:03
tobiashpabelanger: maybe i misinterpreted the log16:03
*** sshnaidm is now known as sshnaidm|off16:04
pabelangertobiash: Oh, I think you are right16:04
corvustobiash: i believe so:     Merge "executor: use node python path"16:04
pabelangertobiash: let me look at code again16:05
corvustobiash: i'm under the impression that both sides of that will be present in the nodepool and zuul releases, but if i'm wrong, let me know :)16:05
tobiashAh yes, so then ++ for release after burn in16:05
pabelangertobiash: actually, I don't think we retried. Looking at 664843 we'd have to add github3.exceptions.ForbiddenError too. But right now we don't trap generic github exepctions. cc clarkb16:06
tobiashcorvus: correct, just wanted to make sure that the zuul part is not missing :)16:06
corvus++16:06
clarkbpabelanger: ya I think that behavior is correct there16:06
clarkbpabelanger: retrying would only make the abuse perception worse16:06
pabelangeryah16:06
tobiashGithub timeout and retry is also in so ++ for burn in and release16:07
pabelangerclarkb: I think you might be right, I don't see a pervious failure on ad668d74-8df3-11e9-93ab-4ff1818b4f8e in zuul logs. So maybe your comment about multiprocess change is to blame.16:09
pabelangertobiash:^16:09
clarkbpabelanger: for retries and abuse detection I think we may want a backoff that is more sophisticated than a sleep(1)16:09
clarkblike sleep with increasing backoff if we detect that case or something16:09
pabelangerclarkb: yah, I'm kinda curious why ratelimit didn't help here16:10
tobiashAnd I think there is still some potential to optimize away a few requests16:10
tobiashWe don't do rate limiting afaik16:11
tobiashWe only log it16:11
pabelangerokay, I see a retry attempt: dc71183c-8df3-11e9-97c9-52bfbc81ffb516:11
pabelangerlooking at logs16:11
*** hashar has joined #zuul16:12
*** olaph has quit IRC16:12
pabelangertobiash: clarkb: :( http://paste.openstack.org/show/752892/16:15
pabelangerthat is a retry attempt16:15
pabelangerfrom a 502 Server Error16:15
pabelangerso, sleep(1) doesn't seem to be enough time16:15
pabelangertime to read docs on why that is16:16
corvustobiash: re log annotations -- did you happen to see a way to get the tracebacks formatted like the other lines?  (so they show up in a grep?)16:19
pabelangerjlk: maybe you also have suggestion about 502 Server Error we get back from github api. We created, https://review.opendev.org/664843/ but now look to be tripping the abuse detection mechanism.16:21
fungipabelanger: what sort of api query is causing that?16:24
corvuspabelanger, tobiash, jlk: here's the expanded log entries for that event: http://paste.openstack.org/show/752893/16:24
corvusfungi: our old friend "getPullBySha" -- the info that everyone (github internal devs included) really wants included in the event16:25
fungigot it16:25
pabelangeryah16:25
pabelangera quick google says we _should_ get Retry-After header back16:25
fungiso it's a read operation16:25
pabelangerbut need to confirm that16:25
jlkis it timing out or is the 502 immediate?16:26
jlkMy team was CCd on an issue that look slike there is a recent spike in somewhat immediate 502s16:26
corvusjlk: i think it took a little over a minute to get the 502 back, if i'm reading the logs right; i'll double check that16:27
pabelangerright, we now see it more because we've also bumped up the default_read_timeout to 300: https://review.opendev.org/664667/16:27
corvusyeah, in http://paste.openstack.org/show/752893/ "Handling status event" is right before the call, and "Failed handling" is right after16:27
corvuspabelanger: right, if someone was using github3.py with the defaults, they would have hit the 10 second read timout before getting the 50216:28
*** jamesmcarthur_ has joined #zuul16:29
pabelangertalking to some ansibullbot folks, they say there is an undocumented ~20 POST per minute, before hitting abuse things. Maybe we are also hitting that now16:32
*** jamesmcarthur has quit IRC16:32
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213416:36
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502316:37
smcginnisIs there a config option somewhere that controls allowed disk space? I see I'm getting aborts now from ExecutorDiskAccountant because the limit is set to 250mb.16:43
clarkbsmcginnis: https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.disk_limit_per_job16:45
smcginnisPerfect, thanks clarkb16:46
clarkbthat should be plenty for devstack jobs last I checked16:46
clarkbpabelanger: POST would be to leave comments on PRs?16:47
smcginnisIt looks like it's happening while checking out the other repos. The one I have on screen right now was Checking out openstack/cinder, Checking out master, ExecutorDiskAccountant warning using 544MB (limit=250)16:47
clarkbpabelanger: the searching should all be GETs right?16:47
pabelangerclarkb: yah, that is right. So, I'm now looking into search api16:48
pabelangerbecause i do see a bit of google hits around search and abuse16:48
clarkbsmcginnis: hrm I didn't think the git repos counted against that16:48
pabelangermaybe we are missing something with rate-limit16:48
pabelangerwith new multiprocessing change16:48
corvussmcginnis: that can happen if the executor mounts are misconfigured -- https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.git_dir and https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.job_dir need to be on the same filesystem16:49
smcginnisclarkb: I was a little surprised to see them all checked out there.16:49
pabelangerI have to relocate, but plan to keep looking. We do seem to be hitting abuse message on zuul.o.o often16:49
pabelangerback shortly16:49
fungismcginnis: normally you would deploy it so the workspace and the git cache are on the same fs, and then git will just make hardlinks when cloning16:54
fungiif they are not on the same fs, git will copy all the data16:54
smcginnisfungi: I'm just running the containers from the doc/source/admin/examples docker-compose setup, so I would have thought it would all be on the same fs.16:55
corvussmcginnis: can you share your docker-compose.yaml file and your zuul.conf?16:56
*** jamesmcarthur_ has quit IRC16:57
*** jamesmcarthur has joined #zuul16:58
corvussmcginnis: also "docker exec -it mount examples_executor_1" may be helpful16:58
smcginnis"Error: No such container: mount"16:58
smcginnisGetting the rest...16:59
corvuser, other way around then16:59
corvus"docker exec -it examples_executor_1 mount"16:59
smcginnisHeh, yep. Sorry, didn't really look at the command when I ran it. That makes more sense.16:59
fungimattw4 ran into this exactly a week ago too: http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2019-06-06.log.html#t2019-06-06T20:22:0816:59
smcginniscorvus: Adding to https://etherpad.openstack.org/p/yvyRWS72JG17:00
fungithough in that case it seems to have been caused by a spurious /var/lib/zuul bindmount17:01
fungiso i guess same symptom, different underlying misconfiguration17:01
corvussmcginnis: try running "df" in the container17:03
smcginnisIn the executor?17:03
corvusyeah17:03
smcginniscorvus: Added to the bottom.17:03
smcginnisNo I just get NODE_FAILURE17:06
*** mattw4 has joined #zuul17:07
corvusthis is really weird...17:08
*** pcaruana has joined #zuul17:09
smcginnisI noticed that I had one DISK_FULL failure, but now the last couple attempts were NODE_FAILUREs.17:09
corvusi'm focused on the disk issue17:09
corvusit's going to be really easy to sweep that under the rug; it needs to be fixed17:10
*** hashar has quit IRC17:10
corvuswhen i run docker-compose locally, i'm seeing mounts in containers which i don't expect17:10
tobiashcorvus: re log annotations, I think I saw a change in nodepool that does something like this17:10
smcginnisJust want to warn that its root cause potentially has gone away since the configs I pasted appear to have gotten by the DISK_FULL error and is hitting NODE_FAILURE instead.17:10
*** gtema has quit IRC17:11
corvussmcginnis: but you disabled the disk limit?17:11
corvusanyway, give me a minute, i'm trying to put together a demonstration of what i'm seeing that is weird17:12
smcginniscorvus: I did now. I can remove disk_limit_per_job and restart to see if the DISK_FULL error comes back, but I just am not sure right now if that went away before or after that change.17:12
corvussmcginnis, fungi, tobiash: this doesn't make sense to me: https://etherpad.openstack.org/p/yvyRWS72JG  lines 225-23717:15
corvusthat's in my executor container; there should be no /var/lib/zuul mount there17:15
tobiashthat's weird17:17
corvusthis seems to match the behavior that smcginnis is seeing too -- smcginnis, if you run "df /var/lib/zuul" does it also show you that the fs is mounted on /var/lib/zuul ?17:17
smcginniscorvus: Is it from here: https://opendev.org/zuul/zuul/src/branch/master/Dockerfile#L4317:18
tobiashcorvus: does the dockerfile specify /var/lib/zuul as volume?17:18
fungiindeed, that's the presumed spurious /var/lib/zuul mount mattw4 had17:18
corvusoh, it does...17:18
fungihe said he thought he'd mounted it to give access to ssh keys17:18
fungibut maybe not?17:18
smcginniscorvus:  /dev/vda1       40470732 6877524  33576824  18% /var/lib/zuul17:18
corvusthat would do it17:18
tobiashbecause the scheduler container specifies it (line 61) and there is probably some automagic connection17:18
*** jpena is now known as jpena|off17:19
corvusokay, given that, i think i understand the patch that's needed.  i'll push it up and we can see if we agree17:19
smcginnisSo, should the reason for my current NODE_FAILURE show up in the executor log, or should I be looking somewhere else to figure out why that's happening?17:22
tobiashcorvus: re log annotations, this is the nodepool change I meant: https://review.opendev.org/61319617:22
tobiashbut it is using a custom formatter17:23
corvussmcginnis: probably nodepool launcher, or if not, possibly scheduler17:23
smcginnisk, thanks. I'll look17:23
corvussmcginnis: (the executor doesn't go into action until the scheduler hands it a node which it gets from the nodepool launcher)17:24
*** panda has quit IRC17:24
smcginnisOK, that makes sense. So if the node has a failure alon ghte way, it never gets sent over.17:25
openstackgerritJames E. Blair proposed zuul/zuul master: Correct quick-start executor volume error  https://review.opendev.org/66518617:25
corvussmcginnis: yep.  which is why the most likely place to find the error is the launcher, but if that's inconclusive, the scheduler should know why it declared it a node failure17:26
corvussmcginnis, fungi, tobiash, mattw4: see  https://review.opendev.org/66518617:26
fungiyup, saw you push and just finished reviewing17:27
fungithanks!!!17:27
corvus(i still think we should change the executor default, but i think it's safer to make this change quicker)17:27
*** panda has joined #zuul17:27
tobiash++ for changing the default17:27
corvussmcginnis, mattw4: thanks for helping us find that; that was rather subtle, and i'm sorry you had to run into it for us to see it17:28
smcginnisGlad some of this has been useful!17:29
mattw4me too!  You all have helped a tremendous help too!!17:30
mattw4I am a native English speaker so I have no excuse for the grammar ^ :)17:31
smcginnis:D17:31
fungii make excuses for my grammar all the time17:32
corvusit's okay, that sentence made me feel very helpful :)17:35
fungiextra helpful even17:37
smcginnisJust FYI, the NODE_FAILURE I was hitting was found in the executor logs and it was due to not being able to deploy the openstack-single-node-bionic nodeset defined by the devstack job. So makes sense, just needed to figure out where to look for the error. Thanks again.17:38
mattw4Tremendously! :)17:39
mattw4smcginnis, I kinda faked it by defining a nodeset with that name and supplying my own node label in the definition.17:40
mattw4Scheduler complains that some nodes are undefined, but I don't need those nodes for my jobs. I'm not sure if that is a problem, but it doesn't seem to impact the tests that I'm running ATM17:41
smcginnismattw4: Good call, I think that was my mistake of not setting the label right.17:41
fungii think that highlights a rough patch in the job sharing model, not sure if anyone's yet thought through how to deal with reusing jobs that specify node labels which may not be relevant in the consumer's context17:42
smcginnisSeems like you need to be able to separately share the jobs and their resource requirements with the nodes and what resources they can provide.17:43
fungismcginnis: oh, so i guess we're missing something to make /var/lib/zuul/builds a valid job_dir?17:43
smcginnisBut that's a drastic oversimplification.17:43
smcginnisfungi: Yeah, looks like it.17:44
fungidoes it need to be created first?17:44
smcginnisThat's what it would appear.17:44
*** jamesmcarthur has quit IRC17:58
tobiashdoes anybody know the book Powerful Python by Aaron Maxwell?18:02
tobiashI just stumbled accross it and I'm wondering if anybody would recommend reading it18:03
clarkbcorvus: re the volume thing, do we think it would be better to have flexibility in the deployment and have docker-compose or similar do the specification rather than the image?18:04
clarkbI guess the problem with that is then people have to know to add it to compose or whatever18:04
clarkbso better off in the image18:04
smcginnisclarkb: Umm, you just approved the patch that has an error. Maybe I should have left -1 instead of just commenting. Might want to remove approval on that one.18:07
clarkbsmcginnis: done18:07
clarkbsmcginnis: and ya that is what -1 is for :P18:07
smcginnis:)18:08
clarkbfungi's comment is probably on the money for why it isn't working18:08
smcginnisIt was a "I'm getting an error but could be convinced it's just me" 0.18:08
clarkbBecause that is a volume mount we can't mkdir it during the build so I think we have to add that to the init script thing18:09
*** ianychoi has quit IRC18:13
corvusor have the executor create it18:14
tobiashI'd vote for the executor18:14
corvusmattw4, smcginnis, fungi: defining a local nodeset that satisfies what upstream jobs like devstack needs is exactly what i would expect.  and if you don't actually need to use it, you could define it with "nodes: []".18:15
corvusi'll work on an update to the patch; i'll probably just go ahead and switch the default, since it's going to involve executor code changes18:20
*** bhavikdbavishi has quit IRC18:23
*** michael-beaver has joined #zuul18:23
openstackgerritJames E. Blair proposed zuul/zuul master: Change default job_dir location  https://review.opendev.org/66518618:28
pabelangerclarkb: so, looking at github3.py, we should be able to inspect the exception for response headers on 502, I'm trying to see if there is 'Retry-After', if so we can use that value for our sleep.18:28
clarkbpabelanger: ++18:29
pabelangerclarkb: other wise, maybe we need a better backoff process as you mentioned before18:29
corvustobiash, smcginnis, fungi, mattw4: okay, there's a slightly more substantial change ^  since that will need new images, etc, i'd suggest smcginnis and mattw4 just manually "mkdir /var/lib/zuul/builds" on the executor (since it's a volume, that will persist) and set the job_dir value in zuul.conf as in the previous patch.  then after that change merges, you should be able to undo that.18:30
corvuspabelanger: sounds good -- also, be thinking about whether we should hold the release for this (i'm inclined to -- this is the sort of thing we hope to catch by burning in on opendev).18:31
mattw4sounds good corvus, I will do that18:32
pabelangercorvus: +1, I think we'll need to fix this before releasing18:32
Shrewsi guess our zuul timeouts are still not long enough? http://logs.openstack.org/61/665161/1/gate/tox-py36/c0ebbc7/job-output.txt.gz#_2019-06-13_18_15_58_35190518:32
corvusShrews: wow, that was a job timeout18:33
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213418:39
*** jamesmcarthur has joined #zuul18:56
*** hashar has joined #zuul19:02
openstackgerritPaul Belanger proposed zuul/zuul master: Improve retry handling for github driver  https://review.opendev.org/66522019:21
clarkbpabelanger: were you able to check if retry after is ever present?19:22
pabelangercorvus: clarkb: tobiash: jlk: ^ is my first attempt to deal with 502 / 403 github errors. Based on things I am reading on the web, and some manual testing 'retry-after' was there19:22
pabelangerclarkb: yah, let me get paste19:22
pabelangerclarkb: but I am not sure if it is on 502 error19:22
clarkbcool that explains the fallback19:23
clarkbnote that will cause a 5 minute backup if it never recovers from the 502 and there isn't shorter retry after values19:23
pabelangerhttp://paste.openstack.org/show/752900/19:23
clarkb(I think we can probably test with this and see if that causes problems)19:23
*** jamesmcarthur has quit IRC19:24
pabelangerclarkb: yah, I didn't actually wait 60 seconds, so maybe we should add a little buffer?19:24
pabelangerI just did testing using curl19:24
corvusis that inside our parellilized workers our outside?19:24
clarkbcorvus: I believe it is inside19:25
clarkbso once we get past that 5 minute zone we should catch up quick19:25
pabelangeror maybe we don't retry 5 times?19:25
corvusclarkb: but other queries will still be happeninng in parallel, so we're only waiting for the sequencing19:25
corvus?19:25
clarkbcorvus: correct19:26
corvuscool, i think (based on what i know atm) that's the way to go.  at least, until we discover more about github rate limiting :)19:26
pabelangeryah, I didn't find https://developer.github.com/v3/#rate-limiting too helpful, with examples19:28
corvuspabelanger: i like the patch, but i left a suggestion about improving the debug info for us19:29
pabelangerack, give me a few mins to look19:30
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502319:31
*** rf0lc0 has quit IRC19:44
*** rfolco has joined #zuul19:44
openstackgerritPaul Belanger proposed zuul/zuul master: Improve retry handling for github driver  https://review.opendev.org/66522019:48
pabelangercorvus: clarkb: ^updated19:48
hogepodgeclarkb: right now locistack is broken during a refactor to use stock images, chasing down the issue right now. should have numbers sometime next week.19:50
corvushogepodge: cool, i'm going to proceed with the devstack approach, and we can look at swapping it in later.19:52
corvusshould be fairly isolated19:52
hogepodgethat sounds best, will give me a chance to do a last bit of housekeeping and setting up a tempest job against it so I can create an opendev repository19:54
pabelangerjlk: maybe you could confirm if 'Retry-After' would be present on a 502 Server Error response, I haven't been able to find much info on the web. If you have the ability19:57
fungias a first step we could start logging more details from the 502 responses19:59
corvuspabelanger: not quite what i had in mind, may i push up revision?20:00
pabelangercorvus: please do so20:01
corvuspabelanger: also, are you sure you want to retry forbidden errors?20:02
pabelangercorvus: that was mostly based on the pastebin from today, so we could open it to more20:04
pabelangerfrom my readings on the web, 403 did return 'retry-after' header20:04
pabelangerbut it was difficult to see what else did20:04
openstackgerritJames E. Blair proposed zuul/zuul master: Improve retry handling for github driver  https://review.opendev.org/66522020:05
corvuspabelanger: this update should supply the information we need to answer that question ^20:06
pabelangerah, much better20:06
corvusoh 1 thing20:06
openstackgerritJames E. Blair proposed zuul/zuul master: Improve retry handling for github driver  https://review.opendev.org/66522020:07
corvusmissed type conversion20:07
pabelangercorvus: thanks, I see what you were asking now. +120:11
corvusdoes anyone know how this $REGION_NAME variable gets set? https://opendev.org/zuul/nodepool/src/branch/master/devstack/plugin.sh#L30320:28
corvusoh, that must come from devstack20:29
openstackgerritMerged zuul/zuul master: Split plugin tests into two  https://review.opendev.org/66516120:31
pabelangertobiash: clarkb: if you don't mind adding https://review.opendev.org/665220/ to your review pipeline, I think we should try to restart zuul.o.o with that to help avoid the 'abuse' errors we are now getting20:36
*** pcaruana has quit IRC20:36
tobiashLgtm20:41
openstackgerritJames E. Blair proposed zuul/zuul master: Change default job_dir location  https://review.opendev.org/66518620:43
corvusjust a minor pep8 fix on that ^, otherwise it passed all the tests, so should be gtg20:43
openstackgerritMatthieu Huin proposed zuul/zuul master: Web: plug the authorization engine  https://review.opendev.org/64088420:44
corvusclarkb, fungi, Shrews: running devstack without the benefit of local git clones took 25 minutes 11.9 seconds (which ara rounds up to 13? cc:dmsimard): http://logs.openstack.org/23/665023/8/check/nodepool-functional-openstack/194fed6/ara-report/20:47
*** panda has quit IRC20:49
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109920:49
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502320:50
dmsimardthe duration in ara is calculated based on the time the task started and when it ended and then it's rounded in the webapp20:50
fungiheh, i suppose you can make the argument that 11.9 is roughly 13 when rounded to the nearest odd number? ;)20:51
fungiwell, rounded up to the next odd number anyway20:51
dmsimardthere is some latency20:51
dmsimardbecause task ends -> tells ara task ended -> ara marks end timestamp20:51
*** panda has joined #zuul20:51
fungigot it. so this is time it took for ara to become aware it was done20:51
dmsimardyes20:52
fungiit just gets a notification, not a timestamp passed to it20:52
*** hashar has quit IRC20:55
openstackgerritMatthieu Huin proposed zuul/zuul master: Web: plug the authorization engine  https://review.opendev.org/64088420:57
dmsimardfungi: right -- this is more or less also how the upstream profile_tasks callback plugin calculates the duration but there is less overhead since it's just printing to stdout20:57
dmsimardhttps://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/profile_tasks.py20:57
corvusdmsimard: ah ok, i assumed it was working from the same data that shows up here: http://logs.openstack.org/23/665023/8/check/nodepool-functional-openstack/194fed6/ara-report/result/6cad0ed8-1cee-47d3-b1a3-58426aef0e37/21:20
corvus(start/end/delta)21:20
dmsimardcorvus: the problem is that (unless mistaken), those fields are not always returned21:20
corvusyeah, i guess those are "command module" specific fields?21:21
dmsimardlike, depending on which module was used21:21
dmsimardyeah21:21
corvusgot it, til, thx :)21:22
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502321:24
fungialso, i suppose they could "lie" under some circumstances, so having an external timer helps keep them honest even if it does only provide loose bounds on the runtime21:25
pabelangerdarn, we had py36 job timeout21:38
pabelangerlooks like be limestone21:38
dmsimardfungi: if Ansible would reliably return timestamps for every module/action/etc, we'd probably use it21:40
smcginnisMaybe a little more relevant in -infra than #zuul, but any idea why the devstack job would have ansible_interfaces undefined errors? Didn't collect facts, but where?21:55
mattw4smcginnis, I know this one!!21:55
smcginnis:)21:55
mattw4I created a new role in base:pre.yaml to collect all facts with the setup module21:55
smcginnisOh good, I'm not the only one hitting some weird thing.21:55
mattw4I couldn't figure out how to make Zuul collect all facts by default so I just added a small role with the setup module21:56
smcginnisHmm, I tried something similar adding "gather_facts: True" to my task in pre.yaml, but same error.21:56
smcginnismattw4: Do you have that up somewhere I could take a peek?21:57
mattw4I think it gathers facts by default, but the fact set is limited to the minimum (!all)21:58
mattw4smcginnis: it'21:58
mattw4smcginnis: it's in an internal repo ATM, but I can share the role, just a sec21:58
smcginnisThanks mattw4!21:58
mattw4smcginnis: I named it "gather-all-ansible-facts" and it's super-small: http://paste.openstack.org/show/752907/22:00
mattw4smcginnis: that added the 'ansible_interfaces' list to the fact set22:01
smcginnisAwesome, I'll give that a shot. Thanks!22:01
mattw4np :)22:01
mattw4I already posted this in #openstack-qa, but I think this may be the right audience: Does anyone know why devstack would fail to install an SSL certificate for Apache2, causing a failure when apache2.service is restarted after installing uwsgi?22:03
mattw4the job is a child of devstack-minimal with a few additional services enabled.22:04
corvusmattw4: that's probably a better #openstack-qa question unless zuul itself is somehow involved (but it doesn't sound like it)22:05
mattw4corvus: ok.  True, it's probably not Zuul.  Thanks tho.22:06
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502322:17
*** jamesmcarthur has joined #zuul22:18
pabelangercorvus: clarkb: https://review.opendev.org/665220/ should land in the next 90mins, do we want to look to restart zuul again today or hold off until another time? I'll be able to assist either way22:19
clarkbI'm in the middle of copying lots of data around so that we can do ssl cert updates on infra services22:22
clarkbso I'll defer to others22:22
pabelangerack22:25
pabelangeralso, I should have used #openstack-infra for that22:25
*** tobiash has quit IRC22:40
ofososcorvus, SpamapS: any love for https://review.opendev.org/#/c/662134/54 ? All the basic test s are good now... I'd like some guidance on how to proceed23:14
ofososLinter will be fixed tomorrow23:18
clarkbofosos: you may want to send an email to the zuul-discuss list soliciting reviews? Sounds like its to a point where it is generally working and now its double checking (and potential refinement)?23:19
ofososclarkb: +123:20
*** jamesmcarthur has quit IRC23:24
corvusofosos: looks like there's a pep8 error at http://logs.openstack.org/34/662134/54/check/tox-pep8/cc56a61/job-output.txt.gz#_2019-06-13_20_51_57_086167 but that's the only failing test23:25
corvusofosos: might want to go aheand and push up a fix for that; i can give the whole stack a closer look tomorrow.  i'm looking forward to it!  :)23:26
ofososJup, it's a single line. I was already at the pub when that popped up. My ide was happy with the code. I'll fix it tomorrow23:27
ofososHad to celebrate a birthday yesterday (already Friday in my tz).23:28
ofososcorvus: very good :)23:29
openstackgerritMerged zuul/zuul master: Improve retry handling for github driver  https://review.opendev.org/66522023:31
ofososI'd still like to refactor some things. Sometimes it's unclear where you have to pass a project or a project name. That's the biggest problem I saw.23:31
ofososThe testing process was really nice though. The fixture tests took 10 hours today too get right, but in the end I think it's for the better.23:32
ofososI also need to incorporate API paging, but that can be done on the client level.23:33
jlkpabelanger: I don't know if a reply-after is going to be present. It looks like our system can throw a 502 if a query has gone longer than 10 seconds, and there was a recent change that caused that to happen a lot more often. This change was reverted a few hours ago, so I'm curious if Zuul is still seeing a slew of 502s.23:34
mattw4Where can I set zuul_log_verbose: true to produce more verbose logs?23:51
pabelangerjlk: great, thanks for the information, we've landed https://review.opendev.org/665220/ to help deal with it, and give additional info23:51
pabelangermattw4: you can set it in your playbook where you call the upload-logs role23:52
mattw4pabelanger: gotcha, Thanks!23:52
pabelangerHmm, for some reason, we have 3 PRs in our third-party-check pipeline, from the same PR: https://dashboard.zuul.ansible.com/t/ansible/status23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!