clarkb | infra-root can I get https://review.openstack.org/#/c/516473/ reviewed and merged so that I don't have to disable puppet on logstash.o.o? | 00:00 |
---|---|---|
clarkb | it just restarted the daemon with the broken config (I'm going to restart with manually fixed config now) | 00:00 |
*** andreww has joined #openstack-infra | 00:00 | |
*** ijw has quit IRC | 00:01 | |
fungi | it's reviewed and approved | 00:01 |
*** ijw has joined #openstack-infra | 00:01 | |
*** bobh has quit IRC | 00:02 | |
clarkb | tyty | 00:02 |
*** gouthamr has quit IRC | 00:05 | |
* clarkb pops out to make dinner | 00:06 | |
fungi | subunit-worker02 seems to have gear 0.11.0 installed now, so i'm going to restart the worker on it | 00:06 |
fungi | maybe it'll catch back up | 00:07 |
*** armaan__ has quit IRC | 00:10 | |
*** markvoelker_ has quit IRC | 00:11 | |
*** dingyichen has joined #openstack-infra | 00:11 | |
*** ijw has quit IRC | 00:14 | |
*** ijw has joined #openstack-infra | 00:14 | |
fungi | i need to knock off for the evening. i'll be semi-around tomorrow for meetings but need to take some time to finish prepping for a very long flight on wednesday-friday | 00:16 |
clarkb | good night | 00:17 |
fungi | thanks, you too | 00:17 |
clarkb | and ya I'll be around but also packing/prepping | 00:17 |
*** ijw has quit IRC | 00:19 | |
openstackgerrit | Merged openstack-infra/system-config master: Remove zl's from jenkins-logstash-client config https://review.openstack.org/516473 | 00:21 |
clarkb | 00:24 | |
clarkb | whoops | 00:24 |
*** markvoelker has joined #openstack-infra | 00:25 | |
*** LindaWang has quit IRC | 00:30 | |
pabelanger | clarkb: ah, we should have landed https://review.openstack.org/515181/ :) | 00:31 |
pabelanger | I can rebase quickly | 00:32 |
clarkb | ah sorry | 00:32 |
*** bobh has joined #openstack-infra | 00:33 | |
*** thorst has quit IRC | 00:34 | |
*** ijw has joined #openstack-infra | 00:35 | |
*** owalsh_pto has quit IRC | 00:36 | |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Remove zuul-launcher support https://review.openstack.org/515181 | 00:37 |
pabelanger | ianw: clarkb: okay, should be rebased and fully removes zuul-launchers | 00:37 |
*** hongbin has joined #openstack-infra | 00:39 | |
pabelanger | ugh, another failure on logstash-worker | 00:40 |
pabelanger | I'll have to pick it up in the morning | 00:40 |
*** bobh has quit IRC | 00:42 | |
*** xingchao has joined #openstack-infra | 00:45 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 00:48 |
*** owalsh_ has joined #openstack-infra | 00:49 | |
*** owalsh has joined #openstack-infra | 00:51 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 00:51 |
*** mat128 has joined #openstack-infra | 00:53 | |
*** owalsh- has joined #openstack-infra | 00:53 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 00:54 |
*** zhurong has joined #openstack-infra | 00:54 | |
*** owalsh_ has quit IRC | 00:54 | |
*** owalsh_ has joined #openstack-infra | 00:55 | |
*** owalsh has quit IRC | 00:56 | |
*** owalsh has joined #openstack-infra | 00:58 | |
*** cuongnv has joined #openstack-infra | 00:59 | |
*** owalsh- has quit IRC | 00:59 | |
*** namnh has joined #openstack-infra | 01:00 | |
*** owalsh- has joined #openstack-infra | 01:01 | |
*** LindaWang has joined #openstack-infra | 01:01 | |
*** kiennt26 has joined #openstack-infra | 01:01 | |
*** owalsh_ has quit IRC | 01:01 | |
*** ijw has quit IRC | 01:03 | |
*** ijw has joined #openstack-infra | 01:03 | |
openstackgerrit | Merged openstack-infra/system-config master: Switch to cgit from gitweb https://review.openstack.org/511970 | 01:03 |
*** ijw has joined #openstack-infra | 01:04 | |
*** yamahata has quit IRC | 01:04 | |
*** owalsh_ has joined #openstack-infra | 01:04 | |
*** owalsh has quit IRC | 01:04 | |
*** baoli has joined #openstack-infra | 01:05 | |
openstackgerrit | Merged openstack-infra/system-config master: Remove unneeded encoding change. https://review.openstack.org/515580 | 01:06 |
*** rhallisey has quit IRC | 01:07 | |
*** owalsh has joined #openstack-infra | 01:07 | |
*** xingchao has quit IRC | 01:08 | |
*** owalsh- has quit IRC | 01:08 | |
*** dhinesh has joined #openstack-infra | 01:08 | |
*** ijw has quit IRC | 01:08 | |
*** owalsh- has joined #openstack-infra | 01:10 | |
*** owalsh_ has quit IRC | 01:11 | |
*** smatzek has joined #openstack-infra | 01:14 | |
openstackgerrit | Merged openstack-infra/system-config master: Add stretch mirror for ceph https://review.openstack.org/513591 | 01:14 |
*** owalsh has quit IRC | 01:14 | |
yamamoto | why logstash.o.o shows files which is not in jenkins-log-client.yaml? (like logs/screen-gnocchi-metricd.txt.gz) | 01:17 |
*** bobh has joined #openstack-infra | 01:17 | |
yamamoto | is there another list? | 01:17 |
clarkb | yamamoto: with the switch to zuulv3 we switched to regex based listing based on what is on disk for the job | 01:18 |
clarkb | so anyrhing matching the regex is now pushed, you can see that in project-config/roles iirc | 01:18 |
*** Apoorva_ has joined #openstack-infra | 01:19 | |
*** rlandy has quit IRC | 01:20 | |
yamamoto | clarkb: so jenkins-log-client.yaml is no longer relevant? | 01:21 |
*** psachin has joined #openstack-infra | 01:21 | |
clarkb | for the most part that is correct | 01:21 |
clarkb | we still use it to run the gearman server | 01:21 |
*** psachin has quit IRC | 01:22 | |
*** Apoorva has quit IRC | 01:22 | |
*** psachin has joined #openstack-infra | 01:22 | |
*** Apoorva_ has quit IRC | 01:23 | |
yamamoto | clarkb: i got it. thank you | 01:24 |
*** larainema has joined #openstack-infra | 01:24 | |
*** bobh has quit IRC | 01:26 | |
yamamoto | is it expected that logstash.o.o shows both of job-output.txt and job-output.txt.gz? | 01:31 |
*** smatzek has quit IRC | 01:32 | |
*** LindaWang has quit IRC | 01:34 | |
clarkb | it shouldnjust be one per job iirc | 01:35 |
clarkb | if not then that is a bug | 01:35 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Create build-openstack-puppet-tarball https://review.openstack.org/515980 | 01:35 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Remove publish-openstack-puppet-branch-tarball from post pipeline https://review.openstack.org/515982 | 01:35 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Move publish-openstack-puppet-branch-tarball into ozj https://review.openstack.org/515981 | 01:35 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Revert "Remove publish-openstack-puppet-branch-tarball from post pipeline" https://review.openstack.org/515983 | 01:35 |
*** LindaWang has joined #openstack-infra | 01:37 | |
pabelanger | AJaeger: fungi: clarkb: EmilienM: mnaser: ianw: ^when you have time, that should be the last steps to getting puppet modules 'release / build' jobs as native zuulv3 jobs | 01:39 |
EmilienM | nice | 01:39 |
EmilienM | pabelanger: why did it work on ocata/pike/master? | 01:40 |
EmilienM | and not on newton | 01:40 |
yamamoto | clarkb: message:"At completion, logs for this job will be available at" for 7d seems to have both fo them for many of results | 01:41 |
*** mriedem has quit IRC | 01:42 | |
clarkb | yamamoto: does it have both for the same job? | 01:42 |
clarkb | I think as long as its unique per job we are ok but if not needs investogating | 01:42 |
pabelanger | EmilienM: you'll have to ask mnaser that, maybe jobs didnt get backported properly? The correct path forward is to use the new build-openstack-puppet-tarball job, as it simplifies things greatly | 01:42 |
*** Sukhdev has quit IRC | 01:44 | |
yamamoto | clarkb: both with the same build_uuid | 01:44 |
*** annp has joined #openstack-infra | 01:44 | |
clarkb | ok will have to investigate then | 01:45 |
*** edmondsw has quit IRC | 01:45 | |
clarkb | can you share an example uuid/query? | 01:45 |
yamamoto | clarkb: the query was message:"At completion, logs for this job will be available at" | 01:46 |
yamamoto | clarkb: for 7d period | 01:46 |
yamamoto | clarkb: build_uuid is eg. 2c30560da8f04966af83c1d951dd8603 | 01:46 |
clarkb | thanks will look in the morning | 01:47 |
*** dhinesh has quit IRC | 01:49 | |
mnaser | EmilienM: is it possible that stable/newton didnt have puppet in bindep where it didnt work? | 01:53 |
EmilienM | mnaser: no, see https://review.openstack.org/#/c/515132/ which is on top of the bindep patch | 01:53 |
EmilienM | mnaser: and still failing after recheck | 01:54 |
*** gmann_afk is now known as gmann | 01:58 | |
*** camunoz has quit IRC | 01:58 | |
pabelanger | mnaser: revoke-sudo is called, before sudo install | 01:58 |
*** aeng has quit IRC | 02:00 | |
*** apetrich has quit IRC | 02:03 | |
*** apetrich has joined #openstack-infra | 02:04 | |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Remove publish-openstack-puppet-branch-tarball https://review.openstack.org/515984 | 02:06 |
mnaser | EmilienM: i wonder why its failing foro that | 02:07 |
mnaser | EmilienM: oooooh | 02:09 |
mnaser | one moment | 02:09 |
mnaser | EmilienM: https://review.openstack.org/#/q/Id68ee1b443a4172d0c1d6d58a04908c52a566623 you can blame mwhahaha for this one :D | 02:10 |
mnaser | oh, merge conflict | 02:10 |
EmilienM | I always do | 02:10 |
EmilienM | I can do the git thing | 02:11 |
mnaser | EmilienM: do you mind cherry picking that locally into stable/newton please | 02:11 |
mnaser | that will fix it for you | 02:11 |
*** threestrands has joined #openstack-infra | 02:11 | |
*** threestrands has quit IRC | 02:11 | |
*** threestrands has joined #openstack-infra | 02:11 | |
mnaser | you can do Depends-On as well to get your puppet-tripleo job to be green | 02:11 |
* mnaser goes back to winter tire shopping :( | 02:11 | |
EmilienM | mnaser: ok | 02:11 |
EmilienM | mnaser: I need to buy that also | 02:11 |
mnaser | november 15 is coming up :P | 02:12 |
EmilienM | mnaser: I don't live in QUebec :P | 02:12 |
EmilienM | I don't even know how it works here in BC lol | 02:12 |
EmilienM | mnaser: where do you go? | 02:14 |
mnaser | EmilienM: usually to Costco but they don’t carry tires in the size of my new car | 02:14 |
EmilienM | ok | 02:14 |
mnaser | I found a place in Ottawa that has a nice packages both tire + rims so I can swap them myself | 02:14 |
EmilienM | I'll let professionals doing it :-D | 02:15 |
*** aeng has joined #openstack-infra | 02:17 | |
mgagne | isn't December 15 in Quebec? | 02:18 |
*** rkukura has quit IRC | 02:19 | |
pabelanger | I kinda with I lived in northern ontario, you can put studs on winter tires | 02:20 |
mgagne | yea, just read about it, you can have studs since October 1st but no laws regarding winter tires | 02:20 |
mnaser | mgagne: it was december 15th when they first started enforcing it, but the actaul date was november 15th | 02:21 |
mnaser | wait | 02:21 |
mnaser | it is december 15 | 02:21 |
mnaser | i thought it was november 15 | 02:21 |
mgagne | yes, can't find anything about november | 02:21 |
*** rwsu has joined #openstack-infra | 02:22 | |
mgagne | so you have time, maybe someone suggested November in the news? | 02:22 |
clarkb | EmilienM: are you in vancouver or victoria? | 02:22 |
mnaser | let me keep telling myself november 15 so i can get it done earlier :P | 02:22 |
* mgagne said nothing | 02:22 | |
clarkb | if so you probably dont need tires unless driving up.to eg whistler | 02:22 |
mnaser | mgagne: but also, i have summer performance tires which means the car is useless with the slightest of snow | 02:22 |
clarkb | pnw is typically realtively warm and wet | 02:22 |
*** aeng has quit IRC | 02:22 | |
mgagne | I read you are required by laws to get winter tires in some BC areas | 02:22 |
mnaser | so dont wanna take any chances | 02:22 |
mgagne | hehe | 02:23 |
mgagne | and now to think about storing that motorcycle ^^' | 02:23 |
*** catintheroof has joined #openstack-infra | 02:24 | |
mgagne | time to go home and get some rest for more Nova Mitaka upgrade tomorrow :D | 02:25 |
mnaser | oh boy | 02:25 |
mnaser | bonne chance | 02:25 |
mgagne | thanks =) | 02:25 |
mnaser | odds of a change that's console says "--- END OF STREAM ---" only actually doing work? | 02:26 |
mnaser | 515937 legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet .. the reset would be massive :( | 02:26 |
clarkb | it typically is iirc | 02:26 |
clarkb | there is a bug where we dont always get a stream that hasnt been sortes but job is running | 02:26 |
mnaser | lets hope thats the case | 02:27 |
*** bobh has joined #openstack-infra | 02:30 | |
pabelanger | hmm | 02:33 |
pabelanger | could be 79 isn't listening again | 02:33 |
pabelanger | let me check quickly | 02:33 |
pabelanger | ya | 02:34 |
pabelanger | finger test@ze02.openstack.org | 02:34 |
pabelanger | is down | 02:34 |
mnaser | yep, it went through! | 02:34 |
*** thorst has joined #openstack-infra | 02:35 | |
*** aeng has joined #openstack-infra | 02:35 | |
pabelanger | netstat -na | grep \:79 | 02:35 |
pabelanger | returns nothing on ze02 | 02:35 |
ianw | pabelanger: is it a full executor restart to get that back? | 02:35 |
pabelanger | yah | 02:36 |
pabelanger | I think it is because of high load on the system | 02:36 |
pabelanger | then we somehow loose socket | 02:36 |
pabelanger | ianw: actually | 02:36 |
pabelanger | https://review.openstack.org/516403/ | 02:36 |
pabelanger | we should land that, then do restart all our executors | 02:37 |
pabelanger | that will fix the issue you found yesterday | 02:37 |
ianw | pabelanger: yeah, i need to write a playbook for that, it was a bit of an emergency situation last night | 02:37 |
ianw | we should really stop the scheduler, restart executors, then restart scheduler right? | 02:38 |
pabelanger | https://review.openstack.org/510155/ | 02:38 |
pabelanger | I need to rebase that | 02:38 |
pabelanger | and address comments, but should give us a playbook | 02:39 |
*** salv-orl_ has joined #openstack-infra | 02:39 | |
*** thorst has quit IRC | 02:39 | |
pabelanger | ianw: no, we should be okay to keep scheduler running | 02:39 |
pabelanger | just stop executors | 02:39 |
pabelanger | then start | 02:39 |
*** rkukura has joined #openstack-infra | 02:40 | |
ianw | what about the running jobs though? they just die? | 02:40 |
pabelanger | scheduler will see job aborted and requeue it | 02:40 |
pabelanger | so, users shouldn't need to do anything, just that there job will restart a few times until restarts are finished | 02:41 |
*** salv-orlando has quit IRC | 02:42 | |
ianw | ahh, ok. i guess last night, the executors were in their really odd state, which messed things up | 02:42 |
*** catintheroof has quit IRC | 02:43 | |
ianw | let's merge the typo fix, i'll see about playbook | 02:44 |
*** dhinesh has joined #openstack-infra | 02:45 | |
*** reed_ has joined #openstack-infra | 02:52 | |
*** gildub has joined #openstack-infra | 02:54 | |
*** reed_ has quit IRC | 02:54 | |
clarkb | yamamoto: I think the issue is that we use the archive ansible module to gzip job-output.txt and it does not remove the original file by default | 02:55 |
clarkb | yamamoto: so when we look at the fielsystem to create job we see both job-output.txt and job-output.txt.gz | 02:56 |
*** dave-mccowan has quit IRC | 02:56 | |
clarkb | we actually just want job-output.txt I thin | 02:56 |
*** hongbin has quit IRC | 02:56 | |
*** hongbin_ has joined #openstack-infra | 02:56 | |
yamamoto | clarkb: so adding $ to regex would solve the issue? | 02:57 |
*** cshastri has joined #openstack-infra | 02:57 | |
clarkb | yamamoto: I think so at least for this specific instance | 02:58 |
clarkb | (which may be sufficient | 02:58 |
clarkb | this could also partly explain why we are behind in indexing | 02:59 |
clarkb | we are indexing twice as much data as we should | 02:59 |
*** hongbin_ has quit IRC | 02:59 | |
*** hongbin has joined #openstack-infra | 03:00 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix syntax with gear unRegisterFunction() https://review.openstack.org/516403 | 03:03 |
*** bobh has quit IRC | 03:08 | |
*** rosmaita has quit IRC | 03:11 | |
*** bobh has joined #openstack-infra | 03:14 | |
*** Sukhdev has joined #openstack-infra | 03:15 | |
*** cody-somerville has joined #openstack-infra | 03:17 | |
*** lathiat_ has joined #openstack-infra | 03:18 | |
*** ramishra has joined #openstack-infra | 03:19 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155 | 03:19 |
*** lathiat has quit IRC | 03:19 | |
*** hongbin has quit IRC | 03:22 | |
*** esberglu_ has quit IRC | 03:22 | |
*** esberglu has joined #openstack-infra | 03:22 | |
ianw | ok i've restarted ze01 using ^ to pickup ^^ | 03:23 |
ianw | i will monitor for a bit before doing others | 03:23 |
*** edmondsw has joined #openstack-infra | 03:24 | |
*** liujiong has joined #openstack-infra | 03:26 | |
*** edmondsw has quit IRC | 03:29 | |
*** vkmc has quit IRC | 03:29 | |
jeblair | ianw, pabelanger: let's give it longer to stop. like 15-20m? | 03:33 |
jeblair | (in the playbook) | 03:33 |
*** gouthamr has joined #openstack-infra | 03:34 | |
*** vkmc has joined #openstack-infra | 03:35 | |
jeblair | pabelanger, ianw, Shrews: i think we have some more error logging now which may have information on why finger daemon died on ze02, we should look for that. | 03:35 |
*** armax has quit IRC | 03:35 | |
pabelanger | jeblair: ianw: yah, I don't think it worked well on ze01. We have 3 zuul-executor and 1 defunt process now | 03:35 |
*** armax has joined #openstack-infra | 03:36 | |
ianw | yep, just poking and noticed that | 03:36 |
pabelanger | which usually means, we started an executor one another was shtting down | 03:36 |
pabelanger | shutting* | 03:36 |
ianw | it did correctly find that the pid had disappeared though? it did not timeout | 03:36 |
pabelanger | I have to run now, sould be able to stop both sockets again | 03:36 |
*** bobh has quit IRC | 03:37 | |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502 | 03:37 |
ianw | jeblair: yeah ... but at least checking that pid didn't even hit that timeout (http://paste.openstack.org/show/625041/) | 03:37 |
clarkb | yamamoto: jeblair dmsimard ^ that is totally untested but I think we may want to do something like that to solve both the query logical name problem and the double indexing of job-output.txt/job-output.txt.gz | 03:37 |
clarkb | jeblair: ^ btw I think yamamoto discovered the cause of the increase in index volume we are indexing console logs twice | 03:38 |
ianw | i'm going to stop it manually and see what disappears | 03:38 |
jeblair | yamamoto: thanks! :) | 03:38 |
openstackgerrit | Kien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui gate jobs https://review.openstack.org/516503 | 03:38 |
*** armaan has joined #openstack-infra | 03:39 | |
openstackgerrit | Kien Nguyen proposed openstack-infra/openstack-zuul-jobs master: Remove Zun-ui legacy gate jobs https://review.openstack.org/516504 | 03:39 |
ianw | ok the init.d stop has returned, the process from the pid file is stil lthere | 03:41 |
openstackgerrit | Kien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui legacy gate jobs https://review.openstack.org/516503 | 03:42 |
jeblair | replacement process was probably unable to create a socket file | 03:42 |
ianw | zuul 22102 11596 0 Oct30 ? 00:00:00 [git] <defunct> | 03:44 |
ianw | zuul 22215 1 0 Oct30 ? 00:00:00 ssh -i /var/lib/zuul/ssh/id_rsa -p 29418 zuul@review.openstack.org git-upload-pack '/openstack/tripleo-heat-templates' | 03:44 |
ianw | that ssh parented to init ... | 03:44 |
ianw | ok, 5301 disappeared after 03:47:10 - 03:39:45 | 03:47 |
ianw | so 10 minutes minimum i guess | 03:48 |
ianw | anw@ze01:~$ ps -aef | grep [z]uul-e | 03:48 |
ianw | zuul 11596 1 8 Oct27 ? 06:42:58 /usr/bin/python3 /usr/local/bin/zuul-executor | 03:48 |
ianw | zuul 11599 11596 0 Oct27 ? 00:00:41 /usr/bin/python3 /usr/local/bin/zuul-executor | 03:48 |
ianw | zuul 21113 11599 0 Oct29 ? 00:00:00 /usr/bin/python3 /usr/local/bin/zuul-executor | 03:48 |
ianw | i'll manually clean up these | 03:48 |
*** dhinesh_ has joined #openstack-infra | 03:53 | |
*** dhinesh has quit IRC | 03:53 | |
*** markvoelker has quit IRC | 03:55 | |
*** ijw has joined #openstack-infra | 04:04 | |
*** ykarel has joined #openstack-infra | 04:06 | |
*** udesale has joined #openstack-infra | 04:06 | |
*** ijw has quit IRC | 04:09 | |
*** xingchao has joined #openstack-infra | 04:13 | |
*** armax_ has joined #openstack-infra | 04:21 | |
*** rkukura_ has joined #openstack-infra | 04:22 | |
*** armax has quit IRC | 04:22 | |
*** notmyname has quit IRC | 04:22 | |
*** armax_ is now known as armax | 04:22 | |
*** dhinesh_ has quit IRC | 04:23 | |
*** jpena|off has quit IRC | 04:23 | |
*** dhinesh has joined #openstack-infra | 04:23 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155 | 04:24 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts https://review.openstack.org/516510 | 04:24 |
*** vhosakot_ has joined #openstack-infra | 04:24 | |
*** notmyname has joined #openstack-infra | 04:24 | |
*** vhosakot_ has quit IRC | 04:24 | |
*** sc` has quit IRC | 04:24 | |
*** sc` has joined #openstack-infra | 04:25 | |
*** jpena|off has joined #openstack-infra | 04:25 | |
*** nhicher has quit IRC | 04:25 | |
*** nhicher has joined #openstack-infra | 04:25 | |
*** rkukura has quit IRC | 04:25 | |
*** rkukura_ is now known as rkukura | 04:25 | |
*** vhosakot has quit IRC | 04:27 | |
*** cshastri has quit IRC | 04:31 | |
*** vsaienk0 has joined #openstack-infra | 04:31 | |
*** thorst has joined #openstack-infra | 04:34 | |
*** gouthamr has quit IRC | 04:34 | |
*** vhosakot has joined #openstack-infra | 04:34 | |
*** thorst has quit IRC | 04:39 | |
*** vsaienk0 has quit IRC | 04:41 | |
*** xingchao has quit IRC | 04:49 | |
*** armax has quit IRC | 04:50 | |
*** zhurong has quit IRC | 04:55 | |
*** Sukhdev has quit IRC | 04:55 | |
*** markvoelker has joined #openstack-infra | 04:56 | |
*** dhinesh has quit IRC | 05:06 | |
*** janki has joined #openstack-infra | 05:06 | |
yamamoto | can in-repo .zuul.yaml have periodic jobs? | 05:06 |
*** liusheng has quit IRC | 05:07 | |
*** liusheng has joined #openstack-infra | 05:07 | |
*** edmondsw has joined #openstack-infra | 05:13 | |
*** edmondsw has quit IRC | 05:17 | |
*** sree has joined #openstack-infra | 05:19 | |
*** gildub has quit IRC | 05:25 | |
*** janki has quit IRC | 05:27 | |
*** janki has joined #openstack-infra | 05:27 | |
*** markvoelker has quit IRC | 05:30 | |
*** mat128 has quit IRC | 05:30 | |
*** yamahata has joined #openstack-infra | 05:35 | |
*** xingchao has joined #openstack-infra | 05:36 | |
*** gildub has joined #openstack-infra | 05:37 | |
*** kiennt26 has quit IRC | 05:42 | |
*** gildub has quit IRC | 05:46 | |
*** zhurong has joined #openstack-infra | 05:46 | |
*** threestrands has quit IRC | 05:46 | |
*** cshastri has joined #openstack-infra | 05:58 | |
ianw | pabelanger: ahhh! "path": "/proc/11206\n/status" ... the '\n' is why it doesn't wait properly | 05:58 |
*** threestrands has joined #openstack-infra | 05:58 | |
*** threestrands has quit IRC | 05:58 | |
*** threestrands has joined #openstack-infra | 05:58 | |
*** threestrands has quit IRC | 06:03 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 06:05 |
*** kiennt26 has joined #openstack-infra | 06:05 | |
*** ijw has joined #openstack-infra | 06:05 | |
openstackgerrit | Chason Chan proposed openstack-infra/project-config master: Add pike branch for OpenStack-Manuals gerritbot https://review.openstack.org/516523 | 06:08 |
*** ijw has quit IRC | 06:10 | |
*** gildub has joined #openstack-infra | 06:12 | |
*** dhajare has joined #openstack-infra | 06:15 | |
*** aeng has quit IRC | 06:19 | |
AJaeger | yamamoto: yes, should be possible - try it out - and point me to your change for review | 06:22 |
*** esberglu has quit IRC | 06:24 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 06:25 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155 | 06:25 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts https://review.openstack.org/516510 | 06:25 |
*** markvoelker has joined #openstack-infra | 06:27 | |
*** gongysh has joined #openstack-infra | 06:28 | |
*** nikhil has quit IRC | 06:30 | |
*** thorst has joined #openstack-infra | 06:35 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 06:35 |
*** linkedinyou has joined #openstack-infra | 06:37 | |
*** ijw has joined #openstack-infra | 06:38 | |
openstackgerrit | Merged openstack-infra/project-config master: Setup Contributor Guide in Storyboard https://review.openstack.org/516462 | 06:38 |
*** thorst has quit IRC | 06:39 | |
*** ijw has quit IRC | 06:42 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 06:44 |
*** hemna_ has quit IRC | 06:45 | |
*** hemna_ has joined #openstack-infra | 06:46 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 06:53 |
*** tosky has joined #openstack-infra | 06:53 | |
*** ijw has joined #openstack-infra | 06:54 | |
*** vhosakot has quit IRC | 06:54 | |
*** jtomasek has joined #openstack-infra | 06:58 | |
*** ijw has quit IRC | 06:58 | |
*** rcernin has quit IRC | 06:59 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 07:00 |
*** jtomasek has quit IRC | 07:00 | |
*** markvoelker has quit IRC | 07:00 | |
openstackgerrit | Rui Chen proposed openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor https://review.openstack.org/516532 | 07:03 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010 | 07:05 |
*** spectr has joined #openstack-infra | 07:08 | |
*** salv-orl_ has quit IRC | 07:11 | |
*** salv-orlando has joined #openstack-infra | 07:11 | |
*** yamahata has quit IRC | 07:14 | |
*** esberglu has joined #openstack-infra | 07:15 | |
*** salv-orlando has quit IRC | 07:16 | |
*** kiennt26 has quit IRC | 07:17 | |
*** pcaruana has joined #openstack-infra | 07:17 | |
*** vsaienk0 has joined #openstack-infra | 07:18 | |
*** aviau has quit IRC | 07:19 | |
*** esberglu has quit IRC | 07:19 | |
*** tosky has quit IRC | 07:19 | |
*** kiennt26 has joined #openstack-infra | 07:19 | |
*** aviau has joined #openstack-infra | 07:19 | |
*** gildub has quit IRC | 07:21 | |
*** salv-orlando has joined #openstack-infra | 07:21 | |
*** gildub has joined #openstack-infra | 07:22 | |
openstackgerrit | Bhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/515337 | 07:24 |
openstackgerrit | Bhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/515337 | 07:26 |
openstackgerrit | Bhagyashri Shewale proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project https://review.openstack.org/516537 | 07:26 |
*** vsaienk0 has quit IRC | 07:32 | |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican https://review.openstack.org/510390 | 07:32 |
*** jtomasek has joined #openstack-infra | 07:33 | |
*** vsaienk0 has joined #openstack-infra | 07:34 | |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican https://review.openstack.org/510390 | 07:38 |
*** linkedinyou has quit IRC | 07:39 | |
*** rcernin has joined #openstack-infra | 07:44 | |
*** shardy has joined #openstack-infra | 07:45 | |
*** shardy has quit IRC | 07:45 | |
*** ffledgling has left #openstack-infra | 07:48 | |
*** shardy has joined #openstack-infra | 07:49 | |
*** gildub has quit IRC | 07:50 | |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Barbican legacy jobs https://review.openstack.org/510414 | 07:54 |
*** markvoelker has joined #openstack-infra | 07:58 | |
*** ykarel is now known as ykarel|lunch | 08:03 | |
*** tmorin has joined #openstack-infra | 08:09 | |
*** ralonsoh has joined #openstack-infra | 08:15 | |
*** Liced has joined #openstack-infra | 08:15 | |
*** tesseract has joined #openstack-infra | 08:16 | |
*** sree has quit IRC | 08:17 | |
*** ccamacho has joined #openstack-infra | 08:17 | |
*** kiennt26 has quit IRC | 08:17 | |
*** priteau has joined #openstack-infra | 08:22 | |
leyal | Hi , I need some help - when i tried to upload a patch - i getting the following message : "Received disconnect from 104.130.246.91 port 29418:12: Too many concurrent connections (64) - max. allowed: 64" | 08:22 |
leyal | But i don't have any open connection to 104.130.246.91 .. | 08:22 |
*** yamamoto has quit IRC | 08:23 | |
openstackgerrit | Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550 | 08:26 |
openstackgerrit | Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550 | 08:27 |
openstackgerrit | Niraj Singh proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project https://review.openstack.org/516552 | 08:27 |
*** pgadiya has joined #openstack-infra | 08:27 | |
*** salv-orlando has quit IRC | 08:27 | |
*** salv-orlando has joined #openstack-infra | 08:28 | |
*** d0ugal has quit IRC | 08:29 | |
*** d0ugal has joined #openstack-infra | 08:29 | |
*** markvoelker has quit IRC | 08:30 | |
*** alexchadin has joined #openstack-infra | 08:31 | |
*** gcb has joined #openstack-infra | 08:32 | |
*** salv-orlando has quit IRC | 08:32 | |
*** jpena|off is now known as jpena | 08:33 | |
*** ociuhandu has joined #openstack-infra | 08:42 | |
*** ykarel|lunch is now known as ykarel | 08:42 | |
*** amoralej|off is now known as amoralej | 08:44 | |
*** jpich has joined #openstack-infra | 08:44 | |
*** hashar has joined #openstack-infra | 08:46 | |
*** edmondsw has joined #openstack-infra | 08:49 | |
*** salv-orlando has joined #openstack-infra | 08:50 | |
*** edmondsw has quit IRC | 08:53 | |
*** sdague has joined #openstack-infra | 08:53 | |
*** dhajare has quit IRC | 08:56 | |
*** baoli has quit IRC | 08:58 | |
*** baoli has joined #openstack-infra | 08:58 | |
ianw | leyal: are you behind some sort of nat? | 08:58 |
leyal | ianw , thanks for answering me . i am from working from my home - so i am the only one that will gerrit from this network .. | 09:00 |
*** dingyichen has quit IRC | 09:01 | |
*** gmann is now known as gmann_afk | 09:02 | |
*** cuongnv has quit IRC | 09:03 | |
*** gcb_ has joined #openstack-infra | 09:03 | |
*** baoli has quit IRC | 09:03 | |
*** zhurong has quit IRC | 09:03 | |
*** gcb has quit IRC | 09:04 | |
*** annp has quit IRC | 09:04 | |
*** dhajare has joined #openstack-infra | 09:05 | |
ianw | leyal: i can see logins from yourself but no particular errors. is this persistent? | 09:06 |
leyal | ianw, It's started yesterday , and since than it's persistent , (i tried git review ~10 times in the last 3 hours ) | 09:10 |
Liced | hi, ajeaeger told me yesterday that translation job was broken 10 days ago. the translation doesn't work for https://github.com/openstack/networking-bgpvpn and now the last merge on the project was yesterday. so my setup translation doesn't work and I don't find the solution | 09:11 |
Liced | translation support is done https://review.openstack.org/486349 and translation is activated in project-config https://review.openstack.org/509178 but with the last merge the project in zanata is still empty | 09:14 |
*** jascott1 has quit IRC | 09:14 | |
*** jascott1 has joined #openstack-infra | 09:15 | |
*** martinkopec has joined #openstack-infra | 09:15 | |
*** Kevin_Zheng has joined #openstack-infra | 09:15 | |
*** lucas-afk is now known as lucasagomes | 09:19 | |
*** jascott1 has quit IRC | 09:19 | |
*** electrofelix has joined #openstack-infra | 09:20 | |
ianw | gerrit2@review:~$ ssh -i review_site/etc/ssh_host_rsa_key -p 29418 'Gerrit Code Review'@127.0.0.1 gerrit show-connections -n | grep 'a/26131' | wc -l | 09:21 |
ianw | 64 | 09:21 |
ianw | infra-root: ^ somehow leyal is leaking connections | 09:21 |
*** yamamoto has joined #openstack-infra | 09:24 | |
ianw | leyal: i've forcibly closed all the open connections, can you try again? | 09:24 |
*** markvoelker has joined #openstack-infra | 09:27 | |
*** jistr|mtgs is now known as jistr | 09:29 | |
leyal | ianw, i tried again and it's ok now .. | 09:32 |
*** yamamoto has quit IRC | 09:34 | |
ianw | ok, and i confirmed there was no open connection just now, so it doesn't appear to be leaking any more | 09:34 |
*** liujiong_lj has joined #openstack-infra | 09:35 | |
*** liujiong has quit IRC | 09:35 | |
ianw | seeing as both are from ip's not your current one, but from your isp, i think it's probably transient. keep an eye, if problems reoccur you can point back to this in logs and we can investigate further | 09:35 |
*** SpamapS has quit IRC | 09:36 | |
*** sflanigan has quit IRC | 09:36 | |
*** sflanigan has joined #openstack-infra | 09:36 | |
*** sflanigan has joined #openstack-infra | 09:36 | |
*** bradm has quit IRC | 09:36 | |
ianw | ("both" above being the ip's against the open sessions that i killed, to be clear) | 09:37 |
*** wolverineav has joined #openstack-infra | 09:38 | |
leyal | ianw, thanks ! | 09:39 |
*** bradm has joined #openstack-infra | 09:39 | |
*** SpamapS has joined #openstack-infra | 09:40 | |
*** shardy has quit IRC | 09:42 | |
*** shardy has joined #openstack-infra | 09:42 | |
*** dsariel__ has joined #openstack-infra | 09:44 | |
*** wolverineav has quit IRC | 09:46 | |
*** wolverineav has joined #openstack-infra | 09:47 | |
*** owalsh- is now known as owalsh | 09:47 | |
ianw | jeblair / pabelanger: i have not ended up touching ze02 as i haven't had a chance to look for any info on the finger death. i can see tomorrow if you don't get to it | 09:48 |
*** slaweq has joined #openstack-infra | 09:49 | |
openstackgerrit | Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550 | 09:51 |
*** wolverineav has quit IRC | 09:51 | |
*** slaweq_ has quit IRC | 09:51 | |
*** pgadiya has quit IRC | 09:52 | |
*** bradm has quit IRC | 09:53 | |
*** bradm has joined #openstack-infra | 09:54 | |
*** sambetts|afk is now known as sambetts | 09:55 | |
*** kjackal_ has joined #openstack-infra | 09:55 | |
*** mandre has quit IRC | 09:55 | |
*** mandre_ has joined #openstack-infra | 09:56 | |
*** mandre_ is now known as mandre | 09:56 | |
openstackgerrit | Merged openstack-infra/project-config master: Fix Grafana neutron-lib dashboard https://review.openstack.org/514801 | 09:59 |
*** namnh has quit IRC | 10:00 | |
*** armaan_ has joined #openstack-infra | 10:00 | |
*** jistr_ has joined #openstack-infra | 10:00 | |
*** hemna- has joined #openstack-infra | 10:00 | |
*** niska` has joined #openstack-infra | 10:00 | |
*** markvoelker has quit IRC | 10:00 | |
*** xhku_ has joined #openstack-infra | 10:01 | |
openstackgerrit | Merged openstack-infra/project-config master: Publish requirements loci images to DockerHub https://review.openstack.org/512941 | 10:01 |
openstackgerrit | Merged openstack-infra/project-config master: ironic: Remove publish-to-pypi add release-openstack-server https://review.openstack.org/516453 | 10:01 |
*** witek has quit IRC | 10:01 | |
*** niska has quit IRC | 10:01 | |
*** jistr has quit IRC | 10:01 | |
*** jschlueter has quit IRC | 10:01 | |
*** hemna has quit IRC | 10:01 | |
*** fbouliane has quit IRC | 10:01 | |
*** michaelxin has quit IRC | 10:01 | |
*** timrc has quit IRC | 10:01 | |
*** armaan has quit IRC | 10:01 | |
*** rwsu has quit IRC | 10:01 | |
*** krtaylor has quit IRC | 10:01 | |
*** askb has quit IRC | 10:01 | |
*** zerick has quit IRC | 10:01 | |
*** migi has quit IRC | 10:01 | |
*** admcleod_ has quit IRC | 10:01 | |
*** admcleod has joined #openstack-infra | 10:01 | |
*** zerick has joined #openstack-infra | 10:01 | |
*** isq_ has joined #openstack-infra | 10:01 | |
*** askb has joined #openstack-infra | 10:02 | |
*** krtaylor has joined #openstack-infra | 10:02 | |
*** rwsu has joined #openstack-infra | 10:02 | |
*** witek has joined #openstack-infra | 10:02 | |
*** michaelxin has joined #openstack-infra | 10:02 | |
*** timrc has joined #openstack-infra | 10:02 | |
*** migi has joined #openstack-infra | 10:02 | |
*** isq has quit IRC | 10:03 | |
*** Jeffrey4l has quit IRC | 10:04 | |
*** zoli has quit IRC | 10:04 | |
*** pgadiya has joined #openstack-infra | 10:06 | |
*** pblaho has joined #openstack-infra | 10:06 | |
*** zoli has joined #openstack-infra | 10:06 | |
*** Jeffrey4l has joined #openstack-infra | 10:07 | |
*** pblaho has quit IRC | 10:09 | |
*** pblaho has joined #openstack-infra | 10:09 | |
*** bkero has quit IRC | 10:11 | |
*** kota_ has quit IRC | 10:11 | |
*** kota_ has joined #openstack-infra | 10:11 | |
*** bkero has joined #openstack-infra | 10:12 | |
*** tobiash has quit IRC | 10:12 | |
*** tobiash has joined #openstack-infra | 10:15 | |
*** e0ne has joined #openstack-infra | 10:15 | |
*** liujiong_lj has quit IRC | 10:18 | |
*** pgadiya has quit IRC | 10:24 | |
*** udesale has quit IRC | 10:26 | |
*** sree has joined #openstack-infra | 10:29 | |
*** boden has joined #openstack-infra | 10:33 | |
*** sree has quit IRC | 10:33 | |
*** sree has joined #openstack-infra | 10:35 | |
*** jschlueter|znc has joined #openstack-infra | 10:35 | |
*** hemna_ has quit IRC | 10:36 | |
*** thorst has joined #openstack-infra | 10:36 | |
*** edmondsw has joined #openstack-infra | 10:37 | |
*** pgadiya has joined #openstack-infra | 10:38 | |
*** yamamoto has joined #openstack-infra | 10:39 | |
*** sree has quit IRC | 10:39 | |
*** edmondsw has quit IRC | 10:41 | |
*** ociuhandu has quit IRC | 10:42 | |
*** [HeOS] has quit IRC | 10:42 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 10:43 |
*** thorst has quit IRC | 10:43 | |
AJaeger | Liced: let me check... | 10:46 |
*** ociuhandu has joined #openstack-infra | 10:46 | |
AJaeger | Liced: that merged yesterday at a time that Zuul was unhappy ;( | 10:46 |
AJaeger | We had to restart Zuul and the post job never run. | 10:47 |
AJaeger | Liced: So, waiting for next merge ;( | 10:47 |
*** vsaienk0 has quit IRC | 10:47 | |
*** gongysh has quit IRC | 10:51 | |
*** pbourke has quit IRC | 10:53 | |
*** vsaienk0 has joined #openstack-infra | 10:53 | |
*** pbourke has joined #openstack-infra | 10:54 | |
*** ijw has joined #openstack-infra | 10:55 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Increase timeout for requirements propose job https://review.openstack.org/516610 | 10:57 |
*** markvoelker has joined #openstack-infra | 10:58 | |
*** ijw has quit IRC | 10:59 | |
*** huanxie has quit IRC | 10:59 | |
hwoarang | good day | 11:02 |
hwoarang | I am seeing some problems with some internal openstack mirrors for opensuse | 11:02 |
hwoarang | as you can see from here http://logs.openstack.org/27/511227/2/gate/openstack-ansible-functional-opensuse-423/c0d460c/host/lxc-cache-prep-commands.log.txt.gz downloading a package takes a while a the job times out | 11:03 |
hwoarang | the mirror which is used is http://mirror.mtl01.inap.openstack.org/opensuse/... | 11:03 |
hwoarang | fungi pabelanger dirk^ | 11:04 |
hwoarang | it's been hitting all the openstack-ansible jobs for quite a while | 11:04 |
dirk | hwoarang: about to turn off mobile phone for the next 24 hours. Will be back from Sydney | 11:08 |
dirk | hwoarang: it looks like I can reach that mirror. Maybe mtu or ipv6 issues? | 11:08 |
hwoarang | dirk: the mirror is reachable but terribly slow as it seems from the job output. it takes 10 minutes to download a few packages | 11:09 |
*** priteau has quit IRC | 11:09 | |
hwoarang | and the job is killed | 11:09 |
*** priteau has joined #openstack-infra | 11:10 | |
*** Hal has joined #openstack-infra | 11:10 | |
*** Hal is now known as Guest95277 | 11:10 | |
*** do3 has joined #openstack-infra | 11:10 | |
*** priteau has quit IRC | 11:11 | |
*** do3 has left #openstack-infra | 11:11 | |
*** hashar is now known as hasharLunch | 11:12 | |
dirk | hwoarang: smells like mtu issue to me | 11:12 |
dirk | hwoarang: can you add debug output for testing that theory? Maybe there is something mtu related different just for opensuse | 11:13 |
hwoarang | dirk: i will have a look but ubuntu also uses mtu 1500 | 11:14 |
*** armaan_ has quit IRC | 11:14 | |
*** armaan has joined #openstack-infra | 11:15 | |
dirk | hwoarang: and you can reproduce the slowness? | 11:17 |
*** rcernin has quit IRC | 11:18 | |
hwoarang | i can't reproduce it outside of openstack gates | 11:18 |
*** ldnunes has joined #openstack-infra | 11:18 | |
dirk | hwoarang: weird. | 11:19 |
*** kjackal_ has quit IRC | 11:20 | |
hwoarang | i have no proof that it's mtu related because the host progresses with downloads, setup fine and it only starts to fail about 20 minutes down the road | 11:20 |
hwoarang | when running a chroot zypper command to prepare a chroot | 11:20 |
hwoarang | anyway | 11:20 |
*** stakeda has quit IRC | 11:22 | |
*** panda|ruck|off is now known as panda|ruck | 11:24 | |
openstackgerrit | Arx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag https://review.openstack.org/516624 | 11:24 |
*** armaan has quit IRC | 11:25 | |
*** armaan has joined #openstack-infra | 11:26 | |
*** hemna has joined #openstack-infra | 11:29 | |
vdrok | good morning folks. could someone take a look at https://review.openstack.org/515716? I've added job definitions as per ML thread, but still only the jobs from project-config are run | 11:29 |
*** wolverineav has joined #openstack-infra | 11:29 | |
*** markvoelker has quit IRC | 11:31 | |
*** sileht has quit IRC | 11:33 | |
*** sileht has joined #openstack-infra | 11:34 | |
*** ociuhandu has quit IRC | 11:36 | |
*** armaan has quit IRC | 11:37 | |
*** armaan has joined #openstack-infra | 11:37 | |
*** armaan has quit IRC | 11:42 | |
*** smatzek has joined #openstack-infra | 11:44 | |
*** esberglu has joined #openstack-infra | 11:48 | |
*** salv-orlando has quit IRC | 11:48 | |
*** pgadiya has quit IRC | 11:51 | |
*** rosmaita has joined #openstack-infra | 11:51 | |
*** esberglu has quit IRC | 11:52 | |
*** udesale has joined #openstack-infra | 11:54 | |
*** salv-orlando has joined #openstack-infra | 11:55 | |
*** jaypipes has joined #openstack-infra | 11:55 | |
*** thorst has joined #openstack-infra | 11:58 | |
*** kjackal_ has joined #openstack-infra | 12:05 | |
*** hemna has quit IRC | 12:07 | |
*** shardy is now known as shardy_lunch | 12:07 | |
*** alexchadin has quit IRC | 12:09 | |
*** yamamoto has quit IRC | 12:11 | |
*** yamamoto has joined #openstack-infra | 12:11 | |
*** thorre has quit IRC | 12:13 | |
*** dprince has joined #openstack-infra | 12:13 | |
*** thorre has joined #openstack-infra | 12:16 | |
*** armaan has joined #openstack-infra | 12:17 | |
*** hemna has joined #openstack-infra | 12:19 | |
*** armaan has quit IRC | 12:20 | |
*** markvoelker has joined #openstack-infra | 12:21 | |
*** dhajare has quit IRC | 12:22 | |
*** martinkopec has quit IRC | 12:24 | |
*** salv-orlando has quit IRC | 12:24 | |
*** martinkopec has joined #openstack-infra | 12:25 | |
*** edmondsw_ has joined #openstack-infra | 12:25 | |
*** rhallisey has joined #openstack-infra | 12:28 | |
*** thorst_ has joined #openstack-infra | 12:29 | |
*** thorst has quit IRC | 12:31 | |
*** catintheroof has joined #openstack-infra | 12:31 | |
*** catintheroof has quit IRC | 12:32 | |
*** catintheroof has joined #openstack-infra | 12:32 | |
*** rlandy has joined #openstack-infra | 12:34 | |
*** trown|outtypewww is now known as trown | 12:34 | |
*** dave-mccowan has joined #openstack-infra | 12:35 | |
*** [HeOS] has joined #openstack-infra | 12:38 | |
*** jcoufal has joined #openstack-infra | 12:39 | |
*** salv-orlando has joined #openstack-infra | 12:39 | |
*** jonher has joined #openstack-infra | 12:40 | |
*** tosky has joined #openstack-infra | 12:44 | |
Shrews | pabelanger: jeblair: I searched ze02 logs for the new finger daemon logging on abnormal exception back to Oct 20th. Found nothing. | 12:45 |
dmsimard | Shrews: that catches finger:// urls being returned as logs ? | 12:46 |
*** lucasagomes is now known as lucas-hungry | 12:47 | |
*** Dinesh_Bhor has quit IRC | 12:47 | |
Shrews | dmsimard: no | 12:47 |
Shrews | only unexpected exceptions from the daemon | 12:48 |
dmsimard | Shrews: oh, okay, cause I have a reproducer for those errors :) | 12:48 |
*** udesale has quit IRC | 12:49 | |
*** udesale has joined #openstack-infra | 12:49 | |
*** zhurong has joined #openstack-infra | 12:49 | |
*** shardy_lunch is now known as shardy | 12:49 | |
*** LindaWang has quit IRC | 12:50 | |
*** jpena is now known as jpena|lunch | 12:51 | |
*** dhajare has joined #openstack-infra | 12:52 | |
Liced | AJaeger: bad luck for me | 12:52 |
*** esberglu has joined #openstack-infra | 12:54 | |
*** udesale has quit IRC | 12:56 | |
*** yamamoto has quit IRC | 12:56 | |
*** udesale has joined #openstack-infra | 12:56 | |
*** mandre is now known as mandre_afk | 12:56 | |
*** janki has quit IRC | 12:58 | |
*** bh526r has joined #openstack-infra | 13:01 | |
*** felipemonteiro has joined #openstack-infra | 13:03 | |
*** martinkopec has quit IRC | 13:03 | |
*** mat128 has joined #openstack-infra | 13:04 | |
*** edmondsw_ is now known as edmondsw | 13:04 | |
*** martinkopec has joined #openstack-infra | 13:04 | |
*** LindaWang has joined #openstack-infra | 13:06 | |
*** hasharLunch is now known as hashar | 13:09 | |
*** yamamoto has joined #openstack-infra | 13:10 | |
*** kgiusti has joined #openstack-infra | 13:13 | |
*** bh526r has quit IRC | 13:13 | |
*** bh526r has joined #openstack-infra | 13:14 | |
*** yamamoto has quit IRC | 13:15 | |
*** jcoufal_ has joined #openstack-infra | 13:16 | |
*** jascott1 has joined #openstack-infra | 13:17 | |
*** mriedem has joined #openstack-infra | 13:18 | |
*** jcoufal has quit IRC | 13:18 | |
*** salv-orlando has quit IRC | 13:19 | |
*** hemna has quit IRC | 13:20 | |
*** salv-orlando has joined #openstack-infra | 13:20 | |
*** jascott1 has quit IRC | 13:21 | |
*** bobh has joined #openstack-infra | 13:23 | |
jonher | Is it possible to merge two gerrit accounts? I seem to have double accounts because I've logged in using two different ubuntu accounts to review | 13:23 |
*** salv-orlando has quit IRC | 13:24 | |
*** smatzek has quit IRC | 13:26 | |
*** smatzek has joined #openstack-infra | 13:27 | |
openstackgerrit | Luka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin https://review.openstack.org/516673 | 13:27 |
Shrews | vdrok: That's a good question. I'm not sure what's going on there. We'll have to wait for jeblair b/c I'm interested in the reason too. | 13:30 |
*** smatzek has quit IRC | 13:31 | |
openstackgerrit | Arx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag https://review.openstack.org/516624 | 13:32 |
*** nikhil has joined #openstack-infra | 13:32 | |
*** jaosorior has quit IRC | 13:33 | |
*** baoli has joined #openstack-infra | 13:35 | |
vdrok | Shrews: ok, thank you | 13:36 |
openstackgerrit | Luka Peschke proposed openstack-infra/project-config master: Add initial jobs for CloudKitty Tempest plugin https://review.openstack.org/516679 | 13:36 |
*** ldnunes has quit IRC | 13:37 | |
*** lbragstad has joined #openstack-infra | 13:37 | |
*** ldnunes has joined #openstack-infra | 13:37 | |
*** lucas-hungry is now known as lucasagomes | 13:41 | |
*** eharney has joined #openstack-infra | 13:41 | |
*** yamamoto has joined #openstack-infra | 13:41 | |
*** zhurong has quit IRC | 13:42 | |
*** hongbin has joined #openstack-infra | 13:43 | |
*** dhajare has quit IRC | 13:43 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 13:44 |
*** yamamoto has quit IRC | 13:45 | |
*** felipemonteiro_ has joined #openstack-infra | 13:46 | |
openstackgerrit | Luka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin https://review.openstack.org/516673 | 13:46 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 13:48 |
*** amoralej is now known as amoralej|lunch | 13:48 | |
*** kiennt26 has joined #openstack-infra | 13:49 | |
*** felipemonteiro has quit IRC | 13:49 | |
openstackgerrit | Dmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload https://review.openstack.org/195043 | 13:50 |
*** jpena|lunch is now known as jpena | 13:52 | |
*** esberglu has quit IRC | 13:53 | |
mriedem | grenade seems to be busted against ocata changes http://logs.openstack.org/04/516404/1/check/legacy-grenade-dsvm-neutron-multinode/aa97464/logs/grenade.sh.txt.gz#_2017-10-30_18_28_09_865 | 13:53 |
mriedem | seeing ImportError issues with neutronlib | 13:54 |
mriedem | did something get EOL'ed? | 13:54 |
mriedem | tonyb: ^ | 13:54 |
mriedem | looks like at least neutron is eol | 13:54 |
mriedem | for newton | 13:54 |
mriedem | sdague: once anything that stable/newton relies on is eol then don't we have to just kill the grenade job in ocata? | 13:56 |
fungi | jonher: yeah, we've seen that happen if you change which e-mail address you give the ubuntu sso when logging in (gerrit accounts map to ubuntu sso ids, not to launchpad profiles, so even if your multiple ubuntu sso ids are associated with the same lp profile they'll result in distinct accounts in gerrit) | 13:56 |
*** oanson has quit IRC | 13:57 | |
fungi | jonher: what's the ssh username on one of the accounts? i should be able to use that to find the other account id by looking for e-mail address overlaps | 13:57 |
*** oanson has joined #openstack-infra | 13:57 | |
fungi | or worst case i'll try to match them up by full name | 13:58 |
jonher | old account id: 18279 that I want to keep. New one is ID 27013 | 13:58 |
fungi | even better. looking now | 13:58 |
jonher | thanks | 13:58 |
*** hemna has joined #openstack-infra | 13:58 | |
*** smatzek has joined #openstack-infra | 13:59 | |
*** iyamahat has joined #openstack-infra | 13:59 | |
*** gcb_ has quit IRC | 14:00 | |
*** jokke_ has joined #openstack-infra | 14:00 | |
*** gcb_ has joined #openstack-infra | 14:00 | |
*** janki has joined #openstack-infra | 14:01 | |
*** smatzek has quit IRC | 14:03 | |
fungi | jonher: i've moved your new openid from account 27013 to account 18279 and set 27013 inactive. you may need to log out of and back into the gerrit webui to be in the correct account again | 14:03 |
jonher | Alright, thanks :) | 14:03 |
fungi | my pleasure | 14:03 |
fungi | let us know if you have any further trouble with it | 14:03 |
*** smatzek has joined #openstack-infra | 14:05 | |
*** mandre_afk is now known as mandre | 14:05 | |
*** ykarel has quit IRC | 14:06 | |
sdague | mriedem: yeh, it should be | 14:07 |
sdague | grenade should be turned off first before eoling branches | 14:08 |
mriedem | ok, trying to figure out how to do that in the new zuulv3 world | 14:08 |
*** esberglu has joined #openstack-infra | 14:09 | |
fungi | patch to stable/ocata to remove grenade jobs you've declared within that branch, or patch to project-config to adjust the branch filter for grenade jobs if defined there | 14:09 |
*** marst has joined #openstack-infra | 14:09 | |
mriedem | not openstack-zuul-jobs? | 14:10 |
mriedem | ok project-config it is | 14:10 |
AJaeger | mriedem: might be openstack-zuul-jobs as well - if you want to patch the job directly | 14:11 |
fungi | yeah, i'm looking now to see where that branch filter is set | 14:12 |
*** catintheroof has quit IRC | 14:12 | |
AJaeger | fungi: we can set it in openstack-zuul-jobs on the job itself - and then it applies everywhere | 14:12 |
mriedem | i know how to do it per-job in openstack-zuul-jobs/zuul.d/zuul-legacy-jobs | 14:12 |
AJaeger | yep | 14:12 |
mriedem | since layout.yaml is gone in project-config | 14:12 |
mriedem | ok | 14:12 |
fungi | yeah, we set it in project-config for the legacy grenade jobs at the moment, like http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n58 | 14:13 |
AJaeger | config-core, the requirements proposal job times out, please review this change to have it run longer https://review.openstack.org/516610 | 14:14 |
*** catintheroof has joined #openstack-infra | 14:14 | |
AJaeger | fungi: yes, that's per job and project. If we want to do it globally, it's openstack-zuul-jobs | 14:14 |
fungi | right there's a mix of the two right now but once we get the legacy jobs cleaned up we'll hopefully just have one place to update that in job-templates | 14:15 |
fungi | er, project-templates | 14:16 |
*** rbrndt has joined #openstack-infra | 14:16 | |
AJaeger | indeed that one as well... | 14:16 |
*** armax has joined #openstack-infra | 14:17 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694 | 14:17 |
mriedem | AJaeger: i think this ^ | 14:17 |
AJaeger | mriedem: yes, expect so | 14:18 |
fungi | well, except for the ones which are still set in project-config in the projects.yaml for now | 14:19 |
fungi | also, i don't think you want to remove the grenade-forward jobs from ocata | 14:19 |
mriedem | what does the forward job do again? | 14:19 |
fungi | since those test that you can upgrade from a proposed stable/ocata change to stable/pike | 14:19 |
mriedem | oh.. | 14:19 |
mriedem | will projects.yaml override what's in openstack-zuul-jobs/ | 14:20 |
mriedem | ? | 14:20 |
*** jaosorior has joined #openstack-infra | 14:20 | |
sdague | fungi: they've never been voting, I'm not convinced they even work | 14:20 |
fungi | sdague: perhaps we should remove them entirely in that case? | 14:20 |
fungi | newton eol isn't a reason to remove the grenade-forward jobs from ocata since they don't touch newton, but if there is a good reason to just drop the grenade-forward jobs globally then probably better to do that | 14:21 |
sdague | yeh, that's fine | 14:22 |
fungi | mriedem: i think the variants in the projects.yaml in project-config will override what's in openstack-zuul-jobs (though there aren't many) | 14:23 |
mriedem | ok i'll tinker with project-config too then | 14:23 |
*** armax has quit IRC | 14:23 | |
AJaeger | I agree with fungi, we need to update both | 14:24 |
openstackgerrit | Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694 | 14:24 |
*** amoralej|lunch is now known as amoralej | 14:27 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694 | 14:27 |
*** armax has joined #openstack-infra | 14:28 | |
*** Guest53850 has quit IRC | 14:29 | |
*** lamt has joined #openstack-infra | 14:29 | |
jeblair | vdrok: i don't see what's wrong with that patch. i lost a debugging tool in a recent zuul restart; i may need to restart it again to get it back to dig into that. | 14:30 |
*** catintheroof has quit IRC | 14:30 | |
*** eharney has quit IRC | 14:31 | |
jeblair | i'll do that now, unless anyone objects | 14:32 |
dmsimard | AJaeger: commented on https://review.openstack.org/#/c/516397/ | 14:33 |
fungi | jeblair: no objection from me | 14:33 |
AJaeger | dmsimard: that's so far the only repo that needs it and therefore I did it at repo level | 14:34 |
AJaeger | dmsimard: do you see that this is needed by more repos? | 14:35 |
dmsimard | AJaeger: that's curious, why wouldn't this be required for other repos ? it's the same job isn't it ? | 14:35 |
*** catintheroof has joined #openstack-infra | 14:35 | |
AJaeger | dmsimard: that repo installs in its tox_install requirements repo... | 14:36 |
*** ykarel has joined #openstack-infra | 14:36 | |
*** salv-orlando has joined #openstack-infra | 14:37 | |
*** spectr has quit IRC | 14:37 | |
jeblair | restarted and re-enqueueing now | 14:37 |
*** vsaienk0 has quit IRC | 14:38 | |
*** spectr has joined #openstack-infra | 14:39 | |
*** salv-orl_ has joined #openstack-infra | 14:40 | |
*** nicolasbock has joined #openstack-infra | 14:41 | |
dmsimard | AJaeger: the failing playbook is actually this one: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/file/7f67273f-9e6a-4fa4-91a4-a6ab4c28c511/ ## task: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/result/a2ce2da7-1b23-41fe-9d9c-07e3edc27aea/ | 14:42 |
dmsimard | "python-tarball/run.yaml" is used for those jobs: 1) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n132 2) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n153 and 3) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n401 | 14:42 |
dmsimard | #3, our failing job, is the only one without requirements | 14:42 |
jaosorior | after zuul was restarted, should the jobs be rechecked or will they get requeued automatically? | 14:43 |
jeblair | jaosorior: they are being re-enqueued now | 14:43 |
dmsimard | jaosorior: they get requeued by the operator who restarts zuul | 14:43 |
dmsimard | it's not automatic in the sense that it still requires manual injection, right jeblair ? | 14:43 |
AJaeger | dmsimard: AH! | 14:44 |
dmsimard | AJaeger: does that make sense ? | 14:44 |
jaosorior | I see | 14:44 |
openstackgerrit | Merged openstack-infra/project-config master: Remove zuul/mapping and job https://review.openstack.org/516029 | 14:44 |
jeblair | jaosorior: if your change isn't there now, we may have missed it so you should recheck it | 14:44 |
AJaeger | dmsimard: my hero, thanks for digging into this. Let me double check... | 14:44 |
jeblair | dmsimard: ya | 14:44 |
*** salv-orlando has quit IRC | 14:44 | |
*** amotoki has quit IRC | 14:44 | |
*** jbernard has quit IRC | 14:44 | |
*** cshastri has quit IRC | 14:44 | |
jaosorior | that makes sense | 14:44 |
jaosorior | jeblair, dmsimard: thanks for the info :D | 14:45 |
*** charz has quit IRC | 14:45 | |
jaosorior | the workings of zuul are still quite unknown to me | 14:45 |
AJaeger | dmsimard: yes, makes sense. Do you want to patch it? | 14:45 |
* AJaeger will abandon his ... | 14:45 | |
dmsimard | AJaeger: it's the same repo you can just submit another patchset | 14:45 |
*** jbernard has joined #openstack-infra | 14:45 | |
dmsimard | I can submit it if you want | 14:45 |
jeblair | vdrok: well, that's annoying. 515716 appears to be working correctly after the restart. | 14:46 |
*** eharney has joined #openstack-infra | 14:46 | |
AJaeger | dmsimard: if you have time, go for it, please | 14:46 |
dmsimard | ack | 14:46 |
vdrok | jeblair: I see, well, that's not so bad :) | 14:46 |
jeblair | vdrok: i guess let me know if it happens again :| | 14:46 |
vdrok | sure, will do. thanks for looking into this | 14:46 |
fungi | dmsimard: correct, we run a python script to generate a shell script based on the contents of specific pipelines obtained from the scheduler's status.json before stopping the service, and then run that shell script once the scheduler is back up and running again (the shell script consists of preformatted calls to the zuul enqueue rpc cli) | 14:47 |
*** udesale has quit IRC | 14:47 | |
*** amotoki has joined #openstack-infra | 14:47 | |
dmsimard | fungi: if that script hasn't changed since v2, I know what script it is :) | 14:47 |
*** charz has joined #openstack-infra | 14:47 | |
*** psachin has quit IRC | 14:48 | |
fungi | dmsimard: it's changed just ever so slightly to add --tenant | 14:49 |
*** david-lyle has quit IRC | 14:50 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694 | 14:50 |
openstackgerrit | Dmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload https://review.openstack.org/195043 | 14:50 |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions https://review.openstack.org/516705 | 14:53 |
dmsimard | AJaeger: so, digging a bit further for that requirements thing... it turns out this is the culprit: http://codesearch.openstack.org/?q=%5C%24ZUUL_CLONER&i=nope&files=&repos= | 14:55 |
dmsimard | tools/tox_install.sh all over the place uses ZUUL_CLONER to ensure that (amongst other things) requirements is there | 14:56 |
*** xarses has joined #openstack-infra | 14:56 | |
jeblair | mordred is working to change the pti so we build tarballs differently and won't need that anymore | 14:58 |
fungi | yep, that's so local devs can experience a consistent experience and don't have to wonder why their unconstrained unit tests on their workstation are broken while the same patches pass testing in our constrained ci jobs | 14:58 |
*** dtantsur|afk is now known as dtantsur | 14:58 | |
*** salv-orl_ has quit IRC | 14:58 | |
*** salv-orlando has joined #openstack-infra | 14:58 | |
*** gcb_ has quit IRC | 14:58 | |
fungi | but overriding how tox builds all its virtualenvs is a pretty clumsy hammer, and then when we turn around and use tox for jobs which don't actually need it we end up with nasty side effects like that | 14:59 |
dmsimard | AJaeger: that gives us the list of repos requiring requirements http://codesearch.openstack.org/?q=openstack%2Frequirements&i=nope&files=tools%2Ftox_install.sh&repos= | 14:59 |
jeblair | let's just put it in the job for now, so it's easy to clean up when mordred finishes his work | 14:59 |
jeblair | or template or whatever. ie, not on individual projects. | 15:00 |
dmsimard | yup, taking care of it | 15:00 |
*** gcb_ has joined #openstack-infra | 15:01 | |
*** diablo_rojo has quit IRC | 15:01 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball https://review.openstack.org/516397 | 15:01 |
dmsimard | AJaeger: ^ | 15:01 |
dmsimard | EmilienM: are you around ? | 15:02 |
*** yamamoto has joined #openstack-infra | 15:04 | |
jeblair | i'm going to restart all of the executors | 15:05 |
*** lbragstad has quit IRC | 15:05 | |
jeblair | i don't think they were cleaned up properly after the unclean shutdown the other day | 15:05 |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502 | 15:05 |
Shrews | jeblair: would this be a good time to restart the np launchers? we have a couple of bug fixes that should go in | 15:06 |
EmilienM | dmsimard: yes | 15:06 |
*** salv-orlando has quit IRC | 15:06 | |
jeblair | Shrews: yes... though i think theoretically any time should be fine? :) | 15:07 |
*** salv-orlando has joined #openstack-infra | 15:07 | |
pabelanger | dmsimard: AJaeger: when we remove zuul-cloner from images, that is going to break branch-tarball jobs right? re 516397. Maybe we need to be creating a legacy branch tarball job that will still use zuul-cloner, or I think mordred has patches to remove the needed for tox_install.sh | 15:07 |
AJaeger | dmsimard: we need horizon and neutron as well - do you want to update again or shall I? | 15:07 |
Shrews | jeblair: infra-root: i'm going to do that then. restarting launchers (unless i hear any objections) | 15:07 |
*** lbragstad has joined #openstack-infra | 15:08 | |
pabelanger | +1 | 15:08 |
jeblair | pabelanger: are you suggesting we have a job using the old v2 zuul-cloner? | 15:08 |
*** sree has joined #openstack-infra | 15:08 | |
fungi | Shrews: sounds good, thanks | 15:09 |
AJaeger | pabelanger: yes, this needs some analysis - the use of tox_install and zuul-cloner will be fun once we remoe zuul-cloner from images | 15:09 |
jeblair | no wait | 15:09 |
jeblair | AJaeger, pabelanger: no jobs should be using the copy of zuul-cloner on the images, if you think a job is, please investigate and confirm that now | 15:10 |
pabelanger | jeblair: not sure, just indicating when we remove zuul-cloner from images, and base playbook branch-tarballs job look like they are going to break | 15:10 |
Shrews | infra-root: restarted nodepool-launcher on both nl01 and nl02 | 15:10 |
jeblair | pabelanger: *please* confirm that. it should not be the case. | 15:10 |
pabelanger | yes, looking now | 15:10 |
openstackgerrit | Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694 | 15:11 |
*** ykarel has quit IRC | 15:11 | |
AJaeger | dmsimard: will update | 15:11 |
dmsimard | AJaeger: I guess it would avoid having to create -horizon and -neutron variants (which I dislike very much) | 15:11 |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions https://review.openstack.org/516705 | 15:11 |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config master: Remove legacy-grenade-dsvm-neutron-nova-next https://review.openstack.org/516711 | 15:11 |
AJaeger | dmsimard: agreed, let me fix | 15:11 |
*** salv-orlando has quit IRC | 15:11 | |
*** sree has quit IRC | 15:12 | |
AJaeger | jeblair: I see both required-projects with "name: some-repo" and with just "some-repo". Are both valid? Any preference? | 15:13 |
AJaeger | check project-config/zuul.d/jobs.yaml | 15:13 |
jeblair | AJaeger: both valid; if not using a branch specifier, i'd prefer just "some-repo" | 15:14 |
*** yamamoto has quit IRC | 15:14 | |
*** vsaienk0 has joined #openstack-infra | 15:15 | |
AJaeger | ok | 15:15 |
EmilienM | dmsimard: what's up? | 15:15 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball https://review.openstack.org/516397 | 15:15 |
dmsimard | EmilienM: sorry got sidetracked | 15:15 |
AJaeger | dmsimard: updated ^ | 15:15 |
*** dtantsur is now known as dtantsur|afk | 15:15 | |
dmsimard | EmilienM: mnaser has a nice series of patches here but I put them on hold as you were working on migration https://review.openstack.org/#/c/515972/1 | 15:15 |
mriedem | AJaeger: fungi: openstack-zuul-jobs with a depends-on to project-config won't work, right? | 15:16 |
mriedem | project-config changes still have to merge first? | 15:16 |
mriedem | https://review.openstack.org/#/c/516694/ | 15:16 |
EmilienM | dmsimard: he can go ahead - I won't have time to work on that until after summit. | 15:16 |
dmsimard | EmilienM: it's work that you'd need to do after the migration anyway, so I thought maybe we ought to land those but that means you'll need to rebase | 15:16 |
EmilienM | dmsimard: I'll rebase | 15:16 |
AJaeger | mriedem: yes, it has to. So, push them both up and then recheck the openstack-zuul-jobs once project-config merged | 15:16 |
dmsimard | EmilienM: ok, it's not going to be so much a rebase as a rewrite but sure | 15:16 |
EmilienM | dmsimard: no worries | 15:16 |
jeblair | mriedem: they won't run tests with the new content, but they will still perform config syntax validation and ensure merging in the right order, so generally worth including the footer still. | 15:16 |
dmsimard | EmilienM: ack, are you confident if I review those or would you rather we keep jobs frozen for now ? | 15:17 |
mriedem | jeblair: then i don't know why this is failing https://review.openstack.org/#/c/516694/ | 15:17 |
dmsimard | EmilienM: they're no-op for the most part, just reducing duplication and streamlining | 15:17 |
AJaeger | dmsimard: EmilienM gave a +1, so you could +2A if you like | 15:17 |
dmsimard | AJaeger: double checking :) | 15:17 |
dmsimard | tripleo has had a bumpy gate recently | 15:18 |
AJaeger | dmsimard: sure, appreciated... | 15:18 |
clarkb | dmsimard: fyi https://review.openstack.org/516502 may be of interest to you | 15:18 |
dmsimard | mriedem: you can't do a depends-on on a project-config patch | 15:18 |
*** jbadiapa has quit IRC | 15:18 | |
mriedem | dmsimard: that's what i thought, but see jeblair's comment | 15:18 |
EmilienM | dmsimard, AJaeger I need to review it properly | 15:19 |
dmsimard | mriedem: project-config is a "special" project in the context of zuul, it is used for storing secrets and trusted jobs | 15:19 |
mriedem | jeblair: but i see this now, "The syntax of the configuration in this change has been verified to be correct once the config project change upon which it depends is merged, but it can not be used until that occurs." | 15:19 |
mriedem | which is different, and better | 15:19 |
mriedem | so ok | 15:19 |
pabelanger | jeblair: okay, so projects that use tox_install.sh (eg: python-ironicclient) and publish-openstack-python-branch-tarball jobs will be okay when we merge https://review.openstack.org/514483/ (delete zuul-env from DIB) but will break when we land https://review.openstack.org/513506/ (fetch-zuul-cloner from base) | 15:19 |
EmilienM | dmsimard, AJaeger : a commit message would have helped | 15:19 |
*** jcoufal_ has quit IRC | 15:19 | |
*** jcoufal has joined #openstack-infra | 15:20 | |
pabelanger | I'm thinking, we could create legacy-publish-openstack-python-branch-tarball and parent to base-legacy which will pull in fetch-zuul-cloner | 15:20 |
jeblair | mriedem: yeah. i don't know why that didn't happen the first time.... :| | 15:20 |
pabelanger | then update jobs using zuul-cloner to that | 15:20 |
dmsimard | pabelanger: let's not create new legacy jobs | 15:20 |
mriedem | jeblair: i think it was just the order in which i pushed the changes up | 15:20 |
dmsimard | pabelanger: it was mentioned that mordred was working on fixing the different tox_install.sh | 15:20 |
jeblair | dmsimard: people can and should use depends-on against project-config patches. it lets zuul do config syntax validation, ensures they land in the right order, and helps human reviewers understand the sequencing. | 15:21 |
pabelanger | well, it isn't a new legacy job, it is just parenting to base-legacy, that pull in zuul-cloner | 15:21 |
EmilienM | AJaeger, dmsimard : lgtm | 15:21 |
dmsimard | jeblair: sure, I mean they can do it but it's not going to have the intended effect is what I mean. It "works" in the sense that it doesn't let that patch merge until the project-config patch merges but it doesn't actually apply and run the jobs intended to run | 15:22 |
pabelanger | but we have zuulv3 jobs, still using legacy code, which is something that is a little confusing too. Meaning, we are going to have breakages at some point | 15:22 |
dmsimard | jeblair: so it... half works ? | 15:22 |
*** martinkopec has quit IRC | 15:23 | |
jeblair | dmsimard: look at mriedem's change and the message that zuul reported. that's only possible because of the depends-on | 15:23 |
dmsimard | pabelanger: legacy code embedded in project repositories | 15:23 |
dmsimard | pabelanger: the jobs themselves aren't the ones doing zuul-cloner, it's tox_install.sh | 15:23 |
dmsimard | fungi mentioned earlier that the approach with tox_install.sh is fairly clunky to begin with | 15:24 |
jeblair | dmsimard: i'm just saying if folks ask, rather than saying "it doesn't work" say "it won't run jobs with the changes in effect but there are still several good reasons to do it". i don't want folks to think they should stop using those footers. | 15:24 |
dmsimard | jeblair: fair | 15:24 |
clarkb | also we've always used depends on just for the merge this first behavior | 15:25 |
clarkb | regardless of how it affects jobs | 15:25 |
clarkb | so that is not a regression and still valuable | 15:25 |
dmsimard | clarkb: I use(d) it a lot for the zuul-cloner factor :) | 15:25 |
clarkb | we're definitely not keeping up with the logstash worker load (up to 94k jobs queued since yesterday evening's restart) | 15:26 |
mriedem | AJaeger: thanks for hitting those patches | 15:27 |
clarkb | I'd like to get https://review.openstack.org/#/c/516502/2 reviewed, tested, and in to see if not indexing job-output.txt twice helps there | 15:27 |
pabelanger | dmsimard: right, but we need a plan to remove zuul-cloner that ideally doesn't break them. Today, the 2 patchs up to do so, will break them | 15:27 |
clarkb | so reviews on that and thoughts on testing very much welcome | 15:27 |
clarkb | I guess I need to update the issues ether pad too | 15:28 |
pabelanger | clarkb: I'm still trying to bring online a new worker, another puppet issue I am debugging now | 15:28 |
dmsimard | clarkb: sorry about that. I started down that road yesterday after our discussion, got sidetracked and then wanted to chat about it | 15:29 |
*** spectr has quit IRC | 15:29 | |
clarkb | dmsimard: its not a problem I learned new things with yamamoto's help | 15:29 |
openstackgerrit | Jose Luis Franco proposed openstack-infra/tripleo-ci master: WIP: Upgrade UC and OC using tripleo-upgrade role https://review.openstack.org/515643 | 15:30 |
dmsimard | clarkb: I'm not sure we want to blindly remove the .gz, I can see it affecting unexpected things -- but mostly because we gzip *by default* https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/upload-logs/tasks/main.yaml | 15:31 |
jeblair | okay all zuul-related processes on the executors are stopped | 15:31 |
jeblair | restarting them now | 15:31 |
fungi | clarkb et al: do we have any updates we want to give the board on our drive to beef up root sysadmin count in emea/apac? https://etherpad.openstack.org/p/syd-leadership-top-5-update | 15:32 |
dmsimard | clarkb: I think one of the things we discussed was to not assume it would be gzipped and it might be specific to openstack-infra, but really it's there by default so I would adjust the e-r queries to take that into account | 15:32 |
*** camunoz has joined #openstack-infra | 15:33 | |
*** catintheroof has quit IRC | 15:34 | |
*** gyee has joined #openstack-infra | 15:34 | |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717 | 15:37 |
clarkb | dmsimard: the problem is that we are ending up with job-output.txt and job-output.txt.gz on disk so we submit jobs to index both. So we need to pick one or the other. I chose picking the one that is backward compatbile with zuulv2 | 15:38 |
clarkb | we could choose to use the .gz but then we would have to udpate all the queries | 15:38 |
*** Liced has quit IRC | 15:39 | |
*** pblaho has quit IRC | 15:40 | |
*** catintheroof has joined #openstack-infra | 15:40 | |
*** pblaho has joined #openstack-infra | 15:40 | |
clarkb | (worth nothing this is a behavior difference between gzip the command and ansible archive module, gzip doesn't leave the original around but ansible archive does) | 15:40 |
*** gcb_ has quit IRC | 15:40 | |
clarkb | *worth noting | 15:41 |
*** kiennt26 has quit IRC | 15:41 | |
*** spectr has joined #openstack-infra | 15:42 | |
*** catintheroof has quit IRC | 15:42 | |
dmsimard | clarkb: pretty sure we can get ansible to delete the extra file | 15:43 |
dmsimard | http://docs.ansible.com/ansible/latest/archive_module.html "remove" "no" "Remove any added source files and trees after adding to archive." | 15:43 |
*** gcb_ has joined #openstack-infra | 15:43 | |
clarkb | we can but I also don't want to rely on everyone's ansible doing the right thing so this is defensive | 15:44 |
dmsimard | clarkb: you mean if I write a job that archives files relevant to me as an end user and then try to submit those ? | 15:45 |
clarkb | yes | 15:45 |
dmsimard | (and omit the remove parameter) | 15:46 |
dmsimard | hmm | 15:46 |
pabelanger | clarkb: okay, I think 516717 is our fix for logstash-workers, but waiting on zuul to report back | 15:46 |
*** notemerson has quit IRC | 15:47 | |
openstackgerrit | Miguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron https://review.openstack.org/516724 | 15:47 |
clarkb | pabelanger: I don't think you can use a before that way beacuse those are defines not classes | 15:47 |
clarkb | pabelanger: instead you want to require in each of the defines that that file is in place | 15:48 |
pabelanger | clarkb: Oh, right | 15:48 |
pabelanger | Hmm | 15:48 |
clarkb | so you just need one require for each define after config_file | 15:48 |
pabelanger | yah, I started doing that and switched | 15:48 |
pabelanger | let me update | 15:48 |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717 | 15:49 |
pabelanger | clarkb: ^ | 15:49 |
dmsimard | clarkb: in the current form of your patch I'm just worried of it backfiring in unexpected ways, but I'm not coming up with any examples to express my concerns.. closest I can come up with is matching .tar.gz. You mentioned the regexes are new, should we take that out and be explicit instead ? I guess we're here discussing this because the regex is matching unexpected things. | 15:50 |
*** xingchao has quit IRC | 15:50 | |
*** gongysh has joined #openstack-infra | 15:51 | |
*** xingchao has joined #openstack-infra | 15:51 | |
*** gongysh has quit IRC | 15:51 | |
clarkb | dmsimard: it is two things though, one is the regex overmatching. The other is we have broken the (somewhat loose) contract we had around the file metadata we inject to elasticsearch | 15:51 |
clarkb | I think this addresses both, whereas dropping the regex would only address one? | 15:51 |
clarkb | and as for things like tar those aren't valid indexable files anyways so would fail either way | 15:52 |
dmsimard | What contract is that ? | 15:52 |
pabelanger | dmsimard: clarkb: jeblair: fungi: mordred: AJaeger: we likely need to have some discussion around https://review.openstack.org/513506/ (Remove fetch-zuul-cloner from base job) should I add that to todays infra meeting or is that something we could hash out outside of the meeting? | 15:52 |
clarkb | dmsimard: we've always indexed with filename and tags dropping the .gz even if that is actually the name on disk | 15:52 |
dmsimard | pabelanger: discussing at the meeting is probably fair game | 15:52 |
clarkb | dmsimard: because logically the file is foo.txt not foo.txt.gz and our webserver honors that as well | 15:52 |
dmsimard | clarkb: console.html wasn't gzipped by default, was it ? | 15:52 |
dmsimard | hm, yeah other files, maybe | 15:53 |
clarkb | other files were console.html wasn't upfront but was eventually | 15:53 |
clarkb | and the webserver would serve console.html and console.html.gz from the same source | 15:53 |
dmsimard | yeah due to mime types and live decompressing | 15:53 |
dmsimard | and rewrite rules | 15:53 |
dmsimard | I'm familiar with those bits.. | 15:54 |
dmsimard | need to pick up my kids from school for lunch, I'll try to think about it | 15:54 |
clarkb | pabelanger: logstash fix lgtm. Lets see if anyone else is willing to review that quickly | 15:55 |
clarkb | (otherwise I think you can single approve) | 15:55 |
*** jbadiapa has joined #openstack-infra | 15:55 | |
*** Apoorva has joined #openstack-infra | 15:56 | |
clarkb | pabelanger: I think we can talk about zuul cloner thigns in today's meeting, go ahead and add it as a zuulv3 subtopic | 15:56 |
*** Apoorva has quit IRC | 15:56 | |
clarkb | I expect the meeting will be relatively quick? | 15:56 |
clarkb | I've got to pack :) so here is hoping | 15:56 |
*** Apoorva has joined #openstack-infra | 15:56 | |
*** smatzek has quit IRC | 15:57 | |
*** ihrachys has joined #openstack-infra | 15:57 | |
*** Apoorva has quit IRC | 15:57 | |
*** xingchao_ has joined #openstack-infra | 15:57 | |
*** gongysh has joined #openstack-infra | 15:57 | |
*** gongysh has quit IRC | 15:57 | |
*** slaweq has quit IRC | 15:58 | |
pabelanger | k, added | 15:58 |
*** xingchao has quit IRC | 15:58 | |
*** smatzek has joined #openstack-infra | 15:59 | |
*** janki has quit IRC | 16:00 | |
pabelanger | also added removal of jenkins to topic too | 16:00 |
*** bnemec has quit IRC | 16:00 | |
*** iyamahat has quit IRC | 16:00 | |
clarkb | fyi for others packing sydney weather is supposed to be damp and relatively cool so don't be tricked by recent news of a heat wave | 16:01 |
*** smatzek_ has joined #openstack-infra | 16:01 | |
* clarkb goes to find rain jacket | 16:01 | |
*** xingchao_ has quit IRC | 16:02 | |
*** yamamoto has joined #openstack-infra | 16:02 | |
*** yamamoto has quit IRC | 16:02 | |
*** smatzek has quit IRC | 16:03 | |
*** jaosorior has quit IRC | 16:03 | |
pabelanger | ya, I haven't looked at weather | 16:04 |
panda|ruck | cloud-y with a chance of tarballs. | 16:05 |
* fungi will pack a hat | 16:06 | |
*** david-lyle has joined #openstack-infra | 16:09 | |
*** esberglu has quit IRC | 16:09 | |
mwhahaha | seeing errors in the gate | 16:09 |
inc0 | hey, I can't figure out what happens in our mariadb setup - did you change iptables in gates during zuulv3 transition? | 16:10 |
mwhahaha | no logs, just jobs in 'error' | 16:10 |
jeblair | mwhahaha: zuul should have more detailed information on what the error is when it reports on the change | 16:10 |
mwhahaha | jeblair: http://zuulv3.openstack.org/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet | 16:10 |
mwhahaha | 404 | 16:11 |
mwhahaha | from status page | 16:11 |
mwhahaha | jeblair: see 485172,5 | 16:11 |
jeblair | mwhahaha: yeah, it doesn't have a log url | 16:11 |
*** bh526r has quit IRC | 16:11 | |
jeblair | mwhahaha: we'll know more when it reports | 16:11 |
*** xingchao has joined #openstack-infra | 16:11 | |
*** smatzek_ has quit IRC | 16:12 | |
*** edmondsw has quit IRC | 16:12 | |
*** smatzek has joined #openstack-infra | 16:12 | |
jeblair | (those errors could probably be added to the status.json, but i don't think they are there now) | 16:12 |
dmsimard | jeblair: not sure if it's related but I'm seeing an error for another job that did report back -- https://review.openstack.org/#/c/516397/ : openstack-tox-linters openstack-tox-linters : ERROR Project openstack/requirements does not have the default branch master | 16:13 |
dmsimard | that seems like an odd message | 16:13 |
jeblair | dmsimard: could be. i wonder if it's fallout from the executor restart. | 16:13 |
AJaeger | what is this? " build-tox-manuals-checkbuild build-tox-manuals-checkbuild : ERROR Project openstack/requirements does not have the default branch master" - change https://review.openstack.org/516696 | 16:13 |
dmsimard | jeblair, AJaeger: I just did a recheck on https://review.openstack.org/#/c/516397/ -- let's see if it reproduces | 16:14 |
AJaeger | and also on https://review.openstack.org/516397 | 16:14 |
dmsimard | AJaeger: we were just discussing that, yes | 16:14 |
AJaeger | dmsimard: sorry, read backscroll - found bug, paced - and you beat me to it ;) | 16:14 |
AJaeger | pasted I mean | 16:14 |
*** bnemec has joined #openstack-infra | 16:15 | |
AJaeger | clarkb: could you put the infra-publishing change on your review queue again, please? https://review.openstack.org/516010 | 16:16 |
jeblair | mwhahaha: seems likely the errors you're seeing are the same thing -- a transient issue caused by the zuul restart | 16:16 |
mwhahaha | :/ | 16:16 |
*** trown is now known as trown|lunch | 16:16 | |
mwhahaha | it's really screwing with the gate, is there some sort of recovery that can be done? | 16:17 |
openstackgerrit | Ronelle Landy proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag Also testing dlrn_hash_tag_newest set to specific hash. https://review.openstack.org/516624 | 16:17 |
mwhahaha | before the restart we were like 20hours behind | 16:17 |
*** panda|ruck is now known as panda|ruck|bbl | 16:18 | |
jeblair | mwhahaha: we could sacrifice the change at the head of the gate to force a reset on all the changes behind it | 16:18 |
fungi | or promote a change to cause it all to reshuffle | 16:19 |
mwhahaha | i already sacrafied a whole bunch of puppet jobs | 16:19 |
jeblair | fungi: yeah | 16:19 |
* fungi is unsure whether promoting the first change would actually restart jobs | 16:19 | |
jeblair | fungi: i don't think so | 16:20 |
jeblair | but promoting the second one ahead of the first should | 16:20 |
*** bhavik1 has joined #openstack-infra | 16:20 | |
AJaeger | the first three in the queue are looking fine, aren't they? So better let them through? | 16:20 |
jeblair | AJaeger: our only tool is moving it to the top | 16:20 |
jeblair | AJaeger: we could wait i guess | 16:20 |
mwhahaha | meh i guess we'll just have to deal | 16:20 |
mwhahaha | just really frustrating | 16:21 |
AJaeger | jeblair: at this time: I would wait - otherwise we lose the first two... | 16:21 |
jeblair | mwhahaha: you don't like any of the proposed options? | 16:21 |
*** smatzek has quit IRC | 16:21 | |
mwhahaha | not really since we haven't been able to merge anything in like a day | 16:21 |
fungi | well, we can give promote a list for which the first three are unchanged but swap the fifth for the fourth | 16:21 |
mwhahaha | how about retry if error :/ | 16:21 |
jeblair | fungi: oh we can? | 16:21 |
*** smatzek has joined #openstack-infra | 16:21 | |
fungi | promote used to at least take a list of changes | 16:22 |
jeblair | mwhahaha: these are "permanent" errors | 16:22 |
pabelanger | just looking into zuul-executors, we look to be swapping 2GB-3GB (on average) across all of them | 16:22 |
AJaeger | fungi: that might just work... | 16:22 |
dmsimard | unrelated and don't want to sidetrack but wanted to point out maybe there's an issue with nodepool? looking at grafana, we're capping at ~830 nodes.. I see an uptick in failures, it seems like there's a lot of nodes in "ready" state perhaps not being allocated to jobs in a timely fashion ? | 16:22 |
dmsimard | we should be capping at higher than 830 nodes, we saw north of 900 easily after we shifted back inap's capacity to v3 | 16:22 |
jeblair | mwhahaha: so what change would you like moved to position #4? | 16:23 |
mwhahaha | let me look | 16:23 |
jeblair | Shrews: can you look into that with dmsimard ? | 16:23 |
dmsimard | Shrews: http://grafana.openstack.org/dashboard/db/nodepool | 16:24 |
*** shardy is now known as shardy_afk | 16:24 | |
pabelanger | I'm also thinking we might either need to bump load governor up, or stand up a few more executors. We look to have a fair bit of ready nodes atm | 16:24 |
dmsimard | 185 ready nodes and 74 failed nodes.. with a bunch of queued jobs in zuul, there's something going on for sure :/ | 16:24 |
mwhahaha | jeblair: can we promote a failed one? 511509 | 16:25 |
*** bhavik1 has quit IRC | 16:25 | |
dmsimard | pabelanger: oh, that's a good point. Perhaps we have a bunch of nodes available to be queued but current executors are too loaded ? | 16:25 |
fungi | mwhahaha: that one looks like it declares a dependency on 510363 | 16:25 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Increase github delay to 10 seconds https://review.openstack.org/515812 | 16:25 |
jeblair | mwhahaha: yeah, it doesn't really matter. as soon as whatever is in position #4 changes, everthing behind it will restart. | 16:25 |
dmsimard | pabelanger: doesn't explain the failed nodes, but does explain the ready nodes | 16:25 |
pabelanger | dmsimard: possible, I am trying to see if that is the case | 16:25 |
mwhahaha | fungi: no that declarded a dep on 512082 which is already merged | 16:26 |
dmsimard | pabelanger: no load graphs on cacti for ze's :( | 16:26 |
pabelanger | each time we restart an executor, there is a large spike in load on others, which make sense | 16:26 |
clarkb | inc0: we changed how the iptables are configured, but tehy should be wide open between all test nodes still (its just ansible doing it instead of a nodepool ready script now) | 16:26 |
pabelanger | so need to wait a bit for things to even out | 16:27 |
*** smatzek has quit IRC | 16:27 | |
jeblair | dmsimard: we have graphs http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all | 16:27 |
fungi | mwhahaha: oh, you're right it was showing a tree where 510363 was the closest working change | 16:27 |
clarkb | inc0: you should see in the job-output.txt and in ara if the jobs fail the playbooks that set up the rules between the nodes run | 16:27 |
dmsimard | jeblair: oh, I wasn't looking at the right place, thanks | 16:27 |
dmsimard | jeblair: was looking at, example, http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=558 | 16:27 |
*** smatzek has joined #openstack-infra | 16:27 | |
*** iyamahat has joined #openstack-infra | 16:27 | |
jeblair | dmsimard, pabelanger: this new graph may be helpful: http://graphite.openstack.org/render/?width=586&height=308&_salt=1509467265.42&target=stats.gauges.zuul.executors.accepting | 16:28 |
fungi | mwhahaha: jeblair: at any rate, something seems to have kicked the change after 514330 out of the gate so everything behind restarted anyway | 16:28 |
jeblair | i haven't grafana'd that | 16:28 |
pabelanger | jeblair: Yah, that is neat | 16:28 |
mwhahaha | fungi: jeblair ok well i'll just keep an eye on it | 16:29 |
inc0 | clarkb: what I'm experiencing (and can't figure out what's wrong) is timeouts between multinode stuff | 16:29 |
inc0 | like galera | 16:29 |
jeblair | mwhahaha: do you want me to reshuffle to fix 514330, or leave it? | 16:29 |
mwhahaha | jeblair: just leave it | 16:29 |
inc0 | so while iptables jobs runs well (I assume) they might close something I need | 16:29 |
jeblair | mwhahaha: ok | 16:29 |
inc0 | I'm also trying with tunnel playbook, but doesn't seem to help | 16:29 |
*** esberglu has joined #openstack-infra | 16:29 | |
*** kjackal_ has quit IRC | 16:30 | |
pabelanger | dmsimard: we also have ~135 ready, but locked nodes in nodepool. So we are likely waiting for the noderequst to be fulfilled before unlocking | 16:30 |
*** kjackal_ has joined #openstack-infra | 16:30 | |
clarkb | inc0: the overlay you mean? | 16:30 |
inc0 | yeah | 16:30 |
clarkb | inc0: do you have an example failing job we can look at logs for? | 16:30 |
AJaeger | dmsimard, jeblair recheck did not help - still " ERROR Project openstack/requirements does not have the default branch master" on https://review.openstack.org/516397 | 16:31 |
jeblair | AJaeger: i'll dig into it | 16:31 |
pabelanger | dmsimard: so, it is possible that nodepool is waiting for jobs to finish, so it can launch more nodes | 16:31 |
AJaeger | thanks, jeblair | 16:31 |
AJaeger | bbl | 16:32 |
*** smatzek has quit IRC | 16:32 | |
*** felipemonteiro_ has quit IRC | 16:32 | |
*** salv-orlando has joined #openstack-infra | 16:32 | |
*** salv-orlando has quit IRC | 16:33 | |
Shrews | dmsimard: that grafana graph, i believe, can be a bit misleading. a node can be READY and already assigned to a request, but there could be a wait on other nodes needed for the request | 16:33 |
*** ijw has joined #openstack-infra | 16:33 | |
pabelanger | yah, that's basically what is going on ATM | 16:33 |
inc0 | clarkb: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/ for example | 16:33 |
*** smatzek has joined #openstack-infra | 16:33 | |
inc0 | in this ps mariadb is in single node, but it fails in different place | 16:33 |
*** ramishra has quit IRC | 16:33 | |
inc0 | also timeout | 16:34 |
inc0 | http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/ansible/deploy <- this is deployment log | 16:34 |
dmsimard | inc0: iptables is literally set up to accept any traffic from nodes in a multinode set, let me pick you up the set of rules | 16:35 |
Shrews | dmsimard: if you have a specific review in mind, or anything specific, really, i don't mind digging. but it's hard to give an explanation of the general state of things | 16:35 |
dmsimard | Shrews: there's contention for the amount of ready nodes and I think we understand that part. I was also asking about the seemingly high amount of failed nodes | 16:36 |
clarkb | inc0: so it is trying to hit http://172.24.4.250:35357 and failing? where do we see that ip address is assigned? | 16:36 |
*** jascott1 has joined #openstack-infra | 16:37 | |
dmsimard | inc0: http://logs.openstack.org/36/509436/6/gate/multinode-integration-ubuntu-xenial/7a9df40/job-output.txt.gz#_2017-10-23_22_40_18_201002 | 16:37 |
*** salv-orlando has joined #openstack-infra | 16:37 | |
dmsimard | inc0: are you using switch/peer groups ? | 16:37 |
jeblair | AJaeger: it looks like that executor's internal git repo for openstack/requirements is corrupted | 16:37 |
jeblair | i suspect that's either related to the unclean shutdown/startup, or my cleaning up after it | 16:38 |
clarkb | dmsimard: ya they are in the inventory | 16:38 |
clarkb | dmsimard: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/zuul-info/inventory.yaml | 16:38 |
pabelanger | Shrews: dmsimard: the failed nodes, is because we've hit quota on cloud. So maybe an issue with calculating that? | 16:38 |
*** smatzek has quit IRC | 16:38 | |
pabelanger | eg: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already used 1531904 of 1536000 ram | 16:38 |
*** pvaneck has joined #openstack-infra | 16:38 | |
Shrews | pabelanger: dmsimard: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already us | 16:38 |
dmsimard | pabelanger: leaked nodes perhaps ? | 16:38 |
Shrews | ed 1531904 of 1536000 ram | 16:38 |
Shrews | seeing that in nl01 logs | 16:38 |
pabelanger | yah, same | 16:39 |
clarkb | inc0: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 198.72.124.138. Set the 'ServerName' directive globally to suppress this message I think that is the problem | 16:39 |
clarkb | inc0: from http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/keystone.txt.gz | 16:39 |
pabelanger | I can check for leacked ndoes in IAD quickly | 16:39 |
clarkb | inc0: would have to double check the vhost but if apache thinks it is serving from a different name than the one you are hitting that could explain it | 16:39 |
Shrews | pabelanger: nodepool doesn't calculate anything wrt cpu or ram | 16:40 |
clarkb | inc0: ya the listens are all off at http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla_configs/keystone/wsgi-keystone.conf too | 16:40 |
clarkb | 172.24.4.1 != 172.24.4.250 | 16:40 |
*** catintheroof has joined #openstack-infra | 16:40 | |
pabelanger | Shrews: yah, maybe a leak or max-server isn't quiet correct | 16:41 |
Shrews | pabelanger: possible that max-servers is set too high? | 16:41 |
clarkb | so I think apache is just not listening on the IP that would make this work | 16:41 |
Shrews | yeah, that | 16:41 |
pabelanger | quite* | 16:41 |
*** vsaienk0 has quit IRC | 16:41 | |
*** jascott1 has quit IRC | 16:41 | |
*** armaan has joined #openstack-infra | 16:41 | |
dmsimard | clarkb: those are internal bridged IPs, right ? We don't set those up in the firewall explicitely but in practice the traffic goes in and out of the private IPs so it should go through fine I think | 16:42 |
pabelanger | clarkb: can I delete you clarkb-test-centos7 in rax-iad? | 16:42 |
*** sree has joined #openstack-infra | 16:42 | |
clarkb | pabelanger: yes that should be fine | 16:42 |
pabelanger | kk | 16:42 |
clarkb | dmsimard: ya I think the firewall is fine, its the process not listening on the right ip:port combo | 16:42 |
*** jascott1 has joined #openstack-infra | 16:43 | |
inc0 | clarkb: .250 will be handled by haproxy | 16:43 |
inc0 | that's why | 16:43 |
pabelanger | yah, I see a few vms in rax-iad that looks to be in error state | 16:43 |
pabelanger | going to try and clean them up | 16:43 |
clarkb | inc0: do you have more logs than http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/haproxy.txt.gz ? | 16:44 |
Shrews | pabelanger: also worth noting that the launcher start a few moments ago would have eliminated a couple of situations where we could have some instances essentially sticking around much too long. | 16:44 |
*** Apoorva has joined #openstack-infra | 16:44 | |
Shrews | s/start/restart/ | 16:44 |
pabelanger | Shrews: ok | 16:45 |
clarkb | aha http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla/haproxy/haproxy_latest.20171025.b55c6793b5c7f834e.txt.gz | 16:45 |
*** catintheroof has quit IRC | 16:45 | |
*** sree has quit IRC | 16:47 | |
inc0 | clarkb: let me retry full mariadb cluster so you'll see it | 16:47 |
inc0 | issue wasn't with haproxy tho because turning it off didn't help with mariadb | 16:47 |
inc0 | it's node->node timeout over 172... ips that failed | 16:48 |
*** yamahata has joined #openstack-infra | 16:48 | |
dmsimard | Shrews: fwiw tobiash has a stack of patches around switching from "max-servers" to use quotas instead, https://review.openstack.org/#/c/503838/ | 16:48 |
clarkb | inc0: right but you aren't really going node to node over 172 | 16:48 |
clarkb | inc0: you are going node to proxy to node to docker | 16:48 |
clarkb | inc0: and any one of those pieces could be broken | 16:48 |
clarkb | (actually its docker to node to proxy to node to docker) | 16:48 |
inc0 | we use net=host in docker so network stack isn't dockerized | 16:48 |
dmsimard | Shrews: (which I'm very excited about) | 16:48 |
inc0 | we don't use docker proxy or docker networking at all | 16:49 |
clarkb | inc0: so haproxy is the only address rewriting involved? | 16:49 |
inc0 | yes, well that and keepalived | 16:49 |
inc0 | keepalived creates .250 ip and handles HA | 16:50 |
inc0 | (on host) | 16:50 |
dmsimard | jeblair: so we had one particular executor with a corrupted repository and that's it ? | 16:50 |
inc0 | haproxy listens on .250 and forwards to .1 .2. .3 | 16:50 |
Shrews | dmsimard: yes. i think we're in a position to begin considering those now | 16:50 |
jeblair | dmsimard: i'm going to check all of them | 16:50 |
jeblair | sudo ansible 'ze*' -m shell -a 'whoami' --become-user zuul | 16:51 |
jeblair | why does that report 'root' ? | 16:51 |
Shrews | jeblair: i think you also need --become | 16:51 |
dmsimard | jeblair: perhaps .ssh/config defaults to root ? and what Shrews said | 16:51 |
jeblair | Shrews: yep, thanks :) | 16:52 |
Shrews | jeblair: you'd think the user one would imply the other, but... not so much | 16:52 |
jeblair | yeah, i guess it's a really strict correspondence with commandline args and module args | 16:52 |
clarkb | inc0: ok the haproxy log gives us clues on each node it appears it can talk to the keystone running on itself but not the others | 16:52 |
pabelanger | Shrews: did you want to take a peak into infracloud-chocolate, a large portion of its nodes are available currently: http://grafana.openstack.org/dashboard/db/nodepool-infra-cloud I haven't checked why that is | 16:52 |
mwhahaha | still getting errors :/ | 16:53 |
clarkb | (just based on the health checks) | 16:53 |
*** smatzek has joined #openstack-infra | 16:53 | |
Shrews | pabelanger: looking | 16:53 |
inc0 | clarkb: problem is, even if I turned off haproxy it fails | 16:53 |
dmsimard | mwhahaha: jeblair identified the problem and is working on it | 16:53 |
jeblair | mwhahaha: yeah, i think there's a problem on at least one executor, i'm working on a solution | 16:53 |
*** smatzek has quit IRC | 16:53 | |
mwhahaha | k | 16:53 |
inc0 | clarkb: let this latest patchset run | 16:53 |
jeblair | mwhahaha: i'll be happy to promote changes once it's fixed; i'll let you know | 16:53 |
inc0 | I just uploaded new one with multinode galera | 16:53 |
clarkb | inc0: I don't think we need galera... | 16:54 |
*** smatzek has joined #openstack-infra | 16:54 | |
inc0 | that's basically setup I'd like to test | 16:54 |
inc0 | well, we do, galera deployment is part of our code and we want to gate it | 16:54 |
clarkb | inc0: I mean to debug this problem | 16:54 |
clarkb | it exists with or without galera | 16:54 |
inc0 | right, but galera makes it appear earlier | 16:54 |
inc0 | anyway, removing haproxy have same effect | 16:55 |
inc0 | removing haproxy -> pointing all API traffic to .1 | 16:55 |
mnaser | are you deploying keepalived in the gate across 3 vms? | 16:55 |
inc0 | yeah | 16:55 |
mnaser | so how can the 2 other nodes reach the vip | 16:56 |
inc0 | well they should, I use tunnel overlay | 16:56 |
mnaser | because usually openstack won't let traffic originating from an IP that does not belong to an instance leave the instance | 16:56 |
inc0 | that's why I'm using tunnel overlay;) | 16:57 |
mnaser | oh, if you have an overlay for that then im not sure, i'd consider MTU issues because now you're doing tunnel in tunnel | 16:57 |
clarkb | inc0 Oct 25 23:40:24 ubuntu kernel: [ 720.595795] iptables dropped: IN=brinfra OUT= MAC=e2:9f:77:40:45:4a:fa:58:48:a1:33:49:08:00 SRC=172.24.4.3 DST=172.24.4.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25069 DF PROTO=TCP SPT=41312 DPT=35357 WINDOW=28200 RES=0x00 SYN URGP=0 | 16:57 |
inc0 | I checked mtu | 16:57 |
mnaser | as some cloud providers dont give you a real L2 network but a tunneled one | 16:57 |
mnaser | so you the MTU might change from a provider to another | 16:57 |
*** bnemec has quit IRC | 16:57 | |
inc0 | clarkb: sounds like problem there | 16:57 |
clarkb | inc0: reading that I think the problem is likely that the firewall rules are only updated for the actual host IPs and not for the overlay | 16:57 |
clarkb | dmsimard: ^ does the overlay role update firewall rules too? | 16:58 |
clarkb | dmsimard: if not we probably want it to | 16:58 |
*** rbrndt has quit IRC | 17:00 | |
*** baoli has quit IRC | 17:00 | |
dmsimard | clarkb: that's what I said earlier, no ? said we didn't add bridge IPs, only nodepool private ips | 17:00 |
clarkb | dmsimard: oh sorry I misread it | 17:01 |
*** baoli has joined #openstack-infra | 17:01 | |
dmsimard | clarkb: just making sure I'm not crazy http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-31.log.html#t2017-10-31T16:42:10 | 17:01 |
dmsimard | do we need to add the bridge IPs to the firewall, then ? | 17:01 |
clarkb | yes we will | 17:01 |
pabelanger | clarkb: dmsimard: Shrews: okay, clean up of rax-IAD underway, looks like we did leak some old nodes. maybe during cutover to zuulv3 | 17:01 |
*** jcoufal has quit IRC | 17:01 | |
Shrews | pabelanger: i'm sort of suspecting zuul has those chocolate ready nodes locked and is not doing anything with them | 17:02 |
clarkb | (I read private IPs as the thing used by the overlay but you mean the actual cloud provided private IPs) | 17:02 |
dmsimard | clarkb: ok, I'll get that done | 17:02 |
dmsimard | brb | 17:02 |
clarkb | dmsimard: and rather than doing ip to ip we just want to open the whole range up | 17:02 |
pabelanger | Shrews: oh, could it because executors stopped accepting jobs, due to high load? | 17:02 |
clarkb | so that inc0 can use .250 for haproxy | 17:03 |
Shrews | pabelanger: the nodes are assigned, the request is gone, so nodepool is waiting for zuul to change their state | 17:03 |
*** yamamoto has joined #openstack-infra | 17:03 | |
*** hashar is now known as hasharAway | 17:03 | |
pabelanger | kk | 17:03 |
Shrews | pabelanger: probably | 17:03 |
*** jcoufal has joined #openstack-infra | 17:03 | |
pabelanger | Shrews: thanks! I'll dig into that more | 17:03 |
pabelanger | once I finish clean up | 17:03 |
jeblair | dmsimard: ze 02,03,04 have ara 0.14.0. ze 05,06,07,08 have 0.14.2. ze 01,09,10 have 0.14.4. | 17:03 |
jeblair | dmsimard: i believe those are clustered by installation date | 17:03 |
jeblair | dmsimard: what's the version you just released? | 17:03 |
clarkb | dmsimard: I think `sudo iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT` is what devstack-gate used to do | 17:05 |
openstackgerrit | James E. Blair proposed openstack-infra/puppet-zuul master: Ensure ara is updated on executors https://review.openstack.org/516740 | 17:05 |
jeblair | clarkb, fungi, pabelanger: ^ is that okay to do? | 17:05 |
jeblair | i can never remember what works and doesn't. | 17:06 |
*** rkukura has quit IRC | 17:06 | |
jeblair | dmsimard: ^ | 17:06 |
pabelanger | yah, syntax looks correct | 17:06 |
dmsimard | jeblair: 0.14.5 | 17:06 |
clarkb | jeblair: that should be ok but it will update all the deps too (whcih may cause problems like with the subunit2sql thing fungi ran into) | 17:06 |
inc0 | also fyi, patchset with dockerhub publishing is up | 17:06 |
inc0 | we'll move our gates to dockerhub + proxy as soon as first images appear upstream | 17:07 |
inc0 | to get rid of tarbalsl | 17:07 |
dmsimard | jeblair: we don't run a pip upgrade of zuul which would trigger an update if it's dependencies ? | 17:07 |
inc0 | proxy == https cache in nodepools | 17:07 |
dmsimard | brb... | 17:07 |
jeblair | dmsimard: ara is not a zuul dependency | 17:07 |
dmsimard | Ohhh | 17:07 |
jeblair | (it's optional) | 17:07 |
*** rkukura has joined #openstack-infra | 17:08 | |
jeblair | we could probably work something out with those requirement tag thingies... i'm not sure if that whole pipeline works now...? | 17:08 |
*** baoli has quit IRC | 17:08 | |
pabelanger | okay, ready nodes are dropping now that quota is rax-iad is coming avaiable again | 17:09 |
clarkb | dmsimard: oh except that may only work with the old linux bridge things and not ovs, looking into how we set it up for ovs | 17:09 |
jeblair | tag isn't the right word for that... what's the word they use? | 17:09 |
*** baoli has joined #openstack-infra | 17:09 | |
clarkb | jeblair: extras | 17:09 |
jeblair | clarkb: yeah that's it! | 17:09 |
*** edmondsw has joined #openstack-infra | 17:10 | |
*** yamamoto has quit IRC | 17:11 | |
clarkb | dmsimard: I'm actually not seeing any explicit allows since we switched to ovs. I think that is because neutron/nova net manage their own IPs and firewall rules on that range | 17:11 |
*** tosky has quit IRC | 17:11 | |
clarkb | dmsimard: so we didn't actually need to manage it directly which would explain why inc0 is having problems too but devstack didn't | 17:11 |
clarkb | we don't run the control plane on devstack multinode on the overlay | 17:11 |
pabelanger | clarkb: so, to loop back to queue window size and related node usuage at PTG. If we support a negative window size on failures, i think that would help with the wasting of CI resources? EG: gate resets, 20 patches reset and suck up all the nodes | 17:12 |
clarkb | we only run the VM networks there | 17:12 |
*** lucasagomes is now known as lucas-afk | 17:12 | |
*** catintheroof has joined #openstack-infra | 17:12 | |
clarkb | inc0: fyi ^ what is the motivation for using the overlay for the control plane here? | 17:12 |
*** bnemec has joined #openstack-infra | 17:12 | |
inc0 | clarkb: well, I had same mariadb timeouts before overlay, so I asked here and you guys suggested overlay | 17:12 |
clarkb | pabelanger: you mean 0 size? you can't really have a negative size | 17:12 |
inc0 | overlay is good for us also because we can use keepalived+haproxy too | 17:13 |
pabelanger | clarkb: by sliding the window down to 10, then it reduces the amount of nodes is grabs each reset | 17:13 |
jlvillal | Is there a little command line utility that I can point at the 'zuul.d/' directory in our repo and it will print out what jobs should run for master, stable/pike, stable/ocata. And if they are voting, non-voting, or experimental. | 17:13 |
clarkb | inc0: those should all run over layer 3 right? | 17:13 |
* jlvillal doesn't ask for much | 17:13 | |
clarkb | pabelanger: so my issue with that is the gate is supposed to pass. If it doesn't that is the bug not the window size which is already a hack around bad jobs | 17:13 |
inc0 | well, neutron underneath will not allow keepalived floating ip to work over regular network | 17:13 |
pabelanger | clarkb: I mean, start at 20 today, and ramp up to 40 and down to 10 | 17:13 |
jlvillal | I'm trying to port what is in master to our stable branches. | 17:13 |
inc0 | neutron of nodepools | 17:14 |
clarkb | pabelanger: why not just change the minimum to 10? | 17:14 |
clarkb | pabelanger: it will slide up to 20 if your jobs pass | 17:14 |
inc0 | also L2 connectivity will generally be more similar to prod so it's little added benefit | 17:14 |
pabelanger | clarkb: yah, I mean setting to 10 might be easier. If we want to do that | 17:14 |
clarkb | inc0: oh right you need a shared IP then ya you need an overlay | 17:14 |
inc0 | we might add second overlay for vms later;) | 17:14 |
jeblair | pabelanger: what problem are you trying to solve? | 17:14 |
clarkb | inc0: dmsimard ok considering that I think what we want is an option to the overlay network role to open up the entire range between the nodes eg 172.24.4.0/23 can talk to 172.24.4.0/23 | 17:15 |
clarkb | inc0: dmsimard default it to not do that as that is the old behavior and some things (like neutron) want to manage the rules itself | 17:15 |
jeblair | #status log restarted all zuul executors and cleaned up old processes from previous restarts | 17:15 |
openstackstatus | jeblair: finished logging | 17:16 |
clarkb | but then inc0 can set that to true and get it working for control plane on overlay | 17:16 |
jeblair | #status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/python-glanceclient on ze05 | 17:16 |
openstackstatus | jeblair: finished logging | 17:16 |
fungi | jeblair: yeah, digging through logs and the source for puppet's pip package provider, i confirmed that using ensure=>latest will cause not only the named package to be updated but all its dependencies will be unconditionally updated to the latest versions on pypi (even if you've preinstalled sufficient versions as system packages) due to using the default upgrade strategy rather than the | 17:16 |
jeblair | #status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/neutron on ze10 | 17:16 |
fungi | only-if-needed strategy | 17:16 |
openstackstatus | jeblair: finished logging | 17:16 |
pabelanger | jeblair: minimize tripleo change pipeline resetting and consuming all the nodes, it has been happening pretty often the last few weeks. I know they are trying to fix the issues, but still making progress | 17:16 |
jeblair | #status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/requirements on ze07 | 17:16 |
openstackstatus | jeblair: finished logging | 17:16 |
inc0 | clarkb: just add new role for iptables | 17:16 |
inc0 | we can call it explicitly | 17:16 |
*** gouthamr has joined #openstack-infra | 17:16 | |
jeblair | pabelanger: can you elaborate on "consuming all the nodes"? | 17:17 |
pabelanger | dmsimard: Shrews: okay, rax-iad is happy again. Looking at rax-ord now | 17:17 |
jeblair | mwhahaha, AJaeger, dmsimard: all the executors should be repaired. should i promote any changes? | 17:17 |
*** markvoelker_ has joined #openstack-infra | 17:17 | |
AJaeger | jeblair: thanks - no requests from my side | 17:17 |
clarkb | inc0: except its fairly tightly coupled here, I don't think a new role is right because people will miss it when adding this role | 17:17 |
clarkb | inc0: instead its an option of the overlay | 17:18 |
inc0 | either way works, thanks! | 17:18 |
jeblair | fungi: and that does not happen on initial installation? | 17:18 |
AJaeger | jeblair: what about integrated gate? Promote 515702 ? | 17:19 |
fungi | jeblair: it does not with ensure=>present because it just calls pip install without --upgrade | 17:19 |
*** markvoelker has quit IRC | 17:19 | |
fungi | pip thinks --upgrade means "upgrade everything" unless you supply --upgrade-strategy=only-if-needed | 17:19 |
pabelanger | jeblair: currently, we're up to 470 centos-7 nodes, which I haven't calculated but likely used mostly by tripleo jobs. Since those job run times are pretty long, each time their gate resets, it is a large amount of resource that get wasted and jobs have to start again. So, was looking for a way to see how we could decrease the window size until that change queue becomes happy again. | 17:20 |
dmsimard | jeblair: for my own curiosity, were you able to tell if (strangely) just openstack/requirements affected ? | 17:20 |
fungi | the latter will upgrade the named packages but only upgrade their dependencies if the installed versions are insufficient | 17:20 |
pabelanger | jeblair: the delay has been pushing 24hrs for the last week or so | 17:20 |
jeblair | dmsimard: i status logged the repos i repaired | 17:20 |
AJaeger | dmsimard: see the #status log above | 17:20 |
dmsimard | jeblair: oops, didn't read far back enough, thanks | 17:20 |
*** Swami has joined #openstack-infra | 17:21 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor https://review.openstack.org/516532 | 17:21 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Check start time for wait_time key https://review.openstack.org/516465 | 17:21 |
*** markvoelker has joined #openstack-infra | 17:22 | |
jeblair | pabelanger: "wasting" is an abstract concern -- what's the problem? | 17:23 |
*** markvoelker_ has quit IRC | 17:23 | |
jeblair | help me understand the concrete problem that needs solving | 17:23 |
dmsimard | jeblair: tripleo is consuming more resources because the gate keeps resetting for different reasons, not all within their control | 17:23 |
dmsimard | is the gist of it | 17:23 |
jeblair | dmsimard: more than what? | 17:23 |
*** e0ne has quit IRC | 17:23 | |
dmsimard | more than if they merged the first time, rather than be rechecked/requeued several times | 17:24 |
*** salv-orlando has quit IRC | 17:24 | |
clarkb | 470 nodes would be ~half our capacity right? | 17:24 |
*** felipemonteiro has joined #openstack-infra | 17:24 | |
*** salv-orlando has joined #openstack-infra | 17:25 | |
jeblair | dmsimard: not really; a gate reset releases old resources and consumes new ones, so the consumption stays constant | 17:25 |
*** tmorin has quit IRC | 17:25 | |
*** felipemonteiro_ has joined #openstack-infra | 17:25 | |
pabelanger | jeblair: Sure, our check pipeline is currently 231, and wanted to see how we can get more nodes for it. My _gut_ is saying because the change queue for tripleo is resetting every 2 hours, that is the reason we are backing up check | 17:25 |
dmsimard | jeblair: okay, I think we're understanding each other but using different vocabulary | 17:25 |
pabelanger | jeblair: however, it isn't a problem. Since we eventually get nodes into check after the gate reset | 17:26 |
clarkb | my initial guess is that the runtime of tripleo jobs is what is making this potentially problematic | 17:26 |
jeblair | pabelanger: okay, a check backlog is something that can be addressed by reducing gate node usage (which can be done by shrinking the window when things are bad) | 17:26 |
clarkb | because check is at a lower priority than gate so when tripleo holds half our capacity in gate then resets they get to keep holding that half for significant periods of time | 17:27 |
jeblair | pabelanger: i'd caution against evaluating the backlog right now since i just reset the entire system twice this morning | 17:27 |
dmsimard | clarkb: it's a bit of a vicious circle, yes | 17:27 |
clarkb | we give gate a higher priority because those jobs should never fail | 17:27 |
clarkb | and so in theory have good throughput | 17:27 |
jeblair | we had a 100 change backlog in check and 40 in gate when i restarted the first time. when i did that, we lost about 800 CPU hours of computation. and then i did it again. | 17:27 |
clarkb | (also merging things is important) | 17:27 |
pabelanger | jeblair: sure, understood. | 17:27 |
*** salv-orlando has quit IRC | 17:29 | |
fungi | the main question i have is why is tripleo's shared change queue so much larger than the others in the gate? some combination of excessive job runtimes, more frequent job failures and tighter coupling between a greater number of repos than most other openstack projects? | 17:29 |
*** felipemonteiro has quit IRC | 17:29 | |
*** jpich has quit IRC | 17:29 | |
dmsimard | mwhahaha, EmilienM ^ | 17:29 |
fungi | or do they actually push that many more changes than other teams? | 17:29 |
*** pcaruana has quit IRC | 17:29 | |
jeblair | fungi: i'd say mostly the first 2 right now. the project diversity doesn't seem to be a big issue. | 17:29 |
*** sree has joined #openstack-infra | 17:29 | |
*** trown|lunch is now known as trown | 17:30 | |
pabelanger | yah job failures and long runtimes is the current state | 17:30 |
odyssey4me | oh dear, did zuul restart? | 17:30 |
odyssey4me | did I break it again? | 17:30 |
fungi | or is it also that they've held off approving changes due to tripleo cloud outages, and are working through an approval backlog now? | 17:30 |
pabelanger | however, that isn't something we can currently fix, that would be done on tripleo side | 17:30 |
*** camunoz has quit IRC | 17:30 | |
pabelanger | can't* | 17:31 |
dmsimard | odyssey4me: zuul restarted, likely not your fault :) | 17:31 |
mwhahaha | there are many reasons for the long queue today, most of which are not necessarily tripleo failures | 17:31 |
fungi | just trying to figure out whether it's expected to be perpetual or whether this is temporary while tripleo catches up on pending change approvals | 17:31 |
*** dhinesh has joined #openstack-infra | 17:31 | |
mwhahaha | 1) zuul reset (and subsequent errors) has contributed to the length, 2) puppet jobs mixed in also got hit with a gem update that broke unit tests | 17:32 |
pabelanger | okau, moving to rax-ord clean up | 17:32 |
mwhahaha | we aren't aproving more things than normal afaik | 17:32 |
jeblair | pabelanger: anyway, if you want to lower the min window because check is too backlogged, i will probably be fine with that, i'd just ask that you not make that evaluation based on the backlog right now which we know is not representative. | 17:33 |
*** ijw has quit IRC | 17:33 | |
mwhahaha | I've asked that we start actively tracking the gate failures better, it's really hard to tell once they clear the dashboard what failed where | 17:33 |
jeblair | pabelanger: it would be good to know what the backlog is under normal circumstances, and how the min-window change would be expected to affect that | 17:33 |
*** ijw has joined #openstack-infra | 17:33 | |
mwhahaha | i have noticed there does seem to be a slower responstime in updating the status of the jobs on the v3 dashboard as opposed to the older one so i'm wondering if the constant churn is also not helping that | 17:33 |
fungi | mwhahaha: "zuul reset" isn't a cause, merely a symptom. aside from the handful we just had due to the corrupt requirements git repository on one of the executors, most of those presumably go back to a generally higher failure rate for tripleo jobs i suppose? otherwise we'd see similar backlog for other projects | 17:34 |
*** ijw has quit IRC | 17:34 | |
clarkb | mwhahaha: in theory that is what the health dashboard should tell you (what failed where) | 17:34 |
*** sree has quit IRC | 17:34 | |
*** ijw has joined #openstack-infra | 17:34 | |
pabelanger | jeblair: is there an easy way to track shared queue reset with statsd? Or maybe something we could start tracking. | 17:34 |
Shrews | keep in mind that right now, jobs using the tripleo-centos-7 node label are limited to a single pool that has max-servers of 70. Once those are all in use by long running jobs, other jobs requesting that node will be waiting for those 70 in-use nodes to be released before they even begin | 17:34 |
mwhahaha | fungi: but it's a symptom of something outside of the tripleo world | 17:34 |
Shrews | not sure how relevent that info is, just throwing it out there | 17:34 |
mwhahaha | fungi: the point is that there are zuul or other related problems causing jobs to reset and because the jobs take long it has a bigger impact | 17:35 |
mwhahaha | fungi: not related to things specific to the tripleo world, so because our jobs take longer the impact is greater on our queue | 17:35 |
fungi | mwhahaha: many (i would wager most?) gate resets are due to job failures | 17:35 |
mwhahaha | fungi: except this morning when the executor was erroring for a few hours | 17:36 |
*** felipemonteiro_ has quit IRC | 17:36 | |
*** tesseract has quit IRC | 17:36 | |
fungi | right, i said aside from that specific incident | 17:36 |
*** felipemonteiro_ has joined #openstack-infra | 17:36 | |
fungi | which should have affected other projects too | 17:36 |
mwhahaha | it did but they are in their own queue and approve less | 17:36 |
mwhahaha | so all of ours are more visable because of the single queue | 17:36 |
mwhahaha | i saw a single nova change in the gate | 17:36 |
pabelanger | pypi.slave.openstack.org is that something we can delete from rax-ord? looks like something from the past we didn't clean up | 17:36 |
jeblair | Shrews: a couple of things mitigate that -- those are only used by check jobs, and only used by tripleo. so that shouldn't directly affect the gate queue length (but can make getting jobs ready to be gated take longer) | 17:37 |
pabelanger | that is nodepool project too | 17:37 |
fungi | okay, so you are saying you're actually approving more changes than the projects sharing the integrated gate queue | 17:37 |
clarkb | jeblair: remind me which roles were we moving into devstack? was it the swap setup? network overlay is staying in project-config right? | 17:37 |
mwhahaha | fungi: without actual metrics I cannot say for certain but our queue is larger than the integrated one | 17:37 |
mwhahaha | fungi: all i can speak to is the last 48 hours | 17:37 |
fungi | mwhahaha: agreed, trying to find out how it reached that point | 17:37 |
mwhahaha | fungi: and the failures the queue has had weren't necessarily tripleo specific | 17:38 |
fungi | it's also just possible other openstack projects are taking the week off in preparation for the summit i guess? | 17:38 |
clarkb | multi-node-bridge is in zuul-jobs so it must not be moving | 17:38 |
dmsimard | clarkb: re: bridge network and iptables -- we're doing this *inside* the image: http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/nodepool-base/install.d/20-iptables | 17:38 |
pabelanger | http://logs.openstack.org/74/467074/7/gate/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-container/84b7572/ just reset tripleo gate | 17:38 |
dmsimard | clarkb: notice the rules with 172.24.4.0/23 | 17:38 |
pabelanger | failed to download from github.com looks like | 17:38 |
clarkb | dmsimard: iirc that is a hack to make ironic agent work | 17:39 |
mwhahaha | fungi: do we have graphite metrics for zuul queue sizes? | 17:39 |
pabelanger | so, that is one reason for failures, we discussed at PTG not downloading from github any more, but that requires changes to DLRN | 17:39 |
fungi | mwhahaha: the disconnect for me is that if the majority of the issues weren't tripleo-specific, then i'm trying to understand how that isn't impacting other projects equally | 17:39 |
jeblair | clarkb: i think network overlay should be in ozj or zj | 17:39 |
clarkb | dmsimard: we can probably clean that up in the future, but aiui ironic nodes must have access to the control plane | 17:39 |
pabelanger | eg: using spec files from RPMS over github.com | 17:39 |
clarkb | jeblair: ya its zj I was confused | 17:39 |
dmsimard | clarkb: but we're talking about just whitelisting the entire traffic between that range so that would be taken out, aye ? | 17:40 |
openstackgerrit | Miguel Lavalle proposed openstack-infra/openstack-zuul-jobs master: Remove job neutron-dsvm-api https://review.openstack.org/516744 | 17:40 |
jeblair | fungi, mwhahaha: it sounds like at least one of the non-tripleo issues is tripleo-focused at least -- the gem failures. | 17:40 |
clarkb | dmsimard: no not necessarily. That rule is 172.24.4.0/23 to the control plane which is one cloud IPs | 17:40 |
mwhahaha | jeblair: no that's puppet-openstack specific | 17:40 |
jeblair | fungi, mwhahaha: that one is a matter of perspective | 17:40 |
clarkb | dmsimard: I think we only want to add the rules that allow 172.24.4.0/23 to talk to 172.24.4.0/23 | 17:40 |
mwhahaha | jeblair: and because we share queue, it impacted tripleo | 17:40 |
pabelanger | fungi: clarkb: see my question about pypi.slave.o.o in rax-ord, is that safe to delete | 17:40 |
jeblair | mwhahaha: you use those modules, right? | 17:40 |
clarkb | pabelanger: I don't know | 17:41 |
pabelanger | clarkb: k, I'll add it to meeting | 17:41 |
jeblair | mwhahaha: i mean, there's a *reason* that queue is shared | 17:41 |
mwhahaha | jeblair: by your argument then we should share neutron, nova, etc | 17:41 |
clarkb | dmsimard: if you haven't started on the chagne to multi node bridge I can take a stab at it | 17:41 |
clarkb | dmsimard: let me know | 17:41 |
jeblair | mwhahaha: that is in fact my argument but i have compromised | 17:41 |
fungi | mwhahaha: and attempting to ascertain whether whatever is causing tripleo to be singled out at the moment is a long-term problem we need to address systemically to bring tripleo's resource consumption into alignment with other teams, or whether this is a temporary/fleeting ballooning of resource needs which will subside once you work through it | 17:41 |
mwhahaha | once again, we need metrics and data so we can get to the bottom | 17:42 |
*** armaan has quit IRC | 17:42 | |
*** salv-orlando has joined #openstack-infra | 17:42 | |
mwhahaha | i also do not like this but without help understanding wtf is occuring in zuul over time when i'm not watching it's hard to say | 17:42 |
*** camunoz has joined #openstack-infra | 17:42 | |
dmsimard | clarkb: mostly making sure we'd be doing the change in the right place, I searched and couldn't find a reference to "iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT".. closest I found was in actual neutron code | 17:42 |
jeblair | mwhahaha: at any rate, what i'm trying to say is that those gem failures are a partial answer to fungi's question of why the tripleo queue has beed adversely affected recently | 17:43 |
*** d0ugal has quit IRC | 17:43 | |
fungi | makes sense | 17:43 |
clarkb | dmsimard: ya I think that was left over from when we used linux bridges but now it is ovs | 17:43 |
mwhahaha | jeblair: that was this morning and was addressed before the executor problem. it doesn't explain what happened yesterday | 17:43 |
clarkb | dmsimard: I did a bit of digging myself, pretty sure the rule is lacking to meet inc0's needs | 17:43 |
mwhahaha | the queue was already in bad shape before that happened | 17:43 |
mwhahaha | that's just contributed to further delays today | 17:44 |
clarkb | mwhahaha: is the health dashboard not tracking it for you? | 17:44 |
clarkb | re metrics | 17:44 |
inc0 | thanks clarkb, dmsimard do you want me to publish patch for it? | 17:45 |
fungi | mwhahaha: poking around in graphite i don't think we have separate meters for every shared change queue | 17:45 |
mwhahaha | fungi: that would be beneficial to have so that we can tell when issues start for RCA | 17:45 |
clarkb | mwhahaha: http://status.openstack.org/openstack-health/#/ | 17:45 |
clarkb | seems to be tracking tripleo jobs based on the front page | 17:45 |
dmsimard | clarkb, inc0: I've started something | 17:45 |
clarkb | dmsimard: cool, thanks | 17:46 |
*** Guest95277 has quit IRC | 17:46 | |
inc0 | thanks dmsimard | 17:46 |
mwhahaha | clarkb: yea it's there but i'll have to dig further. | 17:46 |
openstackgerrit | Miguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron https://review.openstack.org/516724 | 17:47 |
mwhahaha | clarkb: what specifically would be beneficial is a break out of check vs gate | 17:47 |
clarkb | mwhahaha: aiui its all gate today (no check) just due to volume but mtreinish would have to confirm | 17:47 |
mwhahaha | k i'll have to dig in and see if there's specifics we can point to | 17:48 |
mwhahaha | i recall there being a problem in the dashboard around the pingtest being improperly reported so i'll need to make sure that's still not a problem | 17:49 |
fungi | jeblair: is stats.zuul.tenant.openstack.pipeline.gate.total_changes scaled by 0.01? seems more spiky than i would expect too | 17:50 |
dmsimard | clarkb: bleh, I hate this but we might need to use a meta dependency. The problem is that the bridge network is not known before the multi-node-bridge role runs and for that role to work, we need to run the firewall role first to authorize the traffic between the nodes | 17:51 |
pabelanger | mwhahaha: I had a patch up to fix that, https://review.openstack.org/495517/ it was because it would set pingtest fail by default, then never run the test | 17:51 |
EmilienM | legacy-tripleo-ci-centos-7-nonha-multinode-oooq legacy-tripleo-ci-centos-7-nonha-multinode-oooq : ERROR Project openstack/requirements does not have the default branch master ( found on https://review.openstack.org/#/c/516683/ - stable/newton) | 17:52 |
dmsimard | EmilienM: we identified the issue and resolved it | 17:52 |
fungi | EmilienM: that was corrected an hour or so ago | 17:52 |
EmilienM | dmsimard: ok, I'll run "recheck" in that case | 17:52 |
clarkb | dmsimard: yes I think you need to just do it in multi-node-bridge completely independent of the firewall role | 17:52 |
fungi | corrupt git repository cached on an executor | 17:52 |
mwhahaha | pabelanger: yea that's what i'm remembering so not sure since we moved to tempest if we need that as much | 17:52 |
*** jascott1 has quit IRC | 17:52 | |
clarkb | dmsimard: is is a feature of the multinode bridge (and have it be a flag off by default that decides if it turns on or not) | 17:52 |
pabelanger | clarkb: fungi: do you mind reaching out to citycloud, or I can if you have a contact email, about deleting our 7 stuck instances in Kna1. They seems stuck in BUILDING | 17:53 |
dmsimard | clarkb: we could put it in multi-node-bridge directly, but then the rules wouldn't be persisted (https://review.openstack.org/#/c/513943/) | 17:53 |
clarkb | dmsimard: I think you can just add a task to main.yaml in multi-node-bridge with a when: flag | bool | 17:53 |
*** ccamacho has quit IRC | 17:53 | |
clarkb | dmsimard: oh right that | 17:53 |
dmsimard | clarkb: let me put up a WIP to explain | 17:53 |
*** felipemonteiro__ has joined #openstack-infra | 17:54 | |
fungi | pabelanger: i don't know that i have any specific contact off the top of my head--if we do have contact info it'll generally be in our passwords file | 17:54 |
clarkb | pabelanger: I've included you in earlier emails to them you can use but also our contact for that cloud should be in the passwords file | 17:54 |
pabelanger | clarkb: k, couldn't remember. let me search mail again | 17:54 |
pabelanger | with nodepool launcher errors under control, I'm going to see why we have a large amount of ready nodes now | 17:55 |
fungi | pabelanger: if we have a dashboard login of some kind for them, might make sense to just open a trouble ticket through that since it's likely non-urgent | 17:55 |
clarkb | dmsimard: can you just call the firewall role again from multi node bridge and pass in the different IPs? I think the problem is today the firewall role assumes the node IPs in inventory right? but we should be able to have it take a list? | 17:55 |
clarkb | dmsimard: or break out the persist iptables portion of that rule and only reuse that bit (that might actually be easiest) | 17:55 |
pabelanger | fungi: good idea | 17:55 |
dmsimard | hang on, I'll have a patch up soon | 17:55 |
clarkb | jeblair: considering you wrote https://review.openstack.org/#/c/516502/2 I'd be curious to get your thoughts on that (its logstash job submission change) | 17:57 |
*** felipemonteiro_ has quit IRC | 17:57 | |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status https://review.openstack.org/516755 | 18:01 |
pabelanger | jeblair: ^is that the correct syntax to render the new accepting metric for executors? | 18:01 |
*** pvaneck has quit IRC | 18:02 | |
*** panda|ruck|bbl is now known as panda|ruck | 18:02 | |
mwhahaha | question about logstash, which is the correct filename to use going forward? with or without the .gz? ie job-output.txt or job-output.txt.gz | 18:02 |
clarkb | mwhahaha: that is what I'm trying to sort out with https://review.openstack.org/#/c/516502/2 | 18:02 |
clarkb | mwhahaha: I'm asserting no .gz (backward compatible) | 18:03 |
mwhahaha | k | 18:03 |
*** d0ugal has joined #openstack-infra | 18:03 | |
*** tpsilva has joined #openstack-infra | 18:03 | |
clarkb | but hoping more people will review that change so we can make that decision | 18:03 |
dmsimard | clarkb: re-reading ianw's comment on https://review.openstack.org/#/c/513943/ -- I *guess* we could take out the iptables persistence into a specific role that'd run after multi-node-bridge and multi-node-firewall | 18:04 |
mwhahaha | i know it used to be not .gz but the v3 changed that. so i'm for dropping it | 18:04 |
dmsimard | clarkb: which would avoid having to run the multi-node-firewall role twice. | 18:04 |
clarkb | dmsimard: ya | 18:04 |
dmsimard | (and dealing with a meta dependency, which is cool too.) | 18:04 |
dmsimard | okay, let's do that. | 18:04 |
clarkb | dmsimard: that would be my perference I think its clear what is going on that way | 18:04 |
* dmsimard hates meta dependencies | 18:04 | |
* mwhahaha adds a meta dependency on dmsimard | 18:06 | |
*** rbrndt has joined #openstack-infra | 18:06 | |
clarkb | can you include_role the same role from multiple places? | 18:06 |
dmsimard | clarkb: yes | 18:07 |
clarkb | but then main firewall setup can include_role persist iptables and then multi node bridge can include_role persist iptables too | 18:07 |
clarkb | either way works, as long as persist iptables happens after all the firewalling at least once | 18:07 |
*** rbrndt has quit IRC | 18:07 | |
dmsimard | clarkb: I intended to add the persist iptables roles once in the multinode playbook after both roles, but it's true that each of those roles could do an include role too. | 18:08 |
dmsimard | either way works -- using include_role makes it so each role can be used on their own without relying on the playbook including the role | 18:09 |
* dmsimard no strong opinion on either | 18:09 | |
clarkb | dmsimard: ya may be worth going that route just for the ability to consume things individually | 18:09 |
*** dsariel__ has quit IRC | 18:09 | |
*** zzzeek has quit IRC | 18:10 | |
*** jpena is now known as jpena|off | 18:10 | |
*** zzzeek has joined #openstack-infra | 18:12 | |
openstackgerrit | Alex Schultz proposed openstack-infra/elastic-recheck master: Add query for 1729054 https://review.openstack.org/516756 | 18:13 |
AJaeger | odyssey4me: looking at https://review.openstack.org/#/c/516605/2/zuul.d/project.yaml - why are you not using a project-template in a central place? That way you define the template once and can use it everywhere | 18:13 |
AJaeger | odyssey4me: looks to me like you use the same jobs in a couple of repos | 18:13 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one https://review.openstack.org/516757 | 18:15 |
dmsimard | clarkb: should be as simple as that ? ^ I'll fix the persistence stack | 18:15 |
clarkb | dmsimard: one comment inline | 18:16 |
clarkb | oh actually one more one sec | 18:16 |
*** sree has joined #openstack-infra | 18:16 | |
*** zzzeek has quit IRC | 18:17 | |
clarkb | posted | 18:18 |
dmsimard | clarkb: sure | 18:18 |
*** zzzeek has joined #openstack-infra | 18:18 | |
clarkb | dmsimard: thinking about it more specifying the dest is probably not necessary | 18:18 |
clarkb | you aren't going to have packets from that range coming from external nodes to the test env (due to routing) | 18:19 |
dmsimard | clarkb: probably doesn't hurt to specify it | 18:19 |
mwhahaha | is there anything we can do to speed up the response time of the elastic-recheck bot? or is it only as good as the indexing delay | 18:20 |
pabelanger | /dev/xvde2 70G 66G 1.1G 99% /var/lib/zuul | 18:20 |
pabelanger | that is on ze10.o.o | 18:20 |
pabelanger | for some reason, almost full | 18:20 |
*** zzzeek has quit IRC | 18:20 | |
dmsimard | pabelanger: that's what, git repos ? | 18:20 |
clarkb | mwhahaha: its only as good as the indexing delay and right now thats not great due to the double indexing described in https://review.openstack.org/#/c/516502/2 | 18:20 |
clarkb | inc0: you should be able to depends on https://review.openstack.org/516757 and see if that makes things better for you | 18:20 |
*** hemna has quit IRC | 18:21 | |
clarkb | pabelanger: I think we leak build workspaces when executors are restarted | 18:21 |
*** d0ugal_ has joined #openstack-infra | 18:21 | |
*** sree has quit IRC | 18:21 | |
clarkb | not the case on ze02 though | 18:21 |
pabelanger | yah, I'm going to stop ze10.o.o now, we are posting back some errors to jobs | 18:21 |
pabelanger | then see what leaked | 18:21 |
clarkb | hold on | 18:22 |
pabelanger | k | 18:22 |
dmsimard | clarkb, inc0: hang on, we'll default that to false, right ? | 18:22 |
dmsimard | so that it's opt-in, not opt-out | 18:22 |
clarkb | pabelanger: there are a few builds from the 30th yo ucan probably just delete those without stopping the executor since our timeout is less than 18 hours | 18:22 |
clarkb | dmsimard: ya that preserves the old behavior of things like neutron testing their own firewall rules | 18:23 |
pabelanger | clarkb: didn't we have a clean up find command we used before | 18:23 |
*** d0ugal has quit IRC | 18:23 | |
clarkb | pabelanger: I'm not sure | 18:23 |
inc0 | dmsimard: I think default for confugre addresses is true | 18:24 |
inc0 | https://github.com/openstack-infra/zuul-jobs/blob/master/roles/multi-node-bridge/defaults/main.yaml#L5 | 18:24 |
odyssey4me | AJaeger our job definitions are a little common, but not *that common* - the extra layer of abstraction doesn't actually help much | 18:24 |
clarkb | inc0: ya I'm saying we need another flag for whther or not the firewall should be opened as well since old behavior was not to do that because things like neutron do it themselves | 18:24 |
inc0 | and if you configure addresses, it will require iptables too | 18:24 |
clarkb | inc0: no it won't require iptables | 18:25 |
AJaeger | odyssey4me: then my example files were too small ;) | 18:25 |
inc0 | having address you can't communicate over? | 18:25 |
clarkb | because things like neutron are expected to directly manage that stuff and if we do a global rule that masks neutron's rules we won't test neutron | 18:25 |
inc0 | maybe configure_addresses should be default false | 18:25 |
*** zzzeek has joined #openstack-infra | 18:25 | |
clarkb | inc0: yes because some things like neutron do it themselves | 18:25 |
inc0 | right, but then you don't want address on iface too right? | 18:25 |
odyssey4me | AJaeger I think as we stabilise we might look into using the templates -but for now it's not too bad | 18:25 |
clarkb | inc0: we do | 18:25 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one https://review.openstack.org/516757 | 18:26 |
dmsimard | clarkb: ^ with your comments | 18:26 |
clarkb | inc0: without addresses on the interface we won't be able to ssh to neutron managed VMs | 18:26 |
clarkb | due to routing | 18:26 |
clarkb | inc0: but neutron is responsible for making sure the iptables rules are set to all ssh | 18:26 |
clarkb | *allow | 18:26 |
*** zzzeek_ has joined #openstack-infra | 18:27 | |
clarkb | dmsimard: +2 thanks | 18:27 |
dhinesh | hi, looks like i might have a working CI https://review.openstack.org/#/c/516758/ , but how do you get the 'success' of 'failure' status for a CI under workflow | 18:27 |
dmsimard | inc0: try a Depends-On with https://review.openstack.org/#/c/516757/ and set bridge_authorize_internal_traffic: true in your job vars | 18:28 |
clarkb | inc0: basically the isn't a regression for the old overlay code in bash, it seemed to always expect the deployed software to then manage the IPs | 18:28 |
dmsimard | clarkb: -infra uses puppet3 right ? | 18:28 |
clarkb | inc0: your use case is different so we are adding this as a feature that is off by default | 18:28 |
clarkb | dmsimard: yes | 18:28 |
dmsimard | docs for puppet3 are dead :/ they took them out lol | 18:29 |
clarkb | woo | 18:29 |
clarkb | (I mean we do it too so can't complain) | 18:29 |
dmsimard | oh wait there's https://docs.puppet.com/puppet/3.8/ but https://puppet.com/docs/puppet/3.8/index.html is broken | 18:30 |
clarkb | dhinesh: your comments to gerrit have to match our comment link rules | 18:30 |
*** zzzeek has quit IRC | 18:30 | |
inc0 | trying | 18:30 |
clarkb | dhinesh: er not comment links but the javascript we inject looks for a specific format (trying to find where that is) | 18:31 |
*** ralonsoh has quit IRC | 18:31 | |
*** zzzeek_ has quit IRC | 18:31 | |
clarkb | pabelanger: fwiw my du to try and identify the bad dirs is going much slower on ze10 than I expected we probably should stop it since we can't turn it around quikcly | 18:33 |
clarkb | pabelanger: my concern with just stopping it though is that I think some jobs use a lot more disk than others due to having more required projects and those jobs will just all migrate to other executors potentially causing them to run out of disk too | 18:33 |
*** sambetts is now known as sambetts|afk | 18:34 | |
pabelanger | clarkb: yah, I cleaned up old dirs, no help | 18:34 |
clarkb | dhinesh: looks like it is comment links after all https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/review.pp#n170 | 18:34 |
pabelanger | clarkb: I suspect we are just syncing back to much data | 18:34 |
clarkb | pabelanger: last time when I investigate the fs issues it was largely the git repos | 18:35 |
*** pvaneck has joined #openstack-infra | 18:36 | |
clarkb | each job can have like 5GB of just git repos | 18:36 |
pabelanger | clarkb: oh, yah. that could be it too | 18:36 |
clarkb | (and git repos are also inode heavy) | 18:36 |
dmsimard | not just that, the executor also pulls the logs, right ? | 18:36 |
pabelanger | clarkb: okay, so stop or ride it out? | 18:36 |
clarkb | I'm torn I don't want to stop it so we can actually see what is using the disk | 18:37 |
clarkb | but du is running very slowly | 18:37 |
clarkb | still hasn't returned to me | 18:37 |
pabelanger | k, I am searching manually myself | 18:37 |
pabelanger | clarkb: yah, we are swapping too | 18:37 |
dmsimard | clarkb: we can attach an additional volume temporarily and move stuff ? | 18:37 |
clarkb | dmsimard: ya though I'm not sure that will be much faster (but I guess lets us investigate more later if necessary) | 18:38 |
*** bnemec has quit IRC | 18:38 | |
clarkb | (the problem is stopping it deletes all/most of the builds dirs) | 18:38 |
*** zzzeek has joined #openstack-infra | 18:39 | |
clarkb | ok I've got to prep for meeting /me is mostly afk until 1900 | 18:39 |
pabelanger | clarkb: so, far, everything I have git is 1.2GB to 1.6GB | 18:39 |
pabelanger | 5f02db42bc9a4be680c3d617a2eacdbf 2.3 GB | 18:40 |
pabelanger | oh, interesting | 18:41 |
pabelanger | I think we are leaking stuff | 18:41 |
pabelanger | 2017-10-31 15:24:18,141 DEBUG zuul.AnsibleJob: [build: 5f02db42bc9a4be680c3d617a2eacdbf] Sending result: {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"} | 18:41 |
pabelanger | that is last log entry, but still data on disk | 18:41 |
pabelanger | clarkb: ^ | 18:42 |
dhinesh | clarkb: so just adding comment links from the log server would initiate it? | 18:42 |
openstackgerrit | Merged openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status https://review.openstack.org/516755 | 18:42 |
pabelanger | clarkb: so, I think we should stop ze10 and see has been leaked, then go back into debug logs | 18:42 |
*** zzzeek has quit IRC | 18:42 | |
clarkb | pabelanger: ok | 18:43 |
clarkb | dhinesh: your comments to gerrit have to match that rule there | 18:44 |
pabelanger | clarkb: k, stopping | 18:44 |
*** hemna has joined #openstack-infra | 18:44 | |
*** zzzeek has joined #openstack-infra | 18:44 | |
pabelanger | jobs aborting now | 18:45 |
*** rloo has joined #openstack-infra | 18:45 | |
*** dprince has quit IRC | 18:49 | |
jeblair | back | 18:50 |
dmsimard | the issues are with ze10 ? | 18:51 |
jeblair | clarkb: are you running du? | 18:51 |
dmsimard | yeah ok, nevermind -- saw a finger url for it. | 18:51 |
clarkb | jeblair: not anymore | 18:51 |
jeblair | i don't want to run my own if others are | 18:51 |
jeblair | clarkb: what did you find? | 18:52 |
clarkb | 55G builds was biggest consumer followed by 9.1G executor-git | 18:52 |
clarkb | everything else is in the KB range | 18:52 |
clarkb | there are 224 builds | 18:53 |
clarkb | so even at 1GB each thats enough to fill the disk | 18:53 |
jeblair | there's no zuul-executor running? | 18:53 |
clarkb | jeblair: pabelanger was stopping it | 18:53 |
jeblair | okay, then they can all be deleted :) | 18:53 |
pabelanger | okay, I don't see any more playbooks running on ze10, but I do see ssh connections still open | 18:53 |
*** e0ne has joined #openstack-infra | 18:54 | |
jeblair | i'm assuming a bunch of them leaked due to earlier unclean shutdowns | 18:54 |
pabelanger | yes, it just stopped now | 18:54 |
jeblair | we probably should check for and delete old build dirs on the other executors | 18:54 |
clarkb | jeblair: pabelanger found that {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"} is a thing | 18:54 |
jeblair | i'll start that | 18:54 |
mugsie | is "ERROR Project openstack/requirements does not have the default branch master" a known issue? | 18:54 |
jeblair | that's probably due to being out of space | 18:54 |
clarkb | oh ya all those build dirs are from 1500UTC or so | 18:55 |
clarkb | which was around when things restarted? | 18:55 |
jeblair | yep | 18:55 |
pabelanger | agree | 18:55 |
pabelanger | 5f02db42bc9a4be680c3d617a2eacdbf was the one I linked before and still exists on disk | 18:55 |
*** jascott1 has joined #openstack-infra | 18:56 | |
clarkb | infra meeting in ~4 minutes | 18:56 |
AJaeger | mugsie: konwn issue and fixed - please recheck | 18:56 |
clarkb | join us in #openstack-meeting | 18:56 |
jeblair | okay i'm deleting all build dirs older than 4 hours | 18:56 |
mugsie | AJaeger: thanks | 18:57 |
AJaeger | mugsie: if you get it on a change that you just pushed or after the recheck, then it's a new one - please report back in that case | 18:57 |
clarkb | jeblair: I don't think that will catch tose on ze10 just yet | 18:57 |
mugsie | cool - I will keep an eye on them | 18:57 |
jeblair | clarkb: it seems to be the only one with a bunch of 1500s; other executors are generally older | 18:58 |
jeblair | since it's stopped i'll delete the whole builds dir | 18:59 |
*** yamahata has quit IRC | 18:59 | |
fungi | mmm, meeting time? | 19:00 |
clarkb | yup | 19:00 |
jeblair | pabelanger: i'm deleting all the build dirs, and i'm also doing fscks on all the git repos on ze10 | 19:03 |
jeblair | just to make sure everything is clean when we restart it | 19:03 |
*** sree has joined #openstack-infra | 19:03 | |
pabelanger | jeblair: ack | 19:03 |
*** yamahata has joined #openstack-infra | 19:07 | |
*** sree has quit IRC | 19:07 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci master: Default $NODEPOOL_PROVIDER https://review.openstack.org/490037 | 19:07 |
*** catintheroof has quit IRC | 19:14 | |
*** yamahata has quit IRC | 19:14 | |
*** pcaruana has joined #openstack-infra | 19:15 | |
*** dprince has joined #openstack-infra | 19:15 | |
*** yamahata has joined #openstack-infra | 19:17 | |
*** rbrndt has joined #openstack-infra | 19:22 | |
*** ijw has quit IRC | 19:23 | |
*** ijw has joined #openstack-infra | 19:24 | |
jeblair | pabelanger: all of the build dirs on ze10 are deleted, and my git repo fsck has come back clean; you should be clear to restart when ready | 19:24 |
pabelanger | jeblair: thanks, starting up now | 19:25 |
*** electrofelix has quit IRC | 19:25 | |
AJaeger | jeblair, pabelanger what about tripleo-quickstart? That was mentioned above | 19:25 |
jeblair | AJaeger: that error is probably caused by an error cloning from the git repo cache to the job's build dir. i checked the cache, and the repos are fine, so it was probably just that it ran out of space coping it into the build dir. | 19:27 |
jeblair | AJaeger: now that all the old build dirs are deleted, should be fine | 19:27 |
AJaeger | jeblair: ok, thanks | 19:27 |
*** eharney has quit IRC | 19:29 | |
*** eharney has joined #openstack-infra | 19:30 | |
*** hasharAway is now known as hashar | 19:38 | |
*** Hal has joined #openstack-infra | 19:43 | |
*** Hal is now known as Guest4300 | 19:43 | |
*** pvaneck has quit IRC | 19:44 | |
*** pvaneck has joined #openstack-infra | 19:45 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints https://review.openstack.org/516773 | 19:48 |
*** amoralej is now known as amoralej|off | 19:48 | |
*** pcaruana has quit IRC | 19:49 | |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints https://review.openstack.org/516773 | 19:49 |
*** pvaneck has quit IRC | 19:49 | |
*** sree has joined #openstack-infra | 19:50 | |
*** salv-orlando has quit IRC | 19:51 | |
openstackgerrit | Merged openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717 | 19:52 |
*** Guest4300 has quit IRC | 19:53 | |
openstackgerrit | Ruby Loo proposed openstack-infra/project-config master: Remove legacy python-ironicclient jobs https://review.openstack.org/516774 | 19:55 |
*** catintheroof has joined #openstack-infra | 19:55 | |
*** sree has quit IRC | 19:55 | |
*** mat128 has quit IRC | 19:56 | |
fungi | pabelanger: fwiw, i can't seem to get either of the "fg-test" instances for openstackjenkins in ord to accept any of my ssh keys | 19:58 |
fungi | so i don't think they're anything i created | 19:58 |
pabelanger | fungi: k, thanks | 19:58 |
clarkb | I think show will tell us how old they are? | 19:59 |
* clarkb looks | 19:59 | |
fungi | my money is on "ancient" | 19:59 |
pabelanger | http://paste.openstack.org/show/625158/ | 19:59 |
fungi | they're using a "512 MB Classic v1" flavor after all | 19:59 |
pabelanger | centos-6.2 nodes | 19:59 |
pabelanger | 512 | 19:59 |
fungi | oh yeah, created 2016-06-02 | 20:00 |
fungi | less ancient than i anticipated | 20:00 |
clarkb | oh in that case we likely can delete them as the only thing we really ever used centos 6 for was test nodes and git.o.o and both work otherwise | 20:00 |
pabelanger | fungi: I am guessing live migration? | 20:00 |
pabelanger | stop /create | 20:00 |
fungi | maybe | 20:00 |
pabelanger | okay, will delete them now then | 20:01 |
clarkb | re https://review.openstack.org/#/c/516502/ if we can get that in I think I'd like to restart gear on logstash.o.o as its over 100k now and see if that change makes a dent in job queue growth | 20:01 |
clarkb | but first lunch | 20:01 |
jeblair | i have to go run errands for a few hours | 20:01 |
*** efried has joined #openstack-infra | 20:03 | |
openstackgerrit | Ruby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironicclient jobs https://review.openstack.org/516776 | 20:05 |
*** pvaneck has joined #openstack-infra | 20:06 | |
efried | Howdy folks. Zuul is -1ing everything without giving a reason I can understand. Known issue? | 20:06 |
pabelanger | looking into why infracloud-chocolate appears to be wedged. | 20:06 |
pabelanger | 56 locked, ready nodes | 20:06 |
*** ccamacho has joined #openstack-infra | 20:10 | |
pabelanger | yah, appears to be wedge waiting on more nodes | 20:10 |
pabelanger | just a waiting game now I think | 20:11 |
pabelanger | | 0000624820 | infracloud-chocolate | nova | centos-7 | 0da89be1-f267-4926-bc91-b1debb4c509d | ready | 00:04:07:47 | locked | | 20:11 |
pabelanger | is the longest right now | 20:11 |
*** camunoz has quit IRC | 20:11 | |
fungi | efried: please link to an example of the everything it has -1'd | 20:11 |
fungi | there was a disk spontaneously running out of room on one of the 10 executors a little bit ago which could have resulted in some job failures | 20:12 |
fungi | though it seemed like we caught it pretty quickly | 20:13 |
efried | fungi E.g.: https://review.openstack.org/#/c/515151/ | 20:13 |
fungi | efried: thanks, looking | 20:13 |
efried | fungi E.g.: https://review.openstack.org/#/c/515223/ | 20:14 |
efried | fungi Let me know if you want more. | 20:14 |
efried | fungi Thanks for looking! | 20:14 |
fungi | efried: if you toggle the ci comments on that first one, all those "ERROR Project openstack/requirements does not have the default branch master" entries are likely to have been an executor failures we resolved earlier, those jobs started a few hours ago based on the time of your recheck and the duration of the working jobs and time it reported on the change | 20:16 |
*** jtomasek has quit IRC | 20:17 | |
efried | fungi Okay, I want to say there was at least one we rechecked and it still failed, lemme go find... | 20:17 |
fungi | efried: same thing with the second change you linked | 20:17 |
efried | fungi https://review.openstack.org/#/c/516662/ | 20:17 |
efried | MERGER_FAILURE sounds like a sinister Wall Street thing. | 20:18 |
fungi | efried: yes, that looks like a different issue | 20:18 |
*** mrunge has quit IRC | 20:18 | |
*** kgiusti has left #openstack-infra | 20:20 | |
fungi | efried: unrelated to the corrupt git repo on one of the executors which caused the "does not have the default branch master" errors, timing on the failed recheck there looks related to an issue we resolved shortly thereafter where an executor ran out of disk space | 20:21 |
fungi | it should be fine at this point | 20:21 |
efried | fungi Cool, thanks for checking it out. | 20:21 |
clarkb | merger failures can be lack of disk too | 20:21 |
*** Swami has quit IRC | 20:22 | |
fungi | yup | 20:22 |
fungi | that's what i expect it was in that case | 20:22 |
*** mrunge has joined #openstack-infra | 20:22 | |
clarkb | we may want to formalize rm -rf /var/lib/zuul/builds on zuul startup | 20:23 |
clarkb | maybe add it to the init script? | 20:23 |
*** ijw has quit IRC | 20:24 | |
*** dave-mcc_ has joined #openstack-infra | 20:26 | |
*** smatzek has quit IRC | 20:26 | |
*** smatzek has joined #openstack-infra | 20:27 | |
*** dave-mccowan has quit IRC | 20:27 | |
inc0 | dmsimard: I think it helped, gates failed still but for different reason I think | 20:28 |
inc0 | mariadb finally bootstraps:) | 20:28 |
dmsimard | inc0: progress! Do you know what's the issue ? | 20:29 |
*** erlon has quit IRC | 20:29 | |
inc0 | yeah I think, I think I need to recreate /etc/hosts so hostname will point to 172... ip | 20:29 |
inc0 | for rabbitmq | 20:29 |
dmsimard | inc0: we setup the inventory hostnames in /etc/hosts | 20:29 |
inc0 | yeah I know | 20:30 |
*** ldnunes has quit IRC | 20:30 | |
dmsimard | But they point to internal nodepool ip | 20:30 |
inc0 | but since I'm using overlay, I'll set it to overlay net | 20:30 |
inc0 | shouldn't be too bad | 20:30 |
dmsimard | I guess you want to setup the bridge IPs ? | 20:30 |
inc0 | yeah | 20:30 |
*** csomerville has joined #openstack-infra | 20:30 | |
dmsimard | Ok, maybe something to consider as well clarkb ^ | 20:30 |
inc0 | kolla-ansible already does that, but I needed to remove previous setup | 20:31 |
inc0 | dmsimard: not sure, it might be kolla-ansible specific | 20:31 |
*** smatzek has quit IRC | 20:31 | |
pabelanger | mwhahaha: EmilienM: can you see what in tripleo jobs is overwriting /root/.ssh/known_host? it is deleting the infra-root keys we add with nodepool and prevents us from SSH into the running nodes | 20:31 |
mwhahaha | pabelanger: probably in quickstart | 20:32 |
*** Hal has joined #openstack-infra | 20:32 | |
*** cody-somerville has quit IRC | 20:32 | |
*** Hal is now known as Guest95121 | 20:32 | |
mwhahaha | pabelanger: we append not overwrite | 20:33 |
fungi | pabelanger: itym authorized_keys? | 20:33 |
dmsimard | pabelanger: infra-root keys != known_hosts ? | 20:33 |
*** xingchao has quit IRC | 20:33 | |
dmsimard | fungi beat me to it :) | 20:33 |
pabelanger | oh ya, that | 20:33 |
pabelanger | thanks | 20:33 |
pabelanger | authorized_keys | 20:33 |
pabelanger | ty | 20:34 |
mwhahaha | but we do remove known_hosts https://github.com/openstack/tripleo-quickstart-extras/blob/79cf07e3dd3e555206ae6fefdd41423a6da38cd8/roles/virthost-full-cleanup/tasks/main.yml#L111 | 20:34 |
mwhahaha | pabelanger: https://github.com/openstack/tripleo-quickstart-extras/blob/dab754c8de7d235ffe85d157f7d6d6f05be988eb/roles/undercloud-setup/tasks/non_root_user_setup.yml | 20:34 |
mwhahaha | pabelanger: but we're using the authorized_key thing in ansible so not sure if that should be removing any more keys | 20:35 |
pabelanger | mwhahaha: is undercloud_user == root? | 20:36 |
mwhahaha | pabelanger: usually it isn't but it might be in multinode | 20:37 |
dmsimard | mwhahaha: authorized_key from ansible doesn't delete keys unless "state: absent" or "exclusive: yes" | 20:37 |
*** sree has joined #openstack-infra | 20:37 | |
mwhahaha | found it | 20:37 |
mwhahaha | https://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L238 | 20:38 |
pabelanger | yup | 20:38 |
dmsimard | oh yeah, that totally overwrites the one in /root | 20:39 |
*** sshnaidm is now known as sshnaidm|afk | 20:39 | |
pabelanger | cat foo | sudo tee -a /root/.ssh/authorized_keys | 20:39 |
pabelanger | that is the fix | 20:39 |
*** Guest95121 has quit IRC | 20:39 | |
dmsimard | depends what's the purpose | 20:39 |
dmsimard | there might already be keys in there | 20:39 |
mwhahaha | did you guys stop putting those keys in /etc/nodepool? https://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L232 | 20:40 |
mwhahaha | maybe that's the problem? | 20:40 |
pabelanger | no, we use glean now to populate /root/.ssh/authorized_keys | 20:41 |
pabelanger | just that your the only jobs that overwrite it | 20:41 |
clarkb | dmsimard: inc0 that would be another new use case for the overlay | 20:41 |
clarkb | dmsimard: inc0 I am not opposed to supporting it too but the way kolla sets it up I don't think kolla really wants us to do it anyways? because we are unaware of the keepalived ip | 20:41 |
pabelanger | mwhahaha: what is your key used for? | 20:41 |
*** sree has quit IRC | 20:42 | |
dmsimard | pabelanger: the fix is to leave line 238 as is, but then cat "${HOME}/.ssh/authorized_keys" | sudo tee -a /root/.ssh/authorized_keys | 20:42 |
dmsimard | instead of doing the cp | 20:42 |
pabelanger | right | 20:42 |
mwhahaha | pabelanger: no idea it was like that when i got here | 20:42 |
dmsimard | I can send a patch since i'm not core and all | 20:42 |
* dmsimard writes | 20:42 | |
clarkb | ianw: for https://review.openstack.org/#/c/516502/ you good if I go ahead and approve that now and edit the comment in a followup? | 20:44 |
*** thorst_ has quit IRC | 20:44 | |
*** trown is now known as trown|outtypewww | 20:45 | |
*** priteau has joined #openstack-infra | 20:45 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/tripleo-ci master: Don't replace /root/.ssh/authorized_keys, append to it https://review.openstack.org/516785 | 20:45 |
dmsimard | pabelanger, mwhahaha ^ | 20:45 |
*** thorst has joined #openstack-infra | 20:47 | |
pabelanger | jeblair: Shrews: clarkb: yah, chocolate is completely wedged. I'm going to see about bumping max-server up by 5 to see if that will cause things to move again. | 20:47 |
pabelanger | otherwise, I don't know how to delete or release a locked node | 20:47 |
pabelanger | locked 'ready' node | 20:47 |
inc0 | clarkb: yeah in general we have a lot of setup like that in our code, so unless someone else wants it, I don't think there is reason for making this thing just for us | 20:48 |
clarkb | inc0: in this case I think because you are using IPs in the range that we aren't direclty controlling you'll want to do it | 20:48 |
dmsimard | clarkb: yeah.. basically at that point, they might as well parent to base instead of multinode and include each role they need individually (and leave the multi-node-hosts-file out) | 20:48 |
*** felipemonteiro__ has quit IRC | 20:51 | |
*** thorst has quit IRC | 20:51 | |
*** xingchao has joined #openstack-infra | 20:53 | |
*** smatzek has joined #openstack-infra | 20:56 | |
pabelanger | okay, I am not sure what is going on with infracloud-chocolate, 3 more nodes came on line, and looked to be fulfilled for zuul, but still locked | 20:57 |
pabelanger | 2017-10-31 20:49:53,310 DEBUG zuul.nodepool: Updating node request <NodeRequest 200-0000806206 <NodeSet legacy-centos-7-2-node OrderedDict([('primary', <Node None primary:centos-7>), ('secondary', <Node None secondary:centos-7>)])OrderedDict([('subnodes', <Group subnodes ['secondary']>)])>> | 20:57 |
*** hemna has quit IRC | 20:58 | |
pabelanger | I'll check back online jeblair is back from errands, see if we can figure out the issue | 20:58 |
pabelanger | but, unsure how to release the locked ready nodes and unwedge | 20:58 |
*** cody-somerville has joined #openstack-infra | 20:59 | |
*** cody-somerville has joined #openstack-infra | 20:59 | |
clarkb | pabelanger: did the nodes that are locked boot and did zuul use them? | 20:59 |
clarkb | if they are just not booting we should be able to address that problem | 20:59 |
*** xingchao has quit IRC | 21:00 | |
*** smatzek has quit IRC | 21:01 | |
*** csomerville has quit IRC | 21:01 | |
*** jcoufal_ has joined #openstack-infra | 21:02 | |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Better comment in logstash job submission role https://review.openstack.org/516786 | 21:03 |
clarkb | ianw: ^ there is the bigger comment I'm going to go ahead and approve the other change now | 21:03 |
*** salv-orlando has joined #openstack-infra | 21:05 | |
pabelanger | clarkb: they have booted, nodepool-launcher marked them ready (fulfilled), then zuul locked them , but hasn't launched any jobs. let me see if I can figure out if an executor was assigned | 21:05 |
pabelanger | maybe we are having an issue there | 21:05 |
*** jcoufal has quit IRC | 21:05 | |
Shrews | pabelanger: yeah, not sure how to debug the zuul side of that | 21:06 |
*** eharney has quit IRC | 21:07 | |
pabelanger | http://paste.openstack.org/show/625161/ | 21:07 |
pabelanger | Shrews: clarkb: that is all the data I see in zk | 21:07 |
pabelanger | which looks correct | 21:08 |
pabelanger | and I see a lock too | 21:08 |
pabelanger | but don't know how to see that info | 21:08 |
*** catintheroof has quit IRC | 21:09 | |
pabelanger | k, have to dadops with kids, I'll check backscroll this evening | 21:09 |
*** rhallisey has quit IRC | 21:10 | |
openstackgerrit | Merged openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502 | 21:12 |
mwhahaha | ok so is there anywhere to look in the logs to see why the tripleo queue keeps resetting | 21:16 |
mwhahaha | cause it jsut reset again and i have no idea why | 21:16 |
mwhahaha | besides 510900,2 being stuck there in error | 21:17 |
clarkb | usually the easiest thing is to look at the top of the queue and see what just failed. Has 510900 been in that state for a while? | 21:18 |
mwhahaha | clarkb: yes | 21:18 |
mwhahaha | clarkb: hours | 21:18 |
*** ijw has joined #openstack-infra | 21:18 | |
clarkb | ok in that case its likely whatever change was ahead attempted to merge and failed because jgit (so zuul wasn't able to detect that ahead of time) or it got a new patchset and was evicted | 21:19 |
openstackgerrit | Ruby Loo proposed openstack-infra/project-config master: Remove legacy python-ironic-inspector-client jobs https://review.openstack.org/516789 | 21:19 |
clarkb | but otherwise you should have a failure at the tip | 21:19 |
mwhahaha | i know usually it shows up | 21:19 |
mwhahaha | but come back to the ui and it's like wtf just happened | 21:19 |
mwhahaha | but i litterally caught it zeroing out everything but no failure listed | 21:19 |
clarkb | ya if its not filure then likely merge failed in gerrit late or new patchset arrived for some new change | 21:20 |
clarkb | er some change at the head of the queue | 21:20 |
mwhahaha | i don't think so | 21:20 |
* mwhahaha goes looking | 21:20 | |
mwhahaha | http://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A | 21:20 |
mwhahaha | gotta love all those runs | 21:20 |
*** xingchao has joined #openstack-infra | 21:21 | |
mwhahaha | clarkb: it's items in the inventory.yaml to show what was ahead of it right? | 21:22 |
*** priteau has quit IRC | 21:22 | |
mwhahaha | it looks like the last 3 runs of 509521 had nothing infront of it | 21:23 |
*** priteau has joined #openstack-infra | 21:23 | |
clarkb | I'm not sure where that is recorded | 21:23 |
openstackgerrit | Ruby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironic-inspector-client jobs https://review.openstack.org/516791 | 21:23 |
mwhahaha | it just reset again | 21:23 |
mwhahaha | and nothing is infront of it | 21:23 |
* mwhahaha flips tables | 21:24 | |
*** sree has joined #openstack-infra | 21:24 | |
clarkb | I don't think it reset the gate | 21:24 |
clarkb | the changes behind it are still running jobs | 21:25 |
mwhahaha | why is that job getting reset | 21:25 |
clarkb | zuul will do that if the test node crashes (up to some limit of retries) | 21:25 |
*** ijw has quit IRC | 21:25 | |
clarkb | I'm trying t ofind it in the logs now | 21:25 |
clarkb | so its reseting jobs on that change but not reseting the gate as a result from what I see | 21:26 |
*** xingchao has quit IRC | 21:26 | |
mwhahaha | they are all resetting | 21:26 |
*** priteau_ has joined #openstack-infra | 21:26 | |
*** priteau has quit IRC | 21:27 | |
Shrews | i'm going to wager a guess that the infracloud long-locked nodes, and the tripleo problems mwhahaha is seeing are somehow related | 21:28 |
*** sree has quit IRC | 21:29 | |
clarkb | 2017-10-31 21:20:10,386 INFO zuul.Pipeline.openstack.gate: Resetting builds for change <Change 0x7fee5d659978 509521,2> because the item ahead, <QueueItem 0x7fee5d66fd30 for <Change 0x7fee5d66fba8 510900,2> in gate>, is not the nearest non-failing item, None | 21:29 |
clarkb | 2017-10-31 21:20:10,387 DEBUG zuul.Pipeline.openstack.gate: Cancel jobs for change <Change 0x7fee5d659978 509521,2> | 21:29 |
mwhahaha | so it's resetting because 510900,2 is stuck? | 21:30 |
clarkb | mwhahaha: well "is not the nearest non-failing item" I htink means something other than 510900 failed | 21:30 |
mwhahaha | but there's nothing there :/ | 21:30 |
clarkb | and the only thin between those two changes that can fail other than 510900 is 509521 itself | 21:30 |
clarkb | I'm trying to see if I can find build logs for 509521 now | 21:31 |
mwhahaha | the pep8 logs were http://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A | 21:31 |
mwhahaha | but not sure where the other job logs were | 21:31 |
*** mrunge has quit IRC | 21:31 | |
clarkb | if it is the node crashing then we won't have copied logs becaues the instances went away | 21:32 |
clarkb | trying to dig through via the zuul logs | 21:32 |
*** rcernin has joined #openstack-infra | 21:32 | |
mwhahaha | clarkb: could it be the stomping on the authorized_keys that pabelanger mentioned earlier? | 21:34 |
mwhahaha | clarkb: where zuul thinks the node crashed but it wasn't | 21:34 |
mwhahaha | it's just that you can't connect anymore | 21:34 |
*** amoralej|off is now known as amoralej | 21:34 | |
mwhahaha | or is the the fact that it seems to belooping in http://zuulv3.openstack.org/static/stream.html?uuid=6d416d1154364d65982e64c940d2f6d0&logfile=console.log | 21:35 |
clarkb | ya if zuul can't ssh back in that could do it | 21:35 |
clarkb | but I think the thing pabelanger was talking about was root user specific not zuul user | 21:35 |
mwhahaha | so how is that a new thing | 21:35 |
*** threestrands has joined #openstack-infra | 21:36 | |
mwhahaha | is the zuul user different than the user the job uses? | 21:36 |
*** armax_ has joined #openstack-infra | 21:36 | |
*** armax has quit IRC | 21:38 | |
*** armax_ is now known as armax | 21:38 | |
clarkb | no the zuul user is the user the job framework uses | 21:38 |
*** mrunge has joined #openstack-infra | 21:39 | |
mwhahaha | ok maybe we're touching that, but we didn't change anything recently around that | 21:39 |
rm_work | is zuul on storyboard or launchpad? wondering where i could request a feature | 21:39 |
clarkb | rm_work: storyboard | 21:39 |
rm_work | kk | 21:39 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Persist iptables rules https://review.openstack.org/513943 | 21:39 |
rm_work | clarkb: actually, is zuul's webui *part of zuul* or part of something else | 21:40 |
dmsimard | clarkb: as per discussed ^ I'll fix the integration tests | 21:40 |
*** edmondsw has quit IRC | 21:40 | |
*** bobh has quit IRC | 21:41 | |
openstackgerrit | Merged openstack-infra/project-config master: Better comment in logstash job submission role https://review.openstack.org/516786 | 21:41 |
clarkb | rm_work: it is part of zuul | 21:42 |
rm_work | k, thanks | 21:42 |
*** jascott1 has quit IRC | 21:43 | |
clarkb | mwhahaha: http://paste.openstack.org/show/625167/ that is a better snippte of logs but I think I am more confused now reading that | 21:44 |
mwhahaha | clarkb: :( i have no idea. i'm watching the console of the one runing job to see if it is something we're doing. if it resets again i'm going to abandon 510900 to get it to go away | 21:46 |
*** jascott1 has joined #openstack-infra | 21:46 | |
*** dprince has quit IRC | 21:46 | |
clarkb | I think what is going on is zuul thinks that 510900 is still the parent of 509521 in the ordered future git state under test | 21:47 |
clarkb | where it should decouple the two queues and 510900 is in a queue of its own with one item in it (itself) and then 509521 forms the tip of a new queue | 21:48 |
mwhahaha | my thoughts as well is that it's confused about those two | 21:48 |
*** jcoufal has joined #openstack-infra | 21:48 | |
mwhahaha | legacy-tripleo-ci-centos-7-scenario003-multinode-oooq-container seems to be queued on 510900 | 21:49 |
mwhahaha | so i wonder if they keep resetting eachother | 21:49 |
mwhahaha | so let me abandon that patch to clear it | 21:49 |
*** boden has quit IRC | 21:50 | |
*** jcoufal_ has quit IRC | 21:50 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence https://review.openstack.org/513934 | 21:51 |
mwhahaha | clarkb: are there visible metrics for number of times a job (or jobs) gets reset somewhere? | 21:51 |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence https://review.openstack.org/513934 | 21:51 |
dmsimard | ianw: addressed your comment in https://review.openstack.org/#/c/513943/ because splitting the role out made something else easier | 21:55 |
clarkb | mwhahaha: I'm not sure if zuul emits that to graphite, I think it may under the status NONE category | 21:55 |
*** bnemec has joined #openstack-infra | 21:55 | |
mwhahaha | clarkb: cause i want to say its been happening a lot based on queues but it's hard to tell :/ | 21:56 |
clarkb | http://paste.openstack.org/show/625169/ | 21:57 |
clarkb | mwhahaha: thats how often its happened roughly | 21:58 |
dmsimard | clarkb: wow | 21:58 |
mwhahaha | yea that coinicides with all the pep8 logs from that one change | 21:58 |
dmsimard | clarkb: each time that's happened, all the jobs essentially restart with the failed job out of the queue, right ? | 21:58 |
openstackgerrit | Mohammed Naser proposed openstack-infra/project-config master: Drop TripleO jobs from Puppet modules https://review.openstack.org/516794 | 21:59 |
clarkb | dmsimard: they should, but not sure if that is happening 510900 and 509521 seem to be coupled more tightly than I would expect | 21:59 |
mnaser | mwhahaha: ^ | 21:59 |
mwhahaha | clarkb: the coupling was just within zuul cause one is stable/newton and the other is master :D | 22:00 |
*** ijw has joined #openstack-infra | 22:00 | |
mwhahaha | so it's artifically together just in queue | 22:00 |
* mwhahaha shrugs | 22:01 | |
clarkb | well its not artificial that is by design so that upgrade jobs do the right thing | 22:01 |
clarkb | you can't test upgrades without that | 22:01 |
mwhahaha | we've pulled our upgrades out of infra | 22:01 |
clarkb | but reading that latest paste it looks like we have 510900 <- some change <- 509521 | 22:01 |
mwhahaha | so in this case, we probably don't want it | 22:01 |
clarkb | and that some change changes constantly | 22:01 |
clarkb | then it transitions to 510900 <- 509521 | 22:02 |
*** jcoufal has quit IRC | 22:02 | |
clarkb | but the whole time 510900 is there | 22:02 |
clarkb | which seems odd to me | 22:02 |
clarkb | unless 510900 was causing every change after it to break? | 22:02 |
mwhahaha | probably | 22:02 |
mwhahaha | it was one of the ones were the jobs had errored from the executor stuff | 22:03 |
mwhahaha | so it seemed to have gotten in a really bad state | 22:03 |
*** amoralej is now known as amoralej|off | 22:05 | |
clarkb | we may need jeblair to dig in when he gets back | 22:06 |
clarkb | I'm quikcly getting beyond my understanding here | 22:06 |
mwhahaha | there be dragons | 22:06 |
mwhahaha | seems that we have some other changes that might also be suffering from the same problems themselves | 22:07 |
mwhahaha | liek 516630,1 | 22:07 |
*** marst has quit IRC | 22:07 | |
mwhahaha | that's got a bunch of stuff that was errored but may also be requeuing | 22:07 |
mwhahaha | 516651,1 | 22:07 |
mwhahaha | 516047,2 499239,27 511350,25 511350,25 | 22:08 |
mwhahaha | i'm going to abandon/restore the tripleo ones | 22:09 |
mwhahaha | but there's a rally and nova one as well | 22:09 |
mwhahaha | that may continue | 22:09 |
clarkb | mwhahaha: which rally/nova one? | 22:10 |
mwhahaha | 516047,2 | 22:10 |
mwhahaha | rally | 22:10 |
clarkb | the only nova ones in the gate are green | 22:10 |
mwhahaha | 499239,27 | 22:10 |
mwhahaha | not gate | 22:11 |
mwhahaha | in check | 22:11 |
clarkb | ah ok | 22:11 |
mwhahaha | doing same stuff | 22:11 |
mwhahaha | or at least they look like they might be since they are still around | 22:11 |
mwhahaha | and have things queued | 22:11 |
clarkb | in check the fallout should be limited to that change, but ya probably want to evict them so they don't hang out there until next restart | 22:11 |
*** jascott1_ has joined #openstack-infra | 22:12 | |
mwhahaha | well they also might be requeuing constantly as well taking up resources | 22:12 |
mwhahaha | i cleared the 3 tripleo ones | 22:12 |
mwhahaha | but i can't help with those two | 22:12 |
*** slaweq has joined #openstack-infra | 22:12 | |
*** bobh has joined #openstack-infra | 22:13 | |
clarkb | ya not sure looks like they each have a single job queued (and no gate resets) | 22:13 |
*** ijw has quit IRC | 22:13 | |
mwhahaha | just something to keep an eye on | 22:14 |
*** slaweq has quit IRC | 22:15 | |
*** jascott1 has quit IRC | 22:15 | |
*** baoli has quit IRC | 22:18 | |
*** tpsilva has quit IRC | 22:18 | |
*** e0ne has quit IRC | 22:18 | |
clarkb | we'll want to keep an eye on it to see if "normal" gate failures create the same problem | 22:18 |
clarkb | what should happen is after the first failure things get decoupled from each other and move on on their own | 22:19 |
*** ijw has joined #openstack-infra | 22:19 | |
*** rloo has left #openstack-infra | 22:19 | |
clarkb | but if a second failure causes it to reset again that would be a bug | 22:19 |
*** bobh has quit IRC | 22:20 | |
*** ijw has quit IRC | 22:22 | |
*** ijw has joined #openstack-infra | 22:22 | |
ianw | dmsimard: dropped a bridge/iptables comment on 516757 ... is testing sufficient for that? | 22:24 |
clarkb | ianw: in this case we use ovs without linux bridges, does that change your concern? | 22:24 |
clarkb | ianw: we also have syslogs from kolla jobs showing iptables dropped packets sourced and desitnation from that 172.24.4.0/23 range | 22:26 |
ianw | clarkb: ok, just as long as we've tested it, i guess ovs is probably different | 22:27 |
ianw | whenever i see "bridge" and "firewall" it makes me think of this | 22:27 |
clarkb | ianw: ya inc0 mentions it seemed to address the problem in hisdepends on change | 22:27 |
inc0 | clarkb: fwiw it helped | 22:28 |
*** aeng has joined #openstack-infra | 22:30 | |
*** ijw has quit IRC | 22:33 | |
*** slaweq has joined #openstack-infra | 22:33 | |
clarkb | mwhahaha: I've added this problem to the issues with zuul list at https://etherpad.openstack.org/p/zuulv3-issues | 22:33 |
*** lbragstad has quit IRC | 22:36 | |
*** edmondsw has joined #openstack-infra | 22:38 | |
*** edmondsw has quit IRC | 22:43 | |
*** wolverineav has quit IRC | 22:44 | |
jlvillal | What does it mean when there is a Zuul error: MERGER_FAILURE on some of the jobs, but not all? | 22:45 |
*** rbrndt has quit IRC | 22:45 | |
jlvillal | Seen on this patch: https://review.openstack.org/#/c/513152/3 | 22:45 |
*** priteau_ has quit IRC | 22:47 | |
jeblair | back | 22:47 |
clarkb | jlvillal: earlier today an executor ran out of disk due to leaked build dirs. This caused the merger failure messages | 22:47 |
clarkb | jeblair: can you see the converstaion with mwhahaha above (and tried to tldr it on the zuul issues etherpad too | 22:47 |
jlvillal | clarkb: Ah, okay. recheck it is :) | 22:47 |
jeblair | clarkb, mwhahaha: ack | 22:48 |
jeblair | clarkb: "Changes in the gate did not appear to decouple from each other when the one ahead failed. Specifically 510900 and 509521 on October 31." ? | 22:49 |
*** rlandy has quit IRC | 22:49 | |
mwhahaha | jeblair: yea that one | 22:49 |
*** mriedem has quit IRC | 22:50 | |
clarkb | jeblair: yes | 22:51 |
ianw | clarkb: do you have experience re-init of a bup repo -> http://paste.openstack.org/show/625174/ ? | 22:51 |
clarkb | jeblair: it looked like 510900 kept resetting 509521, paste in ehterpad tries to capture that | 22:51 |
clarkb | ianw: I think the .bup should be in /root ? re error: '/opt/backups/bup-ask/.bup/' is not a bup repository; run "bup init" | 22:52 |
clarkb | oh its talking to the remote end and not finding a repo there | 22:52 |
clarkb | ianw: I think you may have to bootstrap local and remote? | 22:52 |
clarkb | system-config docs hopefully have more info? | 22:52 |
*** ijw has joined #openstack-infra | 22:53 | |
ianw | clarkb: yeah, i'm not sure ... /opt/backups/bup-ask/.bup is the remote side, which you'd think "bup init -r ..." would create for you, anyway, i'll keep poking | 22:53 |
clarkb | ianw: does /opt/backups/bup-ask exist? | 22:54 |
clarkb | it may create the .bup for you but not if it can't login? | 22:55 |
ianw | clarkb: yep, that was cloned from the old server. i just removed the .bup directory | 22:55 |
clarkb | gotcha | 22:55 |
ianw | it may be the system-config instructions are missing a bup init on the remote side | 22:55 |
*** hongbin has quit IRC | 22:55 | |
clarkb | ianw: the bup init on the local side to be backed up was a relatively ne wthing too its possible the bootstrap on remote side steps are new as well? | 22:55 |
clarkb | ianw: ya thinking that could be possible | 22:56 |
jeblair | clarkb: so if i can try to summarize -- down at the end of your paste (where it says the NNFI is None), we're looking at 509521 behind 510900 which is at the head. you might expect 510900 to fail and reset 509521 once, but it appears that it somehow failed multiple times and therefore reset 509521 multiple times. | 22:58 |
jeblair | clarkb: does that sound right? (also, are we sure that 510900 was the head at that time?) | 22:58 |
*** slaweq has quit IRC | 22:59 | |
clarkb | jeblair: correct on my theory | 23:00 |
jeblair | okay. i'll try to dig into that. it will take a while. | 23:00 |
clarkb | jeblair: http://paste.openstack.org/show/625167/ may also be helpful | 23:00 |
clarkb | jeblair: thats a bit more logging around a single reset occurence | 23:00 |
ianw | i think we should exclude /var/lib/postgresql from backups ... we just want to backup the database dump. it changes under the live backup | 23:00 |
clarkb | ianw: +1 | 23:01 |
jeblair | clarkb: ah thx | 23:01 |
jeblair | i'm going to try to get 2 of those and see what happened between them | 23:01 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [DNM] remove ci-backup-rs-ord.openstack.org https://review.openstack.org/516159 | 23:01 |
ianw | clarkb: ^ i think remote needs an init, updated instructions | 23:02 |
clarkb | ianw: do you need to flip the order around? bup init on server then on client? | 23:03 |
clarkb | docs say do server to be backed up first | 23:03 |
clarkb | but I think that is why you had the error | 23:04 |
*** xarses has quit IRC | 23:04 | |
ianw | clarkb: i think "bup init" on the client will create /root/.bup, but it's not till it runs with "-r user@backupserver:" that it tries to look at the remote .bup dir | 23:05 |
clarkb | aha | 23:06 |
openstackgerrit | Ian Wienand proposed openstack-infra/puppet-bup master: Ignore postgres working directory https://review.openstack.org/516798 | 23:08 |
*** gyee has quit IRC | 23:10 | |
ianw | infra-root: ^ if ok, could i get two eyes on this, i'd like to start new backups without this | 23:11 |
jeblair | clarkb, mwhahaha: i think there's a bug with reconfiguration; i think we erroneously put 509521 behind 510900 again after reconfiguration, then the next pass through the queue processor threw it out again. | 23:16 |
*** LindaWang has quit IRC | 23:17 | |
*** yamahata has quit IRC | 23:21 | |
*** yamahata has joined #openstack-infra | 23:21 | |
*** thorst has joined #openstack-infra | 23:24 | |
*** gildub has joined #openstack-infra | 23:25 | |
*** gmann_afk is now known as gmann | 23:27 | |
*** slaweq has joined #openstack-infra | 23:29 | |
clarkb | jeblair: ok, so my theory wasn't too far off | 23:30 |
* clarkb goes to pack | 23:31 | |
*** thorst has quit IRC | 23:31 | |
*** daidv has quit IRC | 23:35 | |
*** daidv has joined #openstack-infra | 23:35 | |
*** aviau has quit IRC | 23:38 | |
*** aviau has joined #openstack-infra | 23:38 | |
*** nicolasbock has quit IRC | 23:39 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: failing test for reconfiguration at failed head https://review.openstack.org/516799 | 23:41 |
jeblair | there's a failing test case that reproduces this. this may be a longstanding zuulv2 bug, we just didn't notice it because we didn't reconfigure every 5 minutes. | 23:42 |
*** jascott1_ has quit IRC | 23:46 | |
*** jascott1 has joined #openstack-infra | 23:47 | |
*** baoli has joined #openstack-infra | 23:50 | |
*** markvoelker has quit IRC | 23:51 | |
*** adreznec has quit IRC | 23:52 | |
*** sdague has quit IRC | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!