Tuesday, 2017-10-31

clarkbinfra-root can I get https://review.openstack.org/#/c/516473/ reviewed and merged so that I don't have to disable puppet on logstash.o.o?00:00
clarkbit just restarted the daemon with the broken config (I'm going to restart with manually fixed config now)00:00
*** andreww has joined #openstack-infra00:00
*** ijw has quit IRC00:01
fungiit's reviewed and approved00:01
*** ijw has joined #openstack-infra00:01
*** bobh has quit IRC00:02
clarkbtyty00:02
*** gouthamr has quit IRC00:05
* clarkb pops out to make dinner00:06
fungisubunit-worker02 seems to have gear 0.11.0 installed now, so i'm going to restart the worker on it00:06
fungimaybe it'll catch back up00:07
*** armaan__ has quit IRC00:10
*** markvoelker_ has quit IRC00:11
*** dingyichen has joined #openstack-infra00:11
*** ijw has quit IRC00:14
*** ijw has joined #openstack-infra00:14
fungii need to knock off for the evening. i'll be semi-around tomorrow for meetings but need to take some time to finish prepping for a very long flight on wednesday-friday00:16
clarkbgood night00:17
fungithanks, you too00:17
clarkband ya I'll be around but also packing/prepping00:17
*** ijw has quit IRC00:19
openstackgerritMerged openstack-infra/system-config master: Remove zl's from jenkins-logstash-client config  https://review.openstack.org/51647300:21
clarkb00:24
clarkbwhoops00:24
*** markvoelker has joined #openstack-infra00:25
*** LindaWang has quit IRC00:30
pabelangerclarkb: ah, we should have landed https://review.openstack.org/515181/ :)00:31
pabelangerI can rebase quickly00:32
clarkbah sorry00:32
*** bobh has joined #openstack-infra00:33
*** thorst has quit IRC00:34
*** ijw has joined #openstack-infra00:35
*** owalsh_pto has quit IRC00:36
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Remove zuul-launcher support  https://review.openstack.org/51518100:37
pabelangerianw: clarkb: okay, should be rebased and fully removes zuul-launchers00:37
*** hongbin has joined #openstack-infra00:39
pabelangerugh, another failure on logstash-worker00:40
pabelangerI'll have to pick it up in the morning00:40
*** bobh has quit IRC00:42
*** xingchao has joined #openstack-infra00:45
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364500:48
*** owalsh_ has joined #openstack-infra00:49
*** owalsh has joined #openstack-infra00:51
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364500:51
*** mat128 has joined #openstack-infra00:53
*** owalsh- has joined #openstack-infra00:53
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364500:54
*** zhurong has joined #openstack-infra00:54
*** owalsh_ has quit IRC00:54
*** owalsh_ has joined #openstack-infra00:55
*** owalsh has quit IRC00:56
*** owalsh has joined #openstack-infra00:58
*** cuongnv has joined #openstack-infra00:59
*** owalsh- has quit IRC00:59
*** namnh has joined #openstack-infra01:00
*** owalsh- has joined #openstack-infra01:01
*** LindaWang has joined #openstack-infra01:01
*** kiennt26 has joined #openstack-infra01:01
*** owalsh_ has quit IRC01:01
*** ijw has quit IRC01:03
*** ijw has joined #openstack-infra01:03
openstackgerritMerged openstack-infra/system-config master: Switch to cgit from gitweb  https://review.openstack.org/51197001:03
*** ijw has joined #openstack-infra01:04
*** yamahata has quit IRC01:04
*** owalsh_ has joined #openstack-infra01:04
*** owalsh has quit IRC01:04
*** baoli has joined #openstack-infra01:05
openstackgerritMerged openstack-infra/system-config master: Remove unneeded encoding change.  https://review.openstack.org/51558001:06
*** rhallisey has quit IRC01:07
*** owalsh has joined #openstack-infra01:07
*** xingchao has quit IRC01:08
*** owalsh- has quit IRC01:08
*** dhinesh has joined #openstack-infra01:08
*** ijw has quit IRC01:08
*** owalsh- has joined #openstack-infra01:10
*** owalsh_ has quit IRC01:11
*** smatzek has joined #openstack-infra01:14
openstackgerritMerged openstack-infra/system-config master: Add stretch mirror for ceph  https://review.openstack.org/51359101:14
*** owalsh has quit IRC01:14
yamamotowhy logstash.o.o shows files which is not in jenkins-log-client.yaml? (like logs/screen-gnocchi-metricd.txt.gz)01:17
*** bobh has joined #openstack-infra01:17
yamamotois there another list?01:17
clarkbyamamoto: with the switch to zuulv3 we switched to regex based listing based on what is on disk for the job01:18
clarkbso anyrhing matching the regex is now pushed, you can see that in project-config/roles iirc01:18
*** Apoorva_ has joined #openstack-infra01:19
*** rlandy has quit IRC01:20
yamamotoclarkb: so jenkins-log-client.yaml is no longer relevant?01:21
*** psachin has joined #openstack-infra01:21
clarkbfor the most part that is correct01:21
clarkbwe still use it to run the gearman server01:21
*** psachin has quit IRC01:22
*** Apoorva has quit IRC01:22
*** psachin has joined #openstack-infra01:22
*** Apoorva_ has quit IRC01:23
yamamotoclarkb: i got it. thank you01:24
*** larainema has joined #openstack-infra01:24
*** bobh has quit IRC01:26
yamamotois it expected that logstash.o.o shows both of job-output.txt and job-output.txt.gz?01:31
*** smatzek has quit IRC01:32
*** LindaWang has quit IRC01:34
clarkbit shouldnjust be one per job iirc01:35
clarkbif not then that is a bug01:35
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Create build-openstack-puppet-tarball  https://review.openstack.org/51598001:35
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Remove publish-openstack-puppet-branch-tarball from post pipeline  https://review.openstack.org/51598201:35
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Move publish-openstack-puppet-branch-tarball into ozj  https://review.openstack.org/51598101:35
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Revert "Remove publish-openstack-puppet-branch-tarball from post pipeline"  https://review.openstack.org/51598301:35
*** LindaWang has joined #openstack-infra01:37
pabelangerAJaeger: fungi: clarkb: EmilienM: mnaser: ianw: ^when you have time, that should be the last steps to getting puppet modules 'release / build' jobs as native zuulv3 jobs01:39
EmilienMnice01:39
EmilienMpabelanger: why did it work on ocata/pike/master?01:40
EmilienMand not on newton01:40
yamamotoclarkb: message:"At completion, logs for this job will be available at" for 7d seems to have both fo them for many of results01:41
*** mriedem has quit IRC01:42
clarkbyamamoto: does it have both for the same job?01:42
clarkbI think as long as its unique per job we are ok but if not needs investogating01:42
pabelangerEmilienM: you'll have to ask mnaser that, maybe jobs didnt get backported properly? The correct path forward is to use the new build-openstack-puppet-tarball job, as it simplifies things greatly01:42
*** Sukhdev has quit IRC01:44
yamamotoclarkb: both with the same build_uuid01:44
*** annp has joined #openstack-infra01:44
clarkbok will have to investigate then01:45
*** edmondsw has quit IRC01:45
clarkbcan you share an example uuid/query?01:45
yamamotoclarkb: the query was message:"At completion, logs for this job will be available at"01:46
yamamotoclarkb: for 7d period01:46
yamamotoclarkb: build_uuid is eg. 2c30560da8f04966af83c1d951dd860301:46
clarkbthanks will look in the morning01:47
*** dhinesh has quit IRC01:49
mnaserEmilienM: is it possible that stable/newton didnt have puppet in bindep where it didnt work?01:53
EmilienMmnaser: no, see https://review.openstack.org/#/c/515132/ which is on top of the bindep patch01:53
EmilienMmnaser: and still failing after recheck01:54
*** gmann_afk is now known as gmann01:58
*** camunoz has quit IRC01:58
pabelangermnaser: revoke-sudo is called, before sudo install01:58
*** aeng has quit IRC02:00
*** apetrich has quit IRC02:03
*** apetrich has joined #openstack-infra02:04
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Remove publish-openstack-puppet-branch-tarball  https://review.openstack.org/51598402:06
mnaserEmilienM: i wonder why its failing foro that02:07
mnaserEmilienM: oooooh02:09
mnaserone moment02:09
mnaserEmilienM: https://review.openstack.org/#/q/Id68ee1b443a4172d0c1d6d58a04908c52a566623 you can blame mwhahaha  for this one :D02:10
mnaseroh, merge conflict02:10
EmilienMI always do02:10
EmilienMI can do the git thing02:11
mnaserEmilienM: do you mind cherry picking that locally into stable/newton please02:11
mnaserthat will fix it for you02:11
*** threestrands has joined #openstack-infra02:11
*** threestrands has quit IRC02:11
*** threestrands has joined #openstack-infra02:11
mnaseryou can do Depends-On as well to get your puppet-tripleo job to be green02:11
* mnaser goes back to winter tire shopping :(02:11
EmilienMmnaser: ok02:11
EmilienMmnaser: I need to buy that also02:11
mnasernovember 15 is coming up :P02:12
EmilienMmnaser: I don't live in QUebec :P02:12
EmilienMI don't even know how it works here in BC lol02:12
EmilienMmnaser: where do you go?02:14
mnaserEmilienM: usually to Costco but they don’t carry tires in the size of my new car02:14
EmilienMok02:14
mnaserI found a place in Ottawa that has a nice packages both tire + rims so I can swap them myself02:14
EmilienMI'll let professionals doing it :-D02:15
*** aeng has joined #openstack-infra02:17
mgagneisn't December 15 in Quebec?02:18
*** rkukura has quit IRC02:19
pabelangerI kinda with I lived in northern ontario, you can put studs on winter tires02:20
mgagneyea, just read about it, you can have studs since October 1st but no laws regarding winter tires02:20
mnasermgagne: it was december 15th when they first started enforcing it, but the actaul date was november 15th02:21
mnaserwait02:21
mnaserit is december 1502:21
mnaseri thought it was november 1502:21
mgagneyes, can't find anything about november02:21
*** rwsu has joined #openstack-infra02:22
mgagneso you have time, maybe someone suggested November in the news?02:22
clarkbEmilienM: are you in vancouver or victoria?02:22
mnaserlet me keep telling myself november 15 so i can get it done earlier :P02:22
* mgagne said nothing02:22
clarkbif so you probably dont need tires unless driving up.to eg whistler02:22
mnasermgagne: but also, i have summer performance tires which means the car is useless with the slightest of snow02:22
clarkbpnw is typically realtively warm and wet02:22
*** aeng has quit IRC02:22
mgagneI read you are required by laws to get winter tires in some BC areas02:22
mnaserso dont wanna take any chances02:22
mgagnehehe02:23
mgagneand now to think about storing that motorcycle ^^'02:23
*** catintheroof has joined #openstack-infra02:24
mgagnetime to go home and get some rest for more Nova Mitaka upgrade tomorrow :D02:25
mnaseroh boy02:25
mnaserbonne chance02:25
mgagnethanks =)02:25
mnaserodds of a change that's console says "--- END OF STREAM ---" only actually doing work?02:26
mnaser515937 legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet .. the reset would be massive :(02:26
clarkbit typically is iirc02:26
clarkbthere is a bug where we dont always get a stream that hasnt been sortes but job is running02:26
mnaserlets hope thats the case02:27
*** bobh has joined #openstack-infra02:30
pabelangerhmm02:33
pabelangercould be 79 isn't listening again02:33
pabelangerlet me check quickly02:33
pabelangerya02:34
pabelangerfinger test@ze02.openstack.org02:34
pabelangeris down02:34
mnaseryep, it went through!02:34
*** thorst has joined #openstack-infra02:35
*** aeng has joined #openstack-infra02:35
pabelangernetstat -na | grep \:7902:35
pabelangerreturns nothing on ze0202:35
ianwpabelanger: is it a full executor restart to get that back?02:35
pabelangeryah02:36
pabelangerI think it is because of high load on the system02:36
pabelangerthen we somehow loose socket02:36
pabelangerianw: actually02:36
pabelangerhttps://review.openstack.org/516403/02:36
pabelangerwe should land that, then do restart all our executors02:37
pabelangerthat will fix the issue you found yesterday02:37
ianwpabelanger: yeah, i need to write a playbook for that, it was a bit of an emergency situation last night02:37
ianwwe should really stop the scheduler, restart executors, then restart scheduler right?02:38
pabelangerhttps://review.openstack.org/510155/02:38
pabelangerI need to rebase that02:38
pabelangerand address comments, but should give us a playbook02:39
*** salv-orl_ has joined #openstack-infra02:39
*** thorst has quit IRC02:39
pabelangerianw: no, we should be okay to keep scheduler running02:39
pabelangerjust stop executors02:39
pabelangerthen start02:39
*** rkukura has joined #openstack-infra02:40
ianwwhat about the running jobs though?  they just die?02:40
pabelangerscheduler will see job aborted and requeue it02:40
pabelangerso, users shouldn't need to do anything, just that there job will restart a few times until restarts are finished02:41
*** salv-orlando has quit IRC02:42
ianwahh, ok.  i guess last night, the executors were in their really odd state, which messed things up02:42
*** catintheroof has quit IRC02:43
ianwlet's merge the typo fix, i'll see about playbook02:44
*** dhinesh has joined #openstack-infra02:45
*** reed_ has joined #openstack-infra02:52
*** gildub has joined #openstack-infra02:54
*** reed_ has quit IRC02:54
clarkbyamamoto: I think the issue is that we use the archive ansible module to gzip job-output.txt and it does not remove the original file by default02:55
clarkbyamamoto: so when we look at the fielsystem to create job we see both job-output.txt and job-output.txt.gz02:56
*** dave-mccowan has quit IRC02:56
clarkbwe actually just want job-output.txt I thin02:56
*** hongbin has quit IRC02:56
*** hongbin_ has joined #openstack-infra02:56
yamamotoclarkb: so adding $ to regex would solve the issue?02:57
*** cshastri has joined #openstack-infra02:57
clarkbyamamoto: I think so at least for this specific instance02:58
clarkb(which may be sufficient02:58
clarkbthis could also partly explain why we are behind in indexing02:59
clarkbwe are indexing twice as much data as we should02:59
*** hongbin_ has quit IRC02:59
*** hongbin has joined #openstack-infra03:00
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix syntax with gear unRegisterFunction()  https://review.openstack.org/51640303:03
*** bobh has quit IRC03:08
*** rosmaita has quit IRC03:11
*** bobh has joined #openstack-infra03:14
*** Sukhdev has joined #openstack-infra03:15
*** cody-somerville has joined #openstack-infra03:17
*** lathiat_ has joined #openstack-infra03:18
*** ramishra has joined #openstack-infra03:19
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors  https://review.openstack.org/51015503:19
*** lathiat has quit IRC03:19
*** hongbin has quit IRC03:22
*** esberglu_ has quit IRC03:22
*** esberglu has joined #openstack-infra03:22
ianwok i've restarted ze01 using ^ to pickup ^^03:23
ianwi will monitor for a bit before doing others03:23
*** edmondsw has joined #openstack-infra03:24
*** liujiong has joined #openstack-infra03:26
*** edmondsw has quit IRC03:29
*** vkmc has quit IRC03:29
jeblairianw, pabelanger: let's give it longer to stop.  like 15-20m?03:33
jeblair(in the playbook)03:33
*** gouthamr has joined #openstack-infra03:34
*** vkmc has joined #openstack-infra03:35
jeblairpabelanger, ianw, Shrews: i think we have some more error logging now which may have information on why finger daemon died on ze02, we should look for that.03:35
*** armax has quit IRC03:35
pabelangerjeblair: ianw: yah, I don't think it worked well on ze01. We have 3 zuul-executor and 1 defunt process now03:35
*** armax has joined #openstack-infra03:36
ianwyep, just poking and noticed that03:36
pabelangerwhich usually means, we started an executor one another was shtting down03:36
pabelangershutting*03:36
ianwit did correctly find that the pid had disappeared though?  it did not timeout03:36
pabelangerI have to run now, sould be able to stop both sockets again03:36
*** bobh has quit IRC03:37
openstackgerritClark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical  https://review.openstack.org/51650203:37
ianwjeblair: yeah ... but at least checking that pid didn't even hit that timeout (http://paste.openstack.org/show/625041/)03:37
clarkbyamamoto: jeblair dmsimard ^ that is totally untested but I think we may want to do something like that to solve both the query logical name problem and the double indexing of job-output.txt/job-output.txt.gz03:37
clarkbjeblair: ^ btw I think yamamoto discovered the cause of the increase in index volume we are indexing console logs twice03:38
ianwi'm going to stop it manually and see what disappears03:38
jeblairyamamoto: thanks! :)03:38
openstackgerritKien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui gate jobs  https://review.openstack.org/51650303:38
*** armaan has joined #openstack-infra03:39
openstackgerritKien Nguyen proposed openstack-infra/openstack-zuul-jobs master: Remove Zun-ui legacy gate jobs  https://review.openstack.org/51650403:39
ianwok the init.d stop has returned, the process from the pid file is stil lthere03:41
openstackgerritKien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui legacy gate jobs  https://review.openstack.org/51650303:42
jeblairreplacement process was probably unable to create a socket file03:42
ianwzuul     22102 11596  0 Oct30 ?        00:00:00 [git] <defunct>03:44
ianwzuul     22215     1  0 Oct30 ?        00:00:00 ssh -i /var/lib/zuul/ssh/id_rsa -p 29418 zuul@review.openstack.org git-upload-pack '/openstack/tripleo-heat-templates'03:44
ianwthat ssh parented to init ...03:44
ianwok, 5301 disappeared after 03:47:10 - 03:39:4503:47
ianwso 10 minutes minimum i guess03:48
ianwanw@ze01:~$ ps -aef | grep [z]uul-e03:48
ianwzuul     11596     1  8 Oct27 ?        06:42:58 /usr/bin/python3 /usr/local/bin/zuul-executor03:48
ianwzuul     11599 11596  0 Oct27 ?        00:00:41 /usr/bin/python3 /usr/local/bin/zuul-executor03:48
ianwzuul     21113 11599  0 Oct29 ?        00:00:00 /usr/bin/python3 /usr/local/bin/zuul-executor03:48
ianwi'll manually clean up these03:48
*** dhinesh_ has joined #openstack-infra03:53
*** dhinesh has quit IRC03:53
*** markvoelker has quit IRC03:55
*** ijw has joined #openstack-infra04:04
*** ykarel has joined #openstack-infra04:06
*** udesale has joined #openstack-infra04:06
*** ijw has quit IRC04:09
*** xingchao has joined #openstack-infra04:13
*** armax_ has joined #openstack-infra04:21
*** rkukura_ has joined #openstack-infra04:22
*** armax has quit IRC04:22
*** notmyname has quit IRC04:22
*** armax_ is now known as armax04:22
*** dhinesh_ has quit IRC04:23
*** jpena|off has quit IRC04:23
*** dhinesh has joined #openstack-infra04:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors  https://review.openstack.org/51015504:24
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts  https://review.openstack.org/51651004:24
*** vhosakot_ has joined #openstack-infra04:24
*** notmyname has joined #openstack-infra04:24
*** vhosakot_ has quit IRC04:24
*** sc` has quit IRC04:24
*** sc` has joined #openstack-infra04:25
*** jpena|off has joined #openstack-infra04:25
*** nhicher has quit IRC04:25
*** nhicher has joined #openstack-infra04:25
*** rkukura has quit IRC04:25
*** rkukura_ is now known as rkukura04:25
*** vhosakot has quit IRC04:27
*** cshastri has quit IRC04:31
*** vsaienk0 has joined #openstack-infra04:31
*** thorst has joined #openstack-infra04:34
*** gouthamr has quit IRC04:34
*** vhosakot has joined #openstack-infra04:34
*** thorst has quit IRC04:39
*** vsaienk0 has quit IRC04:41
*** xingchao has quit IRC04:49
*** armax has quit IRC04:50
*** zhurong has quit IRC04:55
*** Sukhdev has quit IRC04:55
*** markvoelker has joined #openstack-infra04:56
*** dhinesh has quit IRC05:06
*** janki has joined #openstack-infra05:06
yamamotocan in-repo .zuul.yaml have periodic jobs?05:06
*** liusheng has quit IRC05:07
*** liusheng has joined #openstack-infra05:07
*** edmondsw has joined #openstack-infra05:13
*** edmondsw has quit IRC05:17
*** sree has joined #openstack-infra05:19
*** gildub has quit IRC05:25
*** janki has quit IRC05:27
*** janki has joined #openstack-infra05:27
*** markvoelker has quit IRC05:30
*** mat128 has quit IRC05:30
*** yamahata has joined #openstack-infra05:35
*** xingchao has joined #openstack-infra05:36
*** gildub has joined #openstack-infra05:37
*** kiennt26 has quit IRC05:42
*** gildub has quit IRC05:46
*** zhurong has joined #openstack-infra05:46
*** threestrands has quit IRC05:46
*** cshastri has joined #openstack-infra05:58
ianwpabelanger: ahhh! "path": "/proc/11206\n/status" ... the '\n' is why it doesn't wait properly05:58
*** threestrands has joined #openstack-infra05:58
*** threestrands has quit IRC05:58
*** threestrands has joined #openstack-infra05:58
*** threestrands has quit IRC06:03
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601006:05
*** kiennt26 has joined #openstack-infra06:05
*** ijw has joined #openstack-infra06:05
openstackgerritChason Chan proposed openstack-infra/project-config master: Add pike branch for OpenStack-Manuals gerritbot  https://review.openstack.org/51652306:08
*** ijw has quit IRC06:10
*** gildub has joined #openstack-infra06:12
*** dhajare has joined #openstack-infra06:15
*** aeng has quit IRC06:19
AJaegeryamamoto: yes, should be possible - try it out - and point me to your change for review06:22
*** esberglu has quit IRC06:24
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601006:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors  https://review.openstack.org/51015506:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts  https://review.openstack.org/51651006:25
*** markvoelker has joined #openstack-infra06:27
*** gongysh has joined #openstack-infra06:28
*** nikhil has quit IRC06:30
*** thorst has joined #openstack-infra06:35
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601006:35
*** linkedinyou has joined #openstack-infra06:37
*** ijw has joined #openstack-infra06:38
openstackgerritMerged openstack-infra/project-config master: Setup Contributor Guide in Storyboard  https://review.openstack.org/51646206:38
*** thorst has quit IRC06:39
*** ijw has quit IRC06:42
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601006:44
*** hemna_ has quit IRC06:45
*** hemna_ has joined #openstack-infra06:46
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601006:53
*** tosky has joined #openstack-infra06:53
*** ijw has joined #openstack-infra06:54
*** vhosakot has quit IRC06:54
*** jtomasek has joined #openstack-infra06:58
*** ijw has quit IRC06:58
*** rcernin has quit IRC06:59
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601007:00
*** jtomasek has quit IRC07:00
*** markvoelker has quit IRC07:00
openstackgerritRui Chen proposed openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor  https://review.openstack.org/51653207:03
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing  https://review.openstack.org/51601007:05
*** spectr has joined #openstack-infra07:08
*** salv-orl_ has quit IRC07:11
*** salv-orlando has joined #openstack-infra07:11
*** yamahata has quit IRC07:14
*** esberglu has joined #openstack-infra07:15
*** salv-orlando has quit IRC07:16
*** kiennt26 has quit IRC07:17
*** pcaruana has joined #openstack-infra07:17
*** vsaienk0 has joined #openstack-infra07:18
*** aviau has quit IRC07:19
*** esberglu has quit IRC07:19
*** tosky has quit IRC07:19
*** kiennt26 has joined #openstack-infra07:19
*** aviau has joined #openstack-infra07:19
*** gildub has quit IRC07:21
*** salv-orlando has joined #openstack-infra07:21
*** gildub has joined #openstack-infra07:22
openstackgerritBhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project  https://review.openstack.org/51533707:24
openstackgerritBhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project  https://review.openstack.org/51533707:26
openstackgerritBhagyashri Shewale proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project  https://review.openstack.org/51653707:26
*** vsaienk0 has quit IRC07:32
openstackgerritNam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican  https://review.openstack.org/51039007:32
*** jtomasek has joined #openstack-infra07:33
*** vsaienk0 has joined #openstack-infra07:34
openstackgerritNam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican  https://review.openstack.org/51039007:38
*** linkedinyou has quit IRC07:39
*** rcernin has joined #openstack-infra07:44
*** shardy has joined #openstack-infra07:45
*** shardy has quit IRC07:45
*** ffledgling has left #openstack-infra07:48
*** shardy has joined #openstack-infra07:49
*** gildub has quit IRC07:50
openstackgerritNam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Barbican legacy jobs  https://review.openstack.org/51041407:54
*** markvoelker has joined #openstack-infra07:58
*** ykarel is now known as ykarel|lunch08:03
*** tmorin has joined #openstack-infra08:09
*** ralonsoh has joined #openstack-infra08:15
*** Liced has joined #openstack-infra08:15
*** tesseract has joined #openstack-infra08:16
*** sree has quit IRC08:17
*** ccamacho has joined #openstack-infra08:17
*** kiennt26 has quit IRC08:17
*** priteau has joined #openstack-infra08:22
leyalHi , I need some help - when i tried to upload a patch - i getting the following message : "Received disconnect from 104.130.246.91 port 29418:12: Too many concurrent connections (64) - max. allowed: 64"08:22
leyalBut i don't have any open connection to 104.130.246.91 ..08:22
*** yamamoto has quit IRC08:23
openstackgerritNiraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project  https://review.openstack.org/51655008:26
openstackgerritNiraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project  https://review.openstack.org/51655008:27
openstackgerritNiraj Singh proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project  https://review.openstack.org/51655208:27
*** pgadiya has joined #openstack-infra08:27
*** salv-orlando has quit IRC08:27
*** salv-orlando has joined #openstack-infra08:28
*** d0ugal has quit IRC08:29
*** d0ugal has joined #openstack-infra08:29
*** markvoelker has quit IRC08:30
*** alexchadin has joined #openstack-infra08:31
*** gcb has joined #openstack-infra08:32
*** salv-orlando has quit IRC08:32
*** jpena|off is now known as jpena08:33
*** ociuhandu has joined #openstack-infra08:42
*** ykarel|lunch is now known as ykarel08:42
*** amoralej|off is now known as amoralej08:44
*** jpich has joined #openstack-infra08:44
*** hashar has joined #openstack-infra08:46
*** edmondsw has joined #openstack-infra08:49
*** salv-orlando has joined #openstack-infra08:50
*** edmondsw has quit IRC08:53
*** sdague has joined #openstack-infra08:53
*** dhajare has quit IRC08:56
*** baoli has quit IRC08:58
*** baoli has joined #openstack-infra08:58
ianwleyal: are you behind some sort of nat?08:58
leyalianw , thanks for answering me . i am from working from my home - so i am the only one that will gerrit from this network ..09:00
*** dingyichen has quit IRC09:01
*** gmann is now known as gmann_afk09:02
*** cuongnv has quit IRC09:03
*** gcb_ has joined #openstack-infra09:03
*** baoli has quit IRC09:03
*** zhurong has quit IRC09:03
*** gcb has quit IRC09:04
*** annp has quit IRC09:04
*** dhajare has joined #openstack-infra09:05
ianwleyal: i can see logins from yourself but no particular errors.  is this persistent?09:06
leyalianw, It's started yesterday , and since than it's persistent , (i tried git review ~10 times in the last 3 hours )09:10
Licedhi, ajeaeger told me yesterday that translation job was broken 10 days ago. the translation doesn't work for https://github.com/openstack/networking-bgpvpn and now the last merge on the project was yesterday. so my setup translation doesn't work and I don't find the solution09:11
Licedtranslation support is done https://review.openstack.org/486349 and translation is activated in project-config https://review.openstack.org/509178 but with the last merge the project in zanata is still empty09:14
*** jascott1 has quit IRC09:14
*** jascott1 has joined #openstack-infra09:15
*** martinkopec has joined #openstack-infra09:15
*** Kevin_Zheng has joined #openstack-infra09:15
*** lucas-afk is now known as lucasagomes09:19
*** jascott1 has quit IRC09:19
*** electrofelix has joined #openstack-infra09:20
ianwgerrit2@review:~$ ssh -i review_site/etc/ssh_host_rsa_key -p 29418 'Gerrit Code Review'@127.0.0.1 gerrit show-connections -n | grep 'a/26131' | wc -l09:21
ianw6409:21
ianwinfra-root: ^ somehow leyal is leaking connections09:21
*** yamamoto has joined #openstack-infra09:24
ianwleyal: i've forcibly closed all the open connections, can you try again?09:24
*** markvoelker has joined #openstack-infra09:27
*** jistr|mtgs is now known as jistr09:29
leyalianw, i tried again and it's ok now ..09:32
*** yamamoto has quit IRC09:34
ianwok, and i confirmed there was no open connection just now, so it doesn't appear to be leaking any more09:34
*** liujiong_lj has joined #openstack-infra09:35
*** liujiong has quit IRC09:35
ianwseeing as both are from ip's not your current one, but from your isp, i think it's probably transient.  keep an eye, if problems reoccur you can point back to this in logs and we can investigate further09:35
*** SpamapS has quit IRC09:36
*** sflanigan has quit IRC09:36
*** sflanigan has joined #openstack-infra09:36
*** sflanigan has joined #openstack-infra09:36
*** bradm has quit IRC09:36
ianw("both" above being the ip's against the open sessions that i killed, to be clear)09:37
*** wolverineav has joined #openstack-infra09:38
leyalianw, thanks !09:39
*** bradm has joined #openstack-infra09:39
*** SpamapS has joined #openstack-infra09:40
*** shardy has quit IRC09:42
*** shardy has joined #openstack-infra09:42
*** dsariel__ has joined #openstack-infra09:44
*** wolverineav has quit IRC09:46
*** wolverineav has joined #openstack-infra09:47
*** owalsh- is now known as owalsh09:47
ianwjeblair / pabelanger: i have not ended up touching ze02 as i haven't had a chance to look for any info on the finger death.  i can see tomorrow if you don't get to it09:48
*** slaweq has joined #openstack-infra09:49
openstackgerritNiraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project  https://review.openstack.org/51655009:51
*** wolverineav has quit IRC09:51
*** slaweq_ has quit IRC09:51
*** pgadiya has quit IRC09:52
*** bradm has quit IRC09:53
*** bradm has joined #openstack-infra09:54
*** sambetts|afk is now known as sambetts09:55
*** kjackal_ has joined #openstack-infra09:55
*** mandre has quit IRC09:55
*** mandre_ has joined #openstack-infra09:56
*** mandre_ is now known as mandre09:56
openstackgerritMerged openstack-infra/project-config master: Fix Grafana neutron-lib dashboard  https://review.openstack.org/51480109:59
*** namnh has quit IRC10:00
*** armaan_ has joined #openstack-infra10:00
*** jistr_ has joined #openstack-infra10:00
*** hemna- has joined #openstack-infra10:00
*** niska` has joined #openstack-infra10:00
*** markvoelker has quit IRC10:00
*** xhku_ has joined #openstack-infra10:01
openstackgerritMerged openstack-infra/project-config master: Publish requirements loci images to DockerHub  https://review.openstack.org/51294110:01
openstackgerritMerged openstack-infra/project-config master: ironic: Remove publish-to-pypi add release-openstack-server  https://review.openstack.org/51645310:01
*** witek has quit IRC10:01
*** niska has quit IRC10:01
*** jistr has quit IRC10:01
*** jschlueter has quit IRC10:01
*** hemna has quit IRC10:01
*** fbouliane has quit IRC10:01
*** michaelxin has quit IRC10:01
*** timrc has quit IRC10:01
*** armaan has quit IRC10:01
*** rwsu has quit IRC10:01
*** krtaylor has quit IRC10:01
*** askb has quit IRC10:01
*** zerick has quit IRC10:01
*** migi has quit IRC10:01
*** admcleod_ has quit IRC10:01
*** admcleod has joined #openstack-infra10:01
*** zerick has joined #openstack-infra10:01
*** isq_ has joined #openstack-infra10:01
*** askb has joined #openstack-infra10:02
*** krtaylor has joined #openstack-infra10:02
*** rwsu has joined #openstack-infra10:02
*** witek has joined #openstack-infra10:02
*** michaelxin has joined #openstack-infra10:02
*** timrc has joined #openstack-infra10:02
*** migi has joined #openstack-infra10:02
*** isq has quit IRC10:03
*** Jeffrey4l has quit IRC10:04
*** zoli has quit IRC10:04
*** pgadiya has joined #openstack-infra10:06
*** pblaho has joined #openstack-infra10:06
*** zoli has joined #openstack-infra10:06
*** Jeffrey4l has joined #openstack-infra10:07
*** pblaho has quit IRC10:09
*** pblaho has joined #openstack-infra10:09
*** bkero has quit IRC10:11
*** kota_ has quit IRC10:11
*** kota_ has joined #openstack-infra10:11
*** bkero has joined #openstack-infra10:12
*** tobiash has quit IRC10:12
*** tobiash has joined #openstack-infra10:15
*** e0ne has joined #openstack-infra10:15
*** liujiong_lj has quit IRC10:18
*** pgadiya has quit IRC10:24
*** udesale has quit IRC10:26
*** sree has joined #openstack-infra10:29
*** boden has joined #openstack-infra10:33
*** sree has quit IRC10:33
*** sree has joined #openstack-infra10:35
*** jschlueter|znc has joined #openstack-infra10:35
*** hemna_ has quit IRC10:36
*** thorst has joined #openstack-infra10:36
*** edmondsw has joined #openstack-infra10:37
*** pgadiya has joined #openstack-infra10:38
*** yamamoto has joined #openstack-infra10:39
*** sree has quit IRC10:39
*** edmondsw has quit IRC10:41
*** ociuhandu has quit IRC10:42
*** [HeOS] has quit IRC10:42
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364510:43
*** thorst has quit IRC10:43
AJaegerLiced: let me check...10:46
*** ociuhandu has joined #openstack-infra10:46
AJaegerLiced: that merged yesterday at a time that Zuul was unhappy ;(10:46
AJaegerWe had to restart Zuul and the post job never run.10:47
AJaegerLiced: So, waiting for next merge ;(10:47
*** vsaienk0 has quit IRC10:47
*** gongysh has quit IRC10:51
*** pbourke has quit IRC10:53
*** vsaienk0 has joined #openstack-infra10:53
*** pbourke has joined #openstack-infra10:54
*** ijw has joined #openstack-infra10:55
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Increase timeout for requirements propose job  https://review.openstack.org/51661010:57
*** markvoelker has joined #openstack-infra10:58
*** ijw has quit IRC10:59
*** huanxie has quit IRC10:59
hwoaranggood day11:02
hwoarangI am seeing some problems with some internal openstack mirrors for opensuse11:02
hwoarangas you can see from here http://logs.openstack.org/27/511227/2/gate/openstack-ansible-functional-opensuse-423/c0d460c/host/lxc-cache-prep-commands.log.txt.gz downloading a package takes a while a the job times out11:03
hwoarangthe mirror which is used is http://mirror.mtl01.inap.openstack.org/opensuse/...11:03
hwoarangfungi pabelanger dirk^11:04
hwoarangit's been hitting all the openstack-ansible jobs for quite a while11:04
dirkhwoarang: about to turn off mobile phone for the next 24 hours. Will be back from Sydney11:08
dirkhwoarang: it looks like I can reach that mirror. Maybe mtu or ipv6 issues?11:08
hwoarangdirk: the mirror is reachable but terribly slow as it seems from the job output. it takes 10 minutes to download a few packages11:09
*** priteau has quit IRC11:09
hwoarangand the job is killed11:09
*** priteau has joined #openstack-infra11:10
*** Hal has joined #openstack-infra11:10
*** Hal is now known as Guest9527711:10
*** do3 has joined #openstack-infra11:10
*** priteau has quit IRC11:11
*** do3 has left #openstack-infra11:11
*** hashar is now known as hasharLunch11:12
dirkhwoarang: smells like mtu issue to me11:12
dirkhwoarang: can you add debug output for testing that theory? Maybe there is something mtu related different just for opensuse11:13
hwoarangdirk: i will have a look but ubuntu also uses mtu 150011:14
*** armaan_ has quit IRC11:14
*** armaan has joined #openstack-infra11:15
dirkhwoarang: and you can reproduce the slowness?11:17
*** rcernin has quit IRC11:18
hwoarangi can't reproduce it outside of openstack gates11:18
*** ldnunes has joined #openstack-infra11:18
dirkhwoarang: weird.11:19
*** kjackal_ has quit IRC11:20
hwoarangi have no proof that it's mtu related because the host progresses with downloads, setup fine and it only starts to fail about 20 minutes down the road11:20
hwoarangwhen running a chroot zypper command to prepare a chroot11:20
hwoaranganyway11:20
*** stakeda has quit IRC11:22
*** panda|ruck|off is now known as panda|ruck11:24
openstackgerritArx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag  https://review.openstack.org/51662411:24
*** armaan has quit IRC11:25
*** armaan has joined #openstack-infra11:26
*** hemna has joined #openstack-infra11:29
vdrokgood morning folks. could someone take a look at https://review.openstack.org/515716? I've added job definitions as per ML thread, but still only the jobs from project-config are run11:29
*** wolverineav has joined #openstack-infra11:29
*** markvoelker has quit IRC11:31
*** sileht has quit IRC11:33
*** sileht has joined #openstack-infra11:34
*** ociuhandu has quit IRC11:36
*** armaan has quit IRC11:37
*** armaan has joined #openstack-infra11:37
*** armaan has quit IRC11:42
*** smatzek has joined #openstack-infra11:44
*** esberglu has joined #openstack-infra11:48
*** salv-orlando has quit IRC11:48
*** pgadiya has quit IRC11:51
*** rosmaita has joined #openstack-infra11:51
*** esberglu has quit IRC11:52
*** udesale has joined #openstack-infra11:54
*** salv-orlando has joined #openstack-infra11:55
*** jaypipes has joined #openstack-infra11:55
*** thorst has joined #openstack-infra11:58
*** kjackal_ has joined #openstack-infra12:05
*** hemna has quit IRC12:07
*** shardy is now known as shardy_lunch12:07
*** alexchadin has quit IRC12:09
*** yamamoto has quit IRC12:11
*** yamamoto has joined #openstack-infra12:11
*** thorre has quit IRC12:13
*** dprince has joined #openstack-infra12:13
*** thorre has joined #openstack-infra12:16
*** armaan has joined #openstack-infra12:17
*** hemna has joined #openstack-infra12:19
*** armaan has quit IRC12:20
*** markvoelker has joined #openstack-infra12:21
*** dhajare has quit IRC12:22
*** martinkopec has quit IRC12:24
*** salv-orlando has quit IRC12:24
*** martinkopec has joined #openstack-infra12:25
*** edmondsw_ has joined #openstack-infra12:25
*** rhallisey has joined #openstack-infra12:28
*** thorst_ has joined #openstack-infra12:29
*** thorst has quit IRC12:31
*** catintheroof has joined #openstack-infra12:31
*** catintheroof has quit IRC12:32
*** catintheroof has joined #openstack-infra12:32
*** rlandy has joined #openstack-infra12:34
*** trown|outtypewww is now known as trown12:34
*** dave-mccowan has joined #openstack-infra12:35
*** [HeOS] has joined #openstack-infra12:38
*** jcoufal has joined #openstack-infra12:39
*** salv-orlando has joined #openstack-infra12:39
*** jonher has joined #openstack-infra12:40
*** tosky has joined #openstack-infra12:44
Shrewspabelanger: jeblair: I searched ze02 logs for the new finger daemon logging on abnormal exception back to Oct 20th. Found nothing.12:45
dmsimardShrews: that catches finger:// urls being returned as logs ?12:46
*** lucasagomes is now known as lucas-hungry12:47
*** Dinesh_Bhor has quit IRC12:47
Shrewsdmsimard: no12:47
Shrewsonly unexpected exceptions from the daemon12:48
dmsimardShrews: oh, okay, cause I have a reproducer for those errors :)12:48
*** udesale has quit IRC12:49
*** udesale has joined #openstack-infra12:49
*** zhurong has joined #openstack-infra12:49
*** shardy_lunch is now known as shardy12:49
*** LindaWang has quit IRC12:50
*** jpena is now known as jpena|lunch12:51
*** dhajare has joined #openstack-infra12:52
LicedAJaeger: bad luck for me12:52
*** esberglu has joined #openstack-infra12:54
*** udesale has quit IRC12:56
*** yamamoto has quit IRC12:56
*** udesale has joined #openstack-infra12:56
*** mandre is now known as mandre_afk12:56
*** janki has quit IRC12:58
*** bh526r has joined #openstack-infra13:01
*** felipemonteiro has joined #openstack-infra13:03
*** martinkopec has quit IRC13:03
*** mat128 has joined #openstack-infra13:04
*** edmondsw_ is now known as edmondsw13:04
*** martinkopec has joined #openstack-infra13:04
*** LindaWang has joined #openstack-infra13:06
*** hasharLunch is now known as hashar13:09
*** yamamoto has joined #openstack-infra13:10
*** kgiusti has joined #openstack-infra13:13
*** bh526r has quit IRC13:13
*** bh526r has joined #openstack-infra13:14
*** yamamoto has quit IRC13:15
*** jcoufal_ has joined #openstack-infra13:16
*** jascott1 has joined #openstack-infra13:17
*** mriedem has joined #openstack-infra13:18
*** jcoufal has quit IRC13:18
*** salv-orlando has quit IRC13:19
*** hemna has quit IRC13:20
*** salv-orlando has joined #openstack-infra13:20
*** jascott1 has quit IRC13:21
*** bobh has joined #openstack-infra13:23
jonherIs it possible to merge two gerrit accounts? I seem to have double accounts because I've logged in using two different ubuntu accounts to review13:23
*** salv-orlando has quit IRC13:24
*** smatzek has quit IRC13:26
*** smatzek has joined #openstack-infra13:27
openstackgerritLuka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin  https://review.openstack.org/51667313:27
Shrewsvdrok: That's a good question. I'm not sure what's going on there. We'll have to wait for jeblair b/c I'm interested in the reason too.13:30
*** smatzek has quit IRC13:31
openstackgerritArx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag  https://review.openstack.org/51662413:32
*** nikhil has joined #openstack-infra13:32
*** jaosorior has quit IRC13:33
*** baoli has joined #openstack-infra13:35
vdrokShrews: ok, thank you13:36
openstackgerritLuka Peschke proposed openstack-infra/project-config master: Add initial jobs for CloudKitty Tempest plugin  https://review.openstack.org/51667913:36
*** ldnunes has quit IRC13:37
*** lbragstad has joined #openstack-infra13:37
*** ldnunes has joined #openstack-infra13:37
*** lucas-hungry is now known as lucasagomes13:41
*** eharney has joined #openstack-infra13:41
*** yamamoto has joined #openstack-infra13:41
*** zhurong has quit IRC13:42
*** hongbin has joined #openstack-infra13:43
*** dhajare has quit IRC13:43
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364513:44
*** yamamoto has quit IRC13:45
*** felipemonteiro_ has joined #openstack-infra13:46
openstackgerritLuka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin  https://review.openstack.org/51667313:46
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364513:48
*** amoralej is now known as amoralej|lunch13:48
*** kiennt26 has joined #openstack-infra13:49
*** felipemonteiro has quit IRC13:49
openstackgerritDmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload  https://review.openstack.org/19504313:50
*** jpena|lunch is now known as jpena13:52
*** esberglu has quit IRC13:53
mriedemgrenade seems to be busted against ocata changes http://logs.openstack.org/04/516404/1/check/legacy-grenade-dsvm-neutron-multinode/aa97464/logs/grenade.sh.txt.gz#_2017-10-30_18_28_09_86513:53
mriedemseeing ImportError issues with neutronlib13:54
mriedemdid something get EOL'ed?13:54
mriedemtonyb: ^13:54
mriedemlooks like at least neutron is eol13:54
mriedemfor newton13:54
mriedemsdague: once anything that stable/newton relies on is eol then don't we have to just kill the grenade job in ocata?13:56
fungijonher: yeah, we've seen that happen if you change which e-mail address you give the ubuntu sso when logging in (gerrit accounts map to ubuntu sso ids, not to launchpad profiles, so even if your multiple ubuntu sso ids are associated with the same lp profile they'll result in distinct accounts in gerrit)13:56
*** oanson has quit IRC13:57
fungijonher: what's the ssh username on one of the accounts? i should be able to use that to find the other account id by looking for e-mail address overlaps13:57
*** oanson has joined #openstack-infra13:57
fungior worst case i'll try to match them up by full name13:58
jonherold account id: 18279 that I want to keep. New one is ID 2701313:58
fungieven better. looking now13:58
jonherthanks13:58
*** hemna has joined #openstack-infra13:58
*** smatzek has joined #openstack-infra13:59
*** iyamahat has joined #openstack-infra13:59
*** gcb_ has quit IRC14:00
*** jokke_ has joined #openstack-infra14:00
*** gcb_ has joined #openstack-infra14:00
*** janki has joined #openstack-infra14:01
*** smatzek has quit IRC14:03
fungijonher: i've moved your new openid from account 27013 to account 18279 and set 27013 inactive. you may need to log out of and back into the gerrit webui to be in the correct account again14:03
jonherAlright, thanks :)14:03
fungimy pleasure14:03
fungilet us know if you have any further trouble with it14:03
*** smatzek has joined #openstack-infra14:05
*** mandre_afk is now known as mandre14:05
*** ykarel has quit IRC14:06
sdaguemriedem: yeh, it should be14:07
sdaguegrenade should be turned off first before eoling branches14:08
mriedemok, trying to figure out how to do that in the new zuulv3 world14:08
*** esberglu has joined #openstack-infra14:09
fungipatch to stable/ocata to remove grenade jobs you've declared within that branch, or patch to project-config to adjust the branch filter for grenade jobs if defined there14:09
*** marst has joined #openstack-infra14:09
mriedemnot openstack-zuul-jobs?14:10
mriedemok project-config it is14:10
AJaegermriedem: might be openstack-zuul-jobs as well - if you want to patch the job directly14:11
fungiyeah, i'm looking now to see where that branch filter is set14:12
*** catintheroof has quit IRC14:12
AJaegerfungi: we can set it in openstack-zuul-jobs on the job itself - and then it applies everywhere14:12
mriedemi know how to do it per-job in openstack-zuul-jobs/zuul.d/zuul-legacy-jobs14:12
AJaegeryep14:12
mriedemsince layout.yaml is gone in project-config14:12
mriedemok14:12
fungiyeah, we set it in project-config for the legacy grenade jobs at the moment, like http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n5814:13
AJaegerconfig-core, the requirements proposal job times out, please review this change to have it run longer https://review.openstack.org/51661014:14
*** catintheroof has joined #openstack-infra14:14
AJaegerfungi: yes, that's per job and project. If we want to do it globally, it's openstack-zuul-jobs14:14
fungiright there's a mix of the two right now but once we get the legacy jobs cleaned up we'll hopefully just have one place to update that in job-templates14:15
fungier, project-templates14:16
*** rbrndt has joined #openstack-infra14:16
AJaegerindeed that one as well...14:16
*** armax has joined #openstack-infra14:17
openstackgerritMatt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata  https://review.openstack.org/51669414:17
mriedemAJaeger: i think this ^14:17
AJaegermriedem: yes, expect so14:18
fungiwell, except for the ones which are still set in project-config in the projects.yaml for now14:19
fungialso, i don't think you want to remove the grenade-forward jobs from ocata14:19
mriedemwhat does the forward job do again?14:19
fungisince those test that you can upgrade from a proposed stable/ocata change to stable/pike14:19
mriedemoh..14:19
mriedemwill projects.yaml override what's in openstack-zuul-jobs/14:20
mriedem?14:20
*** jaosorior has joined #openstack-infra14:20
sdaguefungi: they've never been voting, I'm not convinced they even work14:20
fungisdague: perhaps we should remove them entirely in that case?14:20
funginewton eol isn't a reason to remove the grenade-forward jobs from ocata since they don't touch newton, but if there is a good reason to just drop the grenade-forward jobs globally then probably better to do that14:21
sdagueyeh, that's fine14:22
fungimriedem: i think the variants in the projects.yaml in project-config will override what's in openstack-zuul-jobs (though there aren't many)14:23
mriedemok i'll tinker with project-config too then14:23
*** armax has quit IRC14:23
AJaegerI agree with fungi, we need to update both14:24
openstackgerritMatt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata  https://review.openstack.org/51669414:24
*** amoralej|lunch is now known as amoralej14:27
openstackgerritMatt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata  https://review.openstack.org/51669414:27
*** armax has joined #openstack-infra14:28
*** Guest53850 has quit IRC14:29
*** lamt has joined #openstack-infra14:29
jeblairvdrok: i don't see what's wrong with that patch.  i lost a debugging tool in a recent zuul restart; i may need to restart it again to get it back to dig into that.14:30
*** catintheroof has quit IRC14:30
*** eharney has quit IRC14:31
jeblairi'll do that now, unless anyone objects14:32
dmsimardAJaeger: commented on https://review.openstack.org/#/c/516397/14:33
fungijeblair: no objection from me14:33
AJaegerdmsimard: that's so far the only repo that needs it and therefore I did it at repo level14:34
AJaegerdmsimard: do you see that this is needed by more repos?14:35
dmsimardAJaeger: that's curious, why wouldn't this be required for other repos ? it's the same job isn't it ?14:35
*** catintheroof has joined #openstack-infra14:35
AJaegerdmsimard: that repo installs in its tox_install requirements repo...14:36
*** ykarel has joined #openstack-infra14:36
*** salv-orlando has joined #openstack-infra14:37
*** spectr has quit IRC14:37
jeblairrestarted and re-enqueueing now14:37
*** vsaienk0 has quit IRC14:38
*** spectr has joined #openstack-infra14:39
*** salv-orl_ has joined #openstack-infra14:40
*** nicolasbock has joined #openstack-infra14:41
dmsimardAJaeger: the failing playbook is actually this one: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/file/7f67273f-9e6a-4fa4-91a4-a6ab4c28c511/ ## task: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/result/a2ce2da7-1b23-41fe-9d9c-07e3edc27aea/14:42
dmsimard"python-tarball/run.yaml" is used for those jobs: 1) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n132 2) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n153 and 3) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n40114:42
dmsimard#3, our failing job, is the only one without requirements14:42
jaosoriorafter zuul was restarted, should the jobs be rechecked or will they get requeued automatically?14:43
jeblairjaosorior: they are being re-enqueued now14:43
dmsimardjaosorior: they get requeued by the operator who restarts zuul14:43
dmsimardit's not automatic in the sense that it still requires manual injection, right jeblair ?14:43
AJaegerdmsimard: AH!14:44
dmsimardAJaeger: does that make sense ?14:44
jaosoriorI see14:44
openstackgerritMerged openstack-infra/project-config master: Remove zuul/mapping and job  https://review.openstack.org/51602914:44
jeblairjaosorior: if your change isn't there now, we may have missed it so you should recheck it14:44
AJaegerdmsimard: my hero, thanks for digging into this. Let me double check...14:44
jeblairdmsimard: ya14:44
*** salv-orlando has quit IRC14:44
*** amotoki has quit IRC14:44
*** jbernard has quit IRC14:44
*** cshastri has quit IRC14:44
jaosoriorthat makes sense14:44
jaosoriorjeblair, dmsimard: thanks for the info :D14:45
*** charz has quit IRC14:45
jaosoriorthe workings of zuul are still quite unknown to me14:45
AJaegerdmsimard: yes, makes sense. Do you want to patch it?14:45
* AJaeger will abandon his ...14:45
dmsimardAJaeger: it's the same repo you can just submit another patchset14:45
*** jbernard has joined #openstack-infra14:45
dmsimardI can submit it if you want14:45
jeblairvdrok: well, that's annoying.  515716 appears to be working correctly after the restart.14:46
*** eharney has joined #openstack-infra14:46
AJaegerdmsimard: if you have time, go for it, please14:46
dmsimardack14:46
vdrokjeblair: I see, well, that's not so bad :)14:46
jeblairvdrok: i guess let me know if it happens again :|14:46
vdroksure, will do. thanks for looking into this14:46
fungidmsimard: correct, we run a python script to generate a shell script based on the contents of specific pipelines obtained from the scheduler's status.json before stopping the service, and then run that shell script once the scheduler is back up and running again (the shell script consists of preformatted calls to the zuul enqueue rpc cli)14:47
*** udesale has quit IRC14:47
*** amotoki has joined #openstack-infra14:47
dmsimardfungi: if that script hasn't changed since v2, I know what script it is :)14:47
*** charz has joined #openstack-infra14:47
*** psachin has quit IRC14:48
fungidmsimard: it's changed just ever so slightly to add --tenant14:49
*** david-lyle has quit IRC14:50
openstackgerritMatt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata  https://review.openstack.org/51669414:50
openstackgerritDmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload  https://review.openstack.org/19504314:50
openstackgerritMatt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions  https://review.openstack.org/51670514:53
dmsimardAJaeger: so, digging a bit further for that requirements thing... it turns out this is the culprit: http://codesearch.openstack.org/?q=%5C%24ZUUL_CLONER&i=nope&files=&repos=14:55
dmsimardtools/tox_install.sh all over the place uses ZUUL_CLONER to ensure that (amongst other things) requirements is there14:56
*** xarses has joined #openstack-infra14:56
jeblairmordred is working to change the pti so we build tarballs differently and won't need that anymore14:58
fungiyep, that's so local devs can experience a consistent experience and don't have to wonder why their unconstrained unit tests on their workstation are broken while the same patches pass testing in our constrained ci jobs14:58
*** dtantsur|afk is now known as dtantsur14:58
*** salv-orl_ has quit IRC14:58
*** salv-orlando has joined #openstack-infra14:58
*** gcb_ has quit IRC14:58
fungibut overriding how tox builds all its virtualenvs is a pretty clumsy hammer, and then when we turn around and use tox for jobs which don't actually need it we end up with nasty side effects like that14:59
dmsimardAJaeger: that gives us the list of repos requiring requirements http://codesearch.openstack.org/?q=openstack%2Frequirements&i=nope&files=tools%2Ftox_install.sh&repos=14:59
jeblairlet's just put it in the job for now, so it's easy to clean up when mordred finishes his work14:59
jeblairor template or whatever.  ie, not on individual projects.15:00
dmsimardyup, taking care of it15:00
*** gcb_ has joined #openstack-infra15:01
*** diablo_rojo has quit IRC15:01
openstackgerritDavid Moreau Simard proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball  https://review.openstack.org/51639715:01
dmsimardAJaeger: ^15:01
dmsimardEmilienM: are you around ?15:02
*** yamamoto has joined #openstack-infra15:04
jeblairi'm going to restart all of the executors15:05
*** lbragstad has quit IRC15:05
jeblairi don't think they were cleaned up properly after the unclean shutdown the other day15:05
openstackgerritClark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical  https://review.openstack.org/51650215:05
Shrewsjeblair: would this be a good time to restart the np launchers? we have a couple of bug fixes that should go in15:06
EmilienMdmsimard: yes15:06
*** salv-orlando has quit IRC15:06
jeblairShrews: yes... though i think theoretically any time should be fine? :)15:07
*** salv-orlando has joined #openstack-infra15:07
pabelangerdmsimard: AJaeger: when we remove zuul-cloner from images, that is going to break branch-tarball jobs right? re 516397. Maybe we need to be creating a legacy branch tarball job that will still use zuul-cloner, or I think mordred has patches to remove the needed for tox_install.sh15:07
AJaegerdmsimard: we need horizon and neutron as well - do you want to update again or shall I?15:07
Shrewsjeblair: infra-root: i'm going to do that then. restarting launchers (unless i hear any objections)15:07
*** lbragstad has joined #openstack-infra15:08
pabelanger+115:08
jeblairpabelanger: are you suggesting we have a job using the old v2 zuul-cloner?15:08
*** sree has joined #openstack-infra15:08
fungiShrews: sounds good, thanks15:09
AJaegerpabelanger: yes, this needs some analysis - the use of tox_install and zuul-cloner will be fun once we remoe zuul-cloner from images15:09
jeblairno wait15:09
jeblairAJaeger, pabelanger: no jobs should be using the copy of zuul-cloner on the images, if you think a job is, please investigate and confirm that now15:10
pabelangerjeblair: not sure, just indicating when we remove zuul-cloner from images, and base playbook branch-tarballs job look like they are going to break15:10
Shrewsinfra-root: restarted nodepool-launcher on both nl01 and nl0215:10
jeblairpabelanger: *please* confirm that.  it should not be the case.15:10
pabelangeryes, looking now15:10
openstackgerritMatt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata  https://review.openstack.org/51669415:11
*** ykarel has quit IRC15:11
AJaegerdmsimard: will update15:11
dmsimardAJaeger: I guess it would avoid having to create -horizon and -neutron variants (which I dislike very much)15:11
openstackgerritMatt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions  https://review.openstack.org/51670515:11
openstackgerritMatt Riedemann proposed openstack-infra/project-config master: Remove legacy-grenade-dsvm-neutron-nova-next  https://review.openstack.org/51671115:11
AJaegerdmsimard: agreed, let me fix15:11
*** salv-orlando has quit IRC15:11
*** sree has quit IRC15:12
AJaegerjeblair: I see both required-projects with "name: some-repo" and with just "some-repo". Are both valid?  Any preference?15:13
AJaegercheck project-config/zuul.d/jobs.yaml15:13
jeblairAJaeger: both valid; if not using a branch specifier, i'd prefer just "some-repo"15:14
*** yamamoto has quit IRC15:14
*** vsaienk0 has joined #openstack-infra15:15
AJaegerok15:15
EmilienMdmsimard: what's up?15:15
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball  https://review.openstack.org/51639715:15
dmsimardEmilienM: sorry got sidetracked15:15
AJaegerdmsimard: updated ^15:15
*** dtantsur is now known as dtantsur|afk15:15
dmsimardEmilienM: mnaser has a nice series of patches here but I put them on hold as you were working on migration https://review.openstack.org/#/c/515972/115:15
mriedemAJaeger: fungi: openstack-zuul-jobs with a depends-on to project-config won't work, right?15:16
mriedemproject-config changes still have to merge first?15:16
mriedemhttps://review.openstack.org/#/c/516694/15:16
EmilienMdmsimard: he can go ahead - I won't have time to work on that until after summit.15:16
dmsimardEmilienM: it's work that you'd need to do after the migration anyway, so I thought maybe we ought to land those but that means you'll need to rebase15:16
EmilienMdmsimard: I'll rebase15:16
AJaegermriedem: yes, it has to. So, push them both up and then recheck the openstack-zuul-jobs once project-config merged15:16
dmsimardEmilienM: ok, it's not going to be so much a rebase as a rewrite but sure15:16
EmilienMdmsimard: no worries15:16
jeblairmriedem: they won't run tests with the new content, but they will still perform config syntax validation and ensure merging in the right order, so generally worth including the footer still.15:16
dmsimardEmilienM: ack, are you confident if I review those or would you rather we keep jobs frozen for now ?15:17
mriedemjeblair: then i don't know why this is failing https://review.openstack.org/#/c/516694/15:17
dmsimardEmilienM: they're no-op for the most part, just reducing duplication and streamlining15:17
AJaegerdmsimard: EmilienM gave a +1, so you could +2A if you like15:17
dmsimardAJaeger: double checking :)15:17
dmsimardtripleo has had a bumpy gate recently15:18
AJaegerdmsimard: sure, appreciated...15:18
clarkbdmsimard: fyi https://review.openstack.org/516502 may be of interest to you15:18
dmsimardmriedem: you can't do a depends-on on a project-config patch15:18
*** jbadiapa has quit IRC15:18
mriedemdmsimard: that's what i thought, but see jeblair's comment15:18
EmilienMdmsimard, AJaeger I need to review it properly15:19
dmsimardmriedem: project-config is a "special" project in the context of zuul, it is used for storing secrets and trusted jobs15:19
mriedemjeblair: but i see this now, "The syntax of the configuration in this change has been verified to be correct once the config project change upon which it depends is merged, but it can not be used until that occurs."15:19
mriedemwhich is different, and better15:19
mriedemso ok15:19
pabelangerjeblair: okay, so projects that use tox_install.sh (eg: python-ironicclient) and publish-openstack-python-branch-tarball jobs will be okay when we merge https://review.openstack.org/514483/ (delete zuul-env from DIB) but will break when we land https://review.openstack.org/513506/ (fetch-zuul-cloner from base)15:19
EmilienMdmsimard, AJaeger : a commit message would have helped15:19
*** jcoufal_ has quit IRC15:19
*** jcoufal has joined #openstack-infra15:20
pabelangerI'm thinking, we could create legacy-publish-openstack-python-branch-tarball and parent to base-legacy which will pull in fetch-zuul-cloner15:20
jeblairmriedem: yeah. i don't know why that didn't happen the first time.... :|15:20
pabelangerthen update jobs using zuul-cloner to that15:20
dmsimardpabelanger: let's not create new legacy jobs15:20
mriedemjeblair: i think it was just the order in which i pushed the changes up15:20
dmsimardpabelanger: it was mentioned that mordred was working on fixing the different tox_install.sh15:20
jeblairdmsimard: people can and should use depends-on against project-config patches.  it lets zuul do config syntax validation, ensures they land in the right order, and helps human reviewers understand the sequencing.15:21
pabelangerwell, it isn't a new legacy job, it is just parenting to base-legacy, that pull in zuul-cloner15:21
EmilienMAJaeger, dmsimard : lgtm15:21
dmsimardjeblair: sure, I mean they can do it but it's not going to have the intended effect is what I mean. It "works" in the sense that it doesn't let that patch merge until the project-config patch merges but it doesn't actually apply and run the jobs intended to run15:22
pabelangerbut we have zuulv3 jobs, still using legacy code, which is something that is a little confusing too. Meaning, we are going to have breakages at some point15:22
dmsimardjeblair: so it... half works ?15:22
*** martinkopec has quit IRC15:23
jeblairdmsimard: look at mriedem's change and the message that zuul reported.  that's only possible because of the depends-on15:23
dmsimardpabelanger: legacy code embedded in project repositories15:23
dmsimardpabelanger: the jobs themselves aren't the ones doing zuul-cloner, it's tox_install.sh15:23
dmsimardfungi mentioned earlier that the approach with tox_install.sh is fairly clunky to begin with15:24
jeblairdmsimard: i'm just saying if folks ask, rather than saying "it doesn't work" say "it won't run jobs with the changes in effect but there are still several good reasons to do it".  i don't want folks to think they should stop using those footers.15:24
dmsimardjeblair: fair15:24
clarkbalso we've always used depends on just for the merge this first behavior15:25
clarkbregardless of how it affects jobs15:25
clarkbso that is not a regression and still valuable15:25
dmsimardclarkb: I use(d) it a lot for the zuul-cloner factor :)15:25
clarkbwe're definitely not keeping up with the logstash worker load (up to 94k jobs queued since yesterday evening's restart)15:26
mriedemAJaeger: thanks for hitting those patches15:27
clarkbI'd like to get https://review.openstack.org/#/c/516502/2 reviewed, tested, and in to see if not indexing job-output.txt twice helps there15:27
pabelangerdmsimard: right, but we need a plan to remove zuul-cloner that ideally doesn't break them. Today, the 2 patchs up to do so, will break them15:27
clarkbso reviews on that and thoughts on testing very much welcome15:27
clarkbI guess I need to update the issues ether pad too15:28
pabelangerclarkb: I'm still trying to bring online a new worker, another puppet issue I am debugging now15:28
dmsimardclarkb: sorry about that. I started down that road yesterday after our discussion, got sidetracked and then wanted to chat about it15:29
*** spectr has quit IRC15:29
clarkbdmsimard: its not a problem I learned new things with yamamoto's help15:29
openstackgerritJose Luis Franco proposed openstack-infra/tripleo-ci master: WIP: Upgrade UC and OC using tripleo-upgrade role  https://review.openstack.org/51564315:30
dmsimardclarkb: I'm not sure we want to blindly remove the .gz, I can see it affecting unexpected things -- but mostly because we gzip *by default* https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/upload-logs/tasks/main.yaml15:31
jeblairokay all zuul-related processes on the executors are stopped15:31
jeblairrestarting them now15:31
fungiclarkb et al: do we have any updates we want to give the board on our drive to beef up root sysadmin count in emea/apac? https://etherpad.openstack.org/p/syd-leadership-top-5-update15:32
dmsimardclarkb: I think one of the things we discussed was to not assume it would be gzipped and it might be specific to openstack-infra, but really it's there by default so I would adjust the e-r queries to take that into account15:32
*** camunoz has joined #openstack-infra15:33
*** catintheroof has quit IRC15:34
*** gyee has joined #openstack-infra15:34
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp  https://review.openstack.org/51671715:37
clarkbdmsimard: the problem is that we are ending up with job-output.txt and job-output.txt.gz on disk so we submit jobs to index both. So we need to pick one or the other. I chose picking the one that is backward compatbile with zuulv215:38
clarkbwe could choose to use the .gz but then we would have to udpate all the queries15:38
*** Liced has quit IRC15:39
*** pblaho has quit IRC15:40
*** catintheroof has joined #openstack-infra15:40
*** pblaho has joined #openstack-infra15:40
clarkb(worth nothing this is a behavior difference between gzip the command and ansible archive module, gzip doesn't leave the original around but ansible archive does)15:40
*** gcb_ has quit IRC15:40
clarkb*worth noting15:41
*** kiennt26 has quit IRC15:41
*** spectr has joined #openstack-infra15:42
*** catintheroof has quit IRC15:42
dmsimardclarkb: pretty sure we can get ansible to delete the extra file15:43
dmsimardhttp://docs.ansible.com/ansible/latest/archive_module.html "remove" "no" "Remove any added source files and trees after adding to archive."15:43
*** gcb_ has joined #openstack-infra15:43
clarkbwe can but I also don't want to rely on everyone's ansible doing the right thing so this is defensive15:44
dmsimardclarkb: you mean if I write a job that archives files relevant to me as an end user and then try to submit those ?15:45
clarkbyes15:45
dmsimard(and omit the remove parameter)15:46
dmsimardhmm15:46
pabelangerclarkb: okay, I think 516717 is our fix for logstash-workers, but waiting on zuul to report back15:46
*** notemerson has quit IRC15:47
openstackgerritMiguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron  https://review.openstack.org/51672415:47
clarkbpabelanger: I don't think you can use a before that way beacuse those are defines not classes15:47
clarkbpabelanger: instead you want to require in each of the defines that that file is in place15:48
pabelangerclarkb: Oh, right15:48
pabelangerHmm15:48
clarkbso you just need one require for each define after config_file15:48
pabelangeryah, I started doing that and switched15:48
pabelangerlet me update15:48
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp  https://review.openstack.org/51671715:49
pabelangerclarkb: ^15:49
dmsimardclarkb: in the current form of your patch I'm just worried of it backfiring in unexpected ways, but I'm not coming up with any examples to express my concerns.. closest I can come up with is matching .tar.gz. You mentioned the regexes are new, should we take that out and be explicit instead ? I guess we're here discussing this because the regex is matching unexpected things.15:50
*** xingchao has quit IRC15:50
*** gongysh has joined #openstack-infra15:51
*** xingchao has joined #openstack-infra15:51
*** gongysh has quit IRC15:51
clarkbdmsimard: it is two things though, one is the regex overmatching. The other is we have broken the (somewhat loose) contract we had around the file metadata we inject to elasticsearch15:51
clarkbI think this addresses both, whereas dropping the regex would only address one?15:51
clarkband as for things like tar those aren't valid indexable files anyways so would fail either way15:52
dmsimardWhat contract is that ?15:52
pabelangerdmsimard: clarkb: jeblair: fungi: mordred: AJaeger: we likely need to have some discussion around https://review.openstack.org/513506/ (Remove fetch-zuul-cloner from base job) should I add that to todays infra meeting or is that something we could hash out outside of the meeting?15:52
clarkbdmsimard: we've always indexed with filename and tags dropping the .gz even if that is actually the name on disk15:52
dmsimardpabelanger: discussing at the meeting is probably fair game15:52
clarkbdmsimard: because logically the file is foo.txt not foo.txt.gz and our webserver honors that as well15:52
dmsimardclarkb: console.html wasn't gzipped by default, was it ?15:52
dmsimardhm, yeah other files, maybe15:53
clarkbother files were console.html wasn't upfront but was eventually15:53
clarkband the webserver would serve console.html and console.html.gz from the same source15:53
dmsimardyeah due to mime types and live decompressing15:53
dmsimardand rewrite rules15:53
dmsimardI'm familiar with those bits..15:54
dmsimardneed to pick up my kids from school for lunch, I'll try to think about it15:54
clarkbpabelanger: logstash fix lgtm. Lets see if anyone else is willing to review that quickly15:55
clarkb(otherwise I think you can single approve)15:55
*** jbadiapa has joined #openstack-infra15:55
*** Apoorva has joined #openstack-infra15:56
clarkbpabelanger: I think we can talk about zuul cloner thigns in today's meeting, go ahead and add it as a zuulv3 subtopic15:56
*** Apoorva has quit IRC15:56
clarkbI expect the meeting will be relatively quick?15:56
clarkbI've got to pack :) so here is hoping15:56
*** Apoorva has joined #openstack-infra15:56
*** smatzek has quit IRC15:57
*** ihrachys has joined #openstack-infra15:57
*** Apoorva has quit IRC15:57
*** xingchao_ has joined #openstack-infra15:57
*** gongysh has joined #openstack-infra15:57
*** gongysh has quit IRC15:57
*** slaweq has quit IRC15:58
pabelangerk, added15:58
*** xingchao has quit IRC15:58
*** smatzek has joined #openstack-infra15:59
*** janki has quit IRC16:00
pabelangeralso added removal of jenkins to topic too16:00
*** bnemec has quit IRC16:00
*** iyamahat has quit IRC16:00
clarkbfyi for others packing sydney weather is supposed to be damp and relatively cool so don't be tricked by recent news of a heat wave16:01
*** smatzek_ has joined #openstack-infra16:01
* clarkb goes to find rain jacket16:01
*** xingchao_ has quit IRC16:02
*** yamamoto has joined #openstack-infra16:02
*** yamamoto has quit IRC16:02
*** smatzek has quit IRC16:03
*** jaosorior has quit IRC16:03
pabelangerya, I haven't looked at weather16:04
panda|ruckcloud-y with a chance of tarballs.16:05
* fungi will pack a hat16:06
*** david-lyle has joined #openstack-infra16:09
*** esberglu has quit IRC16:09
mwhahahaseeing errors in the gate16:09
inc0hey, I can't figure out what happens in our mariadb setup - did you change iptables in gates during zuulv3 transition?16:10
mwhahahano logs, just jobs in 'error'16:10
jeblairmwhahaha: zuul should have more detailed information on what the error is when it reports on the change16:10
mwhahahajeblair: http://zuulv3.openstack.org/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet16:10
mwhahaha40416:11
mwhahahafrom status page16:11
mwhahahajeblair: see 485172,516:11
jeblairmwhahaha: yeah, it doesn't have a log url16:11
*** bh526r has quit IRC16:11
jeblairmwhahaha: we'll know more when it reports16:11
*** xingchao has joined #openstack-infra16:11
*** smatzek_ has quit IRC16:12
*** edmondsw has quit IRC16:12
*** smatzek has joined #openstack-infra16:12
jeblair(those errors could probably be added to the status.json, but i don't think they are there now)16:12
dmsimardjeblair: not sure if it's related but I'm seeing an error for another job that did report back -- https://review.openstack.org/#/c/516397/ : openstack-tox-linters openstack-tox-linters : ERROR Project openstack/requirements does not have the default branch master16:13
dmsimardthat seems like an odd message16:13
jeblairdmsimard: could be.  i wonder if it's fallout from the executor restart.16:13
AJaegerwhat is this? " build-tox-manuals-checkbuild build-tox-manuals-checkbuild : ERROR Project openstack/requirements does not have the default branch master" - change  https://review.openstack.org/51669616:13
dmsimardjeblair, AJaeger: I just did a recheck on https://review.openstack.org/#/c/516397/ -- let's see if it reproduces16:14
AJaegerand also on  https://review.openstack.org/51639716:14
dmsimardAJaeger: we were just discussing that, yes16:14
AJaegerdmsimard: sorry, read backscroll - found bug, paced - and you beat me to it ;)16:14
AJaegerpasted I mean16:14
*** bnemec has joined #openstack-infra16:15
AJaegerclarkb: could you put the infra-publishing change on your review queue again, please? https://review.openstack.org/51601016:16
jeblairmwhahaha: seems likely the errors you're seeing are the same thing -- a transient issue caused by the zuul restart16:16
mwhahaha:/16:16
*** trown is now known as trown|lunch16:16
mwhahahait's really screwing with the gate, is there some sort of recovery that can be done?16:17
openstackgerritRonelle Landy proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag Also testing dlrn_hash_tag_newest set to specific hash.  https://review.openstack.org/51662416:17
mwhahahabefore the restart we were like 20hours behind16:17
*** panda|ruck is now known as panda|ruck|bbl16:18
jeblairmwhahaha: we could sacrifice the change at the head of the gate to force a reset on all the changes behind it16:18
fungior promote a change to cause it all to reshuffle16:19
mwhahahai already sacrafied a whole bunch of puppet jobs16:19
jeblairfungi: yeah16:19
* fungi is unsure whether promoting the first change would actually restart jobs16:19
jeblairfungi: i don't think so16:20
jeblairbut promoting the second one ahead of the first should16:20
*** bhavik1 has joined #openstack-infra16:20
AJaegerthe first three in the queue are looking fine, aren't they? So better let them through?16:20
jeblairAJaeger: our only tool is moving it to the top16:20
jeblairAJaeger: we could wait i guess16:20
mwhahahameh i guess we'll just have to deal16:20
mwhahahajust really frustrating16:21
AJaegerjeblair: at this time: I would wait - otherwise we lose the first two...16:21
jeblairmwhahaha: you don't like any of the proposed options?16:21
*** smatzek has quit IRC16:21
mwhahahanot really since we haven't been able to merge anything in like a day16:21
fungiwell, we can give promote a list for which the first three are unchanged but swap the fifth for the fourth16:21
mwhahahahow about retry if error :/16:21
jeblairfungi: oh we can?16:21
*** smatzek has joined #openstack-infra16:21
fungipromote used to at least take a list of changes16:22
jeblairmwhahaha: these are "permanent" errors16:22
pabelangerjust looking into zuul-executors, we look to be swapping 2GB-3GB (on average) across all of them16:22
AJaegerfungi: that might just work...16:22
dmsimardunrelated and don't want to sidetrack but wanted to point out maybe there's an issue with nodepool? looking at grafana, we're capping at ~830 nodes.. I see an uptick in failures, it seems like there's a lot of nodes in "ready" state perhaps not being allocated to jobs in a timely fashion ?16:22
dmsimardwe should be capping at higher than 830 nodes, we saw north of 900 easily after we shifted back inap's capacity to v316:22
jeblairmwhahaha: so what change would you like moved to position #4?16:23
mwhahahalet me look16:23
jeblairShrews: can you look into that with dmsimard ?16:23
dmsimardShrews: http://grafana.openstack.org/dashboard/db/nodepool16:24
*** shardy is now known as shardy_afk16:24
pabelangerI'm also thinking we might either need to bump load governor up, or stand up a few more executors. We look to have a fair bit of ready nodes atm16:24
dmsimard185 ready nodes and 74 failed nodes.. with a bunch of queued jobs in zuul, there's something going on for sure :/16:24
mwhahahajeblair: can we promote a failed one? 51150916:25
*** bhavik1 has quit IRC16:25
dmsimardpabelanger: oh, that's a good point. Perhaps we have a bunch of nodes available to be queued but current executors are too loaded ?16:25
fungimwhahaha: that one looks like it declares a dependency on 51036316:25
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Increase github delay to 10 seconds  https://review.openstack.org/51581216:25
jeblairmwhahaha: yeah, it doesn't really matter. as soon as whatever is in position #4 changes, everthing behind it will restart.16:25
dmsimardpabelanger: doesn't explain the failed nodes, but does explain the ready nodes16:25
pabelangerdmsimard: possible, I am trying to see if that is the case16:25
mwhahahafungi: no that declarded a dep on 512082 which is already merged16:26
dmsimardpabelanger: no load graphs on cacti for ze's :(16:26
pabelangereach time we restart an executor, there is a large spike in load on others, which make sense16:26
clarkbinc0: we changed how the iptables are configured, but tehy should be wide open between all test nodes still (its just ansible doing it instead of a nodepool ready script now)16:26
pabelangerso need to wait a bit for things to even out16:27
*** smatzek has quit IRC16:27
jeblairdmsimard: we have graphs http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all16:27
fungimwhahaha: oh, you're right it was showing a tree where 510363 was the closest working change16:27
clarkbinc0: you should see in the job-output.txt and in ara if the jobs fail the playbooks that set up the rules between the nodes run16:27
dmsimardjeblair: oh, I wasn't looking at the right place, thanks16:27
dmsimardjeblair: was looking at, example, http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=55816:27
*** smatzek has joined #openstack-infra16:27
*** iyamahat has joined #openstack-infra16:27
jeblairdmsimard, pabelanger: this new graph may be helpful: http://graphite.openstack.org/render/?width=586&height=308&_salt=1509467265.42&target=stats.gauges.zuul.executors.accepting16:28
fungimwhahaha: jeblair: at any rate, something seems to have kicked the change after 514330 out of the gate so everything behind restarted anyway16:28
jeblairi haven't grafana'd that16:28
pabelangerjeblair: Yah, that is neat16:28
mwhahahafungi: jeblair ok well i'll just keep an eye on it16:29
inc0clarkb: what I'm experiencing (and can't figure out what's wrong) is timeouts between multinode stuff16:29
inc0like galera16:29
jeblairmwhahaha: do you want me to reshuffle to fix 514330, or leave it?16:29
mwhahahajeblair: just leave it16:29
inc0so while iptables jobs runs well (I assume) they might close something I need16:29
jeblairmwhahaha: ok16:29
inc0I'm also trying with tunnel playbook, but doesn't seem to help16:29
*** esberglu has joined #openstack-infra16:29
*** kjackal_ has quit IRC16:30
pabelangerdmsimard: we also have ~135 ready, but locked nodes in nodepool. So we are likely waiting for the noderequst to be fulfilled before unlocking16:30
*** kjackal_ has joined #openstack-infra16:30
clarkbinc0: the overlay you mean?16:30
inc0yeah16:30
clarkbinc0: do you have an example failing job we can look at logs for?16:30
AJaegerdmsimard, jeblair recheck did not help - still "  ERROR Project openstack/requirements does not have the default branch master" on https://review.openstack.org/51639716:31
jeblairAJaeger: i'll dig into it16:31
pabelangerdmsimard: so, it is possible that nodepool is waiting for jobs to finish, so it can launch more nodes16:31
AJaegerthanks, jeblair16:31
AJaegerbbl16:32
*** smatzek has quit IRC16:32
*** felipemonteiro_ has quit IRC16:32
*** salv-orlando has joined #openstack-infra16:32
*** salv-orlando has quit IRC16:33
Shrewsdmsimard: that grafana graph, i believe, can be a bit misleading. a node can be READY and already assigned to a request, but there could be a wait on other nodes needed for the request16:33
*** ijw has joined #openstack-infra16:33
pabelangeryah, that's basically what is going on ATM16:33
inc0clarkb: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/ for example16:33
*** smatzek has joined #openstack-infra16:33
inc0in this ps mariadb is in single node, but it fails in different place16:33
*** ramishra has quit IRC16:33
inc0also timeout16:34
inc0http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/ansible/deploy <- this is deployment log16:34
dmsimardinc0: iptables is literally set up to accept any traffic from nodes in a multinode set, let me pick you up the set of rules16:35
Shrewsdmsimard: if you have a specific review in mind, or anything specific, really, i don't mind digging. but it's hard to give an explanation of the general state of things16:35
dmsimardShrews: there's contention for the amount of ready nodes and I think we understand that part. I was also asking about the seemingly high amount of failed nodes16:36
clarkbinc0: so it is trying to hit http://172.24.4.250:35357 and failing? where do we see that ip address is assigned?16:36
*** jascott1 has joined #openstack-infra16:37
dmsimardinc0: http://logs.openstack.org/36/509436/6/gate/multinode-integration-ubuntu-xenial/7a9df40/job-output.txt.gz#_2017-10-23_22_40_18_20100216:37
*** salv-orlando has joined #openstack-infra16:37
dmsimardinc0: are you using switch/peer groups ?16:37
jeblairAJaeger: it looks like that executor's internal git repo for openstack/requirements is corrupted16:37
jeblairi suspect that's either related to the unclean shutdown/startup, or my cleaning up after it16:38
clarkbdmsimard: ya they are in the inventory16:38
clarkbdmsimard: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/zuul-info/inventory.yaml16:38
pabelangerShrews: dmsimard: the failed nodes, is because we've hit quota on cloud. So maybe an issue with calculating that?16:38
*** smatzek has quit IRC16:38
pabelangereg: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already used 1531904 of 1536000 ram16:38
*** pvaneck has joined #openstack-infra16:38
Shrewspabelanger: dmsimard: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already us16:38
dmsimardpabelanger: leaked nodes perhaps ?16:38
Shrewsed 1531904 of 1536000 ram16:38
Shrewsseeing that in nl01 logs16:38
pabelangeryah, same16:39
clarkbinc0: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 198.72.124.138. Set the 'ServerName' directive globally to suppress this message I think that is the problem16:39
clarkbinc0: from http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/keystone.txt.gz16:39
pabelangerI can check for leacked ndoes in IAD quickly16:39
clarkbinc0: would have to double check the vhost but if apache thinks it is serving from a different name than the one you are hitting that could explain it16:39
Shrewspabelanger: nodepool doesn't calculate anything wrt cpu or ram16:40
clarkbinc0: ya the listens are all off at http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla_configs/keystone/wsgi-keystone.conf too16:40
clarkb172.24.4.1 != 172.24.4.25016:40
*** catintheroof has joined #openstack-infra16:40
pabelangerShrews: yah, maybe a leak or max-server isn't quiet correct16:41
Shrewspabelanger: possible that max-servers is set too high?16:41
clarkbso I think apache is just not listening on the IP that would make this work16:41
Shrewsyeah, that16:41
pabelangerquite*16:41
*** vsaienk0 has quit IRC16:41
*** jascott1 has quit IRC16:41
*** armaan has joined #openstack-infra16:41
dmsimardclarkb: those are internal bridged IPs, right ? We don't set those up in the firewall explicitely but in practice the traffic goes in and out of the private IPs so it should go through fine I think16:42
pabelangerclarkb: can I delete you clarkb-test-centos7 in rax-iad?16:42
*** sree has joined #openstack-infra16:42
clarkbpabelanger: yes that should be fine16:42
pabelangerkk16:42
clarkbdmsimard: ya I think the firewall is fine, its the process not listening on the right ip:port combo16:42
*** jascott1 has joined #openstack-infra16:43
inc0clarkb: .250 will be handled by haproxy16:43
inc0that's why16:43
pabelangeryah, I see a few vms in rax-iad that looks to be in error state16:43
pabelangergoing to try and clean them up16:43
clarkbinc0: do you have more logs than http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/haproxy.txt.gz ?16:44
Shrewspabelanger: also worth noting that the launcher start a few moments ago would have eliminated a couple of situations where we could have some instances essentially sticking around much too long.16:44
*** Apoorva has joined #openstack-infra16:44
Shrewss/start/restart/16:44
pabelangerShrews: ok16:45
clarkbaha http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla/haproxy/haproxy_latest.20171025.b55c6793b5c7f834e.txt.gz16:45
*** catintheroof has quit IRC16:45
*** sree has quit IRC16:47
inc0clarkb: let me retry full mariadb cluster so you'll see it16:47
inc0issue wasn't with haproxy tho because turning it off didn't help with mariadb16:47
inc0it's node->node timeout over 172... ips that failed16:48
*** yamahata has joined #openstack-infra16:48
dmsimardShrews: fwiw tobiash has a stack of patches around switching from "max-servers" to use quotas instead, https://review.openstack.org/#/c/503838/16:48
clarkbinc0: right but you aren't really going node to node over 17216:48
clarkbinc0: you are going node to proxy to node to docker16:48
clarkbinc0: and any one of those pieces could be broken16:48
clarkb(actually its docker to node to proxy to node to docker)16:48
inc0we use net=host in docker so network stack isn't dockerized16:48
dmsimardShrews: (which I'm very excited about)16:48
inc0we don't use docker proxy or docker networking at all16:49
clarkbinc0: so haproxy is the only address rewriting involved?16:49
inc0yes, well that and keepalived16:49
inc0keepalived creates .250 ip and handles HA16:50
inc0(on host)16:50
dmsimardjeblair: so we had one particular executor with a corrupted repository and that's it ?16:50
inc0haproxy listens on .250 and forwards to .1 .2. .316:50
Shrewsdmsimard: yes. i think we're in a position to begin considering those now16:50
jeblairdmsimard: i'm going to check all of them16:50
jeblairsudo ansible 'ze*' -m shell -a 'whoami' --become-user zuul16:51
jeblairwhy does that report 'root' ?16:51
Shrewsjeblair: i think you also need --become16:51
dmsimardjeblair: perhaps .ssh/config defaults to root ? and what Shrews said16:51
jeblairShrews: yep, thanks :)16:52
Shrewsjeblair: you'd think the user one would imply the other, but... not so much16:52
jeblairyeah, i guess it's a really strict correspondence with commandline args and module args16:52
clarkbinc0: ok the haproxy log gives us clues on each node it appears it can talk to the keystone running on itself but not the others16:52
pabelangerShrews: did you want to take a peak into infracloud-chocolate, a large portion of its nodes are available currently: http://grafana.openstack.org/dashboard/db/nodepool-infra-cloud I haven't checked why that is16:52
mwhahahastill getting errors :/16:53
clarkb(just based on the health checks)16:53
*** smatzek has joined #openstack-infra16:53
Shrewspabelanger: looking16:53
inc0clarkb: problem is, even if I turned off haproxy it fails16:53
dmsimardmwhahaha: jeblair identified the problem and is working on it16:53
jeblairmwhahaha: yeah, i think there's a problem on at least one executor, i'm working on a solution16:53
*** smatzek has quit IRC16:53
mwhahahak16:53
inc0clarkb: let this latest patchset run16:53
jeblairmwhahaha: i'll be happy to promote changes once it's fixed; i'll let you know16:53
inc0I just uploaded new one with multinode galera16:53
clarkbinc0: I don't think we need galera...16:54
*** smatzek has joined #openstack-infra16:54
inc0that's basically setup I'd like to test16:54
inc0well, we do, galera deployment is part of our code and we want to gate it16:54
clarkbinc0: I mean to debug this problem16:54
clarkbit exists with or without galera16:54
inc0right, but galera makes it appear earlier16:54
inc0anyway, removing haproxy have same effect16:55
inc0removing haproxy -> pointing all API traffic to .116:55
mnaserare you deploying keepalived in the gate across 3 vms?16:55
inc0yeah16:55
mnaserso how can the 2 other nodes reach the vip16:56
inc0well they should, I use tunnel overlay16:56
mnaserbecause usually openstack won't let traffic originating from an IP that does not belong to an instance leave the instance16:56
inc0that's why I'm using tunnel overlay;)16:57
mnaseroh, if you have an overlay for that then im not sure, i'd consider MTU issues because now you're doing tunnel in tunnel16:57
clarkbinc0 Oct 25 23:40:24 ubuntu kernel: [  720.595795] iptables dropped: IN=brinfra OUT= MAC=e2:9f:77:40:45:4a:fa:58:48:a1:33:49:08:00 SRC=172.24.4.3 DST=172.24.4.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25069 DF PROTO=TCP SPT=41312 DPT=35357 WINDOW=28200 RES=0x00 SYN URGP=016:57
inc0I checked mtu16:57
mnaseras some cloud providers dont give you a real L2 network but a tunneled one16:57
mnaserso you the MTU might change from a provider to another16:57
*** bnemec has quit IRC16:57
inc0clarkb: sounds like problem there16:57
clarkbinc0: reading that I think the problem is likely that the firewall rules are only updated for the actual host IPs and not for the overlay16:57
clarkbdmsimard: ^ does the overlay role update firewall rules too?16:58
clarkbdmsimard: if not we probably want it to16:58
*** rbrndt has quit IRC17:00
*** baoli has quit IRC17:00
dmsimardclarkb: that's what I said earlier, no ? said we didn't add bridge IPs, only nodepool private ips17:00
clarkbdmsimard: oh sorry I misread it17:01
*** baoli has joined #openstack-infra17:01
dmsimardclarkb: just making sure I'm not crazy http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-31.log.html#t2017-10-31T16:42:1017:01
dmsimarddo we need to add the bridge IPs to the firewall, then ?17:01
clarkbyes we will17:01
pabelangerclarkb: dmsimard: Shrews: okay, clean up of rax-IAD underway, looks like we did leak some old nodes. maybe during cutover to zuulv317:01
*** jcoufal has quit IRC17:01
Shrewspabelanger: i'm sort of suspecting zuul has those chocolate ready nodes locked and is not doing anything with them17:02
clarkb(I read private IPs as the thing used by the overlay but you mean the actual cloud provided private IPs)17:02
dmsimardclarkb: ok, I'll get that done17:02
dmsimardbrb17:02
clarkbdmsimard: and rather than doing ip to ip we just want to open the whole range up17:02
pabelangerShrews: oh, could it because executors stopped accepting jobs, due to high load?17:02
clarkbso that inc0 can use .250 for haproxy17:03
Shrewspabelanger: the nodes are assigned, the request is gone, so nodepool is waiting for zuul to change their state17:03
*** yamamoto has joined #openstack-infra17:03
*** hashar is now known as hasharAway17:03
pabelangerkk17:03
Shrewspabelanger: probably17:03
*** jcoufal has joined #openstack-infra17:03
pabelangerShrews: thanks! I'll dig into that more17:03
pabelangeronce I finish clean up17:03
jeblairdmsimard: ze 02,03,04 have ara 0.14.0.  ze 05,06,07,08 have 0.14.2.  ze 01,09,10 have 0.14.4.17:03
jeblairdmsimard: i believe those are clustered by installation date17:03
jeblairdmsimard: what's the version you just released?17:03
clarkbdmsimard: I think `sudo iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT` is what devstack-gate used to do17:05
openstackgerritJames E. Blair proposed openstack-infra/puppet-zuul master: Ensure ara is updated on executors  https://review.openstack.org/51674017:05
jeblairclarkb, fungi, pabelanger: ^ is that okay to do?17:05
jeblairi can never remember what works and doesn't.17:06
*** rkukura has quit IRC17:06
jeblairdmsimard: ^17:06
pabelangeryah, syntax looks correct17:06
dmsimardjeblair: 0.14.517:06
clarkbjeblair: that should be ok but it will update all the deps too (whcih may cause problems like with the subunit2sql thing fungi ran into)17:06
inc0also fyi, patchset with dockerhub publishing is up17:06
inc0we'll move our gates to dockerhub + proxy as soon as first images appear upstream17:07
inc0to get rid of tarbalsl17:07
dmsimardjeblair: we don't run a pip upgrade of zuul which would trigger an update if it's dependencies ?17:07
inc0proxy == https cache in nodepools17:07
dmsimardbrb...17:07
jeblairdmsimard: ara is not a zuul dependency17:07
dmsimardOhhh17:07
jeblair(it's optional)17:07
*** rkukura has joined #openstack-infra17:08
jeblairwe could probably work something out with those requirement tag thingies... i'm not sure if that whole pipeline works now...?17:08
*** baoli has quit IRC17:08
pabelangerokay, ready nodes are dropping now that quota is rax-iad is coming avaiable again17:09
clarkbdmsimard: oh except that may only work with the old linux bridge things and not ovs, looking into how we set it up for ovs17:09
jeblairtag isn't the right word for that... what's the word they use?17:09
*** baoli has joined #openstack-infra17:09
clarkbjeblair: extras17:09
jeblairclarkb: yeah that's it!17:09
*** edmondsw has joined #openstack-infra17:10
*** yamamoto has quit IRC17:11
clarkbdmsimard: I'm actually not seeing any explicit allows since we switched to ovs. I think that is because neutron/nova net manage their own IPs and firewall rules on that range17:11
*** tosky has quit IRC17:11
clarkbdmsimard: so we didn't actually need to manage it directly which would explain why inc0 is having problems too but devstack didn't17:11
clarkbwe don't run the control plane on devstack multinode on the overlay17:11
pabelangerclarkb: so, to loop back to queue window size and related node usuage at PTG. If we support a negative window size on failures, i think that would help with the wasting of CI resources?  EG: gate resets, 20 patches reset and suck up all the nodes17:12
clarkbwe only run the VM networks there17:12
*** lucasagomes is now known as lucas-afk17:12
*** catintheroof has joined #openstack-infra17:12
clarkbinc0: fyi ^ what is the motivation for using the overlay for the control plane here?17:12
*** bnemec has joined #openstack-infra17:12
inc0clarkb: well, I had same mariadb timeouts before overlay, so I asked here and you guys suggested overlay17:12
clarkbpabelanger: you mean 0 size? you can't really have a negative size17:12
inc0overlay is good for us also because we can use keepalived+haproxy too17:13
pabelangerclarkb: by sliding the window down to 10, then it reduces the amount of nodes is grabs each reset17:13
jlvillalIs there a little command line utility that I can point at the 'zuul.d/' directory in our repo and it will print out what jobs should run for master, stable/pike, stable/ocata. And if they are voting, non-voting, or experimental.17:13
clarkbinc0: those should all run over layer 3 right?17:13
* jlvillal doesn't ask for much17:13
clarkbpabelanger: so my issue with that is the gate is supposed to pass. If it doesn't that is the bug not the window size which is already a hack around bad jobs17:13
inc0well, neutron underneath will not allow keepalived floating ip to work over regular network17:13
pabelangerclarkb: I mean, start at 20 today, and ramp up to 40 and down to 1017:13
jlvillalI'm trying to port what is in master to our stable branches.17:13
inc0neutron of nodepools17:14
clarkbpabelanger: why not just change the minimum to 10?17:14
clarkbpabelanger: it will slide up to 20 if your jobs pass17:14
inc0also L2 connectivity will generally be more similar to prod so it's little added benefit17:14
pabelangerclarkb: yah, I mean setting to 10 might be easier. If we want to do that17:14
clarkbinc0: oh right you need a shared IP then ya you need an overlay17:14
inc0we might add second overlay for vms later;)17:14
jeblairpabelanger: what problem are you trying to solve?17:14
clarkbinc0: dmsimard ok considering that I think what we want is an option to the overlay network role to open up the entire range between the nodes eg 172.24.4.0/23 can talk to 172.24.4.0/2317:15
clarkbinc0: dmsimard default it to not do that as that is the old behavior and some things (like neutron) want to manage the rules itself17:15
jeblair#status log restarted all zuul executors and cleaned up old processes from previous restarts17:15
openstackstatusjeblair: finished logging17:16
clarkbbut then inc0 can set that to true and get it working for control plane on overlay17:16
jeblair#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/python-glanceclient on ze0517:16
openstackstatusjeblair: finished logging17:16
fungijeblair: yeah, digging through logs and the source for puppet's pip package provider, i confirmed that using ensure=>latest will cause not only the named package to be updated but all its dependencies will be unconditionally updated to the latest versions on pypi (even if you've preinstalled sufficient versions as system packages) due to using the default upgrade strategy rather than the17:16
jeblair#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/neutron on ze1017:16
fungionly-if-needed strategy17:16
openstackstatusjeblair: finished logging17:16
pabelangerjeblair: minimize tripleo change pipeline resetting and consuming all the nodes, it has been happening pretty often the last few weeks. I know they are trying to fix the issues, but still making progress17:16
jeblair#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/requirements on ze0717:16
openstackstatusjeblair: finished logging17:16
inc0clarkb: just add new role for iptables17:16
inc0we can call it explicitly17:16
*** gouthamr has joined #openstack-infra17:16
jeblairpabelanger: can you elaborate on "consuming all the nodes"?17:17
pabelangerdmsimard: Shrews: okay, rax-iad is happy again. Looking at rax-ord now17:17
jeblairmwhahaha, AJaeger, dmsimard: all the executors should be repaired.  should i promote any changes?17:17
*** markvoelker_ has joined #openstack-infra17:17
AJaegerjeblair: thanks - no requests from my side17:17
clarkbinc0: except its fairly tightly coupled here, I don't think a new role is right because people will miss it when adding this role17:17
clarkbinc0: instead its an option of the overlay17:18
inc0either way works, thanks!17:18
jeblairfungi: and that does not happen on initial installation?17:18
AJaegerjeblair: what about integrated gate? Promote 515702 ?17:19
fungijeblair: it does not with ensure=>present because it just calls pip install without --upgrade17:19
*** markvoelker has quit IRC17:19
fungipip thinks --upgrade means "upgrade everything" unless you supply --upgrade-strategy=only-if-needed17:19
pabelangerjeblair: currently, we're up to 470 centos-7 nodes, which I haven't calculated but likely used mostly by tripleo jobs. Since those job run times are pretty long, each time their gate resets, it is a large amount of resource that get wasted and jobs have to start again.  So, was looking for a way to see how we could decrease the window size until that change queue becomes happy again.17:20
dmsimardjeblair: for my own curiosity, were you able to tell if (strangely) just openstack/requirements affected ?17:20
fungithe latter will upgrade the named packages but only upgrade their dependencies if the installed versions are insufficient17:20
pabelangerjeblair: the delay has been pushing 24hrs for the last week or so17:20
jeblairdmsimard: i status logged the repos i repaired17:20
AJaegerdmsimard: see the #status log above17:20
dmsimardjeblair: oops, didn't read far back enough, thanks17:20
*** Swami has joined #openstack-infra17:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor  https://review.openstack.org/51653217:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Check start time for wait_time key  https://review.openstack.org/51646517:21
*** markvoelker has joined #openstack-infra17:22
jeblairpabelanger: "wasting" is an abstract concern -- what's the problem?17:23
*** markvoelker_ has quit IRC17:23
jeblairhelp me understand the concrete problem that needs solving17:23
dmsimardjeblair: tripleo is consuming more resources because the gate keeps resetting for different reasons, not all within their control17:23
dmsimardis the gist of it17:23
jeblairdmsimard: more than what?17:23
*** e0ne has quit IRC17:23
dmsimardmore than if they merged the first time, rather than be rechecked/requeued several times17:24
*** salv-orlando has quit IRC17:24
clarkb470 nodes would be ~half our capacity right?17:24
*** felipemonteiro has joined #openstack-infra17:24
*** salv-orlando has joined #openstack-infra17:25
jeblairdmsimard: not really; a gate reset releases old resources and consumes new ones, so the consumption stays constant17:25
*** tmorin has quit IRC17:25
*** felipemonteiro_ has joined #openstack-infra17:25
pabelangerjeblair: Sure, our check pipeline is currently 231, and wanted to see how we can get more nodes for it.  My _gut_ is saying because the change queue for tripleo is resetting every 2 hours, that is the reason we are backing up check17:25
dmsimardjeblair: okay, I think we're understanding each other but using different vocabulary17:25
pabelangerjeblair: however, it isn't a problem. Since we eventually get nodes into check after the gate reset17:26
clarkbmy initial guess is that the runtime of tripleo jobs is what is making this potentially problematic17:26
jeblairpabelanger: okay, a check backlog is something that can be addressed by reducing gate node usage (which can be done by shrinking the window when things are bad)17:26
clarkbbecause check is at a lower priority than gate so when tripleo holds half our capacity in gate then resets they get to keep holding that half for significant periods of time17:27
jeblairpabelanger: i'd caution against evaluating the backlog right now since i just reset the entire system twice this morning17:27
dmsimardclarkb: it's a bit of a vicious circle, yes17:27
clarkbwe give gate a higher priority because those jobs should never fail17:27
clarkband so in theory have good throughput17:27
jeblairwe had a 100 change backlog in check and 40 in gate when i restarted the first time.  when i did that, we lost about 800 CPU hours of computation.  and then i did it again.17:27
clarkb(also merging things is important)17:27
pabelangerjeblair: sure, understood.17:27
*** salv-orlando has quit IRC17:29
fungithe main question i have is why is tripleo's shared change queue so much larger than the others in the gate? some combination of excessive job runtimes, more frequent job failures and tighter coupling between a greater number of repos than most other openstack projects?17:29
*** felipemonteiro has quit IRC17:29
*** jpich has quit IRC17:29
dmsimardmwhahaha, EmilienM ^17:29
fungior do they actually push that many more changes than other teams?17:29
*** pcaruana has quit IRC17:29
jeblairfungi: i'd say mostly the first 2 right now.  the project diversity doesn't seem to be a big issue.17:29
*** sree has joined #openstack-infra17:29
*** trown|lunch is now known as trown17:30
pabelangeryah job failures and long runtimes is the current state17:30
odyssey4meoh dear, did zuul restart?17:30
odyssey4medid I break it again?17:30
fungior is it also that they've held off approving changes due to tripleo cloud outages, and are working through an approval backlog now?17:30
pabelangerhowever, that isn't something we can currently fix, that would be done on tripleo side17:30
*** camunoz has quit IRC17:30
pabelangercan't*17:31
dmsimardodyssey4me: zuul restarted, likely not your fault :)17:31
mwhahahathere are many reasons for the long queue today, most of which are not necessarily tripleo failures17:31
fungijust trying to figure out whether it's expected to be perpetual or whether this is temporary while tripleo catches up on pending change approvals17:31
*** dhinesh has joined #openstack-infra17:31
mwhahaha1) zuul reset (and subsequent errors) has contributed to the length, 2) puppet jobs mixed in also got hit with a gem update that broke unit tests17:32
pabelangerokau, moving to rax-ord clean up17:32
mwhahahawe aren't aproving more things than normal afaik17:32
jeblairpabelanger: anyway, if you want to lower the min window because check is too backlogged, i will probably be fine with that, i'd just ask that you not make that evaluation based on the backlog right now which we know is not representative.17:33
*** ijw has quit IRC17:33
mwhahahaI've asked that we start actively tracking the gate failures better, it's really hard to tell once they clear the dashboard what failed where17:33
jeblairpabelanger: it would be good to know what the backlog is under normal circumstances, and how the min-window change would be expected to affect that17:33
*** ijw has joined #openstack-infra17:33
mwhahahai have noticed there does seem to be a slower responstime in updating the status of the jobs on the v3 dashboard as opposed to the older one so i'm wondering if the constant churn is also not helping that17:33
fungimwhahaha: "zuul reset" isn't a cause, merely a symptom. aside from the handful we just had due to the corrupt requirements git repository on one of the executors, most of those presumably go back to a generally higher failure rate for tripleo jobs i suppose? otherwise we'd see similar backlog for other projects17:34
*** ijw has quit IRC17:34
clarkbmwhahaha: in theory that is what the health dashboard should tell you (what failed where)17:34
*** sree has quit IRC17:34
*** ijw has joined #openstack-infra17:34
pabelangerjeblair: is there an easy way to track shared queue reset with statsd? Or maybe something we could start tracking.17:34
Shrewskeep in mind that right now, jobs using the tripleo-centos-7 node label are limited to a single pool that has max-servers of 70. Once those are all in use by long running jobs, other jobs requesting that node will be waiting for those 70 in-use nodes to be released before they even begin17:34
mwhahahafungi: but it's a symptom of something outside of the tripleo world17:34
Shrewsnot sure how relevent that info is, just throwing it out there17:34
mwhahahafungi: the point is that there are zuul or other related problems causing jobs to reset and because the jobs take long it has a bigger impact17:35
mwhahahafungi: not related to things specific to the tripleo world, so because our jobs take longer the impact is greater on our queue17:35
fungimwhahaha: many (i would wager most?) gate resets are due to job failures17:35
mwhahahafungi: except this morning when the executor was erroring for a few hours17:36
*** felipemonteiro_ has quit IRC17:36
*** tesseract has quit IRC17:36
fungiright, i said aside from that specific incident17:36
*** felipemonteiro_ has joined #openstack-infra17:36
fungiwhich should have affected other projects too17:36
mwhahahait did but they are in their own queue and approve less17:36
mwhahahaso all of ours are more visable because of the single queue17:36
mwhahahai saw a single nova change in the gate17:36
pabelangerpypi.slave.openstack.org is that something we can delete from rax-ord? looks like something from the past we didn't clean up17:36
jeblairShrews: a couple of things mitigate that -- those are only used by check jobs, and only used by tripleo.  so that shouldn't directly affect the gate queue length (but can make getting jobs ready to be gated take longer)17:37
pabelangerthat is nodepool project too17:37
fungiokay, so you are saying you're actually approving more changes than the projects sharing the integrated gate queue17:37
clarkbjeblair: remind me which roles were we moving into devstack? was it the swap setup? network overlay is staying in project-config right?17:37
mwhahahafungi: without actual metrics I cannot say for certain but our queue is larger than the integrated one17:37
mwhahahafungi: all i can speak to is the last 48 hours17:37
fungimwhahaha: agreed, trying to find out how it reached that point17:37
mwhahahafungi: and the failures the queue has had weren't necessarily tripleo specific17:38
fungiit's also just possible other openstack projects are taking the week off in preparation for the summit i guess?17:38
clarkbmulti-node-bridge is in zuul-jobs so it must not be moving17:38
dmsimardclarkb: re: bridge network and iptables -- we're doing this *inside* the image: http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/nodepool-base/install.d/20-iptables17:38
pabelangerhttp://logs.openstack.org/74/467074/7/gate/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-container/84b7572/ just reset tripleo gate17:38
dmsimardclarkb: notice the rules with 172.24.4.0/2317:38
pabelangerfailed to download from github.com looks like17:38
clarkbdmsimard: iirc that is a hack to make ironic agent work17:39
mwhahahafungi: do we have graphite metrics for zuul queue sizes?17:39
pabelangerso, that is one reason for failures, we discussed at PTG not downloading from github any more, but that requires changes to DLRN17:39
fungimwhahaha: the disconnect for me is that if the majority of the issues weren't tripleo-specific, then i'm trying to understand how that isn't impacting other projects equally17:39
jeblairclarkb: i think network overlay should be in ozj or zj17:39
clarkbdmsimard: we can probably clean that up in the future, but aiui ironic nodes must have access to the control plane17:39
pabelangereg: using spec files from RPMS over github.com17:39
clarkbjeblair: ya its zj I was confused17:39
dmsimardclarkb: but we're talking about just whitelisting the entire traffic between that range so that would be taken out, aye ?17:40
openstackgerritMiguel Lavalle proposed openstack-infra/openstack-zuul-jobs master: Remove job neutron-dsvm-api  https://review.openstack.org/51674417:40
jeblairfungi, mwhahaha: it sounds like at least one of the non-tripleo issues is tripleo-focused at least -- the gem failures.17:40
clarkbdmsimard: no not necessarily. That rule is 172.24.4.0/23 to the control plane which is one cloud IPs17:40
mwhahahajeblair: no that's puppet-openstack specific17:40
jeblairfungi, mwhahaha: that one is a matter of perspective17:40
clarkbdmsimard: I think we only want to add the rules that allow 172.24.4.0/23 to talk to 172.24.4.0/2317:40
mwhahahajeblair: and because we share queue, it impacted tripleo17:40
pabelangerfungi: clarkb: see my question about pypi.slave.o.o in rax-ord, is that safe to delete17:40
jeblairmwhahaha: you use those modules, right?17:40
clarkbpabelanger: I don't know17:41
pabelangerclarkb: k, I'll add it to meeting17:41
jeblairmwhahaha: i mean, there's a *reason* that queue is shared17:41
mwhahahajeblair: by your argument then we should share neutron, nova, etc17:41
clarkbdmsimard: if you haven't started on the chagne to multi node bridge I can take a stab at it17:41
clarkbdmsimard: let me know17:41
jeblairmwhahaha: that is in fact my argument but i have compromised17:41
fungimwhahaha: and attempting to ascertain whether whatever is causing tripleo to be singled out at the moment is a long-term problem we need to address systemically to bring tripleo's resource consumption into alignment with other teams, or whether this is a temporary/fleeting ballooning of resource needs which will subside once you work through it17:41
mwhahahaonce again, we need metrics and data so we can get to the bottom17:42
*** armaan has quit IRC17:42
*** salv-orlando has joined #openstack-infra17:42
mwhahahai also do not like this but without help understanding wtf is occuring in zuul over time when i'm not watching it's hard to say17:42
*** camunoz has joined #openstack-infra17:42
dmsimardclarkb: mostly making sure we'd be doing the change in the right place, I searched and couldn't find a reference to "iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT".. closest I found was in actual neutron code17:42
jeblairmwhahaha: at any rate, what i'm trying to say is that those gem failures are a partial answer to fungi's question of why the tripleo queue has beed adversely affected recently17:43
*** d0ugal has quit IRC17:43
fungimakes sense17:43
clarkbdmsimard: ya I think that was left over from when we used linux bridges but now it is ovs17:43
mwhahahajeblair: that was this morning and was addressed before the executor problem. it doesn't explain what happened yesterday17:43
clarkbdmsimard: I did a bit of digging myself, pretty sure the rule is lacking to meet inc0's needs17:43
mwhahahathe queue was already in bad shape before that happened17:43
mwhahahathat's just contributed to further delays today17:44
clarkbmwhahaha: is the health dashboard not tracking it for you?17:44
clarkbre metrics17:44
inc0thanks clarkb, dmsimard do you want me to publish patch for it?17:45
fungimwhahaha: poking around in graphite i don't think we have separate meters for every shared change queue17:45
mwhahahafungi: that would be beneficial to have so that we can tell when issues start for RCA17:45
clarkbmwhahaha: http://status.openstack.org/openstack-health/#/17:45
clarkbseems to be tracking tripleo jobs based on the front page17:45
dmsimardclarkb, inc0: I've started something17:45
clarkbdmsimard: cool, thanks17:46
*** Guest95277 has quit IRC17:46
inc0thanks dmsimard17:46
mwhahahaclarkb: yea it's there but i'll have to dig further.17:46
openstackgerritMiguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron  https://review.openstack.org/51672417:47
mwhahahaclarkb: what specifically would be beneficial is a break out of check vs gate17:47
clarkbmwhahaha: aiui its all gate today (no check) just due to volume but mtreinish would have to confirm17:47
mwhahahak i'll have to dig in and see if there's specifics we can point to17:48
mwhahahai recall there being a problem in the dashboard around the pingtest being improperly reported so i'll need to make sure that's still not a problem17:49
fungijeblair: is stats.zuul.tenant.openstack.pipeline.gate.total_changes scaled by 0.01? seems more spiky than i would expect too17:50
dmsimardclarkb: bleh, I hate this but we might need to use a meta dependency. The problem is that the bridge network is not known before the multi-node-bridge role runs and for that role to work, we need to run the firewall role first to authorize the traffic between the nodes17:51
pabelangermwhahaha: I had a patch up to fix that, https://review.openstack.org/495517/ it was because it would set pingtest fail by default, then never run the test17:51
EmilienMlegacy-tripleo-ci-centos-7-nonha-multinode-oooq legacy-tripleo-ci-centos-7-nonha-multinode-oooq : ERROR Project openstack/requirements does not have the default branch master ( found on https://review.openstack.org/#/c/516683/ - stable/newton)17:52
dmsimardEmilienM: we identified the issue and resolved it17:52
fungiEmilienM: that was corrected an hour or so ago17:52
EmilienMdmsimard: ok, I'll run "recheck" in that case17:52
clarkbdmsimard: yes I think you need to just do it in multi-node-bridge completely independent of the firewall role17:52
fungicorrupt git repository cached on an executor17:52
mwhahahapabelanger: yea that's what i'm remembering so not sure since we moved to tempest if we need that as much17:52
*** jascott1 has quit IRC17:52
clarkbdmsimard: is is a feature of the multinode bridge (and have it be a flag off by default that decides if it turns on or not)17:52
pabelangerclarkb: fungi: do you mind reaching out to citycloud, or I can if you have a contact email, about deleting our 7 stuck instances in Kna1. They seems stuck in BUILDING17:53
dmsimardclarkb: we could put it in multi-node-bridge directly, but then the rules wouldn't be persisted (https://review.openstack.org/#/c/513943/)17:53
clarkbdmsimard: I think you can just add a task to main.yaml in multi-node-bridge with a when: flag | bool17:53
*** ccamacho has quit IRC17:53
clarkbdmsimard: oh right that17:53
dmsimardclarkb: let me put up a WIP to explain17:53
*** felipemonteiro__ has joined #openstack-infra17:54
fungipabelanger: i don't know that i have any specific contact off the top of my head--if we do have contact info it'll generally be in our passwords file17:54
clarkbpabelanger: I've included you in earlier emails to them you can use but also our contact for that cloud should be in the passwords file17:54
pabelangerclarkb: k, couldn't remember. let me search mail again17:54
pabelangerwith nodepool launcher errors under control, I'm going to see why we have a large amount of ready nodes now17:55
fungipabelanger: if we have a dashboard login of some kind for them, might make sense to just open a trouble ticket through that since it's likely non-urgent17:55
clarkbdmsimard: can you just call the firewall role again from multi node bridge and pass in the different IPs? I think the problem is today the firewall role assumes the node IPs in inventory right? but we should be able to have it take a list?17:55
clarkbdmsimard: or break out the persist iptables portion of that rule and only reuse that bit (that might actually be easiest)17:55
pabelangerfungi: good idea17:55
dmsimardhang on, I'll have a patch up soon17:55
clarkbjeblair: considering you wrote https://review.openstack.org/#/c/516502/2 I'd be curious to get your thoughts on that (its logstash job submission change)17:57
*** felipemonteiro_ has quit IRC17:57
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status  https://review.openstack.org/51675518:01
pabelangerjeblair: ^is that the correct syntax to render the new accepting metric for executors?18:01
*** pvaneck has quit IRC18:02
*** panda|ruck|bbl is now known as panda|ruck18:02
mwhahahaquestion about logstash, which is the correct filename to use going forward? with or without the .gz? ie job-output.txt or job-output.txt.gz18:02
clarkbmwhahaha: that is what I'm trying to sort out with https://review.openstack.org/#/c/516502/218:02
clarkbmwhahaha: I'm asserting no .gz (backward compatible)18:03
mwhahahak18:03
*** d0ugal has joined #openstack-infra18:03
*** tpsilva has joined #openstack-infra18:03
clarkbbut hoping more people will review that change so we can make that decision18:03
dmsimardclarkb: re-reading ianw's comment on https://review.openstack.org/#/c/513943/ -- I *guess* we could take out the iptables persistence into a specific role that'd run after multi-node-bridge and multi-node-firewall18:04
mwhahahai know it used to be not .gz but the v3 changed that. so i'm for dropping it18:04
dmsimardclarkb: which would avoid having to run the multi-node-firewall role twice.18:04
clarkbdmsimard: ya18:04
dmsimard(and dealing with a meta dependency, which is cool too.)18:04
dmsimardokay, let's do that.18:04
clarkbdmsimard: that would be my perference I think its clear what is going on that way18:04
* dmsimard hates meta dependencies18:04
* mwhahaha adds a meta dependency on dmsimard 18:06
*** rbrndt has joined #openstack-infra18:06
clarkbcan you include_role the same role from multiple places?18:06
dmsimardclarkb: yes18:07
clarkbbut then main firewall setup can include_role persist iptables and then multi node bridge can include_role persist iptables too18:07
clarkbeither way works, as long as persist iptables happens after all the firewalling at least once18:07
*** rbrndt has quit IRC18:07
dmsimardclarkb: I intended to add the persist iptables roles once in the multinode playbook after both roles, but it's true that each of those roles could do an include role too.18:08
dmsimardeither way works -- using include_role makes it so each role can be used on their own without relying on the playbook including the role18:09
* dmsimard no strong opinion on either18:09
clarkbdmsimard: ya may be worth going that route just for the ability to consume things individually18:09
*** dsariel__ has quit IRC18:09
*** zzzeek has quit IRC18:10
*** jpena is now known as jpena|off18:10
*** zzzeek has joined #openstack-infra18:12
openstackgerritAlex Schultz proposed openstack-infra/elastic-recheck master: Add query for 1729054  https://review.openstack.org/51675618:13
AJaegerodyssey4me: looking at https://review.openstack.org/#/c/516605/2/zuul.d/project.yaml - why are you not using a project-template in a central place? That way you define the template once and can use it everywhere18:13
AJaegerodyssey4me: looks to me like you use the same jobs in a couple of repos18:13
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one  https://review.openstack.org/51675718:15
dmsimardclarkb: should be as simple as that ? ^ I'll fix the persistence stack18:15
clarkbdmsimard: one comment inline18:16
clarkboh actually one more one sec18:16
*** sree has joined #openstack-infra18:16
*** zzzeek has quit IRC18:17
clarkbposted18:18
dmsimardclarkb: sure18:18
*** zzzeek has joined #openstack-infra18:18
clarkbdmsimard: thinking about it more specifying the dest is probably not necessary18:18
clarkbyou aren't going to have packets from that range coming from external nodes to the test env (due to routing)18:19
dmsimardclarkb: probably doesn't hurt to specify it18:19
mwhahahais there anything we can do to speed up the response time of the elastic-recheck bot? or is it only as good as the indexing delay18:20
pabelanger/dev/xvde2       70G   66G  1.1G  99% /var/lib/zuul18:20
pabelangerthat is on ze10.o.o18:20
pabelangerfor some reason, almost full18:20
*** zzzeek has quit IRC18:20
dmsimardpabelanger: that's what, git repos ?18:20
clarkbmwhahaha: its only as good as the indexing delay and right now thats not great due to the double indexing described in https://review.openstack.org/#/c/516502/218:20
clarkbinc0: you should be able to depends on https://review.openstack.org/516757 and see if that makes things better for you18:20
*** hemna has quit IRC18:21
clarkbpabelanger: I think we leak build workspaces when executors are restarted18:21
*** d0ugal_ has joined #openstack-infra18:21
*** sree has quit IRC18:21
clarkbnot the case on ze02 though18:21
pabelangeryah, I'm going to stop ze10.o.o now, we are posting back some errors to jobs18:21
pabelangerthen see what leaked18:21
clarkbhold on18:22
pabelangerk18:22
dmsimardclarkb, inc0: hang on, we'll default that to false, right ?18:22
dmsimardso that it's opt-in, not opt-out18:22
clarkbpabelanger: there are a few builds from the 30th yo ucan probably just delete those without stopping the executor since our timeout is less than 18 hours18:22
clarkbdmsimard: ya that preserves the old behavior of things like neutron testing their own firewall rules18:23
pabelangerclarkb: didn't we have a clean up find command we used before18:23
*** d0ugal has quit IRC18:23
clarkbpabelanger: I'm not sure18:23
inc0dmsimard: I think default for confugre addresses is true18:24
inc0https://github.com/openstack-infra/zuul-jobs/blob/master/roles/multi-node-bridge/defaults/main.yaml#L518:24
odyssey4meAJaeger our job definitions are a little common, but not *that common* - the extra layer of abstraction doesn't actually help much18:24
clarkbinc0: ya I'm saying we need another flag for whther or not the firewall should be opened as well since old behavior was not to do that because things like neutron do it themselves18:24
inc0and if you configure addresses, it will require iptables too18:24
clarkbinc0: no it won't require iptables18:25
AJaegerodyssey4me: then my example files were too small ;)18:25
inc0having address you can't communicate over?18:25
clarkbbecause things like neutron are expected to directly manage that stuff and if we do a global rule that masks neutron's rules we won't test neutron18:25
inc0maybe configure_addresses should be default false18:25
*** zzzeek has joined #openstack-infra18:25
clarkbinc0: yes because some things like neutron do it themselves18:25
inc0right, but then you don't want address on iface too right?18:25
odyssey4meAJaeger I think as we stabilise we might look into using the templates -but for now it's not too bad18:25
clarkbinc0: we do18:25
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one  https://review.openstack.org/51675718:26
dmsimardclarkb: ^ with your comments18:26
clarkbinc0: without addresses on the interface we won't be able to ssh to neutron managed VMs18:26
clarkbdue to routing18:26
clarkbinc0: but neutron is responsible for making sure the iptables rules are set to all ssh18:26
clarkb*allow18:26
*** zzzeek_ has joined #openstack-infra18:27
clarkbdmsimard: +2 thanks18:27
dhineshhi, looks like i might have a working CI https://review.openstack.org/#/c/516758/ , but how do you get the 'success' of 'failure' status for a CI under workflow18:27
dmsimardinc0: try a Depends-On with https://review.openstack.org/#/c/516757/ and set bridge_authorize_internal_traffic: true in your job vars18:28
clarkbinc0: basically the isn't a regression for the old overlay code in bash, it seemed to always expect the deployed software to then manage the IPs18:28
dmsimardclarkb: -infra uses puppet3 right ?18:28
clarkbinc0: your use case is different so we are adding this as a feature that is off by default18:28
clarkbdmsimard: yes18:28
dmsimarddocs for puppet3 are dead :/ they took them out lol18:29
clarkbwoo18:29
clarkb(I mean we do it too so can't complain)18:29
dmsimardoh wait there's https://docs.puppet.com/puppet/3.8/ but https://puppet.com/docs/puppet/3.8/index.html is broken18:30
clarkbdhinesh: your comments to gerrit have to match our comment link rules18:30
*** zzzeek has quit IRC18:30
inc0trying18:30
clarkbdhinesh: er not comment links but the javascript we inject looks for a specific format (trying to find where that is)18:31
*** ralonsoh has quit IRC18:31
*** zzzeek_ has quit IRC18:31
clarkbpabelanger: fwiw my du to try and identify the bad dirs is going much slower on ze10 than I expected we probably should stop it since we can't turn it around quikcly18:33
clarkbpabelanger: my concern with just stopping it though is that I think some jobs use a lot more disk than others due to having more required projects and those jobs will just all migrate to other executors potentially causing them to run out of disk too18:33
*** sambetts is now known as sambetts|afk18:34
pabelangerclarkb: yah, I cleaned up old dirs, no help18:34
clarkbdhinesh: looks like it is comment links after all https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/review.pp#n17018:34
pabelangerclarkb: I suspect we are just syncing back to much data18:34
clarkbpabelanger: last time when I investigate the fs issues it was largely the git repos18:35
*** pvaneck has joined #openstack-infra18:36
clarkbeach job can have like 5GB of just git repos18:36
pabelangerclarkb: oh, yah. that could be it too18:36
clarkb(and git repos are also inode heavy)18:36
dmsimardnot just that, the executor also pulls the logs, right ?18:36
pabelangerclarkb: okay, so stop or ride it out?18:36
clarkbI'm torn I don't want to stop it so we can actually see what is using the disk18:37
clarkbbut du is running very slowly18:37
clarkbstill hasn't returned to me18:37
pabelangerk, I am searching manually myself18:37
pabelangerclarkb: yah, we are swapping too18:37
dmsimardclarkb: we can attach an additional volume temporarily and move stuff ?18:37
clarkbdmsimard: ya though I'm not sure that will be much faster (but I guess lets us investigate more later if necessary)18:38
*** bnemec has quit IRC18:38
clarkb(the problem is stopping it deletes all/most of the builds dirs)18:38
*** zzzeek has joined #openstack-infra18:39
clarkbok I've got to prep for meeting /me is mostly afk until 190018:39
pabelangerclarkb: so, far, everything I have git is 1.2GB to 1.6GB18:39
pabelanger5f02db42bc9a4be680c3d617a2eacdbf 2.3 GB18:40
pabelangeroh, interesting18:41
pabelangerI think we are leaking stuff18:41
pabelanger2017-10-31 15:24:18,141 DEBUG zuul.AnsibleJob: [build: 5f02db42bc9a4be680c3d617a2eacdbf] Sending result: {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"}18:41
pabelangerthat is last log entry, but still data on disk18:41
pabelangerclarkb: ^18:42
dhineshclarkb: so just adding comment links from the log server would initiate it?18:42
openstackgerritMerged openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status  https://review.openstack.org/51675518:42
pabelangerclarkb: so, I think we should stop ze10 and see has been leaked, then go back into debug logs18:42
*** zzzeek has quit IRC18:42
clarkbpabelanger: ok18:43
clarkbdhinesh: your comments to gerrit have to match that rule there18:44
pabelangerclarkb: k, stopping18:44
*** hemna has joined #openstack-infra18:44
*** zzzeek has joined #openstack-infra18:44
pabelangerjobs aborting now18:45
*** rloo has joined #openstack-infra18:45
*** dprince has quit IRC18:49
jeblairback18:50
dmsimardthe issues are with ze10 ?18:51
jeblairclarkb: are you running du?18:51
dmsimardyeah ok, nevermind -- saw a finger url for it.18:51
clarkbjeblair: not anymore18:51
jeblairi don't want to run my own if others are18:51
jeblairclarkb: what did you find?18:52
clarkb55G builds was biggest consumer followed by 9.1G executor-git18:52
clarkbeverything else is in the KB range18:52
clarkbthere are 224 builds18:53
clarkbso even at 1GB each thats enough to fill the disk18:53
jeblairthere's no zuul-executor running?18:53
clarkbjeblair: pabelanger was stopping it18:53
jeblairokay, then they can all be deleted :)18:53
pabelangerokay, I don't see any more playbooks running on ze10, but I do see ssh connections still open18:53
*** e0ne has joined #openstack-infra18:54
jeblairi'm assuming a bunch of them leaked due to earlier unclean shutdowns18:54
pabelangeryes, it just stopped now18:54
jeblairwe probably should check for and delete old build dirs on the other executors18:54
clarkbjeblair: pabelanger found that {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"} is a thing18:54
jeblairi'll start that18:54
mugsieis "ERROR Project openstack/requirements does not have the default branch master" a known issue?18:54
jeblairthat's probably due to being out of space18:54
clarkboh ya all those build dirs are from 1500UTC or so18:55
clarkbwhich was around when things restarted?18:55
jeblairyep18:55
pabelangeragree18:55
pabelanger5f02db42bc9a4be680c3d617a2eacdbf was the one I linked before and still exists on disk18:55
*** jascott1 has joined #openstack-infra18:56
clarkbinfra meeting in ~4 minutes18:56
AJaegermugsie: konwn issue and fixed - please recheck18:56
clarkbjoin us in #openstack-meeting18:56
jeblairokay i'm deleting all build dirs older than 4 hours18:56
mugsieAJaeger: thanks18:57
AJaegermugsie: if you get it on a change that you just pushed or after the recheck, then it's a new one - please report back in that case18:57
clarkbjeblair: I don't think that will catch tose on ze10 just yet18:57
mugsiecool - I will keep an eye on them18:57
jeblairclarkb: it seems to be the only one with a bunch of 1500s; other executors are generally older18:58
jeblairsince it's stopped i'll delete the whole builds dir18:59
*** yamahata has quit IRC18:59
fungimmm, meeting time?19:00
clarkbyup19:00
jeblairpabelanger: i'm deleting all the build dirs, and i'm also doing fscks on all the git repos on ze1019:03
jeblairjust to make sure everything is clean when we restart it19:03
*** sree has joined #openstack-infra19:03
pabelangerjeblair: ack19:03
*** yamahata has joined #openstack-infra19:07
*** sree has quit IRC19:07
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci master: Default $NODEPOOL_PROVIDER  https://review.openstack.org/49003719:07
*** catintheroof has quit IRC19:14
*** yamahata has quit IRC19:14
*** pcaruana has joined #openstack-infra19:15
*** dprince has joined #openstack-infra19:15
*** yamahata has joined #openstack-infra19:17
*** rbrndt has joined #openstack-infra19:22
*** ijw has quit IRC19:23
*** ijw has joined #openstack-infra19:24
jeblairpabelanger: all of the build dirs on ze10 are deleted, and my git repo fsck has come back clean; you should be clear to restart when ready19:24
pabelangerjeblair: thanks, starting up now19:25
*** electrofelix has quit IRC19:25
AJaegerjeblair, pabelanger what about tripleo-quickstart? That was mentioned above19:25
jeblairAJaeger: that error is probably caused by an error cloning from the git repo cache to the job's build dir.  i checked the cache, and the repos are fine, so it was probably just that it ran out of space coping it into the build dir.19:27
jeblairAJaeger: now that all the old build dirs are deleted, should be fine19:27
AJaegerjeblair: ok, thanks19:27
*** eharney has quit IRC19:29
*** eharney has joined #openstack-infra19:30
*** hasharAway is now known as hashar19:38
*** Hal has joined #openstack-infra19:43
*** Hal is now known as Guest430019:43
*** pvaneck has quit IRC19:44
*** pvaneck has joined #openstack-infra19:45
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints  https://review.openstack.org/51677319:48
*** amoralej is now known as amoralej|off19:48
*** pcaruana has quit IRC19:49
openstackgerritMerged openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints  https://review.openstack.org/51677319:49
*** pvaneck has quit IRC19:49
*** sree has joined #openstack-infra19:50
*** salv-orlando has quit IRC19:51
openstackgerritMerged openstack-infra/system-config master: Fix dependency order with logstash_worker.pp  https://review.openstack.org/51671719:52
*** Guest4300 has quit IRC19:53
openstackgerritRuby Loo proposed openstack-infra/project-config master: Remove legacy python-ironicclient jobs  https://review.openstack.org/51677419:55
*** catintheroof has joined #openstack-infra19:55
*** sree has quit IRC19:55
*** mat128 has quit IRC19:56
fungipabelanger: fwiw, i can't seem to get either of the "fg-test" instances for openstackjenkins in ord to accept any of my ssh keys19:58
fungiso i don't think they're anything i created19:58
pabelangerfungi: k, thanks19:58
clarkbI think show will tell us how old they are?19:59
* clarkb looks19:59
fungimy money is on "ancient"19:59
pabelangerhttp://paste.openstack.org/show/625158/19:59
fungithey're using a "512 MB Classic v1" flavor after all19:59
pabelangercentos-6.2 nodes19:59
pabelanger51219:59
fungioh yeah, created 2016-06-0220:00
fungiless ancient than i anticipated20:00
clarkboh in that case we likely can delete them as the only thing we really ever used centos 6 for was test nodes and git.o.o and both work otherwise20:00
pabelangerfungi: I am guessing live migration?20:00
pabelangerstop /create20:00
fungimaybe20:00
pabelangerokay, will delete them now then20:01
clarkbre https://review.openstack.org/#/c/516502/ if we can get that in I think I'd like to restart gear on logstash.o.o as its over 100k now and see if that change makes a dent in job queue growth20:01
clarkbbut first lunch20:01
jeblairi have to go run errands for a few hours20:01
*** efried has joined #openstack-infra20:03
openstackgerritRuby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironicclient jobs  https://review.openstack.org/51677620:05
*** pvaneck has joined #openstack-infra20:06
efriedHowdy folks.  Zuul is -1ing everything without giving a reason I can understand.  Known issue?20:06
pabelangerlooking into why infracloud-chocolate appears to be wedged.20:06
pabelanger56 locked, ready nodes20:06
*** ccamacho has joined #openstack-infra20:10
pabelangeryah, appears to be wedge waiting on more nodes20:10
pabelangerjust a waiting game now I think20:11
pabelanger| 0000624820 | infracloud-chocolate   | nova     | centos-7         | 0da89be1-f267-4926-bc91-b1debb4c509d | ready    | 00:04:07:47 | locked   |20:11
pabelangeris the longest right now20:11
*** camunoz has quit IRC20:11
fungiefried: please link to an example of the everything it has -1'd20:11
fungithere was a disk spontaneously running out of room on one of the 10 executors a little bit ago which could have resulted in some job failures20:12
fungithough it seemed like we caught it pretty quickly20:13
efriedfungi E.g.: https://review.openstack.org/#/c/515151/20:13
fungiefried: thanks, looking20:13
efriedfungi E.g.: https://review.openstack.org/#/c/515223/20:14
efriedfungi Let me know if you want more.20:14
efriedfungi Thanks for looking!20:14
fungiefried: if you toggle the ci comments on that first one, all those "ERROR Project openstack/requirements does not have the default branch master" entries are likely to have been an executor failures we resolved earlier, those jobs started a few hours ago based on the time of your recheck and the duration of the working jobs and time it reported on the change20:16
*** jtomasek has quit IRC20:17
efriedfungi Okay, I want to say there was at least one we rechecked and it still failed, lemme go find...20:17
fungiefried: same thing with the second change you linked20:17
efriedfungi https://review.openstack.org/#/c/516662/20:17
efriedMERGER_FAILURE sounds like a sinister Wall Street thing.20:18
fungiefried: yes, that looks like a different issue20:18
*** mrunge has quit IRC20:18
*** kgiusti has left #openstack-infra20:20
fungiefried: unrelated to the corrupt git repo on one of the executors which caused the "does not have the default branch master" errors, timing on the failed recheck there looks related to an issue we resolved shortly thereafter where an executor ran out of disk space20:21
fungiit should be fine at this point20:21
efriedfungi Cool, thanks for checking it out.20:21
clarkb merger failures can be lack of disk too20:21
*** Swami has quit IRC20:22
fungiyup20:22
fungithat's what i expect it was in that case20:22
*** mrunge has joined #openstack-infra20:22
clarkbwe may want to formalize rm -rf /var/lib/zuul/builds on zuul startup20:23
clarkbmaybe add it to the init script?20:23
*** ijw has quit IRC20:24
*** dave-mcc_ has joined #openstack-infra20:26
*** smatzek has quit IRC20:26
*** smatzek has joined #openstack-infra20:27
*** dave-mccowan has quit IRC20:27
inc0dmsimard: I think it helped, gates failed still but for different reason I think20:28
inc0mariadb finally bootstraps:)20:28
dmsimardinc0: progress! Do you know what's the issue ?20:29
*** erlon has quit IRC20:29
inc0yeah I think, I think I need to recreate /etc/hosts so hostname will point to 172... ip20:29
inc0for rabbitmq20:29
dmsimardinc0: we setup the inventory hostnames in /etc/hosts20:29
inc0yeah I know20:30
*** ldnunes has quit IRC20:30
dmsimardBut they point to internal nodepool ip20:30
inc0but since I'm using overlay, I'll set it to overlay net20:30
inc0shouldn't be too bad20:30
dmsimardI guess you want to setup the bridge IPs ?20:30
inc0yeah20:30
*** csomerville has joined #openstack-infra20:30
dmsimardOk, maybe something to consider as well clarkb ^20:30
inc0kolla-ansible already does that, but I needed to remove previous setup20:31
inc0dmsimard: not sure, it might be kolla-ansible specific20:31
*** smatzek has quit IRC20:31
pabelangermwhahaha: EmilienM: can you see what in tripleo jobs is overwriting /root/.ssh/known_host? it is deleting the infra-root keys we add with nodepool and prevents us from SSH into the running nodes20:31
mwhahahapabelanger: probably in quickstart20:32
*** Hal has joined #openstack-infra20:32
*** cody-somerville has quit IRC20:32
*** Hal is now known as Guest9512120:32
mwhahahapabelanger: we append not overwrite20:33
fungipabelanger: itym authorized_keys?20:33
dmsimardpabelanger: infra-root keys != known_hosts ?20:33
*** xingchao has quit IRC20:33
dmsimardfungi beat me to it :)20:33
pabelangeroh ya, that20:33
pabelangerthanks20:33
pabelangerauthorized_keys20:33
pabelangerty20:34
mwhahahabut we do remove known_hosts https://github.com/openstack/tripleo-quickstart-extras/blob/79cf07e3dd3e555206ae6fefdd41423a6da38cd8/roles/virthost-full-cleanup/tasks/main.yml#L11120:34
mwhahahapabelanger: https://github.com/openstack/tripleo-quickstart-extras/blob/dab754c8de7d235ffe85d157f7d6d6f05be988eb/roles/undercloud-setup/tasks/non_root_user_setup.yml20:34
mwhahahapabelanger: but we're using the authorized_key thing in ansible so not sure if that should be removing any more keys20:35
pabelangermwhahaha: is undercloud_user == root?20:36
mwhahahapabelanger: usually it isn't but it might be in multinode20:37
dmsimardmwhahaha: authorized_key from ansible doesn't delete keys unless "state: absent" or "exclusive: yes"20:37
*** sree has joined #openstack-infra20:37
mwhahahafound it20:37
mwhahahahttps://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L23820:38
pabelangeryup20:38
dmsimardoh yeah, that totally overwrites the one in /root20:39
*** sshnaidm is now known as sshnaidm|afk20:39
pabelangercat foo | sudo tee -a /root/.ssh/authorized_keys20:39
pabelangerthat is the fix20:39
*** Guest95121 has quit IRC20:39
dmsimarddepends what's the purpose20:39
dmsimardthere might already be keys in there20:39
mwhahahadid you guys stop putting those keys in /etc/nodepool? https://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L23220:40
mwhahahamaybe that's the problem?20:40
pabelangerno, we use glean now to populate /root/.ssh/authorized_keys20:41
pabelangerjust that your the only jobs that overwrite it20:41
clarkbdmsimard: inc0 that would be another new use case for the overlay20:41
clarkbdmsimard: inc0 I am not opposed to supporting it too but the way kolla sets it up I don't think kolla really wants us to do it anyways? because we are unaware of the keepalived ip20:41
pabelangermwhahaha: what is your key used for?20:41
*** sree has quit IRC20:42
dmsimardpabelanger: the fix is to leave line 238 as is, but then cat "${HOME}/.ssh/authorized_keys" | sudo tee -a /root/.ssh/authorized_keys20:42
dmsimardinstead of doing the cp20:42
pabelangerright20:42
mwhahahapabelanger: no idea it was like that when i got here20:42
dmsimardI can send a patch since i'm not core and all20:42
* dmsimard writes20:42
clarkbianw: for https://review.openstack.org/#/c/516502/ you good if I go ahead and approve that now and edit the comment in a followup?20:44
*** thorst_ has quit IRC20:44
*** trown is now known as trown|outtypewww20:45
*** priteau has joined #openstack-infra20:45
openstackgerritDavid Moreau Simard proposed openstack-infra/tripleo-ci master: Don't replace /root/.ssh/authorized_keys, append to it  https://review.openstack.org/51678520:45
dmsimardpabelanger, mwhahaha ^20:45
*** thorst has joined #openstack-infra20:47
pabelangerjeblair: Shrews: clarkb: yah, chocolate is completely wedged. I'm going to see about bumping max-server up by 5 to see if that will cause things to move again.20:47
pabelangerotherwise, I don't know how to delete or release a locked node20:47
pabelangerlocked 'ready' node20:47
inc0clarkb: yeah in general we have a lot of setup like that in our code, so unless someone else wants it, I don't think there is reason for making this thing just for us20:48
clarkbinc0: in this case I think because you are using IPs in the range that we aren't direclty controlling you'll want to do it20:48
dmsimardclarkb: yeah.. basically at that point, they might as well parent to base instead of multinode and include each role they need individually (and leave the multi-node-hosts-file out)20:48
*** felipemonteiro__ has quit IRC20:51
*** thorst has quit IRC20:51
*** xingchao has joined #openstack-infra20:53
*** smatzek has joined #openstack-infra20:56
pabelangerokay, I am not sure what is going on with infracloud-chocolate, 3 more nodes came on line, and looked to be fulfilled for zuul, but still locked20:57
pabelanger2017-10-31 20:49:53,310 DEBUG zuul.nodepool: Updating node request <NodeRequest 200-0000806206 <NodeSet legacy-centos-7-2-node OrderedDict([('primary', <Node None primary:centos-7>), ('secondary', <Node None secondary:centos-7>)])OrderedDict([('subnodes', <Group subnodes ['secondary']>)])>>20:57
*** hemna has quit IRC20:58
pabelangerI'll check back online jeblair is back from errands, see if we can figure out the issue20:58
pabelangerbut, unsure how to release the locked ready nodes and unwedge20:58
*** cody-somerville has joined #openstack-infra20:59
*** cody-somerville has joined #openstack-infra20:59
clarkbpabelanger: did the nodes that are locked boot and did zuul use them?20:59
clarkbif they are just not booting we should be able to address that problem20:59
*** xingchao has quit IRC21:00
*** smatzek has quit IRC21:01
*** csomerville has quit IRC21:01
*** jcoufal_ has joined #openstack-infra21:02
openstackgerritClark Boylan proposed openstack-infra/project-config master: Better comment in logstash job submission role  https://review.openstack.org/51678621:03
clarkbianw: ^ there is the bigger comment I'm going to go ahead and approve the other change now21:03
*** salv-orlando has joined #openstack-infra21:05
pabelangerclarkb: they have booted, nodepool-launcher marked them ready (fulfilled), then zuul locked them , but hasn't launched any jobs. let me see if I can figure out if an executor was assigned21:05
pabelangermaybe we are having an issue there21:05
*** jcoufal has quit IRC21:05
Shrewspabelanger: yeah, not sure how to debug the zuul side of that21:06
*** eharney has quit IRC21:07
pabelangerhttp://paste.openstack.org/show/625161/21:07
pabelangerShrews: clarkb: that is all the data I see in zk21:07
pabelangerwhich looks correct21:08
pabelangerand I see a lock too21:08
pabelangerbut don't know how to see that info21:08
*** catintheroof has quit IRC21:09
pabelangerk, have to dadops with kids, I'll check backscroll this evening21:09
*** rhallisey has quit IRC21:10
openstackgerritMerged openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical  https://review.openstack.org/51650221:12
mwhahahaok so is there anywhere to look in the logs to see why the tripleo queue keeps resetting21:16
mwhahahacause it jsut reset again and i have no idea why21:16
mwhahahabesides 510900,2 being stuck there in error21:17
clarkbusually the easiest thing is to look at the top of the queue and see what just failed. Has 510900 been in that state for a while?21:18
mwhahahaclarkb: yes21:18
mwhahahaclarkb: hours21:18
*** ijw has joined #openstack-infra21:18
clarkbok in that case its likely whatever change was ahead attempted to merge and failed because jgit (so zuul wasn't able to detect that ahead of time) or it got a new patchset and was evicted21:19
openstackgerritRuby Loo proposed openstack-infra/project-config master: Remove legacy python-ironic-inspector-client jobs  https://review.openstack.org/51678921:19
clarkbbut otherwise you should have a failure at the tip21:19
mwhahahai know usually it shows up21:19
mwhahahabut come back to the ui and it's like wtf just happened21:19
mwhahahabut i litterally caught it zeroing out everything but no failure listed21:19
clarkbya if its not filure then likely merge failed in gerrit late or new patchset arrived for some new change21:20
clarkber some change at the head of the queue21:20
mwhahahai don't think so21:20
* mwhahaha goes looking21:20
mwhahahahttp://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A21:20
mwhahahagotta love all those runs21:20
*** xingchao has joined #openstack-infra21:21
mwhahahaclarkb: it's items in the inventory.yaml to show what was ahead of it right?21:22
*** priteau has quit IRC21:22
mwhahahait looks like the last 3 runs of 509521 had nothing infront of it21:23
*** priteau has joined #openstack-infra21:23
clarkbI'm not sure where that is recorded21:23
openstackgerritRuby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironic-inspector-client jobs  https://review.openstack.org/51679121:23
mwhahahait just reset again21:23
mwhahahaand nothing is infront of it21:23
* mwhahaha flips tables21:24
*** sree has joined #openstack-infra21:24
clarkbI don't think it reset the gate21:24
clarkbthe changes behind it are still running jobs21:25
mwhahahawhy is that job getting reset21:25
clarkbzuul will do that if the test node crashes (up to some limit of retries)21:25
*** ijw has quit IRC21:25
clarkbI'm trying t ofind it in the logs now21:25
clarkbso its reseting jobs on that change but not reseting the gate as a result from what I see21:26
*** xingchao has quit IRC21:26
mwhahahathey are all resetting21:26
*** priteau_ has joined #openstack-infra21:26
*** priteau has quit IRC21:27
Shrewsi'm going to wager a guess that the infracloud long-locked nodes, and the tripleo problems mwhahaha is seeing are somehow related21:28
*** sree has quit IRC21:29
clarkb2017-10-31 21:20:10,386 INFO zuul.Pipeline.openstack.gate: Resetting builds for change <Change 0x7fee5d659978 509521,2> because the item ahead, <QueueItem 0x7fee5d66fd30 for <Change 0x7fee5d66fba8 510900,2> in gate>, is not the nearest non-failing item, None21:29
clarkb2017-10-31 21:20:10,387 DEBUG zuul.Pipeline.openstack.gate: Cancel jobs for change <Change 0x7fee5d659978 509521,2>21:29
mwhahahaso it's resetting because 510900,2 is stuck?21:30
clarkbmwhahaha: well "is not the nearest non-failing item" I htink means something other than 510900 failed21:30
mwhahahabut there's nothing there :/21:30
clarkband the only thin between those two changes that can fail other than 510900 is 509521 itself21:30
clarkbI'm trying to see if I can find build logs for 509521 now21:31
mwhahahathe pep8 logs were http://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A21:31
mwhahahabut not sure where the other job logs were21:31
*** mrunge has quit IRC21:31
clarkbif it is the node crashing then we won't have copied logs becaues the instances went away21:32
clarkbtrying to dig through via the zuul logs21:32
*** rcernin has joined #openstack-infra21:32
mwhahahaclarkb: could it be the stomping on the authorized_keys that pabelanger mentioned earlier?21:34
mwhahahaclarkb: where zuul thinks the node crashed but it wasn't21:34
mwhahahait's just that you can't connect anymore21:34
*** amoralej|off is now known as amoralej21:34
mwhahahaor is the the fact that it seems to belooping in http://zuulv3.openstack.org/static/stream.html?uuid=6d416d1154364d65982e64c940d2f6d0&logfile=console.log21:35
clarkbya if zuul can't ssh back in that could do it21:35
clarkbbut I think the thing pabelanger was talking about was root user specific not zuul user21:35
mwhahahaso how is that a new thing21:35
*** threestrands has joined #openstack-infra21:36
mwhahahais the zuul user different than the user the job uses?21:36
*** armax_ has joined #openstack-infra21:36
*** armax has quit IRC21:38
*** armax_ is now known as armax21:38
clarkbno the zuul user is the user the job framework uses21:38
*** mrunge has joined #openstack-infra21:39
mwhahahaok maybe we're touching that, but we didn't change anything recently around that21:39
rm_workis zuul on storyboard or launchpad? wondering where i could request a feature21:39
clarkbrm_work: storyboard21:39
rm_workkk21:39
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Persist iptables rules  https://review.openstack.org/51394321:39
rm_workclarkb: actually, is zuul's webui *part of zuul* or part of something else21:40
dmsimardclarkb: as per discussed ^ I'll fix the integration tests21:40
*** edmondsw has quit IRC21:40
*** bobh has quit IRC21:41
openstackgerritMerged openstack-infra/project-config master: Better comment in logstash job submission role  https://review.openstack.org/51678621:41
clarkbrm_work: it is part of zuul21:42
rm_workk, thanks21:42
*** jascott1 has quit IRC21:43
clarkbmwhahaha: http://paste.openstack.org/show/625167/ that is a better snippte of logs but I think I am more confused now reading that21:44
mwhahahaclarkb: :( i have no idea. i'm watching the console of the one runing job to see if it is something we're doing. if it resets again i'm going to abandon 510900 to get it to go away21:46
*** jascott1 has joined #openstack-infra21:46
*** dprince has quit IRC21:46
clarkbI think what is going on is zuul thinks that 510900 is still the parent of 509521 in the ordered future git state under test21:47
clarkbwhere it should decouple the two queues and 510900 is in a queue of its own with one item in it (itself) and then 509521 forms the tip of a new queue21:48
mwhahahamy thoughts as well is that it's confused about those two21:48
*** jcoufal has joined #openstack-infra21:48
mwhahahalegacy-tripleo-ci-centos-7-scenario003-multinode-oooq-container seems to be queued on 51090021:49
mwhahahaso i wonder if they keep resetting eachother21:49
mwhahahaso let me abandon that patch to clear it21:49
*** boden has quit IRC21:50
*** jcoufal_ has quit IRC21:50
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence  https://review.openstack.org/51393421:51
mwhahahaclarkb: are there visible metrics for number of times a job (or jobs) gets reset somewhere?21:51
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence  https://review.openstack.org/51393421:51
dmsimardianw: addressed your comment in https://review.openstack.org/#/c/513943/ because splitting the role out made something else easier21:55
clarkbmwhahaha: I'm not sure if zuul emits that to graphite, I think it may under the status NONE category21:55
*** bnemec has joined #openstack-infra21:55
mwhahahaclarkb: cause i want to say its been happening a lot based on queues but it's hard to tell :/21:56
clarkbhttp://paste.openstack.org/show/625169/21:57
clarkbmwhahaha: thats how often its happened roughly21:58
dmsimardclarkb: wow21:58
mwhahahayea that coinicides with all the pep8 logs from that one change21:58
dmsimardclarkb: each time that's happened, all the jobs essentially restart with the failed job out of the queue, right ?21:58
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Drop TripleO jobs from Puppet modules  https://review.openstack.org/51679421:59
clarkbdmsimard: they should, but not sure if that is happening 510900 and 509521 seem to be coupled more tightly than I would expect21:59
mnasermwhahaha: ^21:59
mwhahahaclarkb: the coupling was just within zuul cause one is stable/newton and the other is master :D22:00
*** ijw has joined #openstack-infra22:00
mwhahahaso it's artifically together just in queue22:00
* mwhahaha shrugs22:01
clarkbwell its not artificial that is by design so that upgrade jobs do the right thing22:01
clarkbyou can't test upgrades without that22:01
mwhahahawe've pulled our upgrades out of infra22:01
clarkbbut reading that latest paste it looks like we have 510900 <- some change <- 50952122:01
mwhahahaso in this case, we probably don't want it22:01
clarkband that some change changes constantly22:01
clarkbthen it transitions to 510900 <- 50952122:02
*** jcoufal has quit IRC22:02
clarkbbut the whole time 510900 is there22:02
clarkbwhich seems odd to me22:02
clarkbunless 510900 was causing every change after it to break?22:02
mwhahahaprobably22:02
mwhahahait was one of the ones were the jobs had errored from the executor stuff22:03
mwhahahaso it seemed to have gotten in a really bad state22:03
*** amoralej is now known as amoralej|off22:05
clarkbwe may need jeblair to dig in when he gets back22:06
clarkbI'm quikcly getting beyond my understanding here22:06
mwhahahathere be dragons22:06
mwhahahaseems that we have some other changes that might also be suffering from the same problems themselves22:07
mwhahahaliek 516630,122:07
*** marst has quit IRC22:07
mwhahahathat's got a bunch of stuff that was errored but may also be requeuing22:07
mwhahaha516651,122:07
mwhahaha516047,2 499239,27 511350,25 511350,2522:08
mwhahahai'm going to abandon/restore the tripleo ones22:09
mwhahahabut there's a rally and nova one as well22:09
mwhahahathat may continue22:09
clarkbmwhahaha: which rally/nova one?22:10
mwhahaha516047,222:10
mwhahaharally22:10
clarkbthe only nova ones in the gate are green22:10
mwhahaha499239,2722:10
mwhahahanot gate22:11
mwhahahain check22:11
clarkbah ok22:11
mwhahahadoing same stuff22:11
mwhahahaor at least they look like they might be since they are still around22:11
mwhahahaand have things queued22:11
clarkbin check the fallout should be limited to that change, but ya probably want to evict them so they don't hang out there until next restart22:11
*** jascott1_ has joined #openstack-infra22:12
mwhahahawell they also might be requeuing constantly as well taking up resources22:12
mwhahahai cleared the 3 tripleo ones22:12
mwhahahabut i can't help with those two22:12
*** slaweq has joined #openstack-infra22:12
*** bobh has joined #openstack-infra22:13
clarkbya not sure looks like they each have a single job queued (and no gate resets)22:13
*** ijw has quit IRC22:13
mwhahahajust something to keep an eye on22:14
*** slaweq has quit IRC22:15
*** jascott1 has quit IRC22:15
*** baoli has quit IRC22:18
*** tpsilva has quit IRC22:18
*** e0ne has quit IRC22:18
clarkbwe'll want to keep an eye on it to see if "normal" gate failures create the same problem22:18
clarkbwhat should happen is after the first failure things get decoupled from each other and move on on their own22:19
*** ijw has joined #openstack-infra22:19
*** rloo has left #openstack-infra22:19
clarkbbut if a second failure causes it to reset again that would be a bug22:19
*** bobh has quit IRC22:20
*** ijw has quit IRC22:22
*** ijw has joined #openstack-infra22:22
ianwdmsimard: dropped a bridge/iptables comment on 516757 ... is testing sufficient for that?22:24
clarkbianw: in this case we use ovs without linux bridges, does that change your concern?22:24
clarkbianw: we also have syslogs from kolla jobs showing iptables dropped packets sourced and desitnation from that 172.24.4.0/23 range22:26
ianwclarkb: ok, just as long as we've tested it, i guess ovs is probably different22:27
ianwwhenever i see "bridge" and "firewall" it makes me think of this22:27
clarkbianw: ya inc0 mentions it seemed to address the problem in hisdepends on change22:27
inc0clarkb: fwiw it helped22:28
*** aeng has joined #openstack-infra22:30
*** ijw has quit IRC22:33
*** slaweq has joined #openstack-infra22:33
clarkbmwhahaha: I've added this problem to the issues with zuul list at https://etherpad.openstack.org/p/zuulv3-issues22:33
*** lbragstad has quit IRC22:36
*** edmondsw has joined #openstack-infra22:38
*** edmondsw has quit IRC22:43
*** wolverineav has quit IRC22:44
jlvillalWhat does it mean when there is a Zuul error: MERGER_FAILURE  on some of the jobs, but not all?22:45
*** rbrndt has quit IRC22:45
jlvillalSeen on this patch: https://review.openstack.org/#/c/513152/322:45
*** priteau_ has quit IRC22:47
jeblairback22:47
clarkbjlvillal: earlier today an executor ran out of disk due to leaked build dirs. This caused the merger failure messages22:47
clarkbjeblair: can you see the converstaion with mwhahaha above (and tried to tldr it on the zuul issues etherpad too22:47
jlvillalclarkb: Ah, okay. recheck it is :)22:47
jeblairclarkb, mwhahaha: ack22:48
jeblairclarkb:     "Changes in the gate did not appear to decouple from each other when the one ahead failed. Specifically 510900 and 509521 on October 31." ?22:49
*** rlandy has quit IRC22:49
mwhahahajeblair: yea that one22:49
*** mriedem has quit IRC22:50
clarkbjeblair: yes22:51
ianwclarkb: do you have experience re-init of a bup repo -> http://paste.openstack.org/show/625174/ ?22:51
clarkbjeblair: it looked like 510900 kept resetting 509521, paste in ehterpad tries to capture that22:51
clarkbianw: I think the .bup should be in /root ? re error: '/opt/backups/bup-ask/.bup/' is not a bup repository; run "bup init"22:52
clarkboh its talking to the remote end and not finding a repo there22:52
clarkbianw: I think you may have to bootstrap local and remote?22:52
clarkbsystem-config docs hopefully have more info?22:52
*** ijw has joined #openstack-infra22:53
ianwclarkb: yeah, i'm not sure ... /opt/backups/bup-ask/.bup is the remote side, which you'd think "bup init -r ..." would create for you, anyway, i'll keep poking22:53
clarkbianw: does /opt/backups/bup-ask exist?22:54
clarkbit may create the .bup for you but not if it can't login?22:55
ianwclarkb: yep, that was cloned from the old server.  i just removed the .bup directory22:55
clarkbgotcha22:55
ianwit may be the system-config instructions are missing a bup init on the remote side22:55
*** hongbin has quit IRC22:55
clarkbianw: the bup init on the local side to be backed up was a relatively ne wthing too its possible the bootstrap on remote side steps are new as well?22:55
clarkbianw: ya thinking that could be possible22:56
jeblairclarkb: so if i can try to summarize -- down at the end of your paste (where it says the NNFI is None), we're looking at 509521 behind 510900 which is at the head.  you might expect 510900 to fail and reset 509521 once, but it appears that it somehow failed multiple times and therefore reset 509521 multiple times.22:58
jeblairclarkb: does that sound right?  (also, are we sure that 510900 was the head at that time?)22:58
*** slaweq has quit IRC22:59
clarkbjeblair: correct on my theory23:00
jeblairokay.  i'll try to dig into that.  it will take a while.23:00
clarkbjeblair: http://paste.openstack.org/show/625167/ may also be helpful23:00
clarkbjeblair: thats a bit more logging around a single reset occurence23:00
ianwi think we should exclude /var/lib/postgresql from backups ... we just want to backup the database dump.  it changes under the live backup23:00
clarkbianw: +123:01
jeblairclarkb: ah thx23:01
jeblairi'm going to try to get 2 of those and see what happened between them23:01
openstackgerritIan Wienand proposed openstack-infra/system-config master: [DNM] remove ci-backup-rs-ord.openstack.org  https://review.openstack.org/51615923:01
ianwclarkb: ^ i think remote needs an init, updated instructions23:02
clarkbianw: do you need to flip the order around? bup init on server then on client?23:03
clarkbdocs say do server to be backed up first23:03
clarkbbut I think that is why you had the error23:04
*** xarses has quit IRC23:04
ianwclarkb: i think "bup init" on the client will create /root/.bup, but it's not till it runs with "-r user@backupserver:" that it tries to look at the remote .bup dir23:05
clarkbaha23:06
openstackgerritIan Wienand proposed openstack-infra/puppet-bup master: Ignore postgres working directory  https://review.openstack.org/51679823:08
*** gyee has quit IRC23:10
ianwinfra-root: ^ if ok, could i get two eyes on this, i'd like to start new backups without this23:11
jeblairclarkb, mwhahaha: i think there's a bug with reconfiguration; i think we erroneously put 509521 behind 510900 again after reconfiguration, then the next pass through the queue processor threw it out again.23:16
*** LindaWang has quit IRC23:17
*** yamahata has quit IRC23:21
*** yamahata has joined #openstack-infra23:21
*** thorst has joined #openstack-infra23:24
*** gildub has joined #openstack-infra23:25
*** gmann_afk is now known as gmann23:27
*** slaweq has joined #openstack-infra23:29
clarkbjeblair: ok, so my theory wasn't too far off23:30
* clarkb goes to pack23:31
*** thorst has quit IRC23:31
*** daidv has quit IRC23:35
*** daidv has joined #openstack-infra23:35
*** aviau has quit IRC23:38
*** aviau has joined #openstack-infra23:38
*** nicolasbock has quit IRC23:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: failing test for reconfiguration at failed head  https://review.openstack.org/51679923:41
jeblairthere's a failing test case that reproduces this.  this may be a longstanding zuulv2 bug, we just didn't notice it because we didn't reconfigure every 5 minutes.23:42
*** jascott1_ has quit IRC23:46
*** jascott1 has joined #openstack-infra23:47
*** baoli has joined #openstack-infra23:50
*** markvoelker has quit IRC23:51
*** adreznec has quit IRC23:52
*** sdague has quit IRC23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!