Friday, 2018-12-14

*** wolverineav has quit IRC00:09
*** wolverineav has joined #openstack-infra00:09
*** slaweq has joined #openstack-infra00:11
*** wolverineav has quit IRC00:14
*** slaweq has quit IRC00:16
*** woojay has quit IRC00:25
*** sthussey has quit IRC00:34
*** dtroyer has quit IRC00:34
*** ssbarnea|rover has quit IRC00:34
*** wolverineav has joined #openstack-infra00:34
*** dtroyer has joined #openstack-infra00:35
*** dtroyer has quit IRC00:36
*** wolverineav has quit IRC00:36
*** dtroyer has joined #openstack-infra00:37
*** wolverineav has joined #openstack-infra00:38
*** wolverineav has quit IRC00:38
*** wolverineav has joined #openstack-infra00:38
*** Swami has quit IRC00:49
*** gyee has quit IRC00:58
*** armax has joined #openstack-infra01:00
*** yamamoto has quit IRC01:03
*** slaweq has joined #openstack-infra01:10
*** slaweq has quit IRC01:15
*** psachin has joined #openstack-infra01:30
zxiiroAnyone else seeing "ImportError: cannot import name decorate" when using openstack client?01:34
zxiiroI think dogpile.cache released a new version yesterday that's breaking.01:34
clarkbzxiiro: I think the dogpile thing is a known issue but unaware of fix01:35
zxiiropinning it to 0.6.8 seems to help my build job at least.01:39
clarkbkmalloc: Shrews might be worth email to the discuss list?01:43
*** d0ugal has quit IRC01:56
*** mrsoul has quit IRC02:07
*** d0ugal has joined #openstack-infra02:11
*** rfolco has quit IRC02:30
*** armax has quit IRC02:31
*** dave-mccowan has joined #openstack-infra02:38
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add change status page  https://review.openstack.org/59947202:50
*** bhavikdbavishi has joined #openstack-infra02:51
*** yamamoto has joined #openstack-infra03:02
*** wolverineav has quit IRC03:03
*** wolverineav has joined #openstack-infra03:04
*** armax has joined #openstack-infra03:05
*** wolverineav has quit IRC03:08
*** slaweq has joined #openstack-infra03:11
*** hongbin has joined #openstack-infra03:14
*** apetrich has quit IRC03:15
*** hongbin has quit IRC03:15
*** slaweq has quit IRC03:15
*** hongbin has joined #openstack-infra03:16
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add change status page  https://review.openstack.org/59947203:18
kmallocclarkb: i also think that openstacksdk is doing the wrong thing here03:22
kmallocclarkb: trying to figure out why we have written a wrapper that is insisting on passing a bound method into a decorator instead of just wrapping the methods like you normally would.03:23
kmalloczxiiro: set to <0.7.0 for now03:24
kmallocclarkb: i'll write an email to the ML tomorrow/later tonight if someone else doesn't get to it first.03:24
*** lathiat has quit IRC03:36
*** lathiat has joined #openstack-infra03:36
*** yamamoto has quit IRC03:39
ianwkmalloc: i just dropped a mail now that we have everything lined up03:43
kmallocThanks!03:49
*** ramishra has joined #openstack-infra03:49
kmallocI think I have a fix for SDK, just need to poke at it a but tomorrow.03:49
kmallocShould be straight forward actually03:49
*** lbragstad has joined #openstack-infra03:50
*** lbragstad has quit IRC03:51
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add projects page  https://review.openstack.org/60426603:55
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer  https://review.openstack.org/62514503:55
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer  https://review.openstack.org/62514603:55
*** dave-mccowan has quit IRC04:03
*** udesale has joined #openstack-infra04:10
*** psachin has quit IRC04:10
*** slaweq has joined #openstack-infra04:11
*** lbragstad has joined #openstack-infra04:13
*** armax has quit IRC04:14
*** slaweq has quit IRC04:16
*** ykarel|away has joined #openstack-infra04:24
*** psachin has joined #openstack-infra04:28
*** hongbin has quit IRC04:34
*** woojay has joined #openstack-infra04:41
*** jamesmcarthur has joined #openstack-infra04:45
*** jamesmcarthur has quit IRC04:49
*** bhavikdbavishi has quit IRC04:50
*** bhavikdbavishi has joined #openstack-infra04:51
*** _alastor_ has quit IRC04:54
*** yamamoto has joined #openstack-infra04:55
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider  https://review.openstack.org/57066704:55
*** _alastor_ has joined #openstack-infra05:02
*** wolverineav has joined #openstack-infra05:08
*** slaweq has joined #openstack-infra05:11
*** slaweq has quit IRC05:16
*** lucasagomes has quit IRC05:17
*** agopi has quit IRC05:17
*** agopi has joined #openstack-infra05:25
*** _alastor_ has quit IRC05:34
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size  https://review.openstack.org/62201005:37
*** dklyle has joined #openstack-infra05:38
*** yamamoto has quit IRC05:40
*** gecong has joined #openstack-infra05:44
*** gecong has quit IRC05:50
*** wolverineav has quit IRC05:51
spsuryame and  whoami-rajat discussed this, So we think we can save our infra resources, if we optimise and fix this https://storyboard.openstack.org/#!/story/2004569 also checking how much this is feasible, confirmation from infra team would correct our understanding and approach05:58
spsuryaThanks05:58
*** gengchc has joined #openstack-infra05:59
whoami-rajatfungi  clarkb  ^ Please provide your valuable inputs on the above query. Thanks!06:05
gengchchello EmilienM!  There are a problem in freezer-api and freezer. Elasticsearch server can't start, Could you please take a look at https://review.openstack.org/#/c/624867/ . error message is [pkg/elasticsearch.sh:_check_elasticsearch_ready:53 :   die 53 'Maximum timeout reached. Could not connect to ElasticSearch']06:05
openstackgerritOpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/62514906:06
*** bhavikdbavishi has quit IRC06:06
*** ykarel|away is now known as ykarel06:11
*** agopi has quit IRC06:13
*** lpetrut has joined #openstack-infra06:13
*** agopi has joined #openstack-infra06:18
*** yamamoto has joined #openstack-infra06:20
*** hwoarang has quit IRC06:20
*** hwoarang has joined #openstack-infra06:21
*** bhavikdbavishi has joined #openstack-infra06:24
*** agopi has quit IRC06:25
*** jtomasek has quit IRC06:52
*** rcernin has quit IRC07:03
*** jtomasek has joined #openstack-infra07:06
*** quiquell|off is now known as quiquell07:11
*** slaweq has joined #openstack-infra07:11
*** pcaruana has joined #openstack-infra07:12
*** slaweq has quit IRC07:16
*** gengchc has quit IRC07:21
*** aojea has joined #openstack-infra07:24
*** bhavikdbavishi has quit IRC07:35
*** pgaxatte has joined #openstack-infra07:36
*** dpawlik has joined #openstack-infra07:38
*** ssbarnea|rover has joined #openstack-infra07:40
*** slaweq has joined #openstack-infra07:41
*** slaweq has quit IRC07:47
*** yamamoto has quit IRC07:48
*** yamamoto has joined #openstack-infra07:48
*** psachin has quit IRC07:48
*** slaweq has joined #openstack-infra07:51
*** ginopc has joined #openstack-infra07:56
*** yamamoto has quit IRC07:57
*** lpetrut has quit IRC07:58
*** rpittau has joined #openstack-infra08:06
*** apetrich has joined #openstack-infra08:09
*** markvoelker has joined #openstack-infra08:16
openstackgerritMerged openstack-infra/project-config master: Add 'Review-Priority' for Cinder repos  https://review.openstack.org/62066408:24
*** imacdonn has quit IRC08:24
*** imacdonn has joined #openstack-infra08:24
*** dkehn has quit IRC08:28
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add change status page  https://review.openstack.org/59947208:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer  https://review.openstack.org/62514508:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add projects page  https://review.openstack.org/60426608:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer  https://review.openstack.org/62514608:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add labels page  https://review.openstack.org/60468208:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add nodes page  https://review.openstack.org/60468308:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page to use a reducer  https://review.openstack.org/62489408:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page using a container  https://review.openstack.org/62489508:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add errors from the job-output to the build page  https://review.openstack.org/62489608:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering  https://review.openstack.org/53786908:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add project page  https://review.openstack.org/62517708:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor job page to use a reducer  https://review.openstack.org/62517808:35
*** yamamoto has joined #openstack-infra08:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor labels page to use a reducer  https://review.openstack.org/62517908:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: refactor nodes page to use a reducer  https://review.openstack.org/62518008:35
*** priteau has joined #openstack-infra08:38
*** bhavikdbavishi has joined #openstack-infra08:42
*** jpena|off is now known as jpena08:48
*** tosky has joined #openstack-infra08:53
*** shardy has joined #openstack-infra09:01
*** Emine has joined #openstack-infra09:05
*** gfidente|afk is now known as gfidente09:14
*** bhavikdbavishi has quit IRC09:17
*** ccamacho has joined #openstack-infra09:20
*** yamamoto has quit IRC09:37
*** ykarel is now known as ykarel|lunch10:00
*** bhavikdbavishi has joined #openstack-infra10:07
*** pbourke has quit IRC10:29
*** pbourke has joined #openstack-infra10:31
*** agopi has joined #openstack-infra10:31
*** ykarel|lunch is now known as ykarel10:32
*** agopi has quit IRC10:36
*** markvoelker has quit IRC10:36
*** markvoelker has joined #openstack-infra10:37
*** e0ne has joined #openstack-infra10:40
*** bhavikdbavishi has quit IRC10:40
*** markvoelker has quit IRC10:41
*** bhavikdbavishi has joined #openstack-infra10:46
*** electrofelix has joined #openstack-infra10:49
*** rpittau is now known as rpittau|lunch11:10
*** bhavikdbavishi has quit IRC11:10
*** yamamoto has joined #openstack-infra11:15
*** rfolco has joined #openstack-infra11:16
*** markvoelker has joined #openstack-infra11:16
*** derekh has joined #openstack-infra11:17
*** bhavikdbavishi has joined #openstack-infra11:28
dulekHey, any idea what might be using port 50036 on infra VM's? Or how do I check that?11:33
dulekOur kuryr-daemon is unable to bind to it: http://logs.openstack.org/54/623554/6/check/kuryr-kubernetes-tempest-daemon-containerized-octavia-py36/aba6dc8/controller/logs/kubernetes/pod_logs/kube-system-kuryr-cni-ds-lb8xk-kuryr-cni.txt.gz#_2018-12-14_09_02_55_11111:33
dulekIt's not 100% of the time, but from time to time the port is taken.11:34
*** bhavikdbavishi has quit IRC11:36
*** gary_perkins has quit IRC11:37
*** rossella_s has quit IRC11:44
*** rossella_s has joined #openstack-infra11:44
*** gary_perkins has joined #openstack-infra11:51
*** udesale has quit IRC12:11
*** rpittau|lunch is now known as rpittau12:13
*** tpsilva has joined #openstack-infra12:14
*** bhavikdbavishi has joined #openstack-infra12:17
*** pcaruana has quit IRC12:21
*** pcaruana has joined #openstack-infra12:22
*** rh-jelabarre has joined #openstack-infra12:23
*** bobh has quit IRC12:24
*** pcaruana is now known as pcaruana|intw|12:25
*** yamamoto has quit IRC12:28
*** yamamoto has joined #openstack-infra12:30
*** yamamoto has quit IRC12:30
*** bobh has joined #openstack-infra12:30
*** jpena is now known as jpena|lunch12:31
openstackgerritJeremy Stanley proposed openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org  https://review.openstack.org/62524112:32
fungiclarkb: jhesketh: corvus: ^ corresponding dns addition for lists.opendev.org12:33
*** _alastor_ has joined #openstack-infra12:35
jheskethfungi: lgtm12:37
*** smarcet has joined #openstack-infra12:38
*** smarcet has quit IRC12:39
*** bobh has quit IRC12:41
*** bobh has joined #openstack-infra12:44
openstackgerritMerged openstack-infra/system-config master: Add lists.opendev.org to Mailman  https://review.openstack.org/62509612:53
*** bhavikdbavishi has quit IRC12:54
*** boden has joined #openstack-infra12:55
*** ykarel has quit IRC13:00
*** ykarel has joined #openstack-infra13:01
fungispsurya: whoami-rajat: it's come up many times in the past (as you can expect, lots of people propose this "simple" optimization), but it requires a lot of discussion because of the combination of whether we configure gerrit to clear or preserve verified votes on commit-message-only edits, whether some projects might want ci jobs which lint commit messages, and so on. can't hurt to discuss it again,13:02
fungibut there's a lot more nuance to it than it might seem13:02
*** bhavikdbavishi has joined #openstack-infra13:04
*** yamamoto has joined #openstack-infra13:05
fungidulek: 50036 is well within the default ephemeral ports range for the linux kernel (32768-61000) as well as iana's suggested range (49152-65535) so it could and most probably is something random and maybe a different process each time you hit that13:05
fungidulek: assigning a static listening port above 2^15 is a bad idea13:06
spsuryafungi: thanks for update13:07
*** weshay_pto is now known as weshay13:07
*** dave-mccowan has joined #openstack-infra13:09
dulekfungi: Okay, thanks!13:11
*** trown|outtypewww is now known as trown13:11
fungidulek: in ci jobs, it's often more effective to use a method which chooses an available ephemeral port and then passes that information along to whatever routines will try connecting to it. this also allows you to have the same fixture start up multiple copies of a listening service without them conflicting over a single port and without having to manually configure individual ports for them13:13
*** quiquell is now known as quiquell|lunch13:13
fungii forget the syscall, but you can basically ask the socket to bind to an unspecified ephemeral port and it will get assigned one and return the integer value on success13:14
fungiif this is python, socket.socket() and friends probably have a parameter explicitly for this13:14
*** _alastor_ has quit IRC13:14
*** bobh has quit IRC13:14
dulekfungi: That would be doable, but initially we've simply used file socket, but stopped due to some issues with requests lib. Maybe we should revisit that approach.13:15
*** dave-mccowan has quit IRC13:15
fungisure, a named pipe/fifo for a unix socket is a useful alternative if you don't actually need it to be a real network connection13:15
*** yamamoto has quit IRC13:18
*** yamamoto has joined #openstack-infra13:18
*** EmilienM is now known as EvilienM13:20
*** bobh has joined #openstack-infra13:20
openstackgerritJeremy Stanley proposed openstack-infra/system-config master: Add rust-vmm OpenDev ML  https://review.openstack.org/62525413:23
*** bobh has quit IRC13:25
fungijhesketh: clarkb: corvus: ^ and the first mailing list anyone has requested us to host on lists.opendev.org13:26
*** ykarel is now known as ykarel|afk13:29
*** weshay is now known as weshay1-113:34
*** jpena|lunch is now known as jpena13:35
*** bhavikdbavishi has quit IRC13:36
*** rlandy has joined #openstack-infra13:37
*** derekh has quit IRC13:47
jhesketh+213:51
*** kgiusti has joined #openstack-infra13:54
*** mriedem has joined #openstack-infra13:56
*** dkehn has joined #openstack-infra13:58
Shrewsfungi: dulek: i think binding to port 0 picks a random, available port14:00
*** ykarel|afk is now known as ykarel14:00
Shrewsiirc, we do that in nodepool tests a lot14:01
*** pcaruana|intw| has quit IRC14:05
*** weshay1-1 is now known as weshay14:05
fungioh, yep, that's the way14:06
fungifor some reason i always forget port 0 is magic14:06
*** Emine has quit IRC14:16
*** jamesmcarthur has joined #openstack-infra14:18
*** derekh has joined #openstack-infra14:24
*** bobh has joined #openstack-infra14:26
*** udesale has joined #openstack-infra14:27
*** dave-mccowan has joined #openstack-infra14:31
*** pcaruana has joined #openstack-infra14:31
*** bobh has quit IRC14:32
*** bobh has joined #openstack-infra14:37
*** dave-mccowan has quit IRC14:43
dhellmanngerrit-admin: could someone add me to the git-os-job-core and git-os-job-release groups, please, so I can complete the migration? https://review.openstack.org/#/admin/groups/1988,members and https://review.openstack.org/#/admin/groups/1989,members14:47
*** panda|off is now known as panda14:50
*** psachin has joined #openstack-infra14:50
dansmithShrews: I'm finger | grep'ing right now.. nifty trick14:51
*** Adri2000 has quit IRC14:51
fricklerdhellmann: done14:51
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: add release jobs for git-os-job  https://review.openstack.org/62527314:52
dhellmannfrickler : thanks!14:52
fungidhellmann: also if you don't feel like using the gerrit groups search form (i find it cumbersome) you can use urls like https://review.openstack.org/#/admin/groups/git-os-job-core14:53
dhellmannoh, that's handy14:53
dhellmannI couldn't remember the name of the group in this case, so I started from the git repo details page14:53
dhellmannbut in future...14:53
fungiahh, yep14:54
*** jamesmcarthur has quit IRC14:54
fungii think i stumbled on that entirely by accident, so not sure where/whether it's actually documented14:54
*** Adri2000 has joined #openstack-infra14:55
openstackgerritHervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant  https://review.openstack.org/61856914:57
ssbarnea|roveris thera way for zuul to finish a job with a WARNING result? I think I seen somewhere some nice orange WARNING results in gerrit, but not sure where.14:58
Shrewsdansmith: ++14:58
openstackgerritHervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant  https://review.openstack.org/61856914:59
*** jamesmcarthur has joined #openstack-infra15:00
*** armstrong has joined #openstack-infra15:01
*** zul has joined #openstack-infra15:04
fungissbarnea|rover: these are the job statuses documented as provided by zuul: https://zuul-ci.org/docs/zuul/user/jobs.html#build-status15:05
*** markvoelker has quit IRC15:06
*** smarcet has joined #openstack-infra15:09
*** quiquell|lunch is now known as quiquell15:12
*** psachin has quit IRC15:21
*** dpawlik has quit IRC15:24
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495715:28
*** psachin has joined #openstack-infra15:29
*** jamesmcarthur has quit IRC15:31
bodenhi, has anyone else reported a ContextualVersionConflict error cropping up in the last day or so that appears to be related to eventlet?? ex: http://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-2.log15:31
*** jamesmcarthur has joined #openstack-infra15:31
bodenI can't seem to figure out what's changed15:31
fungiyou've compared that install log with a previous passing run?15:33
fungilooks like it's getting eventlet 0.24.1 (maybe via oslo.service?) when the constraint requests <0.21.015:34
*** jamesmcarthur has quit IRC15:36
bodenfungi yeah I'm just trying to understand why/how... it just started breaking in the last 24hrs or so and I don't see any changes in requirements that would've affected it15:38
fungiCollecting eventlet==0.24.1 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 22))15:39
fungihttp://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-1.log15:39
openstackgerritMerged openstack-infra/storyboard master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62237715:40
fungiboden: https://git.openstack.org/cgit/openstack/vmware-nsx/tree/lower-constraints.txt#n2215:40
bodenfungi yes, but lower constraints haven't changed there recently... so why starting to fail now15:40
fungii'm looking for the <0.21.015:41
*** jamesmcarthur has joined #openstack-infra15:42
fungiresorting to http://codesearch.openstack.org/?q=eventlet.*<0.21.0 since my hunches based on the error message didn't pan out15:43
funginone of those seem relevant either15:44
fungioh, i should check the tagged versions15:47
openstackgerritSean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600  https://review.openstack.org/62529015:47
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository  https://review.openstack.org/62529215:48
*** dpawlik has joined #openstack-infra15:48
*** pgaxatte has quit IRC15:48
fungiboden: Collecting oslo.service==1.24.0 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 80))15:50
fungiboden: so it's https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.1#n615:50
fungier, https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.0#n6 rather15:51
fungiso your lower constraint for eventlet is set higher than what your lower constraint for oslo.service supports as its maximum eventlet version15:52
*** gfidente has quit IRC15:52
fungithat's the reason for the error15:52
funginow as to why it only just started happening, this will require more digging15:52
*** adriancz has quit IRC15:52
*** dpawlik has quit IRC15:52
bodenfungi yeah I don't understand why it just cropped up... I'll have to dig more as to how we can resolve it15:53
*** armax has joined #openstack-infra15:56
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495715:57
*** bhavikdbavishi has joined #openstack-infra15:59
dansmithclarkb: so I've been trying to poke the lvm timeout stuff with a stick16:00
dansmithclarkb: we tried serializing all lvm ops, which didn't seem to help16:00
dansmithclarkb: I'm also wondering if having a couple few 24G loop devices is causing us to do some really long buffer flushes16:01
dansmithclarkb: I dunno how much you know about how that works, but heavy writes to a loop device can OOM the system and the overhead is generally quite high, so I'm wondering if lvm ops occasionally cause a bunch of data to be flushed out and takes a really long time16:01
dansmithclarkb: >= bionic has direct-io support for loop, which should help if that's the case.. making loop devices behave more like real block devices. so I have a patch up for devstack to enable that when available16:02
openstackgerritMerged openstack-dev/hacking master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62231716:02
fungiboden: https://review.openstack.org/605834 is when the eventlet lower constraint got bumped in vmware-nsx. that merged on october 416:04
fungithe oslo.service lower constraint has remained unchanged since the job was added16:04
fungiboden: this really hasn't been failing all the way back to october 4?16:04
bodenfungi: https://review.openstack.org/#/c/623609/16:06
bodensee lower-constraints job16:06
fungiboden: agreed, http://zuul.openstack.org/builds?project=openstack%2Fvmware-nsx&job_name=vmware-tox-lower-constraints&branch=master shows it succeeded as recently as 15 hours ago16:06
*** dklyle has quit IRC16:07
*** dklyle has joined #openstack-infra16:07
*** efried has joined #openstack-infra16:08
smarcetfungi: how are u doing , thx for your advices on reviews, but i having one issue on apt::update on xenial its complains about an entry on /etc/apt/sources.list : deb cdrom16:13
*** dklyle has quit IRC16:13
smarcetfungi: if i remove that line by hand , the puppet runs ok16:13
smarcetfungi: i am testing the puppet on xenial 16.04 LTS server16:14
smarcetfungi: error seems to be  _("/etc/apt/sources.list contains a cdrom source; not installing.  Use 'allowcdrom' to override this failure.")16:15
*** fuentess has joined #openstack-infra16:16
clarkbdansmith: that is good to know, I can review the devstack change if you like. I think we are largely switched over to bionic for devstasck/tempest testing at this point so we should see a change if the direct io support helps16:16
ttxHi ! With some IRC meetings having moved to team channels, we have a lot more room in the "common" meeting rooms. To the point where openstack-meeting-5 is not used that much and we could easily consolidate to the other 4. Would that be desirable or overkill?16:16
dansmithclarkb: https://review.openstack.org/#/c/625269/216:16
dansmithclarkb: swift uses a loop as well, but via mount -o loop, which doesn't get directio turned on..16:17
dansmithclarkb: I figure if this seems to help I can refactor out some loop utilities and do the loop manually for the swift piece if we decide it's worth it16:17
ttxwith only 32 lurkers, meeting-5 fails to reach the "lurkers benefit too!" benefit16:17
ttxwe could also get rid of #openstack-meeting-cp. 31 lurkers, no meeting.16:18
*** bobh has quit IRC16:18
* cmurphy had no idea we had a -516:19
ttxcmurphy: well only neutron-upgrades and helm uses it right now. And they could move to another room free at the same time16:19
*** pcaruana has quit IRC16:20
openstackgerritMerged openstack-infra/git-review master: test_uploads_with_nondefault_rebase: fix git screen scraping  https://review.openstack.org/62309616:21
*** ginopc has quit IRC16:22
*** quiquell is now known as quiquell|off16:23
*** e0ne has quit IRC16:24
*** dklyle has joined #openstack-infra16:24
clarkbwhoami-rajat: spsurya: One gotcha with that is that projects have chosen in the past to enforce testing against their commit messages beyond simple rules like metadata for depends on16:25
openstackgerritMerged openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org  https://review.openstack.org/62524116:25
clarkbwhoami-rajat: spsurya implementing a feature like that should likely go in zuul itself and be a per project flag if we want to try it.16:26
*** jamesmcarthur has quit IRC16:26
*** bobh has joined #openstack-infra16:26
clarkbthat said as I have tried to point out elsewhere the real cost for openstack infra is tied up in a small number of repos and really one extra large project. I am going to continue to push for fixing flaky tests and reducing the impact of those expensive projects over these smaller optomizations16:27
clarkbWhat we get out of reliable testing is not just more efficient use of resources but better software too16:27
ttxfungi: opinion on that? (IRC channels ^)16:27
clarkbShrews: yup port 0 will bind to an available high port and the python socket lib lets you ask the socket object for the port number it found16:28
clarkbsuper useful in testing16:28
fungismarcet: you're seeing that error raised by our ci jobs, or on your local system? if the latter, i expect puppet just doesn't think you'll be running on a system installed from cd (or which gets its updates by cd anyway) and instead assumes you'll have removed the cdrom lines from your config already16:28
smarcetthe later16:29
smarcetok i will remove by hand then and re test16:29
smarcetthx u116:29
smarcet!16:29
jbryceThanks for setting up the lists.opendev.org pieces. I think this is a simple but neat step toward getting more communities involved16:29
*** electrofelix has quit IRC16:29
clarkbjbryce: its on its way. I've +2'd https://review.openstack.org/#/c/625254/1 but not approved it in case fungi would like more non OSF input first. fungi you've tended to be cautious on that front in the past, let me know if I should just go ahead an approve or if you want to16:30
fungittx: i do not object to smashing meeting-5 and meeting-cp into the others if someone wants to reach out to those teams to ask them to consolidate. they likely need time to warn their regular attendees about the channel changes16:31
ttxyes of course. Was just wanting to gut-check that was desirable before starting anything16:31
fungiclarkb: well, we have jhesketh's blessing at least. but sure, if we can get an additional infra-root reviewer to weigh in i'm all for that as it is our first proposed mailing list on that new domain16:32
clarkbfungi: any chance you have a moment to quickly review https://review.openstack.org/#/c/615968/3 and its parents. I think I can likely get through that portion of the stack today (so I can approve them in chunks today and babysit)16:34
fungitrying to catch up, but sure i'll get it on my roster16:34
clarkbthanks!16:35
clarkbdansmith: fwiw my version of losetup says that direct-io=on is the default setting16:35
clarkbdansmith: possible we are already enabling it on bionic. /me digs up a bionic manpage16:36
dansmithclarkb: mine too, but I floated a test patch to confirm that's a lie16:36
clarkboh neat16:36
dansmithclarkb: clarkb https://review.openstack.org/#/c/625268/16:36
clarkbya bionic manpage says the same, so if it isn't actually set to on as claimed thats a fun bug16:36
dansmithsee the pastebin in there16:36
clarkbcertainly seems set to 016:37
dansmithwhen I pass =on it goes to 116:37
dansmithso yeah16:37
*** gyee has joined #openstack-infra16:37
*** jamesmcarthur has joined #openstack-infra16:38
*** jamesmcarthur has quit IRC16:38
*** jamesmcarthur has joined #openstack-infra16:39
*** dklyle has quit IRC16:41
*** sthussey has joined #openstack-infra16:41
clarkbfrickler: if you are still around https://review.openstack.org/#/c/625269/ is dansmiths change above that may help cinder test reliability16:44
*** wolverineav has joined #openstack-infra16:44
openstackgerritMerged openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS  https://review.openstack.org/62481716:44
clarkbmwhahaha: ^ fyi that should improve reliability of the multinode setup16:45
mwhahahacool, in general it's been really stable lately16:45
clarkb(it is worth noting that that was always failing in pre-run so should've been retried, but we should avoid retries as much as possible where we can too)16:45
clarkbmwhahaha: ya I think http://status.openstack.org/elastic-recheck/gate.html#1708704 shows a worse picture than what we are seeing on gerrit because those failures will be retried16:46
clarkbbut cleaning that up and getting it out of the way on e-r will improve resource usage slightly and also fix a bug16:46
mwhahahawe've had a few of those in our container update process where we were getting 503s from the mirrors16:46
clarkb(and reshuffle e-r with the more important graphs at the top)16:46
mwhahahaso it might have been accurate actually16:46
fungiyeah, every retry_limit result you see probably means 6x as many jobs got aborted (since some may work on the second or third retry)16:47
clarkbhrm oslo.policy crashing stestr subunit streams is a really weird interaction16:50
clarkbah infinite recursion that will do it16:50
fungiclarkb: i gather it's likely due to creating massive amounts of stdout/stderr?16:51
fungiand yeah, i suppose unbounded recursion could explain that case16:51
clarkbfungi: possibly due to infinite recursion. https://review.openstack.org/#/c/625114/4/glance/quota/__init__.py seems to be the fix16:51
*** jamesmcarthur has quit IRC16:51
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495716:52
*** jamesmcarthur has joined #openstack-infra16:53
*** ykarel is now known as ykarel|away16:55
*** tosky has quit IRC16:56
*** jamesmcarthur has quit IRC16:57
*** shardy is now known as shardy_mtg16:58
*** jamesmcarthur has joined #openstack-infra17:03
*** udesale has quit IRC17:07
*** wolverineav has quit IRC17:07
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495717:08
clarkbssbarnea|rover: was going to ask if you had any more insight into the possible du issue. I'm mostly curious to know what the cause of that was when you sort it out as it seems like it could be useful knoweldge for the future :)17:09
*** sshnaidm|off has quit IRC17:10
*** mriedem is now known as mriedem_lunch17:10
ssbarnea|roverclarkb: sure, I think i almost nailed it. I will add you the the review so you will be able to see it, ok?17:10
ssbarnea|roverclarkb: mainly I am still on it!17:10
clarkbssbarnea|rover: thanks17:10
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495717:10
clarkbssbarnea|rover: its the sort of bug that knowing what causes zuul to exhibit that behavior is useful as a zuul oeprator :)17:11
*** dklyle has joined #openstack-infra17:11
*** aojea has quit IRC17:11
ssbarnea|roverclarkb: sadly I am not sure but i suspect i found a workaroud, see https://review.openstack.org/#/c/624381/1317:11
ssbarnea|roverusing threshold param on du makes it 10x faster, probably because sort ends up doing much less work to sort.17:12
clarkbinteresting so du does still seem suspect17:12
clarkbthe threshold flag is probably a reasonable compromise there17:12
ssbarnea|roverusing timeout solves nothing, even with SIGKILL it does not do it.17:13
dmelladohey clarkb is there any issues with zuul as of now? I have seen patches passing on the gate queue being stuck for a while and not getting merged....17:13
*** e0ne has joined #openstack-infra17:13
clarkbdmellado: I'm not aware of any functional issues with zuul itself17:13
ssbarnea|roverthere is also another aspect, which could underline a possible bug related to std* redirections or buggering.17:14
clarkbdmellado: kuryr-kubernetes is waiting for the top of its queue to pass tests so it can merge. kuryr-kubernetes-tempest-daemon-octavia is still running17:14
*** ginopc has joined #openstack-infra17:14
ssbarnea|roverif you remember we alway shard some warnings about closed pipes around du|sort|tail, somethign I was not able to reproduce outsize zuul.17:14
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495717:14
dmelladoclarkb: I've seen for example https://review.openstack.org/#/c/623554/ and if you check for openstack/kuryr-kubernetes on zuul.openstack.org some patches stuck on the gate queue for a while17:14
ssbarnea|rovermaybe these warnings were related to the blocking bug.17:15
clarkbdmellado: yes the gate is a queue, the top/head of the queue must pass testing and merge before anything behind it can merge17:15
clarkbdmellado: this is what ensures correctness of the resulting code (we remove the race of one change breaking another my merging out of sync)17:15
dmelladod'oh17:15
dmelladoforget it17:15
dmelladoI had filtering enabled and didn't realize it17:15
dmelladolol17:15
dmelladoI guess it's friday after all17:15
*** ginopc has quit IRC17:16
clarkbdmellado: no worries. I had a pretty d'oh moment yesterday thinking we had broken requirements17:17
dmelladoheh, glad that it didn't happen xD17:17
clarkb(turns out it was a broken job on unmerged code, the system was working as intended protecting us from the broken :) )17:17
dmelladoxD17:18
*** efried has quit IRC17:18
*** rpittau has quit IRC17:22
*** sshnaidm|off has joined #openstack-infra17:25
*** tobiash has quit IRC17:29
clarkbssbarnea|rover: thinking about the warnings more, perhaps also related to how zuul does logging?17:29
clarkbssbarnea|rover: could be there is a buffering bug lingering somewhere or similar17:30
clarkb(would need more data to debug that likely)17:30
*** markvoelker has joined #openstack-infra17:31
*** agopi has joined #openstack-infra17:32
*** markvoelker has quit IRC17:35
*** Emine has joined #openstack-infra17:36
*** bnemec is now known as beekneemech17:37
*** psachin has quit IRC17:39
clarkbmriedem_lunch: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_15_29_30_910274 timed out and lost a bunch of time in tempest. Any idea of what is going on there? I can dig in more if that is unfamiliar to you17:39
clarkbit seems that tempest got incredibly unhappy and it just snowballed from there17:39
*** tobiash has joined #openstack-infra17:40
*** e0ne has quit IRC17:41
dansmithclarkb: almost looks like something fundamental is stuck.. like keystone or apache itself17:42
*** e0ne has joined #openstack-infra17:43
*** rkukura has quit IRC17:44
clarkbdansmith: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/apache/access_log.txt.gz shows that apache seems to process requests during the lost time. Many of them to placement17:44
dansmithyep17:44
clarkbThere is the occasional identity request (reupping token?)17:44
dansmithyou know what I mean though, right? if everything after that is just timing out http calls..17:45
clarkbya17:45
*** tobiash has quit IRC17:45
*** derekh has quit IRC17:47
dansmithclarkb: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/screen-n-api.txt.gz#_Dec_14_15_02_19_89872717:49
dansmithrabbit goes down at some point it looks like17:49
*** dklyle has quit IRC17:51
dansmithalthough I don't really see any evidence in rabbit's log,17:51
dansmithso maybe something networking-wise17:51
*** wolverineav has joined #openstack-infra17:52
*** wolverineav has quit IRC17:52
*** wolverineav has joined #openstack-infra17:52
clarkbthere are a bunch of missed heartbeats but ya other than that rabbit doesn't seem to think something is wrong17:52
dansmithwe're connecting to the public ip, but that shouldn't really require that the network be up17:53
dansmithso it'd have to be something like iptables blocking something, or just extreme scheduling lag or something like that17:53
dansmithhttp://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/compute1/logs/screen-n-cpu.txt.gz?level=ERROR17:54
dansmiththe compute is way unhappy about rabbit17:54
dansmithit also got 502 Proxy Error from cinder17:55
*** tobiash has joined #openstack-infra17:55
dansmithand neutron17:55
clarkbdansmith: I've pulled up a render of the dstat csv and I think that explains why this happens17:56
clarkbcpu wai > 50% for much of the job17:56
clarkband load skyrockets. Basically not enough cpu to go around17:57
dansmithdo you have some automated way to do that btw?17:57
dansmithclarkb: cpu wai is not iowait right?17:57
clarkbcpu wai is when the kernel is busy waiting on io iirc17:57
clarkbso it can be iowait related17:57
clarkbdansmith: I dump the csv file from the job into https://lamada.eu/dstat-graph/17:57
dansmithokay, iowait does not mean that there's not enough cpu to go around17:58
dansmithooh17:58
*** armax has quit IRC17:58
dansmithhave to download it to drag/drop it I guess?17:58
clarkbya. There is probably a way to hack things in the js behind the scenes to load from http but I'm a browser noob17:59
*** graphene has joined #openstack-infra17:59
*** dklyle has joined #openstack-infra18:00
clarkbthats a good point though. cpus are busy waiting on other things. Not things the cpu can do itself18:00
dansmiththat cpu wai looks like iowait to me,18:01
dansmithwhich would mean we're on a really io constrained node18:01
clarkbwe don't appoear to be swapping either (though that graph doesn't actually render swap usage so I'll need to look more carefully at the raw data)18:01
dansmithand io total is low until the end18:01
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495718:01
dansmithbut, just because we're not doing any doesn't mean we're not trying18:01
dansmithfwiw, this was next on my list of debugging cinder timeouts, rendering out this data to see if we see spikes around the time we hang for a while18:02
clarkbya, it is also curious that it seems to start around when we start tempest18:02
*** ykarel|away has quit IRC18:02
clarkber no devstack? /me double checks timestamps18:02
*** trown is now known as trown|lunch18:03
clarkbhttp://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_14_50_51_551412 tempest correlates strongly18:03
dansmithah yeah, see,18:03
clarkbso devstack + services isn't unhappy until it adds workload18:04
dansmiththe cpu idle value is like 50% most of the time18:04
dansmiththat means it has nothing to do, but is doing a lot of waiting18:04
dansmithif you mouse over the graph you can see the actual cpu idle value, which is hard to see otherwise since it's white on my screen at least18:05
*** tobiash has quit IRC18:05
*** jpena is now known as jpena|off18:06
dansmithdisk traffic in MB/s is zero the whole time until the end and then writes spike,18:06
dansmithbut there are write iops the whole time18:06
smcginnisSo directio should help prevent that big flush from happening at the end.18:07
*** tobiash has joined #openstack-infra18:07
dansmithsmcginnis: it's possible that that's what it is yeah, although I wouldn't expect the loop to hamstring the whole system this badly18:07
smcginnisYeah, seems like something else is compounding the situation.18:08
*** shardy_mtg has quit IRC18:08
dansmiththere are literally zero read iops over the range I'm looking at, but some write iops all the time18:08
dansmithwhich really sounds like thrashing with very little io bandwidth18:08
clarkbwhat is odd is devstack dose a bunch of io too18:12
dansmiththere is a big spike in writes and write iops early in the run, which is exactly when cinder runs lvchange -ay for the first volume18:12
clarkbso we have io available during the start and end of the job. It isn't until we try to use the cloud that we find it unhappy18:12
dansmithand load starts climbing right there and never recovers18:12
dansmithlet me get a screenshot, this is interesting18:12
clarkbso could be a combo workload plus a bug or different io demands18:12
dansmithhttps://imgur.com/a/0bvhVEB18:13
clarkb(also we did just switch to bionic, possibly this is new bionic behavior and maybe direct-io would help)18:13
dansmithright as that disk spike, load goes nuts18:13
dansmiththat spike is lvchange -ay $first_volume18:13
ssbarnea|roverdoes anyone knows a way to dump the SSL certificates when using a https proxy? i need the certs returned by the proxy for debugging purposes.18:14
clarkbssbarnea|rover: openssl s_client18:14
ssbarnea|roverclarkb: i know how to use it to get certificate from a normal web server but not to make the request to a proxy18:14
dansmiththere's a net spike at the same time.. we don't have /opt mounted on nfs or something crazy do we?18:15
ssbarnea|roverthe HTTPS proxy would generate and sign a SSL cert using its own CA-cert.18:15
clarkbssbarnea|rover: ok so the proxy is a mitm18:15
fungissbarnea|rover: right, make a request to the proxy and you'll get it18:15
clarkbssbarnea|rover: in that cse I think s_client should still work18:15
clarkbsince that is the cert you see not the one on the backend18:16
ssbarnea|roverclarkb: yep... me trying to debug why curl works and python requests (and pip) choke with the same cert bundle.18:16
clarkbssbarnea|rover: probably because python requests uses its own package of CAs to trust18:16
clarkbso it isn't using your system set18:17
fungissbarnea|rover: they don't use the same trust set. python (or requests if python is too old) bundles its own by default18:17
ssbarnea|roverclarkb: yep, i need the proxy cert, but the proxy cert returned when the request was made for that specific URL (cert will vary on each website)18:17
clarkbssbarnea|rover: not if you are mitm'd18:18
ssbarnea|roverclarkb:  I do have SSL_CERT_FILE=/Users/ssbarnea/cacert.pem and REQUESTS_CA_BUNDLE=/Users/ssbarnea/cacert.pem -- which worked well so far.18:18
fungiclarkb: they'll still differ for each site if it's a "transparent" proxy which uses its own ca to generate new certs for those sites on the fly18:18
ssbarnea|rovercacert.pem contains the root-CA from the proxy server. proof that is ok is that both browsers and curl do accept use of the proxy.18:19
clarkbdansmith: maybe this is a case of getting your direct-io change in. Then looking to see if behavior changes or persists18:19
dansmithit will definitely be interesting to see if and what changes yeah18:19
*** armax has joined #openstack-infra18:19
dansmithclarkb: side note.. we _have_ to automate this dstat graph thing right? :)18:19
clarkbdansmith: there was support for it in the stackviz tool but that broke somewhere along the way18:20
fungissbarnea|rover: pass verify='/path/to/public_key.pem' as a parameter into your requests methods18:20
clarkbdansmith: but ya it would be nice t get this back into easy to consume format18:20
dansmithyeah18:20
ssbarnea|roveryep, my proxy is configured in transparent mode but traffic is not enforced, still need to tell clients to use the proxy.18:20
ssbarnea|roverfungi: I did set verify, still fails.18:20
fungissbarnea|rover: you can also export REQUESTS_CA_BUNDLE18:20
*** armax has quit IRC18:21
ssbarnea|roversee https://gist.github.com/ssbarnea/3d5067d41abc68c3788f1c9bc0ab4418#file-ssl-request-transparent-proxy-txt-L2918:21
ssbarnea|roveryes they are exported.18:21
*** armax has joined #openstack-infra18:22
dansmithclarkb: I picked another random tempest-full run from a few days ago and it doesn't have the same signature18:23
*** wolverineav has quit IRC18:23
ssbarnea|roveri suspect one of two issues: either requests fails to load the entire ca bundle (>250kb in size), or it fails to validate the entire chain because the MITM re-signing generate some intermediary certs if I remember well. probably requests fails to inherit the trust from CA.18:23
dansmithlike, we do IO the whole time successfully and much more normally18:24
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495718:24
ssbarnea|roverobsiously that it makes no sense to export the certs from each visited site, trusting the CA should be enough.18:25
fungissbarnea|rover: i can't fathom why there would be any intermediary chain required if you're putting the ca's signing cert into the bundle already18:25
clarkbif it does end up being an issue we can tie back to ubuntu bionic we may want to see if coreycb is able to help18:25
*** dklyle has quit IRC18:25
fungissbarnea|rover: contents of the ca bundle are basically the end of the line. if those signed something, they're trusted18:26
ssbarnea|roverfungi: exactly. i used this proxy to install fedora/centos without any problems once I trusted the CA. Only pip chokes on it, on mac... in fact i have an idea...18:26
fungier, of one of those signed something, the cert with that signature is trusted is what i meant to say18:26
*** Swami has joined #openstack-infra18:26
*** wolverineav has joined #openstack-infra18:26
clarkbfwiw pip has historically had other issues with proxies too18:27
fungissbarnea|rover: oh! this is pip on a mac? not pypi-installed pip in a ci job?18:27
ssbarnea|roveronly one issue? python requests is notorious to do things in its own way around SSL.18:27
fungii thought you were trying to work out how to dump and log ssl certs in a ci job. tunnel vision ;)18:27
Linkidhi18:28
*** wolverineav has quit IRC18:28
*** wolverineav has joined #openstack-infra18:28
LinkidI have a question about the spec I'm writing18:28
ssbarnea|roverfungi: well, my long shot is to see if I can use a HTTPS proxy as a generic proxy, as a simple way to avoid configuring custom mirrors.18:28
LinkidI saw that you are using puppet modules to install services18:28
Linkidbut I saw that there is a spec for using ansible + containers18:29
clarkbLinkid: yes, we are now running two services with ansible alone (no puppet), and working out some bugs shown in testing to do containers (so no container based services yet)18:30
Linkidso, I'm wondering what is the way I should speak of in the spec for a new service18:30
clarkbLinkid: depending on your patience for tools and in general I would likely assume ansible if you are impatient, but can assume containers if more patient18:31
fungissbarnea|rover: you're sure your ca bundle is in pem format?18:31
clarkb(unfortunately we are in the weird sport of figuring out what our migration looks like. I think you should be fine to pick one and go with it and if we learn stuff that changes your spec we can help you to update it)_18:32
*** armax has quit IRC18:33
ssbarnea|roverfungi: it it would not be, curl would choke because I defined SSL_CERT_FILE to point to the same file.18:33
openstackgerritClark Boylan proposed openstack-infra/system-config master: Set iptables forward drop by default  https://review.openstack.org/62450118:33
openstackgerritClark Boylan proposed openstack-infra/system-config master: Collect syslogs from nodes in ansible tests  https://review.openstack.org/62482718:33
openstackgerritClark Boylan proposed openstack-infra/system-config master: Import install-docker role  https://review.openstack.org/60558518:33
fungissbarnea|rover: is it possible curl supports SSL_CERT_FILE in other formats? rather than a bare cert18:34
fungier, like a bare cert18:34
clarkbinfra-root ^ I've removed the ipv6 setting from that change as we intend on using the system network namespace anyway. I expect that update will get the change into a mergeable state. Now why did it just push all three chagnes I only updated the last one18:34
clarkbOH!18:34
fungissbarnea|rover: like could it be in der format maybe?18:34
ssbarnea|roverfungi: I don't know but I will narrow it down, probably tomorrow as I am already getting tired.18:34
ssbarnea|roverlooks like PEM to me.18:34
clarkbfungi: stephenfin ssbarnea|rover ^ this points at a git-review bug18:34
clarkbI ran git-review in a dir that doesn't exist on master so it failed with Errors running git reset --hard 3496292b845d33be6c5649195a54ccbf7649405018:35
openstackgerritsebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7  https://review.openstack.org/62495718:35
clarkbI think it must've done the rebase by that point18:35
clarkband that error prevented it from undoing the rebase so when I pushed it pushed up the rebase too18:35
fungissbarnea|rover: another workaround seems to be setting cert=/path/to/ca.crt in a [global] section within pip.conf18:35
clarkbthankfully the diffs come out cleanly18:36
ssbarnea|roverfungi: it does not help as it does the same thing as setting the variable, already did. the only way I was able to make it work was to set verify=False which is not really an option18:36
ssbarnea|roveranyway, i will find a solution.18:37
fungiclarkb: i expect we don't get enough testing of running git-review from random subdirs of a repo. maybe we should make sure it performs all its actions under a chdir18:37
clarkbfungi: ya repo root should be more predictable18:37
fungissbarnea|rover: this sounds like you should be filing a bug report against pip/requests or asking for help in their irc channels instead18:38
jrosserssbarnea|rover: i have a similar situation and add a custom CA to the system CA store18:38
jrosserthen using the requests env var to point to that and it is all good18:38
fungijrosser: that's working for pip install in particular?18:38
*** smarcet has quit IRC18:38
fungior just requests-based python software in general?18:38
jrosseri get an entire openstack-ansible deploy done in an environment like that18:39
jrosserthe big gotcha is that by default requests is setup to use certifi on ubuntu, but you can't add extra custom CA to that18:39
*** graphene has quit IRC18:39
jrosserso the env var is needed to point it at ca-certificates stuff instead18:40
jrosserso in principle if all the certs are good then it should work18:41
dansmithclarkb: does anyone try to do correlation between e-r failures and providers?18:42
dansmithclarkb: it would be interesting to know if the cinder-related failures are almost always on one provider18:42
clarkbdansmith: I had tried some of it with the ovh bhs1 slowness we saw18:42
clarkbdansmith: there were ~6 bugs that went away when we turned off bhs1 the first time18:43
ssbarnea|roverthere is a patch for e-r that should add metadata about this, i think.18:43
dansmithclarkb: nice18:43
ssbarnea|roverit should provide info while you hover of the graph, once we merge it.18:43
fungidansmith: yes, fairly easy to do by following the logstash query link from the graphs page and then adding the node_provider column18:43
clarkbdansmith: I haven't followed up since we turned off bhs1 again (and is still off)18:43
clarkbat least in that specific case amorin foudn a memory leak on their end that would need fixing and we'll likely do artificial load testing with devstack and tempets outside of zuul before adding it back to the pool18:44
dansmithfungi: ah thanks18:44
fungidansmith: usually if there is a strong correlation with node_provider in the results you don't really need to do much statistical analysis18:44
dansmithfungi: yeah18:44
fungilike, if it's a provider-specific issue 9 out of 10 results will be that one provider18:44
clarkbdansmith: in this particular case it happened on inap which we weren't previously watching for io issues18:44
fungiat least on the ones i've investigated in the past18:45
dansmithyup I just didn't know I could click one box and get that in front of my face18:45
*** chandan_kumar has quit IRC18:46
*** dklyle has joined #openstack-infra18:46
fungion the failures related to job timeouts it's been a little more subtle, so i've resorted to turning off all columns except node provider, pasting the resulting list into a file and running it through sort|uniq -c18:46
fungiusing the little gear to the top-right of the results list to crank up the number of results per page also helps for that case18:47
fungiso that you don't have to stich together multiple pages of results18:47
fungier, stitch18:47
dansmithyeah18:48
*** mriedem_lunch is now known as mriedem18:49
Linkidclarkb: ok, thanks :)18:49
clarkb(thinking out loud with little hard evidence here) if we do find that we have general IO issues across clouds we may have to consider it is not cloud specific but potentially something in how nova runs compute or issues in the linux kernel. I think most of our clouds use local storage for our instances. The exception being vexxhost in sjc1 whcih is boot from ceph backed volumes18:50
mriedemclarkb: nope don't think i've seen that18:50
ssbarnea|roverclarkb: fungi dansmith : it may worth checking https://review.openstack.org/#/c/260188/10 about e-r -  i only rebased it to make it pass ci and tested CLI output. i didn't had time to test the graphs.18:50
*** mriedem has quit IRC18:52
*** dklyle has quit IRC18:54
ssbarnea|roverclarkb: aparently my du patch seems to work well, but I will do two more rechecks on it to be sure is not a just luck.18:54
*** mriedem has joined #openstack-infra18:56
fungissbarnea|rover: what was the workaround there?18:58
ssbarnea|roverfungi: --threshold=100K combined with a nohup ... & -- just to be sure.18:58
fungissbarnea|rover: oh, you weren't using --summarize18:59
fungihence, tons of stdout19:00
ssbarnea|roverlikely i decided that i don't want to wait for du to finish. the irony that the treshold already made it it run in 0s.19:00
ssbarnea|roverbut the background part should be a saferty measure if it ever hangs again, the job will miss the report, but it will be a success.19:01
clarkbas long as you still get a total so you can see if thinsg move on the long tail I think its fine19:01
ssbarnea|roverthis is why i want to have 2-3 rechecks to see if I spot one such job.19:01
*** fuentess has quit IRC19:02
ssbarnea|rover--threshold=100K was a huge improvement on my own machine: it reduces the report from 15s to 1s.19:02
*** jamesmcarthur has quit IRC19:02
ssbarnea|rover(not same data as build but clearly less data goes to sort, not like 1M files to sort just to keep top 200)19:03
clarkbfungi: I promise to buy you all the beers in portland next month if you can review https://review.openstack.org/#/c/615968/ and children :) (I'm impatient but the turnaround time on those puppet related chagnes isn't the quickest)19:09
fungiclarkb: i have them up in gertty, almost there now19:09
clarkbyay19:09
fungialso, i don't think i could drink all the beer in portland any more than i could eat all the rice in china19:10
clarkbthe trick is to be strategic about the beers you do drink then call it good19:10
fungisound logic19:10
fungii just now approved the one for logstash.openstack.org19:11
*** dklyle has joined #openstack-infra19:11
fungido you want me to only +2 the others so you can approve at your preferred pace?19:11
clarkbfungi: ya that would be good19:11
fungidon't want to flood you with too many at once19:11
clarkbI'll do subunit workers and elasticsearch with the logstash one as well. Then make sure those are happy before being a bit more cautious with the git servers19:12
*** e0ne has quit IRC19:13
fungithere are so many of these chained together that the threading in gertty's change display doesn't even give me the first letter of title on the last dozen or so19:13
fungier, changes list display i mean19:13
clarkbhrm your reviews prompted me to hop on nodes and double check things and lists.o.o and wiki-dev don't have parser = future in their puppet.conf19:14
clarkbso I don't think this is actually working as expected.19:14
clarkbI think I won't approve any additional ones and instead try to figure out why they aren't futureparsing19:14
fungioh, are we failing to set it?19:14
*** bhavikdbavishi has quit IRC19:15
*** bhavikdbavishi has joined #openstack-infra19:15
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository  https://review.openstack.org/62529219:16
clarkbfungi: it looks that way though not obvious to me why that is teh case19:16
fungiwhat were some of the earlier ones we tried to turn on?19:18
clarkb`ansible --list-hosts futureparser` shows wiki-dev01.o.o is in the group19:18
clarkbfungi: eavesdrop is one that works19:19
clarkbit was the host before kata containers lists19:19
clarkbI haven't checked kata containers lists though19:19
*** armax has joined #openstack-infra19:19
clarkbfungi: neither lists server shows up in the --list-hosts output from above19:20
*** bhavikdbavishi has quit IRC19:20
clarkbI think there are at least two issues. The first is lists.* don't show up in the futureparser group at all. The second is hosts like wiki-dev01.o.o which is in the group not getting it. Oh that is beacuse puppet is disabled on wiki-dev01 maybe?19:21
*** dklyle has quit IRC19:21
clarkbya not seeing any puppeting happen there19:21
clarkbin that case I think we wait for logstash to happen and see if it works properly as expected. Then figure out the lists server group membership issue19:21
clarkbfungi: I think its a localized issue to the list servers now. Likely the glob is wrong for them19:22
clarkblogstash.o.o should confirm19:22
clarkbya I see the problem now [0-9]* in glob means something different than in regex19:23
clarkbin regex it means match 0 or more, in glob it means match the digit always then match anything19:23
* clarkb scribbles a note to come back around and address that when we can watch the lists19:23
*** wolverineav has quit IRC19:25
*** wolverineav has joined #openstack-infra19:25
fungiaha19:26
fungiyes indeedie19:26
*** Emine has quit IRC19:26
fungii don't think there's a "zero or more" operator in shell globbing19:27
*** gagehugo has quit IRC19:28
fungiwell, except for an any match at least19:29
*** trown|lunch is now known as trown19:29
*** wolverineav has quit IRC19:30
openstackgerritClark Boylan proposed openstack-infra/system-config master: Import install-docker role  https://review.openstack.org/60558519:30
clarkbthis time git-review only pushed the one change19:31
clarkbya the * is a match nothing or anything which would work there actually19:31
clarkbbut I think we may want to list lists.o.o and lists[0-9]*.o.o separtely and delete lists.o.o when we get that host converted over19:31
fungiright. basically need two entries in the case of non-enumerated hostnames19:32
fungior to cover the case of19:33
clarkbI'll try to get some of these changes merged before pushing updates to that file though :)19:33
clarkbrebasing yesterday was an interesting experience19:33
fungiseeing [0-9]* in there is definitely confusing as it means two distinctly different patterns depending on whether you intended a file glob or a regex19:34
clarkbya I've already had to fix a bunch of related bugs19:34
clarkb+ isn't valid in globbing for example19:34
fungione digit followed by anything (or nothing) vs zero or more digits19:35
clarkbmaybe a file header comment that says "this file uses shell globs not regexes"19:35
fungii really only see that assumption as a risk if we expect to have digits in the middle of the host portion of some server names19:35
fungiif we can assume digits will always fall at the end, it's fine19:35
clarkbya so far that is true19:36
*** wolverineav has joined #openstack-infra19:38
*** dklyle has joined #openstack-infra19:42
notmynameI'm seeing a permission denied error on one of our test jobs. it doesn't seem to be something related to swift code, so I'm hoping someone here may be able to provide some insight. http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/job-output.txt.gz#_2018-12-14_19_23_50_36210219:42
clarkbnotmyname: let me see19:42
notmynamethank19:43
notmynames19:43
openstackgerritSean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600  https://review.openstack.org/62529019:45
clarkbnotmyname: I think pip is saying it can't read the git repo for swift. I believe that command is running as the zuul user so if something earlier in the job has updated or chowned that repo to another user that may explain it (dgging in logs for any evidence of that)19:45
clarkbnotmyname: http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/ara-report/file/1ba7e97e-08df-4385-b6ca-32e4edce0d22/#line-28 I think that may do it19:47
clarkbnotmyname: that task is running python setup.py develop in teh swift repo as root which will modify the build dir and package link stuff iirc. Then when tox tries to do the same as zuul user it fails to update those files19:48
clarkbnotmyname: I think there ae a few options to fix that 1) only install swift external to tox (there is a tox setting to not install the source repo) 2) only install with tox and don't do it externally19:49
clarkbif you are just using tox as a way to trigger the testsuite then the first option might make the most sense19:49
clarkbthere is a 3) which is have a subsequent task cleanup/chown things so that tox works19:50
notmynameclarkb: ah, ok. thanks. I'll have to think on this and talk with timburke and tdasilva to see what the best option is19:50
clarkband 4) run tox as root19:50
*** dklyle has quit IRC19:55
mriedemi just noticed that https://docs.openstack.org/nova/latest/admin/live-migration-usage.html hasn't been updated since september, but there have been changes to that doc since then - is that normal?19:57
fungimriedem: possible your doc publication jobs have broken, i suppose. looking now19:58
*** mtreinish has joined #openstack-infra19:58
mtreinishis there any way for us to check the load on the trovie instnace that runs the subunit2sql db19:58
mtreinishI've had a couple queries going for >173min. and I'm wondering if the little trove node is too small for the size of the db now19:59
clarkbmtreinish: I don't think we get system level access like that so unless mysql can provide that info I'm guessing no20:00
fungimriedem: http://logs.openstack.org/f6/f6996903d2ef0fdb40135b506c83ed6517b28e19/post/publish-openstack-tox-docs/e140ad1/job-output.txt.gz#_2018-12-14_15_29_25_02637320:00
fungilooks like it's getting built and included20:00
openstackgerritMerged openstack-infra/system-config master: Turn on the future parser for logstash.openstack.org  https://review.openstack.org/61566020:00
clarkbbut maybe rax collects that data for us?20:00
mtreinishclarkb: hmm, that's what I expected the answer was gonna be :/20:01
fungiclarkb: mtreinish: yes, i think we can see it in the rackspace cloud dashboard20:01
mtreinishwell mriedem will just have to keep waiting for his list of slowest tempest tests20:01
fungiheh20:01
mtreinishit probably wouldn't hurt to check the dashboard though and think about upsizing the node20:03
clarkbcan we/should we trim the db?20:03
mtreinishwe trim at 6 months now20:03
clarkbah so its already bound, in that case ya maybe upsize is best if we see indication its too small20:03
* clarkb tries to figure out where that lives20:05
mtreinishwell we were supposed to be running a cron or something to trim to six months, fungi and I set that up a long time ago20:06
mtreinishbut I just checked the oldest entries in the db and it's from june 201420:06
clarkbnice20:06
mtreinishoh, but theres only a few20:06
mtreinishthen 201620:06
mtreinish10/201620:06
mtreinishthen 11/201720:07
mtreinishand then it's 6months and a lot more data20:07
mtreinishI guess our triming job isn't perfect :p20:07
clarkbmight need to do multiple passes to get the new stuff20:08
clarkbso I see a bunch of our DBs for services but not one for health20:08
mriedemmtreinish: gibi already eyeballed it20:08
clarkbwhcih makes me think i am not looking in the right place20:08
mriedemusing good ol' gumption20:08
mriedemsee comments 11 and 12 https://bugs.launchpad.net/tempest/+bug/178340520:09
openstackLaunchpad bug 1783405 in tempest "Slow tests randomly timing out jobs (which aren't marked slow)" [High,In progress] - Assigned to Ghanshyam Mann (ghanshyammann)20:09
*** betherly has joined #openstack-infra20:09
clarkbfungi: any idea where the db is hiding?20:09
mtreinishclarkb: it's probably called subunit2sql something20:09
mriedemfungi: ok maybe my browser has that page cached.../me hard refreshes20:09
fungiyeah, it'll be the subunit2sql db20:09
clarkbmtreinish: bah there it is20:09
clarkbok I'm just blind in that case :)20:09
mtreinishwe turned it on around paris iirc :p20:09
*** bobh has quit IRC20:09
mriedemhmm, that didn't help20:10
clarkbmtreinish: load average spiked to 15 and is on its way down now but still high20:10
mtreinishmriedem: heh, ok20:10
mtreinishmriedem: oh, you actually used openstack-health?20:10
*** armstrong has quit IRC20:10
mtreinishclarkb: hmm, that's probably not a good sign20:10
clarkbmemory usage is either happy or the gauge isn't working20:10
clarkb(I don't see any memory usage)20:11
clarkbdisk usage looks ok20:11
*** e0ne has joined #openstack-infra20:11
mtreinishI'm pretty sure we used the smallest node size when we provisioned it20:11
*** e0ne has quit IRC20:11
clarkbits pretty big now at least for disk20:11
clarkbI don't see an indication of the cpus we've got20:12
mriedemmtreinish: i did, however, i still complained about it's ux when i did (in here yesterday)20:12
mriedemi can't use it w/o cursing it's name first20:12
openstackgerritMerged openstack-infra/system-config master: Add rust-vmm OpenDev ML  https://review.openstack.org/62525420:13
clarkbmtreinish: I take that back I misread this graph20:13
clarkbcpu usage is 15% not a load average20:13
*** betherly has quit IRC20:13
clarkbload average is ~220:14
clarkband has been for ~3 hours20:14
mtreinishheh, well that's when I started running my script20:14
clarkbso it seems that we notice your script but it doesn't seem that the script has used all the available memory or cpu or disk20:15
clarkbmight also need to consider if the query is inefficient or can be improved?20:16
mtreinishI was looking at the explain before it didn't look that bad to me, but I'm hardly an expert20:17
clarkbthe 15% cpu usage implies that either we have a bunch of cpus and mysql can't use them for that query or we are waiting on io?20:17
mtreinishhttp://paste.openstack.org/show/737336/20:18
mtreinisherr I guess I misread it before, it's not that great20:19
mtreinishI'm probably trying to grab too much data at once20:19
clarkbthe time scale on load average and cpu % are different so overlapping them in my head is hard20:19
mtreinishmordred: ^^^ want to fix it for me :p20:20
clarkbmtreinish: I'm definitely not a db expert :)20:20
clarkbmtreinish: is the 500k rows for test ids unique runs or just unique tests20:20
clarkbbecause ya if then going through 500k unique tests to find all the unique run times that could be quite expensive20:20
*** jento has quit IRC20:21
clarkbbut if its 500k unique test runs I would expect that to be easy for it20:21
mtreinishit's 500k unique test_ids (which I expect is that table's size)20:21
mtreinishthe tests table is the test name, total run counts, and a moving average of run times for that individual test across all runs20:22
*** armax has quit IRC20:22
mtreinishthe query my script generated was: http://paste.openstack.org/show/737337/ (yeah paste not line wrapping)20:22
mtreinishI just called: https://github.com/openstack-infra/subunit2sql/blob/master/subunit2sql/db/api.py#L1854 although looking at that it's grabbing more data than i actually need (and has an extra join for no benefit because of that)20:27
Shrewsoof, that explain doesn't look so great for that query20:31
*** rkukura has joined #openstack-infra20:31
Shrewsyou might try a combined index in 'runs' that contains both uuid and id, but i'm speculating what the table schema actually looks like20:31
Shrewsb/c that first table scan is likely what's hurting you20:32
Shrewsbeen a while since i did that type of stuff, too, so take with a grain of salt  :)20:32
clarkbfungi: as an fyi it appears lists.opendev.org records are in place20:35
fungiexcellent20:36
clarkbfwiw logstash seems to have future parsered and broken its apache20:37
clarkbwhich isn't a huge emergency but I'm sorting that out now20:37
*** priteau has quit IRC20:37
fungi#status log started the new opendev mailing list manager process with `sudo service mailman-opendev start` on lists.openstack.org20:39
openstackstatusfungi: finished logging20:39
*** armax has joined #openstack-infra20:39
*** diablo_rojo has joined #openstack-infra20:40
*** wolverineav has quit IRC20:43
*** wolverineav has joined #openstack-infra20:44
mtreinishShrews: that seems correct, I'm trying to rewrite the script using a better query right now20:44
openstackgerritClark Boylan proposed openstack-infra/puppet-kibana master: Set server admin var so that vhost works  https://review.openstack.org/62534420:47
clarkbfungi: ^ I expect that to be the fix for logstash.o.o apache brokenness20:48
fungiclarkb: for some reason the mm_domains variable addition in system-config change doesn't seem to have propagated to lists.o.o20:48
clarkband now I must lunch20:48
clarkboh hrm20:48
clarkbdid puppet actually run?20:48
fungiit created the new mailing list20:48
fungiit just didn't update exim configuration20:48
clarkbyes pupept has run accordingto syslog20:48
clarkbfungi: I can help look after lunch20:49
fungi-r--r--r-- 1 root root 34445 Nov 13 04:15 /etc/exim4/exim4.conf20:49
fungii have to disappear to meet someone in ~25 minutes but may work it out before then20:49
*** wolverineav has quit IRC20:49
*** betherly has joined #openstack-infra20:50
clarkbhave a link to the line that sets it?20:52
fungias best i can tell roles/exim/templates/exim4.conf.j2 isn't getting applied by ansible20:54
*** betherly has quit IRC20:55
clarkboh right exim is in ansible now20:55
fungithe "Write Exim config file" task in roles/exim/tasks/main.yaml is what should be applying that20:56
clarkb/var/log/ansible on bridge has the log files20:57
fungiit sets dest: "{{ config_file }}"20:57
*** wolverineav has joined #openstack-infra20:58
fungiwhich roles/exim/vars/Debian.yaml sets to /etc/exim4/exim4.conf20:58
clarkbI dont see mm_domains in the conf.j2 fike21:00
clarkb*file21:00
fungii still feel uninformed about ansible... does it automatically replace files with a template task?21:00
clarkbyes it should21:00
fungiclarkb: it's used to set a couple values in playbooks/host_vars/lists.openstack.org.yaml21:01
funginot used in the role itself21:01
clarkbah its transitive21:01
fungione of them is exim_local_domains which is what i'm tracking down now21:01
fungithat does then get included in the template21:01
fungibut the template never seems to have been written out on lists.o.o once that was updated21:02
clarkb the force parameter to template is defaulted to yes21:03
clarkbso should update if contents differ21:03
dmsimardunrelated question, I want to sync a feature branch with master with no regard to git history -- in the context of gerrit and zuul, should I send a merge commit for review or should I delete and re-create the branch ?21:04
dmsimardI guess it's sort of the reverse of when we merged the zuulv3 branch of zuul back into master21:05
clarkbfungi do we maybe overwrite mm_domains in the ansible var data on bridge?21:05
clarkband so old value supercedes your  ew value?21:05
clarkbdmsimard: is ths feature branch in gsrrit?21:05
dmsimardclarkb: it is21:06
clarkbit already exists that is? I would msrge master to featude then push that to gsrrit21:06
dmsimardopenstack/ansible-role-ara has a feature/1.0 branch21:06
dmsimardok so something like git merge master feature/1.0 and then git push gerrit feature/1.0 ? or a git review ?21:06
mtreinishmriedem: fwiw: http://paste.openstack.org/show/737340/ is the average run time of every test over the last 300 runs of tempest-full in gate21:08
clarkbdmsimard: git review should work21:08
clarkbdmsimard: it will push up a proposal chagne for merging the merge commit21:08
dmsimardclarkb: ok, I don't think I've ever sent a merge commit for review -- I'll try that, thanks :D21:08
clarkbfungi: its not overridden in private vars from what I see21:08
fungiclarkb: on bridge.o.o `sudo grep -ir mm_domains /etc` only turns up hits in /etc/puppet/modules/exim/templates/exim4.conf.erb which is presumably vestigial21:09
clarkbdmsimard: note that you may have to allow it in your acls file21:09
clarkbfungi: ya that should be ignored beacuse its puppet21:09
mtreinishmriedem: you can generate it in the future by just running: http://paste.openstack.org/show/737341/21:09
clarkbfungi: I'm currently looking for evidence that role ran at all21:11
*** rlandy has quit IRC21:12
fungiyeah, i was checking the logs on bridge.o.o and thinking the exim role isn't getting used?21:12
clarkbfungi: its in playbooks/base.yaml21:13
clarkbbut ya I don't see evidence of it in the logs21:13
*** chandan_kumar has joined #openstack-infra21:13
fungii agree, seems to be included for all of hosts: "!disabled"21:14
clarkbbase-server is running according to the logs21:15
fungiclarkb: i don't see the iptables role getting applied either21:16
fungiand it's in the same set in base21:16
clarkbya but base-server is which is really weird21:16
fungii need to run now. worst case we can append lists.opendev.org to the two hostlists in /etc/exim/exim4.conf and reload the exim4 service while digging deeper21:16
clarkbits almost like ansible crashes21:17
clarkbthe logs go from base-server to "Base: configure OpenStackSDK on bridge which is the next block of tasks21:18
fungii've manually appended that hostname to the hostlists in the exim config and reloaded21:18
fungiand with that i'm disappearing for an hour or so but will check on this as soon as i get back21:18
clarkbit looks like my rewrok of the debian handling for arm may be the cause?21:18
clarkbat least it gets to that point then stops21:18
mriedemmtreinish: ok 2-4 there in that first paste are the same things i identified from gibi's comments in the bug report21:19
mriedemso that's good to know21:19
mriedemtempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern: 161.5385627070707 is the only difference, but that one shouldn't surprise me21:19
clarkbdmsimard: is include_tasks as used at https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/Ubuntu.xenial.aarch64.yaml#n6 not expected to work?21:20
mriedemkind of surprised that isn't already marked as slow21:20
clarkbdmsimard: http://paste.openstack.org/show/737342/ I'm seeing that include_task seemingly nop then the entire playbook jumps to the next play https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/base.yaml#n12 any idea why that happens21:23
*** tpsilva has quit IRC21:23
clarkbalso extra scary here is that ansible isn't failing (this is something that for all puppet's problems it was really good at, if something can't happen then be safe and stop)21:25
dmsimardclarkb: I have no idea why that is21:25
dmsimardhang on21:26
dmsimardhum21:27
*** dklyle has joined #openstack-infra21:27
clarkbdmsimard: I'm tempted to just copy pasta that set of tasks from Debian.yaml into the arm64 task list for now21:28
clarkbbut am open to other ideas (like is it worth trying import_tasks instead of include_tasks?)21:29
dmsimardI'm a bit lost in the Ansible transition between include and import, especially across versions21:29
clarkbI'm sure I'm more lost :)21:29
dmsimardMy understanding is that include is parsed "at runtime" and is meant to be used when there are conditions attached21:29
clarkb(this is a specific topic that could use better docs)21:29
dmsimardWhile import is static21:30
clarkbdmsimard: ya, but in this case its being called from a file that itself is conditionally loaded21:30
clarkbso I'm guessing it has to be included not imported? or may not21:30
dmsimardworth a try21:30
clarkbhttps://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/main.yaml#n6721:31
dmsimardimport_* came with Ansible >= 2.521:31
dmsimardAccording to https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_includes.html21:31
clarkbmaybe it can't handle multiple levels of iclude21:31
clarkb(the docs say it can but could be buggy)21:31
dmsimardyeah, I'm not implying there's no bug or odd behavior at play here21:32
dmsimardThere's definitely something weird going on21:32
*** dklyle has quit IRC21:32
*** dklyle has joined #openstack-infra21:33
dmsimardIn unrelated news, I was looking at a system-config run through ara (to see what the base-server role did) and I'm confused why a job came back successful despite a failure http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/21:33
dmsimard(from https://review.openstack.org/#/c/605585/)21:33
clarkbcould be the same bug21:34
clarkbansible is apparently not trackign failures in both cases21:35
clarkbwe should double check we are checking return codes properly too21:35
*** woojay has quit IRC21:35
dmsimardThe failure in that particular system-config run was http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/result/e3ced53a-5b94-484c-976d-868386826527/21:35
dmsimardBut everything is green from the perspective of Zuul :/ http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/21:36
*** woojay has joined #openstack-infra21:37
dmsimardctrl+f for "root_rsa_key" is turning up empty in the full job log..21:38
openstackgerritClark Boylan proposed openstack-infra/system-config master: Copy pasta the debian base server bits, don't include them  https://review.openstack.org/62535021:38
clarkbthere is the naive make it work change (I hope that makes it work21:38
clarkbdmsimard: where does bridge.yaml run? I see the base.yaml in the surrounding ara report21:40
*** jamesmcarthur has joined #openstack-infra21:40
dmsimardclarkb: that's what I was trying to find out but I ended up being even more confused21:41
dmsimardThe failing task is "Write out ssh private key" which is "changed" in run-base.yaml (from the perspective of zuul) but it's "failed" when it later runs for bridge.yaml from the perspective of system-config ?21:42
dmsimardhttp://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/job-output.txt.gz#_2018-12-14_19_41_53_38794121:42
dmsimardThat's the only instance of "Write out ssh private key" in the job logs, I suppose the one from inside system-config is not in the stdout21:42
clarkbdmsimard: its from run-base.yaml21:43
clarkbwhich is the run playbook for the job21:43
*** EvilienM is now known as EmilienM21:44
*** weshay is now known as weshay_pto21:44
*** jamesmcarthur has quit IRC21:44
*** jamesmcarthur has joined #openstack-infra21:44
clarkbis it possible the streams are getting crossed?21:44
clarkbdmsimard: ok I think I get it now maybe21:46
clarkbdmsimard: the job's run playbook runs bridge.yaml to set things up for the job. We pass in a ssh key value there http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/ee88cd4f-f3b3-4ea8-b0f6-d0fbc9f05bea/21:47
dmsimardyeah, that one works21:47
dmsimardbut the nested one doesn't21:47
clarkbdmsimard: then what the job is testing is that we can run ansible against all the hosts in our inventory which happens to include bridge.o.o so it reruns bridge.yaml only this time we don't pass in the ssh key info21:47
dmsimardwhat I don't understand is that there's no such thing as "Write out ssh keys" in the ansible-playbook command output, unless I'm not looking at the right one: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/89704ccd-5067-4715-81c9-fa0dcee02e55/21:48
clarkbdmsimard: ya thats base.yaml which doesn't run bridge.yaml I don't think21:48
clarkbrun_all.sh runs bridge.yaml21:49
clarkbwhich is where this gets weird21:49
clarkbits the cron21:49
*** jtomasek has quit IRC21:49
dmsimardso that's why we wouldn't have the output anywhere then ? it's in the cron shell ?21:49
clarkbso that run isn't part of the job except that the job installs a cron to run things every 15 minutes21:49
clarkbyup21:49
clarkband that cron ansible is corssing the streams with your nested ara21:49
clarkbI think the clean up for this is to disable the cron on test jobs21:50
dmsimardthat would explain why the job didn't fail as a result21:50
clarkbyup21:50
dmsimardianw: ^ FYI, tl;dr is there was a failure in the nested bridge.o.o ara report and I was confused as to why Zuul hadn't failed the job: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/21:52
clarkbdmsimard: fungi: https://review.openstack.org/#/c/625350/1 is I think the short term ansible fix (and we should watch it when it goes in to make sure iptables continues working as expected ugh, I really wish that ansible would fail safe when it crashes like this)21:53
clarkbdmsimard: also thinking about that more I wonder if the issue is including a task list that is included elsewhere by other hosts in the same play21:53
clarkbbasically we were trying to dedup special tasks for arm by doing arm specific then generic tasks, but may be the reorg here is do all the generic tasks on debuntu then include only the arm specific tasks from there?21:54
clarkbI dunno this is all incredibly cryptic to me and it not actually knowing it failed makes it harder to understand21:54
*** dklyle has quit IRC21:54
*** jamesmcarthur has quit IRC21:54
dmsimardclarkb: where is that failure occurring ? not in zuul right ?21:55
clarkbno this is to apply things to production. I don't think it happens in the zuul jobs beacuse we don't use arm64 test nodes in the zuul jobs21:55
clarkbwe could add an arm64 node to the system-config inventory to further test things (but that seems out of scope for now)21:56
clarkb(at least for me trying to make things happy before weekend)21:56
*** dklyle has joined #openstack-infra21:56
dmsimardclarkb: I don't fully recognize the output format of the paste you sent me, could a callback be eating some sort of trace or failure ?21:56
clarkbmaybe? I think we use default output for that21:57
dmsimardwhere does "2018-12-14 20:54:28,515 p=11685 u=root |" come from ? is that journalctl ?21:57
clarkbcallback_whitelist=profile_tasks, timer21:57
clarkbI think from the timer callback21:57
clarkbor the profuiler?21:57
clarkbI dunno21:58
clarkbit does look like syslog/journalctl format though21:58
dmsimardah, it looks like it's the format provided by ansible from log_path=/var/log/ansible/ansible.log21:59
*** mriedem has quit IRC22:01
openstackgerritMerged openstack-infra/puppet-kibana master: Set server admin var so that vhost works  https://review.openstack.org/62534422:02
*** dklyle has quit IRC22:02
openstackgerritClark Boylan proposed openstack-infra/system-config master: Stop running unnecessary tests on trusty  https://review.openstack.org/62535822:02
clarkband ^ is a small cleanup optomization I noticed we could make22:03
dmsimardclarkb: my gut feeling is that the logging doesn't tell the whole story22:03
dmsimarda bit like how the zuul callback munges some traces or errors22:03
clarkbdmsimard: ya I'm beginning to think it must be a corner case of include_task for a file that is already included in the play22:03
dmsimardsometimes* munges22:04
clarkbits being included for a different set of nodes, but the way ansible queues things up is global iirc22:04
dmsimardclarkb: if we have the ability to run that playbook in the foreground, we would certainly get a different output22:04
*** priteau has joined #openstack-infra22:05
*** priteau has quit IRC22:05
clarkbdmsimard: probably the path forward here is merge the copy pasta (I assume that won't break t hings similarly) then make a revert that dds an arm64 test node to the inventory and tweak until it works22:06
clarkbkeeping in mind that ansible success doesn't mean it actually worked22:07
dmsimardclarkb: hmmm, the output is the same in the run_all raw output http://paste.openstack.org/show/737344/22:09
dmsimardI'm out of ideas22:09
dmsimard¯\_(ツ)_/¯22:09
clarkbit definitely looks like ansible just goes "NOPE"22:09
clarkband continues with the next play22:09
clarkbdmsimard: should I try to fabricobble a bug together on github for this?22:13
clarkbI don't really have much info other than the setup and logs and ansible version info22:13
*** dklyle has joined #openstack-infra22:13
clarkbI guess I can start there and see if anyone in ansible land is able/willing to debug further22:13
dmsimardWe have a good case for a bug if we can come up with a generic reproducer22:13
*** betherly has joined #openstack-infra22:14
*** boden has quit IRC22:14
dmsimardProbably worthwhile to check if there's already an issue about this too22:14
dmsimardyikes, the answer in https://github.com/ansible/ansible/issues/41984 is basically "include_tasks is in tech preview, use at your own risks"22:16
*** trown is now known as trown|outtypewww22:16
dmsimard(that was a while ago, though)22:16
clarkbre reproducing if I had to guess the trick there is having a play that includes foo.yaml and bar.yaml on different hosts, then have bar.yaml include_tasks: foo.yaml22:17
*** dklyle has quit IRC22:18
dmsimardyou can probably fake it using add_host22:18
clarkbplay1 runs role1 and role2. role1 executors foo.yaml via include_tasks on some hosts and bar.yaml on others. Have bar.yaml also include_tasks foo.yaml22:18
dmsimard(with ansible_connection: local)22:18
clarkbthen role2 never runs (and neither does play2)22:18
*** betherly has quit IRC22:18
clarkbor just use an inventory with a few connect local settings22:19
*** rfolco has quit IRC22:22
*** e0ne has joined #openstack-infra22:23
*** betherly has joined #openstack-infra22:24
*** betherly has quit IRC22:29
openstackgerritdemosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``.  https://review.openstack.org/62536422:32
clarkbdmsimard: I can reproduce22:32
clarkbtotally doesn't print any errors either as far as I can tell22:32
clarkbI'm double checkign that copying the tasks fixes22:32
*** slaweq has quit IRC22:33
clarkbyup22:33
clarkbthat is amazing22:33
clarkbI'm going to take a short break then will be back to file a bug report22:34
*** eernst has joined #openstack-infra22:34
*** dklyle has joined #openstack-infra22:37
*** e0ne has quit IRC22:40
*** dklyle has quit IRC22:41
*** dklyle has joined #openstack-infra22:42
*** eernst has quit IRC22:48
*** wolverineav has quit IRC22:51
dmsimardsigning off for today, do share the link, I'm curious :)22:52
kmallocShrews, ianw, mordred, clarkb: https://review.openstack.org/625370  <-- dogpile.cache and sdk fix22:52
kmalloci am unsure how to actually run this as a unit test...22:53
kmallocbut that has been confirmed locally to fix the issue22:53
notmynameFYI https://blade.tencent.com/magellan/index_en.html22:55
notmynamethere's a severe lack of detail (including a CVE), but it sounds scary. but maybe "just" upgrading will mitigate it?22:56
*** kgiusti has left #openstack-infra22:57
*** dklyle has quit IRC22:57
kmallocnotmyname: gross.22:58
kmalloci think chromium is fixed with upgrade according to that.22:58
kmallocnot sure if it's SQLite upgrade fixing =/ man so few details.22:59
*** wolverineav has joined #openstack-infra22:59
*** jamesmcarthur has joined #openstack-infra22:59
notmynameah, it's the one linked in https://news.ycombinator.com/item?id=18685516. seems related to WebSQL23:00
kmallocahh23:00
notmynameCVE TBD23:00
notmynamewell, that's at least the chromium issue. not sure if there are others23:02
*** wolverineav has quit IRC23:03
*** jamesmcarthur has quit IRC23:03
kmallocyeah23:04
*** wolverineav has joined #openstack-infra23:06
*** lbragstad has quit IRC23:08
clarkbkmalloc: thanks and notmyname that looks like a fun one23:09
clarkbI'm going to figure out how to write this bug report for ansible without making the person taking it on go crazy23:09
notmynameheh23:09
notmynameclarkb: looks like https://review.openstack.org/#/c/625361/ worked for the permissions error23:10
openstackgerritdemosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``.  https://review.openstack.org/62537223:11
clarkbnotmyname: cool23:14
*** lbragstad has joined #openstack-infra23:17
fungiclarkb: okay, back from steak night and catching up23:20
fungisounds like you found a bug23:20
clarkbfungi: ya In the process of writing a bug23:22
*** lbragstad has quit IRC23:22
clarkbjust pushed https://github.com/cboylan/ansible_include_tasks_crash to refer to in the bug23:22
clarkbas its easie rto do that than to try and do all this in markdown I think23:22
*** eernst has joined #openstack-infra23:26
fungisure23:26
fungii would have never figured that out myself, fwiw23:27
fungigood job23:27
fungiansible is still so much blackbox to me23:27
scaskeeping ruby in wet memory for chef leaves ansible pretty opaque for me. it's a trade-off23:35
clarkbfungi: dmsimard mordred Shrews https://github.com/ansible/ansible/issues/4996923:36
clarkbimo its a pretty big deal because ansible shouldn't fail and continue running like that23:37
clarkbhrm logstash webserver still broken23:37
*** diablo_rojo has quit IRC23:37
clarkbarg I know why23:38
fungido share23:38
*** yamamoto has quit IRC23:39
openstackgerritClark Boylan proposed openstack-infra/puppet-kibana master: Use full lookup path for serveradmin in template  https://review.openstack.org/62537423:41
clarkbfungi: ^ beacuse I fail at puppet23:41
clarkbthe good news is I fail at ansible equally as evidenced by the include_tasks thing :)23:41
clarkbfungi: https://review.openstack.org/625350 and https://review.openstack.org/625374 should fix the two oustanding issues we know we currently have23:41
clarkbfungi: on the first one I have no idea if ansible will apply cleanly after not doign so for a few days23:42
fungimaybe we just merge and fix any issues we spot over the weekend23:42
scasspeaking of opaque, what might i do with several cross-repo dependencies that all need each other to pass each build?23:43
kmalloczzzeek: i don't think we need to revert dogpile. what SDK is doing is very .. not normal.23:43
clarkbscas: option A build in backward compat/future compat as necessary to get them all happy. option B make tests non voting as necessary to get over hump.23:44
clarkbscas: there is an option C too, realize that these peices of software are tightly coupled and might be better off in a single repo23:44
scassingle repo is not exactly the easiest to manage, since it's configuration management23:45
scasoption A might be the path forward23:45
scasmaking things non-voting would mean making everything non-voting, negating the testing mechanism23:45
clarkbscas: ya B is a short term, thing23:46
clarkbthat comma is there for I don't know why reasons23:46
fungiin openstack, we've held that option a is good engineering and generally downstream-friendly23:47
clarkb++23:47
fungisince at any point in time, the set of your stuff continues to work23:47
scasyeah, that's where i'm leaning23:47
clarkbafter filing this ansible github bug I feel like I've done my good deed of the week23:48
clarkbthat was a fun one23:48
clarkbfungi: fwiw if https://review.openstack.org/#/c/625350/ looks good to you I don't mind keeping one eyeball on irc/bridge logs this evening23:48
clarkbif you want to approve it23:48
fungiscas: it's more complexity and more iteration for sure, but has the up-side that if one of those changes gets ignored for weeks due to lack of reviewers or a wayward bus, stuff still runs23:49
clarkband I don't mind self approving the logstash fix since I already self approved the broken fix23:49
clarkb(maybe this one is broken too )23:49
scasi know force-merging is a less favorable option, but i'm not considering that an option at this point23:52
scasi'd rather get them testing without having to rely on local testing alone saying all's well23:53
fungiscas: at least here that requires cooperation of the admins for the repository hosting platform23:53
scasabsolutely, and i'm sure none would be too pleased of me asking it23:54
funginot that we're an uncooperative bunch, we'll just spend a while telling you why it's a bad idea ;)23:54
scasi'm familiar with how bad of an idea it can be. it had to be wielded in the recent months for something else unrelated, as it was the only option in that case23:56
fungiyeah, it's not necessarily the worst idea, circumstances depending. though it is usually still a bad idea regardless23:57
fungisometimes all the other ideas are simply worse still23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!