Friday, 2019-01-11

*** openstackgerrit has quit IRC00:34
*** openstackgerrit has joined #zuul00:54
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering  https://review.openstack.org/53786900:54
*** rlandy has quit IRC01:07
*** bhavikdbavishi has joined #zuul01:49
*** bhavikdbavishi has quit IRC01:59
*** ruffian_sheep has joined #zuul02:07
ruffian_sheepHi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul02:10
ruffian_sheepThis is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml.02:13
*** sshnaidm is now known as sshnaidm|off02:14
fungiahh, i wish in #openstack-infra you'd mentioned this was for an openstack third-party ci system02:14
fungior i wouldn't have directed you to #zuul02:15
fungizuul hasn't supported jenkins for roughly a year now02:15
fungiso if you want to run zuul with jenkins you're going to be stuck running a somewhat old and unmaintained release of zuul02:16
fungii think 2.6.0 is likely the last release which had jenkins integration02:17
ruffian_sheepBut .....the link is given by official guidance document02:25
fungithat's guidance from the openstack cinder project. you should talk to people involved in that project to get recommendations on what they expect from you02:26
ruffian_sheepI think so !But...this my third week to learn how to build it.And I cann't find anyone do it like me to answer my questionsT.T02:32
ruffian_sheep<fungi>:Do you know the channel they discuss?02:33
fungithere is a #openstack-cinder channel where the openstack cinder team tends to be present, and also an #openstack-third-party-ci channel where some third-party ci operators tend to be02:34
ruffian_sheep<fungi>:Thanks!Does IRC have the ability to add permanent friends?02:36
ruffian_sheep<fungi>: ;P02:36
funginot really, no (some irc clients might, i don't know)02:36
ruffian_sheep<fungi>: Alright02:37
ruffian_sheepHi!I want to build a Third Party CI!But now ,I don't know how to set the piplines of jenkins jobs build used in zuul.This is the link where I learn how to build it.https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers.I don't know the exactly setting of the layout.yaml and project.yaml.02:38
ruffian_sheepSorry,send the wrong channel..02:39
ruffian_sheepPlease ignore the message02:39
*** jiapei has joined #zuul03:10
*** _ari_ has quit IRC03:21
*** rcarrillocruz has joined #zuul03:39
*** bhavikdbavishi has joined #zuul04:51
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add version to info endpoint  https://review.openstack.org/60957105:35
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add about dropdown to display zuul version  https://review.openstack.org/63002705:35
*** chkumar|out is now known as chandankumar05:40
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid  https://review.openstack.org/63003406:45
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route  https://review.openstack.org/63003506:45
*** hashar has joined #zuul06:54
openstackgerritRui Chen proposed openstack-infra/zuul master: Avoid using list branches with protected=1 in github driver  https://review.openstack.org/63003806:55
*** pcaruana has joined #zuul07:05
*** bhavikdbavishi has quit IRC07:10
*** quiquell|off is now known as quiquell07:16
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column  https://review.openstack.org/63003407:17
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route  https://review.openstack.org/63003507:17
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page  https://review.openstack.org/63004107:17
*** hashar has quit IRC07:18
*** jiapei has left #zuul07:20
*** gtema has joined #zuul07:44
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: sql: add buildset uuid column  https://review.openstack.org/63003407:46
*** gtema has quit IRC08:01
*** gtema has joined #zuul08:02
*** rcarrillocruz has quit IRC08:03
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route  https://review.openstack.org/63003508:17
*** jpena|off is now known as jpena08:54
*** panda|off is now known as panda08:55
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildsets route  https://review.openstack.org/63003509:07
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page  https://review.openstack.org/63004109:07
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route  https://review.openstack.org/63007809:07
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page  https://review.openstack.org/63007909:07
*** avass has joined #zuul09:38
avassany way to run a job on all nodes with a specific label?09:40
pandaavass: adding the label in the nodeset and then using hosts: <label> in the playbook doesnt' work ?09:42
pandadoes anyone know if files: attribute in a child overrides completely the same attribute in the parent or the two lists are merged ?09:42
avasspanda: i thought that only requests one node with the label?09:43
avasspanda: i mean all nodes with a specific label in nodepool sorry09:43
pandaavass: yes, nevermind, I always confuse the label as an alias for group09:45
pandaavass: in fact I think the only way is to use groups09:46
avassyeah, but that would mean setting up different labels for all nodes then adding to the group wouldn't it?09:46
ssbarnea|roverfound a bug in zuul, take a look at this message which translates into a table full of SUCCESS in gerrit and a negative vote. https://s3.sbarnea.com/ss/190111-Change_Ia5bcd556_GATE_CHECK_for_TripleO__review.openstack_Code_Review_.png09:48
ssbarnea|roveror maybe this counts as a bug in the JS extension for gerrit made for zuul.09:49
pandaavass: two nodes can have the same label, and I think it's separated from the group definition. If you set two nodes with the same label in nodes: you will need to manually add those nodes to the same group to run the playbook on both09:51
avasspanda: yes but wouldn't that mean that nodeset only requests one of the nodes from nodepool?09:52
avasspanda: or nvm I see what you mean. But that wouldn't be the same as all nodes09:52
pandaavass: nodeset.nodes.label (required)09:53
pandaThe Nodepool label for the node. Zuul will request a node with this label.09:53
pandatwo nodes with the same label will be two nodes anyway09:53
avasspanda: yes, but I have static nodes with jobs that need to be run on all of them at the same time. only adding a label in nodepool when setting up another node is nicer than having to reconfigure the job09:55
avasspanda: meaning I don't want to run a job on a set number of nodes but all of the nodes with a nodepool label09:56
*** avass is now known as avass|lunch10:07
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines  https://review.openstack.org/63011810:08
tobiashavass: that's not possible atm. You would need to remove them from nodepool and use add_host in the job. Then you can run stuff on all of them. But still you would need to find a way to query that list of nodes somewhere10:15
openstackgerritAndriy Shevchenko proposed openstack/pbrx master: Updatae home-page  https://review.openstack.org/63013210:21
*** ruffian_sheep has quit IRC10:37
*** electrofelix has joined #zuul10:48
*** avass|lunch is now known as avass10:53
avasstobiash: alright10:53
*** hashar has joined #zuul10:55
avasstobiash: do you think it would be a lot of work to implement it? might work on it later10:56
jktmy zuul/web/static/ contains just the .keep file after I `pip install`, what am I doing wrong? Do I need some additional packages to build the web dashboard?10:58
*** gtema has quit IRC11:01
tobiashavass: I'm not sure how this would fit into the current architecture of nodepool11:01
avassah, I see11:01
avasswould have been a nice thing to have to set up static nodes11:03
jktah, right, if I'm installing from git directly, I need yarn.11:20
*** hashar has quit IRC11:29
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix test_load_governor on large machines  https://review.openstack.org/63011811:39
*** cmurphy is now known as cmorpheus12:05
*** dkehn has quit IRC12:10
*** rcarrillocruz has joined #zuul12:35
*** hashar has joined #zuul12:35
*** jpena is now known as jpena|lunch12:38
*** rcarrillocruz has quit IRC12:40
*** gtema has joined #zuul12:49
*** rcarrillocruz has joined #zuul13:10
*** rlandy has joined #zuul13:18
openstackgerritSorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs  https://review.openstack.org/63022413:37
*** jpena|lunch is now known as jpena13:44
openstackgerritSorin Sbarnea proposed openstack-infra/zuul-jobs master: Avoid zuul DISK_FULL failure with too big logs  https://review.openstack.org/63022413:46
*** dkehn has joined #zuul13:55
fungissbarnea|rover: yeah, what you're seeing is a combination of 1. zuul not having an external log link to supply because the job was aborted prior to uploading its logs, and 2. the "hideci" javascript overlay in use on openstack's gerrit server not knowing how to deal with it13:57
fungii know we'd discussed in the past having a way to at least archive and report a log link for the build's console log even if it exceeded storage on the executor. the current implementation where the disk accountant as a separate thread aborts jobs out-of-band makes that a bit challenging to implement, i think13:58
dmsimardfungi: perhaps the take away could be that the executor should not be storing the files in transit at all ?14:43
fungiwell, it does some various things to them while stored on the executor which would need to happen elsewhere i suppose14:44
fungidmsimard: also, the one safeguard we do have currently to curtail filling up logservers is the executor disk accountant thread, which operates on the transient build dir14:46
fungiif we copy those files straight through to the destination, we likely need some other way to figure out how much we're about to copy14:47
dmsimardwhat I had implemented a while back for RDO's jenkins things is that the logserver had a ssh key embedded in the image (not unlike the current infra root keys) and the log server would pull the logs instead14:47
fungiyeah, we're also hoping to soon get away from having an actual logserver. i doubt swift containers have a mechanism to pull data from some other source on demand14:48
dmsimardyeah, swift makes this somewhat of a no-go14:48
dmsimardThe ideal situation is if we're able to either pull once or push once, right now we're pulling and then pushing which costs us bandwidth, storage and performance14:51
fungithe previous iteration of logs-on-swift under zuul v2 did have the job nodes upload their logs directly to the swift api, though we had to take extra care to only authorize them for access to their job's log subtree14:52
fungiand much of the challenge was around generating and distributing credentials for that14:53
dmsimardyeah, security is why we had the log server pull the logs instead of having the nodes push them14:54
fungialso, aggregating artifacts/logs through the executor does, i think, provide increased flexibility in supporting multiple storage solutions14:55
dmsimardpros and cons :)14:56
jkthow does nodepool actually connect to the nodes when using the static provider?15:14
jktI see that I have to register the nodes' SSH server pubkeys, but the docs do not say anything about how to configure the nodes themselves15:15
jktpresumably, I should somehow add some SSH pubkey to my target user's home dir, but the docs do not specify which key is used for this by nodepool/zuul15:16
jktis that the same zuul's SSH key as used for talking to Gerrit by chance?15:16
Shrewsjkt: you may find https://zuul-ci.org/docs/zuul/admin/nodepool_static.html useful15:18
Shrewsjkt: that's part of the Zuul From Scratch guide https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html15:19
jktShrews: thanks, I read that one, and I haven't found my answer in there15:21
jktShrews: it is probably executor.private_key_file, right?15:21
*** quiquell is now known as quiquell|off15:22
avassit sets up temporary ssh keys during the pre-job using the master ssh-key15:23
avassjkt: and yes it should be that file15:23
jktlet me check that playbook file, thanks15:24
*** hashar has quit IRC15:28
corvusdmsimard, fungi: i think we can have the executor store a few lines of the ansible log in the database in these cases.15:48
*** avass has quit IRC15:49
fungimakes sense. at least knowing which task was in progress when the DISK_FULL result was determined would have helped speed up diagnosis15:50
fungigranted that's still not a huge help in the particular case we ran into, with a job which normally produces 85mb of logs when it succeeds, but wanted to archive 7gb of logs during an infrequent failure case15:51
fungibut mordred has suggested in #openstack-infra that perhaps running a remote du and then not collecting the logs from the node if over a set threshold would provide sufficient detail to diagnose the cause15:52
fungi(and collect at least a summary of the du info, perhaps by just echoing in the console log)15:53
mordredyeah - thinking that could go into fetch-output15:54
dmsimardwhat was the failure scenario ? would it be something that could've been handled by an ansible rescue block ?15:55
dmsimardlike fetch-output fails, rescue -> do something helpful15:55
mordreddmsimard: right now no - right now it's that the disk accounting protector on the zuul executor killed things because the job had used too much executor disk space15:55
mordreddmsimard: but that's sort of what we're talking about as a potential mitigation so that we can avoid hitting the kill-job-too-much-disk safety net15:56
dmsimardcould the fetch output role look at the size of the artifacts it needs to pull before pulling them ?15:57
mordreddmsimard: that's what we were just talking about doing - yeah15:57
dmsimardoh, hadn't read the entire thing15:57
dmsimardcaffeine--15:57
mordredcorvus: if we did go that direction - perhaps we could expose zuul's disk usage thresholds in a zuul variable so that the role could have the default disk threshold be set from the executor threshold?16:00
mordredalthough - with all of that - I need to AFK for a bit ... y'all have fun16:00
tobiashjkt: nodepool only gets the public keys, it doesn't login to the nodes16:00
corvusfungi, mordred: yeah, the two approaches are not mutually exclusive.  :)  i don't see a problem with exposing the threshold variable16:02
fungimordred: the math still gets interesting, because we're pulling from multiple nodes but have one threshold to work with. i guess the role could tally up the du from all of them before pulling from any of them?16:05
jkttobiash: yup, I am now aware that it's actually zuul-executor, and its default config happens to reuse the "main" key as used for connecting to Gerrit16:06
*** evrardjp has quit IRC16:10
openstackgerritPaul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job  https://review.openstack.org/63030216:11
*** evrardjp has joined #zuul16:11
pabelangerfungi: ttx: ^ is the follow up for pushing up dco-license job, going test is now from ansible-network zuul16:11
corvuspabelanger: you were asking for a zuul release.... should we release 2fd6883 as 3.4.0?16:12
corvustobiash, clarkb: ^?16:13
pabelangercorvus: yes, that looks fine to me16:13
tobiashcorvus: is that what you are running atm?16:13
corvustobiash: yeah16:14
tobiashlgtm16:14
tobiashnothing really important between that and current master16:14
corvusthere's a small behavior change for the archive url artifacts, but i think it's forward-compatible enough to have it span releases16:15
tobiashcorvus: while we're at release partying, does anything speak against a gear release? That would enable us to introduce the client side keepalive in zuul.16:16
corvustobiash: oh thanks, yeah we can do that to.... though since gear is a library, let's do that monday? :)16:17
tobiashmonday is fine16:17
tobiashthanks16:17
clarkbcorvus: fine with me. The github improvements would be good to publish16:19
clarkbon the nodepool side of things we may want to get Shrews's image build timeouts in and tested with openstack before releasing. I think that may be an important one given recent errors with image building in openstack land16:19
Shrewsyeah, i'm working on validating the dib log output is the same before i'm comfortable with it16:20
Shrewsshould know shortly16:20
tobiashclarkb, Shrews: regarding the build timeout, I think it should be configurable. We have one image that regularly already takes 8-10 hours...16:22
tobiash(it's a 500gb image)16:22
clarkbtobiash: ++16:24
corvuszuul 3.4.0 pushed16:24
tobiash\o/16:24
clarkbtobiash: I haven't had a chance to review it yet, but is on my list of things to try and get to today16:26
*** EmilienM is now known as EvilienM16:26
tobiashclarkb: you mean the image build timeout?16:27
clarkbyes16:27
tobiashI just had a quick look at the first verion yesterday when it was wip'ed and noticed the hard coded timeout16:28
openstackgerritPaul Belanger proposed openstack-infra/zuul-jobs master: Add dco-license job  https://review.openstack.org/63030216:30
*** pcaruana has quit IRC16:36
*** pwhalen has joined #zuul16:51
*** gtema has quit IRC16:52
corvustobiash, pabelanger, tristanC: are you aware of this capability?  it's not something we've really advertised... https://review.openstack.org/62998316:53
tobiashcorvus: not yet, will look later16:54
corvusi'm wondering whether we should continue to pretend that doesn't exist, or should we start to use it in some places (keeping in mind that using it will serve as an example to others).  or should we jump into the deep end of the pool and make the 'parent' attribute a list and make this more convenient to use.16:58
pabelangercorvus: ah, interesting. I don't think I've written a job like that before16:59
corvusi don't think anyone has :)16:59
pabelangeryah, having job with 2 parents, way cool!17:00
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build  https://review.openstack.org/62992317:09
clarkbShrews: ^ one note on the flush location there17:11
Shrewsclarkb: that doesn't appear necessary since EOF in the normal case seems to handle that17:12
clarkbShrews: but the log.write(m.encode('utf8')) only happens in the error cases17:13
clarkber no the else is all cases17:13
clarkbso just move the flush below the log.write in the exception handler?17:13
Shrewsclarkb: you've confused me. sorry17:14
Shrewsclarkb: i need the flush+fsync there to get that message at the end of the log file17:15
clarkbShrews: https://review.openstack.org/#/c/629923/2/nodepool/builder.py line 785. Is a write that happens after the flush and fsync. Shouldn't it come first?17:15
Shrewsclarkb: the normal case doesn't need that17:15
Shrewsno, it shouldn't be first because i want it as close to the end of file as possible17:16
clarkbgotcha its ensuring the first process has finished writing before the nodepool builder writes that line17:16
Shrewsbecause in the timeout case, there can still be buffered data this not yet written to the log17:16
Shrewss/this/that is/17:17
Shrewsi guess technically it's not necessary since the dib process could still write to the log long after we've moved on, but it doesn't hurt either17:19
*** panda is now known as panda|off17:25
clarkbya getting it close enough is probably fine. grep works and we shouldn't timeout often17:25
*** pwhalen has quit IRC17:28
*** electrofelix has quit IRC17:31
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build  https://review.openstack.org/62992317:32
Shrewsok, so the requirements on that ^^ seems to be in flux. Do we want the timeout configurable per-image or just an overall timeout?17:34
Shrewsclarkb: tobiash: ^^^17:34
Shrewsthat last PS added an overall timeout17:35
Shrewsbut I just saw the comments17:35
clarkbShrews: given tobiash's extreme builds we likely have to make it configurable. I don't think it has to be configurable per image if that is more work17:35
Shrewsok, then the current PS should work17:36
clarkb(that was more of an optimization so tobiash could say "this is the really long image build treat it differently") For openstack most of our builds are consistent and it won't matter17:36
*** rlandy is now known as rlandy|brb17:36
clarkbShrews: ya that latest ps lgtm17:37
pabelangerHmm. I've updated my zuul-web to 3.4.0, but for some reason it isn't connecting properly to zookeeper now17:40
pabelangerrolling back to 3.3.1 works as expected17:40
pabelangergoing to try another service, zuul-web has a lot of changes17:42
pabelangerOh, ha17:51
pabelangerI think zuul-web now needs a connection to zookeeper17:51
pabelangerno reason it doesn't work, I have it blocked in firewall17:52
pabelangerlet me confirm that is fix, and I'll propose reno change about upgrading17:53
pabelangerYup, that is it17:54
*** pwhalen has joined #zuul17:55
timburkefungi: thinking about limiting swift access to a particular subtree, you might want to look at prefix-based tempurls -- see https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html18:00
fungitimburke: i think that's what we did once upon a time18:01
timburkewow, that's been around longer than i'd remembered... since swift 2.12...18:03
*** rlandy|brb is now known as rlandy18:07
*** TheJulia is now known as needssleep18:10
fungitimburke: yeah, looks like we were using tmpurls with prefixes: https://git.zuul-ci.org/cgit/zuul/tree/zuul/configloader.py?h=9e78452#n12518:10
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Update docs since zuul-web requires zookeeper  https://review.openstack.org/63036518:12
pabelangercorvus: tobiash: clarkb: ^documentation updates for zuul-web when upgrading to 3.4.018:12
corvuswhen did that change?18:13
pabelangercorvus: I have nodes now in web UI, let me find commit18:14
pabelangercorvus: I think https://review.openstack.org/553979/18:15
corvuswelp, we should have tagged zuul v4 then.18:16
corvusthat's an architectural deployment change18:16
openstackgerritMerged openstack-infra/zuul-jobs master: Add validate-dco-license role  https://review.openstack.org/62956518:17
pabelangerYah, I only caught it because zuul-web is on different VM then scheduler18:17
corvusi think we should re-tag 3.3.1 and 3.4.118:17
corvuser18:17
corvusi think we should re-tag 3.3.1 as 3.4.118:17
pabelangerok18:18
corvusbecause we should not be sneaking in deployment changes (like, if you need to update your firewall rules, it's a deployment change) as minor version changes18:18
pabelanger++ agree18:19
corvusand then we can either revert the web->zk changes, or tag 3.4.0 as 4.0.018:19
corvus(and i guess scale-out-scheduler becomes zuul v5 :)18:19
fungii'll have to stop talking about "zuul v3"18:19
openstackgerritMerged openstack-infra/zuul-jobs master: Add dco-license job  https://review.openstack.org/63030218:20
pabelangerwill defer to people, but would be okay with revert and 3.4.0 over 4.0.0 bump18:20
corvuswell, we lose the node+label stuff, unless we rework that to use gearman18:20
corvustristanC, tobiash, SpamapS: ^ thoughts?18:22
AJaegercorvus, do we need https://review.openstack.org/#/c/625615/1/zuul/site-variables.yaml ? Happy to +2A, just want confirmation (or you can +2A yourself ;)18:22
*** hashar has joined #zuul18:23
openstackgerritPaul Belanger proposed openstack-infra/zuul-jobs master: Simplify dco-license job playbook  https://review.openstack.org/63036918:25
AJaegerhere's a zuul-jobs change for review - do we want to pin nodejs as mordred proposed? https://review.openstack.org/62782318:25
SpamapS3.4 seems sufficient for an internal-only deploy change.18:26
SpamapSI'd expect a 3.3.1 to be patches only.18:27
SpamapS(fixes, non-changing features, etc)18:27
SpamapSbut we added some cool features, so it seems fine to say "to get them you need to update your firewalls"18:27
SpamapSHonestly, I'd even be fine with scale-out scheduler being a minor 3.x+1 release, if it didn't change any of the external interfaces.18:28
fungiSpamapS: well, even if you don't use those new features, you're going to need to update your firewalls if you want to continue running zuul-web, i expect?18:28
SpamapSfungi: Yeah, that's fine with me. I'm not going to upgrade from 3.3 to 3.4 without planning and reading changelogs.18:29
fungifair18:29
corvusi'm happy to be talked back from the edge...  though up to this point it's generally been safe to update to 3.x without touching your deployment setup (modulo that one time where certain web configs had to change, sorry SpamapS).  3.x releases have generally been "new job features or behavior" not "new system configuration"18:29
SpamapS(I might upgrade from 3.3.0 -> 3.3.1 without doing that, if it had an important fix)18:29
SpamapSSo, one thing might be to make the new zuul-web bits off by default.18:30
pabelangerfungi: yah, in my case, my zuul.conf on zuul-web didn't have zookeeper configuration too, so I needed a zuul.conf change also18:30
pabelangerbut, that is only because I dynamic generate it with minimal content18:31
mrhillsmanlet me just say the changes that have been going in from a ux have been great so kudos to all the folks doing the work ;)18:31
corvusif i'm reading the change correctly, it is effectively required18:33
corvusif you omit the zk hosts, it defaults to localhost18:33
corvusso regardless, zuul-web will not start if it can't reach a zk18:33
pabelangeryes, that is what happen to me18:34
corvus(even though the connect invocation is wrapped in a conditional to test whether zk_hosts is set)18:35
corvus(it's never going to fail)18:35
corvusexcept... in tests?18:36
corvusnope, even then it's used18:36
corvusoh, ok.  *some* tests use it, some don't18:37
corvusso the only place it's possible to have a web server without a zk connection is in *some* unit tests.  not in production.18:37
corvuswe usually try to run tests as in production, so we should probably change that.18:38
corvusgiven that there's a default value for the zk host, i don't see an easy way to make this optional18:38
corvuswhy do we have a default for the zk hosts value when we don't document that, and say that the hosts list is required?  https://zuul-ci.org/docs/zuul/admin/components.html#attr-zookeeper18:40
corvus(if we didn't have the default, i think there would be an easy way to implement what SpamapS suggested -- add the zk connection info if you want it, or don't)18:41
*** jpena is now known as jpena|off18:44
corvusclarkb, tobiash, mordred, pabelanger, SpamapS, fungi, Shrews, and anyone else who wants to weigh in:  what should we do?  A) tag 3.3.1 as 3.4.1, land the release note, then tag master as 4.0.0.  B) tag 3.3.1 as 3.4.1, make the ZK stuff optional [somehow] then tag master as 3.4.2.  C) land the release note, and send out the release announcement with a note about the additional requirement.18:45
clarkbI like B if we can solve the somehow18:47
clarkbthat gives us future flexibility too18:47
fungiwould making the zk stuff optional and then tagging 3.3.2 be on the table?18:47
fungias in "oops, 3.3.1 was a regression, fixed in 3.3.2"18:48
corvusfungi: 3.4.0 is already tagged, it has the new zk stuff18:48
fungioh, nevermind18:48
corvus3.3.1 is the last "safe" release18:48
clarkbfor solving the somehow, one appraoch could be to check if 2181 is listening on localhost, if so connect to it, else don't try to connect?18:49
tobiashI like B18:49
clarkb(that might be too smart for its own good though)18:49
fungigot it, so "tag 3.3.1 as 3.4.1" means "revert 3.4.0 temporarily"18:49
corvusclarkb: i don't like an approach that fails quietly if zk is down18:49
corvusclarkb: i think we need to know what the operator expects, and then fail loudly if they expect zk to exist but it doesn't18:49
pabelangerthere is also D right, tag 3.3.1 as 3.4.1, revert node+label web ui, tag 3.4.2 ?18:49
* tobiash is only partly around18:49
*** dmellado has quit IRC18:51
*** gouthamr has quit IRC18:51
pabelangeras much as I'd like C, I don't think that will be fair to users. To that is -1 for me18:51
corvusB is nice too, but i need ideas on how to do it.  only way i see would be to drop the undocumented default value.  i'm not sure that needs a major release, but it might be nice to at least give notice and have a deprecation period.18:51
fungirevisiting the proposed options, i like b though it makes me wonder if we have consensus/precedent on when to increment the major release version component18:52
pabelanger+1 for B, if we can figure out how to do18:53
corvusfungi: i'm not sure if we have consensus, but we do have precedent.  it has only incremented on major architecture changes (jenkins api -> gearman -> zk/ansible)18:53
fungii agree version numbers are cheap/free, so if we want a new connection between two components to mean new major version, i'm okay with it18:53
corvusthough 2.5 is an asterisk :)18:54
fungiheh18:54
fungithe addition of a connection from zuul-web to zk doesn't feel like it's in the same class as our previous major version bumps, at least18:54
corvusokay, so does anyone have a suggestion or how to implement B?18:54
pabelangerif we go to 4.0.0, what else would be include feature wise in that stream?18:54
corvuspabelanger: i don't understand the question18:55
clarkbcorvus: one option is to make anothe rbreaking change and force people to list localhost if that is what they mean (sounds like docs already say this is how you would do it)18:55
fungii don't think we include anything else feature-wise in 4.0.0 aside from the node+label webui18:55
pabelangerlet me reask, would multiple ansible version be 4.0 or 5.018:55
fungidepends on when it lands and what lands before it18:56
Shrewsgoing to 4.0.0 seems... extreme, but i get it. if we could do B, that seems better18:56
corvusclarkb: yeah.  i'd like to do that, but it does mean breaking anyone relying on the undocumented default18:56
clarkbdoes the docker-compose quickstart take advantage of that?18:57
clarkb(trying to get a sense for how many people might actually be using it, I know openstack bmw and SpamapS all run remote zk clusters)18:57
fungii feel like changing an undocumented behavior (one which contradicts what the documentation claims) could be argued as a minor version bump with very thorough release note18:57
corvusif we were planning this ahead of time, i would say "lets send an announcement that in the next release we're going to remove the undocumented default, and wait at least a week.18:57
corvusfungi: yeah.  you might be able to talk me into that.  i hope so.  :)18:58
SpamapSHOw long has the undocumented default been out?18:58
SpamapSBecause, I'm inclined to ignore it if it's less than a few months.18:58
SpamapSIf it were documented, that'd be one thing.18:58
corvusi think it's old.  i'll check.18:58
SpamapSBut the accidental "oops this works" being replaced with "hey where'd my nodes/labels UI go?" is kind of fine with me.18:59
SpamapS(assuming the answer is that the nodes/labels UI goes away when there's no ZK config)18:59
fungiso amended proposal: re-tag 3.3.1 as 3.4.1, announce the coming removal of the default, then tag 3.4.0 as 3.5.0 with a clear release note explaining the situation19:00
SpamapSI thought the nodes/labels UI bits only landed a few days ago.19:00
SpamapSBut maybe the API has been around longer.19:00
fungi(or, you know, master as 3.5.0 really)19:00
corvuson 2017-06-23 i changed the section it was under; but it's been there since the beginning, 2017-02-2119:01
fungibasically just a "oops, our bad, here's what we should have announced, we'll unwind it with another release in a week"19:01
pabelangerfungi: so, that would only break people using localhost zookeeper with zuul scheduler, IIUC19:02
corvusSpamapS: yeah, the nodes/labels are new.  but having a zk default is old19:02
corvusfungi: yeah, that's starting to sound like a plan19:02
fungii do feel like 4.0.0 is going to imply much larger architectural changes than we intend to convey, unless we start to set expectations of much more frequent major version bumps going forward19:03
corvusclarkb: the quickstart does not rely on implicit zk hosts19:04
SpamapSWait, so zuul-web has been trying to connect to ZK since June?19:05
SpamapSOr the config value was there, but unused?19:05
corvusfungi: yeah, it sounds like we'd all probably like to be able to make this change as 3.X, but we just need to do it carefully, and reserve 4.0 for "you're going to start or stop running a new service"19:05
pabelangerSpamapS: only after dec 29 https://review.openstack.org/553979/19:06
corvusSpamapS: no, zuul-web has only been trying to connect since 3.4.0 (or master a week ago).  but the "[zookeeper]" section of the config file, which is valid on any host (for this reason) has had an implicit default of localhost since "forever"19:06
*** hashar has quit IRC19:06
SpamapSAh19:06
SpamapSwell IMO the more important thing is whether the daemon started up and worked.19:06
corvusheh, apparently that was 2 weeks ago, time flies19:06
fungii suppose we could find a way to make that default optional only for zuul-web, but that could get complicated19:07
corvusfungi: yeah, it undercuts the work we did to unify that19:07
fungier, make that default nonexistent i maean19:07
fungii agree, it seems like a poor choice for consistency19:07
SpamapSBut I see the complication that the config file parsed through the time we started using it.19:07
pabelangerso, if zookeeper is nonexistent in zuul-web, the new nodes+label UI won't load? Sorry, getting lost mental mapping everything19:08
SpamapSRight now I believe if you don't set it, zuul-web will fail to start unless you have zk on localhost.19:09
corvuspabelanger: the current behavior is that zuul-web will always attempt to connect to zk, regardless of your config, because of the implicit defaul.19:09
corvusSpamapS: right19:09
SpamapSWe could change it to go ahead and catch that failure when the default is used with a "We think you didn't mean to do this" warning in the log and release notes, and then 3.5.0 can be the first release that doesn't have the default. ?19:11
SpamapS*We could change it -- in a 3.4.1 release19:11
pabelangercorvus: ok, and sounds like we are wanting to remove that. which stops zuul-web from connecting to zookeeper, we then also update UI to not display labels / nodes if no zookeeper connection? Or do we get that by default (guess I should look)19:12
SpamapSSo that way 3.4.1 still starts where 3.4.0 doesn't, but I don't think it *breaks* anybody.19:12
SpamapSAlso, I'm sort of inclined to suggest that 4.0 drops any nodepool-specific API's from zuul-web, and we make a nodepool-web.19:12
SpamapS(but that's jumping ahead)19:13
fungipabelanger: it seems like the simplest way forward is to not rework too much of that, roll back the release by re-tagging the old release, and then announce removal of the implicit zk config option, then tag a new release after a bit of a waiting period (so just roll forward with requiring zk for zuul-web)19:13
fungis/option/default/19:14
clarkbfungi: well if we don't rework that won't it also fail on not having zk?19:14
fungiyes, but we will have announced that the configuration there is mandatory, like the documentation already states19:14
clarkbah right19:14
corvuspabelanger: i'm not sure; we might just end up pushing the brokenness down to when someone clicks on the "labels" tab.  in which case, maybe 3.4.0 with a release note saying "oops this is required" is best.  or maybe combine the two approaches: re-tag 3.3.1 then release 3.5.0 in a week after an announcement that the new requirement is coming.19:15
fungiat least for me, it seems like a reasonable balance of courtesy to users with a minimal amount of reworking the software we already have19:15
corvusSpamapS: well, we didn't have any until 2 weeks ago.... i think a lot of folks feel like presenting a unified web view of the system is worthwhile (regardless of whether nodepool also has a web ui)19:15
pabelangerYah, I guess what I am trying to figure out, if we still expect 3.x to have nodes+labels UI, that needs a firewall / config update. unless zuul-web is smart enough not to render it if no zookeeper connection. Other wise, users still need to update firewalls today or when 3.x happens.19:17
pabelangerand if so, then maybe just original option C) land reno note and ML is good enough with out changing defaults19:17
fungii think the firewall config update is suitable for a minor version bump with a clear release note, but that's just my opinion19:17
fungi(an opinion SpamapS helped me to form, credit where credit is due)19:18
corvusfungi: do you think we could do (C) -- just land a release note and announce 3.4.0 with it?19:18
fungihave we not done a release announcement for 3.4.0 yet?19:19
corvusnope19:19
corvusi'm slow19:19
*** smyers has quit IRC19:19
fungithat's tempting. it won't be included in the repo state for that tag is my biggest worry, so visibility is mainly in the announcement19:19
pabelangerfungi: https://review.openstack.org/630365/ should fix that, once we run reno publish again19:20
*** smyers has joined #zuul19:20
corvusyeah, though it will be in the website docs19:20
corvusunder the right section19:20
fungithat's not bad19:20
tobiashWe could make zk optional and hide the labels tab just like we hide the builds tab without sql19:21
fungitobiash: yeah, that's what the code tries to do19:21
corvusfungi: i don't think it tries to hide the tabs19:21
fungiproblem is we always supply a default zk server of localhost when it's not explicitly configured19:21
fungiahh19:21
pabelangeryah, I think we'd need to add logic to hide tabs19:22
corvusfungi: the main issue is that startup fails due to not connecting19:22
fungiso the conditional there is currently more limited19:22
tobiashWe can add the has labels info in the info endpoint19:22
pabelangerwhich is fine also, and removes then need for firewall change, unless you want it19:22
corvusif we caused it to get past that, then we'd be confronted with the fact that clicking the labels tab triggers a 50019:22
corvusbut at least it's running :)19:22
SpamapSzuul-web not running is, technically, hiding the tab.19:22
fungiheh ;)19:22
pabelangerSpamapS: touché19:23
corvustobiash: yes... though, tbh, i'm not sure that's something we really want to be optional.  i think we'd only be considering it because of this issue; otherwise i don't think it's worth it19:23
corvusi mean, presented with that choice, i would just say rework it to use gearman19:23
SpamapSDoes solve a lot of issues if it can just fetch via gearman.19:25
fungiif that's the route we want to take, i'd say skip the re-tagging of 3.3.1 and just tag a revert of the feature19:26
fungias 3.4.119:26
corvusat some point zuul-web will need to talk to zk, so it's not throwaway work.  it's just surprisingly early is all. :)19:28
pabelangerI'm okay with reverting the feature, I'm not really using it. But possible users on zuul.o.o are19:30
pabelangerI need to step away for a few moments, but will also be EOD shortly, need to run some errands19:30
fungithe distributed scheduler work... is the plan that it gets rid of gearman from the zuul/nodepool architecture entirely?19:30
pabelangerI've reverted my local zuul to 3.3.1 for now19:31
corvusokay, new poll: A) land the release note and send out an announcement with it.  B) revert the feature and figure out later what to do about it (put it back with better notice, make it optional, use gearman).19:31
corvuspabelanger: before you leave, ^ do you have a preference, and if it's B, do you strongly object to A?19:31
corvusfungi: yes19:31
pabelangercorvus: both are fine, B is more user friendly, but if you are okay with A, so am I19:32
corvusfungi: so doing more stuff in zuul-web with gearman may be throwaway work.19:32
fungipoints in favor of a: it's a lot less work, it doesn't kick the can further down the road, and at least it was a minor version bump not a patchlevel increment19:32
fungicorvus: thanks, that's why i asked, yes ;)19:32
corvusokay, i think everyone thinks A is acceptable, if not ideal.  so how about we go with that.19:33
pabelangersorry, should say deployer friendly19:33
fungii think option a isn't terrible if 3.5.0 (announced in advance) drops the implicit zk config default so that we don't fail to notice similar dependencies creeping into future releases19:34
corvusyeah, i think we should do that.19:34
fungiif 3.4.0 had already been announced i'd be pretty against option a, fwiw19:34
corvusthat may not force the issue, so i think we'll mostly need to be careful19:35
corvus(because we actually recommend using the same zuul config file on all hosts)19:35
corvusfungi: care to +3   https://review.openstack.org/630365   ?19:37
corvushttp://logs.openstack.org/65/630365/1/check/tox-docs/ff72c5d/html/releasenotes.html#relnotes-3-4-019:37
fungiand done19:38
SpamapScorvus: I mirror pabelanger's feelings. Fine with either, prefer B, but not going to do the work, so (A) is sufficient.19:39
corvushow does this look?  https://etherpad.openstack.org/p/6thJzljYMd19:41
corvusi put it first in the list; should we call it out more strongly?19:42
*** openstackstatus has quit IRC19:43
SpamapSstrong enough for me.19:43
fungiyeah, i worry that doing too much else to it deviates from the notes published to the docs site19:43
fungiif we do want to call it out more, we could mention it between the download url and the release notes introduction i guess19:44
corvusk.  when 630365 lands, i'll send that19:44
*** openstackstatus has joined #zuul19:45
*** ChanServ sets mode: +v openstackstatus19:45
fungilike "be aware this release may require configuration and firewall changes (see below)"19:45
corvusfungi: yeah, part of me wants to do that, then part of me says it would look like "please read 6 lines down"19:45
fungisure19:46
fungii think it's good19:46
fungii mean, if we expect people to notice it in the published release notes we should expect them to notice the same thing in the release announcement19:46
pabelanger+119:47
pabelangeron etherpad19:47
clarkbya lgtm19:47
*** kmalloc has joined #zuul19:50
*** kmalloc has quit IRC19:51
*** gouthamr has joined #zuul19:53
pabelangerokay, shutting down for a while. Thanks everybody for help this afternoon!19:55
*** dmellado has joined #zuul19:58
openstackgerritMerged openstack-infra/zuul-jobs master: Simplify dco-license job playbook  https://review.openstack.org/63036920:05
openstackgerritMerged openstack-infra/zuul master: Update docs since zuul-web requires zookeeper  https://review.openstack.org/63036520:05
*** openstack has joined #zuul20:18
*** ChanServ sets mode: +o openstack20:18
corvusdocs are published, release announcement sent; thanks everyone!20:42
*** hashar has joined #zuul20:43
*** corvus is now known as thecount21:02
*** thecount is now known as corvus21:02
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add a timeout for the image build  https://review.openstack.org/62992321:14
*** rlandy has quit IRC21:31
openstackgerritMerged openstack-infra/zuul-jobs master: Add role to move docs and artifacts to log root  https://review.openstack.org/62957122:07
*** hashar has quit IRC22:10
*** EvilienM is now known as EmilienM22:14
*** pabelanger has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!