Tuesday, 2021-03-09

*** tosky has quit IRC00:28
openstackgerritJames E. Blair proposed zuul/nodepool master: Convert NodeLaunchRecord into NodeLauncher  https://review.opendev.org/c/zuul/nodepool/+/77940700:36
corvusthat gets us stats (and, interestingly, slots even better into the existing driver framework); main todo left now is quota handling00:37
corvusactually, looks like that's done; i think that framework may be ready to port the azure driver over00:59
*** jamesmcarthur has joined #zuul01:05
*** jamesmcarthur has quit IRC01:14
*** jamesmcarthur has joined #zuul01:15
*** jamesmcarthur has quit IRC01:17
*** jamesmcarthur has joined #zuul01:17
*** hamalq has quit IRC01:24
*** jamesmcarthur has quit IRC01:25
*** jamesmcarthur has joined #zuul01:28
*** jamesmcarthur has quit IRC01:33
*** jamesmcarthur has joined #zuul01:39
*** jamesmcarthur has quit IRC02:00
*** jamesmcarthur has joined #zuul02:01
*** jamesmcarthur has quit IRC02:02
*** jamesmcarthur has joined #zuul02:02
*** jamesmcarthur has quit IRC02:04
*** jamesmcarthur has joined #zuul02:05
*** ikhan has quit IRC02:05
*** jamesmcarthur has quit IRC02:09
*** jamesmcarthur has joined #zuul02:25
*** jamesmcarthur has quit IRC02:37
*** jamesmcarthur has joined #zuul02:41
*** jamesmcarthur has quit IRC02:45
corvusthe error with those tests in the zk stack is the lack of a time database directory02:48
corvuswe should keep the testonly argument to scheduler for that02:49
openstackgerritJames E. Blair proposed zuul/zuul master: Make ConnectionRegistry mandatory for Scheduler  https://review.opendev.org/c/zuul/zuul/+/77908602:52
openstackgerritJames E. Blair proposed zuul/zuul master: Instantiate executor client, merger, nodepool and app within Scheduler  https://review.opendev.org/c/zuul/zuul/+/77908702:52
corvustobiash, swest, felixedel: the alternate stack that i pushed under "hashtag:sos" should be ready now; can you take a look on tuesday?02:54
*** jamesmcarthur has joined #zuul02:59
*** ajitha has joined #zuul03:14
*** jamesmcarthur has quit IRC03:22
*** jamesmcarthur has joined #zuul03:27
*** jamesmcarthur has quit IRC03:27
*** jamesmcarthur has joined #zuul03:37
*** jamesmcarthur has quit IRC03:49
corvusremote:   https://review.opendev.org/c/zuul/nodepool/+/779420 WIP: add azure state machine driver [NEW]03:50
corvusthat's still early -- but that does create, delete, and cleanup leaks for real03:51
*** dpawlik6 has joined #zuul03:51
*** jamesmcarthur has joined #zuul03:53
*** Tahvok_ has joined #zuul03:53
*** raukadah has joined #zuul03:54
*** avass_ has joined #zuul03:55
*** freefood has joined #zuul03:56
*** icey_ has joined #zuul03:56
*** paulalbertella has joined #zuul03:56
*** openstackgerrit has quit IRC03:59
*** reiterative has quit IRC03:59
*** Tahvok has quit IRC03:59
*** icey has quit IRC03:59
*** avass has quit IRC03:59
*** chandankumar has quit IRC03:59
*** jkt has quit IRC03:59
*** mhu has quit IRC03:59
*** freefood_ has quit IRC03:59
*** dpawlik has quit IRC03:59
*** fbo has quit IRC03:59
*** Tahvok_ is now known as Tahvok03:59
*** dpawlik6 is now known as dpawlik03:59
*** jkt has joined #zuul04:00
*** jamesmcarthur has quit IRC04:03
*** jamesmcarthur has joined #zuul04:14
*** vishalmanchanda has joined #zuul04:16
*** vishalmanchanda has quit IRC04:21
*** vishalmanchanda has joined #zuul04:21
*** ykarel has joined #zuul04:23
*** saneax has joined #zuul04:48
*** wuchunyang has joined #zuul04:49
*** raukadah is now known as chandankumar04:51
*** jamesmcarthur has quit IRC04:55
*** jamesmcarthur has joined #zuul05:11
*** jangutter has quit IRC05:15
*** jangutter has joined #zuul05:16
*** wuchunyang has quit IRC05:24
*** iurygregory has quit IRC05:26
*** jamesmcarthur has quit IRC05:32
*** evrardjp has quit IRC05:33
*** evrardjp has joined #zuul05:33
*** jfoufas1 has joined #zuul05:34
*** jamesmcarthur has joined #zuul05:35
*** jamesmcarthur has quit IRC05:40
*** jamesmcarthur has joined #zuul06:03
*** wuchunyang has joined #zuul06:27
*** wuchunyang has quit IRC06:53
*** hashar has joined #zuul07:09
*** vishalmanchanda has quit IRC07:36
*** piotrowskim has joined #zuul07:59
*** jamesmcarthur has quit IRC08:08
*** jcapitao has joined #zuul08:12
*** okamis has joined #zuul08:15
okamisHello, what is the purpose of gearman in zuul?08:16
*** hashar has quit IRC08:19
tobiashokamis: it's the rpc protocol zuul-scheduler uses to talk to zuul-executors08:20
okamisokay, do i understand it correctly that gearman also partially decides on the scheduling because it has itself a queue?08:22
*** rpittau|afk is now known as rpittau08:24
*** jamesmcarthur has joined #zuul08:36
*** jamesmcarthur has quit IRC08:43
swestcorvus: lgtm, small comment on 779087 that should fix the failing test08:51
*** jpena|off is now known as jpena08:55
*** paulalbertella is now known as reiterative08:57
*** tosky has joined #zuul09:02
*** hashar has joined #zuul09:25
*** ajitha has quit IRC10:08
*** jamesmcarthur has joined #zuul10:39
*** jangutter has quit IRC10:41
*** jangutter has joined #zuul10:42
*** jangutter has quit IRC10:43
*** jangutter has joined #zuul10:44
*** iurygregory_ has joined #zuul10:45
*** jamesmcarthur has quit IRC10:46
*** iurygregory_ is now known as iurygregory10:46
okamisis this channel more active in other hours?10:53
okamisIm in utc +110:53
*** icey_ is now known as icey11:02
*** jangutter has quit IRC11:07
*** jangutter has joined #zuul11:07
*** hashar has quit IRC11:08
*** nils has joined #zuul11:14
avass_okamis: yeah it's usually active later11:17
*** avass_ is now known as avass11:17
avassokamis: most people are in america. exceptions are me, tobiash and zbr I believe11:18
okamisguess they be sleeping.11:19
tobiashokamis: yes, gearman decides on its own how the jobs are distributed11:19
tobiashwhat's the background of your question?11:19
okamisIm curious about the scheduler, in my old experiences we had issues with gearman either starving jobs, or we went round robin and its also not optimal11:19
okamisIm hoping that it would be possible to write custom job prioritizer, so  I can use scarce resources effectively, and also prioritize jobs together which belong to the same change to have good throughput11:21
tobiashokamis: are you asking about zuul jobs or using gearman for your own software?11:22
okamiswe are using zuulv2, but im curious what zuul (v4?) can do for us11:22
tobiashthis is how the zuul executors distribute their load: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L253811:23
tobiashthis makes the scheduling quite equally distributed among all executors11:24
okamisHow does delaying the request spread it among executors?11:27
*** jangutter has quit IRC11:30
*** jangutter has joined #zuul11:30
okamisEven spread is nice, but I would like that scarce resources are only used for specific jobs when we have those in queue, and if there is none I would like them to pick up arbitrary jobs11:31
tobiashokamis: each executor delays the noop request with a backoff depending on its current running jobs11:33
okamisAaah, Okay, that makes sense, so nodes that doesnt run anything responds faster they are available11:34
tobiashyes11:34
okamisAnother topic im interested in, is upgrading the worker hosts/executor? through this queue system as well. So I dont have to remove worker hosts from ci. Is that something you guys thought about?11:35
okamisIf I have a pool of 100 hosts, I want to say, after you finish job X, run this upgrade host job. If I can prioritize jobs in the scheduler I could send those to the top of the queue11:36
*** jangutter has quit IRC11:42
*** jangutter has joined #zuul11:42
tobiashokamis: is this still a zuulv2 question? Remember zuulv2 is eol long ago and zuulv3 works completely different11:44
tobiashwith zuulv5 we'll even get rid of gearman in the system11:44
okamisnot a zuulv2 question at least, Latest and greatest makes sense :)11:45
tobiashthis question doesn't really fit into the zuulv3+ world since there are usually no queued jobs on the executors11:46
tobiashsince then the zuul-scheduler talks with nodepool and asks for nodes (here is where the waiting/queuing mainly happens) and when it got its nodes it schedules a job on the node (which is most of the time without queuing)11:48
okamishttps://zuul-ci.org/docs/zuul/discussion/components.html#overview11:49
okamisit talks through gearman when requesting nodes?11:50
tobiashno, it talks through zookeeper to nodepool when requesting nodes11:50
*** jcapitao is now known as jcapitao_lunch11:52
okamisah, so I dont know what zookeper api supports but  if the scheduler has the queue then it would be possible to write a custom queue then to implement the features i mentioned?11:53
tobiashzookeeper is just a distributed datastore where nodepool implements its own queue mechanism on top11:54
tobiashthis already supports priorities so it might already support what you need11:55
tobiashthe priority can be defined in the pipeline in zuul11:55
tobiashhttps://zuul-ci.org/docs/zuul/reference/pipeline_def.html#attr-pipeline.precedence11:56
okamisHas happened that we need an emergency patch to go through quickly, I guess having a second check pipeline with high precedence would help12:01
avassokamis: you can also promote changes in a queue I believe12:09
avassokamis: https://zuul-ci.org/docs/zuul/reference/client.html?highlight=promote#configuration12:09
avassokamis: https://zuul-ci.org/docs/zuul/reference/client.html?highlight=promote#promote12:09
*** ykarel_ has joined #zuul12:17
*** ykarel has quit IRC12:20
*** ykarel_ is now known as ykarel12:21
okamisAh cheers, forgot about it as the client doesnt work on our zuul v2 version12:28
okamisAnother topic is the dependent gate, I read that testing with the assumption that changes will fail is more efficient than just being optimistic and assuming all will pass. Is that something you guys have evaluated?12:30
okamis*changes might also fail12:32
*** jpena is now known as jpena|lunch12:34
*** jamesmcarthur has joined #zuul12:42
*** mhu has joined #zuul12:46
*** jamesmcarthur has quit IRC12:50
avassokamis: I'm not sure I understand what you mean :)12:51
avassthe reason we test is that we expect that the changes might fail, no?12:54
okamisMany changes are tested with assumption that the changes ahead will pass. When a change fails, following changes will have to be shuffled around and rerun which can be costly12:58
okamisI realized I spoke to soon without understanding which scenarios its too costly.12:58
avassthat's why it's often recommended to run the same tests in check and gate12:58
tobiashokamis: did you watch the video on https://zuul-ci.org/ ?12:59
tobiashthat is a short but very good explanation of the gating12:59
avassand the changes aren't shuffled around. the change that fails is removed from the queue and the ones behind it a re-enqueue in the same order (the logic is a bit more than that but it gets you the right idea)12:59
okamishttps://eng.uber.com/research/keeping-master-green-at-scale/  I got it from this paper, there is a pdf on that page13:00
okamisSeen the video, I didnt read the paper super in depth, so it might be having scaling issues when adding 500 of worker nodes or some condition. Just wanted to mention it because its very close to what zuul does13:02
avassif you're running the same jobs in both check & gate then the only reason something will fail in gate is when two changes are incompatible.13:02
*** jcapitao_lunch is now known as jcapitao13:03
*** hashar has joined #zuul13:04
okamisThere is definitiely flaky tests, and intermittency issuse in our ci sadly :(13:07
okamisI will come back with some data later13:08
avassoh yeah there's that too, which is a bit harder to avoid13:09
avassokamis: a way to avoid that is to make sure your setup steps are in a 'pre-run' so the jobs gets retried if the pre-run fail. however if something fails in the tests themselves (in a 'run' playbook) the change would fail13:11
*** sduthil has quit IRC13:11
*** sduthil has joined #zuul13:12
avassokamis: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.attempts13:13
*** vishalmanchanda has joined #zuul13:16
avassokamis: you could also tell ansible to retry specific tasks if they're prone to intermittent errors13:19
*** jangutter has quit IRC13:34
*** jangutter has joined #zuul13:34
*** jpena|lunch is now known as jpena13:36
*** jangutter has quit IRC13:44
*** jangutter has joined #zuul13:44
*** ikhan has joined #zuul13:59
okamisokay.14:06
okamisI do hope you guys will peek a bit on that paper, because they have a speculative approach to see if it will pass or not :)14:08
*** jangutter has quit IRC14:16
*** jangutter has joined #zuul14:17
avassokamis: I might take a look later :)14:28
mordredI like that they say the zuul approach doesn't scale when we're already operating at higher throughput rates than they are. :)14:32
*** okamis has quit IRC14:32
mordredok - so the magic beans as to how they "predict" which speculative paths are more or less likely to fail is related to SubmitQueue being integrated with a particular build system14:33
mordredthey then analyze the sub-tasks of the build system as part of their analysis (in their case Buck, but they mention Bazel as well)14:33
mordredthat's hard to generalize for a system designed to do integration testing of arbitrary workloads and languages14:34
mordredthe conflictanalyzer seems like an interesting idea if it's feasible given an environment14:37
*** okamis has joined #zuul14:39
okamisWhat throughput you guys got, Im assuming uber talks about a single project and not the sum of many14:39
mordredyeah - because they chose a monorepo organization. comparing their results against a single repo in opedev wouldn't be an apples to apples comparison. the more appopriate comparison would be to compare their single repo vs, say, all of the interrelated repos of openstack. my throughput comment was more anecdotal than analysis - they said "thousands of changes per day" and I normally think of our load in terms of "thousands of jobs per hour". :) ... it's14:46
mordredan interesting paper14:46
mordred(although I'm totally still on first-caffeine of the day)14:46
corvusopendev's zuul has peaked at about 2000 jobs per hour at the limit of its donated cloud resources; there are even larger private zuul installations.14:47
*** jamesmcarthur has joined #zuul14:47
mordredyah14:48
*** jamesmcarthur has quit IRC14:52
*** jamesmcarthur has joined #zuul14:52
okamisNo doubt you guys win there in total jobs per hour, but they surely improved the efficiency with dependent gate.14:55
corvusokamis: perhaps, i don't have jobs-per-change-merged handy.  but as mordred pointed out, that approach is dependent on a ci system narrowly tailored to the software under test.  that approach can not be applied to the general case.14:57
avassI suppose to compare the systems you really need to check something like: (changes/resources)*confidence14:58
mordredthey definitely do some interesting things - thanks for sharing the paper. I think the main win is in what they call the conflict analyzer. there are also some other things they're able to do by being closely aligned with the underlying build system. as corvus mentions, I don't think those are as readily applicable to the general case. that said - I've long been a proponent of standardized tooling, and I think the uber paper here is a great explanation14:59
mordredof the power of having all of your teams use the same thing :)14:59
avassmordred: I agree that having a standardized system can be really nice when you need to optimize the system, but the dev in me says it wants to use the latest tooling for everything :)15:02
okamisThe page 11 should be of interest to you guys, it mentions that speculate all performs better than zuuls optimistic approach. Which is generic15:02
okamisI retract my previous statement, in some conditions it is better15:03
corvusavass: latest version of zuul for everyone :)  zuul's job content is continuously deployed, so devs do get the latest everything :)15:03
okamisany chance you guys can implement (changes/resources)*custom_confidence? Then you can hardcode it custom_confidence=115:07
corvusit's worth noting that zuul currently has three queue manager implementations, each describing a slightly different way of handling dependent changes; we can add more if we find a use for them.15:17
okamisWow, thats nice to hear, so the code architecture I guess makes it easy to add more cases :starstruck:15:18
corvusi'm not sure i'd use the word "easy", but we do try to avoid backing ourselves into a corner :)15:19
avassone that some people I know think is missing is queueing up a bunch of changes and testing the last one then mergin all of them if that succeeds, or doing sort of a binary search with tests15:20
okamisJust wanna mention also that we had a lot of intermittency so we modified zuul to not rerun already passed jobs so rechecks would be much faster in check. Somethings you guys considered15:20
okamisavass that also sounds very nice :)15:21
*** hashar has quit IRC15:28
tobiashavass: I've also thought about adding a batch parameter to the dependent pipeline manager such that it saves some builds in between15:37
clarkbone thing I've wondered about batching is how do you decide which groups of changes to batch. Do you use a count, a timeout, both? I think some other systems rely on humans to manually construct the batch15:41
clarkbif you accept the downsides with bisectability and breaking rolling release models batching would likely be a good way to save resources. Just have to sort out the mechanism for collecting changes15:42
*** ykarel has quit IRC15:43
clarkbalso looking at their chart for probability of conflicting changes makes me wonder if they also have communications problems between teams15:45
mordredclarkb: of course they do! they have teams :)15:45
clarkb(what level that manifests in I don't know could be in person, api stability/docs, etc)15:45
clarkbheh ya15:45
clarkbthe graph says 15 concurrent android changes have a 50% chance of conflict15:46
clarkbthat seems really really high (remember the CI system can discard trivial conflicts so they must only be looking at conflicts in functionality)15:46
*** ikhan has quit IRC15:48
avassclarkb: I think it's either on a timer or number of changes or a combination of both yeah15:48
okamisA conflict is if it changes the same function I think, its not necessarily bad but a a high risk after merging that its not working as intended.15:48
clarkbokamis: right but lets say you have 15 people working together at the same company on the same software and the same function even. They should be communicating15:48
clarkb50% probability of a conflict there says to me that there isn't enough communication15:49
clarkband that doesn't necessarily mean meetings all day, but stronger api contracts, stability assertions in libraries, etc can go a long way ime15:50
avassthe problem is that what they "should" is often not what they will do :)15:50
okamisclarkb If you and me both needs to update function FOO, then we should both be able to do it in different changes. The uber thing will just lower the confidence that it will merge correctly. And I think thats is very reasonable15:51
*** jangutter_ has joined #zuul15:51
clarkbokamis: yes, but the zuul approach is also very reasonable imo. We communciate and stack the chagnes appropriately15:52
clarkbessentially we try to drive practices that drive the confidence up rather than accepting it will be terrible.15:52
*** jangutte_ has joined #zuul15:53
okamisclarkb Uber doesnt necessarily conflict that.15:54
*** jangutter has quit IRC15:54
okamisSay Im doing feature A and modifying function Foo, and my colleague does feature B and modifies function Foo too. We can comumnicate that. But the conflict analyzer regardless will see that there is an increase risk of mistakes15:55
clarkbyup they are not exclusive, it just feels weird to optimize for the suboptimal. But maybe they are more efficient that way15:55
okamiswhy is it suboptimal?15:56
clarkbokamis: because the developers could collaborate and stack the work explicitly to avoid the conflicts15:56
*** jangutter_ has quit IRC15:56
okamisI dont understand that stack thing to avoid conflict, can you make up some scenario15:57
*** hashar has joined #zuul15:57
clarkbour tools tell us when we are conflicting and we talk about it. I did this just yesterday with the ansible shell type work and corvus' zk work15:57
okamiswhat tools informs you?15:57
clarkbokamis: gerrit15:57
okamisof merge conflicts?15:58
clarkbyes15:58
okamisI dont know what uber is using, but a function can be modified by 2 parties without having merge conflicts15:58
clarkbthis is true, also zuul won't enqueue subsequent changes that merge conflict15:59
clarkb(which avoids the reset cost entirely)15:59
clarkbokamis: but in theory you could run the conflict analyzer ahead of CI and force discussions to happen rather than waiting for when we want to merge stuff16:00
okamisI think the discussion is moving to improving something else now16:01
clarkbyes communication :)16:01
okamissure16:01
clarkbI was just thinking that a conflict rate of 50% for 15 in flight changes seems really really high16:02
clarkbbecause 15 people working on 15 changes that touch the same code should be able to coordinate16:02
clarkbI do think some conflict is likely unavoidable, we are human afterall. Just that specific published number seems high16:03
okamissame logic can also mean (im python developer) touching same module16:03
okamisI dont know if its high, I belonged to a team of 7 but we didnt develop 1 single thing, we were doing cicd so we touch many things16:05
clarkbnote in my evaluation I'm not considering a coordinated stack of changes to the same function as a conflict16:05
clarkbthey can still operate on the same code just in an explicit manner than acknowledges some sort of order (to resolve the conflict)16:06
okamisYeah, I dont know if its high, if its one product then it maybe is reasonable because it scales fast per developer. Its like the birthday paradox right16:07
okamisbut how do you guys resolve it if it doesnt warn of merge conflicts? I rather just press a button and get an answer then speculate16:08
okamisthan*16:08
clarkbyup, I'm suggesting that the conflict detection might be of better use pushed earlier in the change lifecycle16:09
clarkbI think catching conflicts that are more complex than a text merge conflict would be great, but I'd want that in code review early on ?16:09
clarkb(and then the CI system can still query its status, similar to how merge conflicts are handled)16:09
okamisYeah, so you want gerrit to have a new feature right?16:10
okamisIf it was nice in gerrit interface it would be cool, changes that modifies same files as you or function16:10
clarkbright16:11
okamisyeah, that I think I would agree on, probably very possible as you can query the changes through gerrit api16:12
fungithere could even be an external solution, for example a periodic zuul job which runs an analysis of open changes for a project and then updates some information somewhere (via a change comment, findings tab, separate interface, whatever) to inform reviewers when different changes under development are touching the same areas16:18
fungithat could serve as a reminder for them to coordinate better with one another on those particular changes16:18
clarkbaha I think they explain it in another portion of the paper. "This is due to the fact that the build graph on the iOS monorepo is very deep (i.e., only a handful of leaf-level nodes) resulting in a large number of conflicts among changes. Consequently, the speculation graph has few independent changes that can execute and commit in parallel. Therefore, we expect substantially better improvements when16:19
clarkbusing the conflict analyzer for repositories that have a wider build graph."16:19
clarkbsounds like the repo itself induces conflicting changes16:19
clarkbwhereas something like openstack is probably comparatively wide considering we've explicitly split it up into multiple repos and so on16:19
fungi"multiple" being hundreds16:20
clarkboh yup and they talk about proactively communicating the conflicts to devs as well.16:21
clarkband suggest that monorepo change counts make that difficult16:22
*** openstackgerrit has joined #zuul16:22
openstackgerritMerged zuul/zuul master: Fix possible race in _getChange  https://review.opendev.org/c/zuul/zuul/+/75842416:22
clarkbI disagree where they suggest it encourages developers to rus htheir code to avoid conflicts. I think what we more frequently see is explicit stacking to resolve the conflicts then working to land the bottom up16:23
okamisim heading out, thx all16:28
fungiperhaps a similar model is the linux kernel, where there are several layers of commit aggregation which happens before branches are pulled to the main tree. people with familiarity of or working in a particular are of the repository are forced to coordinate their work, and the owners of those parts of the tree then have to coordinate with one another16:28
*** jfoufas1 has quit IRC16:29
*** okamis has quit IRC16:29
*** jamesmcarthur has quit IRC16:40
*** jamesmcarthur has joined #zuul16:41
*** jcapitao has quit IRC16:47
*** jamesmcarthur has quit IRC16:47
*** rpittau is now known as rpittau|afk17:05
*** hashar has quit IRC17:07
*** jamesmcarthur has joined #zuul17:11
*** saneax has quit IRC17:12
*** jamesmcarthur has quit IRC17:15
*** jamesmcarthur has joined #zuul17:15
*** bhavikdbavishi has joined #zuul17:23
*** jangutter has joined #zuul17:26
*** jangutte_ has quit IRC17:30
tobiashzuul-maint: it would be great if you could put the spec for enhancing regional executors onto your review list: https://review.opendev.org/c/zuul/zuul/+/66341317:51
*** jpena is now known as jpena|off18:02
*** nils has quit IRC18:06
*** bhavikdbavishi has quit IRC18:08
*** bhavikdbavishi has joined #zuul18:14
openstackgerritTobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw  https://review.opendev.org/c/zuul/zuul/+/66494918:15
openstackgerritTobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway  https://review.opendev.org/c/zuul/zuul/+/66496518:15
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495018:15
openstackgerritMerged zuul/zuul master: Make repo state buildset global  https://review.opendev.org/c/zuul/zuul/+/73860318:22
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495018:29
*** hamalq has joined #zuul18:30
openstackgerritTobias Henkel proposed zuul/zuul master: Make reporting asynchronous  https://review.opendev.org/c/zuul/zuul/+/69125318:34
tristanCtobiash: does tls finger protocol works with the finger client? i wonder if droping the finger protocol support would make things easier18:40
tobiashtristanC: you mean gnu finger?18:43
tristanCtobiash: yes, when the user connect to the stream with gnu finger18:44
tobiashI don't think gnu finger works with tls, but anyway this protocol is as easy as it can get18:45
tobiashComnect, send build id, stream data18:45
clarkbcorrect fingerdoesn't do ssl/tls at all aiui18:45
tobiashtristanC: if you want a tls capable terminal streaming client we'd probable need to add that to zuul-client18:46
tristanCtobiash: thus i wonder if we shouldn't drop the fingergw and only use the web service?18:48
tobiashwebsocket is way more complicated for streaming between executors and zuul-web18:48
tristanCi also noticed we don't actually need websocket and we could use plain http with EventSource18:49
tristanC(which would work using `curl -N`)18:51
tobiashgetting rid of websocket would be compelling for some users since that creates problems with some reverse proxies18:52
tristanCi think so too, it would be easier to manage18:53
*** hashar has joined #zuul18:53
tristanCwe can keep (tls) finger between zuul-web and executor, and only support plain http between user and zuul-web18:54
corvuswe can also keep non-tls finger for end-users.  that costs nothing.18:55
*** jamesmcarthur has quit IRC18:56
corvusthough if the "curl -N" experience is just as good or better, then i could see us dropping it for simplicity and improved UX18:57
corvusbut maybe let's see it in action first18:57
avasshad to look up evensource and it seems that it only allows for 6 open connection to the same url across all tabs for some reason19:03
fungieventsource? i'm not finding much about anything called evensource19:06
tristanCfungi: it's https://developer.mozilla.org/en-US/docs/Web/API/EventSource19:06
avassfungi: yeah eventsource, typo :)19:07
fungiyep, okay i did find that. thanks19:07
tristanCavass: 6 sounds acceptable?19:07
avasstristanC: some people don't close tabs19:08
tobiashand some people likely want to open 10 streams at once and cycle through them19:08
corvus"You are about to close 7 windows with 344 tabs."  actual message from this weekend.19:08
tobiashwow19:08
corvusi declared tab bankruptcy and started over19:09
*** hamalq has quit IRC19:09
avasswhat to do you even do with that many tabs? :)19:09
corvusnothing19:09
*** hamalq has joined #zuul19:09
corvusjust open new ones19:09
tristanCtobiash: avass: we could use the same trick as in the status page, and stop the stream when the tab doesn't have the focus19:10
avassmaybe that's good enough19:12
clarkbcorvus: I suffer this ailment as well19:12
fungii've been able to brutally close out tabs i don't strictly need for the past year now. it's a challenge though19:13
fungii realized i'd been using open browser tabs as a very lazy to do list, and started to get better about putting them on my actual to do list instead19:13
clarkbI found a plugin that allowed me to set a tab limit. I hit the limit and discovered I couldn't open a new tab to adjust the plugin settings up more and rage unistalled it once19:13
*** bhavikdbavishi has quit IRC19:24
corvustristanC: i think 6 would be ok.  there are 2 use cases i can think of that would change a little bit.  one is a user opens a console window for a long running job, switches away for a while, then comes back to check on it.  as long as the job is still running, there's no behavior change.  if the job has finished, then there will be a behavior change since the result won't be visible.  that could be19:44
corvusmitigated by supplying a nice link to the current log or build location.19:44
corvustristanC: the other case is a little more specialized:  sometimes i open ~10 builds at once to try to catch some random behavior in the act (say, some post-log upload failure).  that would be limited to 6 now, and i'd have to make sure all 6 windows are open and visible.  that's still probably sufficient for that use case, but it's something to be aware of.19:45
corvusi don't think we need to design for the second case, that's esoteric, and as long as we have a procedure for an expert user to follow, i think it's fine.  the first case is probably more typical and we should design a good ux for it.19:46
clarkbkeeping 6 windows open and visible doesn't play nice with my tiling window manager workflow. Not the end of the world though19:48
clarkb(it would clutter up the window space)19:48
avasscorvus: you could also work around it by running 6 firefox windows and 6 chrome windows :)19:49
clarkbavass: you'd need 6 firefox profiles I think19:49
clarkbas they all oeprate on the same set of resource limits within a profile iirc19:49
avassyeah but one browser would be limited to 6 connections, so 6 connections per browser19:50
avassyou can also increase that value in some settings and configure it to be tab local as well from what I understood19:51
avassyou can set 'network.http.max-persistent-connections-per-server' for firefox.19:54
fungifor streaming 10 different build consoles at the same time, i think i'd use terminals and finger (or the mentioned curl stream) anyway20:04
fungiwhich would not be subject to any javascript limitations20:05
avassyeah I'm more concerned about users being confused why their tab doesn't show any log output because they've maxed out the number of connections they can have at once20:11
avassspeaking of the build console, can we reduce the number of "Waiting on logger" messages that gets sent? https://review.opendev.org/c/zuul/zuul/+/777887 :)20:14
openstackgerritAlbin Vass proposed zuul/zuul master: Reduce amount of 'Waiting on logger' messages sent  https://review.opendev.org/c/zuul/zuul/+/77788720:16
avassI got the wrong scope on that variable20:16
fungiclarkb: ^ i seem to remember you had an opinion on that too20:17
clarkboh yes I should review that one, thanks. I've got a number of changes to review after lunch today20:25
*** irclogbot_3 has quit IRC20:31
*** irclogbot_2 has joined #zuul20:34
*** sassyn has joined #zuul20:37
sassynhi all20:37
sassyngood evening.20:37
openstackgerritTobias Henkel proposed zuul/nodepool master: Log openstack requests  https://review.opendev.org/c/zuul/nodepool/+/77579720:43
mordredcorvus, tristanC: the limit of six is only when not using http/220:43
avasszbr: I've gotten my rust compilation down to 2min and the entire build from 30min to below 6min with the zuul-cache. So I'd call that a successful experiment20:43
sassynconsider this: I have a Repo name RepoX in my gerrit server. I commit patch to RepoX. RepoX is a untrusted repo configure in Zuul and have the .zuul.yaml file configure. The .zuul.yaml run Job X, and then  JobY, JobZ and JobL. JobY, JobZ and JobL are all depend on JobX. My question is as follow: I want that JobY for example will only run if the20:43
sassynpatch was done for the file foo in the RepoX, while JobZ will only run if the patch was done for file bar in RepoX. I saw there is an option in the job called files, but Im not sure how I should configure it? If I will set JobY with the setting files: foo will this work?20:43
avasssassyn: hi!20:43
sassynavass: Hi dear friend how are you?20:44
openstackgerritTobias Henkel proposed zuul/nodepool master: Log openstack requests  https://review.opendev.org/c/zuul/nodepool/+/77579720:44
mordredso perhaps eventsource over http v1 for easy support of curl (curl does have an --http2 option, but now things are getting maybe more complex) - and maybe eventsource over http/2 for clients (like browsers) that support it?20:44
clarkbsassyn: if you set the files config on each of the jobs that should work20:45
sassynOK. Thank u20:47
tristanCmordred: is there http2 server library available for python yet?20:48
tristanCfrom what i understand you need a solid runtime to manage the different channels, and it seems like python asyncio implementation is not so popular20:52
tobiashcherrypy has an open issue for http2: https://github.com/cherrypy/cherrypy/issues/127620:53
mordredI read a thing about just having your apache/nginx do the http/2 upgrade for you20:55
mordredI don't know if that would be an improvement over the websocket module in apache/nginx - but maybe since it's all the same port the firewall issues would be lower20:56
mordredand - considering that the eventsource does work over http v1 - the http/2 proxy upgrade could just be an option for deployers that should otherwise be transparent?20:57
openstackgerritTristan Cacqueray proposed zuul/zuul master: WIP: replace console-stream websocket with event stream  https://review.opendev.org/c/zuul/zuul/+/77958120:59
mordredtristanC: +54,-25521:01
tristanCmordred: i haven't test how cherrypy handles long running generator, but if it does it efficiently, then we can drop that stream manager logic21:03
tristanCmordred: using apache/nginx sounds good to me if we are ok with the extra dependency21:04
corvusi'm not sure i'm okay with that21:05
corvusi'd rather us make it work with http121:05
corvusi think it's worth considering implementing the auto-shut-off on the client side over http1, and removing that restriction on http2.  i'm assuming the client can tell.21:06
tobiashit should work with http1, but as I've understood it's easy to set the reverse proxy into http2 mode to have both worlds21:06
mordredbut you'd still want to have the javascript client suppor the auto-shut-off so that it worked well if an admin did not deploy an http/2 proxy21:08
tobiashyea21:08
corvusyeah, i'd rather not add an extra deployment burden if we don't need to.  i mean, part of the rationale here is to make deployment easier.  so plain-old-http1 proxy should be a use case we support.  also supporting http2 and additionally eating more data seems fine.21:08
mordredalso - the upgrade notes of "you need to add this new proxy layer while you stop caring about the websocket proxy" is less friendly than " you can just stop caring about the websocket proxy"21:09
mordredcorvus: ++21:09
*** jamesmcarthur has joined #zuul21:13
tristanCit may not be possible to detect http2 on the client side, so perhaps we could keep both endpoint and have a toggle to enable one or the other21:13
*** tflink_ has joined #zuul21:13
*** tflink has quit IRC21:14
corvustristanC: admin toggle exposed in api/info?21:14
clarkbwill apache/nginx have similar connection count problems between the backend and themslves?21:14
clarkbor is that purely a browser thing?21:14
corvusi think that would be okay, but if it is very complicated, we might want to consider the idea that auto-limiting the number of simultaneous streams in both cases may be universally a good idea and friendly to the server operator :)21:15
avassclarkb: it's a browser thing21:15
tristanChttps://developer.mozilla.org/en-US/docs/Web/API/PerformanceResourceTiming/nextHopProtocol is the javascript thing that is still under draft21:16
tristanCcorvus: we don't even need it in api/info, the endpoint url is embeded in the status page content, so if the admin pick `event-stream`, then the status page would have the correct links21:17
openstackgerritJames E. Blair proposed zuul/nodepool master: Add python-logstash-async to container images  https://review.opendev.org/c/zuul/nodepool/+/77879321:17
corvustristanC: keep websocket and event-stream?21:17
corvusi assumed you were suggesting have the admin toggle between limiting the number of connections or leaving it unlimited.21:18
tristanCcorvus: well if you want http1 compatibility, then it seems like websocket is more appropriate21:18
*** tflink_ has quit IRC21:18
corvustristanC: oh, so you're not in favor of auto-shutoff?21:18
tobiashI'd be in favor of having event-stream xor websocket actually21:20
tristanCcorvus: not sure that is actually possible, e.g. a tab can't tell how many other tabs are opened21:20
corvustobiash: ++ i agree i don't think we should have 2 implementations21:20
corvustristanC: the suggestion was to have it stop when the user leaves the tab, like the status page.21:20
corvusso there would be a max of 1 console stream from a browser21:21
tristanCcorvus: and i don't know what is the actual limitation, so perhaps we have to close and re-open, resulting in a brand new stream21:21
corvus(but if the user switched back, we could resume, so it could be semi-transparent)21:21
corvustristanC: yeah, but if we re-open, we can still ask the remote side to skip the first (x) bytes21:21
corvusthe only downside i see is if the user switces back after the job is finished, then we can't resume; we have to just send them to either the log url or the build page.21:22
tristanCcorvus: well if we are comfortable with having only one stream active at a time, then that sounds possible21:22
*** tflink has joined #zuul21:23
corvustristanC: yeah, i at least think it's worth considering.  i'm not 100% sure, but based on thinking about it a little over lunch :), i think it's a trade-off i'd be willing to make.21:23
tobiashzuul-maint: this is a tiny executor lifecycle bugfix: https://review.opendev.org/c/zuul/zuul/+/777694/21:34
tobiashcurrently the executor doesn't exit on graceful shutdown if it has been paused or governed already21:35
tobiashand this is a small but I think important doc fix since sql reporters are deprecated now: https://review.opendev.org/c/zuul/zuul/+/777638/21:37
clarkbI'm looking at replacing our nodepool launchers and in the process have wondered if I can start a new launcher on a different host with the same hostname using the same provider config (but set max-servers on the old one to 0 and max-servers to valid number on the new one)21:37
clarkbit appears this won't work because the launcher id uses the hostname as part of the launcher id value21:37
clarkbit also uses the pid, but our pids are fairly static because we launch everything in containers21:38
clarkbI'm wondering if I can make that value random (uuid4) and if so do I need to keep it around like the builder does?21:38
clarkbanother option may be to try and use the fqdn?21:38
clarkbreading the code I think it is ok for different launchers to operate on the same provider as long as they have different launcher ids. Which would imply it is ok for a restarted launcher to register with a new id21:39
tobiashclarkb: we always do a rolling restart with an overlap with two launchers21:39
clarkbtobiash: using the same provider name but different hostnames?21:40
tobiashbut I guess ours get unique launcher ids due to pods21:40
clarkbya, but that gives more reinforcement to my reading it should be safe to just use a random value or at least a more unique value21:40
tobiashyes, same provider name21:40
clarkbsince that should be what you are getting out of your pods21:40
clarkb(I expect that it doesn't reuse names anyway)21:40
openstackgerritTobias Henkel proposed zuul/zuul master: Gracefully handle non-existent label on unlabel  https://review.opendev.org/c/zuul/zuul/+/77532921:44
clarkblooks like we don't recreate pool workers unless you delete the provider from config entirely (this isn't someting that gets recreated if you change ap rovider setting)21:45
clarkbwhich means that if we set a uuid4 as part of the name instead of hostname-pid that should be stable for the entirety of the process life. Which is also all we can guaruntee using a pid21:45
clarkbthe fact that containers give us a fairly stable pid doesn't mean that that value would be stable across process restarts21:46
clarkball that to say I think using uuid4 instead would be fine. maybe hostname-uuid so that it is easier to identify them but we'd get away from stable pids21:46
clarkbor as an alternative switch socket.gethostname with socket.getfqdn. Though I think this still suffers issues from ctonainers because the containers could all have the same hostname and also the same pid due to pid namespacing21:49
clarkb(though in opendev's case we run a container per host so that would work for us)21:49
tobiashtristanC: added a question to https://review.opendev.org/c/zuul/zuul/+/776287/21:56
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation.  https://review.opendev.org/c/zuul/zuul-jobs/+/76480821:56
avassadded example in the docs ^21:57
openstackgerritClark Boylan proposed zuul/nodepool master: Uniquely identify launchers  https://review.opendev.org/c/zuul/nodepool/+/77961622:01
clarkbI'll WIP that but pushed it up so others can see what I'm talking about more concretely (I don't think I need to update the tests to match the new format but I should and haven't done that yet)22:02
corvusclarkb: lgtm; while you're in there, it also might be nice to put the uuid first or last (if we do it last, then sorting them becomes useful)22:18
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation.  https://review.opendev.org/c/zuul/zuul-jobs/+/76480822:18
corvusi think the "random" part of that was in the middle just because we added the pool name onto the end later iirc22:18
clarkbcorvus: good idea22:19
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation.  https://review.opendev.org/c/zuul/zuul-jobs/+/76480822:24
openstackgerritMerged zuul/nodepool master: Add python-logstash-async to container images  https://review.opendev.org/c/zuul/nodepool/+/77879322:24
avassthere we go. I've added some better doumentation and I think I've workout out all the quirks I've encountered while using the zuul-cache :)22:25
avassworked out*22:25
openstackgerritClark Boylan proposed zuul/nodepool master: Uniquely identify launchers  https://review.opendev.org/c/zuul/nodepool/+/77961622:29
clarkbcorvus: ^ I'll pull the WIP now I guess22:29
corvusclarkb: typo see comment22:33
clarkbtoo much copy pasta, thanks22:34
openstackgerritClark Boylan proposed zuul/nodepool master: Uniquely identify launchers  https://review.opendev.org/c/zuul/nodepool/+/77961622:35
corvusgotta admit, there was a moment there where i was like "is this a new python3.42 dict assignment syntax?"22:36
clarkbI just got lost in the difference between object to dict and dict to objcet :)22:37
clarkband yy p was easay22:37
clarkbalso I can't type ^ see above for evidence22:37
clarkbI think we can restart our launchers with that landed, ensure everything is happy, then try the easy mode rollout of new launchers22:37
corvusagain, i just assumed i was missing out on the lingo.  kk. ymmv. yy.22:38
clarkbcorvus: yy is vi(m) for yank the line and p for put the yank buffer22:38
clarkbI copied the object version assignment into the dict then got all sideways making pep8 line lengths happy22:38
corvusi love that the terms are backwards from emacs (you kill the line (into the kill ring), then yank it back into the buffer)22:39
clarkbalso I had a really early start today (for me) and my brain is probably not working so well as a result22:39
openstackgerritTristan Cacqueray proposed zuul/zuul master: WIP: replace console-stream websocket with event stream  https://review.opendev.org/c/zuul/zuul/+/77958122:41
*** hashar has quit IRC22:53
openstackgerritMerged zuul/zuul master: Catch exception when double unregistering merge jobs  https://review.opendev.org/c/zuul/zuul/+/77769423:16
*** jamesmcarthur has quit IRC23:17
*** jamesmcarthur has joined #zuul23:17
openstackgerritMerged zuul/zuul master: Include database requirements by default  https://review.opendev.org/c/zuul/zuul/+/77724523:24
openstackgerritMerged zuul/zuul master: ansible: ensure we can delete ansible files  https://review.opendev.org/c/zuul/zuul/+/77594323:24
mordredcorvus: I find, for that reason, that it's best to not try to associate vim commands with concepts. that said ...23:27
mordredclarkb: yy does not yank anything in vi for me - I use dd for that purpose?23:27
clarkbmordred: oh maybe yy is a vim ism then23:27
corvusi've done the dd before23:27
clarkbyy is like dd without removing the line23:27
mordredOH!23:27
mordredI didn't know that23:27
mordredlookie there23:27
* mordred learned a new vi today23:28
fungiyy is copy, dd is cut, essentially23:28
mordredto do copy, I always just cut, then paste, then paste23:29
mordred but clearly that's silly23:29
mordredooh - and a single y works in a visual block23:29
corvusthe good news is you already know how to use ed, the standard unix text editor!23:31
corvusy is also yank in ed23:32
corvusthough x is put, not p23:32

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!