Monday, 2021-08-02

*** marios is now known as marios|ruck05:38
*** jpena|off is now known as jpena07:00
*** rpittau|afk is now known as rpittau07:15
*** sshnaidm_ is now known as sshnaidm08:20
opendevreviewSimon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events  https://review.opendev.org/c/zuul/zuul/+/80315408:45
opendevreviewSimon Westphahl proposed zuul/zuul master: Store build params separate from request  https://review.opendev.org/c/zuul/zuul/+/80315508:45
opendevreviewSimon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315608:45
opendevreviewSimon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events  https://review.opendev.org/c/zuul/zuul/+/80315408:53
opendevreviewSimon Westphahl proposed zuul/zuul master: Store build params separate from request  https://review.opendev.org/c/zuul/zuul/+/80315508:53
opendevreviewSimon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315608:53
opendevreviewSimon Westphahl proposed zuul/zuul master: Remove redundant get when removing a build request  https://review.opendev.org/c/zuul/zuul/+/80315809:03
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present  https://review.opendev.org/c/zuul/zuul-jobs/+/77065609:05
opendevreviewSimon Westphahl proposed zuul/zuul master: Vendor and use fixed Kazoo read/write lock  https://review.opendev.org/c/zuul/zuul/+/80316610:11
*** jcapitao is now known as jcapitao_lunch10:39
opendevreviewSimon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events  https://review.opendev.org/c/zuul/zuul/+/80315411:17
opendevreviewSimon Westphahl proposed zuul/zuul master: Store build params separate from request  https://review.opendev.org/c/zuul/zuul/+/80315511:17
opendevreviewSimon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315611:17
*** jpena is now known as jpena|lunch11:30
felixedelmhu: I've reviewed your UI patches for autohold and authentication https://review.opendev.org/q/topic:%22fffaff%22+(status:open%20OR%20status:merged). I hope the comments are helpful. Let me know if anything is unclear or if I should provide some examples or prototype something.12:17
*** jpena|lunch is now known as jpena12:28
opendevreviewMatthieu Huin proposed zuul/zuul master: [api][cors] Add CORS configuration  https://review.opendev.org/c/zuul/zuul/+/76769112:57
corvusswest: if i change test_sharded_tenant_trigger_events to read data = {'test': "x" * (sharding.NODE_BYTE_SIZE_LIMIT * 2)} the test still passes, and i can see that it made 3 sequence nodes for sharding13:02
swestcorvus: we only compress the content of a znode after we split it based on the original data size13:02
corvusswest: oh that seems backwards13:03
swestcorvus: from 797156 it sounded like we mainly wanted to reduce data size on the network and in zk13:06
swestif we want to do compression before sharding we probably need to do that in the BufferedShard[Reader|Writer] implementation13:06
swestbut that would still lead to the same issue with the transaction when we exceed the jute.maxbuffer limit13:08
corvusswest: yeah, the order isn't that important -- it probably mostly works out well enough with actual data13:10
corvusswest: the test fails with the expected error if we do `data = {'test': ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(sharding.NODE_BYTE_SIZE_LIMIT * 10))}`13:11
corvusswest: not that i doubted you -- but i did want to understand why our tests failed to catch it :{13:11
corvus * swest: not that i doubted you -- but i did want to understand why our tests failed to catch it :)13:12
swestsure13:12
corvusswest: thanks for the changes -- i'll take a look at those -- and maybe update the tests too :)   i know there are some tricky sequencing problems that the transaction solved13:13
swestcorvus: yea, for the build request I had to move the params to a separate data node similar to the event data. I don't think there should be any issues as long as we create the build/event data before we create the build request or the event in the queue13:16
swestonly downside of not using a transaction is, that we need to handle the cleanup of leaked build or event data in a periodic cleanup13:17
opendevreviewMatthieu Huin proposed zuul/zuul master: CORS: support regular expressions in allowed origins  https://review.opendev.org/c/zuul/zuul/+/80320913:40
*** jcapitao_lunch is now known as jcapitao13:40
corvusswest: i suspect we should go ahead and vendor in your fixed lock recipe -- do you want to do that?13:50
swestcorvus: 80316613:50
corvusswest: great thanks :)13:50
swestcorvus: I moved the vendored watchers and lock recipe to zuul/zk/vendor13:51
swestotherwise we'd have lock.py and locks.py in the same path which might get confusing13:52
opendevreviewMatthieu Huin proposed zuul/zuul master: Add authentication-realm attribute to tenants  https://review.opendev.org/c/zuul/zuul/+/73558613:55
opendevreviewBenjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool  https://review.opendev.org/c/zuul/zuul/+/78868013:59
corvusfelixedel: did you look at https://review.opendev.org/803117 ?14:04
felixedelcorvus: Now I did14:08
opendevreviewJames E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/80229914:17
corvusfelixedel: thanks!  i squashed it.  now i think we'll need to review swest's changes and restack the merger api on top of them.14:17
corvusswest: changes lgtm so far but we should add the orphaned data cleanup first.14:37
corvusswest: i can work on that today if you haven't started on it14:39
swestcorvus: I won't have time to do that today but can look into that tomorrow if you are busy with something else14:40
corvusswest: i should be able to take care of it14:40
*** jpena is now known as jpena|off15:25
*** rpittau is now known as rpittau|afk16:24
*** sshnaidm is now known as sshnaidm|afk16:25
*** marios|ruck is now known as marios|out16:26
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data  https://review.opendev.org/c/zuul/zuul/+/80324518:20
opendevreviewJames E. Blair proposed zuul/zuul master: Store build params separate from request  https://review.opendev.org/c/zuul/zuul/+/80315520:50
opendevreviewJames E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters  https://review.opendev.org/c/zuul/zuul/+/80326220:50
*** dviroel is now known as dviroel|out20:53
opendevreviewJames E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters  https://review.opendev.org/c/zuul/zuul/+/80326221:07
opendevreviewJames E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler  https://review.opendev.org/c/zuul/zuul/+/80326421:07
opendevreviewJames E. Blair proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315621:08
opendevreviewJames E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request  https://review.opendev.org/c/zuul/zuul/+/80315821:08
opendevreviewJames E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters  https://review.opendev.org/c/zuul/zuul/+/80326221:30
opendevreviewJames E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler  https://review.opendev.org/c/zuul/zuul/+/80326421:30
opendevreviewJames E. Blair proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315621:30
opendevreviewJames E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request  https://review.opendev.org/c/zuul/zuul/+/80315821:30
corvusclarkb, tobiash, swest: i've added to and restacked swest's changes from earlier.  the gist is that my "transaction" fix to the payload size issues observed with very large repos was flawed, and so was the test of it.  the new stack fixes the issue and the test.  would you please review this with a relatively high priority (since it addresses issues with current master/release)?  https://review.opendev.org/q/status:open+hashtag:sos-fix21:47
clarkbcorvus: yes I can start to look at that once I've gotten the opendev meeting agenda out.21:47
corvus(i cheated and also added the kazoo lock vendor/fix in there, mostly because i'm worried about divergence if we add new read/write locks)21:48
corvusclarkb: thanks!21:48
opendevreviewlotorev vitaly proposed zuul/zuul master: doc: Specify job.required-projects is overriden  https://review.opendev.org/c/zuul/zuul/+/80326922:07
clarkbcorvus: in https://review.opendev.org/c/zuul/zuul/+/803154 what ensures we don't read a half written set of data? is this a keep reviewing the whole stack situation?22:17
corvusclarkb: no that should be in place there -- basically it's the "write the sharded data first, then write the event last" that makes that work22:22
clarkbI guess we write the actual event after the side channel22:22
clarkbyup22:22
corvusoh shoot -- is that going to race with the cleanup i wrote in the next patch?22:23
clarkbpossibly22:23
clarkbI'm only just now reading that change so not sure yet :)22:23
clarkbcorvus: do you need to add a sequence id to the shard data entries? then if the last queue entry sequence id is > the shard data we know we can clean the shard data?22:26
clarkbthough that may not be true unless you also prevent writes to the queue in parallel22:26
corvusi think parallel writes are expected22:28
corvusso we could have A start, B start, B finish, A finish22:28
corvuswe also don't know the sequence id when we write the shard data22:29
corvusi think we may have to resort to timestamps?22:29
corvusthat's so hard to test though :/22:30
clarkbya if timestamp of shard data is at least 5 minutes old and doesn't have a corresponding event it is highly likely to have leaked?22:30
clarkba delta that large between writing shard data and the event itself should basically never happen22:31
corvusyep22:31
clarkbfwiw I think this is something that kafka struggled with22:31
clarkbthey basically had to keep all the real data outside of zookeeper then index into it via zk? I want to say this is one reason they ended up just absorbing the pacsos type db into their own software22:32
corvusyeah, it's a problem :)22:34
clarkbcorvus: reading the cleanup code it seems that it only cleans up shard data for events that are in the queue?22:35
clarkbwhich may mean we are not susceptible to problems as long as we lock around queue processing (I think we do)22:36
clarkbhwoever, it does mean that if we clean up an event but not its shard data we'd still leak that shard data?22:36
clarkbcorvus: also left another thing on https://review.opendev.org/c/zuul/zuul/+/803245 which may end up related to ^22:38
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data  https://review.opendev.org/c/zuul/zuul/+/80324522:40
corvusclarkb: re the first thing -- it should only clean up shard data that isn't referenced by an event22:41
corvusie: make candidate list of shard data; remove any shard data from that list if it's referenced by a current event22:42
clarkbcorvus: oh I see the remove call there is against the set22:43
clarkbI totally read that as remove from zk22:43
clarkbok ya22:43
corvuscool -- the next ps replaced that double event list with the timestamp check22:43
corvusi'll fixup the comments too22:43
clarkbyup looking at the new ps now22:43
clarkbya I think this new patchset will work. It builds a candidate list based on age. If we still see the event for the a candidate it gets removed from the list. Then we return the list of candidates that have no corresponding event22:45
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data  https://review.opendev.org/c/zuul/zuul/+/80324522:45
corvusclarkb: bonus: filtering by age will reduce the candidate set to 0 in most cases22:45
corvusthat ps is just the comment update22:46
clarkbcorvus: should you add that check in _findLostSideChannelData() ? if data_nodes: #get events and iterate over events22:47
clarkbcorvus: that way we reduce the number of zk queries necessary to scan for leaks22:47
corvusclarkb: oh you mean short circuit if candidate list is empty?22:49
clarkbyes22:50
clarkbleft a comment with more info on that22:50
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data  https://review.opendev.org/c/zuul/zuul/+/80324522:51
corvusdone22:51
clarkblgtm22:52
clarkbcorvus: are you going to rebase the children of ^ if so I'll wait for that before I start reviewing22:52
corvusclarkb: yeah, gimme 5 mins to finish up the same update for the next pair of changes22:53
clarkbok22:53
opendevreviewJames E. Blair proposed zuul/zuul master: Store build params separate from request  https://review.opendev.org/c/zuul/zuul/+/80315522:57
opendevreviewJames E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters  https://review.opendev.org/c/zuul/zuul/+/80326222:57
corvusclarkb: okay that's the next pair22:57
corvusftr, i really would like to DRY this, but every one of these has just enough subtle differences in paths or behavior that make that impractical22:58
opendevreviewJames E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler  https://review.opendev.org/c/zuul/zuul/+/80326422:59
opendevreviewJames E. Blair proposed zuul/zuul master: Remove transaction support from sharding API  https://review.opendev.org/c/zuul/zuul/+/80315622:59
opendevreviewJames E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request  https://review.opendev.org/c/zuul/zuul/+/80315822:59
clarkbcorvus: why use sets and iterators and dicts in the cleanup and find paths differently23:09
clarkblooking at the code I think https://review.opendev.org/c/zuul/zuul/+/803262/4/zuul/zk/executor.py _findLostParams might be able to use a set like _findLostSideChannelData() ? might be nice to be consistent there23:09
clarkbanyway I'll leave proper review comments23:10
corvusclarkb: because we deduplicate by build uuid, but we return paths23:10
clarkboh I see23:10
corvuswe need a path at step 1 to get zstat, then an id at step 2 to de-duplicate, then a path at step 3 to return to the caller.  i figure since we need to keep both bits of data past step 2, a dict made sense to pair them23:11
clarkbya and unlike the side channel data the ids are meaningful here23:12
corvusexactyl. and that's one of those subtle differences that means this can't be a straight copy/reuse :/23:12
clarkbsmall note on https://review.opendev.org/c/zuul/zuul/+/803264 as a sanity check23:19
clarkbthe stack lgtm now23:21
corvusclarkb: yep, good to be aware of that23:22

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!