Thursday, 2021-04-22

ianw2021-04-21 06:35:48,993 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>00:01
ianwthis feels like what would have deleted "/nodepool/images/fedora-32/builds/0000057968/providers/ovh-bhs1/images/0000000001"00:02
clarkbcorvus: that first change that I had previously reviewed lgtm. Looks like the only diff is handling use of item.item_ahead when it has been unset (but carried forward by old_item_ahead)00:13
clarkbI don't think I'll get to the other today, but will try to do it early tomorrow00:13
ianwnb01 : nodepool-builder.log.2021-04-20_23:2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs100:14
corvusclarkb: no problem, thanks!00:14
ianwnb02 : nodepool-builder.log.2021-04-21_01:2021-04-21 06:35:46,370 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs100:14
ianwit seems to me they both decided to delete it?00:15
corvusianw: yeah, that's probably the smoking gun; it's possible one of them created a lock immediately after the other one deleted it and caused the znode to stick around00:15
ianwcorvus: hrm, because of the recursive delete?  it can delete the lock file before removing the node, and the other thinks it got the lock?00:17
corvusianw: yeah -- i'm not looking at the code right now, so speaking only in generalties from memory :)00:17
fungii remember when shutil.rmtree had a similar bug00:17
ianwhttps://opendev.org/zuul/nodepool/src/branch/master/nodepool/zk.py#L1639 would be the function in question i guess00:18
ianwdeleteUpload()00:19
ianwi guess we need to get the lock, delete the node, delete the lock.  i'll have to do some reading to get my head around it00:23
ianwin the mean time, i guess i should just manually delete the node on the production system to stop all the spewing of errors00:24
fungithat's how we've handled it in the past, yeah00:25
corvusthe lock is under the node that's deleted00:26
corvusthere's also the interaction with builds and uploads here -- since that build from earlier had an upload in it still00:32
ianw2021-04-21 06:35:52,698 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>00:55
ianw2021-04-21 06:35:48,993 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>00:55
ianwagain nb01 & nb0200:55
ianwthe next thing both should do after putting out that message is00:56
ianw  with self._zk.imageUploadNumberLock(upload,00:56
ianw                                                        blocking=False):00:56
ianwfrom the logs, neither hit https://opendev.org/zuul/nodepool/src/branch/master/nodepool/builder.py#L35200:58
*** rlandy|rover|bbl has quit IRC01:53
*** evrardjp has quit IRC02:36
*** evrardjp has joined #zuul02:36
*** bhavikdbavishi has joined #zuul04:12
*** bhavikdbavishi1 has joined #zuul04:15
*** bhavikdbavishi has quit IRC04:17
*** bhavikdbavishi1 is now known as bhavikdbavishi04:17
*** ykarel has joined #zuul04:45
*** saneax has joined #zuul04:56
*** zbr9 has joined #zuul05:04
*** zbr has quit IRC05:06
*** zbr9 is now known as zbr05:06
*** saneax has quit IRC05:10
*** saneax has joined #zuul05:36
*** bhavikdbavishi has quit IRC06:07
*** ykarel_ has joined #zuul06:08
*** ykarel has quit IRC06:11
*** ykarel__ has joined #zuul06:11
*** ykarel_ has quit IRC06:14
*** ykarel__ is now known as ykarel06:17
swestcorvus: re the time for loading the keys: can you see if the problem is loading the keys from Zookeeper or writing them out to the backup keystore?06:27
*** ykarel_ has joined #zuul06:32
*** ykarel has quit IRC06:35
*** saneax has quit IRC06:40
*** reiterative has quit IRC06:49
*** reiterative has joined #zuul06:50
*** jpena|off is now known as jpena06:50
*** jcapitao has joined #zuul07:04
openstackgerritIan Wienand proposed zuul/nodepool master: Avoid race deleting images  https://review.opendev.org/c/zuul/nodepool/+/78747507:07
ianwcorvus: ^ that's about my best guess of what's going on with the build delete race ...07:07
*** nils has joined #zuul07:17
*** bhavikdbavishi has joined #zuul07:19
*** okamis has joined #zuul07:34
*** rpittau|afk is now known as rpittau07:34
*** okamis has left #zuul07:37
*** jcapitao has quit IRC07:39
*** jcapitao has joined #zuul07:39
*** snapiri has joined #zuul07:46
*** tosky has joined #zuul07:49
*** sean-k-mooney has joined #zuul08:28
sean-k-mooneyquick question can the gerrit plugin for zuul work with only http aceess or does it also need ssh. i know it can use the http api for some thigns but am not sure if it can work with only that access08:30
swestcorvus: I deployed the lastest master in our int environment and I can confirm your numbers. I also tried writing the keys to the backup store only in case they don't exist, but that did not seem to speed things up. So it's probably the time for loading the keys from Zookeeper08:59
*** saneax has joined #zuul09:11
*** ykarel_ has quit IRC09:18
lyrHi there. I migrated jobs from one zuul to another, one is failing with a "Executing local code is prohibited". Is there an option to switch to allow that ?09:32
avasslyr: local code execution is only allowed in trusted contexts. is that new zuul instance also a newer version? because there was a bug that used to allow that anyway09:34
avasslyr: it was fixed in 3.19.1: https://zuul-ci.org/docs/zuul/reference/releasenotes.html#security-issues09:35
lyravass: yes, I'm migrating from software factory 3.3 to 3.5, there's definitively a zuul version bump involved09:36
lyrI'm gonna rewrite the piece of a job, I don't like it anyway09:36
lyrs/the/this09:37
avasslyr: good :)09:38
lyrWhile I'm around, there's a handy zuul-client CLI. Is there a nodepool equivalent ? pip install nodepool seems to install the whole nodepool thing09:39
avassno, not yet at least09:39
lyrAnd is there a way to specify the default tenant in the ~/.zuul.conf file for zuul-client usage ?09:40
avassI have no clue actually, but that would be nice to have :)09:40
mhulyr: no, but that'd be a nice improvement09:40
lyrok, thanks for the intel09:41
mhuzuul-client comes from the admin CLI and thus kept the args hierarchy for retrocompatibility. In the admin CLI it makes sense to pass tenants as subcommand arguments since the admin can manage all tenants;09:41
lyrOn github I'ld make a feature request issue for this, but I'm not familiar with gerrit. Since we agree on the interest of this option, can you submit the idea the way it should be ?09:43
mhulyr: since https://review.opendev.org/c/zuul/zuul-client/+/765203 though, you can omit the tenant arg if you're using a tenant-scoped api url09:44
mhufor example zuul.openstack.org is tenant-scoped to openstack, see https://zuul.openstack.org/api/info09:45
mhulyr, I'll pop a quick patch and add you as reviewer, what's your gerrit handle?09:46
lyrmhu: I don't have one, I'll create that09:48
lyrTried this, with a "weird" error, didn't check the required versions, though I installed zuul-client this week https://paste.garrigue.re/?e70913ffe35f2e95#DwDCyeM6EjrVTYxyUJELf8ALnJsbC2HLJUsK9ucvyYJG09:48
mhuthe first url is incorrect, it's a web UI one09:49
mhuit's a sf deployment so my guess is that it's single tenant?09:50
lyryes09:52
lyrmhu: my handle should be rgarrigue09:54
lyrlogin went through ubuntu one, ended up with an error, I'm connected but there was no username, which I updated to rgarrigue... should be ok09:54
*** bhavikdbavishi has quit IRC10:03
openstackgerritSimon Westphahl proposed zuul/zuul master: Cache secret/SSH keys from Zookeeper  https://review.opendev.org/c/zuul/zuul/+/78752010:04
lyrmhu: what's the tenant scoped api URL ? Tried https://sf/zuul/api/tenant/local, zuul-client says 404 https://sf/zuul/api/tenant/local/api/info10:12
mhulyr, for zuul itself, see https://zuul-ci.org/docs/zuul/howtos/installation.html#web-deployment-options10:14
mhuit's white labelling10:14
mhufor SF, there's a doc for multi tenancy as well10:15
mhuIIRC if you're not using multi tenancy on SF there's no white labeling for the local tenant, so you have to use the root API url10:18
lyrhum10:20
lyrok, I'll write the "--tenant local" for as long as it takes :D10:20
lyr(I've a whole infra to managed and I already spent way too much time taking back control on this)10:21
*** jcapitao is now known as jcapitao_lunch11:03
swestcorvus: in 787520 I tried caching the keys. Since there is no way to rotate keys atm this looked like the best way to me. This doesn't solve the delay on startup but it helps with reconfigurations.11:06
*** jfoufas1 has joined #zuul11:23
openstackgerritSimon Westphahl proposed zuul/zuul master: Cache secret/SSH keys from Zookeeper  https://review.opendev.org/c/zuul/zuul/+/78752011:26
*** ykarel has joined #zuul11:29
*** bhavikdbavishi has joined #zuul11:32
*** jpena is now known as jpena|lunch11:34
*** rlandy has joined #zuul11:47
*** rlandy is now known as rlandy|rover11:50
*** ykarel has quit IRC11:55
*** jcapitao_lunch is now known as jcapitao11:57
*** ykarel has joined #zuul11:57
*** bhavikdbavishi1 has joined #zuul11:58
*** ykarel has quit IRC12:00
*** rlandy|rover is now known as rlandy|rover|cal12:00
*** rlandy|rover|cal is now known as rlandy|rvr|call12:00
*** ykarel has joined #zuul12:00
*** bhavikdbavishi has quit IRC12:01
*** bhavikdbavishi1 is now known as bhavikdbavishi12:01
*** rlandy|rvr|call is now known as rlandy|rover12:18
*** jpena|lunch is now known as jpena12:35
*** hamalq has joined #zuul12:39
*** saneax has quit IRC13:04
*** nils has quit IRC13:04
*** erbarr has quit IRC13:04
*** saneax has joined #zuul13:05
*** nils has joined #zuul13:05
*** erbarr has joined #zuul13:05
*** bhavikdbavishi has quit IRC13:46
*** saneax has quit IRC14:52
*** harrymichal has joined #zuul15:24
*** harrymichal has quit IRC15:24
*** harrymichal has joined #zuul15:25
*** avass has quit IRC15:26
*** avass has joined #zuul15:27
*** ykarel is now known as ykarel|away15:36
*** ykarel|away has quit IRC15:41
*** bhavikdbavishi has joined #zuul15:49
*** jcapitao has quit IRC15:56
*** bhavikdbavishi1 has joined #zuul15:56
*** bhavikdbavishi has quit IRC15:58
*** bhavikdbavishi1 is now known as bhavikdbavishi15:58
*** jpena is now known as jpena|away16:00
*** hamalq has quit IRC16:06
*** sshnaidm is now known as sshnaidm|afk16:28
avasscorvus: before merging https://review.opendev.org/c/zuul/zuul/+/787451 I have a usecase for unique workspace checkout you might want to take into consideration:16:44
avasscaching git repos in images16:44
avassotherwise there would be overlapping projects causing problems but after they're converted to bare repositories they would not longer overlap and instead have the same structure as gerrit has internally16:46
corvusavass: there are 2 parts to this, let's make sure we're talking about the same part -- you're talking specifically about building an image with cached repos (not using a previously built image with cached repos on it).  and you want to have zuul check out all of those repos into a workspace as part of building the image?16:47
avassyeah16:48
corvusavass: are you sure you want to ask zuul to prepare all of those repos, as opposed to doing new clones?16:49
corvusi guess if you're building the image in a zuul job, that's probably the best way to use an existing repo cache16:49
avasscorvus: not completely convinced of that yet no16:50
avassbut yeah that's the current idea16:50
avassbut caching the larger repositories would at least help a lot16:51
corvusokay, so running with that idea -- you'd like to have the option to specify the 'unique' scheme so that you can use that as the repo cache in the image.  then, presumably, update prepare-workspace-git (or whatever) to use the unique scheme on the source side when it's copying stuff into the workspace.  that means that it would need to support multiple schemes, and know how to create 'unique' names given a16:52
corvuszuul vars project entry (which is doable, it's pretty simple)16:52
avassnot exaclty, just use the unique scheme for the workspace on the node and then cache the repos as bare repositories using their canonical names16:52
avassso there would only be a need for the unique scheme16:53
corvusi see; using bare repos avoids the collision issue; does prep-workspace-git expect bare repos or no?16:54
avassI think it does let me check16:54
avasslooks like it doesn't but that shouldn't be hard to configure16:55
corvusavass: i think strictly speaking as long as you're using a deep hierarchy, you could still technically have collisions even with bare repos16:56
avassbut you'd need to have two projects called <something>.git and <something> in that case16:58
corvusfirst, bare repos don't need to have .git suffixes16:58
corvusbut if you chose to give them .git suffixes, then yeah, you can still have a collision like that ^16:58
avassah yeah my mistake16:59
corvus(but you don't need a bare repo to add a .git suffix; really it's two related but ultimately different issues)16:59
corvusi think we can generalize this as we can't avoid collisions with deep hierarchies, but we can make little changes to adjust the likelihood of them :)17:00
avassIn that case caching the repositories with their unique names should still be enough since we can create that from the canonical names since we should always have that17:00
avasswhich means we still just need to support one scheme17:01
corvusyeah, i think if we have a use case of "i have conflicting repos with multiple schemes and i want the executor to splat *all* of them out in one job" then the only way to guarantee that is with the 'unique' scheme, or one like it.17:01
corvusand then have prepare-workspace-git understand that a repo cache might be in a unique scheme.17:03
avassI'm not sure if we're gonna need that but I'd rather make sure the option to do it exists :)17:06
corvuslet's mull that over for a bit; but if we like that, i think it's pretty easy to open up 'unique' as an additional user-facing scheme (with documentation that says "Do not use this." :)  then if we want prepare-workspace-git to handle multiple cache schemes, we just need to update it to deal with that.  it should have all the info it needs (but we can give it an assist with more zuul job vars if we need to)17:08
corvusi think the main thing right now is: 1) i don't see that doing option #2 from the email and support arbitrary mapping helps us here; 2) nothing so far blocks us from supporting unique caches in the future.17:09
avassagreed17:11
corvusavass: note that as part of 786744 zuul should detect repo collisions and should refuse to run the job if they collide; so if you write that job right now, it will probably stop working when 786744 lands.  but you could exclude just the problematic repo from the cache and it would work (at the cost of cloning that repo whenever it appears in jobs)17:16
corvus(i mean, granted, it's not really working for *other* reasons right now; but just wanted to point that out)17:17
avassyup that's the current idea because I expect them to cause problems anyway. but looking at the repo sizes I don't think we would gain much by caching them either17:17
corvusfrom prep-workspage-git:  git clone --bare {{ cached_repos_root }}/{{ zj_project.0.canonical_name }} {{ ansible_user_dir }}/{{ zj_project.0.src_dir }}/.git17:19
corvusi have no idea why we chose to use canonical_name for the first part and src_dir for the second part... oh i think it's because src/dir also has 'src/' on it?  anyway, that stroke of luck i think actually means that prep-workspage-git will already work with a repo cache in golang scheme mapping to a workspace in flat scheme.17:20
corvusneat :)17:20
avasscorvus: does that throw an error back to gerrit? if so that's going to help to avoid confused users :)17:21
corvusavass: i don't think so, i think it's going to be a thoroughly confusing situation and we should improve it.17:21
corvusthe existing error handling around merge errors is weird and complicated; it may not be straightforward.  but worth doing.17:23
avassalso an idea from working with rust for a while: it could help even more if that error references back to feature/upgrade note and include a way to avoid that error by supplying workspace-scheme: flat17:24
*** rpittau is now known as rpittau|afk17:24
avassthough I suppose the feature note explains that17:24
*** bhavikdbavishi1 has joined #zuul17:25
*** rlandy|rover is now known as rlandy|rover|mtg17:27
*** jfoufas1 has quit IRC17:28
*** bhavikdbavishi has quit IRC17:29
*** bhavikdbavishi1 is now known as bhavikdbavishi17:29
avasshmm looks like it's not possible to link to specific notes :/17:30
fungiwe could probably implement that in reno fairly easily... they do already have unique id strings, we'd just need to expose anchors for them17:31
fungioh, though a release note may add multiple entries in different sections and they get collated, so maybe not as easy as i first thought17:32
corvusavass: if we're going to link an error message to anything, let's link it to docs17:34
corvusrelease notes are for old users; new users can hit errors too17:34
avassthat works too17:34
fungiyeah, that i completely agree with. documentation is for documenting things (including errors)17:34
fungirelease notes are something you should only ever need to look at when upgrading17:35
avassit's just to avoid having an error message saying "ERROR: BAD CODE" and instead allow for having a longer explanation for why and how to fix it. it would also help the user get acquainted with the documentation :)17:36
corvusto be fair, when we pass error messages to gerrit, they are pretty rarely terse :)17:37
avasstrue :)17:37
corvusbut i agree they can be verbose and link to docs17:38
*** ianychoi_ has joined #zuul17:38
corvushowever, i'm not sure we've done the last of our doc reorgs, so it might be worth thinking about what the story is when we move them again17:39
corvus(generate a redirect map like last time?  or come up with a system for explicit error redirects?)17:39
*** webknjaz_ has joined #zuul17:41
*** ianychoi__ has quit IRC17:48
*** sduthil has quit IRC17:48
*** xarragon has quit IRC17:49
*** webknjaz has quit IRC17:49
*** fbo has quit IRC17:49
*** webknjaz_ is now known as webknjaz17:49
*** fbo has joined #zuul17:54
*** zettabyte has joined #zuul17:58
*** jpena|away is now known as jpena|off18:07
*** rlandy|rover|mtg is now known as rlandy|rover18:07
*** zettabyte has quit IRC18:47
*** ajitha has joined #zuul19:09
*** bhavikdbavishi has quit IRC19:16
*** hamalq has joined #zuul20:10
*** ajitha has quit IRC21:18
openstackgerritJames E. Blair proposed zuul/zuul-client master: Encrypt: never strip with --infile  https://review.opendev.org/c/zuul/zuul-client/+/78764221:39
*** irclogbot_2 has quit IRC21:50
clarkbcorvus: I left a question on https://review.opendev.org/c/zuul/zuul/+/78674421:54
*** irclogbot_3 has joined #zuul21:55
corvusclarkb: good q, responded22:04
clarkbthanks for the clarification. I think the change lgtm22:07
corvuscool, i'll start +wing that stack and maybe we can restart w it tomorrow22:08
clarkbcool. I'm hoping we can also restart gerrit to pick up 3.2.8 and the jeepyb fixes as well as maybe finally start the zk cluster upgrade22:09
clarkbwill be a fun one :)22:09
*** rlandy|rover is now known as rlandy|rover|bbl22:17
corvuslol friday22:17
*** nils has quit IRC22:36
openstackgerritMerged zuul/zuul master: Add a fast-forward test  https://review.opendev.org/c/zuul/zuul/+/78652123:07
*** harrymichal has quit IRC23:09
*** harrymichal has joined #zuul23:10
*** harrymichal has quit IRC23:14
*** tosky has quit IRC23:17
ianwclarkb: good question ... i think they're ephemeral nodes but, maybe not23:21
clarkbya I think image uploads survive restarts23:22
corvuslocks are always ephemeral nodes23:23
clarkboh the locks themselves23:23
clarkbok so recursive shouldn't be necessary, but we may need to do our best to check for the old lock path too when locking?23:23
openstackgerritMerged zuul/zuul master: Correct repo_state format in isUpdateNeeded  https://review.opendev.org/c/zuul/zuul/+/78652223:35
openstackgerritMerged zuul/zuul master: Revert "Revert "Make repo state buildset global""  https://review.opendev.org/c/zuul/zuul/+/78553523:36
openstackgerritMerged zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen  https://review.opendev.org/c/zuul/zuul/+/78553623:36
openstackgerritMerged zuul/zuul master: Restore repo state in checkoutBranch  https://review.opendev.org/c/zuul/zuul/+/78652323:37
openstackgerritMerged zuul/zuul master: Clarify merger updates and resets  https://review.opendev.org/c/zuul/zuul/+/78674423:37
openstackgerritMerged zuul/nodepool master: Remove statsd args to OpenStack API client call  https://review.opendev.org/c/zuul/nodepool/+/78686223:48
*** paladox has joined #zuul23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!