Thursday, 2021-04-22

ianw	2021-04-21 06:35:48,993 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>	00:01
ianw	this feels like what would have deleted "/nodepool/images/fedora-32/builds/0000057968/providers/ovh-bhs1/images/0000000001"	00:02
clarkb	corvus: that first change that I had previously reviewed lgtm. Looks like the only diff is handling use of item.item_ahead when it has been unset (but carried forward by old_item_ahead)	00:13
clarkb	I don't think I'll get to the other today, but will try to do it early tomorrow	00:13
ianw	nb01 : nodepool-builder.log.2021-04-20_23:2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1	00:14
corvus	clarkb: no problem, thanks!	00:14
ianw	nb02 : nodepool-builder.log.2021-04-21_01:2021-04-21 06:35:46,370 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs1	00:14
ianw	it seems to me they both decided to delete it?	00:15
corvus	ianw: yeah, that's probably the smoking gun; it's possible one of them created a lock immediately after the other one deleted it and caused the znode to stick around	00:15
ianw	corvus: hrm, because of the recursive delete? it can delete the lock file before removing the node, and the other thinks it got the lock?	00:17
corvus	ianw: yeah -- i'm not looking at the code right now, so speaking only in generalties from memory :)	00:17
fungi	i remember when shutil.rmtree had a similar bug	00:17
ianw	https://opendev.org/zuul/nodepool/src/branch/master/nodepool/zk.py#L1639 would be the function in question i guess	00:18
ianw	deleteUpload()	00:19
ianw	i guess we need to get the lock, delete the node, delete the lock. i'll have to do some reading to get my head around it	00:23
ianw	in the mean time, i guess i should just manually delete the node on the production system to stop all the spewing of errors	00:24
fungi	that's how we've handled it in the past, yeah	00:25
corvus	the lock is under the node that's deleted	00:26
corvus	there's also the interaction with builds and uploads here -- since that build from earlier had an upload in it still	00:32
ianw	2021-04-21 06:35:52,698 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>	00:55
ianw	2021-04-21 06:35:48,993 DEBUG nodepool.builder.CleanupWorker.0: Deleting image upload: <ImageUpload {'state': 'deleting', 'state_time': 1618986946.3660314, 'external_id': '517d4974-b220-47b2-8cbf-551e7d28bb88', 'external_name': 'fedora-32-1618803263', 'format': 'qcow2', 'username': 'zuul', 'python_path': '/usr/bin/python3', 'shell_type': None, 'id': '0000000001', 'build_id': '0000057968', 'provider_name': 'ovh-bhs1', 'image_name': 'fedora-32'}>	00:55
ianw	again nb01 & nb02	00:55
ianw	the next thing both should do after putting out that message is	00:56
ianw	with self._zk.imageUploadNumberLock(upload,	00:56
ianw	blocking=False):	00:56
ianw	from the logs, neither hit https://opendev.org/zuul/nodepool/src/branch/master/nodepool/builder.py#L352	00:58
*** rlandy\|rover\|bbl has quit IRC		01:53
*** evrardjp has quit IRC		02:36
*** evrardjp has joined #zuul		02:36
*** bhavikdbavishi has joined #zuul		04:12
*** bhavikdbavishi1 has joined #zuul		04:15
*** bhavikdbavishi has quit IRC		04:17
*** bhavikdbavishi1 is now known as bhavikdbavishi		04:17
*** ykarel has joined #zuul		04:45
*** saneax has joined #zuul		04:56
*** zbr9 has joined #zuul		05:04
*** zbr has quit IRC		05:06
*** zbr9 is now known as zbr		05:06
*** saneax has quit IRC		05:10
*** saneax has joined #zuul		05:36
*** bhavikdbavishi has quit IRC		06:07
*** ykarel_ has joined #zuul		06:08
*** ykarel has quit IRC		06:11
*** ykarel__ has joined #zuul		06:11
*** ykarel_ has quit IRC		06:14
*** ykarel__ is now known as ykarel		06:17
swest	corvus: re the time for loading the keys: can you see if the problem is loading the keys from Zookeeper or writing them out to the backup keystore?	06:27
*** ykarel_ has joined #zuul		06:32
*** ykarel has quit IRC		06:35
*** saneax has quit IRC		06:40
*** reiterative has quit IRC		06:49
*** reiterative has joined #zuul		06:50
*** jpena\|off is now known as jpena		06:50
*** jcapitao has joined #zuul		07:04
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Avoid race deleting images https://review.opendev.org/c/zuul/nodepool/+/787475	07:07
ianw	corvus: ^ that's about my best guess of what's going on with the build delete race ...	07:07
*** nils has joined #zuul		07:17
*** bhavikdbavishi has joined #zuul		07:19
*** okamis has joined #zuul		07:34
*** rpittau\|afk is now known as rpittau		07:34
*** okamis has left #zuul		07:37
*** jcapitao has quit IRC		07:39
*** jcapitao has joined #zuul		07:39
*** snapiri has joined #zuul		07:46
*** tosky has joined #zuul		07:49
*** sean-k-mooney has joined #zuul		08:28
sean-k-mooney	quick question can the gerrit plugin for zuul work with only http aceess or does it also need ssh. i know it can use the http api for some thigns but am not sure if it can work with only that access	08:30
swest	corvus: I deployed the lastest master in our int environment and I can confirm your numbers. I also tried writing the keys to the backup store only in case they don't exist, but that did not seem to speed things up. So it's probably the time for loading the keys from Zookeeper	08:59
*** saneax has joined #zuul		09:11
*** ykarel_ has quit IRC		09:18
lyr	Hi there. I migrated jobs from one zuul to another, one is failing with a "Executing local code is prohibited". Is there an option to switch to allow that ?	09:32
avass	lyr: local code execution is only allowed in trusted contexts. is that new zuul instance also a newer version? because there was a bug that used to allow that anyway	09:34
avass	lyr: it was fixed in 3.19.1: https://zuul-ci.org/docs/zuul/reference/releasenotes.html#security-issues	09:35
lyr	avass: yes, I'm migrating from software factory 3.3 to 3.5, there's definitively a zuul version bump involved	09:36
lyr	I'm gonna rewrite the piece of a job, I don't like it anyway	09:36
lyr	s/the/this	09:37
avass	lyr: good :)	09:38
lyr	While I'm around, there's a handy zuul-client CLI. Is there a nodepool equivalent ? pip install nodepool seems to install the whole nodepool thing	09:39
avass	no, not yet at least	09:39
lyr	And is there a way to specify the default tenant in the ~/.zuul.conf file for zuul-client usage ?	09:40
avass	I have no clue actually, but that would be nice to have :)	09:40
mhu	lyr: no, but that'd be a nice improvement	09:40
lyr	ok, thanks for the intel	09:41
mhu	zuul-client comes from the admin CLI and thus kept the args hierarchy for retrocompatibility. In the admin CLI it makes sense to pass tenants as subcommand arguments since the admin can manage all tenants;	09:41
lyr	On github I'ld make a feature request issue for this, but I'm not familiar with gerrit. Since we agree on the interest of this option, can you submit the idea the way it should be ?	09:43
mhu	lyr: since https://review.opendev.org/c/zuul/zuul-client/+/765203 though, you can omit the tenant arg if you're using a tenant-scoped api url	09:44
mhu	for example zuul.openstack.org is tenant-scoped to openstack, see https://zuul.openstack.org/api/info	09:45
mhu	lyr, I'll pop a quick patch and add you as reviewer, what's your gerrit handle?	09:46
lyr	mhu: I don't have one, I'll create that	09:48
lyr	Tried this, with a "weird" error, didn't check the required versions, though I installed zuul-client this week https://paste.garrigue.re/?e70913ffe35f2e95#DwDCyeM6EjrVTYxyUJELf8ALnJsbC2HLJUsK9ucvyYJG	09:48
mhu	the first url is incorrect, it's a web UI one	09:49
mhu	it's a sf deployment so my guess is that it's single tenant?	09:50
lyr	yes	09:52
lyr	mhu: my handle should be rgarrigue	09:54
lyr	login went through ubuntu one, ended up with an error, I'm connected but there was no username, which I updated to rgarrigue... should be ok	09:54
*** bhavikdbavishi has quit IRC		10:03
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Cache secret/SSH keys from Zookeeper https://review.opendev.org/c/zuul/zuul/+/787520	10:04
lyr	mhu: what's the tenant scoped api URL ? Tried https://sf/zuul/api/tenant/local, zuul-client says 404 https://sf/zuul/api/tenant/local/api/info	10:12
mhu	lyr, for zuul itself, see https://zuul-ci.org/docs/zuul/howtos/installation.html#web-deployment-options	10:14
mhu	it's white labelling	10:14
mhu	for SF, there's a doc for multi tenancy as well	10:15
mhu	IIRC if you're not using multi tenancy on SF there's no white labeling for the local tenant, so you have to use the root API url	10:18
lyr	hum	10:20
lyr	ok, I'll write the "--tenant local" for as long as it takes :D	10:20
lyr	(I've a whole infra to managed and I already spent way too much time taking back control on this)	10:21
*** jcapitao is now known as jcapitao_lunch		11:03
swest	corvus: in 787520 I tried caching the keys. Since there is no way to rotate keys atm this looked like the best way to me. This doesn't solve the delay on startup but it helps with reconfigurations.	11:06
*** jfoufas1 has joined #zuul		11:23
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Cache secret/SSH keys from Zookeeper https://review.opendev.org/c/zuul/zuul/+/787520	11:26
*** ykarel has joined #zuul		11:29
*** bhavikdbavishi has joined #zuul		11:32
*** jpena is now known as jpena\|lunch		11:34
*** rlandy has joined #zuul		11:47
*** rlandy is now known as rlandy\|rover		11:50
*** ykarel has quit IRC		11:55
*** jcapitao_lunch is now known as jcapitao		11:57
*** ykarel has joined #zuul		11:57
*** bhavikdbavishi1 has joined #zuul		11:58
*** ykarel has quit IRC		12:00
*** rlandy\|rover is now known as rlandy\|rover\|cal		12:00
*** rlandy\|rover\|cal is now known as rlandy\|rvr\|call		12:00
*** ykarel has joined #zuul		12:00
*** bhavikdbavishi has quit IRC		12:01
*** bhavikdbavishi1 is now known as bhavikdbavishi		12:01
*** rlandy\|rvr\|call is now known as rlandy\|rover		12:18
*** jpena\|lunch is now known as jpena		12:35
*** hamalq has joined #zuul		12:39
*** saneax has quit IRC		13:04
*** nils has quit IRC		13:04
*** erbarr has quit IRC		13:04
*** saneax has joined #zuul		13:05
*** nils has joined #zuul		13:05
*** erbarr has joined #zuul		13:05
*** bhavikdbavishi has quit IRC		13:46
*** saneax has quit IRC		14:52
*** harrymichal has joined #zuul		15:24
*** harrymichal has quit IRC		15:24
*** harrymichal has joined #zuul		15:25
*** avass has quit IRC		15:26
*** avass has joined #zuul		15:27
*** ykarel is now known as ykarel\|away		15:36
*** ykarel\|away has quit IRC		15:41
*** bhavikdbavishi has joined #zuul		15:49
*** jcapitao has quit IRC		15:56
*** bhavikdbavishi1 has joined #zuul		15:56
*** bhavikdbavishi has quit IRC		15:58
*** bhavikdbavishi1 is now known as bhavikdbavishi		15:58
*** jpena is now known as jpena\|away		16:00
*** hamalq has quit IRC		16:06
*** sshnaidm is now known as sshnaidm\|afk		16:28
avass	corvus: before merging https://review.opendev.org/c/zuul/zuul/+/787451 I have a usecase for unique workspace checkout you might want to take into consideration:	16:44
avass	caching git repos in images	16:44
avass	otherwise there would be overlapping projects causing problems but after they're converted to bare repositories they would not longer overlap and instead have the same structure as gerrit has internally	16:46
corvus	avass: there are 2 parts to this, let's make sure we're talking about the same part -- you're talking specifically about building an image with cached repos (not using a previously built image with cached repos on it). and you want to have zuul check out all of those repos into a workspace as part of building the image?	16:47
avass	yeah	16:48
corvus	avass: are you sure you want to ask zuul to prepare all of those repos, as opposed to doing new clones?	16:49
corvus	i guess if you're building the image in a zuul job, that's probably the best way to use an existing repo cache	16:49
avass	corvus: not completely convinced of that yet no	16:50
avass	but yeah that's the current idea	16:50
avass	but caching the larger repositories would at least help a lot	16:51
corvus	okay, so running with that idea -- you'd like to have the option to specify the 'unique' scheme so that you can use that as the repo cache in the image. then, presumably, update prepare-workspace-git (or whatever) to use the unique scheme on the source side when it's copying stuff into the workspace. that means that it would need to support multiple schemes, and know how to create 'unique' names given a	16:52
corvus	zuul vars project entry (which is doable, it's pretty simple)	16:52
avass	not exaclty, just use the unique scheme for the workspace on the node and then cache the repos as bare repositories using their canonical names	16:52
avass	so there would only be a need for the unique scheme	16:53
corvus	i see; using bare repos avoids the collision issue; does prep-workspace-git expect bare repos or no?	16:54
avass	I think it does let me check	16:54
avass	looks like it doesn't but that shouldn't be hard to configure	16:55
corvus	avass: i think strictly speaking as long as you're using a deep hierarchy, you could still technically have collisions even with bare repos	16:56
avass	but you'd need to have two projects called <something>.git and <something> in that case	16:58
corvus	first, bare repos don't need to have .git suffixes	16:58
corvus	but if you chose to give them .git suffixes, then yeah, you can still have a collision like that ^	16:58
avass	ah yeah my mistake	16:59
corvus	(but you don't need a bare repo to add a .git suffix; really it's two related but ultimately different issues)	16:59
corvus	i think we can generalize this as we can't avoid collisions with deep hierarchies, but we can make little changes to adjust the likelihood of them :)	17:00
avass	In that case caching the repositories with their unique names should still be enough since we can create that from the canonical names since we should always have that	17:00
avass	which means we still just need to support one scheme	17:01
corvus	yeah, i think if we have a use case of "i have conflicting repos with multiple schemes and i want the executor to splat all of them out in one job" then the only way to guarantee that is with the 'unique' scheme, or one like it.	17:01
corvus	and then have prepare-workspace-git understand that a repo cache might be in a unique scheme.	17:03
avass	I'm not sure if we're gonna need that but I'd rather make sure the option to do it exists :)	17:06
corvus	let's mull that over for a bit; but if we like that, i think it's pretty easy to open up 'unique' as an additional user-facing scheme (with documentation that says "Do not use this." :) then if we want prepare-workspace-git to handle multiple cache schemes, we just need to update it to deal with that. it should have all the info it needs (but we can give it an assist with more zuul job vars if we need to)	17:08
corvus	i think the main thing right now is: 1) i don't see that doing option #2 from the email and support arbitrary mapping helps us here; 2) nothing so far blocks us from supporting unique caches in the future.	17:09
avass	agreed	17:11
corvus	avass: note that as part of 786744 zuul should detect repo collisions and should refuse to run the job if they collide; so if you write that job right now, it will probably stop working when 786744 lands. but you could exclude just the problematic repo from the cache and it would work (at the cost of cloning that repo whenever it appears in jobs)	17:16
corvus	(i mean, granted, it's not really working for other reasons right now; but just wanted to point that out)	17:17
avass	yup that's the current idea because I expect them to cause problems anyway. but looking at the repo sizes I don't think we would gain much by caching them either	17:17
corvus	from prep-workspage-git: git clone --bare {{ cached_repos_root }}/{{ zj_project.0.canonical_name }} {{ ansible_user_dir }}/{{ zj_project.0.src_dir }}/.git	17:19
corvus	i have no idea why we chose to use canonical_name for the first part and src_dir for the second part... oh i think it's because src/dir also has 'src/' on it? anyway, that stroke of luck i think actually means that prep-workspage-git will already work with a repo cache in golang scheme mapping to a workspace in flat scheme.	17:20
corvus	neat :)	17:20
avass	corvus: does that throw an error back to gerrit? if so that's going to help to avoid confused users :)	17:21
corvus	avass: i don't think so, i think it's going to be a thoroughly confusing situation and we should improve it.	17:21
corvus	the existing error handling around merge errors is weird and complicated; it may not be straightforward. but worth doing.	17:23
avass	also an idea from working with rust for a while: it could help even more if that error references back to feature/upgrade note and include a way to avoid that error by supplying workspace-scheme: flat	17:24
*** rpittau is now known as rpittau\|afk		17:24
avass	though I suppose the feature note explains that	17:24
*** bhavikdbavishi1 has joined #zuul		17:25
*** rlandy\|rover is now known as rlandy\|rover\|mtg		17:27
*** jfoufas1 has quit IRC		17:28
*** bhavikdbavishi has quit IRC		17:29
*** bhavikdbavishi1 is now known as bhavikdbavishi		17:29
avass	hmm looks like it's not possible to link to specific notes :/	17:30
fungi	we could probably implement that in reno fairly easily... they do already have unique id strings, we'd just need to expose anchors for them	17:31
fungi	oh, though a release note may add multiple entries in different sections and they get collated, so maybe not as easy as i first thought	17:32
corvus	avass: if we're going to link an error message to anything, let's link it to docs	17:34
corvus	release notes are for old users; new users can hit errors too	17:34
avass	that works too	17:34
fungi	yeah, that i completely agree with. documentation is for documenting things (including errors)	17:34
fungi	release notes are something you should only ever need to look at when upgrading	17:35
avass	it's just to avoid having an error message saying "ERROR: BAD CODE" and instead allow for having a longer explanation for why and how to fix it. it would also help the user get acquainted with the documentation :)	17:36
corvus	to be fair, when we pass error messages to gerrit, they are pretty rarely terse :)	17:37
avass	true :)	17:37
corvus	but i agree they can be verbose and link to docs	17:38
*** ianychoi_ has joined #zuul		17:38
corvus	however, i'm not sure we've done the last of our doc reorgs, so it might be worth thinking about what the story is when we move them again	17:39
corvus	(generate a redirect map like last time? or come up with a system for explicit error redirects?)	17:39
*** webknjaz_ has joined #zuul		17:41
*** ianychoi__ has quit IRC		17:48
*** sduthil has quit IRC		17:48
*** xarragon has quit IRC		17:49
*** webknjaz has quit IRC		17:49
*** fbo has quit IRC		17:49
*** webknjaz_ is now known as webknjaz		17:49
*** fbo has joined #zuul		17:54
*** zettabyte has joined #zuul		17:58
*** jpena\|away is now known as jpena\|off		18:07
*** rlandy\|rover\|mtg is now known as rlandy\|rover		18:07
*** zettabyte has quit IRC		18:47
*** ajitha has joined #zuul		19:09
*** bhavikdbavishi has quit IRC		19:16
*** hamalq has joined #zuul		20:10
*** ajitha has quit IRC		21:18
openstackgerrit	James E. Blair proposed zuul/zuul-client master: Encrypt: never strip with --infile https://review.opendev.org/c/zuul/zuul-client/+/787642	21:39
*** irclogbot_2 has quit IRC		21:50
clarkb	corvus: I left a question on https://review.opendev.org/c/zuul/zuul/+/786744	21:54
*** irclogbot_3 has joined #zuul		21:55
corvus	clarkb: good q, responded	22:04
clarkb	thanks for the clarification. I think the change lgtm	22:07
corvus	cool, i'll start +wing that stack and maybe we can restart w it tomorrow	22:08
clarkb	cool. I'm hoping we can also restart gerrit to pick up 3.2.8 and the jeepyb fixes as well as maybe finally start the zk cluster upgrade	22:09
clarkb	will be a fun one :)	22:09
*** rlandy\|rover is now known as rlandy\|rover\|bbl		22:17
corvus	lol friday	22:17
*** nils has quit IRC		22:36
openstackgerrit	Merged zuul/zuul master: Add a fast-forward test https://review.opendev.org/c/zuul/zuul/+/786521	23:07
*** harrymichal has quit IRC		23:09
*** harrymichal has joined #zuul		23:10
*** harrymichal has quit IRC		23:14
*** tosky has quit IRC		23:17
ianw	clarkb: good question ... i think they're ephemeral nodes but, maybe not	23:21
clarkb	ya I think image uploads survive restarts	23:22
corvus	locks are always ephemeral nodes	23:23
clarkb	oh the locks themselves	23:23
clarkb	ok so recursive shouldn't be necessary, but we may need to do our best to check for the old lock path too when locking?	23:23
openstackgerrit	Merged zuul/zuul master: Correct repo_state format in isUpdateNeeded https://review.opendev.org/c/zuul/zuul/+/786522	23:35
openstackgerrit	Merged zuul/zuul master: Revert "Revert "Make repo state buildset global"" https://review.opendev.org/c/zuul/zuul/+/785535	23:36
openstackgerrit	Merged zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen https://review.opendev.org/c/zuul/zuul/+/785536	23:36
openstackgerrit	Merged zuul/zuul master: Restore repo state in checkoutBranch https://review.opendev.org/c/zuul/zuul/+/786523	23:37
openstackgerrit	Merged zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744	23:37
openstackgerrit	Merged zuul/nodepool master: Remove statsd args to OpenStack API client call https://review.opendev.org/c/zuul/nodepool/+/786862	23:48
*** paladox has joined #zuul		23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!