Tuesday, 2020-01-07

*** jamesmcarthur has joined #zuul00:02
openstackgerritMerged zuul/zuul-jobs master: Test swift upload logs without encoding type for gzip files  https://review.opendev.org/70128300:04
mnasercorvus: neat, i just did a little bit of changes to make those fixes, i wrote those jobs without sudo in mind00:17
*** jamesmcarthur has quit IRC00:58
*** jamesmcarthur has joined #zuul01:15
*** igordc has joined #zuul01:49
*** jamesmcarthur has quit IRC02:02
*** zbr_ has joined #zuul02:04
*** bhavikdbavishi has quit IRC02:04
*** zbr has quit IRC02:07
*** zbr has joined #zuul02:09
*** zbr_ has quit IRC02:11
*** Goneri has quit IRC02:19
*** igordc has quit IRC02:43
*** bhavikdbavishi has joined #zuul02:52
*** bhavikdbavishi1 has joined #zuul02:55
*** bhavikdbavishi has quit IRC02:56
*** bhavikdbavishi1 is now known as bhavikdbavishi02:56
*** michael-beaver has quit IRC03:05
*** chandankumar has joined #zuul03:55
*** spsurya has joined #zuul04:32
*** chandankumar has quit IRC04:34
*** sshnaidm|afk has quit IRC04:38
*** bolg has joined #zuul04:52
*** chandankumar has joined #zuul05:13
*** sshnaidm has joined #zuul05:30
*** evrardjp has quit IRC05:33
*** evrardjp has joined #zuul05:33
*** chandankumar has quit IRC05:52
*** sshnaidm has quit IRC05:54
*** pcaruana has joined #zuul06:00
*** chandankumar has joined #zuul06:02
*** bolg has quit IRC06:02
*** pcaruana has quit IRC06:14
*** AJaeger has quit IRC06:55
*** AJaeger has joined #zuul07:02
openstackgerritFelix Schmidt proposed zuul/zuul master: Report retried builds via sql reporter.  https://review.opendev.org/63350107:18
*** avass has joined #zuul07:22
*** themroc has joined #zuul07:26
*** bolg has joined #zuul08:18
*** saneax has joined #zuul08:21
*** tosky has joined #zuul08:23
*** arxcruz|off is now known as arxcruz08:24
*** pcaruana has joined #zuul08:29
*** sshnaidm has joined #zuul08:32
*** chandankumar has quit IRC08:51
*** chandankumar has joined #zuul08:53
*** sshnaidm has quit IRC09:02
*** dmellado has quit IRC09:07
*** dmellado has joined #zuul09:08
*** bhavikdbavishi has quit IRC09:12
*** sshnaidm has joined #zuul09:42
avassHmm, the fetch-output role fails with a 'chown invalid argument' message if we try to preserve uid and guid for some reason.09:54
*** rlandy has quit IRC10:10
*** chandankumar has quit IRC10:39
*** sshnaidm has quit IRC11:32
*** sshnaidm has joined #zuul11:33
*** bolg has quit IRC11:53
*** bolg has joined #zuul11:55
*** themroc has quit IRC12:03
*** themroc has joined #zuul12:03
*** bolg has quit IRC12:05
*** themr0c has joined #zuul12:09
*** themroc has quit IRC12:09
*** zbr has quit IRC12:11
*** zbr_ has joined #zuul12:11
*** bolg has joined #zuul12:29
*** bhavikdbavishi has joined #zuul12:43
*** rfolco has joined #zuul12:45
*** rlandy has joined #zuul12:57
*** bolg has quit IRC13:06
*** bolg has joined #zuul13:08
*** bolg has quit IRC13:27
*** sshnaidm is now known as sshnaidm|mtg13:33
*** bhavikdbavishi has quit IRC13:34
*** Goneri has joined #zuul13:42
*** bolg has joined #zuul13:47
*** bolg has quit IRC13:56
*** bolg has joined #zuul13:59
*** bolg has quit IRC14:16
*** pcaruana has quit IRC14:20
*** themroc has joined #zuul14:29
*** bolg has joined #zuul14:29
*** themr0c has quit IRC14:31
*** pcaruana has joined #zuul14:32
*** bolg has quit IRC14:36
fungii don't think rsync can preserve uid/gid unless it's run as root14:43
fungimaybe we need a note about that14:44
mordredavass: also - I don't see anyting in fetch-output about preserving uid/gid - where is that issue happening?14:49
*** zbr_ is now known as zbr|rover14:50
Shrewsmordred: i think the synchronize module automatically does that unless you give the option not too (was looking at it earlier)14:52
Shrewsarchive=yes  default14:53
mordredah - nod14:54
*** sshnaidm|mtg is now known as sshnaidm|afk14:55
*** avass has quit IRC14:55
*** themroc has quit IRC14:55
mordredso if content is put into the fetch-output directory with preserved uid/gid from its original source, fetch-output is going to have a sad14:55
pabelangeris anybody working on a patch to add volume quota support to nodepool? We often see NODE_FAILURE on zuul.a.c, if we've leaked volumes in our providers.  Last I checked, we don't account for volumes in our quota calculations14:55
mordredShrews: seems like we should maybe either set archive=no in fetch-output or add a flag somewhere or something14:56
*** themroc has joined #zuul14:56
Shrewsmordred: yah14:56
mordredpabelanger: not to my knowledge, but my knowledge shoudl be considered suspect14:56
pabelangerack14:57
Shrewspabelanger: leaking volumes is an openstack cinder issue that i don't know how to safely solve in nodepool (for automatic cleanup)14:57
Shrewsbut adding general quota support should be doable, i would think14:58
Shrewsmordred: i can throw something up for fetch-output unless you're already on it14:58
pabelangerShrews: it would be nice if we add clean-volumes flag, like we do for clean-floating-ips, but not sure how we'd track them. Think that logic is in openstacksdk?14:58
pabelangermordred: is that right?14:59
Shrewspabelanger: that's what i meant. tracking those is the key14:59
*** saneax has quit IRC15:00
pabelangerbasically, there are 2 issues. a) provider has leaked volumes, and not able to delete. For that you need admin access. b) there are leaked 'available' volumes, which are value, but just taking up quota space. In that case, nodepool won't delete or use15:00
Shrewsi haven't seen (b) i don't think?15:00
smcginnisIt's been years, but still waiting for someone to show some data on "cinder leaking volumes" to show it actually is cinder and give a clue how to reproduce and fix. If anyone has logs or anything, it would be useful to get an issue filed to capture that.15:01
pabelangermnaser: not to call you out, but is that something you may be able to help with? logs for smcginnis15:01
Shrewssmcginnis: i think mnaser and fungi had a pretty good handle on the cause. the difficulty (i think) was being able to communicate with nova from cinder15:02
smcginnisDeja vu and motion sickness from going in circles. :)15:02
smcginnisRight, the closest I've seen in the past was that nova was not telling cinder to delete a volume. Yet it keeps getting put out there that "cinder leaks volumes".15:03
Shrewssmcginnis: sorry, not trying to lay blame on the wrong thing.  that's just how i refer to the issue in general  :)15:04
smcginnisCinder will happily leak as many volumes as it's not told to delete. ;)15:04
pabelangeras for tracking volumes, can a volume hold metadata?15:04
pabelangerI am guessing, no15:04
smcginnisYeah, I get it. Just a little frustrating to keep seeing that repeated when we've at least narrowed down some of the cases to another service's issue, yet still getting framed as a cinder problem.15:04
Shrewsperhaps "openstack volume leak" is a better term15:05
*** themroc has quit IRC15:05
smcginnisNova volume leak, from the cases where we've actually gotten data, but sure. :P15:05
Shrewsor "that dang volume thing"15:05
pabelangervolume_image_metadata looks promising15:05
mordredShrews: I'm not already on fetch-output15:05
Shrewsmordred: k15:06
pabelangerShrews: couldn't we do the same thing for servers and metadata, but with volumes?  eg: inject zk info then periodic loop and delete if doesn't exist?15:06
mordredyeah - nova volume leak or openstack volume leak - becuase the issue at hand is with volumes created by nova as part of a boot-from-volume15:07
Shrewspabelanger: iirc, the issue was from automatic volume creation when booting a server?15:07
pabelangeryah, this is BFV15:08
mordredif the user (nodepool) had initial control over the volume rather than delegated control, it's not a problem - but that's a bit harder coding-wise15:08
Shrewspabelanger: yeah, see boot-from-volume per above15:08
mordredI don't remember - can we do BFV with a pre-existing volume?15:08
mordredsmcginnis: ^^ ??15:08
smcginnisYes, and that's usually the safest way.15:09
smcginnisNova has pushed back in the past on doing anything more with orchestrating, so really the best practice is to create a volume first, then tell nova to use it to boot from.15:09
mordredah. ok. so - we maybe need to recode nodepool's bfv support to do that15:09
smcginnisThat would probably be safer and help avoid this.15:10
mordredsmcginnis: that's making me think I should recode openstacksdk to create the volume itself rather than using nova-create15:10
mordredsmcginnis: we can attach metadata to a volume, right?15:10
pabelangerShrews: sorry, not seeing it.15:11
mordredthe problem with haivng sdk do this for people automatically would be deleting the server and not knowing which volume to delete - but it we can put metadata on a volume that's solvable15:11
mordredShrews: do we want to try to solve this in the sdk layer? or add logic to nodepool?15:12
Shrewspabelanger: sorry, just meant we use the bfv option, which you already know15:12
pabelangermordred: yah, openstack volume set --property foo=bar e86ab129-1915-4574-9e67-09b1d9823f48 works15:12
mordredpabelanger: awesome15:12
Shrewsmordred: sdk seems right? but not sure what implications that might have for non-nodepool users15:13
pabelangerso, if we tagged all volumes with zk id, that is step in right direction15:13
smcginnismordred: Yep, that sounds like a good plan too. It would be great if the SDK just took care of that and didn't make it so the end user had to take extra steps.15:13
mordredpabelanger: so with that - we could store something like "openstacksdk-delete-with-server: true"15:13
smcginnisSo someone used to the "old" workflow just starts noticing better results.15:13
mordredfor volumes sdk auto-created15:13
mordredsmcginnis: ++15:13
pabelangermordred: ++15:13
smcginnisAnd Cinder finally fixed it's leak problem. :D15:13
mordredsmcginnis: as long as I get cinder points :)15:13
*** bhavikdbavishi has joined #zuul15:14
Shrewssmcginnis: aha! you admit it! it's now documented   ;)15:14
smcginnisExtra bonus points even.15:14
smcginnisShrews: Haha! ;)15:14
mordred\o/15:14
mordredok. I can work on a patch for that - the concepts at least make sense15:14
smcginnismordred: If you happen to remember, please add me as a reviewer or ping me with it. I'd love to see that go through.15:15
mordredalso - thanks smcginnis! it's super helpful when we get good info from services like that15:15
pabelangermordred: awesome, tyty15:15
mordredsmcginnis: I will *definitely* ping you with it15:15
mordredsmcginnis: if you don't watch out I'll make you sdk core :)15:15
Shrewsmordred: i think i just heard smcginnis request core status15:15
*** avass has joined #zuul15:16
smcginnisI knew I should have just gone back to my coffee!15:16
avassmordred: yeah I fixed it by setting owner=no group=no15:18
*** bolg has joined #zuul15:18
Shrewsopenstack: punishing developers with core status since 201015:18
*** bhavikdbavishi1 has joined #zuul15:19
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Don't preserve logs uid and guid  https://review.opendev.org/70138115:20
*** bhavikdbavishi has quit IRC15:20
*** bhavikdbavishi1 is now known as bhavikdbavishi15:20
avassGotta go but I'll check in later15:21
Shrewsavass: oh, mordred and i were just discussing a similar change. perhaps you could change 701381 to use a variable to control permissions via the 'archive' option?15:21
avassShrews: Sure I'll take a look at it when I get home :)15:22
Shrewsavass: ++15:22
Shrewsthanks!15:22
*** avass has quit IRC15:22
fungismcginnis: Shrews: yes, the issue i'm familiar with seemed to be nova ignoring deletion errors returned from cinder (or possibly never managing to request deletion due to connectivity errors?) and then going ahead and deleting the instance while cinder still thinks the volume is attached (to a now nonexistent instance)15:24
fungiwhich leaves nobody but the cloud operator able to clean up the mess, since there's no way to get nova to tell cinder about a detach at that point, and cinder (sanely) won't let normal users delete attached volumes15:27
mordredfungi, smcginnis: sdk still might have problems working around that15:28
*** michael-beaver has joined #zuul15:28
fungii can't remember why telling nova not to proceed with the instance deletion if it can't confirm from cinder that the volume got detached was a problem, but maybe that's what smcginnis means by "more orchestration in nova"15:29
mordredbecause if the issue is that a server delete isn't communicating the detach fully - then the situation fungi describes will still occur15:29
pabelangermordred: other question I had, do you think we could bubble up warnings / deprecation notices to console logs for zuul users to see? When moving jobs between ansible versions, it is sometime hard to see what is deprecated. Today, only zuul operator has access to that info in executor logs.15:30
mordredbecause I don't think as an API consumer sdk can say "please detach this volume" to a volume that is the root volume for a running server15:30
fungicould it if the server were stopped first?15:30
corvusiiuc, pabelanger described both situations -- user-level leakage, and admin-level leakage.  could be 2 problems?15:30
fungiyeah, we have seen both. some volumes are successfully detached but never deleted15:30
mordredfungi: I don't know - we could try that15:31
corvusoh, so maybe it's more or less the same problem, but the interruption in communication happens at different phases?15:31
fungithose we could certainly clean up with normal credentials15:31
fungicorvus: it wouldn't surprise me15:31
pabelangercorvus: yup, I'm able to manually clean up 'available' volumes which block quota. For leaked (still attached) I need to ask the admin15:31
mordredsmcginnis: do you happen to know if a normal user can detach a volume from a stopped server if the volume in question is the root volume of the server?15:32
mordredfungi, corvus, Shrews: also - upon further reflaction, I do not believe we can fix this in sdk - or, at least, not in a way that nodepool can use15:32
Shrewsyah15:32
mordredand that's because fixing it sdk-side would turn create_server into a synchronous call (we'd have ot wait for the volume create to pass the volume to the server create)15:33
Shrewsalso yah15:34
corvusor else some kind of chain of closures or something15:34
mordredso I think there's two things here - is there a user-level sequence of API operations we can do instead of BFV via nova that will avoid the attachment issue - and then how best to implement it - which might be a feature in sdk that nodepool doesn't use, as well as a feature in nodepool15:34
mordredyeah15:34
mordredmaybe we do the sdk support in such a way that nodepool can consume it piecemeal15:34
mordredbut first we need to find the appropriate api sequence15:35
Shrewsor we could just add a new sdk API method specifically for this?15:35
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add cri-o support to use-buildset registry  https://review.opendev.org/70123715:36
Shrewscreate_server_with_volume() which automatically assumes wait_for_volume() / wait_for_server()15:36
Shrewsor similar nonsense words15:37
mordredShrews: yeah - but I think nodepool couldn't use that - so we'd still need to figure out how best to expose the capability in  a way nodepool could use ... I'm liking the corvus chain-of-closures idea as I think about it - but it might go to the dark place15:38
corvusyeah, pieces that nodepool can use sounds like a great first step without turning the taskmanager into a turing machine :)15:39
*** bolg has quit IRC15:39
smcginnismordred: No, you are not able to detach a boot volume from an instance, even when it's been stopped.15:40
smcginnisI think Matt had started looking at allowing that to be able to do some other shelving or something, but it sounded like a not insignificant effort and I don't think it's gotten anywhere yet.15:40
mordredsmcginnis: gotcha. so without that we'd be unable to avoid the proposed issue of "nova sent a detach which didn't work but then deleted the instance anyway"15:42
mordredsince we'd be reliant on nova doing the detach15:42
smcginnisHmm, yeah.15:43
mordredthat said- maybe that's a more tractable bug? when instances are deleted, sometimes the cinder metadata about volume attachment state is stale?15:44
mordredit's not really more orchestration as much as simply completing an action that nova is the only actor that can perform it15:45
smcginnisYeah, I think in that case nova would blow away it's information about it, but cinder would still have an entry in the attachments table for the volume and instance.15:45
smcginnisSo in theory something external should be able to reconcile it.15:45
mordredyeah15:45
mordredsmcginnis: if you don't mind ... how does this work today? is there an internal api or something that nova is calling as part of its delete process to tell cinder it's no longer attached?15:46
smcginnisYep. Nova calls a detach API to tell cinder to remove any mappings or whatever backend specific thing that exposes the volume to the host and remove the entry from the database.15:48
smcginnisSome backends need to explicitly say "let initiator XYZ access volume 123".15:49
smcginnisSo detach removes that.15:49
mordredsmcginnis: nod. and that API is one that is not available to normal users, because if it's something that's called on a volume that is actually still in use, it would potentially create havoc15:51
smcginnismordred: Well, it is exposed as a regular API, no special secret handshake needed.15:51
smcginnisIt's just not publicized as something an end user should use normally.15:52
smcginnisMost should be using this newer microversion - https://docs.openstack.org/api-ref/block-storage/v3/index.html?expanded=delete-attachment-detail#delete-attachment15:52
mordredsmcginnis: oh - wait - a non-privileged end-user *could* use it?15:52
mordredaha!15:52
smcginnisI believe if they have access to the volume, they have rights to make that call with the default policies.15:52
mordredpabelanger: do you have any volumes in the leaked state needing admin deletion right now?15:53
smcginnisJohn Griffith and Matt R did a lot of the work to implement this API and get a better handle on the whole workflow.15:53
smcginnisI would need to find the specs from that work to get more details. It's not always clear which calls come in which order without swapping some of that back in.15:53
pabelangermordred: yup15:54
mordredpabelanger: cool. I think we should try that api call above to see if it removes the attachment from the volume and then lets you delete it directly15:54
mordredpabelanger: if you like, i can try to cook up a script that would do it - but I can't test it myself15:54
smcginnisSome docs in case anyone is interested - http://specs.openstack.org/openstack/cinder-specs/specs/ocata/add-new-attach-apis.html15:54
mordredif the api call works, then I think we've got a path forward15:55
pabelangermordred: sure, happy to test15:55
mordredpabelanger: k. I'll get you a thing in just a bit15:55
pabelangerhttp://paste.openstack.org/show/788131/15:55
pabelangeris what I see if I try to directly delete15:55
pabelangerhttp://paste.openstack.org/show/788132/15:56
pabelangeris server info15:56
smcginnisYou can also force reset a volume status, but I think that would leave some crud in some of the tables.15:56
smcginnishttps://ask.openstack.org/en/question/66918/how-to-delete-volume-with-available-status-and-attached-to/15:57
Shrewsgotta step out for an appointment, but will catch up on this convo when i return.15:57
mordredpabelanger: http://paste.openstack.org/show/788133/16:01
mordredpabelanger: obvs change the cloud name16:01
mordredpabelanger: but that's your attachment id16:01
mordredpabelanger: if that works, then you should be able to try deleting the volume again16:02
pabelangermordred: yup, that worked! I tested with 2 leaked volumes16:05
mordredpabelanger: WOOT!16:07
mordredI have a path forward16:07
pabelangerYay16:07
mordredalso - Shrews re: the above - it should be possible to also code a safe leak checker if we have server ids - logic being "do get /attachments ; look to see if it's for a server we know about but which we thnk is deleted, if so, force a detach"16:08
mordredbut I think we can make a system that doesn't leak16:08
mordredfungi, clarkb, corvus: ^^ see scrollback and paste for information on how to resolve leaked volumes without need of an admin (the hacky/script version)16:08
clarkbsometimes they fail to delete and have no attachment16:09
clarkbwe may even have a few in that state in vexxhost right now16:09
clarkbiirc we did pre holidays16:09
mordredI'll go look16:10
mordredor - rather16:10
mordredclarkb: do you know how to identify those?16:10
clarkbmordred: volume list | grep available then volume show on those and look for those that are older than a few days16:10
clarkbthen try and delete them and see if they go away16:11
pabelangerclarkb: yah, that is b) from above16:11
pabelangerwe also see that16:11
pabelangerfor that, I think we could add zk id into meta data, then compare, if missing zk id, delete?16:12
clarkbthe problem is they don't delete16:12
pabelangerwell, not missing zk id, doesn't exists16:12
clarkbsome do some dont16:12
mordredyah - clark is suggesting there is a C state16:12
pabelangerack, so far I haven't see that issue16:12
pabelangermind you, I often delete 2 weeks later16:12
mordredI think I'm going to focus on seeing if we can eliminate a and b16:12
mordredsince we've got new info on how to handle those16:13
clarkbthere is a fourth state too where your quota usage recorded by cinder is completely out of sync and you can't bfv because you have no more volume quota16:13
Shrewsmordred: problem with us “having server ids” is we don’t keep that info after the server is deleted. Can discuss more later16:15
*** rfolco is now known as rfolcOUT16:16
mordredShrews: good point.16:18
mordredclarkb: is it common for these volumes to take my entire life to delete?16:19
clarkbmordred: ya its not fast, but that may mean they won't delete? which is the state I was pointing out16:20
*** electrofelix has joined #zuul16:23
*** rishabhhpe has joined #zuul16:23
mordredclarkb: it was outbound http hanging because fo stupid captive portal :)16:24
mordredclarkb: all leaked volumes in vexxhost deleted with no issue. I'll keep my eye out for that so we can try to trick smcginnis in to helping us with them16:29
fungihe's surprisingly easy to bait into being helpful16:35
mordredfungi: it's almost like he's a helpful person16:36
fungioh, maybe that's it ;)16:40
mordredsmcginnis, Shrews: ok - so I can confirm that I can do the delete attachments call on a volume that has been created by nova as part of a BFV and is a boot volume for a server (I stopped the server first)16:46
mnaserpabelanger, smcginnis: indeed i actually don't agree its cinder leaking volumes, its more like nova noop-ing volume deletes *however* i have seen cases where quota does get out of sync16:46
mordredmnaser: yah - I think we've got a fix in works16:46
mnaserreproducers are hard obviously for the quota out of sync16:46
mnaseri will say: 99% of the time everything works16:47
mordredso I *think* the safe flow would be "create volume ; boot server on volume.id" for create - then for delete: "stop server ; delete attachment ; delete server ; delete volume"16:48
pabelangermnaser: yah, it only happens every few months16:48
pabelangerall other times, works every time :)16:48
mordredwe could also leave things as they are, since we can clean up the situation with the attachment delete - and simply add a flag like "nodepol.autocreated" to the volume metadata after teh server create if finished16:49
mordredthen add a cleanup thread that looks for volumes in an attached state with nodepool.autocreated set to True where the server id listed in teh volume's attachments does not exist16:50
mordredand clean those up by deleting the attachment and then deleting the volume16:50
mordredmight not need nodepool.autocreated flag - any volume in an attached state with a non-existent server is a leak and should be safe to clean16:50
mordredyeah?16:50
mnaseri would scan for attachments for nodepool created volumes instead just in case there is some other thing going on16:51
mnaseri dont know if we enforce "nodepool owns an entire tenant" rule16:51
mordredwe don't16:51
mordredso a nodepool flag is safer16:52
pabelanger+116:52
mordredalthough it's pretty much always a volume leak if a volume has a bogus attach - even if it's not a nodepool leak16:52
mordredbut let's start with the nodepool filter16:52
mnaseri agree with that too16:53
*** mattw4 has joined #zuul16:53
mnaserbut when deleting things it's always scary with automation :p16:53
mnaserso the more barriers...16:53
mordredyup16:53
pabelangernodepool, deleter of all things16:53
mordredI'll make an SDK "cleanup_leaked_volumes" method that takes an optional thing to look for in metadata16:54
clarkbany busy nodepool effectively owns a tenant due to quota utilization fwiw16:54
clarkb(also isn't this what tenants are for?)16:55
mordreds/tenants/projects/ yes16:55
rishabhhpehello all, in community page and also in ciwatch i can see status N which means not registered .. what does it signify16:55
rishabhhpeand what can be done to overcome that16:55
fungirishabhhpe: i don't know what you're talking about, though also we have nothing to do with ciwatch16:56
fungirishabhhpe: it sounds like maybe you're running a third-party ci system for testing an openstack driver? i would start by asking people from the project that driver integrates with16:57
fungithey're the ones who will be able to tell you what they expect of your ci system16:57
rishabhhpeOK i will ask in third party ci channel then16:58
*** pcaruana has quit IRC17:05
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: Add basic Helm jobs  https://review.opendev.org/70022217:12
*** tosky has quit IRC17:23
*** bhavikdbavishi has quit IRC17:23
rishabhhpeHi .. can any one check the error pasted in mentioned link "http://paste.openstack.org/show/788139/" and let me know if i am missing anything while configuring zuul v3 using this link https://zuul-ci.org/docs/zuul/admin/quick-start.html17:24
*** bhavikdbavishi has joined #zuul17:24
clarkbfungi: I don't think we need to update the docs for the upload roles17:27
clarkbfungi: for the non swift upload role you'll only want to compress if the fileserver can handle it (and if not you'd disable stage_compress_logs too)17:27
clarkband for swift the idea is to make it transparent to the user17:27
*** pcaruana has joined #zuul17:28
fungiyeah, just wondering if the behavior change provided by the stage_compress_logs variable is documented (i confused it with zuul_log_compress)17:29
fungihttps://zuul-ci.org/docs/zuul-jobs/general-roles.html#rolevar-stage-output.stage_compress_logs17:30
funginot sure if there's benefit to explaining in the rolevar description what the implications of that option are in combination with swift storage17:31
clarkbfungi: https://review.opendev.org/#/c/701282/2/roles/stage-output/README.rst updates that17:31
clarkbmaybe that update is too brief17:32
fungiaha, nope, that's plenty17:32
*** evrardjp has quit IRC17:33
fungii was only looking at 70128417:33
*** evrardjp has joined #zuul17:33
fungiand i guess https://etherpad.openstack.org/p/wwgBLSq1Rq is announcing 70128217:34
clarkbyes17:34
fungimakes much more sense now, thanks17:34
clarkbI don't think 84 needs to be announced as we've always hidden the behavior from the user (except for when it leaks through as in this bug and does the wrong thing)17:34
fungiyeah17:35
pabelangerokay, using mordred script, I've been able to clean up all leaked volumes (without admin powers)17:35
fungiclarkb: do those need to merge in any particular order?17:35
fungii guess they're independent-but-related changes17:35
mordredpabelanger: woot!17:36
* mordred has helped something today17:36
clarkbfungi: I dont think order matters too much as long as they are adjacent17:39
*** sgw has quit IRC17:39
fungiyeah, i'm +2 on both. thanks for working through those17:39
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Added nodepool to charts  https://review.opendev.org/70046217:41
corvusclarkb: 284 lgtm, +3; etherpad lgtm; do you want to wip 282 until next week?17:42
clarkbcorvus: that works, wasn't sure how long of a warning you wanted to give17:44
*** saneax has joined #zuul17:44
clarkbI'll add a link to 282 to the email draft too17:44
clarkbWIP'd and sending email now17:46
fungirishabhhpe: which error are you referring to? if it's "Cannot send SSH key added message to admin@example.com" that's probably benign since it's not a reachable e-mail address17:46
fungirishabhhpe: yep, it shows up in our quickstart job log too and that doesn't result in any failure: http://zuul.opendev.org/t/zuul/build/c422829e153640b68bd983f1bd2e120f/log/container_logs/gerrit.log#13517:48
clarkbcorvus: once 284 lands we should merge something to zuul to get a new js tarball17:49
clarkbhttps://review.opendev.org/#/c/697055/ maybe? /me reviews this change17:49
clarkboh you've got a question on that one17:50
corvusclarkb: https://review.opendev.org/698774 looks ready17:50
corvusor we could ask someone else to review https://review.opendev.org/69171517:51
rishabhhpefungi: so this docker-compose up command is never ending shell ? and i can run it in background if require to get my services up for all time ?17:52
clarkbcorvus: that first one looks like a good one to get in. I'm reviewing it17:54
openstackgerritMerged zuul/zuul-jobs master: Swift upload logs without encoding type for gzip files  https://review.opendev.org/70128417:56
clarkbhttps://review.opendev.org/#/c/698774/ is headed in now18:00
clarkbthat should update the js tarball which should be retrieved with no decompression now18:00
*** saneax has quit IRC18:01
fungirishabhhpe: you mean the command at https://zuul-ci.org/docs/zuul/admin/quick-start.html#start-zuul-containers i guess?18:03
fungii thought `docker-compose up` returned control to the calling shell18:03
clarkbit does not. It follows the containers18:05
clarkband when you kill the up command it kills the conatiners18:05
clarkbdocker compose start does the background18:05
clarkbor up -d18:05
rishabhhpeclarkb: it worked for me but i can see after this some of my containers are exiting18:08
clarkbrishabhhpe: some of them are one shot containers to do setup18:10
fungiaha, my grasp of containery stuff is still a bit tenuous18:10
fungithanks18:10
fungii suppose where the quickstart says "All of the services will be started with debug-level logging sent to the standard output of the terminal where docker-compose is running." should have been a tip-off18:11
fungiobviously leaking output from a background process to an interactive terminal would be a poor choice, so that does imply the shell is blocked on that running process18:12
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Added nodepool to charts  https://review.opendev.org/70046218:13
rishabhhpefungi: i did not get what you are trying to say18:13
*** michael-beaver has quit IRC18:18
fungirishabhhpe: i was conversing with clarkb about how in retrospect it should have been obvious to me from the quickstart document that the `docker-compose up` command will block the calling shell and stay in the foreground indefinitely18:21
rishabhhpefungi: clarkb: is there any documentation available for setting third party ci using zuul v318:25
fungirishabhhpe: i don't know of any. like i said earlier, you should talk to the projects who are asking you to do this testing and find out what they expect from you18:26
fungirishabhhpe: the opendev infra team has a help-wanted spec up about what that documentation might cover: https://docs.opendev.org/opendev/infra-specs/latest/specs/zuulv3-3rd-party-ci.html18:27
ShrewstristanC: a few weeks back (which seems like years now), you suggested a web site that explained how technical documentation should be organized. do you recall what that was?18:29
tristanCShrews: it was: https://www.divio.com/en/blog/documentation/18:30
Shrewsyes! thank you18:31
tristanCShrews: you're welcome, i find this structure quite transformative :)18:33
ShrewstristanC: do you have any documentation using this structure?18:34
Shrewsthat's public18:34
tristanCShrews: https://docs.djangoproject.com/en/3.0/#how-the-documentation-is-organized18:35
Shrewsoh, yeah. that works. thx again18:35
tristanCShrews: https://docs.dhall-lang.org/ and my own attempt at it is: https://github.com/podenv/podenv18:36
*** sgw has joined #zuul18:37
Shrews++18:38
fungihttps://zuul-ci.org/docs/zuul/developer/specs/multiple-ansible-versions.html is fully implemented now, right? is there any process we're using to mark completion of a specification in zuul?18:38
fungido we delete it? or remove the admonition from the top of the page, or...?18:38
*** rishabhhpe has quit IRC18:40
tobiashfungi: yes, that's fully implemented18:49
tobiashI guess we don't have a process for marking completed specs yet?18:50
Shrewsi think also https://zuul-ci.org/docs/zuul/developer/specs/container-build-resources.html18:50
fungiright, i was using multi-ansible as an example. that was really my question... i guess we need to decide how to signal when a spec is implemented18:51
Shrewsmaybe just sections:  Proposed and Completed18:51
fungii'll push up a possible solution18:51
fungiwell, and also removing or changing the admonition18:51
fungifor the completed specs18:51
Shrewsyeah18:52
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Added nodepool to charts  https://review.opendev.org/70046218:52
fungialso, do we expect specs to be updated to reflect the implementation, if the proposed plan differed somewhat from what we wound up with? that will impact how we present a completed spec to the reader, i guess18:52
openstackgerritMerged zuul/zuul master: Don't set ansible_python_interpreter if in vars  https://review.opendev.org/69877418:52
AJaegerREgarding specs: I think with Zuul the team does a great job to update docs, so do we need completed specs at all? Or isn't there good enough documentation so that the spec can be removed from our published site?18:54
fungithat seems like the simplest option to me. delete specs once implemented18:54
fungiwhich is what i had assumed was going on before i started looking closer at what was in the specs directory18:55
fungithe text isn't lost, since we'll have it in the git history forever18:55
AJaegerexactly18:55
AJaegerand a completed spec should be of interest only to developers implementing Zuul, shouldn't it? So, they have access to git history...18:56
fungiagreed18:56
fungii'll start with that and we can use it at least as a springboard for talking about more complicated options18:56
AJaegerIf we delete specs, we might add a note to the specs site: Implemented specs get deleted, use git if you need it - and read our docs for full treatment18:56
fungigood idea, i'll include that18:57
fungihttps://zuul-ci.org/docs/zuul/developer/specs/logs.html is done too, right?18:59
clarkbthe zuul js tarball is actually compressed now \o/18:59
* fungi cheers19:00
*** tjgresha has joined #zuul19:02
AJaeger\o/19:02
*** bhavikdbavishi has quit IRC19:04
*** tjgresha has quit IRC19:07
*** tjgresha has joined #zuul19:07
openstackgerritJeremy Stanley proposed zuul/zuul master: Remove implemented specs  https://review.opendev.org/70143519:19
*** spsurya has quit IRC19:25
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Added nodepool to charts  https://review.opendev.org/70046219:28
*** electrofelix has quit IRC19:33
*** armstrongs has joined #zuul20:08
*** armstrongs has quit IRC20:14
*** tjgresha has quit IRC20:17
*** tjgresha_ has joined #zuul20:17
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Add Zuul charts  https://review.opendev.org/70046020:25
mnasercorvus: fyi im slowly progressing on the helm charts, nodepool seems ok and passing, ill iterate on the rest but i dont have a lot of bandwidth these days so feel free to pick things up20:26
pabelangermnaser: haven't been following along, but is ansible-operator zuul not a thing anymore?20:27
mnaserpabelanger: given time constraints and a bit of limbo in deciding where to go and take it, i settled for working on helm charts which are a bit less "invasive" in terms of work needed to deploy20:28
mnaserideally we can take the work we build there and convert it to an operator easily though20:28
corvusin the end, i don't think it'll hurt to have both, if people need different deployment methods20:28
mnaseryep that too20:29
pabelangeryup, mostly curious is other was dropped20:30
mnaseryeah it's a lot more work for the other one and this works enough20:31
corvusmnaser: thanks, i'm hoping to resume work on the gerrit-project zuul work soon where i think we can use the helm charts.  but i'm blocked waiting on account creation right now, so probably not this week.20:33
mnasercorvus: cool, i have a basic deployment running fine with no problems.  the only issue is the runc zombie processes20:33
corvusmnaser: what runc processes?20:33
mnasercorvus: or maybe it was bwrap processes?20:34
corvusyeah, probably bwrap.20:34
corvusbut i haven't seen that yet -- any theories about the cause?20:34
* mnaser looks to check20:37
mnaseri'm not really sure, i'm curious if its the fact that its a container-inside-container thing20:37
mnaserok i have a few here, let me get a psate20:37
*** tjgresha_ has quit IRC20:37
mnaserhttps://www.irccloud.com/pastebin/2qNlcLxM/20:38
mnasercorvus: i wonder if it has to do with zuul running as uid0 in the container..20:38
mnaserin the case of https://opendev.org/zuul/zuul/src/branch/master/Dockerfile -- we dont seem to make a non-prviledged user like in nodepool20:40
fungiwe're definitely not getting that on production executors in opendev, so could be container-related20:41
mordredmnaser: that process list doens't seem to show dumb-init20:41
mnaseri dont think the zuul images have dumb-init, or do they20:41
fungilooks like crio is running the executor directly?20:41
*** pcaruana has quit IRC20:41
mnasercorrect, that's pretty much using the k8s helm charts that are up for review20:42
mnaserso https://review.opendev.org/#/c/700460/5/charts/zuul/templates/executor/statefulset.yaml20:42
fungibut yeah, maybe dumb-init is needed to clean up terminated subprocess forks there?20:42
mordredmnaser: https://opendev.org/opendev/system-config/src/branch/master/docker/python-base/Dockerfile#L2320:42
corvusthe executor does not wait() on processes it kills, so if there is no init, that would explain zombie processes in those cases (aborted/timed out jobs)20:42
mnasero20:42
fungithat makes sense20:43
mordredmnaser: maybe doing command zuul-executor in the chart is also overriding the base image entrypoint?20:43
mnaser"Note: The command field corresponds to entrypoint in some container runtimes. Refer to the Notes below."20:43
mnaserhttps://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes20:43
corvusah yeah, is there an option to do the other thing?  where it adds extra args to the entrypoint?20:44
corvusmnaser: also, why set "command" in the helm chart?20:44
mordredah - but you could also just supply it as args20:44
mnaseri wrote this a while back so im not sure if it didn't work using args only20:45
mnaserlet me try it out right now20:45
mordrednod20:45
mnaseri def remember there was a reason behind that20:45
mordredI'd start with corvus question - shouldn't need it at all20:45
mordredbut if it is needed, I'd try just doing args20:45
corvusmaybe to add the "-d" ?20:45
mordredah - that's a good point20:46
corvusbut yeah, let's try switching that to "args: .../zuul-executor -d"20:46
mordredso either args: or just add /usr/bin/dumb-init -- to the beginning of the command list20:46
mnaseryeah -d is all i need it for20:46
corvusand if "args" fails, do mordred's full command suggestion20:46
mnaserit might be leftovers from like different iterations20:46
corvusand let's, someday, add an env variable to toggle debug logs to make this nicer20:46
mnaserwell this goes back to an old patch i never had time t oget around which is20:47
mnaserrunnign zuul in foreground implies debug mode i think20:47
mnaserit failed20:47
mnaser[dumb-init] -d: No such file or directory20:47
mordredwhat did you do for args?20:47
mnaserno command, just: args: ['-d']20:48
mnaserargs: ['zuul-executor', -d'] could be a thing but i feel like thats wrong(tm)20:48
mordredthat's the thing20:48
mordredI think it needs to be args: ['/usr/local/bin/zuul-executor', '-d']20:48
mordredbecause command == entrypoint and args == cmd20:48
mordredyou know ...20:49
mnaseri assume this is largely because zuul requires a sort of init?20:49
mordredyeah - you need an init to reap child processes20:49
mnaserokay so this is sthe sort of thing to be useful to have a note explaining why zuul-executor needs this20:49
mordrednod20:49
mnaserok, let me try with those args and update that chart if thats the case20:50
*** tjgresha has joined #zuul20:50
mordredmnaser: we have it in the entrypoint for all of the images just so that things work right in general - but didn't know until just now that k8s did this with entrypoints20:50
*** tjgresha has quit IRC20:51
mordredso - lesson definitely learned for the day20:51
mnaserhttps://www.irccloud.com/pastebin/DYtOn7Ul/20:52
mnaserok, so ill monitor things and update the chart with that and whatever has failed the chart linter20:52
fungidoes zuul-merger need it as well (for git subprocesses)?20:52
mnaseri havent seen any zombie processes in the merger but i assume thats less likely to have things go 'wrong'20:53
mnaseror maybe it hasnt in my case20:53
corvusi think we should do the same for all the images; let's run dumb-init everywhere20:54
corvusand, basically, if we're overriding the command, override the full thing20:54
corvus*alternatively* we could set entrypoint to "dumb-init zuul-executor" in zuul's dockerfile then args could be "-d"20:56
corvusbut i think we'd have to make that change carefully with announcements, so let's shelve that for now.20:56
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Add Zuul charts  https://review.opendev.org/70046020:56
mnasercorvus: ok, updated all services to use dumb-init with the above and address tristanC comments and fixed some lint stuff20:56
corvuscool!20:57
mnaseri split out zuul and nodepool into two different charts20:58
mnaserand then my idea was to have some zuul-ci chart that has both included inside of it20:58
mnaserin the case that someone wants to use one or the other for a specific use case, and the 'zuul-ci' chart would be more of a 'distro' (aka it includes zookeeper, mysql, etc)20:58
corvusor "zuul-system"20:58
mnaseryeah, something like that!20:58
mnaserthats a way better name heh20:59
corvusbtw, i think we can make a job that spins up a k8s and runs the helm chart for real20:59
corvususing speculative registries and can cross-test changes to zuul images20:59
mordredcorvus: ++21:00
*** Goneri has quit IRC21:00
corvusi just wrote a change to add crio support to that, so it'll work with fully-qualified image names in non-dockerhub registries in k8s: https://review.opendev.org/70123721:00
mnasercorvus: there's a few tricks with that if you're trying to depend on a helm chart that hasn't merged yet (it has to live in a repo) -- but i think we can overcome those21:04
corvusi'm looking forward to digging into it :)21:04
mnaseralso, totally my next step is to run a full zuul and k8s job, but also if we want t ostart publishing those for users, we *really* need tagged images for zuul21:04
mnaserthat way we can leverage the appVersion inside the helm chart and point it directly at the right image (and makes those reliably consumable)21:05
mordredit seems like appVersion in the chart would make speculative images harder to work with21:05
mordredbut it'll be fun to poke and and figure out21:05
corvusmnaser: we should get tagged images i agree -- however, as a potential user of the helm charts, i'm actually only interested in running master CD21:06
mordredyeah21:06
corvusjust some user-level input fyi21:06
pabelangerclarkb: we just flipped zuul.a.c to ansible 2.9, 1 issue found with version_compare() filter. Was renamed to version()21:06
mordredI think it'll be interesting to figure out how to support CD-from-master as well as tagged releases and to do both reliably21:06
mordredpabelanger: yay21:07
clarkbpabelanger: yup I had to fix that in zuul-jobs21:07
pabelangerclarkb: configure-mirrors?21:08
pabelanger(we fork that today)21:08
clarkbyes for distro version checks21:08
pabelangeryha21:08
mnasermordred, corvus: totally possible and very easy, we simply define (imagine this as a yaml tree): image.tag: "{{ .Chart.appVersion }}"21:29
mnaserso by default we use the one defined to the chart, but a user can use their own values file to override image.tag to "latest"21:29
mordredcool21:34
*** rlandy is now known as rlandy|bbl23:08
*** mattw4 has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!