Tuesday, 2017-01-10

*** hashar has quit IRC00:02
*** saneax is now known as saneax-_-|AFK00:56
*** Shuo has joined #zuul01:09
ShuoFor the dashboard showing the "Testing Nodes", it shows the number of VMs in-use. Can I somehow translate that into the baremetal machine capacity (if someone starts thinking of budgeting aspect of such problem)?01:13
clarkbShuo: we dont have insight into how our cloud providers pack and oversubscribe. But each of our test instances is 8vcpu x 8GB ram x at least 80GB disk01:24
Shuoclarkb: I would imagine the virtual-to-physical virtulaization factor for CPU can be pretty big (let's say 2:1 for now), but the memory might be 1:1. So I would imagine a 64GB + 16 CPU can give us 8 VMs. So if I make my ballpark guestimation (I don't need to be precise for this :-) ) that 200 such kind of physical machines can serve ~1500 build VMs, which is not too scary from budgeting perspective.02:11
mordredjeblair: shade /nodepool gate fixed - feel free to +A those two change02:12
mordredShuo: yes - I believe we have guestimated similar numbers02:14
Shuomordred: thanks :-)02:17
Shuomordred: how did we planned the network proximity? (I heard that those physcial machines are located in different vendors' DC, and they are continentally away from each other?)02:24
Shuomy question is more or less about how we move the bits around (say, if we need some new version of image, which is might be a big chunk of bits), are we potentially experiencing network bandwidth problem?02:28
mordredShuo: ah - so ... we do have that situation (we have at least one cloud region in europe)02:36
mordredin our case, we deal with everything in terms of regions of clouds and treat regions as physically disparate, even if they are regions from the same provider02:36
mordredthis means we have to upload new images to each region when we make them (which for us is daily, and many images, and is a LOT of bandwidth)02:37
mordredit also means that we have implemented a mirror infrastructure which has per-region mirrors of frequently fetched artifacts02:37
mordredand we have scripts that run when a node reaches ready state that set local config on the node to point to the mirror in the same region as the node02:38
Shuomordred: could you elaborate a bit "a LOT of bandwidth" by giving a back of envelop calculation?02:38
mordredShuo: well - we don't pay for bandwidth, so we have made some choices with that in mind ...02:38
mordredShuo: each of our images is in the 8G range02:39
Shuomordred: if you need to update (push) new images to each region once a day, how does it cost "A LOT" bandwidth?02:39
mordredand we have ... 6 images and 14 regions of cloud02:40
mordredso we push 672G of image updates to our clouds daily from our image build machines02:41
mordred_roughly_02:41
mordredthe number is slighly higher because the rackspace regions take vhd format which is about twice as big02:41
Shuomordred: is 20 images a good number to think of that problem? if so, 8GB/image * 20 = 1280 Gbit toward a 'region'02:41
mordredI honestly think 20 is a bit high - but it would _certainly_ be a good high-water mark02:42
Shuomordred: thanks, you already gave me the answer to my previous question02:42
mordred\o/02:42
mordredShuo: fwiw, we build new images and they are that size because we pre-cache a bunch of things in the image in an attempt to reduce internet traffic during jobs02:44
mordredit's possible that for another installation with different usage characteristics than ours, a different tradeoff might be desired02:44
Shuomordred: I guess the above the main bandwidth cost, right? for each individual build/test job, we only need to pull down the python code from github, which is tiny, right?02:45
mordredwell - you'd think - but the nova repository is actually quite large (as are a few of the others)02:46
mordredso we actually have copies of every git repo we deal with in our base images so that the only thing we're fetching at job time is the proposed changes (and any other changes that might have landed that day)02:46
mordredwe also don't clone from github - the failure rate is too high02:46
mordredwe run a farm of 8 git mirrors behind a load balancer - although that is currently centralized02:47
mordredwe have plans to investiage per-region git mirrors, but haven't yet done that02:47
Shuomordred: hmm, then how to manage code line consistency?02:47
mordredShuo: what do you mean?02:48
Shuomordred, let's me try to recap what you said....02:48
Shuomordred: first, we have a once-a-day network consumption to push the daily-built new image to each region;02:49
mordredyes02:49
Shuomordred: then, we have per build/test code pull (and I thought it was a tiny amount of traffic, but you said it could also be huge amount. Ok, let's make a parking lot for that for now and come back to it later). And you said, you have 8 git servers for different region to pull, is this a correct understanding?02:52
mordredyes. BUT - it _should_ be a tiny amount of traffic, as it should only at most be a daily delta from what we cached in our image02:53
mordredit's possible that our particular approach to pre-caching and mirroring might be overkill for you is the main reason I mention it - and if it is, it could make those image sizes smaller02:53
Shuomordred: "run a farm of 8 git mirrors...", how are these 8 mirrors sync-ed with the primary git? (I just wonder if there exist a tiny window when a mirror git does not hold a commit when the test job asks for it)02:57
Shuomordred: not sure if I asked my question clearly above -- let me know if I did not02:58
mordredShuo: our gerrit server replicates to them on push - so yes, there is a possible race, but the zuul cloning takes that into account02:58
Shuomordred: let me try to describe different git server's sync workflow here, and please correct me if I am not making the right description...03:00
Shuomordred: 1) the code passing the final gating test gets merge into the HEAD on the Gerrit git server; 2) then the Gerrit git server push the that commit onto 8 git replica servers; 3) any build/test job fetch from one of the replica git server; 4) if a particular commit is not available (yet) on the replica git, zuul might wait for some time and retry, I guess. Close enough? ;-)03:03
mordredyes. that is it03:05
mordredclarkb: ^^ please correct me if I've lied to Shuo ... my brain doens't always work perfectly03:05
Shuois github.com/openstack/A-project one of the 8 replica? or it is the 9th?03:06
Shuomordred: ^^03:13
mordredShuo: it's the 9th03:17
mordredwe don't actually use github for anything - we just replicate there for dev convenience03:18
mordredShuo: my flight is landing, so I'm going to afk ...03:19
Shuomordred: kk, thanks for sharing...03:20
Shuomordred: regarding the github question, I was just trying to understand if it's purely a push (in this sense it is same as the other 8 replica) from the gerrit git server (the 'master' git)03:21
*** Shuo has quit IRC03:37
*** bhavik1 has joined #zuul04:39
*** bhavik1 has quit IRC04:43
*** saneax-_-|AFK is now known as saneax05:17
*** Cibo_ has joined #zuul05:28
*** bhavik1 has joined #zuul05:31
*** bhavik1 has quit IRC05:56
*** saneax is now known as saneax-_-|AFK08:10
*** saneax-_-|AFK is now known as saneax08:18
*** hashar has joined #zuul08:32
*** saneax is now known as saneax-_-|AFK09:11
*** saneax-_-|AFK is now known as saneax09:33
*** pabelanger has quit IRC09:35
*** wznoinsk has quit IRC09:35
*** mmedvede has quit IRC09:35
*** wznoinsk has joined #zuul09:35
*** pabelanger has joined #zuul09:35
*** mmedvede has joined #zuul09:36
*** rbergeron has quit IRC09:37
*** rbergeron has joined #zuul09:37
*** hogepodge has quit IRC09:38
*** SpamapS has quit IRC09:38
*** hogepodge has joined #zuul09:39
*** openstack has joined #zuul14:31
*** saneax-_-|AFK is now known as saneax14:32
*** saneax is now known as saneax-_-|AFK14:36
*** saneax-_-|AFK is now known as saneax14:40
*** Cibo_ has quit IRC15:00
openstackgerritLenny Verkhovsky proposed openstack-infra/nodepool: Fixed typo in info msg  https://review.openstack.org/41843515:08
*** mptacekx has quit IRC16:05
*** saneax is now known as saneax-_-|AFK16:55
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove unsed variables  https://review.openstack.org/41849217:12
*** herlo has quit IRC17:22
*** herlo has joined #zuul17:27
*** herlo has joined #zuul17:27
openstackgerritPaul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages  https://review.openstack.org/41427317:37
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages  https://review.openstack.org/41216017:41
*** hashar has quit IRC18:02
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder  https://review.openstack.org/41813718:04
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages  https://review.openstack.org/41216018:18
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Allow nodepool-builder to only build diskimages  https://review.openstack.org/41216018:31
*** Shuo has joined #zuul18:34
openstackgerritMerged openstack-infra/nodepool: Source glean installs in simple-init  https://review.openstack.org/41466218:48
openstackgerritPaul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages  https://review.openstack.org/41427319:00
*** klindgren has joined #zuul19:45
harlowjarbergeron would u be the guy that i can perhaps ask ansible tower questions to, sorta new to it, and klindgren and I had some questions about functiionality19:58
harlowja(nothing crazy hard)19:58
openstackgerritJames E. Blair proposed openstack-infra/zuul: Separate driver interfaces and make abstract  https://review.openstack.org/41855420:00
rbergeronhawlowja: not really (being community gal, and its not open source, so ive had super limited interaction with it --20:02
rbergeronbut happy to find the right person :)20:02
jeblairjhesketh, jamielennox, jlk, SpamapS: ^ the parent of 418554 is the restructuring into drivers change which you may recall;  418554 is all about making a nice api for driver implementors.  it's one approach we could take.  please check it out and let me know if you like that direction, or think something else would be better.20:03
harlowjarbergeron thx, klindgren i think had a question, hopefully he remebers (pattern matching related or something)20:28
klindgrenbasically does ansible tower have a rules engine?  As an example can I feed it notification of alerts from a system, and it be able to do matching against those alerts to trigger specific jobs?  Or is the expectation that via the API that something else needs to do that work and just call a defined playbook?20:31
harlowja(or perhaps said thing is a planned/but not yet feature)20:39
Shrewsharlowja: is there a safe way to delete a lock node after it is released?20:46
Shrewsharlowja: with kazoo, that is20:47
harlowjahmmm, do u know if its unused by others?20:48
harlowjaif known unused, ya, just delete the lock directory20:50
Shrewsharlowja: well now, that's the issue  :)20:50
harlowjaif not known unused20:50
Shrewsi was hoping there was an option to release() that would delete it before unlocking it, but alas, no20:50
harlowjaya, that'd involve a little bit more, cause locks at least via kazoo aren't just single nodes20:51
harlowjathey are directories20:51
Shrewsjeblair: i think we may need a cleanup thread in nodepoold to clean up request locks that are older than X20:51
jeblairShrews: yeah, or lock nodes that are for request nodes that don't exist any more20:51
harlowjaso a release() method would almost need to drop a "this_lock_is_dead" file and wake other waiters to tell them to drop there waiting20:52
openstackgerritPaul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages  https://review.openstack.org/41427320:53
harlowjaShrews but idea, try to delete the lock directory20:55
harlowjai don't think https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/lock.py#L184 will handle it nicely, but one way to find out20:55
harlowjalol20:55
Shrewsharlowja: we'll just do the cleanup thread idea  :)20:56
harlowjak20:56
rbergeronklindgren / harlowja: will ask. no idea. :)21:09
harlowja:)21:09
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests  https://review.openstack.org/41858521:45
jheskethMorning21:46
Shrewsjeblair: hrm, i just noticed that we've made some changes to nodepool/zk.py after making the latest features/zuulv3 branch. i guess we can merge those in later, but we'll probably get conflicts.21:46
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests  https://review.openstack.org/41858521:55
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add framework for handling node requests  https://review.openstack.org/41858521:58
harlowjaShrews how's the overall zookeeper stuff goign btw?21:59
Shrewsharlowja: pretty decently so far. the current nodepool image builder we run in infra production is using it pretty successfully22:01
harlowjacool22:10
pabelangermuch better then gearman at this point. Zero backed up builds23:11
*** saneax-_-|AFK is now known as saneax23:17
*** morgan has quit IRC23:23
*** morgan_ has joined #zuul23:38
openstackgerritPaul Belanger proposed openstack-infra/nodepool: [WIP] Support AFS mirrors for nodepool diskimages  https://review.openstack.org/41427323:40
openstackgerritMerged openstack-infra/nodepool: Remove the ability for nodepoold to launch a builder  https://review.openstack.org/41813723:41

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!