jeblair | 00:06 < openstackgerrit> James E. Blair proposed openstack-infra/infra-specs: Zuul v3: Add section on secrets https://review.openstack.org/386281 | 00:07 |
---|---|---|
jeblair | i've updated that based both on conversations from the summit meetup as well as a subsequent conversation with mordred | 00:07 |
jhesketh | jeblair: left a comment if you're still around | 00:40 |
*** saneax is now known as saneax-_-|AFK | 00:53 | |
jeblair | jhesketh: thanks; replied and revision forthcoming | 00:54 |
jhesketh | cool, glad I wasn't completely off, will wait for the next iteration | 00:56 |
jeblair | 00:59 < openstackgerrit> James E. Blair proposed openstack-infra/infra-specs: Zuul v3: Add section on secrets https://review.openstack.org/386281 | 01:01 |
jhesketh | responded | 01:19 |
openstackgerrit | watanabe isao proposed openstack-infra/nodepool: Add ssh timeout to client https://review.openstack.org/329799 | 01:45 |
*** persia has quit IRC | 03:59 | |
*** persia has joined #zuul | 04:01 | |
*** bcoca has quit IRC | 05:58 | |
*** abregman has joined #zuul | 06:05 | |
*** harlowja_ has quit IRC | 06:33 | |
*** openstackgerrit has quit IRC | 07:48 | |
*** openstackgerrit has joined #zuul | 07:49 | |
*** abregman_ has joined #zuul | 08:26 | |
*** abregman_ has quit IRC | 08:26 | |
*** abregman has quit IRC | 08:29 | |
*** hashar has joined #zuul | 08:32 | |
*** hashar has quit IRC | 08:32 | |
*** abregman has joined #zuul | 08:32 | |
*** hashar has joined #zuul | 08:38 | |
*** hashar_ has joined #zuul | 08:41 | |
*** hashar has quit IRC | 08:43 | |
*** abregman has quit IRC | 08:52 | |
*** hashar_ is now known as hashar | 09:06 | |
*** abregman has joined #zuul | 09:08 | |
*** yolanda has quit IRC | 09:31 | |
*** yolanda has joined #zuul | 09:31 | |
*** abregman is now known as abregman|afk | 10:47 | |
*** abregman|afk is now known as abregman | 11:34 | |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Support GitHub PR webhooks https://review.openstack.org/163117 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Make the string representation of change transparent https://review.openstack.org/238948 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Merge pull requests from github reporter https://review.openstack.org/243250 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Encapsulate determining the event purpose https://review.openstack.org/247487 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Fix job hierarchy bug. https://review.openstack.org/192457 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: support github pull reqeust labels https://review.openstack.org/247421 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Better merge message for GitHub pull reqeusts https://review.openstack.org/280667 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Add 'push' and 'tag' github webhook events. https://review.openstack.org/191207 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Add 'pr-comment' github webhook event https://review.openstack.org/239203 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Support for dependent pipelines with github https://review.openstack.org/247500 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Configurable SSH access to GitHub https://review.openstack.org/239138 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: GitHub file matching support https://review.openstack.org/292376 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Allow github trigger to match on branches/refs https://review.openstack.org/258448 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Log GitHub API rate limit https://review.openstack.org/292377 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Set filter according to PR/Change in URL https://review.openstack.org/325300 | 13:01 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Allow using webapp from connections https://review.openstack.org/215642 | 13:02 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Allow list values in template parameters. https://review.openstack.org/191208 | 13:02 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Add basic Github Zuul Reporter. https://review.openstack.org/191312 | 13:02 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Support for github commit status https://review.openstack.org/239303 | 13:02 |
*** bcoca has joined #zuul | 14:30 | |
openstackgerrit | Merged openstack-infra/nodepool: Remove OldNodePoolBuilder class https://review.openstack.org/392884 | 14:48 |
*** yolanda has quit IRC | 15:11 | |
dmsimard | pabelanger: any news in regards to zuul merger namespacing ? | 15:22 |
pabelanger | dmsimard: not yet, going to try and update feature/zuulv3 first. Make sure everybody agrees on that implementation, then see about back porting to zuulv2 | 15:25 |
*** yolanda has joined #zuul | 15:26 | |
*** abregman has quit IRC | 15:33 | |
openstackgerrit | Merged openstack-infra/zuul: Ansible launcher: move AFS publisher into a module https://review.openstack.org/394658 | 16:20 |
*** harlowja has joined #zuul | 16:31 | |
*** harlowja has quit IRC | 16:32 | |
*** harlowja has joined #zuul | 16:34 | |
*** hashar is now known as hasharAway | 16:40 | |
SpamapS | jeblair: so, I want to make sure we have only one source of truth. I was thinking I'd send an email to openstack-infra asking people to use storyboard for v3. How does that sound? | 17:59 |
* SpamapS should have done that back when the list went in as stories and tasks | 17:59 | |
mordred | SpamapS: that sounds completely sane to me | 18:00 |
jeblair | SpamapS: context? | 18:03 |
jeblair | what would the other source of truth be? | 18:03 |
SpamapS | jeblair: pabelanger wasn't aware of storyboard and there was mention of etherpads | 18:04 |
SpamapS | I just realize we haven't told everyone to do that. | 18:04 |
pabelanger | Ya, I've been just using gerrit up until now | 18:05 |
jeblair | SpamapS: yes. now that i realize you are done setting up storyboard, i have retired the etherpad. | 18:05 |
pabelanger | atleast to check which tasks were done | 18:05 |
jeblair | SpamapS: i think an email would be good | 18:05 |
SpamapS | cool will send shortly | 18:09 |
mordred | jeblair: weekly meeting is mon 2200 yeah? | 18:09 |
jeblair | mordred: yes -- http://eavesdrop.openstack.org/#Zuul_Meeting | 18:10 |
jeblair | in -alt | 18:10 |
jeblair | so everybody be sure to show up for that on monday :) | 18:11 |
*** Shuo_ has joined #zuul | 18:26 | |
SpamapS | sent | 18:27 |
*** jamielennox|away is now known as jamielennox | 18:37 | |
Shrews | SpamapS: How come when I search https://storyboard.openstack.org/#!/search for the zuulv3 tag (as your email suggests), it does not return all items from your workboard (https://storyboard.openstack.org/#!/board/41), even though they are clearly marked with that tag? | 18:43 |
Shrews | Am I missing something about how storyboard works? | 18:43 |
jeblair | Shrews: i will note that the search result returns a very suspcious 10 results | 18:46 |
Shrews | indeed | 18:47 |
jeblair | that's a number that shows up a lot in storyboard :/ | 18:47 |
* jeblair heads over to #storyboard | 18:47 | |
mordred | zuulv3 search gets me a link to the workboard | 18:48 |
Shrews | mordred: try just the "tag" search. should show up as a popup in the search bar (with a tag icon next to it) | 18:49 |
jeblair | mordred: yeah, i think that's a text search | 18:49 |
jeblair | mordred: as compared with what Shrews said | 18:50 |
mordred | AH | 18:51 |
mordred | Shrews: oh! I got a bunch of things | 18:52 |
mordred | (more than 10) | 18:52 |
mordred | jeblair: ^^ | 18:52 |
Shrews | profile says my page size is 10 | 18:53 |
Shrews | yet i cannot go to the next page | 18:53 |
jeblair | yeah, if i set my profile to 100, i get more than 10 | 18:53 |
SpamapS | when I search for the _tag_ I get all of it. | 18:54 |
SpamapS | oh but yeah, I have it set t o100 | 18:54 |
jeblair | my profile resets to 10 every time i log in | 18:54 |
mordred | jeblair: that's odd - fwiw, mine does not reset everytime I log in | 18:57 |
mordred | seems like a bug | 18:57 |
Shrews | mine does reset | 18:58 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Combine ZKTestCase with DBTestCase https://review.openstack.org/383962 | 19:03 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Supply ZK connection information to test configs https://review.openstack.org/383963 | 19:03 |
jeblair | Shrews: what's still WIP about 592? | 19:04 |
Shrews | jeblair: i wanted to run nodepool-builder by hand to test the changes | 19:05 |
Shrews | which i'm setting up to do now | 19:05 |
Shrews | jeblair: but so far it's working ok | 19:09 |
Shrews | (CONNECTED) /nodepool/image/devstack-trusty/builds> get 0000000001 | 19:11 |
Shrews | {"state": "building", "builder": "localhost.localdomain", "state_time": 1478804951} | 19:11 |
Shrews | y | 19:11 |
mordred | Shrews: that seems positive | 19:13 |
Shrews | if post build data looks ok, and delete code runs ok, i'll un-WIP | 19:13 |
mordred | \o/ | 19:15 |
mordred | Shrews: zomg. that means we're pretty much like almost done with that and stuff amirite? | 19:16 |
clarkb | we should enable all the tests again too and get the integration test working | 19:16 |
Shrews | yeah, but did find a minor issue | 19:16 |
jeblair | clarkb: yes, that is the plan | 19:22 |
jeblair | Shrews: cool, then ignore those two changes, i will re-rebase on 592 (like i said i would yesterday but just now remembered) | 19:26 |
Shuo_ | jeblair: what's the status of v3? (before alpha or ?) | 19:26 |
jeblair | Shuo_: yes, under heavy development and not able to actually be used at all yet | 19:27 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: Transition ZK API from dict to object model https://review.openstack.org/394592 | 19:37 |
Shrews | jeblair: ^^^ found a few minor issues | 19:38 |
Shrews | main one is that 'formats' is an attribute of ImageBuild, not ImageUpload | 19:38 |
Shrews | duh | 19:38 |
Shrews | but, un-WIP'd | 19:38 |
Shrews | and one place where i still treated it as a dict instead of obj | 19:40 |
Shrews | mordred: to be clear, i don't think we'll be "done with that" until we actually put it through the ringer with actual usage with actual providers and stuff. there HAVE to be bugs. | 19:47 |
Shrews | but, i think we're at the point we need to start doing that | 19:48 |
mordred | Shrews: ++ | 19:48 |
mordred | yes - that's what I meant | 19:48 |
*** openstackgerrit has quit IRC | 19:48 | |
clarkb | it should be easy because we have much testing for it already | 19:49 |
*** openstackgerrit has joined #zuul | 19:49 | |
clarkb | for the integration test I think you should be able to just update to make sure zk is installed and running and then that will tell you if the build, upload, boot, delete cycle works | 19:52 |
clarkb | though it might only delete the node not the image (adding an image delete too is easy though) | 19:52 |
Shrews | i'll have to spend some time learning the integration tests. haven't looked at that part yet | 19:53 |
mordred | Shrews: it's very similar to the shade tests - devstack plugin, runs nodepool against the devstack cloud | 19:54 |
clarkb | and then checks nodepool reaches expected steady states | 19:54 |
Shrews | this is found in nodepool/devstack? | 19:55 |
clarkb | yes | 19:55 |
clarkb | and then the check steady state script is in tools/ I think | 19:55 |
Shuo_ | jeblair: what's the estimated release time? If we setup a v2, what's the migration path? | 19:56 |
clarkb | Shrews: https://git.openstack.org/cgit/openstack-infra/nodepool/tree/tools/check_devstack_plugin.sh is the state check script, you probably want toadd in image delete checking there too | 20:01 |
clarkb | Shrews: and replace https://git.openstack.org/cgit/openstack-infra/nodepool/tree/devstack/plugin.sh#n309 with starting a zk | 20:01 |
clarkb | Shrews: and you can install zk by modifying files in https://git.openstack.org/cgit/openstack-infra/nodepool/tree/devstack/files | 20:01 |
Shrews | clarkb: well, gearman is still needed for part of nodepool, but i get what you mean. thx | 20:02 |
Shrews | let's hope zookeeper actually works :) | 20:03 |
jeblair | Shrews: note that some of my changes are a pre-req for that; i will finish rebasing them after lunch | 20:04 |
Shrews | jeblair: ack | 20:05 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396388 | 20:07 |
clarkb | jeblair: the db test case reorg is the ones you mean htat need a rebase? | 20:16 |
* clarkb holds off on those for now then | 20:16 | |
jeblair | yep | 20:16 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396388 | 20:31 |
jeblair | Shrews: 592 fails tests now | 20:48 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Transition ZK API from dict to object model https://review.openstack.org/394592 | 20:51 |
jeblair | Shrews: ^ fixed, and also rebased to branch tip (which i needed to resolve a conflict in the stuff i'm stacking on it) | 20:52 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Combine ZKTestCase with DBTestCase https://review.openstack.org/383962 | 20:53 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Supply ZK connection information to test configs https://review.openstack.org/383963 | 20:53 |
Shrews | jeblair: bleh. thx | 20:54 |
clarkb | jeblair: would mixing the zk test case into classes that need both be easier? also make it easier to untangle the two later if tjat becomes desirrablr? | 20:59 |
jeblair | clarkb: that was my first attempt. testtools setup weirdness struck. | 21:06 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396388 | 21:06 |
clarkb | jeblair: huh ok | 21:07 |
jeblair | clarkb: this seemed reasonable considering our trajectory (in reality, they will both almost always be used together until gearman is gone completely) | 21:07 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add getMostRecentBuildImageUpload method to zk https://review.openstack.org/383964 | 21:08 |
clarkb | ya I think test_zk and the allocator tests are probably the odd ones out | 21:08 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Update waitForImage test method for ZK https://review.openstack.org/383966 | 21:09 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Re-enable test_dib_image_list https://review.openstack.org/383967 | 21:09 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Assume diskimage and image names are the same https://review.openstack.org/383965 | 21:09 |
jeblair | clarkb, greghaynes, Shrews: ^ there's the stack updated | 21:09 |
jeblair | 967 is actually the tip. we need to do that and also update the rest of the commands before the integration test since it uses them. but we also still have to update the main nodepool daemon to get its image information from zk. the getMostRecentBuildImageUpload method in 964 and having switched the model to use objects should make that relatively painless. | 21:12 |
jeblair | oh, i think i need to update 967 more | 21:13 |
Shrews | jeblair: do you want to add a 'count' param to your new API to match the others? | 21:27 |
Shrews | totally don't have to, but just a thought | 21:27 |
clarkb | another random thought, its weird that you can't get all the uploads regardless of state (to me at least) | 21:28 |
jhesketh | Morning | 21:28 |
Shrews | yeah, the others have the option of ignoring state altogether | 21:29 |
jeblair | Shrews: i thought about it, but i'm leaning toward no because i think this is basically the primary output of the builder for the rest of the system. nodepool launcher never needs more than the most recent upload | 21:29 |
jeblair | it's the "get me the image i need to launch this node" method | 21:29 |
Shrews | jeblair: *nod* | 21:29 |
clarkb | reading the failed logs it seems to be getting the same build over and over and over even though the state is ready? | 21:32 |
jeblair | i think we may not be uploading | 21:34 |
clarkb | ah | 21:34 |
clarkb | oh right its builds/ that it is iterating through constantly so ya liekyl no upload data | 21:35 |
jeblair | yeah, and i don't see any log msgs from the uploader | 21:36 |
jeblair | other than 'starting' | 21:36 |
clarkb | http://logs.openstack.org/67/383967/4/check/gate-nodepool-python27-db-ubuntu-xenial/826558c/console.html#_2016-11-10_21_15_06_494643 and that confirms no nodes there | 21:37 |
Shuo_ | where can I find a good architecture description of zuul (hopefully one for v3 and one for existing one) | 21:52 |
openstackgerrit | Caleb Boylan proposed openstack-infra/nodepool: Fix subnode deletion https://review.openstack.org/370455 | 21:54 |
jeblair | clarkb, Shrews: it looks to be another manifestation of the diskimage name != image name problem | 21:55 |
jeblair | Iebe873bae1cf71a56ad9b791a9e6751a4e15043e is what broke it | 21:55 |
jeblair | working on a fix | 21:56 |
jeblair | Shuo_: http://docs.openstack.org/infra/publications/zuul/#(1) and http://docs.openstack.org/infra/zuul/quick-start.html#zuul-components may help with v2 | 22:01 |
jeblair | Shuo_: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html may help with v3 | 22:02 |
Shuo_ | jeblair: thanks! | 22:02 |
jeblair | Shuo_: none of those are easily consumable though, sorry. we will address that for the v3 release. | 22:02 |
Shuo_ | jeblair: hopefully get some whiteboard time to consume it :-) | 22:03 |
jeblair | clarkb, Shrews: okay, i have a fix, but there are a couple more fixes i need to untangle. will have patches soon. | 22:11 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Update waitForImage test method for ZK https://review.openstack.org/383966 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Re-enable test_dib_image_list https://review.openstack.org/383967 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add getMostRecentBuildImageUpload method to zk https://review.openstack.org/383964 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Assume diskimage and image names are the same https://review.openstack.org/383965 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Use diskimage name when looking up image on disk https://review.openstack.org/396422 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Override the cleanup interval in builder fixture https://review.openstack.org/396423 | 22:23 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add __repr__ methods to ZK objects https://review.openstack.org/396424 | 22:23 |
jeblair | clarkb, Shrews: ^ I think that should do it. | 22:23 |
Shuo_ | In the architecture of zuul-based CI infrastructure setup, is nodepool's purpose/role to bring VM with the right type of base image online (through nova API)? | 22:24 |
mordred | Shuo_: yes. and also to manage base-images that you'd use to boot vms | 22:28 |
mordred | Shuo_: and also to, based on config, keep a pool of available nodes so that tests don't have to wait for nova to boot a vm | 22:28 |
clarkb | it also does some scrubbing of instances to avoid using "bad ones" and can do per region configuration | 22:28 |
Shuo_ | mordred: 1) does nodepool maintain states (the answer seems a 'yes' from your "keep a pool of available nodes ")?; 2) It sounds like the continuous integration infra team consume a constant large pool of capacity, regardless if there is a lot job flight around. is this understanding true? | 22:31 |
clarkb | yes it keeps states, building, ready, used, delete currently | 22:33 |
mordred | the consumed capacity does vary with demand | 22:33 |
mordred | so although it does keep some amount of nodes around if there is no demand, it's a minimal amount compared to peak demand times | 22:34 |
clarkb | right, we run "under demand" most of the time | 22:34 |
clarkb | holidays, portions of weekends, and summits are just about the only times we drop below | 22:34 |
Shuo_ | To add what I meant by 2), say if I have setup a 400 dell machines openstack cluster, and use it as the capacity of zuul + nodepool servcie. I setup a (say) 1000VM of 4GB on my nodepool configure, then even I have zero job flighing around in the queue, I will be occupying 1000vms | 22:35 |
clarkb | jeblair: if you don't skip all those tests in test_commands.py do they all fail? | 22:35 |
clarkb | Shuo_: no, nodepool is demand based, you set a minimum level to be ready at any time and thats how many you get with zero jobs in flight | 22:36 |
clarkb | Shuo_: it will then expand capacity up to 1k VMs as demand requires it | 22:36 |
Shuo_ | clarkb: thanks and make sense to me now. | 22:37 |
Shuo_ | clarkb: any discussion/brainstorm on a resource pool of container-oriented cluster? Say, we have a cluster of Kubernetes or Mesos already, and we want to use that pool of machines (mesos/kubernetes cluster) as our zuul solution capacity. Can it fit in? | 22:41 |
Shuo_ | clarkb: if it can, replacing or expending what part? | 22:41 |
clarkb | I think in the past we have said that it would be nice if you continued to use the nova api beacuse that can do baremetal, VMs, and containers and we don't have to change anything in nodepool | 22:42 |
Shuo_ | http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/ | 22:42 |
clarkb | It would probably be a significant amount of work to add in not openstack image management and instance boot deletion | 22:43 |
clarkb | you'd basically be rewriting most of nodepool to add that in | 22:43 |
jeblair | clarkb: most of them will probably fail | 22:44 |
Shuo_ | clarkb: interesting, are you saying this can be done. keep all other parts as is and replace nodepool component in the picture? | 22:44 |
clarkb | Shuo_: its possible. It would be a large effort though | 22:46 |
clarkb | I'm not sure how valuable it would be as a result (at least in their example they already have an oepnstack you could talk to) | 22:46 |
mordred | that said - running k8s workloads is a thing we'd like to support - but using k8s or mesos to manage resources instead of openstack would not really be a win for us | 22:48 |
clarkb | mordred: we already do support them :) | 22:48 |
Shuo_ | clarkb: "at least in their example they already have an oepnstack you could talk to", but our reality is we have a huge mesos cluster running. Can't bend the existing infrastructure too much. | 22:48 |
clarkb | kolla at least is doing it today on top of zuul+nodepool | 22:48 |
clarkb | I think magnum and higgins too? | 22:48 |
mordred | clarkb: I mean without first spinning up a VM to install the k8s in | 22:49 |
*** hasharAway has quit IRC | 22:49 | |
clarkb | mordred: right its not consuming k8s resources, but it is testing k8s things | 22:49 |
mordred | yah | 22:49 |
clarkb | Shuo_: I'm not sure I understand how you would put them together either isn't mesos fairly static? | 22:49 |
clarkb | Shuo_: not sure how nodepool would help there | 22:49 |
clarkb | k8s too right? | 22:49 |
mordred | Shuo_: zuul+nodepool is very much focused on operating on things that behave like computers | 22:49 |
clarkb | eg you wouldn't have nodepool do much with them | 22:50 |
mordred | not abstract container execution environments | 22:50 |
clarkb | you'd maybe have things register with zuul and zuul execute on them directly | 22:50 |
timrc | yolanda made a nodepool-like service that spit out containers using k8s, Shuo_ -- totally randomly jumping in here w/out much context so not sure if that's useful or not :) | 22:50 |
clarkb | at least from my first impression I think thats how I would try to structure it | 22:50 |
clarkb | basically nodepool registers openstack resources with zuul, $other thing could register mesos resources with zuul | 22:51 |
mordred | clarkb: yah - that was somewhat was I was originally thinking - a nodepool provider that returned a k8s endpoint | 22:51 |
mordred | yah | 22:51 |
mordred | it seems there are k8s namespaces - so a "single user k8s endpoint" could potentially be a namespace thing inside of an existing k8s | 22:52 |
mordred | this is all very handwavey at this point | 22:52 |
Shuo_ | clarkb: "isn't mesos fairly static?" No and Yes. No, because you can start containers of jenkins on demand (number of occupied capacity is very dynamic); Yes, although your 400 dell machines are all management by mesos (400 is a static thing, but that's the same for OpenStack as well) | 22:52 |
mordred | in some ways, parts of this may be easy - ansible can talk to k8s and mesos just fine | 22:55 |
Shuo_ | clarkb: did that answer your question? | 22:55 |
mordred | however, the hard part, from when we've talked with container people about this problem space | 22:55 |
clarkb | Shuo_: sort of, iirc mesos pre chunked its resources | 22:55 |
clarkb | Shuo_: so you basically consume the entire set at all times which is your problem with using openstack next to mesos because mesos has taken over the whole set | 22:55 |
mordred | is that zuul v3 will want to prepare git repository states for a given job and then rsync those contents to a VM | 22:55 |
clarkb | Shuo_: I think a better way to describe it might be "mesos is single tenant" | 22:56 |
mordred | so if things are operating in a container model, we're goint to need to figure out the git repo data transfer | 22:56 |
clarkb | so yes while openstack consumes the entire set of compute resources it shcedules on them dynamically in a multitenant manner | 22:56 |
mordred | it is on the list of things to figure out - but it's pretty low on the list right now to be honest | 22:56 |
Shuo_ | clarkb: no, mesos does not pre-chunk resource (it labels your machines, so that you can say things like "Job A can only run on machine with Label-I", but that's totally optional) | 22:56 |
mordred | because of the things we need to do to get zuul v3 up and going just in the current non-container oriented world | 22:56 |
clarkb | mordred: I mean it will just work in a container world too | 22:57 |
clarkb | mordred: you jsut have to treat containers as instances not processes | 22:57 |
clarkb | same as baremetal | 22:57 |
timrc | "system containers"? | 22:57 |
mordred | right. that's not actually doing the thing that k8s people want | 22:57 |
clarkb | (apparently people are using nodepool with baremetal which is cool) | 22:57 |
mordred | so isn't really a good analog | 22:57 |
clarkb | mordred: sort of | 22:57 |
clarkb | mordred: thats not what you want for deploying applications in production | 22:58 |
clarkb | but for CI I think we learn over and over and over again that you need to turn those knobs very often | 22:58 |
mordred | clarkb: so - if you have a container-based appliation that is in git repos A and B | 22:58 |
clarkb | because you aren't just running unittests, you are also testing that nginx/apache/haproxy/mysql/etc work together | 22:58 |
mordred | to deploy that, the thing you need to do is build containers a and b from the content in git repos A and B | 22:59 |
timrc | I don't think i'd abstract k8s as a nodepool provider if what I'm getting back is an endpoint to an immutable process container... might as well just have zuul interface k8s directly. | 22:59 |
clarkb | yes that is what kolla is basically doing today | 22:59 |
mordred | the questions are "where do you build the containers" "where do you upload them" and "how to you actually run the deploy based on them" | 22:59 |
clarkb | timrc: yup thats what I was saying earlier :) | 22:59 |
Shuo_ | clarkb: concept of tenancy in mesos is quite different from tenancy in openstack; though it's hard to have a 1-1 mapping, people can do some cheating. | 22:59 |
Shuo_ | do some cheating and achieve such mapping. | 23:00 |
mordred | it's possible we could decide that we want to teach the zuul core about building containers | 23:00 |
timrc | clarkb: sweet :) | 23:00 |
mordred | but that seems like a bad idea | 23:00 |
Shuo_ | fire alarm, ttyl | 23:00 |
clarkb | timrc: mordred I don't think zuul would build the containers | 23:00 |
clarkb | or at least not as part of zuul itself | 23:00 |
mordred | where do you think they would be built? | 23:00 |
clarkb | but the interaction of run a job over here | 23:00 |
clarkb | that seems like something that zuul should figure out between k8s/mesos directly without a nodepool | 23:01 |
clarkb | because its fairly inelastic | 23:01 |
clarkb | k8s cluster si X big, use it | 23:01 |
clarkb | same for mesos | 23:01 |
mordred | so - let's ignore nodepool for a second | 23:01 |
mordred | it's not the hard part | 23:01 |
mordred | the hard part is - how do you go from speculative git repository state to containers running in k8s | 23:02 |
clarkb | mordred: the "easy" way is the way we do it today | 23:02 |
clarkb | mordred: or at least the theoretical future state of how to do it better today | 23:02 |
mordred | right. which requires either a VM or a container running multi-process pretending ot be a VM | 23:03 |
clarkb | give people "real" computers to do their work on and be flexible | 23:03 |
mordred | right | 23:03 |
timrc | That ^^ btw would be useful for lint tests :) | 23:03 |
timrc | But I digress :) | 23:03 |
mordred | but there's this whole world of container people who explicitly do not want to be given 'real' computers | 23:03 |
mordred | but who want to express all of their things in terms of collections of interrelated containers | 23:04 |
timrc | Yeah so you have a job that use k8s instead of nodepool and asks for pods with $image and you get back the ips into that pod? | 23:04 |
clarkb | timrc: ya that is basically what I had in mind | 23:05 |
clarkb | especially with zuulv3 where its already "give me N of image foo" from openstack via nodepool | 23:05 |
clarkb | you get back a "primary" run your testing from that whatever it is that you need to do to assert state | 23:05 |
clarkb | which leads into mordreds thing | 23:06 |
clarkb | of how do we make the new image that you ask the things to boot | 23:06 |
mordred | yah | 23:06 |
clarkb | as part of a speculative git future world that may or may not exist | 23:06 |
mordred | without having each user have to script doing that themselves | 23:06 |
mordred | since for computer based workloads we provide a VM that has a bunch of state shoved in to it for them | 23:06 |
mordred | we'll figure it out eventually :) | 23:06 |
* mordred has to run to dinner | 23:07 | |
jeblair | in the mean time, we need to get it to run. at all. | 23:07 |
timrc | lol | 23:07 |
clarkb | mordred: thsi is sort of related to that thing from earlier and the reason I like have entirely self contained units or things that appear that way is it makes it very easy to reproduce locally | 23:07 |
clarkb | which is why job A to compile feeds into job B to build package feeds into X jobs to assert things I am less a fan of | 23:08 |
clarkb | I like Job 1 asserts X and also compiles and builds package and installs package. job 2 asserts Y and also compiles and builds package and installs package | 23:08 |
clarkb | then you figure out how to make the cost of those steps cheap when repeated. Because now I can just grab that same thing run locally and boom I reproduced | 23:09 |
clarkb | without needing to understand a complicated graph of build targets | 23:09 |
timrc | Just like we have ansible launcher to run ansible against newly provisioned nodes I imagine you'd have some sort of k8slauncer thing that would create images/apply deltas and upload those images to k8s. I dunno that sounds complicated :) | 23:09 |
clarkb | timrc: that assumes I have a k8s right? | 23:10 |
clarkb | but if you flip it around and make having a k8s part of the process now its easy | 23:10 |
clarkb | (which is the setup we use today) | 23:10 |
clarkb | (unfortunately we haven't solved the make those steps cheap part) | 23:10 |
timrc | Hm, yeah. | 23:10 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396388 | 23:11 |
clarkb | I also fully acknowledge that if you want things to be performant taking those shortcuts is often necessary (at least due to time constraitns) | 23:12 |
Shuo_ | clarkb: following up the topic of mesos/k8s. If we think of zuul's core value (great gating system, great serializing checking with speculative test runs), then it does not care if it's openstack cluster that provide the test machine capacity or mesos cluster doing so -- it simply ask for capacity to execute speculative jobs. | 23:19 |
Shuo_ | so although I don't particular understand the architecture detail yet, I feel there could exist synergy here. | 23:19 |
jeblair | Shuo_: yes, that's part of why there's an interface between zuul and nodepool, to allow for this kind of flexibility | 23:20 |
clarkb | yup exactly | 23:20 |
clarkb | the interaction between zuul and nodepool is a fairly well defined api | 23:21 |
Shuo_ | one more thing: zuul may be even be able to consume aws spot instance (I know this is OpenStack community, but I am just sharing our current non-openstack perspective) | 23:21 |
clarkb | and should be useable by things other than nodepool | 23:21 |
jeblair | Shuo_: agreed. our plan is to focus first on openstack's usecase, then the ansible community (which is less tied to openstack and uses github), then other uses, including containers | 23:22 |
clarkb | and in fact I don't think zuul has ever required nodepool | 23:23 |
jeblair | Shuo_: right now, we're at a stage where we're learning about container-based workflows | 23:23 |
clarkb | the old jenkins based system could've used jenkins + mesos | 23:23 |
clarkb | and new system can plug into zuul's resource allocation system | 23:23 |
jeblair | Shrews, clarkb: 394592 through 383967 pass tests | 23:23 |
Shuo_ | To me (and thanks a lot of clarkb, mordred and jeblair for helping out my rudimentary questions), zuul has two primary types of partners: 1) gerrit/gitlab/github -- developer interface; 2) nodepool+openstack cluster / some-kind-of-Mesos_driver + mesos cluster / some-kind-of-aws-driver + EC2 account /.... | 23:25 |
clarkb | Shuo_: I don't know for sure because I haven't tested it but I would expect that zuul 2 + jenkins gearman + jenkins mesos would just work | 23:26 |
Shrews | jeblair: will review when less drunk. Which reminds me, where is olaph??? | 23:27 |
clarkb | you might have to hack it so that jenkins mesos forces gearman to register josb but otherwise that should do things | 23:27 |
Shuo_ | and 2) is resource capacity interface. | 23:27 |
Shrews | olaph: we totes need to hang out | 23:27 |
Shuo_ | jeblair: I'd love to volunteer the container workflow if that helps the zuul community -- I happened to have chance to see both nova/vm side of story/workflow and mesos/container side of work/story | 23:28 |
pabelanger | clarkb: on the run containers on a VM thing, I do think it would be neat for nodepool to launch a VM, then some how flag it to be static for x amount of runs, to let some container jobs do things. Then, have nodepool delete that vms and repeat. | 23:29 |
pabelanger | so, people get container things for lint jobs | 23:30 |
clarkb | pabelanger: why not use nova's container drivers though? | 23:30 |
pabelanger | but we still don't have to manually create static | 23:30 |
clarkb | I think you are solving it at the wrong layer by having nodepool do that | 23:30 |
olaph | Shrews: I'm on the west coast | 23:31 |
pabelanger | clarkb: because not all our clouds support that? | 23:31 |
olaph | getting my protest on outside jeblair's house | 23:31 |
pabelanger | I have no idea, if that is true | 23:31 |
clarkb | mesos, k8s, openstack, docker swarm, etc all solve that problem | 23:31 |
pabelanger | never tried using nova container | 23:31 |
clarkb | it seems weird to try and solev it in nodepool | 23:31 |
Shrews | olaph: bah. Come to the Mexican restaurant below my house and drink margaritas! | 23:32 |
clarkb | but also lint jobs run fine on the "bigger" VMs | 23:32 |
pabelanger | well, I know ansible has an lxc task, it would be neat to use that interface | 23:32 |
pabelanger | but I haven't tried it before | 23:32 |
clarkb | and they won't run faster on containers, so it feels like optimizing for a problem that doesn't exist | 23:32 |
pabelanger | clarkb: right, I think in our case, we can start using smaller vms for lint tests | 23:32 |
Shuo_ | clarkb: "I don't know for sure because I haven't tested it but I would expect that zuul 2 + jenkins gearman + jenkins mesos would just work" -- that would be super! give me enough appetizer to recruit internal partners :-) | 23:32 |
pabelanger | but the often argument is people don't have the budget to keeps a lot of VMs going | 23:33 |
pabelanger | clarkb: speaking of smaller VMs, when do we want to start on that? | 23:33 |
clarkb | pabelanger: for us booting smaller VMs doesn't really help much | 23:33 |
clarkb | pabelanger: since we have a fixed instance quota | 23:33 |
clarkb | but people can boot smaller VMs with nodepool | 23:33 |
clarkb | if they want to reduce consumption of resources | 23:34 |
pabelanger | ya, capacity isn't an issue right now for sure | 23:34 |
clarkb | even if capacity was an issue for us using smaller VMs would not help | 23:34 |
clarkb | because in all of our clouds we either have a fixed number of IPs or a fixed number of total instances or both | 23:34 |
pabelanger | Ya, ipv6 helps | 23:35 |
pabelanger | but, agreed | 23:35 |
Shuo_ | clarkb: in container world, there exist pretty good adoption of ip-per-container, so the number of IP is not an issue (you can use a 10.0/16 and that gives you a huge runway) | 23:36 |
clarkb | if nova says you get 100 instances, making 100 small instances really doesn't do much except for the cloud scheduling | 23:36 |
clarkb | Shuo_: we require public IPs because we run things all over the worlda nd don't want to deal with the haedache of resolving a broken ipv4 weorld | 23:36 |
clarkb | Shuo_: so as pabelanger says ipv6 is nice :) | 23:37 |
Shuo_ | oh, I missed that part. | 23:37 |
clarkb | Shuo_: we run in something like 7 clouds and 11 regions? I forget the current number (it fluctuates a bit) and in at least 3 countries on two continents | 23:38 |
Shuo_ | clarkb: but even for multiple clusters around the world, there could be private IP space solution (we are doing that now on aws): say we have a 400 machines cluster on AWS EAST-1; and you have 600 machines cluster on AWS WEST-1, we can setup the virtual gateway and peering (setup the layer 3 connectivity) in your private ip space. But I know AWS gave us more weapon to use -- not a comparable situation. | 23:41 |
clarkb | ya its a bit different when you are dealing with a single cloud | 23:42 |
Shuo_ | clarkb: but it's great to hear you feel the current code base might just work :-) | 23:42 |
clarkb | Shuo_: ya I think it will come down to whether or not jenkins mesos gives jenkins gearman enough to register the jobs | 23:45 |
clarkb | but once the jobs are registered via gearman zuul should be able to schedule them and go crazy | 23:45 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!