*** goldenfri has quit IRC | 04:36 | |
*** b1airo has joined #scientific-wg | 10:10 | |
*** rbudden has joined #scientific-wg | 10:54 | |
*** oneswig has joined #scientific-wg | 10:58 | |
*** martial_ has joined #scientific-wg | 11:01 | |
martial_ | hello? | 12:03 |
---|---|---|
martial_ | so https://etherpad.openstack.org/p/Scientific-Sydney17-Forum | 12:04 |
oneswig | aha | 12:04 |
martial_ | b1airo: the BoF and lighting talks are on the schedule now> | 12:04 |
b1airo | i'll have 4x SN2700s, 80x CX-4 50GbE DP, 60x C6420s, 20x R740s... | 12:04 |
martial_ | ? I did not receive any notice of this from the Foundation ... strange | 12:04 |
oneswig | b1airo: nice, that's pretty much the entire Cambridge deployment, updated | 12:05 |
b1airo | martial_, ah ok, i did get a message from Jimmy saying they were up - will forward it... | 12:05 |
martial_ | thanks b1airo | 12:06 |
oneswig | b1airo: so you're looking at leading the scientific openstack pain points suggestion? I have a suspicion these two sessions might be merged. If you're meeting other HPC Ops people, what else have you got to talk about? | 12:09 |
martial_ | I see that they accepted the forum sessions I submitted a while back | 12:09 |
b1airo | oneswig, i'll also have 2x SN2100s, 20x CX-4 25GbE DP and 20x R740XD - 3.5PB of extra Ceph :-) | 12:09 |
oneswig | I wonder if jmlowe might be interested in the Ceph BoF, if he's going | 12:09 |
b1airo | i would say so oneswig - lots of people interested in OpenStack will also be interested in Ceph | 12:10 |
oneswig | b1airo: great to hear it, sounds awesome. All part of the same cloud, or a new project? | 12:10 |
martial_ | I need to reach out to them: it is not SWG meeting anymore but SWG Lighting Talk, my affiliation is wrong (likely why I did not get the email ... again) and Stig is missing on the list :P | 12:10 |
oneswig | martial_: I'm missing I think because I'm already doing 3 other things | 12:11 |
oneswig | I couldn't get proposed for a fourth | 12:11 |
oneswig | (or fifth) | 12:11 |
martial_ | oneswig: I remember this conversation :) | 12:11 |
martial_ | oneswig: popular guy | 12:12 |
martial_ | :) | 12:12 |
oneswig | don't know when to stop, that's my problem... | 12:12 |
martial_ | well who needs sleep anyhow right? | 12:12 |
oneswig | OK - better go, tea to drink etc. | 12:12 |
oneswig | catch you later | 12:12 |
*** oneswig has quit IRC | 12:13 | |
martial_ | b1airo: you okay on the "Top pain points for scientific openstack"? | 12:13 |
martial_ | I will email Mike related to the "Ceph BoF" | 12:13 |
martial_ | Maybe invite Robert Budden and Time Randles to be moderator on the "HPC Ops Unite" | 12:14 |
b1airo | martial_, recall that we specifically decided to have a BoF and Lightening Talks as our two WG sessions and forgo a "meeting" - a BoF is already similar | 12:15 |
martial_ | b1airo: I understand but if you look at https://www.openstack.org/summit/sydney-2017/summit-schedule/global-search?t=scientific it is the wrong items | 12:16 |
martial_ | b1airo: are you concerned the "HPC Ops Unite" is a repeat of "SWG BoF", is that what you mean? | 12:19 |
martial_ | I think the BoF is specific to current members and the other is to welcome people that are not yet members | 12:20 |
jmlowe | The ceph bof from MSI? | 12:39 |
jmlowe | Oh, wait, you mean Sydney not SC'17 | 12:40 |
jmlowe | I couldn't swing Sydney | 12:40 |
*** jmlowe has quit IRC | 12:48 | |
*** jmlowe has joined #scientific-wg | 13:34 | |
*** hogepodge has joined #scientific-wg | 13:35 | |
hogepodge | Sorry b1airo I fell back asleep | 13:35 |
hogepodge | I didn't understand the question too, access to the sections? Also, what were the terms offered for the images? Attribution? | 13:36 |
martial_ | jmlowe: do you know if Tim and Robert are coming? | 13:40 |
jmlowe | Tim yes, Robert no | 13:40 |
martial_ | I think Tim is a good candidate for the "HPC Ops Unite" moderator, will reach out to him | 13:40 |
martial_ | I was hoping to be able to give Robert a slot as moderator for a forum session, I know he was interested in the experience | 13:41 |
martial_ | I know Mike May is a good candidate for the Ceph one, who else would you recommend? | 13:42 |
jmlowe | Anybody from MSI if they are going, only people I know who have been using ceph in production longer than me | 13:43 |
jmlowe | Trying to remember who was at SC ceph bof's and might be going to Sydney | 13:45 |
martial_ | if you can email me those info, I can follow up with them. I would appreciate that | 13:46 |
martial_ | thank you very much | 13:46 |
jmlowe | bollig and masber? are from MSI | 13:47 |
martial_ | (okay looking for their email :) ) | 13:54 |
martial_ | thanks Mike | 13:54 |
martial_ | found Evan, will reach out and ask him to follow up with "masber" (real name?) | 13:57 |
rbudden | martial_: sorry, I couldn’t swing Sydney either unfortunately :( | 14:07 |
martial_ | Robert, I am sorry to hear that, I was looking forward to see you guys again ... Vancouver hopefully | 14:07 |
rbudden | yes, I think I should be able to swing Vancouver | 14:07 |
rbudden | international travel is rather difficult at times | 14:08 |
rbudden | I need to write more papers/presentations ;) | 14:08 |
martial_ | I understand, trust me :) | 14:08 |
*** jmlowe has quit IRC | 14:31 | |
*** jmlowe has joined #scientific-wg | 14:37 | |
*** jmlowe has quit IRC | 14:48 | |
*** jmlowe has joined #scientific-wg | 14:48 | |
*** jmlowe has quit IRC | 15:03 | |
*** jmlowe has joined #scientific-wg | 15:04 | |
*** martial_ has quit IRC | 15:13 | |
*** jmlowe has quit IRC | 15:32 | |
*** b1airo has quit IRC | 16:20 | |
*** jmlowe has joined #scientific-wg | 16:35 | |
jmlowe | Come hell or high water I'm going to Vancouver, Berlin is an easier sell than Sydney | 16:36 |
*** b1airo has joined #scientific-wg | 16:37 | |
masber | jmlowe, MSI? | 16:43 |
jmlowe | Minnesota Super Computing Institute, and clearly I was wrong, you are from AZ right? | 16:44 |
masber | Im going to sydney summit | 16:45 |
masber | and I also have talk I have to prepare | 16:46 |
masber | jmlowe, sorry I'm not from MSI but from Garvan Institute (Sydney) | 16:47 |
jmlowe | I need more coffee, country code for Australia is AU not AZ | 16:47 |
jmlowe | masber yeah, getting all kinds of things wrong today | 16:48 |
masber | jmlowe, np | 16:48 |
masber | b1airo, are you from Monash University? | 16:48 |
masber | could I ask now that most of you are here, how do you deploy openstack? | 16:50 |
masber | I used kolla-ansible but I am thinking to either move to openstack-ansible or tripleO? any thoughts? | 16:51 |
jmlowe | b1airo is from Monash | 16:51 |
jmlowe | I rolled my own using saltsack https://github.com/jetstream-cloud/Jetstream-Salt-States | 16:52 |
masber | wow | 16:52 |
jmlowe | rbudden was using packstack for a while, not sure what now | 16:53 |
masber | packstack is not production ready as far as I know | 16:53 |
jmlowe | Do I remember that trandles was giving kolla a try? | 16:53 |
jmlowe | rbudden has a special case, they might have the world's largest ironic cluster | 16:54 |
jmlowe | https://www.psc.edu/bridges is their machine | 16:54 |
masber | apprently they use HPC and big data, I wonder how do they deploy hadoop ecosystem | 16:56 |
jmlowe | he's active, I'll poke him in a minute | 16:59 |
jmlowe | https://www.stackhpc.com/blog.html | 16:59 |
jmlowe | Stig is apparently using Kolla | 16:59 |
rbudden | masber: we have a hadoop guy that converts portions of bridges into hadoop sections on demand | 17:00 |
rbudden | we actually don’t do a lot of hadoop | 17:00 |
rbudden | less than we expected | 17:01 |
rbudden | but when we do the user(s) request a reservation in Slurm for the # of nodes they want, then our hadoop guys scripts everything up | 17:01 |
rbudden | I’d like to play with Sahara eventually and just have OpenStack handle it | 17:01 |
rbudden | maybe one day ;) | 17:01 |
masber | rbudden, why don't you use ambari? it will give a better supported version of hadoop and monitoring and control version of the configuration files | 17:02 |
rbudden | I can’t answer that as I don’t handle any of our hadoop infra, but I’ll check it out | 17:03 |
rbudden | I’m largely our OpenStack guy and do some filesystems work and other dev work | 17:03 |
masber | has anyone used zun + kuryr? | 17:04 |
masber | rbudden, so you use packstack for provisioning openstack? | 17:05 |
rbudden | yes, modified packstack | 17:05 |
rbudden | looking to move to kolla | 17:05 |
masber | why not tripleO? | 17:05 |
masber | kolla-k8s or kolla-ansible? | 17:06 |
rbudden | triple o is on the radar to take a look at | 17:06 |
rbudden | i largely have ansible to handle things so OSA seemed interesting as well | 17:07 |
rbudden | was thinking Kolla just based on Stig’s experience with it | 17:07 |
masber | yeah OSA and tripleO are the ones I want to move on | 17:07 |
masber | they both provide quite a lot of features like SR-IOV and ironic | 17:08 |
rbudden | i’m open to anything at this point that would simplify things and containerize the setup | 17:08 |
masber | tripleO does containers based on docker and OSA based on linux containers | 17:08 |
masber | I like tripleo because they release updates quite fast | 17:09 |
masber | and I want to use pike so I can deploy containers using kuryr and zun | 17:09 |
masber | and I am also exicted about ceph-bluestore | 17:10 |
masber | I think tripleo/packstack will be the first ones to support it | 17:10 |
jmlowe | is zun usable now? | 17:11 |
rbudden | i’m very interested in Zun as well | 17:11 |
masber | apparently yes with pike | 17:11 |
rbudden | I was sitting in on the IRC meetings, but it’s 11pm EST so it conflicts with getting sleep while my baby sleeps ;) | 17:11 |
jmlowe | I last checked in in on zun during the Boston summit, looked a bit like varporware | 17:12 |
masber | egonzalez from kolla-ansible is quite active helping the community and giving support, he told he they were testing zun and kuryr | 17:12 |
rbudden | jmlowe: it’s more than vapor, it’s largely spun off nova-docker so a lot of the work was already done/started | 17:13 |
rbudden | i was planning on trying it on Bridges shortly | 17:13 |
rbudden | in a test setup | 17:13 |
masber | jmlowe, why is that? apparently running containers using nova is deprecated | 17:13 |
jmlowe | it looked like they were just getting the project organized back in May | 17:14 |
rbudden | masber: correct, nova-docker is effectively dead. it was just one dev ‘dimms’ maintaining it | 17:14 |
rbudden | he’s now part of the Zun team | 17:14 |
jmlowe | I reserve the right to be completely wrong | 17:14 |
masber | I am wondering how kuryr works... does it provide service discovery? | 17:15 |
masber | jmlowe, I remember you told me you achive 97% efficiency in your cluster, which tool do you use to measure that? | 17:18 |
jmlowe | that was linpack | 17:18 |
jmlowe | HPL 0.696GFLOPS bare metal .678GFLOPS vm | 17:19 |
jmlowe | worst was STREAM 88GB/s bare metal 68GB/s vm | 17:20 |
*** goldenfri has joined #scientific-wg | 17:20 | |
jmlowe | I take that back, worst was FFTE 13.75 vs 9.235 GFLOPS | 17:20 |
masber | jmlowe, why don't use YCSB? | 17:21 |
jmlowe | We ran HPCC because we are funded by an agency that buys HPC and runs HPCC on all their machines | 17:22 |
jmlowe | YCSB would have been nice but we also had a 6 month delay in hardware delivery and a 2 week delay in production date | 17:23 |
masber | jmlowe, why such a difference between baremetal and vm in STREAM? NUMA without cpu pinning? | 17:24 |
jmlowe | Jetstream went from os install on nodes 2015-11-09 to first vm started 2015-12-15 with early operations and users on 2016-02 | 17:24 |
jmlowe | probably, they are numa nodes | 17:25 |
jmlowe | we don't do pinning | 17:25 |
jmlowe | 17 business days to do bare metal benchmarks, burn in, and get openstack functional | 17:26 |
masber | numa is a pain and it is hard to fully utilize hardware resources with cpu pinning but apparently gives better performance on big VMs | 17:27 |
masber | jmlowe, enable cpu pinning is quite easy you just need to put the attributes when you create the image | 17:28 |
jmlowe | one of those things I would have liked to fully map out | 17:28 |
jmlowe | it also prevents live migration, that is not something I'm willing to give up for any price | 17:28 |
masber | so let me ask you, do you do any type of optimization on openstack? | 17:29 |
jmlowe | our mandate isn't hpc, it is to get the %97 of users eligible to use NSF funded resources who don't, getting an extra %10 performance boost won't do it | 17:29 |
jmlowe | Any time there was a choice between live migrate and always up vs speed we chose live migrate, no sriov, no pinning, 10gige w/ vxlan | 17:31 |
jmlowe | again, if users had to squeeze that last %10 out of a node they would already be using hpc systems | 17:32 |
masber | yes | 17:32 |
jmlowe | also I think those runs were done with gcc not intel | 17:32 |
masber | ok, im going to be, it is 3.37am | 17:37 |
*** goldenfri has quit IRC | 18:26 | |
*** jmlowe has quit IRC | 19:25 | |
*** b1airo has quit IRC | 19:47 | |
*** b1airo has joined #scientific-wg | 19:48 | |
*** martial has joined #scientific-wg | 19:56 | |
*** martial has quit IRC | 20:20 | |
trandles | yes, I've been attempting to use kolla this week | 21:26 |
trandles | I've been totally unimpressed | 21:26 |
trandles | I think it's largely a lack of coherent documentation though | 21:27 |
trandles | the quickstart guide didn't work | 21:28 |
trandles | the "how to build an image" documentation makes a load of assumptions that you have to trip over and discover on your own before you can make progress | 21:29 |
trandles | if you start here (which doesn't seem unreasonable given the title of the page) https://docs.openstack.org/kolla/latest/index.html | 21:29 |
trandles | you are sort of persuaded to choose between kolla-ansible and kolla-kubernetes | 21:30 |
trandles | I went down the kolla-ansible path and followed each link in the User Guides in succession...but only got to "Quick Start" before it failed | 21:31 |
trandles | when googling for help with the failure I came across a bug report that basically said "we never said everything would work, here's documentation to prove it" and the issue was closed | 21:32 |
trandles | I think if you were trying to use the bootstrap-servers ansible play but not for the most recent release (i.e. Pike in my case) you might be ok | 21:34 |
trandles | I'm very interested in talking to Stig about how they managed an upgrade using kolla and ansible. One hint may be that they | 21:36 |
trandles | I fell back to using the kolla stable/pike branch on github. In there I used the various scripts in tools/ but only had partial success. Several of the images didn't build. I have been busy with other things today and haven't had a chance to go back and debug yet. | 21:39 |
trandles | Overall though, I don't think I'm going to try to untangle the kolla mess | 21:40 |
*** goldenfri has joined #scientific-wg | 21:41 | |
trandles | This however looks really good: https://docs.openstack.org/install-guide/ | 21:44 |
trandles | Following it for a clean pike install has been painless. Plus I know docker well, so working from this guide to make LANL-specific dockerfiles, for only the things we care about right now, is the path I'm pursuing. | 21:45 |
*** b1airo has quit IRC | 22:11 | |
*** rbudden has quit IRC | 22:24 | |
*** rbudden has joined #scientific-wg | 22:38 | |
*** b1airo has joined #scientific-wg | 22:47 | |
*** b1airo has quit IRC | 23:10 | |
*** jmlowe has joined #scientific-wg | 23:32 | |
*** b1airo has joined #scientific-wg | 23:35 | |
*** jmlowe has quit IRC | 23:52 | |
*** jmlowe has joined #scientific-wg | 23:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!