*** radsy has joined #tripleo | 00:00 | |
lifeless | slagle: one thing I'd like to do | 00:01 |
---|---|---|
lifeless | slagle: like source-repositories factors out 'things to download' | 00:01 |
lifeless | slagle: I'd love a declarative 'things to install' in elements | 00:01 |
SpamapS | lifeless: yes | 00:01 |
slagle | lifeless: yes | 00:01 |
lifeless | slagle: so we can scan and do one big install run | 00:01 |
SpamapS | lifeless: I started working on that at one point | 00:01 |
GheRivero | +1 | 00:01 |
slagle | and a things to uninstall at the end | 00:01 |
SpamapS | lifeless: we can also have two of them, dev and runtime | 00:01 |
lifeless | greghaynes: reviewed https://review.openstack.org/#/c/83675/ rev 6 | 00:01 |
SpamapS | or rather, build / runtime | 00:01 |
SpamapS | so build deps can be pulled out at the end | 00:02 |
lifeless | SpamapS: three perhaps | 00:02 |
lifeless | SpamapS: runtime; buildtime; testtime | 00:02 |
SpamapS | we don't really need python-dev et. al on our servers | 00:02 |
greghaynes | lifeless: ty | 00:02 |
SpamapS | or gcc for that matter.. or.. do we.. | 00:02 |
* SpamapS shakes fist at cffi | 00:02 | |
lifeless | SpamapS: we do because cffi and packaging fail | 00:03 |
lifeless | SpamapS: but yes | 00:03 |
lifeless | note that by testtime I mean 'things where we test our code, not the CI->glance->deploy pipeline tests. | 00:03 |
lifeless | those are separate, obviously. | 00:03 |
lifeless | greghaynes: 7 pushed up while I was reviewing - sorry | 00:04 |
greghaynes | is np, forgot about the stackname deal | 00:04 |
greghaynes | er, forgot about the cluster name | 00:04 |
lifeless | greghaynes: as for what next - I think the big arc is something like this - get it working, get it in CI, where working means '3 node control planes work' | 00:05 |
lifeless | StevenK: reminds me - ping - I'd love it if you broadened your hacking slightly to included hetergeneous VM descriptions | 00:05 |
lifeless | StevenK: so we can make the hypervisors smaller | 00:05 |
greghaynes | Does 3 node control planes work mean with upgrading? | 00:06 |
lifeless | greghaynes: we don't test upgrade in CI yet | 00:06 |
lifeless | greghaynes: the graceful upgrade arc also needs to be pushed on | 00:06 |
lifeless | greghaynes: and there are cross-arc deps - like, hard to test graceful works properly without a cluster to graceful upgrade. | 00:07 |
lifeless | greghaynes: and - hard to have a cluster without graceful deploys | 00:07 |
greghaynes | true. Such dependencies | 00:07 |
greghaynes | ok, ill play with merge.py and ther other things preventing my stack from reaching CREATE_COMPLETE when deploying to controlscale > 1. | 00:08 |
lifeless | yeah, CREATE_COMPLETE Is the first step | 00:10 |
*** geerdest has quit IRC | 00:10 | |
derekh | lifeless: we got green jobs | 00:11 |
lifeless | fuckyeah | 00:12 |
lifeless | and one rather long backlog :) | 00:12 |
derekh | yup, testenvs will be busy | 00:12 |
*** sdake_1 has quit IRC | 00:15 | |
*** rpodolyaka1 has joined #tripleo | 00:19 | |
slagle | lifeless: so may plan for stable branches is to create an "icehouse" branch for tie, tht, t-inc, and tuskar. and document on the ReleaseManagement page that for doing releases for tie/tht/tuskar from the icehouse branch, you need to add an additional .[0-9] to the tag you create for the version | 00:23 |
slagle | lifeless: does that sound ok? | 00:23 |
*** CaptTofu has joined #tripleo | 00:23 | |
*** rpodolyaka1 has quit IRC | 00:24 | |
lifeless | slagle: huh | 00:24 |
lifeless | slagle: let me dig up the thread where we discussed this | 00:24 |
lifeless | slagle: versions should still be x.y.z right ? | 00:26 |
*** matsuhashi has joined #tripleo | 00:26 | |
lifeless | slagle: or are you proposing x.y.z.a ? | 00:27 |
slagle | lifeless: x.y.z.a, or you wouldn't be able to upgrade and stay on stable | 00:27 |
slagle | unless we say that you always bump .Y when releasing from master | 00:27 |
slagle | and we only use .Z for releasing from the stable branch | 00:28 |
*** CaptTofu has quit IRC | 00:28 | |
*** derekh has quit IRC | 00:28 | |
xuhaiwei | lifeless: when running the devtest_seed.sh, it's asking root password for many times, is it normal? | 00:28 |
lifeless | xuhaiwei: no | 00:28 |
*** openstackgerrit has joined #tripleo | 00:30 | |
lifeless | slagle: so - I have an alternative proposal | 00:37 |
lifeless | slagle: staying to x.y.z is important - see the semver docs in pbr | 00:37 |
lifeless | slagle: if we make sure the next release is tie/tht/tuskar increments the y of x.y.z | 00:39 |
lifeless | slagle: then the stable branch can increment z indefinitely | 00:39 |
lifeless | slagle: t-inc doesn't release | 00:39 |
lifeless | slagle: so it can just create a branch | 00:39 |
*** eguz has joined #tripleo | 00:39 | |
slagle | lifeless: ok, that works for me | 00:39 |
slagle | should be simpler too. i can doc that on the ReleaseManagement wiki | 00:40 |
slagle | and yes, no release for t-inc :). | 00:41 |
*** eghobo has quit IRC | 00:43 | |
greghaynes | SpamapS: If I want to add a resource to a heat template that is 'somestring-' + Heat::RandomString is there a way to do this? Specifically I want to make a MysqlCluserName resource that mysql can use... | 00:43 |
SpamapS | greghaynes: no | 00:44 |
greghaynes | :_( | 00:44 |
SpamapS | greghaynes: the name of the resource is entirely static unfortunately. | 00:45 |
SpamapS | greghaynes: you can just have a MySQLClusterUniquePart randomstring and prepend somestring- when you use it. | 00:45 |
greghaynes | Yep, doing that | 00:46 |
*** matsuhashi has quit IRC | 00:47 | |
*** matsuhashi has joined #tripleo | 00:48 | |
*** sdake has joined #tripleo | 00:48 | |
*** sdake has quit IRC | 00:48 | |
*** sdake has joined #tripleo | 00:48 | |
*** e0ne has joined #tripleo | 00:52 | |
*** matsuhashi has quit IRC | 00:52 | |
*** e0ne has quit IRC | 00:56 | |
*** matsuhashi has joined #tripleo | 00:57 | |
SpamapS | greghaynes: I do think Heat needs the concept of variables or at the very least macros, just to make code easier to read. | 00:58 |
lifeless | we were using params for that | 00:59 |
lifeless | it broken | 00:59 |
lifeless | apparently its ok to put it or something like it back in, but needs doing + tests | 00:59 |
SpamapS | greghaynes: Something like Variables: {MySQLClusterName: {Fn::Join: ['-', 'something', {Ref: RandomMySQLPart}]}} | 00:59 |
SpamapS | lifeless: Yeah, parameters seems like a violation though. I think that should be possible, but local variables would be good too | 01:00 |
SpamapS | lifeless: btw I fixed this: https://review.openstack.org/#/c/83614/ | 01:00 |
SpamapS | lifeless: it's blocking the software-config migration | 01:00 |
greghaynes | hrm, that doesnt seem too hard to implement either. Would you use the same way (Ref:) to refer to them and how would you deal with colliding namespaces then | 01:01 |
greghaynes | or do something like VarRef: | 01:01 |
lifeless | SpamapS: huh we don't even run unittests for occ | 01:06 |
lifeless | oh, its tie, nvm | 01:09 |
openstackgerrit | A change was merged to openstack-infra/tripleo-ci: Expose more testenv parameters https://review.openstack.org/82855 | 01:10 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Fixup testenv config for interface names. https://review.openstack.org/84326 | 01:11 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Fixup HP region testenv config. https://review.openstack.org/84075 | 01:11 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Performance tweaks for testenv deploy script. https://review.openstack.org/84073 | 01:11 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Tune deploy-ci-overcloud a little. https://review.openstack.org/84076 | 01:11 |
*** rpodolyaka1 has joined #tripleo | 01:19 | |
*** eguz has quit IRC | 01:20 | |
*** rpodolyaka1 has quit IRC | 01:21 | |
*** ramishra has joined #tripleo | 01:22 | |
lifeless | thats a new one http://paste.openstack.org/show/74788/http://paste.openstack.org/show/74788/ | 01:27 |
*** ramishra has quit IRC | 01:30 | |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: A sysctl element to manage settings via sysctl.d. https://review.openstack.org/84599 | 01:33 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Update bootstack to use sysctl-set-value. https://review.openstack.org/84600 | 01:33 |
*** bauzas has quit IRC | 01:39 | |
*** nosnos has joined #tripleo | 01:47 | |
*** e0ne has joined #tripleo | 01:52 | |
*** e0ne has quit IRC | 01:57 | |
*** spzala has quit IRC | 01:58 | |
*** ccrouch has left #tripleo | 02:07 | |
*** CaptTofu has joined #tripleo | 02:07 | |
*** sballe has joined #tripleo | 02:11 | |
*** rpodolyaka1 has joined #tripleo | 02:20 | |
*** rpodolyaka1 has quit IRC | 02:24 | |
*** sballe has quit IRC | 02:26 | |
*** newell_ has quit IRC | 02:29 | |
*** yamahata has joined #tripleo | 02:31 | |
*** ramishra has joined #tripleo | 02:36 | |
*** rlandy has quit IRC | 02:37 | |
*** ramishra_ has joined #tripleo | 02:37 | |
*** ramishra has quit IRC | 02:41 | |
*** giulivo has quit IRC | 02:46 | |
*** e0ne has joined #tripleo | 02:52 | |
*** e0ne has quit IRC | 02:57 | |
*** xuhaiwei has quit IRC | 02:58 | |
*** untriaged-bot has joined #tripleo | 03:00 | |
untriaged-bot | Untriaged bugs so far: | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1290488 | 03:00 |
*** untriaged-bot has quit IRC | 03:00 | |
uvirtbot | Launchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete] | 03:00 |
openstackgerrit | Om Kumar proposed a change to openstack/diskimage-builder: Fix Grub configurations for Fedora images built on a UEFI host. https://review.openstack.org/83342 | 03:09 |
*** CaptTofu has quit IRC | 03:12 | |
StevenK | lifeless: So if I'm understanding you correctly, init-heat only needs host defined, init-keystone needs them all, and init-swift doesn't need any? | 03:14 |
lifeless | StevenK: yup | 03:17 |
lifeless | StevenK: in more detail | 03:17 |
lifeless | init-keystone can't use the keystone API normally because the keystone API requires that you have an admin account created | 03:18 |
lifeless | init-heat and init-swift should use the normal keystone API and not require (or hav access to) the admin token | 03:18 |
lifeless | they should instead use the normal keystoneclient CLI glue - the OS_USERNAME etc etc variables, which I *think* there are trivial facilities to reuse in python-keystoneclient etc | 03:19 |
killer_prince | lifeless: were you able to re-review https://review.openstack.org/#/c/79873/ (Refactor code to select boot kernel) | 03:20 |
*** rpodolyaka1 has joined #tripleo | 03:21 | |
lifeless | no, CI cloud was down last 2.75 days | 03:21 |
lifeless | we're now up and monitoring it | 03:21 |
lifeless | but that took precedence over everything | 03:21 |
*** matsuhashi has quit IRC | 03:21 | |
killer_prince | aha.. np... just checking.. | 03:22 |
killer_prince | can you re-review it now.. | 03:22 |
killer_prince | if you have time.. | 03:22 |
*** rpodolyaka1 has quit IRC | 03:25 | |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: Add CLI scripts for init-{heat,keystone,swift} https://review.openstack.org/84330 | 03:30 |
*** jtomasek has quit IRC | 03:33 | |
*** morganfainberg is now known as morganfainberg_Z | 03:34 | |
*** nosnos has quit IRC | 03:37 | |
*** ramishra_ has quit IRC | 03:37 | |
*** ramishra has joined #tripleo | 03:37 | |
*** xuhaiwei has joined #tripleo | 03:37 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Enable Galera clustering https://review.openstack.org/83675 | 03:41 |
*** eghobo has joined #tripleo | 03:45 | |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: Add CLI scripts for init-{heat,keystone,swift} https://review.openstack.org/84330 | 03:47 |
*** e0ne has joined #tripleo | 03:52 | |
openstackgerrit | James Polley proposed a change to openstack/tripleo-incubator: Standardise location of environment password/rc files. https://review.openstack.org/83250 | 03:53 |
*** killer_prince is now known as lazy_prince | 03:55 | |
*** e0ne has quit IRC | 03:56 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-heat-templates: Add initial support for galera clustering https://review.openstack.org/83883 | 03:57 |
tchaypo | greghaynes: i love your nitpicking | 03:58 |
greghaynes | :p | 03:58 |
greghaynes | sorry, I wish I could find it all in one round | 03:58 |
tchaypo | I wish i thoguht of it when i first wrote it | 03:59 |
* tchaypo dreams of denying greghaynes the pleasure of nitpicking | 03:59 | |
greghaynes | one day | 03:59 |
tchaypo | almost as fun will be being nitpicky on your commits :) | 03:59 |
* greghaynes hides | 04:00 | |
*** rpodolyaka1 has joined #tripleo | 04:02 | |
StevenK | tchaypo: I find the best solution is to have stuff he doesn't notice land. | 04:02 |
tchaypo | it's hard when he never seems to not be here | 04:02 |
lifeless | tchaypo: ok so | 04:03 |
tchaypo | maybe we should work US-EAST hours to avoid him? | 04:03 |
lifeless | tchaypo: IIRC you were going to dive in on the HA stuff w/greghaynes? | 04:03 |
lifeless | tchaypo: hows that going? can I help at all ? | 04:03 |
*** akuznetsov has joined #tripleo | 04:03 | |
tchaypo | so yesterday I grabbed all of greghaynes ' patches and cherry-picked them, then tried to build myself an overcloud. it failed, but it looks like that might have been my own error, so I'm making another attempt today | 04:04 |
lifeless | cool | 04:05 |
greghaynes | tchaypo: Want a really bad heat template? | 04:05 |
tchaypo | from what I've seen, the notcomputescale stuff is working as far as building out the right number of nodes, it just requires changes to the services that need to be HA | 04:06 |
tchaypo | greghaynes: yes please | 04:06 |
greghaynes | mm, well the (now named) controlscale change doesnt work for values != 1 | 04:06 |
tchaypo | once I have enough nodes running, I plan to look at what you had to do for percona and make some guesses based on that about what I'd have to do for rabbitmq (which should be far less, probably just telling it "you're not alone" or something, but since I know nothing about rabbit...) | 04:07 |
tchaypo | in what way doesn't it work? | 04:07 |
greghaynes | The merge.py scaling operation doesnt result in a sane template | 04:07 |
greghaynes | hence the really bad template which I've been using rather than relying on merge.py to perform the scaling | 04:08 |
tchaypo | ah, right. I didn't get that far yesterday. | 04:08 |
lifeless | tchaypo: one thing you can do to make this less painful is skip the undercloud | 04:09 |
lifeless | register all nodes with the seed rather than just the first one | 04:09 |
StevenK | lifeless: Are you preparing a novel for a PTL mail? | 04:11 |
tchaypo | I think I understand the concept, but the details of how nodes get registered is one of those areas I've been skipping past and planning to investigate when it starts blocking me. One of the things I saw yesterday is that _overcloud.sh was getting stuck waiting for nova hypervisor-stats to show a count of >1 | 04:13 |
tchaypo | I'm assuming that count is the count of registered nodes | 04:13 |
tchaypo | and I think the registration happens inside setup-baremetal, so I'm just arriving at the point of starting to dig into that to see what it does | 04:14 |
StevenK | register-nodes | 04:14 |
StevenK | Which setup-baremetal calls | 04:14 |
greghaynes | nova baremetal-node-list is more directly that, hypervisor-stats is what resources are used / available | 04:14 |
greghaynes | so they are related | 04:14 |
tchaypo | oh look, it calls a script called setup-nodes | 04:14 |
tchaypo | that was fairly obvious, once i looked | 04:15 |
StevenK | I don't think so, I think it's register-nodes :-) | 04:15 |
tchaypo | greghaynes: but the script is calling hypervisor-stats, so that's what i care about | 04:15 |
tchaypo | StevenK: dangnabbit. | 04:15 |
* StevenK is currently rewriting register-nodes, anyway | 04:15 | |
greghaynes | yep, I mention it because when you start seeing No Valid Host Found errors from nova youll want to know both commands | 04:16 |
tchaypo | thanks | 04:16 |
StevenK | Well. So far I'm trying to work out how to talk to nova-bm using the python API, and not having much luck. | 04:16 |
tchaypo | okay, so if we're not using ironic it looks like :register a node" means "nova barematal-interface-add" | 04:16 |
StevenK | Nope | 04:17 |
lifeless | StevenK: can I suggest subprocess.check_call ? | 04:17 |
lifeless | StevenK: iteration 0 | 04:17 |
StevenK | A few lines up | 04:17 |
StevenK | lifeless: Ew? | 04:17 |
lifeless | StevenK: I know it's not great, but its better than being bottlenecked on it - once its in the reusable place many people can help fix it | 04:17 |
lifeless | StevenK: right now, they can't, and tuskar can't use the functionality at all | 04:17 |
tchaypo | oh, nova baremetal-node-create ? | 04:17 |
StevenK | Yeah | 04:17 |
greghaynes | winner | 04:17 |
* tchaypo feels slightly less clueless | 04:18 | |
StevenK | lifeless: Now I'm currently internally battling about the perfect is the enemy of the good | 04:18 |
greghaynes | tchaypo: https://gist.github.com/greghaynes/9927862 | 04:19 |
lifeless | StevenK: of course :) | 04:20 |
tchaypo | ah, ithink i understand how this can save me time now. Instead of having to do all of devtest_undercloud just to register the one undercloud node, then build it out, set up everything again, just to register the overcloud nodes so i can heat stack-create overcloud | 04:20 |
StevenK | You're right. That is pretty horrible. | 04:20 |
greghaynes | be afraid | 04:20 |
tchaypo | registering those nodes with the seed should let me skip directly to heat stack-create overcloud? | 04:20 |
lifeless | tchaypo: well, to all the logic in devtest_overcloud.sh; yes. | 04:20 |
tchaypo | to me that feels like a lot of effort for very little saving | 04:21 |
lifeless | tchaypo: 15m or so | 04:21 |
lifeless | tchaypo: and by lots of effort you mean one script call ? | 04:21 |
lifeless | tchaypo: source seedrc; register-nodes seed <(jq '.nodes - [.nodes[0]]' $TE_DATAFILE) | 04:23 |
tchaypo | i mean that creating the undercloud is one call; to register the nodes it looks like I'd have to pick apart the nova baremetal-node-create call and figure out how to do one by hand | 04:24 |
tchaypo | however you've just made me realise i don't need to do that at all. | 04:24 |
StevenK | tchaypo: Er, you call register-nodes | 04:24 |
StevenK | Like lifeless just pointed out | 04:24 |
lifeless | tchaypo: a design principle we have is to have small tools | 04:26 |
lifeless | tchaypo: that are reusable | 04:26 |
tchaypo | and loosely joined | 04:27 |
lifeless | right | 04:27 |
lifeless | so anytime you have cognitive dissonance between someone saying 'do X, its easy' and 'omg wall of stuff to do' - look for, and ask about, a tool :) | 04:27 |
lifeless | (because there probably is one) | 04:28 |
tchaypo | in this case i was too busy reading the guts of the tool to realise that the tool itself is what i wanted | 04:28 |
lifeless | :) | 04:28 |
tchaypo | okay, I'm going to take a short tea-break and then register some nodes | 04:29 |
StevenK | tchaypo: My blood, sweat and tears are in the guts of register-nodes | 04:29 |
tchaypo | I assume that one difference in this case is that I'll want to be sourcing seedrc rather than undercloudrc prior to running _overcloud.sh | 04:30 |
*** nosnos has joined #tripleo | 04:31 | |
lifeless | tchaypo: yes | 04:31 |
*** killer_prince has joined #tripleo | 04:35 | |
*** matsuhashi has joined #tripleo | 04:36 | |
*** Rakesh5 has joined #tripleo | 04:49 | |
*** matsuhashi has quit IRC | 04:50 | |
*** e0ne has joined #tripleo | 04:52 | |
*** radsy has quit IRC | 04:55 | |
*** cwolferh has quit IRC | 04:56 | |
*** cwolferh has joined #tripleo | 04:57 | |
*** e0ne has quit IRC | 04:57 | |
*** matsuhashi has joined #tripleo | 05:00 | |
openstackgerrit | A change was merged to openstack/diskimage-builder: Fix dhcp-all-interfaces upstart job https://review.openstack.org/84539 | 05:11 |
*** e0ne has joined #tripleo | 05:24 | |
xuhaiwei | OSError: [Errno 2] No such file or directory: '/tmp/pypi/markupsafe/' means markupsafe package download failed? | 05:33 |
xuhaiwei | this package is in the nova/requirements, but can't find it in the mirror | 05:34 |
xuhaiwei | sorry, in the mirror I found MarkupSafe, it can't be used just because the name uses capital character ? | 05:38 |
xuhaiwei | lifeless: Could you please answer this question? | 05:40 |
*** e0ne has quit IRC | 05:45 | |
*** e0ne has joined #tripleo | 05:45 | |
*** lazy_prince has quit IRC | 05:46 | |
*** killer_p- has joined #tripleo | 05:47 | |
tchaypo | afternoon xuhaiwei | 05:47 |
xuhaiwei | good aternoon | 05:48 |
xuhaiwei | afternoon, :) | 05:48 |
tchaypo | it's almost 8pm for lifeless, I don't think he's going to be around | 05:50 |
lifeless | xuhaiwei: I don't see markupsafe in the requirements | 05:50 |
lifeless | tchaypo: 7pm atm | 05:50 |
lifeless | tchaypo: and C is just watching TV so I have a minute | 05:50 |
*** e0ne has quit IRC | 05:50 | |
xuhaiwei | I am glad you are still here | 05:50 |
lifeless | xuhaiwei: https://pypi.python.org/pypi/MarkupSafe is the official pypi page for it | 05:50 |
xuhaiwei | but I see this log :Downloading/unpacking markupsafe (from Jinja2->-r /opt/stack/nova/requirements.txt (line 8)) | 05:50 |
lifeless | xuhaiwei: so you can see the name should be MarkupSafe | 05:51 |
lifeless | xuhaiwei: so Jinja2 is a dependency | 05:51 |
lifeless | xuhaiwei: marksupsafe is a dependency of Jinja2 | 05:51 |
xuhaiwei | jinja2 is not in the mirror | 05:52 |
tchaypo | I dig into the code a few weeks ago - it turns out there are a few weird things with case | 05:52 |
tchaypo | the pypi servers do weird things to be mostly case-insensitive; and pip itself will re-try with the name all in lowercase if it fails the first time | 05:52 |
xuhaiwei | Jinja2 is in the mirror, why it fails to use it | 05:52 |
tchaypo | but usually download fails are just transient network issues - does it consistently fail every time you try? | 05:52 |
lifeless | xuhaiwei: this is what I see: | 05:53 |
lifeless | Downloading/unpacking markupsafe (from Jinja2) | 05:53 |
lifeless | Real name of requirement markupsafe is MarkupSafe | 05:53 |
lifeless | http://mirror.robertcollins.net/pypi/simple/MarkupSafe/ uses an insecure transport scheme (http). Consider using https if mirror.robertcollins.net has it available | 05:53 |
lifeless | do you see the Real name line ? | 05:53 |
xuhaiwei | the Real name line? | 05:55 |
xuhaiwei | from where? | 05:55 |
xuhaiwei | I can't access to http://mirror.robertcollins.net/pypi/simple/MarkupSafe/ | 05:55 |
lifeless | thats my mirror, its a private url | 05:55 |
lifeless | 18:53 < lifeless> Real name of requirement markupsafe is MarkupSafe | 05:55 |
lifeless | ^ that line | 05:55 |
lifeless | do you see it in your log ? | 05:55 |
xuhaiwei | no | 05:56 |
*** rpodolyaka1 has quit IRC | 05:57 | |
lifeless | ok so now we need to figure out why :) | 05:57 |
StevenK | Hm | 05:57 |
StevenK | Tomorrow or Friday could be interesting -- plumber visit with a new hot water service. | 05:57 |
xuhaiwei | there is only "markupsafe" | 05:57 |
lifeless | xuhaiwei: can you pastebin exactly what you see please v? | 05:58 |
xuhaiwei | ok | 05:58 |
xuhaiwei | http://paste.openstack.org/show/74795/ | 05:59 |
lifeless | oh wow thats nasty | 06:01 |
lifeless | I know now how pip handle case insensitivty | 06:01 |
StevenK | bnemec: Found your blog article about pypi mirroring -- Ubuntu Saucy i386 is ~60GiB, but adding amd64 is probably only going to bump that by 25GiB or so. | 06:03 |
*** e0ne has joined #tripleo | 06:03 | |
tchaypo | lifeless: case not-entirely-sensitivity might be a better name | 06:04 |
*** e0ne has quit IRC | 06:04 | |
lifeless | tchaypo: look at pip/index.py line 433 | 06:04 |
tchaypo | lifeless: but the issue is compounded by the fact that the pypi official servers do extra case insentivity server-side | 06:04 |
lifeless | tchaypo: no, thats irrelevant | 06:05 |
tchaypo | lifeless: that's the line that calls .tolower() and then s/-/_/ right? | 06:05 |
lifeless | tchaypo: my local mirror *doesn't* and it still works. | 06:05 |
lifeless | tchaypo: no, its worse | 06:05 |
tchaypo | oh dear. | 06:05 |
StevenK | Hmmm. https://pypi.python.org/pypi/pep381client | 06:05 |
tchaypo | lifeless: for bad_ext in ... ? | 06:08 |
tchaypo | lines 197-214 of pip-1.2-py2.7.egg/pip/index.py is there the "Real name of requirement %s is %s" comes from | 06:11 |
lifeless | logger.notify( | 06:13 |
lifeless | 'Real name of requirement %s is %s' % (url_name, base) | 06:13 |
lifeless | yeah anyhwo | 06:13 |
*** cwolferh has quit IRC | 06:14 | |
tchaypo | yep. you must be looking at a different version if that's 433 for you | 06:14 |
tchaypo | anyway - xuhaiwei - have we helped at all, or are you still stuck? | 06:15 |
xuhaiwei | I am still stuck | 06:15 |
lifeless | tchaypo: git :) | 06:15 |
tchaypo | I'd suggest looking at /root/.pip/pip.log, which *mmight have more info | 06:16 |
lifeless | tchaypo: basically it reads the full index | 06:16 |
tchaypo | that's from line 448 of your paste | 06:16 |
lifeless | and then looks for a case insensitive match | 06:16 |
xuhaiwei | can I install it by hand? | 06:16 |
tchaypo | yeah, my second suggestion was going to be a manual "pip install markupsafe" to see if that works | 06:16 |
tchaypo | if it does I'd try uninstalling and then retry the previous step again to rule out transient errors | 06:17 |
lifeless | xuhaiwei: bear with me - I'm looking at your bug, but I'm OTP right now | 06:17 |
xuhaiwei | otp? | 06:18 |
tchaypo | on the phone | 06:18 |
xuhaiwei | oh, ok | 06:18 |
xuhaiwei | I am not so eager | 06:18 |
xuhaiwei | I tried pip install markupsafe, but it hit an proxy problem again | 06:19 |
xuhaiwei | what about changint the mirror file name to 'markupsafe' | 06:20 |
tchaypo | lifeless: yep, _find_url_name is 197-214 in the version i was looking at, but 414-436 in the git version | 06:20 |
*** rpodolyaka1 has joined #tripleo | 06:21 | |
tchaypo | xuhaiwei: you could, but then you're probably going to hit similar issues with the proxy later and have to solve them; I'd rather tackle the root problem so you don't hit it again later | 06:21 |
tchaypo | what proxy problem did you see? | 06:21 |
xuhaiwei | Cannot fetch index base URL http://pypi.python.org/simple/ Could not find any downloads that satisfy the requirement markupsafe | 06:21 |
lifeless | tchaypo: I believe the problem is in pip _get_page | 06:21 |
xuhaiwei | I think this is caused by the proxy | 06:22 |
tchaypo | ah, right. | 06:22 |
tchaypo | so fixing the name of markupsafe won't do anything here | 06:22 |
tchaypo | as lifeless said, pip crawls through all the links in the page it retrieves from http://pypi.python.org/simple/ and looks for something that looks like "markupsafe" | 06:23 |
lifeless | xuhaiwei: your mirror is in ~/.cache/ ... roght ? | 06:23 |
xuhaiwei | If I fix the mirror file name, it wont go to the proxy, right? | 06:23 |
lifeless | tchaypo: yeah and if get_page doesn't return things properly ... | 06:23 |
xuhaiwei | yes | 06:23 |
tchaypo | in this case it's failing to download that page, so it can't look at the index | 06:23 |
lifeless | xuhaiwei: can you get me ls -lR from your cache? gzip that and put it somewhere I can download it ? | 06:24 |
tchaypo | yep, if you can make it never need to look up the index it should be better. | 06:24 |
xuhaiwei | lifeless: I will try to do it | 06:25 |
*** rdopieralski has joined #tripleo | 06:30 | |
xuhaiwei | can I send you a mail? | 06:31 |
*** rdopieralski has quit IRC | 06:31 | |
*** rdopieralski has joined #tripleo | 06:31 | |
*** shardy_afk is now known as shardy | 06:33 | |
xuhaiwei | lifeless: can I send you a mail? | 06:33 |
*** rpodolyaka1 has quit IRC | 06:33 | |
lifeless | yes | 06:33 |
lifeless | robertc at robetcollins dot net | 06:33 |
tchaypo | in other news, I think I know why nova hypervisor-stats is reporting just one - my nodes.js on has two nodes. | 06:36 |
greghaynes | nodes.js? What are you doing!? | 06:37 |
tchaypo | bah. | 06:37 |
tchaypo | s/nodes.js/$TE_DATAFILE | 06:37 |
tchaypo | brain thinks: the js file that defines the nodes. fingers type: nodes.js | 06:37 |
openstackgerrit | A change was merged to openstack/tuskar-ui: Removing unused testadata https://review.openstack.org/84406 | 06:38 |
xuhaiwei | lifeless: have you got the mail? | 06:39 |
lifeless | I dont think so | 06:42 |
lifeless | oh I typoed | 06:42 |
lifeless | robertc at robertcollins dot net | 06:42 |
xuhaiwei | I will send it again | 06:43 |
xuhaiwei | and this time? | 06:46 |
lifeless | i have it | 06:47 |
*** mrunge has joined #tripleo | 06:48 | |
openstackgerrit | James Polley proposed a change to openstack/tripleo-incubator: Minor tweaks to docs in _testenv.sh https://review.openstack.org/84640 | 07:08 |
tchaypo | greghaynes: do your worst. | 07:08 |
*** ramishra has quit IRC | 07:11 | |
StevenK | Haha | 07:14 |
greghaynes | done | 07:14 |
*** ramishra has joined #tripleo | 07:16 | |
*** jcoufal has joined #tripleo | 07:17 | |
*** bauzas has joined #tripleo | 07:20 | |
*** jprovazn has joined #tripleo | 07:21 | |
rpodolyaka | morning | 07:25 |
GheRivero | morning all | 07:28 |
*** jtomasek has joined #tripleo | 07:28 | |
lifeless | xuhaiwei: looking now | 07:28 |
lifeless | xuhaiwei: ok, so there is a MarkupSafe directory, now we need to see why pip isn't identifying that correctly | 07:29 |
xuhaiwei | yeah | 07:29 |
*** akuznetsov has quit IRC | 07:29 | |
*** giulivo has joined #tripleo | 07:30 | |
ProfFalken | If I want to run nova client against the undercloud as part of a post-install.d script, which heat variable should I use in my overcloud-source.yaml file? | 07:31 |
*** rpodolyaka1 has joined #tripleo | 07:33 | |
*** akuznetsov has joined #tripleo | 07:36 | |
*** rpodolyaka1 has quit IRC | 07:36 | |
lifeless | ProfFalken: sorry, not entirely sure what you mean by that | 07:37 |
lifeless | xuhaiwei: what version of pip do you have installed? | 07:39 |
lifeless | xuhaiwei: trunk looks like it should work for you | 07:39 |
*** ifarkas has joined #tripleo | 07:40 | |
xuhaiwei | python-pip 1.0-1build1 | 07:40 |
ProfFalken | lifeless: don't worry, there was an error in my heat template - I was pointing at the wrong host! | 07:42 |
lifeless | xuhaiwei: try this patch please http://paste.ubuntu.com/7193216/ | 07:43 |
xuhaiwei | lifeless: it could be the version's problem? | 07:43 |
lifeless | xuhaiwei: it may be | 07:43 |
ProfFalken | lxsli pointed out the error :) | 07:43 |
lifeless | cool :) | 07:44 |
tchaypo | i wonder when Ng is going to be online now that DST has played games | 07:45 |
StevenK | DST has played games? | 07:46 |
StevenK | We don't switch until this weekend, no? | 07:46 |
lxsli | GB has already switched, think IE has too | 07:47 |
xuhaiwei | lifeless: I am running it again | 07:48 |
tchaypo | it's a great game. First the US flips, then the UK flips, then AU flips | 07:48 |
tchaypo | i think we've at least got to the point where all of AU flips on the same night now | 07:49 |
*** boris-42 has quit IRC | 07:50 | |
*** boris-42 has joined #tripleo | 07:51 | |
lifeless | StevenK: and AU flips differently to NZ | 07:51 |
lifeless | StevenK: and AU flips differently to AU, even | 07:51 |
StevenK | Heh | 07:52 |
lxsli | QI tells me the original plan was for the UK to use 4x20min adjustments | 07:53 |
lxsli | let us be thankful for small mercies | 07:53 |
*** jistr has joined #tripleo | 07:53 | |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-image-elements: Restructure the nova.conf to match documentation https://review.openstack.org/83821 | 08:00 |
xuhaiwei | lifeless: I got the same error | 08:00 |
* tchaypo grovels tz source data | 08:00 | |
lifeless | xuhaiwei: please file a bug | 08:02 |
tchaypo | as far as i can tell all states in aus have changed on the 1st sun in april since 2008 | 08:02 |
lifeless | tchaypo: what about the territories ? also I may be remembering before then ? | 08:02 |
tchaypo | looks like NZ is the same | 08:03 |
xuhaiwei | lifeless: This bug belong to tripleo? | 08:03 |
*** bauzas has quit IRC | 08:03 | |
*** viktors has quit IRC | 08:05 | |
tchaypo | lifeless: i can't seen any variations in any of the territories | 08:05 |
StevenK | Except for QLD, and WA | 08:07 |
lifeless | xuhaiwei: yes please | 08:08 |
lifeless | xuhaiwei: pad.lv/b/tripleo | 08:08 |
tchaypo | which don't do DST; neither does the NT, except if the year is 1899 | 08:08 |
StevenK | tchaypo: WA has tried DST 3 times | 08:09 |
tchaypo | most recently in ... | 08:09 |
xuhaiwei | pad.lv/b/tripleo??? | 08:10 |
tchaypo | Rule>_AW>_2007>_2008>_->Oct>lastSun>2:00s>1:00>_- | 08:10 |
lifeless | http://pad.lv/b/tripleo | 08:10 |
xuhaiwei | https://bugs.launchpad.net/tripleo/+bug/1301220 | 08:11 |
uvirtbot | Launchpad bug 1301220 in tripleo "pip can't find markupsafe when running devtest_seed.sh" [Undecided,New] | 08:11 |
xuhaiwei | is it ok? | 08:11 |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-heat-templates: Set the block_migration_flag as Heat-configurable https://review.openstack.org/84655 | 08:13 |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-image-elements: Read libvirt block_migration_flag from nova.conf https://review.openstack.org/84657 | 08:15 |
lifeless | xuhaiwei: thank you, we'll ask some questions there shortly to gather data | 08:15 |
xuhaiwei | ok, I will see it | 08:16 |
*** e0ne has joined #tripleo | 08:16 | |
tchaypo | PAtch set 9 of https://review.openstack.org/#/c/83294/ was uploaded 30 hours ago, and jenkins just got around to failing it. I guess we still have a bit of a backlog :p | 08:20 |
*** gcha has joined #tripleo | 08:22 | |
*** eghobo has quit IRC | 08:23 | |
*** lucasagomes has joined #tripleo | 08:25 | |
lifeless | tchaypo: yes, 36 hours of downtime more or less | 08:26 |
*** ramishra has quit IRC | 08:27 | |
*** derekh has joined #tripleo | 08:28 | |
*** jcoufal_ has joined #tripleo | 08:29 | |
*** jcoufal has quit IRC | 08:30 | |
*** jcoufal_ is now known as jcoufal | 08:30 | |
openstackgerrit | jan grant proposed a change to openstack/tripleo-image-elements: Ensure the (block) loop device is available. https://review.openstack.org/83383 | 08:33 |
*** rpodolyaka1 has joined #tripleo | 08:34 | |
*** rpodolyaka1 has quit IRC | 08:38 | |
*** yassine has joined #tripleo | 08:38 | |
derekh | hmm, one of the test envs seems to be rejecting the key in its own json | 08:46 |
derekh | 2014-04-02 07:47:21.403 | Permission denied (publickey). | 08:46 |
derekh | 2014-04-02 07:47:21.441 | dd: writing to `standard output': Broken pipe | 08:46 |
derekh | http://logs.openstack.org/09/76509/6/check-tripleo/check-tripleo-undercloud-precise/3eb81fc/console.html | 08:46 |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-image-elements: Allow settings for Nova quotas https://review.openstack.org/84666 | 08:52 |
lifeless | derekh: \o/ | 08:52 |
lifeless | derekh: so we're broadly up | 08:52 |
lifeless | derekh: but only because I have a while loop deleting floating-ips | 08:52 |
*** jcoufal has quit IRC | 08:54 | |
derekh | lifeless: ok, ya I seen the bug, it explains why I was sent on a wild goose chase yesterday trying to figure out a traceback when in fact floatingips was the problem | 08:57 |
lifeless | derekh: *bugs* | 08:58 |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-heat-templates: Set the block_migration_flag as Heat-configurable https://review.openstack.org/84655 | 08:58 |
lifeless | derekh: its been a lovely exercise in OMG we release this? | 08:58 |
derekh | lifeless: yup | 08:58 |
lifeless | followed in short order by OMG people use this :P | 08:58 |
lifeless | tchaypo: I hesitate to throw yet *more* things your way | 08:59 |
tchaypo | more learning opportunities! | 09:00 |
tchaypo | I've already learnt so much today | 09:00 |
lifeless | tchaypo: but it seems to me xuhaiwei's issue given that his mirror is correct (at least per the ls -lR) should be thoroughly reproducable | 09:00 |
*** untriaged-bot has joined #tripleo | 09:00 | |
untriaged-bot | Untriaged bugs so far: | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1301220 | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1290488 | 09:00 |
uvirtbot | Launchpad bug 1301220 in tripleo "pip can't find markupsafe when running devtest_seed.sh" [Undecided,New] | 09:00 |
*** untriaged-bot has quit IRC | 09:00 | |
uvirtbot | Launchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete] | 09:00 |
tchaypo | lifeless: oh right. assign it to me so i can look it up in the morning. | 09:00 |
lifeless | tchaypo: e.g. a pypi-mirror on local disk, file:// accesss to it (see the pypi element's pre-install rules) installing the nova element and boom | 09:00 |
lifeless | tchaypo: and I'd love to get a fix into pip to unbreak this for everyon | 09:01 |
lifeless | e | 09:01 |
tchaypo | Yep, if it's that simple I should be able to reproduce it easily | 09:01 |
tchaypo | as long as xuhaiwei isn't using a weird case-mangling filesystem like the OS X default fs :p | 09:02 |
lifeless | tchaypo: even that should work | 09:03 |
lifeless | tchaypo: at least with trunk pip | 09:03 |
lifeless | tchaypo: it may be as simple as 'run git' :P | 09:03 |
lifeless | tchaypo: assignedo | 09:04 |
andreaf | lifeless: ping | 09:10 |
lifeless | andreaf: hey, I'm just relaxing atm, will be back in ~90m | 09:11 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed. https://review.openstack.org/84083 | 09:11 |
andreaf | lifeless: eh enjoy ;) ttil | 09:12 |
*** jp_at_hp has joined #tripleo | 09:12 | |
*** ramishra has joined #tripleo | 09:13 | |
openstackgerrit | Ryan Moore proposed a change to openstack/tripleo-image-elements: Allow setting of common configuration options https://review.openstack.org/84672 | 09:19 |
*** yamahata has quit IRC | 09:24 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed. https://review.openstack.org/84083 | 09:25 |
*** hashar has joined #tripleo | 09:26 | |
*** ProfFalken has quit IRC | 09:27 | |
*** proffalken has joined #tripleo | 09:28 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Bridge physical interface to the seed. https://review.openstack.org/84083 | 09:31 |
*** Rakesh5 has quit IRC | 09:33 | |
lifeless | andreaf: ok, so wassup ? | 09:35 |
*** Rakesh5 has joined #tripleo | 09:37 | |
openstackgerrit | Gonéri Le Bouder proposed a change to openstack/diskimage-builder: fix grub2 installation on Debian Wheezy https://review.openstack.org/83506 | 09:38 |
*** bauzas has joined #tripleo | 09:39 | |
*** pblaho has joined #tripleo | 09:41 | |
lifeless | down to 12 hours behind | 09:42 |
*** Rakesh5 has quit IRC | 09:44 | |
lifeless | ohhh I know why we might be having glithces... new image is saucy not trusty | 09:47 |
lifeless | need to upgrade the network node mellanox driver | 09:48 |
andreaf | lifeless: I just saw your earlier question about CI | 09:49 |
*** xuhaiwei has quit IRC | 09:50 | |
*** matsuhashi has quit IRC | 09:50 | |
andreaf | lifeless: at the moment I'm trying to get a clean baseline - a set of tempest config / test selection which runs stable against a tripleo overcloud | 09:50 |
lifeless | andreaf: ok; I *think* derekh figured that ut previously | 09:51 |
lifeless | andreaf: so I had a thought about where tempest should run | 09:51 |
lifeless | andreaf: which is that since its testing the thing, it shouldn't run *in* the thing | 09:51 |
lifeless | e.g. | 09:51 |
lifeless | say we run baremetal tempeset tests against a seed | 09:51 |
lifeless | we also want to be able to take the seed image and say 'this is known good' | 09:52 |
lifeless | in which case we don't really want tempest in the image | 09:52 |
lifeless | ditto undercloud, overcloud | 09:52 |
*** ccorrigan has joined #tripleo | 09:53 | |
andreaf | lifeless: right | 09:54 |
smulcahy | So once I've run tripleo and want to kick the tyres of the overcloud, I think I need to run the following, | 09:54 |
derekh | andreaf: the tempest tests as filtered bey the tempest element were "known to work" a few months back, it may now be out of date but is probably a good place to start | 09:54 |
smulcahy | . tripleo-incubator/scripts/tripleorc | 09:54 |
smulcahy | . tripleo-incubator/scripts/devtest_variables.sh | 09:54 |
smulcahy | . tripleo-incubator/scripts/tripleo-overcloud-passwords | 09:54 |
smulcahy | . tripleo-incubator/overcloudrc-user | 09:54 |
smulcahy | export no_proxy=$OVERCLOUD_IP | 09:54 |
smulcahy | nova list | 09:54 |
smulcahy | (with the nova list being an example tyre kick) | 09:54 |
andreaf | derekh: yes it is out of date now, I was trying to bring it back to a working state | 09:54 |
smulcahy | do I really need to source all of those files for a tyre-kick? or is there a single thingie I can run to set up my environment correctly? | 09:55 |
andreaf | lifeless: so we should run tempest from the VM out of nodepool | 09:55 |
lifeless | the jenkins slave yes | 09:56 |
lifeless | smulcahy: tripleorc is a snapshot of the state at the end of the run, it should never be necessary - but some folk find it useful | 09:56 |
andreaf | lifeless, derekh: we cannot really use the element to setup tempest on the jenkins slave, so we'll need some different type of tempest configuration, we could use devstack functions (i.e. initset) for example | 09:57 |
lifeless | andreaf: exactly | 09:57 |
*** matsuhashi has joined #tripleo | 09:57 | |
derekh | andreaf: ok | 09:57 |
lifeless | I think there may in future be a use case for a tempest we can deploy into a real cloud (e.g. for certifications), but not for CI | 09:57 |
smulcahy | lifeless: aha thanks, thats one down | 09:58 |
andreaf | lifeless: +1 yes I'd like to maintain the element alive and working for sure | 09:58 |
lifeless | smulcahy: my morning 'resume state' is to source my-rc (which sets things like TRIPLEO_ROOT) for me; devtest_variables (which sets everything up) and then the RC file for the cloud which I want to work with | 09:58 |
smulcahy | but I still need the other 3 and the no_proxy | 09:58 |
smulcahy | could we make the no_proxy part of one of the others? | 09:58 |
lifeless | smulcahy: you can put noproxy in your myrc - just put the whole subnet you want excluded in | 09:59 |
lifeless | 192.0.2.0/24 or whatever | 09:59 |
andreaf | lifeless: where would you have the logic / script to deploy tempest on the slave? I think it should go in tripleo-ci? | 09:59 |
lifeless | andreaf: thinking out loud | 09:59 |
lifeless | andreaf: requirements are: we want to run the zuul_ref of tempest, so we can't install it at buildtime, so not nodepool | 10:00 |
lifeless | andreaf: that means yes, tripleo-ci | 10:00 |
smulcahy | lifeless: I get that, I'm trying to put together a simple howto for someone unfamiliar with tripleo to kick the tyres on a deployment | 10:00 |
lifeless | andreaf: note that devtest will be cached on local disk if you need things from it | 10:00 |
lifeless | smulcahy: ah! so - they are approaching an existing deploy, and just want to use Nova API ? | 10:00 |
andreaf | lifeless: ok I was about to ask that :D | 10:00 |
andreaf | lifeless: what about testing changes to tripleo-ci itself? | 10:01 |
lifeless | andreaf: theres a cache of all of openstack git trees locally, set to zuul ref stuff | 10:01 |
lifeless | andreaf: indeed, thats part of it :) | 10:01 |
andreaf | lifeless: I think for devstack-gate there is some logic in place to care for the case where we have a special zuul ref for devstack-gate | 10:02 |
lifeless | andreaf: so we get some coverage by the fact CI works; we don't have explicit tests for large chunks though, adding some would be great. | 10:02 |
lifeless | andreaf: where possible we should use regular devstack nodes rather than the limit tripleo-precise nodes which run in the much smaller tripleo test regions | 10:02 |
lifeless | andreaf: there is, I think we have that in place for toci already | 10:02 |
smulcahy | lifeless: exactly, its a use-case that we don't cater well for atm | 10:02 |
lifeless | andreaf: sufficient for what you'll be doing anyhow | 10:02 |
smulcahy | I think thats what jp_at_hp was aiming at with his select-cloud stuff | 10:03 |
lifeless | smulcahy: doesn't horizon have a 'download an rc file' thing | 10:03 |
lifeless | ? | 10:03 |
smulcahy | err, the horizon that isn't enabled by default in the cloud? | 10:03 |
smulcahy | in the tripleo overcloud even | 10:03 |
lifeless | smulcahy: thats a bug, no ? | 10:03 |
smulcahy | it used to at some stage alright | 10:03 |
lifeless | anyhow | 10:04 |
smulcahy | lifeless: I'm not sure, it might be, but not sure it should be a requirement | 10:04 |
lifeless | smulcahy: fair enough | 10:04 |
lifeless | smulcahy: so I think whats needed is a way to generate the right final RC file for someone without looking up creds dynamically | 10:04 |
lifeless | you don't care about the deployment infra in this use case | 10:04 |
lifeless | e.g. you want to say 'give me the RC file for user 'demo'' | 10:04 |
lifeless | that seems to me to be exactly what horizon offered | 10:05 |
smulcahy | I guess I'm ok if there are a bunch of rc files and I need to decide which one to source | 10:05 |
smulcahy | it doesn't need to be too fancy | 10:05 |
lifeless | the noproxy thing is really only needed for the case where we make a new nonroutable network | 10:05 |
lifeless | which is local machine only | 10:05 |
lifeless | so (AFAICT?) really not relevant to the use case of giving someone else access to the cloud | 10:05 |
smulcahy | uhm | 10:05 |
lifeless | ^ checking my assumptions | 10:06 |
smulcahy | you still want to no_proxy for a routable network | 10:06 |
andreaf | lifeless: is there a doc somewhere of the current setup of the tripleo test reqion? | 10:06 |
smulcahy | I believe | 10:06 |
lifeless | andreaf: https://wiki.openstack.org/wiki/TripleO/TripleOCloud/Regions ? | 10:06 |
smulcahy | and we've seen people fall over this time and time again | 10:06 |
smulcahy | if you don't get the no_proxy stuff right, your openstack clients start returning 503 errors | 10:07 |
smulcahy | which regularly lead people to the erroneous conclusion that the overcloud is not working | 10:07 |
lifeless | smulcahy: yeah, gateway errors from squid | 10:07 |
andreaf | lifeless: ehe thanks | 10:07 |
lifeless | smulcahy: the assumption I'm making is that twofold; a) that a running cloud cannot know for a given user whether they are able to route traffic to it or not | 10:07 |
lifeless | smulcahy: e.g. proxy[or not] config is a user domain not a cloud domain problem | 10:08 |
lifeless | smulcahy: and b) that real environments will have either 1) users set up properly as part of their regular machine setup or 2) proxies setup to know about all internal services | 10:08 |
smulcahy | lifeless: I think you're right, I guess I'm focusing strictly on the end-user use case here and not thinking too much about the architecture of a solution so much as the desired end result | 10:09 |
lifeless | smulcahy: I may be thoroughly wrong on b) | 10:09 |
smulcahy | you are wrong on a) and b) | 10:09 |
smulcahy | speaking as someone in one such real environment | 10:09 |
smulcahy | oh | 10:09 |
lifeless | great, we've identified one mismatch :) | 10:09 |
smulcahy | wrong on b)1 and b2) | 10:09 |
smulcahy | where did a) go to? | 10:09 |
lifeless | a) that a running cloud cannot know for a given user whether they are able to route | 10:09 |
smulcahy | ah, I see it now | 10:09 |
lifeless | traffic to it or not | 10:10 |
smulcahy | yes, a) is right | 10:10 |
lifeless | so | 10:10 |
smulcahy | but both parts of b) are wrong in the real world where idiots like me live :) | 10:10 |
*** jcoufal has joined #tripleo | 10:10 | |
lifeless | the reason we do no_proxy stuff in *devtest* is that we know the network will be broken for everyone, because 192.0.2.0/24 is TEST_NET_2 | 10:10 |
lifeless | and noone is allowed to route it :) | 10:10 |
*** tzumainn has quit IRC | 10:11 | |
smulcahy | but even in the case where it might work, in a typical corporate environment, the proxy could be one the other side of the continent so you end up doing a really unneccesary and slow round-trip to the cloud in lab next door to you | 10:11 |
lifeless | would it be the case for b1 and b2 then that you can assert that an entire cloud is no_proxy ? | 10:11 |
*** tzumainn has joined #tripleo | 10:11 | |
lifeless | e.g. we could write that to horizon.conf to change the output rc file ? | 10:11 |
smulcahy | I'd prefer to see a solution that didn't require me to login to horizon to get my rc file tbh | 10:12 |
*** markmc has joined #tripleo | 10:12 | |
lifeless | smulcahy: prior to HP the corporate environments I had had proxies in every office, because damn they useful | 10:12 |
smulcahy | thats not how we work on a day to day basis with openstack | 10:12 |
lifeless | smulcahy: well bear with me a inute | 10:12 |
lifeless | smulcahy: here are the constraints / goals AIUI: | 10:13 |
lifeless | - make it easy for users approaching a cloud someone else deployed (or potentially the same person days later) to use api scripts etc etc | 10:13 |
lifeless | - deal with corporate networks that don't know about the cloud for $various reasons, and as such will fail if you use the regular proxy configuration | 10:14 |
lifeless | - not require manual configuration of the users machine [except perhaps a one time bootstrap thing to get connected at all] | 10:14 |
lifeless | ? | 10:14 |
smulcahy | sounds about right, can you elaborate on the 3rd one? | 10:15 |
lifeless | sure | 10:15 |
lifeless | let me add a couple of my own though :) | 10:15 |
lifeless | - be *able to be* folded into openstack core libraries - e.g. python-openstackclient / horizon / keystone | 10:16 |
smulcahy | on 3, are we talking about something like "I install ubuntu 13.10 on a machine, I pip install python-novaclient, I source <magic rc file>, nova list", simples! ? | 10:16 |
jp_at_hp | lifeless: thanks for the config comments - they're really good. I think I'll try and absorb and reply for your tomorrow morning | 10:16 |
lifeless | smulcahy: say you've a cloud deployed in vlads test area | 10:17 |
lifeless | smulcahy: how do you get to it over the network at all ? | 10:17 |
smulcahy | lifeless: I've no idea what vlad's test area looks like. | 10:17 |
lifeless | smulcahy: firewalled off, jump host only access - that sort of thing | 10:18 |
smulcahy | are saying its a test area on a 10.x net, for example? | 10:18 |
lifeless | smulcahy: yeah | 10:18 |
lifeless | 3 is about me trying to stop it being unlimited scope creep | 10:18 |
smulcahy | right, well, if its firewalled off, I'll need to login to some gateway machine | 10:18 |
lifeless | exactly | 10:18 |
smulcahy | where I can hopefully source <magic rc file> and nova list | 10:18 |
lifeless | I'm willing to ack that no_proxy setup can be very valuable | 10:18 |
lifeless | I'm dubious about putting in openvpn config at this point | 10:19 |
smulcahy | otoh, if its not firewalled and my entire corporation is on 10.x | 10:19 |
smulcahy | I just source <magic rc file> and nova list | 10:19 |
lifeless | sure | 10:19 |
lifeless | so the final one I wanted to add is | 10:19 |
smulcahy | and yes, I think we can draw the line at the openvpn config for sure | 10:19 |
smulcahy | but not ipsec | 10:19 |
smulcahy | joke | 10:19 |
lifeless | - not assume local access to the machine | 10:19 |
lifeless | frees/wan 4 ever | 10:20 |
smulcahy | I think catering for the scenario where I have direct access to the machine covers 80% of real world scenarios | 10:20 |
lifeless | I have a freeswan mesh config generator around somewhere that I wrote ages back | 10:20 |
* smulcahy waves hands | 10:20 | |
smulcahy | because that whats I'm seeing day to day in our use of openstack clouds (both public and test) | 10:21 |
lifeless | smulcahy: so you're thinking you'd generate the rc file for user X ? | 10:21 |
lifeless | smulcahy: I certainly don't have shell access to public cloud :P | 10:21 |
smulcahy | well, tripleo seems to do that already overcloudrc-user and so on | 10:21 |
smulcahy | it just needs me to . a bunch of other files before I can meaningfully use that one and I think it would be goodness to collapse that sourcing requirement into a single step | 10:22 |
lifeless | ok so | 10:22 |
smulcahy | ok, for some definitions of access - I'm thinking the api port(s), not shell | 10:22 |
smulcahy | thats so 90s! :) | 10:22 |
lifeless | there are I think conflicting use cases here | 10:22 |
lifeless | but lets pick a real common one | 10:23 |
*** e0ne_ has joined #tripleo | 10:23 | |
lifeless | admin (you) want to let user (fred) onto the cloud | 10:23 |
lifeless | you need to: | 10:23 |
lifeless | - create a user | 10:23 |
lifeless | - with a password | 10:23 |
lifeless | - make an rc | 10:23 |
lifeless | - get it to fred | 10:23 |
lifeless | the horizon thing splits that in two | 10:23 |
lifeless | you make the user with a password and tell them the uid and password, and they login and get the rest themselves | 10:24 |
*** jang1 has joined #tripleo | 10:24 | |
smulcahy | oh yeah, I'm not even talking about that scenario here | 10:24 |
lifeless | I keep coming back to this not because I think horizon is necessarily the right answer, but because it already exists and does the job :) | 10:25 |
smulcahy | I'm focusing on the "ok, I've ran devtest and it seems to have worked - how do I run a nova command against the overcloud, undercloud and indeed seed" | 10:25 |
lifeless | smulcahy: if you've done that, just source Xrc | 10:25 |
lifeless | smulcahy: its all setup already | 10:25 |
smulcahy | its not | 10:25 |
lifeless | smulcahy: did you close your terminal ? | 10:25 |
smulcahy | I need to source 3 separate files and set a no_proxy variable | 10:26 |
smulcahy | the network crapped on my briefly after running devtest so I needed to login again | 10:26 |
smulcahy | it happens | 10:26 |
lifeless | smulcahy: ok | 10:26 |
*** jtomasek has quit IRC | 10:26 | |
lifeless | so the reason I'm giving pushback here | 10:26 |
smulcahy | I think the subtext here is that this isn't something we want to address in tripleo devtest | 10:26 |
lifeless | is that we want small tools that are useful indefinitely | 10:26 |
smulcahy | which is fine but I think we should call it out | 10:26 |
lifeless | if there is a user story, great - lets incubate it | 10:27 |
smulcahy | then I can proceed with dealing with this in my own environment in the knowledge that I'm not duplicating something that already exists in TripleO | 10:27 |
lifeless | smulcahy: I replied with one such one that I think would be great - an rc manager to the select-cloud thing | 10:27 |
*** e0ne has quit IRC | 10:27 | |
lifeless | smulcahy: sure | 10:27 |
lifeless | uhm | 10:28 |
lifeless | if the focus is folk doing devtest | 10:28 |
lifeless | lets start over | 10:28 |
lifeless | what can we fix or unduplicate to make sourcing e.g. undercloudrc on its own work | 10:28 |
lifeless | right now undercloudrc is a static file - we don't generate it | 10:28 |
smulcahy | exactly | 10:28 |
lifeless | our clouds don't know how to generate rc files - which is the path I went down in our previous discussion above. | 10:29 |
smulcahy | so what generates it? | 10:29 |
smulcahy | export OS_PASSWORD=$(os-apply-config -m $TE_DATAFILE --type raw --key undercloud.password) | 10:30 |
smulcahy | export OS_AUTH_URL=$(os-apply-config -m $TE_DATAFILE --type raw --key undercloud.endpoint) | 10:30 |
smulcahy | from my undercloudrc looks generated | 10:30 |
lifeless | smulcahy: we wrote it by hand | 10:30 |
lifeless | git log -p -- undercloudrc | 10:30 |
smulcahy | aha, gotcha | 10:30 |
smulcahy | so maybe the solution then is to make a few mods to that file | 10:31 |
lifeless | its setup to make use of a LOT of assumptions about the environment | 10:31 |
lifeless | that is that there is a TE_DATAFILE | 10:31 |
smulcahy | I think thats ok for now at least | 10:31 |
lifeless | it knows its the undercloud | 10:31 |
jp_at_hp | lifeless: I really want the select-cloud script to get into the incubator. I think right now it is a solution for a developer to very easily choose what cloud to interact with for test purposes, and it provides a place where the no_proxy can be set for the seed automagically (which by design is likely always a vm, right?) - and I think it provides a location into which work can go for allowing developers to inspect and manipulate conf | 10:31 |
lifeless | jp_at_hp: the seed might not be a VM outside of a test environment, its distinguishing characteristic is that it wasn't deployed by nova+heat | 10:32 |
smulcahy | so maybe we add some additional logic into undercloudrc, overcloudrc and overcloudrc-user as a starting point for this | 10:32 |
jp_at_hp | fair point :D | 10:32 |
lifeless | smulcahy: I could see moving idempotent no_proxy logic into those files | 10:32 |
lifeless | jp_at_hp: for instance RedHat have a seed thats on a USB key | 10:32 |
jang1 | if I can chip in: perhaps even expecting TE_DATAFILE to be set is a bit much. Many of our scripts expect it; but if it's not set, devtest_variables can pick a suitable spot. I'd personally like to see more of these things work given a mostly 'pristine' environment. | 10:33 |
lifeless | derekh: ruhroh | 10:33 |
lifeless | derekh: no jobs running | 10:33 |
*** rpodolyaka1 has joined #tripleo | 10:34 | |
lifeless | jang1: so, I dislike that its an environment variable, but thats kindof like gcc assuming a.out | 10:34 |
*** Rakesh5 has joined #tripleo | 10:35 | |
smulcahy | so what about adding a sourcing of devtest_variables and tripleo-<environment>-passwords to the <environment>rc files? and a no_proxy line - does that seem reasonably uncontentious? | 10:35 |
lifeless | jang1: I think its reasonable for our development toolchain to assume that TRIPLEO_ROOT is set (where the code is) and that devtest_variables.sh has been sourced, because thats how we avoid hugely complex boilerplate leaking into every script | 10:35 |
lifeless | e.g. we use PATH etc | 10:35 |
smulcahy | I'm happy to submit that change if it has a chance of being approved | 10:35 |
lifeless | smulcahy: no_proxy line I think is uncontentious (for me at least) | 10:36 |
smulcahy | and I think it dramatically improves the usability of tripleo devtest deployments for non-experts | 10:36 |
smulcahy | but not the other two? | 10:36 |
derekh | lifeless: crap, looking | 10:36 |
lifeless | derekh: nova list --all-tenants | 10:36 |
lifeless | ERROR: HTTPSConnectionPool(host='ci-overcloud.tripleo.org', port=13000): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.error'>: [Errno 110] Connection timed out) | 10:36 |
lifeless | derekh: its to late for me to jump on this | 10:36 |
*** CLOUDOUTAGE has joined #tripleo | 10:36 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 10:36 |
*** CLOUDOUTAGE has quit IRC | 10:36 | |
derekh | lifeless: no, prob will dig in | 10:36 |
lifeless | derekh: but I'm going to say OHF*K and leave it to you/ng/ghe | 10:36 |
lifeless | smulcahy: the passwords file isn't needed | 10:37 |
lifeless | smulcahy: not sure why you want to source that | 10:37 |
lifeless | smulcahy: its needed if you're using the current heat commandlines | 10:38 |
lifeless | smulcahy: which is why I want to move those to use a JSON environment file | 10:38 |
lifeless | smulcahy: but its not needed for the rc files | 10:38 |
*** rpodolyaka1 has quit IRC | 10:38 | |
smulcahy | export OS_PASSWORD=$OVERCLOUD_DEMO_PASSWORD | 10:38 |
smulcahy | thats one of the lines from overcloudrc-user | 10:38 |
lifeless | smulcahy: oh, I was focused on the use case you gave :) | 10:39 |
smulcahy | does that not mean I need the passwords file? | 10:39 |
lifeless | smulcahy: you're right, we haven't changed that one | 10:39 |
lifeless | the demo user is a bit of an odd thing really; what we need is a good user management solution | 10:39 |
lifeless | (the unencrypted asserted-users stuff we do today really isn't good enough - but we needed something automated and repeatable, and blah) | 10:40 |
smulcahy | and yet ironically for a new user, its the most useful one because it lets you quickly test the whole environment | 10:40 |
lifeless | anyhow | 10:40 |
smulcahy | but ok, that may only be neccesary for the overcloudrc-user file | 10:41 |
lifeless | I think that one we'd want a #FIXME against it | 10:41 |
lifeless | because right now the path to those files can be all over the place | 10:41 |
smulcahy | which brings us to source devtest_variables.sh in the overcloudrc and undercloudrc files .. objections to that? | 10:41 |
lifeless | why do you want to source devtest_variables? <- not trolling :) | 10:42 |
lifeless | I assume its because of this line: | 10:42 |
lifeless | export TE_DATAFILE=${TE_DATAFILE:-"$TRIPLEO_ROOT/testenv.json"} | 10:42 |
smulcahy | an example is easiest | 10:43 |
*** lparth has joined #tripleo | 10:43 | |
smulcahy | $ . tripleo-incubator/overcloudrc | 10:43 |
smulcahy | os-apply-config: command not found | 10:43 |
smulcahy | os-apply-config: command not found | 10:43 |
lifeless | ah | 10:43 |
smulcahy | I haven't even gotten to the TE_DATAFILE error yet :) | 10:43 |
derekh | Can't ssh to ci-overcloud-notCompute0 trying textcons | 10:43 |
lifeless | so - ok, the scripts from client-tools | 10:43 |
jang1 | lifeless: "hugely complex boilerplate leaking into every script" can be as simple as ". $(dirname "$0")/common", can't it? | 10:43 |
lifeless | jang1: which then breaks when you install the script | 10:44 |
lifeless | jang1: we had unpleasant times when we starting productionising things | 10:44 |
lifeless | jang1: I'd rather not set us up for that again | 10:44 |
jang1 | well, that surely depends what's inside "common". | 10:44 |
*** e0ne_ has quit IRC | 10:44 | |
lifeless | anyhow, - I would object to sourcing devtest_variables; it makes the dependency be on the specific layout rather than on os-apply-config being in the path and TE_DATAFILE being set (which is ugly in its own right, but thats a pre-existing ugly :( | 10:45 |
smulcahy | we've lost an hour of our lives to this, is the conclusion for now that whats in TripleO is good enough and we should implement something for our own usability for now? | 10:45 |
lifeless | I suggest you add a patch to move the noproxy logic to the rc files | 10:46 |
lifeless | idempotently | 10:46 |
lifeless | and a patch to the -user rc to source the passwords file IFF the variable isn't set | 10:46 |
lifeless | I would like to see those | 10:46 |
lifeless | that should get you down to a) source variables, which gets you your global state setup | 10:47 |
smulcahy | except that needs a TRIPLEO root to work, the passwords one anyway | 10:47 |
lifeless | and then source the rc you want | 10:47 |
smulcahy | right, its an improvement | 10:47 |
openstackgerrit | gerry-drudy proposed a change to openstack/tripleo-image-elements: Add swift-get-nodes, swift-recon and swift-recon-cron https://review.openstack.org/84689 | 10:47 |
smulcahy | ok | 10:47 |
lifeless | smulcahy: which variables will take care of for you | 10:47 |
jang1 | ... so the conclusion is that now there are only two files to source? | 10:48 |
jang1 | 33% improvement? | 10:48 |
derekh | groan, [183900.294370] hpsa 0000:06:00.0: cmd_alloc returned NULL! | 10:48 |
smulcahy | seems to be | 10:48 |
lifeless | jang1: 50 | 10:48 |
smulcahy | and is something like export no_proxy=$OVERCLOUD_IP acceptable? or do you imagine something fancier? | 10:48 |
lifeless | jang1: actually 5 -> 2 | 10:49 |
lifeless | smulcahy: It needs to edit the no_proxy rather than be hardcoded; ideally idempotently to stop it growing out of control | 10:49 |
jang1 | or you could do what I do and depend on the specific layout. I'd be interested to know who _doesn't_ have shell aliases or functions or what-have-you to chop this down to a single line | 10:49 |
lifeless | smulcahy: the existing code you'll be moving does the edit | 10:49 |
jang1 | I'm fairly sure I've seen something that tr's , to \n on $no_proxy and goes from there, already | 10:50 |
lifeless | derekh: yeah, new control plane node is saucy not trusty, so its the older kernel | 10:50 |
lifeless | derekh: please add to the cloud page - I'm going to capture *everythign* in a postmortem when we're stable and stay up for more than a few hours :) | 10:51 |
derekh | lifeless: will do, btw before clear the page I hit "save revision" to mark it | 10:52 |
derekh | lifeless: befor you go, tcp syn/acks not getting out of the box again, we havn't come up with a way around this besides reboot have we? | 10:56 |
*** hashar has quit IRC | 10:56 | |
lifeless | derekh: which box ? | 10:59 |
derekh | lifeless: ci-overcloud controller | 10:59 |
smulcahy | ok, thanks lifeless | 10:59 |
lifeless | derekh: can try rmmoding the mlx_core etc | 10:59 |
*** gcha has quit IRC | 10:59 | |
lifeless | derekh: but yeah reboot | 11:00 |
smulcahy | we may revisit but I'll take this as a step in the right direction anyway :)( | 11:00 |
derekh | lifeless: ok willdo, I'll leave you alone now | 11:00 |
lifeless | derekh: oh before you do | 11:00 |
lifeless | derekh: /e/n/i hasn't been updated on the controller | 11:00 |
derekh | lifeless: ok, will do it also | 11:00 |
*** matsuhashi has quit IRC | 11:00 | |
lifeless | derekh: so you'll want to do that similarly to the one on the hypervisors (but different bridge names) | 11:00 |
lifeless | derekh: so adjust to taste! night all | 11:00 |
derekh | lifeless: ok, night | 11:01 |
lifeless | derekh: and os-collect-config --force --one after reboot of course, but you know that :) | 11:01 |
derekh | yup | 11:01 |
*** nosnos has quit IRC | 11:04 | |
*** akuznetsov has quit IRC | 11:06 | |
*** akuznetsov has joined #tripleo | 11:07 | |
*** CLOUDOUTAGE has joined #tripleo | 11:07 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 11:07 |
*** CLOUDOUTAGE has quit IRC | 11:07 | |
*** openstackgerrit has quit IRC | 11:08 | |
*** openstackgerrit has joined #tripleo | 11:08 | |
openstackgerrit | Gonéri Le Bouder proposed a change to openstack/diskimage-builder: clean up: fix some indent not multiple of 2 https://review.openstack.org/84693 | 11:14 |
*** e0ne has joined #tripleo | 11:16 | |
*** matsuhashi has joined #tripleo | 11:16 | |
*** julim has joined #tripleo | 11:16 | |
*** lucasagomes is now known as lucas-hungry | 11:23 | |
*** e0ne_ has joined #tripleo | 11:24 | |
*** rlandy has joined #tripleo | 11:24 | |
*** pblaho has quit IRC | 11:26 | |
*** e0ne has quit IRC | 11:27 | |
*** CaptTofu has joined #tripleo | 11:33 | |
*** killer_p- has quit IRC | 11:34 | |
*** CLOUDOUTAGE has joined #tripleo | 11:38 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 11:38 |
*** CLOUDOUTAGE has quit IRC | 11:38 | |
openstackgerrit | A change was merged to openstack/tripleo-incubator: overcloud: Look for notCompute or controller https://review.openstack.org/84180 | 11:48 |
openstackgerrit | Ana Krivokapic proposed a change to openstack/tuskar-ui: Use num_nodes to get node count if possible https://review.openstack.org/84702 | 11:51 |
derekh | about to reboot ci-controller, anybody want to double check my edits to /etc/network/interfaces bfor I do ? | 11:55 |
derekh | Ng: SpamapS ^ | 11:55 |
*** morazi has joined #tripleo | 11:55 | |
derekh | brb, its at the bottom of https://etherpad.openstack.org/p/cloud-outage | 11:55 |
*** gcha has joined #tripleo | 11:56 | |
*** ccrouch has joined #tripleo | 11:59 | |
derekh | rebooting ci-controller | 11:59 |
*** morazi has quit IRC | 12:00 | |
*** hashar has joined #tripleo | 12:00 | |
*** morazi has joined #tripleo | 12:02 | |
jprovazn | back in 60 m | 12:02 |
*** jprovazn has quit IRC | 12:02 | |
*** CLOUDOUTAGE has joined #tripleo | 12:09 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 12:09 |
*** CLOUDOUTAGE has quit IRC | 12:09 | |
*** dprince has joined #tripleo | 12:12 | |
*** rbrady has joined #tripleo | 12:13 | |
*** killer_prince has quit IRC | 12:16 | |
*** lblanchard has joined #tripleo | 12:17 | |
openstackgerrit | Ladislav Smola proposed a change to openstack/tuskar: Swift parameters fix https://review.openstack.org/84705 | 12:17 |
*** Matt2 has quit IRC | 12:17 | |
*** jistr is now known as jistr|english | 12:18 | |
*** CaptTofu has quit IRC | 12:20 | |
*** morazi has quit IRC | 12:21 | |
* Ng -> office to pick up his laptop at long long last \o/ | 12:22 | |
*** e0ne has joined #tripleo | 12:22 | |
*** morazi has joined #tripleo | 12:22 | |
*** jistr|mobi has joined #tripleo | 12:23 | |
*** morazi has quit IRC | 12:24 | |
*** e0ne_ has quit IRC | 12:25 | |
*** morazi has joined #tripleo | 12:26 | |
*** jdob has joined #tripleo | 12:28 | |
*** weshay has joined #tripleo | 12:32 | |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: A sysctl element to manage settings via sysctl.d. https://review.openstack.org/84599 | 12:40 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Update bootstack to use sysctl-set-value. https://review.openstack.org/84600 | 12:40 |
*** CLOUDOUTAGE has joined #tripleo | 12:40 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 12:40 |
*** CLOUDOUTAGE has quit IRC | 12:40 | |
*** ramishra has quit IRC | 12:42 | |
*** martyntaylor has quit IRC | 12:46 | |
*** martyntaylor has joined #tripleo | 12:48 | |
*** lucas-hungry is now known as lucasagomes | 12:49 | |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-incubator: Don't hard code the baremetal seed IP in seedrc https://review.openstack.org/83126 | 12:53 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-incubator: Write network configuration into seeds config.json https://review.openstack.org/83125 | 12:53 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-incubator: Make the baremetal-network configurable https://review.openstack.org/82327 | 12:53 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: TEST ONLY: make nova depend on common-venv https://review.openstack.org/79989 | 12:54 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Openstack-clients: don't hard code venv https://review.openstack.org/79988 | 12:54 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Wire in _EXTRA_INSTALL_OPTS... https://review.openstack.org/76966 | 12:54 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Add a new common-venv element https://review.openstack.org/76967 | 12:54 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Horizon: dynamically set config time env vars https://review.openstack.org/82611 | 12:54 |
*** jprovazn has joined #tripleo | 12:55 | |
openstackgerrit | A change was merged to openstack/tuskar-ui: Adding overcloud keystone client https://review.openstack.org/84379 | 12:58 |
*** jistr|mobi has quit IRC | 13:01 | |
*** yamahata has joined #tripleo | 13:02 | |
*** matsuhashi has quit IRC | 13:02 | |
*** matsuhashi has joined #tripleo | 13:02 | |
openstackgerrit | gerry-drudy proposed a change to openstack/tripleo-image-elements: Add swift-get-nodes, swift-recon and swift-recon-cron https://review.openstack.org/84689 | 13:03 |
*** matsuhashi has quit IRC | 13:07 | |
*** jcoufal has quit IRC | 13:08 | |
*** petertoft has joined #tripleo | 13:08 | |
*** matsuhashi has joined #tripleo | 13:10 | |
*** jistr|mobi has joined #tripleo | 13:11 | |
*** jcoufal has joined #tripleo | 13:11 | |
*** CLOUDOUTAGE has joined #tripleo | 13:11 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 13:11 |
*** CLOUDOUTAGE has quit IRC | 13:11 | |
openstackgerrit | Radomir Dopieralski proposed a change to openstack/tuskar-ui: Overcloud initialization https://review.openstack.org/83340 | 13:13 |
*** matty_dubs|gone is now known as matty_dubs | 13:18 | |
*** CaptTofu has joined #tripleo | 13:20 | |
*** CLOUDOUTAGE has joined #tripleo | 13:42 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 13:42 |
*** CLOUDOUTAGE has quit IRC | 13:42 | |
*** jistr|english is now known as jistr | 13:43 | |
*** jistr|mobi has quit IRC | 13:44 | |
derekh | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle : controller is back up, floating IP seem to be giving trouble any body want to jump in and see if they can spot the problem ? | 13:45 |
openstackgerrit | A change was merged to openstack/tripleo-incubator: Load undercloud images with -d (delete duplicate) https://review.openstack.org/84478 | 13:47 |
*** akuznetsov has quit IRC | 13:47 | |
*** jpeeler1 is now known as jpeeler | 13:50 | |
*** jpeeler has joined #tripleo | 13:50 | |
*** akuznetsov has joined #tripleo | 13:51 | |
*** ramishra has joined #tripleo | 14:00 | |
*** Rakesh5 has quit IRC | 14:07 | |
dprince | derekh: seeing a traceback in the neutron-l3-agent log file... | 14:10 |
derekh | :q | 14:10 |
Goneri | slagle: Hi, I don't know if you notified. I answered to your comment https://review.openstack.org/#/c/84693/ | 14:11 |
derekh | dprince: hmm, that was possibly when o-c-c was restarting things not sure | 14:12 |
dprince | derekh: maybe, I don't usually see it but it but sure | 14:13 |
derekh | dprince: I did notice that there is no router namespase | 14:13 |
*** CLOUDOUTAGE has joined #tripleo | 14:13 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 14:13 |
*** CLOUDOUTAGE has quit IRC | 14:13 | |
dprince | derekh: I bounced it | 14:13 |
derekh | dprince: that traceback says | 14:13 |
derekh | 2014-04-02 13:12:37.083 2951 ERROR neutron.agent.l3_agent [req-09de240a-6daf-4810-9898-de030ae6f30c None] Failed synchronizing routers due to RPC error | 14:13 |
derekh | so maybe relevant | 14:13 |
dprince | derekh: doesn't seem to have done anything though | 14:13 |
derekh | dprince: cool, I tried it a few minutes ago but worth another try | 14:14 |
dprince | derekh: Now that its up can we recreate the NW again? | 14:14 |
*** jpeeler has quit IRC | 14:17 | |
*** jpeeler has joined #tripleo | 14:18 | |
derekh | dprince: could be worth a try, you mean remove and recreate the 3 networks in neutron? It would be nice to figure out whats gone wrong but it could end up bening our only course of action | 14:19 |
dprince | derekh: I'm not sure what has been done, just trying to clean slate it as much as possible to get it back up. | 14:20 |
openstackgerrit | gerry-drudy proposed a change to openstack/tripleo-image-elements: Add swift-get-nodes, swift-recon and swift-recon-cron https://review.openstack.org/84689 | 14:20 |
derekh | dprince: yup, pretty much everything I've done so far is in the etherpad https://etherpad.openstack.org/p/cloud-outage | 14:21 |
*** hashar has quit IRC | 14:22 | |
*** hashar has joined #tripleo | 14:23 | |
dprince | derekh: The core network/DHCP on this machine is still concerning to me. | 14:23 |
*** bauzas has quit IRC | 14:24 | |
derekh | dprince: yup, were running a manually edited dhcp-all-interfaces that I edited today along with a manually edited /etc/network/interfaces | 14:25 |
dprince | derekh: as far as I can tell at this point if the machine is rebooting for any reason it will likely not come back up. :( | 14:25 |
derekh | dprince: but I agree, it would be create if we could bring this back up with a new image and fixed dhcp-all-interfaces | 14:26 |
dprince | derekh: oh well. I guess that is the cause w/ TripleO in general until ensure-bridge fixes land though :( | 14:26 |
*** e0ne_ has joined #tripleo | 14:26 | |
derekh | dprince: yup | 14:26 |
*** rdopieralski has quit IRC | 14:26 | |
derekh | dprince: so the error that happened this morning is the same as the one we kept getting a few weeks ago when controllers randomly stopped accepting tcp connections | 14:28 |
derekh | I'm starting to thing its more then just a hardware issue (which is what is was put down to the last time) | 14:28 |
dprince | derekh: related to the underlying NW driver? | 14:28 |
derekh | dprince: yup | 14:28 |
*** e0ne has quit IRC | 14:30 | |
dprince | derekh: So... with regards to statically assigning the IP I'm fine with that as a stop-gap but we should at least persist it in /etc/network/interfaces | 14:32 |
dprince | derekh: otherwise it is game over if someone else comes in and reboots this baby | 14:32 |
* dprince really prefers using ifup/ifdown rather than ad hoc ip commands | 14:33 | |
derekh | dprince: yup, I was hoping somebody could look at the interfaces file I edited to see if I screwed something up | 14:33 |
derekh | so anybody out there with a running devstack, remind me, where is the floating ip mapped to a private IP ? iptables? | 14:34 |
derekh | *devtest | 14:34 |
dprince | derekh: I think we need the qrouter namespace first. | 14:37 |
dprince | derekh: Then this would do it | 14:37 |
dprince | derek: ip netns exec <namespace> iptables-save | 14:37 |
*** jprovazn is now known as jprovazn_afk | 14:37 | |
dprince | derekh: which is why I wanted to recreate the NW | 14:37 |
dprince | derekh: where do those scripts live? (to recreate the neutron networks) | 14:37 |
derekh | dprince: ya, I was suspicious that it was missing, should it be created by the neutron-ovs-agent | 14:37 |
dprince | derekh: not sure | 14:38 |
derekh | dprince: yup, can we create the networks with a specific id? | 14:38 |
dprince | derekh: lets bounce that too | 14:38 |
derekh | dprince: ok | 14:38 |
derekh | dprince: the network id have to match what infra has in their configs | 14:39 |
derekh | which is why I ask about the id | 14:39 |
dprince | derekh: not sure, checking. I've never done that. | 14:41 |
dprince | derekh: if we recreated the overcloud from scratch just yesterday it must be possible (which is why I thought it might have been scripted) | 14:41 |
* dprince isn't sure what all has happened on the HP cloud yet... | 14:41 | |
derekh | dprince: updated yesterday https://review.openstack.org/#/c/84263/3/modules/openstack_project/templates/nodepool/nodepool.yaml.erb | 14:44 |
*** CLOUDOUTAGE has joined #tripleo | 14:44 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 14:44 |
*** CLOUDOUTAGE has quit IRC | 14:44 | |
*** mrunge has quit IRC | 14:44 | |
dprince | derekh: sounds like an opportunity to add a neutron feature to me :) | 14:45 |
derekh | dprince: yup | 14:45 |
dprince | derekh: anyway so the l3-agent should be creating this namespace so that is our problem | 14:45 |
dprince | derekh: wanna enable debug and bounce the l3-agent to get more info? | 14:46 |
derekh | dprince: yup, lets do that | 14:46 |
*** geerdest has joined #tripleo | 14:46 | |
derekh | dprince: I'm doing it now | 14:46 |
dprince | derekh: okay, will let you. Presumably you are editing the os-apply-config source, and then re-running os-collect-config? | 14:47 |
fungi | nova list|grep -c ERROR | 14:47 |
fungi | 52 | 14:47 |
dprince | fungi: we are working on it sir | 14:47 |
fungi | dprince: i expected you were all on top of it. thanks! | 14:47 |
dprince | fungi: but let us know if that number goes up :) | 14:47 |
fungi | it won't. not much anyway. nodepool thinks it's only allowed to have 55 instances in that provider | 14:48 |
dprince | fungi: ack, thanks | 14:48 |
derekh | dprince: only goiong to edit the neutron config file (I'm kind of afraid to rerun o-c-c as it does lots of things) | 14:48 |
*** akrivoka has quit IRC | 14:49 | |
*** akrivoka has joined #tripleo | 14:49 | |
fungi | the remaining 3 in its allowance are currently in a build state according to novaclient | 14:49 |
dprince | derekh: scared to use the TripleO recommended way of doing things! Never! | 14:49 |
derekh | shh | 14:49 |
* dprince tails log in anticipation of debug messages | 14:50 | |
*** matsuhashi has quit IRC | 14:52 | |
derekh | dprince: hold one, while I pull a thought from the back of my memory | 14:52 |
morazi | lsmola_, slagle I'm talking a bit with jdob and d0ugal about what work is left from a tuskar api side to support for swift. I noticed you had a bit of a patch out there in progress. Are you driving that forward? | 14:52 |
dprince | derekh: So I think the router namespace is only going to be re-added if change/add a router | 14:53 |
dprince | derekh: lets create a new one and then delete it? | 14:53 |
derekh | dprince: neutron agent-list shows agents with hostnames with novalocal and some without | 14:53 |
derekh | dprince: they change after the reboot | 14:53 |
lsmola_ | morazi: yeah it should be done | 14:53 |
dprince | derekh: oh... | 14:53 |
lsmola_ | morazi: swift is working for me | 14:53 |
derekh | dprince: I seem to remember lifeless saying something about this | 14:53 |
andrearosa | wendar: I have a question about one og your change (https://review.openstack.org/81552) are u around? | 14:54 |
derekh | dprince: and the fact that the route is added to the agent with the old name (or something like that) | 14:54 |
lsmola_ | morazi: the latest patch is just cosmetic change I forgot, so it presents the users correct heat params | 14:54 |
derekh | dprince: lemme check some logs | 14:54 |
dprince | derekh; The one agent is marked w/ xxx (meaning down) | 14:55 |
dprince | derekh: so perhaps it got assigned this router? | 14:55 |
derekh | dprince: yup, that what I'm thinking | 14:55 |
derekh | dprince: lets remove .novalocal from the hostname and restart it again | 14:56 |
dprince | derekh: so perhaps this is because the machine was rebooted then? | 14:57 |
derekh | dprince: yup, I'm pretty sure this has happened before | 14:58 |
dprince | derekh: bingo | 14:58 |
dprince | derekh: we have qrouter | 14:58 |
derekh | qrouter-917ea51e-050a-4945-8af7-1040e77876a1 | 14:58 |
dprince | derekh: I see floating IP rules too | 14:59 |
derekh | dprince: ok, going to create a new instance and tests it floating ip | 14:59 |
dprince | derekh: go, go! | 14:59 |
*** Matt2 has joined #tripleo | 14:59 | |
derekh | booting, back in 5 minutes | 15:00 |
*** untriaged-bot has joined #tripleo | 15:01 | |
untriaged-bot | Untriaged bugs so far: | 15:01 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1301431 | 15:01 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1301435 | 15:01 |
uvirtbot | Launchpad bug 1301431 in tripleo "Nova compute service failed to rebind to rabbit on control node after control node update("nova rebuild")" [Undecided,New] | 15:01 |
uvirtbot | Launchpad bug 1301435 in tripleo "devtest -c with --offline and specific DIB_REPOREF_<project>'s fails if cache is older than refs" [Undecided,Confirmed] | 15:01 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1290488 | 15:01 |
uvirtbot | Launchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete] | 15:01 |
*** untriaged-bot has quit IRC | 15:01 | |
lsmola_ | morazi: so it should be working for a week or so :-) | 15:03 |
slagle | morazi: i +2'd the patch. am happy to test anything once it's in an rpm build as well | 15:05 |
lsmola_ | d0ugal: about the endpoint, probably speak with slagle how to do it best | 15:05 |
lsmola_ | d0ugal: I expect you will have to add code to devtest setup-endpoints, so we can use it in a comfortable way | 15:06 |
d0ugal | lsmola_: right, makes sense. I'll take a look. | 15:06 |
jistr | i didn't approve this one yet https://review.openstack.org/#/c/84705/ | 15:07 |
jistr | but i suppose i can - Tuskar is still not part of CI jobs in any way, so it cannot interfere, right? | 15:07 |
lsmola_ | slagle: the support for swift has been done here https://github.com/openstack/tuskar/commit/b6e1c9d0c3b1e2cca37bce8bd46626782455f02a | 15:07 |
derekh | dprince: ok, I can now ssh to the floating IP for the te-broker, so we've made progress | 15:08 |
lsmola_ | slagle: the current patch is just fix for UI | 15:08 |
derekh | dprince: but can't ssh to the new instance | 15:08 |
jistr | is anyone against me approving that Tuskar patch even though CI deployment jobs didn't run there yet? | 15:08 |
lsmola_ | jistr: yes we are not in CI | 15:08 |
dprince | derekh: Hmm. Is this an MTU issue? | 15:08 |
derekh | dprince: although I do see iptables rules for it | 15:08 |
lsmola_ | jistr: we==Tuskar | 15:08 |
jistr | lsmola_: right. I'm going to approve it. | 15:09 |
dprince | derekh: I'm not familiar with the network/MTU settings on the HP rack yet | 15:09 |
derekh | dprince: the mtu issue usually caused slowness, I don't think it caused any problems connecting | 15:09 |
lsmola_ | jistr: cool | 15:09 |
derekh | dprince: trying another new instance for the hell of it | 15:09 |
dprince | derekh: okay, seems like it could cause both but lets go with that | 15:09 |
dprince | derekh: could be a bad compute host? | 15:10 |
derekh | dprince: yup | 15:10 |
*** e0ne_ has quit IRC | 15:11 | |
dprince | derekh: that instance landed on compute1. Looks like that compute host has a tone of q... interfaces. | 15:14 |
openstackgerrit | A change was merged to openstack/tuskar: Swift parameters fix https://review.openstack.org/84705 | 15:14 |
*** CLOUDOUTAGE has joined #tripleo | 15:15 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 15:15 |
*** CLOUDOUTAGE has quit IRC | 15:15 | |
derekh | dprince: left over from when lots of instances were running? | 15:15 |
dprince | derekh: Not sure. I don't think they should still be there though. | 15:15 |
derekh | dprince: the one on compute1 is one I probably deleted, the newer one is on compute2 | 15:15 |
derekh | dprince: agreed they should be gone | 15:15 |
derekh | so now the floating IP's are working, dhcp has stopped working... instance didn't get its private IP | 15:16 |
*** ifarkas has quit IRC | 15:16 | |
derekh | dprince: this is the opposite to what I had ealier | 15:16 |
dprince | derekh: right, well I wanted to check the original instance... could be related to the fact it was unpingable. | 15:16 |
dprince | derekh: bounce dhcp too? | 15:17 |
dprince | derekh: after switching the hostname? | 15:17 |
derekh | dprince: ok, just did it, trying again | 15:18 |
dprince | derekh: we should have just re-ran os-collect-config man! | 15:18 |
dprince | derekh: that would bounce everything... which may be a problem in TripleO at some point but for the case we have here (switching the hostname) is probably what we want | 15:19 |
*** mtaylor is now known as mordred | 15:19 | |
derekh | dprince: may have been better, I had to run it 4 times when the server was rebooted tweaking things each time to get it to complete, just didn't want to go through that again | 15:19 |
*** mordred has quit IRC | 15:19 | |
*** mordred has joined #tripleo | 15:19 | |
*** ifarkas has joined #tripleo | 15:21 | |
*** openstackgerrit has quit IRC | 15:21 | |
*** openstackgerrit has joined #tripleo | 15:22 | |
wendar | Hi andrearosa, I'm around now. | 15:25 |
*** e0ne has joined #tripleo | 15:26 | |
derekh | dprince: ok, that didn't work, will we do it your way ? rerun o-c-c ? | 15:26 |
dprince | derekh: worth a shot, So DHCP still isn't working after restarting the neutron-dhcp-agent then (post hostname change)? | 15:28 |
derekh | dprince: correct | 15:28 |
andrearosa | wendar: in that change you added new Parameters in the overcloud-source.yaml and nova-compute-instance.yaml, why in both? is it not enough to add Params in the overcloud-source.yaml? | 15:28 |
derekh | dprince: ok, running o-c-c | 15:28 |
wendar | andrearosa: Because the parameters have to be passed into the overcloud templates from the call to heat in the tripleo scripts, and then the overcloud templates pass them on to the nova-compute-instance template. You'll see the same pattern with several other parameters. | 15:29 |
*** UtahDave has joined #tripleo | 15:29 | |
*** UtahDave has left #tripleo | 15:29 | |
dprince | derekh: is neutron-openvswitch-agent on the compute node happy as well? | 15:30 |
derekh | dprince: lots of errors about failed ovs ports | 15:31 |
derekh | 2014-04-02 15:09:52.197 3526 WARNING neutron.agent.linux.ovs_lib [-] Found failed openvswitch port: [u'qvo3be27798-44', [u'map', [[u'attached-mac', u'fa:16:3e:81:b1:f0'], [u'iface-id', u'3be27798-4411-4f22-9cfa-169889ca50de'], [u'iface-status', u'active'], [u'vm-uuid', u'bb708bf9-aa56-4236-8e7d-8a6475c0e320']]], -1] | 15:31 |
andrearosa | wendar: so basically if we want to add a new Parameter for the compute and notcompute (controller) we need to declare the Parameter in overcloud-source.yaml, nova-compute-instance.yaml and notcompute.yaml? | 15:32 |
openstackgerrit | Jon-Paul Sullivan proposed a change to openstack/diskimage-builder: Change refspec used to fetch all branches and tags https://review.openstack.org/84763 | 15:32 |
wendar | andrearosa: Yes, that would be the current state of affairs. | 15:32 |
wendar | IIRC, that may change, but I don't have a timeline. | 15:33 |
andrearosa | it's not 100% clear to me but that explains why one fo my changes was not working properly! Thank you very much! | 15:33 |
andrearosa | wendar: ^^ | 15:33 |
wendar | andrearosa: Glad to help. :) | 15:34 |
openstackgerrit | Dan Prince proposed a change to openstack/tripleo-image-elements: Make tripleo-cd's te_localrc to support controller https://review.openstack.org/84765 | 15:35 |
wendar | andrearosa: You can think of each template kind of like a subroutine call. And the overcloud calls the compute and notcompute. So it first has to accept the parameters before it can pass them along. | 15:35 |
*** ilives has joined #tripleo | 15:35 | |
andrearosa | nice explanation, ta | 15:37 |
derekh | dprince: o-c-c completed fine, still no luck with dhcp | 15:38 |
*** spzala has joined #tripleo | 15:40 | |
dprince | derekh: well, then. DHCP is always a kicker isn't it. Let me see... | 15:41 |
*** hashar has quit IRC | 15:42 | |
*** matty_dubs is now known as matty_dubs|lunch | 15:45 | |
*** CLOUDOUTAGE has joined #tripleo | 15:46 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 15:46 |
*** CLOUDOUTAGE has quit IRC | 15:46 | |
SpamapS | wait it's still down? | 15:51 |
*** rpodolyaka1 has joined #tripleo | 15:51 | |
SpamapS | derekh: sitrep? | 15:51 |
*** eghobo has joined #tripleo | 15:51 | |
dprince | derekh: see the etherpad... | 15:51 |
dprince | derekh: wondering if we need to do the same to all the computes? What is the root cause of this hostname issue? (how did it even happen) | 15:52 |
SpamapS | gah, we have to fix this hostname flipping stuff | 15:52 |
SpamapS | the hostname flipping thing has to do with bad interactions between cloud-init and openstack, IIRC | 15:52 |
SpamapS | I've never gotten a good handle on it. | 15:52 |
* mordred reminds people that he thinks everything about how hostnames are handled is evil | 15:53 | |
SpamapS | mordred: +1 | 15:53 |
SpamapS | or should I say, +666 | 15:53 |
SpamapS | so why did the SSH key for ci-overcloud change again? | 15:53 |
dprince | SpamapS: another day another cloud? | 15:53 |
dprince | SpamapS: no idea. I just roll w/ it | 15:54 |
SpamapS | SSH really needs to grow some kind of PKI other than "I remember your SSH key was X" | 15:54 |
Ng | it has PKI | 15:55 |
SpamapS | Ng: can I sign the host key somehow with a CA? | 15:55 |
SpamapS | Ng: or "kerberos" | 15:55 |
SpamapS | don't say kerberos | 15:55 |
SpamapS | I'll spit coffee at you | 15:55 |
dprince | derekh/SpamapS: what is the root cause of the hostname flipping? | 15:55 |
Ng | SpamapS: http://blog.habets.pp.se/2011/07/OpenSSH-certificates | 15:55 |
Ng | SpamapS: I would never say kerberos, I hate that crap | 15:56 |
dprince | derekh: We can write a quick script to fix all the computes, and then bounce vSwitch's on them/ | 15:56 |
dprince | derekh: ? | 15:56 |
Ng | signed host keys *and* signed user keys. I've never deployed it, but it looks like giant epic win | 15:56 |
SpamapS | dprince: I believe what happens is nova tells cloud-init to name the box '$name', and cloud-init does so. Then on reboot, the system's init scripts see that it has a searchdomain of .novalocal, and that gets tacked on. | 15:56 |
openstackgerrit | Andrea Rosa proposed a change to openstack/tripleo-heat-templates: Adding the reserved host disk https://review.openstack.org/84770 | 15:56 |
SpamapS | Ng: that's exactly what I want. | 15:56 |
SpamapS | good to know I'm still about 2 years behind smart people. | 15:57 |
Ng | SpamapS: I only came across it recently and I was all "wow this is a great new feature" and then realised that it's already years old ;) | 15:57 |
dprince | SpamapS: So it happens on reboot, presumably that means all of our overcloud boxes got rebooted then. | 15:57 |
SpamapS | dprince: right | 15:58 |
SpamapS | dprince: it's possible that we have a boot time race | 15:58 |
SpamapS | dprince: os-collect-config may need to wait for cloud-config | 15:58 |
SpamapS | actually I think that is likely | 15:58 |
SpamapS | also I think persisting the metadata means that on bootup we're not running os-collect-config's command anyway | 15:59 |
SpamapS | oops | 15:59 |
dprince | SpamapS: we shouldn't unless there is a change | 15:59 |
SpamapS | dprince: except that there is system state that we need to assert | 16:00 |
dprince | SpamapS: services should start themselves via persistent configs | 16:00 |
SpamapS | dprince: yeah, that's really the failing then isn't it? | 16:00 |
SpamapS | so here's a thought.. | 16:00 |
SpamapS | cloud-config may be runnign in parallel with the persistent service startups | 16:00 |
SpamapS | running even | 16:00 |
dprince | SpamapS: That is my prospective. os-collect-config isn't smart enough to assert a config. It just bounces everything. | 16:00 |
openstackgerrit | Ana Krivokapic proposed a change to openstack/tuskar-ui: Use num_nodes to get node count if possible https://review.openstack.org/84702 | 16:01 |
SpamapS | dprince: well it isn't asserting config, it is asserting state. | 16:01 |
SpamapS | But the services should in fact be asserting their own state too | 16:01 |
dprince | SpamapS: if we make it run on reboot now, it'll cause a double restart of all the services. (really a bad idea) | 16:01 |
SpamapS | anyway, the reason os-collect-config --force --one works is because it restarts everything and they all match gethostname() to their agent records. | 16:02 |
derekh | dprince: on the quick script, let me try one node and reboot an instance on it to see if it works | 16:02 |
derekh | dprince: if it does we can do them all | 16:02 |
*** killer_prince has joined #tripleo | 16:02 | |
SpamapS | dprince: Right so we need to not start anything until the hostname is stable. | 16:02 |
dprince | derekh: ack, may as well re-run os-collect-config on them all | 16:02 |
dprince | SpamapS: exactly. Having it swap out is not cool | 16:03 |
SpamapS | so... let's see | 16:03 |
SpamapS | /var/log/boot.log has the output of cloud-config | 16:03 |
dprince | SpamapS: I don't think is a startup race. I think the reboot is the culprit. But we absolutely need to support that | 16:03 |
SpamapS | the reboot is a startup.. no? | 16:04 |
SpamapS | the difference is that on the _first_ startup we run os-refresh-config | 16:04 |
dprince | SpamapS: it should be logged that way | 16:04 |
*** petertoft has quit IRC | 16:04 | |
derekh | dprince: compute4 has novalocal in its hostname and compute2 doesn't..... | 16:05 |
derekh | well thats just weird | 16:05 |
dprince | derekh: perhaps only select computes have been rebooted. | 16:06 |
dprince | derekh: I added a full list to the etherpad | 16:06 |
*** yamahata has quit IRC | 16:06 | |
dprince | derekh: most of them are .novalocal | 16:07 |
derekh | dprince: they should all have been rebooted see the script lifeless ran in /root/recovert on the undercloud | 16:07 |
SpamapS | gah, one evil thing is /etc/init/hostname.conf ... we actually run the hostname command before all the filesystems are even mounted readonly. | 16:07 |
SpamapS | doesn't systemd have a specific thing just for setting the hostname? | 16:07 |
dprince | SpamapS: not sure. I think in the TripleO case cloud-init will do it as well. | 16:08 |
SpamapS | so I mean the command 'hostname' | 16:08 |
SpamapS | not editting /etc/hosts | 16:08 |
*** e0ne has quit IRC | 16:09 | |
SpamapS | cloud-init does in fact set it though | 16:09 |
*** e0ne has joined #tripleo | 16:09 | |
dprince | SpamapS: so we have systemd-hostnamed.service which I think does what you are asking | 16:10 |
*** cwolferh has joined #tripleo | 16:11 | |
dprince | derekh: IMO at this point I'm close to saying lets just respin the whole cloud. IMO rebooting is something that is know to be broken... | 16:11 |
dprince | derekh: because we don't persist our core networking configs, among other things | 16:11 |
SpamapS | wait | 16:11 |
dprince | derekh: my take... | 16:11 |
SpamapS | so are you saying os-collect-config --force--one is not fixing them? | 16:12 |
*** e0ne has quit IRC | 16:12 | |
SpamapS | --force --one I mean | 16:12 |
*** ifarkas has quit IRC | 16:12 | |
SpamapS | because it has fixed this exact issue in the past | 16:12 |
dprince | SpamapS: it may | 16:12 |
dprince | SpamapS: I said I'm close... we are trying that now | 16:12 |
SpamapS | ok | 16:12 |
derekh | dprince: ya, I'm starting to think the same, | 16:12 |
dprince | SpamapS: but it is like one thing after another. | 16:12 |
SpamapS | because respin is more unknowns. :P | 16:12 |
SpamapS | dprince: yeah, hopefully we're getting a clue as to where we are weakest though. | 16:13 |
dprince | SpamapS: Not if we use the same images (with known problems we can fix!) | 16:13 |
dprince | SpamapS: respin with new images... yes. A total roll of the dice | 16:13 |
dprince | derekh: on that note I'd like to consider a way to archive our images for the CI overcloud. | 16:13 |
derekh | SpamapS: dprince ok lets remove novalocal from hostnames and run o-c-c on all compute nodes | 16:14 |
derekh | SpamapS: dprince if that doesn't work then maybe a redeploy (but it would be great to avoidn having to send new net id's into infra's config) | 16:15 |
dprince | derekh: fine, I'll do it | 16:16 |
derekh | dprince: ok, go for it | 16:17 |
*** e0ne has joined #tripleo | 16:17 | |
*** eghobo has quit IRC | 16:17 | |
SpamapS | dprince: well ideally we wouldn't delete the existing ones | 16:17 |
*** CLOUDOUTAGE has joined #tripleo | 16:17 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 16:17 |
*** CLOUDOUTAGE has quit IRC | 16:17 | |
SpamapS | we'd just rename them | 16:17 |
*** eghobo has joined #tripleo | 16:17 | |
*** gcha has quit IRC | 16:17 | |
SpamapS | and use IDs for the stack parameter | 16:17 |
SpamapS | then they'll just stick around in glance. | 16:17 |
howleyt | Are there any tricks for getting rid of a heat stack stuck in 'DELETE_FAILED' state? | 16:17 |
SpamapS | derekh: you should not need to remove anything | 16:17 |
SpamapS | derekh: you shouldn't have to do anything manual except run 'os-collect-config --force --one' | 16:18 |
howleyt | I'd love to avoid having to re-run devtest from the beginning | 16:18 |
SpamapS | howleyt: delete again | 16:18 |
SpamapS | howleyt: or stack-abandon if you know the resources are in fact deleted | 16:18 |
SpamapS | dprince: ^^ do not manually edit /etc/hostname or anything | 16:19 |
dprince | SpamapS: I'm writing a script man | 16:19 |
howleyt | SpamapS: I already tried both. Did a stack-abandon after I had deleted the nova instances, but it stays in DELETE_FAILED. | 16:19 |
howleyt | Resource DELETE failed: NotFound: No resource data | | 16:19 |
howleyt | | | found | 16:19 |
SpamapS | dprince: do not automatically edit /etc/hostname or anything :) | 16:19 |
SpamapS | or run the 'hostname' command | 16:19 |
dprince | SpamapS: why note? | 16:20 |
SpamapS | just run os-collect-config --force --one | 16:20 |
SpamapS | howleyt: you have found a bug in heat | 16:20 |
*** e0ne has quit IRC | 16:20 | |
*** rpodolyaka1 has quit IRC | 16:20 | |
SpamapS | howleyt: please report it. I'm glad to help you debug it. | 16:20 |
*** e0ne has joined #tripleo | 16:21 | |
dprince | SpamapS: running just os-collect-config doesn't fix the hostname sir. | 16:21 |
SpamapS | dprince: we don't need to fix the hostname. we need to fix the running daemons. | 16:21 |
howleyt | SpamapS: Think there maybe a bug for this already, let me check first. | 16:21 |
SpamapS | dprince: as long as hostname == agent name they'll be happy | 16:21 |
SpamapS | they were likely started with the wrong hostname and now are ignoring messages for themselves. | 16:22 |
dprince | SpamapS: well, we were going for consistency as well. But whatever | 16:22 |
dprince | SpamapS: if we use this approach we'll have a mixed bag... | 16:22 |
SpamapS | dprince: consistency is to accept the (broken) automatically assigned hostname unfortunately. | 16:22 |
SpamapS | dprince: yes, we'll have a mixed bag, until we fix this problem, and deploy new images which don't have it. | 16:23 |
dprince | derekh: the os-collect-configs are running now. | 16:23 |
dprince | derekh: in serial :(. But probably fine | 16:23 |
SpamapS | that's probably better really :) | 16:24 |
SpamapS | neutron will die if we hit it with all the work at once :) | 16:24 |
SpamapS | dprince: so I think we might want to change the system default startup to delay runlevel 2 until cloud-config is done. | 16:25 |
SpamapS | I think we may even want to question upstream why they don't do this. | 16:25 |
*** e0ne has quit IRC | 16:25 | |
SpamapS | dprince: the alternative is to have the os-svc-install upstart jobs changed to be 'start on runlevel [2345] and stopped cloud-config' | 16:25 |
SpamapS | (cloud-config is a "task", hence the 'stopped') | 16:26 |
SpamapS | task + stopped == done | 16:26 |
derekh | dprince: SpamapS I'll be popping off soon, pretty much everything I did is on the etherpad, can ye keep it uptodate so as lifeless want to do a postportem after | 16:26 |
dprince | derekh: Thanks for all the work dude | 16:28 |
SpamapS | derekh: sure and many many thanks | 16:28 |
SpamapS | to both of ye ;) | 16:28 |
SpamapS | I feel like we need full time ops on this cloud | 16:28 |
derekh | no probs, | 16:28 |
SpamapS | not joking | 16:29 |
*** rpodolyaka1 has joined #tripleo | 16:29 | |
SpamapS | like when we get it from "on fire" to "smoldering" we need to not then go back to writing code that breaks it. ;) | 16:29 |
derekh | SpamapS: btw, the origional problem "ci-controller not responding to tcp connections" is the exact same as what we had a few weeks back around the time of the sprint | 16:30 |
dprince | SpamapS: on Fedora we have os-collect-config run After=cloud-final.service | 16:30 |
dprince | SpamapS: which I think means we are good to go. | 16:30 |
*** d0ugal has quit IRC | 16:30 | |
derekh | SpamapS: I'm starting to think it can't be the same HW problem again and we have a net kernel module problem | 16:30 |
dprince | SpamapS: For Debian it would be nice if you could wait until after cloud-init is finished... | 16:30 |
*** ramishra has quit IRC | 16:31 | |
derekh | but that just me thinking out loud with no proof | 16:31 |
derekh | outgoing tcp was fine | 16:31 |
SpamapS | derekh: Oh we have that too | 16:31 |
SpamapS | derekh: if we haven't upgraded the mellanox driver then that is the problem. | 16:31 |
SpamapS | derekh: I was hoping we'd be able to limp that ci-overcloud controller along until trusty, which has the newer mellanox built in. | 16:31 |
derekh | SpamapS: ok cool | 16:32 |
SpamapS | dprince: cloud-config != cloud-init | 16:32 |
*** d0ugal has joined #tripleo | 16:32 | |
SpamapS | dprince: cloud-final might be cloud-init + cloud-config .. is it? | 16:32 |
* derekh signs off | 16:32 | |
*** derekh has quit IRC | 16:32 | |
dprince | SpamapS: I think so | 16:33 |
*** d0ugal has quit IRC | 16:33 | |
dprince | SpamapS: I don't see any dependencies on Debian though so you'll probably want something | 16:33 |
*** jistr has quit IRC | 16:33 | |
*** e0ne has joined #tripleo | 16:33 | |
SpamapS | dprince: ok, I'll change all of our upstart jobs to be runlevel [2345] and stopped cloud-config. | 16:33 |
SpamapS | though that leaves mysql screwed | 16:33 |
SpamapS | and probably rabbitmq too | 16:33 |
dprince | SpamapS: No need to change all of them | 16:33 |
SpamapS | which is why I'm wondering if we should override the rc default | 16:34 |
*** e0ne_ has joined #tripleo | 16:34 | |
SpamapS | dprince: this is not just os-collect-config's problem | 16:34 |
dprince | SpamapS: I think just os-collect-config would suffice here no? | 16:34 |
SpamapS | dprince: on reboot, all of the services start on their own... and they're starting before cloud-config sets the hostname to the one from ec2 metadata | 16:34 |
SpamapS | _I think_ | 16:35 |
* SpamapS curses race conditions | 16:35 | |
*** stevehuang has joined #tripleo | 16:35 | |
dprince | SpamapS: not sure about that. I think changing them all may be overkill | 16:35 |
*** e0ne has quit IRC | 16:35 | |
dprince | SpamapS: if we change hostnames we may need a scrubber/cleaner for other reasons anyways | 16:35 |
howleyt | SpamapS: think I need to pull some changes in, probably hitting this one: https://bugs.launchpad.net/tripleo/+bug/1291060 | 16:36 |
uvirtbot | Launchpad bug 1291060 in tripleo "stack delete overcloud fails on Delete AccessKey "NovaCompute0Key"" [Critical,Fix released] | 16:36 |
SpamapS | dprince: We're not changing the hostname if we always start for the first time in the same state. | 16:36 |
SpamapS | dprince: changing the system-wide default for runlevel 2 to be after cloud-config should actually work fine. | 16:37 |
SpamapS | dprince: and I could make a strong argument that it should be the default, but upstart makes that hard. ;) | 16:37 |
*** martyntaylor has left #tripleo | 16:41 | |
SpamapS | dprince: I'll discuss w/ cloud-init people | 16:41 |
*** blamar_ has joined #tripleo | 16:41 | |
*** blamar has quit IRC | 16:42 | |
*** blamar_ is now known as blamar | 16:42 | |
*** ilives has quit IRC | 16:42 | |
openstackgerrit | Clint "SpamapS" Byrum proposed a change to openstack/diskimage-builder: Delay runlevel2 on Ubuntu until after cloud-config https://review.openstack.org/84790 | 16:43 |
SpamapS | dprince: ^^ suggested fix. Will test here locally. | 16:43 |
openstackgerrit | Tzu-Mainn Chen proposed a change to openstack/tuskar-ui: Adds additional overcloud deployment config params https://review.openstack.org/84791 | 16:43 |
dprince | SpamapS: isn't the definition of runlevel 2 after networking is up (and cloud-init requires networking to actually do its thing) | 16:44 |
dprince | SpamapS: Sorry, mistated that. | 16:44 |
dprince | SpamapS: runlevel 2 is without networking right? | 16:45 |
*** martyntaylor has joined #tripleo | 16:46 | |
*** CaptTofu has quit IRC | 16:47 | |
*** CLOUDOUTAGE has joined #tripleo | 16:48 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 16:48 |
*** CLOUDOUTAGE has quit IRC | 16:48 | |
dprince | SpamapS: I think pinning this to the runlevel is the sledge hammer solution. Why can't you just simply make os-collect-config run first after cloud-init finishes? | 16:48 |
jang1 | support-2% who -r | 16:48 |
jang1 | run-level 2 2014-04-02 11:22 | 16:48 |
jang1 | not on ubuntu, not any more. | 16:48 |
SpamapS | dprince: runlevel2, in debian land, is "ready to do business" | 16:48 |
SpamapS | dprince: networking, users, etc. | 16:49 |
jang1 | SpamapS: if that patch works, it'd be good to see it on debian-upstart, too. | 16:49 |
SpamapS | jang1: good point | 16:49 |
jang1 | (I don't think we have a generic "upstart", do we?) | 16:49 |
SpamapS | dprince: I don't see this as a sledgehammer. The system is booting and the hostname is changing right out from under daemons. | 16:50 |
SpamapS | jang1: no we don't | 16:50 |
SpamapS | jang1: but we could. :) | 16:50 |
jp_at_hp | slagle: nice spot on the manifests change... | 16:50 |
SpamapS | dprince: I'm looking into pushing this all the way up into Ubuntu. | 16:50 |
SpamapS | and I guess Debian too really | 16:51 |
SpamapS | dprince: I'm failing at communicating the problem. | 16:51 |
SpamapS | dprince: the problem is not that os-collect-config runs in parallel with cloud-config on the first boot. | 16:52 |
jp_at_hp | slagle: I think what I'm going to do is to undo my changes to the base store-build-settings file, and if those files exist, move them in the manifests element? Either that or make the base element depend on the manifests element. The reasoning being, if the manifests element is not included I still want something like that left in the base image... | 16:52 |
SpamapS | dprince: the problem is that _everything_ runs in parallel with cloud-config on every boot. So the hostname changes at ??? moment. | 16:52 |
SpamapS | dprince: I'm suggesting that we shoudl stop that. cloud-config will have the facilities it needs, and then everything else will start after it has morphed system state | 16:53 |
dprince | SpamapS: So long as networking is up that is fine. We aren't seeing that on Fedora... What guards it is the 'After' clause for os-collect-config. | 16:54 |
SpamapS | dprince: right, and then every other service starts After=os-refresh-config.service (btw is that a bug?) | 16:55 |
SpamapS | you shouldn't still have an os-refresh-config service | 16:55 |
dprince | SpamapS: so long as what you are doing works with dhcp-all-interfaces on first boot I think you'll be fine | 16:55 |
jang1 | I have a question. Why are we having any services start themselves at all, and not letting os-c-c do it? | 16:56 |
dprince | jang1: because o-c-c doesn't always run unless there is new metadata | 16:56 |
jang1 | okay. So, I've heard it claimed that a power state change is a config change. Shouldn't o-c-c actually run at least once on boot? | 16:57 |
SpamapS | dprince: oh interesting, there may be another race there. :) | 16:57 |
SpamapS | no.. no race | 16:57 |
SpamapS | start on starting network-interface | 16:57 |
SpamapS | dprince: so the chain for that is udev->device-added->starting network-interface->dhcp-all-interface runs and configures the interface->network-interface ifup's it, if they're all configured, emit static-networking-up | 16:58 |
SpamapS | but the ifup script that emits static-networking-up does a lock.. so in theory they'll all be blocked up on that. | 16:59 |
SpamapS | hm there may still be a race | 16:59 |
SpamapS | when I wrote it to check all interfaces every time, thats why I did all, not a single one | 16:59 |
SpamapS | but that's flawed too | 17:00 |
dprince | SpamapS: On Fedora I wrote the dhcp-interface@.service such that is is fail fast if the interface script exists (it doesn't even bother calling dhcp-all-interfaces.sh... so no need to flock it) | 17:00 |
SpamapS | dprince: anyway, I'm confident the patch I did will eliminate _a_ race, but not _all_ races. | 17:01 |
SpamapS | dprince: and it will have as much networking as it does today. | 17:02 |
dprince | SpamapS: so our overcloud is still busted :/ | 17:03 |
SpamapS | dprince: down agents? | 17:04 |
dprince | SpamapS: should be | 17:04 |
dprince | SpamapS: oh, restarted os-collect-config rather. | 17:04 |
SpamapS | neutron agents are up | 17:04 |
dprince | SpamapS: which should have rekicked all the agents | 17:04 |
SpamapS | nova computes are up | 17:05 |
*** jcoufal has quit IRC | 17:05 | |
SpamapS | restarting os-collect-config wouldn't do anything | 17:05 |
*** lucasagomes is now known as lucas-afk | 17:06 | |
SpamapS | running --force --one would bounce everything | 17:06 |
dprince | SpamapS: dude, that is what I ran! | 17:06 |
SpamapS | ok just making sure | 17:07 |
dprince | SpamapS: See /root/recovert/hostname_fix.sh | 17:07 |
SpamapS | so I see a working cloud. What's broken now? | 17:07 |
dprince | SpamapS: still no floating IPs | 17:07 |
dprince | SpamapS: we'll they aren't pinging | 17:07 |
dprince | SpamapS: it is getting to the point where my confidence in this is little to none | 17:07 |
dprince | SpamapS: I'm inclined to say lets fix some issues and try again | 17:08 |
*** newell_ has joined #tripleo | 17:08 | |
*** akuznetsov has quit IRC | 17:09 | |
mordred | /buffer Ng | 17:09 |
mordred | gah | 17:09 |
*** eguz has joined #tripleo | 17:09 | |
SpamapS | dprince: Well perhaps we should debug a bit more before giving up on it. We're gaining insight into why this is broken. | 17:10 |
dprince | SpamapS: Sure. Well FWIW I already knew rebooting caused problems :( | 17:10 |
SpamapS | dprince: yes, but did you know why? :) | 17:11 |
dprince | SpamapS: Well, one of them is DHCP (which is still hosed) | 17:11 |
dprince | SpamapS: the hostname thing I had forgotten though | 17:12 |
*** eghobo has quit IRC | 17:14 | |
*** killer_prince has quit IRC | 17:16 | |
openstackgerrit | Ricardo Carrillo Cruz proposed a change to openstack/tripleo-incubator: Add 'Supported Platforms' section in README.md https://review.openstack.org/84801 | 17:18 |
SpamapS | wow security-group-list is... useless | 17:18 |
*** CLOUDOUTAGE has joined #tripleo | 17:19 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 17:19 |
*** CLOUDOUTAGE has quit IRC | 17:19 | |
*** morganfainberg_Z is now known as morganfainberg | 17:23 | |
SpamapS | dprince: I'm able to get to instances I've booted | 17:26 |
dprince | SpamapS: cool, floating too? | 17:27 |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-image-elements: Configurable Keystone token provider https://review.openstack.org/84802 | 17:28 |
SpamapS | dprince: $ telnet 138.35.77.39 22 | 17:28 |
SpamapS | Trying 138.35.77.39... | 17:28 |
SpamapS | Connected to 138.35.77.39. | 17:28 |
SpamapS | Escape character is '^]'. | 17:28 |
SpamapS | SSH-2.0-OpenSSH_6.2p2 Ubuntu-6ubuntu0.1 | 17:28 |
SpamapS | dprince: had to add a security group rule allowing it | 17:28 |
dprince | SpamapS: Okay. I was testing one of derekh's. Not sure if he did that... | 17:28 |
SpamapS | I see nodepool slaves building too | 17:28 |
dprince | SpamapS: So I suppose our os-collect-config did the trick... | 17:29 |
SpamapS | wait no those might be old | 17:29 |
*** e0ne has joined #tripleo | 17:29 | |
SpamapS | dprince: word back from cloud-init is that hostname should not be changing after runlevel 2 btw | 17:29 |
SpamapS | so my theory is bust | 17:29 |
*** matty_dubs|lunch is now known as matty_dubs | 17:29 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-heat-templates: Configurable Keystone token provider https://review.openstack.org/84807 | 17:30 |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-incubator: Configurable Keystone token provider https://review.openstack.org/84808 | 17:30 |
*** e0ne has quit IRC | 17:31 | |
openstackgerrit | Coleman Corrigan proposed a change to openstack/tripleo-image-elements: Activate venvs in os-*-config elements source install https://review.openstack.org/84810 | 17:31 |
openstackgerrit | Coleman Corrigan proposed a change to openstack/tripleo-image-elements: Activate venvs in os-*-config elements source install https://review.openstack.org/84810 | 17:33 |
*** jprovazn_afk is now known as jprovazn | 17:37 | |
*** dividebin has joined #tripleo | 17:37 | |
*** e0ne_ has quit IRC | 17:38 | |
*** noslzzp has quit IRC | 17:38 | |
*** tchaypo has quit IRC | 17:38 | |
*** ohadlevy has quit IRC | 17:38 | |
*** dividehex has quit IRC | 17:38 | |
*** dividebin is now known as dividehex | 17:38 | |
*** akuznetsov has joined #tripleo | 17:40 | |
*** jcoufal has joined #tripleo | 17:41 | |
*** ohadlevy has joined #tripleo | 17:43 | |
*** ohadlevy is now known as Guest13973 | 17:43 | |
*** tchaypo has joined #tripleo | 17:44 | |
*** CLOUDOUTAGE has joined #tripleo | 17:50 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 17:50 |
*** CLOUDOUTAGE has quit IRC | 17:50 | |
openstackgerrit | Saurabh Surana proposed a change to openstack/tripleo-image-elements: base element for trove control plane elements https://review.openstack.org/82605 | 17:51 |
*** UtahDave has joined #tripleo | 17:57 | |
jprovazn | greghaynes, hi | 18:01 |
openstackgerrit | Tzu-Mainn Chen proposed a change to openstack/tuskar-ui: Adds additional overcloud deployment config params https://review.openstack.org/84791 | 18:06 |
*** akuznetsov has quit IRC | 18:09 | |
*** giulivo has quit IRC | 18:11 | |
openstackgerrit | Jay Dobies proposed a change to openstack/tuskar: Added keystone configuration to install guide https://review.openstack.org/84827 | 18:14 |
*** akuznetsov has joined #tripleo | 18:17 | |
*** dkehn__ has joined #tripleo | 18:19 | |
*** e0ne has joined #tripleo | 18:20 | |
*** CLOUDOUTAGE has joined #tripleo | 18:21 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 18:21 |
*** CLOUDOUTAGE has quit IRC | 18:21 | |
*** dkehn_ has quit IRC | 18:23 | |
tchaypo | surely greghaynes can't be awake at the moment | 18:27 |
jprovazn | tchaypo, what TZ is greghaynes? | 18:29 |
greghaynes | jprovazn: hey | 18:29 |
tchaypo | technically US-west, but he seems to work AUS hours | 18:29 |
greghaynes | psh, always awake :) | 18:29 |
jprovazn | greghaynes, :) sorry for waking you up | 18:30 |
greghaynes | oh, you didnt. Just had a few real world things to do | 18:30 |
jprovazn | greghaynes, I enjoyed today very "nice" time with galera cluster | 18:30 |
greghaynes | uh oh | 18:31 |
greghaynes | well... did it work? ;) | 18:32 |
*** noslzzp has joined #tripleo | 18:32 | |
jprovazn | greghaynes, the thing is that cluster init is more tricky - 1) start first node in standalone mode (boostrap param), then *after* cluster is created (after more nodes join) and this first node is being restarted, it can be restarted in common way (so it joins to cluster represented by other nodes) | 18:32 |
jprovazn | greghaynes, but if no other nodes have joined meantime and this first node is being restarted, then it should be started again as standalone | 18:33 |
greghaynes | yes, are you gettting at how rebooting the whole cluster is broken? | 18:33 |
jprovazn | greghaynes, no - ^ this is still first boot phase :) | 18:34 |
jprovazn | greghaynes, IOW current cluster status has to be considered when starting first node | 18:34 |
geerdest | the heat metadata needed to configure ntp server in overcloud…,where does that go? | 18:35 |
greghaynes | Well, its the same problem that comes up when the hole cluster is shut down and restarted - someone has to say "I have the actual on disk data" | 18:36 |
SpamapS | they call that leader election | 18:36 |
greghaynes | hehe yep | 18:36 |
*** rpodolyaka1 has quit IRC | 18:36 | |
SpamapS | and we had originally decided that node 0 is special for the bootstrap case only | 18:36 |
greghaynes | so for the one cluster we can put in some special logic | 18:36 |
jprovazn | greghaynes, yes, if you shutdown whole cluster, situation is same | 18:37 |
greghaynes | If there is more than one node though you cant just pick anyone - you have to figure out who has the most up to date data | 18:37 |
SpamapS | once it has bootstrapped, they'll all join that one, and then from that point on, we try really hard not to lose quorum. If we do, then we have to manually kick it. | 18:37 |
SpamapS | greghaynes: most up to date is easy because quorum | 18:37 |
SpamapS | if you lost quorum, manual kick | 18:37 |
greghaynes | If they all have been shut down for some reason then they all have to propose their latest data revision and highest wins | 18:38 |
SpamapS | greghaynes: true thats still a quorum | 18:38 |
greghaynes | Yep, so I kind of punted on implementing that part | 18:38 |
*** hashar has joined #tripleo | 18:38 | |
SpamapS | greghaynes: galera doesn't? | 18:38 |
greghaynes | hrm, xtradb-cluster docs indicated that needed to be done by the operator | 18:39 |
SpamapS | greghaynes: seems to me that you could say that galera should never start if it isn't the bootstrap node and there isn't quorum already. | 18:39 |
SpamapS | but yeah that sounds easier than it probably is | 18:39 |
jprovazn | greghaynes, SpamapS galera keeps seqno attr for this reason (most up to date node) in grastate.dat | 18:39 |
SpamapS | right | 18:39 |
greghaynes | jprovazn: Yes, but not sure what facility use to resolve that info across the cluster | 18:40 |
SpamapS | jprovazn: so if the first two nodes that start up (0, 1) pick 1 as the most up to date, 1 will send its data to 0, right? | 18:40 |
SpamapS | jprovazn: what if 2 starts up, and it is higher than 0 or 1? | 18:40 |
jprovazn | SpamapS, I think so | 18:40 |
greghaynes | For a temp fix though I could just special case for a cluster size of 1 ... | 18:40 |
SpamapS | greghaynes: cluster size of 1 is special indeed. | 18:41 |
jprovazn | SpamapS, I suppose it should fail to start (node 2) | 18:41 |
SpamapS | jprovazn: ok, that is what worries me, especially in the datacenter-power-on scenario. So I bet Galera has a way to delay startup until all nodes are present. | 18:42 |
*** spzala has quit IRC | 18:42 | |
jprovazn | SpamapS, hm I don't know about such mechanism, but it deserves checking it | 18:43 |
SpamapS | codership's wiki seems to have gone all 404' | 18:44 |
greghaynes | Yes. So the question is - is the N cold reboot something required for that patch to land | 18:45 |
*** yassine has quit IRC | 18:45 | |
greghaynes | I can see an argument that 1 node is, but I dont think > 1 should be as its not something we support yet | 18:45 |
jprovazn | greghaynes, the issue with https://review.openstack.org/#/c/83675/8/elements/mysql/os-refresh-config/configure.d/51-mysql-init is that this starts mysql in standalone mode only on first run of os-refresh-config, then on the next run db.initialized file is created, and it will try to start in "join to cluster" mode | 18:46 |
greghaynes | Yep. really any time you try to restart mysql because the next time its possible that it should be the leader | 18:46 |
SpamapS | this is most disturbing | 18:47 |
jprovazn | greghaynes, not exactly | 18:47 |
SpamapS | galera's manuals have gone away | 18:47 |
*** akuznetsov has quit IRC | 18:47 | |
jprovazn | greghaynes, by "bootstrap" param (or by gcom//) you just sayin "don't join to cluster", but if you restart this first node later (once cluster was created), you want to join it to existing cluster | 18:48 |
greghaynes | docs? who needs thsoe | 18:48 |
SpamapS | greghaynes: I think as long as the cold reboot results in nodes -down- and not nodes inconsistent, then the patch is fine. | 18:48 |
greghaynes | jprovazn: yes, its possible that another node in the cluster has more up to date info in which case the 0th node should be started without the bootstrap param | 18:49 |
greghaynes | SpamapS: Agreed | 18:49 |
SpamapS | right so here's a thought. bootstrap with node 0, that's easy, but then never do that again. If you encounter another situation where there is no known leader, error, exit, done. | 18:50 |
jprovazn | greghaynes, SpamapS: well, it seems to me that the patch will not work because when you boot up first node, it will be ffailing because from second os-refresh-config run it will be starting node "mysql restart" - which will fail beause it will try to connect to other mysql nodes which are not running yet | 18:51 |
SpamapS | then we can cross the cold boot bridge later. | 18:51 |
greghaynes | w00t, thats basically what happens. One enhancement that would probably be nice is an if cluster_size==1 then always reboot with bootstrap | 18:51 |
SpamapS | jprovazn: will it require those other nodes, or just try to connect for the purposes of sharing? | 18:51 |
* SpamapS is actually quite hungry and will go soon | 18:52 | |
jprovazn | SpamapS, other nodes are required I believe - it tries ndoes one by one, if none is avail, it fails | 18:52 |
*** CLOUDOUTAGE has joined #tripleo | 18:52 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 18:52 |
*** CLOUDOUTAGE has quit IRC | 18:52 | |
jprovazn | greghaynes, +1 for the enhancement | 18:52 |
SpamapS | jprovazn: why not just try forever at that point? | 18:53 |
jprovazn | (which is waht I was playing with today :) | 18:53 |
jprovazn | SpamapS, because it would be forever ;) - other nodes will not come up because these will not have other running nodes too | 18:53 |
jprovazn | SpamapS, I think everybody will be waiting for first running node | 18:54 |
SpamapS | you're talking about the 1 node cluster.. that is special cased. The one with quorum expected will have quorum eventually no? | 18:54 |
jprovazn | SpamapS, no, the same happens with multiple nodes | 18:54 |
jprovazn | I think, I tried with 2 only | 18:54 |
SpamapS | ah, ok, I do not understand enough | 18:55 |
SpamapS | and my belly is overriding my curiosity | 18:55 |
* SpamapS will bbiab | 18:55 | |
*** marun is now known as maru_afk | 18:57 | |
*** rwsu has quit IRC | 19:01 | |
*** ramishra has joined #tripleo | 19:01 | |
openstackgerrit | Jan Provaznik proposed a change to openstack/tripleo-image-elements: WIP: Add mysql/mariadb cluster support https://review.openstack.org/84838 | 19:02 |
jprovazn | greghaynes, SpamapS: ^ | 19:02 |
jprovazn | greghaynes, SpamapS: this adds check if first node should be started with boostrap or joined to cluster, as you can see, determine if cluster was already created is not trivial | 19:03 |
tchaypo | 99 | 19:04 |
tchaypo | c | 19:04 |
*** rwsu has joined #tripleo | 19:07 | |
tchaypo | well done me. | 19:10 |
greghaynes | jprovazn: yes, so the issue is the second time around the node 0 could have a data version that is not most up to date and node 1 could be most up to date | 19:11 |
greghaynes | jprovazn: in that case node 1 should do the bootstrap and node 0 should join | 19:11 |
greghaynes | if you have node 0 bootstrap then there will be data loss | 19:12 |
tchaypo | how do you determine which node has the most recent data? | 19:16 |
jprovazn | greghaynes, well, that would be situation when all your nodes have been down at the same time and you need to do bootstrap again (hope this will not happen), really major issue which the script is trying to solve the first cluster setup | 19:16 |
tchaypo | Is it possible to have cases where node 0 and node 1 both have some data that the other hasn't seen (maybe because of a network segementation event before they both died)? | 19:17 |
jprovazn | greghaynes, it takes care of bootstrapping the first node for the right time | 19:17 |
jprovazn | (as long as cluster was not created yet) | 19:18 |
*** CaptTofu has joined #tripleo | 19:18 | |
greghaynes | how do you know with that check the second time through that it shouldnt bootstrap again? | 19:18 |
jprovazn | greghaynes, https://review.openstack.org/#/c/84838/1/elements/mysql-common/os-refresh-config/configure.d/51-init-mysql-cluster - cluster_created function - it checks if cluster was created meantime or not | 19:19 |
greghaynes | :) aha | 19:20 |
greghaynes | I like that idea | 19:20 |
greghaynes | Ill mess with it a bit | 19:20 |
jprovazn | greghaynes, and yes, it's not very nice - it would be great to have solution without notify script | 19:21 |
greghaynes | That stuff should probably go in /mysql not mysql-common right? | 19:21 |
jprovazn | greghaynes, and I could not find such solution during today | 19:21 |
jprovazn | greghaynes, no - it's same for mysql and mariadb - mysql-common is right place unless there is percona speciality | 19:21 |
greghaynes | Awesome | 19:24 |
*** CLOUDOUTAGE has joined #tripleo | 19:24 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 19:24 |
*** CLOUDOUTAGE has quit IRC | 19:24 | |
jprovazn | greghaynes, alternative which to the notify script would be to always do "service mysql start || service mysql bootstrap" for the first node (if it fails to join to existing cluster, bootstrap it, but there is a problem with timeout - cluster join might take a longer time (data sync), for this reason init script timeout is by default 5 minutes, it could be changed but anyway couple of minutes delay on each start is probably subopt | 19:25 |
jprovazn | imal | 19:25 |
greghaynes | yes, that seems very scary to me and likely to cause spurious data corruption | 19:26 |
greghaynes | Theres definitely a long term solution needing to be developed so it should be fine to punt on having nodes come back from a full cold reboot | 19:27 |
greghaynes | and I want to get enough landed to start working on fixing the other parts were missing - like migrations on only one node | 19:28 |
greghaynes | jprovazn: What do you think about making your patch based on the galera one? | 19:30 |
greghaynes | otherwise its going to be a bad set of conflicts when one merges | 19:30 |
*** rpodolyaka1 has joined #tripleo | 19:30 | |
*** e0ne has quit IRC | 19:37 | |
jprovazn | greghaynes, sure thing | 19:37 |
*** spzala has joined #tripleo | 19:38 | |
*** jp_at_hp has quit IRC | 19:42 | |
*** cwolferh has quit IRC | 19:43 | |
*** cwolferh has joined #tripleo | 19:43 | |
*** dividehex has quit IRC | 19:44 | |
*** jcoufal has quit IRC | 19:44 | |
tchaypo | dprince: you just made my morning :) | 19:45 |
dprince | tchaypo: cool. What did I do? | 19:46 |
tchaypo | dprince: it's more what you didn't do | 19:46 |
tchaypo | dprince: specifically, find nits on the standarize-location-of-passwords change | 19:47 |
tchaypo | I'm easily pleased | 19:48 |
dprince | tchaypo: ah, well I'm glad it worked out then :) | 19:48 |
*** e0ne has joined #tripleo | 19:51 | |
*** dividehex has joined #tripleo | 19:51 | |
*** ramishra has quit IRC | 19:51 | |
*** e0ne has quit IRC | 19:52 | |
*** CLOUDOUTAGE has joined #tripleo | 19:55 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 19:55 |
*** CLOUDOUTAGE has quit IRC | 19:55 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-image-elements: Configure logging for keystone https://review.openstack.org/84847 | 20:09 |
*** dprince has quit IRC | 20:15 | |
*** CLOUDOUTAGE has joined #tripleo | 20:26 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 20:26 |
*** CLOUDOUTAGE has quit IRC | 20:26 | |
*** akrivoka has quit IRC | 20:29 | |
*** dkehn__ is now known as dkehn_ | 20:29 | |
*** maru_afk is now known as marun | 20:31 | |
*** jprovazn has quit IRC | 20:32 | |
*** lblanchard has quit IRC | 20:32 | |
*** lblanchard has joined #tripleo | 20:33 | |
*** blamar_ has joined #tripleo | 20:34 | |
*** blamar has quit IRC | 20:36 | |
*** blamar_ is now known as blamar | 20:36 | |
greghaynes | tchaypo: You have any luck with that heat template? | 20:40 |
tchaypo | got distracted by something else, and now I've inadvertently destroyed my seed | 20:42 |
greghaynes | rip | 20:42 |
lifeless | wow, I slept in | 20:43 |
lifeless | SpamapS: hi | 20:43 |
tchaypo | morning lifeless | 20:43 |
SpamapS | lifeless: howdy | 20:43 |
tchaypo | welcome to the day | 20:43 |
SpamapS | lifeless: so ci-overcloud looks to be working.. but nodepool isn't taking advantage just yet.. not sure why | 20:44 |
*** blamar has quit IRC | 20:44 | |
*** blamar has joined #tripleo | 20:44 | |
lifeless | | cdf9ae2d-5b65-4911-a5e3-4ea982a6fead | tripleo-precise-tripleo-test-cloud-3392283.slave.openstack.org | BUILD | - | NOSTATE | | | 20:45 |
lifeless | | 78b0f203-3d26-42f8-9ef4-c0aa2c29dd93 | tripleo-precise-tripleo-test-cloud-3392286.slave.openstack.org | BUILD | - | NOSTATE | | | 20:45 |
SpamapS | lifeless: those are old | 20:45 |
lifeless | ok | 20:46 |
SpamapS | lifeless: from right before we got things back up IIRC | 20:46 |
lifeless | SpamapS: have we upgraded the kernel ? | 20:47 |
SpamapS | lifeless: not that I know of | 20:47 |
*** blamar has quit IRC | 20:47 | |
SpamapS | lifeless: I came in after DHCP was working again | 20:47 |
SpamapS | they were about to start mucking with /etc/hostname and I said just run os-collect-config --force --one and that seemed to have worked | 20:48 |
*** marun is now known as maru_afk | 20:48 | |
*** spzala has quit IRC | 20:49 | |
*** spzala has joined #tripleo | 20:50 | |
*** blamar has joined #tripleo | 20:50 | |
* tchaypo discovers dib's "check-break" function | 20:53 | |
*** CLOUDOUTAGE has joined #tripleo | 20:57 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 20:57 |
*** CLOUDOUTAGE has quit IRC | 20:57 | |
*** CaptTofu has quit IRC | 20:58 | |
*** rpodolyaka1 has quit IRC | 20:59 | |
*** jdob has quit IRC | 21:00 | |
*** untriaged-bot has joined #tripleo | 21:00 | |
untriaged-bot | Untriaged bugs so far: | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1290488 | 21:00 |
uvirtbot | Launchpad bug 1290488 in tripleo "Baremetal: Invalid credentials" [Undecided,Incomplete] | 21:00 |
*** untriaged-bot has quit IRC | 21:00 | |
*** rpodolyaka1 has joined #tripleo | 21:02 | |
*** blamar has quit IRC | 21:02 | |
*** lblanchard has quit IRC | 21:11 | |
*** rpodolyaka1 has quit IRC | 21:14 | |
*** jang1 has quit IRC | 21:16 | |
*** hashar has quit IRC | 21:18 | |
*** eguz has quit IRC | 21:25 | |
*** eghobo has joined #tripleo | 21:25 | |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: set -u and -o pipefail everywhere https://review.openstack.org/84868 | 21:26 |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: set -e all the things https://review.openstack.org/83927 | 21:26 |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: Make sure all scripts are set -e https://review.openstack.org/81637 | 21:26 |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: dib-lint does not work with set -e https://review.openstack.org/83930 | 21:26 |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: Check for set -o pipefail https://review.openstack.org/83929 | 21:26 |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: Ensure scripts are set -u https://review.openstack.org/83928 | 21:26 |
*** CLOUDOUTAGE has joined #tripleo | 21:28 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 21:28 |
*** CLOUDOUTAGE has quit IRC | 21:28 | |
*** matty_dubs is now known as matty_dubs|gone | 21:29 | |
*** meena has quit IRC | 21:29 | |
*** meena has joined #tripleo | 21:30 | |
*** meena has joined #tripleo | 21:30 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-image-elements: Configurable Keystone token provider https://review.openstack.org/84802 | 21:31 |
*** markmc has quit IRC | 21:32 | |
*** eguz has joined #tripleo | 21:38 | |
*** eghobo has quit IRC | 21:41 | |
* tchaypo wonders how StevenK enjoyed his 3am phonecall | 21:47 | |
*** blamar has joined #tripleo | 21:51 | |
greghaynes | 3am and then one coming up | 21:52 |
tchaypo | I did not expect the phone conference to get all metaphysical on me | 21:56 |
tchaypo | "while you wait you will hear silence" | 21:56 |
tchaypo | will i? really? is that even possible? | 21:56 |
greghaynes | Trying to give you something to think about while you wait | 21:56 |
*** UtahDave has quit IRC | 21:57 | |
tchaypo | it's a lot better than spamming muzak in my ear | 21:58 |
*** CLOUDOUTAGE has joined #tripleo | 21:59 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 21:59 |
*** CLOUDOUTAGE has quit IRC | 21:59 | |
greghaynes | mmm seems merge.py isnt correctly scaling compute with software-config | 22:00 |
tchaypo | cody-somerville: do we have any plans for pycon-au yet? | 22:00 |
cody-somerville | Of course. :) | 22:01 |
tchaypo | ah, excellent! How can I find out more about these plans? | 22:01 |
openstackgerrit | Derek Higgins proposed a change to openstack/tripleo-image-elements: Install bridge-utils on compute nodes https://review.openstack.org/84876 | 22:01 |
openstackgerrit | Derek Higgins proposed a change to openstack/tripleo-image-elements: Install bridge-utils on compute nodes https://review.openstack.org/84876 | 22:02 |
cody-somerville | tchaypo: So it doesn't look like HP is currently sponsoring (we can see about changing that if we feel it important - though we're already sponsoring a number of Python events this year) but I do have money set aside for about 3-4 people to go. | 22:03 |
tchaypo | Do we plan to run a miniconf again? | 22:04 |
tchaypo | or rather - did we find that valuable last year? | 22:04 |
cody-somerville | I wasn't involved in our activities at PyCon-AU last year so uncertain. | 22:05 |
cody-somerville | But CFP closes on April 25th. | 22:05 |
lifeless | tchaypo: we didn't run the miniconf last time | 22:06 |
lifeless | tchaypo: tristan did, and roped me in to help | 22:06 |
tchaypo | ah. Tristan is.. rackspace? | 22:06 |
lifeless | tchaypo: what I think we should do is actively be involved and reach out to pull in talks etc | 22:07 |
lifeless | tchaypo: aptira | 22:07 |
tchaypo | I've seen his name on the ozstackers meetup group | 22:07 |
bnemec | If any cores are looking for an easy review, I'd love a +2 on https://review.openstack.org/#/c/78461 (assuming there are no problems, of course :-) | 22:24 |
*** TravT has joined #tripleo | 22:26 | |
openstackgerrit | Ben Nemec proposed a change to openstack/tripleo-image-elements: Make cinder-tgt/lio depend on cinder-volume https://review.openstack.org/84884 | 22:27 |
openstackgerrit | Ben Nemec proposed a change to openstack/tripleo-image-elements: Add cinder-lio element https://review.openstack.org/78463 | 22:27 |
openstackgerrit | Ben Nemec proposed a change to openstack/tripleo-image-elements: Factor out tgt-specific parts of cinder element https://review.openstack.org/78462 | 22:27 |
*** CLOUDOUTAGE has joined #tripleo | 22:30 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 22:30 |
*** CLOUDOUTAGE has quit IRC | 22:30 | |
*** tchaypo has quit IRC | 22:40 | |
*** tchaypo has joined #tripleo | 22:54 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-incubator: Configurable Keystone token provider https://review.openstack.org/84808 | 22:57 |
*** weshay has quit IRC | 22:59 | |
*** john-n-seattle2 has joined #tripleo | 23:01 | |
*** yamahata has joined #tripleo | 23:01 | |
*** john-n-seattle2 has left #tripleo | 23:01 | |
*** CLOUDOUTAGE has joined #tripleo | 23:01 | |
CLOUDOUTAGE | lifeless devananda Ng SpamapS jog0 GheRivero derekh dprince slagle -- ci-overcloud currently down https://etherpad.openstack.org/p/cloud-outage | 23:01 |
*** CLOUDOUTAGE has quit IRC | 23:01 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-heat-templates: Configurable Keystone token provider https://review.openstack.org/84807 | 23:03 |
*** spzala has quit IRC | 23:04 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-image-elements: Configurable Keystone token provider https://review.openstack.org/84802 | 23:07 |
*** yamahata has quit IRC | 23:11 | |
*** lucas-afk has quit IRC | 23:15 | |
*** noslzzp has quit IRC | 23:16 | |
greghaynes | SpamapS: Is the plan for merge.py to not scale out NovaComputeConfig in https://review.openstack.org/#/c/81666/11/nova-compute-instance.yaml ? | 23:33 |
*** xuhaiwei has joined #tripleo | 23:34 | |
greghaynes | As is it currently scales that resource into NovaCompute0Config and NovaCompute1Config, but doesnt update refs | 23:34 |
xuhaiwei | morning | 23:35 |
lifeless | wow the day has gone fast | 23:36 |
greghaynes | I wonder if its sane to make merge.py not apply scaling if resource doesnt match scaling_prefix + 0 ? | 23:37 |
greghaynes | lifeless: ^ or why doesnt it currently operate that way? | 23:38 |
lifeless | so it scales the configs out today because they need to be unique. I'd be ok with making it only scale Foo0 and changing the CFN templates to match | 23:39 |
greghaynes | awesome. Seems safer than assuming prefix.* | 23:39 |
tchaypo | from my cursory reading I thought it was looking for exactly prefix0, I didn't realise it was prefix.* | 23:41 |
*** andreaf2 has joined #tripleo | 23:41 | |
greghaynes | oh, youre reading merge.py | 23:41 |
* greghaynes hands tchaypo a pot of coffee | 23:41 | |
greghaynes | youll need this | 23:41 |
StevenK | greghaynes: Don't forget the flask of whiskey | 23:42 |
greghaynes | Yes, good call | 23:42 |
tchaypo | no no | 23:43 |
tchaypo | I'm currently reading pip | 23:43 |
tchaypo | and trying to find why it doesn't seem to bother reading mirror_base/index.html if mirror_base starts with file:// | 23:43 |
tchaypo | various bits of code hint that it should be doing that, but strace shows that it doesn't. | 23:44 |
greghaynes | strace for great good? | 23:44 |
greghaynes | :/ | 23:44 |
greghaynes | much fun | 23:44 |
tchaypo | One of these days I'm going to figure out how to make pip set loglevel.DEBUG and then i might get more of an idea what it's doing. | 23:45 |
tchaypo | oh, and I'm pretty sure it's one of those things where I'm going to feel stupid that I didn't figure it out sooner, too | 23:45 |
lifeless | tchaypo: does it treat the directory as the page and readdir instead? | 23:46 |
lifeless | tchaypo: and cast that into links | 23:46 |
lifeless | tchaypo: personally for this sort of thing I tend to edit the code in question and insert 'import pdb; pdb.set_trace()' | 23:46 |
tchaypo | the comment from the --index-url param in the help output says that that's what it does | 23:47 |
tchaypo | and of course that's why it breaks; when it tries to get the directory name that doesn't exist | 23:47 |
tchaypo | but other comments say that it should be looking for mirror_base/index.html first, even if it's a file://, and that doesn't seem to be happening | 23:47 |
lifeless | tchaypo: I have different help I think | 23:48 |
lifeless | -i, --index-url <url> Base URL of Python Package Index (default https://pypi.python.org/simple/). | 23:48 |
lifeless | --extra-index-url <url> Extra URLs of package indexes to use in addition to --index-url. | 23:48 |
lifeless | --no-index Ignore package index (only looking at --find-links URLs instead). | 23:48 |
lifeless | -f, --find-links <url> If a url or path to an html file, then parse for links to archives. If a local path or file:// url that's a directory, then | 23:48 |
lifeless | look for archives in the directory listing. | 23:48 |
lifeless | we're not using -f | 23:48 |
tchaypo | no, but even if we do, it stats /tmp/pypi/markupsafe and dies | 23:49 |
tchaypo | sorry, distracted moving stuff getting ready to get it taken to storage | 23:49 |
lifeless | tis ok | 23:49 |
lifeless | may I suggest that -f is a distraction | 23:49 |
lifeless | the mirror we have is correct AFAICT, even *without* an index.html | 23:49 |
tchaypo | yep, I'm more interested in why the index-url isn't being hit | 23:50 |
lifeless | tchaypo: so I'd put a breakpoint as above, at the start of the function where Real name of *should* be output | 23:50 |
lifeless | but thats me :) | 23:50 |
tchaypo | I agree with the breakpoint idea | 23:52 |
tchaypo | but the "Real name" bit happens inside the call to self._find_url_name() on line 196, and we never get that far - calls self._get_page() on line 194 and (further down the call tree) does the stat and exits | 23:52 |
tchaypo | There's another obvious target there - should that exception get caught? | 23:53 |
lifeless | let me have a look | 23:53 |
lifeless | whats the function name in the routine - the one line 194 is in | 23:53 |
tchaypo | sorry, was just musing, not actually intending that to be a question to you | 23:53 |
lifeless | curiousity piqued | 23:53 |
tchaypo | File "/usr/local/lib/python2.7/dist-packages/pip/index.py", line 194, in find_requirement | 23:53 |
tchaypo | page = self._get_page(main_index_url, req) | 23:53 |
lifeless | yeah | 23:54 |
lifeless | what exception is thrown ? | 23:54 |
lifeless | if scheme == 'file' and os.path.isdir(url2pathname(path)): | 23:55 |
lifeless | Is that what raises? | 23:55 |
*** spzala has joined #tripleo | 23:55 | |
lifeless | no | 23:55 |
lifeless | that should be safe | 23:56 |
tchaypo | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 194, in send | 23:56 |
tchaypo | stats = os.stat(pathname) | 23:56 |
tchaypo | OSError: [Errno 2] No such file or directory: '/tmp/pypi/markupsafe/' | 23:56 |
lifeless | whats the frame above ? | 23:56 |
lifeless | erm | 23:56 |
lifeless | whats the first frame in index.py | 23:56 |
Shrews | lifeless: fyi, i've gotten much better battery life since installing pm-powersave | 23:56 |
lifeless | apt-cache show pm-powersave | 23:57 |
lifeless | N: Unable to locate package pm-powersave | 23:57 |
Shrews | and powertop | 23:57 |
tchaypo | what lifeless said | 23:57 |
Shrews | lifeless: pm-utils | 23:57 |
lifeless | Shrews: I have that already :) | 23:57 |
lifeless | Shrews: I guess you mean running pm-powersave? | 23:57 |
Shrews | hrm, maybe it was the powertop twiddling that did it | 23:57 |
tchaypo | I find that when I run powertop --auto-tune my usage drops 3-5W, which is nice | 23:58 |
tchaypo | but every time things change, various settings fall back to bad | 23:58 |
lifeless | TIL | 23:58 |
lifeless | tchaypo: so - frames ? | 23:58 |
tchaypo | lifeless: in http://paste.openstack.org/show/74795/https://bugs.launchpad.net/tripleo/+bug/1301220 | 23:59 |
lifeless | really? | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!