mordred | so even though there was transparent connection re-use going on - the failures are not transparent | 00:00 |
---|---|---|
mordred | it's an option in how you use requests to disable such a thing -but I think we were thinking that requests itself should/could be more graceful | 00:00 |
jogo | mordred: you should yell at https://github.com/sigmavirus24 | 00:01 |
jogo | mordred: he is apparently also the maintainter of flake8 | 00:01 |
*** matsuhashi has joined #tripleo | 00:01 | |
jogo | mordred: which I am pushing to have automatic parallelism in the next version | 00:01 |
mordred | jogo: when is he going to join us in the IRCs? | 00:02 |
jogo | mordred: not sure, he was on this weekend | 00:02 |
*** Penick has joined #tripleo | 00:10 | |
*** Penick has quit IRC | 00:16 | |
*** shakamunyi has quit IRC | 00:18 | |
*** Penick has joined #tripleo | 00:19 | |
*** Penick has quit IRC | 00:19 | |
openstackgerrit | Adam Gandelman proposed a change to openstack/tripleo-incubator: Allow user-specified timeout for stack creation https://review.openstack.org/95968 | 00:21 |
*** lazy_prince has joined #tripleo | 00:24 | |
*** chuckC has quit IRC | 00:24 | |
*** ddieterly has joined #tripleo | 00:25 | |
SpamapS | adam_g: so perhaps what we need is a pool approach, rather than full serialization | 00:30 |
adam_g | SpamapS, yeah, thats what im thinking | 00:31 |
adam_g | SpamapS, or a >1 node undercloud,but pooling still be generally useful for ironic | 00:31 |
SpamapS | adam_g: though it might be best to focus on even more optimized methods than dd over iscsi | 00:32 |
SpamapS | adam_g: scatter gather would take the network load off the conductor quite a bit | 00:33 |
lifeless | multicast would be great | 00:37 |
lifeless | but I'm really having trouble modelling parallel dd causing conductor issues on a single hw node | 00:37 |
lifeless | since there's no off-node traffic to be interfered with | 00:37 |
mordred | lifeless: I really want to track down the dude I was talking to who freaked out when I mentioned multicast | 00:38 |
lifeless | mordred: please | 00:38 |
lifeless | mordred: btw we've got some discussion now happening about IPMI plane access | 00:38 |
mordred | awesome | 00:38 |
lifeless | of course, the discussion is so far limited to a freak out about it | 00:38 |
lifeless | rather than actual discussion of the policy goals etc etc | 00:38 |
mordred | well, that's step one | 00:38 |
lifeless | I'm almost willing to suggest we buy PDU's and don't use IPMI for the folk in question | 00:39 |
*** yamahata has joined #tripleo | 00:39 | |
mordred | the guy I talked to did that too - "PSHAW - OBVIOUSLY you can't do multicast in a REAL data center" | 00:39 |
mordred | with the look in his eyes which implied I was an idiot for bringing it up | 00:39 |
lifeless | yeah | 00:39 |
lifeless | thats the one | 00:39 |
mordred | I forgot that the right respnse it to take his name and number and send him to lifeless | 00:40 |
SpamapS | lifeless: tftp failing because we have saturated the network interface. | 00:40 |
mordred | SpamapS: that's special | 00:40 |
openstackgerrit | A change was merged to openstack/diskimage-builder: add some missing \n at end of file https://review.openstack.org/91823 | 00:40 |
openstackgerrit | A change was merged to openstack/diskimage-builder: dib-lint: ensure file finish with a new line https://review.openstack.org/91818 | 00:41 |
lifeless | hmm | 00:41 |
lifeless | fatal: unable to access 'https://git.openstack.org/openstack-infra/tripleo-ci/': Failed to connect to git.openstack.org port 443: Network is unreachable | 00:41 |
lifeless | SpamapS: I can imagine that but we didn't see that. | 00:41 |
lifeless | SpamapS: adam_g reported *conductor* failures. | 00:41 |
lifeless | SpamapS: and we saw *saucy host node* TFTP failures. Mellanox. | 00:42 |
greghaynes | adam_g: did you ever see that occur on the 4xx rack? | 00:42 |
SpamapS | Hm, thats not what I recalled. | 00:42 |
lifeless | SpamapS: nick upgraded the jump host to trusty and started deploying trusty end to end and the TFTP issue disappeared. | 00:43 |
adam_g | there were two tftp issues i saw | 00:43 |
SpamapS | Ahh, so just a slight disconnect I think. | 00:43 |
adam_g | one was the lock up of tftp xfers at boot time | 00:43 |
adam_g | which i believe is attributed to the driver update | 00:43 |
adam_g | the other was tftp connection failures from the ramdisk when trying to fetch keystone token-$foo file from conductor | 00:43 |
adam_g | i observed the latter while other starnge things were happening, like i/o errors writing to iscsi devices on the conductor | 00:44 |
lifeless | adam_g: wait, what | 00:44 |
adam_g | node locking errors | 00:44 |
adam_g | rpc timeouts, etc | 00:44 |
lifeless | adam_g: the conductor doesn't implement tftp does it ? | 00:44 |
adam_g | lifeless, it runs a tftp server and puts files there. once booted, the ramdisk does tftp get of a file containing a keystone token it uses to do a curl callback to the ironic-api | 00:45 |
adam_g | s/runs/relies on | 00:45 |
lifeless | adam_g: sure, but thats not tfting from the conductor :) | 00:45 |
lifeless | adam_g: its tftping from the tftpd | 00:45 |
lifeless | adam_g: what OS was the conductor node running ? | 00:46 |
adam_g | lifeless, trusty | 00:46 |
lifeless | nuts | 00:46 |
adam_g | lifeless, im running one more overcloud deployment on the 405 rack and writing an update email | 00:47 |
adam_g | lifeless, if you'd like, you can remove the serialization and start throwing large numbers of instances at ironic | 00:47 |
lifeless | I'm going to do a spike at a 3-node undercloud | 00:47 |
lifeless | locally first | 00:48 |
adam_g | also, all of the errors i was observing were throwing the instances back into reschedule and ending up in a NoValidHost ERROR on the nova side | 00:48 |
lifeless | timeout will do that | 00:48 |
openstackgerrit | Adam Gandelman proposed a change to openstack/tripleo-incubator: Allow user-specified timeout for stack creation https://review.openstack.org/95968 | 00:53 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-incubator: Support --help as well as -h in devtest.sh. https://review.openstack.org/95982 | 00:56 |
*** noslzzp has joined #tripleo | 00:56 | |
*** BadCub01_ has quit IRC | 00:57 | |
SpamapS | adam_g: odd.. why use tftp if you already have curl? | 01:08 |
adam_g | SpamapS, asking the wrong person there | 01:08 |
adam_g | :) | 01:08 |
SpamapS | curl would have a hell of a lot better chance of working than tftp if we're overwhelming the network | 01:10 |
*** chuckC has joined #tripleo | 01:10 | |
lifeless | SpamapS: when the token thing was being added there was no http server guaranteed to exist | 01:21 |
lifeless | SpamapS: and the token file is 400 bytes, if we overwhelm things, the kernels etc are more of an issue | 01:22 |
*** eguz has quit IRC | 01:27 | |
*** weshay has quit IRC | 01:29 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-heat-templates: Add initial support for galera clustering https://review.openstack.org/83883 | 01:35 |
*** weshay has joined #tripleo | 01:37 | |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Fix ironic api port in nova element https://review.openstack.org/93078 | 01:38 |
*** nosnos has joined #tripleo | 01:48 | |
*** ddieterly has quit IRC | 01:52 | |
*** ddieterly has joined #tripleo | 01:53 | |
*** morazi has quit IRC | 01:57 | |
*** martyntaylor has quit IRC | 02:08 | |
*** lazy_prince has quit IRC | 02:13 | |
*** nati_uen_ has quit IRC | 02:13 | |
*** weshay has quit IRC | 02:24 | |
*** funzo_ has joined #tripleo | 02:31 | |
*** mkerrin1 has joined #tripleo | 02:32 | |
*** funzo has quit IRC | 02:33 | |
*** mkerrin has quit IRC | 02:33 | |
*** olaph has quit IRC | 02:47 | |
openstackgerrit | lifeless proposed a change to openstack/tripleo-incubator: Add -c support to devtest_seed.sh. https://review.openstack.org/96014 | 02:48 |
lifeless | SpamapS: up ? | 02:52 |
*** rcarrill` has joined #tripleo | 02:55 | |
*** rcarrillocruz has quit IRC | 02:58 | |
*** untriaged-bot has joined #tripleo | 03:00 | |
untriaged-bot | Untriaged bugs so far: | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1321943 | 03:00 |
uvirtbot | Launchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1318767 | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1314978 | 03:00 |
uvirtbot | Launchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1315355 | 03:00 |
uvirtbot | Launchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New] | 03:00 |
uvirtbot | Launchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1323167 | 03:00 |
uvirtbot | Launchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1319473 | 03:00 |
uvirtbot | Launchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1316675 | 03:00 |
uvirtbot | Launchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress] | 03:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1320090 | 03:00 |
*** untriaged-bot has quit IRC | 03:00 | |
uvirtbot | Launchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New] | 03:00 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-heat-templates: Add Controller scale param to merge.py https://review.openstack.org/88085 | 03:08 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 03:14 |
greghaynes | lifeless: incase you run into it - im mid-saneifying https://review.openstack.org/#/c/91177/3 | 03:17 |
*** shakamunyi has joined #tripleo | 03:19 | |
lifeless | greghaynes: I'm just looking at the bootstrap host stuff | 03:20 |
lifeless | greghaynes: which looks entirely broken right now ? | 03:20 |
lifeless | greghaynes: ahha, same stuff | 03:20 |
greghaynes | Yes :( I was fooled by seeing the green checks in my gerrit dashboard | 03:20 |
lifeless | greghaynes: so | 03:20 |
lifeless | greghaynes: it seems to me that the boot-stack element is the one to use for this | 03:20 |
lifeless | greghaynes: grep for bootstack in t-h-t | 03:20 |
greghaynes | oh, to determine what IP is bootstrap? | 03:21 |
lifeless | greghaynes: no, as the namespace | 03:21 |
lifeless | rather than boot-strap | 03:21 |
greghaynes | ah, that works | 03:22 |
greghaynes | what about the element, merge them also? | 03:22 |
lifeless | secondly, AIUI the issue with merge.py is getting the first element of a list of all the control nodes ? | 03:22 |
lifeless | greghaynes: I'm thinking merge the elements yes. | 03:22 |
*** noslzzp has quit IRC | 03:22 | |
lifeless | its not sufficiently different AFAICT | 03:22 |
greghaynes | Yep, and kind of puts our init magic in one place | 03:23 |
greghaynes | lifeless: Which issue with merge.py? (theres a couple) | 03:23 |
lifeless | greghaynes: having it spit out the one true host | 03:23 |
*** shakamunyi has quit IRC | 03:24 | |
greghaynes | oh. So the logic for picking one host was not going to be in merge.py or templates - it was just supplying "heres a list of everyone, and heres you" | 03:24 |
lifeless | greghaynes: yeah, I saw. let me quickly (har har) test something | 03:25 |
greghaynes | On a slightly related note - did you see the etcd discovery protocol I linked? | 03:26 |
greghaynes | Even if we dont etcd, seems like their design might be worth imitating - use the seed to elect a master for our master-election system in the undercloud, and so on for overcloud | 03:28 |
lifeless | I haven't read it no | 03:30 |
greghaynes | basically they have a system to use an etcd to elect an inital master for another etcd cluster | 03:31 |
*** eghobo has joined #tripleo | 03:33 | |
tchaypo | so the seed functions as a cluster with a quorum of 1? | 03:33 |
tchaypo | that seems reasonable to me - if the seed is deaded we have problems | 03:34 |
greghaynes | Well it can die after, it just doesnt have the master-election issue since its size of 1 | 03:34 |
lifeless | greghaynes: http://paste.ubuntu.com/7533785/ | 03:34 |
lifeless | greghaynes: I haven't convinced myself of its correctness by using it yet, I'm just about to do that | 03:34 |
greghaynes | ok, you can commit it if its good - thats different enough from mine I should just wash my changes | 03:35 |
greghaynes | argh, gotta run for a few, bbiab | 03:35 |
lifeless | greghaynes: If there is an equality fn in cfn we can output a bool | 03:35 |
lifeless | I think | 03:35 |
*** ramishra has joined #tripleo | 03:40 | |
*** nosnos has quit IRC | 03:47 | |
*** tzumainn has quit IRC | 03:49 | |
*** shakamunyi has joined #tripleo | 03:54 | |
lifeless | greghaynes: let me know when you are back | 03:55 |
tchaypo | lifeless: are you going to be around for the meeting tonight? I'm plamnning to be ready to drive it just in case you aren't.. | 03:56 |
lifeless | I hope to be | 03:57 |
lifeless | but am delighted if you want to run it | 03:57 |
StevenK | Oh, the first meeting I can actually attend | 03:57 |
*** akuznetsov has joined #tripleo | 03:57 | |
tchaypo | As long as you have your feet on the spare pedals, I'd be happy to drive with L plates for the second time in a day... | 03:57 |
StevenK | Heh heh | 03:58 |
lifeless | tchaypo: cool | 03:58 |
*** rbrady has quit IRC | 04:03 | |
lifeless | init-complete is the bane of my loife | 04:03 |
lifeless | greghaynes: adam_g: was there a patch to make that issue better that I've missed? | 04:03 |
*** ramishra has quit IRC | 04:03 | |
*** rpodolyaka1 has joined #tripleo | 04:06 | |
lifeless | arggghhh | 04:12 |
openstackgerrit | Steve Kowalik proposed a change to openstack/os-cloud-config: Add logging to os_cloud_config/nodes https://review.openstack.org/96051 | 04:13 |
StevenK | lifeless: Hmm? | 04:13 |
lifeless | who moved the control plane image id *out* of the heat environment file | 04:13 |
lifeless | usability fail | 04:13 |
StevenK | git blame ? | 04:13 |
lifeless | indeed | 04:13 |
lifeless | Dan | 04:13 |
*** matsuhashi has quit IRC | 04:15 | |
lifeless | sadface at Fn::Equals being *in* the heat delivered metadata | 04:16 |
lifeless | however | 04:16 |
lifeless | I have got | 04:16 |
lifeless | current id vs selected id in two variables | 04:16 |
lifeless | great | 04:17 |
lifeless | Heat never implemented Fn::Equals | 04:17 |
lifeless | stevebaker: ^ ! | 04:17 |
lifeless | stevebaker: worth a bug ? | 04:17 |
lifeless | http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-conditions.html#d0e39905 | 04:17 |
stevebaker | lifeless: we've discussed implementing those. We even have a blueprint https://blueprints.launchpad.net/heat/+spec/intrinsics | 04:20 |
lifeless | now there is something weird with merge.py and lists that I'm going to ignore for now | 04:20 |
lifeless | but I think this will work | 04:20 |
lifeless | stevebaker: ah, ok. | 04:20 |
stevebaker | lifeless: but I'm not sure it is on anybody's immediate radar | 04:20 |
lifeless | there we go | 04:25 |
lifeless | greghaynes: | 04:25 |
lifeless | "bootstrap_nodeid": "overcloud-controller0-f3k7jgpnh2bl", | 04:25 |
lifeless | "public_interface_ip": "", | 04:25 |
lifeless | "nodeid": "overcloud-controller2-h2arrn3qicjv" | 04:25 |
*** matsuhashi has joined #tripleo | 04:26 | |
*** nosnos has joined #tripleo | 04:27 | |
*** shakamunyi has quit IRC | 04:27 | |
openstackgerrit | lifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys to the overcloud. https://review.openstack.org/96052 | 04:27 |
lifeless | I have to bounce for baby time, but I'll be back. | 04:29 |
SpamapS | lifeless: I'm here now. About to be surrounded by new Heat devs though. Wassup? | 04:38 |
*** lazy_prince has joined #tripleo | 04:50 | |
*** shakamunyi has joined #tripleo | 04:52 | |
*** shakamunyi has quit IRC | 04:54 | |
lifeless | SpamapS: paging in | 04:55 |
lifeless | SpamapS: oh yeah, was going to ask about your patch that moves the hosts calculations around | 04:56 |
lifeless | and something else I don't recall right now | 05:03 |
lifeless | SpamapS: hows the bootstrapping going ? | 05:03 |
openstackgerrit | Adam Vinsh proposed a change to openstack/tripleo-image-elements: Added dependencies to install ceilometer https://review.openstack.org/96055 | 05:06 |
*** shakamunyi has joined #tripleo | 05:11 | |
*** lokesh184 has joined #tripleo | 05:12 | |
*** dshulyak_ has joined #tripleo | 05:15 | |
*** ddieterly has quit IRC | 05:22 | |
*** ddieterly has joined #tripleo | 05:22 | |
*** akuznetsov has quit IRC | 05:23 | |
*** akuznetsov has joined #tripleo | 05:27 | |
*** shakamunyi has quit IRC | 05:40 | |
*** shakamunyi has joined #tripleo | 05:40 | |
*** rpodolyaka1 has quit IRC | 05:43 | |
greghaynes | lifeless: ah, nice | 05:46 |
greghaynes | lifeless: I think the hosts moving got replaced by nova api memoization | 05:47 |
*** rlandy has joined #tripleo | 05:47 | |
*** rcarrillocruz has joined #tripleo | 05:49 | |
*** rcarrill` has quit IRC | 05:51 | |
greghaynes | lifeless: are you working on the element changes for that heat patch or should I do it? | 05:53 |
lifeless | greghaynes: we should move the hosts anyway | 05:56 |
lifeless | greghaynes: if you're still doing stuff tonight, that would be a good thing to do | 05:56 |
* StevenK frowns at the function he just wrote. | 05:56 | |
lifeless | greghaynes: I'm just tweaking the heat patch anyhow | 05:56 |
greghaynes | ok, ill do the element bit | 05:56 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys to the overcloud. https://review.openstack.org/96052 | 05:59 |
lifeless | greghaynes: don't forget to add them to seed-config too | 05:59 |
greghaynes | yerp, just saw that comment | 05:59 |
xuhaiwei | hi, does anyone meet this situation? build devtest environment and ssh seed, under, overcloud are all OK, but quit under and overcloud, after a while(maybe a few hours) can't ssh undercloud and overcloud again | 06:00 |
lifeless | xuhaiwei: what do you mean by 'quit under and overcloud' | 06:00 |
xuhaiwei | exit them, don't ssh | 06:01 |
xuhaiwei | come back to host | 06:01 |
xuhaiwei | i can still ssh seed cloud, and on seed run 'nova list', the undercloud is still active | 06:02 |
lifeless | are they real hardware or emulated? | 06:02 |
xuhaiwei | emulated | 06:02 |
lifeless | do you still have your route to the bm bridge? | 06:02 |
lifeless | can you ping the undercloud node? | 06:03 |
xuhaiwei | I can't ping undercloud and overcloud | 06:03 |
xuhaiwei | and in my route table, i can see 0.0.0.0 10.21.43.254 0.0.0.0 UG 100 0 0 brbm 10.21.40.0 0.0.0.0 255.255.252.0 U 0 0 0 brbm 192.0.2.0 192.168.122.197 255.255.255.0 UG 0 0 0 virbr0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 | 06:04 |
*** rpodolyaka1 has joined #tripleo | 06:04 | |
*** rpodolyaka1 has quit IRC | 06:05 | |
xuhaiwei | this is not the first time i met this situation, i rebuild devtest environment for several times, every is the same | 06:05 |
*** eghobo has quit IRC | 06:05 | |
lifeless | if you tcpdump brbm and then try to ping, can you see the ICMP requests? | 06:06 |
xuhaiwei | but if i keep ssh in undercloud( don't leave it since ssh it for the first time), i can always stay in it | 06:06 |
xuhaiwei | $ tcpdump brbm tcpdump: no suitable device found | 06:07 |
xuhaiwei | does this mean brbm is gone? | 06:07 |
StevenK | xuhaiwei: No, you're using tcpdump wrong :-) | 06:08 |
StevenK | xuhaiwei: sudo tcpdump -i brbm -p -n | 06:08 |
xuhaiwei | StevenK: thank you :( | 06:09 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-heat-templates: Add initial support for galera clustering https://review.openstack.org/83883 | 06:15 |
lifeless | greghaynes: ^ old / bad deps removed | 06:15 |
xuhaiwei | when running tcpdump i got this message full of the screen | 06:16 |
*** rakesh_hs has joined #tripleo | 06:16 | |
SpamapS | lifeless: bootstrapping is going very well | 06:16 |
xuhaiwei | 15:43:42.166034 IP 10.21.41.72.22 > 10.21.42.98.33591: Flags [P.], seq 1433824:1434016, ack 145, win 216, options [nop,nop,TS val 103173044 ecr 284062690], length 192 | 06:16 |
SpamapS | lifeless: regarding moving the inputs around, stevebaker has a patch into heat to memoize the calls which should result in a much better performance and we get to keep our nicely expressed templates. | 06:17 |
lifeless | xuhaiwei: what ip address is your undercloud on ? | 06:17 |
xuhaiwei | lifeless: the host's ip? | 06:17 |
lifeless | SpamapS: it seems nicer to me to express globally global things | 06:17 |
lifeless | xuhaiwei: yes, from nova list | 06:17 |
SpamapS | lifeless: we also discussed at the summit implementing variables as a separate recursive thing from software config so that we don't have to express the same thing many times | 06:17 |
lifeless | great | 06:18 |
xuhaiwei | the undercloud's ip is 192.0.2.3 | 06:18 |
SpamapS | lifeless: basically it was a poor attempt at variables. We need actual variables. | 06:18 |
xuhaiwei | 10.21.41.72 is the host's ip | 06:18 |
lifeless | xuhaiwei: ok so your tcpdump should be tcpdump -ni brbm host 192.0.2.3 | 06:19 |
lifeless | you need promiscuous on | 06:19 |
*** dshulyak_ has quit IRC | 06:19 | |
lifeless | SpamapS: we do, I agree, but even so | 06:19 |
StevenK | lifeless: Blah, didn't know that. -i <foo> -p -n has been in my muscle memory for so long | 06:19 |
lifeless | SpamapS: the config as a global expression seems like the right place to put global expressions | 06:19 |
lifeless | StevenK: its appropriate when you're on the interface you want to dump, but we're not here :> | 06:20 |
lifeless | StevenK: if you see what I mean | 06:20 |
StevenK | Heh, right | 06:20 |
*** dkehn has quit IRC | 06:20 | |
xuhaiwei | lifeless: running tcpdump and then ping 192.0.2.3, there is no reaction to tcpdump | 06:20 |
lifeless | xuhaiwei: what is the output of 'ip route' | 06:21 |
xuhaiwei | default via 10.21.43.254 dev brbm metric 100 10.21.40.0/22 dev brbm proto kernel scope link src 10.21.41.72 192.0.2.0/24 via 192.168.122.197 dev virbr0 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 | 06:21 |
lifeless | xuhaiwei: use a pastebin please - thats unreadable | 06:21 |
xuhaiwei | ok | 06:22 |
lifeless | also the output of ip route get 192.0.2.3 | 06:22 |
*** dkehn has joined #tripleo | 06:22 | |
lifeless | SpamapS: but it sounds like you're unhappy with that patch as it stands ? | 06:22 |
SpamapS | lifeless: it is not global | 06:22 |
lifeless | SpamapS: the config object? | 06:23 |
SpamapS | lifeless: we are using it with the controller address today, but tomorrow it will be the haproxy endpoint | 06:23 |
SpamapS | lifeless: and the next day it might be a separate mysql server | 06:23 |
xuhaiwei | lifeless:http://paste.openstack.org/ | 06:23 |
lifeless | SpamapS: I don't understand | 06:23 |
lifeless | xuhaiwei: I need the actual url, not the website :) | 06:24 |
lifeless | xuhaiwei: after you click 'paste' | 06:24 |
xuhaiwei | sorry | 06:24 |
SpamapS | lifeless: basically to support reusing the config object in multiple topologies, we have to wait until deployment time to decide what values to lookup and inject into the config. | 06:24 |
xuhaiwei | http://paste.openstack.org/show/81824/ | 06:24 |
SpamapS | lifeless: Or, we have to go back to Type: FileInclude for reusability. | 06:25 |
lifeless | SpamapS: right, so things that are per-node are inputs | 06:25 |
lifeless | SpamapS: things that are the same for all nodes using the config are directly expressable | 06:25 |
lifeless | SpamapS: or - I'm misunderstanding something ? | 06:25 |
lifeless | SpamapS: the deployment binds node-local values into the global config | 06:26 |
SpamapS | lifeless: Yes, you're misunderstanding that the config is meant to be reusable across multiple topologies. If we could compose a config, out of another config, then we would have that. But we can't. :-/ | 06:26 |
lifeless | SpamapS: at what point do the expressions in the config as-written get evaluated ? | 06:26 |
SpamapS | lifeless: when they can be evaluated according to the graph | 06:27 |
SpamapS | so as soon as all the parents are active | 06:27 |
SpamapS | but this is missing the point. If we only were ever going to have one topology.. agreed. But I'd think at some point we might want to be able to run rabbitmq and mysql on their own servers. | 06:27 |
lifeless | are the evaluated once for all the deploys-of-the-config or just once? | 06:28 |
lifeless | SpamapS: I know that, but thats a separate question AFAICT | 06:28 |
SpamapS | if we're expressing the compute node config with a reference to the controller IP for rabbitMQ .. then we cannot reuse that config definition. | 06:28 |
lifeless | SpamapS: since we're *only* talking about the scatter-gather of *all hosts* | 06:28 |
SpamapS | so anyway... the performance problem is addressed under the covers | 06:30 |
SpamapS | the expressability problem can be given more thought | 06:30 |
SpamapS | anyway, back to bootstrapping | 06:31 |
lifeless | SpamapS: see e.g. my latest patch in tht | 06:36 |
lifeless | yay p.o.o.your 500's thrill me | 06:37 |
lifeless | xuhaiwei: you did not include the 'ip route get 192.0.2.3' command output | 06:39 |
lifeless | though I would expect it to be unsurprising | 06:39 |
lifeless | xuhaiwei: now, can you ping 192.0.2.1 ? | 06:39 |
xuhaiwei | i can ping 192.0.2.1 | 06:39 |
lifeless | ok | 06:40 |
lifeless | now log into the seed | 06:40 |
lifeless | and from there try to ping 192.0.2.3 | 06:40 |
xuhaiwei | but as i ping 192.0.2.1 tcpdump still got nothing | 06:40 |
lifeless | also check in virsh list that the nodes are still running | 06:40 |
xuhaiwei | on seed , can't ping 192.0.2.3 | 06:41 |
xuhaiwei | lifeless: yes, virsh list shows all the nodes are running(under and over | 06:42 |
lifeless | ok | 06:42 |
lifeless | so check the ip route from within the seed | 06:42 |
lifeless | and connect to the console of e.g. the undercloud and login with stack / stack to debug there | 06:42 |
xuhaiwei | lifeless: the ip route http://paste.openstack.org/show/81825/ | 06:44 |
*** e0ne has joined #tripleo | 06:44 | |
*** boris-42 has quit IRC | 06:45 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 06:45 |
dshulyak | greghaynes: hi, i'm ok with it, are you going to use heat Parameters or some kind of StructuredConfig ? | 06:45 |
lifeless | greghaynes: oh, we'll need to add this to the undercloud too | 06:46 |
greghaynes | lifeless: yerp | 06:46 |
greghaynes | dshulyak: I was thinking parameters | 06:46 |
lifeless | greghaynes: insta-review done | 06:47 |
greghaynes | insta-1 ;) | 06:47 |
*** e0ne has quit IRC | 06:48 | |
*** e0ne has joined #tripleo | 06:49 | |
* TheJulia wipes sleep from her eyes | 06:49 | |
*** e0ne has quit IRC | 06:49 | |
*** boris-42 has joined #tripleo | 06:49 | |
greghaynes | For the midnight meeting? | 06:49 |
*** e0ne has joined #tripleo | 06:50 | |
*** rdopiera has joined #tripleo | 06:50 | |
lifeless | TheJulia: *now* you need that caffeine | 06:50 |
TheJulia | yeah, 3am | 06:50 |
greghaynes | oh wow | 06:50 |
xuhaiwei | lifeless: you mean connect the console of the underloud? use nova get-vnc-console? | 06:50 |
TheJulia | lifeless: already got it | 06:50 |
StevenK | TheJulia: Eek, you don't need to attend meetings if they're horrible times | 06:51 |
*** e0ne has quit IRC | 06:51 | |
*** e0ne has joined #tripleo | 06:51 | |
StevenK | TheJulia: East coast of the US? | 06:52 |
TheJulia | Its still a first for me, so might as well, besides my sleep cyclce cold use some reverse adjustment. I was starting to wake up at 4:30-5 AM on my own anyway | 06:52 |
TheJulia | Yeah, North Carolina | 06:52 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-heat-templates: Export new bootstack keys for cluster init. https://review.openstack.org/96052 | 06:52 |
lifeless | xuhaiwei: no, virsh or virt-manager directly | 06:52 |
StevenK | Is the meeting in -alt or -meeting itself? | 06:53 |
lifeless | greghaynes: ^ should do it | 06:53 |
greghaynes | wah? | 06:53 |
StevenK | TheJulia: This will be my first meeting, since all of the previous meetings were at 5am or so | 06:54 |
lifeless | greghaynes: we're going to be consulting this in every cluster, regardless of scale :) | 06:54 |
lifeless | tchaypo: so - I will be putting C to bed for the first 10m or so | 06:54 |
lifeless | tchaypo: I hope :). | 06:54 |
devananda | StevenK: hi! https://bugs.launchpad.net/ironic/+bug/1315224 was just pointed out to me, and it looks like you filed it, so i'll ask you :) | 06:54 |
uvirtbot | Launchpad bug 1315224 in ironic "Ironic should set the node power-state to off when registering a node" [Undecided,New] | 06:54 |
lifeless | devananda: you'll ask him why you aren't asleep ? | 06:55 |
StevenK | Bwahaha | 06:55 |
devananda | that too | 06:55 |
StevenK | devananda: Ask me why I filed it? That's easy, lifeless asked me to. :-P | 06:55 |
devananda | hah | 06:55 |
*** e0ne has quit IRC | 06:56 | |
lifeless | devananda: because in a rack of unknown state, having machines running that aren't undergoing maintenance or actively deployed is a bad idea | 06:56 |
devananda | StevenK: so you (or lifeless) expect a newly-registered node to be immediately powered off | 06:56 |
devananda | lifeless: we covered that. there's a periodic task to ensure state | 06:56 |
lifeless | devananda: and we found that ironic takes quite some time to assert the power state post-registration - long enough for deploys to get broken by it | 06:56 |
devananda | gotcha | 06:57 |
lifeless | devananda: so someone put a patch into tripleo-incubator to power off nodes post registration | 06:57 |
devananda | that was my assumption based on reading the bug | 06:57 |
lifeless | devananda: NobodyCam I believe in fact | 06:57 |
devananda | yep | 06:57 |
lifeless | devananda: I believe that workarounds in incubator are a misfeature :) | 06:57 |
devananda | lifeless: you don't want to be a special case? :) | 06:57 |
lifeless | please god no | 06:57 |
devananda | *incubator to be | 06:57 |
*** eghobo has joined #tripleo | 06:58 | |
devananda | ok - thanks. i have what I wanted (an understanding of where this issue came from) | 06:58 |
devananda | enough to comment and triage it | 06:58 |
*** rcarrill` has joined #tripleo | 06:59 | |
tchaypo | Meeting time! | 07:00 |
rpodolyaka | morning all | 07:01 |
StevenK | tchaypo: Which is in which channel? | 07:01 |
tchaypo | StevenK: the usual #openstack-meeting-alt | 07:01 |
*** rcarrillocruz has quit IRC | 07:01 | |
*** lifeless changes topic to " https://etherpad.openstack.org/p/tripleo-ci-r1-trusty | tripleo-cd running preserve-ephemeral WIP patches and https://review.openstack.org/#/c/62042/ | Using OpenStack to deploy OpenStack;meetings Tuesday 1900//0700 UTC in #openstack-meeting-alt" | 07:02 | |
*** jistr has joined #tripleo | 07:05 | |
*** e0ne has joined #tripleo | 07:05 | |
*** ddieterly has quit IRC | 07:06 | |
*** mrunge has joined #tripleo | 07:06 | |
*** ddieterly has joined #tripleo | 07:06 | |
*** derekh_ has joined #tripleo | 07:08 | |
lifeless | greghaynes: what made you say 'wah?' | 07:08 |
greghaynes | lifeless: confusion, I thought you were saying I should drive meeting | 07:09 |
lifeless | SpamapS: is https://etherpad.openstack.org/p/tripleo-ci-r1-trusty up to date? | 07:09 |
lifeless | greghaynes: ah no; was saying that patch should make the uc work too | 07:09 |
*** e0ne has quit IRC | 07:10 | |
*** e0ne has joined #tripleo | 07:11 | |
*** mrunge has quit IRC | 07:11 | |
*** mrunge has joined #tripleo | 07:12 | |
*** e0ne has quit IRC | 07:13 | |
*** pblaho has joined #tripleo | 07:13 | |
*** ifarkas has joined #tripleo | 07:14 | |
*** rcarrill` is now known as rcarrillocruz | 07:15 | |
openstackgerrit | Dmitry Shulyak proposed a change to openstack/tripleo-specs: Haproxy configuration options https://review.openstack.org/94907 | 07:19 |
SpamapS | lifeless: I haven't done anything else so yes it should be up to date. | 07:19 |
*** e0ne has joined #tripleo | 07:20 | |
*** jcoufal has joined #tripleo | 07:28 | |
*** lokesh184 has quit IRC | 07:33 | |
*** lazy_prince has quit IRC | 07:34 | |
*** lazy_prince has joined #tripleo | 07:35 | |
*** e0ne has quit IRC | 07:36 | |
lifeless | neutron subnet-show is barfing on the ext-net | 07:36 |
lifeless | subnet-list isn't listing it | 07:36 |
lifeless | net-list lists it | 07:36 |
*** e0ne has joined #tripleo | 07:36 | |
*** giulivo has joined #tripleo | 07:38 | |
*** e0ne has quit IRC | 07:39 | |
*** eghobo has quit IRC | 07:44 | |
*** jprovazn has joined #tripleo | 07:46 | |
*** lokesh184 has joined #tripleo | 07:50 | |
marios | lifeless: what i mean is (less specific axes) the latest specs in advanced services (service chaining, external ports, traffic steering) are much more abstract and implementation agnostic | 07:55 |
lifeless | marios: yeah, been watching that | 07:58 |
lifeless | marios: we'll see if they get more cores and review bandwidth though :) | 07:58 |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Choose whether to deploy/update using heat. https://review.openstack.org/92344 | 07:58 |
lifeless | derekh_: so | 07:58 |
marios | lifeless: on one hand great/interesting/exciting. on other hand, there are so many other things to fix first so... as you say | 07:58 |
derekh_ | lifeless: SpamapS subnetshow works for me | 07:58 |
derekh_ | neutron subnet-show fc0673ec-b4a6-4802-8454-02d5937cd1a3 | 07:59 |
lifeless | derekh_: there are 40 active nodes other than te-broker | 07:59 |
lifeless | derekh_: none have a public IP | 07:59 |
derekh_ | lifeless: yup, which presumable is why nodepool isn't using them | 07:59 |
lifeless | 2014-05-28 08:00:10.727 2582 TRACE nova.api.openstack RuntimeError: maximum recursion depth exceeded while getting the str of an object | 08:00 |
lifeless | is getting logged in the nova-api log | 08:00 |
lifeless | in 2014-05-28 08:00:26.241 2582 TRACE nova.api.openstack File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/api.py", line 371, in _check_num_instances_quota | 08:00 |
lifeless | 2014-05-28 08:00:26.241 2582 TRACE nova.api.openstack min_count, allowed) | 08:00 |
tchaypo | TheJulia: where in the world are you? | 08:00 |
TheJulia | North Carolina | 08:01 |
tchaypo | I'm guessing western europe? | 08:01 |
tchaypo | well, I was a bit wrong | 08:01 |
tchaypo | you're handy for the mid-cycle though | 08:01 |
*** dtantsur|afk is now known as dtantsur | 08:01 | |
tchaypo | okay, I'm in a moving car, so I'm going to stop looking at the screen now | 08:02 |
TheJulia | enjoy | 08:02 |
marios | tchaypo: irc chairing like a boss | 08:03 |
derekh_ | lifeless: so that recursive _check_num_instances_quota only happens if in except exception.OverQuota as exc: | 08:04 |
derekh_ | lifeless: so its over quota ? | 08:04 |
derekh_ | or trying to go over quota | 08:06 |
derekh_ | :q | 08:06 |
* TheJulia goes back to sleep | 08:07 | |
derekh_ | nodepool could have lost track of some of the instances it started | 08:08 |
derekh_ | brb, tethering off phone and want to get onto home network | 08:09 |
*** derekh_ has quit IRC | 08:09 | |
*** derekh_ has joined #tripleo | 08:11 | |
*** matsuhashi has quit IRC | 08:14 | |
lifeless | derekh_: could be | 08:14 |
*** matsuhashi has joined #tripleo | 08:15 | |
*** viktors|afc is now known as viktors | 08:16 | |
*** matsuhashi has quit IRC | 08:17 | |
*** matsuhashi has joined #tripleo | 08:17 | |
*** andreaf has joined #tripleo | 08:18 | |
*** IvanBerezovskiy has joined #tripleo | 08:27 | |
*** lucasagomes has joined #tripleo | 08:33 | |
openstackgerrit | A change was merged to openstack/tripleo-incubator: Support --help as well as -h in devtest.sh. https://review.openstack.org/95982 | 08:42 |
*** shakamunyi has quit IRC | 08:43 | |
*** ddieterly has quit IRC | 08:49 | |
*** ddieterly has joined #tripleo | 08:50 | |
*** e0ne has joined #tripleo | 08:58 | |
openstackgerrit | Dmitry Shulyak proposed a change to openstack/tripleo-image-elements: Change stunnel priority and binding addresses https://review.openstack.org/95663 | 08:59 |
*** untriaged-bot has joined #tripleo | 09:00 | |
untriaged-bot | Untriaged bugs so far: | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1321943 | 09:00 |
uvirtbot | Launchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1318767 | 09:00 |
uvirtbot | Launchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1314978 | 09:00 |
uvirtbot | Launchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1315355 | 09:00 |
uvirtbot | Launchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1323167 | 09:00 |
uvirtbot | Launchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1319473 | 09:00 |
uvirtbot | Launchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1316675 | 09:00 |
uvirtbot | Launchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress] | 09:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1320090 | 09:00 |
uvirtbot | Launchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New] | 09:00 |
*** untriaged-bot has quit IRC | 09:00 | |
openstackgerrit | Loganathan Parthipan proposed a change to openstack/tripleo-incubator: scripts to persist a running devtest to disk https://review.openstack.org/92575 | 09:01 |
*** e0ne_ has joined #tripleo | 09:03 | |
*** pelix has joined #tripleo | 09:03 | |
*** e0ne has quit IRC | 09:06 | |
dshulyak | greghaynes: so are you still going to parametrize haproxy configuration :) ? i can add params, but i thought that it is not good | 09:12 |
*** jp_at_hp has joined #tripleo | 09:12 | |
*** lazy_prince has quit IRC | 09:19 | |
*** lokesh184 has quit IRC | 09:20 | |
*** rlandy has quit IRC | 09:21 | |
*** lokesh184 has joined #tripleo | 09:22 | |
*** martyntaylor has joined #tripleo | 09:22 | |
*** lazy_prince has joined #tripleo | 09:22 | |
*** yamahata has quit IRC | 09:22 | |
openstackgerrit | Jiri Stransky proposed a change to openstack/tripleo-specs: Tuskar multiple clouds https://review.openstack.org/96112 | 09:27 |
openstackgerrit | Lin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw https://review.openstack.org/96114 | 09:31 |
openstackgerrit | Lin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw https://review.openstack.org/96114 | 09:33 |
openstackgerrit | Lin Tan proposed a change to openstack/diskimage-builder: Correct the substitution of suffix from qcow2 to raw https://review.openstack.org/96114 | 09:34 |
*** shakamunyi has joined #tripleo | 09:40 | |
*** shakamunyi has quit IRC | 09:45 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-heat-templates: Move to software-config for the undercloud. https://review.openstack.org/93319 | 09:51 |
*** boris-42 has quit IRC | 09:52 | |
openstackgerrit | Martin Geisler proposed a change to openstack/tuskar: Remove unnecessary coding lines https://review.openstack.org/96123 | 09:52 |
openstackgerrit | Martin Geisler proposed a change to openstack/tuskar: Use Emacs-friendly file variable to set file encoding https://review.openstack.org/95886 | 09:53 |
openstackgerrit | Dmitry Shulyak proposed a change to openstack/tripleo-heat-templates: Haproxy configuration https://review.openstack.org/93554 | 09:53 |
dshulyak | greghaynes: updated haproxy configuration with params ) | 09:54 |
*** dtantsur is now known as dtantsur|lunch | 09:56 | |
*** akrivoka has joined #tripleo | 10:07 | |
*** lokesh184 has quit IRC | 10:12 | |
*** boris-42 has joined #tripleo | 10:14 | |
*** jtomasek has joined #tripleo | 10:20 | |
*** e0ne_ has quit IRC | 10:21 | |
*** e0ne has joined #tripleo | 10:22 | |
*** e0ne has quit IRC | 10:26 | |
*** matsuhashi has quit IRC | 10:48 | |
*** markmc has joined #tripleo | 10:56 | |
*** e0ne has joined #tripleo | 10:59 | |
*** e0ne has quit IRC | 11:04 | |
*** lokesh184 has joined #tripleo | 11:05 | |
*** rakesh_hs has quit IRC | 11:15 | |
*** lucasagomes is now known as lucas-hungry | 11:15 | |
*** rakesh_hs has joined #tripleo | 11:15 | |
giulivo | jprovazn, a question regarding https://bugs.launchpad.net/tripleo/+bug/1226310 | 11:19 |
uvirtbot | Launchpad bug 1226310 in tripleo "Nova bm operations fail when LIBVIRT_DEFAULT_URI not set" [Medium,Triaged] | 11:19 |
giulivo | if I wanted to submit something small to fix it | 11:19 |
giulivo | would you actually approach it as a config variable for our tripleo user, or would it make more sense to fix that in nova-bm itself? | 11:20 |
*** tzumainn has joined #tripleo | 11:21 | |
jprovazn | giulivo: ah, I forgot about this issue. Setting variable for tripleo user was NACKed before. A fix on nova side would be much better | 11:22 |
giulivo | jprovazn, ok, thanks for pointing that out :) | 11:22 |
jprovazn | np | 11:22 |
*** dtantsur|lunch is now known as dtantsur | 11:29 | |
*** e0ne has joined #tripleo | 11:32 | |
dshulyak | jprovazn: hi, are you going to restore https://review.openstack.org/#/c/61376/ ? | 11:37 |
jprovazn | dshulyak: hi, yes | 11:37 |
jprovazn | dshulyak: looks like easiest solution | 11:38 |
jprovazn | let me reopn/update it now | 11:38 |
*** e0ne_ has joined #tripleo | 11:45 | |
*** e0ne has quit IRC | 11:49 | |
*** lokesh184 has quit IRC | 11:52 | |
*** morazi has joined #tripleo | 11:55 | |
openstackgerrit | Ladislav Smola proposed a change to openstack/tripleo-image-elements: Properly enabling and restarting snmpd https://review.openstack.org/95689 | 11:56 |
*** rlandy has joined #tripleo | 11:57 | |
*** dprince has joined #tripleo | 12:10 | |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: Adding -x to keystone orc scripts https://review.openstack.org/90760 | 12:10 |
*** gcha has joined #tripleo | 12:14 | |
*** funzo_ is now known as funzo | 12:16 | |
*** pblaho has quit IRC | 12:18 | |
*** julim has joined #tripleo | 12:21 | |
*** weshay has joined #tripleo | 12:23 | |
*** lucas-hungry is now known as lucasagomes | 12:23 | |
openstackgerrit | A change was merged to openstack/tripleo-image-elements: indent using 4 spaces (3/3) https://review.openstack.org/93208 | 12:25 |
Ng | morning | 12:26 |
slagle | good morning | 12:27 |
*** martyntaylor has left #tripleo | 12:30 | |
*** rbrady has joined #tripleo | 12:31 | |
openstackgerrit | Radomir Dopieralski proposed a change to openstack/tuskar-ui: Tests are broken since Horizon started using angular-cookies https://review.openstack.org/96152 | 12:33 |
*** rakesh_hs2 has joined #tripleo | 12:33 | |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: dpkg: support pkg-map in bin/install-packages https://review.openstack.org/91601 | 12:34 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Yum: support pkg-map in bin/install-packages https://review.openstack.org/91600 | 12:34 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: opensuse: support pkg-map in bin/install-packages https://review.openstack.org/91602 | 12:34 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Update base element to make use of pkg-map https://review.openstack.org/91880 | 12:34 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Add pkg-map element. https://review.openstack.org/91598 | 12:34 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Set DISTRO_NAME in OS environment.d https://review.openstack.org/91599 | 12:34 |
*** rakesh_hs has quit IRC | 12:35 | |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Name deploy-ironic and deploy-baremetal files uniq https://review.openstack.org/95812 | 12:36 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Add top level 'tests' dir for element testing https://review.openstack.org/95813 | 12:36 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Name 01-install-bin uniquely https://review.openstack.org/95810 | 12:36 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Name 10-rhel-cloud-image uniquely https://review.openstack.org/95811 | 12:36 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Name 99-setup-first-boot uniquely https://review.openstack.org/95808 | 12:36 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Delete redhat-common 00-usr-local-bin-secure-path https://review.openstack.org/95809 | 12:36 |
*** jdob has joined #tripleo | 12:37 | |
*** lazy_prince has quit IRC | 12:37 | |
*** lazy_prince has joined #tripleo | 12:38 | |
*** andreaf has quit IRC | 12:41 | |
lxsli | greghaynes: I got a clean Jenkins on https://review.openstack.org/#/c/93041/ , turn your +2 into +2/+A please? | 12:48 |
*** nosnos has quit IRC | 12:50 | |
*** jcoufal has quit IRC | 12:56 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes https://review.openstack.org/96154 | 12:56 |
*** jcoufal has joined #tripleo | 12:57 | |
*** lazy_prince has quit IRC | 13:05 | |
*** edmund has quit IRC | 13:09 | |
*** edmund has joined #tripleo | 13:09 | |
*** ramishra has joined #tripleo | 13:15 | |
*** mrunge has quit IRC | 13:17 | |
*** ddieterly has quit IRC | 13:18 | |
openstackgerrit | A change was merged to openstack/tuskar-ui: Separate out BareMetalNode from IronicNode https://review.openstack.org/95881 | 13:21 |
*** rakesh_hs2 has quit IRC | 13:23 | |
*** rakesh_hs has joined #tripleo | 13:24 | |
*** gcha has quit IRC | 13:26 | |
openstackgerrit | Dmitry Shulyak proposed a change to openstack/tripleo-heat-templates: WIP: Haproxy configuration https://review.openstack.org/93554 | 13:27 |
openstackgerrit | Ladislav Smola proposed a change to openstack/tripleo-image-elements: Storing SNMPd credentials in Ceilometer https://review.openstack.org/90374 | 13:29 |
openstackgerrit | Ladislav Smola proposed a change to openstack/tripleo-image-elements: Storing SNMPd credentials in Ceilometer https://review.openstack.org/90374 | 13:32 |
*** ifarkas_ has joined #tripleo | 13:34 | |
*** ifarkas has quit IRC | 13:34 | |
*** ddieterly has joined #tripleo | 13:49 | |
*** andreaf has joined #tripleo | 13:52 | |
*** jcoufal has quit IRC | 13:54 | |
*** jcoufal has joined #tripleo | 13:55 | |
*** jcoufal has quit IRC | 13:56 | |
*** shakamunyi has joined #tripleo | 13:56 | |
*** jcoufal has joined #tripleo | 13:56 | |
openstackgerrit | Jan Provaznik proposed a change to openstack/tripleo-image-elements: Update openstack services to listen on stunnel connect port https://review.openstack.org/61376 | 13:57 |
*** BadCub has joined #tripleo | 14:00 | |
openstackgerrit | Ladislav Smola proposed a change to openstack/tripleo-incubator: Adding Undercloud Ceilometer config element https://review.openstack.org/94637 | 14:05 |
openstackgerrit | Ladislav Smola proposed a change to openstack/tripleo-incubator: Generating of password for SNMPd https://review.openstack.org/94838 | 14:05 |
*** matty_dubs|gone is now known as matty_dubs | 14:05 | |
*** akuznetsov has quit IRC | 14:06 | |
*** akuznetsov has joined #tripleo | 14:06 | |
*** edmund has quit IRC | 14:19 | |
*** yamahata has joined #tripleo | 14:20 | |
*** e0ne_ has quit IRC | 14:21 | |
*** e0ne has joined #tripleo | 14:21 | |
openstackgerrit | Dmitry Shulyak proposed a change to openstack/tripleo-image-elements: Change horizon binding address to local-ipv4 in haproxy case https://review.openstack.org/91089 | 14:22 |
openstackgerrit | Stuart McLaren proposed a change to openstack/tripleo-incubator: Run the overcloud with an SSL enabled public IP https://review.openstack.org/85098 | 14:30 |
*** jprovazn has quit IRC | 14:31 | |
*** jcoufal has quit IRC | 14:33 | |
openstackgerrit | A change was merged to openstack/tripleo-incubator: Properly default MysqlInnodbBufferPoolSize (overcloud) https://review.openstack.org/93041 | 14:38 |
*** eghobo has joined #tripleo | 14:39 | |
*** dtantsur is now known as dtantsur|afk | 14:39 | |
openstackgerrit | Kiall Mac Innes proposed a change to openstack/diskimage-builder: VM element: Enable serial console on Debian https://review.openstack.org/96177 | 14:40 |
*** eghobo has quit IRC | 14:42 | |
*** edmund has joined #tripleo | 14:48 | |
*** rakesh_hs has quit IRC | 14:50 | |
openstackgerrit | Alexis Lee proposed a change to openstack/tripleo-incubator: Properly default MysqlInnodbBufferPoolSize (undercloud) v2 https://review.openstack.org/94856 | 14:58 |
*** untriaged-bot has joined #tripleo | 15:00 | |
untriaged-bot | Untriaged bugs so far: | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1321943 | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1318767 | 15:00 |
uvirtbot | Launchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress] | 15:00 |
uvirtbot | Launchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1314978 | 15:00 |
uvirtbot | Launchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1315355 | 15:00 |
uvirtbot | Launchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1323167 | 15:00 |
uvirtbot | Launchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1319473 | 15:00 |
uvirtbot | Launchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1316675 | 15:00 |
uvirtbot | Launchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress] | 15:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1320090 | 15:00 |
uvirtbot | Launchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New] | 15:00 |
*** untriaged-bot has quit IRC | 15:00 | |
*** rdopiera has quit IRC | 15:03 | |
openstackgerrit | Nicholas Randon proposed a change to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes https://review.openstack.org/96154 | 15:12 |
*** shakamunyi has quit IRC | 15:16 | |
*** shakamunyi has joined #tripleo | 15:18 | |
*** noslzzp has joined #tripleo | 15:33 | |
*** pblaho has joined #tripleo | 15:35 | |
*** pblaho has quit IRC | 15:35 | |
*** pblaho has joined #tripleo | 15:35 | |
openstackgerrit | A change was merged to openstack/tripleo-incubator: Ironic "lock already held" when powering off nodes https://review.openstack.org/96154 | 15:47 |
*** e0ne has quit IRC | 15:47 | |
*** akrivoka has quit IRC | 15:47 | |
*** e0ne has joined #tripleo | 15:48 | |
*** e0ne_ has joined #tripleo | 15:49 | |
*** jistr has quit IRC | 15:50 | |
*** akrivoka has joined #tripleo | 15:50 | |
*** e0ne has quit IRC | 15:52 | |
*** eghobo has joined #tripleo | 15:54 | |
*** derekh_ has quit IRC | 16:00 | |
*** derekh_ has joined #tripleo | 16:01 | |
*** cwolferh has joined #tripleo | 16:03 | |
openstackgerrit | Ben Nemec proposed a change to openstack/diskimage-builder: Factor out error behavior in dib-lint https://review.openstack.org/95850 | 16:05 |
derekh_ | lifeless: R1 is back in use now, check back to http://goodsquishy.com/downloads/s_tripleo-jobs.html later to check pass rates and know how its doing | 16:07 |
*** ifarkas_ has quit IRC | 16:09 | |
*** ifarkas has joined #tripleo | 16:13 | |
*** IvanBerezovskiy has left #tripleo | 16:15 | |
*** matty_dubs is now known as matty_dubs|lunch | 16:17 | |
*** e0ne_ has quit IRC | 16:19 | |
*** e0ne has joined #tripleo | 16:19 | |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: dpkg: support pkg-map in bin/install-packages https://review.openstack.org/91601 | 16:22 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Yum: support pkg-map in bin/install-packages https://review.openstack.org/91600 | 16:22 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: opensuse: support pkg-map in bin/install-packages https://review.openstack.org/91602 | 16:22 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Update base element to make use of pkg-map https://review.openstack.org/91880 | 16:22 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Add pkg-map element. https://review.openstack.org/91598 | 16:22 |
openstackgerrit | Dan Prince proposed a change to openstack/diskimage-builder: Set DISTRO_NAME in OS environment.d https://review.openstack.org/91599 | 16:22 |
*** dprince has quit IRC | 16:22 | |
*** derekh_ has quit IRC | 16:23 | |
*** e0ne has quit IRC | 16:24 | |
*** dshulyak_ has joined #tripleo | 16:25 | |
*** viktors is now known as viktors|afk | 16:26 | |
*** akrivoka has quit IRC | 16:30 | |
*** hashar has joined #tripleo | 16:31 | |
openstackgerrit | James Slagle proposed a change to openstack/tripleo-specs: Varying the deploy cloud hypervisor type https://review.openstack.org/94586 | 16:31 |
*** jdob has quit IRC | 16:33 | |
*** fandi has joined #tripleo | 16:34 | |
*** noslzzp has quit IRC | 16:40 | |
*** pblaho has quit IRC | 16:44 | |
*** rpodolyaka1 has joined #tripleo | 16:47 | |
*** rpodolyaka1 has quit IRC | 16:50 | |
openstackgerrit | Alexis Lee proposed a change to openstack/tripleo-specs: Promote HEAT_ENV https://review.openstack.org/94910 | 16:52 |
*** noslzzp has joined #tripleo | 16:53 | |
*** rpodolyaka1 has joined #tripleo | 16:53 | |
*** eghobo has quit IRC | 16:56 | |
*** nati_ueno has joined #tripleo | 16:57 | |
*** eghobo has joined #tripleo | 16:59 | |
*** BadCub has quit IRC | 17:00 | |
*** matty_dubs|lunch is now known as matty_dubs | 17:01 | |
*** rdopiera has joined #tripleo | 17:02 | |
*** yamahata has quit IRC | 17:03 | |
*** andreaf has quit IRC | 17:06 | |
*** andreaf has joined #tripleo | 17:10 | |
*** andreaf has quit IRC | 17:10 | |
*** ifarkas has quit IRC | 17:10 | |
*** eghobo has quit IRC | 17:11 | |
*** hashar has quit IRC | 17:12 | |
*** andreaf has joined #tripleo | 17:12 | |
*** lucasagomes is now known as lucas-dinner | 17:21 | |
*** lparth_ has quit IRC | 17:27 | |
*** jdob has joined #tripleo | 17:28 | |
*** lparth has joined #tripleo | 17:28 | |
vinsh | mordred, I was watching this movie "the 5th estate" about wiki links... and the whole time, I kept thinking of you as Julian Assange.. some how.. :D | 17:28 |
vinsh | *wikileaks | 17:28 |
vinsh | kinda sorta.. but not really. | 17:29 |
mordred | vinsh: I may or may not be the same person | 17:33 |
vinsh | explains alot. | 17:33 |
lifeless | can I get two +2's and a +A on https://review.openstack.org/#/c/93848/ please? | 17:45 |
lifeless | critical path for any non-trivial deploy. | 17:45 |
greghaynes | oh right, that bug | 17:45 |
lifeless | bnemec: / SpamapS: / greghaynes ... | 17:45 |
greghaynes | once I finish epic battle with EEM | 17:46 |
lifeless | see you in 2015 | 17:46 |
bnemec | lifeless: Ick. Do we really want 24 hour tokens in the overcloud too? | 17:51 |
bnemec | Maybe a follow-up patch to make it configurable? | 17:51 |
greghaynes | This should be config-passthrough-able if you really want to configure it | 17:52 |
greghaynes | so probably doesnt need special config magic | 17:52 |
greghaynes | I cant find any documentation of what the default val is there :/ | 17:54 |
bnemec | Maybe I'm overthinking it and 24 hour tokens aren't a problem. It's just that hard-coding it is going to set it in all Keystones, so I want to at least consider the implications. | 17:54 |
bnemec | greghaynes: https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample#L1270 | 17:55 |
bnemec | Which matches the behavior mentioned in the bug. | 17:55 |
greghaynes | Yea, its there... doesnt say what it is by default though | 17:55 |
bnemec | greghaynes: That value is the default. The config generator includes the default value commented out. | 17:56 |
greghaynes | sweet | 17:56 |
*** dtantsur|afk is now known as dtantsur | 17:56 | |
lifeless | bnemec: so this is a problem in the overcloud too | 17:58 |
lifeless | bnemec: heat there will also fail in the same way if any stack operation takes > token time | 17:58 |
lifeless | bnemec: so we have to fix the underlying bug IMO, this is a stopgap. | 17:58 |
greghaynes | We were having issues with the keystone db getting too large though, right? | 17:59 |
greghaynes | and looks like this will make it grow by a factor of ~20 | 17:59 |
lifeless | AIUI the issues have been when the gc doesn't work | 18:00 |
bnemec | Yeah, although I think it was the expired token table that was causing issues so this wouldn't have much of an effect. | 18:00 |
greghaynes | Maybe the thing to do is run this on one of our actual racks for a day | 18:01 |
lifeless | this used to be the keystone default, if that helps | 18:04 |
greghaynes | ah, that explains our odd commented out line | 18:04 |
greghaynes | im +2 then | 18:04 |
* SpamapS awakens .. oh jet lag... you fun / infuriating character you | 18:06 | |
greghaynes | Youve almost made it to AUS time | 18:06 |
lifeless | SpamapS: home again? | 18:06 |
SpamapS | lifeless: no 2 more days here | 18:12 |
SpamapS | lifeless: re 24 hour tokens.. that is... a bad idea. | 18:12 |
SpamapS | lifeless: we'll completely destroy performance in a very short time. | 18:13 |
SpamapS | lifeless: we should pile on to help shardy fix it. | 18:13 |
lifeless | SpamapS: I agree! | 18:18 |
lifeless | SpamapS: would that be a reasonable bootstrappy thing ? | 18:18 |
lifeless | SpamapS: or is it unrelated to the big refactor? | 18:18 |
SpamapS | lifeless: so with 1 hour tokens, we see the token growing to GB already. Selects to delete users blow up even a large sized buffer pool. | 18:19 |
SpamapS | lifeless: making it 24x longer ... will make it 24x worse. | 18:19 |
greghaynes | ah, I thought it was just a non gc issue when we had the db size explosion :/ | 18:20 |
lifeless | SpamapS: are you saying there is no short term solution and we just have to tell people 'you cannot use heat for stacks > 1 hour in deploy time' ? | 18:20 |
lifeless | SpamapS: I had the impression from the summit sprint times that this was an ok interim step. | 18:20 |
lifeless | SpamapS: until a) we get the dogpile keystone config ratified by morganfainberg and co, and / or heat fixed. | 18:21 |
SpamapS | lifeless: it will snowball on itself. | 18:22 |
SpamapS | lifeless: I think we can maybe do 2 hour expire times. | 18:22 |
SpamapS | lifeless: but that will make stack deletes much worse. | 18:22 |
greghaynes | Have we set this in the 405 rack? I thought our overcloud deploys there were > 1hr | 18:23 |
SpamapS | Yeah I think 2 hours is fine. | 18:23 |
lifeless | we set it in the 405 and 306 racks AFAIK | 18:23 |
*** rpodolyaka1 has quit IRC | 18:23 | |
lifeless | otherwise we wouldn't be able to deploy anything with the reduced performance of serialised ironic | 18:23 |
* morganfainberg apologizes to everyone for not getting token stuff better yet | 18:23 | |
*** nati_ueno has quit IRC | 18:24 | |
morganfainberg | the real win will be https://review.openstack.org/#/c/95976/ and we're talking about creating a session token (tgt style) that can be refreshed and reused to help w/ this type of issue (unscoped) | 18:24 |
SpamapS | lifeless: but we didn't set 86400 right? | 18:25 |
*** ramishra has quit IRC | 18:25 | |
lifeless | SpamapS: we set 86400 | 18:25 |
lifeless | SpamapS: using this patch | 18:26 |
lifeless | SpamapS: and its been used there for the last 1.5 weeks | 18:26 |
SpamapS | morganfainberg: yesssssssss non persistant tokens will be fantastic. | 18:26 |
SpamapS | lifeless: but is that cloud used in any kind of ongoing basis, or torn down and re-deployed a lot? | 18:26 |
lifeless | SpamapS: I don't know if adam_g and greghaynes have been doing end-to-end runs or repeated-oc-runs | 18:27 |
lifeless | if the latter, then ongoing | 18:27 |
greghaynes | We can just leave one up for >24hr and do some stack-updates | 18:27 |
morganfainberg | SpamapS, well, that is on the slate for Juno 100% evne if i have to work nights and weekends throug RC to get it done. | 18:27 |
SpamapS | morganfainberg: heh, a common thread for the magical tasks we have ;) | 18:28 |
SpamapS | lifeless: well perhaps the urgency is not as high as I thought then. :p | 18:28 |
lifeless | SpamapS: do you mean the severity of impact? | 18:28 |
lifeless | SpamapS: I think its urgent to fix the ability to deploy a rack | 18:29 |
lifeless | SpamapS: I think its urgent to fix heat to do $whatever shardy says is needed here | 18:29 |
*** nati_ueno has joined #tripleo | 18:30 | |
SpamapS | lifeless: I just mean that I am pretty sure 1 day expiring tokens will be a workaround with many other consequences. | 18:30 |
SpamapS | But perhaps I'm wrong, and the token table will remain managable even in a moderately busy cloud. | 18:31 |
*** e0ne has joined #tripleo | 18:32 | |
lifeless | SpamapS: so here are our options today AIUI | 18:35 |
lifeless | - do nothing: Ironic + adam's serialisation patch + steve's heat patch for API memoisation can deploy 5-10 nodes. | 18:35 |
lifeless | - set a 2 hour limit: 20-30ish nodes | 18:36 |
lifeless | - set a 3 hours limit: 30 reliably I suspect | 18:36 |
lifeless | - set the limit only in the undercloud | 18:36 |
lifeless | the undercloud has a smaller/identical control plane than the overcloud at the moment so that doesn't make a huge amount of sense to me but - what do tyou think? | 18:37 |
SpamapS | lifeless: I'd be interested to hear what size the token table ends up at after a deploy completes on the undercloud. | 18:38 |
adam_g | lifeless, FYI, im testing right now across both racks without serialization to see if i can reproduce the issues i was seeing. it doesn't make sense you guys didn't hit the same issues but i was plagued by it all last week. | 18:38 |
morganfainberg | SpamapS, lifeless, i could help you run Redis (which iirc will work better than the memcache token backend) but it all depends on token churn. | 18:39 |
lifeless | adam_g: / greghaynes: you happen to have a oc deploy thats completed/completing and not torn down? can we gather that data for SpamapS ? | 18:40 |
morganfainberg | if the token churn is low-enough, it shouldn't be too bad to do a db token cleanup on a cron as a stop-gap (keystone-manage something-something-flush i think) | 18:40 |
adam_g | lifeless, i have one up right now im testing concurrent dd's | 18:40 |
lifeless | adam_g: ok, lets check token size post deploy | 18:40 |
adam_g | lifeless, post-deploy of the heat stack? | 18:40 |
lifeless | adam_g: so we *did* see lots of flaky hardware. And it may be the serialisation takes the edge off of that | 18:40 |
lifeless | adam_g: yeah, SpamapS wants to see how big the tokens table becomes | 18:41 |
adam_g | lifeless, yeah.. either way, configurable concurrency of deployments would be valuable knob even if we're not hitting an i/o limit on 30 nodes | 18:41 |
lifeless | adam_g: when we investigated why specific nodes failed to deploy, it was the same nodes repeatedly, which is why we decided 'its hardware' | 18:42 |
*** eghobo has joined #tripleo | 18:43 | |
SpamapS | morganfainberg: we already do token cleanup in cron | 18:44 |
SpamapS | adam_g: /mnt/state/var/lib/mysql/keystone/token.ibd should be the max size the token table has reached. | 18:46 |
adam_g | SpamapS, deploying the stack now, one min | 18:46 |
morganfainberg | SpamapS, ok you're probably ok increasing token TTL as a work around for now. But, i wouldn't go over ~5h life. | 18:47 |
SpamapS | morganfainberg: yeah maybe 5h will be a happy medium. | 18:47 |
morganfainberg | SpamapS, previous job had a customer churning tokens like mad (~100k / day) still took till about 20mil token rows to see real nasty nasty issues. | 18:48 |
*** cwolferh has quit IRC | 18:49 | |
SpamapS | morganfainberg: thats assuming you have a nicely tuned db server. :) | 18:49 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat token handling brain-damage. https://review.openstack.org/93848 | 18:49 |
morganfainberg | eh, percona, 3 nodes, minimal tuning | 18:49 |
morganfainberg | but a glut of ram on those boxes | 18:50 |
openstackgerrit | Jon-Paul Sullivan (jp_at_hp) proposed a change to openstack/tripleo-image-elements: Start the cloud-init nic before cloud-init https://review.openstack.org/96221 | 18:50 |
bnemec | lifeless: Did you mean to make it only 4 hours? | 18:51 |
SpamapS | morganfainberg: yeah, in theory our current target for controllers would be able to have a ton of RAM allocated to the buffer pool. | 18:51 |
jerryz | lifeless: what is the cpu_allocation_ratio in your test cloud? and ram_allocation_ratio? | 18:52 |
*** hashar has joined #tripleo | 18:52 | |
morganfainberg | SpamapS, i've had success running in that config | 18:52 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat token handling brain-damage. https://review.openstack.org/93848 | 18:52 |
lifeless | bnemec: yes, but I flumped the commit messages | 18:52 |
lifeless | jerryz: which test cloud ? | 18:52 |
jerryz | lifeless: tripleo-test-cloud or rh1 | 18:54 |
lifeless | jerryz: ah, the ones the jenkins CI slaves run in | 18:54 |
bnemec | lifeless: Okay, +2. | 18:54 |
lifeless | jerryz: they are stock tripleo kvm clouds, with no tweaks | 18:54 |
*** cwolferh has joined #tripleo | 18:55 | |
lifeless | jerryz: so cpu_allocation_ratio unset, and ra_allocation_ratio set to 1.0 | 18:55 |
jerryz | lifeless: let me ask in another way, the max servers vs physical cpu cores? | 18:56 |
jerryz | lifeless: according to current usage | 18:57 |
*** shardy is now known as shardy_afk | 18:57 | |
lifeless | I think you are asking how many physical cores in use vs virtual cores in use | 18:57 |
jerryz | lifeless: yehs | 18:57 |
lifeless | we run one virtual core per physical core | 18:57 |
lifeless | since slow CI is not good | 18:58 |
jerryz | lifeless: in current usage model, what is the peak number of used nodes?number of concurrent patches tested | 19:00 |
*** jp_at_hp has quit IRC | 19:00 | |
*** matty_dubs is now known as matty_dubs|gone | 19:01 | |
jerryz | lifeless: because now we have some difficulty finding enough resource to guarantee a 1:1 ratio for cpu. that's why i am asking | 19:01 |
lifeless | jerryz: we're running 123 VMs with 456 vcpus at the moment, and 1.5TB of memory in use and 2.5TB of local disk storage. | 19:04 |
lifeless | jerryz: in the HP1 region | 19:04 |
lifeless | jerryz: its probably ok to run a smaller region at 1:1 rather than overcommitting. | 19:04 |
lifeless | jerryz: note that those stats don't include the testenvs used to emulate baremetal; we have 10 of those machines at the moment, 24 cores each, fully deployed | 19:05 |
lifeless | I think we overcommit there slightly because the usage pattern of emulated baremetal has a lot of quiet periods | 19:06 |
*** e0ne has quit IRC | 19:10 | |
*** e0ne has joined #tripleo | 19:10 | |
jerryz | lifeless: i see. thanks for the info. how many of 123 vms are visible to nodepool | 19:11 |
*** dtantsur is now known as dtantsur|afk | 19:12 | |
lifeless | actually those stats are out - nova bug :(. | 19:12 |
lifeless | 45 nodepool vms at the moment, I think | 19:12 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-image-elements: Workaround Heat not handling token expiry https://review.openstack.org/93848 | 19:13 |
lifeless | better message there ^ | 19:14 |
*** e0ne has quit IRC | 19:14 | |
*** e0ne_ has joined #tripleo | 19:14 | |
jerryz | lifeless: ok. we will start from a small region first. about ssh jump host, nodepool's sshclient needs some work to use openssh config. jenkins also does not honor it. the link you posted in the spec about prefix suffix command is no longer working. | 19:17 |
lifeless | jerryz: well, we'll need to work through those issues :) | 19:18 |
*** chuckC has quit IRC | 19:19 | |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/os-apply-config: Updated from global requirements https://review.openstack.org/96233 | 19:19 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/os-cloud-config: Updated from global requirements https://review.openstack.org/93253 | 19:19 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/os-collect-config: Updated from global requirements https://review.openstack.org/96234 | 19:19 |
lifeless | bnemec: ^ I made the commit message less derogatory; could I get the +2 back ? THanks! https://review.openstack.org/93848 | 19:19 |
openstackgerrit | OpenStack Proposal Bot proposed a change to openstack/os-refresh-config: Updated from global requirements https://review.openstack.org/96235 | 19:19 |
*** akuznetsov has quit IRC | 19:20 | |
lifeless | greghaynes: did I get a +2 from you as well ? | 19:20 |
jerryz | lifeless: out to lunch, talk to you later. thanks! | 19:20 |
bnemec | lifeless: It was a trivial rebase so my +2 stuck. | 19:20 |
lifeless | bnemec: ah cool | 19:20 |
greghaynes | adam_g: The build with the keystone timeout change is still going? | 19:21 |
*** e0ne_ has quit IRC | 19:21 | |
greghaynes | I think ill wait until we can poke at it | 19:22 |
lifeless | greghaynes: adam_g is building with parallel ironic; keystone unaltered (but all the builds this last week have had the keystone timeout change) | 19:22 |
adam_g | greghaynes, keystone timeout change? | 19:22 |
greghaynes | oh | 19:22 |
*** e0ne has joined #tripleo | 19:22 | |
greghaynes | conversation overload | 19:22 |
greghaynes | ok, and the last week thing - was that at 24hr or 4? | 19:22 |
greghaynes | either way actually | 19:23 |
greghaynes | should be fine - I know we had stacks up for over 4hr | 19:23 |
*** rpodolyaka1 has joined #tripleo | 19:23 | |
greghaynes | what actually does our partitioning for use-ephemeral? | 19:25 |
lifeless | the way its meant to work is this | 19:25 |
lifeless | flavour with ephemeral in it | 19:25 |
lifeless | -> nova instance metadata | 19:25 |
lifeless | -> nova ironic driver | 19:25 |
lifeless | -> ironic api | 19:26 |
lifeless | -> ironic conductor then does the partition table and mkfs for it | 19:26 |
greghaynes | ok, so theres supposed to be some logic in ironic about how to size the partitions? | 19:27 |
*** pelix has quit IRC | 19:28 | |
greghaynes | oh no, we pass that in | 19:28 |
greghaynes | ah ok, its just all set in the flavor | 19:29 |
lifeless | right via the flavour | 19:29 |
greghaynes | *lightbulb* | 19:29 |
lifeless | flavours are cloud operator tools to describe available machine configurations to tenants | 19:30 |
lifeless | one reason its done by the operator is to prevent bad-packing dos attacks | 19:31 |
lifeless | though whether thats really a risk is arguable IMO | 19:32 |
greghaynes | hah, ok | 19:32 |
*** nati_ueno has quit IRC | 19:32 | |
greghaynes | adam_g: Do you grok what Ng was saying about preserve_ephemeral in ironic not actually provisioning an ephemeral partition? | 19:36 |
lifeless | SpamapS: so I think a questionI had for you got lost | 19:36 |
lifeless | SpamapS: the token thing in heat | 19:36 |
lifeless | SpamapS: is that something the team you're bootstrapping would consider in-remit ? | 19:36 |
adam_g | greghaynes, i believe the gist is that we're ending up with a single partition (/dev/sda2 mounted at /). all stateful stuff ends up in /mnt/state/ on the same partition. | 19:38 |
greghaynes | Yep, but is this a feature change from nova-bm? | 19:38 |
lifeless | its a bit of an annoying fallback mode | 19:38 |
lifeless | greghaynes: yes | 19:38 |
lifeless | greghaynes: in nova-bm we get a separate partition that can survive rebuilds | 19:39 |
greghaynes | What do we get in ironic? | 19:39 |
SpamapS | lifeless: Not sure. | 19:40 |
lifeless | greghaynes: well it works for me | 19:40 |
lifeless | ├─sda1 8:1 0 30G 0 part /mnt | 19:40 |
lifeless | ├─sda2 8:2 0 1023.5K 0 part | 19:40 |
lifeless | └─sda3 8:3 0 10G 0 part / | 19:40 |
*** rpodolyaka1 has quit IRC | 19:40 | |
lifeless | oh | 19:40 |
lifeless | I may have used nova-bm, nuts | 19:41 |
greghaynes | cached -env.json? | 19:41 |
lifeless | greghaynes: the -env doesn't affect the disk image | 19:41 |
greghaynes | true | 19:41 |
lifeless | time to build new images | 19:42 |
SpamapS | did we get a table size yet btw? | 19:42 |
adam_g | SpamapS, on its way | 19:42 |
lifeless | SpamapS: re (not sure)- so perhaps run it up the flagpole? It might be a good way to get known to the heat cores | 19:43 |
lifeless | SpamapS: and its certainly critical | 19:43 |
lifeless | SpamapS: as well as something the long term design needs IMO - can't react arbitrary times later without it fixed | 19:44 |
SpamapS | lifeless: shardy is already 50% done.. so it may not be the best thing to tackle. | 19:44 |
lifeless | SpamapS: didn't realise | 19:44 |
lifeless | nobudy tells me nudding | 19:44 |
SpamapS | lifeless: yeah, it's a straight forward issue. We just need tests. | 19:44 |
SpamapS | https://review.openstack.org/96222 | 19:45 |
lifeless | how about a tempest test? | 19:45 |
lifeless | set the keystone token expiry to 2 minutes | 19:45 |
lifeless | then do stuff | 19:45 |
lifeless | I'm willing to be a *lot* of openstack code will fall over in a heap :) | 19:45 |
SpamapS | True | 19:46 |
morganfainberg | lifeless, never set below 5minutes :P accepted clock-skew window (but i don't disagree) | 19:46 |
morganfainberg | in fact... we should probably make the minimum 300 seconds... | 19:47 |
SpamapS | morganfainberg: in a testing situation, you'd be on the same box | 19:48 |
SpamapS | morganfainberg: well in devstack-gate anyway | 19:48 |
morganfainberg | SpamapS, sure, but doesn't mean any real deployment could ever really support it, iirc | 19:48 |
SpamapS | Another option is to introduce time warping into devstack-gate and tempest by mucking with clocks. | 19:49 |
lifeless | morganfainberg: so the question is 'how can we do integration tests that make common operations which are occasionally slow... exceed token timeout time' | 19:49 |
lifeless | morganfainberg: in order to test that token renewal (where appropriate) is working | 19:50 |
morganfainberg | lifeless, right. sleep? :P i mean no no no. | 19:50 |
lifeless | morganfainberg: lol, in the gate man | 19:50 |
mordred | oh yes. please GOD add a bunch of sleep calls into the tempest gate | 19:50 |
morganfainberg | lifeless, *snicker* | 19:50 |
morganfainberg | lifeless, see mordred likes this plan! :P | 19:51 |
morganfainberg | >.> | 19:51 |
mordred | morganfainberg: that should be your clue that it's very very dangerous | 19:51 |
morganfainberg | mordred, oh good! i have a metric to go by! | 19:51 |
giulivo | guys, wanted to ask for your opinion about this bug https://bugs.launchpad.net/tripleo/+bug/1226310 | 19:51 |
uvirtbot | Launchpad bug 1226310 in tripleo "Nova bm operations fail when LIBVIRT_DEFAULT_URI not set" [Medium,Triaged] | 19:51 |
morganfainberg | lifeless, i could see a benefit to allowing for requesting an initial token with less time-to-live than the maximum | 19:52 |
giulivo | basically people @nova are so and so about introducing anything new into nova-bm, so even if I manage to land the default_uri config setting in ironic, we can't be sure it will be in nova-bm | 19:52 |
morganfainberg | lifeless, immidiately, probably setting the value silly low and watching things explode :( | 19:52 |
*** markmc has quit IRC | 19:52 | |
giulivo | I wonder if that is still worth trying or if we should instead try to just set the env variable for the tripleo user in something like .bashrc ? | 19:53 |
lifeless | giulivo: we should fix things where the fix belongs | 19:54 |
morganfainberg | lifeless, oh i bet you could do this by issueing a delete on the active token | 19:54 |
lifeless | giulivo: ironic will have the same issue | 19:54 |
SpamapS | mordred: so instead of sleep, what we need instead is just a way to warp time. I bet there's already an LD_PRELOAD that lets you do that. | 19:55 |
morganfainberg | lifeless, revoke the token. auth_token should respond the same way if the token is expired or revoked | 19:55 |
lifeless | giulivo: so I'd start by fixing it in Ironic then you can discuss 'backporting' the feature to nova-bm | 19:55 |
giulivo | lifeless, got it, thanks | 19:56 |
morganfainberg | lifeless, using v2.0 DELETE /v2.0/tokens/<token_id> v3 is ... DELETE /v3/auth with x-subject-token header as the token to be revoked iirc | 19:56 |
morganfainberg | lifeless, but that should solve the immediate need (slightly different workings under the hood) | 19:56 |
*** shardy_afk is now known as shardy | 19:58 | |
lifeless | morganfainberg: there we go :) nice | 20:00 |
lifeless | SpamapS: ^ and thus we can get a tempest test | 20:00 |
lifeless | giulivo: really the bug is that Fedora has very strange defaults, but thats a different discussion :) | 20:00 |
lifeless | giulivo: the rh folk have already tried to get it address AIUI | 20:00 |
giulivo | lifeless, well no I don't agree there... why shall it default to qemu:///system when issued by a normal user? | 20:02 |
giulivo | and also, why shall the virtual_power rely on a particular configuration of the libvirt client? | 20:02 |
lifeless | giulivo: it already relys on a particular configuration in that the vms have to be manually created | 20:03 |
*** chuckC has joined #tripleo | 20:03 | |
lifeless | giulivo: but sure | 20:04 |
giulivo | lifeless, indeed the problem is exactly that we do not rely on the default setting when creating the VMs, we create them via our own DEFAULT_URI setting, but nova-bm doesn't have it set | 20:04 |
lifeless | giulivo: in terms of strange defaults and a case for that - fedora is primarily a desktop OS, but the default doesn't use hardware accelerated hypervisors | 20:05 |
lifeless | giulivo: this is analogous to not using hardware accelerated graphics even if the hardware has it | 20:05 |
lifeless | giulivo: Personally, I think thats very strange. | 20:05 |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 20:05 |
*** nati_ueno has joined #tripleo | 20:06 | |
giulivo | lifeless, wait if I was to follow up on that, it would be off topic ... I won't get into the trap :P | 20:06 |
lifeless | I don't think we worry too much about off-topic here :) | 20:07 |
lifeless | anyhoo | 20:07 |
lifeless | ironic first ;) | 20:07 |
giulivo | so anyway, regarding the fedora default, it really isn't a fedora default either.... it is libvirt default, fedora just lets the developer pick their default | 20:07 |
giulivo | but see, I don't question ubuntu decision of manually changing it to qemu:///system I'm just not sure this can be blamed on the distro really | 20:08 |
*** noslzzp has quit IRC | 20:08 | |
lifeless | hmm, on Ubuntu (and I believe debian) have it qemu:///system | 20:08 |
giulivo | yeah but it is manually set in the libvirt.conf file | 20:08 |
giulivo | the global libvirt.conf | 20:09 |
*** rlandy has quit IRC | 20:09 | |
lifeless | giulivo: not on my one | 20:09 |
lifeless | giulivo: /etc/libvirt/libvirt.conf is all comments for me | 20:09 |
giulivo | so that means, unless you have DEFAULT_URI set, you are using qemu:///session I think | 20:09 |
*** rpodolyaka1 has joined #tripleo | 20:10 | |
*** nati_ueno has quit IRC | 20:11 | |
lifeless | however it works, on Ubuntu, I end up making qemu://system and it all just works | 20:11 |
lifeless | virt-manager with a connection to qemu:///system, for instance, shows the same vms | 20:11 |
openstackgerrit | James Slagle proposed a change to openstack/tripleo-specs: Varying the deploy cloud hypervisor type https://review.openstack.org/94586 | 20:11 |
lifeless | perhaps session==system? here | 20:11 |
giulivo | huh, well it shouldn't | 20:11 |
giulivo | but at this point, it is worth investigating further on ubuntu/debian and see if that is behaving differently for some other reason | 20:12 |
giulivo | will do :) | 20:12 |
*** dshulyak_ has quit IRC | 20:13 | |
openstackgerrit | Gregory Haynes proposed a change to openstack/tripleo-image-elements: Add os-is-bootstrap-host element and script https://review.openstack.org/86435 | 20:14 |
SpamapS | lifeless: cool. I will lay it out as a low hanging fruit when we're in the office in 9 hours ;) | 20:14 |
*** rpodolyaka1 has quit IRC | 20:14 | |
lifeless | adam_g: did ironic fail ? | 20:15 |
adam_g | lifeless, on the concurrent image writeouts? not in the way i was seeing last week, but there has been a reoccuring iscsiadm error | 20:15 |
Ng | greghaynes: a fresh deployment from completely blank hardware with ironic, will pass preserve_ephemeral and that causes ironic to skip even trying to mkfs the ephemeral partition | 20:16 |
adam_g | lifeless, see https://bugs.launchpad.net/ironic/+bug/1321504 | 20:16 |
uvirtbot | Launchpad bug 1321504 in ironic "Launching ~30 nova instances /w Ironic + pxe_ipmitool results in many ERRORs" [Medium,In progress] | 20:16 |
adam_g | my OC deploys have been delayed b/c a fix for the trusty cloud-init hang was not pulled in, should be up shortly | 20:16 |
lifeless | Ng: is there a bug for that in Ironic now ? | 20:17 |
Ng | greghaynes: see around line 231 of ironic/drivers/modules/deploy_utils.py | 20:17 |
adam_g | oh actually, one is done | 20:17 |
Ng | lifeless: not yet, I wanted to discuss it a little. I'm not 100% convinced that it's a bug, we're asking it to not touch the ephemeral partition, it seems unreasonable to expect it to be magical given that constraint | 20:17 |
adam_g | SpamapS, 188M May 28 20:18 token.ibd | 20:17 |
Ng | if it does some dumb "is ephemeral and ext fs" it will happily blow away some weird fs someone chooses to use | 20:18 |
SpamapS | bingo | 20:18 |
SpamapS | adam_g: lovely | 20:18 |
*** akrivoka has joined #tripleo | 20:18 | |
adam_g | SpamapS, contains 9117 rows | 20:18 |
SpamapS | adam_g: so I think 4h is a nice compromise while we work through the re-auth issue in Heat | 20:18 |
adam_g | SpamapS, this is a slightly smaller 24 node clode | 20:18 |
adam_g | clode? | 20:19 |
SpamapS | adam_g: sure, but I'm ok with 10x the size | 20:19 |
SpamapS | clode is a scottish cloud | 20:19 |
Ng | lifeless: my thinking was that we'd be better off having something like an o-r-c job that checks for a filesystem and makes it otherwise, since that's much easier for operators to discover and tune/override | 20:19 |
slagle | lifeless: fwiw, qemu://session != no kvm | 20:19 |
slagle | lifeless: session vs system is a way to have per-user vm's, configurations, etc | 20:19 |
slagle | the crux of this issue has always been that sometimes the scripts use virsh, and sometimes they use sudo virsh | 20:20 |
slagle | and sometimes they use hardcoded /var/lib/libvirt/... paths, etc, which of course assumes you're using qemu://system when you later call virsh | 20:20 |
lifeless | slagle: which *on redhat and fedora* are different, but they aren't on Ubuntu | 20:21 |
lifeless | slagle: the hardcoded paths are of course bugs | 20:21 |
lifeless | slagle: the sesssion vs system both letting kvm be used is something I didn't know - thanks ! | 20:21 |
lifeless | Ng: that happens way too late | 20:21 |
lifeless | Ng: it is a bug, because a new instance has nothing to preserve. | 20:22 |
lifeless | Ng: we have to do this in early userspace or else services that we've configured to start - like mysql - will write to /mnt/state *under* the filesystem mount point | 20:22 |
Shrews | Ng: preserve_ephemeral is passed only on a rebuild | 20:22 |
Shrews | ironic/nova/virt/ironic/driver.py, about line 652 | 20:23 |
Shrews | (if i've followed the bits of the conversation correctly) | 20:23 |
*** e0ne has quit IRC | 20:29 | |
*** e0ne has joined #tripleo | 20:30 | |
Ng | hrm, well that would deepen the mystery and definitely justify a bug | 20:30 |
*** andreaf has quit IRC | 20:32 | |
*** e0ne has quit IRC | 20:33 | |
*** noslzzp has joined #tripleo | 20:37 | |
greghaynes | Ng: So is there just unpartitioned space left on the device? | 20:38 |
greghaynes | oh | 20:38 |
Shrews | greghaynes: i'm now curious, what's the tl;dr of the problem? too much scrollback | 20:39 |
*** mestery has quit IRC | 20:39 | |
*** mestery has joined #tripleo | 20:40 | |
greghaynes | Shrews: Deploying nodes via ironic with preserve ephemeral, basically we dont end up with an ephemeral partition (let alone fs) to mount | 20:40 |
Ng | it seems like we get the partition, it's just not formatted | 20:40 |
Shrews | oh, hrm. not seen that at all in my devstack tests. i just added a tempest test for that (but not yet merged) | 20:42 |
Ng | ok, this isn't what we thought it was, there is a fs, it's just not being mounted | 20:43 |
Ng | maybe | 20:43 |
Shrews | i just looked at that element the other day, wondering where it was mounted | 20:45 |
Shrews | https://github.com/openstack/tripleo-image-elements/blob/master/elements/use-ephemeral/os-refresh-config/pre-configure.d/00-fix-ephemeral-mount#L18 | 20:45 |
Shrews | but that's all blackmagic to me | 20:45 |
*** jdonalds has joined #tripleo | 20:46 | |
lifeless | I so the idea is | 20:46 |
lifeless | nova says 'ephemeral is over there XXX' | 20:47 |
lifeless | if nova isn't saying that | 20:47 |
lifeless | that would be a problem | 20:47 |
lifeless | Ng: can you pastebin an os-collect-config --print, of a faulty setup | 20:47 |
Ng | adam_g: ^^ | 20:47 |
lifeless | ok, --build-only has regressed and now tries to do things. sadface. | 20:48 |
lifeless | we really need a test for it | 20:48 |
lifeless | I was asking mordred about this the other day | 20:48 |
openstackgerrit | Robert Parker proposed a change to openstack/tripleo-heat-templates: Setup SSL for Ceilometer https://review.openstack.org/96257 | 20:49 |
adam_g | lifeless, http://paste.ubuntu.com/7539163/ | 20:50 |
*** nati_ueno has joined #tripleo | 20:50 | |
lifeless | bnemec: hey you're around now :) - so I had a question | 20:51 |
Ng | hmm, no ephemeral0 in the block-device-mapping | 20:51 |
lifeless | you've looked into running tests in various places | 20:51 |
lifeless | bnemec: if I wanted to test 'devtest.sh --build-only --trash-my-machine' in the gate, do you think you could point me at the right places to poke? | 20:52 |
*** jdob has quit IRC | 20:52 | |
lifeless | not that trash-my-machine *should* be needed, but one thing at a time | 20:52 |
greghaynes | Dont change anything, but somehow find a way to trash my machine | 20:52 |
mordred | lifeless: you were asking me about what? | 20:53 |
bnemec | lifeless: I think you could add that as a tox target, right? | 20:53 |
lifeless | greghaynes: right so right now it still makes a new testenv, for instance. | 20:54 |
lifeless | greghaynes: unless you tell it not to. | 20:54 |
*** noslzzp has quit IRC | 20:55 | |
mordred | lifeless: just make a job similar to: {pipeline}-requirements-integration-dsvm | 20:56 |
mordred | lifeless: and specify that it wants to run on a bare-precise node | 20:56 |
mordred | that will give you root on the box, and we'll throw the box away after anyway | 20:56 |
mordred | then you can skip devstack-checkout and friends | 20:56 |
mordred | lifeless: actually - look at gate-openstack-chef-repo | 20:57 |
mordred | just do something like that except remove "revoke-sudo" | 20:57 |
lifeless | mordred: devstack-checkout and friends are helpful perf stuff though | 20:57 |
mordred | because you want to keep sudo | 20:57 |
mordred | lifeless: ah - good point | 20:58 |
lifeless | mordred: we'll want to consolidate some of the tripleo glue at the same time | 20:58 |
mordred | so do those - in any case- bare-precise node is the thing you want - and from there you can do anything | 20:58 |
lifeless | I really wish new jobs could get tested as part of gating | 20:59 |
mordred | (and yes, we're working on getting trusty nodes) | 20:59 |
lifeless | so we could see if it works before its reviewed | 20:59 |
mordred | ++ | 20:59 |
lifeless | last time I raised this -infra folk were skeptical :) | 20:59 |
*** noslzzp has joined #tripleo | 20:59 | |
mordred | well, I want that too - I think we're probably a few steps away from being able to do it | 20:59 |
*** untriaged-bot has joined #tripleo | 21:00 | |
untriaged-bot | Untriaged bugs so far: | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1321943 | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1318767 | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1314978 | 21:00 |
mordred | but it should be easier to do when we're turbo-hipstered, because then we'll have a thing that can more directly run jobs on things | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1315355 | 21:00 |
uvirtbot | Launchpad bug 1321943 in tripleo "Ceilometer Swift polling on overcloud control node fails with a 403 forbidden error" [Undecided,In progress] | 21:00 |
uvirtbot | Launchpad bug 1318767 in tripleo "apache element SSL cert check fails" [Undecided,New] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1323167 | 21:00 |
uvirtbot | Launchpad bug 1314978 in tripleo "Cloud vm not pingable after overcloud upgrade " [Undecided,New] | 21:00 |
uvirtbot | Launchpad bug 1315355 in tripleo "Upgrade of overcloud failed with "Connection to neutron failed: Maximum attempts reached"" [Undecided,New] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1319473 | 21:00 |
uvirtbot | Launchpad bug 1323167 in tripleo "overcloud can't create an instance by neutron error" [Undecided,New] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1316675 | 21:00 |
uvirtbot | Launchpad bug 1319473 in tripleo "devtest_testenv.sh doesn't honour overcloud_computescale or overcloud_controlscale" [Undecided,In progress] | 21:00 |
untriaged-bot | https://bugs.launchpad.net/tripleo/+bug/1320090 | 21:00 |
uvirtbot | Launchpad bug 1316675 in tripleo "Saving a devtest VM results in error" [Undecided,In progress] | 21:00 |
uvirtbot | Launchpad bug 1320090 in tripleo "pxe_ephemeral_format': u'ext4 left on nodes after deploys deleted" [Undecided,New] | 21:00 |
*** untriaged-bot has quit IRC | 21:00 | |
mordred | as opposed to now, when we'd have to blow the job out into xml and inject it into a jenkins | 21:00 |
mordred | which is possible - but orchestrating that would be a bunch of work that we _know_ we'd be throwing away ina few months when jenkins diafs | 21:01 |
SpamapS | mordred: jenkins can hear you, and is going to spit in your tea | 21:01 |
mordred | SpamapS: what else is new? | 21:02 |
*** eghobo has quit IRC | 21:04 | |
*** noslzzp has quit IRC | 21:06 | |
*** eghobo has joined #tripleo | 21:07 | |
Ng | lifeless: is racks.txt the sum total of everything we know about the hardware in our not-CI rack? | 21:10 |
lifeless | I don't know what racks.txt is | 21:11 |
lifeless | Ng: do you mean the 405 rack? | 21:11 |
Ng | I do, I was trying to be oblique ;) | 21:11 |
Ng | filing jira tickets for stuck hosts and I'm being asked for machine serials, but I don't see them anywhere | 21:11 |
*** casanch1 has joined #tripleo | 21:12 | |
*** BadCub has joined #tripleo | 21:18 | |
*** cwolferh_ has joined #tripleo | 21:21 | |
*** cwolferh has quit IRC | 21:24 | |
lifeless | Ng: so there was an early email that had stuff | 21:27 |
lifeless | let me look it up | 21:27 |
*** eghobo has quit IRC | 21:27 | |
lifeless | Ng: nope, ilo password and ip only | 21:28 |
lifeless | Ng: everything else is in ~ on the 405 rack jump host | 21:28 |
Ng | lifeless: ok thanks | 21:28 |
*** noslzzp has joined #tripleo | 21:29 | |
*** jdonalds has quit IRC | 21:34 | |
*** casanch1_ has joined #tripleo | 21:38 | |
*** Penick has joined #tripleo | 21:39 | |
*** Penick has quit IRC | 21:39 | |
*** hashar has quit IRC | 21:39 | |
*** akrivoka has quit IRC | 21:41 | |
*** casanch1 has quit IRC | 21:41 | |
*** TravT has joined #tripleo | 21:43 | |
*** casanch1_ has quit IRC | 21:43 | |
*** nati_ueno has quit IRC | 21:49 | |
*** ddieterly has quit IRC | 21:51 | |
*** bcrochet is now known as bcrochet|g0ne | 21:53 | |
*** ddieterly has joined #tripleo | 21:55 | |
lifeless | jerryz: you might like to report back in the infra-specs spec for using jump host about what you've found out | 21:57 |
jerryz | lifeless: sure. | 21:59 |
*** ekarlso has quit IRC | 22:01 | |
lifeless | Ng: have you filed the bug on ironic ? | 22:01 |
lifeless | Ng: if not, please do. | 22:01 |
lifeless | Ng: or tell me to ;) | 22:01 |
Ng | lifeless: it's in an open tab, I'll finish it off shortly | 22:02 |
lifeless | Shrews: so ^ the bug is that the block device mapping for the ephemeral device is missing | 22:04 |
*** yamahata has joined #tripleo | 22:06 | |
Ng | https://bugs.launchpad.net/ironic/+bug/1324286 | 22:07 |
uvirtbot | Launchpad bug 1324286 in ironic "ephemeral partition not being mounted" [Undecided,New] | 22:07 |
Ng | terrible subject | 22:07 |
*** nati_ueno has joined #tripleo | 22:07 | |
*** greghaynes has quit IRC | 22:07 | |
lifeless | Ng: putting it in the etherpad ? | 22:07 |
Ng | done | 22:08 |
*** greghaynes has joined #tripleo | 22:09 | |
*** lucas-dinner has quit IRC | 22:13 | |
*** edmund1 has joined #tripleo | 22:23 | |
*** edmund has quit IRC | 22:24 | |
*** greghaynes has quit IRC | 22:27 | |
*** ekarlso has joined #tripleo | 22:27 | |
*** jdonalds has joined #tripleo | 22:28 | |
*** greghaynes has joined #tripleo | 22:28 | |
*** jml has quit IRC | 22:28 | |
*** weshay has quit IRC | 22:29 | |
*** jml has joined #tripleo | 22:30 | |
Shrews | Ng: where does the block-device-mapping data come from? | 22:30 |
*** jp_at_hp has joined #tripleo | 22:31 | |
Ng | Shrews: nova makes it available to the instance, via the ec2 metadata URL | 22:31 |
Ng | (cloud-init then consumes it and does various things, including writing out /etc/fstab entries for ephemeralN entries) | 22:32 |
Ng | I'm afraid I don't know exactly how nova constructs it. I started poking around nova and the ironic nova driver code to see what I could find, but I didn't get anywhere conclusive | 22:32 |
*** greghaynes has quit IRC | 22:32 | |
Shrews | Ng: so it's ironic's responsibility to update the metadata? seems more like nova's responsibility, but i'm probably wrong | 22:33 |
Ng | Shrews: I would suspect that Ironic would need to inform nova that it has created an ephemeral partition, and its label (i.e. ephemeral0) | 22:33 |
Ng | but I don't know that for fact | 22:33 |
*** greghaynes has joined #tripleo | 22:34 | |
Shrews | Ng: i'll poke around a bit tomorrow and see what I can find out for you, unless someone beats me to it | 22:34 |
*** eghobo has joined #tripleo | 22:34 | |
*** greghaynes has quit IRC | 22:37 | |
*** greghaynes has joined #tripleo | 22:38 | |
*** greghaynes has quit IRC | 22:38 | |
*** greghaynes has joined #tripleo | 22:39 | |
openstackgerrit | lifeless proposed a change to openstack/tripleo-incubator: Add -c support to devtest_seed.sh. https://review.openstack.org/96014 | 22:42 |
openstackgerrit | lifeless proposed a change to openstack/tripleo-incubator: Fix --build-only. https://review.openstack.org/96303 | 22:42 |
jogo | lifeless: message:"tripleo" AND build_queue:"check-tripleo" AND build_name:"check-tripleo-overcloud-precise" | 22:51 |
jogo | lifeless: console logs for tripleo cloud | 22:53 |
lifeless | jogo: thats for narrow selection right? simple searches will find it all regardless? | 22:53 |
jogo | yeah, you just specify the build_name | 22:54 |
lifeless | jogo: legendary | 22:54 |
jogo | lifeless: try build_name:"check-tripleo-overcloud-precise" AND message:"failed" | 22:54 |
jogo | or build_name:"check-tripleo-overcloud-precise" AND build_status:"FAILURE" | 22:55 |
jogo | for all failures | 22:55 |
jogo | or to track the number of failed jobs: build_name:"check-tripleo-overcloud-precise" AND build_status:"FAILURE" AND message:"Finished: FAILURE" | 22:55 |
jogo | but graphite can do that too | 22:55 |
greghaynes | argh, probably should stop giving servers the same IP as my IRC box | 22:55 |
lifeless | greghaynes: :> | 22:55 |
lifeless | jogo: <3 seed log next yah? We can change the format we capture them in in toci easily if that will help | 22:56 |
lifeless | jogo: e.g. subdir no tar, or whatever | 22:56 |
jogo | lifeless: yeah subdir would be great | 22:56 |
lifeless | jogo: sanity checking - you enabled all the tripleo jobs, not just that one ? | 22:56 |
lifeless | jogo: have a look in openstack-infra/tripleo-ci | 22:57 |
jogo | lifeless: I didn't do anything ^_^ infra already supported it | 22:57 |
lifeless | jogo: huh blink | 22:57 |
lifeless | jogo: in tripleo-ci in the root see toci_devtest.sh | 22:57 |
lifeless | there is a function in there | 22:57 |
lifeless | get_state_from_host | 22:58 |
jogo | http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/logstash/jenkins-log-client.yaml#n16 | 22:58 |
jogo | lifeless: ahh cool | 22:59 |
jogo | so on the infra side what happens is because you already have logs in the infra log pipeline if you name them right or update that logstash yaml file, they get added to logstash.o.o automatically | 22:59 |
lifeless | (teaching you how its done so you can change both sides at the same time) | 22:59 |
jogo | and yes it will work for all tests | 22:59 |
jogo | you can specify the queue insetad | 22:59 |
jogo | such as: build_queue:"check-tripleo" | 23:00 |
jogo | lifeless: for starters what do you think of just doing the usptart logs | 23:03 |
lifeless | that would be a great start | 23:03 |
jogo | h | 23:04 |
*** BadCub has quit IRC | 23:04 | |
openstackgerrit | Michael Tupitsyn proposed a change to openstack/tripleo-incubator: Do not create admin user if it exists already https://review.openstack.org/96307 | 23:07 |
*** rbrady has quit IRC | 23:09 | |
*** yamahata has quit IRC | 23:09 | |
*** ddieterly has quit IRC | 23:11 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/tripleo-ci: Extract upstart logs so they can be loaded into logstash https://review.openstack.org/96308 | 23:13 |
*** edmund1 has quit IRC | 23:17 | |
jogo | https://review.openstack.org/#/c/96308/ isn't self gated? | 23:20 |
Shrews | lifeless: just posted a comment on that eph device bug. not as simple as i first thought, so will need some discussion with lucas | 23:36 |
*** jtomasek has quit IRC | 23:36 | |
*** jp_at_hp has quit IRC | 23:37 | |
bnemec | jogo: We don't actually gate on the tripleo queue jobs, but we do require them to pass before approving. | 23:37 |
bnemec | You can see that change on the zuul status page under the check-tripleo list. | 23:37 |
*** jp_at_hp has joined #tripleo | 23:38 | |
lifeless | jogo: infra won't let us gate yet | 23:39 |
lifeless | we haven't met e.g. the reliability requirement | 23:39 |
lifeless | Shrews: ack | 23:42 |
greghaynes | Do we know what the cause is for tests that fail with tar: /var/log/host_info.txt: file changed as we read it when grabbing logs after success | 23:45 |
lifeless | presumably something wrote to it... | 23:49 |
*** jtomasek has joined #tripleo | 23:49 | |
greghaynes | as it read it? | 23:49 |
greghaynes | I just know ive seen it several times now, curious if anyone debugged yet | 23:50 |
lifeless | guessing | 23:51 |
lifeless | tar stats it | 23:51 |
lifeless | outputs the tar header | 23:51 |
lifeless | opens and reads it | 23:51 |
lifeless | and its reading too little or too much | 23:51 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!