*** markvoelker has joined #openstack-infra | 00:02 | |
*** markvoelker has quit IRC | 00:06 | |
*** markvoelker has joined #openstack-infra | 00:45 | |
*** jamesmcarthur has quit IRC | 00:57 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kafs support https://review.opendev.org/623974 | 00:59 |
---|---|---|
openstackgerrit | Ian Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels https://review.opendev.org/665057 | 00:59 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd https://review.opendev.org/674215 | 00:59 |
*** markvoelker has quit IRC | 01:09 | |
*** jamesmcarthur has joined #openstack-infra | 01:19 | |
ianw | clarkb: thanks for looking at mirror; lmn if any help needed | 01:19 |
openstackgerrit | Merged opendev/system-config master: Re-add the Debian 8/jessie key to reprepro https://review.opendev.org/674406 | 01:29 |
*** jamesmcarthur has quit IRC | 01:32 | |
*** yikun has joined #openstack-infra | 01:44 | |
*** yamamoto has joined #openstack-infra | 01:45 | |
openstackgerrit | Merged openstack/diskimage-builder master: journal-to-console: element to send systemd journal to console https://review.opendev.org/669784 | 02:00 |
*** markvoelker has joined #openstack-infra | 02:01 | |
*** bhavikdbavishi has joined #openstack-infra | 02:02 | |
*** jamesmcarthur has joined #openstack-infra | 02:03 | |
*** yamamoto has quit IRC | 02:04 | |
*** yamamoto has joined #openstack-infra | 02:04 | |
*** markvoelker has quit IRC | 02:06 | |
*** bhavikdbavishi has quit IRC | 02:07 | |
*** bhavikdbavishi1 has joined #openstack-infra | 02:07 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:09 | |
*** n-saito has joined #openstack-infra | 02:10 | |
*** bhavikdbavishi has quit IRC | 02:22 | |
*** markvoelker has joined #openstack-infra | 02:32 | |
*** yamamoto has quit IRC | 02:35 | |
*** whoami-rajat has joined #openstack-infra | 02:38 | |
*** yamamoto has joined #openstack-infra | 02:38 | |
*** markvoelker has quit IRC | 02:42 | |
*** gregoryo has joined #openstack-infra | 02:46 | |
openstackgerrit | Merged openstack/diskimage-builder master: Cleanup: remove useless statement https://review.opendev.org/668372 | 02:46 |
*** ricolin has joined #openstack-infra | 02:51 | |
*** bhavikdbavishi has joined #openstack-infra | 03:11 | |
*** jamesmcarthur has quit IRC | 03:11 | |
*** markvoelker has joined #openstack-infra | 03:13 | |
*** markvoelker has quit IRC | 03:17 | |
*** psachin has joined #openstack-infra | 03:29 | |
*** ricolin_ has joined #openstack-infra | 03:35 | |
*** ricolin has quit IRC | 03:38 | |
*** mrda has joined #openstack-infra | 03:39 | |
mrda | hey infra, I've seen some strange behaviour lately in git cloning off opendev.org. For example, I cloned https://opendev.org/openstack/devstack.git and got a bunch of zero length files. i.e. like this: https://pastebin.com/fhAHaZpb I didn't see any errors on the clone, and only when I ran ./tools/create-stack-user.sh did I see anything wrong. Just wanted to add the data point here in case others | 03:41 |
mrda | see it too. | 03:41 |
*** jamesmcarthur has joined #openstack-infra | 03:42 | |
*** ramishra has joined #openstack-infra | 04:04 | |
*** udesale has joined #openstack-infra | 04:07 | |
*** AJaeger is now known as AJaeger_ | 04:11 | |
*** dpawlik has joined #openstack-infra | 04:13 | |
*** dpawlik has quit IRC | 04:20 | |
*** ykarel has joined #openstack-infra | 04:23 | |
*** markvoelker has joined #openstack-infra | 04:28 | |
*** Lucas_Gray has joined #openstack-infra | 04:29 | |
*** jamesmcarthur has quit IRC | 04:30 | |
*** markvoelker has quit IRC | 04:33 | |
*** ramishra has quit IRC | 04:45 | |
*** ramishra has joined #openstack-infra | 04:45 | |
ianw | mrda: is it only devstack or you've seen this on other repos? | 04:45 |
fungi | i can't seem to reproduce it. could you have run out of disk space? | 04:47 |
mrda | It's happened 3 times in a week, and I've just recloned and all was well. | 04:48 |
mrda | It's not a disk space issue. FWIW, it was in a local KVM running F29 on a F30 host. | 04:49 |
fungi | strange, i wonder if one of the backends has a corrupt copy | 04:49 |
mrda | Just figured y'all would like to know in case any other reports come in | 04:49 |
fungi | appreciated! | 04:51 |
fungi | i'm test cloning it from all 8 of the backends individually now | 04:51 |
mrda | thanks fungi | 04:51 |
fungi | no dice. tried to reproduce by cloning from all 8 backends individually | 04:59 |
fungi | so doesn't look like a corrupt repository on any of them at least | 04:59 |
*** janki has joined #openstack-infra | 05:02 | |
mrda | ok, thanks for letting me know. If I see it again I'll report it here. | 05:02 |
fungi | another possibility could be a misbehaving proxy (if you have a transparent https proxy anyway) | 05:03 |
*** jamesmcarthur has joined #openstack-infra | 05:04 | |
*** Lucas_Gray has quit IRC | 05:25 | |
*** notmyname has quit IRC | 05:40 | |
*** jaosorior has joined #openstack-infra | 05:41 | |
*** notmyname has joined #openstack-infra | 05:41 | |
*** dpawlik has joined #openstack-infra | 05:43 | |
*** kota_ has quit IRC | 05:46 | |
*** yamamoto has quit IRC | 05:47 | |
*** yamamoto has joined #openstack-infra | 05:48 | |
ianw | i would have thought git's checksumming would have avoided proxies getting in the way ... the other thing might be some odd manifestation of I5ebdaded3ffd0a5bc70c5e9ab5b18daefb358f58 if it's devstack only | 05:51 |
*** jamesmcarthur has quit IRC | 05:53 | |
ianw | like if you only notice it after it runs | 05:58 |
mrda | I shall take a look | 05:59 |
*** dchen has quit IRC | 06:06 | |
*** ramishra has quit IRC | 06:06 | |
*** ramishra has joined #openstack-infra | 06:06 | |
*** jamesmcarthur has joined #openstack-infra | 06:23 | |
*** jamesmcarthur has quit IRC | 06:27 | |
*** dpawlik has quit IRC | 06:28 | |
*** jtomasek has joined #openstack-infra | 06:34 | |
*** kota_ has joined #openstack-infra | 06:38 | |
*** dchen has joined #openstack-infra | 06:40 | |
*** ricolin__ has joined #openstack-infra | 06:41 | |
*** ricolin__ is now known as ricolin | 06:41 | |
*** dpawlik has joined #openstack-infra | 06:44 | |
*** ricolin_ has quit IRC | 06:45 | |
*** apetrich has joined #openstack-infra | 06:48 | |
*** jamesmcarthur has joined #openstack-infra | 06:49 | |
*** kopecmartin|pto is now known as kopecmartin | 06:51 | |
*** dchen has quit IRC | 06:53 | |
*** ginopc has joined #openstack-infra | 06:54 | |
*** ralonsoh has joined #openstack-infra | 06:54 | |
*** jamesmcarthur has quit IRC | 06:54 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Ansible roles for backup https://review.opendev.org/662657 | 07:01 |
*** pgaxatte has joined #openstack-infra | 07:01 | |
*** rcernin has quit IRC | 07:04 | |
*** slaweq has joined #openstack-infra | 07:05 | |
*** xek has joined #openstack-infra | 07:05 | |
*** pcaruana has joined #openstack-infra | 07:08 | |
*** pkopec has joined #openstack-infra | 07:11 | |
*** markvoelker has joined #openstack-infra | 07:17 | |
*** tesseract has joined #openstack-infra | 07:17 | |
*** tosky has joined #openstack-infra | 07:19 | |
*** bhavikdbavishi has quit IRC | 07:27 | |
*** xek has quit IRC | 07:28 | |
*** ccamacho has joined #openstack-infra | 07:49 | |
*** markvoelker has quit IRC | 07:51 | |
*** rpittau|afk is now known as rpittau | 07:51 | |
*** ykarel is now known as ykarel|lunch | 07:59 | |
*** dchen has joined #openstack-infra | 08:07 | |
*** bhavikdbavishi has joined #openstack-infra | 08:08 | |
*** tkajinam has quit IRC | 08:11 | |
*** yamamoto has quit IRC | 08:13 | |
*** iurygregory has joined #openstack-infra | 08:16 | |
*** arxcruz is now known as arxcruz|grb | 08:18 | |
*** arxcruz|grb is now known as arxcruz|brb | 08:18 | |
*** dchen has quit IRC | 08:19 | |
*** tesseract-RH has joined #openstack-infra | 08:22 | |
*** tesseract has quit IRC | 08:22 | |
*** piotrowskim has joined #openstack-infra | 08:24 | |
*** Lucas_Gray has joined #openstack-infra | 08:24 | |
*** tesseract-RH has quit IRC | 08:24 | |
*** tesseract has joined #openstack-infra | 08:24 | |
*** gregoryo has quit IRC | 08:25 | |
openstackgerrit | Ian Wienand proposed opendev/zone-opendev.org master: Add vexxhost backup server https://review.opendev.org/674549 | 08:29 |
*** Lucas_Gray has quit IRC | 08:30 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add vexxhost backup server https://review.opendev.org/674550 | 08:36 |
*** sshnaidm|afk is now known as sshnaidm | 08:37 | |
*** janki has quit IRC | 08:43 | |
*** smrcascao has joined #openstack-infra | 08:45 | |
*** e0ne has joined #openstack-infra | 08:48 | |
openstackgerrit | Merged opendev/system-config master: Ansible roles for backup https://review.opendev.org/662657 | 08:48 |
*** jaosorior has quit IRC | 08:51 | |
*** yamamoto has joined #openstack-infra | 08:54 | |
openstackgerrit | Merged opendev/irc-meetings master: Free up unused Murano meeting slot https://review.opendev.org/674307 | 08:56 |
*** markvoelker has joined #openstack-infra | 08:56 | |
openstackgerrit | Merged opendev/irc-meetings master: Free up unused Gluon meeting slot https://review.opendev.org/674304 | 08:59 |
openstackgerrit | Merged opendev/irc-meetings master: Free up unused openstack-chef meeting slot https://review.opendev.org/674300 | 08:59 |
*** gfidente has joined #openstack-infra | 09:00 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add vexxhost backup server https://review.opendev.org/674550 | 09:00 |
*** BrentonPoke has quit IRC | 09:01 | |
*** ykarel|lunch is now known as ykarel | 09:08 | |
*** yamamoto has quit IRC | 09:09 | |
*** tdasilva has joined #openstack-infra | 09:09 | |
*** guoqiao has quit IRC | 09:24 | |
*** markvoelker has quit IRC | 09:30 | |
*** yamamoto has joined #openstack-infra | 09:44 | |
*** janki has joined #openstack-infra | 09:45 | |
*** spsurya has joined #openstack-infra | 09:50 | |
*** bhavikdbavishi has quit IRC | 09:53 | |
*** jaosorior has joined #openstack-infra | 10:05 | |
*** ociuhandu has joined #openstack-infra | 10:20 | |
*** janki has quit IRC | 10:27 | |
openstackgerrit | Sagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL https://review.opendev.org/674572 | 10:38 |
*** markvoelker has joined #openstack-infra | 10:38 | |
sshnaidm | cores please take a look ^^ | 10:42 |
*** markvoelker has quit IRC | 10:43 | |
*** arxcruz|brb is now known as arxcruz | 10:52 | |
mordred | sshnaidm: left a suggestion | 10:55 |
openstackgerrit | Sagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL https://review.opendev.org/674572 | 10:57 |
*** tdasilva has quit IRC | 10:57 | |
sshnaidm | mordred, updated ^ | 10:57 |
*** tdasilva has joined #openstack-infra | 10:58 | |
mordred | sshnaidm: lgtm. I think we could also get rid of that ansible_os_family line now - but this reads well | 10:58 |
*** dchen has joined #openstack-infra | 11:00 | |
*** rosmaita has joined #openstack-infra | 11:01 | |
*** rascasoft has quit IRC | 11:03 | |
*** rascasoft has joined #openstack-infra | 11:05 | |
*** pgaxatte has quit IRC | 11:06 | |
*** pkopec_ has joined #openstack-infra | 11:10 | |
*** pkopec__ has joined #openstack-infra | 11:14 | |
*** pkopec has quit IRC | 11:14 | |
*** pkopec has joined #openstack-infra | 11:15 | |
*** pkopec_ has quit IRC | 11:16 | |
*** pkopec__ has quit IRC | 11:18 | |
*** jaosorior has quit IRC | 11:23 | |
*** ramishra has quit IRC | 11:24 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kafs support https://review.opendev.org/623974 | 11:26 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels https://review.opendev.org/665057 | 11:26 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd https://review.opendev.org/674215 | 11:26 |
*** ramishra has joined #openstack-infra | 11:26 | |
*** jamesdenton has joined #openstack-infra | 11:30 | |
*** pkopec_ has joined #openstack-infra | 11:31 | |
*** pkopec has quit IRC | 11:33 | |
*** rh-jelabarre has joined #openstack-infra | 11:45 | |
*** jamesmcarthur has joined #openstack-infra | 11:46 | |
*** ociuhandu has quit IRC | 11:49 | |
*** ociuhandu has joined #openstack-infra | 11:50 | |
*** yamamoto has quit IRC | 11:53 | |
*** markvoelker has joined #openstack-infra | 11:55 | |
*** adriancz has joined #openstack-infra | 12:01 | |
*** markvoelker has quit IRC | 12:02 | |
*** markvoelker has joined #openstack-infra | 12:02 | |
*** pgaxatte has joined #openstack-infra | 12:02 | |
*** jamesmcarthur has quit IRC | 12:05 | |
*** dchen has quit IRC | 12:06 | |
*** udesale has quit IRC | 12:06 | |
*** udesale has joined #openstack-infra | 12:07 | |
*** markvoelker has quit IRC | 12:09 | |
*** jaosorior has joined #openstack-infra | 12:10 | |
*** markvoelker has joined #openstack-infra | 12:11 | |
*** ociuhandu has quit IRC | 12:12 | |
*** yamamoto has joined #openstack-infra | 12:13 | |
*** betherly has joined #openstack-infra | 12:15 | |
*** rfolco has joined #openstack-infra | 12:16 | |
*** rfolco is now known as rfolco|ruck | 12:16 | |
*** ociuhandu has joined #openstack-infra | 12:20 | |
sshnaidm | some problem with limestone: SSH Error: data could not be sent to remote host "10.4.70.72". Make sure this host can be reached over ssh | 12:24 |
*** yamamoto has quit IRC | 12:24 | |
sshnaidm | Hostname: centos-7-limestone-regionone-0009753160 Provider: limestone-regionone | 12:24 |
*** rlandy has joined #openstack-infra | 12:26 | |
logan- | sshnaidm: could you link to the job logs, i'd like to take a look into that | 12:29 |
sshnaidm | logan-, https://logs.opendev.org/33/674433/7/check/tripleo-ci-centos-7-containers-multinode/bd26761/job-output.txt.gz#_2019-08-05_12_05_52_389815 | 12:31 |
logan- | thanks | 12:31 |
*** jroll has quit IRC | 12:39 | |
*** jroll has joined #openstack-infra | 12:39 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134 | 12:40 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Rework some bugs https://review.opendev.org/674425 | 12:40 |
*** jamesmcarthur has joined #openstack-infra | 12:41 | |
*** ccamacho has quit IRC | 12:43 | |
*** priteau has joined #openstack-infra | 12:45 | |
*** ramishra has quit IRC | 12:46 | |
*** yamamoto has joined #openstack-infra | 12:48 | |
*** jamesdenton has quit IRC | 12:49 | |
*** priteau has quit IRC | 12:54 | |
logan- | sshnaidm: going to monitor for a while. it looks like some rabbitmq messages were being held up due to a massive "notifications.sample" queue backlog. I cleaned that up and I don't see neutron messages backing up anymore, but that may have been the cause of communications issues during your multinode jobs. we'll see | 12:55 |
sshnaidm | logan-, thanks! | 12:56 |
*** bhavikdbavishi has joined #openstack-infra | 12:59 | |
*** jamesmcarthur has quit IRC | 13:04 | |
*** aaronsheffield has joined #openstack-infra | 13:04 | |
*** jamesdenton has joined #openstack-infra | 13:04 | |
fungi | overrun with rabbits | 13:05 |
*** ekultails has joined #openstack-infra | 13:05 | |
*** goldyfruit has joined #openstack-infra | 13:06 | |
*** goldyfruit has quit IRC | 13:11 | |
*** bhavikdbavishi has quit IRC | 13:12 | |
*** bhavikdbavishi has joined #openstack-infra | 13:13 | |
*** Lucas_Gray has joined #openstack-infra | 13:14 | |
*** mriedem has joined #openstack-infra | 13:14 | |
*** jamesmcarthur has joined #openstack-infra | 13:15 | |
*** ykarel is now known as ykarel|afk | 13:22 | |
mordred | fungi: I didn't realize the outer banks had rabbit issues | 13:22 |
fungi | just the openstack parts | 13:23 |
*** ociuhandu has quit IRC | 13:32 | |
*** HenryG has quit IRC | 13:37 | |
*** aedc has quit IRC | 13:38 | |
openstackgerrit | Thierry Carrez proposed openstack/ptgbot master: Display room capabilities https://review.opendev.org/674606 | 13:38 |
*** jbadiapa has joined #openstack-infra | 13:39 | |
*** aedc has joined #openstack-infra | 13:39 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: POC: Add ensure-managed role https://review.opendev.org/674609 | 13:42 |
*** jamesdenton has quit IRC | 13:44 | |
*** jbadiapa has quit IRC | 13:44 | |
*** pcaruana has quit IRC | 13:47 | |
*** dchen has joined #openstack-infra | 13:48 | |
*** dchen has joined #openstack-infra | 13:49 | |
*** dchen has quit IRC | 13:49 | |
*** ykarel|afk is now known as ykarel|away | 13:52 | |
*** bhavikdbavishi has quit IRC | 13:54 | |
*** iurygregory has quit IRC | 13:55 | |
*** yamamoto has quit IRC | 13:58 | |
*** pcaruana has joined #openstack-infra | 14:00 | |
*** ramishra has joined #openstack-infra | 14:07 | |
*** yamamoto has joined #openstack-infra | 14:08 | |
johnsom | Is http://paste.openstack.org/ down or is it just me? | 14:10 |
AJaeger_ | infra-root, takes ages to respond (still spinning) for me as well ^ | 14:11 |
AJaeger_ | wait, it's there now... | 14:11 |
johnsom | Yeah, mine just opened as well. | 14:11 |
*** rlandy is now known as rlandy|brb | 14:13 | |
*** ociuhandu has joined #openstack-infra | 14:19 | |
*** eharney has joined #openstack-infra | 14:20 | |
*** Lucas_Gray has quit IRC | 14:20 | |
*** Lucas_Gray has joined #openstack-infra | 14:22 | |
*** jcoufal has joined #openstack-infra | 14:22 | |
*** _erlon_ has joined #openstack-infra | 14:23 | |
*** rlandy|brb is now known as rlandy | 14:23 | |
*** jamesdenton has joined #openstack-infra | 14:23 | |
fungi | i believe it occasionally gets its mysql socket timed out and the backend daemon doesn't realize it, so takes a while before it tries to reconnect | 14:23 |
*** _erlon_ has left #openstack-infra | 14:23 | |
fungi | i've tried to track down the cause several times in the past, to no avail | 14:24 |
fungi | we probably ought to just move its db back out of trove and keep it local to the server | 14:24 |
*** pkopec_ has quit IRC | 14:25 | |
*** mriedem has quit IRC | 14:29 | |
*** mriedem has joined #openstack-infra | 14:29 | |
*** yamamoto has quit IRC | 14:31 | |
*** yamamoto has joined #openstack-infra | 14:32 | |
*** jaosorior has quit IRC | 14:36 | |
*** yamamoto has quit IRC | 14:37 | |
openstackgerrit | Thierry Carrez proposed openstack/ptgbot master: Generate etherpad links automatically https://review.opendev.org/674622 | 14:38 |
*** jaosorior has joined #openstack-infra | 14:39 | |
openstackgerrit | Thierry Carrez proposed openstack/ptgbot master: Display room capabilities https://review.opendev.org/674606 | 14:52 |
*** Lucas_Gray has quit IRC | 14:54 | |
clarkb | I'm going to rerun the fedora mirror sync with updated atomic excludes just as soon as my morning meeting completes | 14:57 |
clarkb | then rerun vos release in screen with -localauth | 14:57 |
clarkb | and if that looks happy I will release the lockfile on mirror-update | 14:57 |
*** pkopec has joined #openstack-infra | 14:59 | |
*** bnemec-pto is now known as bnemec | 15:01 | |
donnyd | clarkb: It looks like you are moving to object storage, but I haven't been following for what. I am working on getting some all nvme object storage going (gonna be a bit). Is every provider going to need object stores? | 15:01 |
clarkb | donnyd: I think the idea is to use object stores for log storage if they are available. We won't be requiring that of every cloud but will happily make use of them from those that have it (as a note we probably want ipv4 from those instances since client browsers will make log requests directly to the swift api) | 15:03 |
donnyd | Also FN is at capacity, and it looks like all of the issues have been pretty well ironed out. The only errors i get is when a fresh image is loaded, nodepool schedules too many of that instance type too fast | 15:03 |
clarkb | donnyd: as long as that results in "clean failures" eg we don't break the cloud in the process that is probably fine | 15:03 |
donnyd | All the api's are on v4 here (also working v6 for that too.) | 15:03 |
donnyd | So if its useful, I can make it my next target... I need to find a faster backend for glance anyways. How much space are you thinking the logs will require? Performance sla? | 15:04 |
clarkb | donnyd: corvus likely has better grasp of those numbers since he has been working on it recently, but I can take a look after meetings too | 15:05 |
*** Lucas_Gray has joined #openstack-infra | 15:07 | |
donnyd | Just for anyone who was curious, I know in the beginning of getting FN we talked a little about power requirements. At full load with 100(ish) instance in system I am using 3200 watts | 15:13 |
fungi | that's still probably half the power my dec alphastation drew at idle ;) | 15:15 |
donnyd | About 250 bucks a month in electrical costs. Quite honestly that isn't that bad | 15:16 |
donnyd | The blade chassis isn't really efficient until i put a full lot in it, and a full load on it. My 3 EMC isilions were taking more than that just for disk prior to the NVME build | 15:17 |
fungi | wow | 15:17 |
*** yamamoto has joined #openstack-infra | 15:19 | |
*** yamamoto has quit IRC | 15:25 | |
*** goldyfruit has joined #openstack-infra | 15:26 | |
*** pkopec has quit IRC | 15:26 | |
*** betherly has quit IRC | 15:29 | |
*** ralonsoh has quit IRC | 15:29 | |
*** iurygregory has joined #openstack-infra | 15:38 | |
*** ociuhandu has quit IRC | 15:40 | |
*** ociuhandu has joined #openstack-infra | 15:40 | |
*** ociuhandu has quit IRC | 15:41 | |
*** sthussey has joined #openstack-infra | 15:41 | |
*** ociuhandu has joined #openstack-infra | 15:42 | |
*** ociuhandu has quit IRC | 15:43 | |
*** ociuhandu has joined #openstack-infra | 15:43 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Report retried builds in a build set via mqtt. https://review.opendev.org/632727 | 15:43 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Report retried builds via sql reporter. https://review.opendev.org/633501 | 15:43 |
*** ociuhandu has quit IRC | 15:43 | |
*** ociuhandu has joined #openstack-infra | 15:45 | |
*** ociuhandu has quit IRC | 15:45 | |
*** ociuhandu has joined #openstack-infra | 15:45 | |
*** pgaxatte has quit IRC | 15:48 | |
*** gyee has joined #openstack-infra | 15:48 | |
*** Lucas_Gray has quit IRC | 15:51 | |
corvus | donnyd, clarkb: our current total log usage is 9.2TiB in 380M inodes; we're not sure how many cloud providers we can use yet, but including FN, it would be between 4 and 7, so that's something like 1.3-1.2TiB per provider and 54-95M objects. | 15:52 |
corvus | donnyd, clarkb: if we have room, we might expand our log retention a bit | 15:54 |
clarkb | corvus: we are at 4 weeks now? | 15:54 |
donnyd | Requirements for speed? | 15:54 |
corvus | yeah, looks like 30 days | 15:55 |
corvus | i don't think we've ever characterized the numbers around speed. everything gets written once and then read once automatically, after that, very little of it gets read. there shouldn't be a lot of concurrent activity, and read throughput isn't a big deal, but it would be good if it didn't take too long to fetch a random file since that's a user-interactive thing. | 15:57 |
corvus | maybe there's a nugget of useful info in there, sorry. | 15:58 |
*** AJaeger_ is now known as AJaeger | 15:59 | |
donnyd | I can add on swift without issue, just gonna take some time. The reason I ask is I am using all nvme disks. If this has no specific performance requirements, I can put it on normal spinning rust and offer a lots more in terms of available space. I have 3.2TB of nvme to throw at swift and 500Tb of spinning rust laying around. | 16:00 |
donnyd | Preferance for swift over ceph or vice vesa? | 16:01 |
clarkb | donnyd: we need CORS headers to make this work and ceph's swift api doesn't support that (the s3 api does but we've not yet gotten that working/confirmed) I think that means the preference for now is swift proper | 16:02 |
corvus | i think i lean toward more spinning rust being more useful in the long run; should be sufficient performance and has potential for growth. clarkb, fungi? | 16:02 |
corvus | swift++ | 16:02 |
clarkb | donnyd: also spinning disk is probably fine since ya being able to extend retention would be great | 16:02 |
clarkb | I am starting the fedora mirror sync with updates (no vos release) now | 16:02 |
fungi | ooh, this is always a fun debate | 16:03 |
fungi | and i agree, cheap slow bulk storage is likely a great tradeoff | 16:03 |
corvus | i typo'd before -- current retention would be 1.3-2.3TiB per provider | 16:04 |
donnyd | Ok, I think i can do that without any issues at all. | 16:04 |
*** tesseract has quit IRC | 16:04 | |
donnyd | If its rust, the smallest drive I have is 3tb | 16:04 |
fungi | write speed probably matters more than read speed (though maybe the amount of the two is going to be roughly on par considering our current logstash worker implementation?) | 16:04 |
corvus | if we increase retention to our target of 6months, it would be 8-14TiB per provider | 16:04 |
*** ociuhandu_ has joined #openstack-infra | 16:04 | |
donnyd | so as long as we can keep it below say 20TB, I am a happy camper. | 16:05 |
corvus | seems reasonable | 16:05 |
corvus | fungi: can you +3 https://review.opendev.org/674459 ? | 16:05 |
fungi | corvus: yep, that's a simple enough one | 16:06 |
*** ociuhandu has quit IRC | 16:07 | |
*** michael-beaver has joined #openstack-infra | 16:09 | |
*** ginopc has quit IRC | 16:09 | |
*** lpetrut has joined #openstack-infra | 16:09 | |
clarkb | rsync done, doing vos release with localauth now | 16:09 |
donnyd | Also more interesting tidbits of info on what the CI looks like from the cloud perspective. The instances only use about 25% of what they are provisioned for disk space | 16:14 |
openstackgerrit | Merged opendev/base-jobs master: Add zuul manifest role to swift base job https://review.opendev.org/674459 | 16:15 |
AJaeger | config-core, please review https://review.opendev.org/673988 | 16:15 |
clarkb | AJaeger: on it | 16:15 |
AJaeger | can we merge new repo creation changes again? https://review.opendev.org/673900 and https://review.opendev.org/673898 create two new airship repos | 16:16 |
AJaeger | thanks, clarkb | 16:16 |
AJaeger | and here's one more change: https://review.opendev.org/658439 | 16:16 |
clarkb | AJaeger: yes all gitea backends have been replaced and are being managed by ansible, creating projects should be safe | 16:16 |
AJaeger | and one more https://review.opendev.org/673764 for review, please | 16:17 |
AJaeger | clarkb: great | 16:17 |
*** e0ne has quit IRC | 16:19 | |
clarkb | AJaeger: pabelanger I left a concern on https://review.opendev.org/#/c/658439/2 | 16:19 |
AJaeger | thanks for spotting | 16:20 |
*** gfidente has quit IRC | 16:20 | |
fungi | donnyd: we could likely get away with something like 3:1 oversubscribed storage because only a fraction of jobs need additional space (for example jobs which create an assortment of cinder volumes for some tests). problem likely is that oversubscribing means thin-provisioning rootfs volumes, which likely implies a write throughput penalty | 16:21 |
fungi | also some jobs use swap, and i expect that would get markedly worse performance on thin-provisioned volumes | 16:22 |
openstackgerrit | Merged openstack/project-config master: Fix ACL for compute-hyperv https://review.opendev.org/673988 | 16:28 |
AJaeger | clarkb: time to review the two new repo creations changes as well? | 16:28 |
donnyd | I see no write penalties on my thin volumes, but my drives are so much faster than the rest of everything else... I can't really tell | 16:29 |
* AJaeger would love to be in that situation ;) | 16:30 | |
clarkb | knocked 160GB off the fedora mirror I think | 16:30 |
clarkb | AJaeger: and ya I'll review those shortly | 16:30 |
AJaeger | clarkb: thanks | 16:31 |
donnyd | Its fun to see just how stupid fast nvme drives really are | 16:31 |
*** ociuhandu_ has quit IRC | 16:33 | |
*** ociuhandu has joined #openstack-infra | 16:33 | |
donnyd | If only i could get this on all the hypervisors READ: bw=11.4GiB/s (12.2GB/s), 1454MiB/s-1555MiB/s (1525MB/s-1630MB/s), io=80.0GiB (85.9GB), run=6586-7043msec | 16:33 |
*** mattw4 has joined #openstack-infra | 16:34 | |
*** bhavikdbavishi has joined #openstack-infra | 16:34 | |
openstackgerrit | Merged openstack/project-config master: Allow registered users to vote for backport candidates https://review.opendev.org/673764 | 16:35 |
donnyd | AJaeger: And I have less in it than most peoples laptops | 16:36 |
*** diablo_rojo has joined #openstack-infra | 16:37 | |
*** mattw4 has quit IRC | 16:37 | |
*** mattw4 has joined #openstack-infra | 16:39 | |
*** jamesmcarthur has quit IRC | 16:39 | |
clarkb | infra-root AJaeger can you check my comment on https://review.opendev.org/#/c/673900/2 and consider if that concern is valid? | 16:40 |
fungi | donnyd: oh, yeah, if you're already thin-provisioning and not seeing a disk bottleneck, then i guess that's ideal (and also presumably means you have available blocks on the backend?) | 16:40 |
mordred | clarkb: I agree with yoru comment | 16:42 |
*** weifan has joined #openstack-infra | 16:43 | |
donnyd | fungi: I am 100% overprovisioned on the backend. ( requested 8TB, have 4Tb). We are only using 25% of the 4tb I have. | 16:43 |
donnyd | so its more like it actually uses 12.5 of what is requested. | 16:43 |
AJaeger | clarkb: I see your point but view people type it, so I'm fine either way. | 16:44 |
clarkb | ya testing cinder in particular requires quite a bit of disk space, but most other jobs likely don't get near the limits | 16:44 |
*** ramishra has quit IRC | 16:44 | |
donnyd | I will keep an eye on it, as today is the first day at full capacity | 16:44 |
clarkb | (cinder wouldn't need so much disk if they allowed for specifying sizes smaller than incremements of 1GB) | 16:44 |
clarkb | iirc devstack provisions ~30GB of disk just for cinder beause we have multiple threads running and the volume deletion for test cleanup can lag behind subsequent tests starting | 16:45 |
fungi | donnyd: that's excellent data, thanks! | 16:45 |
*** ociuhandu has quit IRC | 16:45 | |
clarkb | AJaeger: I mostly don't want the next k8s related thing to come along and be cranky someone has decided they are the k8s thing | 16:46 |
donnyd | it just interesting to see what the CI system actually uses.. From my POV, its memory, then cpu, then network, then storage | 16:46 |
fungi | clarkb: some swift tests might too, since i know they create an auxiliary filesystem to be able to exercise extended xfs attributes | 16:46 |
clarkb | donnyd: that is great feedback | 16:47 |
clarkb | would it be worthwhile for us to have donnyd write a little section of notes we can tack onto https://docs.openstack.org/infra/system-config/contribute-cloud.html ? | 16:47 |
clarkb | a "from the provider perspective" notes? | 16:48 |
openstackgerrit | Merged openstack/project-config master: New project request: airship/porthole https://review.opendev.org/673898 | 16:48 |
corvus | yeah, the storage overprovisioning factor is a great new number to have | 16:48 |
*** markvoelker has quit IRC | 16:49 | |
donnyd | I need to find a way to find out what it needs from an iops, BW perspective as well to be optimal | 16:49 |
clarkb | donnyd: https://opendev.org/opendev/system-config/src/branch/master/doc/source/contribute-cloud.rst is the source file for that doc if you want to propose a change with your thoughts. I am happy to push a change with you too if you would rather we do ti that way (can collaborate on an etherpad if so) | 16:49 |
donnyd | https://usercontent.irccloud-cdn.com/file/UmYecpK0/Screenshot%20from%202019-08-05%2012-50-09.png | 16:50 |
*** ociuhandu has joined #openstack-infra | 16:50 | |
donnyd | Do we just want to add a notes section? | 16:51 |
donnyd | Also is it helpful to have public facing dashboards for FN, so you can capture the data from the provider side? | 16:52 |
clarkb | donnyd: maybe a "From a provider's perspective" section then you can format that in whatever manner makes the most sense | 16:53 |
clarkb | for dashboards I think that may also help but am not sure how easy it isfor you to do that without also exposing important details | 16:53 |
*** jamesmcarthur has joined #openstack-infra | 16:54 | |
donnyd | There are no details that are important.. this is the only workload on here | 16:54 |
donnyd | I think the important details are probably going to provide valuable data for the underpinnings of the CI. The more we know, the more efficient we can make things. | 16:56 |
clarkb | ++ | 16:56 |
*** ricolin has quit IRC | 16:57 | |
donnyd | Just no sure what data we want to gather, but I have grafana hooked up to gnocchi... just don't know how to gnocchi quite yet | 16:57 |
*** kopecmartin is now known as kopecmartin|off | 16:59 | |
*** markvoelker has joined #openstack-infra | 17:01 | |
corvus | clarkb, fungi, mordred: mnaser told me about "openstack ec2 credentials create" so i tried that and plugged those into this script, but i still get an error: http://paste.openstack.org/show/755536/ -- any ideas of anything else we could try? | 17:01 |
clarkb | that makes me wonder if the ceph s3 api isn't tied into the openstack aws api support | 17:05 |
clarkb | ec2 credentials for example are a nova thing | 17:05 |
logan- | ^ typically that works, at least that's the way we connect s3 clients to our radosgw | 17:06 |
*** jamesmcarthur has quit IRC | 17:07 | |
*** udesale has quit IRC | 17:07 | |
corvus | that's good to know; i guess we'll wait to see what mnaser says | 17:08 |
*** bhavikdbavishi1 has joined #openstack-infra | 17:10 | |
*** bhavikdbavishi has quit IRC | 17:12 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 17:12 | |
fungi | that's an interesting entanglement, but i guess makes some sense | 17:12 |
*** jamesmcarthur has joined #openstack-infra | 17:12 | |
*** goldyfruit has quit IRC | 17:18 | |
*** goldyfruit_ has joined #openstack-infra | 17:18 | |
donnyd | can anyone hit https://grafana.fortnebula.com | 17:21 |
AJaeger | yes, I can, donnyd | 17:21 |
diablo_rojo | I can too | 17:22 |
donnyd | is there anything worthwhile on it? | 17:22 |
clarkb | and the openstack utilization dashboard renders for me too | 17:22 |
AJaeger | donnyd: nit, it's OpenStack with capital S; ) | 17:22 |
donnyd | refresh AJaeger | 17:22 |
AJaeger | max 75 % utilization... | 17:22 |
AJaeger | donnyd: nit trick ;) | 17:23 |
donnyd | LOL | 17:23 |
AJaeger | donnyd: interesting, compute-8 and compute-9 have very low cpu utilization compared with rest | 17:24 |
donnyd | Does anyone know how to actually use gnocchi... these are backend stats | 17:24 |
donnyd | compute-9 always will, its the dedicated node for the mirror | 17:24 |
AJaeger | and compute-2 has more than average... | 17:24 |
AJaeger | ah, I see | 17:24 |
donnyd | compute-8 was placed into serivce early this morning | 17:24 |
donnyd | Would you want to see anything else... I was thinking maybe detailed dashboards per hypervisor or something like that maybe... I need to also gather front-end metrics | 17:26 |
donnyd | like instance related things | 17:26 |
clarkb | donnyd: aggregate network bw at the gateway/boundary would probably be helpful | 17:27 |
donnyd | Ok, I can do that | 17:28 |
clarkb | donnyd: in part because that might help us evaluate effectiveness of the mirrors/caching | 17:28 |
*** psachin has quit IRC | 17:28 | |
*** ociuhandu has quit IRC | 17:28 | |
*** ociuhandu has joined #openstack-infra | 17:29 | |
*** ociuhandu has quit IRC | 17:31 | |
*** ociuhandu has joined #openstack-infra | 17:31 | |
*** jamesmcarthur has quit IRC | 17:32 | |
donnyd | how about now clarkb | 17:32 |
donnyd | My local network requirements are usually pretty low, so that is 90% the CI | 17:33 |
clarkb | donnyd: edge bw graph lgtm | 17:34 |
donnyd | sweet... | 17:34 |
*** kjackal has joined #openstack-infra | 17:34 | |
*** ociuhandu has quit IRC | 17:35 | |
*** jamesmcarthur has joined #openstack-infra | 17:37 | |
*** sgw has quit IRC | 17:37 | |
*** jamesmcarthur_ has joined #openstack-infra | 17:39 | |
*** jamesmcarthur has quit IRC | 17:41 | |
*** jamesmcarthur has joined #openstack-infra | 17:42 | |
*** rpittau is now known as rpittau|afk | 17:42 | |
*** tdasilva has quit IRC | 17:43 | |
*** tdasilva has joined #openstack-infra | 17:43 | |
*** jamesmcarthur_ has quit IRC | 17:43 | |
Shrews | infra-root: I've put together a script to cleanup the leaked swift objects from image uploads. I'm not sure how to determine which ones might actually still be needed, so I thought I'd start by deleting any objects that have a last-modified timestamp older than 5 days, just out of safety and until I can figure out exactly where they're being leaked. Does this seem safe? | 17:44 |
clarkb | Shrews: I want to say mordred believes the swift objects from image uploads are always safe to delete as long as the upload isn't in progress | 17:45 |
clarkb | Shrews: so 5 days should be plenty to make sure we don't delete anything in progress | 17:45 |
Shrews | yeah, it was the "figure out which ones are in progress" that wasn't the easy part | 17:46 |
fungi | does glance just use them as an upload staging area? | 17:46 |
clarkb | fungi: exactly | 17:46 |
fungi | Shrews: pause the builders first? | 17:46 |
clarkb | I think we can start with the 5 day limit first to clean out 99% of the leaks | 17:46 |
clarkb | then ya turn off the builders and clean up the rest maybe | 17:46 |
Shrews | fungi: i don't think we have to do that yet with the 5 day limit | 17:46 |
fungi | if there are no image uploads in progress, then it would be safe to delete it all en masse | 17:46 |
Shrews | what clarkb said :) | 17:46 |
fungi | ahh, also an option, sure | 17:47 |
corvus | Shrews: yeah, i vote delete with 5 day limit, don't worry about pausing, then after we fix the leak, we can pause and delete all | 17:48 |
fungi | makes sense | 17:48 |
*** jamesmcarthur has quit IRC | 17:48 | |
corvus | (since we know we're going to continue leaking for a bit) | 17:48 |
*** jamesmcarthur has joined #openstack-infra | 17:53 | |
Shrews | ok, it's running for dfw. might take a while | 17:55 |
*** ykarel|away has quit IRC | 17:56 | |
mordred | Shrews: my understanding is that we don't need any fo them - as long as we're not actively importing the related image | 17:56 |
mordred | Shrews: having read the full scrollback now - yes, I agree with teh above course of action :) | 17:57 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: render console in js with listview https://review.opendev.org/674663 | 17:59 |
*** sgw has joined #openstack-infra | 18:02 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: render console in js with listview https://review.opendev.org/674663 | 18:03 |
*** jamesmcarthur_ has joined #openstack-infra | 18:03 | |
*** e0ne has joined #openstack-infra | 18:05 | |
*** jamesmcarthur has quit IRC | 18:06 | |
Shrews | Seems that 'openstack object list images' does not change as the objects are deleted from the container. Yet the deletes are clearly working since I get a 404 if I try to delete one already processed. Maybe the 'list' data is cached ? | 18:08 |
Shrews | oh, i bet it limits output to 10000 | 18:09 |
Shrews | which means we probably have WAAAY more objects than we think | 18:09 |
Shrews | because 'list' output changes, but not the total size of it | 18:10 |
Shrews | :( | 18:11 |
*** jamesmcarthur has joined #openstack-infra | 18:13 | |
*** bhavikdbavishi has quit IRC | 18:13 | |
corvus | Shrews: i want to say that the web interface was like ~13000 objects per region | 18:13 |
corvus | assuming it's correct (i'm going going out on a limb for it) | 18:14 |
*** jamesmcarthur_ has quit IRC | 18:14 | |
*** bhavikdbavishi has joined #openstack-infra | 18:14 | |
Shrews | ok, not much over 10k then | 18:14 |
corvus | fungi, clarkb, mordred: i have learned from our friends at ovh that swift doesn't return an allow-origin header unless the browser sends an origin header, so my test of ovh was faulty, ovh+swift should be working fine and we can go ahead and start using it. | 18:15 |
mordred | corvus: oh good! (because browsers will send the origin header) | 18:17 |
corvus | mordred: yeah, though apparently only with an xmlhttprequest, since just hitting the url in the browser and watching the network panel shows no headers | 18:18 |
corvus | fungi, clarkb, mordred, mnaser: repeating the test against vexxhost with an origin header, i do see an allow-origin: * header, but i also see allow-methods: HEAD... so that's confusing. | 18:18 |
mordred | Shrews: I *think* if we just delete the SLO manifest object we don't need to delete teh subobjects | 18:18 |
mordred | Shrews: I do not know if that is useful to you or not | 18:18 |
Shrews | mordred: maybe if you explained the words? | 18:19 |
mordred | corvus: ah - yeah - because just opening it in a normal browser link is a thing you can just do | 18:19 |
mordred | Shrews: each of these images is a "Large Object" - which has an object in /images and then a bunch of smaller ones in (I think) /image_segments | 18:19 |
corvus | i think image_segments may have been empty when i looked? | 18:20 |
mordred | Shrews: I *believe* we only have to delete the manifest objects and the segment objects will get autodeleted for us | 18:20 |
mordred | neat | 18:20 |
mordred | then it's also possible I'm just high and shouldn't be listened to | 18:20 |
corvus | but Shrews was looking a different way and should probably confirm :) | 18:20 |
Shrews | mordred: i'm currently only deleting in 'images' so i'll check the other afterwards | 18:20 |
corvus | (because i'm *really* suspcious when the control panel says 0 objects in a container) | 18:20 |
Shrews | mordred: corvus: i currently don't see anything in image_segments | 18:21 |
clarkb | corvus: the s3 api allows you to do fine grained cors for request types too | 18:21 |
Shrews | err, images_segments, that is | 18:21 |
clarkb | corvus: was your vexxhost test via s3 or swift apis? | 18:22 |
corvus | clarkb: still no joy with s3, those files were put via swift apis | 18:22 |
corvus | oh, ha | 18:24 |
corvus | vexxhost returns whatever method you use to access it via Access-Control-Allow-Methods | 18:24 |
corvus | so, erm, i think we're just going to have to try this | 18:24 |
*** jamesmcarthur_ has joined #openstack-infra | 18:24 | |
clarkb | oh taht should be fine because we are doing GETs from the web ui | 18:25 |
corvus | yeah, as long as the options response makes sense | 18:26 |
corvus | that's the thing i'm not sure how to simulate | 18:26 |
*** jamesmcarthur has quit IRC | 18:26 | |
*** jamesmcarthur has joined #openstack-infra | 18:28 | |
*** betherly has joined #openstack-infra | 18:30 | |
*** weifan has quit IRC | 18:30 | |
*** jamesmcarthur_ has quit IRC | 18:30 | |
*** tosky has quit IRC | 18:32 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:34 | |
*** betherly has quit IRC | 18:34 | |
*** smarcet has joined #openstack-infra | 18:35 | |
cloudnull | anyone around mind giving this a look - https://review.opendev.org/#/c/674414/7/zuul.d/molecule.yaml - seems this change is resulting in a zuul configuration validation failure, and I'm not really sure why? | 18:35 |
cloudnull | keeps raising "Unknown configuration error" | 18:36 |
*** jamesmca_ has joined #openstack-infra | 18:36 | |
*** jamesmcarthur has quit IRC | 18:36 | |
clarkb | cloudnull: I can take a look | 18:37 |
cloudnull | thanks clarkb! | 18:37 |
mordred | cloudnull: do you have hide-ci turned on? because there's a nice clear error message from zuul on that | 18:38 |
clarkb | hideci should no longer hide that message fwiw | 18:38 |
mordred | awesome | 18:39 |
cloudnull | no. | 18:39 |
*** jamesmcarthur_ has quit IRC | 18:39 | |
* cloudnull that that I'm aware of | 18:39 | |
fungi | and it's "toggle extra ci" now | 18:39 |
mordred | cloudnull: weird. anywho - I'm going to guess it's https://review.opendev.org/#/c/674414/7/zuul.d/playbooks/releasenotes/notes/docker_enable_vfs-c8b41b02111341df.yaml | 18:39 |
mordred | which zuul is attempting to parse as zuul config | 18:39 |
mordred | cloudnull: oh - I see the unknown config error you mentioned | 18:39 |
clarkb | mordred: ya because that isn't a list but instead a dict | 18:39 |
mordred | cloudnull: there's a better error message on ps7 | 18:40 |
cloudnull | its the release note ? | 18:40 |
mordred | cloudnull: maybe you were in the wrong dir when you ran reno new? | 18:41 |
cloudnull | that very well could be... | 18:41 |
* cloudnull walks away in shame | 18:41 | |
*** jamesmca_ has quit IRC | 18:41 | |
clarkb | vos release on fedora mirror is still running if anyone is wondering | 18:41 |
mordred | cloudnull: it happens to the best of us - and also to me :) | 18:42 |
cloudnull | thanks clarkb mordred | 18:42 |
clarkb | I wonder if I should rerun the sync script by hand and rerun vos release again just to make sure it comes in under the timeout normally | 18:44 |
*** tdasilva has quit IRC | 18:44 | |
clarkb | this update did remove a bunch of data from the volume so not suprising it is slow | 18:44 |
*** tdasilva has joined #openstack-infra | 18:45 | |
*** e0ne has quit IRC | 18:45 | |
*** jamesmcarthur has joined #openstack-infra | 18:46 | |
*** smarcet has quit IRC | 18:46 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:48 | |
*** spsurya has quit IRC | 18:49 | |
*** jamesmcarthur has quit IRC | 18:50 | |
*** jamesmcarthur has joined #openstack-infra | 18:53 | |
*** jamesmcarthur_ has quit IRC | 18:55 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698 | 18:58 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698 | 19:00 |
*** jamesmcarthur has quit IRC | 19:01 | |
*** jamesmcarthur_ has joined #openstack-infra | 19:01 | |
mordred | infra-root: afk for just a bit | 19:01 |
*** bhavikdbavishi has quit IRC | 19:04 | |
*** lpetrut has quit IRC | 19:05 | |
*** jamesmcarthur_ has quit IRC | 19:06 | |
*** weifan has joined #openstack-infra | 19:08 | |
*** igordc has joined #openstack-infra | 19:08 | |
*** Goneri has joined #openstack-infra | 19:08 | |
*** jamesmcarthur has joined #openstack-infra | 19:12 | |
*** weifan has quit IRC | 19:12 | |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New project request: airship/kubernetes-entrypoint https://review.opendev.org/673900 | 19:13 |
*** e0ne has joined #openstack-infra | 19:16 | |
*** jamesmcarthur has quit IRC | 19:17 | |
*** jamesmcarthur has joined #openstack-infra | 19:19 | |
*** jamesmcarthur has quit IRC | 19:24 | |
*** iurygregory has quit IRC | 19:24 | |
*** diablo_rojo has quit IRC | 19:25 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698 | 19:26 |
*** jamesmcarthur has joined #openstack-infra | 19:31 | |
*** e0ne has quit IRC | 19:31 | |
* clarkb finds lunch | 19:31 | |
*** smarcet has joined #openstack-infra | 19:33 | |
*** jamesmcarthur_ has joined #openstack-infra | 19:36 | |
*** jamesmcarthur has quit IRC | 19:37 | |
*** jamesmcarthur has joined #openstack-infra | 19:38 | |
*** jamesmcarthur_ has quit IRC | 19:40 | |
*** betherly has joined #openstack-infra | 19:41 | |
*** Goneri has quit IRC | 19:42 | |
*** Goneri has joined #openstack-infra | 19:43 | |
*** betherly has quit IRC | 19:45 | |
*** tdasilva has quit IRC | 19:45 | |
*** tdasilva has joined #openstack-infra | 19:46 | |
*** jamesmcarthur has quit IRC | 19:48 | |
openstackgerrit | Merged openstack/project-config master: New project request: airship/kubernetes-entrypoint https://review.opendev.org/673900 | 19:54 |
*** weifan has joined #openstack-infra | 19:59 | |
*** weifan has quit IRC | 20:03 | |
*** Goneri has quit IRC | 20:07 | |
*** harlowja has joined #openstack-infra | 20:08 | |
mnaser | corvus: ill try to dig a little bit and see whats up with the "405 Method Not Allowed" | 20:09 |
mnaser | also, is storyboard-dev.openstack.org having issues? https://storyboard-dev.openstack.org/api/v1/stories?limit=10&project_id=237&sort_dir=desc&status=active takes almost 10 seconds | 20:10 |
fungi | looking now | 20:10 |
fungi | interestingly, system load on it is nearly nonexistent | 20:13 |
*** Goneri has joined #openstack-infra | 20:13 | |
fungi | the vast majority of memory utilization is buffers/cache too | 20:14 |
fungi | mnaser: watching top, it seems that's the amount of time it takes a mysql thread to return the result of the corresponding database queries | 20:16 |
*** jcoufal has quit IRC | 20:17 | |
*** michael-beaver has quit IRC | 20:18 | |
*** betherly has joined #openstack-infra | 20:22 | |
*** betherly has quit IRC | 20:27 | |
*** tdasilva has quit IRC | 20:27 | |
*** tdasilva_ has joined #openstack-infra | 20:27 | |
*** weifan has joined #openstack-infra | 20:27 | |
*** jamesmcarthur has joined #openstack-infra | 20:27 | |
*** guoqiao has joined #openstack-infra | 20:41 | |
*** prometheanfire has quit IRC | 20:41 | |
mnaser | i guess maybe some indexes are missing, i dunno | 20:42 |
* mnaser shrugs | 20:42 | |
*** prometheanfire has joined #openstack-infra | 20:43 | |
*** betherly has joined #openstack-infra | 20:43 | |
mnaser | it makes dealingh with this pretty slow | 20:43 |
mnaser | https://storyboard.openstack.org/#!/project/openstack/governance | 20:43 |
*** jamesmcarthur has quit IRC | 20:47 | |
*** betherly has quit IRC | 20:47 | |
*** jamesmcarthur has joined #openstack-infra | 20:48 | |
*** jamesmcarthur has quit IRC | 20:48 | |
*** jamesmcarthur has joined #openstack-infra | 20:48 | |
donnyd | Do we know what the jobs are reaching out to the interwebs for? I could just sync what the mirror cannot down locally to speed up wany things | 20:49 |
donnyd | mnaser: What do you have the nodepool tenants volume limit set at for vexxhost? | 20:51 |
donnyd | i should say volume quota? | 20:52 |
clarkb | donnyd: the idea is that they shouldn't hit the internet for anything (well not without going through the mirror at least) but it is a general purpose ci system and we know people don't use the mirror for everything so we keep that option available | 20:53 |
*** Goneri has quit IRC | 20:53 | |
donnyd | Well I ask because I was thinking I may try to get on the official mirror list for the major distros so when it goes "upstream", that would be local too | 20:54 |
fungi | mnaser: yes, we've got mysql slow query logging turned on for the production storyboard to try and figure out where the hot spots are and which queries are in most need of refactoring. there are some really egregious ones | 20:54 |
clarkb | donnyd: oh in many cases it is github repos | 20:54 |
mnaser | donnyd: i am not sure, for nodepool i think i have 80*number_of_nodes*small% | 20:54 |
clarkb | donnyd: for random projects like puppet and ansible modules or golang repos | 20:55 |
mnaser | donnyd: you know what'd be neat? but also i dont know the security implication of it, but running a transparent squid proxy at your gateway | 20:55 |
donnyd | oh, well i can't do much about that. I could cache locally with the squids or something like that.. but I would be worried it would interfere with what is upstream and | 20:55 |
mnaser | and then generate stats from that only (not cache) | 20:55 |
mnaser | *just* to help us find what are the big things that are being hit | 20:56 |
donnyd | I can surely setup a squid that would just miss on everything and tell us where they are going | 20:56 |
donnyd | mnaser: As long as the squid doesn' | 20:58 |
donnyd | doesn't have any hits, it shouldn't effect the jobs. right? | 20:58 |
donnyd | hit the enter too early | 20:58 |
mnaser | i dont think so, it'll just be a pass through, but don't quote me on that :) | 20:59 |
ianw | Shrews: you've possibly finished by now, but I had a script to maybe do a similar cleanups to what you've described above @ -> https://review.opendev.org/#/c/562510/1/tools/rax-cleanup-image-uploads.py | 20:59 |
donnyd | clarkb: fungi any thoughts on that? | 20:59 |
ianw | looks like that never got a second +2 | 20:59 |
ianw | infra-root: could i get a couple of eyes on https://review.opendev.org/#/c/674550/ and the dns change https://review.opendev.org/#/c/674549/ to get the ansible-ised backup server started please? thanks | 21:01 |
ianw | it won't have any clients yet, one thing at a time :) | 21:02 |
corvus | infra-root: i have paused the ansible run_all script for a bit | 21:02 |
clarkb | donnyd: the reason we haven't set up squid ourselves is that we can't know source ranges for all our possible VM IPs to set up whitelists for access to that. We don't want open transparent proxy on the internet | 21:02 |
donnyd | Not sure how I will man-in-the-middle github though | 21:02 |
clarkb | donnyd: right that was going to be my next concern. I think if you did it at the cloud level it would have to be transparent | 21:03 |
donnyd | well I know the range, because I asigned it | 21:03 |
clarkb | in which case https wouldn't work | 21:03 |
*** betherly has joined #openstack-infra | 21:03 | |
clarkb | donnyd: ya in your cloud that particular problem is probably fine. In most other clouds it is a bigger issue (since we pull from large pools of addrs that are recycled) | 21:03 |
corvus | if all the clouds support neutron we might be able to set up a private network | 21:03 |
clarkb | corvus: I don't think ovh does | 21:04 |
donnyd | well we only need one cloud to gets the datas | 21:04 |
*** mattw4 has quit IRC | 21:04 | |
*** mattw4 has joined #openstack-infra | 21:04 | |
donnyd | I think are options are pretty limited for https proxy. You would have to create a CA the image trusts and then pass it off to my squid and then figure out what breaks | 21:05 |
Shrews | ianw: oh, i wasn't aware of your script. mine is much more basic. i've had to pause my deletes for a bit but i'll look at yours. If we fix the root problem, we shouldn't need to keep a script around in system-config. | 21:05 |
donnyd | or force instances to not care and trust all | 21:05 |
*** diablo_rojo has joined #openstack-infra | 21:06 | |
donnyd | I am no squid expert, btw | 21:07 |
donnyd | just know enough to break things | 21:07 |
clarkb | donnyd: ya to do https caching you essentially set up a mitm | 21:07 |
clarkb | (which is maybe an option for us, I'll have to think about that a bit more) | 21:07 |
*** eharney has quit IRC | 21:07 | |
donnyd | well I don't actually need to do the caching part (while it won't bother me to), just find out where the connections are going to. | 21:08 |
ianw | Shrews: yeah, from my notes, at the time it cleared out 200+TB ... maybe a lot of that was de-duped, but it was *a lot* | 21:08 |
*** betherly has quit IRC | 21:08 | |
donnyd | I can also turn logging up for the edge fw and find out who is talking to who... | 21:08 |
clarkb | donnyd: oh for that tcpdump on your gateway may be sufficient. You can tcpdump filting out the mirror (since we want traffic to go through it) then see whateverything else is pinging | 21:08 |
fungi | it would be possible (i'm not saying it's worth the effort involved) to set up a private ca for infra which signs certificates for a list of specific dns names and then distribute those to transparent proxies in each provider and set our images to add them to the appropriate trust lists | 21:08 |
donnyd | but won't the client still complain about the mitm | 21:09 |
*** weshay is now known as weshay_dentist | 21:09 | |
fungi | not if the cert provided by the proxy is trusted by the client | 21:09 |
donnyd | Ok, then that could work | 21:09 |
fungi | that's the "set our images to add them to the appropriate trust lists" part | 21:09 |
donnyd | I can easily lie in dns to tell it who github.com is | 21:10 |
fungi | the other trick would be working out how to make only requests for specific domains go through them | 21:10 |
fungi | oh, yeah that would do it | 21:10 |
donnyd | DNS attacj | 21:10 |
donnyd | LOL | 21:10 |
fungi | we could also have our own dns overrides which substitute the proxy's address for those names | 21:10 |
donnyd | surely could | 21:11 |
donnyd | I would just have to update the tenant's network to point to your dns | 21:11 |
donnyd | You could also get the who is asking for what from there too | 21:11 |
fungi | so clients know github.com as the ip address of whatever the local transparent proxy is in that provider, and are configured to trust the infra ca which signed the github.com cert we installed on the proxy | 21:11 |
donnyd | correct | 21:12 |
fungi | well, we also install local caching dns recursive resolver/forwarder on every node, so could simply carry the overrides there | 21:12 |
donnyd | and you get to know where else they are going by looking in the dns logs for who looked up what | 21:12 |
donnyd | all you would require from the cloud providers is an updated dns for the nodepool network | 21:13 |
fungi | we don't actually trust/use the complimentary recursive resolvers in our providers anyway because they tend to be unreliable for a number of reasons | 21:13 |
donnyd | right now its google, but that could easily be you | 21:13 |
fungi | (most fun one being rackspace, whose resolvers have threat mitigation to automatically blacklist queries from the ip addresses of misbehaving server instances, but the blacklist doesn't get updated as quickly as they turn those addresses over to new instances) | 21:14 |
clarkb | ya so we use google and cloudflare dns with local unbound caching results they return | 21:15 |
donnyd | So your mirror would need a squid and dns server | 21:16 |
jrosser | deploying openstack where a bunch of the resources needed come from git repos / mirrors / proxies / blah with a company CA rather than a public one took a ton of work | 21:16 |
donnyd | and some certs, then populate those certs (we already know github.com), but there may be other places | 21:16 |
jrosser | specifically the hoops that python requests / certifi make you jump through | 21:16 |
clarkb | jrosser: good reminder re requests | 21:16 |
donnyd | you could also prevent the I am gonna pull from the public centos mirrors anyway behaviors too | 21:17 |
clarkb | it ships its own ca trust chain | 21:17 |
jrosser | yep, it's a PITA | 21:17 |
jrosser | so i don't think this is a set-and-forget without fixing up the actual jobs | 21:17 |
clarkb | ya there will be fallout from that for sure | 21:18 |
*** jamesmcarthur has quit IRC | 21:18 | |
donnyd | well a good start is just doing dns, because then you are going to definitively know where the tests are going | 21:18 |
donnyd | maybe not speed anything up (except dns requests), but surely get valuable data | 21:19 |
fungi | donnyd: wouldn't necessarily have to be squid. the apache mod_proxy with proxy caching we already have in place would be capable, in theory | 21:19 |
*** mriedem has quit IRC | 21:19 | |
*** jamesmcarthur has joined #openstack-infra | 21:20 | |
donnyd | yea any proxy could work. | 21:20 |
jrosser | i do have a sort of related example | 21:21 |
*** mriedem has joined #openstack-infra | 21:21 | |
jrosser | i suffer / enjoy / deal with http proxies all day and to ensure good support i made a CI job with one | 21:21 |
jrosser | https://logs.opendev.org/21/672721/1/gate/openstack-ansible-deploy-aio_proxy-ubuntu-bionic/130b0f8/logs/host/squid/access.log.txt.gz | 21:21 |
*** rh-jelabarre has quit IRC | 21:23 | |
donnyd | So in this job, the CI build starts and sends traffic through this proxy? | 21:23 |
jrosser | pretty much, yes | 21:23 |
*** yamamoto has joined #openstack-infra | 21:23 | |
fungi | that's a neat design | 21:24 |
jrosser | but it's really trying to be a mock up of an environment where the deployment is behind a proxy | 21:24 |
jrosser | and as there isnt a full http proxy as part of the infra i made my own in the job | 21:24 |
*** jamesmcarthur has quit IRC | 21:25 | |
*** yamamoto has quit IRC | 21:28 | |
*** mattw4 has quit IRC | 21:28 | |
donnyd | So where should we start? I can get working on the logging thing after I am done with work today | 21:29 |
*** mattw4 has joined #openstack-infra | 21:29 | |
donnyd | I need to get central logging up and running anyways | 21:29 |
*** markvoelker has quit IRC | 21:29 | |
clarkb | jrosser: at a job level it is easier to do that because you know all of the source IPs related to the job at runtime | 21:29 |
fungi | what is it you want to do again? measure what the network traffic to nodes is that doesn't get proxied/cached? | 21:30 |
jrosser | also a full http proxy is different to the transparent one you have been discussing, so not quite the same thing | 21:30 |
jrosser | but anyway, it's really just as an example and a maybe representative set of squid log to look at | 21:31 |
donnyd | fungi: Yea, just see where the hotspots are in the public side and then figure out a way to get that service faster access | 21:32 |
*** mattw4 has quit IRC | 21:32 | |
donnyd | If its 90% github.com, then we can solve that a bunch of ways, if its random interwebs there might not be anything that can be done | 21:32 |
donnyd | but right now we have very little data other than the traffic graph from my wan side (which is 90% these instances). Not that I mind using the public internet for things, but more in the interest of what can be done to speed it up | 21:33 |
donnyd | If we could do something, we would need a way to make sure that it does in fact make things faster and not slower | 21:35 |
fungi | simple ip address analyses (netflow, tcpdump, whatever) might yield some insights | 21:35 |
fungi | (ntop) | 21:35 |
donnyd | like case in point, the package mirrors. I could have local mirrors that would make access to .rpm and .deb content in orders of magnitude faster | 21:35 |
fungi | part of the challenge is selectively depending on resources like that hosted by a provider since we don't have it as a consistent option everywhere | 21:37 |
*** mattw4 has joined #openstack-infra | 21:37 | |
fungi | is rpm and deb content on our mirror host not fast (once the afs cache is warmed)? | 21:37 |
donnyd | Not as fast as I could provide it | 21:39 |
donnyd | I can cache 100% of the content downloaded locally | 21:41 |
donnyd | Only so much can be done in 8GB | 21:41 |
donnyd | yea and that makes sense from my perspective. Consistency is more important than speed at scale | 21:42 |
clarkb | donnyd: so there are a couple things about the linux distro mirrors that are worth pointing out | 21:43 |
clarkb | the first is that they cache to disk not memory so the actual afs disk cache size is much larger than 8GB | 21:43 |
clarkb | second is that through the use of afs we synchronize all of our package mirrors globally | 21:44 |
clarkb | this means you don't have jobs passing or failing due to local state (which is nice) | 21:44 |
*** betherly has joined #openstack-infra | 21:44 | |
donnyd | Oh I c, I thought AFS was caching everything in memory. | 21:44 |
clarkb | and finally we build the mirrors very carefully (ubuntu/debian at least) and only publish them when we have a valid mirror ready as well as not deleting older content until some time has passed to avoid jobs failing because files disappear out from under them | 21:44 |
clarkb | I think the rpm rsync mirrors already handle the deletion phase out but also the way tools like yum work means it is far less likely you'll want to download older files compared to say apt | 21:45 |
fungi | yeah, and avoiding mismatched package/index combinations | 21:45 |
fungi | and afs provides atomic volume updates | 21:46 |
fungi | so the data in a given package repository isn't changing while it's being read, which helps maintain consistency | 21:46 |
donnyd | I am not arguing the value of AFS, it makes sense to me. | 21:46 |
donnyd | But here is an example https://logs.opendev.org/87/674687/3/check/tripleo-ci-centos-7-scenario001-standalone/ce1d8a5/job-output.txt#_2019-08-05_20_34_33_056272 | 21:47 |
clarkb | looks like we are set to use 50GB of afs disk cache and fn mirror is using about 8GB currently | 21:47 |
fungi | except at given update checkpoints, where we ensure we don't delete packages referenced by the previous index state on that pulse | 21:47 |
donnyd | On the rest of my network packagey things are about 5x that speed | 21:47 |
clarkb | donnyd: so I did a test of warm cache data from afs on the mriror itself and I get super speed (I forget the exact number but it was illy) | 21:47 |
clarkb | donnyd: I think we need to test the network bw between VMs? | 21:48 |
clarkb | because testing shows the afs cache works and it is fast, but then pull from off host is slower | 21:48 |
clarkb | (whcih means maybe something between the hosts is slwoing it down) | 21:48 |
*** jamesmcarthur has joined #openstack-infra | 21:49 | |
*** betherly has quit IRC | 21:49 | |
donnyd | we should do that and figure out where the bottleneck is. Could be the router between the mirror and the clients | 21:49 |
donnyd | I did just upgrade it so I know for sure it will move at wireline 10G | 21:49 |
donnyd | but its worth a look see. | 21:49 |
donnyd | the mirror here should be ludicrous fast. It shares nothing and is on the fastest equipment you can get. the mirror is on a very new cpu / ram / nvme and shares nothing | 21:51 |
clarkb | ok lets warm a file up then we can request it from elsewhere and see what it looks like | 21:51 |
donnyd | Anyways, I think i should probably work on getting the logging stuffs up and running and then we can work on the host - mirror thing | 21:52 |
donnyd | sure | 21:52 |
donnyd | I think we did BW tests and were able to get quite respectable speeds a few weeks back. I think fungi and I were working on it | 21:53 |
clarkb | donnyd: http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso I am going to warm that up. It is 500MB ish | 21:53 |
clarkb | as expected, cold is not fast. ~1MBps | 21:54 |
clarkb | so will be a few before we can do second request and see what it is like warm | 21:54 |
donnyd | is there a way to make those cold requests go faster? Or do they go back to something central infra owns? | 21:55 |
clarkb | (the slowness on cold access is due to how afs does window sizing of its udp packets, unfortunatley I don't think that is very fixable at least not without replacing afs) | 21:55 |
clarkb | donnyd: ya we have central fileservers and they serve the data and they do it with a max window size that was set back in the 80s? and it is much smaller than modern networking would prefer | 21:55 |
*** jamesmcarthur has quit IRC | 21:55 | |
clarkb | it predates tcp so afs does its own windowing with udp | 21:56 |
*** whoami-rajat has quit IRC | 21:58 | |
donnyd | lmk when to test on my end. Desktop has the 10g's, so I should get some respectable numbers | 21:59 |
clarkb | will do | 21:59 |
*** mattw4 has quit IRC | 22:02 | |
*** mattw4 has joined #openstack-infra | 22:03 | |
*** jtomasek has quit IRC | 22:04 | |
clarkb | donnyd: cold: 507.00M 931KB/s in 10m 23s warm: 507.00M 456MB/s in 1.1s | 22:04 |
clarkb | donnyd: you should test it from your desktop now | 22:05 |
*** betherly has joined #openstack-infra | 22:05 | |
donnyd | (zuul) donny@office:~> curl -o /dev/null http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso | 22:05 |
donnyd | % Total % Received % Xferd Average Speed Time Time Time Current | 22:05 |
donnyd | Dload Upload Total Spent Left Speed | 22:05 |
donnyd | 100 507M 100 507M 0 0 292M 0 0:00:01 0:00:01 --:--:-- 291M | 22:05 |
clarkb | now I guess I should ssh into a test node and give it a test from there | 22:06 |
*** rcernin has joined #openstack-infra | 22:06 | |
donnyd | Oh you know what else I should put on the public graph is mirror node bw | 22:07 |
clarkb | donnyd: 507.00M 295MB/s in 1.7s from one of our bionic test nodes | 22:07 |
clarkb | the mystery deepens | 22:07 |
*** smarcet has quit IRC | 22:07 | |
clarkb | maybe that points at our cache not being as warm as we often think it is? | 22:08 |
clarkb | ianw: ^ you've been poking at afs logs recently. Any idea if we can better quantify that? | 22:08 |
donnyd | so that number makes sense to me because my edge fw will only do 3.5GB on a single thread | 22:08 |
*** betherly has quit IRC | 22:09 | |
corvus | infra-root: i have re-enabled the run-all crontab | 22:12 |
clarkb | corvus: ty | 22:12 |
ianw | clarkb: not really, sorry. i've been working on a little speed test type thing to send to grafana to get openafs/kafs comparisions, not ready just yet | 22:12 |
donnyd | ok, now you can see the BW for the mirror node from the host view | 22:13 |
mordred | clarkb: ++ | 22:14 |
donnyd | Does that graph help? | 22:17 |
clarkb | donnyd: ya that likely gives us somethin we can cross check for over/under loading to at least narrow it down | 22:17 |
* donnyd dinnering | 22:21 | |
clarkb | I think the next thing for us to investigate is if those slower than expected downloads are cache misses | 22:21 |
*** tosky has joined #openstack-infra | 22:26 | |
*** diablo_rojo has quit IRC | 22:28 | |
*** diablo_rojo has joined #openstack-infra | 22:29 | |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Update swift upload credentials https://review.opendev.org/674709 | 22:29 |
clarkb | fedora mirror vos release finally completed. I am going to rerun the script manually and vos release manually again just to be sure it runs in a reasonable amount of time | 22:30 |
corvus | infra-root: a speedy review of that ^ would help me progress testing | 22:30 |
mordred | corvus: looks great | 22:39 |
*** rlandy is now known as rlandy|bbl | 22:42 | |
openstackgerrit | Merged opendev/base-jobs master: Update swift upload credentials https://review.opendev.org/674709 | 22:45 |
donnyd | clarkb: the cache misses are usually much slower, I was found one that was on the quicker side | 22:45 |
clarkb | donnyd: so we have an intermediate speed too between cold and hot cache? | 22:46 |
donnyd | I am yet to see anything close to what we can see in hot cache on any instance | 22:46 |
*** markvoelker has joined #openstack-infra | 22:46 | |
*** tosky has quit IRC | 22:47 | |
donnyd | Here is a better view of what is most likely not cold, but surely not showing the hot numbers we are getting | 22:47 |
donnyd | https://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_15_560505 | 22:47 |
*** jamesmcarthur has joined #openstack-infra | 22:48 | |
clarkb | hrm ya 27MB/s is well above cold numbers but like 1/10th hot numbers | 22:48 |
clarkb | rsync took about 16 minutes for fedora and I've started the vos release | 22:49 |
donnyd | I do have a faster nic to drop into my fw sometime in the near future, should get numbers up even higher.. but it doesn't look to me like I can really do anything to make it faster | 22:52 |
clarkb | I wonder if the intermediate speed could be due to a mix of hot and cold files? | 22:53 |
clarkb | yum is requesting a whole set then depending on which are warm and not you get a different aggregate speed? probably still a good idea to check afs cache hit/miss rate | 22:54 |
donnyd | granted it only took 3 seconds from flash to bang on the download, so I am not sure that doing something would make a marked improvement on build times | 22:54 |
donnyd | that part started here | 22:54 |
donnyd | https://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_11_871635 | 22:54 |
donnyd | But one would think that if its in cache you would see speeds more like our test | 22:56 |
*** tkajinam has joined #openstack-infra | 22:56 | |
*** eernst has joined #openstack-infra | 22:57 | |
fungi | throughput for one large file is going to be markedly different than serialized throughput for lots of small files each with its own request | 22:57 |
donnyd | any recommendations on log aggregation tools / looks like E(L/F)K is pretty much the standard. Just curious | 22:58 |
clarkb | corvus: ianw http://docs.openafs.org/Reference/1/xstat_cm_test.html is that the command we want to run to get info about cache hit/miss rates? | 22:58 |
clarkb | the stat cache is a separate from the data cache | 23:00 |
* clarkb looks to see what our statcache size is | 23:00 | |
fungi | donnyd: or old school it with rsyslog on a mysql backend | 23:02 |
corvus | clarkb: i'm unfamiliar with that command | 23:02 |
clarkb | getcacheparms seems to only show dcache entries? | 23:02 |
clarkb | corvus: do you know how to get info on the stat cache? | 23:02 |
*** eernst has quit IRC | 23:02 | |
corvus | no | 23:02 |
donnyd | fungi: which do you think requires the lowest level of maintenance | 23:02 |
donnyd | I was looking a graylog just because it does some inline transforms I would think I could make use of | 23:03 |
clarkb | docs say default stats cache size is 300 on unix machines | 23:03 |
clarkb | trying to see what we are actually set to /me digs around in config files | 23:03 |
*** aaronsheffield has quit IRC | 23:03 | |
fungi | donnyd: once set up neither require a considerable amount of maintenance. elk is a lot more resource-heavy, but has all your log analysis and indexing without needing separate analysis tools | 23:03 |
*** diablo_rojo has quit IRC | 23:03 | |
fungi | when i hear "log aggregation" i don't immediately assume log analysis too | 23:04 |
donnyd | fungi: I want to push the logs public too, but I don't particularly want to hand someone keys to the network | 23:04 |
donnyd | So something I could easily do a transform on would be baller | 23:05 |
fungi | kibana can be public facing. exposing elacsticsearch to the open internet on the other hand is dangerific | 23:05 |
*** mattw4 has quit IRC | 23:06 | |
*** mattw4 has joined #openstack-infra | 23:06 | |
clarkb | afsd manpage says the -stat value is actually based on the size of the dcache | 23:06 |
donnyd | LOL, of course I just meant a dashboard.. not handing over my monitoring stuffs to anyone either, but dashboards are fairly harmless | 23:06 |
fungi | but that raises another hidden goal which mere "log aggregation" doesn't cover in my book. publication is something kibana handles for you. if you wanted a web frontend to your mysql tables that would be yet another something you'd need to find/write | 23:07 |
clarkb | I now believe that line 3 in http://paste.openstack.org/show/755542/ may exposethe stat cache | 23:07 |
clarkb | we set the dcache to 50GB and from that afs decides we should be able to cache 1.5 million stat entries | 23:07 |
clarkb | we actually use more dcache than stat cache so that should be fine | 23:08 |
clarkb | corvus: any objection to me trying that cm_test command? | 23:09 |
donnyd | fungi: so you are saying elk is a pretty safe bet | 23:10 |
clarkb | `xstat_cm_test -cmname mirror01.regionone.fortnebula.org -colID 2 -onceonly` something like that | 23:10 |
corvus | clarkb: honestly, no idea | 23:11 |
fungi | donnyd: well, i'm saying it's a popular choice, and you're looking for at least a few bells and whistles it provides over mere log aggregation | 23:11 |
donnyd | yea, I think that is safe to say. I need a log thingy that does all the things | 23:12 |
fungi | be mindful of the licensing. there are a proprietary license builds subject to a number of additional conditions, you probably want to stick to the open source version | 23:13 |
donnyd | thanks fungi | 23:14 |
clarkb | dcache miss rate is .08%. vcache miss rate is .3% | 23:16 |
clarkb | both are within the recommended parameters | 23:17 |
clarkb | https://docs.openafs.org/AdminGuide/HDRWQ402.html | 23:17 |
*** markvoelker has quit IRC | 23:20 | |
clarkb | mirror01.regionone.fortnebuila.opendev.org:/root/cache_manager_data if anyone else wants to see the verbose output | 23:20 |
*** betherly has joined #openstack-infra | 23:27 | |
*** betherly has quit IRC | 23:32 | |
*** kjackal has quit IRC | 23:38 | |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Synchronize cloud creds in base-test-swift job https://review.opendev.org/674713 | 23:46 |
corvus | infra-root: ^ small update i missed earlier | 23:46 |
clarkb | vos release has now been running for almost an hour. The rsync pulled in some new arm64 images for atomic. I do wonder if this was ultimately what broke it in the first place? | 23:46 |
clarkb | +2 | 23:46 |
*** weifan has quit IRC | 23:48 | |
*** weifan has joined #openstack-infra | 23:48 | |
fungi | quite possible | 23:52 |
fungi | what jobs are using atomic images? | 23:52 |
*** weifan has quit IRC | 23:52 | |
clarkb | fungi: magnum jobs aiui | 23:53 |
corvus | i think the expiration time is something like 12h | 23:53 |
corvus | though i'm not sure what happens if we hit our cron job timeout first | 23:53 |
clarkb | it actually doesn't look like we have a cron job timeout on these | 23:54 |
corvus | good | 23:54 |
corvus | the 12h timeout (or maybe it's 10h?) is the longest that a vos release can take without using -localauth | 23:54 |
clarkb | ya crontab lacks timeouts and the commands with timeouts in the script itself do not include the vos release | 23:55 |
corvus | so if a release from mirror-update takes longer than that, we'll be left with a locked volume | 23:55 |
clarkb | k | 23:56 |
clarkb | in that case it is probably safe for me to unlock the lockfile when this vos release finishes assuming it does so in under 12 hours | 23:56 |
corvus | (the transaction to perform the update will eventually finish, but the auth will have expired for the subsequent volume unlock) | 23:56 |
corvus | clarkb: you running from m-u? | 23:56 |
clarkb | corvus: no I am running the vos release on afs01.dfw.openstack.org | 23:57 |
*** smarcet has joined #openstack-infra | 23:57 | |
corvus | oh you mean unlock the cron lockfile | 23:57 |
corvus | gotcha, yeah | 23:57 |
clarkb | yes that | 23:57 |
*** smarcet has left #openstack-infra | 23:57 | |
openstackgerrit | Merged opendev/base-jobs master: Synchronize cloud creds in base-test-swift job https://review.opendev.org/674713 | 23:57 |
*** owalsh_ has joined #openstack-infra | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!