Monday, 2019-08-05

*** markvoelker has joined #openstack-infra		00:02
*** markvoelker has quit IRC		00:06
*** markvoelker has joined #openstack-infra		00:45
*** jamesmcarthur has quit IRC		00:57
openstackgerrit	Ian Wienand proposed opendev/system-config master: kafs support https://review.opendev.org/623974	00:59
openstackgerrit	Ian Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels https://review.opendev.org/665057	00:59
openstackgerrit	Ian Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd https://review.opendev.org/674215	00:59
*** markvoelker has quit IRC		01:09
*** jamesmcarthur has joined #openstack-infra		01:19
ianw	clarkb: thanks for looking at mirror; lmn if any help needed	01:19
openstackgerrit	Merged opendev/system-config master: Re-add the Debian 8/jessie key to reprepro https://review.opendev.org/674406	01:29
*** jamesmcarthur has quit IRC		01:32
*** yikun has joined #openstack-infra		01:44
*** yamamoto has joined #openstack-infra		01:45
openstackgerrit	Merged openstack/diskimage-builder master: journal-to-console: element to send systemd journal to console https://review.opendev.org/669784	02:00
*** markvoelker has joined #openstack-infra		02:01
*** bhavikdbavishi has joined #openstack-infra		02:02
*** jamesmcarthur has joined #openstack-infra		02:03
*** yamamoto has quit IRC		02:04
*** yamamoto has joined #openstack-infra		02:04
*** markvoelker has quit IRC		02:06
*** bhavikdbavishi has quit IRC		02:07
*** bhavikdbavishi1 has joined #openstack-infra		02:07
*** bhavikdbavishi1 is now known as bhavikdbavishi		02:09
*** n-saito has joined #openstack-infra		02:10
*** bhavikdbavishi has quit IRC		02:22
*** markvoelker has joined #openstack-infra		02:32
*** yamamoto has quit IRC		02:35
*** whoami-rajat has joined #openstack-infra		02:38
*** yamamoto has joined #openstack-infra		02:38
*** markvoelker has quit IRC		02:42
*** gregoryo has joined #openstack-infra		02:46
openstackgerrit	Merged openstack/diskimage-builder master: Cleanup: remove useless statement https://review.opendev.org/668372	02:46
*** ricolin has joined #openstack-infra		02:51
*** bhavikdbavishi has joined #openstack-infra		03:11
*** jamesmcarthur has quit IRC		03:11
*** markvoelker has joined #openstack-infra		03:13
*** markvoelker has quit IRC		03:17
*** psachin has joined #openstack-infra		03:29
*** ricolin_ has joined #openstack-infra		03:35
*** ricolin has quit IRC		03:38
*** mrda has joined #openstack-infra		03:39
mrda	hey infra, I've seen some strange behaviour lately in git cloning off opendev.org. For example, I cloned https://opendev.org/openstack/devstack.git and got a bunch of zero length files. i.e. like this: https://pastebin.com/fhAHaZpb I didn't see any errors on the clone, and only when I ran ./tools/create-stack-user.sh did I see anything wrong. Just wanted to add the data point here in case others	03:41
mrda	see it too.	03:41
*** jamesmcarthur has joined #openstack-infra		03:42
*** ramishra has joined #openstack-infra		04:04
*** udesale has joined #openstack-infra		04:07
*** AJaeger is now known as AJaeger_		04:11
*** dpawlik has joined #openstack-infra		04:13
*** dpawlik has quit IRC		04:20
*** ykarel has joined #openstack-infra		04:23
*** markvoelker has joined #openstack-infra		04:28
*** Lucas_Gray has joined #openstack-infra		04:29
*** jamesmcarthur has quit IRC		04:30
*** markvoelker has quit IRC		04:33
*** ramishra has quit IRC		04:45
*** ramishra has joined #openstack-infra		04:45
ianw	mrda: is it only devstack or you've seen this on other repos?	04:45
fungi	i can't seem to reproduce it. could you have run out of disk space?	04:47
mrda	It's happened 3 times in a week, and I've just recloned and all was well.	04:48
mrda	It's not a disk space issue. FWIW, it was in a local KVM running F29 on a F30 host.	04:49
fungi	strange, i wonder if one of the backends has a corrupt copy	04:49
mrda	Just figured y'all would like to know in case any other reports come in	04:49
fungi	appreciated!	04:51
fungi	i'm test cloning it from all 8 of the backends individually now	04:51
mrda	thanks fungi	04:51
fungi	no dice. tried to reproduce by cloning from all 8 backends individually	04:59
fungi	so doesn't look like a corrupt repository on any of them at least	04:59
*** janki has joined #openstack-infra		05:02
mrda	ok, thanks for letting me know. If I see it again I'll report it here.	05:02
fungi	another possibility could be a misbehaving proxy (if you have a transparent https proxy anyway)	05:03
*** jamesmcarthur has joined #openstack-infra		05:04
*** Lucas_Gray has quit IRC		05:25
*** notmyname has quit IRC		05:40
*** jaosorior has joined #openstack-infra		05:41
*** notmyname has joined #openstack-infra		05:41
*** dpawlik has joined #openstack-infra		05:43
*** kota_ has quit IRC		05:46
*** yamamoto has quit IRC		05:47
*** yamamoto has joined #openstack-infra		05:48
ianw	i would have thought git's checksumming would have avoided proxies getting in the way ... the other thing might be some odd manifestation of I5ebdaded3ffd0a5bc70c5e9ab5b18daefb358f58 if it's devstack only	05:51
*** jamesmcarthur has quit IRC		05:53
ianw	like if you only notice it after it runs	05:58
mrda	I shall take a look	05:59
*** dchen has quit IRC		06:06
*** ramishra has quit IRC		06:06
*** ramishra has joined #openstack-infra		06:06
*** jamesmcarthur has joined #openstack-infra		06:23
*** jamesmcarthur has quit IRC		06:27
*** dpawlik has quit IRC		06:28
*** jtomasek has joined #openstack-infra		06:34
*** kota_ has joined #openstack-infra		06:38
*** dchen has joined #openstack-infra		06:40
*** ricolin__ has joined #openstack-infra		06:41
*** ricolin__ is now known as ricolin		06:41
*** dpawlik has joined #openstack-infra		06:44
*** ricolin_ has quit IRC		06:45
*** apetrich has joined #openstack-infra		06:48
*** jamesmcarthur has joined #openstack-infra		06:49
*** kopecmartin\|pto is now known as kopecmartin		06:51
*** dchen has quit IRC		06:53
*** ginopc has joined #openstack-infra		06:54
*** ralonsoh has joined #openstack-infra		06:54
*** jamesmcarthur has quit IRC		06:54
openstackgerrit	Ian Wienand proposed opendev/system-config master: Ansible roles for backup https://review.opendev.org/662657	07:01
*** pgaxatte has joined #openstack-infra		07:01
*** rcernin has quit IRC		07:04
*** slaweq has joined #openstack-infra		07:05
*** xek has joined #openstack-infra		07:05
*** pcaruana has joined #openstack-infra		07:08
*** pkopec has joined #openstack-infra		07:11
*** markvoelker has joined #openstack-infra		07:17
*** tesseract has joined #openstack-infra		07:17
*** tosky has joined #openstack-infra		07:19
*** bhavikdbavishi has quit IRC		07:27
*** xek has quit IRC		07:28
*** ccamacho has joined #openstack-infra		07:49
*** markvoelker has quit IRC		07:51
*** rpittau\|afk is now known as rpittau		07:51
*** ykarel is now known as ykarel\|lunch		07:59
*** dchen has joined #openstack-infra		08:07
*** bhavikdbavishi has joined #openstack-infra		08:08
*** tkajinam has quit IRC		08:11
*** yamamoto has quit IRC		08:13
*** iurygregory has joined #openstack-infra		08:16
*** arxcruz is now known as arxcruz\|grb		08:18
*** arxcruz\|grb is now known as arxcruz\|brb		08:18
*** dchen has quit IRC		08:19
*** tesseract-RH has joined #openstack-infra		08:22
*** tesseract has quit IRC		08:22
*** piotrowskim has joined #openstack-infra		08:24
*** Lucas_Gray has joined #openstack-infra		08:24
*** tesseract-RH has quit IRC		08:24
*** tesseract has joined #openstack-infra		08:24
*** gregoryo has quit IRC		08:25
openstackgerrit	Ian Wienand proposed opendev/zone-opendev.org master: Add vexxhost backup server https://review.opendev.org/674549	08:29
*** Lucas_Gray has quit IRC		08:30
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add vexxhost backup server https://review.opendev.org/674550	08:36
*** sshnaidm\|afk is now known as sshnaidm		08:37
*** janki has quit IRC		08:43
*** smrcascao has joined #openstack-infra		08:45
*** e0ne has joined #openstack-infra		08:48
openstackgerrit	Merged opendev/system-config master: Ansible roles for backup https://review.opendev.org/662657	08:48
*** jaosorior has quit IRC		08:51
*** yamamoto has joined #openstack-infra		08:54
openstackgerrit	Merged opendev/irc-meetings master: Free up unused Murano meeting slot https://review.opendev.org/674307	08:56
*** markvoelker has joined #openstack-infra		08:56
openstackgerrit	Merged opendev/irc-meetings master: Free up unused Gluon meeting slot https://review.opendev.org/674304	08:59
openstackgerrit	Merged opendev/irc-meetings master: Free up unused openstack-chef meeting slot https://review.opendev.org/674300	08:59
*** gfidente has joined #openstack-infra		09:00
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add vexxhost backup server https://review.opendev.org/674550	09:00
*** BrentonPoke has quit IRC		09:01
*** ykarel\|lunch is now known as ykarel		09:08
*** yamamoto has quit IRC		09:09
*** tdasilva has joined #openstack-infra		09:09
*** guoqiao has quit IRC		09:24
*** markvoelker has quit IRC		09:30
*** yamamoto has joined #openstack-infra		09:44
*** janki has joined #openstack-infra		09:45
*** spsurya has joined #openstack-infra		09:50
*** bhavikdbavishi has quit IRC		09:53
*** jaosorior has joined #openstack-infra		10:05
*** ociuhandu has joined #openstack-infra		10:20
*** janki has quit IRC		10:27
openstackgerrit	Sagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL https://review.opendev.org/674572	10:38
*** markvoelker has joined #openstack-infra		10:38
sshnaidm	cores please take a look ^^	10:42
*** markvoelker has quit IRC		10:43
*** arxcruz\|brb is now known as arxcruz		10:52
mordred	sshnaidm: left a suggestion	10:55
openstackgerrit	Sagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL https://review.opendev.org/674572	10:57
*** tdasilva has quit IRC		10:57
sshnaidm	mordred, updated ^	10:57
*** tdasilva has joined #openstack-infra		10:58
mordred	sshnaidm: lgtm. I think we could also get rid of that ansible_os_family line now - but this reads well	10:58
*** dchen has joined #openstack-infra		11:00
*** rosmaita has joined #openstack-infra		11:01
*** rascasoft has quit IRC		11:03
*** rascasoft has joined #openstack-infra		11:05
*** pgaxatte has quit IRC		11:06
*** pkopec_ has joined #openstack-infra		11:10
*** pkopec__ has joined #openstack-infra		11:14
*** pkopec has quit IRC		11:14
*** pkopec has joined #openstack-infra		11:15
*** pkopec_ has quit IRC		11:16
*** pkopec__ has quit IRC		11:18
*** jaosorior has quit IRC		11:23
*** ramishra has quit IRC		11:24
openstackgerrit	Ian Wienand proposed opendev/system-config master: kafs support https://review.opendev.org/623974	11:26
openstackgerrit	Ian Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels https://review.opendev.org/665057	11:26
openstackgerrit	Ian Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd https://review.opendev.org/674215	11:26
*** ramishra has joined #openstack-infra		11:26
*** jamesdenton has joined #openstack-infra		11:30
*** pkopec_ has joined #openstack-infra		11:31
*** pkopec has quit IRC		11:33
*** rh-jelabarre has joined #openstack-infra		11:45
*** jamesmcarthur has joined #openstack-infra		11:46
*** ociuhandu has quit IRC		11:49
*** ociuhandu has joined #openstack-infra		11:50
*** yamamoto has quit IRC		11:53
*** markvoelker has joined #openstack-infra		11:55
*** adriancz has joined #openstack-infra		12:01
*** markvoelker has quit IRC		12:02
*** markvoelker has joined #openstack-infra		12:02
*** pgaxatte has joined #openstack-infra		12:02
*** jamesmcarthur has quit IRC		12:05
*** dchen has quit IRC		12:06
*** udesale has quit IRC		12:06
*** udesale has joined #openstack-infra		12:07
*** markvoelker has quit IRC		12:09
*** jaosorior has joined #openstack-infra		12:10
*** markvoelker has joined #openstack-infra		12:11
*** ociuhandu has quit IRC		12:12
*** yamamoto has joined #openstack-infra		12:13
*** betherly has joined #openstack-infra		12:15
*** rfolco has joined #openstack-infra		12:16
*** rfolco is now known as rfolco\|ruck		12:16
*** ociuhandu has joined #openstack-infra		12:20
sshnaidm	some problem with limestone: SSH Error: data could not be sent to remote host "10.4.70.72". Make sure this host can be reached over ssh	12:24
*** yamamoto has quit IRC		12:24
sshnaidm	Hostname: centos-7-limestone-regionone-0009753160 Provider: limestone-regionone	12:24
*** rlandy has joined #openstack-infra		12:26
logan-	sshnaidm: could you link to the job logs, i'd like to take a look into that	12:29
sshnaidm	logan-, https://logs.opendev.org/33/674433/7/check/tripleo-ci-centos-7-containers-multinode/bd26761/job-output.txt.gz#_2019-08-05_12_05_52_389815	12:31
logan-	thanks	12:31
*** jroll has quit IRC		12:39
*** jroll has joined #openstack-infra		12:39
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	12:40
openstackgerrit	Mark Meyer proposed zuul/zuul master: Rework some bugs https://review.opendev.org/674425	12:40
*** jamesmcarthur has joined #openstack-infra		12:41
*** ccamacho has quit IRC		12:43
*** priteau has joined #openstack-infra		12:45
*** ramishra has quit IRC		12:46
*** yamamoto has joined #openstack-infra		12:48
*** jamesdenton has quit IRC		12:49
*** priteau has quit IRC		12:54
logan-	sshnaidm: going to monitor for a while. it looks like some rabbitmq messages were being held up due to a massive "notifications.sample" queue backlog. I cleaned that up and I don't see neutron messages backing up anymore, but that may have been the cause of communications issues during your multinode jobs. we'll see	12:55
sshnaidm	logan-, thanks!	12:56
*** bhavikdbavishi has joined #openstack-infra		12:59
*** jamesmcarthur has quit IRC		13:04
*** aaronsheffield has joined #openstack-infra		13:04
*** jamesdenton has joined #openstack-infra		13:04
fungi	overrun with rabbits	13:05
*** ekultails has joined #openstack-infra		13:05
*** goldyfruit has joined #openstack-infra		13:06
*** goldyfruit has quit IRC		13:11
*** bhavikdbavishi has quit IRC		13:12
*** bhavikdbavishi has joined #openstack-infra		13:13
*** Lucas_Gray has joined #openstack-infra		13:14
*** mriedem has joined #openstack-infra		13:14
*** jamesmcarthur has joined #openstack-infra		13:15
*** ykarel is now known as ykarel\|afk		13:22
mordred	fungi: I didn't realize the outer banks had rabbit issues	13:22
fungi	just the openstack parts	13:23
*** ociuhandu has quit IRC		13:32
*** HenryG has quit IRC		13:37
*** aedc has quit IRC		13:38
openstackgerrit	Thierry Carrez proposed openstack/ptgbot master: Display room capabilities https://review.opendev.org/674606	13:38
*** jbadiapa has joined #openstack-infra		13:39
*** aedc has joined #openstack-infra		13:39
openstackgerrit	Sorin Sbarnea proposed zuul/zuul-jobs master: POC: Add ensure-managed role https://review.opendev.org/674609	13:42
*** jamesdenton has quit IRC		13:44
*** jbadiapa has quit IRC		13:44
*** pcaruana has quit IRC		13:47
*** dchen has joined #openstack-infra		13:48
*** dchen has joined #openstack-infra		13:49
*** dchen has quit IRC		13:49
*** ykarel\|afk is now known as ykarel\|away		13:52
*** bhavikdbavishi has quit IRC		13:54
*** iurygregory has quit IRC		13:55
*** yamamoto has quit IRC		13:58
*** pcaruana has joined #openstack-infra		14:00
*** ramishra has joined #openstack-infra		14:07
*** yamamoto has joined #openstack-infra		14:08
johnsom	Is http://paste.openstack.org/ down or is it just me?	14:10
AJaeger_	infra-root, takes ages to respond (still spinning) for me as well ^	14:11
AJaeger_	wait, it's there now...	14:11
johnsom	Yeah, mine just opened as well.	14:11
*** rlandy is now known as rlandy\|brb		14:13
*** ociuhandu has joined #openstack-infra		14:19
*** eharney has joined #openstack-infra		14:20
*** Lucas_Gray has quit IRC		14:20
*** Lucas_Gray has joined #openstack-infra		14:22
*** jcoufal has joined #openstack-infra		14:22
*** _erlon_ has joined #openstack-infra		14:23
*** rlandy\|brb is now known as rlandy		14:23
*** jamesdenton has joined #openstack-infra		14:23
fungi	i believe it occasionally gets its mysql socket timed out and the backend daemon doesn't realize it, so takes a while before it tries to reconnect	14:23
*** _erlon_ has left #openstack-infra		14:23
fungi	i've tried to track down the cause several times in the past, to no avail	14:24
fungi	we probably ought to just move its db back out of trove and keep it local to the server	14:24
*** pkopec_ has quit IRC		14:25
*** mriedem has quit IRC		14:29
*** mriedem has joined #openstack-infra		14:29
*** yamamoto has quit IRC		14:31
*** yamamoto has joined #openstack-infra		14:32
*** jaosorior has quit IRC		14:36
*** yamamoto has quit IRC		14:37
openstackgerrit	Thierry Carrez proposed openstack/ptgbot master: Generate etherpad links automatically https://review.opendev.org/674622	14:38
*** jaosorior has joined #openstack-infra		14:39
openstackgerrit	Thierry Carrez proposed openstack/ptgbot master: Display room capabilities https://review.opendev.org/674606	14:52
*** Lucas_Gray has quit IRC		14:54
clarkb	I'm going to rerun the fedora mirror sync with updated atomic excludes just as soon as my morning meeting completes	14:57
clarkb	then rerun vos release in screen with -localauth	14:57
clarkb	and if that looks happy I will release the lockfile on mirror-update	14:57
*** pkopec has joined #openstack-infra		14:59
*** bnemec-pto is now known as bnemec		15:01
donnyd	clarkb: It looks like you are moving to object storage, but I haven't been following for what. I am working on getting some all nvme object storage going (gonna be a bit). Is every provider going to need object stores?	15:01
clarkb	donnyd: I think the idea is to use object stores for log storage if they are available. We won't be requiring that of every cloud but will happily make use of them from those that have it (as a note we probably want ipv4 from those instances since client browsers will make log requests directly to the swift api)	15:03
donnyd	Also FN is at capacity, and it looks like all of the issues have been pretty well ironed out. The only errors i get is when a fresh image is loaded, nodepool schedules too many of that instance type too fast	15:03
clarkb	donnyd: as long as that results in "clean failures" eg we don't break the cloud in the process that is probably fine	15:03
donnyd	All the api's are on v4 here (also working v6 for that too.)	15:03
donnyd	So if its useful, I can make it my next target... I need to find a faster backend for glance anyways. How much space are you thinking the logs will require? Performance sla?	15:04
clarkb	donnyd: corvus likely has better grasp of those numbers since he has been working on it recently, but I can take a look after meetings too	15:05
*** Lucas_Gray has joined #openstack-infra		15:07
donnyd	Just for anyone who was curious, I know in the beginning of getting FN we talked a little about power requirements. At full load with 100(ish) instance in system I am using 3200 watts	15:13
fungi	that's still probably half the power my dec alphastation drew at idle ;)	15:15
donnyd	About 250 bucks a month in electrical costs. Quite honestly that isn't that bad	15:16
donnyd	The blade chassis isn't really efficient until i put a full lot in it, and a full load on it. My 3 EMC isilions were taking more than that just for disk prior to the NVME build	15:17
fungi	wow	15:17
*** yamamoto has joined #openstack-infra		15:19
*** yamamoto has quit IRC		15:25
*** goldyfruit has joined #openstack-infra		15:26
*** pkopec has quit IRC		15:26
*** betherly has quit IRC		15:29
*** ralonsoh has quit IRC		15:29
*** iurygregory has joined #openstack-infra		15:38
*** ociuhandu has quit IRC		15:40
*** ociuhandu has joined #openstack-infra		15:40
*** ociuhandu has quit IRC		15:41
*** sthussey has joined #openstack-infra		15:41
*** ociuhandu has joined #openstack-infra		15:42
*** ociuhandu has quit IRC		15:43
*** ociuhandu has joined #openstack-infra		15:43
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Report retried builds in a build set via mqtt. https://review.opendev.org/632727	15:43
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Report retried builds via sql reporter. https://review.opendev.org/633501	15:43
*** ociuhandu has quit IRC		15:43
*** ociuhandu has joined #openstack-infra		15:45
*** ociuhandu has quit IRC		15:45
*** ociuhandu has joined #openstack-infra		15:45
*** pgaxatte has quit IRC		15:48
*** gyee has joined #openstack-infra		15:48
*** Lucas_Gray has quit IRC		15:51
corvus	donnyd, clarkb: our current total log usage is 9.2TiB in 380M inodes; we're not sure how many cloud providers we can use yet, but including FN, it would be between 4 and 7, so that's something like 1.3-1.2TiB per provider and 54-95M objects.	15:52
corvus	donnyd, clarkb: if we have room, we might expand our log retention a bit	15:54
clarkb	corvus: we are at 4 weeks now?	15:54
donnyd	Requirements for speed?	15:54
corvus	yeah, looks like 30 days	15:55
corvus	i don't think we've ever characterized the numbers around speed. everything gets written once and then read once automatically, after that, very little of it gets read. there shouldn't be a lot of concurrent activity, and read throughput isn't a big deal, but it would be good if it didn't take too long to fetch a random file since that's a user-interactive thing.	15:57
corvus	maybe there's a nugget of useful info in there, sorry.	15:58
*** AJaeger_ is now known as AJaeger		15:59
donnyd	I can add on swift without issue, just gonna take some time. The reason I ask is I am using all nvme disks. If this has no specific performance requirements, I can put it on normal spinning rust and offer a lots more in terms of available space. I have 3.2TB of nvme to throw at swift and 500Tb of spinning rust laying around.	16:00
donnyd	Preferance for swift over ceph or vice vesa?	16:01
clarkb	donnyd: we need CORS headers to make this work and ceph's swift api doesn't support that (the s3 api does but we've not yet gotten that working/confirmed) I think that means the preference for now is swift proper	16:02
corvus	i think i lean toward more spinning rust being more useful in the long run; should be sufficient performance and has potential for growth. clarkb, fungi?	16:02
corvus	swift++	16:02
clarkb	donnyd: also spinning disk is probably fine since ya being able to extend retention would be great	16:02
clarkb	I am starting the fedora mirror sync with updates (no vos release) now	16:02
fungi	ooh, this is always a fun debate	16:03
fungi	and i agree, cheap slow bulk storage is likely a great tradeoff	16:03
corvus	i typo'd before -- current retention would be 1.3-2.3TiB per provider	16:04
donnyd	Ok, I think i can do that without any issues at all.	16:04
*** tesseract has quit IRC		16:04
donnyd	If its rust, the smallest drive I have is 3tb	16:04
fungi	write speed probably matters more than read speed (though maybe the amount of the two is going to be roughly on par considering our current logstash worker implementation?)	16:04
corvus	if we increase retention to our target of 6months, it would be 8-14TiB per provider	16:04
*** ociuhandu_ has joined #openstack-infra		16:04
donnyd	so as long as we can keep it below say 20TB, I am a happy camper.	16:05
corvus	seems reasonable	16:05
corvus	fungi: can you +3 https://review.opendev.org/674459 ?	16:05
fungi	corvus: yep, that's a simple enough one	16:06
*** ociuhandu has quit IRC		16:07
*** michael-beaver has joined #openstack-infra		16:09
*** ginopc has quit IRC		16:09
*** lpetrut has joined #openstack-infra		16:09
clarkb	rsync done, doing vos release with localauth now	16:09
donnyd	Also more interesting tidbits of info on what the CI looks like from the cloud perspective. The instances only use about 25% of what they are provisioned for disk space	16:14
openstackgerrit	Merged opendev/base-jobs master: Add zuul manifest role to swift base job https://review.opendev.org/674459	16:15
AJaeger	config-core, please review https://review.opendev.org/673988	16:15
clarkb	AJaeger: on it	16:15
AJaeger	can we merge new repo creation changes again? https://review.opendev.org/673900 and https://review.opendev.org/673898 create two new airship repos	16:16
AJaeger	thanks, clarkb	16:16
AJaeger	and here's one more change: https://review.opendev.org/658439	16:16
clarkb	AJaeger: yes all gitea backends have been replaced and are being managed by ansible, creating projects should be safe	16:16
AJaeger	and one more https://review.opendev.org/673764 for review, please	16:17
AJaeger	clarkb: great	16:17
*** e0ne has quit IRC		16:19
clarkb	AJaeger: pabelanger I left a concern on https://review.opendev.org/#/c/658439/2	16:19
AJaeger	thanks for spotting	16:20
*** gfidente has quit IRC		16:20
fungi	donnyd: we could likely get away with something like 3:1 oversubscribed storage because only a fraction of jobs need additional space (for example jobs which create an assortment of cinder volumes for some tests). problem likely is that oversubscribing means thin-provisioning rootfs volumes, which likely implies a write throughput penalty	16:21
fungi	also some jobs use swap, and i expect that would get markedly worse performance on thin-provisioned volumes	16:22
openstackgerrit	Merged openstack/project-config master: Fix ACL for compute-hyperv https://review.opendev.org/673988	16:28
AJaeger	clarkb: time to review the two new repo creations changes as well?	16:28
donnyd	I see no write penalties on my thin volumes, but my drives are so much faster than the rest of everything else... I can't really tell	16:29
* AJaeger would love to be in that situation ;)		16:30
clarkb	knocked 160GB off the fedora mirror I think	16:30
clarkb	AJaeger: and ya I'll review those shortly	16:30
AJaeger	clarkb: thanks	16:31
donnyd	Its fun to see just how stupid fast nvme drives really are	16:31
*** ociuhandu_ has quit IRC		16:33
*** ociuhandu has joined #openstack-infra		16:33
donnyd	If only i could get this on all the hypervisors READ: bw=11.4GiB/s (12.2GB/s), 1454MiB/s-1555MiB/s (1525MB/s-1630MB/s), io=80.0GiB (85.9GB), run=6586-7043msec	16:33
*** mattw4 has joined #openstack-infra		16:34
*** bhavikdbavishi has joined #openstack-infra		16:34
openstackgerrit	Merged openstack/project-config master: Allow registered users to vote for backport candidates https://review.opendev.org/673764	16:35
donnyd	AJaeger: And I have less in it than most peoples laptops	16:36
*** diablo_rojo has joined #openstack-infra		16:37
*** mattw4 has quit IRC		16:37
*** mattw4 has joined #openstack-infra		16:39
*** jamesmcarthur has quit IRC		16:39
clarkb	infra-root AJaeger can you check my comment on https://review.opendev.org/#/c/673900/2 and consider if that concern is valid?	16:40
fungi	donnyd: oh, yeah, if you're already thin-provisioning and not seeing a disk bottleneck, then i guess that's ideal (and also presumably means you have available blocks on the backend?)	16:40
mordred	clarkb: I agree with yoru comment	16:42
*** weifan has joined #openstack-infra		16:43
donnyd	fungi: I am 100% overprovisioned on the backend. ( requested 8TB, have 4Tb). We are only using 25% of the 4tb I have.	16:43
donnyd	so its more like it actually uses 12.5 of what is requested.	16:43
AJaeger	clarkb: I see your point but view people type it, so I'm fine either way.	16:44
clarkb	ya testing cinder in particular requires quite a bit of disk space, but most other jobs likely don't get near the limits	16:44
*** ramishra has quit IRC		16:44
donnyd	I will keep an eye on it, as today is the first day at full capacity	16:44
clarkb	(cinder wouldn't need so much disk if they allowed for specifying sizes smaller than incremements of 1GB)	16:44
clarkb	iirc devstack provisions ~30GB of disk just for cinder beause we have multiple threads running and the volume deletion for test cleanup can lag behind subsequent tests starting	16:45
fungi	donnyd: that's excellent data, thanks!	16:45
*** ociuhandu has quit IRC		16:45
clarkb	AJaeger: I mostly don't want the next k8s related thing to come along and be cranky someone has decided they are the k8s thing	16:46
donnyd	it just interesting to see what the CI system actually uses.. From my POV, its memory, then cpu, then network, then storage	16:46
fungi	clarkb: some swift tests might too, since i know they create an auxiliary filesystem to be able to exercise extended xfs attributes	16:46
clarkb	donnyd: that is great feedback	16:47
clarkb	would it be worthwhile for us to have donnyd write a little section of notes we can tack onto https://docs.openstack.org/infra/system-config/contribute-cloud.html ?	16:47
clarkb	a "from the provider perspective" notes?	16:48
openstackgerrit	Merged openstack/project-config master: New project request: airship/porthole https://review.opendev.org/673898	16:48
corvus	yeah, the storage overprovisioning factor is a great new number to have	16:48
*** markvoelker has quit IRC		16:49
donnyd	I need to find a way to find out what it needs from an iops, BW perspective as well to be optimal	16:49
clarkb	donnyd: https://opendev.org/opendev/system-config/src/branch/master/doc/source/contribute-cloud.rst is the source file for that doc if you want to propose a change with your thoughts. I am happy to push a change with you too if you would rather we do ti that way (can collaborate on an etherpad if so)	16:49
donnyd	https://usercontent.irccloud-cdn.com/file/UmYecpK0/Screenshot%20from%202019-08-05%2012-50-09.png	16:50
*** ociuhandu has joined #openstack-infra		16:50
donnyd	Do we just want to add a notes section?	16:51
donnyd	Also is it helpful to have public facing dashboards for FN, so you can capture the data from the provider side?	16:52
clarkb	donnyd: maybe a "From a provider's perspective" section then you can format that in whatever manner makes the most sense	16:53
clarkb	for dashboards I think that may also help but am not sure how easy it isfor you to do that without also exposing important details	16:53
*** jamesmcarthur has joined #openstack-infra		16:54
donnyd	There are no details that are important.. this is the only workload on here	16:54
donnyd	I think the important details are probably going to provide valuable data for the underpinnings of the CI. The more we know, the more efficient we can make things.	16:56
clarkb	++	16:56
*** ricolin has quit IRC		16:57
donnyd	Just no sure what data we want to gather, but I have grafana hooked up to gnocchi... just don't know how to gnocchi quite yet	16:57
*** kopecmartin is now known as kopecmartin\|off		16:59
*** markvoelker has joined #openstack-infra		17:01
corvus	clarkb, fungi, mordred: mnaser told me about "openstack ec2 credentials create" so i tried that and plugged those into this script, but i still get an error: http://paste.openstack.org/show/755536/ -- any ideas of anything else we could try?	17:01
clarkb	that makes me wonder if the ceph s3 api isn't tied into the openstack aws api support	17:05
clarkb	ec2 credentials for example are a nova thing	17:05
logan-	^ typically that works, at least that's the way we connect s3 clients to our radosgw	17:06
*** jamesmcarthur has quit IRC		17:07
*** udesale has quit IRC		17:07
corvus	that's good to know; i guess we'll wait to see what mnaser says	17:08
*** bhavikdbavishi1 has joined #openstack-infra		17:10
*** bhavikdbavishi has quit IRC		17:12
*** bhavikdbavishi1 is now known as bhavikdbavishi		17:12
fungi	that's an interesting entanglement, but i guess makes some sense	17:12
*** jamesmcarthur has joined #openstack-infra		17:12
*** goldyfruit has quit IRC		17:18
*** goldyfruit_ has joined #openstack-infra		17:18
donnyd	can anyone hit https://grafana.fortnebula.com	17:21
AJaeger	yes, I can, donnyd	17:21
diablo_rojo	I can too	17:22
donnyd	is there anything worthwhile on it?	17:22
clarkb	and the openstack utilization dashboard renders for me too	17:22
AJaeger	donnyd: nit, it's OpenStack with capital S; )	17:22
donnyd	refresh AJaeger	17:22
AJaeger	max 75 % utilization...	17:22
AJaeger	donnyd: nit trick ;)	17:23
donnyd	LOL	17:23
AJaeger	donnyd: interesting, compute-8 and compute-9 have very low cpu utilization compared with rest	17:24
donnyd	Does anyone know how to actually use gnocchi... these are backend stats	17:24
donnyd	compute-9 always will, its the dedicated node for the mirror	17:24
AJaeger	and compute-2 has more than average...	17:24
AJaeger	ah, I see	17:24
donnyd	compute-8 was placed into serivce early this morning	17:24
donnyd	Would you want to see anything else... I was thinking maybe detailed dashboards per hypervisor or something like that maybe... I need to also gather front-end metrics	17:26
donnyd	like instance related things	17:26
clarkb	donnyd: aggregate network bw at the gateway/boundary would probably be helpful	17:27
donnyd	Ok, I can do that	17:28
clarkb	donnyd: in part because that might help us evaluate effectiveness of the mirrors/caching	17:28
*** psachin has quit IRC		17:28
*** ociuhandu has quit IRC		17:28
*** ociuhandu has joined #openstack-infra		17:29
*** ociuhandu has quit IRC		17:31
*** ociuhandu has joined #openstack-infra		17:31
*** jamesmcarthur has quit IRC		17:32
donnyd	how about now clarkb	17:32
donnyd	My local network requirements are usually pretty low, so that is 90% the CI	17:33
clarkb	donnyd: edge bw graph lgtm	17:34
donnyd	sweet...	17:34
*** kjackal has joined #openstack-infra		17:34
*** ociuhandu has quit IRC		17:35
*** jamesmcarthur has joined #openstack-infra		17:37
*** sgw has quit IRC		17:37
*** jamesmcarthur_ has joined #openstack-infra		17:39
*** jamesmcarthur has quit IRC		17:41
*** jamesmcarthur has joined #openstack-infra		17:42
*** rpittau is now known as rpittau\|afk		17:42
*** tdasilva has quit IRC		17:43
*** tdasilva has joined #openstack-infra		17:43
*** jamesmcarthur_ has quit IRC		17:43
Shrews	infra-root: I've put together a script to cleanup the leaked swift objects from image uploads. I'm not sure how to determine which ones might actually still be needed, so I thought I'd start by deleting any objects that have a last-modified timestamp older than 5 days, just out of safety and until I can figure out exactly where they're being leaked. Does this seem safe?	17:44
clarkb	Shrews: I want to say mordred believes the swift objects from image uploads are always safe to delete as long as the upload isn't in progress	17:45
clarkb	Shrews: so 5 days should be plenty to make sure we don't delete anything in progress	17:45
Shrews	yeah, it was the "figure out which ones are in progress" that wasn't the easy part	17:46
fungi	does glance just use them as an upload staging area?	17:46
clarkb	fungi: exactly	17:46
fungi	Shrews: pause the builders first?	17:46
clarkb	I think we can start with the 5 day limit first to clean out 99% of the leaks	17:46
clarkb	then ya turn off the builders and clean up the rest maybe	17:46
Shrews	fungi: i don't think we have to do that yet with the 5 day limit	17:46
fungi	if there are no image uploads in progress, then it would be safe to delete it all en masse	17:46
Shrews	what clarkb said :)	17:46
fungi	ahh, also an option, sure	17:47
corvus	Shrews: yeah, i vote delete with 5 day limit, don't worry about pausing, then after we fix the leak, we can pause and delete all	17:48
fungi	makes sense	17:48
*** jamesmcarthur has quit IRC		17:48
corvus	(since we know we're going to continue leaking for a bit)	17:48
*** jamesmcarthur has joined #openstack-infra		17:53
Shrews	ok, it's running for dfw. might take a while	17:55
*** ykarel\|away has quit IRC		17:56
mordred	Shrews: my understanding is that we don't need any fo them - as long as we're not actively importing the related image	17:56
mordred	Shrews: having read the full scrollback now - yes, I agree with teh above course of action :)	17:57
openstackgerrit	James E. Blair proposed zuul/zuul master: WIP: render console in js with listview https://review.opendev.org/674663	17:59
*** sgw has joined #openstack-infra		18:02
openstackgerrit	James E. Blair proposed zuul/zuul master: WIP: render console in js with listview https://review.opendev.org/674663	18:03
*** jamesmcarthur_ has joined #openstack-infra		18:03
*** e0ne has joined #openstack-infra		18:05
*** jamesmcarthur has quit IRC		18:06
Shrews	Seems that 'openstack object list images' does not change as the objects are deleted from the container. Yet the deletes are clearly working since I get a 404 if I try to delete one already processed. Maybe the 'list' data is cached ?	18:08
Shrews	oh, i bet it limits output to 10000	18:09
Shrews	which means we probably have WAAAY more objects than we think	18:09
Shrews	because 'list' output changes, but not the total size of it	18:10
Shrews	:(	18:11
*** jamesmcarthur has joined #openstack-infra		18:13
*** bhavikdbavishi has quit IRC		18:13
corvus	Shrews: i want to say that the web interface was like ~13000 objects per region	18:13
corvus	assuming it's correct (i'm going going out on a limb for it)	18:14
*** jamesmcarthur_ has quit IRC		18:14
*** bhavikdbavishi has joined #openstack-infra		18:14
Shrews	ok, not much over 10k then	18:14
corvus	fungi, clarkb, mordred: i have learned from our friends at ovh that swift doesn't return an allow-origin header unless the browser sends an origin header, so my test of ovh was faulty, ovh+swift should be working fine and we can go ahead and start using it.	18:15
mordred	corvus: oh good! (because browsers will send the origin header)	18:17
corvus	mordred: yeah, though apparently only with an xmlhttprequest, since just hitting the url in the browser and watching the network panel shows no headers	18:18
corvus	fungi, clarkb, mordred, mnaser: repeating the test against vexxhost with an origin header, i do see an allow-origin: * header, but i also see allow-methods: HEAD... so that's confusing.	18:18
mordred	Shrews: I think if we just delete the SLO manifest object we don't need to delete teh subobjects	18:18
mordred	Shrews: I do not know if that is useful to you or not	18:18
Shrews	mordred: maybe if you explained the words?	18:19
mordred	corvus: ah - yeah - because just opening it in a normal browser link is a thing you can just do	18:19
mordred	Shrews: each of these images is a "Large Object" - which has an object in /images and then a bunch of smaller ones in (I think) /image_segments	18:19
corvus	i think image_segments may have been empty when i looked?	18:20
mordred	Shrews: I believe we only have to delete the manifest objects and the segment objects will get autodeleted for us	18:20
mordred	neat	18:20
mordred	then it's also possible I'm just high and shouldn't be listened to	18:20
corvus	but Shrews was looking a different way and should probably confirm :)	18:20
Shrews	mordred: i'm currently only deleting in 'images' so i'll check the other afterwards	18:20
corvus	(because i'm really suspcious when the control panel says 0 objects in a container)	18:20
Shrews	mordred: corvus: i currently don't see anything in image_segments	18:21
clarkb	corvus: the s3 api allows you to do fine grained cors for request types too	18:21
Shrews	err, images_segments, that is	18:21
clarkb	corvus: was your vexxhost test via s3 or swift apis?	18:22
corvus	clarkb: still no joy with s3, those files were put via swift apis	18:22
corvus	oh, ha	18:24
corvus	vexxhost returns whatever method you use to access it via Access-Control-Allow-Methods	18:24
corvus	so, erm, i think we're just going to have to try this	18:24
*** jamesmcarthur_ has joined #openstack-infra		18:24
clarkb	oh taht should be fine because we are doing GETs from the web ui	18:25
corvus	yeah, as long as the options response makes sense	18:26
corvus	that's the thing i'm not sure how to simulate	18:26
*** jamesmcarthur has quit IRC		18:26
*** jamesmcarthur has joined #openstack-infra		18:28
*** betherly has joined #openstack-infra		18:30
*** weifan has quit IRC		18:30
*** jamesmcarthur_ has quit IRC		18:30
*** tosky has quit IRC		18:32
*** jamesmcarthur_ has joined #openstack-infra		18:34
*** betherly has quit IRC		18:34
*** smarcet has joined #openstack-infra		18:35
cloudnull	anyone around mind giving this a look - https://review.opendev.org/#/c/674414/7/zuul.d/molecule.yaml - seems this change is resulting in a zuul configuration validation failure, and I'm not really sure why?	18:35
cloudnull	keeps raising "Unknown configuration error"	18:36
*** jamesmca_ has joined #openstack-infra		18:36
*** jamesmcarthur has quit IRC		18:36
clarkb	cloudnull: I can take a look	18:37
cloudnull	thanks clarkb!	18:37
mordred	cloudnull: do you have hide-ci turned on? because there's a nice clear error message from zuul on that	18:38
clarkb	hideci should no longer hide that message fwiw	18:38
mordred	awesome	18:39
cloudnull	no.	18:39
*** jamesmcarthur_ has quit IRC		18:39
* cloudnull that that I'm aware of		18:39
fungi	and it's "toggle extra ci" now	18:39
mordred	cloudnull: weird. anywho - I'm going to guess it's https://review.opendev.org/#/c/674414/7/zuul.d/playbooks/releasenotes/notes/docker_enable_vfs-c8b41b02111341df.yaml	18:39
mordred	which zuul is attempting to parse as zuul config	18:39
mordred	cloudnull: oh - I see the unknown config error you mentioned	18:39
clarkb	mordred: ya because that isn't a list but instead a dict	18:39
mordred	cloudnull: there's a better error message on ps7	18:40
cloudnull	its the release note ?	18:40
mordred	cloudnull: maybe you were in the wrong dir when you ran reno new?	18:41
cloudnull	that very well could be...	18:41
* cloudnull walks away in shame		18:41
*** jamesmca_ has quit IRC		18:41
clarkb	vos release on fedora mirror is still running if anyone is wondering	18:41
mordred	cloudnull: it happens to the best of us - and also to me :)	18:42
cloudnull	thanks clarkb mordred	18:42
clarkb	I wonder if I should rerun the sync script by hand and rerun vos release again just to make sure it comes in under the timeout normally	18:44
*** tdasilva has quit IRC		18:44
clarkb	this update did remove a bunch of data from the volume so not suprising it is slow	18:44
*** tdasilva has joined #openstack-infra		18:45
*** e0ne has quit IRC		18:45
*** jamesmcarthur has joined #openstack-infra		18:46
*** smarcet has quit IRC		18:46
*** jamesmcarthur_ has joined #openstack-infra		18:48
*** spsurya has quit IRC		18:49
*** jamesmcarthur has quit IRC		18:50
*** jamesmcarthur has joined #openstack-infra		18:53
*** jamesmcarthur_ has quit IRC		18:55
openstackgerrit	Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698	18:58
openstackgerrit	Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698	19:00
*** jamesmcarthur has quit IRC		19:01
*** jamesmcarthur_ has joined #openstack-infra		19:01
mordred	infra-root: afk for just a bit	19:01
*** bhavikdbavishi has quit IRC		19:04
*** lpetrut has quit IRC		19:05
*** jamesmcarthur_ has quit IRC		19:06
*** weifan has joined #openstack-infra		19:08
*** igordc has joined #openstack-infra		19:08
*** Goneri has joined #openstack-infra		19:08
*** jamesmcarthur has joined #openstack-infra		19:12
*** weifan has quit IRC		19:12
openstackgerrit	Matt McEuen proposed openstack/project-config master: New project request: airship/kubernetes-entrypoint https://review.opendev.org/673900	19:13
*** e0ne has joined #openstack-infra		19:16
*** jamesmcarthur has quit IRC		19:17
*** jamesmcarthur has joined #openstack-infra		19:19
*** jamesmcarthur has quit IRC		19:24
*** iurygregory has quit IRC		19:24
*** diablo_rojo has quit IRC		19:25
openstackgerrit	Sorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698	19:26
*** jamesmcarthur has joined #openstack-infra		19:31
*** e0ne has quit IRC		19:31
* clarkb finds lunch		19:31
*** smarcet has joined #openstack-infra		19:33
*** jamesmcarthur_ has joined #openstack-infra		19:36
*** jamesmcarthur has quit IRC		19:37
*** jamesmcarthur has joined #openstack-infra		19:38
*** jamesmcarthur_ has quit IRC		19:40
*** betherly has joined #openstack-infra		19:41
*** Goneri has quit IRC		19:42
*** Goneri has joined #openstack-infra		19:43
*** betherly has quit IRC		19:45
*** tdasilva has quit IRC		19:45
*** tdasilva has joined #openstack-infra		19:46
*** jamesmcarthur has quit IRC		19:48
openstackgerrit	Merged openstack/project-config master: New project request: airship/kubernetes-entrypoint https://review.opendev.org/673900	19:54
*** weifan has joined #openstack-infra		19:59
*** weifan has quit IRC		20:03
*** Goneri has quit IRC		20:07
*** harlowja has joined #openstack-infra		20:08
mnaser	corvus: ill try to dig a little bit and see whats up with the "405 Method Not Allowed"	20:09
mnaser	also, is storyboard-dev.openstack.org having issues? https://storyboard-dev.openstack.org/api/v1/stories?limit=10&project_id=237&sort_dir=desc&status=active takes almost 10 seconds	20:10
fungi	looking now	20:10
fungi	interestingly, system load on it is nearly nonexistent	20:13
*** Goneri has joined #openstack-infra		20:13
fungi	the vast majority of memory utilization is buffers/cache too	20:14
fungi	mnaser: watching top, it seems that's the amount of time it takes a mysql thread to return the result of the corresponding database queries	20:16
*** jcoufal has quit IRC		20:17
*** michael-beaver has quit IRC		20:18
*** betherly has joined #openstack-infra		20:22
*** betherly has quit IRC		20:27
*** tdasilva has quit IRC		20:27
*** tdasilva_ has joined #openstack-infra		20:27
*** weifan has joined #openstack-infra		20:27
*** jamesmcarthur has joined #openstack-infra		20:27
*** guoqiao has joined #openstack-infra		20:41
*** prometheanfire has quit IRC		20:41
mnaser	i guess maybe some indexes are missing, i dunno	20:42
* mnaser shrugs		20:42
*** prometheanfire has joined #openstack-infra		20:43
*** betherly has joined #openstack-infra		20:43
mnaser	it makes dealingh with this pretty slow	20:43
mnaser	https://storyboard.openstack.org/#!/project/openstack/governance	20:43
*** jamesmcarthur has quit IRC		20:47
*** betherly has quit IRC		20:47
*** jamesmcarthur has joined #openstack-infra		20:48
*** jamesmcarthur has quit IRC		20:48
*** jamesmcarthur has joined #openstack-infra		20:48
donnyd	Do we know what the jobs are reaching out to the interwebs for? I could just sync what the mirror cannot down locally to speed up wany things	20:49
donnyd	mnaser: What do you have the nodepool tenants volume limit set at for vexxhost?	20:51
donnyd	i should say volume quota?	20:52
clarkb	donnyd: the idea is that they shouldn't hit the internet for anything (well not without going through the mirror at least) but it is a general purpose ci system and we know people don't use the mirror for everything so we keep that option available	20:53
*** Goneri has quit IRC		20:53
donnyd	Well I ask because I was thinking I may try to get on the official mirror list for the major distros so when it goes "upstream", that would be local too	20:54
fungi	mnaser: yes, we've got mysql slow query logging turned on for the production storyboard to try and figure out where the hot spots are and which queries are in most need of refactoring. there are some really egregious ones	20:54
clarkb	donnyd: oh in many cases it is github repos	20:54
mnaser	donnyd: i am not sure, for nodepool i think i have 80number_of_nodessmall%	20:54
clarkb	donnyd: for random projects like puppet and ansible modules or golang repos	20:55
mnaser	donnyd: you know what'd be neat? but also i dont know the security implication of it, but running a transparent squid proxy at your gateway	20:55
donnyd	oh, well i can't do much about that. I could cache locally with the squids or something like that.. but I would be worried it would interfere with what is upstream and	20:55
mnaser	and then generate stats from that only (not cache)	20:55
mnaser	just to help us find what are the big things that are being hit	20:56
donnyd	I can surely setup a squid that would just miss on everything and tell us where they are going	20:56
donnyd	mnaser: As long as the squid doesn'	20:58
donnyd	doesn't have any hits, it shouldn't effect the jobs. right?	20:58
donnyd	hit the enter too early	20:58
mnaser	i dont think so, it'll just be a pass through, but don't quote me on that :)	20:59
ianw	Shrews: you've possibly finished by now, but I had a script to maybe do a similar cleanups to what you've described above @ -> https://review.opendev.org/#/c/562510/1/tools/rax-cleanup-image-uploads.py	20:59
donnyd	clarkb: fungi any thoughts on that?	20:59
ianw	looks like that never got a second +2	20:59
ianw	infra-root: could i get a couple of eyes on https://review.opendev.org/#/c/674550/ and the dns change https://review.opendev.org/#/c/674549/ to get the ansible-ised backup server started please? thanks	21:01
ianw	it won't have any clients yet, one thing at a time :)	21:02
corvus	infra-root: i have paused the ansible run_all script for a bit	21:02
clarkb	donnyd: the reason we haven't set up squid ourselves is that we can't know source ranges for all our possible VM IPs to set up whitelists for access to that. We don't want open transparent proxy on the internet	21:02
donnyd	Not sure how I will man-in-the-middle github though	21:02
clarkb	donnyd: right that was going to be my next concern. I think if you did it at the cloud level it would have to be transparent	21:03
donnyd	well I know the range, because I asigned it	21:03
clarkb	in which case https wouldn't work	21:03
*** betherly has joined #openstack-infra		21:03
clarkb	donnyd: ya in your cloud that particular problem is probably fine. In most other clouds it is a bigger issue (since we pull from large pools of addrs that are recycled)	21:03
corvus	if all the clouds support neutron we might be able to set up a private network	21:03
clarkb	corvus: I don't think ovh does	21:04
donnyd	well we only need one cloud to gets the datas	21:04
*** mattw4 has quit IRC		21:04
*** mattw4 has joined #openstack-infra		21:04
donnyd	I think are options are pretty limited for https proxy. You would have to create a CA the image trusts and then pass it off to my squid and then figure out what breaks	21:05
Shrews	ianw: oh, i wasn't aware of your script. mine is much more basic. i've had to pause my deletes for a bit but i'll look at yours. If we fix the root problem, we shouldn't need to keep a script around in system-config.	21:05
donnyd	or force instances to not care and trust all	21:05
*** diablo_rojo has joined #openstack-infra		21:06
donnyd	I am no squid expert, btw	21:07
donnyd	just know enough to break things	21:07
clarkb	donnyd: ya to do https caching you essentially set up a mitm	21:07
clarkb	(which is maybe an option for us, I'll have to think about that a bit more)	21:07
*** eharney has quit IRC		21:07
donnyd	well I don't actually need to do the caching part (while it won't bother me to), just find out where the connections are going to.	21:08
ianw	Shrews: yeah, from my notes, at the time it cleared out 200+TB ... maybe a lot of that was de-duped, but it was a lot	21:08
*** betherly has quit IRC		21:08
donnyd	I can also turn logging up for the edge fw and find out who is talking to who...	21:08
clarkb	donnyd: oh for that tcpdump on your gateway may be sufficient. You can tcpdump filting out the mirror (since we want traffic to go through it) then see whateverything else is pinging	21:08
fungi	it would be possible (i'm not saying it's worth the effort involved) to set up a private ca for infra which signs certificates for a list of specific dns names and then distribute those to transparent proxies in each provider and set our images to add them to the appropriate trust lists	21:08
donnyd	but won't the client still complain about the mitm	21:09
*** weshay is now known as weshay_dentist		21:09
fungi	not if the cert provided by the proxy is trusted by the client	21:09
donnyd	Ok, then that could work	21:09
fungi	that's the "set our images to add them to the appropriate trust lists" part	21:09
donnyd	I can easily lie in dns to tell it who github.com is	21:10
fungi	the other trick would be working out how to make only requests for specific domains go through them	21:10
fungi	oh, yeah that would do it	21:10
donnyd	DNS attacj	21:10
donnyd	LOL	21:10
fungi	we could also have our own dns overrides which substitute the proxy's address for those names	21:10
donnyd	surely could	21:11
donnyd	I would just have to update the tenant's network to point to your dns	21:11
donnyd	You could also get the who is asking for what from there too	21:11
fungi	so clients know github.com as the ip address of whatever the local transparent proxy is in that provider, and are configured to trust the infra ca which signed the github.com cert we installed on the proxy	21:11
donnyd	correct	21:12
fungi	well, we also install local caching dns recursive resolver/forwarder on every node, so could simply carry the overrides there	21:12
donnyd	and you get to know where else they are going by looking in the dns logs for who looked up what	21:12
donnyd	all you would require from the cloud providers is an updated dns for the nodepool network	21:13
fungi	we don't actually trust/use the complimentary recursive resolvers in our providers anyway because they tend to be unreliable for a number of reasons	21:13
donnyd	right now its google, but that could easily be you	21:13
fungi	(most fun one being rackspace, whose resolvers have threat mitigation to automatically blacklist queries from the ip addresses of misbehaving server instances, but the blacklist doesn't get updated as quickly as they turn those addresses over to new instances)	21:14
clarkb	ya so we use google and cloudflare dns with local unbound caching results they return	21:15
donnyd	So your mirror would need a squid and dns server	21:16
jrosser	deploying openstack where a bunch of the resources needed come from git repos / mirrors / proxies / blah with a company CA rather than a public one took a ton of work	21:16
donnyd	and some certs, then populate those certs (we already know github.com), but there may be other places	21:16
jrosser	specifically the hoops that python requests / certifi make you jump through	21:16
clarkb	jrosser: good reminder re requests	21:16
donnyd	you could also prevent the I am gonna pull from the public centos mirrors anyway behaviors too	21:17
clarkb	it ships its own ca trust chain	21:17
jrosser	yep, it's a PITA	21:17
jrosser	so i don't think this is a set-and-forget without fixing up the actual jobs	21:17
clarkb	ya there will be fallout from that for sure	21:18
*** jamesmcarthur has quit IRC		21:18
donnyd	well a good start is just doing dns, because then you are going to definitively know where the tests are going	21:18
donnyd	maybe not speed anything up (except dns requests), but surely get valuable data	21:19
fungi	donnyd: wouldn't necessarily have to be squid. the apache mod_proxy with proxy caching we already have in place would be capable, in theory	21:19
*** mriedem has quit IRC		21:19
*** jamesmcarthur has joined #openstack-infra		21:20
donnyd	yea any proxy could work.	21:20
jrosser	i do have a sort of related example	21:21
*** mriedem has joined #openstack-infra		21:21
jrosser	i suffer / enjoy / deal with http proxies all day and to ensure good support i made a CI job with one	21:21
jrosser	https://logs.opendev.org/21/672721/1/gate/openstack-ansible-deploy-aio_proxy-ubuntu-bionic/130b0f8/logs/host/squid/access.log.txt.gz	21:21
*** rh-jelabarre has quit IRC		21:23
donnyd	So in this job, the CI build starts and sends traffic through this proxy?	21:23
jrosser	pretty much, yes	21:23
*** yamamoto has joined #openstack-infra		21:23
fungi	that's a neat design	21:24
jrosser	but it's really trying to be a mock up of an environment where the deployment is behind a proxy	21:24
jrosser	and as there isnt a full http proxy as part of the infra i made my own in the job	21:24
*** jamesmcarthur has quit IRC		21:25
*** yamamoto has quit IRC		21:28
*** mattw4 has quit IRC		21:28
donnyd	So where should we start? I can get working on the logging thing after I am done with work today	21:29
*** mattw4 has joined #openstack-infra		21:29
donnyd	I need to get central logging up and running anyways	21:29
*** markvoelker has quit IRC		21:29
clarkb	jrosser: at a job level it is easier to do that because you know all of the source IPs related to the job at runtime	21:29
fungi	what is it you want to do again? measure what the network traffic to nodes is that doesn't get proxied/cached?	21:30
jrosser	also a full http proxy is different to the transparent one you have been discussing, so not quite the same thing	21:30
jrosser	but anyway, it's really just as an example and a maybe representative set of squid log to look at	21:31
donnyd	fungi: Yea, just see where the hotspots are in the public side and then figure out a way to get that service faster access	21:32
*** mattw4 has quit IRC		21:32
donnyd	If its 90% github.com, then we can solve that a bunch of ways, if its random interwebs there might not be anything that can be done	21:32
donnyd	but right now we have very little data other than the traffic graph from my wan side (which is 90% these instances). Not that I mind using the public internet for things, but more in the interest of what can be done to speed it up	21:33
donnyd	If we could do something, we would need a way to make sure that it does in fact make things faster and not slower	21:35
fungi	simple ip address analyses (netflow, tcpdump, whatever) might yield some insights	21:35
fungi	(ntop)	21:35
donnyd	like case in point, the package mirrors. I could have local mirrors that would make access to .rpm and .deb content in orders of magnitude faster	21:35
fungi	part of the challenge is selectively depending on resources like that hosted by a provider since we don't have it as a consistent option everywhere	21:37
*** mattw4 has joined #openstack-infra		21:37
fungi	is rpm and deb content on our mirror host not fast (once the afs cache is warmed)?	21:37
donnyd	Not as fast as I could provide it	21:39
donnyd	I can cache 100% of the content downloaded locally	21:41
donnyd	Only so much can be done in 8GB	21:41
donnyd	yea and that makes sense from my perspective. Consistency is more important than speed at scale	21:42
clarkb	donnyd: so there are a couple things about the linux distro mirrors that are worth pointing out	21:43
clarkb	the first is that they cache to disk not memory so the actual afs disk cache size is much larger than 8GB	21:43
clarkb	second is that through the use of afs we synchronize all of our package mirrors globally	21:44
clarkb	this means you don't have jobs passing or failing due to local state (which is nice)	21:44
*** betherly has joined #openstack-infra		21:44
donnyd	Oh I c, I thought AFS was caching everything in memory.	21:44
clarkb	and finally we build the mirrors very carefully (ubuntu/debian at least) and only publish them when we have a valid mirror ready as well as not deleting older content until some time has passed to avoid jobs failing because files disappear out from under them	21:44
clarkb	I think the rpm rsync mirrors already handle the deletion phase out but also the way tools like yum work means it is far less likely you'll want to download older files compared to say apt	21:45
fungi	yeah, and avoiding mismatched package/index combinations	21:45
fungi	and afs provides atomic volume updates	21:46
fungi	so the data in a given package repository isn't changing while it's being read, which helps maintain consistency	21:46
donnyd	I am not arguing the value of AFS, it makes sense to me.	21:46
donnyd	But here is an example https://logs.opendev.org/87/674687/3/check/tripleo-ci-centos-7-scenario001-standalone/ce1d8a5/job-output.txt#_2019-08-05_20_34_33_056272	21:47
clarkb	looks like we are set to use 50GB of afs disk cache and fn mirror is using about 8GB currently	21:47
fungi	except at given update checkpoints, where we ensure we don't delete packages referenced by the previous index state on that pulse	21:47
donnyd	On the rest of my network packagey things are about 5x that speed	21:47
clarkb	donnyd: so I did a test of warm cache data from afs on the mriror itself and I get super speed (I forget the exact number but it was illy)	21:47
clarkb	donnyd: I think we need to test the network bw between VMs?	21:48
clarkb	because testing shows the afs cache works and it is fast, but then pull from off host is slower	21:48
clarkb	(whcih means maybe something between the hosts is slwoing it down)	21:48
*** jamesmcarthur has joined #openstack-infra		21:49
*** betherly has quit IRC		21:49
donnyd	we should do that and figure out where the bottleneck is. Could be the router between the mirror and the clients	21:49
donnyd	I did just upgrade it so I know for sure it will move at wireline 10G	21:49
donnyd	but its worth a look see.	21:49
donnyd	the mirror here should be ludicrous fast. It shares nothing and is on the fastest equipment you can get. the mirror is on a very new cpu / ram / nvme and shares nothing	21:51
clarkb	ok lets warm a file up then we can request it from elsewhere and see what it looks like	21:51
donnyd	Anyways, I think i should probably work on getting the logging stuffs up and running and then we can work on the host - mirror thing	21:52
donnyd	sure	21:52
donnyd	I think we did BW tests and were able to get quite respectable speeds a few weeks back. I think fungi and I were working on it	21:53
clarkb	donnyd: http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso I am going to warm that up. It is 500MB ish	21:53
clarkb	as expected, cold is not fast. ~1MBps	21:54
clarkb	so will be a few before we can do second request and see what it is like warm	21:54
donnyd	is there a way to make those cold requests go faster? Or do they go back to something central infra owns?	21:55
clarkb	(the slowness on cold access is due to how afs does window sizing of its udp packets, unfortunatley I don't think that is very fixable at least not without replacing afs)	21:55
clarkb	donnyd: ya we have central fileservers and they serve the data and they do it with a max window size that was set back in the 80s? and it is much smaller than modern networking would prefer	21:55
*** jamesmcarthur has quit IRC		21:55
clarkb	it predates tcp so afs does its own windowing with udp	21:56
*** whoami-rajat has quit IRC		21:58
donnyd	lmk when to test on my end. Desktop has the 10g's, so I should get some respectable numbers	21:59
clarkb	will do	21:59
*** mattw4 has quit IRC		22:02
*** mattw4 has joined #openstack-infra		22:03
*** jtomasek has quit IRC		22:04
clarkb	donnyd: cold: 507.00M 931KB/s in 10m 23s warm: 507.00M 456MB/s in 1.1s	22:04
clarkb	donnyd: you should test it from your desktop now	22:05
*** betherly has joined #openstack-infra		22:05
donnyd	(zuul) donny@office:~> curl -o /dev/null http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso	22:05
donnyd	% Total % Received % Xferd Average Speed Time Time Time Current	22:05
donnyd	Dload Upload Total Spent Left Speed	22:05
donnyd	100 507M 100 507M 0 0 292M 0 0:00:01 0:00:01 --:--:-- 291M	22:05
clarkb	now I guess I should ssh into a test node and give it a test from there	22:06
*** rcernin has joined #openstack-infra		22:06
donnyd	Oh you know what else I should put on the public graph is mirror node bw	22:07
clarkb	donnyd: 507.00M 295MB/s in 1.7s from one of our bionic test nodes	22:07
clarkb	the mystery deepens	22:07
*** smarcet has quit IRC		22:07
clarkb	maybe that points at our cache not being as warm as we often think it is?	22:08
clarkb	ianw: ^ you've been poking at afs logs recently. Any idea if we can better quantify that?	22:08
donnyd	so that number makes sense to me because my edge fw will only do 3.5GB on a single thread	22:08
*** betherly has quit IRC		22:09
corvus	infra-root: i have re-enabled the run-all crontab	22:12
clarkb	corvus: ty	22:12
ianw	clarkb: not really, sorry. i've been working on a little speed test type thing to send to grafana to get openafs/kafs comparisions, not ready just yet	22:12
donnyd	ok, now you can see the BW for the mirror node from the host view	22:13
mordred	clarkb: ++	22:14
donnyd	Does that graph help?	22:17
clarkb	donnyd: ya that likely gives us somethin we can cross check for over/under loading to at least narrow it down	22:17
* donnyd dinnering		22:21
clarkb	I think the next thing for us to investigate is if those slower than expected downloads are cache misses	22:21
*** tosky has joined #openstack-infra		22:26
*** diablo_rojo has quit IRC		22:28
*** diablo_rojo has joined #openstack-infra		22:29
openstackgerrit	James E. Blair proposed opendev/base-jobs master: Update swift upload credentials https://review.opendev.org/674709	22:29
clarkb	fedora mirror vos release finally completed. I am going to rerun the script manually and vos release manually again just to be sure it runs in a reasonable amount of time	22:30
corvus	infra-root: a speedy review of that ^ would help me progress testing	22:30
mordred	corvus: looks great	22:39
*** rlandy is now known as rlandy\|bbl		22:42
openstackgerrit	Merged opendev/base-jobs master: Update swift upload credentials https://review.opendev.org/674709	22:45
donnyd	clarkb: the cache misses are usually much slower, I was found one that was on the quicker side	22:45
clarkb	donnyd: so we have an intermediate speed too between cold and hot cache?	22:46
donnyd	I am yet to see anything close to what we can see in hot cache on any instance	22:46
*** markvoelker has joined #openstack-infra		22:46
*** tosky has quit IRC		22:47
donnyd	Here is a better view of what is most likely not cold, but surely not showing the hot numbers we are getting	22:47
donnyd	https://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_15_560505	22:47
*** jamesmcarthur has joined #openstack-infra		22:48
clarkb	hrm ya 27MB/s is well above cold numbers but like 1/10th hot numbers	22:48
clarkb	rsync took about 16 minutes for fedora and I've started the vos release	22:49
donnyd	I do have a faster nic to drop into my fw sometime in the near future, should get numbers up even higher.. but it doesn't look to me like I can really do anything to make it faster	22:52
clarkb	I wonder if the intermediate speed could be due to a mix of hot and cold files?	22:53
clarkb	yum is requesting a whole set then depending on which are warm and not you get a different aggregate speed? probably still a good idea to check afs cache hit/miss rate	22:54
donnyd	granted it only took 3 seconds from flash to bang on the download, so I am not sure that doing something would make a marked improvement on build times	22:54
donnyd	that part started here	22:54
donnyd	https://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_11_871635	22:54
donnyd	But one would think that if its in cache you would see speeds more like our test	22:56
*** tkajinam has joined #openstack-infra		22:56
*** eernst has joined #openstack-infra		22:57
fungi	throughput for one large file is going to be markedly different than serialized throughput for lots of small files each with its own request	22:57
donnyd	any recommendations on log aggregation tools / looks like E(L/F)K is pretty much the standard. Just curious	22:58
clarkb	corvus: ianw http://docs.openafs.org/Reference/1/xstat_cm_test.html is that the command we want to run to get info about cache hit/miss rates?	22:58
clarkb	the stat cache is a separate from the data cache	23:00
* clarkb looks to see what our statcache size is		23:00
fungi	donnyd: or old school it with rsyslog on a mysql backend	23:02
corvus	clarkb: i'm unfamiliar with that command	23:02
clarkb	getcacheparms seems to only show dcache entries?	23:02
clarkb	corvus: do you know how to get info on the stat cache?	23:02
*** eernst has quit IRC		23:02
corvus	no	23:02
donnyd	fungi: which do you think requires the lowest level of maintenance	23:02
donnyd	I was looking a graylog just because it does some inline transforms I would think I could make use of	23:03
clarkb	docs say default stats cache size is 300 on unix machines	23:03
clarkb	trying to see what we are actually set to /me digs around in config files	23:03
*** aaronsheffield has quit IRC		23:03
fungi	donnyd: once set up neither require a considerable amount of maintenance. elk is a lot more resource-heavy, but has all your log analysis and indexing without needing separate analysis tools	23:03
*** diablo_rojo has quit IRC		23:03
fungi	when i hear "log aggregation" i don't immediately assume log analysis too	23:04
donnyd	fungi: I want to push the logs public too, but I don't particularly want to hand someone keys to the network	23:04
donnyd	So something I could easily do a transform on would be baller	23:05
fungi	kibana can be public facing. exposing elacsticsearch to the open internet on the other hand is dangerific	23:05
*** mattw4 has quit IRC		23:06
*** mattw4 has joined #openstack-infra		23:06
clarkb	afsd manpage says the -stat value is actually based on the size of the dcache	23:06
donnyd	LOL, of course I just meant a dashboard.. not handing over my monitoring stuffs to anyone either, but dashboards are fairly harmless	23:06
fungi	but that raises another hidden goal which mere "log aggregation" doesn't cover in my book. publication is something kibana handles for you. if you wanted a web frontend to your mysql tables that would be yet another something you'd need to find/write	23:07
clarkb	I now believe that line 3 in http://paste.openstack.org/show/755542/ may exposethe stat cache	23:07
clarkb	we set the dcache to 50GB and from that afs decides we should be able to cache 1.5 million stat entries	23:07
clarkb	we actually use more dcache than stat cache so that should be fine	23:08
clarkb	corvus: any objection to me trying that cm_test command?	23:09
donnyd	fungi: so you are saying elk is a pretty safe bet	23:10
clarkb	`xstat_cm_test -cmname mirror01.regionone.fortnebula.org -colID 2 -onceonly` something like that	23:10
corvus	clarkb: honestly, no idea	23:11
fungi	donnyd: well, i'm saying it's a popular choice, and you're looking for at least a few bells and whistles it provides over mere log aggregation	23:11
donnyd	yea, I think that is safe to say. I need a log thingy that does all the things	23:12
fungi	be mindful of the licensing. there are a proprietary license builds subject to a number of additional conditions, you probably want to stick to the open source version	23:13
donnyd	thanks fungi	23:14
clarkb	dcache miss rate is .08%. vcache miss rate is .3%	23:16
clarkb	both are within the recommended parameters	23:17
clarkb	https://docs.openafs.org/AdminGuide/HDRWQ402.html	23:17
*** markvoelker has quit IRC		23:20
clarkb	mirror01.regionone.fortnebuila.opendev.org:/root/cache_manager_data if anyone else wants to see the verbose output	23:20
*** betherly has joined #openstack-infra		23:27
*** betherly has quit IRC		23:32
*** kjackal has quit IRC		23:38
openstackgerrit	James E. Blair proposed opendev/base-jobs master: Synchronize cloud creds in base-test-swift job https://review.opendev.org/674713	23:46
corvus	infra-root: ^ small update i missed earlier	23:46
clarkb	vos release has now been running for almost an hour. The rsync pulled in some new arm64 images for atomic. I do wonder if this was ultimately what broke it in the first place?	23:46
clarkb	+2	23:46
*** weifan has quit IRC		23:48
*** weifan has joined #openstack-infra		23:48
fungi	quite possible	23:52
fungi	what jobs are using atomic images?	23:52
*** weifan has quit IRC		23:52
clarkb	fungi: magnum jobs aiui	23:53
corvus	i think the expiration time is something like 12h	23:53
corvus	though i'm not sure what happens if we hit our cron job timeout first	23:53
clarkb	it actually doesn't look like we have a cron job timeout on these	23:54
corvus	good	23:54
corvus	the 12h timeout (or maybe it's 10h?) is the longest that a vos release can take without using -localauth	23:54
clarkb	ya crontab lacks timeouts and the commands with timeouts in the script itself do not include the vos release	23:55
corvus	so if a release from mirror-update takes longer than that, we'll be left with a locked volume	23:55
clarkb	k	23:56
clarkb	in that case it is probably safe for me to unlock the lockfile when this vos release finishes assuming it does so in under 12 hours	23:56
corvus	(the transaction to perform the update will eventually finish, but the auth will have expired for the subsequent volume unlock)	23:56
corvus	clarkb: you running from m-u?	23:56
clarkb	corvus: no I am running the vos release on afs01.dfw.openstack.org	23:57
*** smarcet has joined #openstack-infra		23:57
corvus	oh you mean unlock the cron lockfile	23:57
corvus	gotcha, yeah	23:57
clarkb	yes that	23:57
*** smarcet has left #openstack-infra		23:57
openstackgerrit	Merged opendev/base-jobs master: Synchronize cloud creds in base-test-swift job https://review.opendev.org/674713	23:57
*** owalsh_ has joined #openstack-infra		23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!