Monday, 2019-08-05

*** markvoelker has joined #openstack-infra00:02
*** markvoelker has quit IRC00:06
*** markvoelker has joined #openstack-infra00:45
*** jamesmcarthur has quit IRC00:57
openstackgerritIan Wienand proposed opendev/system-config master: kafs support  https://review.opendev.org/62397400:59
openstackgerritIan Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels  https://review.opendev.org/66505700:59
openstackgerritIan Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd  https://review.opendev.org/67421500:59
*** markvoelker has quit IRC01:09
*** jamesmcarthur has joined #openstack-infra01:19
ianwclarkb: thanks for looking at mirror; lmn if any help needed01:19
openstackgerritMerged opendev/system-config master: Re-add the Debian 8/jessie key to reprepro  https://review.opendev.org/67440601:29
*** jamesmcarthur has quit IRC01:32
*** yikun has joined #openstack-infra01:44
*** yamamoto has joined #openstack-infra01:45
openstackgerritMerged openstack/diskimage-builder master: journal-to-console: element to send systemd journal to console  https://review.opendev.org/66978402:00
*** markvoelker has joined #openstack-infra02:01
*** bhavikdbavishi has joined #openstack-infra02:02
*** jamesmcarthur has joined #openstack-infra02:03
*** yamamoto has quit IRC02:04
*** yamamoto has joined #openstack-infra02:04
*** markvoelker has quit IRC02:06
*** bhavikdbavishi has quit IRC02:07
*** bhavikdbavishi1 has joined #openstack-infra02:07
*** bhavikdbavishi1 is now known as bhavikdbavishi02:09
*** n-saito has joined #openstack-infra02:10
*** bhavikdbavishi has quit IRC02:22
*** markvoelker has joined #openstack-infra02:32
*** yamamoto has quit IRC02:35
*** whoami-rajat has joined #openstack-infra02:38
*** yamamoto has joined #openstack-infra02:38
*** markvoelker has quit IRC02:42
*** gregoryo has joined #openstack-infra02:46
openstackgerritMerged openstack/diskimage-builder master: Cleanup: remove useless statement  https://review.opendev.org/66837202:46
*** ricolin has joined #openstack-infra02:51
*** bhavikdbavishi has joined #openstack-infra03:11
*** jamesmcarthur has quit IRC03:11
*** markvoelker has joined #openstack-infra03:13
*** markvoelker has quit IRC03:17
*** psachin has joined #openstack-infra03:29
*** ricolin_ has joined #openstack-infra03:35
*** ricolin has quit IRC03:38
*** mrda has joined #openstack-infra03:39
mrdahey infra, I've seen some strange behaviour lately in git cloning off opendev.org.  For example, I cloned https://opendev.org/openstack/devstack.git and got a bunch of zero length files.  i.e. like this: https://pastebin.com/fhAHaZpb  I didn't see any errors on the clone, and only when I ran ./tools/create-stack-user.sh did I see anything wrong.  Just wanted to add the data point here in case others03:41
mrdasee it too.03:41
*** jamesmcarthur has joined #openstack-infra03:42
*** ramishra has joined #openstack-infra04:04
*** udesale has joined #openstack-infra04:07
*** AJaeger is now known as AJaeger_04:11
*** dpawlik has joined #openstack-infra04:13
*** dpawlik has quit IRC04:20
*** ykarel has joined #openstack-infra04:23
*** markvoelker has joined #openstack-infra04:28
*** Lucas_Gray has joined #openstack-infra04:29
*** jamesmcarthur has quit IRC04:30
*** markvoelker has quit IRC04:33
*** ramishra has quit IRC04:45
*** ramishra has joined #openstack-infra04:45
ianwmrda: is it only devstack or you've seen this on other repos?04:45
fungii can't seem to reproduce it. could you have run out of disk space?04:47
mrdaIt's happened 3 times in a week, and I've just recloned and all was well.04:48
mrdaIt's not a disk space issue. FWIW, it was in a local KVM running F29 on a F30 host.04:49
fungistrange, i wonder if one of the backends has a corrupt copy04:49
mrdaJust figured y'all would like to know in case any other reports come in04:49
fungiappreciated!04:51
fungii'm test cloning it from all 8 of the backends individually now04:51
mrdathanks fungi04:51
fungino dice. tried to reproduce by cloning from all 8 backends individually04:59
fungiso doesn't look like a corrupt repository on any of them at least04:59
*** janki has joined #openstack-infra05:02
mrdaok, thanks for letting me know.  If I see it again I'll report it here.05:02
fungianother possibility could be a misbehaving proxy (if you have a transparent https proxy anyway)05:03
*** jamesmcarthur has joined #openstack-infra05:04
*** Lucas_Gray has quit IRC05:25
*** notmyname has quit IRC05:40
*** jaosorior has joined #openstack-infra05:41
*** notmyname has joined #openstack-infra05:41
*** dpawlik has joined #openstack-infra05:43
*** kota_ has quit IRC05:46
*** yamamoto has quit IRC05:47
*** yamamoto has joined #openstack-infra05:48
ianwi would have thought git's checksumming would have avoided proxies getting in the way ... the other thing might be some odd manifestation of I5ebdaded3ffd0a5bc70c5e9ab5b18daefb358f58 if it's devstack only05:51
*** jamesmcarthur has quit IRC05:53
ianwlike if you only notice it after it runs05:58
mrdaI shall take a look05:59
*** dchen has quit IRC06:06
*** ramishra has quit IRC06:06
*** ramishra has joined #openstack-infra06:06
*** jamesmcarthur has joined #openstack-infra06:23
*** jamesmcarthur has quit IRC06:27
*** dpawlik has quit IRC06:28
*** jtomasek has joined #openstack-infra06:34
*** kota_ has joined #openstack-infra06:38
*** dchen has joined #openstack-infra06:40
*** ricolin__ has joined #openstack-infra06:41
*** ricolin__ is now known as ricolin06:41
*** dpawlik has joined #openstack-infra06:44
*** ricolin_ has quit IRC06:45
*** apetrich has joined #openstack-infra06:48
*** jamesmcarthur has joined #openstack-infra06:49
*** kopecmartin|pto is now known as kopecmartin06:51
*** dchen has quit IRC06:53
*** ginopc has joined #openstack-infra06:54
*** ralonsoh has joined #openstack-infra06:54
*** jamesmcarthur has quit IRC06:54
openstackgerritIan Wienand proposed opendev/system-config master: Ansible roles for backup  https://review.opendev.org/66265707:01
*** pgaxatte has joined #openstack-infra07:01
*** rcernin has quit IRC07:04
*** slaweq has joined #openstack-infra07:05
*** xek has joined #openstack-infra07:05
*** pcaruana has joined #openstack-infra07:08
*** pkopec has joined #openstack-infra07:11
*** markvoelker has joined #openstack-infra07:17
*** tesseract has joined #openstack-infra07:17
*** tosky has joined #openstack-infra07:19
*** bhavikdbavishi has quit IRC07:27
*** xek has quit IRC07:28
*** ccamacho has joined #openstack-infra07:49
*** markvoelker has quit IRC07:51
*** rpittau|afk is now known as rpittau07:51
*** ykarel is now known as ykarel|lunch07:59
*** dchen has joined #openstack-infra08:07
*** bhavikdbavishi has joined #openstack-infra08:08
*** tkajinam has quit IRC08:11
*** yamamoto has quit IRC08:13
*** iurygregory has joined #openstack-infra08:16
*** arxcruz is now known as arxcruz|grb08:18
*** arxcruz|grb is now known as arxcruz|brb08:18
*** dchen has quit IRC08:19
*** tesseract-RH has joined #openstack-infra08:22
*** tesseract has quit IRC08:22
*** piotrowskim has joined #openstack-infra08:24
*** Lucas_Gray has joined #openstack-infra08:24
*** tesseract-RH has quit IRC08:24
*** tesseract has joined #openstack-infra08:24
*** gregoryo has quit IRC08:25
openstackgerritIan Wienand proposed opendev/zone-opendev.org master: Add vexxhost backup server  https://review.opendev.org/67454908:29
*** Lucas_Gray has quit IRC08:30
openstackgerritIan Wienand proposed opendev/system-config master: Add vexxhost backup server  https://review.opendev.org/67455008:36
*** sshnaidm|afk is now known as sshnaidm08:37
*** janki has quit IRC08:43
*** smrcascao has joined #openstack-infra08:45
*** e0ne has joined #openstack-infra08:48
openstackgerritMerged opendev/system-config master: Ansible roles for backup  https://review.opendev.org/66265708:48
*** jaosorior has quit IRC08:51
*** yamamoto has joined #openstack-infra08:54
openstackgerritMerged opendev/irc-meetings master: Free up unused Murano meeting slot  https://review.opendev.org/67430708:56
*** markvoelker has joined #openstack-infra08:56
openstackgerritMerged opendev/irc-meetings master: Free up unused Gluon meeting slot  https://review.opendev.org/67430408:59
openstackgerritMerged opendev/irc-meetings master: Free up unused openstack-chef meeting slot  https://review.opendev.org/67430008:59
*** gfidente has joined #openstack-infra09:00
openstackgerritIan Wienand proposed opendev/system-config master: Add vexxhost backup server  https://review.opendev.org/67455009:00
*** BrentonPoke has quit IRC09:01
*** ykarel|lunch is now known as ykarel09:08
*** yamamoto has quit IRC09:09
*** tdasilva has joined #openstack-infra09:09
*** guoqiao has quit IRC09:24
*** markvoelker has quit IRC09:30
*** yamamoto has joined #openstack-infra09:44
*** janki has joined #openstack-infra09:45
*** spsurya has joined #openstack-infra09:50
*** bhavikdbavishi has quit IRC09:53
*** jaosorior has joined #openstack-infra10:05
*** ociuhandu has joined #openstack-infra10:20
*** janki has quit IRC10:27
openstackgerritSagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL  https://review.opendev.org/67457210:38
*** markvoelker has joined #openstack-infra10:38
sshnaidmcores please take a look ^^10:42
*** markvoelker has quit IRC10:43
*** arxcruz|brb is now known as arxcruz10:52
mordredsshnaidm: left a suggestion10:55
openstackgerritSagi Shnaidman proposed zuul/zuul-jobs master: Don't install centos repos on RHEL  https://review.opendev.org/67457210:57
*** tdasilva has quit IRC10:57
sshnaidmmordred, updated ^10:57
*** tdasilva has joined #openstack-infra10:58
mordredsshnaidm: lgtm. I think we could also get rid of that ansible_os_family line now - but this reads well10:58
*** dchen has joined #openstack-infra11:00
*** rosmaita has joined #openstack-infra11:01
*** rascasoft has quit IRC11:03
*** rascasoft has joined #openstack-infra11:05
*** pgaxatte has quit IRC11:06
*** pkopec_ has joined #openstack-infra11:10
*** pkopec__ has joined #openstack-infra11:14
*** pkopec has quit IRC11:14
*** pkopec has joined #openstack-infra11:15
*** pkopec_ has quit IRC11:16
*** pkopec__ has quit IRC11:18
*** jaosorior has quit IRC11:23
*** ramishra has quit IRC11:24
openstackgerritIan Wienand proposed opendev/system-config master: kafs support  https://review.opendev.org/62397411:26
openstackgerritIan Wienand proposed opendev/system-config master: ubuntu-kernel: role to use Ubuntu mainline kernels  https://review.opendev.org/66505711:26
openstackgerritIan Wienand proposed opendev/system-config master: kafs: allow to skip cachefilesd  https://review.opendev.org/67421511:26
*** ramishra has joined #openstack-infra11:26
*** jamesdenton has joined #openstack-infra11:30
*** pkopec_ has joined #openstack-infra11:31
*** pkopec has quit IRC11:33
*** rh-jelabarre has joined #openstack-infra11:45
*** jamesmcarthur has joined #openstack-infra11:46
*** ociuhandu has quit IRC11:49
*** ociuhandu has joined #openstack-infra11:50
*** yamamoto has quit IRC11:53
*** markvoelker has joined #openstack-infra11:55
*** adriancz has joined #openstack-infra12:01
*** markvoelker has quit IRC12:02
*** markvoelker has joined #openstack-infra12:02
*** pgaxatte has joined #openstack-infra12:02
*** jamesmcarthur has quit IRC12:05
*** dchen has quit IRC12:06
*** udesale has quit IRC12:06
*** udesale has joined #openstack-infra12:07
*** markvoelker has quit IRC12:09
*** jaosorior has joined #openstack-infra12:10
*** markvoelker has joined #openstack-infra12:11
*** ociuhandu has quit IRC12:12
*** yamamoto has joined #openstack-infra12:13
*** betherly has joined #openstack-infra12:15
*** rfolco has joined #openstack-infra12:16
*** rfolco is now known as rfolco|ruck12:16
*** ociuhandu has joined #openstack-infra12:20
sshnaidmsome problem with limestone:  SSH Error: data could not be sent to remote host "10.4.70.72". Make sure this host can be reached over ssh12:24
*** yamamoto has quit IRC12:24
sshnaidmHostname: centos-7-limestone-regionone-0009753160 Provider: limestone-regionone12:24
*** rlandy has joined #openstack-infra12:26
logan-sshnaidm: could you link to the job logs, i'd like to take a look into that12:29
sshnaidmlogan-, https://logs.opendev.org/33/674433/7/check/tripleo-ci-centos-7-containers-multinode/bd26761/job-output.txt.gz#_2019-08-05_12_05_52_38981512:31
logan-thanks12:31
*** jroll has quit IRC12:39
*** jroll has joined #openstack-infra12:39
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213412:40
openstackgerritMark Meyer proposed zuul/zuul master: Rework some bugs  https://review.opendev.org/67442512:40
*** jamesmcarthur has joined #openstack-infra12:41
*** ccamacho has quit IRC12:43
*** priteau has joined #openstack-infra12:45
*** ramishra has quit IRC12:46
*** yamamoto has joined #openstack-infra12:48
*** jamesdenton has quit IRC12:49
*** priteau has quit IRC12:54
logan-sshnaidm: going to monitor for a while. it looks like some rabbitmq messages were being held up due to a massive "notifications.sample" queue backlog. I cleaned that up and I don't see neutron messages backing up anymore, but that may have been the cause of communications issues during your multinode jobs. we'll see12:55
sshnaidmlogan-, thanks!12:56
*** bhavikdbavishi has joined #openstack-infra12:59
*** jamesmcarthur has quit IRC13:04
*** aaronsheffield has joined #openstack-infra13:04
*** jamesdenton has joined #openstack-infra13:04
fungioverrun with rabbits13:05
*** ekultails has joined #openstack-infra13:05
*** goldyfruit has joined #openstack-infra13:06
*** goldyfruit has quit IRC13:11
*** bhavikdbavishi has quit IRC13:12
*** bhavikdbavishi has joined #openstack-infra13:13
*** Lucas_Gray has joined #openstack-infra13:14
*** mriedem has joined #openstack-infra13:14
*** jamesmcarthur has joined #openstack-infra13:15
*** ykarel is now known as ykarel|afk13:22
mordredfungi: I didn't realize the outer banks had rabbit issues13:22
fungijust the openstack parts13:23
*** ociuhandu has quit IRC13:32
*** HenryG has quit IRC13:37
*** aedc has quit IRC13:38
openstackgerritThierry Carrez proposed openstack/ptgbot master: Display room capabilities  https://review.opendev.org/67460613:38
*** jbadiapa has joined #openstack-infra13:39
*** aedc has joined #openstack-infra13:39
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: POC: Add ensure-managed role  https://review.opendev.org/67460913:42
*** jamesdenton has quit IRC13:44
*** jbadiapa has quit IRC13:44
*** pcaruana has quit IRC13:47
*** dchen has joined #openstack-infra13:48
*** dchen has joined #openstack-infra13:49
*** dchen has quit IRC13:49
*** ykarel|afk is now known as ykarel|away13:52
*** bhavikdbavishi has quit IRC13:54
*** iurygregory has quit IRC13:55
*** yamamoto has quit IRC13:58
*** pcaruana has joined #openstack-infra14:00
*** ramishra has joined #openstack-infra14:07
*** yamamoto has joined #openstack-infra14:08
johnsomIs http://paste.openstack.org/ down or is it just me?14:10
AJaeger_infra-root, takes ages to respond (still spinning) for  me as well ^14:11
AJaeger_wait, it's there now...14:11
johnsomYeah, mine just opened as well.14:11
*** rlandy is now known as rlandy|brb14:13
*** ociuhandu has joined #openstack-infra14:19
*** eharney has joined #openstack-infra14:20
*** Lucas_Gray has quit IRC14:20
*** Lucas_Gray has joined #openstack-infra14:22
*** jcoufal has joined #openstack-infra14:22
*** _erlon_ has joined #openstack-infra14:23
*** rlandy|brb is now known as rlandy14:23
*** jamesdenton has joined #openstack-infra14:23
fungii believe it occasionally gets its mysql socket timed out and the backend daemon doesn't realize it, so takes a while before it tries to reconnect14:23
*** _erlon_ has left #openstack-infra14:23
fungii've tried to track down the cause several times in the past, to no avail14:24
fungiwe probably ought to just move its db back out of trove and keep it local to the server14:24
*** pkopec_ has quit IRC14:25
*** mriedem has quit IRC14:29
*** mriedem has joined #openstack-infra14:29
*** yamamoto has quit IRC14:31
*** yamamoto has joined #openstack-infra14:32
*** jaosorior has quit IRC14:36
*** yamamoto has quit IRC14:37
openstackgerritThierry Carrez proposed openstack/ptgbot master: Generate etherpad links automatically  https://review.opendev.org/67462214:38
*** jaosorior has joined #openstack-infra14:39
openstackgerritThierry Carrez proposed openstack/ptgbot master: Display room capabilities  https://review.opendev.org/67460614:52
*** Lucas_Gray has quit IRC14:54
clarkbI'm going to rerun the fedora mirror sync with updated atomic excludes just as soon as my morning meeting completes14:57
clarkbthen rerun vos release in screen with -localauth14:57
clarkband if that looks happy I will release the lockfile on mirror-update14:57
*** pkopec has joined #openstack-infra14:59
*** bnemec-pto is now known as bnemec15:01
donnydclarkb: It looks like you are moving to object storage, but I haven't been following for what. I am working on getting some all nvme object storage going (gonna be a bit). Is every provider going to need object stores?15:01
clarkbdonnyd: I think the idea is to use object stores for log storage if they are available. We won't be requiring that of every cloud but will happily make use of them from those that have it (as a note we probably want ipv4 from those instances since client browsers will make log requests directly to the swift api)15:03
donnydAlso FN is at capacity, and it looks like all of the issues have been pretty well ironed out. The only errors i get is when a fresh image is loaded, nodepool schedules too many of that instance type too fast15:03
clarkbdonnyd: as long as that results in "clean failures" eg we don't break the cloud in the process that is probably fine15:03
donnydAll the api's are on v4 here (also working v6 for that too.)15:03
donnydSo if its useful, I can make it my next target... I need to find a faster backend for glance anyways. How much space are you thinking the logs will require? Performance sla?15:04
clarkbdonnyd: corvus likely has better grasp of those numbers since he has been working on it recently, but I can take a look after meetings too15:05
*** Lucas_Gray has joined #openstack-infra15:07
donnydJust for anyone who was curious, I know in the beginning of getting FN we talked a little about power requirements. At full load with 100(ish) instance in system I am using 3200 watts15:13
fungithat's still probably half the power my dec alphastation drew at idle ;)15:15
donnydAbout 250 bucks a month in electrical costs. Quite honestly that isn't that bad15:16
donnydThe blade chassis isn't really efficient until i put a full lot in it, and a full load on it. My 3 EMC isilions were taking more than that just for disk prior to the NVME build15:17
fungiwow15:17
*** yamamoto has joined #openstack-infra15:19
*** yamamoto has quit IRC15:25
*** goldyfruit has joined #openstack-infra15:26
*** pkopec has quit IRC15:26
*** betherly has quit IRC15:29
*** ralonsoh has quit IRC15:29
*** iurygregory has joined #openstack-infra15:38
*** ociuhandu has quit IRC15:40
*** ociuhandu has joined #openstack-infra15:40
*** ociuhandu has quit IRC15:41
*** sthussey has joined #openstack-infra15:41
*** ociuhandu has joined #openstack-infra15:42
*** ociuhandu has quit IRC15:43
*** ociuhandu has joined #openstack-infra15:43
openstackgerritTobias Henkel proposed zuul/zuul master: Report retried builds in a build set via mqtt.  https://review.opendev.org/63272715:43
openstackgerritTobias Henkel proposed zuul/zuul master: Report retried builds via sql reporter.  https://review.opendev.org/63350115:43
*** ociuhandu has quit IRC15:43
*** ociuhandu has joined #openstack-infra15:45
*** ociuhandu has quit IRC15:45
*** ociuhandu has joined #openstack-infra15:45
*** pgaxatte has quit IRC15:48
*** gyee has joined #openstack-infra15:48
*** Lucas_Gray has quit IRC15:51
corvusdonnyd, clarkb: our current total log usage is 9.2TiB in 380M inodes; we're not sure how many cloud providers we can use yet, but including FN, it would be between 4 and 7, so that's something like 1.3-1.2TiB per provider and 54-95M objects.15:52
corvusdonnyd, clarkb: if we have room, we might expand our log retention a bit15:54
clarkbcorvus: we are at 4 weeks now?15:54
donnydRequirements for speed?15:54
corvusyeah, looks like 30 days15:55
corvusi don't think we've ever characterized the numbers around speed.  everything gets written once and then read once automatically, after that, very little of it gets read.  there shouldn't be a lot of concurrent activity, and read throughput isn't a big deal, but it would be good if it didn't take too long to fetch a random file since that's a user-interactive thing.15:57
corvusmaybe there's a nugget of useful info in there, sorry.15:58
*** AJaeger_ is now known as AJaeger15:59
donnydI can add on swift without issue, just gonna take some time. The reason I ask is I am using all nvme disks. If this has no specific performance requirements, I can put it on normal spinning rust and offer a lots more in terms of available space. I have 3.2TB of nvme to throw at swift and 500Tb of spinning rust laying around.16:00
donnydPreferance for swift over ceph or vice vesa?16:01
clarkbdonnyd: we need CORS headers to make this work and ceph's swift api doesn't support that (the s3 api does but we've not yet gotten that working/confirmed) I think that means the preference for now is swift proper16:02
corvusi think i lean toward more spinning rust being more useful in the long run; should be sufficient performance and has potential for growth.  clarkb, fungi?16:02
corvusswift++16:02
clarkbdonnyd: also spinning disk is probably fine since ya being able to extend retention would be great16:02
clarkbI am starting the fedora mirror sync with updates (no vos release) now16:02
fungiooh, this is always a fun debate16:03
fungiand i agree, cheap slow bulk storage is likely a great tradeoff16:03
corvusi typo'd before -- current retention would be 1.3-2.3TiB per provider16:04
donnydOk, I think i can do that without any issues at all.16:04
*** tesseract has quit IRC16:04
donnydIf its rust, the smallest drive I have is 3tb16:04
fungiwrite speed probably matters more than read speed (though maybe the amount of the two is going to be roughly on par considering our current logstash worker implementation?)16:04
corvusif we increase retention to our target of 6months, it would be 8-14TiB per provider16:04
*** ociuhandu_ has joined #openstack-infra16:04
donnydso as long as we can keep it below say 20TB, I am a happy camper.16:05
corvusseems reasonable16:05
corvusfungi: can you +3 https://review.opendev.org/674459 ?16:05
fungicorvus: yep, that's a simple enough one16:06
*** ociuhandu has quit IRC16:07
*** michael-beaver has joined #openstack-infra16:09
*** ginopc has quit IRC16:09
*** lpetrut has joined #openstack-infra16:09
clarkbrsync done, doing vos release with localauth now16:09
donnydAlso more interesting tidbits of info on what the CI looks like from the cloud perspective. The instances only use about 25% of what they are provisioned for disk space16:14
openstackgerritMerged opendev/base-jobs master: Add zuul manifest role to swift base job  https://review.opendev.org/67445916:15
AJaegerconfig-core, please review https://review.opendev.org/67398816:15
clarkbAJaeger: on it16:15
AJaegercan we merge new repo creation changes again? https://review.opendev.org/673900 and https://review.opendev.org/673898 create two new airship repos16:16
AJaegerthanks, clarkb16:16
AJaegerand here's one more change: https://review.opendev.org/65843916:16
clarkbAJaeger: yes all gitea backends have been replaced and are being managed by ansible, creating projects should be safe16:16
AJaegerand one more https://review.opendev.org/673764 for review, please16:17
AJaegerclarkb: great16:17
*** e0ne has quit IRC16:19
clarkbAJaeger: pabelanger I left a concern on https://review.opendev.org/#/c/658439/216:19
AJaegerthanks for spotting16:20
*** gfidente has quit IRC16:20
fungidonnyd: we could likely get away with something like 3:1 oversubscribed storage because only a fraction of jobs need additional space (for example jobs which create an assortment of cinder volumes for some tests). problem likely is that oversubscribing means thin-provisioning rootfs volumes, which likely implies a write throughput penalty16:21
fungialso some jobs use swap, and i expect that would get markedly worse performance on thin-provisioned volumes16:22
openstackgerritMerged openstack/project-config master: Fix ACL for compute-hyperv  https://review.opendev.org/67398816:28
AJaegerclarkb: time to review the two new repo creations changes as well?16:28
donnydI see no write penalties on my thin volumes, but my drives are so much faster than the rest of everything else... I can't really tell16:29
* AJaeger would love to be in that situation ;)16:30
clarkbknocked 160GB off the fedora mirror I think16:30
clarkbAJaeger: and ya I'll review those shortly16:30
AJaegerclarkb: thanks16:31
donnydIts fun to see just how stupid fast nvme drives really are16:31
*** ociuhandu_ has quit IRC16:33
*** ociuhandu has joined #openstack-infra16:33
donnydIf only i could get this on all the hypervisors   READ: bw=11.4GiB/s (12.2GB/s), 1454MiB/s-1555MiB/s (1525MB/s-1630MB/s), io=80.0GiB (85.9GB), run=6586-7043msec16:33
*** mattw4 has joined #openstack-infra16:34
*** bhavikdbavishi has joined #openstack-infra16:34
openstackgerritMerged openstack/project-config master: Allow registered users to vote for backport candidates  https://review.opendev.org/67376416:35
donnydAJaeger: And I have less in it than most peoples laptops16:36
*** diablo_rojo has joined #openstack-infra16:37
*** mattw4 has quit IRC16:37
*** mattw4 has joined #openstack-infra16:39
*** jamesmcarthur has quit IRC16:39
clarkbinfra-root AJaeger can you check my comment on https://review.opendev.org/#/c/673900/2 and consider if that concern is valid?16:40
fungidonnyd: oh, yeah, if you're already thin-provisioning and not seeing a disk bottleneck, then i guess that's ideal (and also presumably means you have available blocks on the backend?)16:40
mordredclarkb: I agree with yoru comment16:42
*** weifan has joined #openstack-infra16:43
donnydfungi: I am 100% overprovisioned on the backend. ( requested 8TB, have 4Tb). We are only using 25% of the 4tb I have.16:43
donnydso its more like it actually uses 12.5 of what is requested.16:43
AJaegerclarkb: I see your point but view people type it, so I'm fine either way.16:44
clarkbya testing cinder in particular requires quite a bit of disk space, but most other jobs likely don't get near the limits16:44
*** ramishra has quit IRC16:44
donnydI will keep an eye on it, as today is the first day at full capacity16:44
clarkb(cinder wouldn't need so much disk if they allowed for specifying sizes smaller than incremements of 1GB)16:44
clarkbiirc devstack provisions ~30GB of disk just for cinder beause we have multiple threads running and the volume deletion for test cleanup can lag behind subsequent tests starting16:45
fungidonnyd: that's excellent data, thanks!16:45
*** ociuhandu has quit IRC16:45
clarkbAJaeger: I mostly don't want the next k8s related thing to come along and be cranky someone has decided they are the k8s thing16:46
donnydit just interesting to see what the CI system actually uses.. From my POV, its memory, then cpu, then network, then storage16:46
fungiclarkb: some swift tests might too, since i know they create an auxiliary filesystem to be able to exercise extended xfs attributes16:46
clarkbdonnyd: that is great feedback16:47
clarkbwould it be worthwhile for us to have donnyd write a little section of notes we can tack onto https://docs.openstack.org/infra/system-config/contribute-cloud.html ?16:47
clarkba "from the provider perspective" notes?16:48
openstackgerritMerged openstack/project-config master: New project request: airship/porthole  https://review.opendev.org/67389816:48
corvusyeah, the storage overprovisioning factor is a great new number to have16:48
*** markvoelker has quit IRC16:49
donnydI need to find a way to find out what it needs from an iops, BW perspective as well to be optimal16:49
clarkbdonnyd: https://opendev.org/opendev/system-config/src/branch/master/doc/source/contribute-cloud.rst is the source file for that doc if you want to propose a change with your thoughts. I am happy to push a change with you too if you would rather we do ti that way (can collaborate on an etherpad if so)16:49
donnydhttps://usercontent.irccloud-cdn.com/file/UmYecpK0/Screenshot%20from%202019-08-05%2012-50-09.png16:50
*** ociuhandu has joined #openstack-infra16:50
donnydDo we just want to add a notes section?16:51
donnydAlso is it helpful to have public facing dashboards for FN, so you can capture the data from the provider side?16:52
clarkbdonnyd: maybe a "From a provider's perspective" section then you can format that in whatever manner makes the most sense16:53
clarkbfor dashboards I think that may also help but am not sure how easy it isfor you to do that without also exposing important details16:53
*** jamesmcarthur has joined #openstack-infra16:54
donnydThere are no details that are important.. this is the only workload on here16:54
donnydI think the important details are probably going to provide valuable data for the underpinnings of the CI. The more we know, the more efficient we can make things.16:56
clarkb++16:56
*** ricolin has quit IRC16:57
donnydJust no sure what data we want to gather, but I have grafana hooked up to gnocchi... just don't know how to gnocchi quite yet16:57
*** kopecmartin is now known as kopecmartin|off16:59
*** markvoelker has joined #openstack-infra17:01
corvusclarkb, fungi, mordred: mnaser told me about "openstack ec2 credentials create" so i tried that and plugged those into this script, but i still get an error: http://paste.openstack.org/show/755536/  -- any ideas of anything else we could try?17:01
clarkbthat makes me wonder if the ceph s3 api isn't tied into the openstack aws api support17:05
clarkbec2 credentials for example are a nova thing17:05
logan-^ typically that works, at least that's the way we connect s3 clients to our radosgw17:06
*** jamesmcarthur has quit IRC17:07
*** udesale has quit IRC17:07
corvusthat's good to know; i guess we'll wait to see what mnaser says17:08
*** bhavikdbavishi1 has joined #openstack-infra17:10
*** bhavikdbavishi has quit IRC17:12
*** bhavikdbavishi1 is now known as bhavikdbavishi17:12
fungithat's an interesting entanglement, but i guess makes some sense17:12
*** jamesmcarthur has joined #openstack-infra17:12
*** goldyfruit has quit IRC17:18
*** goldyfruit_ has joined #openstack-infra17:18
donnydcan anyone hit https://grafana.fortnebula.com17:21
AJaegeryes, I can, donnyd17:21
diablo_rojoI can too17:22
donnydis there anything worthwhile on it?17:22
clarkband the openstack utilization dashboard renders for me too17:22
AJaegerdonnyd: nit, it's OpenStack with capital S; )17:22
donnydrefresh AJaeger17:22
AJaegermax 75 % utilization...17:22
AJaegerdonnyd: nit trick ;)17:23
donnydLOL17:23
AJaegerdonnyd: interesting, compute-8 and compute-9 have very low cpu utilization compared with rest17:24
donnydDoes anyone know how to actually use gnocchi... these are backend stats17:24
donnydcompute-9 always will, its the dedicated node for the mirror17:24
AJaegerand compute-2 has more than average...17:24
AJaegerah, I see17:24
donnydcompute-8 was placed into serivce early this morning17:24
donnydWould you want to see anything else... I was thinking maybe detailed dashboards per hypervisor or something like that maybe... I need to also gather front-end metrics17:26
donnydlike instance related things17:26
clarkbdonnyd: aggregate network bw at the gateway/boundary would probably be helpful17:27
donnydOk, I can do that17:28
clarkbdonnyd: in part because that might help us evaluate effectiveness of the mirrors/caching17:28
*** psachin has quit IRC17:28
*** ociuhandu has quit IRC17:28
*** ociuhandu has joined #openstack-infra17:29
*** ociuhandu has quit IRC17:31
*** ociuhandu has joined #openstack-infra17:31
*** jamesmcarthur has quit IRC17:32
donnydhow about now clarkb17:32
donnydMy local network requirements are usually pretty low, so that is 90% the CI17:33
clarkbdonnyd: edge bw graph lgtm17:34
donnydsweet...17:34
*** kjackal has joined #openstack-infra17:34
*** ociuhandu has quit IRC17:35
*** jamesmcarthur has joined #openstack-infra17:37
*** sgw has quit IRC17:37
*** jamesmcarthur_ has joined #openstack-infra17:39
*** jamesmcarthur has quit IRC17:41
*** jamesmcarthur has joined #openstack-infra17:42
*** rpittau is now known as rpittau|afk17:42
*** tdasilva has quit IRC17:43
*** tdasilva has joined #openstack-infra17:43
*** jamesmcarthur_ has quit IRC17:43
Shrewsinfra-root: I've put together a script to cleanup the leaked swift objects from image uploads. I'm not sure how to determine which ones might actually still be needed, so I thought I'd start by deleting any objects that have a last-modified timestamp older than 5 days, just out of safety and until I can figure out exactly where they're being leaked. Does this seem safe?17:44
clarkbShrews: I want to say mordred believes the swift objects from image uploads are always safe to delete as long as the upload isn't in progress17:45
clarkbShrews: so 5 days should be plenty to make sure we don't delete anything in progress17:45
Shrewsyeah, it was the "figure out which ones are in progress" that wasn't the easy part17:46
fungidoes glance just use them as an upload staging area?17:46
clarkbfungi: exactly17:46
fungiShrews: pause the builders first?17:46
clarkbI think we can start with the 5 day limit first to clean out 99% of the leaks17:46
clarkbthen ya turn off the builders and clean up the rest maybe17:46
Shrewsfungi: i don't think we have to do that yet with the 5 day limit17:46
fungiif there are no image uploads in progress, then it would be safe to delete it all en masse17:46
Shrewswhat clarkb said  :)17:46
fungiahh, also an option, sure17:47
corvusShrews: yeah, i vote delete with 5 day limit, don't worry about pausing, then after we fix the leak, we can pause and delete all17:48
fungimakes sense17:48
*** jamesmcarthur has quit IRC17:48
corvus(since we know we're going to continue leaking for a bit)17:48
*** jamesmcarthur has joined #openstack-infra17:53
Shrewsok, it's running for dfw. might take a while17:55
*** ykarel|away has quit IRC17:56
mordredShrews: my understanding is that we don't need any fo them - as long as we're not actively importing the related image17:56
mordredShrews: having read the full scrollback now - yes, I agree with teh above course of action :)17:57
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: render console in js with listview  https://review.opendev.org/67466317:59
*** sgw has joined #openstack-infra18:02
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: render console in js with listview  https://review.opendev.org/67466318:03
*** jamesmcarthur_ has joined #openstack-infra18:03
*** e0ne has joined #openstack-infra18:05
*** jamesmcarthur has quit IRC18:06
ShrewsSeems that 'openstack object list images' does not change as the objects are deleted from the container. Yet the deletes are clearly working since I get a 404 if I try to delete one already processed. Maybe the 'list' data is cached ?18:08
Shrewsoh, i bet it limits output to 1000018:09
Shrewswhich means we probably have WAAAY more objects than we think18:09
Shrewsbecause 'list' output changes, but not the total size of it18:10
Shrews:(18:11
*** jamesmcarthur has joined #openstack-infra18:13
*** bhavikdbavishi has quit IRC18:13
corvusShrews: i want to say that the web interface was like ~13000 objects per region18:13
corvusassuming it's correct (i'm going going out on a limb for it)18:14
*** jamesmcarthur_ has quit IRC18:14
*** bhavikdbavishi has joined #openstack-infra18:14
Shrewsok, not much over 10k then18:14
corvusfungi, clarkb, mordred: i have learned from our friends at ovh that swift doesn't return an allow-origin header unless the browser sends an origin header, so my test of ovh was faulty, ovh+swift should be working fine and we can go ahead and start using it.18:15
mordredcorvus: oh good! (because browsers will send the origin header)18:17
corvusmordred: yeah, though apparently only with an xmlhttprequest, since just hitting the url in the browser and watching the network panel shows no headers18:18
corvusfungi, clarkb, mordred, mnaser: repeating the test against vexxhost with an origin header, i do see an allow-origin: * header, but i also see allow-methods: HEAD... so that's confusing.18:18
mordredShrews: I *think* if we just delete the SLO manifest object we don't need to delete teh subobjects18:18
mordredShrews: I do not know if that is useful to you or not18:18
Shrewsmordred: maybe if you explained the words?18:19
mordredcorvus: ah - yeah - because just opening it in a normal browser link is a thing you can just do18:19
mordredShrews: each of these images is a "Large Object" - which has an object in /images and then a bunch of smaller ones in (I think) /image_segments18:19
corvusi think image_segments may have been empty when i looked?18:20
mordredShrews: I *believe* we only have to delete the manifest objects and the segment objects will get autodeleted for us18:20
mordredneat18:20
mordredthen it's also possible I'm just high and shouldn't be listened to18:20
corvusbut Shrews was looking a different way and should probably confirm :)18:20
Shrewsmordred: i'm currently only deleting in 'images' so i'll check the other afterwards18:20
corvus(because i'm *really* suspcious when the control panel says 0 objects in a container)18:20
Shrewsmordred: corvus: i currently don't see anything in image_segments18:21
clarkbcorvus: the s3 api allows you to do fine grained cors for request types too18:21
Shrewserr, images_segments, that is18:21
clarkbcorvus: was your vexxhost test via s3 or swift apis?18:22
corvusclarkb: still no joy with s3, those files were put via swift apis18:22
corvusoh, ha18:24
corvusvexxhost returns whatever method you use to access it via Access-Control-Allow-Methods18:24
corvusso, erm, i think we're just going to have to try this18:24
*** jamesmcarthur_ has joined #openstack-infra18:24
clarkboh taht should be fine because we are doing GETs from the web ui18:25
corvusyeah, as long as the options response makes sense18:26
corvusthat's the thing i'm not sure how to simulate18:26
*** jamesmcarthur has quit IRC18:26
*** jamesmcarthur has joined #openstack-infra18:28
*** betherly has joined #openstack-infra18:30
*** weifan has quit IRC18:30
*** jamesmcarthur_ has quit IRC18:30
*** tosky has quit IRC18:32
*** jamesmcarthur_ has joined #openstack-infra18:34
*** betherly has quit IRC18:34
*** smarcet has joined #openstack-infra18:35
cloudnullanyone around mind giving this a look - https://review.opendev.org/#/c/674414/7/zuul.d/molecule.yaml - seems this change is resulting in a zuul configuration validation failure, and I'm not really sure why?18:35
cloudnullkeeps raising "Unknown configuration error"18:36
*** jamesmca_ has joined #openstack-infra18:36
*** jamesmcarthur has quit IRC18:36
clarkbcloudnull: I can take a look18:37
cloudnullthanks clarkb!18:37
mordredcloudnull: do you have hide-ci turned on? because there's a nice clear error message from zuul on that18:38
clarkbhideci should no longer hide that message fwiw18:38
mordredawesome18:39
cloudnullno.18:39
*** jamesmcarthur_ has quit IRC18:39
* cloudnull that that I'm aware of 18:39
fungiand it's "toggle extra ci" now18:39
mordredcloudnull: weird. anywho - I'm going to guess it's https://review.opendev.org/#/c/674414/7/zuul.d/playbooks/releasenotes/notes/docker_enable_vfs-c8b41b02111341df.yaml18:39
mordredwhich zuul is attempting to parse as zuul config18:39
mordredcloudnull: oh - I see the unknown config error you mentioned18:39
clarkbmordred: ya because that isn't a list but instead a dict18:39
mordredcloudnull: there's a better error message on ps718:40
cloudnullits the release note ?18:40
mordredcloudnull: maybe you were in the wrong dir when you ran reno new?18:41
cloudnullthat very well could be...18:41
* cloudnull walks away in shame 18:41
*** jamesmca_ has quit IRC18:41
clarkbvos release on fedora mirror is still running if anyone is wondering18:41
mordredcloudnull: it happens to the best of us - and also to me :)18:42
cloudnullthanks clarkb mordred18:42
clarkbI wonder if I should rerun the sync script by hand and rerun vos release again just to make sure it comes in under the timeout normally18:44
*** tdasilva has quit IRC18:44
clarkbthis update did remove a bunch of data from the volume so not suprising it is slow18:44
*** tdasilva has joined #openstack-infra18:45
*** e0ne has quit IRC18:45
*** jamesmcarthur has joined #openstack-infra18:46
*** smarcet has quit IRC18:46
*** jamesmcarthur_ has joined #openstack-infra18:48
*** spsurya has quit IRC18:49
*** jamesmcarthur has quit IRC18:50
*** jamesmcarthur has joined #openstack-infra18:53
*** jamesmcarthur_ has quit IRC18:55
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars  https://review.opendev.org/66769818:58
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars  https://review.opendev.org/66769819:00
*** jamesmcarthur has quit IRC19:01
*** jamesmcarthur_ has joined #openstack-infra19:01
mordredinfra-root: afk for just a bit19:01
*** bhavikdbavishi has quit IRC19:04
*** lpetrut has quit IRC19:05
*** jamesmcarthur_ has quit IRC19:06
*** weifan has joined #openstack-infra19:08
*** igordc has joined #openstack-infra19:08
*** Goneri has joined #openstack-infra19:08
*** jamesmcarthur has joined #openstack-infra19:12
*** weifan has quit IRC19:12
openstackgerritMatt McEuen proposed openstack/project-config master: New project request: airship/kubernetes-entrypoint  https://review.opendev.org/67390019:13
*** e0ne has joined #openstack-infra19:16
*** jamesmcarthur has quit IRC19:17
*** jamesmcarthur has joined #openstack-infra19:19
*** jamesmcarthur has quit IRC19:24
*** iurygregory has quit IRC19:24
*** diablo_rojo has quit IRC19:25
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars  https://review.opendev.org/66769819:26
*** jamesmcarthur has joined #openstack-infra19:31
*** e0ne has quit IRC19:31
* clarkb finds lunch19:31
*** smarcet has joined #openstack-infra19:33
*** jamesmcarthur_ has joined #openstack-infra19:36
*** jamesmcarthur has quit IRC19:37
*** jamesmcarthur has joined #openstack-infra19:38
*** jamesmcarthur_ has quit IRC19:40
*** betherly has joined #openstack-infra19:41
*** Goneri has quit IRC19:42
*** Goneri has joined #openstack-infra19:43
*** betherly has quit IRC19:45
*** tdasilva has quit IRC19:45
*** tdasilva has joined #openstack-infra19:46
*** jamesmcarthur has quit IRC19:48
openstackgerritMerged openstack/project-config master: New project request: airship/kubernetes-entrypoint  https://review.opendev.org/67390019:54
*** weifan has joined #openstack-infra19:59
*** weifan has quit IRC20:03
*** Goneri has quit IRC20:07
*** harlowja has joined #openstack-infra20:08
mnasercorvus: ill try to dig a little bit and see whats up with the "405 Method Not Allowed"20:09
mnaseralso, is storyboard-dev.openstack.org having issues?  https://storyboard-dev.openstack.org/api/v1/stories?limit=10&project_id=237&sort_dir=desc&status=active takes almost 10 seconds20:10
fungilooking now20:10
fungiinterestingly, system load on it is nearly nonexistent20:13
*** Goneri has joined #openstack-infra20:13
fungithe vast majority of memory utilization is buffers/cache too20:14
fungimnaser: watching top, it seems that's the amount of time it takes a mysql thread to return the result of the corresponding database queries20:16
*** jcoufal has quit IRC20:17
*** michael-beaver has quit IRC20:18
*** betherly has joined #openstack-infra20:22
*** betherly has quit IRC20:27
*** tdasilva has quit IRC20:27
*** tdasilva_ has joined #openstack-infra20:27
*** weifan has joined #openstack-infra20:27
*** jamesmcarthur has joined #openstack-infra20:27
*** guoqiao has joined #openstack-infra20:41
*** prometheanfire has quit IRC20:41
mnaseri guess maybe some indexes are missing, i dunno20:42
* mnaser shrugs20:42
*** prometheanfire has joined #openstack-infra20:43
*** betherly has joined #openstack-infra20:43
mnaserit makes dealingh with this pretty slow20:43
mnaserhttps://storyboard.openstack.org/#!/project/openstack/governance20:43
*** jamesmcarthur has quit IRC20:47
*** betherly has quit IRC20:47
*** jamesmcarthur has joined #openstack-infra20:48
*** jamesmcarthur has quit IRC20:48
*** jamesmcarthur has joined #openstack-infra20:48
donnydDo we know what the jobs are reaching out to the interwebs for? I could just sync what the mirror cannot down locally to speed up wany things20:49
donnydmnaser: What do you have the nodepool tenants volume limit set at for vexxhost?20:51
donnydi should say volume quota?20:52
clarkbdonnyd: the idea is that they shouldn't hit the internet for anything (well not without going through the mirror at least) but it is a general purpose ci system and we know people don't use the mirror for everything so we keep that option available20:53
*** Goneri has quit IRC20:53
donnydWell I ask because I was thinking I may try to get on the official mirror list for the major distros so when it goes "upstream", that would be local too20:54
fungimnaser: yes, we've got mysql slow query logging turned on for the production storyboard to try and figure out where the hot spots are and which queries are in most need of refactoring. there are some really egregious ones20:54
clarkbdonnyd: oh in many cases it is github repos20:54
mnaserdonnyd: i am not sure, for nodepool i think i have 80*number_of_nodes*small%20:54
clarkbdonnyd: for random projects like puppet and ansible modules or golang repos20:55
mnaserdonnyd: you know what'd be neat?  but also i dont know the security implication of it, but running a transparent squid proxy at your gateway20:55
donnydoh, well i can't do much about that.  I could cache locally with the squids or something like that.. but I would be worried it would interfere with what is upstream and20:55
mnaserand then generate stats from that only (not cache)20:55
mnaser*just* to help us find what are the big things that are being hit20:56
donnydI can surely setup a squid that would just miss on everything and tell us where they are going20:56
donnydmnaser: As long as the squid doesn'20:58
donnyddoesn't have any hits, it shouldn't effect the jobs. right?20:58
donnydhit the enter too early20:58
mnaseri dont think so, it'll just be a pass through, but don't quote me on that :)20:59
ianwShrews: you've possibly finished by now, but I had a script to maybe do a similar cleanups to what you've described above @ -> https://review.opendev.org/#/c/562510/1/tools/rax-cleanup-image-uploads.py20:59
donnydclarkb: fungi any thoughts on that?20:59
ianwlooks like that never got a second +220:59
ianwinfra-root: could i get a couple of eyes on https://review.opendev.org/#/c/674550/ and the dns change https://review.opendev.org/#/c/674549/ to get the ansible-ised backup server started please?  thanks21:01
ianwit won't have any clients yet, one thing at a time :)21:02
corvusinfra-root: i have paused the ansible run_all script for a bit21:02
clarkbdonnyd: the reason we haven't set up squid ourselves is that we can't know source ranges for all our possible VM IPs to set up whitelists for access to that. We don't want open transparent proxy on the internet21:02
donnydNot sure how I will man-in-the-middle github though21:02
clarkbdonnyd: right that was going to be my next concern. I think if you did it at the cloud level it would have to be transparent21:03
donnydwell I know the range, because I asigned it21:03
clarkbin which case https wouldn't work21:03
*** betherly has joined #openstack-infra21:03
clarkbdonnyd: ya in your cloud that particular problem is probably fine. In most other clouds it is a bigger issue (since we pull from large pools of addrs that are recycled)21:03
corvusif all the clouds support neutron we might be able to set up a private network21:03
clarkbcorvus: I don't think ovh does21:04
donnydwell we only need one cloud to gets the datas21:04
*** mattw4 has quit IRC21:04
*** mattw4 has joined #openstack-infra21:04
donnydI think are options are pretty limited for https proxy. You would have to create a CA the image trusts and then pass it off to my squid and then figure out what breaks21:05
Shrewsianw: oh, i wasn't aware of your script. mine is much more basic. i've had to pause my deletes for a bit but i'll look at yours. If we fix the root problem, we shouldn't need to keep a script around in system-config.21:05
donnydor force instances to not care and trust all21:05
*** diablo_rojo has joined #openstack-infra21:06
donnydI am no squid expert, btw21:07
donnydjust know enough to break things21:07
clarkbdonnyd: ya to do https caching you essentially set up a mitm21:07
clarkb(which is maybe an option for us, I'll have to think about that a bit more)21:07
*** eharney has quit IRC21:07
donnydwell I don't actually need to do the caching part (while it won't bother me to), just find out where the connections are going to.21:08
ianwShrews: yeah, from my notes, at the time it cleared out 200+TB ... maybe a lot of that was de-duped, but it was *a lot*21:08
*** betherly has quit IRC21:08
donnydI can also turn logging up for the edge fw and find out who is talking to who...21:08
clarkbdonnyd: oh for that tcpdump on your gateway may be sufficient. You can tcpdump filting out the mirror (since we want traffic to go through it) then see whateverything else is pinging21:08
fungiit would be possible (i'm not saying it's worth the effort involved) to set up a private ca for infra which signs certificates for a list of specific dns names and then distribute those to transparent proxies in each provider and set our images to add them to the appropriate trust lists21:08
donnydbut won't the client still complain about the mitm21:09
*** weshay is now known as weshay_dentist21:09
funginot if the cert provided by the proxy is trusted by the client21:09
donnydOk, then that could work21:09
fungithat's the "set our images to add them to the appropriate trust lists" part21:09
donnydI can easily lie in dns to tell it who github.com is21:10
fungithe other trick would be working out how to make only requests for specific domains go through them21:10
fungioh, yeah that would do it21:10
donnydDNS attacj21:10
donnydLOL21:10
fungiwe could also have our own dns overrides which substitute the proxy's address for those names21:10
donnydsurely could21:11
donnydI would just have to update the tenant's network to point to your dns21:11
donnydYou could also get the who is asking for what from there too21:11
fungiso clients know github.com as the ip address of whatever the local transparent proxy is in that provider, and are configured to trust the infra ca which signed the github.com cert we installed on the proxy21:11
donnydcorrect21:12
fungiwell, we also install local caching dns recursive resolver/forwarder on every node, so could simply carry the overrides there21:12
donnydand you get to know where else they are going by looking in the dns logs for who looked up what21:12
donnydall you would require from the cloud providers is an updated dns for the nodepool network21:13
fungiwe don't actually trust/use the complimentary recursive resolvers in our providers anyway because they tend to be unreliable for a number of reasons21:13
donnydright now its google, but that could easily be you21:13
fungi(most fun one being rackspace, whose resolvers have threat mitigation to automatically blacklist queries from the ip addresses of misbehaving server instances, but the blacklist doesn't get updated as quickly as they turn those addresses over to new instances)21:14
clarkbya so we use google and cloudflare dns with local unbound caching results they return21:15
donnydSo your mirror would need a squid and dns server21:16
jrosserdeploying openstack where a bunch of the resources needed come from git repos / mirrors / proxies / blah with a company CA rather than a public one took a ton of work21:16
donnydand some certs, then populate those certs (we already know github.com), but there may be other places21:16
jrosserspecifically the hoops that python requests / certifi make you jump through21:16
clarkbjrosser: good reminder re requests21:16
donnydyou could also prevent the I am gonna pull from the public centos mirrors anyway behaviors too21:17
clarkbit ships its own ca trust chain21:17
jrosseryep, it's a PITA21:17
jrosserso i don't think this is a set-and-forget without fixing up the actual jobs21:17
clarkbya there will be fallout from that for sure21:18
*** jamesmcarthur has quit IRC21:18
donnydwell a good start is just doing dns, because then you are going to definitively know where the tests are going21:18
donnydmaybe not speed anything up (except dns requests), but surely get valuable data21:19
fungidonnyd: wouldn't necessarily have to be squid. the apache mod_proxy with proxy caching we already have in place would be capable, in theory21:19
*** mriedem has quit IRC21:19
*** jamesmcarthur has joined #openstack-infra21:20
donnydyea any proxy could work.21:20
jrosseri do have a sort of related example21:21
*** mriedem has joined #openstack-infra21:21
jrosseri suffer / enjoy / deal with http proxies all day and to ensure good support i made a CI job with one21:21
jrosserhttps://logs.opendev.org/21/672721/1/gate/openstack-ansible-deploy-aio_proxy-ubuntu-bionic/130b0f8/logs/host/squid/access.log.txt.gz21:21
*** rh-jelabarre has quit IRC21:23
donnydSo in this job, the CI build starts and sends traffic through this proxy?21:23
jrosserpretty much, yes21:23
*** yamamoto has joined #openstack-infra21:23
fungithat's a neat design21:24
jrosserbut it's really trying to be a mock up of an environment where the deployment is behind a proxy21:24
jrosserand as there isnt a full http proxy as part of the infra i made my own in the job21:24
*** jamesmcarthur has quit IRC21:25
*** yamamoto has quit IRC21:28
*** mattw4 has quit IRC21:28
donnydSo where should we start? I can get working on the logging thing after I am done with work today21:29
*** mattw4 has joined #openstack-infra21:29
donnydI need to get central logging up and running anyways21:29
*** markvoelker has quit IRC21:29
clarkbjrosser: at a job level it is easier to do that because you know all of the source IPs related to the job at runtime21:29
fungiwhat is it you want to do again? measure what the network traffic to nodes is that doesn't get proxied/cached?21:30
jrosseralso a full http proxy is different to the transparent one you have been discussing, so not quite the same thing21:30
jrosserbut anyway, it's really just as an example and a maybe representative set of squid log to look at21:31
donnydfungi: Yea, just see where the hotspots are in the public side and then figure out a way to get that service faster access21:32
*** mattw4 has quit IRC21:32
donnydIf its 90% github.com, then we can solve that a bunch of ways, if its random interwebs there might not be anything that can be done21:32
donnydbut right now we have very little data other than the traffic graph from my wan side (which is 90% these instances). Not that I mind using the public internet for things, but more in the interest of what can be done to speed it up21:33
donnydIf we could do something, we would need a way to make sure that it does in fact make things faster and not slower21:35
fungisimple ip address analyses (netflow, tcpdump, whatever) might yield some insights21:35
fungi(ntop)21:35
donnydlike case in point, the package mirrors. I could have local mirrors that would make access to .rpm and .deb content in orders of magnitude faster21:35
fungipart of the challenge is selectively depending on resources like that hosted by a provider since we don't have it as a consistent option everywhere21:37
*** mattw4 has joined #openstack-infra21:37
fungiis rpm and deb content on our mirror host not fast (once the afs cache is warmed)?21:37
donnydNot as fast as I could provide it21:39
donnydI can cache 100% of the content downloaded locally21:41
donnydOnly so much can be done in 8GB21:41
donnydyea and that makes sense from my perspective. Consistency is more important than speed at scale21:42
clarkbdonnyd: so there are a couple things about the linux distro mirrors that are worth pointing out21:43
clarkbthe first is that they cache to disk not memory so the actual afs disk cache size is much larger than 8GB21:43
clarkbsecond is that through the use of afs we synchronize all of our package mirrors globally21:44
clarkbthis means you don't have jobs passing or failing due to local state (which is nice)21:44
*** betherly has joined #openstack-infra21:44
donnydOh I c, I thought AFS was caching everything in memory.21:44
clarkband finally we build the mirrors very carefully (ubuntu/debian at least) and only publish them when we have a valid mirror ready as well as not deleting older content until some time has passed to avoid jobs failing because files disappear out from under them21:44
clarkbI think the rpm rsync mirrors already handle the deletion phase out but also the way tools like yum work means it is far less likely you'll want to download older files compared to say apt21:45
fungiyeah, and avoiding mismatched package/index combinations21:45
fungiand afs provides atomic volume updates21:46
fungiso the data in a given package repository isn't changing while it's being read, which helps maintain consistency21:46
donnydI am not arguing the value of  AFS, it makes sense to me.21:46
donnydBut here is an example https://logs.opendev.org/87/674687/3/check/tripleo-ci-centos-7-scenario001-standalone/ce1d8a5/job-output.txt#_2019-08-05_20_34_33_05627221:47
clarkblooks like we are set to use 50GB of afs disk cache and fn mirror is using about 8GB currently21:47
fungiexcept at given update checkpoints, where we ensure we don't delete packages referenced by the previous index state on that pulse21:47
donnydOn the rest of my network packagey things are about 5x that speed21:47
clarkbdonnyd: so I did a test of warm cache data from afs on the mriror itself and I get super speed (I forget the exact number but it was illy)21:47
clarkbdonnyd: I think we need to test the network bw between VMs?21:48
clarkbbecause testing shows the afs cache works and it is fast, but then pull from off host is slower21:48
clarkb(whcih means maybe something between the hosts is slwoing it down)21:48
*** jamesmcarthur has joined #openstack-infra21:49
*** betherly has quit IRC21:49
donnydwe should do that and figure out where the bottleneck is.  Could be the router between the mirror and the clients21:49
donnydI did just upgrade it so I know for sure it will move at wireline 10G21:49
donnydbut its worth a look see.21:49
donnydthe mirror here should be ludicrous fast. It shares nothing and is on the fastest equipment you can get. the mirror is on a very new cpu / ram / nvme and shares nothing21:51
clarkbok lets warm a file up then we can request it from elsewhere and see what it looks like21:51
donnydAnyways, I think i should probably work on getting the logging stuffs up and running and then we can work on the host - mirror thing21:52
donnydsure21:52
donnydI think we did BW tests and were able to get quite respectable speeds a few weeks back. I think fungi and I were working on it21:53
clarkbdonnyd: http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso I am going to warm that up. It is 500MB ish21:53
clarkbas expected, cold is not fast. ~1MBps21:54
clarkbso will be a few before we can do second request and see what it is like warm21:54
donnydis there a way to make those cold requests go faster? Or do they go back to something central infra owns?21:55
clarkb(the slowness on cold access is due to how afs does window sizing of its udp packets, unfortunatley I don't think that is very fixable at least not without replacing afs)21:55
clarkbdonnyd: ya we have central fileservers and they serve the data and they do it with a max window size that was set back in the 80s? and it is much smaller than modern networking would prefer21:55
*** jamesmcarthur has quit IRC21:55
clarkbit predates tcp so afs does its own windowing with udp21:56
*** whoami-rajat has quit IRC21:58
donnydlmk when to test on my end. Desktop has the 10g's, so I should get some respectable numbers21:59
clarkbwill do21:59
*** mattw4 has quit IRC22:02
*** mattw4 has joined #openstack-infra22:03
*** jtomasek has quit IRC22:04
clarkbdonnyd: cold: 507.00M   931KB/s    in 10m 23s warm: 507.00M   456MB/s    in 1.1s22:04
clarkbdonnyd: you should test it from your desktop now22:05
*** betherly has joined #openstack-infra22:05
donnyd(zuul) donny@office:~> curl -o /dev/null http://mirror.regionone.fortnebula.opendev.org/centos/7/os/x86_64/images/boot.iso22:05
donnyd  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current22:05
donnyd                                 Dload  Upload   Total   Spent    Left  Speed22:05
donnyd100  507M  100  507M    0     0   292M      0  0:00:01  0:00:01 --:--:--  291M22:05
clarkbnow I guess I should ssh into a test node and give it a test from there22:06
*** rcernin has joined #openstack-infra22:06
donnydOh you know what else I should put on the public graph is mirror node bw22:07
clarkbdonnyd: 507.00M   295MB/s    in 1.7s from one of our bionic test nodes22:07
clarkbthe mystery deepens22:07
*** smarcet has quit IRC22:07
clarkbmaybe that points at our cache not being as warm as we often think it is?22:08
clarkbianw: ^ you've been poking at afs logs recently. Any idea if we can better quantify that?22:08
donnydso that number makes sense to me because my edge fw will only do 3.5GB on a single thread22:08
*** betherly has quit IRC22:09
corvusinfra-root: i have re-enabled the run-all crontab22:12
clarkbcorvus: ty22:12
ianwclarkb: not really, sorry.  i've been working on a little speed test type thing to send to grafana to get openafs/kafs comparisions, not ready just yet22:12
donnydok, now you can see the BW for the mirror node from the host view22:13
mordredclarkb: ++22:14
donnydDoes that graph help?22:17
clarkbdonnyd: ya that likely gives us somethin we can cross check for over/under loading to at least narrow it down22:17
* donnyd dinnering22:21
clarkbI think the next thing for us to investigate is if those slower than expected downloads are cache misses22:21
*** tosky has joined #openstack-infra22:26
*** diablo_rojo has quit IRC22:28
*** diablo_rojo has joined #openstack-infra22:29
openstackgerritJames E. Blair proposed opendev/base-jobs master: Update swift upload credentials  https://review.opendev.org/67470922:29
clarkbfedora mirror vos release finally completed. I am going to rerun the script manually and vos release manually again just to be sure it runs in a reasonable amount of time22:30
corvusinfra-root: a speedy review of that ^ would help me progress testing22:30
mordredcorvus: looks great22:39
*** rlandy is now known as rlandy|bbl22:42
openstackgerritMerged opendev/base-jobs master: Update swift upload credentials  https://review.opendev.org/67470922:45
donnydclarkb: the cache misses are usually much slower, I was found one that was on the quicker side22:45
clarkbdonnyd: so we have an intermediate speed too between cold and hot cache?22:46
donnydI am yet to see anything close to what we can see in hot cache on any instance22:46
*** markvoelker has joined #openstack-infra22:46
*** tosky has quit IRC22:47
donnydHere is a better view of what is most likely not cold, but surely not showing the hot numbers we are getting22:47
donnydhttps://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_15_56050522:47
*** jamesmcarthur has joined #openstack-infra22:48
clarkbhrm ya 27MB/s is well above cold numbers but like 1/10th hot numbers22:48
clarkbrsync took about 16 minutes for fedora and I've started the vos release22:49
donnydI do have a faster nic to drop into my fw sometime in the near future, should get numbers up even higher.. but it doesn't look to me like I can really do anything to make it faster22:52
clarkbI wonder if the intermediate speed could be due to a mix of hot and cold files?22:53
clarkbyum is requesting a whole set then depending on which are warm and not you get a different aggregate speed? probably still a good idea to check afs cache hit/miss rate22:54
donnydgranted it only took 3 seconds from flash to bang on the download, so I am not sure that doing something would make a marked improvement on build times22:54
donnydthat part started here22:54
donnydhttps://logs.opendev.org/53/674653/4/check/openstack-ansible-deploy-aio_lxc-centos-7/07cf6c1/job-output.txt#_2019-08-05_19_48_11_87163522:54
donnydBut one would think that if its in cache you would see speeds more like our test22:56
*** tkajinam has joined #openstack-infra22:56
*** eernst has joined #openstack-infra22:57
fungithroughput for one large file is going to be markedly different than serialized throughput for lots of small files each with its own request22:57
donnydany recommendations on log aggregation tools / looks like E(L/F)K is pretty much the standard. Just curious22:58
clarkbcorvus: ianw http://docs.openafs.org/Reference/1/xstat_cm_test.html is that the command we want to run to get info about cache hit/miss rates?22:58
clarkbthe stat cache is a separate from the data cache23:00
* clarkb looks to see what our statcache size is23:00
fungidonnyd: or old school it with rsyslog on a mysql backend23:02
corvusclarkb: i'm unfamiliar with that command23:02
clarkbgetcacheparms seems to only show dcache entries?23:02
clarkbcorvus: do you know how to get info on the stat cache?23:02
*** eernst has quit IRC23:02
corvusno23:02
donnydfungi: which do you think requires the lowest level of maintenance23:02
donnydI was looking a graylog just because it does some inline transforms I would think I could make use of23:03
clarkbdocs say default stats cache size is 300 on unix machines23:03
clarkbtrying to see what we are actually set to /me digs around in config files23:03
*** aaronsheffield has quit IRC23:03
fungidonnyd: once set up neither require a considerable amount of maintenance. elk is a lot more resource-heavy, but has all your log analysis and indexing without needing separate analysis tools23:03
*** diablo_rojo has quit IRC23:03
fungiwhen i hear "log aggregation" i don't immediately assume log analysis too23:04
donnydfungi: I want to push the logs public too, but I don't particularly want to hand someone keys to the network23:04
donnydSo something I could easily do a transform on would be baller23:05
fungikibana can be public facing. exposing elacsticsearch to the open internet on the other hand is dangerific23:05
*** mattw4 has quit IRC23:06
*** mattw4 has joined #openstack-infra23:06
clarkbafsd manpage says the -stat value is actually based on the size of the dcache23:06
donnydLOL,  of course I just meant a dashboard.. not handing over my monitoring stuffs to anyone either, but dashboards are fairly harmless23:06
fungibut that raises another hidden goal which mere "log aggregation" doesn't cover in my book. publication is something kibana handles for you. if you wanted a web frontend to your mysql tables that would be yet another something you'd need to find/write23:07
clarkbI now believe that line 3 in http://paste.openstack.org/show/755542/ may exposethe stat cache23:07
clarkbwe set the dcache to 50GB and from that afs decides we should be able to cache 1.5 million stat entries23:07
clarkbwe actually use more dcache than stat cache so that should be fine23:08
clarkbcorvus: any objection to me trying that cm_test command?23:09
donnydfungi: so you are saying elk is a pretty safe bet23:10
clarkb`xstat_cm_test -cmname mirror01.regionone.fortnebula.org -colID 2 -onceonly` something like that23:10
corvusclarkb: honestly, no idea23:11
fungidonnyd: well, i'm saying it's a popular choice, and you're looking for at least a few bells and whistles it provides over mere log aggregation23:11
donnydyea, I think that is safe to say. I need a log thingy that does all the things23:12
fungibe mindful of the licensing. there are a proprietary license builds subject to a number of additional conditions, you probably want to stick to the open source version23:13
donnydthanks fungi23:14
clarkbdcache miss rate is .08%. vcache miss rate is .3%23:16
clarkbboth are within the recommended parameters23:17
clarkbhttps://docs.openafs.org/AdminGuide/HDRWQ402.html23:17
*** markvoelker has quit IRC23:20
clarkbmirror01.regionone.fortnebuila.opendev.org:/root/cache_manager_data if anyone else wants to see the verbose output23:20
*** betherly has joined #openstack-infra23:27
*** betherly has quit IRC23:32
*** kjackal has quit IRC23:38
openstackgerritJames E. Blair proposed opendev/base-jobs master: Synchronize cloud creds in base-test-swift job  https://review.opendev.org/67471323:46
corvusinfra-root: ^ small update i missed earlier23:46
clarkbvos release has now been running for almost an hour. The rsync pulled in some new arm64 images for atomic. I do wonder if this was ultimately what broke it in the first place?23:46
clarkb+223:46
*** weifan has quit IRC23:48
*** weifan has joined #openstack-infra23:48
fungiquite possible23:52
fungiwhat jobs are using atomic images?23:52
*** weifan has quit IRC23:52
clarkbfungi: magnum jobs aiui23:53
corvusi think the expiration time is something like 12h23:53
corvusthough i'm not sure what happens if we hit our cron job timeout first23:53
clarkbit actually doesn't look like we have a cron job timeout on these23:54
corvusgood23:54
corvusthe 12h timeout (or maybe it's 10h?) is the longest that a vos release can take without using -localauth23:54
clarkbya crontab lacks timeouts and the commands with timeouts in the script itself do not include the vos release23:55
corvusso if a release from mirror-update takes longer than that, we'll be left with a locked volume23:55
clarkbk23:56
clarkbin that case it is probably safe for me to unlock the lockfile when this vos release finishes assuming it does so in under 12 hours23:56
corvus(the transaction to perform the update will eventually finish, but the auth will have expired for the subsequent volume unlock)23:56
corvusclarkb: you running from m-u?23:56
clarkbcorvus: no I am running the vos release on afs01.dfw.openstack.org23:57
*** smarcet has joined #openstack-infra23:57
corvusoh you mean unlock the cron lockfile23:57
corvusgotcha, yeah23:57
clarkbyes that23:57
*** smarcet has left #openstack-infra23:57
openstackgerritMerged opendev/base-jobs master: Synchronize cloud creds in base-test-swift job  https://review.opendev.org/67471323:57
*** owalsh_ has joined #openstack-infra23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!