Friday, 2019-07-26

*** xek__ has joined #openstack-infra00:00
*** xek_ has quit IRC00:02
*** jamesmcarthur has joined #openstack-infra00:03
fungidoes seem to be going a bit more than twice slower00:03
*** mattw4 has quit IRC00:03
*** jamesmcarthur has quit IRC00:06
*** jamesmcarthur has joined #openstack-infra00:06
clarkbfungi: seems like we are about 14-15 minutes in and it has 2100 ish tasks00:08
clarkbso about the same speed?00:08
fungiyeah, maybe00:08
*** slaweq has joined #openstack-infra00:11
*** weifan has joined #openstack-infra00:13
*** slaweq has quit IRC00:15
*** weifan has quit IRC00:17
*** rosmaita has quit IRC00:19
*** gyee has quit IRC00:19
funginearly done00:20
*** yolanda has quit IRC00:20
fungiso ~2x14min00:20
clarkbya, you'd expect it to be quicker than that00:20
clarkbbut at least it hasn't gone slower than the single case00:20
fungiso far00:20
clarkbshould I do 05, 07, 08 next?00:20
fungibut yeah, replicating two in parallel took roughly as long as replicating one00:20
fungihas 01 been done yet?00:21
clarkbyes 01 06 03 04 ar edone00:21
fungioh, right, 01 went first then 0600:21
clarkbif we do 05 07 08 together we can see if it is 3x14 minutes00:22
fungiand we're not doing 02 because it's already removed to replace00:22
fungiso, yep, sounds good00:22
clarkbfungi: ya, though maybe you should double check that haproxy has 02 pulled as we expect?00:22
clarkbdoing 05 07 08 now00:22
fungichecking now, sure00:22
clarkb6.3k tasks now00:23
clarkbI'm going to start on that curry now00:23
*** weifan has joined #openstack-infra00:25
fungiyeah, 02 is no longer in the pools according to "show stat"00:25
fungienjoy curry!00:25
fungii'm monitoring the times for the gerrit replication queue with timestamps, so should know within a minute when it's caught up00:29
fungiyou know, for science00:30
*** weifan has quit IRC00:30
*** rosmaita has joined #openstack-infra00:32
*** igordc has quit IRC00:45
ianwanyoen know how to encode the date range in the url of the kibana query so if you send it to someone they see the same range?00:47
fungino clue, sorry00:51
fungii've not been able to figure out how to direct-link kibana queries for that matter00:51
*** betherly has joined #openstack-infra00:51
*** yamamoto has joined #openstack-infra00:52
logan-?from=5d00:52
logan-there is a "share" button at the top that will link to the query (2nd from the right)00:52
ianwlogan-: yeah, that share link doesn't seem to add in the date range if you've selected a custom one00:53
logan-nope, it doesn't00:53
logan-you have to add the from parameter manually :/00:53
ianwwhat if it's a range?00:54
logan-not sure00:54
logan-sorry, i only know how to specify the start time00:54
logan-half way there at least :P00:54
ianwmaybe ?from=..?to=... ... ?00:55
fungiworth a try00:55
*** ricolin has joined #openstack-infra00:56
*** betherly has quit IRC00:56
clarkbsorry I dont know00:58
ianwhttp://logstash.openstack.org/#/dashboard/file/logstash.json?from=2019-07-25T16:39:00&to=2019-07-25T17:00:00&query=message:%5C%22HTTP%20Error%20404%5C%22%20AND%20node_provider:%20rax-iad00:58
ianwdoes not appear to work00:58
ianwit seems like this is done by default in kibana 4, but for 3 might be out of luck :/00:59
ianw(date range as part of shared url i mean)00:59
clarkb ya but 4 requires write accessto elastocsearch01:01
clarkbwhoch is why wenever upgraded01:02
fungilast batch nearly done01:02
clarkbfungi: about on time01:02
fungiyeah01:02
fungiand done01:02
clarkbI wonder where we degrade to an hour per and if it is related to load like you suggested01:03
fungiso... 40 minutes?01:03
fungiright01:03
openstackgerritJoshua Hesketh proposed opendev/system-config master: Toggle CI should also hide old zuul comments  https://review.opendev.org/67143601:09
*** betherly has joined #openstack-infra01:13
rm_workerr, is there a good channel to ask someone about some Shanghai visa specifics? nothing to do with infra really, but you folks are the most connected I know :D01:14
*** betherly has quit IRC01:17
clarkbrm_work: there is an invite letter form to fill out on the summit site. Other than that it hasbeen suggested to me to use ahandling company01:17
rm_workyeah i think we have one internally we use01:18
rm_workfollowing the email directions seemed to indicate i had to sign up for the summit first but i think i misread pre-coffee and it's fine01:18
rm_work(to get the invite letter)01:19
ianwclarkb / auristor: correlating everything (as clarkb had done anyway) it seems very likely that apache can think files aren't on disk during releases even with 5.3-rc1 afs-next branch; see -> http://lists.infradead.org/pipermail/linux-afs/2019-July/003122.html01:19
rm_workahh no, it does, the form says your order ID is required01:22
rm_workso ... if we are waiting for speaker codes... we just have to keep waiting before we can get our visa thing?01:23
clarkbI guess? I was also told the visa is relatively quick01:24
auristorianw: is there FileAuditLog data for [Thu Jul 25 16:39:18 2019] kAFS: Volume 536870968 'mirror.epel' is offline ?01:24
ianwauristor: no unfortunately, it's turned off atm.  i can go back and re-enable it like last time now we have these new changes to test01:25
rm_worki've been told to allow 2 months for the visa process, lol01:26
rm_workbut I guess we do have a bit01:26
*** diablo_rojo has quit IRC01:27
ianwauristor: i can do that in a bit.  the tar.gz i provided before was sufficient right?  just replicate that?01:27
*** jamesmcarthur has quit IRC01:30
auristorianw: yes, the same contents as the last time would be great01:32
*** betherly has joined #openstack-infra01:33
auristorianw: openafs unlike kafs will upon receiving a VBUSY or VOFFLINE error will sleep 15 seconds and then retry the request up to 100 times.  at the moment, kafs will immediately failover to the other locations but will then fail the syscall.01:35
auristorianw: I would like to confirm from the FileAuditLog entries whether or not the failover is taking place01:36
ianwok.  i'm not sure if there's a way to make apache a bit more verbose too about what it is seeing on disk01:38
*** betherly has quit IRC01:38
auristorI think the translation of VOFFLINE to ENOENT is wrong01:41
*** betherly has joined #openstack-infra01:54
*** betherly has quit IRC01:59
auristorfs/afs/misc.c afs_abort_to_error() should not convert VOFFLINE to ENOENT but to ENODEV because ENOENT will cause a negative lookup to be cached resulting in a missing file error.02:03
*** tinwood has quit IRC02:10
*** slaweq has joined #openstack-infra02:11
*** tinwood has joined #openstack-infra02:12
*** betherly has joined #openstack-infra02:15
*** slaweq has quit IRC02:16
*** betherly has quit IRC02:20
*** bobh has joined #openstack-infra02:22
*** whoami-rajat has joined #openstack-infra02:26
*** bobh has quit IRC02:36
*** yamamoto has quit IRC02:36
*** factor has joined #openstack-infra02:38
*** yamamoto has joined #openstack-infra02:50
openstackgerritIan Wienand proposed opendev/system-config master: AFS audit logging: helper script  https://review.opendev.org/67284702:54
ianwauristor: logging enabled; for infra-root ^ should be helpful so there's less magic involved02:54
*** betherly has joined #openstack-infra02:57
prometheanfireianw: do_extra_package_install does not include the hooks mount, is this intentional?02:58
auristorianw; thanks02:59
*** jamesmcarthur has joined #openstack-infra02:59
ianwprometheanfire: ahhh, i have no idea :)  i guess it's just always been like that02:59
prometheanfireok, may need to unify that behavior then :D02:59
prometheanfireI guess I could just have it run portageq itself, since it will always be in the chroot02:59
prometheanfireya, will just do that02:59
*** betherly has quit IRC03:01
*** EmilienM|pto is now known as EmilienM03:01
*** HenryG has quit IRC03:04
*** slaweq has joined #openstack-infra03:11
*** bhavikdbavishi has joined #openstack-infra03:13
*** rh-jelabarre has quit IRC03:14
*** slaweq has quit IRC03:15
*** psachin has joined #openstack-infra03:22
*** psachin has quit IRC03:23
*** psachin has joined #openstack-infra03:26
*** betherly has joined #openstack-infra03:28
*** michael-beaver has quit IRC03:32
*** ykarel|away has joined #openstack-infra03:32
*** betherly has quit IRC03:33
*** gregoryo has joined #openstack-infra03:35
*** diablo_rojo has joined #openstack-infra03:43
*** bhavikdbavishi1 has joined #openstack-infra03:46
prometheanfiredoesn't do the caching either03:47
*** bhavikdbavishi has quit IRC03:48
*** bhavikdbavishi1 is now known as bhavikdbavishi03:48
*** betherly has joined #openstack-infra03:48
*** betherly has quit IRC03:53
*** HenryG has joined #openstack-infra03:54
*** udesale has joined #openstack-infra03:57
*** betherly has joined #openstack-infra04:09
*** slaweq has joined #openstack-infra04:11
*** ramishra has joined #openstack-infra04:12
*** betherly has quit IRC04:14
*** slaweq has quit IRC04:15
*** apetrich has quit IRC04:20
*** betherly has joined #openstack-infra04:30
*** jamesmcarthur has quit IRC04:32
*** betherly has quit IRC04:35
*** ykarel|away has quit IRC04:40
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153004:41
*** yamamoto_ has joined #openstack-infra04:46
*** yamamoto has quit IRC04:50
*** diablo_rojo has quit IRC04:56
*** betherly has joined #openstack-infra05:01
*** jbadiapa has quit IRC05:02
*** ykarel|away has joined #openstack-infra05:02
*** betherly has quit IRC05:06
*** armax has quit IRC05:16
*** jaosorior has joined #openstack-infra05:21
*** ramishra has quit IRC05:27
*** ramishra has joined #openstack-infra05:38
*** rtjure has joined #openstack-infra05:42
*** yamamoto_ has quit IRC05:43
*** zbr has joined #openstack-infra05:44
*** kjackal has quit IRC05:44
*** jamesmcarthur has joined #openstack-infra05:45
*** yamamoto has joined #openstack-infra05:56
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153005:57
*** rcernin has quit IRC06:08
*** slaweq has joined #openstack-infra06:11
*** odicha has joined #openstack-infra06:14
*** apetrich has joined #openstack-infra06:15
*** slaweq has quit IRC06:16
*** gregoryo has quit IRC06:17
*** yamamoto_ has joined #openstack-infra06:19
*** yamamoto has quit IRC06:21
*** jbadiapa has joined #openstack-infra06:24
*** dpawlik has joined #openstack-infra06:25
*** Lucas_Gray has quit IRC06:26
*** jamesmcarthur has quit IRC06:33
*** ramishra has quit IRC06:34
*** iurygregory has joined #openstack-infra06:34
*** ykarel|away is now known as ykarel06:36
*** joeguo has quit IRC06:44
*** kjackal has joined #openstack-infra06:45
*** rlandy has joined #openstack-infra06:47
*** kjackal has quit IRC06:48
*** jpena|off is now known as jpena06:51
*** jpena is now known as jpena|mtg06:51
*** raukadah is now known as chandankumar06:51
*** Vadmacs has joined #openstack-infra06:59
*** kjackal has joined #openstack-infra07:00
*** tesseract has joined #openstack-infra07:03
*** slaweq has joined #openstack-infra07:07
*** ginopc has joined #openstack-infra07:10
*** ykarel is now known as ykarel|lunch07:26
*** bhavikdbavishi has quit IRC07:26
*** kopecmartin|off is now known as kopecmartin07:26
*** cshen has joined #openstack-infra07:32
*** tosky has joined #openstack-infra07:34
*** dchen has quit IRC07:39
*** cshen has quit IRC07:43
*** rpittau|afk is now known as rpittau07:44
*** pcaruana has joined #openstack-infra07:44
*** Goneri has joined #openstack-infra07:45
*** ramishra has joined #openstack-infra07:53
*** bhavikdbavishi has joined #openstack-infra07:54
*** dtantsur|afk is now known as dtantsur07:55
*** ralonsoh has joined #openstack-infra07:56
*** cshen has joined #openstack-infra07:56
*** ramishra has quit IRC07:57
*** ramishra has joined #openstack-infra07:57
*** lucasagomes has joined #openstack-infra08:03
*** yamamoto_ has quit IRC08:05
*** yamamoto has joined #openstack-infra08:13
*** ricolin has quit IRC08:19
*** pkopec has joined #openstack-infra08:23
*** siqbal has joined #openstack-infra08:29
*** rosmaita has quit IRC08:30
*** rosmaita has joined #openstack-infra08:34
*** joeguo has joined #openstack-infra08:41
*** bhavikdbavishi has quit IRC08:47
*** ykarel|lunch is now known as ykarel08:49
*** bhavikdbavishi has joined #openstack-infra08:56
*** e0ne has joined #openstack-infra09:16
*** bhavikdbavishi has quit IRC09:17
*** derekh has joined #openstack-infra09:23
*** siqbal has quit IRC09:31
*** siqbal has joined #openstack-infra09:31
*** arxcruz is now known as arxcruz|off09:32
*** cshen has left #openstack-infra09:33
*** jbadiapa has quit IRC09:39
openstackgerritSimon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies  https://review.opendev.org/64330909:49
*** priteau has joined #openstack-infra09:53
*** rlandy has quit IRC10:03
*** dpawlik has quit IRC10:08
*** yamamoto has quit IRC10:10
*** yamamoto has joined #openstack-infra10:14
*** yamamoto has quit IRC10:15
*** dpawlik has joined #openstack-infra10:26
*** tdasilva has quit IRC10:29
*** psachin has quit IRC10:38
*** udesale has quit IRC10:44
*** udesale has joined #openstack-infra10:45
*** yamamoto has joined #openstack-infra10:45
*** siqbal has quit IRC10:47
*** kjackal has quit IRC10:48
*** dpawlik has quit IRC11:00
*** tdasilva has joined #openstack-infra11:02
*** psachin has joined #openstack-infra11:02
*** dpawlik has joined #openstack-infra11:06
*** jaosorior has quit IRC11:15
*** ramishra has quit IRC11:19
*** ramishra has joined #openstack-infra11:20
*** roman_g has quit IRC11:25
*** roman_g has joined #openstack-infra11:26
*** kjackal has joined #openstack-infra11:26
*** EmilienM has quit IRC11:27
*** EmilienM has joined #openstack-infra11:28
*** jbadiapa has joined #openstack-infra11:37
*** larainema has joined #openstack-infra11:51
*** yamamoto has quit IRC11:51
*** irclogbot_2 has quit IRC11:53
*** yamamoto has joined #openstack-infra11:54
*** irclogbot_1 has joined #openstack-infra11:54
*** rh-jelabarre has joined #openstack-infra11:59
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227312:05
*** joeguo has quit IRC12:06
*** bhavikdbavishi has joined #openstack-infra12:07
*** hwoarang has quit IRC12:08
*** jbadiapa has quit IRC12:08
*** jbadiapa has joined #openstack-infra12:08
*** bhavikdbavishi1 has joined #openstack-infra12:09
*** bhavikdbavishi has quit IRC12:11
*** bhavikdbavishi1 is now known as bhavikdbavishi12:11
*** hwoarang has joined #openstack-infra12:13
*** derekh has quit IRC12:24
openstackgerritTristan Cacqueray proposed zuul/zuul master: manager: specify report failure in logs  https://review.opendev.org/67176012:31
*** rascasoft has quit IRC12:32
*** Goneri has quit IRC12:34
*** rascasoft has joined #openstack-infra12:34
*** yamamoto has quit IRC12:34
*** derekh has joined #openstack-infra12:36
*** dpawlik has quit IRC12:37
*** yamamoto has joined #openstack-infra12:38
*** dpawlik has joined #openstack-infra12:43
*** jpena|mtg is now known as jpena|off12:47
*** mriedem has joined #openstack-infra12:53
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227312:55
*** yamamoto has quit IRC12:57
openstackgerritFabien Boucher proposed zuul/zuul master: Builds page - Fix bad labels display  https://review.opendev.org/67297312:59
*** bhavikdbavishi has quit IRC12:59
*** xek__ has quit IRC13:01
*** xek__ has joined #openstack-infra13:02
donnydLast night I finally got together something to gather metrics for FN. It looks like the hypervisors are under utilized, so I am going to turn things back up a bit.13:06
donnydCPU utilization for the most part sits around 20%, and I am thinking 40-50% would make more use of my equipment. So with that, I turned it back up to 6013:07
donnydI will watch over the weekend to see what it does... I am still.. waiting... on... parts.. for the new storage (fans this time) and when I go to put it in place there will be hard downtime. So I will need to roll it back to zero, but there should be plently of advanced notice.13:08
donnydStill getting timeouts, but about 1/3 as much as rax-ord ( probably because they have 4x the instances)13:10
donnydSeems the be the same jobs timing out everywhere, so I am pretty sure its not from the infra13:10
*** b3nt_pin is now known as beagles13:15
*** bhavikdbavishi has joined #openstack-infra13:18
*** jpena|off is now known as jpena13:19
*** ykarel has quit IRC13:19
*** ekultails has joined #openstack-infra13:20
*** aaronsheffield has joined #openstack-infra13:28
*** smcginnis has joined #openstack-infra13:29
*** Goneri has joined #openstack-infra13:29
*** yamamoto has joined #openstack-infra13:31
*** goldyfruit has joined #openstack-infra13:31
*** bhavikdbavishi has quit IRC13:35
*** pkopec has quit IRC13:36
*** pkopec has joined #openstack-infra13:37
*** michael-beaver has joined #openstack-infra13:38
*** jpena is now known as jpena|off13:38
fungidonnyd: there are also guaranteed to be at least some jobs whose runtimes have crept up close to their timeout values and so minor variances cause them to overrun the allowed duration13:40
fungiusually easy to tell by looking at the success runs, though the durations for them are a click away from http://zuul.opendev.org/t/openstack/builds13:42
fungii wonder if including a duration column there would be useful13:42
fungiand then there are jobs which have nondeterministic/race condition issues causing some process to hang, so the success runs are well under their timeouts but then sometimes they timeout inexplicably because the job stops indefinitely halfway through13:43
donnydTo me it would be extremly useful. A large part of this project is to figure out what it takes to make a CI system goes as fast as possible13:44
*** yamamoto has quit IRC13:45
*** xek__ has quit IRC13:46
*** xek__ has joined #openstack-infra13:47
*** yamamoto has joined #openstack-infra13:48
mordredfungi: if we look in to adding that duration column, we should keep in mind there are some jobs that will show a long duration because they were paused (docker registry, for instance) - so we should account for that somehow, or maybe have a total duration and a total active duration or something13:51
donnydI would think any metric on runtime would be a great place to start. Is there anything I could do to speed up the container based builds, they seem to timeout more than others13:52
*** liuyulong has joined #openstack-infra13:55
*** FlorianFa has quit IRC13:58
*** eharney has joined #openstack-infra14:02
AJaegerconfig-core, puppet-crane is updated now - want to help retiring it, please? https://review.opendev.org/#/c/671268/14:05
AJaegerthanks, mordred14:08
AJaegerconfig-core, and one review to rename a job in grafana, please - https://review.opendev.org/67229014:09
*** ykarel has joined #openstack-infra14:12
*** dpawlik has quit IRC14:16
*** psachin has quit IRC14:22
*** odicha has quit IRC14:22
openstackgerritJames E. Blair proposed zuul/zuul master: Add react-lazylog package  https://review.opendev.org/67298814:23
fungimordred: great point14:24
fungidonnyd: have an example?14:24
*** bhavikdbavishi has joined #openstack-infra14:25
*** bnemec is now known as beekneemech14:26
*** Goneri has quit IRC14:27
*** xek__ has quit IRC14:28
*** xek__ has joined #openstack-infra14:29
donnydWell from a provider prospective I concerned with job start/finish times. I am pretty sure we already gather the data. But I really only can track instance on to off on my end14:30
*** smrcascao has joined #openstack-infra14:31
clarkbwe track that for every job in graphite14:31
fungidonnyd: oh, i meant an example build for something containery which was slow14:31
donnydOh, yea14:31
fungibut sure, makes sense14:31
clarkbevery job also has its own timeout value though which isn't in graphtie and I think what we really want is proximit of duration to timeout14:31
fungii'm mostly just curious if some of these jobs are failing to use mirrors and whatnot14:31
clarkbfungi: ++14:31
openstackgerritMerged openstack/project-config master: Remove puppet-crane  https://review.opendev.org/67126814:32
donnydWell for the containery thing, maybe a proxy that can cache container images in memory or very fast disk14:32
clarkbfungi: for gitea backend replacements do we have a change to pull more of them out of inventory? or do we want to go ahead with 02 for now?14:32
donnydBut I have no real ideas what it would take without digging in14:32
clarkbdonnyd: that is what our mirror node does14:32
donnydDoes it do that for containers14:32
clarkbdonnyd: yup14:33
clarkbdonnyd: as long as you request the images through mirror.regionone.fortnebula.opendev.org instead of hub.docker.com directly14:33
fungiclarkb: i was mostly thinking of doing more in parallel because syncing took so long, but last night's syncs were fast, so one at a time is likely fine14:34
donnydOh... well then the job timeouts have to be more related to the actual job config then the service provider... Without a massive CPU / Memory upgrade, I cannot make things turn any faster.. and from what I could tell the workload seems to be IO bound anyways14:34
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: try lazylog  https://review.opendev.org/67299114:34
donnydclarkb: I guess we would have to look at the failing jobs to see where they are getting their bits from14:35
*** ricolin has joined #openstack-infra14:36
fungiclarkb: anyway, should be safe to delete 02 so i'm doing that now and will then boot the replacement14:37
openstackgerritGraham Hayes proposed zuul/nodepool master: Implement an Azure driver  https://review.opendev.org/55443214:39
clarkbdonnyd: ya that is why fungi was looking for examples14:39
clarkbfungi: gotcha. fwiw you don't have to delete the old one first if you don't want to (though with 8 total backends deleting first should also be totally fine)14:40
clarkbfungi: don't forget to look for leaked volume after server delete14:40
fungiyup14:40
fungino available volumes in that region/tenant14:41
fungiso nova took care of it this time14:41
fungisudo /opt/system-config/launch/launch-node.py gitea02.opendev.org --flavor=v2-highcpu-8 --cloud=openstackci-vexxhost --region=sjc1 --image=infra-ubuntu-bionic-minimal-20190612 --boot-from-volume --volume-size=80 --ignore_ipv6 --network=public --config-drive14:43
fungithose are the options we settled on previously14:43
*** jamesmcarthur has joined #openstack-infra14:43
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227314:43
clarkblooks correct to me14:43
fungi[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details14:44
fungi[WARNING]: Unable to parse /opt/system-config/inventory/emergency.yaml as an inventory source14:45
fungiyeah, that file doesn't exist14:45
clarkbfungi: that may be a regression in launch-node looking for specific inventory files which have since moved14:46
fungiyeah, i'm hunting it down14:46
clarkbyup in launch-node.py it lists that file specifically14:46
clarkbshould be updated to point to /etc/ansible/hosts/emergency.yaml I think14:46
fungiit already includes that14:47
fungioh, wait, playbooks/roles/install-ansible/templates/ansible.cfg.j2 includes it14:48
fungilaunch/launch-node.py only includes the one from the system-config repo14:48
*** chandankumar is now known as raukadah14:48
clarkb/opt/system-config/inventory/emergency.yaml is in launch-node.py14:48
fungii guess we can also clean out the nonexistent system-config one and make sure they both use the one from /etc14:48
clarkbyup14:49
fungii'll also correct some references in doc/source/sysadmin.rst14:50
*** openstackgerrit has quit IRC14:51
*** openstackgerrit has joined #openstack-infra14:52
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: try lazylog  https://review.opendev.org/67299114:52
*** kopecmartin is now known as kopecmartin|off14:56
openstackgerritJeremy Stanley proposed opendev/system-config master: Correct emergency file reference in launch script  https://review.opendev.org/67299614:57
fungiturns out the other entries in playbooks/roles/install-ansible/templates/ansible.cfg.j2 were fine, they were for the actual inventories in the system-config repo14:57
openstackgerritJeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea02  https://review.opendev.org/67299715:02
clarkbfungi: ^ we need a change to add that host to the inventory as gitea02 too15:04
*** piotrowskim has quit IRC15:04
openstackgerritJeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory  https://review.opendev.org/67299915:05
fungiyeah, i was typing it ;)15:05
*** ykarel is now known as ykarel|away15:05
*** yamamoto has quit IRC15:06
clarkbleft a couple notes on the emergency file docs changes. I think followups for that are fine so I +2'd15:07
clarkbfungi: I +2'd 672999 but just realized I think it needs the exclusion on the remote_puppet_git.yaml playbook to avoid having ansible update the db15:08
*** yamamoto has joined #openstack-infra15:09
*** dtantsur is now known as moltendmitry15:09
fungioh, right, thanks15:09
*** yamamoto has quit IRC15:10
*** bhavikdbavishi has quit IRC15:10
openstackgerritJeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory  https://review.opendev.org/67299915:11
clarkb+2 mordred corvus ^ have a moment for that change?15:13
fungigonna grab an early lunch while all that percolates and get the repos initialized when i get back15:14
clarkbfungi: we'll want ot double check that gitea01 noops as expected when ansible runs against it but it did with 06 so should be fine15:14
donnydOk, I'm picking up what your are putting down now15:14
mordredclarkb: lgtm15:14
clarkbdonnyd: I'm happy to look at some examplse too and should have a logstash window with the timeout queryup somewhere let me see15:14
donnydnode_provider:"fortnebula-regionone" AND filename:job-output.txt AND message:"RUN END RESULT_TIMED_OUT"15:15
clarkbyou can keep all your tabs forever but what they don't tell you is that if you do you'll never find the one you want again15:15
* clarkb has a tab problem15:15
*** jamesmcarthur has quit IRC15:17
donnydclarkb: needs nested tabbing15:17
openstackgerritMerged opendev/zone-opendev.org master: Update IP address for gitea02  https://review.opendev.org/67299715:21
*** jamesmcarthur has joined #openstack-infra15:21
*** cmurphy is now known as cmorpheus15:25
clarkbhttp://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/job-output.txt is an example of a timed out default tempest job15:29
clarkbdevstack took 32 minutes to run there which isn't the fastest but is also within range of other cloud regions15:29
clarkband it doesn't timeout until it gets into the slow tempest tests run which is right at the end of the job so we are very near the timeout15:29
donnydI would like to get the devstack install times down a little further, but I am not sure where the bottleneck in it is15:30
clarkbdonnyd: I think a lot of it could be improvements to devstack and the projects themselves. For example we do database migrations for a lot of projects from many releases ago as they haven't been rolled up15:32
donnydOn the bright side I bumped the cpu ratios back up and it would seem that this is more where I would like density to be15:32
clarkbbut ya I agree devstack could be quicker, we just don't have anyone investing in that (and when people suggest alternatives to devstack they tend to be even slower :( )15:33
donnydhttps://usercontent.irccloud-cdn.com/file/K6D0a0CW/Screenshot%20from%202019-07-26%2011-32-21.png15:33
donnydtowards the left side is this morning with cpu ratio at 1.5:115:34
donnydand right side is 2.0:115:34
donnydin case anyone is curious15:34
clarkbthe good thing about that tempest job is that it has dstat logs so we can sanity check those15:34
clarkbif you download http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/dstat-csv_log.txt.gz, uncompress it then you can feed it to https://lamada.eu/dstat-graph/15:35
donnydI am interesting in what can be done from an infra perspective to speed up devstack15:35
*** yamamoto has joined #openstack-infra15:35
donnydinterested*15:35
clarkblooking at that dstat graph I think (lack of) memory and resulting spike in load average and disk usage and paging are a big hit15:37
clarkbc-bak is still a running service which was identified as a memory hog that isn't actually tested by the job? mriedem you responded to my emails about this in the past do you have up to date info?15:38
mriedempatch in tempest is still sitting15:38
mriedemhttps://review.opendev.org/#/c/651865/15:38
mriedemi think it's waiting on gmann's work to split apart the integrated gate template into separate jobs so that cinder is running something which runs c-bak so those tests still get run on cinder changes15:39
*** jbadiapa has quit IRC15:39
mriedemhaving said that, now that gmann is adding all of these new templates, those new templates, except for anything that runs cinder, should probably also disable c-bak15:39
clarkblooking at the graph a bit more closely we can see lack of memory results in a spike in wai state15:39
clarkbwhich will definitely slow things down15:40
clarkbmriedem: Ideally we'd look at where the memory use in the projects is coming from too. I know heat did this once and dropped memroy use by like 800MB15:40
donnydI can't make disk speeds much more than they already are15:40
*** yamamoto has quit IRC15:41
clarkbdonnyd: I think that is fine. Once we hit swap its sort of a "good luck" point15:41
donnydwrites are hitting the limits of an individual nic, although reads could be faster15:41
*** noorul has joined #openstack-infra15:41
clarkbdonnyd: we have swap there because sometimes it is fast enough that we don't have to throw away the job, but if it isn't well hard to blame the cloud for that15:41
mriedemaha https://review.opendev.org/#/c/669312/6/.zuul.yaml@21315:41
donnydI am curious to see if the new storage will speed the jobs up any at all... doesn't look like there is much of a write workload looking at an individual job15:42
*** armax has joined #openstack-infra15:43
donnydhttps://usercontent.irccloud-cdn.com/file/YeqEgCTu/image.png15:43
donnydBut looking at network traffic overall I surely bang up against the limits of an individual nic15:43
clarkbscanning devstack logs we install uwsgi and compile it because our wheel mirror doesn't have it15:44
clarkbfixing that will save ~15 seconds15:44
clarkbprometheanfire: smcginnis tonyb any idea why uwsgi isn't listed in global-requirements? if it were we would have wheels ready to go for it15:45
donnydOn at least my cloud, my cpu to memory ratios could go way up on the memory side. I am using not even 25% of the addressable memory15:46
clarkbdonnyd: we've intentionally limited memory in the test environments in part to make it possible for someone to say "my test failed" then run it locally on say a laptop or desktop and not require them to have a rack of servers15:46
clarkbit also helps reach a balance with clouds and resource utilization where we don't have a ton of underutilized instances15:47
clarkb(tests can scale up by requesting multiple nodes if they know they need that)15:47
donnydThat makes sense, but I am curious to know if giving devstack more memory would in-fact speed it up15:47
donnydI am not sure what laptops out there have 8 cores, but only 8 GB of memory15:48
donnydThe laptops I have that do have better processors (with 8 cores), also usually have a bit more in the memory dept... mine specifically 64G... But i don't think 16 is a typical at this point...15:49
clarkbdonnyd: with cores we use more of them if we have them (to speed up testing) but you don't need 8 to run tempest15:49
clarkb8GB remains the pretty typical laptop memory setup15:49
clarkbgoing to single dimm machines for thinness seems to have really impacted memory availability15:50
donnydI am not disagreeing, because what you are saying makes sense15:50
*** moltendmitry is now known as dtantsur|afk15:50
donnydI am just trying to find out where the optimal amount of memory lies to make devstack go as fast as possible with reasonable DC equipment15:51
*** xek__ has quit IRC15:51
clarkbBut also we have bloated memory use and I don't think anyone has looked at why other than my quick "oh c-bak made dstat sad"15:51
clarkband rather than simply through more memory at the problem it would be good to understand15:51
*** xek__ has joined #openstack-infra15:51
prometheanfireclarkb: at the time we didn't want to choose one impl (gunicorn vs uwsgi vs whatever)15:52
donnydI will run some tests on my end to see where the balance in between "not achievable on a laptop" and "fast as possible lies"15:52
prometheanfirenot sure about license off the top of head either15:52
*** Vadmacs has quit IRC15:52
*** jangutter has joined #openstack-infra15:54
clarkbprometheanfire: uwsgi is gpl v2+ with linking exception15:54
prometheanfireclarkb: ya, just looked it up15:54
prometheanfirenot sure that's allowed or not, I know the base gpl is not15:55
paladoxmordred bazel caused load for me to go up to 233 apparently15:55
clarkbwe spend almost 7 minutes just creating keystone services and roles and such15:57
*** eharney has quit IRC15:57
prometheanfireguess it'd be considered Projects run as part of the OpenStack Infrastructure15:57
prometheanfireso as it's OSI, it'd be fine15:57
clarkbcmorpheus: ^ Any idea if we can trim that down? like maybe we don't need all of those roles by default?15:57
clarkbprometheanfire: and the linking exception makes it extra sfae15:57
prometheanfireyep15:57
*** rpittau is now known as rpittau|afk15:57
clarkbcmorpheus: starts at about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_36_48_436 there and goes to about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_43_43_15315:58
cmorpheusclarkb: looking15:58
*** gyee has joined #openstack-infra15:59
openstackgerritGraham Hayes proposed zuul/nodepool master: Implement an Azure driver  https://review.opendev.org/55443216:01
cmorpheusthe reader member and admin roles are created by keystone and can't be trimmed down, the ResellerAdmin role i think is for swift, i don't think anotherrole or invisible_to_admin are really useful right now16:01
cmorpheusoh invisible_to_admin is a project i guess16:02
cmorpheusi haven't read scrollback, what is the actual problem?16:02
prometheanfireclarkb: I'd put it to the list before we settle on it, given that reqs doesn't like duplicate functionality16:03
clarkbcmorpheus: looking into general slowness of devstack + tempest jobs16:03
clarkbcmorpheus: devstack took 32 minutes in this case and ~7 of that is just that keystone setup16:03
clarkbseparately it appears that digging into swap during tempest runs may be cause of slowdown when running tempest16:03
clarkbcmorpheus: mostly just trying to see if we can improve runtime by fixing inefficiencies16:04
*** lucasagomes has quit IRC16:04
cmorpheusokay cool16:05
clarkbis osc the only way to create those keystone setup entries? Is keystoneclient still a thing or is there an admin tool?16:06
*** icarusfactor has joined #openstack-infra16:06
clarkb(might be helpful to do comparison of tool costs if we can)16:06
cmorpheuskeystoneclient has no cli and is going away some day16:07
cmorpheusthe keystone-manage admin tool can't be used for most of this, we only use it to bootstrap an admin user16:07
clarkbwe unfortunately pushed everything into osc and then realized after it was too late that it had a large performance impact16:08
*** factor has quit IRC16:08
clarkbI guess we could write a script to do that chunk of config16:08
clarkbto avoid the cost of python and pkg_resources spin up time16:08
cmorpheusone thing we could do is remove all the service users and just have one service user do all the things16:09
clarkbmordred: ^ how crazy would such a thing be? and maybe you already have such a thing because sdk testing?16:09
cmorpheuscrap i have a meeting brb16:09
*** e0ne has quit IRC16:12
clarkbhttps://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1146-L1161 that is the 7 minute block16:14
openstackgerritPaul Belanger proposed opendev/base-jobs master: Switch base-test ansible version to 2.8  https://review.opendev.org/67301216:14
*** jamesmcarthur has quit IRC16:15
pabelangerinfra-root: ^if you don't mind reviewing, this allows us to start testing base-test jobs using ansible 2.8.   Which should be fine, and humans need to opt into base-test16:15
*** jamesmcarthur has joined #openstack-infra16:16
*** mattw4 has joined #openstack-infra16:17
openstackgerritMerged opendev/system-config master: Add gitea02 replacement to inventory  https://review.opendev.org/67299916:17
*** pkopec has quit IRC16:18
jangutteris #openstack-infra the go-to place to talk about devstack?16:19
jonherjangutter #openstack-qa is probably best for devstack16:20
jangutterthanks jonher!16:20
jonhernp16:20
*** diablo_rojo has joined #openstack-infra16:21
*** jamesmcarthur has quit IRC16:21
fungiokay, back now... let's see where we're at16:23
clarkbfungi: change to run ansible on gitea02 just merged16:23
clarkbfungi: waiting on that to appyl now16:23
*** ricolin has quit IRC16:23
fungiaccepted the gitea02 ssh host key on bridge.o.o now16:23
*** noorul has quit IRC16:23
fungion the next ansible pass it ought to set up docker/gitea and then i can initialize the repos16:24
*** larainema has quit IRC16:25
clarkbmordred: do methods like conn.identity.role_project_user_assignments in sdk's examples actually exist? and if so why do they not show up in the api docs nor outside of the examples when grepped for?16:27
clarkb(in particular it would be great to know if that takes any parameters for narrowing the list of roles)16:28
*** jamesmcarthur has joined #openstack-infra16:31
openstackgerritMerged opendev/base-jobs master: Switch base-test ansible version to 2.8  https://review.opendev.org/67301216:32
*** pkopec has joined #openstack-infra16:33
*** icarusfactor has quit IRC16:33
*** icarusfactor has joined #openstack-infra16:33
openstackgerritPaul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test  https://review.opendev.org/67301416:35
*** mriedem is now known as mriedem_lunch16:35
*** jamesmcarthur has quit IRC16:36
*** icarusfactor has quit IRC16:36
*** jamesmcarthur has joined #openstack-infra16:36
openstackgerritPaul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs  https://review.opendev.org/67301516:36
openstackgerritPaul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs  https://review.opendev.org/67301616:37
*** roman_g has quit IRC16:41
*** iurygregory has quit IRC16:46
openstackgerritJames E. Blair proposed zuul/zuul master: Add severity filtering to logs  https://review.opendev.org/67283916:46
*** iurygregory has joined #openstack-infra16:47
openstackgerritJames E. Blair proposed zuul/zuul master: Add severity filtering to logs  https://review.opendev.org/67283916:49
clarkbcmorpheus: mordred https://review.opendev.org/673018 is the I have no idea what I am doing change16:51
openstackgerritJean-Philippe Evrard proposed openstack/project-config master: [WIP] Add tooling to update python jobs on branch creation  https://review.opendev.org/67301916:52
*** mattw4 has quit IRC16:54
clarkbfungi: base.yaml is running on gitea01 now16:57
clarkber 0216:57
fungiyep, docker not running yet though16:58
*** mattw4 has joined #openstack-infra16:58
fungikeeping tabs on it and will initialize repos as soon as gitea is up16:58
*** derekh has quit IRC16:58
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302017:01
*** eharney has joined #openstack-infra17:01
fungiit did install gnupg this time17:01
funginot docker yet though17:01
*** jangutter has quit IRC17:02
mnaserclarkb: i wonder if replacing all that osc shell code by some sort of python code would speed things up17:03
mnaseri'd imagine it would17:03
clarkbmnaser: yup see https://review.opendev.org/67301817:04
clarkbmnaser: thats the small scale check it on a thing we run 15 times17:04
clarkbif that shows improvements we can rewrite to be more complete17:04
* fungi misread that as a suggestion to replace openstackclient with something written in python17:04
fungi1000      4476  1.4  1.1 1702848 90592 ?       Ssl  17:04   0:01 /app/gitea/gitea web17:06
fungiwoo!17:06
fungiproceeding17:06
*** iurygregory has quit IRC17:07
fungihttps://docs.openstack.org/infra/system-config/gitea.html#deploy-a-new-backend indicates the next step is to stop gitea and restore a database dump17:08
clarkb++17:08
*** jamesmcarthur has quit IRC17:08
dtroyerheh, I proposed once upon a time replacing those keystone bits with a string of commands piped into osc so it only loads once, it does help time-wise but the lack of decent error handling was a concern so we dropped it…17:09
* dtroyer is still accepting sponsorship proposals to write a proper cli in a single non-nterpreted binary17:09
fungi`docker ps -a` indicates i should stop 5c59a8a31b9d (opendevorg/gitea:latest) and 8e68bb69a209 (opendevorg/gitea-openssh) but leave 5bdefc623895 (mariadb:10.4) running17:09
clarkbdtroyer: fwiw non interpreted binary isn't really the problem as much as "python is silly and scans the entire disk when loading packages then sorts them all by name and version because that is fast"17:10
fungiokay, now only the mariadb:10.4 container is "up"17:10
clarkbdtroyer: I fully expect my python script there to be much quicker since it doesn't pkg_resources17:10
clarkb(at least I hope it doesn't end up doing that via openstacksdk)17:11
clarkbthis is why I'm testing it small scale first17:11
dtroyerclarkb: yes, but it is still a PITA to install17:11
clarkbdtroyer: wget'ing a prebuilt binary has a lot of problems with it too :/17:12
clarkbmostly in verifying the contents are as expected17:12
* mnaser rather wget a prebuilt binary that's always a fast client (or even build once) than wait 3 seconds every single time i run a command :(17:12
mnaseron a brand new openstack cluster (3 controllers in ha, zero load, zero vms): real 0m2.605s for openstack server list17:13
fungisee, it's only 3 seconds because you rounded to the nearest second! ;)17:14
clarkbmnaser: ya when I first looekd at this a couple years ago I think the numbers I had were about 50% scanning packages and sorting them and 50% http rtt17:14
clarkbbut again you can avoid scanning packages and sorting them with python17:15
clarkbunfortunately the thing that does the scanning and sorting is pretty well entrenched so shows up all over the place (meaning if you remove it one place you find it in another and the list goes on and on)17:16
clarkband of that 50% rtt for http I want to say a good chunk of it is getting a token?17:18
clarkbI don't think I was caching the tokens17:18
clarkbbut maybe that is automagic and I didn't notice17:18
*** bobh has joined #openstack-infra17:23
clarkbfungi: how goes gitea02 db recovery?17:25
fungijust about done shuffling db copies around17:26
*** bobh has quit IRC17:26
fungiwanted to grab a fresh one just in case17:26
fungii can reuse it for subsequent replacements today/tomorrow at least17:26
clarkbcmorpheus: can you see my responses to your comments on that devstack change? the sdk api docs say that those parameters are not valid17:28
*** weifan has joined #openstack-infra17:28
*** udesale has quit IRC17:31
*** igordc has joined #openstack-infra17:34
*** harlowja has quit IRC17:35
*** igordc has quit IRC17:36
*** igordc has joined #openstack-infra17:36
fungidb import to gitea02 completed and "All missing Git repositories for which records existed have been reinitialized."17:37
fungistarting the gerrit replication to it now17:37
clarkbcmorpheus: does that mean I should do a filter just of the user then scan the resulting list for RoleAssignments that match the user_domain (and project_domain if assigned)?17:38
*** electrofelix has quit IRC17:38
fungi2115 tasks17:39
clarkbfungi: I've realized this may take longer than ~14 minutes beacuse there is no data at all on the rmote17:39
fungioh, perhaps17:39
clarkbbut we should see how long it actually takes compared to the mostly noop case of 14 minutes17:39
fungithat could explain some of it17:39
fungiyep17:39
*** Vadmacs has joined #openstack-infra17:41
cmorpheusclarkb: did you see my response?17:41
*** hwoarang has quit IRC17:42
*** hwoarang_ has joined #openstack-infra17:43
cmorpheusfiltering by role assignment on a domain is not the same as the user domain that the get_or_add_user_project_role function is getting17:43
clarkbcmorpheus: ya I'm not quite sure I understand what we an to get back? I guess we specify the user domain for when the user isn't in the default domain set by env vars? in which case asking for all the roles won't work?17:43
clarkbcmorpheus: right I get they aren't the same but I don't know what it is we actualyl want17:43
clarkbwe want the list of role assignments for the user that is in domain_not_default_in_env_var ?17:43
openstackgerritJeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03  https://review.opendev.org/67302617:44
fungithat ^ should be the next round once replication completes17:44
clarkband sdk doesn't (document at least) a method to get that data short of creating a different connection with different user domain details maybe17:44
cmorpheusclarkb: we want the list of role assignments that the user has on the project, domains only come into play here because both the user and the project are namespaced by a domain17:44
cmorpheusif you only have the username then you always need to specify the domain17:45
cmorpheusexcept if it's the default domain then i think osc and may sdk do some magic for you there17:45
clarkbcmorpheus: right ok. Domain is specified to be default via env vars. So this will work as long as the user isn't in a non default domain17:45
clarkb(and I update it to not filter on domain)17:45
cmorpheusokay sounds good17:45
cmorpheusalso i apologize on behalf of my predecessors who came up with this17:46
fungii'll drink to that17:48
*** rtjure has quit IRC17:50
clarkbonce I get enough logging to hopefully figure out if the domains are supplied as ID's or names I'll update the connect call17:50
*** mriedem_lunch is now known as mriedem17:50
*** ykarel|away has quit IRC17:51
*** tesseract has quit IRC17:54
clarkbwe appear to primarily operate using IDs. change updated to reflect that now17:54
*** hwoarang_ has quit IRC17:54
fungi651 tasks17:54
*** Vadmacs has quit IRC17:55
clarkbfungi: that is slower but not significanlty so17:55
fungii'll self-approve 673026 once it hits ~0 and start on replacing 0317:55
*** rtjure has joined #openstack-infra17:56
*** hwoarang has joined #openstack-infra17:56
*** chason has quit IRC17:56
clarkbfungi: I just noticed a bug in that chagne, the ip addr for gitea02 in the haproxy config is not up to date17:57
fungithanks!!!17:59
fungifixing now17:59
openstackgerritJeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03  https://review.opendev.org/67302618:01
fungiclarkb: ^18:01
clarkbthat looks better thanks18:02
fungialso looks like replication finished18:02
*** mattw4 has quit IRC18:02
*** mattw4 has joined #openstack-infra18:02
fungicurrently there's a slew of git-upload-pack '/openstack/nova' for the zuul user18:03
*** hwoarang has quit IRC18:03
fungiand a sudden burst of index changes18:03
*** hwoarang has joined #openstack-infra18:04
*** ramishra has quit IRC18:12
*** weifan has quit IRC18:16
*** weifan has joined #openstack-infra18:17
*** weifan has quit IRC18:17
donnydclarkb: fungi So I am hoping that this will be the weekend I can actually get my controller swapped out. Do you think we should update the quota in zuul, as the control plane will be unreachable for an hour or so? Or just let it ride because weekend loads are low anyways18:20
fungidonnyd: dropping the quota to 0 won't help i don't think if the api itself is unreachable, but we can certainly set max-servers to 0 in nodepool. or alternatively just expect that there will be a handful of boot failures logged by nodepool until the api is back up18:22
donnydthats what I mean.. yes. max-servers18:22
donnydthanks fungi18:22
fungii don't see the latter as a problem really18:22
fungii mean, we've designed nodepool to withstand severe provider outages18:23
donnydI am hoping to do an mostly online swap from my edge LB between the two control planes18:23
fungiif setting max-servers=0 for it will help you feel less rushed, i'm happy to approve such a change18:23
donnydSo the new controllers will be built and populated in parallel and then just swap out the place the edge LB sends requests to18:24
donnydAll sounds good... till it doesn't18:24
fungiif the account credentials, endpoint and so on will remain the same, then i don't personally see any need to zero out the max-servers for it18:25
donnydYea, I have all of what was used to provision this automated (mostly)18:27
donnydso nothing should change on your end at all18:27
pleia2happy sysadmin day :)18:27
clarkbnodepool should happily deal with those api errors, it may print a lot of log meesages about it though (thats fine)18:27
clarkbpleia2: and to you too!18:28
donnydI'm hopeful for a 5 minute outage... but I also live in the real sysadmin world18:28
clarkbpleia2: are you sysadmining for ibm?18:28
*** Vadmacs has joined #openstack-infra18:29
fungithanks pleia2!!!18:31
*** hwoarang has quit IRC18:32
fungimay your systems be bountiful18:32
pleia2clarkb: only a tiny bit here and there (we run an openstack-driven cloud that launches VMs on mainframes, so I poke my head in when needed)18:32
pleia2mostly I do dev advocacy though18:32
*** hwoarang has joined #openstack-infra18:32
pleia2(we use the z/VM connector for nova, but switching to KVM soon, which runs on s390x and will make our lives 100x easier)18:33
*** tdasilva has quit IRC18:33
*** weifan has joined #openstack-infra18:34
*** weifan has quit IRC18:34
*** weifan has joined #openstack-infra18:38
fungipleia2: openstack in use at ibm? i thought that was only a myth!18:39
pleia2haha18:41
clarkbpleia2: is kvm loaded off of a virtual card deck?18:42
*** weifan has quit IRC18:42
clarkbit would make me so happy if it is18:42
pleia2clarkb: I don't actually know how it works :)18:43
pleia2it's a supported thing though, right alongside z/VM18:44
*** mattw4 has quit IRC18:46
*** bobh has joined #openstack-infra18:48
corvuspleia2: happy sysadmin day to you too!18:48
corvuswho brought the cake?18:50
clarkbI don't have cake but now I want some18:50
clarkbI did just eat some leftover curry18:50
*** mattw4 has joined #openstack-infra18:51
corvusclarkb: was that breakfast or lunch?18:51
clarkbsomething in the middle but closer to lunch18:51
clarkbI think my little python script in devstack didn't break this time18:52
clarkbnot sure if it is faster yet. Will have to wait for logs and compare to other jobs on that cloud18:52
fungii could go for some currycake18:53
clarkbhrm I got the handling of user_domain and project_domain wrong looks like18:53
clarkbbecause they are names not ids18:53
clarkbso inconsistent18:53
fungipleia2: i have fond memories of being a racf administrator for s/390 clusters running linux in lpars. i hope it's as enjoyable for you!18:57
fungi[edit: i guess we called it a "sysplex" not a "cluster"]18:57
*** bobh has quit IRC19:02
pleia2fungi: cool, it sure is :)19:03
clarkbcmorpheus: it is amazing how complicated this little script ends up getting, makes me wonder how much faster it would be overall to continue to support names and ids over osc19:04
*** rh-jelabarre has quit IRC19:04
cmorpheusheh19:04
clarkbI've basically run into needingto look up all inputs to get their id's because they might be names19:05
clarkb(devstack uses both names and ids)19:05
fungiand i guess the api doesn't treat them interchangeably19:07
clarkbthe wins now may not be seen unless we rewrite that section of shell into python entirely. Then we can get the role user and project data once and reuse it over and over and over19:07
*** weifan has joined #openstack-infra19:07
*** weifan has quit IRC19:08
fungiand so the slow progression of translating devstack into python continues19:08
*** weifan has joined #openstack-infra19:08
clarkbfungi: apparently not re treating them the same19:08
fungithat's one thing i've come to appreciate about gerrit's rest api... you can provide a variety of typed inputs for certain parameters and it will decide how to equate them19:09
fungiso if you're doing an account lookup you can provide an id number or a username or an e-mail address or... and it will dereference them all to the same values behind the scenes19:10
fungigranted, it also returns lists for just about everything, because there's no guarantee that the inputs reduce to a single value19:12
cmorpheusclarkb: yeah you're starting to reproduce part of why osc is so slow19:12
*** weifan has quit IRC19:13
fungisometimes the only way to truly understand a problem is to try and reproduce the solution?19:13
openstackgerritPaul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test  https://review.opendev.org/67301419:15
*** rh-jelabarre has joined #openstack-infra19:16
openstackgerritPaul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs  https://review.opendev.org/67301619:18
openstackgerritPaul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs  https://review.opendev.org/67301519:18
*** diablo_rojo has quit IRC19:26
clarkbcmorpheus: ya I think the proper way this gets faster is to rewrite the whole configure accounts stuff in python rather than just the function bits19:30
openstackgerritMerged opendev/system-config master: Swap gitea02 into service and bring down gitea03  https://review.opendev.org/67302619:31
*** weifan has joined #openstack-infra19:32
fungias soon as that gets installed onto the lb and the active connections to 03 trail off, i'll rip and replace19:35
*** weifan has quit IRC19:36
*** weifan has joined #openstack-infra19:36
*** weifan has quit IRC19:37
*** weifan has joined #openstack-infra19:37
*** igordc has quit IRC19:37
*** weifan has quit IRC19:38
*** weifan has joined #openstack-infra19:38
*** weifan has quit IRC19:39
*** weifan has joined #openstack-infra19:39
*** weifan has quit IRC19:39
*** weifan has joined #openstack-infra19:40
*** weifan has quit IRC19:40
*** slaweq has quit IRC19:41
*** wpp has quit IRC19:48
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302019:50
*** ralonsoh has quit IRC19:55
openstackgerritJeff Liu proposed zuul/zuul-operator master: Verify Operator Pod Running  https://review.opendev.org/67039519:55
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302019:55
mordredwow. that's a fun change!19:58
*** goldyfruit has quit IRC20:02
paladoxmordred bazel caused load on my mac to go up to 233.20:03
mordredpaladox: well - I can build 2.15 now - but 2.16 and 3.0 are _really_ unhappy20:04
*** goldyfruit has joined #openstack-infra20:04
paladoxoh20:04
paladoxmordred do you run out of cpu/ram for 2.16?20:04
mordredyeah - if I use the same settings that work for 2.1520:05
paladoxI'm surprised 3.0 is a problem as that removed GWTUI (so less to build)20:05
*** wpp has joined #openstack-infra20:05
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227320:05
mordredyeah20:05
mordredpaladox: oh - actually - the last time I ran it, 2.16 failed in a new and different way20:06
paladoxoh20:06
mordredhttp://logs.openstack.org/73/672273/6/check/system-config-build-image-gerrit-2.16/1c13564/job-output.txt.gz#_2019-07-26_13_32_10_64353620:06
*** harlowja has joined #openstack-infra20:07
mordredand I seem to be foot-gunning with 3.0 ... doh.20:07
paladoxohhh, was it trying to use the master branch?20:07
*** Vadmacs has quit IRC20:07
paladoxhttps://github.com/GerritCodeReview/plugins_download-commands/commit/891455076417dd097fdfd63f4afc0d28a3e85aff <-- was the change that caused that20:08
*** Vadmacs has joined #openstack-infra20:08
paladoxhttps://github.com/GerritCodeReview/plugins_download-commands/branches dosen't appear to have a stable-2.16 branch20:08
openstackgerritJames E. Blair proposed zuul/zuul master: Colorize log severity  https://review.opendev.org/67310320:09
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302020:09
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227320:09
*** igordc has joined #openstack-infra20:10
openstackgerritPaul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs  https://review.opendev.org/67301520:11
*** slaweq has joined #openstack-infra20:11
*** weifan has joined #openstack-infra20:11
mordredpaladox: hrm. good catch.20:11
*** sgw has joined #openstack-infra20:12
*** weifan has quit IRC20:16
*** slaweq has quit IRC20:16
openstackgerritJames E. Blair proposed zuul/zuul master: Add raw links to log manifest  https://review.opendev.org/67310420:25
openstackgerritJames E. Blair proposed zuul/zuul master: Rename view to logfile  https://review.opendev.org/67310520:25
*** wolke has quit IRC20:26
*** wolke has joined #openstack-infra20:27
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227320:27
mordredpaladox: thanks! I Think that version might just work (instead of overlaying the zuul cloned repo directly, it uses submodule update --init to do it - but since we don't have an origin remote but we DO have things cloned in the right relative path, it should clone from the already cloned repo and not across the network20:28
mordredcorvus: ^^ weird but nice side-effect for gerrit submodule plugin repos and zuul20:28
paladox:)20:29
mordredcorvus: I *think* we might want to update the playbook to do that for all of the plugin repos, not just download-commands20:29
mordredso that we get the ref that the gerrit repo is expecting. now - that obviously breaks depends-on - but we can solve that when we have a need to do a depends-on with a plugin ref20:29
*** wolke has quit IRC20:29
*** wolke has joined #openstack-infra20:30
corvusmordred: wait we use required-projects for the plugins20:32
mordredyes. that's why the submodule update --init will work20:32
mordredsince there's no remote origin remote, it'll actually do relative path20:32
*** wolke has quit IRC20:33
*** wolke has joined #openstack-infra20:33
corvusright, yes.  i think i misunderstood what you were saying before :)20:33
mordredcorvus: I probably misunderstood what I'm saying - we're talking about submodules after all :)20:33
corvushaha20:34
corvusand yeah the shape of that change looks good to me20:34
mordredcorvus: here's hoping this build works!20:34
mordredif it does, I might try making all of the plugin repos use this mechanism instead of the copy20:34
mordredin fact - why don't I do that as a followup...20:35
fungigitea02 is in rotation now and gitea03 is removed, working on replacement20:35
corvusmordred: oh wait20:35
*** wolke has quit IRC20:35
paladoxthat reminded me to pull in 2.15.15 :P, which i also found the build now fails :(20:35
corvusmordred: okay, so download-commands is the issue -- why don't we just specify the right branch for that?20:35
paladoxcorvus it dosen't have any 2.16+ branches20:35
paladoxapparently20:35
paladoxSee https://github.com/GerritCodeReview/plugins_download-commands/branches20:36
*** wolke has joined #openstack-infra20:36
corvusmordred: ok; the downside of that is that depends-on won't work20:37
corvusbuilding at all > supporting depends-on > not supporting depends on20:38
mordredyes20:38
corvusso i agree this is the best we can do with download-commands :)20:38
corvusbut maybe it's not what we want for the others20:38
mordredyeah - I agree20:38
mordredalso - doing it for the others makes the playbook more, not less, complex20:39
mordredbecause doing submodule update --init is only useful for "builtin" plugins - and our "standard" set is a mix of both20:39
mordredfor non-standard, moving the repo in is always the right choice20:40
mordredcorvus: can override checkout take a sha?20:40
corvusmordred: i think it has to be a ref20:42
mordreddarn20:42
corvusbut can be a branch or tag20:42
*** weifan has joined #openstack-infra20:42
mordredI was thinking if it could we could just override-checkout for the sha that 2.16 wants and then depends-on is only borked for 2.1620:43
openstackgerritMonty Taylor proposed opendev/system-config master: Override-checkout download-commands to v2.16.10  https://review.opendev.org/67310720:44
mordredcorvus: woot. there's a tag20:44
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302020:47
*** wolke has quit IRC20:48
*** goldyfruit has quit IRC20:48
*** wolke has joined #openstack-infra20:49
*** wolke has joined #openstack-infra20:51
*** wolke has quit IRC20:52
*** priteau has quit IRC20:53
jrosserthis has merged https://review.opendev.org/#/c/672952/, but i don't see it here https://opendev.org/openstack/ansible-config_template/commits/branch/master?lang=en-US#20:54
jrosseris something broken?20:54
fungithe commit itself seems to have replicated: https://opendev.org/openstack/ansible-config_template/commit/b7f38639a21857aead860195d12eccf6eb9f437e20:56
jrosseri just rechecked a ton of jobs that need that and they've all failed20:57
jrossersuggests that master doesnt point to quite the right place20:58
corvusjrosser: have a link to a job that failed?20:59
fungigerrit's on-disk copy of the repository indicates b7f38639a21857aead860195d12eccf6eb9f437e is the tip of master, so in theory ci jobs should be using that20:59
jrosserhttp://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz21:00
fungibut i do wonder what's happened to replication to gitea21:00
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302021:02
*** goldyfruit has joined #openstack-infra21:05
clarkbcmorpheus: https://review.opendev.org/673108 I probably went a little crazy21:05
fungithat master branch state doesn't seem to have replicated to any of the active gitea backends21:06
corvusevrardjp, jrosser: is anyone working on updating that job to use zuul git repo checkouts?  because we really shouldn't be cloning from opendev.org in jobs21:06
cmorpheusclarkb: omg21:06
fungii'm going to try to manually trigger full replication for openstack/ansible-config_template and see what happens21:06
cmorpheusclarkb: you replaced 80 lines of shell with 300 lines of python21:07
clarkbcmorpheus: I know. I just want real data to know if there are gains to be had here. If not I'll give up21:07
clarkbcmorpheus: this change should be alrge enough and caching enough id data to see a delta though21:08
jrossercorvus: well in theory it should be using them https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L35-L6121:08
jrosserbut of course that could be broken21:08
mordredhttp://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_33_116422 seems to be running, but then http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_46_107736 is still doing the clone21:10
corvusjrosser: oh that looks promising... /me digs into that21:10
mordredso I'm thinking maybe the filtering in https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L74-L78 is not doing the right thing/21:10
mnaseroh look osa things21:10
mordred?21:10
mnaseri wonder if we're missing required-projects21:11
fungiforcing full replication doesn't seem to have updated openstack/ansible-config_template master branch state21:11
jrosserthis looks suspect http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_43_64614821:11
mnaserhttps://review.opendev.org/#/c/670473/ is an os_ceilometer change21:11
mnaserand well it checked out os_ceilometer21:11
corvusmnaser, jrosser: yes, i think that's it21:11
mnaserso because we dont have all the required-projects listed21:12
corvusso it is working, it's just there are no required projects, so it's only activated for dependencies21:12
fungifor some reason the master branch state is correct in the github mirror but not on the gitea backends21:12
mnaserwe're not hard failing like things failed whn we moved to zuulv321:12
fungichecking gerrit logs next21:12
mnasercause we don't hard fail if the stuff is not missing21:12
mnaser(we should probably hard fail in ci if a required-project is missing)21:12
corvusand if we added more there, then the job would benefit from the cache and be faster (and also be immune to mirror hiccups)21:12
mnaserand skip when not in ci21:12
mnaserlet me hack up something21:12
*** rfarr has joined #openstack-infra21:13
jrossermnaser: thanks :)21:13
corvusmnaser, jrosser, fungi: cool, so to summarize -- the job is set up to use zuul repos but simply doesn't have enough required projects listed so we fell back on the mirror, and the mirror is slightly out of date for as-yet-unknown reason.21:13
corvusevrardjp: ^ you can ignore ping from earlier :)21:14
fungicorvus: agreed. still digging into the outdated state of the master branch ref for that repo in gitea21:15
fungino mentions of that repo in the gerrit error log aside from some timeouts while stackalytics-bot-2 was trying to query it21:15
clarkbfungi: if you rereplicate that repo does it update?21:15
clarkbcan limit it to a single gitea to prevent polluting the debuggable state21:16
fungithat did nothing as far as i could tell (saw it enqueue the replication events, waited until they were done, still no change)21:16
fungiall active gitea backends seem to be in a similar state with that repo too21:16
clarkbdid we have fs/disk problems again?21:19
clarkbor maybe this is a hold over?21:19
mnaserjrosser, corvus, fungi: https://review.opendev.org/673109 should be a failing CI job (because not everything is in required projects).  if that fails as expected, i'll readd them to required-projectrs21:21
*** beekneemech is now known as bnemec-pto21:22
*** eharney has quit IRC21:23
fungiclarkb: what's strange is that gitea02 was created at 14:43z today, its database was copied from gitea01's most recent nightly mysqldump, and then content was replicated in. the openstack/ansible-config_template master branch state updated at 19:38z today, long after we had finished replicating all repository states into gitea0221:25
clarkboh interesting21:25
fungiso it's hard to blame this on old state, unless we blame the copied database dump?21:25
*** raissa has quit IRC21:26
*** Lucas_Gray has joined #openstack-infra21:29
*** rfolco|rover has quit IRC21:30
openstackgerritJeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test  https://review.opendev.org/67302021:30
fungion gitea02, /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is still 09c76e238026d7ba4134ee2b66a4e9fd2617b84321:33
fungiwhich coincides with what the webui shows21:33
clarkbdoes a fsck report anyting?21:34
*** wolke has joined #openstack-infra21:34
*** ekultails has quit IRC21:34
fungii do realize i forgot to do a git gc on gitea02, though that shouldn't affect this21:34
clarkbno should only affect performance (laod avg may be higher on that host than the others)21:35
fungigit fsck reports no problems for that repo21:36
*** wolke has left #openstack-infra21:37
*** Vadmacs has quit IRC21:38
fungialso, it replicated the head change to github, but not to gitea21:38
fungior rather, the replication to github is successfully reflected while on gitea it is not21:39
clarkbmaybe fsck of the content in gerrit/github will reveal something that might make gitea unhappy?21:40
clarkbwe have had that with github before where replication didn't work due to problems in the repo that gerrit was ok with21:40
fungijust a few dangling commits: http://paste.openstack.org/show/754915/21:41
clarkbfungi: if you docker ps -a you can check the logs for the gitea ssh container using docker logs --since --before $containerid type stuff21:42
clarkbmaybe do that for the time around when you triggered replication of that repo and see if anything shows up as a problem?21:42
fungithe opendevorg/gitea-openssh has no docker logs at all21:49
fungithe opendevorg/gitea:latest docker logs don't seem to have any sort of failure/error messages related to openstack/ansible-config_template (just entries which look like clients fetching/cloning from it)21:50
clarkbin that case you may have to docker exec -it $opensscontainer bash21:52
clarkbthen poke around and look for logs21:52
clarkbthat gives you a bash process running int he context of that container21:53
*** mattw4 has quit IRC21:53
*** jamesmcarthur has joined #openstack-infra21:55
fungiit looks like everything in that filesystem root is actually just mapped to files in /var/gitea21:56
fungiso easier to browse/view them without docker getting in the way21:57
clarkbincluding sshd logs?21:57
fungi(at least so far i've found no files via a docker shell which weren't present there outside the container)21:57
fungioh, the ssh container. i'll try that one21:58
fungilooks like they share the same filesystem tree21:59
paladoxmordred heh gerrit 2.15.15 broke some of our plugins due to a bazel change, so i've had to spend time pulling in the plugin update && also adapting one of our other plugins to the changes.22:00
clarkbfungi: it is possible that sshd is only logging to syslog and we have to mount in /dev/log for that container to properly log (we did this with haproxy)22:01
clarkbcorvus: mordred ^ do you know?22:01
paladoxbut that also means that the gerrit docker image we have will have to stay broken with 2.15.15 until we merge && deploy a new image.22:01
fungiclarkb: yeah, i don't see any syslog in /var/log of the ssh container at least22:03
*** whoami-rajat has quit IRC22:06
fungikinda tempted to press forward with the gitea03 replacement (ready to write the change to update dns records and add it back to the inventory) to see if it gets that head replicated22:06
clarkbfungi: that seems like a reasonable test, worst case 03 will be in the same situation as the others22:07
clarkbit would be cool if pip didn't tell you every package it was ignoring bceause your python version idnd't match env markers22:10
*** kjackal has quit IRC22:10
openstackgerritJeremy Stanley proposed opendev/system-config master: Add gitea03 replacement to inventory  https://review.opendev.org/67311322:10
clarkbthat IP tells me it is a gitea03 + 222:11
*** slaweq has joined #openstack-infra22:11
openstackgerritJeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea03  https://review.opendev.org/67311422:11
*** xek__ has quit IRC22:12
*** raissa has joined #openstack-infra22:14
*** raissa has quit IRC22:14
*** raissa has joined #openstack-infra22:15
*** jamesmcarthur has quit IRC22:15
*** raissa has quit IRC22:15
*** slaweq has quit IRC22:16
*** raissa has joined #openstack-infra22:16
*** raissa has joined #openstack-infra22:16
*** raissa has quit IRC22:17
fungiinfra-root: i'm at a loss for why /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is out of date on all our gitea servers (still referring to 09c76e238026d7ba4134ee2b66a4e9fd2617b843 when it should be b7f38639a21857aead860195d12eccf6eb9f437e like /home/review_site/git/openstack/ansible-config_template.git/refs/heads/master has)22:17
fungi(...has on our gerrit server)22:17
clarkbfungi: you did say the ref itself is present but the master ref isn't updated?22:17
fungiyep22:17
corvusby ref you mean commit?22:18
clarkber yes22:18
fungiref is already there, would have been replicated when the review patchset was pushed22:18
fungicommit, yes22:18
corvusand yeah, it's not a merge commit, so it's the same as a refs/changes ref22:18
clarkbas a sanity check plenty of disk22:18
clarkbat least on gitea0122:18
corvusi don't know where the ssh logs end up22:18
corvuswherever ssh puts them by default at LogLevel INFO22:19
fungiyeah, i had no luck finding them either22:19
clarkbI think sshd probably logs to syslog and we don't have a syslog22:19
corvusthere's no other special logging22:19
fungii'm in the process of building a new gitea server (since i was doing that anyway) to see what state ends up replicated to it22:19
clarkbwe could mount /dev/log as with haproxy, that might get a little confusing as there is the host's sshd and the container sshd but probably not to the point where we don't want to do that22:20
fungifrom a timeline perspective, that change merged after gitea02 was built, replicated to and brought into service yet it has the same stale state as its sibling gitea servers22:21
corvuswhat was the state of gitea03 at the time22:21
corvus[2019-07-26 22:19:55,500] [f900b6cd] Cannot replicate to ssh://git@gitea03.opendev.org:222/openstack/ansible-config_template.git22:21
fungioffline22:21
fungii had already deleted the nova server instance for it22:21
openstackgerritMerged opendev/zone-opendev.org master: Update IP address for gitea03  https://review.opendev.org/67311422:22
fungithere are now changes up to add dns and inventory entries for the new 0322:22
fungidns just now merged, from the looks of it22:22
corvuswhen was the original replication time?22:22
clarkbhttps://review.opendev.org/#/c/672952/ is the change looks like 19:38UTC ish22:23
fungithere were two because i manually initiated a replication of that repo in troubleshooting this22:23
clarkbfor when it merged which should be near the replication time22:23
fungibut yeah, that would be the earlier one22:23
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153022:24
fungisecond was sometime between 21:06 and 21:11 since the various tasks were queued in gerrit for a few minutes22:24
corvusin a minute, i'm going to manually trigger replication while running strace on sshd on gitea0222:26
corvusjust as soon as this openstack/openstack replication finishes22:26
*** michael-beaver has quit IRC22:27
fungisounds like a reasonable test22:31
corvusthe replication commands are not appearing in gerrit's log22:33
corvusif i run this:  ssh review replication start zuul/zuul-operator --url gitea02.opendev.org22:33
corvusi see this: [2019-07-26 22:32:49,250] [] scheduling replication zuul/zuul-operator:..all.. => ssh://git@gitea02.opendev.org:222/zuul/zuul-operator.git22:33
corvusif i run this: ssh review replication start openstack/ansible-config_templates --url gitea02.opendev.org22:34
corvusi see nothing in replication.log22:34
corvusdid i spell the project right?22:34
corvusno i did not, it's singular.22:34
corvusi'll try again22:34
clarkb openstack/ansible-config_template <- straight copy paste from gerrit webui22:35
clarkbno s22:35
clarkboh you figured it out before me22:35
corvusit's "running" now22:35
fungiyep, sorry, had my nose in the gitea database schema seeing if it could somehow be double-tracking head states in there incorrectly22:35
fungi(found no obvious place it might track repository heads in the db, fwiw)22:36
*** mattw4 has joined #openstack-infra22:38
*** gyee has quit IRC22:39
corvusi think the logs are going to stderr22:41
corvusi don't know why they are not showing up in "docker logs"22:41
corvus-rw-r--r--  1 1000 1000   73 Jul 26 19:38 .gitconfig22:43
corvus-rw-r--r--  1 1000 1000    0 Jul 26 19:43 .gitconfig.lock22:43
corvuswell that's a coincidince, huh?22:43
corvushave we confirmed any updates since then?22:44
fungithose times do look suspicious22:44
fungiis that inside that particular repo?22:44
corvusno, that's the homedir22:45
corvuson gitea0822:45
corvussame times on all servers22:45
*** jamesmcarthur has joined #openstack-infra22:45
corvus/var/gitea/data/git22:45
clarkbfwiw that .gitconfig.lock is the same file we had to delete after restarting the giteas yesterday before replication would work22:46
clarkb(was assumed to be fallout of the ceph disaster earlier in the day)22:46
clarkbmaybe an OOM left it behind and the 8GB swapfile isn't big enough?22:46
fungithat's well after gitea02 was brought back online, so there was nothing manual going on with gitea servers22:46
corvusi don't see any currently running processes started around that time22:47
clarkb[Fri Jul 26 19:39:45 2019] INFO: task khugepaged:65 blocked for more than 120 seconds.22:47
clarkbfrom dmesg -T22:48
clarkbnot an OOM but unhappy disk?22:48
clarkb[Fri Jul 26 19:39:19 2019] systemd[1]: systemd-journald.service: State 'stop-sigabrt' timed out. Terminating.22:48
* clarkb looks on other giteas22:48
fungiit does indeed look like other repos aren't updating either22:48
fungihttps://opendev.org/opendev/zone-opendev.org/commits/branch/master is missing 673114 which merged at 22:2222:49
clarkb[Fri Jul 26 19:38:44 2019] INFO: task jbd2/vda1-8:248 blocked for more than 120 seconds. <- from gitea0122:49
corvus[pid 26929] write(2, "error: could not lock config fil"..., 68) = 6822:49
clarkbseems like something happened in the cloud around that time and made the hosts unhappy, git process or gitea may have not cleaned up after itself as a result?22:49
corvus[pid 26929] exit_group(-1 <unfinished ...>22:49
corvus(that's from that strace)22:50
clarkbso maybe we remove those lock files again, rereplicate (again) and bring this up with mnaser?22:50
fungigitea02 is showing signs of write errors as well from 19:39 today22:50
clarkbfungi: I expect all of them will exhibit that but 0322:50
fungigitea02 didn't even get nova booted until 14:43 today, so this is definitely not lingering issues from instances which went through the problems on wednesday22:51
*** goldyfruit has quit IRC22:52
fungilooks like a wholly fresh incident similar to the early wednesday one22:52
clarkbyup22:52
fungior was that yesterday? i've lost track already22:52
corvusclarkb: i agree with the mitigation plan22:53
corvusi can easily remove the locks if ya'll are ready22:53
fungiprevious incident was early 2019-07-25 so i guess that was yesterday (thursday)22:53
*** weifan has quit IRC22:53
fungicorvus: sounds good22:53
corvusclarkb: ?22:54
clarkbcorvus: am ready22:54
clarkband plan sounds good22:54
corvusdone22:54
clarkbfungi: should we trigger replication ~3 at a time like yesterday?22:55
fungii'm on hand to babysit the replication... no exciting friday night plans22:55
*** gyee has joined #openstack-infra22:55
clarkbI'm around for the next hour at least22:55
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153022:55
clarkbfungi: just let me know how I can help22:55
fungiwe could try x4 and see what the slowdown is22:55
clarkbfungi: ++22:55
corvusi'll trigger the single rep on gitea02 now22:56
fungithanks corvus, a good canary22:57
fungii have a feeling we could fire all 8 together and they'd still complete in a reasonable amount of time, if the slowdown was really related to other use of the server22:57
*** aaronsheffield has quit IRC22:57
corvus02 is good now on a-c_t22:58
fungithough it was rather uncanny that last night the replication time scaled roughly linearly to the number of repos we replicated at once22:58
fungiimplying "parallel" replication still has some inherent serialization imposed somewhere22:58
corvusi have to run, so i'll leave the repl kick-off to you22:59
fungithanks again corvus!!!22:59
*** weifan has joined #openstack-infra22:59
fungii'll fire up full replication for 05-08 first22:59
clarkbsounds good22:59
corvus(it wouldn't be sysadmin day if we didn't get to do some sysadminning, huh?)22:59
fungiall queued23:00
fungiindeed, indeed23:00
fungihappy sysadmin day to all23:00
*** jamesmcarthur has quit IRC23:00
fungii'm polling the queue length with some timestamping again so i'll know what time it reaches ~023:03
*** weifan has quit IRC23:03
fungiguessing it'll be around 23:55 if the trending from last night remains consistent23:03
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153023:05
*** mriedem has quit IRC23:08
*** pkopec has quit IRC23:09
*** jamesmcarthur has joined #openstack-infra23:10
*** slaweq has joined #openstack-infra23:11
*** slaweq has quit IRC23:15
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153023:22
clarkbI think my devstack into python chagne is working now and takes keystone account setup from ~60 seconds to 7 seconds?23:27
clarkbthere are a bunch of other keystone user and endpoint type creation things that before were taking that ~7 minutes which we can probably get down to ~30 seconds if rewritten in python23:27
*** diablo_rojo has joined #openstack-infra23:28
*** tosky has quit IRC23:31
*** jamesmcarthur has quit IRC23:42
fungireplication seems to be tracking far longer than my earlier estimate23:42
fungiso either things are slower now than this time last night, or 4x is past the tipping point for contention23:43
*** mattw4 has quit IRC23:45
fungitaking nearly twice as long as projected23:50
clarkbhuh23:51
fungiyeah, i expected it to wrap up around nowish, but 2788 tasks23:54
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153023:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!