Thursday, 2019-08-01

*** armax has quit IRC00:00
*** aaronsheffield has quit IRC00:01
*** jrist has quit IRC00:01
donnydSo I am 100% confident my bottleneck is no longer storage00:02
donnydJust fired up a vm on a loaded server, and here are the results00:03
donnydhttps://www.irccloud.com/pastebin/9OMKNFQH/00:03
*** jrist has joined #openstack-infra00:03
*** sgw has joined #openstack-infra00:04
*** jrist has quit IRC00:04
donnydI am not sure what other clouds are doing, but  I would be curious to see what one would get in this instance type. Maybe I am still short00:05
*** betherly has joined #openstack-infra00:06
*** lseki has quit IRC00:09
donnydSo I am not sure why tempest runs keep timing out on FN... It appears to me like things are running as they should from the infra perspective00:09
*** jrist has joined #openstack-infra00:10
*** betherly has quit IRC00:11
clarkbdonnyd: I think it is possible that the swapping is just really hurting it00:12
donnydHere are stats from my edge router showing we aren't short there... The only thing left it could be is my CPU's are too old and slow.. or too over commited https://usercontent.irccloud-cdn.com/file/WoVNW88l/Screenshot%20from%202019-07-31%2020-10-29.png00:12
*** michael-beaver has quit IRC00:13
*** jrist has quit IRC00:15
*** jrist has joined #openstack-infra00:17
*** ekultails has quit IRC00:18
donnydclarkb: I will see if turning down the cpu ratios helps at all.. but I can't do much more to make this thing go any faster00:20
donnydMaybe the context switching on the hypervisor is slowing things to a crawl00:20
*** sgw has quit IRC00:20
donnydor something like that00:20
clarkbdonnyd: ok, I think part of the problem is definitely the software being slow too (for example that api stuff we were discussing can be much quicker)00:21
donnydyea, that would surely help00:21
donnydI have a tv calling me, but I do appreciate everyone ( clarkb fungi mordred ianw ) for helping me to get FN back online00:23
donnydHave a great night00:23
fungiwe appreciate your help too, thanks donnyd!00:23
clarkboh right that is happenign again tonight00:24
clarkbdonnyd: thank you! and I should go find the tv too00:24
*** armax has joined #openstack-infra00:44
*** gregoryo has joined #openstack-infra00:47
*** efried has quit IRC00:51
*** efried has joined #openstack-infra00:59
*** ricolin has joined #openstack-infra01:04
*** betherly has joined #openstack-infra01:08
*** betherly has quit IRC01:13
*** eernst_ has quit IRC01:19
*** igordc has quit IRC01:31
*** bobh has joined #openstack-infra01:34
*** happyhemant has quit IRC01:35
*** armax has quit IRC01:39
*** betherly has joined #openstack-infra01:40
*** dychen has joined #openstack-infra01:41
*** dchen has quit IRC01:44
*** betherly has quit IRC01:44
*** bhavikdbavishi has joined #openstack-infra01:48
*** eernst has joined #openstack-infra01:50
*** bhavikdbavishi1 has joined #openstack-infra01:51
*** bhavikdbavishi has quit IRC01:52
*** bhavikdbavishi1 is now known as bhavikdbavishi01:52
*** eernst has quit IRC01:58
*** gyee has quit IRC02:00
*** betherly has joined #openstack-infra02:01
*** betherly has quit IRC02:05
*** slaweq has joined #openstack-infra02:11
*** slaweq has quit IRC02:15
*** bobh has quit IRC02:19
*** betherly has joined #openstack-infra02:25
*** bobh has joined #openstack-infra02:26
*** dingyichen has joined #openstack-infra02:26
*** betherly has quit IRC02:29
*** dychen has quit IRC02:30
*** ykarel|away has joined #openstack-infra02:30
*** ykarel|away has quit IRC02:35
*** bhavikdbavishi has quit IRC02:35
*** ramishra has joined #openstack-infra02:41
*** bobh has quit IRC02:56
*** kjackal has joined #openstack-infra03:04
*** eernst has joined #openstack-infra03:05
*** rlandy|bbl has quit IRC03:10
*** kjackal has quit IRC03:17
*** notmyname has quit IRC03:20
*** notmyname has joined #openstack-infra03:20
*** ykarel|away has joined #openstack-infra03:26
*** diablo_rojo has quit IRC03:27
*** bhavikdbavishi has joined #openstack-infra03:29
*** bobh has joined #openstack-infra03:33
*** bobh has quit IRC03:38
*** whoami-rajat has joined #openstack-infra03:41
*** ykarel|away is now known as ykarel03:43
*** hongbin has joined #openstack-infra03:45
*** hongbin has quit IRC03:46
*** udesale has joined #openstack-infra03:46
*** psachin has joined #openstack-infra03:55
*** slaweq has joined #openstack-infra04:11
*** ramishra has quit IRC04:17
*** slaweq has quit IRC04:17
ianwhrm not geting a console stream on http://zuul.openstack.org/stream/ec45ac1d208343b58dee520053e8caee?logfile=console.log04:29
*** ramishra has joined #openstack-infra04:30
ianwwent to ze02, job-output.txt is flowing through ok but not console04:32
*** n-saito has joined #openstack-infra04:45
*** eernst has quit IRC04:55
*** raukadah is now known as chandankumar04:59
*** goldyfruit has joined #openstack-infra04:59
*** ykarel is now known as ykarel|afk05:00
*** gfidente has joined #openstack-infra05:02
*** goldyfruit has quit IRC05:04
*** ykarel|afk has quit IRC05:04
ianwinfra-root: so it looks like the log dameon on ze01 & ze02 has disappeared; nothing listening on port 7900.  i've looked through the logs for anything with "^Traceback" but i can't see anything obvious as to why it would stop05:06
*** n-saito has quit IRC05:09
*** diablo_rojo has joined #openstack-infra05:21
*** ociuhandu has joined #openstack-infra05:22
*** dingyichen has quit IRC05:23
*** pkopec has joined #openstack-infra05:23
*** dchen has joined #openstack-infra05:23
*** diablo_rojo has quit IRC05:25
*** dpawlik has joined #openstack-infra05:26
*** jamesmcarthur has quit IRC05:26
*** dansmith has quit IRC05:26
*** ociuhandu has quit IRC05:27
*** dansmith has joined #openstack-infra05:28
*** kopecmartin|off is now known as kopecmartin05:46
*** jaosorior has quit IRC05:46
*** ykarel|afk has joined #openstack-infra05:51
*** ccamacho has quit IRC05:54
*** ykarel|afk is now known as ykarel05:58
*** slaweq has joined #openstack-infra06:04
ianw#status log restarted ze02 to get log streaming working06:07
openstackstatusianw: finished logging06:07
ianwinfra-root: ^ see my notes in #zuul -- ze01 is still not streaming but i'm not going to touch it right now.  i don't know if the  "Unable to find worker for job" is just part of the startup now, or there's something else going on06:07
ianwze02 appears to be processing jobs but i don't want to risk making anything worse06:08
*** slaweq has quit IRC06:09
*** slaweq has joined #openstack-infra06:11
*** takamatsu has joined #openstack-infra06:13
*** slaweq has quit IRC06:16
*** xek has joined #openstack-infra06:17
*** bobh has joined #openstack-infra06:19
*** jtomasek has joined #openstack-infra06:21
*** janki has joined #openstack-infra06:22
*** bobh has quit IRC06:23
*** pgaxatte has joined #openstack-infra06:26
*** ricolin_ has joined #openstack-infra06:26
*** xek has quit IRC06:27
*** iurygregory has joined #openstack-infra06:29
*** ricolin has quit IRC06:29
*** jlufr has joined #openstack-infra06:40
*** odicha has joined #openstack-infra06:45
openstackgerritLuigi Toscano proposed zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files  https://review.opendev.org/67388506:56
*** e0ne has joined #openstack-infra06:57
*** e0ne has quit IRC06:59
*** slaweq has joined #openstack-infra07:04
*** ginopc has joined #openstack-infra07:12
*** pcaruana has quit IRC07:12
*** rpittau|afk is now known as rpittau07:13
*** ccamacho has joined #openstack-infra07:18
*** tesseract has joined #openstack-infra07:20
*** fdegir has joined #openstack-infra07:24
*** jtomasek has quit IRC07:29
*** jpena|off is now known as jpena07:34
*** Goneri has joined #openstack-infra07:35
*** ykarel is now known as ykarel|lunch07:37
*** AJaeger has quit IRC07:37
*** ralonsoh has joined #openstack-infra07:39
*** ociuhandu has joined #openstack-infra07:39
*** ralonsoh has quit IRC07:40
*** ralonsoh has joined #openstack-infra07:40
*** pcaruana has joined #openstack-infra07:42
*** AJaeger has joined #openstack-infra07:48
*** gregoryo has quit IRC07:51
*** jtomasek has joined #openstack-infra07:52
*** e0ne has joined #openstack-infra07:53
*** tosky has joined #openstack-infra07:54
*** ociuhandu has quit IRC07:56
*** electrofelix has joined #openstack-infra07:58
*** lpetrut has joined #openstack-infra08:03
*** lpetrut has quit IRC08:04
*** goldyfruit has joined #openstack-infra08:04
*** lpetrut has joined #openstack-infra08:04
*** dchen has quit IRC08:07
*** jtomasek has quit IRC08:09
*** goldyfruit has quit IRC08:09
*** happyhemant has joined #openstack-infra08:09
*** lucasagomes has joined #openstack-infra08:11
*** ykarel|lunch is now known as ykarel|away08:21
*** pgaxatte has quit IRC08:22
*** derekh has joined #openstack-infra08:25
*** ykarel|away has quit IRC08:27
*** tkajinam has quit IRC08:27
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455708:28
*** lpetrut has quit IRC08:29
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455708:31
*** pgaxatte has joined #openstack-infra08:52
*** priteau has joined #openstack-infra08:53
*** psachin has quit IRC09:00
*** bobh has joined #openstack-infra09:07
*** janki has quit IRC09:08
*** janki has joined #openstack-infra09:09
*** bobh has quit IRC09:12
*** jlufr has quit IRC09:13
openstackgerritSorin Sbarnea proposed opendev/base-jobs master: [POC] Execute linters using pre-commit tool  https://review.opendev.org/67396909:17
*** jchhatbar has joined #openstack-infra09:17
*** janki has quit IRC09:17
openstackgerritCarlos Goncalves proposed openstack/diskimage-builder master: Reduce yum-minimal based OS install size footprint  https://review.opendev.org/67232909:19
*** ginopc has quit IRC09:19
*** ginopc has joined #openstack-infra09:25
*** jchhatbar has quit IRC09:25
*** e0ne_ has joined #openstack-infra09:36
*** yamamoto has joined #openstack-infra09:36
*** e0ne has quit IRC09:37
*** roman_g has joined #openstack-infra09:38
*** priteau has quit IRC09:43
*** priteau has joined #openstack-infra09:43
openstackgerritThierry Carrez proposed openstack/project-config master: Fix ACL for compute-hyperv  https://review.opendev.org/67398809:47
*** priteau has quit IRC09:50
*** priteau has joined #openstack-infra09:51
*** goldyfruit has joined #openstack-infra09:55
*** bhavikdbavishi has quit IRC09:56
*** ociuhandu has joined #openstack-infra09:57
*** goldyfruit has quit IRC10:00
*** ociuhandu has quit IRC10:01
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: POC: Run linters via pre-commit  https://review.opendev.org/66769910:01
*** prometheanfire has quit IRC10:04
*** rpittau is now known as rpittau|bbl10:04
*** yamamoto has quit IRC10:07
*** sshnaidm|afk is now known as sshnaidm10:25
*** ricolin__ has joined #openstack-infra10:28
*** dtantsur|afk is now known as dtantsur10:29
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455710:29
*** yamamoto has joined #openstack-infra10:30
*** yamamoto has quit IRC10:30
*** ricolin_ has quit IRC10:31
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455710:33
*** georgk has quit IRC10:34
*** georgk has joined #openstack-infra10:35
*** georgk has quit IRC10:35
*** georgk has joined #openstack-infra10:37
openstackgerritSorin Sbarnea proposed opendev/system-config master: Recognize DISK_FULL failure messages (review_dev)  https://review.opendev.org/67389310:47
*** prometheanfire has joined #openstack-infra10:50
*** yamamoto has joined #openstack-infra10:51
fricklerianw: I've commented on the ethercalc regarding devstack OSC timings, if you remove the two results from clarkb's tests, the stddev goes way down and the improvement resulting from them gets much clearer IMHO10:54
*** yamamoto has quit IRC10:55
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455710:55
*** dpawlik has quit IRC10:58
*** rcernin has quit IRC10:58
*** dpawlik has joined #openstack-infra11:06
*** yamamoto has joined #openstack-infra11:10
*** goldyfruit has joined #openstack-infra11:14
*** ociuhandu has joined #openstack-infra11:20
*** ociuhandu has quit IRC11:21
*** goldyfruit has quit IRC11:21
*** panda is now known as panda|eat11:25
*** dpawlik has quit IRC11:27
*** e0ne has joined #openstack-infra11:29
*** e0ne_ has quit IRC11:30
*** bhavikdbavishi has joined #openstack-infra11:35
*** bhavikdbavishi has quit IRC11:37
*** bhavikdbavishi has joined #openstack-infra11:38
*** bobh has joined #openstack-infra11:45
*** priteau has quit IRC11:46
*** tdasilva has joined #openstack-infra11:47
*** udesale has quit IRC11:49
*** bobh has quit IRC11:50
*** udesale has joined #openstack-infra11:50
*** jamesmcarthur has joined #openstack-infra11:51
*** dpawlik has joined #openstack-infra11:57
*** pgaxatte has quit IRC11:58
*** ociuhandu has joined #openstack-infra12:01
*** dpawlik has quit IRC12:01
*** ociuhandu has quit IRC12:05
*** jamesmcarthur has quit IRC12:06
*** ociuhandu has joined #openstack-infra12:15
*** bobh has joined #openstack-infra12:16
openstackgerritTobias Henkel proposed zuul/zuul master: Optionally allow zoned executors to process unzoned jobs  https://review.opendev.org/67384012:19
*** Lucas_Gray has joined #openstack-infra12:22
*** ociuhandu has quit IRC12:25
*** ociuhandu has joined #openstack-infra12:27
*** dpawlik has joined #openstack-infra12:29
*** ociuhandu has quit IRC12:31
*** rlandy has joined #openstack-infra12:31
*** ociuhandu has joined #openstack-infra12:31
*** Lucas_Gray has quit IRC12:32
*** ekultails has joined #openstack-infra12:32
*** panda|eat is now known as panda12:34
*** jpena is now known as jpena|off12:34
*** Lucas_Gray has joined #openstack-infra12:35
*** bhavikdbavishi1 has joined #openstack-infra12:37
*** bhavikdbavishi has quit IRC12:37
*** bhavikdbavishi1 is now known as bhavikdbavishi12:37
*** Lucas_Gray has quit IRC12:42
*** jamesmcarthur has joined #openstack-infra12:44
*** aaronsheffield has joined #openstack-infra12:51
gmanninfra team,  patrole stable/stein was created by mistake, I have fixed that on release side (https://review.opendev.org/#/c/670942/) can you delete the branch now - https://opendev.org/openstack/patrole/src/branch/stable/stein12:53
mordredit's currently on 58ec6210c6a121743e6bc3217d1388962f0647c312:54
mordreddone12:54
*** rpittau|bbl is now known as rpittau12:57
*** jaosorior has joined #openstack-infra12:58
*** pgaxatte has joined #openstack-infra13:03
gmannmordred: thanks13:05
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455713:07
*** tosky has quit IRC13:10
*** tosky has joined #openstack-infra13:10
*** bobh has quit IRC13:12
donnydin the interest of getting to the bottom of job performance I am curious to know if increasing the available memory for each instance will in fact improve performance.13:15
*** jpena|off is now known as jpena13:16
donnydI think at this point the jobs that timeout on FN are fairly well established, so i propose increasing memory for each instance on FN from 8G to 16G to see if that helps jobs move any faster. With the understanding that we aren't planning to ask for more from the cloud providers, I just want to get to the bottom of getting the absolute most of of each job.13:18
donnydWith the proper data, I think we can do better at optimizing the jobs if we can definitively say that the jobs are using too much memory and find a way to make them better13:20
*** yamamoto has quit IRC13:21
*** ociuhandu has quit IRC13:26
*** ociuhandu has joined #openstack-infra13:27
*** mriedem has joined #openstack-infra13:27
*** jcoufal has joined #openstack-infra13:27
*** priteau has joined #openstack-infra13:27
*** ginopc has quit IRC13:29
*** ociuhandu has quit IRC13:31
*** lseki has joined #openstack-infra13:33
*** ginopc has joined #openstack-infra13:33
openstackgerritJan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455713:37
openstackgerritTobias Henkel proposed zuul/zuul master: Make tenant and pipeline optional in zuul-changes  https://review.opendev.org/67403413:38
*** ginopc has quit IRC13:40
*** yamamoto has joined #openstack-infra13:43
*** witek has joined #openstack-infra13:44
openstackgerritTobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw  https://review.opendev.org/66494913:45
openstackgerritTobias Henkel proposed zuul/zuul master: WIP: Route streams to different zones via finger gateway  https://review.opendev.org/66496513:45
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/66495013:45
*** eharney has joined #openstack-infra13:46
fricklerdonnyd: if you want to abandon https://review.opendev.org/673838 you should see an "abandon" button there where you can do that yourself13:47
*** ginopc has joined #openstack-infra13:48
fricklerdonnyd: we generally want to keep the memory per instance identical on all providers. otherwise jobs will start to fail for lack of memory based on provider selection, which would be bad13:48
donnydfrickler: well they already do that13:50
donnydThe understanding is that we know they fail, but is it from lack of memory and going into swap?13:50
donnydNot a permanent thing, just testing to see if the instances that are provided more memory stop failing, or build faster13:51
witekhello Infra Team13:52
fricklerdonnyd: hmm, maybe wait for feedback from other infra-root, for me I'd like to see such a test made with dedicated job runs, not with a random subset of our general job queue13:52
*** priteau has quit IRC13:53
witekwe see Monasca tempest jobs failing since Tuesday with strange errors when stacking DevStack13:54
witekI don't think we have changed anything in the configuration though13:54
fricklerwitek: is that with nova.conf missing? do you have a sample log pointer?13:54
witekfrickler: yes13:55
witekhttps://logs.opendev.org/16/674016/1/check/monasca-tempest-python3-influxdb/3b8a695/controller/logs/devstacklog.txt.gz#_2019-08-01_12_38_53_23113:55
fricklerwitek: yes, that's a known issue in tempest, sadly the revert keeps failing in gate, see https://review.opendev.org/67378413:56
witekthere are some other errors in other jobs as well13:56
donnydI surely agree that across the cloud providers we should be doing exactly the same thing, but there are a certain batch of jobs that fail on the regular. The other purpose in doing it more broadly is to see if all jobs are completed faster or its only certain ones that can actually make use of the extra memory. If yes, then we can report back to the communities with complete data saying that we need to find a better13:56
donnydway to more efficiently use the memory we have13:56
openstackgerritJeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image  https://review.opendev.org/67279113:57
witekfrickler: thanks for info13:57
donnydBut I am thinking a temporary test could give us the data we need13:57
donnydI agree we should wait to see what everyone else thinks13:58
*** lbragstad has joined #openstack-infra13:59
donnydfrickler: thanks for reminding me to abandon my review, still learning how to gerrit and such14:00
*** priteau has joined #openstack-infra14:01
*** michael-beaver has joined #openstack-infra14:02
frickler#status log Force-Merged openstack/tempest master: Revert "Use memcached based cache in nova in all devstack-tempest jobs"  https://review.opendev.org/673784 as requested by gmann to unblock other projects14:04
openstackstatusfrickler: finished logging14:04
gmannfrickler: thanks14:04
*** ociuhandu has joined #openstack-infra14:07
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: Avoid openstacksdk image delete bug  https://review.opendev.org/67404314:14
mriedemclarkb: the devstack memcache + nova meta api patch https://review.opendev.org/#/c/674025/14:15
*** ociuhandu has quit IRC14:16
*** ociuhandu has joined #openstack-infra14:19
*** jhesketh has quit IRC14:23
*** ociuhandu has quit IRC14:23
openstackgerritTobias Henkel proposed zuul/zuul master: Add spec for enhanced regional executor distribution  https://review.opendev.org/66341314:34
jrollneat, pypi added upload-scoped API tokens, we should consider moving to that for uploads: https://pyfound.blogspot.com/2019/07/pypi-now-supports-uploading-via-api.html14:38
jrollit also means we'd be able to put 2FA on our pypi account(s) :)14:39
corvusjroll: that looks perfect14:39
corvussupports scoping to all packages maintained by an account14:40
corvusso we can still just have one for all of openstack14:40
jrollyep14:40
corvusmost of that work will be enhancements to the roles in zuul-jobs to support tokens.  anyone can do that.  once that's in place, an infra-root will need to go into our account and get a token.14:43
*** dpawlik has quit IRC14:43
mordred++14:43
*** ykarel has joined #openstack-infra14:43
corvuswe need to restart zuul to pick up the log_url patch; earlier the better on that, so i'll get started on that14:45
*** jaosorior has quit IRC14:46
fungicorvus: jroll: the topic came up in #zuul last week, and it was unclear what benefit that provided over using a dedicated account for performing uploads (which is basically what we do now)14:46
jrollfungi: can that account do anything besides upload?14:47
jrollif not, I guess I don't see any benefit either14:48
fungiwhat else important is there to do on pypi besides uploading?14:48
corvusfungi: i think it means if the creds were compromised, they couldn't be used to remove or replace old releases, or change ownership of a project14:48
fungiwell, pypi already doesn't allow replacing old releases14:49
corvusat least, i really hope that's the case, otherwise they shouldn't call them 'upload' tokens :)14:49
corvusfungi: if they allow deleting old releases (which i'm pretty sure we've learned to our chagrin they do) and they allow uploading, then they do, right?14:49
jrollright, uploads are probably the most interesting thing to do with a stolen credential anyway, so it isn't a large benefit IMO. but things like changing ownership are also bad14:49
fungithe feature seemed to be mostly so that for users who interact with the warehouse webui and so want to use 2fa because they're at risk for leaking their credentials, but also use the same account to upload files and need some means of doing that non-interactively14:50
fungicorvus: they allow you to delete old releases, but they don't allow you to reupload a previously-deleted release14:50
corvusfascinating14:51
fungithey indefinitely track the old release/file names14:51
fungieven after deletion14:51
fungi(to close the loophole you describe)14:51
corvusthen i agree this doesn't significantly change our security posture except in relation to ownership changes (though to really effect that, we'd have to revoke the ownership of our names by the patchwork of people who additionally own them now)14:51
fungiyeah, i mean i think from a zuul-jobs perspective supporting tokens is great, but i believe because it reuses the existing credential options for twine it would already work today14:52
corvus(zuul is restarting, btw)14:53
corvusit just uses the password field?14:53
corvusor, username + password14:54
fungiright, and a generic username of "token" (or something like that)14:54
corvusok, then i guess it's at most a docs change :)14:54
fungitoken goes in the password field, username is a generic one which indicates the warehouse auth api should do a token lookup14:54
fungiso right, if we wanted opendev's pypi uploads to use tokens, pretty sure it's mostly a matter of issuing one with our account and then updating the secret in project-config14:55
fungiso maybe still worth doing if we think the slight reduction in permissions is worthwhile14:56
*** priteau has quit IRC14:56
fungithe main reason the announcement so broad is that pypi plans to make upload tokens mandatory for accounts which have enable two-factor authentication (though no indication of when that's planned yet, and the implication is that non-2fa accounts can continue to upload with their normal account username+password for the foreseeable future)14:57
fungimaybe someday they'll also make upload tokens mandatory for non-2fa accounts, but i've not seen anyone suggest that14:58
corvusi'm re-enqueing now; the executors haven't finished stopping yet, which is interesting.  they're extra slow today.15:01
jrollcorvus: fungi: good info, thanks. it does seem like it isn't super useful, but a good item for someone to pick up if they're bored :)15:02
*** yamamoto has quit IRC15:02
clarkbcorvus: did you see ianws noted about ze01 and ze02 not streaming logs?15:02
corvusclarkb: yes15:03
clarkbmaybe related to slowness stopping?15:03
openstackgerritNatal Ngétal proposed openstack/reviewstats master: Load subproject data from governance  https://review.opendev.org/65302415:03
openstackgerritNatal Ngétal proposed openstack/reviewstats master: Raise hacking version and fix pep8 errors  https://review.opendev.org/65591115:03
corvusze02 stopped, and ze01 just did15:03
openstackgerritNatal Ngétal proposed openstack/reviewstats master: Switch to stestr  https://review.opendev.org/65550615:03
openstackgerritNatal Ngétal proposed openstack/reviewstats master: Drop pypy default tox env  https://review.opendev.org/65591215:03
*** yamamoto has joined #openstack-infra15:04
*** yamamoto has quit IRC15:04
*** yamamoto has joined #openstack-infra15:05
corvusneat, ze07 hit the oom killer15:05
corvusand the whole executor process died15:05
corvusi suspect that's what happened to 01 and 02, except that it only killed the streaming daemon15:06
corvusthe extra slowness is probably from the other executors being overloaded15:06
clarkbperhaps related to cmurphy's keystone console logs that killed my desktop15:06
clarkbif someone was trting to view them in the live streamed console?15:06
*** pkopec has quit IRC15:06
corvusall is back up now15:06
fungiugh, our broadband provider seems to have lost contact with the mainland at 15:00z precisely (i'm back on through a wireless modem for now)15:06
clarkbfungi: mine texted me several hours ago about an outage today too15:07
clarkbseems to be working so Im hoping it is all behind us15:07
*** ykarel is now known as ykarel|away15:08
*** yamamoto has quit IRC15:09
*** diga has joined #openstack-infra15:10
*** ociuhandu has joined #openstack-infra15:10
digaHello Everyone15:10
digaI want to create new project in storyboard and under Open Infra topic15:11
digawhat's the process for it ?15:11
*** yamamoto has joined #openstack-infra15:13
*** yamamoto has quit IRC15:13
*** yamamoto has joined #openstack-infra15:14
AJaegerdiga: what kind of new project? What's the purpose?15:14
*** ykarel|away has quit IRC15:17
*** ykarel|away has joined #openstack-infra15:17
*** ykarel|away has quit IRC15:18
*** yamamoto has quit IRC15:18
paladoxfungi you live on an island?15:19
*** ykarel|away has joined #openstack-infra15:19
*** ykarel|away has quit IRC15:20
*** ykarel|away has joined #openstack-infra15:20
fungipaladox: yup, but it's only ~15km offshore15:20
fungi(barrier island separating an inland brackish body of water from the atlantic)15:21
*** odicha has quit IRC15:21
donnydisle of mann fungi ?15:22
*** Goneri has quit IRC15:23
fungiheh, wrong side of the atlantic15:23
donnydlol15:23
fungibodie (pronounced "body") island, part of the northern stretch of the outer banks of north carolina15:23
paladoxlol15:24
donnydi was going to keep guessing :)15:24
donnydand now you have taken the fun out of my day15:26
* paladox will be near the isle of man in 2 weeks15:26
paladoxi won't be on the isle, but near it :)15:26
donnydNext year I am going to try to make the TT... greatest show on earth15:27
*** gyee has joined #openstack-infra15:27
paladoxTT?15:29
knikollamordred sir, just got word that app creds should be enabled now on the moc.15:30
mordredknikolla: woot!15:32
mordredknikolla: I can confirm that this works!15:33
* knikolla makes a happy dance. 15:33
*** jpena is now known as jpena|off15:36
*** ykarel|away has quit IRC15:36
*** e0ne has quit IRC15:37
openstackgerritMonty Taylor proposed opendev/system-config master: Add clouds.yaml entry for MOC control plane project  https://review.opendev.org/67146315:38
mordredinfra-root: that ^^ is ready for review15:38
donnydIts a motorcycle race @paladox15:38
paladoxah ok15:39
*** sthussey has joined #openstack-infra15:39
clarkbdonnyd: I share frickler's concern. Basically we want to avoid having code merge that depends on that increase in memory. If we want to set up a new label temporarily that fn services to test if it makes a difference that would be fine15:39
clarkbdonnyd: the basic process there is updating that nl02.openstack.org.yaml file to have another ubuntu-bionic image + flavor + label combo. Then we can psuh a change to tempest that uses a different nodeset to run its jobs on that label15:40
*** tdasilva has quit IRC15:43
*** ccamacho has quit IRC15:43
donnydI am happy to do whatever infra thinks is a good path forward. Just so I understand what you are saying, if we bump the memory up: a job will succeed and code will be merged based on that. When the job runs again somewhere else it won't succeed the next time because it will get scheduled somewhere that does not have that setup. Trying to understand the why clarkb frickler15:44
clarkbmordred: is there just the one account? I notice the all_clouds file only adds one15:45
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Create bindep_virtualenv_python for bindep role  https://review.opendev.org/65843915:46
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add test-bindep job  https://review.opendev.org/67407815:46
*** jamesmcarthur has quit IRC15:46
*** ociuhandu has quit IRC15:47
openstackgerritLuigi Toscano proposed zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files  https://review.opendev.org/67388515:49
*** priteau has joined #openstack-infra15:50
*** whoami-rajat has quit IRC15:51
*** jtomasek has joined #openstack-infra15:51
*** dpawlik has joined #openstack-infra15:52
*** lucasagomes has quit IRC15:54
clarkbdonnyd: yup that is the concern. Not necessarily that it will happen but that it could happen15:54
mordredclarkb: yes - so far - I haven't requested the second project yet because I wanted to get the first one working with app creds first15:54
clarkbgotcha15:55
donnydok, now i see what clarkb and frickler are saying and it makes sense to me15:55
*** ginopc has quit IRC15:55
*** ginopc has joined #openstack-infra15:55
*** armax has joined #openstack-infra15:58
*** ginopc has quit IRC15:59
*** dpawlik has quit IRC15:59
*** sgw has joined #openstack-infra16:00
*** pgaxatte has quit IRC16:02
*** michael-beaver has quit IRC16:04
*** jpena|off is now known as jpena16:05
openstackgerritMerged zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files  https://review.opendev.org/67388516:07
*** jpena is now known as jpena|off16:07
*** igordc has joined #openstack-infra16:08
openstackgerritJames E. Blair proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109916:19
openstackgerritJames E. Blair proposed zuul/zuul master: authentication config: add optional token_expiry  https://review.opendev.org/64240816:19
fungidonnyd: also i don't know that we really need experimental data on that. we already have data which indicates that those jobs used to work in 8gb of ram and are now using increasing amounts of swap space16:20
fungii mean, increasing ram to find out if they still time out might confirm for us that it's the swapping which is slowing them down, but we need them to stop swapping heavily regardless16:20
openstackgerritMerged openstack/reviewstats master: Mailing lists change openstack-dev to openstack-discuss  https://review.opendev.org/66844616:23
openstackgerritJames E. Blair proposed opendev/system-config master: WIP: Sketch of a registry test which uses swift  https://review.opendev.org/65379716:24
*** lbragstad has quit IRC16:29
cloudnullhey all , we've been seeing the following error with the docker registry reverse proxy on OVH - https://logs.opendev.org/20/673920/4/check/tripleo-ci-centos-7-scenario004-standalone/59f618e/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz?level=ERROR16:29
cloudnullwhen I curl http://mirror.bhs1.ovh.openstack.org:8082/v2/tripleomaster/centos-binary-nova-compute/blobs/sha256:2efded40b28a63edb701aef3f646be560c1d938199334b173f496db2ca7285b1 - I get the same 40116:29
cloudnullso i'16:29
clarkbcloudnull: what happens if you request the same object from the backend?16:30
cloudnullSo i'm wondering if this is something you've seen or may have some insight on ?16:30
clarkbcloudnull: but this is likely the thing that sshnaidm brought up a little while ago16:30
clarkbcloudnull: docker requires auth tokens even if doing anonymous access16:30
clarkbtripleo uses a python script that doesn't get a token (or maybe the token is expiring?)16:31
cloudnullah ha !16:31
clarkbyou basically requet an anonymous token that they can use to track the session/requestor without authenticating16:31
cloudnullinteresting. I will look into that.16:31
* cloudnull TIL 16:31
clarkbcloudnull: let me see if I can find the docs on that16:31
clarkbI had them found when sshnaidm was looking at it16:32
clarkbcloudnull: https://docs.docker.com/registry/spec/auth/token/16:32
clarkbI was able to reproduce the 401's locally with curl at the time not authing. Then when I auth'd it worked16:32
cloudnullok, cool . that should be an easy enough fix16:33
cloudnullthanks cloudnull16:33
cloudnullbah...16:33
cloudnullclarkb16:33
fungithere was an example where we already do that in another script/job, right?16:34
fungii just don't recall which one now16:34
corvusthese roles: https://zuul-ci.org/docs/zuul-jobs/container-roles.html16:34
*** dtantsur is now known as dtantsur|afk16:38
*** rpittau is now known as rpittau|afk16:40
*** witek has quit IRC16:41
clarkbcorvus: ttx and jroll have asked that we add them to the openstack org as admins so they can begin cleaning it up. They have promised to be careful. But I remember you were working with jroll before and wanted to makle sure you didn't have some other plan in mind16:44
clarkbif not I'll go ahead and add them16:44
*** sshnaidm is now known as sshnaidm|afk16:46
corvusclarkb: no other plan; sgtm16:47
mordredclarkb: also - we shoudl point jroll and ttx at my crappy scripts from yesterday16:48
jrollmordred: heh, I've already written another crappy script16:48
mordredawesome!16:48
jrollbut more crappy code is always welcome16:48
mordredjroll: well, https://review.opendev.org/#/c/673831/ is what I used to retire and archive the stuff in openstack-infra that had moved to opendev/16:48
corvusmostly i wanted to highlight how very dangerous it is to use the same github account for regular github work and also give it admin perms in the openstack org16:49
mordredjroll: if it's useful to you in anyway, awesome. if not - also awesome :)16:49
jrollcool, thanks mordred16:49
corvusi lost track of that thread and am unsure if a solution to that was ever arrived at16:49
corvusie, it should really either be a dedicated account, or an account owned by someone who doesn't actually use github.16:51
jrollcorvus: we plan to use our accounts for now (as neither of us use/clone from/etc github for openstack work), and move to a shared account if we feel that's needed to add more people to help16:51
donnydclarkb: Do you want me to setup a custom resource for jobs known to fail on FN?16:51
corvusjroll: sounds reasonable16:51
jrollsee also https://etherpad.openstack.org/p/openstack-repos-on-github16:51
clarkbdonnyd: up to you if you want to proceed with testing that further. I'm fairly happy with the results we've got so far and have identified deficiencies in the software that should make it go quicker16:52
clarkbdonnyd: I can assist with that if you are interested16:52
donnydI should have been more specific, this job we are trying to get to fail correct? https://review.opendev.org/#/c/673923/16:53
clarkbdonnyd: ya if we recheck it a few times hopefully one of them runs on fn and then fails and we get logs16:53
donnydOH ic16:53
clarkbjroll: invite sent16:54
clarkbjroll: you can check your email or go to https://github.com/openstack to accept says github's banner16:54
clarkbttx: ^ you too16:55
jrollclarkb: got it, thanks16:55
donnydI wouldn't mind a pointer on what to change to setup a custom resource. Do I just give it a different name, say like ubuntu-bionic-testonly16:55
clarkbdonnyd: ya pretty mcuh. We have an example with the vexxhost gpu flaor type I can get links for. one sec16:56
cloudnullso when I start looking at our test results for that 401 error it looks like all of our failures for the last week have been in OVH.16:56
cloudnullhttp://logstash.openstack.org/#/dashboard/file/logstash.json?query=(message:%20%5C%22401%20Client%20Error:%20Unauthorized%5C%22)%20AND%20tags:%20%5C%22console%5C%22%20AND%20voting:1%20AND%20(project:%20*tripleo*)16:56
clarkbdonnyd: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L48-L53 and https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L237-L251 are the config you need16:57
cloudnull116 instances of that 401 error over the last 7 days16:57
clarkbcloudnull: yes but the error is reproduceable from anywhere including my desktop16:57
cloudnullbut seems odd that we're not hitting it on the other node providers ?16:58
clarkbcloudnull: that was why I theorized maybe the token is expiring16:58
cloudnullhum . I will continue digging .16:58
clarkbcould be that the script is requesting a token but that token expires before all requests are complete possibly due to transatlantic data transit16:58
cloudnull++16:58
cloudnullthat very well could be16:58
clarkbcloudnull: the dockerhub stuff is just a proxy so if we are not hitting cached data it could theoretically be slow to pull data down to france16:59
clarkb(maybe, its one thing to look into at least)16:59
cloudnullI will see if I can make our tools reauth on 40116:59
clarkbcloudnull: that sounds like a great idea17:00
fungiyeah, my theory to the intermittent nature of it was that mostly jobs are hitting cached copies but when there's a cache miss only jobs which are getting tokens manage to re-warm the cache17:00
* mordred hands cloudnull +2 Chain Mail of Awesomeness17:00
clarkbfungi: cloudnull oh right the other theory was that docker itself may be pulling down those images which caches them Then our proxy doesn't care about the auth token17:00
cloudnullfungi ++17:00
fungimordred: ooh, is it mithril?17:00
clarkbfungi: cloudnull so depending on job ordering and mirror expiry demands (we have 24 hour refresh but also expire due to disk space) we could see it in one cloud more frequently17:01
mordredfungi: yes, and also covered in feathers17:01
*** derekh has quit IRC17:01
fungirighteous17:01
* cloudnull tared and feathered 17:01
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Don't compare to literal True/False  https://review.opendev.org/66769717:01
openstackgerritDonny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test  https://review.opendev.org/67409117:02
fungione reason ovh is hitting it more often could be that we're running an larger number of builds there relative to the single mirror in each region, so caching a diversity of things more quickly and expiring some images out of the cache faster as a result of pressure for the cache space17:02
*** igordc has quit IRC17:03
cloudnullthat does make some sense17:03
fungithough it also could simply be that we're failing to cache (a subset of?) images in ovh for some reason17:03
fungior registering cache misses when we shouldn't17:04
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars  https://review.opendev.org/66769817:04
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support  https://review.opendev.org/67409217:04
*** priteau has quit IRC17:05
*** smrcascao has quit IRC17:05
*** electrofelix has quit IRC17:06
donnydclarkb: I think i got it17:06
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support  https://review.opendev.org/67409217:07
clarkbdonnyd: couple of comments inline17:10
*** bobh has joined #openstack-infra17:11
*** bobh has quit IRC17:11
*** udesale has quit IRC17:14
*** whoami-rajat has joined #openstack-infra17:15
*** ramishra has quit IRC17:18
*** ralonsoh has quit IRC17:20
openstackgerritDonny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test  https://review.opendev.org/67409117:20
donnydoh i messed that one up17:21
*** kopecmartin is now known as kopecmartin|off17:24
donnydOk, so maybe my browser was just caching something... weird.17:27
*** igordc has joined #openstack-infra17:27
*** jcoufal_ has joined #openstack-infra17:30
*** ricolin__ is now known as ricolin17:30
*** igordc has quit IRC17:30
*** jcoufal has quit IRC17:32
openstackgerritMerged zuul/zuul master: Try out reporting the build page  https://review.opendev.org/67386317:32
openstackgerritJeff Liu proposed zuul/zuul-operator master: Change operator namespace to zuul-ci.org  https://review.opendev.org/67410017:37
clarkbdonnyd: one more small thing but otherwise that looks bout ready17:41
*** slaweq has quit IRC17:42
*** igordc has joined #openstack-infra17:42
*** ociuhandu has joined #openstack-infra17:44
openstackgerritDonny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test  https://review.opendev.org/67409117:46
clarkbdonnyd: that lgtm17:48
*** ociuhandu has quit IRC17:48
*** betherly has joined #openstack-infra17:49
donnydonly took me three trys17:51
*** betherly has quit IRC17:54
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Revert "fetch-subunit-output: collect additional subunit files"  https://review.opendev.org/67410217:55
AJaegertosky, corvus , I see failures and propose a revert ^17:56
toskyAJaeger: sure, failures where?17:56
AJaegerexample is https://logs.opendev.org/01/673401/2/check/openstacksdk-tox-py36-tips/0e0bed9/job-output.txt.gz#_2019-08-01_17_04_29_54962917:56
toskyotherwise I can't fix it17:56
*** yamamoto has joined #openstack-infra17:57
toskyAJaeger: there is an inconsistency in the way zuul_work_dir is defined, it seems17:57
*** slaweq has joined #openstack-infra17:57
AJaegertosky: lots of "post_failures" on openstacksdk jobs, have a look at zuul.opendev.org17:57
AJaegertosky: here as well: https://logs.opendev.org/87/673987/1/gate/cross-cinder-py27/9b81c84/job-output.txt.gz#_2019-08-01_17_55_02_29139117:58
toskyAJaeger: I can't fix them there, but I think there is an inconsistency in the usage of relative paths and full paths17:58
toskyso sure, revert, but the issue is somewhere else17:58
AJaegerinfra-root, should we revert - and then fix?17:59
corvuswe should certainly revert17:59
corvuswhether the fix is to correct all instances of zuul_work_dir, or rather to make the consuming role more forgiving is an open question17:59
AJaegerthanks, clarkb17:59
AJaegertosky: then let's work on figuring out the best way forward...18:00
toskytomorrow18:01
*** tosky has quit IRC18:01
*** gfidente has quit IRC18:03
*** yamamoto has quit IRC18:04
fungii think, given that it's a change in zuul-jobs, we probably need to make the role more forgiving (or do a lot of communication to downstream consumers in advance of the behavior change)18:04
fungibecause "we" can't fix all uses of it, given we are probably not even aware of some of them and have no visibility into them18:05
*** witek has joined #openstack-infra18:10
openstackgerritMerged zuul/zuul-jobs master: Revert "fetch-subunit-output: collect additional subunit files"  https://review.opendev.org/67410218:12
openstackgerritMerged openstack/project-config master: Adding another pool to FN for expanded memory test  https://review.opendev.org/67409118:13
AJaegerwe have a couple of places that use a relatve zuul_work_dir, so those jobs will fail. I agree, we need to make it more robust18:13
clarkbwe could use a tmpdir maybe?18:14
clarkbor possibly even append on the executor before upload to the log server?18:14
*** jcoufal_ has quit IRC18:14
AJaegerlooks like 32 or so repos use that in master - so if we require absolute paths, we have to backport them. http://codesearch.openstack.org/?q=zuul_work_dir%3A%20src&i=nope&files=&repos=18:15
AJaeger32 lines - not repos18:15
*** diablo_rojo has joined #openstack-infra18:17
openstackgerritMatt McEuen proposed openstack/project-config master: New project request: airship/kubernetes-entrypoint  https://review.opendev.org/67390018:18
*** jamesmcarthur has joined #openstack-infra18:23
*** betherly has joined #openstack-infra18:30
*** eharney has quit IRC18:32
*** diablo_rojo has quit IRC18:32
*** jcoufal has joined #openstack-infra18:34
*** betherly has quit IRC18:35
*** eharney has joined #openstack-infra18:36
fungiAJaeger: and that's just in opendev, to say nothing of other possible users outside opendev18:37
*** diablo_rojo has joined #openstack-infra18:42
*** diga has quit IRC18:44
AJaegerfungi: yeah. As tosky pointed out, the role itself says it needs to be an absolute paths - but we have other places where do not require that ;( So, I would be fine making that change *after* everything is fixed and an announcement went out since the risk of breakage is high...18:44
* AJaeger is suprised to see zuul_work_dir: "{{ zuul.project.src_dir }}" and in other files: zuul_work_dir: "{{ ansible_user_dir }}/{{ zuul.project.src_dir }}"18:45
AJaegerWhich one should we use?18:45
AJaegerand then zuul_work_dir: "src/{{ zuul.project.canonical_name }}"18:46
AJaegereven opestack-zuul-jobs has one occurence of "zuul_work_dir: src/opendev.org/zuul/zuul"18:46
AJaegerand I see zuul_work_dir: "{{ zuul.executor.work_root }}/{{ zuul.project.src_dir }}"18:47
*** jcoufal has quit IRC18:49
openstackgerritAndreas Jaeger proposed openstack/openstack-zuul-jobs master: Use absolute zuul_work_dir  https://review.opendev.org/67410818:52
AJaegerfungi: that is one place to update ^18:52
*** betherly has joined #openstack-infra18:58
AJaegerargh, that is not an absolute path either ;(19:01
AJaegerso, even more relative zuul_work_dirs since I ignored that pattern19:02
*** bhavikdbavishi has quit IRC19:02
AJaegerzuul.project.src_dir is relative19:03
*** betherly has quit IRC19:03
* AJaeger waves good night19:03
fungihave a good night AJaeger!19:04
*** bhavikdbavishi has joined #openstack-infra19:04
*** jtomasek has quit IRC19:05
*** michael-beaver has joined #openstack-infra19:22
*** tesseract has quit IRC19:23
*** eernst has joined #openstack-infra19:25
*** dpawlik has joined #openstack-infra19:32
clarkbanyone know how to link to multiple cacti graphs?19:35
clarkbI can't seem to figure it out19:35
clarkbI can chaneg the graphs in my browser but they don't seem to have links19:35
clarkbanyways gitea08 OOM'd again today19:36
clarkbno other hosts have OOM'd since the replacements19:36
clarkbcacti shows large number of connections and high cpu, memory, swap usage19:36
clarkbI'm going to trigger gerrit replication against gitea08 now just to be sure we don't have missing stuff there19:36
clarkbbut then I guess I'll need to look at logs to see what was going on?19:37
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109919:39
openstackgerritMatthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry  https://review.opendev.org/64240819:39
openstackgerritMatthieu Huin proposed zuul/zuul master: [WIP] admin REST API: zuul-web integration  https://review.opendev.org/64353619:39
*** bhavikdbavishi has quit IRC19:40
openstackgerritMatthieu Huin proposed zuul/zuul master: Use a requests session to simplify auth'd calls  https://review.opendev.org/67051119:40
clarkblooks like gitea actually crashed or was killed by oomkiller according to the logs. Not sure which yet19:41
clarkbbut basically goroutines complain about lack of memory and then there are logs of it starting again19:41
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support  https://review.opendev.org/67409219:41
fungiclarkb: so the lack of screaming means our haproxy health checks dtrt and took it out of the pool at least?19:42
clarkbfungi: ya I'm assuming so19:43
clarkbit should be back in the pool now19:43
fungiawesome19:43
fungithanks19:43
fungiyou couldn't tell from dmesg whether the oom killer got it?19:43
clarkboh I didn't readd it, I think haproxy checks should've passed and stuck it back into the rotation19:43
clarkbfungi: I could if I want to read the logs thoroughly19:43
fungino worries19:44
fungii'm taking a quick look to see if i can tell19:44
clarkbI'm trying to figure out what triggered this now19:44
clarkband I'm remembering it is difficult to read the logs for that (that proxy protocol thing might be a good idea)19:44
fungi[Thu Aug  1 11:36:15 2019] Out of memory: Kill process 14505 (gitea) score 949 or sacrifice child19:46
fungi[Thu Aug  1 11:36:15 2019] Killed process 24769 (git) total-vm:686700kB, anon-rss:23304kB, file-rss:0kB, shmem-rss:0kB19:47
fungiso gitea needed memory, git was ultimately sacrificed19:47
clarkbfungi: oomkiller was invoked a bunch though19:47
clarkbthe gitea crash seems to be closer to 11:4719:47
fungioh19:48
fungiyep19:48
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: builder: Remove recency table logging  https://review.opendev.org/67412419:48
fungiall the "Killed process" lines are git processes, except for one pandoc19:49
fungisuggesting that gitea was not killed directly via oom-killer but crashed or was terminated for some other reason19:49
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: add-build-sshkey: add centos/rhel-8 support  https://review.opendev.org/67409219:55
clarkbk19:55
clarkbsomehow emc, novell, ibm, and rackspace IPs all land on this one backend19:56
clarkbI suppose that is deterministic hashing for the how19:56
clarkbin the ~4 hour time period surrounding that increase in resource usage emc is the biggest number of connections19:57
clarkbfollowed closely by novell19:57
clarkbthen rackspace half as many as them and ibm half as many as rax19:57
*** jtomasek has joined #openstack-infra19:57
clarkb(the rax IP is not ours according to our inventory)19:57
*** jcoufal has joined #openstack-infra19:57
fungiany signs of similar spikes on other backends in the same timeframe?19:58
clarkbmy grep | sed | uniq -c is not sophisticated enough for that :)19:59
clarkbthough now that you mention it maybe we want a python script that can make bar graphs or something19:59
*** betherly has joined #openstack-infra20:00
clarkbI need to eat lunch so will have to take a break20:00
clarkbmy homedir on gitea08 has some files I've trimmed down from a copy of syslog20:00
clarkb`cut -d' ' -f 7 gitea08_curated_haproxy_logs | sed -ne 's/\(:[0-9]\+\)$//p' | sort | uniq -c | sort -n -r | head` will give you numbers20:00
*** eharney has quit IRC20:03
*** betherly has quit IRC20:04
*** nhicher has quit IRC20:04
clarkbthe emc IP shows up with multiple connections to gerrit with about 5 different accounts so likely a NAT addr20:05
*** jtomasek has quit IRC20:06
clarkball of the accounts talking to gerrit from that ip are dell emc third party ci accounts20:07
clarkbI don't know yet that they are at fault, but they are making a ton of requests and wouldn't surprise me if there is a relationship there20:08
clarkbok eating lunch now20:08
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: builder: Log all deletions of image upload records  https://review.opendev.org/67412620:10
*** tdasilva has joined #openstack-infra20:14
*** jamesmcarthur has quit IRC20:16
*** tosky has joined #openstack-infra20:17
openstackgerritMerged zuul/nodepool master: Avoid openstacksdk image delete bug  https://review.opendev.org/67404320:18
*** jcoufal has quit IRC20:22
fungithere is a great python library which will do "ascii" (more unicode) line/bar graphs, but i'm on the wrong computer to find that tab i had up20:34
* fungi is only half here at the moment, between breaks pushing around a mower20:34
*** diablo_rojo has quit IRC20:35
smcginnisfungi: This? https://github.com/lord63/ascii_art20:36
*** e0ne has joined #openstack-infra20:40
funginope, but that looks neat20:47
*** witek has quit IRC20:54
*** betherly has joined #openstack-infra20:59
*** smrcascao has joined #openstack-infra21:02
*** betherly has quit IRC21:03
*** witek has joined #openstack-infra21:05
*** cloudnull has quit IRC21:06
*** cloudnull has joined #openstack-infra21:07
*** Lucas_Gray has joined #openstack-infra21:13
*** betherly has joined #openstack-infra21:19
*** betherly has quit IRC21:24
*** witek has quit IRC21:25
clarkb`goaccess` can apparently be convinced to do things with haproxy via a log format string21:26
clarkbso I'm about to fiddle with that21:26
ianwfrickler: ahh, great point on removing the numbers from the test runs from the timing results!21:30
ianwlies, damned lies and statistics :)21:30
ianwclarkb: did you see https://review.opendev.org/#/c/673724/ & https://review.opendev.org/#/c/673739/1 from the other day -- your intuition on the recommends install is i believe correct21:37
*** eernst has quit IRC21:38
*** ekultails has quit IRC21:38
clarkbianw: I did not21:40
clarkbsorry I've been all over the place the last week or so21:40
ianwno worries :)  i think it does solve the mystery of why it works in the gate but not on new servers, though21:46
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift  https://review.opendev.org/67413621:47
*** betherly has joined #openstack-infra21:50
*** mriedem has quit IRC21:53
clarkbugh goaccess has trouble parsing haproxy logs because haproxy writes ipv6 addres without []s21:53
clarkbbut even then it seems to not do greedy matching so it fails21:53
*** betherly has quit IRC21:55
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift  https://review.opendev.org/67413621:59
*** tosky has quit IRC22:07
*** slaweq has quit IRC22:10
*** betherly has joined #openstack-infra22:11
*** slaweq has joined #openstack-infra22:11
clarkbianw: I think https://review.opendev.org/#/c/673739/1 is inverted. The test nodes arleady don't install recommends but production images to22:12
clarkbs/images to/images do/22:12
*** betherly has quit IRC22:15
*** slaweq has quit IRC22:16
clarkbok I finally figured out how to get goaccess to read the logs (I ended up wrapping all ip addrs in []s via sed then I could update the format for goaccess apropriately22:19
clarkbGives some interesting data22:19
clarkbthe emc ip I identified doesn't do anywhere near the bulk of the data transfer but it does do the third most cumulative connection time22:20
clarkbso whatever they are doing is expensive?22:20
clarkbNow I need to trim my logs a bit so that I'm only looking at the time period I care about (goaccess dashboard doesn't make that configurable)22:20
ianwclarkb: that's right, so that's why in base-server i've added the don't install flags so it applies on our control-plane/production servers?22:22
*** diablo_rojo has joined #openstack-infra22:22
ianwso they act the same as the testing nodes (and, if when we get nodepool uploads working, the dib control plane nodes)22:22
clarkboh for some reason I read that as a test playbook/role22:23
clarkbany concern with things breaking beacuse we'll stop installing all the packages we depend on if we do that?22:23
ianwclarkb: i don't think so ... i mean we would presumably only notice when rolling out new servers, and new servers should have pretty good gate coverage22:24
clarkbI guess unattended upgrades will continue to update installed packages for the older stuff so that is fine22:26
ianwand even all the old puppet testing has been running in the gate on nodes with it disabled via dib22:26
ianwi think it is definitely a openafs packaging bug that it seems to start the client and systemd seems to think it started although the modules aren't built22:27
ianwbut still, i think we're also better being more homogeneous between testing and production anyway22:28
*** smrcascao has quit IRC22:28
*** jamespage_ has joined #openstack-infra22:29
*** dustinc_ has joined #openstack-infra22:29
*** sdoran_ has joined #openstack-infra22:29
*** mnasiadka_ has joined #openstack-infra22:30
*** jrosser_ has joined #openstack-infra22:30
*** kmalloc_ has joined #openstack-infra22:30
*** dtantsur has joined #openstack-infra22:30
clarkban engineering college in india did the largest amount of data transfer during that ~4 hour window with the largest cummulative time spent against gitea0822:31
*** betherly has joined #openstack-infra22:31
clarkb~5.5x data transfered than emc22:31
clarkbabout about .25 of a day more time spent talking to the gitea08 backend22:32
clarkbdid class start and everyone was told to clone nova at the same time or something?22:32
clarkbtheir connections do begin in earnest around that 8:40UTC timeperiod22:35
clarkbI'm going to try and map that onto gitea connection logs now22:35
*** jpenag has joined #openstack-infra22:36
*** betherly has quit IRC22:36
*** jpena|off has quit IRC22:37
*** dtantsur|afk has quit IRC22:37
*** mordred has quit IRC22:37
*** mnasiadka has quit IRC22:37
*** jamespage has quit IRC22:37
*** kmalloc has quit IRC22:37
*** sdoran has quit IRC22:37
*** dustinc has quit IRC22:37
*** jrosser has quit IRC22:37
*** mnasiadka_ is now known as mnasiadka22:37
*** kmalloc_ is now known as kmalloc22:37
*** jamespage_ is now known as jamespage22:37
*** dustinc_ is now known as dustinc22:37
*** jrosser_ is now known as jrosser22:37
*** sdoran_ is now known as sdoran22:37
*** panda has quit IRC22:41
*** panda has joined #openstack-infra22:42
*** mordred has joined #openstack-infra22:44
*** e0ne_ has joined #openstack-infra22:45
*** e0ne has quit IRC22:46
*** e0ne has joined #openstack-infra22:47
*** e0ne_ has quit IRC22:50
*** tkajinam has joined #openstack-infra22:51
openstackgerritJames E. Blair proposed opendev/base-jobs master: Add swift base test job  https://review.opendev.org/67414322:51
*** e0ne has quit IRC22:52
*** whoami-rajat has quit IRC22:55
openstackgerritJames E. Blair proposed opendev/base-jobs master: Upload to a swift at random  https://review.opendev.org/67414423:07
corvusi think that ^ is going to be pretty cool :)23:08
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift  https://review.opendev.org/67413623:10
*** igordc has quit IRC23:10
*** ianychoi has quit IRC23:12
*** betherly has joined #openstack-infra23:12
clarkbinfra-root https://etherpad.openstack.org/p/debugging-gitea08-OOM23:13
clarkbI've tried to collect my current thoughts on the problem there23:13
clarkbcorvus: re swift stuff neat. We could probably also weight the swift uploads by total storage size23:14
clarkbso that we balance things out over time23:14
clarkb(but random is probably close enough)23:14
*** betherly has quit IRC23:18
*** rcernin has joined #openstack-infra23:21
clarkbtl;dr is we have some users that fetch disproportionately more data per connection and more time per connection and on top of that we have some very very long lived requests in gitea whcih I think may result in git processes living for extended periods using all the memory23:24
clarkbI don't want to tell users to go away, which means our best option may be to have gitea timeout quicker if things are going south?23:24
clarkbianw: left a comment on https://review.opendev.org/#/c/673739/123:33
ianwclarkb: yeah, i need to work on arm64 bionic control plane23:40
*** sthussey has quit IRC23:49
*** aaronsheffield has quit IRC23:51
*** betherly has joined #openstack-infra23:54
*** trident has quit IRC23:54
*** diablo_rojo is now known as diablo_rojo_23:56
*** diablo_rojo_ is now known as diablo__rojo_23:56
*** diablo__rojo_ is now known as diablo_rojoooooo23:57
*** diablo_rojoooooo is now known as diablo_rojo23:57
*** betherly has quit IRC23:58
*** trident has joined #openstack-infra23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!