Tuesday, 2020-07-14

*** ryohayakawa has joined #openstack-infra00:02
*** d34dh0r53 has quit IRC00:09
*** yamamoto has joined #openstack-infra00:10
*** tetsuro has joined #openstack-infra00:10
*** ysandeep|away is now known as ysandeep|rover00:11
*** d34dh0r53 has joined #openstack-infra00:12
*** rlandy|bbl is now known as rlandy00:34
*** gyee has quit IRC00:52
*** ysandeep|rover is now known as ysandeep|afk01:42
openstackgerritIan Wienand proposed zuul/zuul-jobs master: add-build-sshkey: Generate PEM format key  https://review.opendev.org/74084101:52
*** apetrich has quit IRC02:14
*** ysandeep|afk is now known as ysandeep|rover02:22
openstackgerritIan Wienand proposed zuul/zuul-jobs master: add-build-sshkey: Generate PEM format key  https://review.opendev.org/74084102:25
*** Goneri has quit IRC02:29
*** rlandy has quit IRC02:32
*** Lucas_Gray has quit IRC02:44
*** rakhmerov has joined #openstack-infra02:47
*** psachin has joined #openstack-infra02:59
*** rfolco has quit IRC02:59
*** ricolin_ has joined #openstack-infra03:00
*** tetsuro has quit IRC03:04
*** ysandeep|rover is now known as ysandeep|afk03:10
*** ykarel|away has joined #openstack-infra03:47
*** iurygregory has quit IRC03:54
*** TheJulia has quit IRC03:54
prometheanfireclarkb: ok, will do, I will say to just be prepared for breakage tomorrow03:54
*** TheJulia has joined #openstack-infra03:55
*** psachin has quit IRC04:07
*** tetsuro has joined #openstack-infra04:22
*** tetsuro_ has joined #openstack-infra04:26
*** tetsuro has quit IRC04:27
*** ykarel|away is now known as ykarel04:31
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-infra04:33
*** tetsuro_ has quit IRC04:41
*** psachin has joined #openstack-infra04:45
*** tetsuro has joined #openstack-infra04:46
*** marios has joined #openstack-infra04:54
*** ysandeep|afk is now known as ysandeep04:57
*** tetsuro has quit IRC04:58
*** ociuhandu has joined #openstack-infra05:01
*** ociuhandu has quit IRC05:05
*** soniya29 has joined #openstack-infra05:20
*** lmiccini has joined #openstack-infra05:25
*** udesale has joined #openstack-infra05:29
*** psachin has quit IRC05:29
*** psachin has joined #openstack-infra05:39
*** vishalmanchanda has joined #openstack-infra05:41
openstackgerritzhangboye proposed openstack/openstack-zuul-jobs master: migrate testing to ubuntu focal  https://review.opendev.org/74087505:45
openstackgerritzhangboye proposed openstack/os-testr master: migrate testing to ubuntu focal  https://review.opendev.org/74088906:08
*** elod is now known as elod_off06:15
*** ccamacho has joined #openstack-infra06:18
*** ykarel_ has joined #openstack-infra06:34
*** ykarel has quit IRC06:37
mnasiadkaMorning06:38
mnasiadkaAnybody from Linaro managing aarch64 CI nodes around? They seem to have an issue connecting to Docker Hub on Kolla jobs...06:39
*** dklyle has quit IRC06:39
*** SotK has quit IRC06:54
*** SotK has joined #openstack-infra06:55
*** eolivare has joined #openstack-infra07:02
*** rcernin has quit IRC07:05
*** iurygregory_ has joined #openstack-infra07:10
*** kevinz has joined #openstack-infra07:26
*** iurygregory_ is now known as iurygregory07:31
*** ralonsoh has joined #openstack-infra07:31
*** nightmare_unreal has joined #openstack-infra07:31
*** ccamacho has quit IRC07:32
*** ysandeep is now known as ysandeep|brb07:34
*** tosky has joined #openstack-infra07:37
*** yamamoto has quit IRC07:38
*** xek has joined #openstack-infra07:40
*** yamamoto has joined #openstack-infra07:40
*** bhagyashris|afk is now known as bhagyashris07:44
*** dtantsur|afk is now known as dtantsur07:56
*** fumesover3 has joined #openstack-infra08:07
*** rcernin has joined #openstack-infra08:09
*** ysandeep|brb is now known as ysandeep|rover08:10
*** Lucas_Gray has joined #openstack-infra08:10
tkajinamIs zuul down now ?08:14
*** rcernin has quit IRC08:14
*** rcernin has joined #openstack-infra08:15
fricklertkajinam: not for me, what issue do you see with it?08:18
tkajinamfrickler, seems like it came back. I couldn't access to it a couple minutes ago08:18
fricklermnasiadka: ianw mentioned a possible network issue with linaro in #opendev, likely related08:18
tkajinambecause of connection timeout08:18
mnasiadkafrickler: yeah, it's long running08:19
fricklertkajinam: o.k., let us know if you see any further issue08:20
tkajinamfrickler, thanks. it seems that it is working well so far08:20
tkajinamI can see zuul status and its job results08:21
tkajinamnow08:21
*** ykarel_ is now known as ykarel08:36
*** rcernin has quit IRC08:42
*** derekh has joined #openstack-infra08:48
*** ociuhandu has joined #openstack-infra08:50
*** donnyd has quit IRC08:53
*** donnyd has joined #openstack-infra08:53
*** ccamacho has joined #openstack-infra09:00
*** apetrich has joined #openstack-infra09:00
*** piotrowskim has joined #openstack-infra09:03
*** auristor has quit IRC09:03
*** auristor has joined #openstack-infra09:09
*** lucasagomes has joined #openstack-infra09:10
*** xek has quit IRC09:12
*** pkopec has joined #openstack-infra09:23
*** pkopec has quit IRC09:41
*** frickler is now known as frickler_pto09:44
*** frickler_pto is now known as frickler09:47
*** dtantsur is now known as dtantsur|bbl09:49
*** grantza has joined #openstack-infra10:08
*** tkajinam has quit IRC10:12
*** yamamoto has quit IRC10:16
*** yamamoto has joined #openstack-infra10:22
*** gouthamr has quit IRC10:22
*** gouthamr has joined #openstack-infra10:23
*** Jeffrey4l has quit IRC10:23
*** Jeffrey4l has joined #openstack-infra10:24
*** yamamoto has quit IRC10:26
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind  https://review.opendev.org/74093510:50
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind  https://review.opendev.org/74093510:51
*** ysandeep|rover is now known as ysandeep|afk10:53
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind  https://review.opendev.org/74093510:58
*** yamamoto has joined #openstack-infra11:02
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind  https://review.opendev.org/74093511:05
*** yamamoto has quit IRC11:08
*** rcernin has joined #openstack-infra11:13
*** Lucas_Gray has quit IRC11:28
*** Lucas_Gray has joined #openstack-infra11:33
*** yamamoto has joined #openstack-infra11:40
*** rlandy has joined #openstack-infra11:43
*** rlandy is now known as rlandy|ruck11:45
*** iurygregory has quit IRC11:46
*** ysandeep|afk is now known as ysandeep11:47
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind  https://review.opendev.org/74093511:54
*** iurygregory has joined #openstack-infra12:00
*** rfolco has joined #openstack-infra12:03
zbr|ruckmore mirror errors? https://c3c4d9b326375e78bcd8-2bdda90a1128cbc54c09909a8150f07c.ssl.cf2.rackcdn.com/739939/2/gate/tripleo-tox-molecule/3975439/tox/molecule-1.log12:04
zbr|ruckmirror.mtl01.inap.opendev.org - Caused by ResponseError('too many 503 error responses', -- looks weird.12:05
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093512:06
*** yamamoto has quit IRC12:08
*** yamamoto has joined #openstack-infra12:13
zbr|ruckI also tried to search for "too many 503 error responses" on logstash but got no results.12:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093512:13
*** udesale_ has joined #openstack-infra12:22
*** xek has joined #openstack-infra12:23
*** derekh has quit IRC12:23
*** udesale has quit IRC12:25
*** ryohayakawa has quit IRC12:25
*** yamamoto has quit IRC12:27
*** grantza has quit IRC12:33
*** grantza has joined #openstack-infra12:34
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093512:36
*** yamamoto has joined #openstack-infra12:41
*** artom has quit IRC12:43
*** artom has joined #openstack-infra12:44
*** artom has quit IRC12:44
*** artom has joined #openstack-infra12:45
*** rcernin has quit IRC12:48
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093512:59
*** rlandy|ruck is now known as rlandy|ruck|mtg13:00
*** ysandeep is now known as ysandeep|rover13:03
*** derekh has joined #openstack-infra13:03
*** yamamoto has quit IRC13:10
*** eharney has joined #openstack-infra13:11
*** xek has quit IRC13:13
*** Goneri has joined #openstack-infra13:14
fungizbr|ruck: that's an odd message indeed. i mean, the url it complained about wouldn't have been served from there, but should be generating 404 not 50313:16
*** ykarel is now known as ykarel|away13:16
fungii'll see if i can find some 503 errors in the apache logs for port 44313:16
fungioh, right, we have proxying for /pypi and /pypifiles on 44313:18
fungiso it could have been passing through the 503 responses from pypi/fastly13:18
zbrfungi: 2nd question would be why not visible in logstash?13:20
fungito tox/*.log get indexed in logstash?13:21
fungior is logstash processing backlogged?13:21
zbrpyyaml was released in march and is very popular, so it should have being already in cache anyway13:21
fungiyeah, but the url it's complaining about is /pypifiles/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz13:22
zbrno idea yet, but i plan to find-out, seems a very important informtion13:22
zbrlol, that is 2018, even worse.13:23
fungiit's too bad tox doesn't record timestamps, but i can probably find the requests by ip address13:24
fungizbr: yeah, so it tried 6 times to get https://mirror.mtl01.inap.opendev.org/pypifiles/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz and each time https://files.pythonhosted.org/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz (to which it's proxying) responded with a 50313:29
fungicurrently that mirror can retrieve it with wget, so whatever the problem was seemed to have been temporary13:30
fungihttp://paste.openstack.org/show/795915/13:30
zbrit happened at the gate.... :p13:30
zbrand w/o logstash search we are unable to discover how common is the 503 from pypi.13:30
fungii can grep the apache logs real quick on that mirror to get an idea13:31
zbrtbh, we should expect 503 from time to time.13:31
zbrquestion is if we can find a way to prevent a failure13:32
zbrwhat is our current mirroring logic? can we have another fallback on specific errors, like 503 and try another mirror instead?13:33
fungiaccess log on that mirror starts 06:25z today, 12 503 responses and knowing that pip tries 6 times before giving up that suggests it happened twice to builds in inap in the past ~7 hours13:33
zbrfungi: you greped for 503 in general or this file in particular?13:34
zbri guess it does not make sense to ask at pypi, as they will say "it is possible".13:34
fungiyeah, in addition to your ansi2html-1.5.2.tar.gz failure at 10:40:55-10:41:02 there was a similar failure for lxml-4.5.0-cp27-cp27mu-manylinux1_x86_64.whl at 10:46:11-10:46:1913:35
fungiso the 12 error responses spanned a little over 5 minutes13:36
fungialso remember it's not *necessarily* pypi because they're going through a cdn (fastly) so it could have been isolated to the nearest cdn endpoint to inap's mtl01 region13:36
fungii'll check some other mirrors' access logs for comparison13:37
fungiovh bhs1 had a build which saw two such failures for varlink-30.3.0-py2.py3-none-any.whl at 10:42:2713:39
*** benj_ has quit IRC13:39
fungisince it wasn't six, i'm guessing the next try was successful... checking13:39
fungiyep, 200 ok for it at 10:42:2813:40
*** benj_ has joined #openstack-infra13:41
fungihowever, both those provider regions are geographically near one another (quebec, canada) so it could still be a localized issue with that fastly endpooint13:43
fungii didn't see any similar occurrences in our other ovh region (in france)13:43
zbrfungi: would it be possible to fallback to one central mirror of ours for 503?13:44
*** psachin has quit IRC13:44
funginor in vexxhost's california region13:45
zbrthat needs to go in our mirror implementation, something like "try pypi else if 503 try fallbackl ...."13:45
*** dtantsur|bbl is now known as dtantsur13:45
fungii have no idea, nor do i know whether that would be generally more robust, nor whether this is such an isolated incident that the human cost in maintaining a workaround exceeds the benefit13:46
fungiso far we have a situation which caused some downloads to fail in quebec over a 5 minute span of time13:46
zbri know for sure that I seen 503 from pypi in the past (like months ago).13:47
zbri asked about fallback, w/o knowing what implementation we have.13:47
zbri know how to do a fallback with nginx, but no idea if is easily doable with what we use.13:47
fungino similar occurrences today yet in any of our rackspace regions either (texas, illinois, virginia)13:48
zbryes, if is too hard it may not make sense, but if if it is easy. it may safe us from few failures.13:48
*** ralonsoh has quit IRC13:49
*** frickler is now known as frickler_pto13:50
fungiit's apache mod_proxy, the configuration can be found here: sudo grep 'Response status 503' /var/log/apache2/mirror_443_access.log|cut -d' ' -f5,713:50
fungier, sorry, pasted from wrong buffer13:50
fungihttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L49-L10113:50
fungithere13:50
*** ralonsoh has joined #openstack-infra14:00
*** rlandy|ruck|mtg is now known as rlandy|ruck14:00
*** dave-mccowan has joined #openstack-infra14:02
zbri am reading the docs now, apparently it has supports for lots of failover models. i am still not sure to where to point it to.14:02
fungipart of the problem is that pypi itself provides no fallback, instead it trusts its cdn network to be robust14:03
fungiso there's no "if the cdn is broken go here instead" url i'm aware of14:04
*** ykarel|away has quit IRC14:05
openstackgerritOleksandr Kozachenko proposed openstack/project-config master: Add openstack/horizon to the vexxhost tenant  https://review.opendev.org/74096914:06
*** lucasagomes has quit IRC14:13
*** lucasagomes has joined #openstack-infra14:14
*** armax has joined #openstack-infra14:16
*** knikolla has joined #openstack-infra14:21
zbrfungi: i asked for help from pypi-dev and I was asked to raise https://github.com/pypa/warehouse/issues/826014:22
zbrthey will look into, hopefully they will give us a fallback14:22
zbrfungi: lets watch it, dstufft offered to look into :)14:25
*** ykarel|away has joined #openstack-infra14:26
*** yamamoto has joined #openstack-infra14:28
fungithanks for bringing it up with them14:28
AJaegerconfig-core, please review https://review.opendev.org/739876 https://review.opendev.org/739892 https://review.opendev.org/740711 https://review.opendev.org/740614 https://review.opendev.org/74031014:29
*** dave-mccowan has quit IRC14:33
*** dklyle has joined #openstack-infra14:34
*** dave-mccowan has joined #openstack-infra14:37
openstackgerritMerged openstack/project-config master: Add openstack/horizon to the vexxhost tenant  https://review.opendev.org/74096914:43
openstackgerritMerged openstack/project-config master: update-constraints: Install pip for all versions  https://review.opendev.org/73892614:43
clarkbzbr: fungi: an important clarification is that our apache mirrors have a 24 hour limit on any cached object. That means we'll still query the remote backend to check validity after that period. Also the cache objects can be expired early if necessary to make room for other objects.14:47
*** ysandeep|rover is now known as ysandeep|food14:47
clarkband yes logstash indexing has been bhind recently. I think some file is causing things to OOM or similar though I've not had time to look closer than e-r's status page14:47
zbrclarkb: that 24h is an implementation aspect, it does not influence the outcome.14:48
clarkbzbr: it does, you assumed that an older popular package would always be in the cache and the mirror wouldn't make external requests14:48
clarkbthis isn't necessarily true for multipel reasons. I wanted to make sure that was clarified14:48
zbrok. for pypi packages that cannot be updated by design we could decide to improve the logic maybe.14:49
clarkb13:21:49              zbr | pyyaml was released in march and is very popular, so it should have being already in cache anyway14:49
*** manfly000 has joined #openstack-infra14:49
clarkbno I think teh current logic is correct even for objects that shouldn't change14:50
clarkbit acts as belts and suspenders14:50
*** johnthetubaguy has quit IRC14:50
zbrstill, i would like to see a way to avoid sending the 503 to pip, when it happens, but we will need upstream help for a fallback.14:50
clarkbour mirrors shouldn't change responses14:50
clarkbso ya thats entirely up to pypi and their cdn14:51
*** johnthetubaguy has joined #openstack-infra14:52
zbri disagree here, we are in business to make CI/CD fast and less dependent on external networking issues. It would be up to us to decide what to do when upstream is flaky.14:53
funginote that files on pypi *can* change by being deleted, and we want that state correctly reflected too14:53
clarkbthat isn't quite accurate imo either14:53
zbri mentioned only 503 code, which is a non-cachable one14:53
clarkbwe are in the business of testing software that accurately reflects its "life" outside of the CI system14:54
zbri did not say we should cache 404, or other codes.14:54
*** manfly000 is now known as xiaoguang14:54
fungialso, i don't know when we got into business, i don't personally remember that, i'm here to serve community collaborations14:54
zbrthere is a small set of http requests which indicate that the client should try again, as "we have a problem".14:54
clarkbfungi: figure of speech14:54
clarkbthe goal of the CI system isn't to make the software pass on every run14:55
*** xiaoguang is now known as manfly00014:55
clarkbthe goal of the CI system is to accurately reflect the life of software outside of the CI system so that we can catch and address problems early14:55
*** artom has quit IRC14:56
clarkbif we change external behaviors we make the software dependent on the CI system and that is a problem14:56
*** artom has joined #openstack-infra14:56
clarkbin this particular case it seems like maybe we found a pypi bug and now it can be addressed14:56
clarkbthat is why we run the CI tooling14:57
*** artom has quit IRC14:57
*** artom has joined #openstack-infra14:57
zbrclarkb: fungi: please comment on the bug, eventually even posting a comment on #pypa-dev -- maybe we can persuade them do to something while is still "hot".14:58
zbri clearly agree that is pypi, issue, but i am realistic that I cannot always rely on external service providers to fix their stuff.14:58
zbrwe should first engage upstream, and see what we can do after.14:59
clarkbeven then we shouldn't update the CI system to fix it, because the software still has to deal with it outside of the CI system14:59
clarkbworking around the problem in the software itself is fine14:59
*** manfly000 has left #openstack-infra14:59
*** artom has quit IRC14:59
clarkbbut the CI "platform" should do its best to be special in that regard14:59
fungiif we can't rely on external providers to fix things, we shouldn't expect the users of our software to have to either though, right?14:59
clarkb*do its best not to be special in that regard15:00
*** xek has joined #openstack-infra15:00
zbrfungi: pypi was reliable enough so far, but i am sure you would have a different take if it would become more flaky suddenly15:00
*** artom has joined #openstack-infra15:01
*** artom has quit IRC15:01
clarkbfor similar reasons this is why I've said that devstack should handle setuptools distutils vendoring and not try and fix that in the CI system15:01
zbranyway, i consider the subject closed, not proposing any chance on our side for mirrors.15:01
clarkbbecause other people using debian and ubuntu will have the same problem and we don't want the tooling to be dependent on the CI platform15:01
*** artom has joined #openstack-infra15:01
*** xiaoguang has joined #openstack-infra15:01
zbrthe irony is that I got a similar issue today with docker build with alpine apk, which is so stupid that it ignores failure to add a repository.15:02
zbrclearly not the CI/CD issue, only a combination of lack of retry support in both apk and docker build15:02
zbrand it happened exactly on my release tag :p15:03
*** yamamoto has quit IRC15:03
*** artom has quit IRC15:04
*** rlandy|ruck is now known as rlandy|ruck|mtg15:06
*** xiaoguang has quit IRC15:06
*** lmiccini has quit IRC15:16
*** armax has quit IRC15:18
*** xek has quit IRC15:19
*** Limech has joined #openstack-infra15:20
openstackgerritMerged openstack/openstack-zuul-jobs master: Enable grenade again for Stein, Rocky and Queens  https://review.opendev.org/74031015:21
*** hamalq has joined #openstack-infra15:23
openstackgerritMerged openstack/project-config master: maintain-github-mirror: add requests dependency  https://review.opendev.org/74071115:26
*** artom has joined #openstack-infra15:27
*** hamalq has quit IRC15:29
*** ykarel|away has quit IRC15:30
*** hamalq has joined #openstack-infra15:35
*** udesale_ has quit IRC15:35
*** ysandeep|food is now known as ysandeep15:38
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a60/733250/19/check/neutron-tempest-dvr-ha-multinode-full/a6025d9/controller/logs/screen-q-svc.txt (don't actually open that link directly, drop the file at the end to browse the dir) is ~300MB large and I think is contriburing to the logstash behindness15:39
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_464/735125/10/check/neutron-ovn-rally-task/4648c27/controller/logs/screen-q-svc.txt as well15:40
clarkbit seems that maybe neutron exploded its logging size15:40
openstackgerritMerged openstack/openstack-zuul-jobs master: Add template for charm check and gate  https://review.opendev.org/74061415:40
* clarkb joins #neutron15:41
*** ysandeep is now known as ysandeep|away15:43
clarkb* #openstack-neutron. I've brought it up there and will see if they can help make that happier15:44
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093515:47
*** gyee has joined #openstack-infra15:50
clarkbneutron is instrumenting function call times in debug log output15:50
clarkbthat accounts for the majority fo the logging output there. I've left that info in #openstack-neutron and suggested it be off by default and when enabled it can log to a separate file possibly15:50
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093515:51
clarkbzbr: ^ you had wondered about that and I think this is the root cause15:51
zbrwow 300, we should definetly have a limit per job.15:53
fungiwe do15:53
clarkbif they are unwilling or unable to fix that we can stop indexing that file, but I'd like to see if they can address the underlying issue first15:57
*** artom has quit IRC15:59
*** lucasagomes has quit IRC15:59
toskyfungi: about os-loganalyze, is the OpenDev Infra team meeting later today to discuss about it (whether to port the jobs or retire it altogether), or have your discussed about it already?16:00
clarkbtosky: we aren't using it anymore. I don't know that there is much to discuss.16:01
*** marios is now known as marios|out16:01
*** Lucas_Gray has quit IRC16:01
clarkbmaybe we want to advertise we aren't able to maintain it anymore and if anyone is using it they can maintain it if they like16:01
clarkbbut we've long since switched to zuul's rendering of log output16:02
*** rlandy|ruck|mtg is now known as rlandy|ruck16:02
fungii think the question is specifically about current jobs for osla blocking the effort to get rid of legacy ci jobs16:04
fungiwe could delete those jobs or retire the repo entirely16:04
fungibut i don't personally see the benefit in us spending time updating jobs for a project we no longer use16:05
clarkbya if the desire is to fix jobs for osla I'd say delete the jobs instead16:05
*** sshnaidm is now known as sshnaidm|afk16:08
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093516:09
toskyclarkb: yes, sorry, I should have provided more details (it's a follow-up from a few days ago); it's about removing or porting a legacy job16:12
*** aedc_ has joined #openstack-infra16:24
*** aedc_ has quit IRC16:24
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/74093516:25
*** aedc has quit IRC16:27
zbrclarkb: fungi : https://github.com/pypa/warehouse/issues/8260 -- i knew it it was too good to be true. every time i have to interact with him, i regret it.16:46
*** sshnaidm|afk is now known as sshnaidm16:48
fungii think they're being reasonable on that. pypi is funded by psf, which has basically no budget and something like 1.5 full time engineers managing all of their infrastructure. the cdn is generously donated by fastly, so it's not like they have any leverage if it'd down occasionally. this is just one of a vast number of ways pip install can encounter a failure, and it already has a number of general16:51
fungimitigations built in (in the cases we logged, it retried the request 6 times with an escalating backoff timer over a couple of minutes)16:51
*** nightmare_unreal has quit IRC16:52
fungias i noted in my comment, trying to engineer solutions for an ever increasing number of corner cases increases the complexity of your software until that complexity becomes a liability and contributes to worse outages16:52
fungiwe had fewer request failures when we maintained a full mirror, but we decided to switch to a caching proxy even though we knew it would at least slightly decrease reliability, because it also significantly reduced the maintenance cost16:53
*** jackedin has joined #openstack-infra16:54
fungithese community services (our ci system and pypi as well) have to balance maintenance burden against reliability16:54
fungii also think it was not at all nice of you to make comments like 'a Netflix engineer would not have closed the bug with a "not my problem"'16:55
fungithese are people striving to make the best choices for their community with the funds and time available to them16:56
*** ociuhandu_ has joined #openstack-infra16:58
*** derekh has quit IRC16:58
*** yamamoto has joined #openstack-infra17:01
*** ociuhandu has quit IRC17:01
*** ociuhandu_ has quit IRC17:02
*** dtantsur is now known as dtantsur|afk17:31
*** marios|out has quit IRC17:35
zbri see things bit different here, nobody asked fastly about how to resolve this, nor we asked them if they could provide another endpoint.17:47
*** yamamoto has quit IRC17:53
*** piotrowskim has quit IRC17:56
zbrat this point we have proof that it happened 6 times on one mirror in two weeks. Can we start logging 503 for longer period of time? what is the number of failures for 6 months for example?17:57
*** ralonsoh has quit IRC17:57
zbrtwo weeks single host is not enough to get an estimate for  number of failures/year17:58
fungithose requests spanned a few minutes, so were part of a single incident, not six17:58
fungii'm pretty sure that the pypi maintainers are happy that their content is accessible most of the time in most of the world. they're not accountable to you or me or anyone else. they're providing a public service run on donated funding and donated infrastructure, and it's up to their donors to determine where that time and money are best applied. sometimes it's down. sometimes the internet is down18:03
fungitoo. there are likely much larger problems they'd rather focus on fixing, and that's their prerogative18:03
clarkbright, we've sort of accepted there is a certain degree of unreliability inherrent to the system18:03
clarkbthis is why we cache and mirror as we are able18:04
clarkbits not perfect but it helps and when things sneak through we can report them to see if upstream can fix them. If not oh well18:04
*** xek has joined #openstack-infra18:04
zbrok. so no action, even if i would have preferred to have a way to log such occurences for longer period of times.18:05
fungilog analysis might be warranted to determine the degree of unreliability so that choices we make can be reevaluated, but i wouldn't take one incident as a signal that it's worth the time to perform that deeper level of investigation18:06
zbrfungi: the reason why i spend this amount of time today on this is because is not the first time I seen this error.18:06
zbrusually on first occurence I just... recheck18:07
zbrbut i remember seeing exactly 503 on pythonhosted before. obviously I have to logs to prove it.18:07
clarkbfwiw we DO have a method of trackign these problems. the unfortuante reality is those tools have languished because few use them and they process terabytes of data constantly and as a result need care18:07
*** d34dh0r53 has quit IRC18:08
fungiit apparently impacted two builds in the span of two weeks from what i could see. out of, i don't know how many builds total but we did a *lot* likely tens of thousands. the percentage failure from this occurrence is disproportionate to the time invested in trying to "solve" it18:08
zbrcurrent logstash instance is not good for this as we feed it too much data, we need something where we cherry pick and log specific events, and keep them for like 1-2years.18:09
clarkbzbr: it would be trivial to record occurence data over the long term via logstash queries18:09
zbrbasically is the same tool, but with a very different usage approach.18:09
clarkbbasically once a week record the number of instances18:09
clarkbthen log that over time18:09
fungias long as we know what to look for (say based on elastic-recheck queries getting maintained regularly)18:10
zbrhmm, that would be interesting.18:10
clarkbfungi: yes exactly18:10
zbrlike a log-compress filter18:11
*** d34dh0r53 has joined #openstack-infra18:15
*** jackedin has quit IRC18:29
*** rmcall has joined #openstack-infra18:30
*** xarses has quit IRC18:35
*** xarses has joined #openstack-infra18:35
*** eolivare has quit IRC18:37
*** ricolin_ has quit IRC18:41
*** slaweq has joined #openstack-infra18:54
*** slaweq has quit IRC18:56
*** ramishra has quit IRC18:58
*** ociuhandu has joined #openstack-infra19:14
*** ociuhandu has quit IRC19:23
*** ramishra has joined #openstack-infra19:25
*** soniya29 has quit IRC19:32
*** yamamoto has joined #openstack-infra19:51
*** auristor has quit IRC20:38
*** auristor has joined #openstack-infra20:40
*** yamamoto has quit IRC20:51
openstackgerritIan Wienand proposed zuul/zuul-jobs master: upload-afs-synchronize: expand documentation  https://review.opendev.org/74105120:52
*** markvoelker has joined #openstack-infra21:23
openstackgerritMerged zuul/zuul-jobs master: write-inventory: add per-host variables  https://review.opendev.org/73989221:26
*** markvoelker has quit IRC21:27
*** yamamoto has joined #openstack-infra21:35
*** krotscheck has joined #openstack-infra21:46
*** vapjes has joined #openstack-infra21:53
*** jamesmcarthur has joined #openstack-infra22:32
*** vishalmanchanda has quit IRC22:39
*** rcernin has joined #openstack-infra22:44
*** armax has joined #openstack-infra22:47
*** tosky has quit IRC22:50
*** rcernin has quit IRC22:51
*** rcernin has joined #openstack-infra22:51
*** tkajinam has joined #openstack-infra22:58
*** hamalq has quit IRC22:58
*** yamamoto has quit IRC23:02
*** armax has quit IRC23:12
*** yamamoto has joined #openstack-infra23:33
*** jamesmcarthur has quit IRC23:36
*** xek has quit IRC23:36
*** jamesmcarthur has joined #openstack-infra23:48

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!