Wednesday, 2018-11-07

ianwok, so it's failing on "TASK [cloud-launcher : Processing keypair infra-root-keys for openstackzuul-citycloud]"00:01
clarkbianw: ok for that one I think we have to delete the existing keys00:01
clarkbI recall running into that whenspinning up packethost? basically we changed the value in yaml but launcher doesn't know how to do a delete add to update00:02
ianwclarkb: oh, it's an authentication issue (at least the first problem is :)00:02
clarkbianw: if you delete it nodepool won't be able to boot instances while its gone but citycloud is disabled anyway00:02
clarkbah00:02
ianwi guess we *also* want the region00:03
ianwin the task message00:03
*** betherly has quit IRC00:04
clarkbhttp://graphite.openstack.org/render/?from=20181019&until=20181106&target=scale(stats.timers.zuul.tenant.openstack.pipeline.check.resident_time.mean,%200.00001666666) illustrates the queue improvements we've seen00:07
clarkbthank you tripleo!00:07
openstackgerritIan Wienand proposed openstack/ansible-role-cloud-launcher master: Add region to task output  https://review.openstack.org/61603500:11
ianwclarkb: ok, that will put info in the launcher, but i'll look at manual auth now00:12
*** slaweq has quit IRC00:12
*** bobh has joined #openstack-infra00:12
*** slaweq has joined #openstack-infra00:13
ianwyep, so even "OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml ./env/bin/openstack --os-cloud=openstackzuul-citycloud --os-region Lon1 server list" doens't work right now00:13
clarkbhrm did openstacksdk 0.19.0 install properly?00:14
clarkbits there00:15
ianwyeah, this is my virtualenv with that00:15
ianwmordred: ? ^00:15
clarkbianw: it works with my venv00:16
clarkb/home/clarkb/venv/bin/openstack if you want to cross check that00:16
*** bobh has quit IRC00:17
ianwhrm00:17
ianw./env-ansible-devel/bin/pip list | grep openstacksdk00:17
ianwopenstacksdk           0.19.000:17
ianw$ /home/clarkb/venv/bin/pip list | grep openstacksdk00:18
ianwopenstacksdk           0.17.200:18
clarkbhrm00:18
clarkbcould be a new regression in the sdk?00:18
ianwi feel like I've journeyed a long way from trying to stand up a mirror server in the new arm cloud :)00:19
clarkbianw: fwiw, I would not be opposed to you running cloud launcher against the armci cloud alone then setup a mirror there00:20
clarkbianw: and we can work the larger more general problem in parallel (with mordred hopefully as he is able)00:20
ianwyeah, that's an option00:22
clarkbI'm going to update my venv now and run with debug output from openstackclient00:22
clarkbianw: hrm it actually works with 0.19.0 for me too00:23
clarkbbug in another library that didn't get updated maybe?00:23
ianw:/ ... ok, i have debugging output too, want to be careful with it that i don't post a password :)00:23
ianwit's talking to "Starting new HTTPS connection (1): lon1.citycloud.com:5000"00:24
clarkbianw: feel free to use my venv as needed to debug if you decide to do that00:25
ianwclarkb: it doesn't work for me using your venv00:26
clarkbI'm gettin sucked into eelection watching. And I can always recreate the venv if I need to00:26
clarkbianw: hrm something external to the venv then??00:26
ianwOS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml /home/clarkb/venv/bin/openstack --os-cloud=openstackzuul-citycloud -vvv --os-region Lon1 server list00:26
clarkboh! I'm using openstackci-citycloud, not openstackzuul-citycloud00:26
clarkbok now it doesn't work00:26
clarkbso difference in those two accounts00:27
ianwhrm, then i wonder if we just have a wrong password00:27
clarkbmaybe?00:27
mordredclarkb: what did I do?00:27
clarkbmordred: maybe nothing? citycloud still not working quite right, but we've narrowed it down to openstackzuul-citycloud. openstackci-citycloud works00:28
ianwmordred: can you make OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml /home/ianw/env/bin/openstack --os-cloud=openstackzuul-citycloud --os-region Lon1 server list work? :)00:28
*** ssbarnea has quit IRC00:28
ianwdid the password change maybe?00:28
mordredoh - ... I think the project id for lon is different00:28
clarkbianw: we've been in hibernation with citycloud so it is possible00:28
mordredI think I remember seeing that this was the case when I was logged in to their web interface00:29
mordredor maybe the domain id - or something00:29
clarkbswitch to name maybe?00:29
openstackgerritMerged openstack/ansible-role-cloud-launcher master: Add region to task output  https://review.openstack.org/61603500:29
mordredyeah - I think that's what I did for the other one I updated ... becaues hte names are consistent across the regions00:30
mordred    Additionally, Sto2 and Lon1 each have different domain ids. The domain00:30
ianwok, pulling up webui ...00:30
mordred    names are the same though - and that's good, because logical names are00:30
mordred    nicer in config files anyway.00:30
*** gfidente|afk has quit IRC00:31
mordredyes - that will be it - update project_id project_domain_id and user_domain_id to be the name varieties00:31
mordredsorry I didn't catch that in the other patch00:32
ianwright yeah they're all CCP_Domain_27611 ?00:32
mordredI don't know if the openstackzuul user is in the same domain00:33
mordredI'm *certain* that 'Default Project 27611' will not be the project00:33
clarkbI think each user/account has a different domain there00:33
ianwmordred: from the webui, i think so -- both users seem to be there00:33
clarkbianw: oh you know what it is we have one account with two users in the same domain00:34
clarkbmaybe00:34
mordredyah00:34
mordredand probably 2 different projects00:34
mordredwhich have a selector up top00:34
clarkbya00:34
ianwall the endpoints https://sto2.citycloud.com:8776/v2/65684... end with the same digits00:35
ianwis that the project?00:35
mordredyah - that'll be the project_id00:35
*** mriedem has quit IRC00:36
*** longkb has joined #openstack-infra00:36
ianwthat's not private is it?00:36
mordredit's not00:36
mordredthe project name will be up in the top - it's a little harder to find00:36
* mordred has to run - but is very confident this will fix things00:38
*** irclogbot_1 has quit IRC00:38
ianwhrm, i see where that projectid comes from ... bed89257500340af8d0fbe7141b1bfd6 ... but that's the one not working00:38
clarkbwell if domain is wrong the project id wont matter00:39
ianwand it's different the the one listed at the end of the endpoint urls00:39
ianwhang on they have an rc file ... maybe this will help00:41
ianwok, i think so, testing ...00:42
ianwyep, ok, got it00:45
clarkbyay00:46
ianwoh haha it's already ok nodepool config00:48
openstackgerritIan Wienand proposed openstack-infra/system-config master: Update citycloud project details  https://review.openstack.org/61603900:50
ianwclarkb / mordred : ^00:50
clarkb+200:51
clarkbany other infra-root around to do second eyeball on ^00:51
jheskethclarkb, ianw: yep, taking a look00:51
*** sthussey has quit IRC00:52
jheskethlgtm00:53
ianwthanks.  i'm going to grab some lunch and when i come back will check on cloud launcher again ...00:58
*** Swami has quit IRC01:01
*** fresta has quit IRC01:38
*** fresta has joined #openstack-infra01:40
*** yamamoto has joined #openstack-infra01:46
*** fresta has quit IRC01:47
*** fresta has joined #openstack-infra01:51
*** rlandy has quit IRC01:58
*** markvoelker has quit IRC02:00
*** markvoelker has joined #openstack-infra02:00
*** markvoelker has quit IRC02:02
*** armax has quit IRC02:10
*** dave-mccowan has joined #openstack-infra02:16
*** dayou has quit IRC02:20
*** dayou has joined #openstack-infra02:21
*** dayou has quit IRC02:25
*** dayou has joined #openstack-infra02:26
*** dave-mccowan has quit IRC02:27
*** dayou has quit IRC02:29
*** dayou has joined #openstack-infra02:33
*** annp has joined #openstack-infra02:34
*** dayou has quit IRC02:34
*** dayou has joined #openstack-infra02:36
*** dayou has quit IRC02:37
*** dayou has joined #openstack-infra02:38
*** dayou has quit IRC02:42
*** dayou has joined #openstack-infra02:44
*** dayou has quit IRC02:45
*** dayou has joined #openstack-infra02:46
*** dayou has quit IRC02:47
*** mrsoul has quit IRC02:48
*** dayou has joined #openstack-infra02:49
*** yamamoto has quit IRC02:49
*** felipemonteiro has joined #openstack-infra02:51
*** dayou has quit IRC03:05
*** diablo_rojo has quit IRC03:06
*** hongbin has joined #openstack-infra03:12
*** felipemonteiro has quit IRC03:16
*** dayou has joined #openstack-infra03:20
*** dave-mccowan has joined #openstack-infra03:24
ianwhrm, i'm not sure we haven't broken the system-config gate now03:33
*** dave-mccowan has quit IRC03:36
*** felipemonteiro has joined #openstack-infra03:37
ianwhttp://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/job-output.txt.gz03:37
ianw[WARNING]:  * Failed to parse /opt/system-config/inventory/openstack.yaml with03:38
ianw2018-11-07 01:00:14.693975 | bridge.openstack.org | openstack plugin: Auth plugin requires parameters which were not given:03:38
ianw2018-11-07 01:00:14.694008 | bridge.openstack.org | auth_url03:38
clarkbhrm we don't want to use the openstack plugin there do we? the inventory is provided by the job itself03:40
clarkbweird that it started later though03:40
clarkbianw: thats part of the ansible config that says which plugins to enable iirc03:41
*** kota_ has quit IRC03:41
ianwyeah, i dunno, it's maybe working now ...03:42
clarkbianw: ok we copy ./playbooks/roles/install-ansible/files/ansible.cfg to /etc/ansible/ansible.cfg. Then update the ini values to remove openstack.yaml?03:44
clarkbhttp://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ara-report/file/16abe354-93fb-4feb-a763-838fed1379b6/#line-24 I'm guessing that didn't work for some reason?03:45
clarkbunfortunately the logging there is light03:45
clarkbianw: http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ansible/ansible.cfg we seem to log that file nad it doesn't look updated03:46
clarkbthat is really weird03:46
clarkbara indicates it did what we wanted03:47
clarkbbug in ansible maybe?03:47
*** owalsh_ has joined #openstack-infra03:47
ianwi'm not sure, i think that the openstack plugin should have read the file, but didn't, giving that error with auth_url03:48
ianwmaybe it was an old openstacksdk03:49
clarkbianw: well I don't thin kwe are supposed to use the openstack plugin at all03:49
clarkbthat is what http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ara-report/file/16abe354-93fb-4feb-a763-838fed1379b6/#line-24 should chnage (it writes a new inventory option value)03:49
*** rkukura has quit IRC03:50
clarkbbut if you look at the file value we log for that job its still got the original inventory option value03:50
ianwoh right, ok03:50
ianwwell, it all just passed, so it's non-deterministic :/03:50
*** owalsh has quit IRC03:51
clarkbya none of the three options we should update appear to have bene changed there03:51
*** bhavikdbavishi has joined #openstack-infra03:52
clarkblooks like they try really hard to make that an atomic update03:57
clarkbI wonder if it is buggy03:57
clarkbI take that back. This ran in ansible 2.5.11 that doesn't do the atomic move04:01
clarkbianw: I wonder if it is a sync issue? we read the file, update the contents and write it back then we do that over and over to do each individual option04:06
clarkbif we somehow read back something that hasn't been updated on the write side?04:06
openstackgerritMerged openstack-infra/system-config master: Update citycloud project details  https://review.openstack.org/61603904:08
clarkbianw: if it persists maybe we should use sed :/04:12
*** felipemonteiro has quit IRC04:15
*** yamamoto has joined #openstack-infra04:21
*** bhavikdbavishi has quit IRC04:30
*** felipemonteiro has joined #openstack-infra04:31
*** roman_g has quit IRC04:35
*** felipemonteiro has quit IRC04:43
*** hongbin has quit IRC04:47
*** felipemonteiro has joined #openstack-infra04:47
*** felipemonteiro has quit IRC04:53
*** noama has joined #openstack-infra04:54
*** annp has quit IRC05:00
*** longkb has quit IRC05:00
*** longkb has joined #openstack-infra05:00
*** agopi has joined #openstack-infra05:08
ianwit looks like 2604:e100:1:0:f816:3eff:fe05:7ce0 is hanging hte ansible run05:09
ianwit's planet.o.o05:11
ianw$ ssh ianw@planet.openstack.org05:11
ianwssh_exchange_identification: read: Connection reset by peer05:11
ianwor it hangs; but apache is responding05:12
ianwplanet01 login: [195722.883411] INFO: task jbd2/vda1-8:287 blocked for more than 120 seconds.05:16
*** hamzy has joined #openstack-infra05:18
*** armax has joined #openstack-infra05:20
ianwlooks like it's dead on the remote end05:20
ianwinteresting, i can't seem to get to vexxhost.com ... can anyone else?05:21
clarkbit pings but no http(s)05:22
bkeroI get a HTTP connection and SSL negotiation05:24
clarkbianw: those hosts are boot from volume ceph right? though I don't know if planet is. But maybe ceph had a sad05:24
ianwclarkb: i'm not sure; planet.o.o is the only control-plane server we have in vexxhost05:27
ianwi tried to reboot it, and it's gone into an error state05:27
ianw{'message': 'internal error: process exited while connecting to monitor', 'code': 500, 'created': '2018-11-07T05:20:07Z'}05:28
*** rascasoft has quit IRC05:28
*** pall has quit IRC05:28
clarkbya the vda message makes me think that maybe the backing store is unhappy05:29
ianwi'm seeing a cloudflare error for vexxhost.com now.  so i'm guessing something is wrong :/05:29
ianw#status log planet.o.o shutdown and in error state, vexxhost.com currently not responding (planet.o.o is hosted in ca-ymq-2)05:30
openstackstatusianw: finished logging05:30
ianwi guess what i'll do is put it in the emergency file.  hopefully it doesn't require being rebuilt but i'm not sure what else to do right now05:30
*** rascasoft has joined #openstack-infra05:31
ianwfor reference, here is the status of the server after i rebooted it -> http://paste.openstack.org/show/734320/05:31
ianwand here is the log of it going down -> http://paste.openstack.org/show/734321/05:32
clarkbseems like a reasonable step and we can check in with mnaser when he is around05:32
ianwhrm, the emergency.yaml file is empty05:33
ianwi'm not sure that's right05:33
clarkbianw: iirc mordred did that because all the emergency file hosts were in the disabled lists in yamlgroups file05:34
clarkbso we didn't need anything in emergency file?05:34
clarkbit is getting late for me though and election things are wrapping up so I expect to be afk soon05:34
ianwoh, ok, so we just add them as an entry under disabled?05:34
clarkbya or you can add an emergency file for more temporary things05:35
clarkb(since emergency file doesn't go thorugh code review)05:35
ianwwhy is /etc/ansible/groups.yaml in the old format?05:37
ianwhosts/groups.yaml ...05:38
clarkbianw: I think that file is ignored (we should probably clean that up) the ansible.cfg points to other files (those in /opt/system-config/inventory)05:38
ianwohhh, right05:39
ianwso we're not deploying it, i must have missed that bit05:39
ianw#status log planet01.o.o in the emergency file, pending investigation with vexxhost05:42
openstackstatusianw: finished logging05:42
*** wznoinsk has quit IRC05:46
*** jrist has quit IRC05:58
openstackgerritIan Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job  https://review.openstack.org/61489405:58
openstackgerritIan Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk  https://review.openstack.org/61598205:58
*** d0ugal has quit IRC06:05
*** jrist has joined #openstack-infra06:11
*** dayou has quit IRC06:14
*** dayou has joined #openstack-infra06:15
*** haleyb has quit IRC06:20
*** d0ugal has joined #openstack-infra06:21
*** apetrich has quit IRC06:29
*** apetrich has joined #openstack-infra06:44
*** felipemonteiro has joined #openstack-infra06:50
*** bhavikdbavishi has joined #openstack-infra06:52
*** andreaf has quit IRC06:53
*** andreaf has joined #openstack-infra06:55
*** diablo_rojo has joined #openstack-infra06:56
AJaegerconfig-core, could I get a second +2 on https://review.openstack.org/615637 (clarkb already reviewed) - removes now obsolete publish-static jobs from project-config, please?06:58
AJaegerianw: I left a question on https://review.openstack.org/#/c/615698/5/roles/validate-host/library/zuul_network_validate.py for you07:07
*** bhavikdbavishi has quit IRC07:10
*** quiquell|off is now known as quiquell07:10
*** dpawlik has joined #openstack-infra07:11
*** bhavikdbavishi has joined #openstack-infra07:13
*** alexchadin has joined #openstack-infra07:15
*** dpawlik has quit IRC07:17
*** maciejjozefczyk has joined #openstack-infra07:17
*** dpawlik has joined #openstack-infra07:17
openstackgerritMerged openstack-infra/project-config master: Remove publish-static  https://review.openstack.org/61563707:18
*** rkukura has joined #openstack-infra07:30
*** ccamacho has joined #openstack-infra07:31
*** diablo_rojo has quit IRC07:32
*** apetrich has quit IRC07:33
*** pcaruana has joined #openstack-infra07:36
*** bhavikdbavishi has quit IRC07:37
*** aojea has joined #openstack-infra07:42
*** ifat_afek has joined #openstack-infra07:43
*** yamamoto has quit IRC07:43
*** apetrich has joined #openstack-infra07:47
*** e0ne has joined #openstack-infra07:49
*** rkukura has quit IRC07:49
*** rkukura has joined #openstack-infra07:49
*** quiquell is now known as quiquell|brb07:49
*** rkukura has quit IRC07:49
*** yamamoto has joined #openstack-infra07:57
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: doc: fix typo in secret example  https://review.openstack.org/61609508:02
*** jtomasek has joined #openstack-infra08:02
*** quiquell|brb is now known as quiquell08:04
*** felipemonteiro has quit IRC08:06
ifat_afekHi, IRC bot seems to be down on #openstack-meeting-4. The Vitrage meeting is not being logged08:07
AJaegerifat_afek: you added an extra space before "#startmeeting" - try again without08:08
AJaegerifat_afek: see http://eavesdrop.openstack.org/irclogs/%23openstack-meeting-4/%23openstack-meeting-4.2018-11-07.log for your invocation08:09
ifat_afekAJaeger: you are probably right, I’ll try again. Thanks!08:09
AJaegerifat_afek: yes, try again...08:10
*** lpetrut has joined #openstack-infra08:20
*** ginopc has joined #openstack-infra08:20
*** bhavikdbavishi has joined #openstack-infra08:23
ianwclarkb: "localhost                  : ok=601  changed=13   unreachable=0    failed=0    skipped=1535" ... yay good cloud launcher run!  but that's using ansible from devel.  so we either need to look into the snat thing or think about https://review.openstack.org/#/c/614894/ and pin to non-release version08:30
ianwAJaeger: thanks, i'll have to think about your comment :)08:30
ianw... till tomorrow, i'm out!08:30
AJaegerianw: good night!08:32
*** ralonsoh has joined #openstack-infra08:33
*** dayou has quit IRC08:35
*** ginopc has quit IRC08:36
*** bhavikdbavishi has quit IRC08:36
*** roman_g has joined #openstack-infra08:47
*** gfidente has joined #openstack-infra08:50
*** jpena|off is now known as jpena08:50
*** dims has quit IRC08:52
*** dims has joined #openstack-infra08:53
*** dims has quit IRC08:58
*** dims has joined #openstack-infra08:59
*** jpich has joined #openstack-infra09:01
openstackgerritMerged openstack-infra/yaml2ical master: update the versions of python 3 claimed  https://review.openstack.org/61526609:04
*** dayou has joined #openstack-infra09:04
*** kukacz has quit IRC09:04
*** kukacz has joined #openstack-infra09:05
*** alexchadin has quit IRC09:06
openstackgerritMerged openstack-infra/yaml2ical master: add base class for recurrence  https://review.openstack.org/61526709:11
openstackgerritMerged openstack-infra/yaml2ical master: add day_specifier to recurrence  https://review.openstack.org/61526809:11
*** yamamoto has quit IRC09:14
*** ssbarnea has joined #openstack-infra09:18
*** shardy has joined #openstack-infra09:19
*** owalsh_ is now known as owalsh09:23
*** jtomasek has quit IRC09:26
*** electrofelix has joined #openstack-infra09:29
*** jtomasek has joined #openstack-infra09:31
*** ttx has quit IRC09:37
*** dayou has quit IRC09:38
*** dpawlik has quit IRC09:38
*** yamamoto has joined #openstack-infra09:41
*** dpawlik has joined #openstack-infra09:42
*** panda|off is now known as panda09:43
*** derekh has joined #openstack-infra09:44
*** ttx has joined #openstack-infra09:50
*** kopecmartin|off is now known as kopecmartin09:59
*** sshnaidm|afk is now known as sshnaidm|rover10:01
*** e0ne has quit IRC10:02
*** e0ne has joined #openstack-infra10:04
*** dayou has joined #openstack-infra10:05
*** longkb has quit IRC10:13
*** rfolco|ruck has joined #openstack-infra10:36
*** yamamoto has quit IRC10:39
*** ginopc has joined #openstack-infra10:39
*** e0ne has quit IRC10:45
*** dtantsur|afk is now known as dtantsur10:47
*** admcleod has joined #openstack-infra10:47
*** admcleod has quit IRC10:48
*** admcleod has joined #openstack-infra10:49
*** alexchadin has joined #openstack-infra10:50
*** bhavikdbavishi has joined #openstack-infra10:51
*** priteau has joined #openstack-infra10:53
*** e0ne has joined #openstack-infra10:54
*** admcleod has quit IRC10:54
*** admcleod has joined #openstack-infra10:59
*** shrasool has joined #openstack-infra11:02
*** yamamoto has joined #openstack-infra11:16
*** bhavikdbavishi has quit IRC11:24
*** yamamoto has quit IRC11:25
*** florianf has quit IRC11:30
*** ssbarnea has quit IRC11:34
*** florianf has joined #openstack-infra11:48
*** ssbarnea has joined #openstack-infra11:53
*** bhavikdbavishi has joined #openstack-infra11:59
*** e0ne has quit IRC12:04
*** e0ne has joined #openstack-infra12:08
*** e0ne has quit IRC12:09
*** pbourke has quit IRC12:11
*** dpawlik has quit IRC12:11
*** pbourke has joined #openstack-infra12:11
*** dpawlik has joined #openstack-infra12:12
*** yamamoto has joined #openstack-infra12:12
*** dpawlik has quit IRC12:16
*** florianf has quit IRC12:16
openstackgerritStephen Finucane proposed openstack-infra/git-review master: As suggested by pep8 don't compare boolean values or empty sequences  https://review.openstack.org/22117212:16
*** alexchadin has quit IRC12:17
openstackgerritDmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces  https://review.openstack.org/39178712:20
*** snapiri has joined #openstack-infra12:22
openstackgerritStephen Finucane proposed openstack-infra/git-review master: option.compatible as a config variable This is to avoid the annoying use of '-c' option with old gerrit servers. Usage doc has been updated to include '-c' option  https://review.openstack.org/44427412:24
openstackgerritStephen Finucane proposed openstack-infra/git-review master: Fix log command used to count refs to be submitted.  https://review.openstack.org/33788312:25
*** e0ne has joined #openstack-infra12:26
openstackgerritStephen Finucane proposed openstack-infra/git-review master: Fix "git-review -d" erases work directory if on the same branch as the change downloaded  https://review.openstack.org/39977912:27
*** ifat_afek has quit IRC12:37
*** ifat_afek has joined #openstack-infra12:38
*** dpawlik has joined #openstack-infra12:39
*** dpawlik has quit IRC12:39
*** dpawlik has joined #openstack-infra12:40
*** ifat_afek has quit IRC12:43
*** sshnaidm|rover is now known as sshnaidm|afk12:50
*** jpena is now known as jpena|lunch12:53
*** rlandy has joined #openstack-infra12:54
coreycbclarkb: AJaeger: if you have a moment can you respond to this thread? http://lists.openstack.org/pipermail/openstack-dev/2018-November/136337.html12:57
*** shrasool has quit IRC13:06
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for mirror-update.o.o  https://review.openstack.org/61599113:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for rax mirror  https://review.openstack.org/61599213:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all mirrors  https://review.openstack.org/61599313:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for files.openstack.org  https://review.openstack.org/61599413:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for on zookeeper instance  https://review.openstack.org/61599513:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances  https://review.openstack.org/61599613:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver  https://review.openstack.org/61599813:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers  https://review.openstack.org/61599913:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver  https://review.openstack.org/61600013:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev  https://review.openstack.org/61600113:13
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid  https://review.openstack.org/61600213:13
*** jpena|lunch is now known as jpena13:18
*** trown|outtypewww is now known as trown13:21
*** chem has joined #openstack-infra13:23
chemHi, I don't get why https://review.openstack.org/#/c/611677/ doesn't go through the gate.  Should I change the depends-on to point only to the master patch ?13:25
AJaegerchem: new syntax is "depends-on: https://review.openstack.org/616666". It only merges if *all* dependencies are in.13:27
AJaegerchem: and your cahnge has a dependency on a stable change with the ID.13:27
AJaegerchem: so, depends on what you want - wait for all - or only a single change (or multiple ones with multiple depends-on)13:27
chemAJaeger: Thanks for the confirmation.  Will use the new syntax from now on for this kind of patch.13:28
*** bhavikdbavishi has quit IRC13:29
*** ansmith has joined #openstack-infra13:30
*** dpawlik has quit IRC13:35
*** dpawlik has joined #openstack-infra13:35
*** dpawlik has quit IRC13:40
*** e0ne has quit IRC13:41
*** bobh has joined #openstack-infra13:42
*** shrasool has joined #openstack-infra13:45
*** sthussey has joined #openstack-infra13:45
*** pcaruana has quit IRC13:45
*** e0ne has joined #openstack-infra13:45
*** agopi has quit IRC13:47
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one zookeeper instance  https://review.openstack.org/61599513:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances  https://review.openstack.org/61599613:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver  https://review.openstack.org/61599813:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers  https://review.openstack.org/61599913:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver  https://review.openstack.org/61600013:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev  https://review.openstack.org/61600113:50
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid  https://review.openstack.org/61600213:50
*** kgiusti has joined #openstack-infra13:50
openstackgerritLuka Peschke proposed openstack-infra/irc-meetings master: Creating an IRC meeting for CloudKitty  https://review.openstack.org/61620513:51
dhellmannsmcginnis , fungi, AJaeger: we had another rsync error in the releases repo publish job. I thought we had eliminated all of those, so I wonder if this is somehow due to switching to the new job? http://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/13:52
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620613:57
*** mriedem has joined #openstack-infra13:58
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620613:59
*** pcaruana has joined #openstack-infra14:00
openstackgerritSlawek Kaplonski proposed openstack-infra/irc-meetings master: Add ralonsoh to chairing neutron-qos meeting  https://review.openstack.org/61620814:03
*** shrasool has quit IRC14:03
smcginnisdhellmann: I didn't think we actually did something to fix those. There was a thought that the way zuul would run things it would be less likely, but I assumed we hadn't seen it in awhile just because we didn't release two branches of the same deliverable at the same time lately.14:08
dhellmannok, I couldn't remember what changed but I thought it was fixed14:09
dhellmannI'm not sure why releasing different branches would matter, since it's always master in that repo14:09
smcginnisSomething about the local paths ending up the same or something. It is a bit vague from my memory.14:10
dhellmannhmm14:10
dhellmannok14:10
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620614:11
*** e0ne has quit IRC14:13
*** e0ne has joined #openstack-infra14:14
*** agopi has joined #openstack-infra14:17
openstackgerritMerged openstack-infra/nodepool master: Correct heading levels for Kubernetes config docs  https://review.openstack.org/61600714:20
*** rh-jelabarre has joined #openstack-infra14:21
*** ifat_afek has joined #openstack-infra14:22
*** yamamoto has quit IRC14:29
efriedAnyone else having log server woes? Some (but not all) requests to the log server just spin and eventually time out? E.g. http://logs.openstack.org/24/615724/2/check/openstack-tox-lower-constraints/a135914/14:32
smcginnisHmm, first I've seen of that, but I get the same thing with that link.14:32
efriedIt happened briefly yesterday, then seemed to clear up.14:33
*** pcaruana has quit IRC14:33
efriedand I thought it was happening on some links and not others, but maybe it's just that it's intermittent.14:33
*** pcaruana has joined #openstack-infra14:34
cmurphyyeah it's been kind of slowish for me14:35
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances  https://review.openstack.org/61599614:38
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver  https://review.openstack.org/61599814:38
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers  https://review.openstack.org/61599914:38
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver  https://review.openstack.org/61600014:38
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev  https://review.openstack.org/61600114:38
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid  https://review.openstack.org/61600214:38
*** florianf has joined #openstack-infra14:40
*** bobh has quit IRC14:42
*** jcoufal has joined #openstack-infra14:43
*** bobh has joined #openstack-infra14:50
*** yamamoto has joined #openstack-infra14:52
*** jistr is now known as jistr|call15:00
*** priteau has quit IRC15:01
*** priteau has joined #openstack-infra15:03
*** anteaya has joined #openstack-infra15:08
*** rpioso|afk is now known as rpioso15:13
*** roman_g has quit IRC15:15
*** felipemonteiro has joined #openstack-infra15:16
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: add day_specifier from recurrence  https://review.openstack.org/61527015:19
*** dpawlik has joined #openstack-infra15:22
*** dpawlik has quit IRC15:22
*** dpawlik has joined #openstack-infra15:23
fungithe cacti graphs for static.o.o don't look especially anomalous15:25
fungihttp://logs.openstack.org/24/615724/2/check/openstack-tox-lower-constraints/a135914/ loaded for me instantly but i can try reloading it a bunch and see what happens15:26
funginot seeing anything so far15:26
fungidmesg on the server isn't reporting trouble reaching any of its cinder volumes, nor any problems with the filesystem15:27
*** jistr|call is now known as jistr15:29
fungidhellmann: smcginnis: yes it could be changes triggered by different branches racing i suppose... zuul no longer runs jobs in parallel in (release-)post if they're triggered by ref updates for the same project+branch but could still run them in parallel if they're for different branches of the same project (i think)15:30
fungitriggered by ref updates of different branches for the same project, that is15:31
fungithough looks like that build was triggered by a ref update for the master branch of openstack/releases? and it only has one branch, right?15:32
corvusfungi: i believe that description of behavior is correct15:32
fungirsync: rename failed for "/srv/static/releases/.buildinfo" (from .~tmp~/.buildinfo): No such file or directory (2)15:33
fungihttp://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/ara-report/15:34
fungier, http://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/ara-report/result/afe3b900-624a-4435-bf7f-01db4b4981a0/15:34
smcginnisfungi: I thought that was the same behavior we were seeing before.15:35
fungiit certainly looks familiar15:35
smcginnisBefore as in, when we saw two jobs running simultaneously and modifying local temporary directory paths.15:35
fungiwell, i don't know that we ever managed to confirm positively that was the cause. i remember speculating about the possibility15:36
*** felipemonteiro has quit IRC15:39
*** ralonsoh has quit IRC15:41
*** dtantsur is now known as dtantsur|afk15:42
*** david-lyle has quit IRC15:42
*** dklyle has joined #openstack-infra15:47
*** kopecmartin is now known as kopecmartin|off15:51
*** mriedem has quit IRC15:53
*** ralonsoh has joined #openstack-infra15:54
*** gyee has joined #openstack-infra15:58
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Migration to PHP 7.x  https://review.openstack.org/61622616:00
ssbarneafungi: do you know of we can use "tr:" or "bug:" searches on our gerrit?16:07
clarkbssbarnea: should be able to if searching with the message: filed16:08
clarkb*field16:08
ssbarneai am trying to find a way to perform a query that would return all CR related to a specific sprint. using topic is not possible as we have multiple ones.16:08
clarkbmessage:"bug: foo" type of query16:08
fungissbarnea: we likely need 471078 which nobody ever reviewed after i proposed it16:09
fungiif it's something you'd like to take advantage of i can try to get people to take a look16:10
ssbarneaclarkb: ok, so if we use a keyword like ooo-sprint-123 we can rely on this to get, it would work.16:11
clarkbssbarnea: possibly? you should test that gerrit is indexing the entire commit message16:11
clarkbbut yes I expect that is how it should work16:11
ssbarneafungi: well, I do also support https://review.openstack.org/#/c/471078/ -- and added a +1 on it. who can merge it?16:11
*** agopi is now known as agopi|lunch16:14
*** lpetrut has quit IRC16:14
fungissbarnea: infra-root members are core reviewers on that repository16:15
clarkboh thats an actual gerrit feature TIL16:15
ssbarneabtw, we started to use taiga and we are posting URLs on "Story:". They look bit ugly but they work.16:15
ssbarneayep, tr:/bug: is a gerrit thing. in fact I hope to see it upgraded to be able to benefit from other improvements.16:16
clarkbssbarnea: any reason to not use storyboard?16:16
clarkbI thought tripleo had migrated16:16
*** imacdonn has quit IRC16:17
ssbarneayeah, lots of them and more important was to use same tool as rdo infra team. they started first.16:17
*** imacdonn has joined #openstack-infra16:17
clarkbssbarnea: the feedback would likely be appreciated by the storyboard team if anyone has time to write it down16:18
fungiif we get that tracking change in (it may need updating, i can check after my current meeting) then it will go into effect along with the task footer hyperlinks at the next gerrit restart16:18
*** quiquell is now known as quiquell|off16:19
ssbarneaclarkb: i plan to do this but it will be in december: i need to learn more about both of them.16:19
mordredI didn't think the openstack policy was to allow official projects to use whatever tracker they felt like, but instead that projects needed to use one of the agreed upon tools that the community as a whole uses - did we change that policy?16:20
ssbarneathis was the 2nd sprint and we still learn about it, after 3 sprints I think I will know enough to make a correct comparison.16:20
clarkbmordred: I don't know if the TC has said one way or the other16:20
ssbarneaslow down: LP is still used for bugs, this is planning tool.16:20
fungimordred: it sounds more like tripleo is interested in splitting off from openstack anyway? so should be fine16:21
clarkbssbarnea: storyboard provides that functionality is the concern I think16:21
mordredclarkb: the _original_ tooling decision, that stands until it's changed, is that projects must use tools from a tc-approved selection of tools16:21
mordredfungi: ah - I didn't know that16:21
mordredssbarnea: and no worries either way - I'm mostly just clarifying what's up16:21
fungiit's what i'm getting from this discussion at least16:21
clarkbssbarnea: there are workboards that can either be automatically managed or managed by hand to track work in a kanban like fashion (which is what taiga does aiui)16:21
* mordred isn't going to rage around anywhere or anything16:21
fungibut yeah, they were using trello before, which wasn't something openstack had officially approved either16:22
mordredah - gotcha16:22
* SotK would be interested in hearing about what taiga provides that storyboard doesn't, or what is so painful in storyboard as to make folk go elsewhere16:22
ssbarneai don't want to upset anyone that spend effort on storyboard, so I will stop here... but i will try to document what is missing.16:23
fungiSotK: doesn't sound like it's even that, more that a red hat product team had already decided to use taiga, and as another red hat product the tripleo team is expected to fall in line16:23
clarkbcoreycb: I did ack the py37 changes from infra perspective16:24
clarkber on the mailing list16:24
coreycbclarkb: awesome, thanks16:25
SotKssbarnea: thanks, I'd appreciate that16:25
*** mriedem has joined #openstack-infra16:26
ssbarneaSotK: it would be useful to hear about a team doing mixed SCRUM and Kanban on storyboard, i am curious how the planning looks like.16:27
fungialso curious how anyone ever thought applying agile methodology to a free software project was anything other than completely nuts16:28
dmsimard:(16:30
fungii've never been a fan of agile to start with, but seems like it sucks all the fun out of making free software16:32
clarkbssbarnea: fwiw I think the concern here is that an openstack project is demanding that someone lese use the openstack tool first rather than being willing to dogfood. particularly when the motivation seems to be that a non openstack project has chosen a different tool16:32
clarkbssbarnea: it would be a lot easier to swallow if we could identify deficiencies that could at least be reported and possibly addressed, but as is we don't have even that feedback16:33
dmsimardfungi: let's have alcoholic beverages over that topic one day16:37
*** pcaruana has quit IRC16:38
fungihappy to. i've turned down a number of jobs over their use of agile, because i find it absolutely painful16:38
ssbarneadmsimard: first good comment in while! this is the kind of subject to discuss over drinks.16:39
ssbarneassbarnea: experience with storyboard while trying to create a board:  500: POST /api/v1/worklists: _CallbackResult was already set  --- and this is not the first time when I see failures or timeouts.16:41
ssbarneaeven the /#!/board/list page takes like 5-10s to load the two columns, even they have <100 elemens each.16:43
*** e0ne has quit IRC16:44
*** e0ne has joined #openstack-infra16:44
*** e0ne has quit IRC16:44
*** shardy has quit IRC16:46
clarkbssbarnea: poking around storyboard it appears that starlingx may be attemping to use the boards in the way you were describing? dtroyer may have more info16:49
hogepodgeI'm having a tough time accessing logs.openstack.org16:49
ssbarneaperfect, I will keep an eye on that.16:50
ttxhogepodge: yes me too16:50
hogepodgeI'm trying to figure out why Loci jobs are failing stochastically, with a failure rate of about 1 in 1016:50
clarkbhogepodge: ttx trouble is opening log files?16:50
hogepodgeclarkb: I can't access the ARA reports. I was able to read the job log, but now I want to chase down what exactly is timing out to see if it's a problem on our side or the infra side.16:51
clarkbseems like indexes open quickly16:51
clarkbbut actual content does not16:51
ssbarneai can confirm the logs issue, but it also happens often to load the index page fast and when you click to open the log file ... "waiting for ...". Many times I press it multiple time, probably causing extra load to the server.16:51
fungia couple of folks reported a problem accessing logs earlier but the link they posted was to the index which was loading fine for me. i didn't get that it was maybe the log content they were having trouble with16:52
ssbarneathis is not an unique problem, happened like this for weeks, but experience varies a lot.16:52
fungicacti wasn't indicating anything out of the ordinary for the server itself16:52
fungiapache processes are spiking to ~100% cpu16:53
fungicould this be only a problem for ara and prettified logs?16:54
fungior are raw logs having trouble too?16:54
odyssey4meI'm getting timeouts when trying to brows logs/ara reports from time to time. Is it just me, or are others seeing this too?16:54
odyssey4meah, fungi same with raw logs16:54
ssbarneaprobably because they are trying to re-gzip an already gzip files, but it does make no sense that file was 85kb only (gz).16:54
clarkbfungi: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=138&rra_id=all that is very periodic in its iowait vs not. But we are in the not iowait period16:54
ssbarneatake a look http://logs.openstack.org/13/609413/5/check/tripleo-ci-centos-7-containers-multinode/0099c05/job-output.txt.gz16:54
odyssey4meI hit it with http://logs.openstack.org/18/616218/1/check/openstack-ansible-functional-centos-7/94aef6a/job-output.txt.gz and http://logs.openstack.org/18/616218/1/check/openstack-ansible-functional-centos-7/94aef6a/logs/ara-report/ on the first try, then a refresh worked.16:55
ssbarneai tried to load it multiple times and it takes ages to start serving it.16:55
odyssey4messbarnea yep me too16:55
clarkbssbarnea: -rw-r--r-- 1 jenkins jenkins  83K Nov  6 16:21 job-output.txt.gz16:55
ssbarneai tried even with wget same experience: the web server doesn't even start to reply for about 10-20s. when it starts you get it instantly, but connection is the issue.16:55
clarkbssbarnea: looks like 83K according to the filesystem16:56
clarkbssbarnea: so I don't think file sizes are the issue here16:56
fungithere is a fair amount of cache memory in use16:56
ssbarneaclarkb: i agree with you, is something with the webserver.16:56
fungifairly large spikes in established tcp connections16:56
clarkbfungi: is it possible we are queuing connections?16:57
fungia but higher than this time previous weeks16:57
ssbarneaeither lack of http connections or something more sinister regarding how files are served.16:57
fungiwell, yes that could be a symptom of apache taking longer to return content16:57
corvusi've pulled up a server-status page... there are available connection slots16:58
corvusodyssey4me: are you going over ipv4 or v6?17:00
clarkbcorvus: I am ipv4 and saw it17:01
odyssey4mecorvus hmm, good question - most likely v4 but let me verify17:01
corvusi'm not seeing any v4 packet loss from my workstation17:02
fungithings are loading for me over v6, though slow-ish17:02
*** irclogbot_1 has joined #openstack-infra17:02
odyssey4meyep, looks like v4 for me17:02
fungiand once they're cached they seem speedier17:02
fungiactually they're back to loading very quickly for me now17:02
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=311&rra_id=all doesn't seem to show an appreciable drop in data being served17:03
fungii have a feeling there is contention for the storage leading to long access times reading from disk. could be something like a noisy neighbor starving bandwidth for the provider's cinder network connection17:03
fungior a stripe set is degraded, or...17:04
clarkbfungi: or given we are still serving the same amount of data maybe we have specific files/requests being our own dos?17:04
fungialso possible17:04
ssbarneaout of file handlers?17:04
ssbarneait is true that I also have ipv6 working (even if it proved to be permanent source of problems)17:05
clarkbwe are using 78% of disk with ~50% of inodes in use which is a huge swing in the opposite direction of the old not enough inodes problem17:06
clarkbdoes make me wonder if there are large files dominating the downloads17:06
corvusfwiw, i'm seeing cacti behaving slowly.  that's a different server.17:06
clarkb(and using available bw/disk iops)17:06
fungiwhere are the queue graphs for our logstash workers lately?17:06
clarkbfungi: grafana under zuul-status dashboard17:06
fungiahh, just found17:06
fungisorry for the noise17:06
clarkbmight be "Zuul Status"17:06
fungisime fairly large spikes in logstash queue17:07
fungiperhaps those could correspond to starvation?17:07
fungihttp://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=117:07
clarkbfungi: ya could be the pipeline backs up waiting for logs to process17:08
*** efried is now known as efried_rollin17:08
fungiyour comment about "maybe we're being our own dos" got me thinking about that17:09
mordredyah - that could totally be a thing17:09
mordredwe're very good at DOSing things17:09
clarkbapache logs should tell us which requests take the most time right?17:11
fungithe apache access logs don't tell you how long the connection took, only the time it started and the size transferred17:12
clarkbI do wonder if there are specific files dominating (maybe because wsgi middleware is spinning on them or they are huge)17:12
fungiit may be possible to add an elapsed time field? i know there are options for adding fields to the access log17:12
mordredclarkb: 31849 www-data  20   0  343168  74488   6044 S  97.6  0.9  51:45.11 apache217:14
mordredclarkb: there is defintely some high-cpu apaching going on17:14
fungiwell, i mean apache will use as much cpu as it can to fulfill a request. it's more a question of how long it's sustained for17:15
*** trown is now known as trown|lunch17:15
*** rkukura has joined #openstack-infra17:15
mordredyah17:15
ssbarneahigh cpu for serving plain text files? this smells like .log.gz recompresson to me.17:17
*** yamamoto has quit IRC17:17
clarkbssbarnea: remember the wsgi that rewrite the text based on regexes17:18
*** florianf is now known as florianf|afk17:18
ssbarneaclarkb: nope.17:19
clarkbssbarnea: thats the os-loganalyze stuff you updated teh css for17:19
clarkbit checks against known lists of files, then rewrite the plain text into html for colorizing and severity filtering17:20
clarkb(so it isn't just gzip)17:20
ssbarneaclarkb: yep but i was expecting this to happen offline, not on each request.... it would be stupid.17:21
clarkbssbarnea: its wsgi, it runs on request17:21
clarkbssbarnea: and its not stupid because our contraint is disk space not cpu typically17:21
clarkbif we did it online the 12TB of disk we have would be even less useful17:21
clarkber offline17:21
ssbarneai was expecting the logs server to be as plain as possible, easy to replace with a CDN.17:22
mordredthere is work underway to shift to uploading logs to swift - when that is done, what is now active wsgi is done at upload time17:23
clarkbfungi: fwiw on the log processors I see them basically going out to lunch like everyone else. Haven't found an indication they are the cause of the lunch break yet17:23
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: Use python3 to run pep8 job  https://review.openstack.org/61625617:24
fungiclarkb: yeah, the spikes there may be symptoms of the underlying problem just like the spikes in the established tcp connections graph for the logserver17:24
*** roman_g has joined #openstack-infra17:24
clarkband the log processors are in the same cloud as the fileserver so I think that rules out internet networking trouble17:24
corvusssbarnea: please don't call the good work of other folks stupid.17:25
openstackgerritFabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver  https://review.openstack.org/60440417:25
fungissbarnea: 99.99% of the logs on the logserver are never viewed with a browser, so rendering the browser-friendly view on demand is actually far less computation expended17:25
ssbarneacorvus: it wasn't my intent to upset anyone.17:26
*** shrasool has joined #openstack-infra17:26
fungipre-rendering them all *would* be stupid, in my opinion17:27
corvusfungi: well, that's what the logs-in-swift approach does :)17:27
fungiyep, only because we end up with no other choice17:27
clarkbScript timed out before returning headers: wsgi.py17:27
fungirather, with no more efficient choice17:28
ssbarneai would go so far to say that processing logs should be the task of the *uploader*.17:28
mordredssbarnea: yup. that's what logs-in-swift does17:28
clarkbthat log comes from logs.openstack.org error file17:28
*** jpich has quit IRC17:28
fungissbarnea: yeah, that's basically what the logs in swift solution is switching ot17:28
clarkbso either the disk isn't returning the data or os-loganalyze is getting really confused17:28
clarkbnow to figure out what files were requested at that time :/17:30
ssbarneait seems that is time for me to play a little bit with os-loganalyze, unless there are plans to replace it with something else.17:31
fungiif it ends up being noisy neighbor bandwidth consumption on the cinder network, we should expect to find no appreciable correlation for the requests being processed17:31
ssbarneaalso another question that may sound weird: why we are not doing the processing client-side (javascript)? it would mess the browsers?17:31
clarkbok I see hogepodge's requests (they stand out because its ara and loci)17:31
*** bobh has quit IRC17:32
hogepodgeclarkb: ? Are we doing something that needs to be addressed?17:32
fungihogepodge: you merely provided a great example to research17:32
clarkbhogepodge: I don't think so. Mostly just I can correlate the errors you saw to specific logs entries17:32
hogepodgeclarkb: with loading logs?17:33
hogepodgeclarkb: or with the job timeouts?17:33
*** mriedem has quit IRC17:33
corvusssbarnea: if you write a javascript log viewer, i can incorporate it into zuul and the swift work i'm doing.  see http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000501.html for what that would look like17:33
mordred++17:33
clarkbhogepodge: loading logs and that being slow17:34
mordredssbarnea: yah - a js log viewer has been on my tdl - just haven't gotten to it. I wouldn't spend much effort on improving the os-loganalyze wsgi app though - it will hopefully be going away in the not too distant future17:34
ssbarneacorvus: i am already doing something like this in https://github.com/openstack/coats -- so it may not bee such a big deal.17:35
*** rh-jelabarre has quit IRC17:35
ssbarneanowadays clients are more powerful than the severs, also a less powerful client may just skip doing any processing and load the plain text.17:36
clarkbfwiw both os-loganalyze and ara wsgi timeout17:36
clarkbunless we've got a bug in both my hunch is that it is disk related17:37
*** ginopc has quit IRC17:38
clarkbreading os loganalyze it returns headers with a start_response basically as soon as it has created the python fileinput object17:40
*** shrasool has quit IRC17:41
clarkbcorvus: jobs don't have a log upload size limit yet right? or do they?17:41
*** rkukura has quit IRC17:42
corvusclarkb: only the executor workspace limit17:42
corvusi have a request hanging for a plain old svg file.17:43
*** derekh has quit IRC17:44
*** shrasool has joined #openstack-infra17:44
*** shrasool has quit IRC17:44
corvusi don't see my browser connection in netstat17:44
clarkbcorvus: now we don't have available connectiosn on apache17:45
clarkbmaybe it is/was that all along17:45
fungiwhich, again, is likely a symptom of taking too long to return content17:45
fungiconnections pile up waiting17:46
clarkbfungi: ya though if we have ~80 logstash log processors we can chew up a good number of apache workers17:46
clarkbapache status shows what seems to mostly be log processing17:46
*** noama has quit IRC17:46
corvusi think the "_" connections are still available for use17:46
fungiwe could pause them for a bit and see what happens17:46
*** jpena is now known as jpena|off17:46
clarkband then another good chunk is red hat doing something17:46
clarkbcorvus: ah17:47
clarkblots of red hat requests to ci.openstack.org17:47
clarkb(which is fine, still fewer of those than log processors)17:48
clarkb(I didn't realize that alias was in place too, neat)17:49
corvusyeah i still use it.  so easy to type :)17:49
clarkbgiven that we have cpu and memory available to expand should we consider allowing for more connections to see if that helps? It could be that the requests using up the slots aren't any slower than before (see bw hasn't changed much) but we have more of them?17:51
clarkbthat doesn't explain the wsgi errors that correlate to when hogepodge was having trouble though17:52
corvusclarkb: i don't think we're running out of slots17:53
clarkbcorvus: wouldn't your conenction have gone though if we had slots? Or do you suspect possibly some other error in front of apache?17:53
corvusclarkb: since i've seen latency in http requests to static and cacti as well as shell latency on logs, i'm suspecting that there's a network issue between me and rax.  but i can't pinpoint it, or say whether it's closer to me or rax.17:55
corvuss/logs/static/17:55
*** yamamoto has joined #openstack-infra17:56
clarkbcorvus: do you know why dm-2 (logs) isn't in cacti but dm-0 and dm-1 (static and tarballs) are?18:00
clarkbI'm thinking adding dm-2 graphs to cacti is what we may need to further debug this18:00
corvusclarkb: no; i think those may have had to be added manually18:00
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: add day_specifier from recurrence  https://review.openstack.org/61527018:00
corvusclarkb: ++18:00
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: Add ralonsoh to chairing neutron-qos meeting  https://review.openstack.org/61620818:00
openstackgerritThierry Carrez proposed openstack-infra/irc-meetings master: Creating an IRC meeting for CloudKitty  https://review.openstack.org/61620518:00
clarkbcorvus: by manual does that mean edit the database directly?18:01
*** jcoufal has quit IRC18:01
corvusclarkb: no, through the ui; i can do it real quick if you want18:02
clarkbcorvus: that would be great (I'm not quite sure where to start myself)18:02
clarkbmaybe you can show me in berlin18:02
corvusoh, hrm.  there are 81 http connections from my ip in CLOSE_WAIT18:02
clarkbcorvus: we have both iops and bytes for dm-0 and dm-118:02
fungithat sounds like packet loss18:03
openstackgerritMerged openstack-infra/irc-meetings master: Use python3 to run pep8 job  https://review.openstack.org/61625618:03
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes  https://review.openstack.org/61626218:04
corvusthere are 162 connections in closewait from 3 ips.  one is mine, 1 is redhat in boston, one is comcast in washington18:04
openstackgerritMerged openstack-infra/zuul master: Fix reporting ansible errors in buildlog  https://review.openstack.org/61620618:04
openstackgerritMerged openstack-infra/zuul master: doc: fix typo in secret example  https://review.openstack.org/61609518:04
clarkbthe red hat oen is NAT so could push a lot of connections, but if you alone account for half of that then maybe that isn't the case for the red hat ip18:05
clarkbI've got to go find breakfast now but ya my suggestion is add dm-2 to cacti and go from there if the network trouble from corvus/redhat/comcast isn't the issue18:08
corvusclarkb: dm2 is added now18:09
*** yamamoto has quit IRC18:09
clarkbI see it (but no graph yet)18:09
clarkbthanks18:09
corvusyeah, it will only have data starting now (we should see a graph in 5m, and a graph with data in 10m)18:10
*** panda is now known as panda|off18:10
fungifwiw, i'm not seeing any (icmp) packet loss to static.o.o from home over either v4 or v6 routes18:10
corvusme neither (on v4 only).  nor the other direction from static to my home router.18:11
corvusi will delete a bunch of old hosts from cacti while i'm here18:11
*** agopi|lunch is now known as agopi18:15
*** trown|lunch is now known as trown18:20
*** e0ne has joined #openstack-infra18:22
*** e0ne has quit IRC18:25
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Remove infracloud from cacti  https://review.openstack.org/61626518:26
*** rkukura has joined #openstack-infra18:26
openstackgerritSorin Sbarnea proposed openstack-infra/os-loganalyze master: Adds libmagic on plaforms missing it  https://review.openstack.org/61626618:28
*** Swami has joined #openstack-infra18:30
*** ifat_afek has quit IRC18:30
*** ralonsoh has quit IRC18:30
ssbarneacorvus: any reason why pypy is included on os-loganalyze tox targets?18:31
ssbarneais kinda broken on my system, and it was a surprise to see it, especially as py3 is missing.18:32
fungilikely inherited from a very old cookiecutter template18:32
mordredssbarnea: probably just historical18:32
mordredyeah18:32
corvusyep, historical accident of when active work on osla more or less stopped18:32
corvus(at that time, pypy was interesting and py3 was not [shrug])18:32
fungiwe could install python3.1 from a ppa back then, i think, or maybe compile 3.2 from source18:33
fungipy3k porting didn't really gain traction in our community until 3.3 was readily available18:33
hogepodgeso clarkb and fungi, once the the log situation is sorted out, any chance we can take a peek at what's going on with jobs like this? https://review.openstack.org/#/c/616044/218:34
fungimainly because the options for writing a nontrivial application which could run unaltered on 2.6, 2.7 and 3.2 was... basically not happening18:34
hogepodgeAbout 1 in every 10 loci jobs is timing out, and I can't find where in the logs things are going wrong.18:34
*** haleyb has joined #openstack-infra18:34
clarkbcorvus: on https://etherpad.openstack.org/p/WUNOTv8MuP some of the faq questions I think you had thoughts on. Care to add them?18:35
hogepodgeIf there's a place I can add logging to get that information (or where I should be looking), I'm all ears. From what I can tell, the build jobs for ubuntu/centos/leap are being done in parallel and it's not returning.18:35
*** e0ne has joined #openstack-infra18:36
clarkbok its happening again with logs amd dm-2 shows  device reads dropping18:37
fungiyeah, i just tried to load the ara-report for the failing job hogepodge mentioned18:37
*** e0ne has quit IRC18:38
hogepodgeI'm sure this has nothing to do with jobs named after a trickster god18:38
*** mriedem has joined #openstack-infra18:38
corvusclarkb: wow that is a write-only filesystem if i've ever seen one :)18:38
clarkbcorvus: indeed18:39
ssbarneamwhahaha: please read my comment on https://review.openstack.org/#/c/616203/18:39
ssbarneamordred: i was expecting historical reasons. so nobody will be against me removing it. later, if needed, i will add a py3 one which seems more practical.18:40
*** jtomasek has quit IRC18:40
mordredclarkb: I just added the word "a" on line 2218:41
mordredssbarnea: ++18:41
ssbarneai guess i should also move zuul job into the repository. if we ask others to do the same, it would be a good idea to do it for infra repos too :D18:42
*** boden has joined #openstack-infra18:42
corvusssbarnea: it's fine to do, but it's low priority for us -- we aren't blockers to our own config changes.18:47
ssbarneacorvus: true, i am doing  it just because I am starting to touch the project. seems like the kind of change you do the first time you need to reconfigure the jobs.18:48
corvusyep18:49
openstackgerritSorin Sbarnea proposed openstack-infra/os-loganalyze master: Move zuul job definition inside the respository  https://review.openstack.org/61627318:50
*** diablo_rojo has joined #openstack-infra18:51
*** rh-jelabarre has joined #openstack-infra18:52
*** kjackal has quit IRC18:52
*** rh-jelabarre has quit IRC18:52
*** rh-jelabarre has joined #openstack-infra18:52
*** eharney has quit IRC18:53
openstackgerritSorin Sbarnea proposed openstack-infra/project-config master: Move os-loganalize job definition inside the project  https://review.openstack.org/61627518:56
*** gtmanfred has quit IRC18:57
openstackgerritSorin Sbarnea proposed openstack-infra/os-loganalyze master: Move zuul job definition inside the respository  https://review.openstack.org/61627318:57
*** roman_g has quit IRC18:57
*** gtmanfred has joined #openstack-infra18:57
*** gfidente is now known as gfidente|afk18:58
ssbarneaare circular dependencies allowed by zuul?18:59
mordredssbarnea: no, they are not18:59
ssbarneamordred: so which CR must go first? the one adding the new jobs, or the one removing it from project-config?19:01
clarkbhogepodge: reading the ara report for that I'm confused why the async stuff is used. Couldn't you just wait for the build to finish? I ask because I think we've found that ansible async is really buggy19:02
clarkbmordred: ^ maybe you know the status on the async work but iirc you had pushed a patch or two to try and clean that up19:02
mordredclarkb, hogepodge it is _definitely_ buggy. when last we sent in patches, we were basically the only ones touching it - and then we stopped19:02
mordredssbarnea: it's safer to add them first, then remove them - if you do it the other way the repo will be ungated for a little bit19:02
mordredssbarnea: double-defining the jobs isn't an issue, they'll still only be run once19:02
Shrewsand ansible declared us "experts in async"  LOL19:02
*** kmalloc is now known as needscoffee19:05
*** bobh has joined #openstack-infra19:06
mordredShrews: to be fair - I believe at that moment in time we were :)19:06
*** bobh has quit IRC19:07
*** priteau has quit IRC19:10
*** priteau has joined #openstack-infra19:12
fungia frightening thought19:13
clarkbfungi: I recall you had an excellent answer for conference opendev association. Care to add that to https://etherpad.openstack.org/p/WUNOTv8MuP faq?19:15
clarkblooking at dm-2 cacti info for the last slow spell I'm not seeing anything that looks suspicious19:16
clarkbthe next step may be to try stracing a wsgi processing when it happens if we can catch it?19:16
fungiclarkb: my pleasure19:17
*** rh-jelabarre has quit IRC19:18
clarkbtyty19:21
*** jcoufal has joined #openstack-infra19:22
*** pfallenop has joined #openstack-infra19:23
*** e0ne has joined #openstack-infra19:23
fungiclarkb: that lgty?19:24
clarkbfungi: I'll look in a sec, but logs is doing it again so I'm stracing stuff19:25
clarkbthe pid reported by apache status for my connection is blocking on a read to a pipe19:26
*** eharney has joined #openstack-infra19:26
clarkbhow do you find out what is on the other side of the pipe?19:26
clarkbprocfs?19:26
AJaegerclarkb: yeah, procfs19:27
AJaeger/proc/<pid>/fd19:27
clarkblr-x------ 1 root     root     64 Nov  7 19:25 7 -> pipe:[1025660877]19:27
AJaegermmh, now I'm lost ;(19:28
clarkblsof seems to think that maybe that is a pipe between apache processes19:29
*** rh-jelabarre has joined #openstack-infra19:29
*** e0ne has quit IRC19:31
corvusclarkb: the faq answers lgtm; i left some comments on the prose in chat19:32
*** dpawlik has quit IRC19:34
*** dpawlik_ has joined #openstack-infra19:34
clarkbthanks19:35
dmsimardclarkb: gave it a read, +119:39
clarkbapparently the wsgi script timed out errors can be related to mod wsgi python itnerpreters and the GIL?19:39
clarkbok heres a theory19:39
clarkbssbarnea's change to osla while not in python would've caused us to reinstall osla and its been a while since that last happened19:40
clarkbmaybe we pulled in some new dep causing that sort of python problem with mod_wsgi?19:40
*** betherly has joined #openstack-infra19:42
clarkbWSGIApplicationGroup %{GLOBAL} could be the workaround for this (but forces all wsgi processes to run in the context of the main python interpreter?)19:43
fungithere should be a pip log which mentions what got upgraded/installed on the server19:44
clarkbthere is no ~root/.pip on logs.o.o19:46
*** electrofelix has quit IRC19:47
*** betherly has quit IRC19:47
clarkbok nevermind we already set the WSGIApplicationGroup to GLOBAL19:47
clarkbcould it be related to having two wsgi apps in the same vhost? (ara and osla)19:47
fungibut why didn't we notice this until *very* recently if so?19:48
fungiwe've been doing ara-on-wsgi for a few months at this point, right?19:48
clarkbfungi: ya but we haven't updated osla in logner than that I think19:49
dmsimardara senses tingling19:49
dmsimardI had to put in a bit of effort for ara and osla to not conflict but it was a matter of rewrite rules, not deps19:50
dmsimardi.e, ara was hijacking osla routes (or vice versa)19:50
clarkbdmsimard: in this case we are seeing both of them time out in apache, which apparently can indicate conflicts between the wsgi processes when sharing the python interpreter?19:51
dmsimardI'll poke at the logs a bit19:53
*** mriedem is now known as mriedem_afk19:53
dmsimardno recent occurrences of those but that's still odd http://paste.openstack.org/raw/734369/19:54
dmsimardoh wait, I was looking at .119:55
clarkbthe next time someone notices slowness can you see if logs-dev.openstack.org has the same behavior for that file?20:03
clarkbthey share the global application wsgi group20:03
clarkbso I think tehy should both be slow if the problem is associated with the python interpreter20:03
*** dpawlik_ has quit IRC20:06
clarkbcorvus: want to double check my edits based on your feedback?20:07
*** dpawlik has joined #openstack-infra20:07
*** dpawlik has quit IRC20:09
*** dpawlik has joined #openstack-infra20:09
dmsimardclarkb: so there's a pattern emerging from the error logs: http://paste.openstack.org/show/734370/20:11
dmsimard38.145.34.10 is a tripleo tool, I'm in touch with them to understand what it does and determine if it's misbehaving20:12
clarkbdmsimard: what does this tool do?20:13
dmsimardnot entirely sure yet haha20:13
openstackgerritMerged openstack-infra/storyboard master: Fix up a few requirements  https://review.openstack.org/61119420:15
clarkbcorvus: fungi dmsimard the other thing I notice digging into wsgi config is we only have 8 single thread wsgi processes? which means that apache may be able to take more connections but have busy wsgi processes behind them?20:17
clarkbits possible this is a dos after all and we are just not able to keep up with demand for those processes?20:17
dmsimardclarkb: yes, the thread/process config caught my eye as well20:17
clarkbthis would explain why apache sees my connection that is hanging as connected and otherwise happy, because its waiting for wsgi processes to open up?20:18
*** efried_rollin is now known as efried20:18
dmsimardclarkb: the logstash workers are (understandably) hammering logs.o.o on a regular basis -- they probably don't need their logs to go through osla ?20:18
clarkbdmsimard: they use osla to filter out debug logs20:19
clarkbdoesn't necessarily have to happen there20:19
*** needscoffee is now known as kmalloc20:19
clarkb(but that is why it is done this way)20:19
dmsimardI had put a reasonable default in the template, perhaps we need to tune it http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/templates/logs.vhost.erb#n1620:22
clarkbdmsimard: I think if we notice it happening again and logs-dev works fine then that would be a good next step20:23
clarkbif logs-dev shows the same problem then it likely isn't process/thread contention (those two have different process groups)20:23
*** kjackal has joined #openstack-infra20:23
dmsimard++20:24
dmsimardthe server doesn't seem particularly busy on the cpu side of things http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=138&rra_id=0&view_type=tree&graph_start=1541103780&graph_end=154162218020:24
dmsimardso it wouldn't shock me to increase the amount of threads20:24
*** eharney has quit IRC20:25
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one nodepool launcher  https://review.openstack.org/61628820:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all nodepool launchers  https://review.openstack.org/61628920:29
clarkbok it just did it to me. logs-dev was fine20:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one nodepool builder  https://review.openstack.org/61629020:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all nodepool builders  https://review.openstack.org/61629120:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one zuul executor  https://review.openstack.org/61629220:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zuul executors  https://review.openstack.org/61629320:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one zuul merger  https://review.openstack.org/61629420:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all zuul mergers  https://review.openstack.org/61629520:29
openstackgerritColleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for zuul.openstack.org  https://review.openstack.org/61629620:29
clarkbhttp://logs-dev.openstack.org/44/616044/2/check/loci-heat/6909fc4/job-output.txt.gz vs http://logs.openstack.org/44/616044/2/check/loci-heat/6909fc4/job-output.txt.gz20:29
clarkbinfra-root given ^ thoughts on bumping the process/thread count for logs.o.o?20:30
clarkber specifically for wsgi in logs.o.o20:30
fungiit sounds like a plausible theory. what's the default and where it it tuned?20:31
dmsimardI'll have a patch up in a minute20:32
clarkbfungi: http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/templates/logs.vhost.erb#n16 is where we set it, those vars are at http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/manifests/logserver.pp#n3420:32
fungithose do look like small numbers20:33
dmsimardhmm, we can either update the default there20:33
clarkbinfra-root unrelated to the logs my plan is to send out the opendev email to the various -dev and -discuss tomorrow morning local time20:33
dmsimardor bump it from openstack_project::static20:33
clarkbdmsimard: I think we should bump it in openstack_project20:33
clarkbdmsimard: then consider changing the default if this fixes the issue20:33
fungiclarkb: i fully endorse your communication plan. thanks!!!20:34
mordredclarkb: ++20:34
clarkbI think it looks quite good as far as an email draft goes. Thank you all fior the help20:34
clarkbfollowing up on ianw's planet investigation from last night it appears to be up and running now. Did any of us fix it?20:35
fungiclarkb: before it goes out, can we change "best practices" to something like "recommended practices" or "standard practices" or even just "practices"?20:35
fungi"best practices" makes my skin crawl20:35
clarkbfungi: ++ feel free to update to the least skin crawlly version20:35
dmsimardclarkb: do we want to "dry" test at 16 processes before patching it in ?20:35
clarkbdmsimard: we can. Let me put static.o.o in the emergency file20:36
clarkbdmsimard: thats done. Do you want me to edit to 16 processes or were you planning to do it?20:37
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Bump amount of mod_wsgi processes for static vhosts to 16  https://review.openstack.org/61629720:37
dmsimardI can20:37
clarkb(puppet can still override it depending on where it is in the run... but meh we can set it back again fi that happens)20:37
clarkb(it should only overwrite the one possible time though, after that its stopped)20:37
clarkbI see new processes20:38
dmsimardit's restarting now20:38
dmsimard#status log logs.o.o was put in the emergency file to test if bumping to 16 wsgi processes addresses timeout issues pending https://review.openstack.org/61629720:39
openstackstatusdmsimard: finished logging20:39
clarkbthere are 24 processes now instead of 16 (16 + 8 vs 8+8) that lgtm20:39
*** rfolco|ruck is now known as rfolco|off20:40
clarkbnow I guess we ask people to be on the lookout for subsequent slow behaviopr20:42
clarkbhogepodge: ^ fyi20:42
clarkbssbarnea: your line break change seems to be working too fwiw20:42
*** dpawlik has quit IRC20:43
*** dpawlik has joined #openstack-infra20:43
clarkbdmsimard: bah puppet updated the vhost file ~2 minutes after you restarted20:44
clarkbdmsimard: can you update it again and restart again? (puppet shouldn't run anymoer after that last one)20:44
clarkbthe race there is in generating the ansible groups and running puppet. If that happens before we add a host to the emergency file then puppet runs one more time on that host20:44
clarkb*if generating the ansible groups happen before editing the file and then running puppet happens after20:45
ianwclarkb: i didn't do anything to it20:46
clarkbdmsimard: let me know if I should do the update instead (I don't want to step on toes)20:46
ianwplanet.o.o i mean20:46
clarkbianw: must've been cloud side fix then? I'm happy to take that20:46
ianwyep :)  the cloud giveth, the cloud taketh away20:47
*** guilhermesp has quit IRC20:47
*** jungleboyj has quit IRC20:47
*** rajinir has quit IRC20:47
*** dpawlik has quit IRC20:47
*** mrhillsman has quit IRC20:47
*** liusheng__ has quit IRC20:47
*** jbryce has quit IRC20:47
*** hogepodge has quit IRC20:47
*** kopecmartin|off has quit IRC20:47
*** sparkycollier has quit IRC20:47
*** fdegir has quit IRC20:47
*** diablo_rojo_phon has quit IRC20:47
*** neith has quit IRC20:47
*** jbryce has joined #openstack-infra20:48
*** mnaser has quit IRC20:48
*** lamt has quit IRC20:48
*** mwhahaha has quit IRC20:48
*** seongsoocho has quit IRC20:48
*** sparkycollier has joined #openstack-infra20:48
*** guilhermesp has joined #openstack-infra20:48
*** hogepodge has joined #openstack-infra20:48
*** fdegir has joined #openstack-infra20:48
*** eharney has joined #openstack-infra20:48
*** lucasagomes has quit IRC20:48
*** seongsoocho has joined #openstack-infra20:49
clarkbhrm its been a few minutes, dmsimard must've stepped away. I will make the process change again20:49
*** mwhahaha has joined #openstack-infra20:49
*** mnaser has joined #openstack-infra20:49
*** kopecmartin has joined #openstack-infra20:49
hogepodgeclarkb: I didn't write the original gate jobs, and the person who did has disappeared. I expect that they're being run in parallel to save on time, but I was thinking the same thing, send up a patch to make them serial20:49
clarkbhogepodge: reading it, it appeared it was still running one image at a time?20:50
clarkbbut I am bad at understanding ansible looping so could be me misunderstanding20:50
*** kgiusti has left #openstack-infra20:50
hogepodgeclarkb: what do you mean?20:50
*** Qiming has quit IRC20:50
*** rajinir has joined #openstack-infra20:50
hogepodgeme too20:50
*** andreaf has quit IRC20:51
clarkbhogepodge: well I had missed that it has a distros with_items20:51
fungiokay, taking a break to go get very late lunch, but i'll be back20:51
clarkbhogepodge: so ya its trying to build all the distros in parallel. /me wishes we would get away from the idea that our container images should supprot all the base distros20:51
hogepodgeclarkb: so it's doing docker builds with different base images (which triggers different code paths based on introspection)20:52
*** andreaf has joined #openstack-infra20:52
clarkbhogepodge: one thing you might be able to do if the time cost for doing it serially is high is move the asynchronous polling out of ansible20:52
clarkbhave a shell script do that or similar20:52
hogepodgeclarkb: it's the only way to get buy in from vendors. redhat and suse and canonical are all involved in some way, and they want to use their own base container images20:52
fungiclarkb: i've done one last pass over the announcement and made some minor edits for improved accuracy20:53
hogepodgeclarkb: let me send a patch up that does it serial and we can do measurements20:53
clarkbhogepodge: ya but it also defeats the purpose of containers imo. THe whole point is you shouldn't care what the surrounding env of the application is as long as it is functional20:53
clarkbfungi: thanks!20:53
*** Qiming has joined #openstack-infra20:54
dmsimardclarkb: I had indeed stepped momentarily away20:54
* fungi disappears for a little while20:54
clarkbdmsimard: no worries. I think its all set now20:54
hogepodgeIf you buy into the idea that containers don't contain, but are convenient isolation barriers that you can mix and match, that becomes a less compelling argument20:54
clarkbhogepodge: in particular the tiny images for go things that are basically libc + compiled go file are really neat20:54
clarkbiirc there are things for python like that too20:54
hogepodgeopenstack needs to touch a lot of system stuff, which complicates matters20:54
clarkbthat is true.20:55
hogepodgebut are limited in scope to python. venv doesn't solve you mysql problems20:55
clarkbhogepodge: no, but also mysql doesn't require to be run on any one of those many distros20:55
clarkband https://hub.docker.com/r/mysql/mysql-server/ has made a choice for you20:56
clarkband you aren't supposed to care if it is alpine or ubuntu or centos or something else20:56
hogepodgeTrue. I always use the mariadb provided mysql container image, and I don't know what it's built on, which is fine by me20:56
mordredhogepodge: exactly :)20:56
clarkboraclelinux20:57
hogepodgebut I may care about what neutron or nova-compute is built on, because it wants to control my lower level system20:57
clarkbnot surprising, also I don't care20:57
*** kota_ has joined #openstack-infra20:57
hogepodgefor most services it doesn't matter, totally agree.20:57
clarkbmariadb is based on ubuntu20:58
persiaInterestingly, there are more ways to do "containers" than folk who like to use the word, so there's no answer to this.  In practice, it makes sense to build whatever folk want to use: an integration test that runs on some mix of things that can be identified but doesn't match someone's plan doesn't let them reuse the result.20:58
hogepodgeright or wrong, we live in a multi-vendor world and you're going to have a hard time getting VendorA to use packages build on VendorB when each is selling their own OpenStack solution (there was an instance where we tried, and once someone figured out what was going on they were eUpset)20:59
mordredI agree with both hogepodge and persia that this is the state of the world20:59
clarkbhogepodge: ya that is why I think the actual answer there is a non vendor base. Like the libc + binary go stuff20:59
persiahogepodge: There are semantic ways around that, but they are hard.  Happy to discuss in depth sometime, but I suspect it will take a very long time to make progress.21:00
hogepodgefwiw someone just sent up a patch to loci to add another base image and I was strongly "nope, we have configuration to let you build your own images from your own base... we don't want to be responsible for maintaining your special needs"21:00
mordredhowever, I agree with clarkb that the specifics of that state of the world are sad, as they remove a major benefit21:00
mordredthere is a reason we only support one base image in pbrx :)21:01
persiaflatpak.org has an interesting answer to that.  They run on any distro, but ship the freedesktop SDK as a base, and build all the flatpaks against that.21:01
clarkbpersia: yup I am actually hopeful that flatpak and/or snaps (and there is a third option that does the same thing iirc) will push things more towards the nix model for packaging with containers in mind21:02
mordredpersia: I haven't been able to get myself as excited about flatpak as I'd like to be - especially given it aligns very well with things we discussed at UDS 10 years ago :)21:02
clarkbbecause one of the things we've (openstack, infra, kolla, tripleo) struggled with here is that with the multi distro vendor container stuff we end up with massive containers for tiny packages21:02
mordredalthough I will admit I have used a snap for something21:02
clarkblike tripleo rsyslog container is >600MB21:02
clarkbI'm sure the package on centos is like 2MB21:03
persiaI have only limited excitement about flatpak, due to all the issues with distributing blobs, but I am really excited about freedesktop SDK.  I hope we can reach a point where we can use that as a reference point, and distros can value-add to that ABI, rather than the current confusion about "versions" of things.21:03
clarkbI just want packages to be 2MB again and not 700MB :)21:04
clarkbso that you can realistically have 100 of them in your system and not want to cry doing CI21:04
clarkb(then multiple that by number of supported distros)21:04
mordredclarkb: or run out of filesystem space when trying to use docker to install a web app21:04
persiaclarkb: If you don't want to cry during CI, you might consider some of the non-package models (nix is a good example) :)21:04
clarkbya nix has been hgih on my list of things I should look at but then when I have a free moment I get distracted by brisket and fishing and my kids21:05
persiaA few years ago I was playing with some systems that let you define your system in yaml, talked to a build farm, and let you "checkout" specific configurations by git sha of the yaml config which applied binary diffs to the filesystem, and it made everything super-fast.  Unfortuantely, it didn't work in lots of other ways, and that project isn't active anymore.  We're getting closer to sane with these sorts of things.21:06
*** ifat_afek has joined #openstack-infra21:06
clarkbin a former life I did linux sysadmin for a university and nix would be perfect there because then users can install packages and get the specific version of a thing they want21:06
clarkbwhich seemed like half the work we did as admins. Oh you need valgrind version x.y.z_1 because _2 has a bug? ugh21:06
persia(side note: Apache doesn't like it very much when you upgrade the live-running server by slowly switching network namespaces on each new connection to the new backend version in the new filesystem namespace)21:07
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Report tenant and project specific resource usage stats  https://review.openstack.org/61630621:07
clarkbThis was also my first run in with lisp. Debugging why certain versions of valgrind would crash under lush221:08
persiaIn a former life I spent most of my time trying to untangle the mess that was created after that, and make sure that everyone could install everything they wanted on a single set of library versions.  I like nix in lots of ways, but the security implications still scare me.21:08
clarkbpersia: don't user package installs theoretically isolate the user to what they could already do? they can run a webserver on high ports but no setuid etc21:09
clarkb*nix user packages21:09
persiaclarkb: Right, which means they can run a webserver with a remote exploit, and maybe something else with a privilege escalation exploit, and ...21:09
clarkbpersia: ya but they could do that anyway (without nix helping)21:10
*** rh-jelabarre has quit IRC21:10
persiaTrue, but in a package world, when the admins update the mirrors to have patched versions, any systems using those mirrors that are configured to upgrade, upgrade.  Users have to do more work to cause issues.21:10
clarkbthis is true the barrier to entry is a bit lower21:11
persiaNote that it's possible to cause nix to deal with it: it's just harder than with package-based systems.21:11
*** gfidente|afk has quit IRC21:11
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes  https://review.openstack.org/61626221:13
*** dpawlik has joined #openstack-infra21:14
openstackgerritMerged openstack-infra/system-config master: Remove infracloud from cacti  https://review.openstack.org/61626521:14
clarkbianw: should we remove planet01 from the emergency file?21:14
dmsimardclarkb: did you have a reliable way of reproducing the logs.o.o issue ?21:17
clarkbdmsimard: no, but it was reproducing itself about once an hour or so21:17
clarkbdmsimard: http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=1&from=now-6h&to=now the large spikes there seem to correlate with it21:18
dmsimardoh, interesting21:19
clarkbdmsimard: the logprocessor nodes were backing up like web clients21:20
clarkbso their queues would grow21:20
ianwclarkb: if it's back, yep21:21
*** dpawlik has quit IRC21:22
ianwclarkb: done21:22
*** dpawlik has joined #openstack-infra21:22
clarkbthanks21:23
ianwcorvus/clarkb: https://review.openstack.org/#/c/614894/ is for pinning ansible, and came out of prior discussions on corvus' pin.  thoughts welcome :)  there's a follow-on for openstacksdk too21:23
*** ifat_afek has quit IRC21:24
corvusianw: my main concern with voting on that is github reliability -- however, we do have ansible in our zuul; we can add it to required-projects and install from there21:26
*** dpawlik has quit IRC21:27
*** rh-jelabarre has joined #openstack-infra21:28
ianwcorvus: hrrm, good point, the devel job should be able to use a local path just as easily, i'll look at that21:29
corvusianw: bonus: depends-on will work :)21:29
ianw++21:30
ianwi was a bit tunnel visioned on getting a URL working and being able to leave the version out, should have thought of pulling it via zuul21:31
*** jcoufal has quit IRC21:32
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Set ansible's default branch to devel  https://review.openstack.org/61631421:33
corvusianw: we should land that first ^ :)21:33
*** dpawlik has joined #openstack-infra21:33
corvusmordred, Shrews: ^21:34
dmsimardAt what point should we consider moving the base job from xenial to bionic ?21:35
dmsimardI had to move certain jobs to bionic because they required a more recent version of py321:35
clarkbdmsimard: the TC and QA team are still trying to figure that out I think. We've asked that infra not be the decision maker on that21:36
dmsimardfair21:36
clarkb(this is one reason that multiple tenants in zuul will be helpful)21:36
clarkbfungi: fyi I created https://etherpad.openstack.org/p/BER-opendev-feedback-and-missing-features for your forum session21:37
*** ansmith has quit IRC21:37
*** dpawlik has quit IRC21:38
*** dpawlik has joined #openstack-infra21:38
clarkbadded it to the etherpad too21:38
clarkber wiki page for etherpads21:38
*** dpawlik has quit IRC21:39
*** dpawlik_ has joined #openstack-infra21:39
clarkbfungi: prepopulated it with the list of things from the schedule which is probably a good start21:40
clarkbIf anyone else has forum sessions they intend on using an etherpad with please update https://wiki.openstack.org/wiki/Forum/Berlin2018 :)21:41
openstackgerritIan Wienand proposed openstack-infra/system-config master: run_cloud_launcher.sh : generate runtime stats  https://review.openstack.org/61635521:43
*** bobh has joined #openstack-infra21:48
*** bobh has quit IRC21:51
*** fuentess has joined #openstack-infra21:52
clarkbhrm I think there was another case of slowness at ~21:24-21:3021:56
clarkband the wsgi change went in at 20:4921:56
clarkbLogLevel info next?21:56
*** jungleboyj has joined #openstack-infra21:57
dmsimardclarkb: it's recurring every hour, do we have something that could explain that ? There's some spikes in load elsewhere http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=25&fullscreen&orgId=121:57
clarkbdmsimard: could be gate resets21:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests  https://review.openstack.org/61535621:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Remove unneeded nodepool test skips  https://review.openstack.org/61635821:58
dmsimardclarkb: the logstash workers are kinda spiky too21:58
dmsimardhttp://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=1267&rra_id=0&view_type=tree&graph_start=1541541508&graph_end=154162790821:58
clarkbdmsimard: ya ultimately its all driven by how quickly zuul pushes stuff out21:59
Shrewscorvus: what does that fix (if anything)?22:03
clarkbianw: the wsgi timeouts ring any bells for you? logs.o.o server will get slow periodically (seems to coincide with when log processing has a backlog http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=1&from=now-6h&to=now then will go away22:03
Shrewscorvus: oh, ok. helps to read scrollback22:03
clarkbianw: best I can tell logs-dev.o.o doesn't have that issue on the same log files implying its somethign to do with apache + mod_wsgi + our wsgi apps in that particular process group22:04
corvusShrews: ok; was about to answer it's for jobs like ianw's to cross-test our playbooks with ansible, but let me know if you have further questions.22:04
corvusShrews: (basically gets rid of a bunch of override-checkouts.  i don't think it will affect openstacksdk, which *does* have a bunch of override checkouts -- i wouldn't remove them since there's complex branch logic that probably has to stay anyway)22:05
Shrewscorvus: yeah, i was seeing the "branches: devel" in the job and was confused at first22:05
*** trown is now known as trown|outtypewww22:06
hogepodgeclarkb: it looks to me like async doesn't give us any performance advantage, so I may just pull it all out22:07
clarkbhogepodge: ok, will be curious to know if it is more reliable22:07
ianwclarkb: sorry no :/22:07
clarkbdmsimard: at full steam ahead logprocessors will process 80 files at once. Though the vast majority of the time involved there should be in the file munging not downloading (so I don't expect we'd see 80 concurrent requests often). Does make me think the issue is elsewhere and log processing just sees the symptom as fungi put it22:10
openstackgerritMerged openstack-infra/project-config master: Set ansible's default branch to devel  https://review.openstack.org/61631422:11
dmsimardclarkb: need to take care of dinner, I can take another look later tonight22:13
clarkbdmsimard: ya I'm beginning to think this may be a sleep on it problem22:13
clarkbI've got a tail on the error log now. If I catch it erroring out in the near future I'll poke around and try to see if we had a gate reset or any other external event that may be clogging the pipeline22:13
corvusclarkb: or we could switch to swift logs22:14
clarkbbut otherwise I need to clear head and get some other stuff done and look at it with fresh eyes22:14
clarkbcorvus: ya there is that too22:14
clarkbcorvus: the last remaining bits were mostly edge cases that tobiash was finding?22:14
corvusoh, i'm not aware of any issues.  i thought they're ready to go; i've mostly just been hoping we can get other stuff in place in zuul to make them nicer, before the next system crash forces us to switch.22:15
clarkbgotcha22:15
*** kjackal has quit IRC22:20
*** mriedem_afk is now known as mriedem22:22
*** dpawlik_ has quit IRC22:23
*** dpawlik has joined #openstack-infra22:24
clarkbI do worry that if we change it and all get on planes the next few days we might not be able to fix bigger issues if somethign goes wrong (right now jobs are functioning and uploading logs and you can usually get the log files )22:25
clarkbmaybe plan for it week after summit (when its bit quieter for US thanksgiving) and we can watch it a bit closer?22:26
*** dpawlik has quit IRC22:28
corvusyeah.22:30
hogepodgeclarkb: well, failure rate is still the same, some jobs are about 50% slower, but now I can actually get a log of what went wrong22:31
hogepodgefor example http://logs.openstack.org/37/616337/2/check/loci-glance/89bbd6e/job-output.txt.gz#_2018-11-07_22_26_39_54669622:31
*** rfolco|off has quit IRC22:34
openstackgerritIan Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job  https://review.openstack.org/61489422:35
openstackgerritIan Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk  https://review.openstack.org/61598222:35
*** boden has quit IRC22:36
openstackgerritIan Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk  https://review.openstack.org/61598222:36
*** ssbarnea has quit IRC22:37
clarkbwell other than that one at 21:30 ish we've not appeared to see any more slowdowns since. Maybe the 16 threads did help but not fix the problem (eg need more processes?)22:38
clarkbneed more data22:38
hogepodgeclarkb: is there a way to recheck just a single job, or do I need to do them all?22:40
*** dpawlik has joined #openstack-infra22:40
clarkbhogepodge: you have to do all of them (this is by design to prevent lockign in of results for flaky jobs)22:40
ianwhogepodge: i have no idea what you're doing :)  but for intermediate testing i often edit down the zuul file to just the interesting jobs22:41
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests  https://review.openstack.org/61535622:41
hogepodgeianw: I'm trying to figure out what I'm doing. loci testing does a bunch of things. It has one job for every supported openstack project (and requirements), and within those jobs it does three builds. I think I want to do a major refactoring, but have to think about how to organize it. I'm just trying to understand why jobs have a 10% failure rate in infra22:43
hogepodgeFor that particular project. It's too high and has turned into a blocker for us.22:43
lbragstadhas anyone seen things like this crop up recently wrt reno? http://logs.openstack.org/61/610661/4/check/test-release-openstack/17eca5b/job-output.txt.gz#_2018-11-07_14_04_51_88797222:43
clarkblbragstad: I think that is the twine based linting of your python package22:44
clarkblbragstad: basically python tooling is saying your package is bad please fix it22:44
dhellmannyeah, look at the links in your readme22:44
hogepodgesamyaple set up all those jobs and is mia, so there's a bit of learning going on.22:44
*** dpawlik has quit IRC22:44
lbragstadok - so glance master would be broken, too?22:45
dhellmannlbragstad : that job only runs if someone tries to change the packaging files (including the readme)22:45
lbragstadaha - ok that makes sense22:45
dhellmannyou can run the check locally by installing twine and docutils, building an sdist, then running twine check on the output file22:46
dhellmannin this case there is almost certainly a redundant link with the title "release notes" in the readme22:46
dhellmannto fix that you can make them anonymous links (use double underscore at the end like : `blah <url>`__)22:47
*** priteau has quit IRC22:47
openstackgerritIan Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job  https://review.openstack.org/61489422:51
openstackgerritIan Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk  https://review.openstack.org/61598222:51
lbragstadthanks dhellmann clarkb22:51
*** rh-jelabarre has quit IRC22:54
*** slaweq has quit IRC23:01
openstackgerritIan Wienand proposed openstack-infra/system-config master: [dnm] testing depends-on for openstacksdk in devel job  https://review.openstack.org/61637223:06
fungiokay, back and catching up now23:11
openstackgerritIan Wienand proposed openstack-infra/system-config master: [dnm] testing depends-on for openstacksdk in devel job  https://review.openstack.org/61637223:15
clarkbI tried inducing load on apache by downloadign the same file 80 times in different processes and that didn't do it either (which would be similar ish to what the logprocessors do)23:16
*** rlandy is now known as rlandy|bbl23:29
ianwcorvus / clarkb: https://review.openstack.org/#/c/614894 & https://review.openstack.org/#/c/615982 updated to use zuul checkouts for devel job, thanks.  (it even works https://review.openstack.org/#/c/616372 :)23:31
clarkbianw: other than needing to figure out router things for citycloud did you get what you needed for brinign up the armci cloud?23:31
ianwclarkb: i think so, the cloud launcher has at least run there now so i'll try bringing up a mirror shortly23:32
ianwclarkb: have we restarted nodepool launcher with recent sdk's lately?23:35
clarkbianw: yes, I did so last week23:36
clarkband removed all the nodepool nodes from the emergency file at that time23:36
ianwclarkb: hrm, we still don't seem to be getting stats; i wonder if we missed the fixes or something else ...23:36
ianw0.19.0 according to pip23:37
ianwthat should have the fix ... hrm23:38
clarkbianw: I think I caught a bug in the ansible and sdk install changes (please double check me)23:39
clarkbon the logs front no issues sicne I started tailing the error log. Clearly we should just all look at the logs server sternly and then it will work23:39
clarkbianw: also we've been iterating on https://etherpad.openstack.org/p/WUNOTv8MuP for opendev messaging. If you want to take a look at that sometime. My goal is to send it out tomorrow morning local time23:41
ianwclarkb: replied, it's as i intended but i'm very open to better ways of doing it ...23:42
clarkbianw: looking at that chain of logic though we'll set the var to latest if it is not defined. Then the next block won't set the _foo_var name and we'll omit version at the end because _foo_var isn't set?23:43
clarkbianw: its that middle block not setting the _foo_var when latest that I think breaks the first block's intent?23:43
ianwclarkb: that's right, because "latest" is a special variable where we don't want to set the version:, but the state:23:44
*** dhellmann has quit IRC23:45
clarkbah23:45
clarkbthis is ansible being fancy23:45
clarkbok23:45
ianwyeah, i guess as far as the pip module is concerned a version could be anything, so it's quite happy trying to do "pip install package@latest"23:46
*** eharney has quit IRC23:47
clarkbI've updated my reviews and left a breadcrumb for anyone that is confused like me23:47
ianwi think it's a bit nicer to hide the complexity in the role, and then at the top level we just either set the version to "x.y.z" or "latest" and don't have to worry about fiddling state tags, etc23:49
clarkbya23:49
clarkbianw: fwiw I'm not seeing anything wrong with the nodepool and sdk installations23:51
clarkbianw: is https://git.openstack.org/cgit/openstack/openstacksdk/commit/?id=dd5f0f68274df4106902aa71a3f882d70c673dab the fix for stats?23:52
clarkbianw: I want to say nodepool doesn't use the taskmanager in sdk yet23:52
ianwclarkb: that was my fix but clearly something is wrong ...23:56
clarkbianw: I think it may be beacuse nodepool doesn't use that code? I'm working to confirm23:56
ianwi'm not seeing any "Manager %s ran task %s in %ss" messages being trigger23:57
clarkbianw: ya it subclasses the sdk class btu doesn't super anything23:57
clarkbso submit_task for example is all nodepool23:57
clarkbsame with post_run_task23:58
ianwhrmm, i can't see that post_run_task is being triggered at all ... and then why did we break with the new sdk?23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!