Wednesday, 2018-05-16

*** myoung|ruck|afk is now known as myoung|ruck00:14
*** rlandy is now known as rlandy|bbl00:16
pabelangerdmsimard|off: why does ARA need to be install for ara-report when ara_report_type == database?00:27
pabelangerinstall on executor?00:27
pabelangeras I understand it, the log server would run ARA against the database when somebody hits the web right?00:28
pabelangeroh, It is so the database is generated to start with, when we run ansible-playbook00:29
pabelanger:)00:30
pabelangerkk, sorry for the noise00:30
dmsimard|offpabelanger: yeah, the executors need to have the callback enabled to have the database. The callback will be split into another python module in 1.0 so it's less of a pain.00:35
pabelangeryah, thanks. I forgot that step. I do think ara-report could only check to make sure the database was found, since ara-report doesn't actually need to run ara at that point00:40
pabelangerwill test tomorrow and find out, ara in is a virtualenv, and not sure of the role will find it00:40
*** elyezer has quit IRC00:58
*** elyezer has joined #zuul01:10
*** elyezer_ has joined #zuul01:12
*** elyezer has quit IRC01:16
*** ssbarnea_ has quit IRC01:34
*** rlandy|bbl has quit IRC02:16
*** snapiri has joined #zuul05:32
*** smyers has quit IRC05:44
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Support merged as requirement in github driver  https://review.openstack.org/56848805:44
*** smyers has joined #zuul05:45
*** AJaeger has quit IRC06:38
*** AJaeger has joined #zuul06:43
*** AJaeger has quit IRC07:27
*** AJaeger has joined #zuul07:29
*** gtema has joined #zuul07:42
*** jpena|off is now known as jpena07:50
*** dims has quit IRC07:59
*** dims has joined #zuul08:02
*** dims has quit IRC08:07
*** dims has joined #zuul08:07
*** ssbarnea_ has joined #zuul08:34
*** ekan is now known as johanssone09:12
*** corvus has quit IRC10:29
*** corvus has joined #zuul10:30
*** hashar has joined #zuul10:33
*** hashar has quit IRC11:02
*** ssbarnea_ has quit IRC11:14
*** jpena is now known as jpena|lunch11:47
*** ssbarnea_ has joined #zuul11:47
*** rlandy has joined #zuul12:29
*** jpena|lunch is now known as jpena12:43
*** elyezer_ has quit IRC13:30
*** gtema has quit IRC13:36
*** elyezer_ has joined #zuul13:43
*** acozine1 has joined #zuul13:43
*** gtema has joined #zuul14:20
*** dkranz has quit IRC15:07
*** dkranz has joined #zuul15:09
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Replace use of aiohttp with cherrypy  https://review.openstack.org/56795915:17
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833515:17
*** gtema has quit IRC16:15
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Replace use of aiohttp with cherrypy  https://review.openstack.org/56795916:30
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833516:30
mordredcorvus: 568028 has 3 +2s - should we hold off on landing it until the other two are ready?16:32
corvusmordred: yeah, i've miped it for now.  also we should probably land the mqtt change first; i think it might conflict16:33
corvusthat's https://review.openstack.org/53554316:33
corvusi think it's ready to go, but i didn't want to land a major change since i'm so distracted; but if others are around and want to, i think that's fine.16:34
mordredcorvus: I'm goig to run to the store, but I can maybe land it and keep an eye on things when I get back ... do we have a patch anywhere to connect it to firehose?16:37
corvusmordred: i don't think so16:42
*** jpena is now known as jpena|off17:11
*** sshnaidm|rover is now known as sshnaidm|off17:29
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Status branch protection checking for github  https://review.openstack.org/53568017:39
tobiashfinally added tests to ^ :)17:42
SpamapSLooksl ike the split of zk so far is preventing all the nodepool waiting17:47
SpamapSand in fact with that happening, now the executor gets load up to the governor17:47
tobiashSpamapS: so happy end now :)17:48
pabelangerwoot17:48
pabelangerSpamapS: how larger was your shared zookeeper before the move?17:48
pabelangererr17:49
pabelangersmaller*17:49
SpamapSpabelanger: so I have everything on a single 16GB 8vcpu VM that tops out at around 400 IO/s17:50
SpamapSmoved Zookeeper to an 8GB 4vcpu VM on its own17:50
pabelangerSpamapS: good info, I too am testing a single VM, but don't really have a lot of jobs currently17:51
SpamapSMy executor hit the governor load of 20 for the first time a few minutes ago.17:52
SpamapSThere were probably 25 - 30 concurrent playbooks running17:52
SpamapSI should get a graphite/statsd set up so I can tell17:53
corvusSpamapS: i wonder if there's a metric or something we could log to help identify this problem?17:53
SpamapScorvus: zookeeper was warning me17:54
SpamapSApr 19 14:50:49 zuul.cloud.phx3.gdg zookeeper[2178]: 2018-04-19 14:50:49,268 - WARN  [SyncThread:0:FileTxnLog@338] - fsync-ing the write ahead log in SyncThread:0 took 3451ms which will adv17:54
SpamapScorvus: we shoudl tell people to watch out for that17:54
tobiashSpamapS: did you also enable autopurge.purgeInterval?17:55
tobiashthat makes sure that you don't fill up all your space with snapshots17:55
tobiash(or in my case it filled its data tmpfs with snapshots and oomed once a week)17:56
SpamapStobiash: no I just run the cleanout script17:57
SpamapSbut putting that on tmpfs would probably make it pretty fast. :)17:57
tobiashok, that's the other option ;)17:57
tobiashSpamapS: yes but then you should spread it to several vms17:58
tobiashI'm running it with 5 replica on tmpfs17:59
*** electrofelix has quit IRC18:01
SpamapStobiash: IIRC that does not improve write load18:02
tobiashSpamapS: I know, more zk are actually slower but I have 5 to reduce the risk of data loss and having to rebuild all images18:03
fungitmpfs (in linux anyway) is really just the kernel's filesystem cache layer divorced from any underlying physical block layer and granted the ability to page out to swap18:03
tobiashI think the recommendation was 3, 5 or at max 718:03
fungiso when you've got available ram, filesystem caching is roughly as performant as tmpfs, and when you don't have available ram tmpfs is using swap which makes it about the same performance as any actual block-backed fs18:05
SpamapStobiash: ah yeah, I should do that. ;)18:05
SpamapSLoss of nodepool data would mostly mean that you have to clean up all the ready nodes and images.18:06
tobiashfungi: zk does many fsync calls afaik and my nodes don't have swap ;)18:06
SpamapSfungi: in the past swapping was far less performant than filesystem flushing.18:06
ShrewsSpamapS: awesome. i suspected the issue had to be environmental, but glad you confirmed. zk gets a LOT of traffic18:09
SpamapSyeah, it's just weird because it wasn't showing as much io wait (15-20 percent) so was kinda hidden.18:10
*** elyezer_ has quit IRC18:10
*** elyezer has joined #zuul18:12
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Replace use of aiohttp with cherrypy  https://review.openstack.org/56795918:12
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833518:12
ShrewsSpamapS: iirc, we have nodepool code that would clean up leaked instances (due to zk data loss)18:30
Shrewsi can't remember if there is an images equivalent18:30
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: WIP: Simplify driver API  https://review.openstack.org/56870418:32
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: WIP: Simplify driver API  https://review.openstack.org/56870418:34
tobiashShrews: afaik there is no image equivalent18:36
tobiashbut maybe that would make sense18:36
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Simplify driver API  https://review.openstack.org/56870418:37
Shrewstobiash: yeah, maybe18:37
tobiashI think we would need to add a nodepool_provider_name property also to the images and add a cleanup thread18:37
Shrewsyeah, we'd need something to say "we own this"18:37
tobiashwe already have upload and build id but that doesn't tell us that information18:38
ShrewstristanC (and any EasyStack folks who may silently linger here): I know 568704 is totally going to cause havoc on your driver proposals, but I think that is going to be a much easier interface, if you find time to take a quick peek.18:42
Shrewsno rush, obviously18:42
Shrewsi still need to get to the end of what i'm changing to validate that's actually going to work for us. more changes may come18:44
*** gtema has joined #zuul19:00
*** gtema has quit IRC19:07
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: WIP: Cleanup leaked images  https://review.openstack.org/56893719:09
tobiashShrews: that's a first ugly wip ^19:09
tobiashbut I think we might want to move that into the provider and add a cleanupLeakedImages function to the driver api?19:10
Shrewstobiash: it should be in the builder, not launcher19:15
tobiashoh right, there is also a cleanup worker19:18
corvusmordred: the cherrypy change is ready -- it passes tests locally, however, it's hitting process-returncode failures in the gate with no output.  do you think the stestr change would help illuminate the problem?19:47
mordredcorvus: MAYBE?19:48
mordredcorvus: I mean, it's worth a depends-on19:50
corvusmordred: any idea where that ended up? :)19:51
corvusmordred: oh, i think it was merged and reverted, and there's no unrevert19:51
corvus536882 was original19:51
corvusi'll push up an unrevert and stack on it19:52
mordredcool19:52
mordrediirc, I think there was an issue with the original too that we uncovered that we were going to fix when we unreverted - although it might have just been a documentation issue19:53
corvusyeah, i wish i had been more verbose in the revert commit :|19:53
corvusi know one of the errors i ran into was my fault; i can't recall the others19:54
mordredme either19:54
mordredhowever - I can help re-diagnose them as soon as you encounter them19:54
corvusyeah, as long as we aren't in a rush and can take some time to poke at it over the next couple weeks i'm sure we can sort it out19:55
mordred++19:55
corvuswe just switched nodepool to stestr19:55
corvusand i think Shrews may have ironed out some things there?19:55
corvusso probably worth refreshing the zuul stestr change to account for any differences that ended up in the nodepool one19:56
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Revert "Revert "Switch to stestr""  https://review.openstack.org/56894920:03
Shrewsonly issue i had with the nodepool testr change was being able to run ttrun20:08
Shrewswhich mordred fixed20:08
Shrewss/testr/stestr20:09
fungicorvus: any reason to hold off approving the mqtt publisher? lgtm but since it's accumulating +2s i didn't know if there was a reason to wait, seeking more feedback, something20:14
mordredfungi: I think it's mostly making sure there's someone to watch it since it's a big change - I was gonna do it after lunch but got sidetracked20:15
fungiokay, cool20:15
fungii'm already on the hook for watching the storyboard update unfold20:15
fungiwhich looks like it's going to apply any moment now20:16
corvusyeah, i just didn't want to accidentally make more work for someone since i'm running around with very small timeslices right now :)20:20
corvusshould be totally fine20:20
corvusmordred: ft1.2: tests.unit.test_streaming.TestStreaming.test_decode_boundaries_StringException20:49
mordredcorvus: that doesn't seem to be much more helpful20:50
corvusstill looks like alarm clock timeouts leave us with no data20:50
mordredcorvus: "StringException" isn't clear to you?20:51
corvusmordred: oh... hrm... i wonder if this actually has given us the answer...20:52
mordredcorvus: oh -like, tests.unit.test_streaming.TestStreaming.test_decode_boundaries was the test that bonged?20:52
corvusyeah, i think so20:52
corvus*maybe* stestr is better at reporting the actual failing test, whereas under testr the failure sometimes gets allocated to the wrong test?20:53
corvusi don't know if that's actually the case, or maybe it's just the case that the random number generator happened to run the failing test last this time or something :)20:54
corvusbut it definitely hangs locally20:54
mordred\o/20:56
corvuser, hrm.  no actually that only hangs with 568335.  when i run it under 567959 it works locally20:56
*** acozine1 has quit IRC21:04
*** dkranz has quit IRC21:09
ianwcorvus: I noticed this doing some debugging of aiohttp stuff as well.  i removed all the OS_CAPTURE stuff from .testr.conf and then it started spitting out exceptions for me21:14
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Replace use of aiohttp with cherrypy  https://review.openstack.org/56795921:22
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Convert streaming unit test to ws4py and remove aiohttp  https://review.openstack.org/56833521:22
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Revert "Revert "Switch to stestr""  https://review.openstack.org/56894921:22
corvusianw: yeah, if i run out of options, i may have to do that and try to capture the console output to see if i can catch the issue.  it's just that in that case we get all of the output, which is enormous, and it's all interleaved :/21:23
ianwwhat was weird was that i didn't get all the debugging logs, etc.  just the exception output21:25
dimscorvus : mordred : is there a badge functionality in zuul? (like the "build|passing" in travis - https://docs.travis-ci.com/user/status-images/)21:34
corvusdims: afaik zuul sends the right things to github for build status to show up...21:35
mordredyah - but not a badge like travis has21:35
mordredbut that's because with zuul there is no other state for a tree to be in21:36
corvusoh, that21:36
mordredso there could just be a static image badge "gated by zuul" or something21:36
corvusyeah, just link to a green box :)21:36
mordredwe should totally do that21:36
corvus++21:36
dimsright!!21:36
mordreddims: consider your feature request accepted :)21:36
dimsw00t21:36
SpamapS:-D22:10
SpamapSShould just have a badge which is the zuul logo and the number of days since zuul first merged something: "It has been __ Days since the last failed build."22:12
*** ssbarnea_ has quit IRC22:14
*** pabelanger has quit IRC22:15
*** _ari_ has quit IRC22:16
*** mhu has quit IRC22:16
*** myoung|ruck has quit IRC22:17
*** weshay has quit IRC22:17
*** weshay has joined #zuul22:22
*** weshay has quit IRC22:26
*** rlandy is now known as rlandy|bbl22:28
*** andreaf has quit IRC22:29
*** andreaf has joined #zuul22:29
*** weshay has joined #zuul22:34
*** pabelanger has joined #zuul22:34
*** _ari_ has joined #zuul22:35
*** myoung has joined #zuul22:35
*** mhu has joined #zuul22:36
openstackgerritMonty Taylor proposed openstack-infra/zuul-website master: Add a "zuul: gated" status badge  https://review.openstack.org/56897522:40
mordredSpamapS: ++22:40
mordreddims: ^^ how's that?22:40
mordredit'll be at https://zuul-ci.org/gated.png if/when that lands22:40
corvusthe preview will be ready in just a minute22:41
mordredSpamapS: I like your idea too - I sort of feel like we should have a collection of fun/snarky badges people can use22:41
corvushttp://logs.openstack.org/75/568975/1/check/zuul-website-build/328c5ec/html/gated.png22:45
SpamapSI kind of want it to be a chunk of javascript, not a .png22:48
SpamapSeven if it just displays a png now22:48
SpamapSeventually...22:48
SpamapSthere's some real fun we can have.22:48
SpamapSLike we could make it an homage to the McDonalds 1,000,032 burgers served sign... or count how many gate fails there have been and be like "Protected against 45 bad patches. You're welcome."22:49
mordredheh22:57
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Add documentation about using the badge  https://review.openstack.org/56897822:59
fungiif there's one thing i learned from scouts, is that people love collecting badges23:01
corvusbadge jokes?  we don't need no stinkin' badge jokes!23:02
fungiwow! ;)23:02
mordredyay for jokes in comments23:03
*** pabelanger has quit IRC23:17
*** mhu has quit IRC23:17
*** weshay has quit IRC23:17
*** myoung has quit IRC23:17
*** _ari_ has quit IRC23:18
*** pabelanger has joined #zuul23:20
*** weshay has joined #zuul23:21
*** _ari_ has joined #zuul23:21
*** myoung has joined #zuul23:23
ianwnodepool@nl01:~$ nodepool list | grep arm6423:24
ianw2018-05-16 23:23:39,227 WARNING kazoo.client: Connection dropped: socket connection error: Permission denied23:24
*** mhu has joined #zuul23:24
ianwwhy do i always see that ^23:24
*** _ari_ has quit IRC23:28
*** _ari_ has joined #zuul23:28
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Add documentation about using the badge  https://review.openstack.org/56897823:53
*** myoung is now known as myoung|ruck23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!