Wednesday, 2018-12-05

jheskethdmsimard: No, there isn't a spec (afaik). Happy to work on one with you though if you think it's useful. Otherwise hopefully the direction those patches are going help explain the goals.00:46
* jhesketh really needs to return to that :-s00:46
pabelangeroh, interesting, in one of my tests, nodepool-builder lost access to zookeeper, it seems during a build. This is on ovh, so suspect node is slow, however I wouldn't have expect nodepool-builder to delete the already completed DIB and generate it again: http://logs.openstack.org/04/622604/1/check/windmill-src-fedora-latest/88c57a7/logs/nb01/var/log/nodepool/builder-debug.log00:51
*** rlandy has quit IRC00:59
pabelangerShrews: Looking at https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/builder.py#n795 if DIB was successful, what zk info could we have lost, if we lose the connection to zk?01:05
pabelangerShrews: basically, trying to understand why we set zk.FAILED there if connetion is lost, and not wait for it to recover01:06
pabelangerlike we did at https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/builder.py#n78101:06
*** j^2 has joined #zuul02:15
tristanCShrews: not sure what changed between 3.3.0 and 3.3.1, but "deleting" node are leaking, we now have 16 of those... here are the logs: http://paste.openstack.org/show/736679/02:17
clarkbtristanC: ya there is a fix for that. though I dont think it was an issue in 3.3.1? came aftwr iirc02:18
clarkbbasically nodepool created delete records for nodes that had exceptions booting (previously we leaked them without a node record) but the first fix didnt include pool info so we still couldnt delete them02:19
tristanCclarkb: i also thought the missing pool info was the issue because of the "OpenStackProvider: Cannot find provider pool for node" message02:21
tristanCbut the NodeDeleter doesn't seems to use the pool info to delete the node02:21
tristanCthen I looked into the "node_exists" attribute, but Shrews said those node shall have an id, e.g.: https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/launcher.py#n9302:23
dmsimardtristanC: that's another fix that landed02:23
dmsimardNot sure if it's been released02:23
dmsimardhttps://review.openstack.org/#/c/622403/02:26
dmsimardSaw this issue in our logs a while back02:27
tristanCdmsimard: iiuc, that's about empty zknode, which also are an issue, but a different one02:27
tristanCperhaps https://review.openstack.org/#/c/621301/ would help figure out what's the issue02:27
tristanCcommit message doesn't really help though...02:27
clarkbtristanC I thought it needed the pool info as it was short circuiting otherwise02:28
tristanCclarkb: alright, thanks for the tip. I'll give the fix a try02:30
pabelangertristanC: clarkb: https://review.openstack.org/621040/ is the issue I think will fixed leaked nodes for vexxhost in ansible-network... as for deleting nodes. Because this is rdocloud, i think you need to have a cloud admin reset the state of the VM, I believe nodepool is saying delete by openstack isn't02:42
pabelangertristanC: you can test, by trying to manually delete the node with openstack client02:42
pabelangerif that fails, then the state needs to be toggled on openstack side02:42
tristanCpabelanger: there are some Unauthorized exception too, not sure if they came with nodepool-3.3.1 though... here is an example: http://paste.openstack.org/show/736680/02:47
pabelangertristanC: if you trace the log, was that to limestone? If so, we did a password reset today and updated clouds.yaml with dmsimard02:49
tristanCpabelanger: the provider is not logged, and those are quite frequent though, about 3000 since 2018-12-0102:51
pabelangertristanC: are they still happening?02:51
pabelangeryah, we should log the provider to help debug02:51
pabelangertristanC: it is likely limestone auth was down for a few days also02:52
tristanCpabelanger: last one was 2018-12-04 16:52:09,28702:52
tristanC(utc)02:52
pabelangertristanC: I think that is around the time dmsimard update clouds.yaml on sf.io, you likely can confirm with timestamp on the file02:52
pabelangertristanC: I am not sure the process on sf.io side where the password is stored02:53
tristanCindeed ~nodepool/.config/openstack/clouds.yaml is Dec  4 16:5202:53
pabelangercool02:54
*** bhavikdbavishi has joined #zuul02:56
*** bhavikdbavishi has quit IRC03:01
*** bhavikdbavishi has joined #zuul03:18
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210104:04
*** bjackman has joined #zuul04:05
*** mordred has quit IRC04:57
*** mordred has joined #zuul04:57
tobiashtristanC: commented ^05:09
tristanCtobiash: thanks, i agree that tests (and better commit message too) would help understand the new failures we are having with nodepool-3.3.105:18
tristanCtobiash: but for the moment, i'm just trying to mitigate node leaking with the last released version...05:19
tristanCfwiw, here is the list of backport that seems to help: https://softwarefactory-project.io/r/1452105:21
tobiashtristanC: I'm sure there is already a server create failed scenario. That probably just misses the quota call afterwards05:23
*** j^2 has quit IRC05:29
tobiashtristanC: if you don't have time maybe I can help with the test later05:32
tristanCtobiash: https://review.openstack.org/622101 seems critical though, provider with a failing node are not able to launch new nodes because of the type IndexError exception05:45
*** dmellado has quit IRC05:46
*** gouthamr has quit IRC05:46
tobiashOne more argument for a test. I'll look later05:46
tristanCthis was not issue without https://review.openstack.org/621681, because the estimatedNodepoolQuotaUsed was skipped as the node didn't have a pool set.05:47
openstackgerritMerged openstack-infra/nodepool master: Add cleanup routine to delete empty nodes  https://review.openstack.org/62261605:47
*** dmellado has joined #zuul05:48
*** gouthamr has joined #zuul05:53
*** dmellado has quit IRC05:55
*** dmellado has joined #zuul05:57
*** dmellado has quit IRC06:02
*** dmellado has joined #zuul06:14
*** gouthamr has quit IRC06:15
*** gouthamr has joined #zuul06:18
*** njohnston has quit IRC06:27
*** njohnston_ has joined #zuul06:28
*** gouthamr has quit IRC06:29
*** gouthamr has joined #zuul06:32
*** njohnston_ has quit IRC06:52
*** njohnston has joined #zuul06:55
*** gouthamr has quit IRC06:58
*** gouthamr has joined #zuul07:07
*** bhavikdbavishi has quit IRC07:09
*** bhavikdbavishi1 has joined #zuul07:09
*** pcaruana has joined #zuul07:10
*** bhavikdbavishi1 is now known as bhavikdbavishi07:12
quiquell|offtobiash: What a brain fart my review :-(07:16
openstackgerritQuique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217507:17
*** quiquell|off is now known as quiquell07:17
tobiashquiquell: thanks07:17
quiquelltobiash: fixed07:18
quiquellthanks to you07:18
*** gouthamr has quit IRC07:40
*** gouthamr has joined #zuul07:42
*** quiquell is now known as quiquell|brb07:48
*** gtema has joined #zuul08:01
*** themroc has joined #zuul08:15
*** quiquell|brb is now known as quiquell08:22
*** bhavikdbavishi has quit IRC08:33
*** jpena|off is now known as jpena08:39
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210108:44
tobiashtristanC, corvus, Shrews: ^08:44
tobiashthere will be a second change that makes the quotacalculation resilient about this so that already wedged nodepools don't have to manually delete broken znodes08:45
*** bhavikdbavishi has joined #zuul09:06
*** bhavikdbavishi has quit IRC09:07
*** bhavikdbavishi has joined #zuul09:07
*** njohnston has quit IRC09:09
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210109:30
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290609:30
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290609:32
*** bhavikdbavishi has quit IRC09:40
*** njohnston has joined #zuul09:44
*** quiquell is now known as quiquell|brb10:21
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size  https://review.openstack.org/62201010:22
*** dkehn has quit IRC10:22
*** quiquell|brb is now known as quiquell10:52
bjackmanAny chance of a Workflow+1 on https://review.openstack.org/#/c/620838/  ?10:56
*** bhavikdbavishi has joined #zuul10:58
quiquelltobiash, corvus: The role fetch-zuul-cloner is legacy stuff ?11:09
*** bhavikdbavishi has quit IRC11:09
*** bhavikdbavishi has joined #zuul11:09
*** bhavikdbavishi has quit IRC11:16
*** bhavikdbavishi has joined #zuul11:16
*** bhavikdbavishi has quit IRC11:20
tristanCquiquell: zuul-cloner is no longer needed in zuulv3, the repos are pushed from the executor instead11:24
quiquelltristanC: So if we have something that depends on that we have to fix it, is that right ?11:26
tristanCquiquell: iiuc, the fetch-zuul-cloner let you not fix that, but it's recommended to use the new zuul.projects workspace yes11:28
quiquelltristanC: ack, thanks!11:29
tobiashquiquell: yes, that's legacy ;)11:29
quiquellsshnaidm: ^ looks like we cannot depend on the setup install fetch-zuul-cloner does11:29
quiquelltobiash: thanks, we have being using the install it does over python projects without knowing11:30
tobiashquiquell: you still can use it but it's only there for backwards compatibility and mainly used in openstack land11:30
quiquelltobiash, tristanC: broken requirements to using /home/zuul/src...11:30
quiquellcool cool thanks11:31
sshnaidmquiquell, the patch with requirement is in gates, so should be fine11:31
quiquellsshnaidm: Just thinking about removing it from the base of our repro, so we can catch those issues11:32
quiquellsshnaidm: Like the patch do we have to also include all the projects there ?11:33
sshnaidmquiquell, well, it's not really the issue while it's working11:33
quiquellsshnaidm: ack11:33
*** gtema has quit IRC11:40
*** bhavikdbavishi has joined #zuul12:02
openstackgerritMerged openstack-infra/zuul master: Fix "reverse" Depends-On detection with new Gerrit URL schema  https://review.openstack.org/62083812:04
openstackgerritBrendan proposed openstack-infra/zuul master: Fix urllib imports in Gerrit HTTP form auth code  https://review.openstack.org/62294212:05
*** gtema has joined #zuul12:14
*** jpena is now known as jpena|lunch12:25
*** bjackman has quit IRC12:52
*** themroc has quit IRC13:21
*** themroc has joined #zuul13:23
*** jpena|lunch is now known as jpena13:33
*** dkehn has joined #zuul13:35
*** rlandy has joined #zuul13:36
mordredtristanC: refactor stack lgtm - there's an oops in the commit mesage in https://review.openstack.org/#/c/62139614:02
*** sshnaidm has quit IRC14:06
*** njohnston has quit IRC14:08
*** njohnston_ has joined #zuul14:09
Shrewspabelanger: re: image build, not sure what zk info would have been lost. i'd have to go back through the code again14:09
pabelangerShrews: ack, thanks. For now, I've bumped the job timeout to account for the slow nodes, but might be a good optimization to look into.14:10
*** quiquell is now known as quiquell|off14:11
Shrewspabelanger: oh, i think it's because we no longer hold a lock for building that particular image (another builder could be actively building it now)14:11
Shrewspabelanger: not so efficient if you only have one builder with that image, but no way for us to know that atm14:12
*** sshnaidm has joined #zuul14:14
pabelangerShrews: oh, right. that makes sense14:15
tobias-urdinanybody running zuul with gerrit 2.16, planning upgrade but can't see any supported versions statement14:31
pabelangertobias-urdin: I've seen 1 fix for gerrit 2.16 in zuul so far: https://review.openstack.org/619533/14:41
pabelangerI figure that user is, but do not know the irc handle14:42
tobias-urdinpabelanger: thanks, i'll wait for a while and stop at 2.15 for now, 2.16 seems pretty fresh overall14:43
pabelangertobias-urdin: I know openstack has plans to upgrade, but not sure the timeline. I know corvus is also working on gerrit for opendev, so assume that will maybe use a newer version of gerrit.14:45
pabelangertobias-urdin: we should see what version of gerrit zuul quickstart is using14:45
pabelangerhttps://git.zuul-ci.org/cgit/zuul/tree/doc/source/admin/examples/docker-compose.yaml#n614:46
pabelangerseems like latest version14:46
tobias-urdincool14:47
tobias-urdinnobody remembers a coward :)14:48
tobias-urdini'll give it a try14:48
pabelanger++14:49
*** sshnaidm has quit IRC14:57
*** quiquell|off has quit IRC14:59
*** ParsectiX has joined #zuul15:16
*** sshnaidm has joined #zuul15:18
*** sshnaidm is now known as sshnaidm|afk15:36
*** jesusaur has quit IRC15:37
tobiashtobias-urdin: there is a second change for 2.16 too: https://review.openstack.org/62083815:41
tobiashtobias-urdin: jackman uses 2.16 so you could ask him how good it works atm15:41
tobiashs/jackman/bjackman15:41
*** jesusaur has joined #zuul15:42
tobias-urdinjust saw some plugins we are using haven't cut any stable-2.16 branches yet so i'll have to wait on 2.15 for a while :(15:51
*** hashar has joined #zuul16:04
tobiashfungi: I approved 554352 but it is in merge conflict16:09
fungitobiash: unsurprising. that change has been waiting for ages. i'll rebase it and fix up whatever conflicts it has nowish16:11
openstackgerritJeremy Stanley proposed openstack-infra/zuul master: Add instructions for reporting vulnerabilities  https://review.openstack.org/55435216:12
fungitobiash: ^ not bad. it was just the index, unsurprisingly16:12
fungithanks!16:14
tobiashnow we just need more contact persons...16:14
openstackgerritMerged openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210116:14
openstackgerritMerged openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient  https://review.openstack.org/62290616:14
*** j^2 has joined #zuul16:15
*** sshnaidm|afk has quit IRC16:15
*** pcaruana has quit IRC16:18
tobiashtristanC: ^16:19
tobiashwith tests :)16:19
tobiashShrews, corvus: were these the last fixes needed for 3.3.1 ^^ ?16:20
tobiashmaybe it makes sense to do a release probably next week?16:21
Shrewstobiash: that's proabably mostly your call since you've worked more with tristanC on his issues16:22
Shrewsi'm not aware of anything outstanding atm16:22
corvuswe'd probably have the same issue if we restarted more :)16:23
corvusso i think we should get all that in and restart openstack-infra, then when it looks good release16:24
tobiash++16:24
*** sshnaidm|afk has joined #zuul16:30
pabelangerwould be great to see 3.3.1 release16:32
*** themroc has quit IRC16:37
*** gtema has quit IRC16:49
*** bjackman has joined #zuul16:57
*** bhavikdbavishi has quit IRC17:02
*** bhavikdbavishi has joined #zuul17:02
*** hashar has quit IRC17:08
*** bjackman has quit IRC17:09
*** rlandy is now known as rlandy|brb17:10
tobiashcorvus, Shrews: as far I can see all recent fixes to the last release have been merged17:25
openstackgerritMerged openstack-infra/zuul master: Add instructions for reporting vulnerabilities  https://review.openstack.org/55435217:25
*** rlandy|brb is now known as rlandy17:42
*** jpena is now known as jpena|off17:42
*** mrhillsman has quit IRC17:51
*** mrhillsman has joined #zuul17:52
mrhillsmanany thoughts on why web would only show Loading... on the Builds tab?18:25
tobiashmrhillsman: typically if the api requests failed18:26
tobiashmrhillsman: you should check requests in the browser debugging window18:26
tobiashmrhillsman: does zuul-web have the sql connection configured?18:27
mrhillsmanlet me confirm that18:27
mrhillsmandid not know that was required18:27
mrhillsmanbut makes sense18:28
tobiashmrhillsman: zuul-web directly queries the database without asking the scheduler18:28
mrhillsmanthx18:32
mrhillsmantobiash which daemon handles writing to the db18:41
tobiashmrhillsman: the scheduler, but only if the sql reporter is added to the pipeline18:41
mrhillsmanyeah, it is there, and i can login manually18:42
mrhillsmanand pymysql is there18:42
mrhillsmani did not have mysql-client thought18:42
mrhillsmanthough18:42
mrhillsmannot sure if that mattered18:43
Shrewscorvus: tobiash: looks like we successfully removed around 430 empty zk nodes with the latest update18:43
tobiashShrews: cool :)18:43
mrhillsmanoh, i got it18:43
mrhillsmanthx tobiash18:43
mrhillsmani have found the error of my ways18:43
tobiashmrhillsman: you're welcome18:44
*** manjeets_ is now known as manjeets18:47
*** ParsectiX has quit IRC18:49
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add an upgrade release note for schema change  https://review.openstack.org/62304618:53
Shrewscorvus: tobiash: ^^^18:53
tobiash+218:54
*** bhavikdbavishi has quit IRC19:15
pandajhesketh: ping re: https://review.openstack.org/#/q/topic:freeze_job may I ask you what you expect this will be used ? something like runner --job <job-name> and will run the playbooks locally ?19:49
ShrewsSo, it seems we have a race in the test_handler_poll_session_expired nodepool unit test. I've seen it fail enough to make me begin to look into it. Not sure where the race is yet, though.21:06
Shrewsbut was able to reproduce it locally21:07
tobiashShrews: cool, I also thought about that but don't have an idea either21:09
openstackgerritMerged openstack-infra/nodepool master: Add an upgrade release note for schema change  https://review.openstack.org/62304621:17
*** rlandy is now known as rlandy|bbl23:25
*** j^2 has quit IRC23:26
*** j^2 has joined #zuul23:31
*** dkehn has quit IRC23:33
*** dkehn has joined #zuul23:34
*** j^2 has quit IRC23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!