Tuesday, 2020-01-14

*** TxGirlGeek has quit IRC00:04
*** TxGirlGeek has joined #starlingx00:07
*** mpeters-wrs has joined #starlingx00:27
*** byang has joined #starlingx00:48
*** TxGirlGeek has quit IRC01:28
*** sgw has quit IRC01:50
*** mpeters-wrs has quit IRC02:05
*** wangyi4 has joined #starlingx02:06
*** sgw has joined #starlingx02:54
*** mpeters-wrs has joined #starlingx03:09
*** mpeters-wrs has quit IRC03:43
*** mpeters-wrs has joined #starlingx04:02
*** wangyi41 has joined #starlingx04:16
*** cyan_ has joined #starlingx04:25
*** rchurch_ has quit IRC04:34
*** rchurch has joined #starlingx04:35
*** mpeters-wrs has quit IRC04:36
*** TxGirlGeek has joined #starlingx04:52
*** mpeters-wrs has joined #starlingx05:00
*** mpeters-wrs has quit IRC05:05
*** TxGirlGeek has quit IRC07:05
*** anran has joined #starlingx07:54
* wangyi41 sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/QyderTfmvjBlanwZaLRCIlPG >08:09
*** wangyi4 has quit IRC08:10
wangyi41This task force includes @anran, @yan_chen and me. Welcome more people join in.08:13
*** anran has quit IRC08:55
*** sgw has quit IRC09:45
*** byang has quit IRC12:09
*** mpeters-wrs has joined #starlingx12:14
*** ijolliffe has quit IRC12:52
*** ijolliffe has joined #starlingx13:18
*** mpeters-wrs has quit IRC13:50
*** sgw has joined #starlingx14:07
*** mpeters-wrs has joined #starlingx14:07
sgwMorning all14:08
*** mpeters-wrs has quit IRC14:09
*** mpeters has joined #starlingx14:09
*** billzvonar has joined #starlingx14:41
sgwslittle1: Morning14:45
ijolliffemorning - thanks wangyi41 and team - i see 4 reviews posted for the hack-a-thon14:46
*** billzvonar has quit IRC14:46
* sgw back in 90 or so15:02
slittle1CENGN had troubles setting up the build container via stx-tools/Dockerfile ...15:07
slittle1RUN pip install python-subunit junitxml --upgrade && \15:08
slittle1    pip install tox --upgrade15:08
slittle1failed .... anyone else observing this?15:08
dpenney_A new version of more-itertools was just released a couple days ago, so maybe we constrain it to the older version for now: https://pypi.org/project/more-itertools/#history15:09
dpenney_maybe python3-specific code was added and it can't build for python2.7?15:11
dpenney_looks like the previously successful build was 20200111T023000Z, which used more-itertools 8.0.215:20
*** TxGirlGeek has joined #starlingx16:02
dpenney_I've posted a review to resolve the build failure: https://review.opendev.org/70247116:27
*** abailey has quit IRC16:29
*** abailey has joined #starlingx16:30
slittle1looks good16:30
*** jrichard has quit IRC16:35
sgwslittle1: you around?  I am working on the layer build testing and having some issues16:54
*** mpeters has quit IRC16:56
slittle1what are you seeing?17:20
*** mpeters-wrs has joined #starlingx17:20
sgwslittle1: First I tried a basic build of compiler layer, download and built the pkgs OK, then I switched to flock17:27
slittle1sgw: yes ....17:28
sgwNow, I know I am not perfect so I started with download_mirror, but had forgotten to reset the repo/manifest17:28
sgwThere was an issue with the next steps of generate-cgcs-centos-repo and populate-downloads, so there needs to be somekind of error checking that the layers are checked now properly.17:29
slittle1to be clear .... are you using the same workspace as the former compiler layer build ?17:29
slittle1I've been using separate workspaces for each layer17:30
sgwYes, using the same workspace17:31
sgwAlso, I think that initial dir to find the centos-mirror-tools/config should be relative to MY_REPO not to the location of the scripts17:31
sgwthat way the scripts get copied to /usr/local/bin (in the Dockerfile) and everything else is relative to MY_REPO17:32
slittle1k17:32
slittle1stepping out for a few mins to heat lunch17:33
sgwOk, ping when back17:34
slittle1sgw: back17:41
sgwthat was a fast lunch, did you even have a chance to chew ;-)17:41
dpenney_the cengn build I kicked off got further, but looks like it fails on downloading the new kata container RPMs17:43
dpenney_http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20200114T172515Z/logs/17:43
slittle1just heated.... still eating17:51
sgwGot it.17:52
sgwdpenney_: did Bart talk with you about the PBR  stuff?  The PRC folks are still doing the performance check17:53
bwensleyI did talk with him.17:54
bwensleyHe seems OK with it, but I'll let him answer as well. :)17:54
slittle1meeting18:01
*** jrichard has joined #starlingx18:22
*** mpeters-wrs has quit IRC18:52
*** mpeters-wrs has joined #starlingx18:54
*** mpeters-wrs has quit IRC18:58
*** mpeters-wrs has joined #starlingx18:58
dpenney_+1 :)19:21
slittle1I think we'll need to revert stx-tools:431885231ae41256188a7c32f0f5351c4455707b to fix the CENGN build.19:25
dpenney_that would require reverting maybe 8 commits, vs updating the versions of the rpms in question19:26
slittle1Looks like the kata repo was updated Dec_10, and the update that just merged has never been updated to track the upstream change19:26
dpenney_they're all binary rpms, so we should be able to just update the LST file with the new versions, which I'm looking at now19:27
slittle1ok, I'll buy that19:27
dpenney_doing a test download now19:30
sgwdpenney_:  that +1 was to the PBR stuff?19:32
sgwslittle1: I guess we assumed the kata folks had the right versions, clearly they were working with cached RPMS19:32
dpenney_yeah, I'm not concerned over the versioning impacting the update mechanism, as the versions would always be incrementing (or PBR and semver would be fundamentally broken)19:33
dpenney_I'll post a review shortly for the kata fix19:33
sgwslittle1: so your working assumtion for the layering is that each layer is a different workspace?  Are you testing mirror download and generation based on that or are you using the corporate mirror for your default /import location?  in other words your mirror is always fully populated19:37
slittle1Trying to test both19:38
dpenney_review is posted: https://review.opendev.org/70250619:38
sgwI can start testing with the assumption 1 workspace/layer, but will start with empty mirror.  (I actually pass the "output" directory for my mirror to avoid an extra copy)19:38
dpenney_once https://review.opendev.org/702506 is merged, I'll kick off a new CENGN build again19:42
sgwdpenney_: so all you tested was that they exist and download, do we know if the functionality will change?20:00
dpenney_yeah, all I verified was that they could be downloaded to allow the build to proceed. Otherwise, we can revert all the kata updates and have them rebase20:01
sgwdpenney_: back the nvme follow-up, so if I use the command you suggested, will that also address the storage config issues?  I am not local to the machines right now, if I lock, delete the existing storage nodes, then I can't reboot them, is there a suggested process?20:02
dpenney_I would expect storage config should be fine. Configure the rootfs/boot device, the node should install and discover resources, and populate the system database appropriately20:06
dpenney_and I think deleting the node triggers a wipedisk and reboot, but I could be mistaken20:07
sgwdpenney_: system host-lock 420:16
sgwCannot lock a storage node when ceph pools are not empty and replication is lost. This may result in data loss.20:16
sgwIs there a way to force the lock20:16
dpenney_rchurch, do you know the answer to sgw's question?20:20
rchurchYou should be able to force lock the storage host. If you don't care about the data in the cluster you can delete the pools. That will restore HEALTH_OK and you can then lock normally20:23
dpenney_per Frank's email reply, I'll abandon my kata update and we can trigger a revert of the 8 kata updates20:24
sgwrchurch: force lock?  I dont see that in the help output20:26
rchurchswg: system host-lock [-f] <hostname or id>20:28
sgwrchurch: where is that actually documented, it's not part of the system host-lock --help option20:31
dpenney_system help host-lock20:32
rchurchYep. That's what I did20:32
sgwAh, you guys know your tools, may tools use --help of the sub option (think git, although it works both ways)20:34
sgwdpenney_  rchurch: so when I set up for personality=storage, do I have to define the storage disk also as nvme vs sd?20:35
sgwthe command that dpenney_ sent in email was just for rootfs and boot device20:36
dpenney_if your disks are nvme, yes20:36
dpenney_the disk configuration would be a post install step20:36
dpenney_for storage OSDs20:36
dpenney_the disks would get discovered and put in sysinv20:36
sgwdpenney_:  that's defined on in the existing documentation when doing a standard install, I will try that again, I did change those OSD commands20:37
*** rchurch has quit IRC20:39
dpenney_Kata update reversions: https://review.opendev.org/#/q/status:open+projects:starlingx+branch:master+topic:kata20:40
*** rchurch has joined #starlingx20:40
bwensleyStill need a core from tools and root.20:46
dpenney_sgw: can you review the two reversions for tools and root?20:46
*** mpeters-wrs has quit IRC20:47
*** mpeters-wrs has joined #starlingx20:48
sgwdpenney_: sorry stepped way to zap my lunch20:49
*** mpeters-wrs has quit IRC20:49
sgwlooking20:49
*** mpeters-wrs has joined #starlingx20:49
sgwdpenney_: your missing signed-off-by, but I will let it go this time20:50
dpenney_I just used the "revert" button in gerrit :)20:52
dpenney_didn't even notice the lack of signed-off-by ;)20:53
sgwno worries20:55
sgwguys would there be any reason that host-disk-list would not list a second drive (short of it not existing)?  The NUC I ordered were supposed to all be the same with 2 disks, is there a way to unlock ssh to get into a compute or storage node via ssh?21:37
dpenney_is it in a RAID config, maybe?21:43
dpenney_once it installs, you should be able to ssh in as sysadmin, using the original password - which will prompt for an immediate change21:43
*** mpeters-wrs has quit IRC21:46
abaileyhackathon review for test_pvs uploaded:  https://review.opendev.org/#/c/702537/21:49
rchurchThe sysinv agent might not recognize the disk if the disk's major device number is not in the supported list. Check out VALID_MAJOR_LIST in sysinv/common/constants.py21:50
abaileywhich failed zuul :(21:50
dpenney_kata reversions have merged and I've kicked a new cengn build21:51
sgwdpenney_: Ah, I thought all nodes got the new admin password set up for the controllers, that worked and apparently I got shorted on disks!  I need to double check hardware when I am back to where they are21:53
sgwdpenney_:  3 have 2 disks and 3 apparently only have 1 :-(21:54
dpenney_they do get the new password, but only after the first unlocking, when the puppet apply happens21:54
sgwAh, and those are locked because they are not fully provisioned yet, so will that screw with unlocking?21:55
dpenney_well that's unfortunate... I've seen cases where RAID config made two disks look like a single disk, I think21:55
dpenney_nope, you changing the password won't cause a problem21:55
sgwno I don't think it RAID the disks, like I said will double check21:55
* sgw BTW, to all, thanks for being here on IRC and helping out, this was way faster for me to diagnose and understand my issue21:56
sgwabailey: pylint gotcha!21:57
sgwdpenney_: just to confirm, if I try to unlock only one storage, it will fail, as it needs to talk to storage-1 before it will configure and provision properly, correct?21:59
rchurchThe unlock should succeed. The cluster will not be healthy until the other storage host is unlocked and any required data replication is done.22:01
sgwYes, it unlocked and is showing  operational/disabled and availability/failed22:05
sgwfm alarm-list shows that storage-0 experienced a configuration failure22:07
sgwand a "service affecting failre"22:08
sgwfailure22:08
rchurchLock the storage host. Check the puppet logs on the storage host: sudo less -R /var/log/puppet/latest/puppet.log22:10
*** ijolliffe has quit IRC22:22
sgwMore stangeness ensues, it decided to reboot itself as I was looking at the logs, it looks like it had /dev/disk/by-path/ nodes for a second nvme , but no /dev/nvme1n1 entry22:29
sgwThis might have to wait until Friday when I am back next to the hardware22:29
rchurchFWIW, I've provisioned storage hosts with NVMe only disks in the past, so I wouldn't expect a problem. Doesn't mean there isn't one, but it worked at one time.22:33
*** abailey has quit IRC22:37
sgwrchurch: thanks for that, helpful to know22:41
*** jrichard has quit IRC22:58
*** jrichard has joined #starlingx23:08
jrichardhackathon: updated review for network api tests is up.  https://review.opendev.org/70230023:28
*** mpeters-wrs has joined #starlingx23:31
*** mpeters-wrs has quit IRC23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!