opendevreview | Tim Burke proposed openstack/swift master: replicator: Use last-primary table https://review.opendev.org/c/openstack/swift/+/859349 | 00:47 |
---|---|---|
opendevreview | Jianjian Huo proposed openstack/swift master: Sharder: warn when sharding appears to have stalled. https://review.opendev.org/c/openstack/swift/+/859373 | 04:49 |
opendevreview | Alistair Coles proposed openstack/swift master: proxy: refactor error limiter to a class https://review.opendev.org/c/openstack/swift/+/858790 | 14:57 |
opendevreview | Alistair Coles proposed openstack/swift master: Refactor memcache config and MemcacheRing loading https://review.opendev.org/c/openstack/swift/+/820648 | 14:57 |
opendevreview | Alistair Coles proposed openstack/swift master: Global error limiter using memcache https://review.opendev.org/c/openstack/swift/+/820313 | 14:57 |
acoles | reid_g: is it possible you have run into https://bugs.launchpad.net/liberasurecode/+bug/1886088? i.e. do you have different liberasurecode versions? | 16:00 |
reid_g | Hmm. that does look possible. The 20.04 nodes have 1.6.1-4 and the 18.04 nodes have 1.5.0-1. | 16:04 |
reid_g | How do I know if "Note that this is only a problem *if your servers were using libec's alternative CRC*." | 16:05 |
timburke | reid_g, the second comment on the bug has a check_libec_crc.py script that you can run against some fragments | 16:06 |
reid_g | When I run this script against one of the quarantined fragments, it is saying that the stored as zlib. | 16:12 |
reid_g | When I check a file fragment that isn't quarantined, it says legacy. | 16:12 |
reid_g | (both on the 18.04 box) | 16:12 |
timburke | sounds like exactly the problem, then :-( | 16:12 |
reid_g | hmm | 16:13 |
timburke | i wonder how hard it would be to drop in the jammy package on the focal boxes, so you could have them writing legacy CRCs again... unfortunately, we didn't catch & fix the bug until 1.6.2 | 16:15 |
reid_g | So it sounds like we would need to install 1.6.2 on focal, set LIBERASURECODE_WRITE_LEGACY_CRC=1, finish upgrades on all nodes, remove LIBERASURECODE_WRITE_LEGACY_CRC=1, move quarantined objects back to where they belong? | 16:18 |
timburke | i think that'd be the recommended route, yeah. the fact that there's already-quarantined data makes it a little hairy -- it's hard to tell whether you've already got an availability issue. depending on how far along the upgrade you are, you might prefer to get 1.6.0+ on the remaining bionic nodes | 16:23 |
reid_g | I wonder if this package is in the cloud archive | 16:25 |
timburke | best way to stop the bleeding is to pull the bionic nodes out of your load balancer -- as long as the proxies are all writing frags with the legacy CRC and rebuilds are infrequent, you should generally get legacy crcs everywhere | 16:25 |
timburke | this also reminds me that i should keep pushing on https://review.opendev.org/c/openstack/pyeclib/+/817498 -- ideally we'd even have zuul building binary pyeclib wheels that include libec and isa-l and publish them to pypi / https://tarballs.opendev.org/openstack/pyeclib/ | 16:32 |
reid_g | I am going to meet with my team about this. Maybe we can backport the liberasurecode1 package from jammy | 16:43 |
reid_g | The note 'This issue was fixed in the openstack/swift 2.27.0 release.' just means that you can set the LIBERASURECODE_WRITE_LEGACY_CRC=1 via swift right? If you are using 2.25 & liberasurecode1>=1.6.2, you can just set the ENV manually to write legacy crcs? | 17:46 |
reid_g | timburke | 18:25 |
timburke | reid_g, yeah -- starting in swift 2.27.0, you could set `write_legacy_ec_crc = true` in your proxies, reconstructors, and internal clients and have swift set the env var for you. if you can manage the env vars on your own, any swift can take advantage of the legacy-crc mode with libec 1.6.2+ | 18:35 |
reid_g | Ok. Thank you! | 18:42 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!