Saturday, 2020-11-21

clarkbcounting off time to index 200 changes it does seem to be slowly getting quicker00:05
clarkbbut that might not be a wide enough sample to check00:05
clarkb~now is when we expected it to be done. It is not done if anyone is wondering. Still slow but maybe slowly getting quicker. I'll keep an eye on it00:34
clarkbfungi: corvus: I'll aim to be back around about 15:00 tomorrow as well00:34
clarkbbut we'll see how I do00:34
clarkb~10k changes in ~17 minutes00:37
clarkbnot great00:37
clarkbbut also watching it like this may not be great for my health. I'm gonna take a break00:37
clarkbI've discovered that there may actually have been a flag to tell the migrator to not reindex. That would have allowed us to do the gc'ing first then manually reindex. But at this point sticking to what we've tested is our best bet I think even if it takes all night01:29
corvus++01:29
corvusplan the dive and dive the plan01:29
clarkbare you mordred now?01:29
corvusi, um, used to have a long daily commute by train and read pulp adventure novels01:30
clarkbha01:30
clarkbfor anyone following along I don't erally expect this to finish before I go to bed so that I can kick off the gc01:30
clarkbI'll still check on it, but probably try and return tomorrow at 15:00 UTC. Assuming it exits 0 I think fungi you can probably go ahead and start the gc? but wait on others before doing the next steps. Or if you'd prefer to wait for me to be awake I'm cool with that too01:31
corvusclarkb: it's probably going to be fungi that hits the button; but in case i (or someone else) happens to be around first... it's ....01:32
corvussorry what step?01:32
clarkbcurrently 4.3: time find /home/gerrit2/review_site/git/ -type d -name "*.git" -print0 | xargs -t -0 -P 16 -n 1 -IGITDIR sudo -H -u gerrit2 git --git-dir="GITDIR" gc --aggressive01:32
clarkbplease run echo $? when this current command finishes so we can confirm it exits 001:33
clarkbduring testing we discovered that gerrit commands don't always tell you they have errored when they error :?01:33
clarkbso so its echo $? then if 0 step 4.3 from a couple lines above01:33
corvusclarkb: so 4.1 (migrate-to-notedb) that's running now; then 4.2 when that finishes, and if it's zero and nothing seems to be on fire, 4.3 (gc).  right?01:34
clarkbcorrect01:34
corvusclarkb: can i 'strikethrough' the steps done on the etherpad?01:34
clarkbcorvus: yes I think that is fine01:34
corvusdone (and i bolded 4.1)01:35
corvusclarkb: have a good evening!01:35
clarkbI'll try! :) dinenr then the mandalorian I hope01:36
fungii just caught up, had two episodes to get through01:50
fungiand yeah, this looks like it's taking a while01:50
fungii'm planning to fire off the git gc when i wake up, assuming the reindex is even done by then01:51
*** hamalq has quit IRC02:59
ianwReindexing changes: project-slices: 29% (785/2697), 30% (235273/760363) (-) fyi03:47
clarkbjust crossed 300k04:47
clarkbalso I've learned that one of the things the wikimedia changes does is shuffle the project "slices" They are supposed to be broken down into smaller chunks to prevent a single repo from dominating the cost like nova04:54
clarkbhowever, that element of randomness may explain why we see times that vary so much ? at least contribute to it04:55
clarkbI haven't done as much testing as wikimedia did, but I would be really surprised if it is faster to skip around like that. it seems like you want to keep things warm in the cache04:56
clarkbeg do all of nova, then do all of neutron and so on04:56
clarkb"It does mean that reindexing after invalidating the DiffSummary cache will be expensive" another tidbit from the source (I wonder if we're in that situation perhaps induced by the notedb migration?05:01
clarkboh neat they also split up slices based on changeid/number not actual ref count05:09
clarkbso if you've got lots of changes with lots of refs (patchsets) in certain projects those won't be balanced well05:09
clarkbthey also use mod to split them up so change 1 and 2 go in different slices and 101 and 102 go in different slices if moddiny by 2. When you probably want them to be in the same slice due to git tree state cache warmth? Anyway thats probably enough java for me tonight. There is likely quite a bit of room for improvement in the reindexer to be more deterministic and less reliant on luck05:11
clarkboh and when we tested we would typically start gerrit at 2.16 and maybe that populates the DiffSummary caches? We didn't want to do that this time ebcause to interact with it we'd have to drop our web notice. It would be funny if not starting on 2.16 without notedb was the problem05:17
mnasero/ is there an etherpad with the steps that are occurring and what was done / left to do for those curious people who want to watch from the sidelines ?05:19
mnaser(aka me)05:19
clarkbmnaser: https://etherpad.opendev.org/p/opendev-gerrit-3.2-upgrade-plan the bolded item is the one we're on05:19
clarkbmnaser: we are currently doing the last part of the notedb migration which is a full reindex (which is going slower than expected but we also planned for this long task to happen during the between days period)05:20
clarkbwhen this is done we git gc all the repos to pack up the notedb contents (makes things faster), then upgrade to 3.0, 3.1, 3.2 and reindex again05:21
mnaserCool!  So it sounds like the major migration is done05:22
clarkbthe actual data migration part is ya. Now its a bucnh of house keeping around that (reindex and gc)05:22
mnaserI’d argue that the actual migration into notedb is the trickier bit, indexing is indexing05:23
mnaserAwesome05:23
mnaserSo I assume from now on, Gerrit will no longer use a database server05:24
mnaserIt will be using purely notedb I guess?05:24
clarkbunfortunately that is a bad assumption :P05:24
clarkbthe accountPatchReviewDb remains in mysql05:24
clarkbits the single table database that tracks when you have reviewed a file05:24
clarkbbut ya one of the changes I have proposed and WIP'd is one to remove the main db configuration from the gerrit config05:25
clarkbwe'll actually do that cleanup after we're settled on the new version as its ok to have the old db config in place. gerrit 3.2 will just ignore it05:25
mnaserOh I see05:26
mnaserSo in a way however the database is not that important, you’d just lose track of what patches you reviewed if that db is lost?05:26
clarkbwhat files you have reviewed05:26
clarkbthe change votes are in notedb05:26
clarkbyou know when you look at a file diff and it gives you a checkmark on that file?05:27
mnaseroh yes05:27
clarkbthats all that database is doing is tracking those checkmarks next to files for you05:27
clarkband ya its not super critical05:27
clarkbreplication to gitea will also take a bit once this is all done as all that notedb state will be replicated for changes05:28
mnaserand I guess in terms of scale there’s a few other deployments who have ran at our scale or even bigger :p05:28
mnaseroh ouch, that will add a lot of additional data that is replicated across every gitea system05:29
clarkbya I haven't checked recently. I think gerrithub may be similar? But they didn't really exist until notedb was a thing? I may msiremember that. I know they were a driving force for it because it meant they could store stuff in github iirc05:29
clarkbmnaser: ya the problem is refs/changes/12345/45/meta is where it goes05:29
clarkbso you can't replicate the patchets without the notedb content (since git ref spec doesn't allow you to exclude things like that as far as I can tell)05:30
clarkbI don't expect it will cause many issues once we get the initial sync done05:30
clarkbthat will just take some time (in testing it was like 1.5 days)05:30
mnaserLooks like gerrithub is in the 500000s of changes05:30
mnaserAnd I think we’re in the 700k’s05:30
clarkb76036305:31
clarkbwe're watching a slow count up to that number on the reindex right now05:31
mnaserDoesnt Google have a big installation too?05:31
clarkbthere is the gerrit gerrit, chrome, and android05:32
clarkbhowever, google doesn't really run gerrit05:32
clarkbthey use dependency injection to replace a bunch of stuff aiui05:32
clarkbso that it ties into their proprietary internal distributed filesystems and databses and indexers etc05:32
mnaserThe chrome one is at 2.5m wow heh05:33
mnaserOh I see so they’re probably not running notedb05:33
clarkbwe discovered this the hard way when we did an upgrade once and jgit just didn't work05:33
clarkbit turned out that jgit was fine talking to their filesystem/storage/whatever it was but not to a posix fs05:33
clarkband so no one caught it until an open source deployment upgraded05:34
clarkb(us)05:34
mnaserouch05:34
corvusi think they're using notedb, but the git data store isn't what mere mortals use06:25
corvusReindexing changes: project-slices: 49% (1345/2697), 51% (390766/760363) (/)    |06:25
corvusthat's a timestamped progress status before i go to bed06:26
ianwReindexing changes: project-slices: 74% (2021/2697), 77% (587125/760363) (-)10:21
ianw25% in ~ 4 hours10:21
ianwthat puts it at about 14:00UTC to finish10:22
fungiyeah, awake again and it's claiming around 88% complete now12:21
fungiReindexing changes: project-slices: 87% (2373/2697), 89% (679414/760363)12:27
fungi99%!13:44
fungionce this wraps up, assuming it looks good, i'll start the git gc and then i need to run out to the hardware store to pick up an order for some tools13:46
fungi1086m41.925s14:11
fungithat's 18h6m42s14:12
fungiexited 014:12
fungii've pulled the gc command back up and will start it momentarily14:13
fungijust need to switch computers to double-check our notes14:13
fungiokay, looking good and i've updated our notes to indicate which step we're on, gc is running now14:18
clarkbthanks. I'm very slowly waking up but maybe I can take it easy for another hour or teo now14:19
fungiestimated time to completion is 1.25 hours so hopefully done before 16:0014:19
clarkbthe previous gc times were failry accurate if a few minutes fast iirc14:20
fungiother than the final offline reindex, all the other steps should go quickly14:20
fungiat least up until we start gerrit again, and then there's the replication which will probably take ages14:21
fungiand the long tail of fixing things which are broken (some of which we know about, some of which we likely don't yet)14:21
fungianyway, not seeing any obvious errors stream by, so i'll take this opportunity to go pick up my order and be back in plenty of time for the rest of the upgrade14:22
clarkbthanks again14:22
corvuso/14:25
fungiokay, i'm back. if the gc finishes at 1.25 hours then that'll be ~12 minutes from now15:21
clarkbjudging by the cinder runtime when I checked about 5 minutes ago I think it will be longer but not significantly so. All the expensive repos seem to be processing at this point15:22
clarkbnova, cinder, horizon, manuals15:24
clarkboh and neutron15:24
clarkbnova is the only one running now15:33
fungi80m54.544s15:39
fungiexited 015:40
clarkbabout 6 minutes logner than estimated much better.15:40
fungiokay, ready for the next pull?15:41
clarkbyes that loosk good to me15:41
fungiopendevorg/gerrit   3.0                 fbd02764262c        46 hours ago        679MB15:41
clarkbthat looks about right15:42
clarkbif you're ready to run the init I am15:42
fungirunning15:42
fungiin testing this was near instantaneous15:42
fungi0m12.344s and exited 015:43
fungino error messages15:43
fungiready for me to work on 3.1 or want to check anything?15:43
clarkbI don't think there is anything other than the exit code to check15:44
clarkblets do 3.1. This init doesn't do any schema updates15:44
fungiand pulling15:44
fungiopendevorg/gerrit   3.1                 eae7770f89d6        46 hours ago        681MB15:45
clarkblgtm15:45
fungiready to init with 3.1?15:45
clarkbI think so. Can't think of anything else to check first15:45
fungiunderway15:45
clarkband done15:45
fungi0m11.280s15:46
fungiexited 015:46
fungiready to pull 3.2?15:46
clarkbyup15:46
fungiopendevorg/gerrit   3.2                 6fdfe303e8df        46 hours ago        681MB15:47
clarkbthat image lgtm15:47
clarkbI think we can do the reindex15:47
fungirunning15:47
clarkber no sorry I keep getting ahead of myself15:47
clarkbthe init15:47
fungiyeah, the init is what i'm running, sorry15:47
clarkbthe command you have queued looks right :)15:47
fungiokay, running now15:47
fungi0m13.628s and exited 015:48
fungi*now* it's time to reindex15:48
clarkbyup and the command you have up for that lgtm15:48
fungiokay, starting it now15:48
fungieta 41 minutes15:49
fungithen we start gerrit and begin unwinding things15:49
clarkbor 18 hours :/15:50
fungiyeah, ugh15:50
fungiwell, we're already at 1% done so hopefully not 18 hours15:51
clarkbya this is going much quicker just counting off progress at 20 second intervals15:52
clarkbwe were doing about 200 changes per 20 second interval last night. This just did like 4k15:52
clarkbI think the gc'ing helps tremendously15:52
clarkbfor the unwinding it would be good for others to maybe look over what I've written down again and just sanity check it. I think my biggest concern at this point is any interaction between our ci/cd and gitea replication lag15:53
clarkbI believe in cd we pull from gerrit and not gitea so that isn't an issue but I've got us explicitly replicating our infra repos first to mitigate that15:54
clarkbas another sanity check our disk utilization has gone up about 5GB since the gc which is what we expected based on testing15:55
clarkb93GB -> 98GB on that fs15:55
clarkbthe unpacked state was about 110GB iirc15:55
clarkbalready up to 10% much much quicker this time15:57
clarkbthose exceptions in the screen scrollback are expected (small number of corrupted changes)15:59
corvusgerrit needs its coffee15:59
corvusi'm estimating ~18:00 for completion of this step16:00
corvusoh the rate seems to have just significantly improved16:00
corvusand my math was wrong16:01
clarkbcorvus needs coffee too?16:01
corvusmaybe ~17:00?16:02
clarkbya about another hour by my math16:02
clarkbit took ~14 minutes to get to 20% so another 4 blocks of 14 minutes16:03
corvusi have to run an errand; i probably won't be back until after this completes, but i'll check in when i get back and see if there's unexpected issues i can help with16:05
fungithanks!16:17
clarkbit is up to 61% now16:26
clarkbI guess the trick with the notedb migration would've been to somehow stop that process prior to reindexing, then garbage collect, then reindex manually. Reading the code there is a --reindex flag but it isn't clear to me if you can negate that somehow. Anyway we shouldn't need to do this again so not worth thinking about too much anymore16:27
clarkbfungi: not to get ahead of myself, but do you think we should block port 29418 and leave the apache message in place when we first start gerrit? then check that logs indicate it is happy before opening things up?16:28
clarkbI did have us starting gerrit before updating apache to check logs but realize that port 29418 would still be accessible16:28
fungiyeah, wouldn't hurt to temporarily remove public access to that port initially, but obviously we shouldn't start up anything which would need access either (like zuul)16:37
fungii can edit the firewall rules temporarily now to do that. i'll use a second window in that screen session16:38
clarkbya there are a number of things I think we should do before starting zuul in the etherpad16:40
fungiand done16:40
clarkbthanks16:40
clarkbI'm putting together a list of scripts to update to use the 3.2 image on review.o.o now since it occurred to me that we run manage-project type things periodically iirc16:41
fungiiptables -nL and ip6tables -nL now report no allow rule for 2941816:41
clarkband we don't want them to use the old image (ist actually probably ok for them to use the old image since its the same version of jeepby but I don't want to count on that16:41
fungi(i left the overflow reject rules for 29418 in there for now)16:41
fungi90%16:42
clarkbdocker-compose.yaml, /usr/local/bin/manage-projects, /usr/local/bin/track-upstream seem to be the files using that variable when I grep in sytem-config16:43
clarkbdocker-compose is already edited but we should update the other two before starting zuul (I've made a note in the etherpad too)16:43
clarkbdone in 59 minutes16:48
fungi59m13.719s exited 016:48
fungiyep16:48
fungiokay, and 29418 is currently blocked so in theory we can start gerrit and check its service logs for obvious signs of distress16:48
clarkbyup I think that is our next step16:49
clarkbdocker-compose up -d16:49
fungiready?16:49
clarkbI guess so16:49
clarkbGerrit Code Review 3.2.5-1-g49f2331755-dirty ready16:50
clarkbthat plugin manager exception is expected. I believe it is because we don't enable the oplugin manager in our config but have the plugin installed16:50
fungisomething to add to the to do list to remove or enable i guess16:50
clarkbya16:50
clarkbbefore we open things up I should add my gerrit admin ssh key. But I think you've had more experience with doing those things so maybe you want to do the force submit of the change if it still looks good to you as well as kick off replication for system-config and project-config?16:51
clarkbwe want to force merge first then replicate I think16:51
clarkbalso before we go further let me reread the etherpad notes :)16:51
fungiare you going to be able to do those things without 29418 open?16:52
clarkbno I'm saying lets just be ready for that when we open it16:52
fungioh, sure16:52
clarkbbefore we open things though why don't we fix /usr/local/bin/manage-projects and /usr/local/bin/track-upstream ?16:52
clarkbwe need to change the image version in those scripts to 3.216:53
fungionce 29418 is open i can add your openid account to project bootstrappers temporarily so you can add verify +2 and call submit16:53
clarkbfungi: do you want to do the script fix in the screen or should I just do them off screen then you can confirm on screen?16:53
fungido we have a change to update /usr/local/bin/manage-projects and /usr/local/bin/track-upstream already?16:53
fungithey're not going to get called until we reenable the crontabs16:54
clarkbfungi: yes, the change whcih we force merge sets gerrit_container_image in ansible vars and that is used in docker-compose and the two scripts16:54
fungiahh, okay16:54
clarkbfungi: manage-projects is called by zuul periodically iirc16:54
clarkbso once zuul is up it may try it16:54
fungiwell, ansible is still disabled for the server too16:54
clarkboh good point16:54
clarkbwell I think we should fix it anyway since its a good sanity check16:55
fungisure, i can edit those manually for now16:55
clarkbmy concern in particular is a race between the config management updates and the manage-project updates16:55
clarkbI don't know that they always go in order16:55
fungilgty?16:56
clarkbthose edits lgtm thanks16:56
clarkbok give me a minute to get situated with auth things then I guess we can turn it on and force merge the config mgmt change then replicate16:57
clarkbalright i've got keys loaded and have my totp token16:59
fungicool, so open 29418 first or undo the maintenance page in apache first?17:00
clarkbI think lets undo apache first17:00
fungidoes that look correct?17:00
clarkbyes, but we also want to remove the /p blocks too17:00
fungilike that?17:01
clarkbyup17:01
fungiready for me to reload apache2?17:01
clarkblet me just double check zuul isn't running somehow17:02
fungik17:02
clarkbps shows no zuul processes on zuul0117:02
clarkbI guess we continue unless you can think of anything else17:02
funginope, nothing comes to mind17:02
fungiand it's up17:03
fungii get the webui17:03
fungisigning in17:03
clarkbI'm signed in17:04
clarkbas my regular user. Did you want to review https://review.opendev.org/c/opendev/system-config/+/762895/1 and maybe be the one to force merge it?17:04
clarkbdoesn't look like anyone else has voted on it yet17:04
fungiyeah, signed in as my normal user too17:04
fungifiring up gertty17:05
fungiseems to be syncing okay17:05
clarkbI'm removing my WIP on that change now17:05
fungishould to remember to remind gertty users that now they need to add "auth-type: basic" to their configs17:06
fungiworth noting you actually wanted me to review https://review.opendev.org/c/opendev/system-config/+/762895/2 not /117:09
fungitook a bit to realize i was looking at an old patch there17:09
clarkboh sorry thats what it redirected me to from my link in theetherpad17:10
clarkbbecause etherpad had the /1 too17:10
fungino worries, i've voted +2 on it17:12
clarkbfungi: ok do you want to submit it or do you want me to?17:12
fungii can do it, just a sec17:12
clarkbyou'll need to add the +2 verified too17:12
fungiyep17:12
fungiand workflow +1 obviously17:12
clarkbonce that force merges I want  to see if replication for system-config replicates everything or just that ref17:13
clarkbbut generally replicate system-config and project-config next I think17:13
fungifatal: "762895" no such change17:16
fungid'oh17:16
fungii was doing it to review-test ;)17:16
* fungi curses his command history17:17
fungineed to open 29418 on review.o.o for this17:17
fungiare we good with that?17:17
clarkbyes I am17:17
clarkbalso you still need the verified +2 (I assume your admin accounts will do that)17:17
fungiit will17:18
clarkbfungi: note that rules.v4 is the file now iirc17:18
clarkband if we missed actually blocking 29418 on ipv4 then oh well at thsi point :) it seems fine17:19
fungiyeah, i'm just keeping rules consistent with it until we confirm and clean up the cruft17:19
clarkbkk17:19
fungii edited all three17:19
clarkbgotcha17:19
fungiokay, it's merged and i've removed membership for my admin account from project bootstrappers17:20
clarkbnow lets see what is being replicated17:20
clarkbnothing in the queue so did it only replicate that ref? /me looks at gitea17:21
clarkbhttps://opendev.org/opendev/system-config/commit/2197f11a0f27da3f9bd1c009c84107dc09559f6e yes only that ref17:22
fungineat17:22
fungii suppose we need to manually trigger a full replication17:22
clarkbwhat I think that means is we could not replicate anything and let it catch up over time?17:22
clarkbya or we manually replicate. I still think we manually replicate system-config and project-config first though17:22
fungii can trigger replication for system-config first17:23
clarkbprobably ripping off this bandaid is the best option to ensure we have plenty of disk on the giteas17:23
clarkbfungi: ++ that would be great17:23
fungitriggered17:23
clarkbthere are already 4 new changes too17:24
clarkbhrm system-config is done replicating? that took suspciously little time17:25
clarkbI see the refs on gitea01 though17:27
clarkbI wonder if part of the reason we were slow replicating in testing was network bandwidth17:27
fungicould be...17:28
fungii can trigger project-config next17:28
clarkb++17:28
fungidone17:28
clarkbthis might be overoptimization: but we may also want to do nova, neutron, cinder, horizon, openstack-manuals so that we can run teh gitea gc after they are done17:28
fungii assumed we were talking openstack/project-config not opendev/project-config in this case17:28
clarkbsince they should be the biggest repos17:28
clarkbfungi: correct17:28
fungisure i can do nova next and see what happens17:29
clarkbwow it says project-config is done17:29
fungitailing replication_log in the screen session is probably not useful. it lags waaaay behind because of how verbose the logs is17:30
clarkbspot checking project-config on gitea01 shows that it seems to have worked too17:30
clarkbI see refs/changes/xy/abcxy/meta17:30
clarkbbut ya lets work through that list I posted just above, check disk usage on gitea01 and run gc on all the giteas if it looks like we expanded disk use a lot17:31
clarkbthen when we're happy with that trigger full replication then start looking at zuul I guess17:31
fungithe replication log is really, really busy though, are you sure it's not actively replicating everything?17:32
clarkbfungi: gerrit show-queue -w says no17:32
fungistrange17:32
clarkbif I start a new tail on replication_log its quiet17:32
clarkbI think thats just screen and ssh buffering with large amounts of text?17:33
fungipreviously it was very noisy but now it seems to have quiesced, yeah17:33
fungiokay, i'll do nova now17:33
clarkb++17:34
fungiand it should be running17:34
clarkbI see it in the show queue17:34
fungiyeah17:34
clarkbI see disk use slowly increasing on gitea01 so it seems to be doing things17:36
fungistatus notice The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored.17:40
fungisomething like that ^?17:40
clarkbsounds good to me17:40
clarkbnova replication is done according to show queue and disk use increased by about a gig so ya I think doing some of these big ones first, gc'ing then doing everything is a good idea17:41
fungi#status notice The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored.17:41
openstackstatusfungi: sending notice17:41
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored.17:41
fungiokay, i'll do openstack-manuals next17:42
clarkb++17:42
fungiand it's running17:42
clarkband honetly at the rate these have gone I think we should start global replication, benchmark it, then see if we can wait a bit before starting zuul since it seems quick. If benchmarks say it will eb all day then nevermind17:42
fungisure17:43
clarkbsince that will rule out any out of sync unexpectedness17:43
clarkbmanuals is done17:44
openstackstatusfungi: finished sending notice17:44
fungineutron next?17:44
clarkbI think you can just enqueue the others in the list and let gerrit figure out ordering17:44
clarkbI would just tell it to do neutron cinder and horizon now17:45
fungiyup, was just finding your original list in scrollback17:45
clarkbthat list is based on which things were slow to gc which implies more data/more refs17:46
fungitriggered all three17:46
clarkbhorizon is done, neutron and cinder still running17:47
* mnaser is playing around gerrit right now17:48
fungijust be aware zuul is still offline17:49
mnaserfungi: yep!  i'm just trying to see if the gerrit functionality itself seems to be okay17:49
fungithanks, appreciated!17:49
mnaseri am noticing a few things, none are critical of course, but "oh, interesting" type of tings17:49
clarkbmnaser: ya I expect a lot of that :)17:49
fungisure, i'm going to hate the new ui for a while i'm sure17:50
mnaseri.e. anything except verified/code-review/workflow are under this thing called "Other labels"17:50
clarkbpolygerrit adds a bunch of new excellent features and some not so great things17:50
mnaserso roll call votes in governance are under "Other labels"17:50
mnaserbackport candidate patches seem to be affected too, not a big deal but maybe good for us to know how it decides whats other and whats not17:50
clarkbbut where we were was a dead end so we're ripping the bandaid off and going to try and work upstream and with plugins etc to make stuff better17:50
clarkbmnaser: have a link to a change so we can see that?17:50
mnasersure -- https://review.opendev.org/c/openstack/governance/+/76091717:51
fungialso i'm noticing that the gitweb links are broken, probably worth working on a proper link to gitea to replace those anyway17:51
mnaseryou can see rollcall-vote is under other labels, so is code-review in there (but i guess maybe that's cause code-review doesn't mean anything for merging inside openstack/governance)17:51
fungimight be a good time to start a post-upgrade notes etherpad where we can collect lists of things which have changed people might ask about, and things we know are broken which will either be fixed or removed17:52
mnaseryeah, i can start putting a few things in there too17:52
clarkb++17:53
mnasersome other minor things are the ordering of code review comments17:53
mnaserit seems to be verified, code-review then workflow17:53
clarkbI think it was that way before?17:53
clarkbI've already forgotten17:53
mnaseri remember you would see code-review, verified, workflow in the list17:53
mnaserzuul always came in the middle, workflow was always at the end17:53
mnaser(in the display of votes at least)17:54
clarkbfungi: ok those replications are done and we're using 4gb extra disk. I'll trigger the gc cron on all of the giteas now? any other repos you think we should replicate first?17:54
* corvus checking in17:54
clarkbcorvus: tl;dr is gerrit is up and seems ok so far. replication is much quicker than anticipated. We are manually triggering replication for "large" repos so that we can gc on the giteas to pack back down again then start global replication17:54
clarkbafter that we'll eb looking at zuul17:55
fungii've started a pad here https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes17:55
corvus++17:55
fungimnaser: ^17:55
mnaserfungi: cool i'll fill those out17:56
fungiclarkb: i agree, git gc on gitea next17:56
clarkbcorvus: fungi any other repos we should manually replicate? We have done system-config project-config nova neutron cinder horizon and openstack manuals17:56
fungithen we can do a full replication17:56
corvuscan't think of any others17:56
clarkbfungi: k will give corvus a minute to bring up any other repos that may be worth doing that to then I can do the gitea gc'ing17:56
clarkbcool I'll work on gitea gc'ing now17:56
fungijust to avoid overrunning the fs with all of them at once17:57
fungithanks!17:57
mnasersomething i remember broke last time we did an update was all the bp topic links from specs17:57
mnaseri just tested one and its working just fine17:57
mnaserspecifically: https://review.opendev.org/#/q/topic:bp/action-event-fault-details from https://blueprints.launchpad.net/nova/+spec/action-event-fault-details as an example17:57
mnaseroops17:58
mnaseri found our first broken17:58
mnaserDirectly linked changes are redirecting to an incorrect port, Example: https://review.opendev.org/712697 => Location: https://review.opendev.org:80/c/openstack/nova/+/712697/17:59
mnaseri added that to the etherpad17:59
mnaseri remember fixing that inside our gerrit installation actually, let me find17:59
clarkbthat could be related to the thing fungi linked about after the bug fixing this week18:00
fungimnaser: that may be a known issue, at least wmf and eclipse ran into it and filed bugs18:00
mnaserif i remember right, we did this: `listenUrl = proxy-https://*:8080/`18:00
mnaseror maybe that was for https redirection stuff18:00
fungiapparently we can fiddle the proxy settings in apache if it's the same issue18:00
* fungi checks notes18:00
clarkball 8 giteas are gc'ing now18:01
fungimnaser: can you see if it looks like https://bugs.chromium.org/p/gerrit/issues/detail?id=1370118:02
clarkbusing /c/number works fwiw18:02
clarkbthat may be an easy workaround for now if necessary18:02
fungiif so the solution is supposedly "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}" in our vhost config18:02
corvusclarkb: well, the /# links are supposed to be "permalinks" so i don't think "use /c" is an easy solution (the problem is existing links point there)18:03
mnaserthat makes sense18:03
clarkbcorvus: yup we should fix it18:03
corvusfungi: x-forward-proto makes sense to me18:03
mnaser"X-Forwarded-Proto is now required because of underlying upgrade of the Jetty library, when Gerrit is accessed through an HTTP(/S) reverse-proxy."18:03
clarkbI think I have figured out why replication timing is so much better. its because we're not replicating all the actual git content now18:03
mnaserindeed, so yes, that does all make sense18:04
corvusanyone writing an x-forwarded-proto change?18:04
clarkbI'm not18:04
corvuslooks like i am :)18:04
clarkbin fact I need to find something to drink. back shortly18:05
* mnaser keeps looking18:05
corvusi kind of want to ninja the fix in first just to make sure it works18:05
fungicorvus: please feel free to hand-patch it into the config first18:05
corvusk will do both18:05
fungii agree the change isn't much good if the fix turns out to be incorrect for our deployment for some reason18:06
mnaseri'll add the "you'll need a new version of git-review" to "what's changed"18:06
mnaseras i guess that might come up18:06
corvusmnaser: redirect look good now?18:08
mnasercorvus: yes!  working in my browser and curl shows the right path too18:08
clarkbyay18:09
mnaserseems like gerritbot is not posting changes18:10
fungimnaser: i thnik it's git-review>=1.2618:10
mnaseri am not sure if thats cause its turned off or18:10
fungiit probably needs to be restarted now that the event stream is accessible18:10
fungii'll do that now18:10
clarkbcorvus: fwiw if you'relooking at the vhodt I think there may be old cruft in there we should cleanup. I always get lost when looking though18:11
mnaserfungi: i see the 1.27.0 release notes have: "Update default gerrit namespace for newer gerrit. According to Gerrit documentation for 2.15.3, refs/for/’branch’ should be used when pushing changes to Gerrit instead of refs/publish/’branch’." -- is it not that change?18:11
corvusremote:   https://review.opendev.org/c/opendev/system-config/+/763577 Add X-Forwarded-Proto to gerrit apache config [NEW]18:11
fungigerritbot has been restarted18:11
clarkbcorvus: fungi ^ should we force merge that one too?18:11
corvusclarkb: i see comments related to upgrade i will address them18:11
clarkbcorvus: well the upgradethins should be handled18:12
mnaserwell look at that, i can now post emojis in my changes without a 50018:12
mnaser:P18:12
clarkbas part of the earlier force merge18:12
fungimnaser: thanks, yeah 1.27 sounds right, i was going from memory18:12
corvusclarkb: oh, er, what do you want me to do?18:13
corvusclarkb: i agree that the TODO lines have been removed in system-config master18:13
clarkbI'm more thinking about what I think is old gitweb config. I don't think it needs doing now. I just mean someone that groks apache better than me should look at that vhost and audit it18:13
corvusclarkb: i have manually removed them from the live apache config18:13
clarkbas there may be a few cleanups we can do18:13
clarkbcorvus: thanks18:13
corvusbut they were already commented out, so that should all be a noop18:13
mnaseris the "links" part in the gerrit change display something that is customizable by the deploy (where gitweb currently is listed?). if so, probably would be neat if we added a "zuul builds" link which went to a prefiltered zuul build search using the changeid!18:14
clarkbthe gitea gc's are still going. The cron only does one repo at a time18:14
clarkbmnaser: you can probably write a plugin for that18:15
mnaserok i see, so the gitweb link comes from a plugin18:15
clarkbmnaser: gitweb is built in but gitiles is a plugin18:15
clarkbaiui18:15
mnaserhttps://review.opendev.org/q/hashtag:%22dnm%22+(status:open%20OR%20status:merged) tags stuff working pretty neatly too18:15
fungibut i feel like we should consider replacing that with a link to gitea anyway if we can18:15
corvusmnaser: https://review.opendev.org/Documentation/dev-plugins.html#links-to-external-tools may be relevant?18:16
corvuslooks like we'd need to do a tiny plugin18:16
mnaserou, that's pretty cool and seems like it would be quite straightforward too18:16
corvusnot sure if that's the right interface to put it in the 'links' section18:17
corvusbut seems pretty close to that18:17
corvuscould incorporate that into the zuul plugin18:17
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/763577 lgtm if you want to review that one and force merge it too?18:18
corvusspeaking of which https://gerrit.googlesource.com/plugins/zuul/18:18
mnaserhttps://review.opendev.org/c/openstack/project-config/+/763576 seems to work pretty well too for a WIP change that is accessible :)18:18
corvusalso https://gerrit.googlesource.com/plugins/zuul-status/18:18
corvusbtw gertty has half-implemented support for hashtags18:18
corvusi will be motivated to finish it now :)18:19
clarkbmnaser: ya one followup we can look at doing is removing workflow -118:19
mnaserit seems like i see some 3pci still reporting to cinder, so they're probably 'just fine'18:19
fungi763577 is merged18:20
mnaserit looks like you can mark a change as private, which i guess can be useful18:20
clarkbyup and gerritbot reported it18:20
clarkbmnaser: hrm I think we should actually disable that18:21
fungiindeed it did18:21
mnaseryeah i remmber it was disabled before18:21
fungiwell, "drafts" were disabled18:21
clarkbmnaser: I don't want people assuming "private" is really "private' until we can check it18:21
clarkbya private is a newer thing iirc18:21
mnaseri wonder if you can enable it per project too, or for specific users18:21
fungibut gerrit removed drafts and replaced them with two features, private changes and work in progress status18:21
mnaserwould be really nice for embargo'd security changes18:21
clarkb"Do not use private changes for making security fixes (see pitfalls below)"18:22
clarkbno it won't be :P18:22
mnaseraha18:22
clarkbthis is why I don't want it enabled if we can disable it18:22
clarkbdrafts was a honeypot and private will likely be too18:22
mnaseri'll add it to "what's changed" for now18:22
clarkbhttps://gerrit-review.googlesource.com/Documentation/intro-user.html#private-changes that quote is from there18:23
clarkbwe can set change.disablePrivateChanges to true18:23
fungiyeah, it's an attractive nuisance18:23
fungii agree we should disable it18:23
clarkbin gerrit.config18:23
fungii can write that change now18:23
clarkbthanks18:23
mnaseri moved my test change back to public in case that causes some issue about disabling it with a private change already there18:24
clarkbmnaser: thanks, though I expect its fine. usualyl that stuff gets enforced when you push18:25
clarkbsimilar to hwo we disabled drafts, the old drafts were fine18:25
clarkbgiteas are about done gc'ing18:25
fungiwhat change topic were we using for upgrade-related changes?18:25
mnaseroh that's quite cool -- if you look at a change diff and click on "show blame", it shows the blame and you can click to go to the original change behind it18:26
funginifty18:26
clarkbgitea06 onyl has 19GB free disk. I'm going to look at that as its much lower than the others18:26
clarkbfungi: I was using gerrit-upgrade-prep18:26
fungithanks18:27
clarkband have a couple of wip changes there that we should land once we're properly settled18:27
clarkbfungi: I wonder if we shouldn't manually apply that change and force merge it too18:27
fungiit'll need a service restart18:28
clarkbya18:28
clarkbprobably not is the best time for those?18:28
clarkbs/not/now/18:28
clarkbsince we're telling people its not ready yet18:28
fungichange is 76357818:30
fungii'll hand edit the config now and restart gerrit18:31
fungii have the line added in the screen session if you want to double-check18:32
clarkbscreen looks correct18:32
mnaserit looks like a change owner can set an assignee for their change18:32
clarkbI'm still trying to sort out gitea06 disk18:32
mnaseri'm not too sure what an assignee really .. means18:33
clarkbthe gitea web container us using 20gb of disk in /var/lib/docker/containers18:33
fungithat may be to support workflows where reviewers are auto-assigned18:33
fungimnaser: ^18:34
clarkbwhcih should be separate from the bind mounted stuff which is where we expect data to go18:34
fungiokay, restarting the service18:34
corvusmnaser: i'm also not sure who's supposed to check the "resolved" box for comments.  the author or the reviewer?18:34
clarkbI expect that if I restart gitea on 06 that will clean up after itself18:34
corvusmnaser: we'll have some more cultural things to figure out18:34
clarkbbut maybe I should exec into it first and figure out where the disk is used18:34
mnasercorvus: yep, as the assignee of a change seems to be a 1:1 mapping too18:35
mnaserclarkb: i'd probably see why it ran away with so much disk space in the first place out of my curiosity :)18:35
clarkbit is the log file18:36
clarkbI'll compress a copy into my homedir then down up the container?18:37
fungiwfm18:37
mnaserhmmmmmmm18:37
mnaseryou can change your full display name inside gerrit right now18:37
clarkbmnaser: you always could18:37
mnaseroh, i thought you could change the formatting18:38
clarkbsome people would stick their irc nicks in there or put away messages18:38
funginope, always was allowed18:38
mnaserah got it18:38
fungibut now away messages are unnecessary, because...18:38
fungiyou can set your status!18:38
mnaserindeed18:38
fungiactually what's changed around the name is that it has a separate "display name" and "full name"18:39
fungiyou can change them both18:39
fungiused to just be a full name18:39
mnaserunrelated but18:40
mnaserthe static link url to the CLA is well, ancient18:40
mnaserhttps://review.opendev.org/static/cla.html18:40
mnaser"you agree that OpenStack, LLC may assign the LLC Contribution Agreement along with all its rights and obligations under the LLC Contribution License Agreement to the Project Manager."18:40
fungimnaser: technically still accurate18:40
mnaseropenstack, llc? :p18:41
fungimnaser: yep18:41
clarkbhrm using xz because I don't want a 2GB gzip file18:41
clarkbbut this is slow18:41
clarkbfungi: has gerrit restarted?18:41
mnaserwell IANAL but if it works, it works18:42
fungimnaser: section #9 contains the previous icla18:42
fungibecause lawyers18:42
fungiit's an icla within an icla18:42
fungiclarkb: yes18:42
fungiin theory private changes should no longer appear as an option18:42
clarkbfungi: cool the change for that lgtm. I +2'd it if you want to force merge it too18:43
clarkbonce I've got gitea06 in a good spot I t hink we're ready to start replicating more things18:43
clarkbI'll give it say 5 minutes on the xz but if that isn't done switch to gz?18:44
fungimnaser: the short answer is that agreeing to the new license agreement carries a clause saying you agree that contributions previously made under the old agreement can be assumed to be under the new agreement, and part of doing that is specifying a copy of the old agreement18:44
fungii'll merge the private disable change now18:45
fungiclarkb: care to add a workflow +1?18:46
clarkbdone18:46
fungithanks18:46
fungiand now it's merged and i've removed my admin account from project bootstrappers again18:46
clarkbthanks for taking care of that18:46
funginp18:46
mnaserdo we have an 'opendev' plugin in ue?18:47
mnaseri was researching on how to add the opendev logo and replace 'Gerrit' by 'OpenDev', found out it was possible by writing a style plugin18:48
mnaseri've found the one used by chromium -- https://chromium.googlesource.com/infra/gerrit-plugins/chromium-style/+/refs/heads/master18:48
funginope, though that raises the question whether we'd want an aio plugin for all our stuff or separate single-purpose plugins18:48
clarkbwhat is ue?18:48
fungii assumed he meant "use"18:48
mnaserah yes, in use indeed18:48
clarkbmnaser: no what we came to realize was thati f we tried to get every single thing like that done before we did the notedb transition in particular we'd just be making it harder and harder as more changes land18:49
mnaserclarkb: oh yes, of course, i agree :)18:50
clarkbinstead it felt prudent to ugprade, then figure out what we need tochange as we're able to roll ahead with eg the 3.3 release18:50
clarkbthat comes out next week, maybe we'll upgrade week after18:50
mnaserby the way, funny thing18:50
mnaserin that plugin `if (window.location.host.includes("chromium-review")) {`18:50
mnaser`} else if (window.location.host.includes("chrome-internal-review")) {`18:50
mnaserhttps://chrome-internal-review.googlesource.com/ i wonder where this little guy goes :)18:51
fungibehind a firewall/vpn you can't reach, no doubt18:51
fungiit's likely full of googlicious goodness18:52
clarkbok its been more than 5 minutes and xz is still going. I'm going to stop it and see how big a gzip is18:52
fungixz takes a lot more memory/cpu to compress than gzip18:53
fungiso not surpeising18:53
fungigz will probably still make it nearly as small18:53
clarkbfungi: I went with xz to start beacuse compressing journald logs is significanlty better with it than gzip18:54
clarkbthat is why the devstack jobs use xz for that purpose18:54
clarkblike an order of magnitude18:54
fungiwoah really?18:55
clarkbya18:55
fungii rarely see xz get that much of an advantage over gz. maybe 25%18:55
fungiorder of magnitude is impressive indeed18:55
clarkbits like 30MB xz and 200MB gzip iirc18:55
mnaserrest of the gerrit looks pretty good to me so far in terms of functionality at this point, i'll come try 'break' things again once zuul is back up :)18:56
fungii guess it's on super repetitive stuff18:56
* mnaser goes for a walk18:56
mnasergl!18:56
fungithanks again mnaser!18:56
fungii'm going to need to break in an hour to light the grill and start cooking dinner18:56
clarkbcorvus: ^ if you're still around any thoughts on the zuul startup process I have on the etherpad?18:57
clarkbfungi: ya lunch here is in about an hour and I barely ate breakfast so should haev something too18:57
clarkbok gzip is done. took 18GB down to 1.2GB so its probably going to give us more than enough space. I'm stopping gitea06 now using the safer process in the playbook18:58
clarkbyup 35GB available now which I think is plenty19:00
clarkbfungi: corvus I think we are ready to trigger global replication now. Gitea01 has the least free disk at 27GB but our git repo growth was about 15GB so I expect that to be plenty19:00
clarkbfungi: ^ do you want to trigger that if you agree we're good?19:01
fungisounds good, i can trigger it as soon as you're ready19:01
clarkbI guess I'm as ready as I will be. gitea06 is up now19:01
fungii've done `replication start --all --now`19:03
clarkbI see things getting queued up in show-queue19:03
clarkbit doesn't seem to load the queue items as quickly as before19:05
clarkbthe number is still climbing19:05
clarkbheh its stream events the replication scheduled events for everything19:06
clarkbpeaked at just over 17k events in the queue19:08
clarkbnumber is falling now (slowly)19:08
clarkbI'm going to remove the digest auth option from all our zuul config files as the default is basic19:10
clarkbthis is required before we start zuul back up again, but I will wait on zuul startup until we've got eyeballs19:10
clarkblooks like it may only be necessary on the scheulder? the others have it but no corresponding secret. I'll do the others for completeness19:12
fungisounds right19:15
clarkbjust under 16k events now so whatever that comes out for replicating19:16
fungionly the scheduler performs privileged actions on gerrit, the other services just pull refs (at least in our deployment)19:16
corvusclarkb: looking re zuul19:16
corvusclarkb: 6.4.1 and 6.4.2?19:17
clarkbcorvus: ya19:17
corvusclarkb: i think 6.4.2 is done arleady, right?19:17
clarkbyup and 6.4.1 is done as of 30 seconds ago19:18
clarkbI guess the question for you is do you think we should start zuul now or wait or do other things first?19:18
clarkbzuul can't ssh into bridge to run ansible right now19:18
clarkbso we should be able to bring it up, have it run normal ci jobs, be happy with it then work to reenable cd?19:18
corvusclarkb: sgtm.  i can't think of a reason to delay19:19
clarkblooks like zuul_start.yaml starts the scheduler, then web, then mergers, then executors19:19
clarkbdo we want ot hack up a playbook to not exclude disabled or do it more manually?19:20
corvusclarkb: i'd just hack out disabled then run that19:20
clarkbok I think it has to be in the same dir as what we run out of because it includes other roles?19:21
clarkbI guess tahts fine because nothing is updating system-config on bridge right now19:21
fungiare we planning on relying on ansible to undo the commented-out cronjobs or should we manually uncomment them (and when)?19:22
clarkbfungi: I was going to rely on ansible19:22
clarkbtrack-upstream isn't super critical19:22
clarkbactually lets uncomment them because the gc'ing and the log cleanup is good to have19:23
clarkbwe can probably do that now?19:23
clarkbcorvus: fungi: I've got an edited zuul start playbook in the root screen on bridge19:23
clarkbthat is a vim buffer if you want to take a look at that before we run it19:23
fungiokay, i'll uncomment the cronjobs now19:23
fungiplaybook in bridge root screen lgtm19:24
clarkbdown to 14.7k replication tasks now19:24
corvusclarkb: lgtm19:24
corvusclarkb: rember -f 20 :)19:24
clarkbcorvus: ++19:24
corvusor 50 is fine :)19:24
fungiheh, 50 it is19:25
corvus-f lots19:25
clarkbthat command was in the scrollback so easy to modify19:25
* fungi fasts fireball19:25
clarkbdoes that command look good to yall?19:25
fungier, casts19:25
fungiyeah, looks fine19:25
corvus++19:25
clarkbok running it19:25
fungisuccess!19:26
clarkblooks happy19:26
clarkbnow to see what the running service is like19:26
corvusexecutors are deleting stale dirs19:26
corvus2020-11-21 19:25:55,459 DEBUG zuul.Repo: Updating repository /var/lib/zuul/git/opendev.org/inaugust/src.sh19:27
fungicrontabs edited in root screen session on review.o.o if anyone wants to double-check those19:27
corvusthat is not going as quickly as i would expect19:27
corvusi wonder if zuul is going to have to pull a lot of new refs19:28
corvusoh okay, things are moving now19:28
corvusi think we might have been stuck at branch iteration longer than i expected19:28
corvusie, the delay wasn't git, but rather the rest api querying branches19:28
corvuscat jobs are proceeding19:29
clarkbthis takes about 5-10 minutes typically iirc19:29
corvusi'm seeing a number of errors in gertty19:31
clarkbI moved my temporary playbook into my homedir to avoid any trouble that may cause system-config syncing when we get there19:31
corvusi have no reason to think they are on the gerrit side; more likely minor api tweaks19:31
corvuszuul is running jobs in the openstack tenant19:32
clarkbhttps://review.opendev.org/763599 for that change19:32
clarkbdown to 13.3k replication tasks19:32
fungicorvus: gertty isn't logging any errors for me... did you change your auth from digest to basic?19:32
corvusfungi: oh, not yet; that's not the error i'm getting but maybe it's a secondary effect19:33
corvus2020-11-21 19:31:30,509 WARNING zuul.ConfigLoader: Zuul encountered an error while accessing the repo x/ansible-role-19:33
corvusbindep.  The error was:19:33
corvus  invalid literal for int() with base 16: 'l la'19:33
corvuszuul logged that error for a handful of repos ^19:33
clarkbcorvus: I thnik I saw that scroll by in the zuul scheduler debug19:33
corvusyeah19:34
clarkbshould I be digging into that or are you investigating?19:34
fungicorvus: yeah, the error i remember gertty throwing when i had the wrong auth type was opaque to say the least19:34
corvusi don't recall seeing that before, therefore i don't know if it could be upgrade related.  but it doesn't seem like it should be -- that's in-repo content over the git protocol, so i don't think anything should be different.  but i dunno.19:35
fungii've put a reminder in the post-upgrade etherpad for gertty users to update their configs19:35
clarkbcorvus: oh I see this is us talking git not api19:35
clarkbthree jobs have succeeded, but the other jobs on that chagne will take a while to run so will be a while before we see zuul comment back19:36
corvusfatal: https://review.opendev.org/x/ansible-role-bindep/info/refs not valid: is this a git repository?19:37
corvusthat would explain the proximate cause of the zuul error19:37
clarkbinfo/refs/ is there and file level permissiosn look ok19:38
clarkbansible-role-bindep doesn't show up in the error_log19:39
corvusi can clone it over ssh19:39
corvusis there a problem with "x/" repos and http?19:39
clarkbx/ranger reproduces (just a random one I remembered was in x/)19:40
clarkbI wonder if this is a permissions issue perhaps related to the bug that got mitigated?19:41
corvusjust for 'x/' though?19:41
clarkbreview-test reproduces fwiw19:41
clarkbif you search for chagnes in those repos you can see them19:43
clarkbin the web ui I mean19:43
corvusif i curl info/refs for x repos, i get the gerrit web app19:44
corvusi'm a little worried there's some kind of routing thing in gerrit that assumes any one-letter path component is not a repo19:45
clarkboh fun19:45
fungiyikes19:45
corvusno basis for that other than observed behavior19:45
corvusi'm going to start looking at gerrit source code19:46
clarkbok19:46
clarkbdown to 11.1k replication tasks and things look good on gitea01 disk wise19:48
clarkbits x/19:49
clarkbcorvus: java/com/google/gerrit/httpd/raw/StaticModule.java19:49
clarkbit serves something related to polygerrit judging by the path names19:49
clarkbs/path/variable/19:49
corvusclarkb: thx19:49
corvusclarkb: poly gerrit extension plugins?19:52
clarkbya the docs talk about #/x/<plugin-name>/settings19:53
corvusand /x/pluginname/*screenname*19:53
clarkbdo we need to start talking about renaming them?19:55
clarkbI did test a rename and if you move the project in gerrit's git dir everything seems to be fine except for project watches config19:56
clarkbyou can do an online reindex too19:56
clarkbor maybe this is somethign to pull luca in on19:56
corvusi think a surprise project rename might be disruptive19:56
clarkbagreed19:57
corvusgrepping logs, i'm not seeing any currently legit access for /x/*19:58
corvus(other than attempted clones)19:58
corvusthere are some requests for fonts: /x/fonts/roboto/Roboto-Bold.ttf19:58
corvusbut i'm not sure those are actually returning fonts (i think they may just return the app)19:58
clarkbthinking out loud here. I wonder if we can convince the gerrit http server to check for x/repo first then fallback to x/else19:59
corvusclarkb: i think long term if gerrit wants to own x/ we can't have it20:00
clarkbya agreed, I figure something liek that wold be so we can schedule a rename not today20:00
corvusbut short term, i'm wondering if, since it doesn't seem like our gerrit is using x/ right now, we can rebuild it without that exclusion then work on a rename plan20:00
fungii'm around to review a gerrit patch, though getting started grilling20:01
corvus(if we're right about x/ being used for plugins, then it'll become an issue as we add polygerrit plugins)20:01
fungii assume we'll want to start a thread on repo-discuss noting that polygerrit has made some repository names impossible. that seems like a bug they would be interested in fixing20:01
corvusfungi: i assume they'll fix it with a doc change saying 'don't use these'20:02
clarkbcorvus: ya maybe we can add a sed to the jobs to comment that out on the 3.2 branch which will rebuild image then pull that and use it?20:02
corvusjust like /p/ and /c/ are unavailable20:02
corvusclarkb: sounds good20:02
clarkbcorvus: do you want to write that change or should I/20:03
fungiif you can't use repositories whose names start with c/ or p/ or x/ but gerrit doesn't prevent you from creating them, that sounds like a bug20:03
corvusclarkb: you if you're available20:03
clarkbalso I think we should trim down the images so its just 3.2 on that chagne20:03
clarkbok working on that now20:03
fungifor not properly separating api paths from git project paths20:03
corvusfungi: perhaps gerrit does prevent creation; we should check that20:03
corvusi imagine we should just no longer allow single-char in the initial path component of project names to be safe for the future20:04
clarkb++20:05
fungior is there a more correct path prefix we should switch to using to access git repositories?20:05
clarkbI always have to spend 10 minutes figuring out how we build the gerrit wars in these jobs20:06
clarkbfungi: the download urls are rooted at /20:06
clarkbI checked that as I wondered too20:06
fungiand arent' configurable?20:06
fungibecause that seems like it would be a relatively minor fix... deprecate the / routing for project names and add a new prefix20:07
fungiand instruct users to migrate to the new prefix and then eventually rtemove the download routing at / in a later release20:08
*** Alex_Gaynor has joined #opendev-meeting20:10
*** Alex_Gaynor has left #opendev-meeting20:10
corvusclarkb: i came to the same conclusion20:14
corvusi mean, /p/ *used* to work :/20:14
clarkbremote:   https://review.opendev.org/c/opendev/system-config/+/763600 Handle x/ prefix projects on gerrit 3.220:15
clarkbI figure we can pull that image onto review-test and test out there first, then if that looks ok do it to prod20:15
clarkband I'll update my change so that we can land it20:16
corvusclarkb: ++20:16
corvusclarkb: what needs to be updated?20:16
clarkbcorvus: stuff around which jobs to run I think20:16
clarkbcorvus: I removed 2.13 - 3.1 since they aren't necessary to get that image20:17
ianwo/ ... well done everyone!20:17
corvusclarkb: can't we land that?20:17
fungisince luca reached out when he saw our upgrade was in progress and suggested we should let him know if we hit any snags, is this something we should give him a heads up about?20:17
clarkbif we add them back in I need to make the sed branch specific. If we don't add them back in then I need squash it into fungi's use regular stable branches change I think20:17
corvusfungi: yes20:18
clarkbcorvus: yes I think I need to update the system-config-run dependency maybe?20:18
corvusi think we should send an email saying we found this issue and our proposed solution and see if he thinks it's ok20:18
clarkbcorvus: I'm sorry these jobs always confuse me20:18
clarkbI'm basically hsut saying that we need to review teh job updates carefully if we land this20:18
fungii've got the grill starting so i'm happy to throw a quick e-mail out there pointing to our workaround and asking for suggestions20:18
clarkbfungi: go for it20:18
clarkb5.8k on the replication20:22
clarkbgitea01 is down to 18GB free. Should have plenty for the remaining replication20:24
clarkbI'm going to find some food while I wait for zuul to build that image20:25
fungireply sent to luca, seems like my patio is experiencing unnecessary levels of packet loss so i'm less responsive than i might otherwise be at the moment20:31
clarkbmy ansible is bad20:32
clarkbfixing20:32
ianwthe new UI is so much faster, very pleasant for us high latency users20:32
clarkbnew ps has been pushed20:34
clarkbinfra-root. I added myself to project bootstrappers and admins on review-test. Then went to /plugins/ which returns a json doc of plugins20:39
clarkbthe index_url for each plugin we have is listed there and they all start with plugins/ not x/20:40
clarkb(just another data point towards the safety of this change)20:40
clarkbI think toget that document you have to be in the amdins group20:41
clarkbcould probably get it via the rest api instead too20:41
corvusclarkb: yeah, if i'm following correctly x/ might be used by polygerrit plugins to serve certain resources20:41
clarkbcorvus: hrm are any of the plugins we have polygerrit plugins? I assume that some are like the codemirror-editor and download-commands?20:43
corvusclarkb: no idea20:45
corvusclarkb: ansible pares error again20:46
clarkbk, can someone look at it really quickly? I feel like my brain isn't working20:47
corvusclarkb: will do20:47
clarkbpoking at codemirror editor on review-test with ff dev tools it self hosts its static contents looks like20:48
clarkbI think I see it/ shell needs to be a list20:49
corvusyes i'm on it20:50
clarkbk20:50
corvus-          "/x/*",20:50
corvus+          //"/x/*",20:50
corvusclarkb: that's the intended change, yeah?20:50
clarkbcorvus: yes20:50
corvusi'm validating it makes it all the way through ansible unscathed20:50
clarkbit comments out that line with the /x/* in it20:50
corvusclarkb: pushed20:51
corvusi figured i'd double check the whole thing to save us any more round trips20:51
clarkbthanks20:51
clarkb++20:51
clarkb~600 replication tasks now20:56
fungione this is built, pulled and restarted, do we need to restart the executors and mergers as well?20:58
clarkbits running the bazelisk build now20:59
clarkbfungi: you want to respond to luca?21:00
clarkband file the bug?21:00
clarkbreplication is done. I'm going to do another round of gc'ing on the giteas21:00
fungioh, cool, he already replied. yeah i can do that immediately after dinner21:01
clarkbfungi: specifically I think the bit that was missing in the email was that its cloning repos21:01
fungiyes21:02
clarkbgiteas are gc'ing now21:03
corvusbuild finished21:13
corvusdocker://insecure-ci-registry.opendev.org:5000/opendevorg/gerrit:f76ab6a8900f40718c6cd8a57596e3fc_3.221:13
clarkbcool I'll get that on review-test momentarily21:14
corvusi'm also running it locally for fun21:14
corvusor will, when it downloads, in a few minutes21:14
clarkbnote review-test's LE cert expired a few days ago and we decided to leave it be21:15
clarkbcloning x/ranger from review-test works now21:16
corvus\o/21:17
clarkbhttps://review-test.opendev.org/x/fonts/fonts/robotomono/RobotoMono-Regular.ttf is a 40421:18
corvusclarkb: but it's also not a real thing on prod21:19
clarkbya I guess not21:19
clarkbI just wanted to see what it does there21:19
corvusclarkb: want me to update your patch with the system-config-run change?21:20
clarkbcorvus: that would be swell21:20
clarkbthen I think it sould be landable?21:20
corvusclarkb: actually... maybe we should make this 2 changes21:22
clarkbcorvus: I'm good with that too21:22
clarkbjust the sed then a cleanup?21:22
corvusyep21:22
clarkbwfm21:22
corvusi'll take care of that21:23
clarkbcorvus: remember you need to check the branch if you do that21:23
corvusclarkb: meanwhile, we have a built image -- want to go ahead and run it on prod?21:23
clarkbor have 3.2 use a differnet playbook21:23
corvusclarkb: how about we invert the order?21:23
clarkbcorvus: that also works21:23
corvusremove old stuff, then the x/ change21:23
clarkb++21:23
corvuswill be easy to revert21:23
clarkbfor prod any concern that this may break something else? or are we willing to find out the hard way :)21:23
corvusclarkb: i think we've done the testing we can21:24
clarkbok21:24
clarkbI'll do this in the screen fwiw21:24
corvusi'm not worried about it breaking anything in a way we can't roll back21:24
clarkbgerrit is starting back up again on prod21:26
clarkbhrm the chagne screen isn't loading for me though I thought I tested taht on review-test too21:28
clarkboh there it goes21:28
clarkbI just need patience21:28
clarkbI can clone ranger from prod via https now too21:28
corvusremote:   https://review.opendev.org/c/opendev/system-config/+/763616 Remove container image builds for old gerrit versions [NEW]21:30
corvusremote:   https://review.opendev.org/c/opendev/system-config/+/763600 Handle x/ prefix projects on gerrit 3.221:30
corvusclarkb: i think we should do a full-reconfigure in zuul21:30
corvusi'll do that21:30
clarkboh I should go rsetart gerritbot now that I restarted gerrit21:30
clarkbcorvus: ++21:30
clarkbgerritbot has been restarted21:30
corvusi have more work to do on those image build changes; on it21:31
clarkbbtw zuul commented a -1 on https://review.opendev.org/c/openstack/os-brick/+/763599/ which was the first change that started runnign zuul jobs. That aspect of things looks good21:34
corvusclarkb, fungi, ianw: remote:   https://review.opendev.org/c/openstack/project-config/+/763617 Remove old gerrit image jobs from jeepyb [NEW]21:34
clarkb+221:35
corvuscat jobs are running21:36
clarkbcorvus: one small thing on https://review.opendev.org/c/opendev/system-config/+/76361621:37
clarkbI'm ahppy to fix the issue on ^ if you want to roll forward instead21:38
clarkber I mean fix it in a follow on21:38
corvusclarkb: i'll respin21:38
clarkbok21:38
corvusclarkb: respin done21:40
corvus2020-11-21 21:37:15,977 INFO zuul.Scheduler: Full reconfiguration complete (duration: 379.767 seconds)21:40
clarkband no more of thos errors?21:40
fungiwas review.o.o restarted with the fix? i guess so, my tests to reproduce the error don't fail21:41
fungiwhat was the error message on attempting to clone?21:41
fungisorry, just now catching up since dinner's done21:41
corvusfungi: heh, lemme see if i have a terminal open with the error :)21:41
fungiback to nominal levels of packet loss again and can test things suitably21:41
fungithanks!21:41
fungiworking up the reply to luca now21:42
clarkbanother thing I notice is that gitweb doesn't work but gitiles seems to21:42
clarkbI think we should just stop using gitweb maybe and have it gitiles21:42
clarkbthat isn't super urgent though21:42
clarkbthen we can add in gitea when we sort that out21:42
corvusfungi: i don't, sorry :(21:42
corvus19:37 < corvus> fatal: https://review.opendev.org/x/ansible-role-bindep/info/refs not valid: is this a git repository?21:43
corvusfungi: but i pasted that ^21:43
corvusthat was about it21:43
corvusclarkb: confirmed, no new 'invalid literal' errors from zuul21:44
clarkb+2 from me on corvus' image stack21:45
corvus+2 from me on clarkb's image stack21:46
clarkbzuul stillcan't ssh into bridge (Ithink that is a good thing), once we've got these issues settled I figured we would use https://review.opendev.org/c/opendev/system-config/+/757161 this change as the canary for that?21:46
clarkbmy family has pointed out to me that I am yet to shower today though, so now might be time for me to take a break.21:46
clarkbis there anything else you'd like me to do before I pop out for a bit?21:46
funginope, go become less offensive to your family ;)21:48
corvusi think now's a good break time21:48
clarkbfungi: maybe you can include a diff for luca as well: http://paste.openstack.org/show/qz6zQ6a3jkRVluxebh8l/21:48
corvusfungi: can you +3 https://review.opendev.org/763617 ?21:48
fungicorvus: thanks! i'll try to work with that21:48
fungiyeah, will review21:48
fungiand approved21:49
clarkbgiteas are still gc'ing but free disk space is going up so we should be more than good there21:49
clarkband now break time21:49
clarkbI've also removed my normal user from privileged groups on review-test21:49
clarkbas I am done testing there for now21:50
fungii've re-replied to luca, will start putting the bug report together shortly21:55
fungiany other urgent upgrade-related tasks need my attention first?21:55
corvusfungi: i don't think so.  i'm about to +w the remaining image stack21:56
corvuserr, there's another error21:57
corvusclarkb, fungi: can you +3 https://review.opendev.org/763616 ?21:59
corvusmissed an update for the infra-prod jobs to trigger on 3.2 builds22:00
fungiyup, taking a look now22:01
corvuscurrent status: we need to merge https://review.opendev.org/763616 and https://review.opendev.org/763600 then the repos will match the image we're running in production.  then we can proceed with enabling cd.  aside from that, i think there's no known issues in prod and we're just waiting for replication to finish.22:02
fungii've approved 763616 now22:02
corvuscool, then i'm going to afk for another errand22:02
corvusinfra-root: just a highlight ping for what i think is the current status (a couple lines up ^) as i think we're all on break while waiting for tasks to complete22:03
fungiawesome, thanks again!22:04
fungihttps://bugs.chromium.org/p/gerrit/issues/detail?id=1372122:44
fungiif anyone feels inclined, please clarify mistakes or omissions therein22:45
clarkbI'll take a look in a few.22:46
clarkbgitea01 has finished gc'ing and has 22gb free which should be plenty for now22:47
clarkbthe others all have more free disk too22:48
clarkband are done as well22:48
clarkbI think that means all the replication related activities are done22:48
clarkbfungi: the bug looks good to me22:49
clarkbI'm going to start drafting a "its up, this is what we've discovered, this is where we go from here" type email in etherpad22:50
fungithanks! don't forget to incorporate notes from https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes as appropriate22:52
clarkbya was going to link to that I think22:52
clarkbhttps://etherpad.opendev.org/p/rNXB-vJe8IUeFnOKFVs8 is what I'm drafting22:55
ianwfungi: no idea if it helps but i think x/ was introduced @ https://gerrit.googlesource.com/gerrit/+/153d46c367965cd7782a3ac86212c07b298eaca822:57
ianwactually no, more to dig22:58
clarkbthe file was moved at some point which makes it difficult to go back in time with22:59
clarkbI ended up doing a git log -p and grepping for it and giving up22:59
ianwhttps://gerrit.googlesource.com/gerrit/+/7cadbc0c0c64b47204cf0de293b7c6881477465223:00
ianw+    serve("/x/*").with(PolyGerritUiIndexServlet.class);23:00
ianwthat is really the first instance.  i wonder if it's not really necessary and just been pulled along since23:00
clarkbianw: the docs hint at it but could still be dead code23:01
ianw.. at least it's in a "add x/ this is a really important path never remove" type change i guess :)23:02
clarkbnot in or in?23:04
clarkbhttps://etherpad.opendev.org/p/rNXB-vJe8IUeFnOKFVs8 ok I think thats largely put together at this point23:04
ianwclarkb: minor suggestion on maybe something that explains the x/ thing at a high level but enough for people to understand23:13
clarkbianw: something liek that?23:14
ianwyeah, i think so; feel like it explains how both want to "own" the /x endpoint23:15
ianwnamespace, whatever :)23:15
clarkboh shoot, I think there is a minor but not super important issue with https://review.opendev.org/763600 it doesn't update the dockerfile so we won't promote the image23:19
clarkbcorvus: ^ maybe thats something we can figure out manually or just push up another change that does a noop dockerfile edit?23:19
clarkbdouble check me on all that first though23:20
clarkbalso I'm starting to feel the exhaustion roll in. If others want to drive things and get cd rolling again I'll do my best to help, otherwise, tomorrow morning might be good23:22
clarkbya I think the promote jobs for the 3.2 docker image tagging didn't run23:23
clarkbI'll push up a noop job now to get that rolling23:23
clarkbremote:   https://review.opendev.org/c/opendev/system-config/+/763618 Noop change to promote docker image build23:26
ianwi've got to run out, but i can get to the CD stuff early my tomorrow?  i don't think we need it before then?23:26
clarkbya I don't think its super urgent unless others really want their sunday back. I'm just wiped out23:27
clarkbfungi: corvus ^ fyi. Also any thoughts on that email? should I send that nowish?23:27
clarkbinfra-root Note that https://review.opendev.org/c/opendev/system-config/+/763618 or something like it should land before we start doing cd again23:28
ianwok, that's the new image with the x/ fix right?23:29
clarkbyes23:29
ianwi.e. we don't want to CD deploy the old image23:29
clarkbwe actually just built it when corvus' changes landed but because we didn't modify files that the promote jobs match we didn't promote it23:29
clarkbwe could also do an out of band promote via docker directly if we want23:29
clarkb763618 should also take care of it since the dockerfile is modified23:30
ianwok, i have to head out but will check back later23:30
clarkbianw: o/23:30
fungiclarkb: sorry, stepped away for a bit, reading draft e-mail now23:37
fungimade a couple of minor edits but lgtm in general23:41
clarkbcool I'll wait abit to seeif corvus is able to take a look then send that out23:42
clarkbfungi: and maybe a corresponding #status notice23:42
clarkbI'm taking abreak now though. The tired hit me hardin the lastlittle bit23:42
fungiyup, a status notice at the same time that e-mail gets sent would make sense23:43
clarkbfungi: did you see 763618 too?23:46
fungilikely not if you're asking23:47
fungiapprovidado23:48
corvusreading scrollback23:48
corvusclarkb: email lgtm23:50
clarkbcool I'll send that out momentarily23:50
clarkbhow about this for the notice #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during out upgrad window which ends at 01:00 November 23.23:55
corvusclarkb: s/out upgrad/our upgrade/23:55
clarkbI can also add "See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details"23:55
clarkbhow about this for the notice #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during our upgrade window which ends at 01:00 November 23. See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for23:55
clarkbmore details23:55
clarkbis that just short enough if I drop my prefix?23:56
fungimaybe squeeze it down a bit so it fits in a single notice23:57
fungii think statusbot will truncate it otherwise23:57
clarkblike #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details23:57
fungior, rather, statusbot doesn't know to so the irc server ends up discarding the rest23:57
fungilooks good. hopefully that's short enough23:58
clarkbI can trim it a bit more but I'll just go ahead and send it with that trimming23:58
clarkb#status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. We are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details23:59
openstackstatusclarkb: sending notice23:59
-openstackstatus- NOTICE: Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. We are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!