Friday, 2020-04-10

-openstackstatus- NOTICE: review.opendev.org is being restarted for scheduled maintenance; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html16:04
fungiokay, we can start prepping for the etherpad maintenance in here i suppose16:53
corvusstatus notice etherpad.openstack will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html16:54
corvushow's that look?16:54
corvusalso, do we want to startmeeting?16:55
corvusmaybe startmeeting opendev-maintenance ?16:55
corvusinfra-root: i summon you :)16:56
fungi*poof*16:56
* fungi appears in a puff of smoke16:56
clarkbcorvus: ++ on the meeting we can try that out for records16:56
clarkband the status message lgtm16:56
fungithat lgtm16:57
fungiusing meetbot for this one would work, but not for anything where #status alert as they will fight for control of the channel topic16:57
corvusor should we call it 'opendev-maint' because typing is hard?16:58
fungii'm fine with the abbrev, sure16:59
* mordred waves16:59
mordredcorvus: yes16:59
corvus"opendev-maint"  going once16:59
corvus...going twice...16:59
corvus...sold17:00
corvus#startmeeting opendev-maint17:00
openstackMeeting started Fri Apr 10 17:00:05 2020 UTC and is due to finish in 60 minutes.  The chair is corvus. Information about MeetBot at http://wiki.debian.org/MeetBot.17:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.17:00
*** openstack changes topic to " (Meeting topic: opendev-maint)"17:00
openstackThe meeting name has been set to 'opendev_maint'17:00
corvusha, apparently it's opendev_maint :)17:00
corvus#status notice etherpad.openstack.org will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html17:00
openstackstatuscorvus: sending notice17:00
* mordred is in a screen on etherpad01.opendev.org17:01
-openstackstatus- NOTICE: etherpad.openstack.org will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html17:01
corvusjoined17:01
fungijoined as well17:01
mordredk. I'm ready to rock and roll there - somebody else want to stop existing etherpad?17:01
mordred(17:02
* clarkb is joining17:02
corvusi'll stop existing etherpad17:02
mordredI'm going to warn everybody - it's like watching paint dry in the screen once this is running17:02
fungioh, for the db dump/source pipeline?17:02
clarkbI've joined17:02
mordredyup17:02
corvusold etherpad is stopped17:03
mordredok. I'm going to run the command17:03
mordredit is running17:03
corvusneat, old etherpad is running a puppetlabs mcollectived server17:04
corvuswhatever that is17:04
openstackstatuscorvus: finished sending notice17:04
mordredWOW17:04
corvusmordred: is etherpad running on the new server?17:04
mordredcorvus: it shold not be17:04
mordredI only started the mariadb service17:04
corvuscool, i confirm that's the case :)17:04
clarkbmcollective was puppets message bus for doing orchestration like tasks17:05
corvusshould we start the dns change now?17:05
corvusi believe we should change etherpad.openstack.org cname to point to etherpad.opendev.org ?17:05
mordredyeah - I think that's a good idea17:05
corvusi'll get started on that while clarkb and fungi confirm :)17:06
clarkb++17:06
fungiyes definitely17:07
fungito give the change time to propagate17:07
fungipresumably the plan is to delete the existing a/aaaa rrs for etherpad.openstack.org and replace it with a cname to etherpad.opendev.org17:08
corvusetherpad.openstack.org is currently a cname for etherpad0117:09
corvusetherpad.openstack.org is currently a cname for etherpad01.openstack.org17:09
corvusi was going to change it to be a cname for etherpad.opendev.org17:09
corvusso the result will be etherpad.openstack.org -> etherpad.opendev.org -> etherpad01.opendev.org17:09
clarkbcorvus: ++17:09
mordredcorvus: I think that's correct17:09
fungiahh, right, so just update the cname, even easier17:10
corvusthere's just one problem; i don't see etherpad.openstack.org in the list of records in the rax web ui17:10
corvusit was there when i changed the ttl a few days ago17:10
fungiscroll all the way to the end and then keyword search?17:10
corvusis there some kind of limit?17:10
mordredthe rax records are paged and sorted by type17:10
corvusfungi: that is my usual procedure which i have done17:10
fungiit only pages in some at a time and you have to scroll17:10
mordredweird17:10
fungiahh, i can try17:10
clarkbthe lenght of the db backup is making me think about this. Whats the disk situation like on the new server? it has a 50GB volume and is currently using ~3GB of that for the prod db?17:11
mordredalso - https://review.opendev.org/#/c/718764 can be landed now17:11
corvuswait i found it17:11
fungistanding down!17:11
clarkbalso ^F doesn't work properly17:11
corvusctrl-f was not bringing it up17:11
mordredcorvus: once it's loaded it's about 30G of data17:11
mordredgah17:11
mordredclarkb: ^^17:11
clarkbmordred: is 50GB big enough?17:11
corvusbut scrolling to it, it shows up (and it's highlighted)17:11
mordredthat's what the volume was on the old one17:11
clarkbmordred: ah ok17:11
clarkband we can always attach another volume and grow the lv17:12
clarkbnow that I've said ^ and checked lvs I'm far less worried :)17:12
mordred++17:12
* fungi checks paint, still sticky17:12
mordredthat said - I was totaly a shemp when I attached that volume so the lv has a stupid name17:12
corvus#info updated etherpad.openstack.org. CNAME from etherpad01.openstack.org. to etherpad01.opendev.org.17:13
corvusi left the ttl at 30017:13
mordredcool17:13
corvusdo we have an ssl cert for etherpad.openstack.org on etherpad01.opendev.org?17:14
fungiyeah, i already tested that bit17:14
corvuscool, i thought so, just running through things again :)17:14
mordredif you want to watch the db size grow:17:14
mordredls -ltrah /var/etherpad/db/etherpad@002dlite/store.ibd17:14
mordredon etherpad01.opendev.org17:14
clarkbya and the LE verification failed the first time around because dns wasn't set up properly to verify that the frist time17:15
fungiX509v3 Subject Alternative Name: DNS:etherpad.opendev.org, DNS:etherpad.openstack.org, DNS:etherpad01.opendev.org17:15
fungiaccording to openssl17:15
mordredwoot17:15
corvusetherpad.openstack.org.300INCNAMEetherpad.opendev.org.17:16
corvusetherpad.opendev.org.299INCNAMEetherpad01.opendev.org.17:16
corvusetherpad01.opendev.org.218INA104.130.124.12017:16
corvusthat's what i get from dig now17:16
clarkbcorvus: looks perfect17:17
corvusand cool, the http redirect is working17:17
corvus(because apache is up; it's just the eplite service that's down)17:17
mordredwhile we're waiting - it occurred to me recently - is having apache on the host rather than in a docker container and in the compose file the right choice? would it make more sense to run it as an apache container as well?17:19
clarkbmordred: ya I was thinking about that back when I thought refstack might grow some momentum again. I think if we want to go away from using host networking having a host run webproxy is nice though it could be the one host network container too17:20
fungiright, i tested the redirect yesterday as well, albeit with the etherpad service down and apache serving an error for it17:20
fungiso looks like what i got from my local /etc/hosts edit17:21
mordredclarkb: yeah - I was thinking about it from a "what would be different about these container services if we decided to roll out k8s"17:21
clarkbmordred: if we rolled out k8s we'd probably use the nginx ingress controller for a good chunk of that ?17:21
corvusi'm ambivalent about whether we run apache in a container or not; if we did, we could stull use host networking17:22
clarkbthough services like etherpad need rewriting which I don't know that can do17:22
corvusclarkb: we would use *some kind* in ingress controller, not necessarily the nginx one, depending on what our load balancer situation was like17:22
clarkbfair17:22
corvusand many of them can rewrite17:22
mordredclarkb: yeah - I think we can still run apache behind the ingress controller in those cases - so that we don't have to rewrite all of our rewrites17:22
mordredbut also - cloud load balancers are a thign too17:22
mordredwhen we did the gitea setup, we used a cloud load balancer that attached to exposed service of each pod running17:23
clarkband that cloud load balancer was running haproxy not nginx :)17:23
mordredthat said - in our current clouds we can do the same thing only with nginx ingress if we use VRRP to manage which thing owns the VIP17:23
mordredif we don't want to rely on a cloud load balancer17:24
mordredI know that it's possible to create VRRP-enabled ports in neutron in vexxhost17:24
clarkbmordred: ya the basic requirement is being able to control a shared l2 network between the instances with the 3 IPs on that network17:25
clarkbthough maybe you don't even need the third ip on that network if you can vrrp separately? its been a while since I had to do vrrp17:26
corvushere's an ingress controller config for gke with a path mapping (to /, but the syntax is there to imagine other roots); so it's doing layer 7 load balancing -- https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml#31517:26
fungiclarkb: yeah, technically you can have vrrp/hsrp/carp use only two addresses (though a third makes it somewhat easier)17:27
mordredcorvus: so that ingress setup seems like it's mapping a single external ip to the resources?17:29
clarkbmordred: I think its a name not an ip17:30
clarkb(so they could do magic with dns potentially)17:30
mordredkubernetes.io/ingress.global-static-ip-name: "zuul-static-ip"17:31
mordredis what I was keying off of17:31
corvusmordred: yes, it's a single pre-allocated static ip17:31
corvus(i previously ran "gcloud get me a static ip named zuul-static-ip")17:32
clarkbah17:32
clarkbso its referencing cloud resources outside of k8s17:32
mordrednod. so pattern-wise (ignoring mechanics for a sec) - that would potentally map to the sorts of things we'd want to do17:32
corvusyep17:32
mordredso figuring out the equiv pattern for us inside of a k8s in openstack would be a key piece if we wanted to explore using k8s for services instead of compose17:33
clarkbwe are at 13GB used17:37
clarkband now 15GB this paint is sticky17:41
mordredyeah17:44
fungi"wet data, do not touch"17:44
mordredseems to be running slower today17:44
fungiit is a holiday17:48
corvuswe're expiting it to be how big?17:48
fungi~30gb clarkb said?17:48
corvus30g right?17:48
clarkbya thats what mordred said above17:49
fungioh, got it17:49
corvusso we're 36 minutes away from completion17:49
corvusstatus notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC17:50
corvusshould we send that?17:50
fungiyeah, warranted17:51
clarkb++17:51
corvus#status notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC17:51
openstackstatuscorvus: sending notice17:51
corvusi'm going to afk for about 30m17:51
-openstackstatus- NOTICE: The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC17:51
fungionce maintenance is concluded, it may be time to prepare for my annual viewing of "the life of brian"17:52
clarkbI'll be making a tunafish sandwich for lunch when this is done17:53
mordredfungi, clarkb : while you're waiting: https://review.opendev.org/#/c/718764/17:53
mordredand actually - I think we can not land that yet17:54
openstackstatuscorvus: finished sending notice17:54
mordredand land it once we take etherpad out of the emergency file to ... no, that's too laggy. nevermind me17:55
clarkbhttps://review.opendev.org/#/c/719051/ another good one to review though it had a post failure17:55
mordredI think we can land it whenever17:55
mordredclarkb: and this one remote:   https://review.opendev.org/719053 Set env vars pointing to correct file locations17:57
mordredand remote:   https://review.opendev.org/719052 Fix issues from rolling out containers18:01
mordredinfra-root db migration done18:02
mordredI might have been wrong about db size18:02
fungior there were a lot of zeroes at the end18:02
clarkbor newer mysql is more compact18:02
mordredI think actually 32G of free space on device is what I was looking at :)18:02
fungiso ready to start up the container?18:02
mordredyeah - I thnk so18:03
mordredany last concerns?18:03
funginone for me18:04
clarkbnone from me18:04
mordredk. here we go18:04
mordredk. I reloaded an openstack etherpad, it redirected to opendev and all is good18:04
fungii reconnected to a pad i already had open and got sent to the right (new) place18:05
mordredwe might want to keep our eyes on this as it gets usage - might need to tune the my.cnf settings18:05
fungididn't even reload, just clicked the reconnect button from when it got disconnected during the shutdown18:05
fungiwe did at least incorporate the apache tuning we had on the old deployment, right?18:05
mordredyeah18:06
mordredinnodb_buffer_pool_size= 256M is the one I think might be applicable18:06
fungitested out a few more pads, not seeing any problem yet18:06
clarkbmordred: thinking it may need to be bigger?18:06
mordredbut honestly, 256M of hot data isn't bad18:06
clarkband ya I think individual etherpads tend to be pretty small. Its the history data that grows (I wonder if we can tune it to prefer the newer pad data)18:07
mordredit'll do that naturally - the buffer pool will only contain the most recently touched pages18:07
mordredso I think it should be fine18:08
mordredin other news, my new dowel-style rolling pin has arrived18:09
fungihave fun! i still just use a boring old marble cylinder roller18:11
fungibut i like the extra weight18:11
mordredare you saying I'm fat?18:12
fungiheh18:13
clarkbthat post failrue was due to an rsync failure fwiw18:13
clarkbmordreds approval seems to have rechecked it18:13
clarkbdo we need to send an all clear now? and maybe end the meeting?18:14
clarkbnot sure what other work there is to do other than following up on gerrit jeepyb things18:14
mordredI think we should end the meeting - don't know if we need an all clear18:18
mordredI thnk this oe is good18:18
mordredwe might need to restart etherpad to pick up the settings.json update - but that should be a thing that can just be done - in the margin of error of an internet facing service connectivity18:19
mordredoh - we need to take etherpad01.opendev.org out of emergency - shall I do that?18:19
clarkb++18:19
clarkband then sometime next week clean up the old server and db? probably after we have backups running for the new server?18:19
mordredno - we need ot land ...18:20
mordredhttps://review.opendev.org/#/c/719036/18:20
mordredand then ... one sec18:20
fungimordred: are we missing an equivalent of https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/templates/gerrit_patchset-created.erb ?18:20
clarkbmordred: comment on https://review.opendev.org/#/c/719036/118:21
funginevermind, found it at https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/patchset-created.j218:22
mordredclarkb: updated - and pushed up 2 additional18:23
mordredfungi: oh - I issed that one in the patch didn't I?18:24
fungimordred: yeah, i commented18:24
fungisince it's a template it's not in the same directory18:24
mordred++18:24
fungithough maybe make it not a template?18:24
corvuso/18:24
fungiit's only templated so we can toggle the welcome message feature on the existence or absence of a welcome_message_gerrit_ssh_private_key value18:25
clarkbmordred: and do we expect that to noop for review01.openstack.org? I guess since its already configured?18:25
mordredfungi: yeah - which does't exist onreview-dev I think18:25
fungiwhich i expect was more transitionalor for the benefit of people who might reuse our hook scripts18:25
mordredclarkb: I thnik the backup group is intended to be a normal group for servers we backup?18:25
fungianyway, yeah, drop the conditional, move to files, add envvar exports18:26
clarkbmordred: aha got it18:26
mordredthe backup-server is the only one we only run some times18:26
clarkbalso I accidentally adding a +W on that group change. I've removed that18:26
mordred(see the two followup patches)18:26
mordredfungi: no - I think review-dev doesn't have that key18:26
mordredfungi: we'd need to add one for it - and a welcome message user18:27
mordredthat said ...18:27
mordredfungi: I updated it - I think you'll like it now18:30
mordredcorvus: does the stack at https://review.opendev.org/#/c/719077/ look right to you?18:32
corvusmordred: yeah -- though what was the conclusion about puppet managing backups on review?18:33
corvus(have we confirmed that's gone?)18:33
mordredthose would be cron jobs right?18:33
clarkbmordred: yes cron jobs18:33
clarkband since puppet isn't running its not managing it18:33
clarkbwould mostly just be ensuring ansible applies the same or similar cron jobs and bup config18:34
mordredyeah. let me remove the bup cronjob18:34
mordredthere's also 2 other cronjobs we have for root we need to add to ansible18:34
mordredbut I'll leave them for now18:34
mordreduntil we have the patch to replace them18:34
mordredk. bup cronjob on review01.opendev.org has been removed - we should expect ansible to add one now18:35
mordredlemme make a patch to add the others18:35
clarkbservice-backup should apply it18:35
clarkbwhen you add the server to the backup group18:36
clarkb(I don't know what rtiggers that playbook though)18:36
mordredclarkb: well - we have a patch to trigger all playbooks on inventory changes18:40
mordredthat hasn't landed18:40
mordredhttps://review.opendev.org/719088 <-- gerrit cron jobs18:40
mordredclarkb: I take it back - inventory changes trigger everything now: https://review.opendev.org/7190818:41
mordredclarkb: so adding and removing the things to groups should cause the backup playbook to run18:41
clarkbk18:41
clarkbmordred: that link is missing a digit18:41
mordredclarkb:https://review.opendev.org/#/c/717114/ is what I meant18:42
clarkbspecifically line 1716 of that change covers this case18:43
mordredyeah18:43
mordredhah18:44
corvuslooks like it's time to end the meeting18:47
corvus#endmeeting18:47
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"18:47
openstackMeeting ended Fri Apr 10 18:47:50 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)18:47
openstackMinutes:        http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-04-10-17.00.html18:47
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-04-10-17.00.txt18:47
openstackLog:            http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-04-10-17.00.log.html18:47
*** diablo_rojo has joined #opendev-meeting18:55
-openstackstatus- NOTICE: Due to a database migration error, etherpad.opendev.org is offline until further notice.20:07
*** diablo_rojo_phon has joined #opendev-meeting20:53
*** diablo_rojo has quit IRC21:54
-openstackstatus- NOTICE: Maintenance on etherpad.opendev.org is complete and the service is available again22:23

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!