Wednesday, 2021-09-15

*** odyssey4me is now known as Guest729004:14
snadgemy openstack-ansible ussuri pre-production platform crashed the other day, due to disk space, and now galera won't restart.. should i just re-run the playbook or try and figure out why it won't start?11:31
snadgenow it just starts and i haven't even done anything.. sigh, its just been one of those kinda days11:37
noonedeadpunkcool that it's solved anyway:)11:41
noonedeadpunkbut for the future - I wouldn't trust galera recovery to the playbook anyway11:41
noonedeadpunkit's smth that should be done manually11:42
*** arxcruz is now known as arxcruz|pto11:55
snadgei think the first start cleared a temporary table and bailed, and all i needed to do was start it again.. but i had something urgent come up and the day was ruined, then after awful day i thought maybe I should take a look at this.. reluctantly.. and easy success12:13
*** odyssey4me is now known as Guest733512:17
*** odyssey4me is now known as Guest733812:31
opendevreviewMerged openstack/openstack-ansible stable/wallaby: Fix ceph-ansible shallow_since date  https://review.opendev.org/c/openstack/openstack-ansible/+/80899913:35
spatelnoonedeadpunk how do you take backup of mysql database for openstack?13:58
noonedeadpunkwe use mariabackup script shipped with galera role :)13:58
spatelmysqldump --opt --all-databases > openstack.sql ?13:58
spateloh! where is that script?13:58
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L218-L23813:59
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/templates/mariabackup_script.py.j213:59
spatelall i need to do add  galera_mariadb_backups_enabled: true in user_variables.yml right? 14:00
noonedeadpunkkind of. jsut ensure you have that in your role because it landed not that long time ago14:02
*** odyssey4me is now known as Guest735214:07
spateloh wait what do you mean in your role?14:11
spatelnoonedeadpunk ^14:12
noonedeadpunknah, sorry, I meant more osa release14:13
noonedeadpunkbecause that landed in V14:13
spateli have V :)14:18
spateli am upgrading my prod from V-W right now so want to make sure backup is latest and gratest 14:19
spatelgreatest 14:19
noonedeadpunkoh, cool14:19
noonedeadpunklet us know how that passed14:19
noonedeadpunkI'm a bit nervous about PKI stuff and rabbitmq14:20
spatelwhy?14:23
spateldo you have any doubt so please share before i press buttom 14:23
spatelbutton*14:24
noonedeadpunkwell in CI it was all working:) 14:26
noonedeadpunkand technically it should be fine14:26
spateli did test in lab and didn't see any issue 14:27
spateli have multi-node lab like production 14:27
noonedeadpunkas you might know - we're generating certificate authority (you might want to define CSR details for it!)14:27
noonedeadpunkand replace SSL used for galera and distribute CA across all hosts and containers14:27
spateloh wait.. why it didn't ask me to do in LAB?14:28
noonedeadpunkbecause we have defaults set14:28
spateli am ok with default.. as far as it works 14:28
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/ssl.yml#L33-L6614:28
spatelcan i disable SSL if i don't like ?14:29
noonedeadpunknope14:29
spatelhmm so SSL is mandatory 14:29
noonedeadpunkwell, roles allow that, but you can't iirc because some oslo.messaging dependency now requires rabbit cert to be trusted and encryption enabled14:30
noonedeadpunkoh, well, thinking about it, I think you can disable encryption for rabbit after all14:30
noonedeadpunkas that's how we workarounded the issue with self-signed cert during development of stuff14:31
spatelhow to disable?14:32
spateli just want to know all tools before i start upgrade :)14:33
noonedeadpunkyou would need setting `openstack_pki_authorities: []`  and `rabbitmq_use_ssl: false`14:33
spatelin user_variables right?14:33
noonedeadpunkyep14:34
spatelthat is for rabbitmq what about mysql?14:34
noonedeadpunkmysql not imoplemented yet14:35
spatelgreat! so that is easy 14:35
noonedeadpunkwell, after W you would need that anyway just in case, if you want to have live migrations14:35
spatelwe don't do live migration in our cloud14:36
spatelwe have all local storage 14:36
noonedeadpunkah, I see14:36
noonedeadpunkbut actually local storage could live migrate as well iirc14:36
noonedeadpunkit just needs ssl :)14:36
spatelblock :)14:36
spatelwhat is the connection with live migration and SSL ? 14:36
spatelnova use SSH for block migration right?14:37
noonedeadpunkThis behaviour will be removed after W14:37
spatelhmm 14:37
noonedeadpunkAnd with SSH migration you can't migrate local block storage14:37
noonedeadpunkhttps://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_tunnelled14:37
spateloh 14:38
spatellibvirtd 14:39
spateli have added galera_mariadb_backups_enabled: true now what playbook i should be running without restarting mysql :)14:39
noonedeadpunkgalera-install with some tag...14:39
spatelits production so better ask stupid question 14:39
noonedeadpunk--tags galera_server-backups14:40
spatelsweet let me run14:40
noonedeadpunkthinking about that - it's worth moving this out of galera-server part...14:41
spatelhmm nothing happened 14:42
spatellet me check if that tags is correct14:43
noonedeadpunkdoh14:43
noonedeadpunkwe should have backported https://opendev.org/openstack/openstack-ansible-galera_server/commit/677dddf21a6d976c88e87fd0230ec1452a18217f14:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server stable/wallaby: Improve support for tags  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/80914714:44
spatelnoonedeadpunk so its no available ?14:45
noonedeadpunkyeah, they're not working :(14:46
spatelanyway.. let me use mysqldump 14:46
spatelthats is ok14:46
spateldo you have command to take full backup with all routine + user/password?14:46
noonedeadpunkI think you can create some playbook and use tasks_from: galera_server_backups.yml14:46
spatelsure14:47
spatelis this enough to take backup. mysqldump --opt --all-databases > openstack.sql14:47
spatelor any special option you would recommand14:47
noonedeadpunkthat would block your tables during backup just in case14:48
noonedeadpunkbut it will work14:48
noonedeadpunkalso you would need to backup grants sepearatelly14:48
spatelhmm let me see how to do that14:51
spatelmysqldump -u root -p --routines --triggers --opt --quote-names --all-databases > openstack.sql 14:51
spatelroutines should do grant also right?14:52
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Change pki_create_ca condition  https://review.opendev.org/c/openstack/openstack-ansible/+/80920514:52
noonedeadpunkum, no idea honestly. I usually used another request for that14:53
spateldo you have command, if you don't mind to share :)14:54
spatellet me do some restore test on LAB to verify back and restore works 14:55
noonedeadpunkNot 100% sure as I guess I lost mine copy of it, but it should be smth like mysql --skip-column-names -A -e"SELECT CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql --skip-column-names -A | sed 's/$/;/g' > MySQLUserGrants.sql14:56
noonedeadpunkmaybe `mysqldump -u root -p mysql user` also works  - dunno14:57
noonedeadpunklast time I used mysqldump years ago...14:58
spatelThank you! let me give it a shot in LAB 14:58
spatelworth putting small doc in OSA (just incase someone like me struggling) 14:59
noonedeadpunkwhy? when we have mariabackup inside the role?14:59
spatelin that case also we need something to let people know right?15:00
spatelOSA doesn't have any FAQ section, it would be great to have that to just reference links in FAQ for help15:01
noonedeadpunkoh, huh, I was pretty sure we have smth 15:01
noonedeadpunkbut indeed we don' (15:01
spatelI haven't seen any doc in OSA related recover except rebuild galera cluster 15:02
noonedeadpunkI guess it's matter of time that we don't have15:02
noonedeadpunkbut anybody can contribute to docs :)15:03
spatelme 15:03
spatelI would also like to create FAQ if you don't mind.. because that helps new folks to get onboard 15:04
spatelQuick and dirty Question / Answer 15:04
noonedeadpunkI think that can be discussed. What kind of stuff you see there?15:04
noonedeadpunkie what kind of questions?15:04
spatellike how to run specific playbook, how to use tags, how to use -vvv to debug, how to add new compute node, how to recover rabbitmq, how to recover mysql etc... 15:05
spatelwe do have doc but they are not on single page.. with reference 15:06
spatelhow to disable compute node.. / how to remove compute node / and many more small and simple question15:07
spatelsomething like this - https://docs.openstack.org/devstack/latest/faq.html15:08
noonedeadpunkeventually the thing here is to define what questuions should be covered there and whats not15:09
noonedeadpunklike disablement of compute - is smth related to nova for instance. We can reference nova doc ofc there. But then also neutron comes. And cinder for some kind of deployemtns15:09
spatelwe don't need to decide right now but we can start with simple thing and then keep adding more stuff as we grow 15:10
noonedeadpunkI guess we tried to do smth like that in https://docs.openstack.org/openstack-ansible/latest/reference/commands/reference.html15:10
spatelI know what you saying.. 15:12
spatelnoonedeadpunk do you have any experience with rental server?15:25
noonedeadpunknope15:25
spateldamn it :)15:25
noonedeadpunkI always everywhere was working on own hardware15:25
spatelwe have huge pressure to run stuff on remote rental datacenter and trying to figure out how 15:25
spatelwe need global presence and its hard to have local DC and staff 15:26
spateljapan/Singapore/south america etc.15:26
noonedeadpunkit depends on amount of presence. Because you can rent a rack kind of everywhere15:27
spateli am going with server.com to see how it goes15:27
spatelrental is easy but we want to deploy vlan provider so need bunch of vlans in fabric 15:27
spatelwant to make sure they offer that too 15:28
spatelnoonedeadpunk you are correct we need separate dump for grant table. so i did this and getting this error - https://paste.opendev.org/show/809343/16:05
spatelusing your command to dump grant and then copy paste all grant to new DB but seems like something is missing 16:06
noonedeadpunkspatel: So first you need to restore your all_databases16:06
spateli did 16:06
spateli can see all users in mysql.user table and all data 16:06
noonedeadpunkhuh16:06
spateljust grant part is missing in my restore 16:07
noonedeadpunkyeah, exactly16:07
noonedeadpunkso in mysql.user you have users records?16:07
spatelyes every one.. 16:08
spateli compared with my db where i tool backup16:08
noonedeadpunkdunno, that should have worked...16:08
noonedeadpunkI think you can try to google then how to backup grants16:08
spatelhttps://paste.opendev.org/show/809344/16:08
spatel:) let me figure out 16:09
noonedeadpunkmaybe smth has changed as again - I used that years ago...16:10
spateltotally16:11
spatellet me figure out and get back to you16:11
noonedeadpunkbut looks pretty valid to me tbh...16:12
spatelhttps://www.thegeekdiary.com/mysql-how-to-backup-user-privileges-as-create-user-and-or-grant-statements/16:12
spatelcheck this out16:12
noonedeadpunkthat exactly what I did for 5.5 haha16:13
noonedeadpunkI mean the command I gave16:13
spateli am running 10.5.12 16:13
noonedeadpunkno idea what is an alternative in mysql world16:14
noonedeadpunkI guess it's smth like 5.7...16:14
spatelmay be mariadb has something special 16:14
noonedeadpunkI doubt16:14
spatelhttps://mariadb.com/kb/en/mariabackup-overview/16:15
spatelapt-get install mariadb-backup16:15
spatelhmm it has special binary to do that16:15
noonedeadpunkmariadb-backup is whole new concept16:16
spatelhmm16:16
noonedeadpunkthat we use in role with script16:16
noonedeadpunkit's fork from percona xtrabackup16:16
noonedeadpunkwhich can make incremental backups16:16
noonedeadpunkspatel - question: does any service comes to your mind that has same naming for ubuntu/centos/debian both for package name and systemd name?16:17
spatelnet-tools 16:17
spatelsystemd name... hmm16:17
spatellet me think16:18
noonedeadpunkso eventually, I'm trying to write some tests for ansible systemd module...16:18
noonedeadpunkand need smth robust not to make complicated conditions16:18
noonedeadpunkI thought about chrony at first...16:19
noonedeadpunkbut it's tricky16:19
spatelrsyslog 16:20
spatelyou need same package and service name.. hmm still thinking16:20
spatelcrond 16:20
noonedeadpunknah, package and service name might be different16:26
noonedeadpunkeventualy I just need some service only)16:26
noonedeadpunkrsyslog sounds good enough!16:26
noonedeadpunkthanks a lot!16:28
spatelcool16:36
jrossernoonedeadpunk: I think there is a universal package name used in the old tests for openstack-hosts16:45
jrosseroh16:45
jrosserservice not package, ignore16:45
spatelnoonedeadpunk solution is - FLUSH PRIVILEGES;16:48
spatel:D16:48
spatelas soon as i did FLUSH PRIVILEGES; it accepted grant command16:48
noonedeadpunkah, lol16:51
spatelnoonedeadpunk i am seeing very odd issue in lab, i have created one mysql server outside openstack to test backup/restore and i did restore everything. now i am telling haproxy to go to my newly created db machine but getting this error https://paste.opendev.org/show/809345/17:21
spateli can connect directly to remote mysql but not able to do that via haproxy vip 17:22
spateldo we have any security stuff coming between it ?17:22
spatelnoonedeadpunk there?18:13
spatelthis is very odd, i can't use other mysql with haproxy.. i wonder some security issue coming on my way 18:14
spateleverything looks ok then why haproxy doesn't able to work with new mysql 18:16
mgariepyanyone has seen this with rabbitmq ? https://paste.openstack.org/show/809348/18:20
mgariepythat's after a restart.. 18:21
mgariepybefore i had : Channel error on connection 18:22
spatelmgariepy is your cluster status showing healthy?18:23
spateli have seen that error and i believe i re-build whole rabbitMQ 18:23
spatelnuke it 18:24
mgariepyrebuilt, one node at the time ? 18:26
spatelthis is what i did last time https://gist.github.com/satishdotpatel/f4f6cc2026da11fbb27a3527caba448a18:28
spatelif its production then just extra careful..18:28
spatelwhen i say nuke means re-build RabbitMQ / destroy and re-build 18:29
spateldo you have any notification queue ? 18:29
spateltry to purge that first 18:29
spatelhttps://gist.github.com/satishdotpatel/df751b5281726dca77065f78eab9584a18:30
spatelThis is other way to destroy rabbitMQ and rebuild from scratch - https://gist.github.com/satishdotpatel/9f11c54e86cb0f3ad59d5feac1827b1f18:30
mgariepyno notification queues.18:32
spatelgood18:32
mgariepyyep it's prod.18:33
spatelare you able to spin up vm or not?18:33
spatelif not then your downtime already started :)18:33
mgariepyi know ;p18:33
spateli would say just nuke it 18:33
spateli did many time in my cloud, because there is no easy way to fix rabbitMQ 18:34
spateltry https://gist.github.com/satishdotpatel/f4f6cc2026da11fbb27a3527caba448a  if not work then go with https://gist.github.com/satishdotpatel/9f11c54e86cb0f3ad59d5feac1827b1f18:35
mgariepyvms are still running. so it's not too bad haha :D18:38
spatelyes, they won't get impacted even your mysql is dead :)18:39
mgariepystop all rabbitmq > start all rabbitmq. seems to fix the issue.19:21
mgariepyfor now.19:21
mgariepyfrom : https://groups.google.com/g/rabbitmq-users/c/q1FEA4Q0z3Q19:50
mgariepythanks for your help spatel.19:52
spatelnice! 19:52
mgariepyi'll need to dig in the logs to see what happened tho.19:53
spatelmay be network isolation 20:00
spatelsplit-brain for few second 20:00
spateldid you stop start all node at same time or one by one?20:00
mgariepyall at the same time20:07
mgariepyansible rabbitmq_all -m service -a "name=rabbitmq-server state=stopped" ; sleep 10; ansible rabbitmq_all -m service -a "name=rabbitmq-server state=started"20:08
spatelhmm20:11
spateli thought you have to start first node and then second and third for quorum 20:11
spatellike Galera 20:11
mgariepythey had the data i guess.20:25
*** prometheanfire is now known as Guest121:20
*** promethe- is now known as prometheanfire21:53
snadgeim having trouble deleting a volume, it gets stuck in an error deleting state.. i've tried setting available and detaching and trying again, to no avail22:58
snadgethe volume appears to be mapped on the san and is online.. nothing in cinder logs23:12

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!