Tuesday, 2022-05-10

*** ysandeep|out is now known as ysandeep|rover04:42
noonedeadpunkmornings07:12
jrossergood morning07:38
foutatorohello jrosser07:57
jrosserhello07:57
damiandabrowski[m]morning folks!07:58
*** ysandeep|rover is now known as ysandeep|rover|lunch08:49
admin1morning09:20
*** ysandeep|rover|lunch is now known as ysandeep|rover09:47
mgariepygood morning everyone11:23
*** dviroel|afk is now known as dviroel11:28
opendevreviewMarc GariĆ©py proposed openstack/openstack-ansible-os_tempest master: [DNM] testing if all the tests are still passing.  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/84125712:22
mgariepynoonedeadpunk, uca doesnt have tempest plugin package. jammy does have some but i guess it's only in universe and won't stay up to date anyway.12:23
noonedeadpunkwe were using source install for ubuntu tempest regardless if it's source or distro install12:25
noonedeadpunkI thought we had something that prevented this from failures12:26
noonedeadpunkmaybe we can disable building wheels on ubuntu, when it's distro install12:26
mgariepyi thing the hard-coded "source" install does fix it12:26
mgariepywe will see.12:27
noonedeadpunkbut we don't have repo container when rest is distro install? do we?12:27
mgariepyindeed we do not.12:27
mgariepylet's see if it passes. if not i'll debug it.12:28
noonedeadpunkthen it should fail on attempt to get constraints file from repo container12:29
mgariepytempest was installing from source on distro install test a couple weeks ago.12:29
noonedeadpunkwell. I dropped some things maybe :D12:30
mgariepylol. maybe that's why i'm re-testing the role haha12:30
noonedeadpunklike with https://review.opendev.org/c/openstack/openstack-ansible/+/83784512:31
noonedeadpunkBut I don't see what would result in the issues...12:31
noonedeadpunkMaybe we also fixed not deploying repo container for distro installs when were merging gluster12:32
mgariepywell me neither.12:32
mgariepylet's wait for the test result. 12:32
noonedeadpunkbut eventually this fails only for tempest role12:32
noonedeadpunkwhich is really interesting12:32
mgariepywe do have another role that doesn't support distro install.12:32
mgariepygnocchi.12:33
noonedeadpunkI guess we should jsut drop distro support there?12:36
noonedeadpunkor whole telemetry does support it?12:36
mgariepyi have no idea if there is gnocchi in uca or not. it's getting really hard i think to have all our roles patched at the same time12:37
mgariepythere are always one or 2 or 4 that are left behind.12:37
lowercaseI'm finally getting to a place where I feel comfortable uploading my work with fluentd, openstack and loki into a public repo. What's the repo where this would all go. The one that had the elk configurations and such.12:45
jrosserlowercase: openstack-ansible-ops is the repo for this sort of thing12:47
mgariepyfoutatoro, did you recovered your cluster ?12:48
foutatoromgariepy, good morning. 12:56
foutatoromgariepy: not yet I have a really strange issue. previous VM disks seem to be in ceph vms pool but I can't list them not attached them to appropriate VM12:58
foutatorohttps://paste.opendev.org/show/b1Mhw44QtAMH8qlQVTBL/12:58
lowercase4 in (since 6M)12:59
lowercasedid you just recover an osd?12:59
mgariepy6M i guess it month./12:59
mgariepyit's 6 months**13:00
lowercasethen why are the pgs degraded?13:00
lowercaseif he didn't lose an osd13:00
foutatorolowercase: I've 4 osd this is a pred-prod13:00
mgariepywas an osd out for a long time ?13:01
foutatoromgariepy>: no13:01
lowercasewhat happened 14 hours ago13:01
foutatorolowercase: due to a incident all infra hosts were restared 13:02
lowercasedo you infra hosts also host osds?13:02
foutatoroyes13:03
lowercaseokay, so all osds were offline 14 hours ago13:03
foutatoroexact13:03
lowercasewhich makes a 6 month uptime for that osd impossible. So what happened 6 minutes ago?13:04
foutatorosince the restart the cinder-volune service state is down also13:04
jrosserthe backend is down, not the service13:04
foutatoronothing happens 6 minutes ago13:04
lowercaseyour ceph health status says otherwise13:04
foutatorolowercase mgariepy: is there a way to download rdb objects as qcow2 ?13:08
mgariepyfoutatoro, you can copy the image from ceph yes.13:09
lowercaseokay, i was wrong. mgariepy was correct. 12 osds: 11 up (since 2m), 12 in (since 10w)13:10
lowercasei restarted an osd in my dev cluster just to confirm.13:10
mgariepyfoutatoro, https://paste.openstack.org/show/bj72ZpWyVrbgBcSaswTU/13:11
mgariepysimple command with a few args easy to remember by heart13:12
mgariepybut you really should try to see why cinder is not starting.13:13
jrossercinder volume backends can be down because of rabbitmq trouble13:13
mgariepyfoutatoro, ok first thing, can you lists projects and users (this will tell you if keystone works)13:14
foutatoroyes I can list projets, users, neutron networks, previous instances names ...13:14
mgariepyfor rabbitmq, what does `rabbitmqctl cluster_status` tells you13:15
foutatoro `rabbitmqctl cluster_status`: https://paste.openstack.org/show/b3GkAwR515Mfs9unC40K/13:17
mgariepyok seems ok i guess.13:18
mgariepynow cinder did you restart it after you fixed the galera cluster ?13:18
foutatoroyes, I restart containers and all services with 'systemctl restart cinder*'13:19
mgariepyand the cinder api is online in your haproxy ?13:20
mgariepy hatop -s /var/run/haproxy.stat13:20
foutatorohttps://paste.openstack.org/show/byhWiTt4sJ5FNdEegIry/13:23
foutatorocider-api is not marked as UP13:23
mgariepycinder_api-back seems UP.13:24
mgariepyin the cinder container13:25
mgariepywhat does cinder log looks like ?13:25
mgariepy`journalctl -u cinder.slice -f`13:25
mgariepy`systemctl status cinder.slice`13:27
foutatorohttps://paste.openstack.org/show/bFSZYN9I2hL5DkDAJLfo/13:27
jrosser`cinder service-list` 13:29
lowercaseThis might help narrow down: `journalctl -u cinder-volume -p 3` or `journalctl -u cinder.slice -f -p 3`, -p 3 only shows logs that are marked as errors.13:31
mgariepynice about -p3 13:32
mgariepyi usually to `-n 10000|grep something` :D lol13:32
mgariepyor --since with some quick google haha13:32
lowercaseanother favorite is --no-pager, which makes journactl not view in a bad ... well pager.13:34
foutatorocinder service-list return a Bad Gateway it tries to join serve running on 877613:35
mgariepy`opesntack volume service list`13:35
foutatorohttps://paste.openstack.org/show/bc2YxdSQWJpCnle4a1xw/13:35
foutatorohttps://paste.openstack.org/show/b6BpHCcSaf8FfG0Uf8Eh/13:36
foutatoro`opesntack volume service list`:  https://paste.openstack.org/show/b6BpHCcSaf8FfG0Uf8Eh/13:36
foutatoroI'm restarting the scheduler13:37
lowercasecinder-api is prob offline13:37
lowercaseyou don't even have a cinder-api?13:37
jrosseri don't think it appears in that list anyway13:38
mgariepyindeed it doesnt13:39
lowercaseit sure doesn't13:39
lowercasehuh13:39
mgariepysystemctl restart cinder.slice13:39
mgariepyor the status before.13:40
mgariepyjust to see.13:40
jrosseri keep saying that the up/down there isnt about the service running or not :)13:40
jrosserit's the backend13:40
foutatorocinder.slice status: https://paste.openstack.org/show/bmHEid7kcMNSyWILsUU6/13:41
lowercase`journalctl -n 100 -p 3 -u cinder.slice` command please13:43
jrosseri do not thing that it is correct to have both rbd:volumes@RBD and infra2-cinder-volumes-container-bdc12de4@RBD cinder-volume services both listed13:44
jrosserthat is a sign that there is something wrong with the active/active parts of the config13:44
mgariepyor it wasn't cleaned up ?13:45
jrosseryes, or that13:45
mgariepyif it was installed in the good old days. and it was never cleaned up it can be there.13:46
foutatorohttps://paste.openstack.org/show/b84ZTb019u7lU7cDudwI/13:46
jrosserlooks like at least rabbitmq trouble there13:49
jrosseryou can use netstat or something to see if there are any actual connections working13:50
lowercasepymysql.err.OperationalError: (2013, 'Lost connection to M13:53
lowercaseySQL server during query')13:53
lowercaseyeah, both rabbit and mysql issues.13:53
lowercasea whole log of mysql issues.13:54
foutatoroI see but those errors were at 2022-05-10 08:03:24.39713:59
foutatoroand I restart services after13:59
lowercaseIs the time on the server the same time as your timezone?14:00
lowercasecause mine are set to UTC and that screws me up all the time lol14:00
mgariepyi prefer to have utc everywhere .. here we have some day light saving ( +1 / -1 every 6 months)14:01
mgariepynoonedeadpunk, tempest still pass with the static install_method.14:09
noonedeadpunkoh, ok14:09
mgariepygood enough for me :D haha14:11
mgariepyfoutatoro, https://paste.openstack.org/show/bmHEid7kcMNSyWILsUU6/ is seems to be missing the api service14:22
mgariepyunless i'm mistaken.14:22
mgariepyha. it's not in the cinder slice :/14:24
mgariepyit's in the uwsgi slice...14:25
mgariepyfun.14:25
foutatoroso I have to run 'openstack-ansible os-cinder-install.yml' ?14:26
mgariepysystemctl status cinder-api.service14:27
mgariepy`journalctl -u cinder-api.service -p 3 -n 100`14:29
mgariepyfoutatoro, i don't think running playbook will help you there.14:33
mgariepyit's better to try to find the root cause14:33
mgariepyeven more since this is a pre-prod system. you can take the time to debug it. it's not like when the produciton cluster have issues14:34
foutatorocinder-api status: https://paste.openstack.org/show/bDXX5EswGiW5HqJ1Ietk/14:39
foutatoromgariepy: I don't know why this message "http://infra2-cinder-api-container-632dfbb4:8776/ returned with HTTP 300" but "wget http://infra2-cinder-api-container-632dfbb4:8776/" works fine on both containers14:43
mgariepy300 is multiple choice from haproxy check i think.14:43
lowercasehttp 300 just means multiple choice, meaning that the url isn't a terminating url and there are multiple url paths that it can follow. i would try curl -L <url> and see if you get a 200 from that14:44
lowercasebut can you perform the same command with -p 3 appended to it14:45
lowercaseyour api server is clearly running, processing requests. Howevor, since the issue is a rabbit or database issue, i want to see if the api service is complaining about either of those.14:46
foutatorolowercase: right, curl -L works but adding -p 3 makes the request not terminate 14:52
lowercase-p 3 to the journactl command lol14:52
foutatoromy bad14:52
lowercase`journalctl -u cinder-api.service -p 3 -n 100`14:52
*** dviroel is now known as dviroel|lunch|afk14:53
foutatorolowercase: journal shows logs of yesterday14:55
foutatorohttps://paste.openstack.org/show/bgQAuugtt5ERJVxobuIi/14:55
noonedeadpunk#startmeeting openstack_ansible_meeting15:00
opendevmeetMeeting started Tue May 10 15:00:29 2022 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:00
noonedeadpunk#topic rollcall15:00
noonedeadpunko/15:00
noonedeadpunkwell, I'm actually semi-around15:01
mgariepyhey o/15:01
jrosserhello o/15:02
noonedeadpunk#topic office hours15:05
ebbexo/15:05
noonedeadpunkI will be honest - I done nothing. I can't even recall what I was doing whole week...15:05
noonedeadpunkLikely side-effect after moving to the new place...15:06
noonedeadpunkjrosser: you had some issues with merging repo stuff - should we discuss it?15:06
jrosseroh yes, i left it alone for a few days15:06
jrosserbut i think it's got all a bit circular15:07
noonedeadpunkWe can always disable CI to land that...15:07
jrosserwell maybe a couple of things to look at first15:07
damiandabrowski[m]hi! 15:08
jrosserthe glusterfs filesystem does not exist until we merge this https://review.opendev.org/c/openstack/openstack-ansible/+/837589/13/playbooks/repo-install.yml15:08
jrosserthe repo_install playbook needs updating to create it, as the current use of serial: breaks the installation15:08
jrosserthe tasks cannot be serial for those parts15:09
jrosserbut then logically the next patch to merge (until I thought about it) was this https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83941115:09
jrosserbut i don't think thats ever going to pass without the first one15:09
jrosseras the fs will not exist15:09
jrosseri think rather than circular patches, i mean its very hard to get everything to pass in CI without making it circular15:11
noonedeadpunkI also left comment for https://review.opendev.org/c/openstack/openstack-ansible/+/837589/13/ansible-collection-requirements.yml#40 just in case :)15:12
jrosserah yes i saw that15:12
jrosseri got kind of diverted by playing with skyline15:13
jrosserbut we should try to get this gluster stuff merged becasue it is a big change and needs some testing for real15:13
noonedeadpunkyeah15:13
jrosserdeleting / re-creating repo server containers has some subletlies now, for example15:13
noonedeadpunkBtw regarding rbac topic - I guess there's no reall need to do changes this release since cinder/heat are still not ready15:14
noonedeadpunkBut I'd rather introduced service role anyway, despite discussions about it are still ongoing15:14
noonedeadpunkwe can suggest dropping all repo containers at once for example as well15:15
noonedeadpunkas that should be fine I guess?15:15
jrosseri mount /openstack/glusterfs in the current patches15:15
noonedeadpunkAs there's nothing _really_ important anyway15:15
jrosseras there is UUID need to be preserved, else you can't re-create/join the cluster properly15:15
jrosserso sometimes you need to keep that, sometimes you need to delete it15:15
jrosserdepends if you want to destroy the fs and start again, or to keep it15:16
noonedeadpunkI actually thought it will get removed with force_containers_data_destroy ?15:16
jrosserthat does not seem to understand whatever bind mounts get made15:16
noonedeadpunkBut not sure15:16
jrosserhowever in this case, i think that preserving it is the right thing to do for multinode15:17
jrosserthere is also an impact on re-deploying an infra node15:17
jrosserhaving said all this - i really would like other eyes / opinions on it15:18
noonedeadpunkyep, fair15:20
noonedeadpunkanother thing - do we want to have a presentation about project updates?15:20
noonedeadpunkTHere's no dedicated event during summit for that, but still marketing has some plan how to promote these15:20
noonedeadpunkBasically they asked for a video 10mins tops to say about changes that were made lately15:21
jrosseri guess we would have to look back over the etherpads to see what we did / did not do15:22
noonedeadpunkyup, agree15:23
noonedeadpunkI will try to put smth into other etherpad so we could review topics next week15:23
damiandabrowski[m]okok, great15:24
noonedeadpunkok, what else we have on plate? 15:26
noonedeadpunkExcept tons of stuff that needs to land?15:27
jrosserhmmm yes - reviews / merging of lots of things15:27
jrosseri should also say that i have done a proof-of-concept with the alternative dashboard, skyline15:28
jrosserand it's ummmm - interesting to deploy15:28
noonedeadpunkI can imagine, as it's nodejs iirc?15:29
noonedeadpunkat least frontend part of it15:29
jrosserthere is a python part 'apiserver' then nodejs things for 'console'15:30
damiandabrowski[m]i will spend some time on reviews tomorrow15:30
jrosserand imho the code is very docker / kolla centric15:30
jrosserand also confuses the service code with deployment tooling, as theres a executable to generate the required nginx config /o\15:30
jrosserbut i think this is an opportunity to influence the skyline development to support wider tools and deployments15:32
jrosserdebugging this is really on the edge of my understanding though, so if anyone is interested with web development skills then please help out :)15:33
noonedeadpunktool to generate nginx config sounds as it sounds ofc...15:42
noonedeadpunkAnd also I saw there's no SSO support atm15:42
noonedeadpunkSo I'd say they have plenty gaps as of todayy15:42
noonedeadpunkbut you're right, we'd better chime-in earlier then later15:43
jrossertheres kind of two parts i think - ansible'ing up the deployment, pretty much whatever-it-takes to make it work15:44
jrosserthen work on tidying that all up and making it all more OSA-like15:45
noonedeadpunk#endmeeting16:00
opendevmeetMeeting ended Tue May 10 16:00:34 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.html16:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.txt16:00
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-05-10-15.00.log.html16:00
*** ysandeep|rover is now known as ysandeep|out16:23
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Make octavia_provider_network better configurable  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/78733616:46
*** dviroel|lunch|afk is now known as dviroel\18:53
*** dviroel\ is now known as dviroel18:53
*** dviroel is now known as dviroel|out21:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!