Wednesday, 2024-12-11

spateljrosser Hey! 14:18
jrosserhello14:19
spatelAround?14:19
jrossera bit, yes :)14:19
spatelhttps://paste.opendev.org/show/bIAsi8mx3HrA2gG0UTVd/14:19
spatelWhat is wrong here.. I am pulling my hair :(14:19
spateltrying to install magnum-cluster-api 14:20
spatelCluster type (vm, ubuntu, kubernetes) not supported14:20
spatelI found this reference thread but no solution yet -  https://kubernetes.slack.com/archives/C05Q8TDTK6Z/p171170418718632914:21
* jrosser shakes fist at slack14:23
jrosserspatel: so there are messages from the magnum-cluster-api almost as soon as magnum-conductor starts (you did restart it?!), like this https://zuul.opendev.org/t/openstack/build/ce6d644ed18144869e501b512ec165b1/log/logs/openstack/aio1-magnum-container-d0d4b96e/magnum-conductor.service.journal-19-13-04.log.txt#192114:29
jrosserwhere it does not find the manila service, for example14:29
jrosserthat shows that the driver is loaded14:29
spatelI did restart both api and conductor but if you think I should do conductor first then I can do in that order 14:30
jrosserthe order should not matter14:31
jrosserwhen it says that the cluster type is not supported it means that for some reason the driver is not loaded, like the thread in slack says14:31
jrosseri have never seen this though14:31
spatelhmm but i did install magnum-cluster-api with pip command as mentioned 14:33
spatelDoes it has anything to do with mgmt cluster?14:33
jrosserno i don't think so14:34
spatelThis is my mgmt cluster logs - https://paste.opendev.org/show/bwir9RYgJZPjFSbzUlfQ/14:35
spatelI thought mcapi go out and talk to mgmt k8s cluster and then register 14:35
jrosserregister?14:39
jrosseryou are trying to create the cluster template - this is something entirely internal to magnum and its db14:40
jrosserbut it tells you that it does not know what to do for a cluster with the tuple (vm, ubuntu, kubernetes)14:41
jrosserwhich suggests that the driver is not loaded for some reason14:43
spatelhmm14:52
jrosserit is also true that the first thing that the driver does is setup the API object for the management cluster https://github.com/vexxhost/magnum-cluster-api/blob/724a3fbb19889342fc427298729d1e3f88c35161/magnum_cluster_api/driver.py#L4614:54
jrosserthough if that actually touches the API at all is another matter14:54
spatelLet me redeploy mgmt cluster 14:56
spatelsomething funny going on 14:57
spatelI am waiting for mnaser to respond :)14:57
jrosserhow did you deploy it?15:07
spatelhttps://paste.opendev.org/show/bVYdx1iH7YXna9AnCT8a/15:11
spatelbut i am doing latest version of clusterctl  15:11
spatelinstead of old15:12
jrosseroh well i meant magnum-cluster-api15:12
spatelpip install magnum-cluster-api15:12
jrosseryou took care of upper-constraints etc when using pip?15:12
spatelno15:13
spatelAll i do just pip install magnum-cluster-api15:13
spatelI did deploy couple of cluster like that before and all works 15:13
jrosseroh sure15:14
jrosserall i am saying is that maybe at that time you got lucky with the versions of things that were on pypi15:14
jrosserperhaps today not so lucky15:14
spatelI am also following - https://www.roksblog.de/openstack-magnum-cluster-api-driver/15:14
jrosserbut i have no idea15:14
spatelI am going back to older version and see 15:15
jrosseri am more meaning dependancies15:15
spatelHmm.. all I have to go back to older version and try 15:15
jrosserbut like i say i have no idea about installing "open loop" like this without any constraints15:15
jrosserthats why there is a ton of complexity in the way that OSA makes venvs to ensure that everything is consistent15:16
spatelhmm 15:16
jrosserideally you need to use whatever mechanism kolla has to insert extra python packages at the point that the container image is built15:17
jrosser(i have no idea if this is possible at all)15:17
sykebenXnoonedeadpunk: Good afternoon! Just wanted to thank you for your help yesterday and report back that I found out what appears to have been causing the issue. It seems that the SSH Multiplexing session from a previous run as an unprivileged used (ANSIBLE_REMOTE_USER=nonroot) was being re-used on subsequent runs and as a result, even though the SSH command that initiates the lxc-attach during gather container facts specifies 18:15
sykebenXdoes not respect that18:15
sykebenXthe workaround was to simply `killall ssh` to kill the multiplex sessions and re-run with ANSIBLE_REMOTE_USER=root18:15
sykebenXStill not 100% sure why running as nonroot in the first place will not work, but this seems to at least be part of the problem I was having18:16
jrossersykebenX: I’m not sure if we say that non root is supported or not18:20
jrosserwe certainly do not test that at all18:20
sykebenXFair enough! I assumed that based on this document: https://docs.openstack.org/openstack-ansible/latest/user/security/non-root.html18:21
jrosseraaaah ok18:22
jrosserit would probably be not too difficult to set up a CI job to check that works properly18:23
jrosserultimately you end up with become=true  so the difference with root and non root is pretty small, but I understand that some places have policy about this18:24
sykebenXjrosser: Yeah I agree it's probably a bit unnecessary, but yes, it is a requirement for some orgs18:25
spateljrosser no luck 18:39
spatelI did downgrade everything to working envirnment 18:39
spatelI have 2023.1 running in production with mcapi and everything working well18:40
jrosserI think you have to do some more active debugging18:40
spatelI took 2023.1 magnum container and put it on 2024.1 cloud 18:40
spatelI did enable debug but not a single error 18:40
spateleverything looks clean... 18:40
jrosseryou can use this to check that the driver is visible to magnum (I think) https://pypi.org/project/entry-point-inspector/18:41
spatelI am running mcapi in yoga + zed + 2023.1 with same config and same install method 18:41
spatelonly with 2024.1 I am seeing this issue.. I don't think its 2024.1 release issue 18:41
jrosseryou can add some debug prints to the mcapi driver to see if it even inits18:42
jrosserwell, if it was me I would spin an AIO and test locally18:42
jrosserbut then for OSA we put in ~1 year of effort to make that possible18:42
spatelLet me enable debug but I can tell you its not saying anything in debug at all.. even no "capi" keyword in entire log search 18:43
jrosserthat’s how I get confidence that it will work for me in prod when I need it - that the upstream tests are representative and working18:43
jrosserthen that still points to the driver not loading18:44
jrosserdoes kolla use a venv inside the docker container?18:44
spatelQuestion, how does magnum know that i need to load mcapi ?18:44
spatelYes docker use venv 18:44
jrosserthrough the stevedore framework I think18:44
jrosserwhich is why I pointed you to entrypoint-inspector18:45
jrosserso that you can check the visibility of the installed driver18:45
spatelguide me.. how.. let me first install entrypoint-inspector18:45
spatelpip install entrypoint-inspector 18:46
jrosserI am not at my work computer right now18:46
spatelERROR: Could not find a version that satisfies the requirement entrypoint-inspector18:46
spatellet me google 18:46
jrosserkind of https://doughellmann.com/projects/entry-point-inspector/18:47
jrosserah maybe entry-point-inspector18:47
jrosserI am just on my phone :)18:47
spatelworks! 18:48
spatelThank you so much for the help.. from your phone :)18:48
spatelI have installed that package 18:48
spatelnow enable debug and restart service? 18:48
jrosserno use the epi command to look at the drivers for magnum18:49
spatelep show 18:50
spatellet me post output18:50
spatelhttps://paste.opendev.org/show/bMzDNe0qidpLyijcPetw/18:51
spatellook like driver is loaded 18:51
jrosserwell, it’s discoverable18:52
spatelhmm18:52
spatelso how to make sure its loading during service start?18:52
jrosserI’m not sure18:54
jrosserit almost sounds like you have magnum running in one place and the driver installed somewhere else18:55
jrosserdid you check the magnum service runs actually from the same venv?18:55
spatelhmm! that is a good point 18:56
spatellet me check.. hold on.. 18:56
noonedeadpunksykebenX: I think this also could be to usage of session presistance that speeds up ansible dramatically, but maybe you can play with that... https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/openstack-ansible.rc#L5218:56
spatelbut this is how I am doing everywhere and it works in most case.. 18:56
spateljrosser its in right place - https://paste.opendev.org/show/bF2bVw5I5k1eKBaoX1Zq/18:57
spatelmagnum and magnum-capi in same venv environment 18:58
spatelDo you think its SDK problem not the real server 19:00
sykebenXnoonedeadpunk: Thank you! I will investigate that19:01
spateljrosser no kidding it works....19:06
spatellook like i found something...19:06
jrosseroh?!19:06
spateljust hold on.. let me prove myself19:07
spateljrosser here is the culprit  https://paste.opendev.org/show/brp45S4hl9YgtuKAoYBs/19:09
spatelthis line was in magnu.conf file 19:09
spatelsomehow 2024.1 has this option added 19:10
spatelas soon as I removed all good 19:10
noonedeadpunkstackhpc influence on kolla ?:)19:10
spatelYes yes.. they are pushing there driver 19:11
noonedeadpunkif you ever happen to try out their driver - I will be all ears to listen about it 19:11
jrosserthat is probably because it uses the same vm/ubuntu/k8s tuple so cannot co-exist19:12
noonedeadpunkI wonder actually if out control cluster could be used "as is" with their driver19:12
noonedeadpunkyup19:12
jrosserprobably19:12
noonedeadpunkwhich is in fact stupid problem to have kinda... imo, it's high time to invent a better label for images...19:13
noonedeadpunkanyway19:13
spatelAgreed... 19:14
spatelThis damn confusing with same variable but multiple drivers 19:14
spatelit took my 3 days worth of life :(19:15
noonedeadpunkalso having "ubuntu" value to select magnum drivers... is not intuitive19:15
noonedeadpunkalso os_distro is used to define distribution indeed, and we mark all existing images with os_distro...19:16
noonedeadpunkdue to https://docs.scs.community/standards/scs-0102-v1-image-metadata/#technical-requirements-and-features19:16
noonedeadpunkto it produces soooo much more issues....19:17
spatelThank you jrosser for helping me out.. I like that entry-point stuff.. good to know 19:22
-opendevstatus- NOTICE: Gerrit will undergo a short restart to pick up some bugfixes for the 3.10 release that we upgraded to.19:24
jrosserspatel: https://opendev.org/openstack/kolla-ansible/commit/4879656058c43a88eb934ae0c61251aaa5ff82b919:45
jrosserthis all seems to be because of not defining magnum_kubeconfig_file_path19:45
spatelJust found that from slack with talking to kolla developer - magnum_kubeconfig_file_path is not defined19:45
jrossersnap :)19:46
spatelhaha! bummer (This is bad variable.. it should throw error instead silent killer)19:46
jrosseranyway good to understand - nice it’s working now19:48
spatelindeed.. lots of hide jewels we discovered 19:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!