Tuesday, 2020-04-07

flwang1give yours is a new env, pls go for fedora coreos00:01
cosmicsoundflwang1 , i tried those images in this new setup somehow they all failed to boot00:04
cosmicsoundmaybe my cli for coreos template is not great00:04
flwang1cosmicsound: i think it's because you're using old version heat00:04
flwang1what's your heat version?00:04
cosmicsoundi use kolla-ansible to deploy from source on ubuntu00:05
cosmicsoundso i should not be using old version00:05
flwang1cosmicsound: what's your heat version?00:06
*** k_mouza has joined #openstack-containers00:08
*** k_mouza has quit IRC00:10
cosmicsoundnot sure how i see it00:12
cosmicsoundcli shows me the cli version of cli cliennt00:12
cosmicsoundheat --version00:13
cosmicsound2.0.000:13
flwang1hmm.. no, not heat cli version00:31
flwang1your heat service version00:31
cosmicsoundnot sure how to get that yet00:42
cosmicsoundmust be the train stable version00:42
openstackgerritMerged openstack/magnum master: Support calico v3.3.6  https://review.opendev.org/71711601:23
cosmicsoundit seems if so i found the issue02:05
cosmicsoundor not02:08
*** k_mouza has joined #openstack-containers02:10
*** k_mouza has quit IRC02:15
cosmicsoundmanifest for kolla/ubuntu-source-magnum-api:9.3.0 not found: manifest unknown: manifest unknown02:32
openstackgerritLingxian Kong proposed openstack/magnum master: [K8S] Delete all related load balancers before deleting cluster  https://review.opendev.org/71693003:48
*** ykarel|away is now known as ykarel04:20
cosmicsoundApr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to https://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no certificate or crl found (_ssl.c:4232)')))04:23
cosmicsoundthis is only in coreos happening04:24
*** AJaeger has left #openstack-containers05:04
*** ricolin has joined #openstack-containers05:30
*** udesale has joined #openstack-containers05:41
openstackgerritFeilong Wang proposed openstack/magnum master: [WIP] Support multi AZ for k8s multi masters  https://review.opendev.org/71434705:44
brtknrcosmicsound: are you using os_distro=fedora-coreos or coreos?05:57
cosmicsoundfedora-coreos06:03
cosmicsoundtried with your scrips06:03
cosmicsoundscripts*06:04
cosmicsoundit worked for atomic06:04
cosmicsoundnot for coreos06:04
cosmicsoundI get this in logs06:04
cosmicsoundApr 07 05:45:48 k-necofnexy5va-master-0 podman[2327]: Source [heat_local] Unavailable.06:04
cosmicsoundApr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: Source [request] Unavailable.06:04
cosmicsoundApr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: /var/lib/os-collect-config/local-data not found. Skipping06:04
cosmicsoundApr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: No auth_url configured.06:04
cosmicsoundI updated to latest image from coreos06:04
*** udesale has quit IRC06:09
*** udesale has joined #openstack-containers06:13
*** xinliang has joined #openstack-containers06:26
brtknrcosmicsound: wait so coreos is booting up but not making the cluster?06:30
brtknrIf you are using kolla ansible, can you try using the master branch for magnum?06:31
brtknrI have seen that before for tls endpoints06:31
brtknreg if your keystone is https06:32
brtknrit works when it’s http06:32
brtknrplease file a bug report06:32
brtknrAnd we shall look at it06:32
brtknrBut try the master tag for magnum fiest06:33
brtknrfirst06:33
cosmicsoundyes06:36
cosmicsoundI will try06:36
cosmicsoundNow i am on train06:36
openstackgerritMerged openstack/magnum master: Cleanup py27 support  https://review.opendev.org/71754906:41
*** ttsiouts has joined #openstack-containers06:45
cosmicsoundbrtknr , with master branch and your script for atomic it gives me: Create_Failed: Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.docker_volume: Invalid input for field/attribute availability_zone. Value: . '' is too short (HTTP 400) (Request-ID: req-f3846f3b-bd55-4d64-bb1a-33779b5a43ba)07:16
*** xinliang has quit IRC07:29
cosmicsoundbrtknr , Create Failed07:29
cosmicsoundResource Create Failed: Badrequest: Resources.Kube Masters.Resources[0].Resources.Kube Node Volume: Invalid Input For Field/Attribute Availability Zone. Value: . '' Is Too Short (Http 400) (Request-Id: Req-E909dacb-8c96-467b-A859-1ae5ddb40e4a)07:29
cosmicsoundi added availability_zone=nova and then i got the second error07:29
cosmicsoundand weirdest is that. on new version i cannot even. remake the cluster.  that worked before: Resource Create Failed: Error: Resources.Kube Masters.Resources[0].Resources.Master Config Deployment: Deployment To Server Failed: Deploy Status Code: Deployment Exited With Non-Zero Status Code: 107:59
cosmicsoundhttp://paste.openstack.org/show/791719/08:00
*** born2bake has joined #openstack-containers08:02
*** k_mouza has joined #openstack-containers08:11
*** k_mouza has quit IRC08:15
*** ykarel is now known as ykarel|lunch08:48
brtknrcosmicsound: are you using cinder volume?08:51
cosmicsoundbrtknr , yes08:51
cosmicsoundcinder backed by ceph08:51
brtknrcan you disable it and try08:51
cosmicsoundnot sure how you mean that08:52
cosmicsound labels               | {'heat_container_agent_tag': '689704', 'kube_tag': 'v1.14.8'}08:52
brtknralso you need the latest heat also08:52
brtknrdo you have volume_driver=cinder?08:52
cosmicsoundlet me check08:52
cosmicsoundlatest heat with latest magnum?08:53
brtknryes08:54
cosmicsoundwill give it a try08:54
cosmicsoundanyhow08:54
cosmicsoundmost occurd i find08:54
cosmicsoundthat the same template that made yesterday a healthy cluster08:55
cosmicsoundtoday it fails08:55
cosmicsoundthere is no volume_driver setup08:56
*** k_mouza has joined #openstack-containers09:37
*** ykarel|lunch is now known as ykarel09:41
*** k_mouza has quit IRC09:49
*** k_mouza has joined #openstack-containers09:57
*** k_mouza has quit IRC09:57
*** k_mouza has joined #openstack-containers09:57
*** ykarel is now known as ykarel|meeting10:02
cosmicsoundbrtknr , it did not helped10:04
cosmicsoundsomehow it fails continually10:04
brtknrcosmicsound: why are you using heat_container_agent_tag 689704?10:12
brtknrussuri-dev is recommended10:12
cosmicsoundthats what you had stored in the script10:13
cosmicsoundand thus was the first working cluster10:13
cosmicsoundi will update them all to ussuri-dev and try again10:13
*** ricolin has quit IRC10:16
cosmicsoundheat_container_agent_tag=ussuri-dev,kube_tag=v1.16.010:27
cosmicsoundtrying this10:27
cosmicsounddockerd-current[1343]: time="2020-04-07T10:30:28.670451562Z" level=error msg="Handler for GET /v1.26/containers/etcd/json returned error: No such container: etcd"10:30
cosmicsoundApr 07 10:30:14 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:30:14,291] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/lib/heat-config/deployed/56955806-b478-4789-b1c6-e5e747713f43.json10:31
cosmicsoundit will stay here 2 3 mins, then will time out with os-profile not found10:32
cosmicsoundbrtknr , it goes here10:37
cosmicsoundApr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:36:02,657] (os-refresh-config) [INFO] Completed phase migration10:37
cosmicsoundApr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: INFO:os-refresh-config:Completed phase migration10:37
cosmicsoundApr 07 10:36:04 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: /var/lib/os-collect-config/local-data not found. Skipping10:37
cosmicsoundand it dies10:37
cosmicsoundno matter what labels i use10:37
brtknrcosmicsound: please check inside /var/log/heat-config as i mentioned before10:37
*** ykarel|meeting is now known as ykarel10:37
*** vishalmanchanda has joined #openstack-containers10:40
cosmicsoundi did10:41
cosmicsoundthis is from there10:41
*** k_mouza has quit IRC10:42
*** k_mouza has joined #openstack-containers10:46
cosmicsound4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ ssh -F /srv/magnum/.ssh/config root@localhost ls /dev/disk/by-id10:48
cosmicsound4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ grep 'd3212ba7-394d-45f1-9$'10:48
cosmicsound4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: + device_name=10:48
cosmicsoundHere comes the trouble10:48
cosmicsoundhttps://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/configure-etcd.sh#L2510:50
cosmicsoundthis one cannot be retrieved by. heat-agent10:50
cosmicsoundand will fail10:50
*** ttsiouts has quit IRC10:59
cosmicsoundbrtknr , with or without hw_scsi_model=virtio-scsi11:03
cosmicsoundit seems scsi can cause also issues with above bug we got here11:03
cosmicsoundil confirm soon11:03
brtknrplease show me the full log11:04
brtknrcosmicsound: well, have you specified etcd_volume_size?11:05
brtknrtry disabling it11:05
brtknrif what you say is true then that is only ever executed if this condition is met: if [ -n "$ETCD_VOLUME_SIZE" ] && [ "$ETCD_VOLUME_SIZE" -gt 0 ]; then11:06
cosmicsoundbrtknr , i had o etcd specified this time11:07
cosmicsoundalso i was o scsi now on virtio11:07
*** ttsiouts has joined #openstack-containers11:09
cosmicsoundhttp://paste.openstack.org/show/791719/ here is full log from heat11:11
cosmicsoundcoreos seems to be stucked at:11:35
cosmicsound+ echo 'Trying to label master node with node-role.kubernetes.io/master=""'11:35
cosmicsound+ sleep 5s11:35
cosmicsound++ curl --silent http://127.0.0.1:8080/healthz11:35
cosmicsound+ '[' ok = '' ']'11:35
ttsioutsstrigazi, brtknr: are you guys around?12:07
strigazio/12:08
ttsioutso/12:08
ttsioutsI wanted to talk about the spec12:08
ttsioutsI kind of like brtknr's idea.12:09
ttsioutsI could start rewriting the spec based on that12:09
guilhermespthanks for sharing the conformance results flwang1 ! Not sure but for both v1.17.4 and v1.18 i'm getting the same tests failing http://paste.openstack.org/show/791730/12:26
guilhermespwhich is mostly dns tests12:26
*** ttsiouts has quit IRC12:34
born2bakeguys is there any up-to-date guide how to use magnum and deploy up-to-date k8s with magnum?12:41
born2bakeusing up-to-date fedora-coreos images12:42
born2bakeI ve tried already so many different setups :) the only one that works for me: flannel, fedora-coreos, 1 master. (autoscaler, autohealer, cloud manager are crashing at scaling)12:43
*** ttsiouts has joined #openstack-containers12:43
born2bakecalico, multi-master neither of them are working for me12:43
born2bakeopenstack: train, kolla12:43
guilhermespborn2bake: do you have octavia on your env?12:47
born2bakeyes12:47
guilhermespno logs?12:47
born2bakedeleted everything, will try again later on. I managed to have multi-master cluster with fedora-atomic 29....but I cant use fedora-atomic (it takes around 20 min to boot up just one image), worker-nodes were not able to connect either12:48
guilhermespborn2bake: https://review.opendev.org/#/c/685875/1 are you aware of?12:49
born2bakethe main problem, I have no idea how to troubleshoot using heat-config logs...I can see it stopped and failed but I do not know why12:49
born2bakeguilhermesp nope. even though I have no idea how to update my magnum setup lol12:50
born2bakebut as far as I know, in kolla ansible train magnum version is 9.2.012:50
guilhermespand the heat version?12:51
brtknrttsiouts: heelo im here12:52
born2bakedocker exec -it kolla/ubuntu-source-heat-engine:train heat --version - 1.18.012:53
brtknrborn2bake: can you upload your logs?12:56
brtknrborn2bake: ssh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* | nc seashells.io 133712:56
born2bakesurely I will do12:57
brtknrcosmicsound: same for you^12:57
strigaziping ttsiouts12:58
ttsioutsI'm here13:01
ttsioutsbrtknr: I really like the your idea13:02
ttsiouts:)13:02
ttsioutsI just wanted to discuss with both of you a bit more13:02
brtknrttsiouts: great to hear =)13:02
*** udesale_ has joined #openstack-containers13:03
brtknrI think someone else has suggested this before but on the client side13:03
brtknre.g. by reading the cluster template labels and applying merge/override based on a flag that the server never sees13:04
ttsioutsthis solution though would not allow proper tracking of the labels that were provided at creation time.13:05
*** udesale has quit IRC13:05
brtknrttsiouts: yes i agree13:06
ttsioutsso having this option server side is what makes it work for this use case too.13:06
strigaziI think someone else has suggested this before but on the client side: HARD NO13:06
brtknrI thought it was this but looks like its a different implementation: https://review.opendev.org/#/c/65741013:07
strigazihttps://review.opendev.org/#/c/657435/ This is the patch13:07
brtknrthat looks similar13:08
strigaziThey are duplicate13:08
brtknreither way, the merge takes place on the client side13:08
strigaziHaven't we rejected this?13:09
brtknryes, i was basically trying to point out that my suggestion is not 100% original :)13:10
ttsioutsok we agree that the merge should be done server side in order to allow proper tracking of client input13:10
brtknr+113:11
ttsioutsdo you also agree that the labels field (in a cluster or nodegroup) should contain only the labels provided at creation?13:13
ttsioutswhich means that we have to persist the flag too.13:14
brtknrat cluster/nodegroup creation?13:14
ttsioutsbrtknr: yes13:14
brtknragree13:14
ttsioutsstrigazi ?13:14
strigaziyes13:15
strigaziargee13:15
strigaziagree13:16
brtknrttsiouts: e.g. after the cluster is created, --merge-label flag cannot be modified you mean right?13:16
brtknrvia the API13:16
ttsioutsbrtknr: yes13:16
ttsioutscool13:17
ttsioutssorry for going step by step but I want this to go forward as soon as possible13:18
ttsiouts:)13:18
strigaziwe are picky, so this ^^ is the only way13:18
ttsiouts:)13:19
brtknrttsiouts: any more resolutions to pass ? :)13:20
ttsioutsshould we also agree on the flag and the field name?13:20
ttsiouts--merge-labels and a boolean field in DB called merge_labels?13:21
brtknri am happy with merge-labels but open to other suggestions13:22
brtknrother ideas: --combine-labels, --smash-labels, --update-labels, --inherit-labels, --override-labels13:23
strigazioverride is probably what we want13:23
strigaziin OOP you override a method13:23
brtknrsecond thing to agree on is whether the current behaviour is override-labels=True or False13:25
strigazianimal.get_features() and dog.get_features()13:25
ttsioutsoverride though means not using what's inherited right?13:25
ttsioutsbrtknr: yes13:25
ttsioutsif we go with override then false should mean merge right?13:26
brtknrI think the current behaviour is --override-labels=True13:26
ttsioutsbrtknr: exactly13:26
brtknrSince nothing is inherited13:27
strigaziactually, the current behaviour is both. Becaus:13:27
cosmicsoundbrtknr , will upload logs13:28
brtknrIt would be good not have to speficy --override-labels=False as an opt-in flag13:28
strigaziin the API we do https://github.com/openstack/magnum/blob/master/magnum/api/controllers/v1/cluster.py#L47513:28
brtknrwould prefer to supply "--opt-in-flag" only if True13:28
strigazias brtknr wants13:29
brtknrstrigazi: I see your point13:30
ttsioutsstrigazi: indeed13:30
cosmicsoundbrtknr , https://seashells.io/v/VzqW9UYW13:31
brtknr--override_labels = False if cluster.label ==  wtypes.Unset else True13:31
strigazithis boolean is strange because, in the new API version we want override always True and in the old API always False.13:32
cosmicsoundWill update session as it changes i test newer versions13:32
brtknrhmm this is more of a rabbit hole than I realised :)13:33
strigaziAnother  option (a bad one) is the default API is not the actual latest which means (use always only the cluster labels), and the new API uses always both (override = True)13:35
strigaziThe problem is that in the cli we always ask the latest API version.13:35
strigazioverride=true == (get CT labels and C labels) && (get CT labels and C labels and NG labels)13:38
strigazioverride=False == (get C labels) && (get NG labels)13:38
strigazifor POST cluster and POST NG respectively13:38
strigazidefault logic override=true13:39
ttsioutsIMHO the True boolean option should reflect the new functionality.13:39
strigazi+10 ^^13:40
strigaziThe issue to address is: The old client will send Unset for override and latest for the API microversion.13:41
strigaziTo solve this: The API can have default logic override=fale13:42
strigaziand the new client send true by default13:42
strigazifor UX experience13:42
brtknror override=True if labels is defined else False?13:43
brtknris override is Unset13:43
strigazi(UX includes experience) :)13:43
brtknrif override is Unset13:43
strigazi-1 to that13:43
strigaziThe new API should not check if Unset13:44
strigazithe old API microversion will do what you just mentioned13:44
strigazibecause it is not supposed to know about override13:44
brtknrstrigazi: ok IDK the fine details of how API microversion works atm13:45
cosmicsoundcloud_provider_tag=v1.15.0 should work for k8s v1.17.4 ?13:45
strigaziyes ^^13:45
brtknrcosmicsound: yes thats what I use13:45
guilhermespit is the default right for v1.17 right?13:46
guilhermespcloud_provider_tag=v1.15.0.13:46
strigazibrtknr: ttsiouts: let's break backwards compatibility?13:47
strigazibrtknr: ttsiouts: let's break the API13:47
brtknrstrigazi: hmm?13:47
strigaziwe document and users open many tickets, at CERN the open many anyway :)13:47
brtknrstrigazi: not sure if you are being serious :)13:49
strigaziactually desperate13:49
strigazi :)13:49
cosmicsoundError: Unable to update cluster. when trying to resize cluster13:49
cosmicsoundisnt this supposed to work?13:49
cosmicsoundmaking from 1 node 2 3 nodes13:50
ttsioutsstrigazi, brtknr: let's think about this. this property is immutable. meaning that it is false for all the existing clusters13:50
ttsioutswe need to describe this with one word.13:51
brtknrthe way I see it, it only makes sense to evaluate this flag if labels is not empty at cluster scope or nodegroup scope13:52
ttsioutsbrtknr: +113:52
brtknrif labels is empty and this flag is True, the API should return an error13:52
brtknrthis should be backward compatible too13:52
ttsioutsI agree13:52
brtknrstrigazi: ^13:53
strigazithinking13:55
brtknr-\|/-\|/-13:55
strigaziAnd the default is True?13:56
strigaziboth API and cli13:57
strigazi?13:57
brtknri think the "opt-in" flag should signify the merge action13:57
brtknrI am starting to actually prefer the sound of combine13:58
ttsioutsthe default shouldn't be true13:58
brtknr--combine-labels13:58
brtknrso it should be False13:58
brtknr^13:58
*** dave-mccowan has joined #openstack-containers13:59
ttsioutsIn the code it will be a dict.update14:01
strigazibrtknr: So the new improved logic won't be available by default to users, correct?14:01
ttsioutsshould we go with update?14:01
ttsioutsupdate-labels14:01
brtknrstrigazi: not by default14:02
brtknrusers will need to work for it14:02
brtknrearn their keep14:02
strigaziIs this what we want?14:02
brtknrstrigazi: doesnt make sense to break current default behaviour suddenly does it?14:03
strigaziI think that what currently is described in the spec is simpler (one param less and more verbosity) for user and more code for us, thoughts?14:03
strigazidoesnt make sense to break current default behaviour suddenly does it? it doesn't14:04
brtknri disagree that it is one less param, since we are adding a new field14:04
brtknrits the same number of params14:05
brtknrttsiouts: i am also fine with update-labels14:06
brtknrttsiouts: i am also fine with update-labels but it is less obvious14:07
strigazibrtknr: true, same number of params.14:07
brtknrin fact i would argue that the purpose of dict.update is not entirely clear to a new user14:07
strigaziSo, to have the same functionality with the SPEC (and not change default behavior). override-labes false by default. And we are covered, correct?14:10
brtknrthe purpose of this flag is to avoid mutually exclusive population of labels/override_labels(from the current spec)14:10
*** dave-mccowan has quit IRC14:10
brtknrif override-labels==combine-labels, correct :)14:11
strigaziexactly14:11
brtknrif override-labels==combine-labels==update-labels, correct :)14:11
strigazi*-labels14:11
strigazidoesn't matter14:12
strigazithe new name14:12
strigaziwhatever it is14:12
brtknrwe could do a survey on ML?14:14
brtknror send a link to a survey14:14
strigazilet's conclude in the rest and leave the name out14:14
brtknrok14:14
strigaziSo it will be a new bool14:15
ttsioutsyes14:16
brtknr+114:16
strigaziit will be false by default14:16
ttsioutsyes14:17
brtknrfalse by default if labels is defined14:17
ttsioutswe need a default value for the flag in the DB14:17
ttsioutsand it should be false no matter what for existing cluster14:18
ttsioutsthe default should be false and it will be evaluated only if labels are provided.14:19
ttsioutsdoes it make sense?14:19
brtknrttsiouts: yes that makes sense to me14:19
strigazibrtknr: > false by default if labels is defined | what about when labels == unset?14:19
strigazibrtknr: false as well, no?14:20
brtknrwhat ttsiouts said14:20
brtknrstrigazi: yes14:20
brtknrlets keep it simple and leave it as false14:20
strigazisolved?14:22
brtknrttsiouts: ?14:22
brtknrstrigazi: do you guys use https for keystone at CERN?14:23
strigaziyes14:23
brtknrhow did you make this work for fedora coreos?14:23
strigazifor everything user facing14:23
ttsioutsI'm just thinking about the migration for the existing clusters. we should have something simple and not checking if things are set or not14:23
strigaziI think false for all existing clusters is fine. I don't see what we need to distinct it14:24
brtknrstrigazi: a user reported this yesterday: Apr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to14:24
brtknrhttps://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net',14:25
brtknrport=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no14:25
brtknrcertificate or crl found (_ssl.c:4232)')))14:25
strigazittsiouts: I think false for all existing clusters is fine. I don't see what we need to distinct it14:25
strigazibrtknr: openstack_ca_file14:25
strigazibrtknr: in magnum.conf14:25
ttsioutsstrigazi: cool for me14:25
brtknrstrigazi: ok cool thanks14:26
brtknrttsiouts: look forward to the updated spec14:26
brtknrcosmicsound: ^^14:26
strigazittsiouts: brtknr: I hope flwang1 doesn't have another idea14:26
brtknrstrigazi: me too :P14:27
ttsioutshaha14:27
ttsioutsstrigazi, brtknr: thanks guys! I'll update the spec14:27
brtknr.X,14:27
brtknr^that is a crossed finger14:27
strigazibrtknr: ttsiouts: name:14:29
strigazibrtknr: ttsiouts: https://helm.sh/docs/helm/helm_install/ For example, if both myvalues.yaml and override.yaml contained a key called ‘Test’, the value set in override.yaml would take precedence:14:29
ttsioutswe go with override? It's ok for me14:30
strigazibrtknr: ^^14:30
brtknr strigazi: im okay with override14:30
brtknryankcrime: ^^  please read the bit about openstack_ca_file14:31
ttsioutsit's the name of the spec and my jira ticket :P14:31
brtknri am getting used to it14:32
strigazittsiouts: override?14:32
ttsioutsstrigazi: yes14:32
strigazibrtknr: yankcrime: https://docs.openstack.org/magnum/latest/configuration/sample-config.html [drivers] openstack_ca_file Path to the OpenStack CA-bundle file to pass and install in all cluster nodes.14:33
*** ricolin has joined #openstack-containers14:39
*** ttsiouts has quit IRC14:42
yankcrimebrtknr: 👀14:45
yankcrimeoh is this because fedora coreos doesn't ship a cacert bundle for public / commercial CAs?14:45
strigaziyankcrime: no, because pyhton14:46
*** ttsiouts has joined #openstack-containers14:46
strigaziyankcrime: no, because python14:46
yankcrime:(14:47
strigaziyankcrime: it should work, you have an ok cert14:48
strigazifrom Sectigo14:48
*** ttsiouts_ has joined #openstack-containers14:49
*** ttsiouts has quit IRC14:49
yankcrimestrigazi: it's a letsencrypt issued cert and we still see that error that brtknr described14:50
strigaziyankcrime: not sure why you see it: podman run -it --rm --entrypoint /usr/bin/python docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))"14:53
strigazi<Response [200]>14:53
strigaziseems to work14:53
strigaziyankcrime: this is how the agent runs: https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_coreos_v1/templates/fcct-config.yaml#L19414:59
brtknrstrigazi: yankcrime has 9.2.0 release14:59
brtknrstrigazi: is this patch relevant: https://review.opendev.org/#/c/709777/15:00
brtknrthis patch is only available in 9.3.0 release15:01
strigazimaybe yes, if /etc/pki/ca-trust/source/anchors/openstack-ca.pem has something bad inside15:01
*** ttsiouts_ has quit IRC15:02
strigazibrtknr: if this file doesn't exist the error is: OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/pki/ca-trust/source/anchors/openstack-ca.pem15:03
strigazitest with: podman run -it --rm --entrypoint /usr/bin/python3 --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))"15:03
strigaziyankcrime: brtknr: ^^15:04
brtknrstrigazi: you're right15:06
born2bakebrtknr http://paste.openstack.org/show/791740/ - calico, coreos ; same on flannel. its with having loadbalancers added.15:08
born2bakeso it failed but load balancers show online http://prntscr.com/rut6b5 and they can ping machines15:09
brtknrstrigazi: not sure what other explaination there is15:11
brtknri will try running these on yankcrime's compute.sausage.cloud15:12
*** rcernin has quit IRC15:13
brtknrborn2bake: please run 9.3.0, there is a patch for TimeoutRestartSec15:17
born2bakedo I need to add label tag or something when I do 9.3.0?15:18
brtknrNo label required15:19
brtknrTimeoutRestartSec default value is 90 seconds, we have increased this to 60015:19
*** ttsiouts has joined #openstack-containers15:27
brtknrborn2bake: in 9.3.0 release15:30
born2bakeit would take some time cause I ve no idea how to create custom magnum containers in kolla so I can have the latest version :)15:31
brtknrborn2bake: you dont need to build it, the image should be usable as train tag: https://hub.docker.com/r/kolla/centos-binary-magnum-conductor/tags15:34
brtknralthough i think the CI is broken15:35
born2bakea632c4d94216        kolla/ubuntu-source-magnum-conductor:train             "dumb-init --single-…"   3 days ago          Up 3 days                                                   magnum_conductor15:35
born2bake1a200061c45b        kolla/ubuntu-source-magnum-api:train                   "dumb-init --single-…"   3 days ago          Up 3 days                                                   magnum_api15:35
born2bakei have ubuntu-source-train15:35
brtknrno wait it finally merged: https://review.opendev.org/#/c/716339/15:36
born2bakeand then just run reconfigure?15:36
brtknryou might have to wait till tomrrow because i think they build the image every 24 hours15:36
brtknrstrigazi: if run that command you shared as sudo, with --privileged flag, i can reproduce the problem15:56
brtknrstrigazi: e.g sudo podman run  -it       --name heat-container-agent-dupe         --privileged         --volume /etc/:/etc/         --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem --net=host --rm         docker.io/openstackmagnum/heat-container-agent:ussuri-dev python3 -c "import requests ; print(requests.get('https://compute.sausage.cloud:5000/v3'))"15:59
brtknrbut with the REQUESTS_CA_BUNDLE patch, no issues15:59
brtknryankcrime: you need this patch in conclusion https://review.opendev.org/#/c/704739/2/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json16:00
brtknrborn2bake: yes reconfigure but as i mentioned in the openstack-kolla channel, i dont think the images have been built yet, according to dockerhub the last train image was built 13 days ago16:04
brtknrah sorry you are using ubuntu-source16:05
brtknrits possible 9.3.0 is available in there then16:06
brtknrone caveat is that we forgot to merge zincati auto-update disable patch16:06
brtknryou might therefore be better off using master branch for magnum16:06
brtknryou might therefore be better off using master tag for magnum16:07
brtknrthe side-effect of zincati is that for fedora coreos, heat-container-agent restarts16:07
*** udesale_ has quit IRC16:08
*** ykarel is now known as ykarel|away16:10
born2bakebrtknr ok I will try binary cotainers then. Also, as I mentioned previously, just created flannel 1 master 1 node cluster....run  kubectl scale deployment test-autoscale --replicas=100 - http://paste.openstack.org/show/791752/ (autoscaler, autohealer, cloud manager crashing, node is created in stack though but not added16:12
brtknrborn2bake: not sure why, they work for me16:14
brtknrborn2bake: ubuntu-source may have the correct version16:15
brtknras it was built 9 hours ago16:15
born2bakehow do I check magnum version in container?16:15
born2bake[magnum@sova magnum-base-source]$ ls16:19
born2bakemagnum-9.2.016:19
born2bake[heat@sova heat-base-source]$ ls openstack-heat-13.0.016:19
born2bakeI will try to use binary master16:22
born2bakebrtknr which one you wd suggest to you? centos/ubuntu-binary/source-master?16:22
yankcrimebrtknr: ok will get it applied16:27
yankcrimetomorrow at this rate16:27
*** ttsiouts has quit IRC16:40
brtknrborn2bake: ubuntu-source master should also be fine16:48
cosmicsoundborn2bake , use virtio instead of scsi if case16:50
cosmicsoundit helped me on my failed scripts16:51
cosmicsounduse heat_tag: master magnum_tag: master and reconfigure16:51
born2bakeas I said, when I use virtio, image doesnt have enough entropy /dev/random and cant generate ssh keys fast. so it takes around 20 minutes for machine to boot :)16:51
cosmicsoundmake sure disks are on virtio16:51
cosmicsoundhmm16:51
cosmicsounddid you added the other one i mentioned?16:52
born2baketherefore, I am stick to fedora-coreos images cause they are fine and newer16:52
born2bakeyes, I ve tried all :)16:52
cosmicsoundi too work now on coreos16:52
cosmicsoundand works good for me16:52
born2bakehave you tried autoscaler?16:52
born2bakeits crashing for me for some reason16:53
*** ttsiouts has joined #openstack-containers17:13
*** ttsiouts has quit IRC17:18
cosmicsoundborn2bake , i tried it17:22
cosmicsoundit made me scared when it lowered my servers17:22
cosmicsound:D17:22
cosmicsoundI did not tried it upscale it yet was downscalling itself17:22
born2bakenone-k8s servers? :)17:22
cosmicsoundall :D17:30
cosmicsoundborn2bake , used sonobuoy?17:31
cosmicsoundanyone know how i start it?17:31
*** k_mouza has quit IRC17:31
*** ttsiouts has joined #openstack-containers17:31
born2bakecosmicsound what version do you have? kolla/ubuntu-source-magnum-conductor:master - [magnum@sova magnum-base-source]$ ls - magnum-9.1.0.dev21217:36
born2bakeI set master, and its even lower than I had17:37
*** ttsiouts has quit IRC17:46
*** vishalmanchanda has quit IRC17:47
cosmicsoundyes born2bake17:57
cosmicsoundthe one with 9.1.0 was working17:57
cosmicsoundalso need the heat master17:57
cosmicsoundi do not tag versions only master or train .17:57
cosmicsoundnumerical tags do not work17:57
cosmicsoundIf I want to edit just a extra label17:58
cosmicsoundI need to recreate the cluster?17:58
born2bakeboth flannel and calico failed for me with master tag :/18:01
born2bakehttp://paste.openstack.org/show/791756/18:01
*** ricolin has quit IRC18:02
*** k_mouza has joined #openstack-containers18:13
*** k_mouza has quit IRC18:14
*** ttsiouts has joined #openstack-containers18:25
*** ttsiouts has quit IRC18:30
born2bakehttp://paste.openstack.org/show/791757/ - flannel, with lb, 2 masters18:31
born2bakecalico faiing18:31
brtknrUse etcd_tag=v3.4.618:33
brtknrWith coreos or atomic?18:33
brtknrborn2bake:18:33
born2bakecoreos18:34
brtknrare you using the terraform script?18:34
born2bakehttp://paste.openstack.org/show/791756/ - calico18:34
born2bakeyes terraform from github18:34
brtknrThat is a partial log that doesn’t tell me a lot19:01
brtknrborn2bake:19:02
brtknrborn2bake: it doesn’t say why it failed19:02
brtknrborn2bake: at the end of the log, it says etcd server request timed out19:04
brtknrcheck that etcd is running19:04
brtknrborn2bake: When copying the logs, use the seashells method I described earlier19:05
brtknrit will capture the full log19:06
born2bakessh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* | nc seashells.io 1337 ?19:06
born2bakeOkay I will19:06
brtknrborn2bake: Yes19:06
born2bakeI will change etcd tag and do clusters again19:06
brtknrbut looks are you are using incompatible etcd version19:06
brtknrwhat is the current version on the terraform script?19:07
born2bakeflannel http://paste.openstack.org/show/791757/ looks like smth with octavia19:07
brtknrin master, you now need a v before the tag19:07
born2bakebranch is up-to-date with 'origin/master'19:07
born2bakein vars.tf right?19:08
brtknrI saw those, it’s not much use because the log is incomplete19:08
brtknrbut I saw etcd timing out at the end19:09
*** ttsiouts has joined #openstack-containers19:14
born2bakebrtknr flannel, coreos - masters finished successfully https://seashells.io/v/QJcExtc8 ; let me see worker19:21
born2bakeon worker node there is no even heat-config logs19:23
born2bakenow there is. worker node - https://seashells.io/v/nQvQRffK19:23
born2bakekubectl get node doesnt work on master either19:26
brtknrCan you use tail instead of cat19:28
brtknrlooks like the container agent is still running on the master19:29
cosmicsoundborn2bake , what hw labels you have on the image?19:34
cosmicsoundfor libvirt19:34
born2baketail on master - https://seashells.io/v/uASF984Q19:37
born2bakecosmicsound all ceph rbd related...cant find it now :)19:37
brtknrborn2bake: can you try with your lb disabled?19:47
*** ttsiouts has quit IRC19:48
brtknrdoes that work?19:48
brtknrlooks like your lb for octavia is not reachable19:48
born2bakebrtknr flannel: masters tail - https://seashells.io/v/uASF984Q ; workers - https://seashells.io/v/nQvQRffK ; calico: master - https://seashells.io/v/xN4gyXTD19:48
brtknrfor etcd19:48
brtknrarr you sure octavia is configured correctly?19:49
born2bakeas I said, flannel with 1 master and 1 node it works19:49
born2bakecant be sure :/19:50
born2bakehttps://ssup2.github.io/record/OpenStack_Stein_%EC%84%A4%EC%B9%98_Kolla-Ansible_Ubuntu_18.04_ODROID-H2_Cluster/ followed that guide for octavia19:50
brtknryou can test octavia ingress controller19:50
born2bakecreated certs, added route to docker hosts "route add -net 20.0.0.0/24 gw 192.168.0.225", then when I create lb's they are fine19:51
brtknrborn2bake: Setting up octavia is complicated, if it works with single master, sounds like problem with your octavia config19:54
born2bakecalico without lb still didnt work but I think I will focus on flannel just now...and see what's wrong with octavia19:54
born2bakeeven though autoscaler/cloud manager are still crashing for me :(19:55
brtknryou can try curling etcd port on the load balancer19:55
born2bakecurl --insecure https://10.0.15.169:2379 curl: (35) error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate19:57
born2bakeI noticed my load balancer does not have floating ip assigned19:57
born2bakeStatus: TCP 2379 Online Active Yes19:57
born2bakehowever, I do have     master_lb_floating_ip_enabled = "true" enabled19:58
born2bakeI think the case might be that my octavia doesnt support tls/ssl19:59
brtknrWhat if you use http instead of https20:03
born2bakecurl: (52) Empty reply from server20:03
born2bakething is it does not create floating ip - lb for etcd. only for 6443 api20:04
brtknrCan you curl the k8s api?20:09
brtknrborn2bake: Anyway have fun investigating, I’m going to bed, I strongly suspect your lb config20:10
born2bakeokay, yeah wd need to do some testing on octavia20:11
born2bakethanks a lot!20:11
flwang1brtknr: ping, are you there?20:28
*** born2bake has quit IRC21:36
*** ttsiouts has joined #openstack-containers21:44
*** rcernin has joined #openstack-containers22:11
*** ttsiouts has quit IRC22:18

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!