*** jamesmcarthur has joined #zuul | 00:02 | |
*** jamesmcarthur has quit IRC | 00:06 | |
*** jamesmcarthur has joined #zuul | 00:10 | |
*** jamesmcarthur has quit IRC | 00:27 | |
*** threestrands has joined #zuul | 00:27 | |
*** jamesmcarthur has joined #zuul | 00:39 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: Add remove-zuul-sshkey https://review.opendev.org/680712 | 00:41 |
---|---|---|
*** jamesmcarthur has quit IRC | 00:44 | |
*** jamesmcarthur has joined #zuul | 01:02 | |
*** rlandy has quit IRC | 01:50 | |
*** jamesmcarthur has quit IRC | 01:59 | |
*** jamesmcarthur has joined #zuul | 02:04 | |
*** roman_g has quit IRC | 02:34 | |
*** jamesmcarthur has quit IRC | 03:00 | |
*** jamesmcarthur has joined #zuul | 03:06 | |
*** jamesmcarthur has quit IRC | 03:13 | |
*** jamesmcarthur has joined #zuul | 03:25 | |
*** rfolco has quit IRC | 03:32 | |
*** PrinzElvis has quit IRC | 03:39 | |
*** webknjaz has quit IRC | 03:43 | |
*** dcastellani has quit IRC | 03:45 | |
*** PrinzElvis has joined #zuul | 03:45 | |
*** webknjaz has joined #zuul | 03:46 | |
*** dcastellani has joined #zuul | 03:46 | |
*** jamesmcarthur has quit IRC | 03:57 | |
*** ianychoi_ has joined #zuul | 04:24 | |
*** ianychoi has quit IRC | 04:27 | |
*** pcaruana has joined #zuul | 04:42 | |
*** saneax has joined #zuul | 05:03 | |
*** pcaruana has quit IRC | 05:12 | |
*** bolg has joined #zuul | 05:45 | |
*** bolg has quit IRC | 06:09 | |
*** pcaruana has joined #zuul | 06:21 | |
*** bolg has joined #zuul | 06:26 | |
*** AJaeger has quit IRC | 06:28 | |
*** AJaeger has joined #zuul | 06:35 | |
*** hashar has joined #zuul | 06:42 | |
*** roman_g has joined #zuul | 07:16 | |
*** themroc has joined #zuul | 07:16 | |
*** jpena|off is now known as jpena | 07:37 | |
bogdando | o/ | 07:38 |
bogdando | please merge https://review.opendev.org/#/c/681182/ | 07:38 |
*** avass has joined #zuul | 07:40 | |
*** threestrands has quit IRC | 07:59 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 08:09 | |
*** ianychoi_ has quit IRC | 09:09 | |
*** hashar has quit IRC | 09:38 | |
*** saneax has quit IRC | 10:11 | |
*** saneax has joined #zuul | 10:12 | |
*** ianychoi has joined #zuul | 10:30 | |
*** avass has quit IRC | 10:37 | |
*** ianychoi has quit IRC | 10:45 | |
*** ianychoi has joined #zuul | 10:45 | |
*** shachar has quit IRC | 11:08 | |
*** snapiri has joined #zuul | 11:08 | |
*** sshnaidm|ruck is now known as sshnaidm|bbl | 11:20 | |
*** hashar has joined #zuul | 11:20 | |
*** avass has joined #zuul | 11:30 | |
*** jpena is now known as jpena|lunch | 11:40 | |
*** spsurya has joined #zuul | 11:58 | |
*** rfolco has joined #zuul | 12:06 | |
*** rlandy has joined #zuul | 12:23 | |
*** jamesmcarthur has joined #zuul | 12:25 | |
pabelanger | morning! I wanted to see how we could make a change to nodepool, to not fail if an openstack provider as been configured to not have a public IP: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/openstack/handler.py#L190 | 12:30 |
*** jamesmcarthur has quit IRC | 12:30 | |
pabelanger | we have this use case in one of our network appliances, cisco iosxr, where the management interface can come online (via dhcp) but not have a default route | 12:31 |
pabelanger | so we need to do some very weird things, to make multinode jobs happen | 12:31 |
pabelanger | but this requires both nodes to be on the same subnet, for public internet (which is really hard these days). | 12:31 |
pabelanger | So, by removing the check above or toggling it, we want to launch a node, but only have private IPs, which is possible the zuul executor won't have direct access too | 12:32 |
*** jpena|lunch is now known as jpena | 12:32 | |
*** bogdando has left #zuul | 12:43 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/676464 | 13:11 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow ensure-tox to upgrade tox version https://review.opendev.org/676464 | 13:13 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 13:13 |
sean-k-mooney | i proably should have checked this before pushing the github dirver support depends on with the pull request URL right | 13:35 |
pabelanger | yes | 13:35 |
sean-k-mooney | so this would work https://review.opendev.org/#/c/681474/ | 13:35 |
pabelanger | yup | 13:35 |
sean-k-mooney | if the intel zuul has that project in it zull config | 13:35 |
sean-k-mooney | cool i was expeting upstream zuul to explone because it does not | 13:36 |
sean-k-mooney | oh i forgot to add back in noop ot upstream | 13:36 |
*** panda is now known as panda|ruck | 13:40 | |
*** sshnaidm|bbl is now known as sshnaidm|ruck | 13:44 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow ensure-tox to upgrade tox version https://review.opendev.org/676464 | 13:44 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 13:44 |
*** swest has quit IRC | 13:45 | |
*** panda|ruck is now known as panda|rover | 13:45 | |
*** swest has joined #zuul | 13:45 | |
*** bolg has quit IRC | 13:56 | |
sshnaidm|ruck | hi, how can I build new containers with zuul? The current containers on docker.io/zuul has zuul version 3.5.0 which is quite old | 13:59 |
Shrews | pabelanger: interface_ip is not necessarily the public IP. it's whatever IP sdk determines is used for communicating with that server, which might be private: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/cloud/meta.py#L318-L337 | 13:59 |
*** swest has quit IRC | 13:59 | |
sshnaidm|ruck | docker pull zuul/zuul; docker run -it zuul/zuul zuul --version | 13:59 |
sshnaidm|ruck | Zuul version: 3.5.0 | 13:59 |
pabelanger | Shrews: yah, agree. In this case, it is a hard check where nodepool needs to find the interface_ip. In this case, it will always be empty. I'd like to skip that check | 14:02 |
AJaeger | sshnaidm|ruck: did you confirm that the content is old - or is it just the version? | 14:02 |
AJaeger | sshnaidm|ruck: the image was updated yesterday, wasn't it? | 14:02 |
sshnaidm|ruck | AJaeger, what do you mean by content? | 14:02 |
sshnaidm|ruck | AJaeger, I need a newer zuul version, at least as in CI | 14:03 |
AJaeger | I mean: Does it include current git master but uses wrong version number? | 14:03 |
pabelanger | we can look at the publish job to see | 14:03 |
Shrews | pabelanger: i'm confused... how do you expect the zuul executor to communicate with the node then? | 14:04 |
AJaeger | https://hub.docker.com/r/zuul/zuul/tags says "updated 17hours ago" | 14:04 |
pabelanger | Shrews: it doesn't, until the primary node is able to SSH into secondary and setup route | 14:04 |
pabelanger | It needs to be done this way, because route cannot be obtainned via dhcp | 14:05 |
AJaeger | sshnaidm|ruck, pabelanger, this seems to be the job pushing the image, isn't it? http://zuul.opendev.org/t/zuul/build/e174c37169da417da31a85a644ca7976 | 14:05 |
pabelanger | only static | 14:05 |
sshnaidm|ruck | AJaeger, it doesn't include git, it has /usr/local/lib/python3.7/site-packages/zuul-3.5.0.dist-info | 14:05 |
Shrews | pabelanger: so these are nodes allocated via nodepool, but not expected to be used directly by the executor? | 14:05 |
pabelanger | Shrews: they are allocated in nodepool, but zuul / nodepool cannot route to them, until a pre-run task is setup in base job | 14:06 |
pabelanger | once that is done, then zuul executor can access it | 14:06 |
zbr | not sure who is generating https://zuul.opendev.org/manifest.json but it would be useful to include zuul version there. | 14:06 |
pabelanger | this is because, nodepool isn't able to properly setup network directly, it needs manually commands | 14:07 |
Shrews | pabelanger: that goes directly against our current design. i don't have any ideas on that one, atm | 14:08 |
pabelanger | we have this working today | 14:08 |
pabelanger | in zuul.a.c | 14:08 |
pabelanger | but, it works because we are using public IPs | 14:08 |
pabelanger | I'd just like to have nodepool not enforce an interface ip | 14:09 |
pabelanger | so I can flip to the Vm to private | 14:09 |
corvus | sshnaidm|ruck, AJaeger: "docker run -it zuul/zuul zuul --version" -> "Zuul version: 3.10.2.dev66" for me | 14:09 |
fungi | pabelanger: how about a reverse nat, where the builder/executor masquerade as a local address on that network when connecting to those addresses? that way the device never needs a default route and just responds via layer 2 (arp or v6nd) resolution | 14:10 |
sshnaidm|ruck | corvus, did you pull from docker.io? | 14:11 |
corvus | sshnaidm|ruck: yes | 14:11 |
sshnaidm|ruck | corvus, me too.. | 14:11 |
pabelanger | fungi: trying to understand reverse nat comment | 14:12 |
sshnaidm|ruck | lemme check on vm, maybe cache.. | 14:12 |
Shrews | pabelanger: And the executor currently runs this pre-run task on the node, right? | 14:12 |
pabelanger | Shrews: on the primary node | 14:12 |
Shrews | oh, i think i see now | 14:13 |
pabelanger | so here is an example | 14:13 |
pabelanger | https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_32/62132/b93674666faa9a116d2a6d160f48557e6c7a454e/third-party-check/ansible-test-network-integration-iosxr-python36/dd0c215/job-output.html#l598 | 14:13 |
pabelanger | we have a pre-run job, that runs on primary node (2 nodeset) | 14:13 |
pabelanger | to ensure it can route to applinace node (so they are on same subnet) | 14:14 |
AJaeger | sshnaidm|ruck, corvus, I get the same result as corvus. So, all looks fine here as well - and I pulled from dockerhub. | 14:14 |
pabelanger | then we know we can access it and do things to it | 14:14 |
*** bolg has joined #zuul | 14:14 | |
pabelanger | if we cannot, zuul aborts, and re retry | 14:14 |
pabelanger | however, today we are using public IPs, in vexxhost-sjc1, since they have 1 single subnet of public IPs | 14:14 |
fungi | pabelanger: managed device lives in the "private" network and lacks routes outside that network. executor and launcher live outside the network and have static routes to a router which knows how to forward traffic into that network. last hop also performs layer 3 address translation to map the executor and launcher addresses to local addresses on that network so the device sees connections coming from a local | 14:14 |
fungi | (to it) address | 14:14 |
pabelanger | however, it is a single region we test against. In limestone, we can create provider network, that is private single subnet, and route between 2 nodes, without default routes | 14:15 |
Shrews | pabelanger: how does the primary node determine the address of the appliance nodes? | 14:15 |
sshnaidm|ruck | corvus, AJaeger thanks, checked on different vm, it's also 3.10, seems like docker cached something | 14:15 |
pabelanger | Shrews: public IP, because nodepool know it | 14:16 |
pabelanger | when we flip to private, we'll need to manage that via nodepool again | 14:16 |
pabelanger | as it will have the info | 14:16 |
pabelanger | https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_32/62132/b93674666faa9a116d2a6d160f48557e6c7a454e/third-party-check/ansible-test-network-integration-iosxr-python36/dd0c215/zuul-info/inventory.yaml | 14:17 |
pabelanger | is example inventory file | 14:17 |
pabelanger | fungi: so, if I understand, for that to work with multiple clouds, I'd have per region subnets, that don't conflict | 14:18 |
pabelanger | so I know which cloud to route too | 14:18 |
corvus | pabelanger: have you tried https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openstack].pools.host-key-checking ? | 14:19 |
fungi | pabelanger: a twist on that, if you do need conflicting/overlapping private networks, is to also use 1:1 nat in the other direction for those devices | 14:19 |
pabelanger | corvus: yes, we disable that by default. However we hiccup on interface_ip missing | 14:19 |
corvus | pabelanger: why are the nodes not always on the same subnet? can't you make a neutron network and put them both on it? the whole "abort and retry if they aren't on the same subnet" thing seems like it could be very problematic. | 14:24 |
pabelanger | corvus: I don't know to be honest. I'd have to confirm with cloud providers how to do that or if openstack supported. That in fact would be the easiest solutions here. | 14:25 |
*** electrofelix has joined #zuul | 14:26 | |
corvus | pabelanger: i think that's worth looking into | 14:28 |
corvus | pabelanger, Shrews: but back to the interface_ip thing -- | 14:28 |
pabelanger | k, I'll work up email to openstack ML | 14:28 |
corvus | pabelanger, Shrews: it seems like we expect entire clouds to be either public or private, but iiuc, pabelanger has a cloud where he wants to get both public and private vms. so yeah, that's not accomodated by the logic in sdk. even if we set up an override option in nodepool to say "force private", that would still apply at the pool level, not the server level, so there's no way to say "use interface_ip | 14:30 |
corvus | for this server, and private_ip for this other one" | 14:30 |
corvus | pabelanger, Shrews: the only workable option i see is strictly what pabelanger suggested: disable the interface ip check and just return no data. i guess we could do that, but if we add that, we should add a warning saying people almost certainly don't want to enable this flag because it will mask all kinds of very frequent problems. | 14:32 |
Shrews | pabelanger: corvus: i wonder if we can just skip the interface_ip check if host-key-checking is disabled. presumably you don't care about that ip if you're skipping ssh keyscan??? | 14:35 |
corvus | Shrews: hrm, yeah, that may be reasonable | 14:36 |
*** nhicher has quit IRC | 14:39 | |
pabelanger | yah, looking at code more, if we skip interface_ip on host-key-checking false I think that make sense. | 14:39 |
pabelanger | so, +1 from me :) | 14:39 |
pabelanger | but also asking in openstack too | 14:39 |
*** sshnaidm|ruck is now known as sshnaidm|rover | 14:40 | |
clarkb | pabelanger: so your appliance refuses to set a default route from dhcp? | 14:46 |
pabelanger | clarkb: yup! | 14:47 |
pabelanger | it is terrible | 14:47 |
*** panda|rover is now known as panda|ruck | 14:52 | |
fungi | if the device is a piece of networking gear, that's not entirely uncommon behavior | 14:56 |
fungi | you end up with a lot of flat management networks and/or 1:1 nat when dealing with devices like that | 14:57 |
clarkb | ime th3 expectation is you manually configure the device and not use dhcp (but that still allows for a default route | 14:57 |
*** bogdando has joined #zuul | 14:58 | |
bogdando | hi, please merge https://review.opendev.org/#/c/681182/ | 14:58 |
bogdando | Shrews, clarkb: ^^ | 14:58 |
*** bogdando has left #zuul | 14:58 | |
fungi | yeah, or if the device is capable of placing its management address on any network where it has a serial/loopback then it may be able to make that routable via traditional routing protocols | 14:58 |
fungi | that's more common for actual ip routers though, less so for pure ethernet switches | 15:00 |
pabelanger | yah, usually this device is the one providing DHCP to the network, so that is why it is missing (or so I am told) | 15:01 |
pabelanger | this is basically, a large hack around the idea of not adding console support into nodepool :) | 15:02 |
clarkb | in this particular case it does seem like you want to create your own network and subnet in neutron and boot all of the instances on that network | 15:04 |
clarkb | then your executor can have an interface on the network too | 15:04 |
clarkb | I believe mordred has said that while vexxhost gives you public networking by default you can still create a network and subnet and router there | 15:04 |
pabelanger | Yup, that is right, when I last tested this we couldn't get nodepool to bring online node due to interface_ip missing | 15:06 |
pabelanger | I did get it working with FIPs | 15:07 |
pabelanger | but some clouds don't support thta | 15:07 |
pabelanger | (and FIPs have an extra cost) | 15:07 |
*** jamesmcarthur has joined #zuul | 15:08 | |
clarkb | there is a clouds.yaml setting to say the private ip is the ip to use | 15:08 |
pabelanger | my last idea, is going to be using nested virt for the appliance. But really trying to avoid doing that | 15:08 |
clarkb | if nodepool is checking that that is reachable it would still fail though | 15:08 |
pabelanger | clarkb: would I need to create a new pool for that? I mostly only want this 1 VM to be setup with private | 15:09 |
clarkb | you might have to set up a new provider for that since it si a clouds.yaml setting | 15:10 |
pabelanger | k, let me look into that. that might complicate things on nodepool config side, but also an option | 15:10 |
clarkb | pabelanger: https://docs.openstack.org/os-client-config/latest/user/network-config.html | 15:11 |
clarkb | I think you set routes_externally on the netowrk to true | 15:11 |
*** chandankumar has quit IRC | 15:11 | |
clarkb | then you'll get that IP back as "the ip" from sdk in nodepool | 15:11 |
pabelanger | that might actually not be bad, if that is the case | 15:12 |
*** chandankumar has joined #zuul | 15:12 | |
pabelanger | okay, let me test that | 15:12 |
*** Goneri has joined #zuul | 15:15 | |
fungi | i strongly suspect openstack doesn't want to provide a means to request/guarantee provider network affinity, and expects you to create a network instead if you need that | 15:16 |
clarkb | ya thats teh whole point of being able to configure networks and subnets yourself | 15:17 |
Shrews | clarkb: we good to merge https://review.opendev.org/681182 for bogdando? Not sure why it wasn't approved before... | 15:20 |
clarkb | Shrews: I don't know why tristanC didn't approve, but ya aiui we test the multinode roles fairly well so if tests pass I would expect that to be working and can be approved | 15:22 |
clarkb | Shrews: do you want to +A or should I? | 15:22 |
Shrews | i'll go ahead | 15:22 |
Shrews | just wanted to make sure we weren't waiting for something | 15:23 |
*** bolg has quit IRC | 15:23 | |
*** themroc has quit IRC | 15:26 | |
*** igordc has joined #zuul | 15:31 | |
*** mattw4 has joined #zuul | 15:34 | |
*** mattw4 has quit IRC | 16:04 | |
*** mattw4 has joined #zuul | 16:04 | |
openstackgerrit | Merged zuul/zuul-jobs master: Fix evaluating nodepool_ip and switch_ip facts https://review.opendev.org/681182 | 16:05 |
*** mattw4 has quit IRC | 16:10 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/676464 | 16:16 |
*** hashar has quit IRC | 16:18 | |
*** chandankumar is now known as raukadah | 16:31 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul master: For pre blocks to wrap text https://review.opendev.org/681532 | 16:32 |
pabelanger | clarkb: that seems to work, but like I guess, disrupts all other VMs that are also attached to that network. For some reason, opestacksdk is returning that is interface IP, even while public exists and routes external. I'm guessing there is no order involved when 2 networks have that setting enabled | 16:38 |
pabelanger | but, will look more into openstacksdk | 16:38 |
clarkb | pabelanger: ya you need to use a separate provider with different clouds.yaml profile | 16:40 |
clarkb | for the ordering thing you can specify the public network as routes_external false probably | 16:41 |
clarkb | or only boot the instance with a single network | 16:41 |
pabelanger | yah, I have complex network requirements and limited provider networks | 16:42 |
pabelanger | if routes_externally could be passed via nodepool.yaml file, that would work | 16:43 |
clarkb | well in this case you are wanting to not use provider networks and instead use user configured networks | 16:43 |
pabelanger | but we'd need to grow support I think | 16:43 |
clarkb | hrm? | 16:43 |
clarkb | nodepool allows you to specify which networks to attach | 16:43 |
pabelanger | yup, but I don't think we expose the setting of the network | 16:43 |
clarkb | nodepool does | 16:43 |
pabelanger | oh | 16:43 |
clarkb | https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[openstack].pools.networks | 16:44 |
*** jpena is now known as jpena|off | 16:44 | |
SpamapS | corvus:I grabbed the nodepool task on https://storyboard.openstack.org/#!/story/2006516 .. but .. feels like the task list isn't quite in-line with the story. If I understand the story right, this is mostly about making the DB optional and allowing external settings. Yes? | 16:46 |
pabelanger | clarkb: isn't that just the name of the network? | 16:47 |
pabelanger | not the settings for it | 16:47 |
clarkb | pabelanger: correct the settings go in clouds.yaml | 16:49 |
corvus | SpamapS: i think the first 2 tasks are pre-reqs for the rest (or, well, at the very least, the zk task is a pre-req for nodepool) | 16:49 |
clarkb | pabelanger: what you need to do is have two profiles in clouds.yaml with different settings for the same network. Then have two providers in nodepool using different clouds.yaml profiles | 16:50 |
pabelanger | clarkb: okay, yah, that is still the 2 provider rule | 16:50 |
clarkb | yes | 16:50 |
pabelanger | which does work, I just manually forced it here: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_07/207/19fd14e960c2dbe048cc429c581f594d067252fe/check/ansible-network-iosxr-appliance/b78c8a9/job-output.html#l40 | 16:50 |
clarkb | fwiw we intentionally stopped managing network settings in nodepool and rely on clouds.yaml | 16:51 |
clarkb | so I think that is the correct way to do this | 16:51 |
pabelanger | yah, I'm just not looking forward to doubling my nodepool config from 4 providers, to 8. for a single node :( | 16:52 |
corvus | SpamapS: nodepool needs a zk. the story says we should be able to provide zk connection information outside of any operators (like, imagine the IT department already runs a ZK). so the easiest way to get something working in the test job is to set up a zk (which just happens to be run by an operator in the same k8s, but zuul-operator doesn't have to know about that). then tell zuul-operator the zk | 16:52 |
corvus | connection info. later on, we can do more fancy things with the zuul-operator interacting with the zk operator. | 16:52 |
corvus | (and same thing applies to pxc and zuul) | 16:52 |
SpamapS | corvus: got it. I'll put my extra cycles into those first few then. | 16:53 |
corvus | kk | 16:53 |
SpamapS | Interesting experience. I recently marked our `gate` pipeline as `supercedes: check`. As a result, PRs are merging with a "pending" status on check. I wonder if we can delete a status. | 17:03 |
*** hashar has joined #zuul | 17:04 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin https://review.opendev.org/680778 | 17:06 |
openstackgerrit | Paul Belanger proposed zuul/nodepool master: Disable interface_ip check, when host-key-checking is disable https://review.opendev.org/681544 | 17:06 |
pabelanger | corvus: clarkb: Shrews: ^that should be host-key-checking patch, we discussed this morning | 17:07 |
pabelanger | if still okay, I'll add some testing around it after I grab some lunch | 17:07 |
corvus | SpamapS: yeah, we might need a new reporter action in zuul to handle that; if you find that the github api supports it, we can do that. | 17:07 |
pabelanger | that would be the less work approach to make this work for us | 17:07 |
pabelanger | but understand if we don't want to do it | 17:08 |
corvus | SpamapS: (reporter action meaning like "start, failure, success.... superceded") | 17:09 |
pabelanger | clarkb: just thinking about it before getting food, is there no way in clouds.yaml to define my own network name, but map to existing provider network? I'm gussing not | 17:12 |
clarkb | pabelanger: you create a network in the cloud and use that instead of the provider network | 17:14 |
pabelanger | yah, I don't think I can create a network | 17:15 |
Shrews | pabelanger: i think we need to document the behavior in the host-key-checking doc portion too | 17:15 |
clarkb | mordred: has said you can in vexxhost | 17:15 |
pabelanger | this is limestone where I am testing | 17:15 |
clarkb | I've never done it myself though | 17:15 |
clarkb | you may be able to there too | 17:16 |
pabelanger | trying to figure it out with current resource | 17:16 |
pabelanger | the new provider works | 17:16 |
pabelanger | but that is a lot of overhead, like mirrors, quotas, etc. | 17:16 |
pabelanger | I could get behind a pool, but that is dedicated quotas there too | 17:16 |
pabelanger | Shrews: good idea | 17:17 |
clarkb | why do you need new mirrors? | 17:17 |
clarkb | if the host can't route anyways its not talking to those | 17:17 |
pabelanger | well, updates for dns entries (if we had region mirrors) | 17:17 |
pabelanger | image uploads will be a thing | 17:18 |
pabelanger | that will have a cost to it | 17:18 |
clarkb | not sure I understand the dns entries problem either. for images considering this is an appliance I assume nodepool isn't uploading those and you are just setting a uuid? | 17:18 |
clarkb | if so then you can use the image that is already there | 17:18 |
pabelanger | yah, appliance side we can reuse, but controller node is manged by dib | 17:19 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul master: For pre blocks to wrap text https://review.opendev.org/681532 | 17:19 |
clarkb | controller node can be launched on the other network and the limited network | 17:19 |
clarkb | but ya if using another provider that would be another image | 17:19 |
clarkb | maybe we need the concept of a subprovider which inherits images from its parent | 17:20 |
pabelanger | yah, quota is the bigger one honestly. need to share that between providers | 17:20 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Pass zuul_success to cleanup playbooks https://review.opendev.org/681552 | 17:21 |
pabelanger | okay, let me update nodepool patch, at tests | 17:21 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul master: For pre blocks to wrap text https://review.opendev.org/681532 | 17:21 |
pabelanger | then, maybe switch to vexxhost and create private network | 17:21 |
clarkb | corvus: 681552 passes zuul_success to cleanup playbooks | 17:21 |
pabelanger | then limit which VMs use it | 17:21 |
pabelanger | I can then loop back to other provider | 17:21 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: synchronize: add support for kubectl connection https://review.opendev.org/681553 | 17:22 |
zbr | corvus: i updated https://review.opendev.org/#/c/681532/ -- rephrased and included screenshots before/after, looks ok now? | 17:23 |
tristanC | zuul-maint: https://review.opendev.org/681553 integrates https://github.com/ansible/ansible/pull/62107 so that most zuul-jobs can ran in kubernetes | 17:23 |
tristanC | it's not ideal, but i don't know how long it will take for Ansible to support synchronize with kubectl connection. Could someone ask at AnsibleFest? | 17:24 |
clarkb | I think we should be very careful about adding features to ansible that won't work outside of zuul (because people will expect playbooks to work in zuul and outside of zuul) | 17:25 |
clarkb | I think we can ask (I'll be at the dev day Monday | 17:25 |
zbr | tristanC: my experience with Ansible was that if you make a PR it will be reviewed and merged quite fast, or maybe I was just lucky. | 17:25 |
tristanC | clarkb: it's not specific to zuul, it just extend the synchronize connection support | 17:26 |
zbr | tristanC: there is only one ugly aspect of ansible: if you add a new feature, it will only go into next release. | 17:26 |
zbr | but if you hurry up, there may even be time to slip things into 2.9, not sure. pabelanger probably knows better. | 17:27 |
zbr | i know this because I was upset that refused my fix to enable "etc-hosts" for docker_image module which was missing, and they aceepted only for 2.9, did not want to add it to 2.8 because counted as "new feature". | 17:28 |
*** sshnaidm|rover is now known as sshnaidm|off | 17:28 | |
zbr | from my point of view it was a bug: failure to pass argument to docker-py module, but always depends from which angle you see it. | 17:28 |
zbr | someones | 17:28 |
zbr | feature is someone's bug | 17:29 |
clarkb | tristanC: you've imported the code from the PR into zuul right? and that PR hasn't merged. So if we merge that chagne it will be specific to zuul until that PR merges | 17:29 |
clarkb | and it will not be clear to users that have working kubectl rsync in zuul why it doesn't work with normal ansible | 17:30 |
clarkb | (all of the other zuul changes to ansible prevent actions from being taken so tasks that run outside of zuul should still work) | 17:30 |
pabelanger | zbr: tristanC won't ship in 2.9, too late for that sadly | 17:31 |
tristanC | clarkb: yes that's correct | 17:31 |
pabelanger | but, with 2.9 we can create our own zuul collection if wanted | 17:31 |
tristanC | zbr: pabelanger: it can wait next or next+1, some feedback on it would be great though | 17:31 |
openstackgerrit | Paul Belanger proposed zuul/nodepool master: Disable interface_ip check, when host-key-checking is disable https://review.opendev.org/681544 | 17:32 |
*** spsurya has quit IRC | 17:32 | |
zbr | i will try to test it because I have a working local cluster and I wanted to deploy zuul locally anyway to learn more about it, so I should be able to help. | 17:32 |
tristanC | clarkb: i understand the concern, but on the other hand, without this we can't use most zuul-jobs on kubernetes | 17:33 |
zbr | tristanC: do you have a small playbook that should test that code? it could help me do the testing. | 17:34 |
tristanC | another solution would be to patch zuul-jobs fetch roles to not do synchronize, but instead copies the artifacts to a known location and let the base jobs synchronize pull from the nodes | 17:34 |
tristanC | or perhaps we need a zuul-k8s-jobs with variant of the zuul-jobs that are known to work with kubectl | 17:35 |
pabelanger | tristanC: clarkb: the way to do this post 2.9, is a colleciton. We need to start thinking about support that in zuul, given most modules will likley be removed from ansible/ansible. That was, we could ship our own zuul_syncronize over patching ansible core functionality | 17:36 |
tristanC | zbr: ansible -m synchronize -a 'src=/tmp/dir dest=. mode=push' $pod-name | 17:36 |
clarkb | we only need to replace the pull from test node to executor right? because we can synchronize from executor to a filesystem? | 17:36 |
clarkb | in that case replacing synchronize between executor and test node seems like something to explore | 17:36 |
clarkb | pabelanger: I assume `pip install ansible` (what zuul runs) will get you some sort of useable system? However we can add zuul-ansible-bits to that pip install list too I suppose | 17:37 |
tristanC | clarkb: the zuul-jobs patch is to circumvent the oc rsync command that can't be exector on localhost from untrusted jobs | 17:37 |
tristanC | executed* | 17:37 |
pabelanger | clarkb: IIRC, new workflow is pip install ansible, galaxy install collection foobar | 17:37 |
clarkb | tristanC: right that goes into a trusted base job | 17:37 |
pabelanger | pip ansible will be super minimal | 17:37 |
pabelanger | eg: no openstack modules | 17:37 |
tristanC | clarkb: e.g. we can synchronize from a pod to the executor using oc rsync, but that requires a command | 17:37 |
clarkb | tristanC: that is the same with jobs running on openstack VMs (you can't run it from untrusted context) | 17:38 |
tristanC | clarkb: not sure to understand, you can run synchronize: mode=pull from a VM to the executor from untrusted context | 17:38 |
clarkb | oh right you just have to keep the destination in the working dir | 17:40 |
clarkb | why is exec'ing oc different than exec'ing rsync in this case? is it because it happens at the module level rather than command ? | 17:40 |
tristanC | that's because zuul authorize synchronize to do so | 17:41 |
clarkb | got it | 17:42 |
tristanC | could we make a convention for jobs that they just need to copy their artifacts to a known location on the test instance, e.g. ~/job-logs, then we could have a generic "fetch-logs" roles that would run from the base jobs, and it would be easy to make it work for both ssh and kubectl connections | 17:44 |
openstackgerrit | Merged zuul/zuul master: Fix timestamp race occurring on fast systems https://review.opendev.org/680937 | 17:44 |
clarkb | tristanC: I want to say that already exists but not all jobs do it | 17:46 |
clarkb | the conventional location exists I mean | 17:46 |
tristanC | clarkb: indeed there is fetch-output... so we could in theory patch the fetch-* roles to implement a "copy_output_locally" toggle to make them use ~/zuul-output instead | 17:49 |
corvus | tristanC, clarkb: yes, that was an idea that mordred worked on for a little bit, and then others carried on for a little bit, but no one has ever pushed it to completion. | 17:50 |
corvus | in general, the idea was to stop having jobs fetch things from remote nodes, and instead just put them in known directories on the remote nodes and have the base jobs do the copying back | 17:51 |
tristanC | clarkb: corvus: great, thanks for the suggestion, i'll work on that instead, that seems like the best system | 17:51 |
corvus | ++ | 17:51 |
corvus | tristanC: i think the 'fetch-output' role is the centerpiece of that. | 17:52 |
tristanC | corvus: yes, that seems like exactly what we need | 17:53 |
*** nhicher has joined #zuul | 17:56 | |
*** electrofelix has quit IRC | 17:56 | |
*** jamesmcarthur has quit IRC | 18:06 | |
*** jamesmcarthur has joined #zuul | 18:07 | |
*** jamesmcarthur has quit IRC | 18:11 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: Add support for the Gerrit checks plugin https://review.opendev.org/680778 | 18:18 |
*** panda|ruck is now known as panda|ruck|off | 18:21 | |
*** hashar has quit IRC | 18:41 | |
pabelanger | clarkb: welp, I don't think the dedicated provider idea is going to work, because I need to launch 2 types of nodes, controller / appliance. So both been on the private network, but if I enable routes externally, both will get private ip as interface_ip. | 19:20 |
pabelanger | I can't just move iosxr to the new provider, because need the ability to do multi node across providers | 19:20 |
clarkb | pabelanger: and the executor isn't sufficient for the controller piece? | 19:21 |
pabelanger | right, we want to test ansible | 19:21 |
pabelanger | and new network connections | 19:21 |
pabelanger | I think we need an option to toggle routes externally via nodepool.yaml | 19:21 |
pabelanger | or allow multi node across provider / pools | 19:22 |
pabelanger | let me see why nodescan is using the private ip | 19:24 |
clarkb | actually this should work fine | 19:25 |
clarkb | what you need is two networks for the controller | 19:25 |
clarkb | one is the private network shared by the appliance the other is your public network | 19:25 |
clarkb | on the controller you say routes external as per normal | 19:25 |
clarkb | on the appliance you say routes external on the private network | 19:25 |
pabelanger | http://paste.openstack.org/raw/775155/ | 19:25 |
pabelanger | that is what I get on controller node | 19:25 |
pabelanger | which should be public / private | 19:26 |
pabelanger | I don't know why yet, it has private ip | 19:26 |
pabelanger | I would expect public | 19:26 |
pabelanger | https://github.com/ansible-network/windmill-config/blob/master/nodepool/nl01.sjc1.vexxhost.zuul.ansible.com.yaml#L173 is the new region | 19:27 |
pabelanger | clarkb: the part I might be missing, is how can I say routes externally, for different networks, per label | 19:27 |
pabelanger | https://github.com/ansible-network/windmill-config/blob/master/nodepool/clouds.yaml.j2#L28 is clouds.yaml | 19:28 |
clarkb | oh I see hrm | 19:28 |
clarkb | I wonder what happens if you tell nodepool to boot the appliance with only a routes externall false network? | 19:29 |
clarkb | then you don't need different configs. Nodepool's fallback behavior may work in that case? (eg if no external network then use whatever is there?) | 19:29 |
clarkb | I don't know that it does that though | 19:29 |
pabelanger | that works, but no interface ip | 19:30 |
pabelanger | but nodepool doesn't like that | 19:30 |
pabelanger | with out patch above | 19:30 |
pabelanger | however, i am starting to not like that idea, and the inventory file for applinace doesn't have ansible_host set | 19:31 |
pabelanger | because zuul gets that from interface_ip | 19:31 |
pabelanger | nodepool.private_ipv4 is set however | 19:31 |
clarkb | well that most accurate describes the setup you need | 19:31 |
clarkb | basically you have an unreachable instance on a network somewhere and it comes with a second node that acts as a bridge | 19:32 |
clarkb | in that case maybe we should fallback to "we have no better option so use the private ip" | 19:32 |
pabelanger | yah, I'd actually want the following inventory: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_07/207/19fd14e960c2dbe048cc429c581f594d067252fe/check/ansible-network-iosxr-appliance/b78c8a9/zuul-info/inventory.yaml | 19:32 |
pabelanger | I was able to force that by manually changing clouds.yaml between boots | 19:33 |
pabelanger | today, in nodepool you cannot have a nodeset with 1 interface_ip public and other interface_ip private | 19:33 |
pabelanger | that's really what I'm looking for | 19:33 |
clarkb | pabelanger: where does public ip come from in that inventroy? | 19:33 |
clarkb | there should be no public ip only a private ip in my example | 19:33 |
pabelanger | openstacksdk | 19:34 |
clarkb | (and you'd get no inventory_ip according to your message above) | 19:34 |
pabelanger | because I had routes external trye | 19:34 |
pabelanger | true* | 19:34 |
clarkb | ah ok | 19:34 |
pabelanger | so, interface_ip on controller node become private too | 19:34 |
pabelanger | then keyscan fails | 19:34 |
clarkb | pabelanger: another way of looking at this is that if you have no external network then zuul can't talk to it so having no ip in the zuul inventory is correct | 19:35 |
clarkb | pabelanger: then your job could generate a new inventory from the supplied private ip and use that in its nested ansible | 19:35 |
clarkb | ya so we want to disable keyscanning (whichI thought we already support) | 19:35 |
pabelanger | clarkb: yah, that way does work. but need https://review.opendev.org/681544/ | 19:35 |
pabelanger | if I apply that, I can do what you said | 19:36 |
corvus | remember also that if the executor can't talk to it (ie, during the pre playbooks), then it's going to be unhappy, so we may not want it in the inventory at all. if that's the case, we may want to look into treating it like a "resource" (ie, what we do for k8s) rather than an item in the inventory. | 19:36 |
pabelanger | then deal with missing ansible_host info via playbooks | 19:36 |
clarkb | corvus: ++ I think the nested ansible (or other job content) needs to figure out how to interact with it rather than zuul | 19:36 |
pabelanger | corvus: yup, today that is what we do, if the node is in appliance group, we have zero pre-playbooks or run playbooks use it, via hosts: all:!appliance | 19:37 |
pabelanger | then use your write-inventory role to add it to 1st node /etc/ansible folder, and control it from their | 19:38 |
pabelanger | zuul-executor doesn't touch it | 19:38 |
pabelanger | (appliance( | 19:38 |
pabelanger | if we didn't have to test ansible, I think we'd be fine with zuul-executor using it directly | 19:39 |
*** panda|ruck|off has quit IRC | 19:55 | |
*** panda has joined #zuul | 19:57 | |
*** jamesmcarthur has joined #zuul | 20:06 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add enqueue reporter action https://review.opendev.org/681132 | 20:15 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add no-jobs reporter action https://review.opendev.org/681278 | 20:15 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add report time to item model https://review.opendev.org/681323 | 20:15 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add Item.formatStatusUrl https://review.opendev.org/681324 | 20:15 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin https://review.opendev.org/680778 | 20:15 |
pabelanger | clarkb: corvus: Shrews: I've confirmed https://review.opendev.org/681544/ does allow the node to come online properly now, outside of the routes externally issue, if you could review when possible | 20:28 |
pabelanger | also, if you have ideas how to test in nodepool, aside from updating nodepool test, I can add those tests | 20:28 |
pabelanger | sorry devstack test | 20:29 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Also include nodepool inventory variables https://review.opendev.org/681601 | 20:39 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: fetch-javascript-tarball: introduce zuul_do_synchronize https://review.opendev.org/681603 | 20:41 |
tristanC | clarkb: corvus: should i propose something similar to https://review.opendev.org/681603 for all the other affect fetch roles? | 20:41 |
tristanC | affected* | 20:42 |
corvus | tristanC: yeah -- though maybe call it 'zuul_use_fetch_output' with a default value of '{{ zuul_site_use_fetch_output }}' so the ux is that someone sets a site variable that says "my base job uses fetch-output" ? | 20:43 |
corvus | (and i guess that's inverting the boolean, so, default of false instead of true) | 20:45 |
tristanC | corvus: good idea, thanks | 20:54 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: fetch-javascript-tarball: introduce zuul_do_synchronize https://review.opendev.org/681603 | 21:02 |
*** jamesmcarthur has quit IRC | 21:20 | |
*** avass has quit IRC | 21:32 | |
*** saneax has quit IRC | 21:57 | |
*** panda has quit IRC | 22:00 | |
*** panda has joined #zuul | 22:03 | |
*** rlandy is now known as rlandy|bbl | 22:24 | |
*** threestrands has joined #zuul | 22:37 | |
corvus | does anyone have suggestions as to how to convince sphinx to supply more information than this: Warning, treated as error: | 22:49 |
corvus | :1: (ERROR/3) Unknown interpreted text role "class". | 22:49 |
corvus | (that is *literally* all it's telling me. no idea even what file is triggering it. still happens without any changes to .rst files) | 22:50 |
fungi | no clue. did you add some text with a role named "class"? | 22:51 |
corvus | fungi: i didn't -- but even when i revert out the changes to rst files it still happens | 22:51 |
fungi | okay, so something new. yeah line numbers would be nice :/ | 22:51 |
corvus | somehow it seems that changing the python code has caused this? i also haven't changed any docstrings. | 22:51 |
corvus | if i'm parsing the error string correctly, that's: "line number 1 in the empty file" | 22:52 |
fungi | argh | 22:52 |
fungi | is this when trying to locally do `tox -e docs` on the zuul repo? | 22:53 |
corvus | yep, with 680778 checked out | 22:53 |
fungi | seems i need a full set of ansible build dependencies installed to build zuul docs | 22:59 |
corvus | the client doc pages run "zuul" to get the help output | 23:00 |
SpamapS | maybe an upstream library changed that flubs the scraped output? | 23:01 |
pabelanger | okay, super ugly, but manage to get job working: https://github.com/ansible/ansible-zuul-jobs/pull/207 | 23:03 |
corvus | SpamapS: if i checkout the change ahead of it it works; so i think it's something about 680778 setting it off | 23:03 |
SpamapS | ah that's handy | 23:04 |
pabelanger | the appliance comes online with non-routable public IP, but controller can access it. I need to figure out a better way of updating hostvars with right private IP, so I can use write-inventory role. | 23:05 |
pabelanger | Used the replace filter, but really want to replace I think | 23:05 |
fungi | okay, i've gotten my dev env to the point where i can reproduce the opaque sphinx error with change 680778 | 23:08 |
corvus | fungi, SpamapS: okay, by reverting each chunk of the patch in 680778 one at a time and running sphinx, i've found it's related to the addition of the GerritPoller class in gerritconnection.py | 23:08 |
SpamapS | hah I did that too and was about to say that.. ;) | 23:09 |
SpamapS | Luckily it was the 3rd hunk. ;) | 23:10 |
SpamapS | You can also just not mention that class, and it will build correctly. | 23:10 |
corvus | i don't think it's referenced in the docs | 23:11 |
SpamapS | hm, you're right | 23:12 |
*** saneax has joined #zuul | 23:12 | |
SpamapS | but I can also get the docs to build if I revert doc/source/admin/drivers/gerrit.rst | 23:12 |
SpamapS | oh wait no, they just get further | 23:13 |
SpamapS | corvus:the problem is GerritPoller needs a """ """ instead of # | 23:13 |
corvus | it's specifically the line "poller_class = GerritPoller" setting a class variable in GerritConnection | 23:13 |
SpamapS | oh hah every time I think I find it it explodes more | 23:14 |
SpamapS | in different ways.. derp | 23:14 |
SpamapS | yeah | 23:14 |
corvus | okay, i suspect the linkage here is that the testing doc does documentFakeGerritConnection which links to GerritConnection (though that is not documented); but that might be enough to get the sphinx autodoc stuff examining that class | 23:16 |
corvus | why it's treating that instance variable that way is still a mystery | 23:16 |
* SpamapS puts $4.20 on it being some bonghits python parsing corner case | 23:16 | |
*** sanjayu_ has joined #zuul | 23:18 | |
*** igordc has quit IRC | 23:18 | |
corvus | huh, i assumed the "class" in "poller_class" was the "class" it was refering to, but no -- even if i change that varname to "poller_thingy" it still barfs | 23:19 |
fungi | even commenting out the line entirely still breaks | 23:20 |
*** saneax has quit IRC | 23:20 | |
corvus | i'm able to fix it by commenting that line out | 23:21 |
fungi | maybe i'm commenting out a different line | 23:23 |
corvus | fungi: in change 680778 it's gerritconnection.py line 363 | 23:23 |
fungi | also possible i needed to git clean between runs | 23:23 |
corvus | okay, new hypothesis -- testing.rst autodocs FakeGerritConnection with "inherited-members" so it picks up stuff from GerritConnection, which has a class variable that points to another class and sphinx can't handle that. | 23:25 |
fungi | with that line commented out (same line number/file) it breaks on me with the above error | 23:25 |
fungi | maybe there's more you've also commented out? | 23:25 |
corvus | (i explicitly tested reparenting that class from threading.Thread to object to exclude the hypothesis that it's the threading class that's weird) | 23:25 |
corvus | fungi: almost certainly. let me reset to that and see. | 23:25 |
SpamapS | that actually makes sense given the error message | 23:26 |
SpamapS | It's probably looking for `str` or `bytes` or `unicode` and it's getting `class` back from `type(thatvar)` | 23:26 |
fungi | fwiw, a web search on that error mostly shows commits/discussions about developers silencing it | 23:27 |
fungi | though i never found a great explanation of why it was turning up in their cases, this explanation does make sense | 23:27 |
fungi | sphinx basically has a hard-coded list of types it expects | 23:27 |
corvus | fungi: yep, i also had removed it in base.py -- so essentially the error shows up in 2 places | 23:28 |
fungi | i wouldn't be surprised if class objects as classvars confuse it | 23:28 |
corvus | if we remove *both* of those lines it fixes it | 23:28 |
fungi | yep, confirmed | 23:30 |
fungi | i wonder if there's a way to mark variables so that autodoc will skip them | 23:32 |
corvus | well, one way to do that is to prefix them with an '_', which is a perfectly acceptable solution in this case so i think i'll go with that :) | 23:32 |
*** jamesmcarthur has joined #zuul | 23:32 | |
*** igordc has joined #zuul | 23:33 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add support for the Gerrit checks plugin https://review.opendev.org/680778 | 23:34 |
corvus | fungi, SpamapS: thanks! that was "fun" :) | 23:34 |
fungi | indeed, that's a reasonable workaround i guess | 23:37 |
fungi | it's possible https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#skipping-members could be used as an alternative, but the simple solution seems fine | 23:40 |
*** igordc has quit IRC | 23:40 | |
*** igordc has joined #zuul | 23:42 | |
*** jamesmcarthur has quit IRC | 23:46 | |
*** jamesmcarthur has joined #zuul | 23:46 | |
SpamapS | corvus: at least you didn't have to print out the hierarchy this time. ;) | 23:48 |
*** jamesmcarthur has quit IRC | 23:51 | |
*** sanjayu_ has quit IRC | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!