Tuesday, 2022-11-29

*** rlandy is now known as rlandy|out00:29
*** clarkb is now known as Guest29801:19
*** Guest298 is now known as clarkb01:20
*** atmark is now known as Guest30502:10
*** yadnesh|away is now known as yadnesh04:14
Tenguclarkb: need to read some doc about what's done for pypi in the proxy thing, but I think I get it, more or less. basically I'll have to get the S3 URI, and call the "substitute" in order to rewrite it to some "ansible-galaxy-files" location, to match a new "endpoint" in the proxy config.07:57
TenguI'll work on that.07:57
Tenguhey. wait.07:59
Tenguactually.... there's ALREADY an endpoint!07:59
* Tengu dumb for not checking beforehand07:59
Tenguhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L127-L13308:00
Tengufungi: :) you actually created -^  via change-id Ib5664e5588f7237a19a2cdb6eec3109452e8a10708:01
*** yadnesh is now known as yadnesh|afk08:11
*** jpena|off is now known as jpena08:23
*** yadnesh|afk is now known as yadnesh08:27
*** rlandy|out is now known as rlandy11:05
*** dviroel|afk is now known as dviroel11:12
*** frenzy_friday is now known as frenzy_friday|rover12:15
fungiTengu: somehow this doesn't surprise me12:42
Tengufungi: same :)12:43
Tengufungi: go get your coffee first :]12:43
fungiaha, yes i guess the tripleo team asked to have it added roughly a year ago12:43
Tengusounds like something matching the votes :)12:43
Tenguand they never put it to use.12:43
fungiso this means they didn't end up using it? i wonder why12:43
opendevreviewMerged openstack/project-config master: Use kolla.config for kolla-ansible in gerrit  https://review.opendev.org/c/openstack/project-config/+/86568612:50
fungiTengu: i guess test it and make sure it's working, so we can adjust it12:51
Tengufungi: yeah, I'll talk with them today during the community call :)12:51
Tengufungi: I've pushed this https://review.opendev.org/c/opendev/base-jobs/+/865970 to make the ansible proxy more "visible"12:52
fungiTengu: would https work better? i have no idea if the ansible-galaxy tool cares either way12:57
Tengufungi: I didn't hit such issue over the testing, but maybe switching to tls would be better.12:57
Tenguespecially since the certificate is valid12:58
fungiwe added let's encrypt to our mirrors more recently than we set those existing envvars in the base job, but if the tool is happy either way it probably doesn't matter12:58
Tengubah, let's switch to TLS12:58
Tenguit's always better imho.12:58
Tenguand future-proof12:59
TenguTLS is in the 4443, isn't it?12:59
fungino, just the regular 44312:59
Tengureally?12:59
Tengufungi: the comment in the mirror config seems to state otherwise... ?13:01
Tengu# Dedicated port for proxy caching, as not to affect afs mirrors.    and 8080, 444313:01
Tengu(among things)13:01
Tengufun...13:02
Tenguoh. ok. /galaxy/ is defined in the BaseMirror13:02
fungithe test_galaxy_mirror test added in the change you referred to just connects to "https://%s/galaxy/" % addr where addr is just a raw ip address13:02
Tenguyep13:02
TenguI wanted to double-check with the apache config itself.13:03
Tengunow I get it: BaseMirror macro defines the galaxy, and is called for 80 and 44313:03
fungithe reason we use that BaseMirror macro is so that we can serve the same things through http on 80 and https on 443 without duplicating the configuration13:03
Tengusame goes for the ProxyMirror macro, but for other ports.13:03
Tenguit's a nice feature from httpd13:04
fungiall the higher numbered ports are for "special" things which can't have subpaths relative to the root path13:04
fungiwe try not to add those when we can help it13:04
fungibut some tools are a bit braindead in their assumptions13:04
Tenguheh - no wonder.13:05
TenguI updated my patch to reference https:// and removing the port.13:05
Tengugood catch anyway, because the :8080 would fail anyway.13:05
*** dasm|off is now known as dasm13:05
fungiahh, yeah, i didn't even spot the :8080!13:06
* fungi takes another gulp of coffee13:06
Tengu;)13:08
Tenguand I guess we can merge my NetworkManager thingy? 3x +2 is good13:08
Tenguah, thanks fungi :). Also thanks for the "-print" vote.13:18
TenguI forgot about that one actually :]13:18
fungiyeah, i still think the df that one adds won't tell you much new since we already log a df (and df -i) at the start of every job13:19
TenguI can remove it13:19
opendevreviewMerged openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf  https://review.opendev.org/c/openstack/project-config/+/86543313:19
fungirunning a df after the mv might give you more insight13:19
Tengulet's do that!13:20
fungisince then you can compare against the one from job start13:20
Tengulemme correct/amend.13:20
opendevreviewCedric Jeanneret proposed openstack/openstack-zuul-jobs master: Add some output to the `find' command  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86538313:22
Tengubetter.13:22
Tengufungi: also updated the commit message to mention the zuul-info13:22
*** frenzy_friday|rover is now known as frenzy_friday|rover|food13:43
Tengufungi: what's the ETA to get the first nodepool images built with the NetworkManager config running in the CI?13:45
fungiTengu: images are rebuilt ~daily, and you can see the list of built images at http://nl01.opendev.org/dib-image-list while the list of uploaded images in each provider is http://nl01.opendev.org/image-list13:49
Tenguah, cool! thanks 13:49
fungiTengu: if you want to see the build logs for a particular image, identify the builder it was built on from the dib-image-list and then go to it in a browser, like https://nb01.opendev.org/13:50
Tenguwow. that's neat!13:51
fungii think the zuul info we log from each build may also embed image ids for the nodes, looking now...13:51
TenguI think I've seen it in the zuul-info/13:51
Tengufungi: the "age" is Day:Hours:Minutes:Seconds I guess?13:52
Tenguyep, looks like so13:52
fungicorrect13:53
Tenguseems there are some stalled in "deleting" state :/13:53
fungiand no, i can't seem to find the image id in the logged zuul-info, but if i'm not overlooking it then maybe that's something worth adding13:54
fungiTengu: a fun fact about image deletion. if you use boot from volume for a server instance, you can't delete the image while the server is still running. if a node is held in such a provider, or stuck deleting, then the image it was booted from can't be deleted13:55
fungiwe go through and try to clean them up manually from time to time13:55
Tengufungi: erf..13:55
Tengufungi: so we have the "image-hostname" alongside dib-builddate: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_be0/863872/14/check/tripleo-ci-centos-9-standalone/be07f2c/zuul-info/zuul-info.primary.txt13:55
Tenguthat's the colses I seem to be able to find.13:56
Tengu*closest13:56
fungiright, the dib-builddate could be used to get us close enough to identifying the image used13:57
fungithough actually logging the image id would be even better13:57
Tengui.e. generate the image-id before the actual build, inject it, and use that id while uploading?13:57
fungimore likely plumb it back through the node request to the zuul scheduler and add it to the inventory13:59
Tengu'k. well - I don't know how things are piped in there ;)14:09
*** dviroel is now known as dviroel|lunch16:11
*** frenzy_friday|rover|food is now known as frenzy_friday|rover16:21
clarkbvishalmanchanda: ok updated zuul-jobs patch pushed. We can recheck your change once that comes back green16:29
vishalmanchandaclarkb: sure, thanks.16:29
Tenguclarkb: heya! just saw your comment about the env var for ansible-galaxy proxy - there are some ansible variables already available somewhere?16:40
clarkbTengu: not for galaxy as far as I know. But other things like distro packages mirrors and pypi mirror and so on have roles that configure them16:41
clarkbTengu: there is the base mirror fqdn and the nthe roles tack on the service specific bits and configure them16:41
clarkblet me find an example of that16:42
Tenguhmm. care to show me? if it's just a matter of adding a role somewhere, and call it, I'd be more than happy16:42
Tengunote that tripleo is also using RDO, so maybe that's why our jobs are relying on that "old" file exposing env vars?16:42
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/defaults/main.yaml#L2-L316:43
Tenguoh, and then it's used in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/tasks/mirror.yaml16:44
Tenguoook.16:44
Tenguand, provided configure-mirror role is called from within the job, we'll get the proper config directly.. ?16:44
clarkbfor the things that role configures16:45
clarkbI don't think galaxy should be configured by that role16:45
Tengui.e. I can ini_file /etc/ansible/ansible.cfg, and add the galaxy.server key and be off with that?16:45
clarkbbut I wanted to show you an example how you can use the base mirror fqdn to construct a mirror location in an ansible role16:45
Tengu'k16:45
clarkb(reall I wish pypi wasn't configured by that role and it only did distro mirrors, but that is a historical artifact that is difficult to change now)16:46
Tenguzuul_site_mirror_fqdn  is something that exists and is available then?16:46
clarkbyes, we set it in opendev. That role is expected to be generic enough to run when it isn't set though hence the omit check16:46
Tenguok. I'll consider it then16:46
Tengujust need to make thing that's compatible with RDO infra as well16:46
clarkbvishalmanchanda: the zuul-jobs update is green16:52
vishalmanchandaclarkb: ack.16:53
*** dviroel|lunch is now known as dviroel17:12
*** jpena is now known as jpena|off17:17
clarkbTengu: fungi: I've been looking at the /opt move and supposedly rsync might be quicker? That doesn't delete on the source which we also want though17:55
clarkbI wonder if the speed ends up equivalent once you add in the delete step after copying17:55
clarkbwe can test this17:57
fungiyeah, that change was more just to get some indication of the current performance before experimenting with alternatives17:58
clarkboh is there an existing change?17:58
opendevreviewClark Boylan proposed openstack/openstack-zuul-jobs master: Test /opt move using rsync  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86605418:08
clarkbfungi: Tengu  ^ more debugging18:08
clarkbis there a change depending on the parent that I can update?18:09
clarkbhttps://review.opendev.org/c/openstack/devstack/+/858996 is a devstack change I already had for similar purposes I've updated it18:14
fricklerjust note that in general performance of our nodes seems to vary by +/- 50%, so comparing performance needs a large sample size18:22
clarkbyup18:24
clarkbmtreinish had good data on this once upon a time too. And the variance is crazy18:24
clarkbeven when you only look at nodes in a single provider18:24
fungiclarkb: Tengu's change is https://review.opendev.org/865383 Add some output to the `find' command18:24
fricklerthe other question is do we really need to free the space on /? otherwise we could consider moving /opt/git to /srv/git or whatever and just symlink to that?18:28
clarkbfrickler: jobs hit the 20gb limit on rax all the time18:29
clarkbeven with clening out the 10gb of /opt18:29
clarkbthe problem is that /var is used by journald and docker and so on18:30
clarkbmakes it really easy to fill a few gigabytes on /18:30
fricklerhmm, from the flavor I see we should have 40G as root disk, where do you see 20G?18:40
clarkbhrm I thought it was 20GB maybe that is what we end up with free and not total size18:41
clarkbanother thing we can/should look at is trimming the contents of /opt18:41
clarkbthe bulk of the data there is git repos and maybe we've got some git repos we can prune out18:41
clarkbalso maybe the cirros images and friends can be reduced (they are very small already)18:41
frickleron a random node I see 29G of 37G free after the move to /opt has happened. /opt has 13G used. if 16G on / aren't enough (without rming), then IMO jobs need to be fixed18:47
frickleror we need to declare rax unusable for that kind of jobs18:47
clarkb"/bin/sh: 5: time: not found" fyi18:48
clarkbfrickler: its 16GB after rming though right?18:48
opendevreviewClark Boylan proposed openstack/openstack-zuul-jobs master: Test /opt move using rsync  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86605418:49
fricklerno, after rming we have 29G free on /. not sure about the original usage, but it can have been at most the 13G now used on /opt18:50
clarkbfwiw the /opt move is limited to openstack jobs. Its not something we do globally in base jobs18:59
clarkbI guess with a bit of testing for explosions we might be able to remove it for openstack as well. But the potential blast radius is quite large19:00
mtreinishclarkb: I think I have a subunit2sql db archived somewhere if people want hard numbers from like 4-5 yrs ago :)19:33
mtreinishlooking through old presentations on the topic I had this image in a slide: https://blog.kortar.org/wp-content/uploads/2022/11/runtime_variance.png19:47
mtreinishbut I don't remember the context of exactly it was graphing (and the details aren't in the slide besides just saying "Runtime variance")19:47
mtreinishI assume it's just of a random tempest test across all gate runs based on the y axis19:48
Tenguclarkb: ah, i was thinking about rsync as well. though I think find might have been used for potential hidden directories?20:09
Tenguwe can of course discuss tomorrow if you want, I'm on a private device with no acxess but irc20:10
Tenguclarkb: "time" not found?! errrr.. is it embeded in bash? will check that out tomorrow.20:21
fungiTengu: or we're not installing the package needed to make it available20:22
*** swalladge is now known as Guest40121:24
*** dasm is now known as dasm|off21:39
*** dviroel is now known as dviroel|out22:00
clarkbfwiw I think my test change has failed to land on rax but I need to double check that before rechecking22:18
clarkbcaught one https://zuul.opendev.org/t/openstack/stream/37dffb86cfad4ae3b3717f86ed294efc?logfile=console.log22:36
clarkbits not looking any quicker22:39
clarkb(granted sample size of one)22:39
clarkbI'm not super surprised by that. The bottleneck is almost certainly disk io22:39
*** rlandy is now known as rlandy|out23:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!