Wednesday, 2022-08-31

*** dasm|off is now known as Guest164504:00
*** jm1|ruck is now known as jm1|rover07:00
jm1moin #oooq :)07:00
chandankumarjm1: o/07:14
chandankumarjm1: need any help on ruck rover, do let me know :-)07:14
jm1chandankumar: ok thanks! will ping you when i need help ^^07:28
*** jpena|off is now known as jpena07:33
jpodivinreviewbot: https://review.rdoproject.org/r/c/rdo-jobs/+/4460808:10
jpodivinreviewbot: add to review list https://review.rdoproject.org/r/c/rdo-jobs/+/4460808:19
reviewbotI could not add the review to Review List08:19
jm1chandankumar: ./ruck_rover.py --release osp17-1 --distro rhel-9 --component all fails for me :/ does it work for you?09:13
chandankumarjm1: yes it fails for me also https://paste.centos.org/view/ed91d40d09:14
jm1chandankumar: yeah thats the error i get as well09:16
afuscoarHii. One little question, do you have any update about what could be happening in the dashboard that says no data to show (http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/tbUsg0Z4k/downstream-data?orgId=1), thx09:17
chandankumarjm1: https://code.engineering.redhat.com/gerrit/plugins/gitiles/tripleo-environments/+/refs/heads/master/ci-scripts/dlrnapi_promoter/config/RedHat-9/component/rhos-17.1.yaml09:19
chandankumarcriteria is empty currently that;s why it is failing09:19
chandankumarI think ruck/rover tool does not consider this case when crtieria is empty09:19
jm1afuscoar: ananya is currently working on downstream cockpit but she is on PTO this week09:21
jm1chandankumar: nice catch! wanna submit a patch? ;)09:22
afuscoarThank you jm109:23
chandankumarlet me put a patch09:24
chandankumarjm1: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/4469409:53
chandankumarjpodivin: o/ please have a look at last comment https://review.rdoproject.org/r/c/rdo-jobs/+/44608/3#message-8eb50329a9b09c43b335f40d499ef6675ec0775509:53
jpodivinchandankumar: right. Thanks for pointing that out09:54
jpodivinreviewbot: add to review list https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/4469510:07
reviewbotI could not add the review to Review List10:08
jm1chandankumar: thank you! voted on your patch10:11
jm1chandankumar: why is ruck_rover.py not printing any component jobs to rerun although it has a lot of red components?10:12
* jm1 lunch10:17
rlandyjm1: hey - when you are back from lunch  will run through the program doc with you10:37
rlandy<jm1> chandankumar: ./ruck_rover.py --release osp17-1 --distro rhel-9 --component all fails for me10:37
rlandythere are no components for 17.110:37
rlandywaiting on CRE team to develop them10:37
chandankumarrlandy: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44694 will fix the current issue10:41
rlandychandankumar; ok - want me to merge that or bring it to review time?10:42
chandankumarrlandy: feel free to merge that10:43
chandankumarrlandy: thank you :-)10:52
jm1rlandy: o/ wanna sync?11:10
rlandyjm1: sure11:10
rlandyhttps://meet.google.com/qsr-cfbr-mep?pli=1&authuser=011:11
chandankumaranyone wants to join review meeting? there is not much there , all of them already reviewed11:16
chandankumarindia having public holiday, I think we can skip the meeting11:17
chandankumarrlandy: please merge https://review.rdoproject.org/r/c/rdo-jobs/+/44608 https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 and https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/4330711:22
*** dviroel|out is now known as dviroel11:23
dviroelo/11:23
rlandydviroel: hi - pls see pvt chat11:47
rlandychandankumar: dviroel: lol - this is fun - 5 mins pls ... https://meet.google.com/ucs-kaow-bkd?pli=1&authuser=011:50
rlandydviroel: ^^ pls11:51
rlandyI only have a few minutes11:51
jm1dviroel: your kvm patch is working 🥳11:53
dviroeljm1: thats great, thanks for updating it11:56
dviroeljm1: i will check in a bit11:57
jm1rlandy: want to add your alternative solution for BZ:2105408 (switching back to vexxhost jobs) to cix card? :)12:07
rlandyjm1: we would still need the fix for internal12:07
rlandyso it's not really ab alterntaive12:07
rlandybut I will put in the testproject job to test out vexx on w and t12:07
jm1rlandy: not a complete alternative solution but a partial solution which could "fix" our c9 master/wallaby jobs (and consolidate it on one platform: vexxhost)12:09
jm1rlandy: will also make rerunning jobs easier because we only need rdo testprojects 12:09
rlandyjm1: yep - working on that testproject now12:19
rlandyakahat: hello ...12:36
rlandyremember the work to move sc010 all to vexx12:36
rlandydo you still have those reviews?12:36
reviewbotDo you want me to add your patch to the Review list? Please type something like add to review list <your_patch> so that I can understand. Thanks.12:36
rlandytestprojects?12:37
rlandyakahat: master sc010 on vexx is looking much more stable than psi now12:37
rlandyI want to recheck your jobs12:37
rlandyi think we merged the definitions12:41
rlandyhttps://review.rdoproject.org/r/q/topic:sc10_cs912:46
rlandyok - so maybe it was just master12:46
rlandyadding wallaby c9 and train12:46
jm1rlandy: we can get a c9 master promotion (integration line) today. only three jobs missing which failed on different intermittent errors12:49
jm1rlandy: internal kvm job on c9 master did not fail today so apparently it ran on a patched L0 host12:49
rlandygreat12:51
rlandyjm1: do we need to skip promote or it will just go through?12:51
jm1rlandy: simply wait12:51
rlandyawesome12:51
jm1rlandy:  same for c9 wallaby: internal kvm job passed, so psi is maybe upgrading their nodes to rhel 8.6?12:52
rlandyit's intermittent12:52
rlandysometimes to get a lucky run12:53
jm1rlandy: yeah because psi has some nodes with rhel 8.6 or code is running on amd cpus, i guess. and since these internal kvm jobs are passing more often now, we can imply they are upgrading their infra. anyway, good to see its getting better12:55
jm1rlandy: c9 wallaby integration jobs failed on different intermittent errors, hence we can simply rekick them until they pass. then c9 wallaby should promote13:24
rlandyyep - give it another shot13:24
jm1rlandy: already did. proceeding with c8 train13:25
jm1rlandy: omg c8 train should also promote today when we rerun jobs often enough 🙄13:31
rlandyjm1: your reactions are so funny :)13:32
rlandyhttps://review.rdoproject.org/r/c/rdo-jobs/+/44696 Add wallaby c9 and train c8 kvm jobs13:35
rlandytestprojecting that13:39
rlandyhttps://review.rdoproject.org/r/c/testproject/+/3625413:44
rlandylet's see what that does13:44
chandankumardviroel: please merge https://review.rdoproject.org/r/c/rdo-jobs/+/44608 https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 and https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/4330713:49
*** Guest1645 is now known as dasm13:50
chandankumarrlandy: ^^13:50
chandankumardasm: o/13:50
dasmo/13:51
rlandylooking13:52
jm1rlandy: thanks!13:53
jm1rlandy: most c9 master components should promote as well. only exception is tripleo component, since takashi is still working on patches13:54
rlandydone13:54
rlandyjm1+++++++++++++++++13:55
chandankumarrlandy: dviroel on IBM cloud, we can run 5 fs01 ovb jobs successfully in one go 13:57
chandankumarno more retry limit issue there on OVB i think13:57
dviroelchandankumar: nice, how do we control that? if there is 5 ovb running there, and we trigger a testproject, it will fail?14:00
jm1rlandy: wanna join our meeting with phil?14:01
chandankumardviroel: those are 5 ovb fs01 jobs14:02
chandankumarhttps://review.rdoproject.org/r/c/testproject/+/44388/10#message-13e83612b696e4d730b924d4c3ed80c3d745ad9214:02
chandankumarfs01 consume maximum resource14:02
chandankumarin our pipeline, we have multiple ovb jobs with different config14:04
chandankumarI think testproject will also work fine14:04
jpodivinchandankumar: The https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 failed in gate. Something about not sharing the change queue with dependency ...14:05
jpodivinshould I hit recheck or is there something else that should be done? 14:05
chandankumardviroel:  regarding controlling, if we start seeing issues or retry_limit we can ask infra to increase the space14:05
chandankumarjpodivin: recheck would be fine14:05
chandankumaronce dependson merges14:05
jpodivinchandankumar: done. Thank you very much14:05
jm1rlandy: wanna join cix?14:31
dviroeljm1: false positive on kvm patch14:49
dviroeljm1: not using cpu_model on libvirt conf14:50
chandankumarsee ya people!14:53
dasmchandankumar: o/14:54
dviroeljm1: we can't remove 'libvirt_', because is stable/train14:56
dviroeljm1: if you go to "logs/undercloud/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf" you can check cpu_model config there14:59
dviroeljm1: updated the patch, rerunning testproject14:59
jm1dviroel: omg true, thanks for checking and fixing this. so the issue was simply that cascadelake was used instead of skylake15:02
jm1dviroel: our internal kvm jobs for c9 master and c9 wallaby passed today and my not-doing-anything-patch job passed as well. does this mean psi has upgraded their environment? 15:04
jm1dviroel: for example check this here, no cpu_mode=host-model and it is still passing https://sf.hosted.upshift.rdu2.redhat.com/logs/32/425432/6/check/periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2-keys/9c6085d/logs/undercloud/var/log/containers/nova/nova-compute.log15:05
jm1dviroel: "no cpu_mode=host-model" => "cpu_mode=host-model"15:05
toskyarxcruz: hi, I've just noticed that https://review.opendev.org/c/openinfra/python-tempestconf/+/849127/ now fails with a weird error in one of the jobs, which was working in the previous recheck15:07
toskyarxcruz: should we just recheck or do you think it may be related to something else going on?15:07
arxcruztosky checking15:07
dviroeljm1: not sure, we have a mix of success and failures I think, so it may depends where the instance gets scheduled15:08
dviroeljm1: https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2&skip=015:08
arxcruztosky it's related to the rpm build, i would recheck just in case, since the other jogs are running fine, if still fails, we need to check if something change in the cfg to build the rpm 15:09
toskyarxcruz: oki, thanks15:11
jm1dviroel: your last failing job (before i played with it) had skylake cpus but the later jobs god icelake cpus15:16
jm1dviroel: but both had rhel8.4 below15:17
dviroeljm1: i see, lets see if this workaround works for everything15:22
* dviroel lunch15:25
*** dviroel is now known as dviroel|lunch15:25
jm1dviroel: what makes it difficult is that e.g. periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-master failed with the same error yesterday but was running on icelake cpus https://sf.hosted.upshift.rdu2.redhat.com/logs/29/426429/2/check/periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-master/e31bce2/zuul-info/host-info.primary.yaml15:27
* jm1 bbl15:39
rlandyjm1: c8 - nice clean run :) https://review.rdoproject.org/zuul/buildset/d89307d240ed41d8a15cc84e0bb6ab1815:41
rlandyshould promote15:41
rlandyOVB stacks are not being deleted in downstream16:07
rlandylunch - brb16:20
*** dviroel|lunch is now known as dviroel16:25
dviroeljm1: damn, failed, trying again. It should be cpu_model not cpu_models16:43
*** jpena is now known as jpena|off16:43
rlandydasm: hi17:58
rlandydasm: could you upload the bmc-template?17:59
rlandyjm1: definite issue with component jobs and ovb in downstream18:02
dasmrlandy: it's done18:03
rlandygreat18:03
dasmi left notes on rr review list18:03
carlossdviroel++ - thank you for all the reviews in Manila features for feature freeze. I owe you a beer!18:51
reviewbotDo you want me to add your patch to the Review list? Please type something like add to review list <your_patch> so that I can understand. Thanks.18:51
dviroelcarloss: np, i will update the beer counter18:54
dviroelcarloss: you can ignore our friend reviewbot18:55
carlosshaha18:55
carlossI wondered if he was talking to me or not18:55
carlosss/he/it18:55
jm1rlandy: fs64 and fs35 on c9 wallaby and c9 master are very unstable. both are the only jobs missing for promotion. they are failing on intermittent issues again and again :(19:09
rlandyjm1: if there are only tempest tests failing and they ar not the same, we can skip promote19:10
rlandythere is an issue with OVB with downstream19:10
rlandymetadata service19:10
rlandyon meeting with phil now19:10
rlandyneed to chat with rhos-ops about that after meeting19:11
dviroeljm1: worked with skylake cpu type: https://sf.hosted.upshift.rdu2.redhat.com/logs/32/425432/6/check/periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2-keys/4acf7b6/logs/undercloud/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf19:19
dviroeljm1: running test again, on 16-2 and 17 now19:19
jm1rlandy: jobs for fs35 and fs64 on c9 master and c9 wallaby always fail for different reasons. but you will see the same intermittent errors across those different jobs19:23
jm1rlandy: i reformated the rr notes and now you can better see that intermittent errors appear very often across jobs but very rarely twice on the same job19:28
jm1rlandy: for example "‘arecord’ not found." and "ensure apache is installed" happens often19:29
dasmjm1: is there any pattern to the failure? Can we investigate some global|common fixes?19:29
dasmI'd like to add some extra tweaks to make our jobs more reliable19:30
jm1dasm: https://hackmd.io/94uNoMlnQgegrgy1iXV1kQ?view19:32
jm1dasm: i cannot see a pattern but its late here so...19:33
dasm> Cannot download, all mirrors were already tried without success19:33
jm1dasm: looks like network or connectivity is often an issue19:33
dasmhmm... repos might be out of sync. Unfortunately it happens very often19:33
rlandyjm1: sorry - just coming out of meeting19:34
jm1dasm: but it only happens for one run and on the next run it passes and fails on something else19:34
dasmhmm19:34
rlandygettin back to ovb downstream19:34
jm1dasm, rlandy: i think this "'arecord' not found." is a real issue19:34
rlandyjm1; when did it start?19:34
rlandycan you track it to a specific change?19:35
jm1rlandy: its intermittent and i see it since yesterday but i had no time to check older jobs19:36
rlandydasm: can I merge https://review.rdoproject.org/r/c/config/+/44595?19:36
jm1rlandy: actually i started tracking intermittent issues in detail only since yesterday19:36
rlandyjm1; ok - I'll look at it19:36
jm1rlandy: better deal with downstream, for these intermittent bugs we should simply create launchpad bugs tomorrow19:37
jm1rlandy: please simply recheck my testprojects for c9 master and c9 wallaby. those fs35 and fs64 jobs are the only ones missing for promotion and they really try hard to fail on something else everytime19:39
rlandyjm1: sure - will leave you notes19:39
rlandywallaby c9 is only out fs06419:41
rlandywill recheck19:41
rlandyjm1: master and wallaby are both only out on fs064 19:43
rlandyretesting those19:43
jm1rlandy: oh fs35 just passed, awesome!19:45
* rlandy check fs064 failure19:45
jm1rlandy: already checked it19:46
jm1rlandy: its in rr notes19:46
* dviroel going afj - run an errand19:46
rlandyjm1: ok - late for you - I'll check it19:46
jm1rlandy: fs64 both failed on "'arecord' not found."19:47
*** dviroel is now known as dviroel|afk19:47
rlandy2022-08-31 18:45:23 | 2022-08-31 18:45:23.950835 | fa163e7d-a1a6-78e8-a52d-000000006cc1 |      FATAL | try modifying forward dns record | undercloud | error={"changed": false, "msg": "`arecord` not found."}19:48
rlandyI see that 19:48
rlandybut it carries on19:48
jm1rlandy: yes, i shorten it in job overview to "'arecord' not found." in rr notes. if you jump to section "Intermittent Failures" you will find a longer description19:50
rlandyjm1; think the arecord thing looks legit19:50
jm1rlandy: i will file a bug tomorrow with some more details19:51
* rlandy asks on sec channel19:51
jm1rlandy: ok thanks! i am eod now19:53
* jm1 have a nice evening #oooq 🥂19:54
rlandyjm1: have a good night19:54
rlandylogging bug19:54
rlandyto pass to sec team19:54
*** dasm is now known as dasm|off21:40
dasm|offo/ see you tomorrow21:40
*** rlandy is now known as rlandy|bbl22:25
*** dviroel|afk is now known as dviroel23:12
rcastilloo/23:21
rcastilloI'll be on pto until monday, see you next week23:21
dviroelrcastillo: enjoy o/23:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!