Thursday, 2023-07-27

opendevreviewMerged openstack/nova master: Add compute_id to Instance object  https://review.opendev.org/c/openstack/nova/+/87950001:38
opendevreviewTakashi Kajinami proposed openstack/nova master: Enforce correct choice for [DEFAULT] enabled_ssl_apis  https://review.opendev.org/c/openstack/nova/+/88978701:38
opendevreviewmelanie witt proposed openstack/nova master: block_device: Add encryption attributes to swap disks  https://review.opendev.org/c/openstack/nova/+/88431207:10
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675407:10
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675507:10
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093407:10
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093207:10
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093307:10
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093507:10
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093607:10
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093707:10
opendevreviewmelanie witt proposed openstack/nova master: block_device: Add encryption attributes to swap disks  https://review.opendev.org/c/openstack/nova/+/88431207:20
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675407:20
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675507:20
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093407:20
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093207:20
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093307:20
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093507:20
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093607:20
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093707:20
*** tobias-urdin-pto is now known as tobias-urdin07:34
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Verify whether we persist some wrong ReqSpec values  https://review.opendev.org/c/openstack/nova/+/88984308:54
opendevreviewalecorps proposed openstack/nova-specs master: new spec: support of vnc console for mks  https://review.opendev.org/c/openstack/nova-specs/+/88984409:02
sean-k-mooneybauzas: so we have a gate blocker in the nova-lvm job 10:20
bauzasshit shit shit10:20
sean-k-mooneytest_get_console_output_server_id_in_shutoff_status is failing with AssertionError: '' is not true : Console output was empty.10:20
sean-k-mooneyim seeing that quite a lot10:21
sean-k-mooneyalways in the lvm job for some reason and not others10:22
sean-k-mooneylooking at the test im wondering if this could be a test bug or not10:23
sean-k-mooneyhttps://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L71310:23
sean-k-mooneyim wondering if we are perhaps stoping the server before the guest writes to the console?10:23
sean-k-mooneyhttps://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(hosts_region,project,build_name),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:project,negate:!f,params:(query:openstack%2Fnova),ty10:27
sean-k-mooneype:phrase),query:(match_phrase:(project:openstack%2Fnova)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:d,query:(language:kuery,query:'message:%22Console%20output%20was%20empty.%22'),sort:!())10:27
sean-k-mooneyhum let me get a short link10:27
sean-k-mooneyhttps://tinyurl.com/kvcc928910:28
sean-k-mooney132 failures in the last 3 days10:29
bauzassean-k-mooney: ack, ok10:36
bauzassorry, was on meeting10:37
sean-k-mooneyno worries just filed https://bugs.launchpad.net/nova/+bug/202885110:37
sean-k-mooneyim going ot add a wiat for sshabel before the server stop in tempest and see if that helps10:38
bauzassean-k-mooney: it looks to me it regressed two days before https://zuul.openstack.org/builds?job_name=nova-lvm&skip=010:38
bauzaslast successful run was 2023-07-24 09:18:0310:39
sean-k-mooneyya it stated on the 24th10:39
sean-k-mooneyim not really sure what could have merged then10:39
bauzaslooking at gerrit10:39
bauzasit doesn't sound a race, rather a regression10:40
sean-k-mooneyhttps://github.com/openstack/nova/commits/master10:40
sean-k-mooneywe had one commit on the 21st and then the 25th10:40
sean-k-mooneyso i dont see anything in nova that aligns10:40
bauzasyeah10:42
bauzassean-k-mooney: but tempest has suspects https://github.com/openstack/tempest/commits/04cb0adc822ffea6c7bfccce8fa08b03739894b710:43
sean-k-mooney ya i looked at the optimize one10:44
sean-k-mooneyi dont think that is touching the relevent test10:44
bauzashttps://github.com/openstack/tempest/commit/eea2c1cfac1e5d240cad4f8be68cff7d72f220a810:45
sean-k-mooneyoh the merge commit10:45
bauzaswe have a problem with server actions but this patch touches this10:45
sean-k-mooneyyep it moved the test10:47
sean-k-mooneythis is the orginal commit https://github.com/openstack/tempest/commit/1643536c9f7407dceccfaa18a2dd95156de1419c looks like dan wrote it10:48
bauzasprobably an unnotified merge conflict when rebasing10:49
sean-k-mooneyi dont think so10:50
sean-k-mooneywell maybe so what dan did was split up the large class into 310:50
sean-k-mooneyso that we got bettter parallisum10:50
sean-k-mooneyreducing the over all time of the  tests form 15 min to 5 on there system10:51
sean-k-mooneybut that might mean we have more load10:51
sean-k-mooneyand the vm boots might now be slower10:51
bauzasyeah sounds plausible10:53
sean-k-mooneyim just going to change server = self.create_test_server(wait_until='ACTIVE')10:54
sean-k-mooneyto server = self.create_test_server(wait_until='SSHABLE')10:54
bauzasshoot10:56
bauzasand depends-on it with a nova dnm please10:56
sean-k-mooneyyep10:56
* bauzas goes eating a kebab10:56
sean-k-mooneynow i want one :)10:57
sean-k-mooneyenjoy10:57
opendevreviewsean mooney proposed openstack/nova master: use disk caching to hide slow cinder performance  https://review.opendev.org/c/openstack/nova/+/88938311:03
sean-k-mooneyill use my diskcache mode change to test it instead ^ that way we can test both and save ci time11:04
opendevreviewalecorps proposed openstack/nova-specs master: new spec: support of vnc console for mks  https://review.opendev.org/c/openstack/nova-specs/+/88985712:56
dansmithsean-k-mooney: if you could use my nick so I get a notify I'd appreciate it13:29
dansmithI can't notify on "dan" because it's far too common13:29
dansmithsean-k-mooney: so which test are you changing from active to sshable?13:30
dansmiththe split patch literally just moved code, so it shouldn't affect anything13:30
dansmithit will increase parallelism, but we usually are running that test when the workers are fully loaded with tests, not after they've run out of work,13:30
dansmithso I wouldn't expect it to really change the profile much13:31
dansmithbauzas: can you please get this queued up? you've already +2d the follow-up to address your feedback: https://review.opendev.org/c/openstack/nova/+/884752/213:35
bauzasdansmith: cool13:35
dansmithI've been rechecking these for three weeks and I'm getting close, so I want to be able to land this when the time comes13:35
bauzasdansmith: as we discussed with sean-k-mooney this morning, we have a job issue13:35
dansmithI realize that, I still would like to have it ready to go, the patch below it is still waiting so +W isn't going to send it13:36
sean-k-mooneyi dont think we should slow down reviewing bacause of the job issue13:37
dansmithagree13:37
dansmithsean-k-mooney: I see the console_output thing, that's because it's being requested as sshable and doesn't have the validation things it needs 13:37
dansmithsean-k-mooney: not sure how that changed for that test, but I can fix it easily13:38
dansmithsean-k-mooney: did you open a bug for it?13:38
sean-k-mooneyi fixed the validation thing13:38
sean-k-mooneyi opened a bug13:38
sean-k-mooneyand i have a ptach in ci right now13:38
sean-k-mooneyits almost completed running13:39
dansmithlink?13:39
sean-k-mooneyhttps://review.opendev.org/c/openstack/tempest/+/88984713:39
sean-k-mooneyhttps://launchpad.net/bugs/202885113:39
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/88938313:39
sean-k-mooneyi used my disk caching patch to run it with a depends on13:39
sean-k-mooneyhttps://zuul.openstack.org/stream/9890c9b53059410eabc0c61df0376217?logfile=console.log13:39
sean-k-mooneythat is the nova-lvm job i have it open on my other screen13:40
dansmithoh I was looking at this: https://8d44202ac4ef26a68d85-49968699300828e6c9b78fd54dff75ef.ssl.cf1.rackcdn.com/889383/4/check/nova-lvm/368f504/testr_results.html13:40
dansmiththe in_shutoff test fail13:40
dansmithwhere it claims it doesn't have validation_resourecs13:40
dansmithsorry, no the unlimited_size one13:40
dansmithwell, both13:40
dansmithah, that's yours13:40
sean-k-mooneyya in a previous revsion i enabled sshable13:40
sean-k-mooneybut not the validation resouces13:41
dansmithah, okay cool13:41
dansmithokay so yep, increased load making us lose that race makes sense13:42
dansmithit's definitely not 100% fail13:42
sean-k-mooneyits only happening in the lvm job and i think that uses lvm on a loopback file so i suspect the disk io is worse there then other jobs13:43
dansmithbecause we've merged stuff in the last couple of days13:43
dansmither13:44
dansmithI actually don't see that on those patches.. is that in experimental or something?13:44
sean-k-mooneyno https://github.com/openstack/nova/blob/master/.zuul.yaml#L230-L24413:44
sean-k-mooneybut it does not run on all changes13:44
sean-k-mooneyit only runs on change to libvirt driver 13:45
dansmithah, only on libvirt I see13:45
dansmithokay13:45
dansmithwell in that case we *definitely* shouldn't be holding up review ;P13:45
bauzassorry, on an internal meeting now13:45
bauzasdansmith: yeah, I haven't asked to hold reviews :)13:46
dansmithbauzas: sounded like you did, but okay :)13:46
bauzasdansmith: it's the nova-lvm job, so yeah no13:47
dansmithack13:48
sean-k-mooneyill bring this up as a ptg topic but we have had the vmware driver as experimatal and deprecated for almost 2 cycle now. i think in C we should consider removing it13:48
sean-k-mooneyor at least moving it out of tree13:49
bauzasI don't personnally disagree, since Caracal will be a SLURP release13:51
sean-k-mooneythe deprecation/experimental warning went out in antelope so we ahve warned people and i dotn think we have had any ci in maybe 2 years13:52
sean-k-mooneythere is a spec propsoed to add a new nova-mksproxy intree for vmware. when we added mks supprot in liberty the proxy server was maintained out of tree 13:53
sean-k-mooneyhttps://opendev.org/x/nova-mksproxy13:53
sean-k-mooneybut i dont really like the idea of adding a new proxy in tree only for vmware with no testing13:53
sean-k-mooneymaintianed is proably an overly stong statement as almost nothign has been done to it in 6 years13:54
bauzassean-k-mooney: at least I haven't heard of any operator using vmware after Train14:02
dansmithsean-k-mooney: still failing this one: https://d5c0e4c80966c29b54ec-9ed411199ab33310a8b6bd95d117cd22.ssl.cf1.rackcdn.com/889383/4/check/nova-lvm/9890c9b/testr_results.html14:27
sean-k-mooneyya.. just saw14:28
sean-k-mooneyany other idea on how to fix other then disable it14:28
dansmiththat's weird, because it's failing after it has already been sshable and shutoff14:29
dansmithoh wait, lemme check something14:29
dansmithokay, no.. I was thinking maybe validation was disabled for this job, but it's not14:30
dansmithso yeah, pretty weird14:30
dansmithmaybe the api times out getting the console from the compute and doesn't report failure or something?14:31
dansmithsean-k-mooney: man, I think that's a legit failure, I just don't know why14:45
dansmithwe ssh into the guest and then we poll the console log a LOT of times before we give up and there's nothing in it14:46
dansmithAFAICT, that basically means we lost the console for that instance, maybe because we overwrote the file or it is hooked up to a different inode than what we're reading, like if we raced to create/ensure it or something14:47
bauzasdansmith: sean-k-mooney: we should possibly skip the test until we find the root cause15:03
dansmithbauzas: sean-k-mooney is going to try to repro locally and if not, then skip yeah15:03
bauzas++15:03
fricklerif an instance is shutoff, is the console log cleared? or should the log from the last run still be shown?15:03
dansmithfrickler: the test is specifically testing that it's not cleared15:05
dansmithhowever, there are circumstances where we read the log from a pty, which is supposed to be for container stuff15:05
dansmithbut if we're getting that on the lvm config because we try not to write files, then that could be a cause15:05
dansmithin which case the test is invalid (currently) for LVM and we were just racing before to pass it15:05
dansmithbut that is what we're going to check locally15:06
sean-k-mooneyfrickler: for our default config we write it to a file on disk (or rahter libvirt/qemu does)15:07
sean-k-mooneyso we normally would epxect it to still be there when its powered off15:07
sean-k-mooneybut it gets cleard via being overwtitten on the next vm start15:07
fricklerah, o.k., I would have expected it to get removed when the instances is shut down15:09
sean-k-mooneyfor the most part we just stop the domain and dont do any cleanup of files on shutdown 15:14
sean-k-mooneyand since this file is created by qemu or virtlogd depening on your libvirt config15:14
sean-k-mooneynova does not really mange it excpetion when you delete a vm15:15
sean-k-mooneyits created under the instance dir so it gets cleaned up when that is removed15:15
fricklermakes sense when you think about it, yes15:17
sean-k-mooneythe odd thing is the test has not changed in like 6 years or more15:23
sean-k-mooneyso why it started failing 3 days ago is beyond me15:23
melwittI was just about to ask, was this passing recently15:27
sean-k-mooneyya opensearch says it started failing on the 24th15:28
sean-k-mooneydan optimised the execution time by making some of the test run across multipel workers and that seams to have exposed an existing bug15:29
sean-k-mooneywether its a bug in nova or in the test is TBD15:29
melwittI can try that locally and see if it shows up for me15:30
sean-k-mooneyhence why we are repoducing it locally now. im just waiting on devstack to complete15:30
sean-k-mooneythe more the merrier15:30
melwittlord knows I got devstacks already available15:30
sean-k-mooneywell do you have one with lvm15:31
sean-k-mooneythis is only affecting the lvm job and ther emight eb some lvm specific logic at play15:31
melwitthm, no. would it be equivalent at all to do it with a boot from lvm volume?15:32
sean-k-mooneyi dont think so but its worth a try i guess if you have it to hand15:32
sean-k-mooneythe test is passing we thing in other jobs but ill admit i didnt actully check if it was running in other jobs15:32
melwittI'll check. given that the only condition appears to be CONF.compute_feature_enabled.console_output I'd expect it is running for other jobs15:33
melwittyeah it runs and passes in tempest-integrated-compute, for example15:35
sean-k-mooneyya it is15:35
sean-k-mooneyalso in nova-ceph-multistore 15:36
dansmithyeah, and also note that sean-k-mooney has proposed changes to it, 15:41
dansmithwhich may make it even more lvm-specific15:41
sean-k-mooney... typo in my local.conf, take 2 i guess15:47
melwittwell, I'm getting weird results in both the bfv and non bfv cases I get a 404 error when requesting the console log. not sure if that's just something wrong in my environment15:51
melwittmeaning, I know it's passing in non lvm jobs so why is it doing this for me15:52
sean-k-mooneynot that could be the same issue15:53
dansmiththe lvm jobs are lvm backend right?15:54
dansmithso really have nothing to do with a bfv instance15:55
sean-k-mooneyyes image_type=lvm15:55
melwittlike, the code is relying on the guest xml to get the console log path, but a stopped instance won't have xml, so how is this ever working?15:55
melwittI must be missing something15:56
sean-k-mooneyit does15:56
sean-k-mooneywe dont undefien the xml when we power off15:56
sean-k-mooneywe delete the domain and recreate it every tiem we start a domian but power off leaves it in place15:57
melwitthm ok15:57
sean-k-mooneywe use that fact to make vGPU work by reading the mdev path out of the xml15:57
dansmithyeah, it has xml when stopped15:58
dansmiththat's the "defined" vs. "active15:59
dansmith" xml15:59
melwittok, I'm not able to get the domain with 'virsh dumpxml [--inactive] <uuid>' so far I dunno how to look at it15:59
melwittgotcha15:59
sean-k-mooneydo virsh list --all16:00
dansmithyou need to use the id not the uuid16:00
melwittok, thanks I see them16:00
melwittid is blank but it has "name" so I'll try that16:00
dansmithoh for the inactive yeah16:00
sean-k-mooneyok stacked lets boot a vm and see what happened16:01
dansmithbut uuid is just metadata for libvirt AFAIK.. I've never known to be able to use that for virsh commands16:01
melwittyou can. I had been doing that to look at inactive domain xml16:02
melwittok got it. I wasn't using sudo 🙄 16:03
sean-k-mooneyok so i have the console in horizon16:03
sean-k-mooneyill check the host fs/xml and then stop it16:03
melwittuuid works with sudo too16:03
melwittok16:03
sean-k-mooneyso there is a console.log in the instanace dir16:04
sean-k-mooneyand it has data since i can cat it so  ill power it off now16:05
melwittsame16:05
melwittsame16:05
melwittmine has data after poweroff to16:05
melwittbut, in my case console_type == 'pty' and so there must be something weird happening with that16:05
melwittthe console log file path in the xml is correct and I can use it to open the log file16:06
sean-k-mooneyhttps://paste.opendev.org/show/bMa9I7WsUnEfsrMC6iQz/16:06
melwittis the console log get API working for you?16:06
sean-k-mooneyi did it in horizon but ill check in a sec on the cli16:07
sean-k-mooneyhorizon is now showing "Unable to get log for instance "ea890cd1-c3fe-4455-bcd9-b44ea830a136"."16:07
dansmithmelwitt: interesting, I started with libvirt before there were even non-active domains and definitely since before uuids were in the xml, so I guess 15-year habits die hard :)16:08
melwittfor mine there's helpfully no log messages that say anything about why/where it's running into a problem16:08
dansmithahh, console type is pty,16:08
dansmiththat's what I was wondering if it was happening16:08
dansmiththe comment says it's only for container types, but it would explain why we're losing console on shutdown16:09
dansmithso with pty, there's no storage of the console stuff once the other side goes away, like when shutdown16:09
melwittI feel like it has to be this failing: data = nova.privsep.libvirt.readpty(console_path)16:09
dansmithmelwitt: your console path is a file or a path to a pty?16:10
melwittit's a file. but it is also type pty. which I assume is not right?16:10
dansmithidk, to be honest16:10
sean-k-mooneyon a call by the way16:10
dansmiththe code looks like nova reads directly from the pty if the type is pty, so the file wouldn't be used16:10
dansmithah, wait,16:11
dansmithit looks like it reads from the pty and writes the file if type=pty16:11
melwittthe console files are there and are properly with data16:11
dansmithbut that would only work when it's running16:11
dansmithso I wonder if the test is now breaking because we never read the log before the instance is shutdown (which is what the test was supposed to be doing)16:12
melwittoh hm ok, it must be bc I queried for the console before I shut it down16:12
melwittI'll try not querying before16:12
dansmithif we did once, then it would write the file and then we could read the file after it's shutdown16:12
dansmithso maybe the test just needs to hit the console endpoint once after ssh'able and then shutdown, and verify that it still works after16:12
melwittyeah.. for me I still can't get it via the API despite the fact that the file is there and the file path matches what's in the xml16:13
dansmithis the domain inactive?16:13
melwittyes16:13
dansmithI would think the console element would go away partially while inactive,16:14
dansmithbut is that causing us to not read from the file?16:14
melwitt    <console type='pty'>16:14
melwitt      <log file='/opt/stack/data/nova/instances/682931e5-4563-4955-a552-526a31c6957a/console.log' append='off'/>16:14
melwitt      <target type='serial' port='0'/>16:14
melwitt    </console>16:14
melwittand that file is right. it just can't get it for whatever reason16:15
dansmithyeah16:15
dansmithlemme re-read this code, hang on16:15
dansmithokay so if it's a pty, we read anything from it we can and append to the log,16:15
dansmithI guess so that we're constantly pulling pty content and filling the log file, and then we just "should" be returning the log file contents for the requester16:16
dansmithmelwitt: can you just add some logs to those functions and see what's going on?16:16
melwittyeah16:16
sean-k-mooneyok call done16:58
sean-k-mooneyi was going to check the cli16:58
sean-k-mooneydid ye fined anything in the intrim16:58
sean-k-mooneythe content is still there on disk for what its worth16:58
sean-k-mooneystack@upstream-devstack:~/repos/devstack$ openstack console log show test17:00
sean-k-mooneyResourceNotFound: 404: Client Error for url: http://192.168.122.17/compute/v2.1/servers/ea890cd1-c3fe-4455-bcd9-b44ea830a136/action, Guest does not have a console available.17:00
sean-k-mooneyoh...17:03
sean-k-mooneyFetching instance by UUID {{(pid=1349614) get /opt/stack/nova/nova/compute/api.py:297817:03
sean-k-mooneyok that is not relevent its just https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2978-L298217:05
dansmithmelwitt: so to be clear, you've got regular guests with pty consoles yeah?17:05
sean-k-mooney[None req-0042b3bd-d199-4c9d-b8f2-f1ee3a2d444e demo admin] HTTP exception thrown: Guest does not have a console available.17:06
dansmithI wonder if this was the fix for the "console log grows forever" problem and my (our?) assumptions that it's still just a file on disk is old17:06
dansmithif that's so, I don't see how we'd ever pass that test once we remove the races, if we don't get the console log once while it's running17:06
sean-k-mooneyso there is a file on disk17:07
dansmithright, but the code in libvirt driver is feeding it from the pty each time it runs17:08
melwittok, so what's happening is the xml for <console> element is different after the instance is stopped17:09
melwittso the parsing it's doing is not finding the console path17:09
melwittbefore the instance is stopped, there's <source path='/dev/pts/3'/> under console. after the instance is stopped there is not17:11
melwittand it's looking for this ('pty', "./devices/console[@type='pty']/source[@path]", 'path')17:11
sean-k-mooneyhttps://paste.opendev.org/show/bU3Vb2zX0TgOqU42xBAV/17:11
sean-k-mooneyit is be we have the log file line in both cases17:12
sean-k-mooneyso we shoudl be usign that not source17:12
sean-k-mooneyor at least falling back to the log line if source is not found17:12
melwitthttps://paste.openstack.org/show/bnLH5aJxsHlzXdZS455s/17:13
melwittyeah, I think that's the bug17:13
melwittit's not falling back17:13
sean-k-mooneywell i think log is alwasy where to im not sure if the fallback is ever needed17:13
sean-k-mooneyi woudl be incliend to prefer log and fallback ot source if its not there17:14
melwittthe log file is always there. it's just using source only17:14
melwittit looks like the code intends to fall back to the file but it's not working17:15
sean-k-mooneywe also have https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4577C1-L4580C4317:15
sean-k-mooneywe are generating what that will be17:15
sean-k-mooneywe dont ever need to parse the xml17:15
dansmithso the test not waiting for sshable was actually the reason this never failed? because we were racing with the shutdown?17:15
melwitthah probably17:16
melwittI dunno17:16
dansmithI guess I don't know why this is only failing lvm still, unless it's just lvm being slower or faster about something?17:16
melwittyeah that's what I don't understand17:16
melwittI would have wondered if console type is not pty if not lvm but my setup is rbd and I get console type pty.. and the ceph job isn't failing this test iiuc17:17
dansmithright17:18
sean-k-mooneyi think its a pty when virtlogd is installed and running17:18
sean-k-mooneyqemu is using the pty to talk to virtlogd which is writing to the file17:18
sean-k-mooneyi think17:18
sean-k-mooneybut i dont see why we woudl ever need to read form that. we can simplfy this code by using 17:19
sean-k-mooney _get_console_log_path(instance)17:19
sean-k-mooneyto return the log file and just read directly form that17:19
melwittyeah I dunno if there is or was a reason for doing the reading from pty17:20
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4462-L448817:21
sean-k-mooneyso this is doub17:21
sean-k-mooneywe use   console_log = self._get_console_log_path(instance)17:21
sean-k-mooneyto get the path which is to the file but that not the path we check for at the start17:22
sean-k-mooneyoh actully17:22
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4487-L450217:23
sean-k-mooneywe read form the pty and write to the log and then read form the log again but this time without privsep17:23
melwittthis is like rube goldberg17:24
sean-k-mooneyok so if i tail the log file and start the vm it gets the stream17:24
sean-k-mooneyso we dont actully need to write to the logfile17:25
sean-k-mooneyit already has the data form the pty17:25
sean-k-mooneyand shutting off the instance dow not affect the data17:25
melwittwhat about if the compute host were to be rebooted? just trying to think if there's any case where we need the file17:26
sean-k-mooneythe file is beiing wrttien by qemu/virtlogd17:26
sean-k-mooneyits only written by use if we do a console log show17:27
melwittoh ok17:27
sean-k-mooneyso if we removed that code from nova it would already have the data17:27
sean-k-mooneyi did sudo tail -F console.log to confirm that17:28
dansmithsean-k-mooney: certainly we need to not be reading from that pty if virtlogd is already doing it17:28
sean-k-mooneywhen the vm starts it recreates the file17:28
dansmithso where in there are we only doing that if virtlogd is not?17:28
sean-k-mooneyso is this part of the xml generated by libvirt or us17:29
sean-k-mooneyi tought we were just setting the log location and libvirt filled in the rest based on if virtlogd was deployed and how it was configured17:29
dansmithnot sure that matters, since this is on get and doesn't know if we added it or libvirt itself did right?17:30
sean-k-mooneywell we are specifying the location of the file17:30
sean-k-mooneyso my point is i think we can ignore if its a pty or not17:30
sean-k-mooneyand alwasy just go to the file17:30
dansmithnot here.. here we're just looking at the state of the xml17:31
sean-k-mooneybut we dont need to look at the xml at all17:31
sean-k-mooneythats my point we have a function that takes the instance and returs the log path to use17:31
sean-k-mooneyits always called consol.log in the instnace dir17:31
dansmithI know that's your point, I'm just not sure it's right :)17:31
sean-k-mooneyok :)17:32
dansmithI want to know how we're not tripping over us reading from the pty when virtlogd already is17:32
sean-k-mooneyah good question17:32
dansmithI mean I guess if virtlogd is select()ing from it then it has already read everything we potentially could have from the pty each time we go to read from it, and we just get lucky that we never compete17:32
dansmithbut I would expect to occasionally see some corruption there17:33
dansmithand is virtlogd always a thing in libvirt now, or is it optional?17:33
dansmithand even still,17:33
dansmithit looks to me like we *are* always returning the contents of the file at the bottom of that function,17:33
sean-k-mooneyhttps://paste.opendev.org/show/bW8SNwHjfUFkOMv4gp5N/17:34
dansmithso why are we getting blank log output when the file on disk clearly has stuff in it?17:34
sean-k-mooneythat is the orginal xml we generate17:34
sean-k-mooneythere is no console element at all we just set the serial port and pass the log path17:34
dansmiththat has no console stuff in it and is not complete17:34
sean-k-mooneyno i just didn copy it all one sec17:34
dansmithbut I don't see what it matters what we generate17:35
sean-k-mooneyhttps://paste.opendev.org/show/bdwD3vAg5f6D6iNsIHk1/17:35
melwittit's relying on <source> in the xml to get the console path, so it gets no path at all17:35
sean-k-mooneywe are telling libvirt to do whatever it need to to make sure teh serial console is in a given file17:36
dansmithmelwitt: oh okay, gotcha you said that earlier I think17:36
dansmithokay, so if we ask for pty/logfile and rely on virtlogd now it sounds like17:36
sean-k-mooneyi think we shoudl be changing this code to only use that instaed17:36
dansmithso the comment in the code there about ptys says something about container types,17:36
dansmithpresumably because they will hang a process if something doesn't consume their pty17:36
* sean-k-mooney looks17:36
dansmithso perhaps that code is leftover from long ago before we moved to virtlogd everywhere, and that all needs to be ripped out?17:37
melwittI did but I know it's hard to visualize. here's my dumb print statements showing how it goes https://paste.openstack.org/show/bpovXD89OL2QzukI3BWi/17:37
sean-k-mooneyhe virt_types kvm and qemu are the only ones17:37
sean-k-mooney            # which create a dedicated file device for the console logging.17:37
sean-k-mooneyok 17:37
dansmithkeep reaing17:37
dansmith*reading17:37
sean-k-mooneyso the pty fallback is for other virt drivers17:37
sean-k-mooneynot driver virt types like lxc17:37
dansmithit says "like lxc"17:38
sean-k-mooneyyep17:38
dansmithyou said "not virt types like lxc"17:38
sean-k-mooneyso if virt_type not in (qemu,kvm)17:38
sean-k-mooneythen we need to read from the pty17:38
sean-k-mooneyand save it to a file17:38
sean-k-mooneybut for qemu/kvm we dont17:38
dansmithanyway, it makes me super nervous to mess with stuff like this without a very clear understanding of what is going on, and I *still* don't understand why this isn't always failing especially after sean-k-mooney's change to the test17:39
sean-k-mooneyso i think hte fix is to wrap that cod in an if/else based on the virt_type17:39
melwitt+1 to be really careful messing with this17:39
dansmithsean-k-mooney: why do those virt types not also benefit from virtlogd? if anything, moving to the pty and daemon would make qemu/kvm more like lxc17:39
sean-k-mooneyya im not sure why ceph is not exploding17:40
dansmithceph or any of the other jobns17:40
sean-k-mooneyso they might but i wonder if this comment predates the creation of virtlogd17:40
sean-k-mooneyi could try deploying with lxc and find out17:41
dansmiththe fact that we don't (that we know of) see corruption of console logs due to us sometimes reading from the pty also just makes me sus that this is not actually happening17:41
sean-k-mooneybefore virtlogd was a thing qemu used to write to the file itself17:41
dansmithsean-k-mooney: right that's my point. maybe this all needs to go, not preserve some of it for lxc17:41
dansmithsean-k-mooney: right I know, and virtlogd should have made them all effectively the same17:41
sean-k-mooneyok so there are a couple of thing i can try ill add some logs to the fucntion and see if we are hitting this17:42
sean-k-mooneythen i can spin up another vm with nova and lxc17:42
sean-k-mooneydo you know if stoping virtlogd would be enough to change libvirt behavior17:42
sean-k-mooneyi coudl also try that17:42
dansmithohhh hang on17:42
dansmithmelwitt: does the running instance have a source= tag in the console/pty element?17:43
dansmithif not, then it's never finding this console_type==pty thing because the xpath won't match17:43
melwittdansmith: it does after the initial create. I haven't checked if it does after I start it after stopping it. checking now17:44
dansmithokay I thought you said it didn't on stop17:44
sean-k-mooneyit does 17:44
sean-k-mooneyafter we stop it it goes away17:44
melwittdansmith: ? I said it has <source> before stop and then it does not have <source> after stop17:44
sean-k-mooneywhen we do a dumpxml we are not actully reading a file17:44
sean-k-mooneywe are geting the responce form the libvirt deamon17:44
dansmithso after stop we should be getting the actual file on disk without the pty messing it up17:45
sean-k-mooneyi dont know if the console is ever on the file on disk17:45
sean-k-mooneyya proably does anyoen know where that file is and we can check17:45
melwittif I start it again it has <source> agian17:46
sean-k-mooneyah its in /etc/libvirt/qemu/instance-00000001.xml17:46
melwitt*again17:46
dansmithokay, so17:46
dansmithafter we shutdown, 17:46
dansmithwe match neither pty or file, the path.exists fails and we just return "" for the console log17:46
dansmithis that the thinking?17:47
dansmith(sorry if I'm just being slow)17:47
melwitt(and console log get works again)17:47
dansmithI just so don't see how this is working for the non-lvm cases17:47
melwittno it's bailing before it ever gets to the log file part17:47
sean-k-mooneyso so the file on disk is not updated with the souce element when its running so its only updated in the trainient domain while its running17:48
dansmithmelwitt: right, so here right? https://github.com/openstack/nova/blame/master/nova/virt/libvirt/driver.py#L448317:48
melwittbecause it doesn't parse <source>. I don't think that's intended probably17:48
melwittno, like this https://paste.openstack.org/show/bpovXD89OL2QzukI3BWi/17:48
melwittfails (returns) before that17:49
dansmithmelwitt: shouldn't that return an error through the api then?17:49
sean-k-mooneyoh interesting https://termbin.com/0qcj17:49
dansmithtempest is getting blank logs not an error, AFAIK17:49
sean-k-mooneyya the error iss """AssertionError: '' is not true : Console output was empty."""17:51
dansmithunless we're not actually raising that, but I thought we were in one case you pasted earleir17:51
sean-k-mooneythe test is doing self.assertTrue(output, "Console output was empty.")17:51
dansmithwait what17:51
melwittyeah... looking to see if I can see how. for me I think it goes here and raises HTTPNotFound https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/console_output.py#L5817:51
sean-k-mooneyno i think the test is not checking the return code17:51
melwittso it must be something the client is doing?17:52
dansmithomh17:52
dansmithsean-k-mooney: it's checking to see that the console is actually empty in shutoff state/17:52
sean-k-mooneyno output will be true if its not the empty string17:53
melwitton the command line I get "ResourceNotFound: 404: Client Error for url: http://127.0.0.1/compute/v2.1/servers/68b8344b-10dc-4a84-90e4-ad8be7a2bb60/action, Guest does not have a console available."17:53
melwittosc17:53
dansmithwtf17:53
dansmithso I guess I totally misread the test17:54
sean-k-mooneyya thats what i get too17:54
sean-k-mooney40417:54
sean-k-mooneybut i think the 404 has an empy body17:54
dansmithI thought the test was asserting that the console was *still* available17:54
melwittyeah and it's bc 17:54
melwitt        except (exception.InstanceNotFound,17:54
melwitt                exception.ConsoleNotAvailable) as e:17:54
melwitt            raise webob.exc.HTTPNotFound(explanation=e.format_message())17:54
melwittwait what17:54
sean-k-mooneydansmith: the test is17:54
sean-k-mooneyhttps://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L211C7-L21317:55
dansmithit's not, it's asserting that the result is "console is empty"17:55
melwitt        """Test getting console output for a server in SHUTOFF status17:55
melwitt        Should be able to GET the console output for a given server_id17:55
melwitt        in SHUTOFF status.17:55
dansmithsean-k-mooney: L21317:55
frickleryou can add --debug to the osc call to verify the body17:55
melwittuh wtf self.assertTrue(output, "Console output was empty.")17:55
sean-k-mooneyno self.assertTrue takes an object to test and a message to return if it fails17:55
dansmithmelwitt: yeah I wonder if that means "doesn't fail"17:55
melwittthis is asserting the exact opposite of what the docstring says17:56
dansmithsean-k-mooney: ahhh, right okay17:56
dansmithnot assertEqual17:56
melwittohh ok right17:56
dansmithconfusing17:56
sean-k-mooneyright 17:56
sean-k-mooneythey shoudl not be useing assertTrue relaly in this case17:56
dansmithyeah for serious17:56
sean-k-mooneyas i said they are relying on the boolean convertion to true of a non empty string17:56
melwittyeah bc they're saying "I got an error string, it must be a console!"17:57
sean-k-mooneyright so the fact this is failing17:57
sean-k-mooneyis telliing me the 404 had an emptyp body17:57
sean-k-mooneyim going to get the request id and check the api log17:57
dansmithI wish frickler wasn't here watching this circus of three seasoned nova cores trying to figure out how one of the most basic calls works :/17:57
sean-k-mooneyit should have the content lenght17:57
melwittseasoned curlies17:58
* frickler hides17:58
dansmithfrickler: I'm kidding of course, just .. not our best moment :)17:58
fricklerwell I can't say I'm not enjoying it ;)17:59
dansmithlol17:59
melwittthis means not an empty string no?17:59
melwittDEBUG nova.api.openstack.wsgi [None req-2441eccd-ea63-4c9d-94a3-29aaf1ecf5c4 demo demo] Returning 404 to user: Guest does not have a console available. {{(pid=2011570) __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:936}}17:59
melwittI guess I should do osc --debug18:00
sean-k-mooneystatus: 404 len: 86 microversion: 2.91 time: 0.71921618:00
melwittRESP BODY: {"itemNotFound": {"code": 404, "message": "Guest does not have a console available."}}18:00
sean-k-mooney86 bytes might just be the header18:00
gmannstatus code is already checked in service client side so we can just remove this assert itself ?18:00
sean-k-mooneyoh actully no18:00
sean-k-mooneyRESP BODY: {"itemNotFound": {"code": 404, "message": "Guest does not have a console available."}}18:00
melwittso I think it really is going assertTrue("Guest does not have a console available.")18:01
sean-k-mooneyya but why whoudl that be false18:01
dansmithmaybe it's ... AI...18:01
melwittmaybe for lvm it's not returning a non empty string? I dunno why18:01
melwittAI yes18:01
sean-k-mooneyno that form lvm18:02
sean-k-mooneyi have it repoduced locally18:02
melwittoh... uh wtf 18:02
sean-k-mooneyi ran osc with --debug18:02
sean-k-mooneyso it pring the actul repsocne body18:02
gmannbut how come it reach to this assert if it return 404 ? it should have failed here https://github.com/openstack/tempest/blob/master/tempest/lib/api_schema/response/compute/v2_1/servers.py#L45618:03
sean-k-mooneythats the fully output if your interested form osc https://termbin.com/fq3i18:03
dansmithgmann: yeah I was looking to see if we were ignoring 404 or something but we're not18:04
melwittI don't get how this is passing though18:04
melwitt        lines = len(output.split('\n'))18:04
melwitt        self.assertEqual(lines, 3)18:04
gmannyeah, status code in resposne schema fail if different status code than expected and it does not proceed further18:04
* dansmith notes we now have four nova cores, one tempest core, and one devstack core...and zero understanding18:05
melwittthis is some kind of black magic I think18:05
dansmithobviously the solution is to remove the nova-lvm job18:05
melwittyeah. removing job and skipping tests makes things work :D18:06
gmann:)18:06
melwittwe definitely need more logging in that tempest test cause like... lol18:06
sean-k-mooneywe only validate if the respocne is scculs or redirection18:08
sean-k-mooneyhttps://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/common/rest_client.py#L101618:08
sean-k-mooneywe got a 40418:08
sean-k-mooneyso the cleint does not validate it18:08
sean-k-mooneythat why its not failing earlier18:08
melwittand there's no else: noice18:09
sean-k-mooneyyep we fall off the end of the function18:09
melwittVALID@18:09
melwittVALID!18:09
gmannsean-k-mooney: it should have failed here https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/common/rest_client.py#L85218:12
sean-k-mooney i assume _error_checker is a decortor or called else whewer 18:13
sean-k-mooneyi didnt see that in the funcitons i was looking at18:13
sean-k-mooneythats not called form action https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/services/compute/servers_client.py#L21818:14
sean-k-mooneyor get_console_output https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/services/compute/servers_client.py#L648-L65618:14
sean-k-mooneywhich is what we are calling https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L21118:14
gmannit is called for every API call from tempest https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/common/rest_client.py#L74218:16
sean-k-mooneyso its called form this post https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/services/compute/servers_client.py#L225C27-L225C3118:16
frickleronly success is validated https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/common/rest_client.py#L101618:16
sean-k-mooneyya but gmann  is correct18:17
sean-k-mooneyon https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/common/rest_client.py#L742 we shoudl go to _error_checker18:18
gmannthat is for schema only tempest validate status code and it is <400 then only it proceed to test otherwise fail in API call itself18:18
gmannthat->validate_response18:18
sean-k-mooneyok so the post will raise 18:19
sean-k-mooneyraise exceptions.NotFound(resp_body, resp=resp)18:20
gmanni cannot see where getConsoleOutput return 400, it was always 200 https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#4016318:20
frickleraction() doesn't use request, but post() https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/lib/services/compute/servers_client.py#L218-L23318:20
gmannand output is empty {"output": "" - https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#4016718:20
sean-k-mooneygmann: no melwitt  found where we convert it18:20
sean-k-mooneyin the api log and in the debug of osc its a 40418:21
gmannfrickler: every API call go through the request only either post, delete etc18:21
gmannsean-k-mooney: I am saying in tempest it get 200 and empty output right?18:22
sean-k-mooneyhttps://github.com/openstack/nova/blob/e5ee5e035c0c85654a139131d51be5c4df76f400/nova/api/openstack/compute/console_output.py#L56-L5818:22
sean-k-mooneygmann: i dont see how it could be getting a 200 in tempest18:23
gmannso _error_checker pass and self.assertTrue(output, "Console output was empty.") is valid assert right18:23
sean-k-mooneyi guess if we have the request id i can check in the job logs18:23
sean-k-mooneygmann: that is the assert that is failing i belive yes18:24
sean-k-mooneyyes see the bug https://bugs.launchpad.net/nova/+bug/202885118:24
gmannyeah I am checking those only18:24
sean-k-mooneyoh...18:25
gmannbecause this is alst call to get console from tempest which return 200 and empty output https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#4016318:26
sean-k-mooneythe isntance is still active18:26
sean-k-mooneyits ment to be stopped18:26
gmanneven every call on wait is same 200 and empty output https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#39248-4016318:27
gmannand while waiting for non empty console output, tempest timeout and raise exception from that assert https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/base.py#L34018:29
sean-k-mooney it should not have got there18:29
sean-k-mooneyhttps://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L727C7-L727C7918:29
sean-k-mooneythat was ment to be wating for the instance ot be shutoff18:29
sean-k-mooneyso i see 2023-07-27 14:03:23,945 91441 INFO     [tempest.common.waiters] State transition "ACTIVE/powering-off" ==> "SHUTOFF/None" after 1 second wait18:31
gmannsean-k-mooney: it does and server is in shutoff state only https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#3924618:31
gmannafter that only it check for the console output18:32
fricklerso maybe this is a race? I wonder if you sleep a bit after the wait-for-shutoff whether it would always fail with 40418:34
sean-k-mooneyso the server uuid is e526f14e-5878-4fab-ba8b-4f6c2456bd2 so we shoud lbe able to confirm the status code in the api logs18:34
sean-k-mooneyfrickler: i have the vm stopped on my local devstack and it fails with 404 yes18:35
sean-k-mooneyin ci it looks like its getting 200's for about 3 minutes18:36
sean-k-mooneybut i get a 404 basically imiedaly after i power it off18:36
fricklerhttps://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/screen-n-api.txt#3445018:38
fricklercan you run console log show in a loop while you power off? also are you on a lvm setup?18:38
sean-k-mooney i have it in a watch now18:41
sean-k-mooneybut its consitently 40418:41
sean-k-mooneyPOST /compute/v2.1/servers/e526f14e-5878-4fab-ba8b-4f6c2456bd2a/action => generated 14 bytes in 66 msecs (HTTP/1.1 200) 9 headers in 356 bytes (1 switches on core 0)   18:45
sean-k-mooneyin ci its getting a 20018:45
fricklerone other thing I don't understand is why this is wrapped in a wait_for() https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L72818:45
frickler_get_output() either asserts or returns nothing, right?18:46
sean-k-mooneyyour right18:47
sean-k-mooneyit never returns anything18:47
frickleroh, wait. _get_server acts on self.server_id not on temp_server_id18:47
gmannfrickler: sean-k-mooney dansmith melwitt: ahh I got the issue here. we are using different server in that tests not the one test create :( https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L21218:47
gmannand test use differnt server https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L723-L72818:47
fricklergmann: ^518:47
melwittwuuuut18:48
gmannfrickler: yeah18:48
gmanndiffeent server id in both place in log https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#39246-3924818:48
gmannfixing it18:48
gmannjust a 4 letter mistake and took so much time :)18:48
melwittwow. fr18:48
sean-k-mooneyoh we are not passign a server id to _get_output18:48
sean-k-mooneyi checked the ides that were in use18:49
sean-k-mooneybut i missed we did not pass it to _get_output18:49
gmannboth are different https://zuul.opendev.org/t/openstack/build/9890c9b53059410eabc0c61df0376217/log/controller/logs/tempest_log.txt#39246-3924818:49
sean-k-mooneygmann: but that does not explain why its working for ceph18:50
sean-k-mooneyoh unless the first vm18:50
sean-k-mooneyis not active18:50
sean-k-mooneyby the time we get to this?18:50
gmannsean-k-mooney: might be that different server used in get_console is created in setup so it has the console output18:50
sean-k-mooneyor that ya18:50
sean-k-mooneyits runing longer18:50
gmannyeah18:51
sean-k-mooneywell its more you only get the 404 if its stopped18:51
sean-k-mooneygmann: are you workign on a pathch to optionally pass the server id to _get_output18:52
gmannsean-k-mooney: yeah18:52
gmannrunning the test18:53
sean-k-mooneythe test sill still not be really correct18:53
fricklerthe test should start failing if this is fixed, because it should get 404s all the time18:54
fricklerso the question is whether nova is broken or the test18:55
sean-k-mooneyyep18:55
sean-k-mooneyboth kind of18:55
sean-k-mooneynova could be maded to work when the vm is stopped in this case18:55
sean-k-mooneybut the test is not correct today either18:55
gmannwhy ?18:55
gmannnova should return 404 or something if no actual console is get or maybe correct if empty console for shutoff VM is ok18:56
gmannbut I do not see why test is incorrect18:56
gmannor should we expect empty console for shutoff VM as valid case?18:56
sean-k-mooneybecause its going to be asserting that "ResourceNotFound: 404: Client Error for url: http://192.168.122.17/compute/v2.1/servers/ea890cd1-c3fe-4455-bcd9-b44ea830a136/action, Guest does not have a console available." is true18:56
sean-k-mooneythe test is ment to be asserting that you can retirve the concole log while the vm is stopped18:57
sean-k-mooneythat works on my home cloud which is using cinder BFV18:58
sean-k-mooneyso there is a bug in nova as this should work18:58
sean-k-mooneyif the test catches that then the test is ok but it shoudl be failing when using the correct server id18:59
sean-k-mooneyin the lvm job19:00
sean-k-mooneyso we shoudl retirage https://bugs.launchpad.net/nova/+bug/202885119:00
sean-k-mooneyits a valid bug for both nova and tempest 19:00
sean-k-mooneydansmith: melwitt: do you agree ^19:00
sean-k-mooneyon the nova side we either need to say the console shoudl always be aviable for all dirver when the server is off19:01
sean-k-mooneyor that test is invalid for nova-lvm19:01
melwittI'm a little confused but if the tempest test is passing with the correct server id with a 404 being returned from nova then the tempest test has a bug as well19:02
sean-k-mooneyno i think that will fail19:02
sean-k-mooneythe only tempest bug is the one gmann found where its not using the correct id19:02
melwittoh gotcha. yeah. I agree there is a bug on nova too19:02
gmannhttps://review.opendev.org/c/openstack/tempest/+/88989519:04
dansmith404 should mean "that path you gave me does not exist19:04
gmannmelwitt: sean-k-mooney yeah i will fail, I tested locally and got the 404 from nova, tempest failed 19:04
dansmith404 for a POST doing an action that doesn't return anything but for a guest that does exist is extremely wrong, IMHO19:05
dansmithespecially when the same POST url just worked for shutdown and will work a minute later for start up19:05
gmanndansmith: yes it says 404 Guest does not have a console available 19:05
dansmithright, that is incorrect HTTP behavior, IMHO19:05
sean-k-mooneydansmith: its kind of abusing 404 right now19:05
dansmithyeah19:06
sean-k-mooneybut i can see why someone might have chosen it19:06
sean-k-mooneyi.e. the log was nto found19:06
gmannmaybe 400 we should return19:06
sean-k-mooneyno its a 500 or simiall19:06
sean-k-mooneythis is a server side issue19:06
dansmithhow is it a 500?19:06
dansmithIMHO, it should be 20019:06
sean-k-mooneywell it shoudl be a 200 with the content19:07
gmannso until guest is shutoff, console not available is ok19:07
sean-k-mooneywhich is what i get with a bfv vm with cinder lvm19:07
sean-k-mooneyso with nova and lvm today we can only get the console while the vm is running19:08
dansmithsean-k-mooney: did I miss why lvm differs here?19:08
sean-k-mooneywith bfv we can get the console when the vm is stopped19:08
dansmithbecause I would say that the console should always return 200, even if the console is empty, unless we know that it can't have a console or something.. and even still, 404 is not the right thing to return (nor 500)19:08
sean-k-mooneydansmith: no im still not sure why that is the case although i cna try and compare to my home cloud19:08
dansmithwe have to figure that out, because I think that's wrong (as heck)19:09
sean-k-mooneydansmith: ya i would be ok with always returnning 20019:09
dansmithbut even still, 200 even if it's empty because the operation worked, this is what we have for console log, which may be nothing19:09
sean-k-mooneyi said 500 just ebcasue there is a bug in nova 19:09
dansmithif you have your kernel set to log to something other than tty0 it'll be blank too19:09
sean-k-mooneybut this is part of the interop tests right19:09
sean-k-mooneyso this cant have backend depent behavior19:09
gmannyeah empty is still a response we got the console19:10
gmannbut console not available? 19:10
gmann'console not available' should be error case right19:11
sean-k-mooneyit depends19:11
dansmithit should never be 404, that's for sure19:12
gmannbecause this fix will get the 404 on console not avaliable for the shutoff Guest https://review.opendev.org/c/openstack/tempest/+/88989519:12
dansmithbecause /action exists19:12
dansmithit can be an error maybe if we've never started the guest yet, or *maybe* if it was lost from disk somehow19:12
dansmithbut that's different than blank19:12
sean-k-mooneydansmith: i agree on the 404. the depends is more i belive its possibel to have a vm without any console19:12
gmannI think 400 is more appropriate here19:12
dansmithmaybe 410 Gone19:13
dansmithHonestly, 4xx means the client messed up and the client hasn't messed up here19:13
dansmiththere's also 425 Too Early19:13
dansmithand 428 Precondition Required19:13
gmannbut when server is not yet active and console not availble 410 is not right. for shutoff case where console was available but now it is not19:13
dansmithyeah, I really don't understand what use ever returning 4xx for this is19:14
gmanndansmith: but if request is before server is in active or in shutoff state it is client error right?19:14
dansmithmaybe on a windows guest where there is no such console?19:14
sean-k-mooneyso its not a clinet error so 4XX is incorrect19:14
dansmithgmann: I dunno, it's debatable I think19:14
dansmithmaybe if you call it before it's active,19:14
sean-k-mooneythe user should be able to expect the console to be ableiable i think if a console exits for the instnace19:14
dansmithbut since you shoudl be able to call it in SHUTOFF, that's kinda confusing to the user that doesn't care much19:14
gmannbut does nova says console can be available for before guest become active or in shutoff state?19:15
sean-k-mooneyyes the console can be avaiable in shutoff19:15
gmanndansmith: humm that is good point19:15
sean-k-mooneygmann: we have a bug in the livbirt drvier where it does not find it properly19:15
dansmithgmann: so maybe if the instance isn't scheduled to a host yet, then 400 makes sense19:15
dansmithbut after it's on a host, I dunno, seems like 200 empty is the right move always19:16
gmannyeah, it is difficult to make differentiation on those cases19:16
dansmithso instance.host==None, 4xxsomething, but otherwise, I'd say 200 empty19:16
gmann+119:16
dansmithI don't think there's any useful distinction between "nothing has ever been written to the console" and "the console log is gone"19:17
sean-k-mooneyhehe ok fun19:17
sean-k-mooneydansmith: so my boot form volume guest is not using a pty19:17
dansmithlike I said, *maybe* for a windows guest with no actual console, I dunno how that works really19:17
dansmitheven macos on uefi has a text console.. it's set to silent by default, but it's there19:17
gmannI think I got your point of action is passed so it should be 200 irrespective of what output it give19:17
dansmithgmann: yeah19:17
sean-k-mooneydansmith: https://paste.opendev.org/show/bvGBKyn5fHQthz8TO13C/19:18
dansmithsean-k-mooney: interesting19:18
sean-k-mooneydansmith: so its using tcp for the console19:18
sean-k-mooneyand the source is still there when its stoped19:18
dansmithyeah, I dunno why that is set that way, but I can't imagine it has much to do with being lvm19:19
sean-k-mooneythat becasue its kolla-ansible i think19:19
dansmithyeah *that* I'd believe :)19:19
sean-k-mooneyagain this is related to how libvirt is configured19:19
sean-k-mooneysince its generating that so its connecting to virtlogd over tcp in this case19:20
sean-k-mooneywell to qemu i guess not virtlogd19:20
sean-k-mooneythis deployment is using molitic libvirt19:21
sean-k-mooneydansmith: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L446419:22
sean-k-mooneyso if we are usign tcp we get the path form log19:22
sean-k-mooneynot source19:22
sean-k-mooneyso thats why that is working19:22
dansmithokay but virtlogd should hide this all from us and let us use the log file yeah?19:22
sean-k-mooneyyes for qemu/kvm19:23
sean-k-mooneyi would need to double check for lxc19:23
dansmithack19:23
sean-k-mooneyhttps://libvirt.org/formatdomain.html#consoles-serial-parallel-channel-devices is the relevent docs19:34
sean-k-mooneyRegardless of the type, character devices can have an optional log file associated with them. This is expressed via a log sub-element, with a file attribute. There can also be an append attribute which takes the same values described above. Since 1.3.3 .19:34
dansmiththat implies virtlogd works for everything yeah?19:34
dansmith(as I would expect)19:35
sean-k-mooneyimplies it yes btu does not actully say it19:36
sean-k-mooneyi also cant find anything that would imply how you tell libvirt to chooese the type19:37
sean-k-mooneyim going to check my kolla cloud incase it ahs something set in the libvirt configs19:37
sean-k-mooneyi suspect pty might be the default or the default in newer libvirts19:38
sean-k-mooneyah found it19:40
sean-k-mooney# The backend to use for handling stdout/stderr output from19:40
sean-k-mooney# QEMU processes.19:40
sean-k-mooney#19:40
sean-k-mooney#  'file': QEMU writes directly to a plain file. This is the19:40
sean-k-mooney#          historical default, but allows QEMU to inflict a19:40
sean-k-mooney#          denial of service attack on the host by exhausting19:40
sean-k-mooney#          filesystem space19:40
sean-k-mooney#19:40
sean-k-mooney#  'logd': QEMU writes to a pipe provided by virtlogd daemon.19:40
sean-k-mooney#          This is the current default, providing protection19:40
sean-k-mooney#          against denial of service by performing log file19:40
sean-k-mooney#          rollover when a size limit is hit.19:40
sean-k-mooney#19:40
sean-k-mooney#stdio_handler = "logd"19:40
sean-k-mooneymy kolla cloud is using file19:41
dansmitheesh man, pastebin c'mon19:41
dansmiththat's libvirt config?19:41
sean-k-mooney:) sure ya that /etc/libvirt/qemu.conf19:41
dansmithoh, qemu.conf, interesting19:42
dansmithwell, I dunno what to say19:42
dansmithwe don't support lxc19:42
dansmithI don't want to break it for no reason, but... unless someone is going to try out a change...19:42
sean-k-mooneyill try chanign my lvm devstack to file and see if it changes to tcp19:42
sean-k-mooneyno still using pty 19:46
sean-k-mooneyok im going ot leave it there today19:46
gmannthis test was wrong since starting and not sure how https://review.opendev.org/c/openstack/tempest/+/889109 unhide it on lvm and made it always failing19:47
gmann889109 does not change anything from this test perspective19:47
frickleroh, interesting, iirc there is still a kolla bug about dos via console log filling the disk. I need to look into that19:47
sean-k-mooneymy cloud is not fully up to date19:48
fricklergmann: interaction with other tests in the class?19:48
dansmithfrickler: I doubt it19:48
sean-k-mooneybut on the kolla sid i jsu tnotice that libvirtd.conf is configured to connect to libvirt with sasl over tcp19:48
dansmithit may be because this change does increase the load with more parallelism,19:48
gmannfrickler: humm, no, it recreate the server in setup and keep it SSHABLE19:48
dansmithhowever this test class was usually running early on enough that all the workers were slammed with work19:48
dansmithI looked at a *lot* of stackviz reports before doing that :)19:49
sean-k-mooneyhttps://paste.opendev.org/show/bZMoOYeTqbcHyr79ZDwS/19:49
gmanndansmith: i doubt, server was active always in any parallel case in setup so it should have been working in same way it was working before19:50
dansmithgmann: I agree, I'm just saying load can be higher with the split, but that was the point19:50
dansmithI just rechecked sean-k-mooney's patch to get another data point.. i.e. to see if it's still only lvm and only that test19:51
gmannanyways as 889895 corrent the tests and make it fail always. I am going to skip this test for now until we figure out the nova behavior in 'guest shutoff' case 19:51
gmann*coprrect19:51
dansmithack19:52
*** elodilles is now known as elodilles_pto20:35
fricklermaybe this is related, since it has been moved to a different class https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/servers/test_server_actions.py#L332-L33721:10
frickleralso expected failures on the tempest patch going back to yoga, so not a recent regression at least21:10
gmannbut that was same before moving to different class also. this test s no issue, test_get_console_output_server_id_in_shutoff_status started failing after refactoring and not sure why21:22
gmannor maybe you are right. the execution order of test in same class might be making the server console available before test_get_console_output_server_id_in_shutoff_status  execute21:24
dansmithhow?21:29
dansmitheach class gets its own server to work on, and the tests within a class are single-threaded. if it matters now, it mattered before, so it must have already been broken21:30
gmannone chance I see if that self.server_id server get through many operation before test_get_console_output_server_id_in_shutoff_status   was executed before so console of that self.server_id was somehow available but now as it is in separate class, self.server_id is kind of fresh server and fail ?21:37
gmannidk but that is something might be happening, i cannot find any other reasoning on this21:37
gmanncorrecting and skipping this test for now https://review.opendev.org/c/openstack/tempest/+/88989521:50
dansmithyeah I don't think so.. at one point I was wondering if we had to call get_console once to process the pty data and start the log file, but with virtlogd that should definitely not be the case22:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!