sean-k-mooney | gmann: efried so it looks like its failing ot install os-testr | 00:05 |
---|---|---|
sean-k-mooney | http://paste.openstack.org/show/788663/ | 00:06 |
sean-k-mooney | or rather willl install os tester that try to replace PyYAML | 00:06 |
*** jmlowe has joined #openstack-nova | 00:11 | |
*** derekh has quit IRC | 00:16 | |
*** jmlowe has quit IRC | 00:27 | |
*** mlavalle has quit IRC | 00:29 | |
*** jmlowe has joined #openstack-nova | 00:35 | |
*** TxGirlGeek has quit IRC | 00:36 | |
*** jmlowe has quit IRC | 00:47 | |
*** mkrai_ has joined #openstack-nova | 01:43 | |
*** gyee has quit IRC | 02:04 | |
*** mkrai__ has joined #openstack-nova | 02:05 | |
*** Liang__ has quit IRC | 02:06 | |
*** mkrai_ has quit IRC | 02:08 | |
*** TxGirlGeek has joined #openstack-nova | 02:51 | |
*** gentoorax has quit IRC | 02:55 | |
openstackgerrit | melanie witt proposed openstack/nova stable/pike: Avoid redundant initialize_connection on source post live migration https://review.opendev.org/683008 | 02:55 |
*** jkulik has quit IRC | 03:00 | |
*** mvkr has quit IRC | 03:02 | |
*** gentoorax has joined #openstack-nova | 03:11 | |
*** mvkr has joined #openstack-nova | 03:13 | |
*** jkulik has joined #openstack-nova | 03:26 | |
*** nweinber__ has joined #openstack-nova | 03:42 | |
*** nweinber__ has quit IRC | 04:02 | |
*** udesale has joined #openstack-nova | 04:04 | |
*** mkrai__ has quit IRC | 04:12 | |
*** mkrai_ has joined #openstack-nova | 04:12 | |
*** links has joined #openstack-nova | 05:16 | |
*** artom has quit IRC | 05:28 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #openstack-nova | 05:34 | |
*** alex_xu has quit IRC | 05:41 | |
*** rchurch has quit IRC | 05:44 | |
*** Jeffrey4l has quit IRC | 05:45 | |
*** Jeffrey4l has joined #openstack-nova | 05:46 | |
*** rchurch has joined #openstack-nova | 05:47 | |
frickler | so I started to fix the live-migration job at https://review.opendev.org/703735 , but a better solution would be to make os-testr work with system pyyaml. the latted is depended on by e.g. cloud-init so it's very hard to remove. or set up using os-testr from within a venv only | 06:02 |
frickler | also, has anyone started work to move that job to zuul v3? | 06:03 |
*** TxGirlGeek has quit IRC | 06:06 | |
*** ociuhandu has joined #openstack-nova | 06:15 | |
*** ociuhandu has quit IRC | 06:20 | |
*** ratailor has joined #openstack-nova | 06:28 | |
*** TxGirlGeek has joined #openstack-nova | 06:36 | |
*** TxGirlGeek has quit IRC | 06:45 | |
*** vishalmanchanda has quit IRC | 07:24 | |
*** ivve has quit IRC | 07:30 | |
*** rpittau|afk is now known as rpittau | 07:37 | |
*** lpetrut has joined #openstack-nova | 07:47 | |
*** yaawang has quit IRC | 07:50 | |
*** slaweq_ has joined #openstack-nova | 07:59 | |
*** tesseract has joined #openstack-nova | 08:08 | |
*** maciejjozefczyk_ has joined #openstack-nova | 08:08 | |
*** tesseract has quit IRC | 08:09 | |
*** tesseract has joined #openstack-nova | 08:09 | |
*** bnemec has joined #openstack-nova | 08:09 | |
*** tkajinam has quit IRC | 08:10 | |
*** davee_ has quit IRC | 08:14 | |
*** davee_ has joined #openstack-nova | 08:14 | |
*** mrch_ has joined #openstack-nova | 08:16 | |
*** ratailor has quit IRC | 08:21 | |
*** ratailor has joined #openstack-nova | 08:25 | |
*** tosky has joined #openstack-nova | 08:27 | |
*** xiaolin has quit IRC | 08:30 | |
*** slaweq_ has quit IRC | 08:33 | |
*** xek_ has joined #openstack-nova | 08:34 | |
*** mrch has joined #openstack-nova | 08:35 | |
*** xek has quit IRC | 08:37 | |
*** ivve has joined #openstack-nova | 08:37 | |
*** mrch_ has quit IRC | 08:38 | |
*** ralonsoh has joined #openstack-nova | 08:51 | |
*** jaosorior has joined #openstack-nova | 08:59 | |
*** brinzhang has quit IRC | 09:08 | |
*** alex_xu has joined #openstack-nova | 09:14 | |
*** mvkr has quit IRC | 09:23 | |
*** martinkennelly has joined #openstack-nova | 09:26 | |
*** mvkr has joined #openstack-nova | 09:37 | |
*** dtantsur|afk is now known as dtantsur | 09:48 | |
*** maciejjozefczyk_ is now known as maciejjozefczyk | 10:10 | |
*** derekh has joined #openstack-nova | 10:13 | |
*** ociuhandu has joined #openstack-nova | 10:30 | |
*** ociuhandu has quit IRC | 10:31 | |
*** ociuhandu has joined #openstack-nova | 10:31 | |
*** damien_r has joined #openstack-nova | 10:32 | |
*** damien_r has quit IRC | 10:33 | |
*** iurygregory has quit IRC | 10:36 | |
stephenfin | frickler: I assume "install everything devstack'y in a virtualenv" is much harder than it seems | 10:47 |
stephenfin | frickler: and yes, mriedem has started work moving to zuul v3 and I think sean-k-mooney is continuing that | 10:47 |
bauzas | stephenfin: gibi: worth looking at an already-approved spec that I could work on in Ussuri ? https://review.opendev.org/#/c/702943/ | 10:51 |
*** damien_r has joined #openstack-nova | 10:56 | |
stephenfin | bauzas: Sure. Got some downstream release'y stuff to take care of this morning first though | 10:58 |
bauzas | hah | 10:58 |
bauzas | your turn | 10:58 |
*** pcaruana has joined #openstack-nova | 11:02 | |
*** dviroel has joined #openstack-nova | 11:03 | |
*** ociuhandu has quit IRC | 11:15 | |
*** mkrai_ has quit IRC | 11:17 | |
*** ociuhandu has joined #openstack-nova | 11:27 | |
*** ratailor has quit IRC | 11:30 | |
*** mvkr has quit IRC | 11:30 | |
openstackgerrit | John Garbutt proposed openstack/nova-specs master: Small fixes to unified limits spec https://review.opendev.org/703773 | 11:36 |
*** rpittau is now known as rpittau|bbl | 11:41 | |
*** ociuhandu has quit IRC | 11:42 | |
*** mvkr has joined #openstack-nova | 11:43 | |
*** tbachman has quit IRC | 11:44 | |
*** ccamacho has joined #openstack-nova | 11:45 | |
*** mkrai_ has joined #openstack-nova | 11:55 | |
*** ccamacho has quit IRC | 11:55 | |
*** ccamacho has joined #openstack-nova | 11:55 | |
*** yan0s has joined #openstack-nova | 11:58 | |
*** ociuhandu has joined #openstack-nova | 12:00 | |
*** udesale_ has joined #openstack-nova | 12:04 | |
*** udesale has quit IRC | 12:07 | |
*** ociuhandu has quit IRC | 12:08 | |
*** ociuhandu has joined #openstack-nova | 12:09 | |
*** ratailor has joined #openstack-nova | 12:12 | |
*** ratailor has quit IRC | 12:12 | |
*** ivve has quit IRC | 12:12 | |
*** udesale_ has quit IRC | 12:21 | |
*** udesale has joined #openstack-nova | 12:21 | |
*** jawad_axd has joined #openstack-nova | 12:24 | |
*** iurygregory has joined #openstack-nova | 12:25 | |
*** iurygregory_ has joined #openstack-nova | 12:29 | |
gibi | it seems I need to spend my time today on downstream issues | 12:30 |
*** iurygregory has quit IRC | 12:30 | |
*** iurygregory_ has quit IRC | 12:32 | |
*** zbr has quit IRC | 12:34 | |
*** iurygregory has joined #openstack-nova | 12:34 | |
*** zbr has joined #openstack-nova | 12:36 | |
*** ivve has joined #openstack-nova | 12:38 | |
*** iurygregory has quit IRC | 12:39 | |
*** iurygregory has joined #openstack-nova | 12:40 | |
*** damien_r has quit IRC | 12:40 | |
*** zbr has quit IRC | 12:46 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: virt: Provide block_device_info during rescue https://review.opendev.org/700811 | 12:47 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Add support for stable device rescue https://review.opendev.org/700812 | 12:47 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: docs: Add stable device rescue docs https://review.opendev.org/700837 | 12:47 |
*** zbr has joined #openstack-nova | 12:49 | |
*** mkrai_ has quit IRC | 13:08 | |
*** jaosorior has quit IRC | 13:12 | |
*** jaosorior has joined #openstack-nova | 13:12 | |
*** damien_r has joined #openstack-nova | 13:13 | |
*** ociuhandu has quit IRC | 13:14 | |
*** xek_ is now known as xek | 13:15 | |
*** ociuhandu has joined #openstack-nova | 13:18 | |
*** rpittau|bbl is now known as rpittau | 13:20 | |
*** ociuhandu has quit IRC | 13:22 | |
*** mkrai_ has joined #openstack-nova | 13:22 | |
*** zbr has quit IRC | 13:26 | |
*** zbr has joined #openstack-nova | 13:30 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova.image.api' module https://review.opendev.org/702451 | 13:54 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP: nova-net: Remove unused nova-network objects https://review.opendev.org/697156 | 13:54 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: nova-net: Update API reference guide https://review.opendev.org/703796 | 13:54 |
*** rcernin has quit IRC | 14:03 | |
*** ociuhandu has joined #openstack-nova | 14:18 | |
*** lbragstad has quit IRC | 14:23 | |
*** links has quit IRC | 14:23 | |
*** nweinber__ has joined #openstack-nova | 14:26 | |
*** lbragstad has joined #openstack-nova | 14:28 | |
*** Liang__ has joined #openstack-nova | 14:33 | |
*** eharney has joined #openstack-nova | 14:34 | |
*** ociuhandu has quit IRC | 14:40 | |
openstackgerrit | sean mooney proposed openstack/os-vif master: move os-vif-ovs to be a non legacy job. https://review.opendev.org/701601 | 14:40 |
openstackgerrit | sean mooney proposed openstack/os-vif master: Revert "[Follow Up] OVS DPDK port representors support" https://review.opendev.org/703672 | 14:40 |
*** ociuhandu has joined #openstack-nova | 14:41 | |
*** TxGirlGeek has joined #openstack-nova | 14:51 | |
*** priteau has joined #openstack-nova | 14:58 | |
*** mkrai_ has quit IRC | 14:58 | |
*** Liang__ is now known as LiangFang | 14:59 | |
*** vishalmanchanda has joined #openstack-nova | 14:59 | |
*** damien_r has quit IRC | 15:03 | |
efried | sean-k-mooney, gmann: So is the PyYAML thing in n-l-m sorted yet or still busted? | 15:04 |
*** LiangFang has quit IRC | 15:05 | |
openstackgerrit | sean mooney proposed openstack/nova stable/stein: Remove 'test_cold_migrate_with_physnet_fails' test https://review.opendev.org/702971 | 15:06 |
openstackgerrit | sean mooney proposed openstack/nova stable/stein: Block rebuild when NUMA topology changed https://review.opendev.org/702972 | 15:06 |
openstackgerrit | sean mooney proposed openstack/nova stable/stein: Disable NUMATopologyFilter on rebuild https://review.opendev.org/702973 | 15:06 |
openstackgerrit | sean mooney proposed openstack/nova stable/stein: FUP for in-place numa rebuild https://review.opendev.org/702974 | 15:06 |
frickler | efried: busted, need a fix in devstack and ds-gate each | 15:06 |
sean-k-mooney | efried: its breaking os-vifs 1 remaining legacy job too | 15:07 |
sean-k-mooney | frickler: i assume ds-gate is installing os-testr or something like that | 15:07 |
*** mrch has quit IRC | 15:07 | |
sean-k-mooney | the non legacy versions seem to be fine but its the installation of os-testr that is trigering PyYAML to be reinstalled | 15:08 |
*** jawad_axd has quit IRC | 15:08 | |
*** jawad_axd has joined #openstack-nova | 15:09 | |
efried | I can't tell if https://review.opendev.org/#/c/561597/ will fix it temporarily until https://review.opendev.org/#/c/703735/ can be landed. | 15:09 |
efried | frickler: ^ ? | 15:10 |
frickler | efried: the former isn't the fix, but the cause of the new failures | 15:10 |
*** damien_r has joined #openstack-nova | 15:10 | |
efried | oh? I was seeing them before that patch landed | 15:11 |
efried | I think | 15:11 |
frickler | fixes are https://review.opendev.org/703792 and https://review.opendev.org/#/c/703271/4 | 15:11 |
frickler | oh https://review.opendev.org/#/c/703735/ is the fix, 703271 is just the bottom of that stack | 15:12 |
*** ociuhandu has quit IRC | 15:13 | |
sean-k-mooney | frickler: i was wondering if a better fix might be to start running devstack with USE_VENV=True | 15:13 |
sean-k-mooney | that shoudl prevent conflict with site packages right | 15:13 |
efried | frickler: thanks. Who can approve these? | 15:14 |
sean-k-mooney | however if we wanted to keep the cross project compatblity checking we might want a USE_SHARED_VENV option too | 15:14 |
frickler | sean-k-mooney: clarkb made some tests with USE_VENV, I don't think he got it into a working state | 15:15 |
sean-k-mooney | it tried it last night and it got pretty far but it failed to start glance | 15:16 |
frickler | efried: I pinged some folks in #-qa already, but devstack cores are rare these days. | 15:16 |
*** jawad_axd has quit IRC | 15:16 | |
sean-k-mooney | but yes i have not used it beforfe because its not tested in the gate | 15:16 |
efried | frickler: is there any value to me opening a bug? Or would that just be unnecessary red tape cause now we have to mark commit messages etc? | 15:17 |
*** noonedeadpunk has left #openstack-nova | 15:17 | |
frickler | efried: the patches are in place and seem to work, so I don't see any value in a bug | 15:17 |
efried | ack | 15:17 |
efried | sean-k-mooney: are you tracking the effort to migrate n-l-m to zv3? Or is that gmann? Clearly we're going to keep running into this kind of nonsense, and we've been suffering with the gzip thing. | 15:18 |
frickler | unless someone wants to pick up mriedem's task of feeding elastic-recheck | 15:19 |
sean-k-mooney | i dont have a bug for that currently. but i can file one | 15:21 |
*** Sundar has joined #openstack-nova | 15:21 | |
*** martinkennelly has quit IRC | 15:22 | |
sean-k-mooney | i have been tied up with a backport to queens of the in place numab rebuild that i need to get done by friday to do a backport of a feature downstream so i have not had time to work on the live migration job yet | 15:22 |
*** tbachman has joined #openstack-nova | 15:27 | |
*** priteau has quit IRC | 15:28 | |
*** tbachman has quit IRC | 15:28 | |
*** martinkennelly has joined #openstack-nova | 15:28 | |
*** tbachman has joined #openstack-nova | 15:29 | |
sean-k-mooney | efried: https://bugs.launchpad.net/nova/+bug/1860573 | 15:29 |
openstack | Launchpad bug 1860573 in OpenStack Compute (nova) "Nova legacy jobs should be ported to zuul v3 native jobs" [High,Triaged] | 15:29 |
*** TxGirlGeek has quit IRC | 15:32 | |
*** luksky has joined #openstack-nova | 15:33 | |
efried | thanks sean-k-mooney | 15:34 |
efried | gmann: ^ for your subscribing enjoyment | 15:35 |
*** ociuhandu has joined #openstack-nova | 15:47 | |
*** artom has joined #openstack-nova | 15:49 | |
*** TxGirlGeek has joined #openstack-nova | 16:04 | |
*** TxGirlGeek has quit IRC | 16:06 | |
*** TxGirlGeek has joined #openstack-nova | 16:07 | |
*** TxGirlGeek has quit IRC | 16:10 | |
*** ociuhandu has quit IRC | 16:11 | |
*** TxGirlGeek has joined #openstack-nova | 16:13 | |
*** ociuhandu has joined #openstack-nova | 16:15 | |
*** gyee has joined #openstack-nova | 16:16 | |
gmann | efried: ACK. | 16:16 |
gmann | devstack change is +A too | 16:16 |
*** iurygregory has quit IRC | 16:16 | |
*** mlavalle has joined #openstack-nova | 16:18 | |
*** iurygregory has joined #openstack-nova | 16:19 | |
*** mkrai_ has joined #openstack-nova | 16:20 | |
*** ociuhandu has quit IRC | 16:20 | |
*** dtantsur is now known as dtantsur|afk | 16:23 | |
*** mriedem has joined #openstack-nova | 16:24 | |
*** yan0s has quit IRC | 16:25 | |
*** ivve has quit IRC | 16:28 | |
*** tosky has quit IRC | 16:31 | |
*** TxGirlGeek has quit IRC | 16:42 | |
*** mkrai_ has quit IRC | 16:46 | |
*** luksky has quit IRC | 16:47 | |
*** TxGirlGeek has joined #openstack-nova | 16:49 | |
*** bnemec has quit IRC | 16:51 | |
kashyap | sean-k-mooney: Know where is the actual failure here? - https://zuul.opendev.org/t/openstack/build/ce5c1b08a48348bcad0d27dd05b0d076 | 16:53 |
kashyap | sean-k-mooney: It says "FAILED with status: 2" | 16:53 |
* kashyap is trying to wade through the files to find it... | 16:53 | |
kashyap | Maybe this - https://0aa0d36168d68dde4230-9fa499072a9f8bf63e024cc09284603e.ssl.cf5.rackcdn.com/616603/15/check/nova-live-migration/ce5c1b0/job-output.txt | 16:54 |
kashyap | Yep | 16:54 |
*** ociuhandu has joined #openstack-nova | 16:56 | |
efried | kashyap: n-l-m jobs are failing with PyYAML bs. | 16:56 |
efried | Fixes in devstack and devstack-gate are merging | 16:57 |
efried | https://review.opendev.org/#/c/703735/ and its dep | 16:57 |
*** iurygregory has quit IRC | 16:57 | |
kashyap | efried: Ah, thank you, good sir | 16:57 |
*** TxGirlGeek has quit IRC | 16:58 | |
efried | if you want to see the real failure, you have to look in devstacklog.txt.gz | 16:58 |
efried | which you have to unzip | 16:58 |
efried | which is a pita | 16:58 |
efried | and will go away once we migrate n-l-m to zuulv3 | 16:58 |
efried | which is nontrivial | 16:59 |
efried | but in the works | 16:59 |
efried | but has been for a while | 16:59 |
kashyap | (Okay, I quoted you on the Nova change.) | 16:59 |
kashyap | efried: Yeah, that's the "main script" the job-output.txt moans about. Thank you | 16:59 |
efried | but we've been essentially broken by it almost continuously for a couple of weeks. | 16:59 |
*** TxGirlGeek has joined #openstack-nova | 16:59 | |
kashyap | Hmm | 17:00 |
efried | kashyap: If you have any ability/time to help with that migration, it would be much appreciated. | 17:01 |
efried | I've been avoiding digging in because I would be starting off so far behind the curve that by the time I figured out how to get started someone else would (hopefully) already have it figured out. | 17:01 |
kashyap | (Afraid, not this week, preparing for a work conf on Fri) | 17:02 |
efried | not that that would be time wasted for me, but it terms of priorities, I need to be doing other things. | 17:02 |
efried | okay, it was worth a shot :P | 17:02 |
kashyap | efried: Yeah, completely understand | 17:02 |
kashyap | Can't dig into every mom-n-pop failure :D | 17:02 |
kashyap | This channel is logged, /me should mind his language | 17:03 |
melwitt | kashyap: I'm just reminded, we'd appreciate your review on this proposed revert https://review.opendev.org/703596 | 17:03 |
efried | unfortunately the mom-n-pop failures have been what's killing us, and consuming a bunch of my time. | 17:03 |
*** rpittau is now known as rpittau|afk | 17:03 | |
kashyap | melwitt: Hi, /me clicks | 17:03 |
efried | mostly me floundering around trying to find someone who can help with them. | 17:03 |
kashyap | efried: :-( | 17:03 |
kashyap | melwitt: To start with, that commit doesn't even tell why on god's green earth it is trying to revert | 17:05 |
* kashyap wades through the comments | 17:05 | |
melwitt | kashyap: take that up with sean-k-mooney lol | 17:05 |
*** udesale has quit IRC | 17:05 | |
kashyap | I despise "naked commit messages | 17:05 |
kashyap | s/commit/commit"/ | 17:05 |
kashyap | :D | 17:05 |
melwitt | me too | 17:05 |
sean-k-mooney | kashyap: it is what you get form the ui by default | 17:06 |
kashyap | Okay, Sean comments on PS1 | 17:06 |
sean-k-mooney | but i left some comment on the backport then fielded the reviert | 17:06 |
kashyap | sean-k-mooney: It just doesn't make sense at all to do CPU comparison check on AArch64 | 17:06 |
sean-k-mooney | that is incorrect | 17:06 |
kashyap | melwitt: I documented "why" on the main change | 17:06 |
sean-k-mooney | we have the info in /proc/cpuinfo | 17:06 |
sean-k-mooney | so we can check the cpu flags and cpu model | 17:07 |
kashyap | sean-k-mooney: I relied on the expertise on the libvirt dev who wrote that code. Want to argue with him? | 17:07 |
sean-k-mooney | sure | 17:07 |
kashyap | And hell, even AArch64 folks *themselves* told that KVM guests are to be run via 'host-passthrough' | 17:07 |
sean-k-mooney | yes | 17:07 |
kashyap | s/told/tell/ | 17:08 |
sean-k-mooney | but host-passthough means that the guest will see the host cpu flags and model | 17:08 |
kashyap | Yes, so what? | 17:08 |
sean-k-mooney | im not arguing that we should not use host-passthough | 17:08 |
kashyap | That's the whole point in this case | 17:08 |
sean-k-mooney | im am saying when you do its more important to ensur the flags and modle do not change | 17:08 |
kashyap | melwitt: sean-k-mooney: Let me quote my message from the main change, since I don't know what people have read or haven't | 17:08 |
kashyap | [quote] | 17:08 |
kashyap | So it doesn't make sense to do CPU compatability check on AArch64. And the AArch64 folks themselves recommend that the way to run KVM guests on AArch64 is via 'host-passthrough'. | 17:08 |
kashyap | It is true that libvirt does not know how to detect host CPU model on AArch64, but even if it _wants_ to know, it cannot, because even the `/proc/cpuinfo` on AArch64 doesn't show anything interesting. There are lots of vendors making different AArch64 CPUs, and they are not easily comparable. They all differ in various ways. (This is also confirmed by Jiri Denemark of libvirt.) | 17:09 |
kashyap | [/quote] | 17:09 |
kashyap | sean-k-mooney: Zooming out: what prompted you to revert? | 17:09 |
sean-k-mooney | kashyap: that may have been true when they look or on a rhel kerenl but /proc/cpuinfo has cpu Flags and the cpu modle name | 17:10 |
sean-k-mooney | however they have different capitalistation then x86 | 17:10 |
sean-k-mooney | kashyap: i saw the backport and i consider it to be a regerssion | 17:10 |
bauzas | stephenfin: so, once https://review.opendev.org/#/c/703568/ do we still have gate problems with the functional tests ? | 17:11 |
* bauzas was barely paying attention to this chan | 17:11 | |
sean-k-mooney | kashyap: there have been some patches to lscpu to allow it to parse the different way the flags and model are reported on aarch64. i suspect that libvirt just need to have the same change applied to its parsing | 17:12 |
sean-k-mooney | kashyap: well actully the lscpu patch merged back in 2018 | 17:12 |
kashyap | My brain is starved of food, and my chest is aching after this bizarre bike fall. So I can't parse you fully yet | 17:19 |
kashyap | melwitt: sean-k-mooney What I know we shouldn't merge that revert, IMHO | 17:19 |
melwitt | thanks kashyap | 17:20 |
kashyap | melwitt: sean-k-mooney: Until libvirt gets the parsing correct and libvirt/QEMU CPU modelling maintainers give guidance, we should stick with current Nova behaviour. | 17:20 |
sean-k-mooney | i dissagree with that but if im being over ruled fine | 17:21 |
*** ociuhandu has quit IRC | 17:22 | |
sean-k-mooney | with the current patch its likely that after a migration if the guess is rebooted the cpu flags and model might change | 17:22 |
sean-k-mooney | its also not clear to me if there is a risk of the guest crashign if it trys to use a instuction not supported on the dest host | 17:23 |
sean-k-mooney | given the info regarding cpu model and cpu flags is availabel in /proc/cpuinfo a correct fix in my view would have been to validate those | 17:24 |
sean-k-mooney | i walso woudl have expected a workaround config optin for this but im not planing on spending time to code that up | 17:25 |
kashyap | sean-k-mooney: All this requires extensive testing, and extremely careful audit - we haven't done any of that (yet). So we go with "solid heuristics". | 17:27 |
sean-k-mooney | solid heuristics of check nothing and hope it works? | 17:28 |
kashyap | The heuristic here being, "avoid needless checks" | 17:28 |
sean-k-mooney | we would not consider it safe to not check cpu compatabily on x86. its just seam wrong to have a lower standard for other archs | 17:29 |
sean-k-mooney | anyway i better go back to backporting | 17:29 |
*** evrardjp has quit IRC | 17:34 | |
sean-k-mooney | stephenfin: can you review https://review.opendev.org/#/c/701601 again when you have a chance | 17:34 |
*** evrardjp has joined #openstack-nova | 17:34 | |
stephenfin | sean-k-mooney: done | 17:42 |
sean-k-mooney | stephenfin: thanks | 17:42 |
sean-k-mooney | stephenfin: do you know if jan is about. he is ment to be moving this week right to start his new job next week | 17:43 |
stephenfin | bauzas: We shouldn't have, no. The issues are to do with Tempest tests, iirc (something to do with pip not uninstalling pyyaml since it was installed by the package manager) | 17:43 |
sean-k-mooney | that is being fix | 17:44 |
bauzas | all good, will recheck then | 17:44 |
sean-k-mooney | or has been fixed | 17:44 |
stephenfin | not so fast, I don't think the fix has merged | 17:44 |
stephenfin | bauzas: https://review.opendev.org/#/c/703735/ | 17:44 |
sean-k-mooney | you can add a depends on and then recheck | 17:44 |
bauzas | oh ok | 17:45 |
sean-k-mooney | that was breaking os-vif as well hence why i want to get that os-vif change merged so we nolonger have legacy jobs for it to break | 17:45 |
*** TxGirlGeek has quit IRC | 17:47 | |
*** jawad_axd has joined #openstack-nova | 17:48 | |
stephenfin | ah, so that's the dependency | 17:48 |
stephenfin | gtk | 17:48 |
*** TxGirlGeek has joined #openstack-nova | 17:49 | |
*** Sundar has quit IRC | 17:50 | |
*** jawad_axd has quit IRC | 17:52 | |
*** martinkennelly has quit IRC | 17:58 | |
*** derekh has quit IRC | 18:00 | |
*** eharney has quit IRC | 18:03 | |
*** jawad_axd has joined #openstack-nova | 18:08 | |
*** jmlowe has joined #openstack-nova | 18:10 | |
openstackgerrit | Merged openstack/nova stable/stein: libvirt: remove conditional on VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY https://review.opendev.org/700773 | 18:14 |
openstackgerrit | Merged openstack/nova stable/stein: libvirt: check job status for VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED event https://review.opendev.org/700774 | 18:14 |
*** eharney has joined #openstack-nova | 18:15 | |
*** vishalmanchanda has quit IRC | 18:17 | |
*** jawad_axd has quit IRC | 18:19 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Recalculate 'RequestSpec.numa_topology' on resize https://review.opendev.org/662522 | 18:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Cleanup of '_test_resize' helper test https://review.opendev.org/664245 | 18:31 |
artom | stephenfin, oh, you've picked that up again? ^^ | 18:35 |
artom | IIRC you handed it off to me, and then I looked it, shrugged my shoulders in despair, and never touched it again :/ | 18:36 |
stephenfin | artom: I'm getting hassled downstream for it and figured you didn't have bandwidth, so yeah :) | 18:36 |
stephenfin | all good | 18:36 |
artom | Yeah, sorry. There was attempt, the Gerrit comments are evidence of that. | 18:36 |
sean-k-mooney | stephenfin: i can hassel you upstrem for it too if you like | 18:38 |
sean-k-mooney | i wanted that to be in 16GA | 18:38 |
sean-k-mooney | well in train | 18:38 |
sean-k-mooney | same thing | 18:38 |
stephenfin | sean-k-mooney: https://media0.giphy.com/media/R7IYpzLLMBomk/giphy.gif?cid=790b7611c0d748cc03dc8e8f5456f63a200cd89cd6b5f14f&rid=giphy.gif | 18:39 |
sean-k-mooney | i think you still being working and only at 18:40 is enough pressue to make you fix anything | 18:40 |
*** tesseract has quit IRC | 18:40 | |
sean-k-mooney | /only/online/ | 18:41 |
stephenfin | heh, yeah, I'll be done shortly | 18:44 |
artom | stephenfin, oh, I remember where I blocked - if we update the request spec with the new numa_topology, but the resize fails in one of a myriad of ways, and we need to revert it | 18:45 |
*** tbachman has quit IRC | 18:45 | |
artom | IIRC we talked with dansmith about not persisting the numa_topology at all, and just making sure each request to the scheduler has the correct one | 18:46 |
stephenfin | artom: Just looked into that. FWICT, we don't revert the RequestSpec.flavor, let alone the RequestSpec.numa_topolology, in the case of a failure | 18:46 |
stephenfin | So I need another follow-up to do that | 18:46 |
artom | stephenfin, ah yeah, the flavor was in the same boat | 18:46 |
sean-k-mooney | oh ye are talking about a different bug. i was refering to https://bugs.launchpad.net/nova/+bug/1831771 | 18:48 |
openstack | Launchpad bug 1831771 in OpenStack Compute (nova) "UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host" [Medium,In progress] - Assigned to Matthew Booth (mbooth-9) | 18:48 |
sean-k-mooney | sorry i saw you being ping about that downstream so assume that was the one you were refering too | 18:49 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Add emulated TPM support to Nova https://review.opendev.org/631363 | 18:52 |
openstackgerrit | Eric Fried proposed openstack/nova master: Add support for resize and cold migration of emulated TPM files https://review.opendev.org/639934 | 18:52 |
efried | gmann: What's the accepted way to add a specialized CI job to nova these days? | 18:58 |
sean-k-mooney | efried: are you asking about thrid party or first party | 18:59 |
efried | 1p, I would think | 18:59 |
efried | Specifically, I'm going to need to work something up for vTPM. This is going to mean that the job will need to | 19:00 |
efried | - run barbican | 19:00 |
efried | - (compile and?) install special versions of libvirt and qemu | 19:00 |
efried | - (compile and?) install some custom software | 19:00 |
efried | - build servers with custom flavors | 19:00 |
efried | - run some commands via ssh | 19:00 |
sean-k-mooney | you can just defien it in tree in the .zuul.yaml | 19:00 |
sean-k-mooney | the job name should start with nova | 19:00 |
sean-k-mooney | right | 19:00 |
sean-k-mooney | am its on my long todo list but i do plan to add a tempest jobs that does the libvirt/qemu compilation for that devstack plugin | 19:01 |
sean-k-mooney | that said next cycle ubuntu 20.04 should ship the ones you need | 19:02 |
sean-k-mooney | for the swtpm sorfware | 19:02 |
efried | I don't have until next cycle ;P | 19:02 |
sean-k-mooney | i would just use an ansible per-run playbook | 19:02 |
efried | I don't know that I care to run actual tempest | 19:02 |
efried | Unless the "build servers" and "run some commands" bits need to be done in a tempestuous framework of some kind? A plugin? | 19:03 |
efried | jroll: o/ | 19:03 |
sean-k-mooney | it could | 19:03 |
sean-k-mooney | but on the custom flaovr front have you looked at my dpdk job | 19:03 |
efried | my brain goes fuzzy when I see dpdk | 19:04 |
gmann | efried: you can use zuulv3 native base jobs from devstack if you do not want to run tempest | 19:04 |
sean-k-mooney | well dpdk is not important | 19:04 |
sean-k-mooney | but you can use a pre-run playbook to create a local.sh file which will create custome flavors | 19:04 |
sean-k-mooney | e.g. request vtpm | 19:04 |
efried | gmann: is there a "making a devstack zuulv3 job for dummies" guide somewhere? | 19:04 |
sean-k-mooney | then you can run standard tempest wtih those falvor to test your feature | 19:05 |
sean-k-mooney | or a subset of tempest | 19:05 |
gmann | efried: humm, not guide as such i know but i can link you some example from other project did like Tacker etc | 19:05 |
gmann | or job description | 19:05 |
efried | k. I don't trust nova's .zuul.yaml cause I never know when there's a legacy minefield I'm walking into. | 19:06 |
*** ralonsoh has quit IRC | 19:06 | |
gmann | 'devstasck' can be used with minimum effort on specific jobs - https://github.com/openstack/devstack/blob/2e45f2c267c9ababdbdfc4c505b329398391c5f9/.zuul.yaml#L352 | 19:07 |
gmann | this is tacker multinode job for their functional testing - https://github.com/openstack/tacker/blob/bdb2d52b3a1b69c58cb2ac6f903380ab8a7bd973/.zuul.yaml#L28 | 19:07 |
efried | oh, so it's still kosher to use run.yaml things | 19:08 |
efried | that's a relief. | 19:08 |
sean-k-mooney | yes | 19:08 |
KeithMnemonic | melwitt. did this change break the patch? it is failing on a live migration thing now https://review.opendev.org/#/c/683008/6..7/nova/tests/unit/compute/test_compute.py | 19:08 |
gmann | multinode support is in all base jobs of devstack which is based on nodeset used on job | 19:08 |
sean-k-mooney | by default we leave it out in most of our jobs because its coming from the parent job | 19:08 |
melwitt | KeithMnemonic: lol no, that's a unit test. the live migration job is failing. we might need to rebase. I'm looking at the failure now | 19:09 |
efried | melwitt: if it's PyYAML, we're just waiting for https://review.opendev.org/#/c/703735/ to merge | 19:09 |
melwitt | unit tests run in the openstack-tox-py27 and openstack-tox-py35 jobs | 19:09 |
KeithMnemonic | ok thank you | 19:10 |
melwitt | efried: this is stable/pike so I doubt it right? the PyYAML is a new thing? | 19:10 |
efried | oh | 19:11 |
efried | I don't know. Is devstack branchless or something? | 19:11 |
sean-k-mooney | efried: here is an example job i need to go update https://review.opendev.org/#/c/679656/12 but it show you how to create a custom job with custom flavor and run standard tempest test to validate thing | 19:11 |
melwitt | gah gzipped logs again | 19:11 |
sean-k-mooney | efried: no devstack is branched | 19:11 |
melwitt | efried: devstack has branches but tempest is branchless | 19:11 |
sean-k-mooney | yes ^ | 19:11 |
efried | melwitt: okay, that failure looks different, ignore me. | 19:12 |
melwitt | how are these old crusty jobs having the friggin gzipped logs, dang | 19:12 |
efried | thanks sean-k-mooney | 19:12 |
*** luksky has joined #openstack-nova | 19:13 | |
efried | oh, yeah melwitt, that's going to continue to be a problem I guess, unless we plan to migrate legacy jobs to zv3 on stable. Much sad face. | 19:13 |
efried | we may have to figure out how to get that turned off in the job itself for stable. | 19:13 |
melwitt | oh geeez :''''''( | 19:13 |
sean-k-mooney | am well it should not be | 19:13 |
sean-k-mooney | oh actully | 19:13 |
efried | I thought it was an infra-level thing. | 19:14 |
sean-k-mooney | ya it is | 19:14 |
efried | so, if we're going to have to figure that out anyway, I guess we might as well do it on master, asap, so we can stop being frustrated at least partially. | 19:14 |
sean-k-mooney | so we need to modify devstack-gate to stop compressing the logs internally | 19:14 |
sean-k-mooney | and backport that to stable branches | 19:14 |
melwitt | isn't that what you did though? it didn't stop the compression for legacy jobs on master | 19:15 |
melwitt | so even if we backport it, it won't help right? | 19:15 |
sean-k-mooney | my fix fixed non legacy jobs | 19:15 |
*** TxGirlGeek has quit IRC | 19:16 | |
sean-k-mooney | i intended it to fix both but it seams that devstack gate is compressing them somewhere else too | 19:16 |
sean-k-mooney | or the legacy jobs are | 19:16 |
sean-k-mooney | it might not be in devstack gate | 19:16 |
melwitt | got it | 19:16 |
melwitt | ok, now to find where this thing is failing LM | 19:17 |
sean-k-mooney | i have been doing " curl <log url> | zcat | lnav -q" | 19:18 |
melwitt | that's helpful, thanks | 19:19 |
*** jmlowe has quit IRC | 19:21 | |
sean-k-mooney | melwitt: artom was say you can also to "curl <url> | zless" but i like using lnav to browse logs | 19:21 |
* artom pills | zzzzz | 19:22 | |
*** TxGirlGeek has joined #openstack-nova | 19:22 | |
melwitt | I still haven't gotten around to trying lnav so I'll use zless for now | 19:23 |
sean-k-mooney | ya if you do try it by default lnav dumps all the logs to std out so the -q is there to stop that | 19:23 |
melwitt | \:| [instance: 5bca844b-4ca6-4c63-a579-8f00316da019] Instance spawn was interrupted before instance_claim, setting instance to ERROR state | 19:24 |
sean-k-mooney | am that is new | 19:24 |
melwitt | yeah, it's new to me | 19:24 |
sean-k-mooney | does that mean qemu crashed or something? | 19:25 |
melwitt | no idea | 19:25 |
sean-k-mooney | well that makes two of us | 19:25 |
melwitt | I don't expect it's related to the patch, and the patch has been sitting around forever so I wouldn't be surprised if it needs a rebase by now. but the nova-live-migration job is failing pretty consistently on it and so far, for unknown reasons | 19:26 |
melwitt | rechecked it like 3 times and always fail on nova-live-migration | 19:26 |
sean-k-mooney | i can take a look at it a bit later. what is the review? | 19:27 |
sean-k-mooney | once i fix the osp 15 backport im doing ill take a quick look before moveing to osp 13 | 19:28 |
melwitt | sean-k-mooney: it's this one https://review.opendev.org/683008 | 19:28 |
sean-k-mooney | cool i have it open. | 19:28 |
melwitt | note that it is very old, not rebased since september | 19:28 |
artom | sean-k-mooney, huh, that's the same thing we saw in whitebox | 19:29 |
artom | We thought it was because we were restarting nova-compute to change configs | 19:29 |
sean-k-mooney | melwitt: if you do a rebase via the ui i should not hurt | 19:29 |
sean-k-mooney | but i dont think that is the issue | 19:29 |
sean-k-mooney | zuul will do a merge with master so it should basicaly be the same | 19:29 |
melwitt | yeah, probably worth it to try. it's gonna have to be rebased to merge anyway | 19:29 |
melwitt | let me do it now | 19:30 |
sean-k-mooney | well its more or less the same as a recheck and will avoid the merge commit so it wont hurt | 19:30 |
sean-k-mooney | artom: did you fix it in whitebox? | 19:30 |
artom | sean-k-mooney, by extending the service restart wait delay | 19:31 |
artom | Which apparently has nothing to do with it, as we're now seeing the same thing in Nova? | 19:31 |
sean-k-mooney | we are not restart services in this job | 19:31 |
sean-k-mooney | actully | 19:31 |
sean-k-mooney | we might be | 19:31 |
openstackgerrit | melanie witt proposed openstack/nova stable/pike: Avoid redundant initialize_connection on source post live migration https://review.opendev.org/683008 | 19:31 |
artom | Wait, n-l-m? | 19:31 |
sean-k-mooney | i think we do | 19:31 |
sean-k-mooney | ya | 19:31 |
artom | We're changing some configs IIRC | 19:32 |
sean-k-mooney | i think there is a hack that swaps the storage to ceph or something | 19:32 |
artom | So yeah, we are | 19:32 |
sean-k-mooney | i think its in the post run playbook | 19:32 |
sean-k-mooney | ok its not there | 19:33 |
sean-k-mooney | its proably in the test hook | 19:33 |
*** jmlowe has joined #openstack-nova | 19:33 | |
sean-k-mooney | ya so here https://github.com/openstack/nova/blob/master/gate/live_migration/hooks/run_tests.sh#L55-L67 | 19:34 |
artom | Ah, yeah, configure_and_start_nova | 19:35 |
*** maciejjozefczyk has quit IRC | 19:35 | |
artom | Are... are we just going to hax a sleep in there? Before the run_tempest call? | 19:35 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/gate/live_migration/hooks/ceph.sh#L71-L100 | 19:35 |
sean-k-mooney | we could hack a sleep here https://github.com/openstack/nova/blob/master/gate/live_migration/hooks/ceph.sh#L114 | 19:36 |
artom | Why in ceph? | 19:37 |
sean-k-mooney | this is where we restart nova after reconfiguring to use ceph | 19:37 |
*** gmann is now known as gmann_afk | 19:38 | |
sean-k-mooney | so in the ligration job we deploy withour ceph intially run tempest then after that we reconfivure the deploymetn for ceph and do more testing | 19:38 |
*** jmlowe has quit IRC | 19:39 | |
*** damien_r has quit IRC | 19:39 | |
sean-k-mooney | so we only restart the serivce in the ceph teting at the end here https://github.com/openstack/nova/blob/master/gate/live_migration/hooks/run_tests.sh#L64 | 19:40 |
*** jmlowe has joined #openstack-nova | 19:40 | |
sean-k-mooney | that function get loaded form the ceph.sh file here https://github.com/openstack/nova/blob/master/gate/live_migration/hooks/run_tests.sh#L21 | 19:40 |
*** damien_r has joined #openstack-nova | 19:42 | |
*** jmlowe has quit IRC | 19:47 | |
stephenfin | sean-k-mooney: Seeing as you're still here, does my comment here make sense https://review.opendev.org/#/c/662522/13/nova/compute/api.py@3637 | 19:53 |
stephenfin | checking it tomorrow is fine too. I am outta here | 19:53 |
sean-k-mooney | ill check | 19:56 |
sean-k-mooney | am the conducor calls the schduer | 19:58 |
sean-k-mooney | so it could modify the numa toplogy before it calls the select destination function i think | 19:58 |
sean-k-mooney | i would have to check but you are right in that you definetly need to recalulate the numa toployg based on the new flavor before the numa toplogy filter runs | 19:59 |
sean-k-mooney | meaining either in the api or in the conductor if the conductor si what is invoking the scheduler. | 20:00 |
sean-k-mooney | i dont have that part of code loaded up in my brain right now hence vague answer | 20:01 |
*** szaher has quit IRC | 20:03 | |
*** eharney has quit IRC | 20:04 | |
*** szaher has joined #openstack-nova | 20:09 | |
*** jmlowe has joined #openstack-nova | 20:12 | |
*** gmann_afk is now known as gmann | 20:18 | |
*** jmlowe has quit IRC | 20:18 | |
*** TxGirlGeek has quit IRC | 20:31 | |
*** gentoorax is now known as gentoorax_away | 20:42 | |
*** gentoorax_away is now known as gentoorax | 20:42 | |
*** mgariepy has quit IRC | 20:46 | |
*** panda has quit IRC | 20:49 | |
*** panda has joined #openstack-nova | 20:51 | |
*** artom has quit IRC | 20:52 | |
*** jmlowe has joined #openstack-nova | 20:55 | |
*** TxGirlGeek has joined #openstack-nova | 21:01 | |
*** jmlowe has quit IRC | 21:06 | |
*** slaweq_ has joined #openstack-nova | 21:07 | |
*** pcaruana has quit IRC | 21:07 | |
*** eharney has joined #openstack-nova | 21:23 | |
*** tbachman has joined #openstack-nova | 21:44 | |
*** rchurch has quit IRC | 21:49 | |
*** rcernin has joined #openstack-nova | 21:50 | |
*** rchurch has joined #openstack-nova | 21:51 | |
*** lucidguy has joined #openstack-nova | 21:54 | |
lucidguy | I'll be very very appreciative if someone can assist me with this, i've spent hours googling etc. https://paste.ubuntu.com/p/6wqhRX68tr/ | 21:54 |
*** xek has quit IRC | 21:54 | |
sean-k-mooney | you are booting vms with 1TB of ram | 21:56 |
sean-k-mooney | that is fun | 21:56 |
sean-k-mooney | this looks like a kvm kernel bug | 21:57 |
*** TxGirlGeek has quit IRC | 21:58 | |
*** nweinber__ has quit IRC | 21:58 | |
*** TxGirlGeek has joined #openstack-nova | 21:58 | |
lucidguy | sean-k-mooney: Suggestions? | 21:59 |
*** TxGirlGe_ has joined #openstack-nova | 22:00 | |
sean-k-mooney | ubutun 16.04 is getting near the end of its life have you updated to the latest 16.04 kernel available on the host. | 22:01 |
sean-k-mooney | its possible this is already fixed if not | 22:01 |
sean-k-mooney | this is not an openstack related bug | 22:01 |
*** gentoorax has quit IRC | 22:01 | |
*** gentoorax has joined #openstack-nova | 22:01 | |
sean-k-mooney | so your best bet is to ralk to the ubuntu kernel folks or reach our to the kvm comunity on irc | 22:02 |
sean-k-mooney | personlly if you are stuck on 16.04 but can upgrade your kernel i would consider using there hardware enabling kernel which tends to be more up to date | 22:02 |
*** TxGirlGeek has quit IRC | 22:03 | |
lucidguy | sean-k-mooney: We are already using the hwe kernel | 22:05 |
sean-k-mooney | actully just re reading this | 22:07 |
sean-k-mooney | if you launch the vm with 1TB of ram it start fine | 22:07 |
sean-k-mooney | and if you launch it with 1.2TB it fails with this error | 22:07 |
lucidguy | Thats correct | 22:08 |
sean-k-mooney | im just wondering if you are hitting a qemu or kvm memory limit | 22:10 |
lucidguy | sean-k-mooney: That's what I'm thinking but I can't find any documentation stating that | 22:10 |
jroll | efried: did you have a question for me or want my comments on how we might do that CI? | 22:12 |
jroll | (or neither?) | 22:12 |
sean-k-mooney | lucidguy: are you using nested virt by the way | 22:12 |
lucidguy | sean-k-mooney: How do I confirm that? | 22:13 |
sean-k-mooney | cat /sys/module/kvm_intel/parameters/nested | 22:14 |
lucidguy | Y | 22:14 |
sean-k-mooney | right so in 4.15 nested virt is disable by default in the kernel because tehre were still a few edgcaces that did not work correctly | 22:15 |
sean-k-mooney | by kernel 4.19 or so they had been fixed and the upstream kernel default it to Y | 22:15 |
lucidguy | Are you reading this somewhere? | 22:16 |
sean-k-mooney | https://bugs.launchpad.net/qemu/+bug/1813165 is a similar bug to yours but in there case they cpu was emulating System Management Mode | 22:16 |
openstack | Launchpad bug 1813165 in QEMU "KVM internal error. Suberror: 1 emulation failure" [Undecided,New] | 22:16 |
sean-k-mooney | that is not the casue in your trace back as you have SMM=0 | 22:16 |
sean-k-mooney | but they mentione this happend in a nested case | 22:17 |
sean-k-mooney | *also happend | 22:17 |
sean-k-mooney | lucidguy: and know i just have been working with nested virt for year and have seeing it improve and break over that time | 22:17 |
lucidguy | Since I'm using HWE, I believe I'm using the same kernal as Ubuntu 18.04. So people with the latest LTS are still having this issue? | 22:18 |
sean-k-mooney | ya nested virt in the defaul 18.04 kernel is broken | 22:18 |
lucidguy | To be honest, I don't exactly know what "nested" means. | 22:19 |
sean-k-mooney | so if that is what the 16.04 hwe kerel is trackin then that is likely the cause | 22:19 |
*** amodi has quit IRC | 22:19 | |
sean-k-mooney | lucidguy: it allows you to run kvm in side the vm to have 2+ layers of vms | 22:20 |
*** amodi has joined #openstack-nova | 22:20 | |
lucidguy | So should I not use nested? | 22:20 |
lucidguy | I don't think we have users runing vms within their vms | 22:21 |
sean-k-mooney | with that specific kernel proably not | 22:21 |
sean-k-mooney | but in general its on by default now but only from 4.19 on i think | 22:21 |
lucidguy | Hmm, now to find out where that is set. | 22:21 |
sean-k-mooney | sean@pop-os:~$ cat /etc/modprobe.d/qemu-system-x86.conf | 22:21 |
sean-k-mooney | options kvm_intel nested=1 | 22:21 |
sean-k-mooney | change 1 to 0 | 22:21 |
sean-k-mooney | you could also downgrae your kernel to the 16.04 non hwe kerenl | 22:22 |
lucidguy | I don't recall setting this anywhere, you think its on by default? | 22:22 |
lucidguy | We have hardware requiring that kernel | 22:23 |
sean-k-mooney | it should not be on by defualt on ubuntu 16.04 | 22:23 |
lucidguy | Hmm, set in nova somewhere? | 22:23 |
sean-k-mooney | no | 22:23 |
sean-k-mooney | nova does not set this | 22:23 |
lucidguy | odd | 22:23 |
sean-k-mooney | it can be set in two places | 22:24 |
sean-k-mooney | either in a file in /etc/modeprobe.d | 22:24 |
sean-k-mooney | or on the kernel command line | 22:24 |
lucidguy | sean-k-mooney: I have to leave soon, before I forget I just want to thank you for this info | 22:24 |
sean-k-mooney | no worries. nested virt would be my best guess but i dont know its defintly that | 22:24 |
sean-k-mooney | it is definetly a kenrel bug | 22:25 |
lucidguy | Still lots more to think about | 22:25 |
*** slaweq_ has quit IRC | 22:25 | |
*** dviroel has quit IRC | 22:25 | |
*** TxGirlGe_ has quit IRC | 22:25 | |
lucidguy | have to run. Thanks again. | 22:26 |
sean-k-mooney | no worries o/ | 22:26 |
sean-k-mooney | does anyone remember how to enable debug loggin in functinal tests? | 22:31 |
sean-k-mooney | nevermind i think i found the issue | 22:43 |
sean-k-mooney | yep | 22:43 |
sean-k-mooney | before train you should use NUMAHostInfo not HostInfo when you want it to constuct a numa toplogy form the kwargs | 22:43 |
sean-k-mooney | i even fixed that in one of the previous patches. | 22:43 |
lucidguy | back for a few minutes | 22:56 |
lucidguy | sean-k-mooney: You supporting a large OpenStack production environment? | 22:56 |
sean-k-mooney | not directly. i work at redhat on the comptue team so i mainly work upstream but i also support customer when they report bugs downstream | 22:57 |
sean-k-mooney | so we have some large cloud deployment that i support indirely but im on the enginerring team rather then support | 22:58 |
lucidguy | I'm supporting a 600+ hypervisor Queens deployment. | 22:58 |
lucidguy | I'm betting my boss is going to ask me to upgrade to Train. Lord help me. | 22:59 |
sean-k-mooney | what do you use for your installer. charms? or do you use something like kolla ansible or osa? | 22:59 |
sean-k-mooney | queens to train is a bit leap but its doable | 23:00 |
lucidguy | We install everything from scratch, using custom ansible playbooks | 23:00 |
sean-k-mooney | oh ok | 23:01 |
lucidguy | Are we crazy? | 23:01 |
sean-k-mooney | ya that makes it more challanging in some ways | 23:01 |
sean-k-mooney | that depends | 23:01 |
lucidguy | But this way you trully understand how things work. | 23:01 |
sean-k-mooney | yep that is the advantage | 23:01 |
sean-k-mooney | i would suggest looking at the ansibel roles form OSA | 23:01 |
sean-k-mooney | osa being openstack ansibel | 23:02 |
sean-k-mooney | to so see if they are of use too you | 23:02 |
lucidguy | OpenStack ansible popular ehh? | 23:02 |
sean-k-mooney | yes | 23:02 |
sean-k-mooney | it is the install that vexhost use | 23:03 |
lucidguy | vexhost? | 23:03 |
sean-k-mooney | they are a large public cloud operator that donate ci resouce to the openstck comunity | 23:03 |
lucidguy | interesting | 23:03 |
sean-k-mooney | lucidguy: you shoudl reach out to mnaser | 23:04 |
sean-k-mooney | i think he is still the PTL of OSA but they have done a lot to make upgrade simpler | 23:04 |
lucidguy | Doesn't canonical support OpenStack? | 23:04 |
sean-k-mooney | yes they do, they have an installer base on opesntack charms deployed with juju | 23:05 |
sean-k-mooney | so if you pay for support with them that is the install they woudl use | 23:05 |
lucidguy | interesting | 23:05 |
lucidguy | I would think large organizations would go with our approach. Avoid wrappers. | 23:06 |
sean-k-mooney | redhat has an openstack distobution that build on top of the triplo project | 23:06 |
sean-k-mooney | you would be surprised | 23:06 |
*** tkajinam has joined #openstack-nova | 23:07 | |
sean-k-mooney | some do if they have the devepolment staff to mainatin it | 23:07 |
sean-k-mooney | but its more common to see large companies contibuie and addopt the comuntiy installer | 23:07 |
sean-k-mooney | *installers | 23:07 |
lucidguy | Just sounds like a pain to have to troubleshoot another layer of software. | 23:08 |
sean-k-mooney | well when that software is built by a group of operators that have really clouds in produciton you would be suprised at how helpful it can be | 23:10 |
sean-k-mooney | that said i agree if its a complex installer it can be a pain to debug | 23:10 |
sean-k-mooney | so i tend to prefer the simpler ones like devstack and kolla-ansible | 23:11 |
sean-k-mooney | just never run devstack in production... | 23:11 |
*** mlavalle has quit IRC | 23:14 | |
lucidguy | ugg stupid instance stuck in deleting state.. and virsh on hv not responding FML | 23:14 |
sean-k-mooney | that sounds like the host kernel might have hung | 23:15 |
lucidguy | this is the box I disabled nesting | 23:16 |
sean-k-mooney | by the way i hope if you are runing 1TB VM you are using Hugepages for the memory | 23:16 |
lucidguy | Whatever the default option is. | 23:16 |
sean-k-mooney | so no you are not | 23:16 |
sean-k-mooney | the reason i mentioned that is it can take a long time for qemu/kvm to release all the mapped memory when you kill it | 23:17 |
lucidguy | I was just reading those options in nova.conf | 23:17 |
sean-k-mooney | so if you are using the decault 4k memory then it might take a while to fully delete the vm | 23:17 |
lucidguy | Would I not see a load on the system anywhere? | 23:18 |
lucidguy | I don't even see a qemu process running | 23:19 |
sean-k-mooney | you would see that the memory has not been freed with free -m | 23:21 |
sean-k-mooney | but in that case if virsh is not working i would check to see if libvirt is still running | 23:21 |
lucidguy | Its free | 23:21 |
lucidguy | libvirtError: failed to connect to monitor socket: No such process | 23:21 |
sean-k-mooney | so that mean libvirt tried to connecct to qemu but it failed because it was not running | 23:22 |
lucidguy | oooo its gone. So it did take forever. | 23:22 |
lucidguy | Now you're going to make me research how Hugepages works. Thanks alot. heh | 23:22 |
*** mriedem has quit IRC | 23:23 | |
sean-k-mooney | hehe ya they are non trival | 23:23 |
sean-k-mooney | but basically it invole preallcoating meory so that it can be handed out in larger chunks | 23:23 |
sean-k-mooney | normaly either 2MB or 1G instead of 4k | 23:23 |
sean-k-mooney | that signicatly impoves memory performance but it has other implciations | 23:23 |
*** TxGirlGeek has joined #openstack-nova | 23:24 | |
lucidguy | Won't the mem be used less efficiently? | 23:24 |
sean-k-mooney | the main onces beign non memory over subsription and no live migation suppor until train | 23:24 |
sean-k-mooney | no | 23:24 |
sean-k-mooney | by using larger page allocation you actully optimeis adress translation | 23:25 |
sean-k-mooney | the translation lookaside buffer can only cache a limite number of page translations. if each page is bigger you can cache the adress to more memory | 23:25 |
sean-k-mooney | so it can give 30%+ performanc improment if you applciation is memory or latency sensitive | 23:26 |
lucidguy | Tried launching a new intance on that HV without nexting.. instance comes up paused and I can see that same error in libvirt .. :( | 23:26 |
lucidguy | Was worth a try. | 23:27 |
sean-k-mooney | ya sorry it did not work | 23:27 |
lucidguy | Was so excited about that. Installing a 5.0 kernel on a 16.04 box in production I don't think is wise | 23:28 |
sean-k-mooney | i run 5.4 on my home server based on ubuntu 18.04 | 23:28 |
sean-k-mooney | but its really something you need to evaluate for your own usecase | 23:29 |
lucidguy | Did you basically upgrade using this approach? https://www.tecmint.com/upgrade-kernel-in-ubuntu/ | 23:29 |
sean-k-mooney | no i was lazy and used ukuu the ubuntu kernel update utility | 23:30 |
sean-k-mooney | it automates that | 23:30 |
sean-k-mooney | but i am running the 5.4 mainline kernel form that ppa | 23:30 |
lucidguy | ohh, theres a cli approach. | 23:30 |
sean-k-mooney | https://www.omgubuntu.co.uk/2017/02/ukuu-easy-way-to-install-mainline-kernel-ubuntu | 23:31 |
sean-k-mooney | again you proably shoudl not do that in prodcutin but i basically un the latest upstream lts kernel which is 5.4 a the moment | 23:32 |
sean-k-mooney | 5.0 was the previous upstream long life kernel i think | 23:32 |
sean-k-mooney | oh the previous long term support was 4.19 | 23:33 |
lucidguy | Things to think about. Thanks again. Have to run. Wife is hating me. heh | 23:33 |
sean-k-mooney | https://www.kernel.org/category/releases.html | 23:33 |
sean-k-mooney | no worries o/ | 23:33 |
lucidguy | I guess I should start by updating my local repo and giving 4.19 a try. Sadly I'm not optimistic of the outcome. | 23:34 |
lucidguy | have a good one | 23:35 |
*** damien_r has quit IRC | 23:39 | |
efried | jroll: we had talked vaguely about you being involved in the development of the CI, so I thought to involve you in that conversation :) | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!