Wednesday, 2019-06-19

*** awalende has joined #openstack-nova00:06
*** gfhellma_ has joined #openstack-nova00:07
*** gfhellma_ has quit IRC00:09
*** awalende has quit IRC00:10
*** slaweq has joined #openstack-nova00:11
*** yonglihe has quit IRC00:15
*** slaweq has quit IRC00:16
*** gyee has quit IRC00:32
*** brinzhang has joined #openstack-nova00:35
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.74: Support Specifying AZ to unshelve  https://review.opendev.org/66513600:43
*** yonglihe has joined #openstack-nova00:54
*** brinzh has joined #openstack-nova00:55
*** brinzhang has quit IRC00:58
*** takashin has joined #openstack-nova00:59
*** mriedem_away has quit IRC01:01
*** yedongcan has joined #openstack-nova01:06
*** lbragstad has quit IRC01:08
*** tetsuro has joined #openstack-nova01:10
*** brinzhang has joined #openstack-nova01:11
*** brinzh has quit IRC01:14
*** igordc has quit IRC01:16
openstackgerritMerged openstack/nova master: Log quota legacy method warning only if counting from placement  https://review.opendev.org/66576501:45
*** tetsuro has quit IRC01:48
*** tetsuro has joined #openstack-nova01:52
openstackgerritFan Zhang proposed openstack/nova master: Log disk transfer stats in live migration monitor.  https://review.opendev.org/61939501:52
*** guozijn has joined #openstack-nova01:53
*** hongbin has joined #openstack-nova01:58
openstackgerritFan Zhang proposed openstack/nova master: Retry after hitting libvirt error VIR_ERR_OPERATION_INVALID in live migration.  https://review.opendev.org/61227202:04
*** whoami-rajat has joined #openstack-nova02:08
*** tinwood has quit IRC02:08
*** tinwood has joined #openstack-nova02:10
*** slaweq has joined #openstack-nova02:11
*** slaweq has quit IRC02:16
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: Fix duplicate object description error  https://review.opendev.org/66620302:35
*** tetsuro has quit IRC02:47
*** _alastor_ has quit IRC02:50
openstackgerritTakashi NATSUME proposed openstack/nova master: Add a live migration regression test  https://review.opendev.org/64120002:52
*** ricolin has joined #openstack-nova02:55
*** markvoelker has joined #openstack-nova03:00
*** markvoelker has quit IRC03:06
*** slaweq has joined #openstack-nova03:11
*** slaweq has quit IRC03:15
*** guozijn has quit IRC03:18
openstackgerritMerged openstack/nova master: Clean up NumInstancesFilter related docs  https://review.opendev.org/66576803:21
*** tetsuro has joined #openstack-nova03:29
*** psachin has joined #openstack-nova03:32
*** tetsuro has quit IRC03:34
*** _alastor_ has joined #openstack-nova03:49
*** _alastor_ has quit IRC03:54
*** psachin has quit IRC04:01
*** markvoelker has joined #openstack-nova04:01
*** psachin has joined #openstack-nova04:03
*** markvoelker has quit IRC04:06
*** guozijn has joined #openstack-nova04:09
*** guozijn has quit IRC04:22
*** bhagyashris has joined #openstack-nova04:24
*** hongbin has quit IRC04:25
*** shilpasd_ has quit IRC04:27
*** udesale has joined #openstack-nova04:31
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: Add irrelevant files in dsvm job  https://review.opendev.org/66621704:47
*** cfriesen has quit IRC04:58
*** tetsuro has joined #openstack-nova05:07
*** slaweq has joined #openstack-nova05:11
openstackgerritMadhuri Kumari proposed openstack/nova master: Replace deprecated with_lockmode with with_for_update  https://review.opendev.org/66622105:16
*** slaweq has quit IRC05:16
*** dklyle_ has joined #openstack-nova05:24
*** david-lyle has quit IRC05:27
*** purplerbot has quit IRC05:27
*** purplerbot has joined #openstack-nova05:28
*** alex_xu has quit IRC05:30
*** irclogbot_0 has quit IRC05:30
*** awestin1 has quit IRC05:30
*** sridharg has joined #openstack-nova05:31
*** mgoddard has quit IRC05:31
*** irclogbot_3 has joined #openstack-nova05:31
*** awestin1 has joined #openstack-nova05:32
*** mgoddard has joined #openstack-nova05:33
*** ratailor has joined #openstack-nova05:49
*** guozijn has joined #openstack-nova05:50
*** Luzi has joined #openstack-nova05:50
openstackgerritmelanie witt proposed openstack/nova-specs master: Propose showing server status UNKNOWN when host status UNKNOWN  https://review.opendev.org/66618105:55
*** guozijn has quit IRC06:03
*** tkajinam has quit IRC06:03
*** yikun has joined #openstack-nova06:08
*** tetsuro has quit IRC06:10
*** slaweq has joined #openstack-nova06:11
*** spsurya has joined #openstack-nova06:15
*** ricolin has quit IRC06:18
*** ricolin has joined #openstack-nova06:19
*** dpawlik has joined #openstack-nova06:22
*** rajinir has quit IRC06:22
*** threestrands has joined #openstack-nova06:27
amotokigmann: hi06:30
gmannamotoki: hi06:30
amotokigmann: there is a discussion on exceptions from novaclient in a horizon review https://review.opendev.org/#/c/661526/06:30
amotokigmann: we are struggling on how to catch a specific exception from nova API and/or novaclient.06:30
amotokigmann: some suggestion would be appreciated.06:31
*** hamdyk has joined #openstack-nova06:31
amotokigmann: in case of neutron, neutron returns an exception type in a response body and neutronclient decodes it, so consumers of neutronclient python binding can catch a specific error.06:32
gmannchecking06:34
amotokigmann: thanks. no need to rush :)06:34
*** belmoreira has joined #openstack-nova06:40
gmannamotoki: BadRequest is not generic error. I mean nova API convert the various related exception to  HTTP Exception and then raise06:49
gmannlike - https://opendev.org/openstack/nova/src/branch/master/nova/api/openstack/compute/volumes.py#L35406:49
gmannHTTPBadRequest can occur due to multiple reason and their details are in error message06:50
*** guozijn has joined #openstack-nova06:50
*** rcernin has quit IRC06:51
gmannamotoki: Horizon showing the error message which include the error details for example: "invalid volume id is request" not enough ? or you want to prepare some addition helpful error msg ?06:52
gmannnovaclient also decode them into ClientException - https://github.com/openstack/python-novaclient/blob/003ac57d9af74aa4658a7bf6cc6b6b3bafa58c11/novaclient/exceptions.py#L24906:54
amotokigmann: thanks for checking. The horizon patch I mentioned above is to try to show a message from nova API as-is instead of showing a generic message "Unable to attach volume".06:56
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: Add irrelevant files in dsvm job  https://review.opendev.org/66621706:57
amotokigmann: it is generally a thing avoided in horizon (as messages cannot be translated and sometimes they are not friendly to GUI users) but we don't have a good idea for this case.06:57
gmannamotoki: i see your point.06:58
amotokigmann: my initial question is whether we can assume BadRequest only for that case.06:58
amotokigmann: for example, I wonder BadRequest can be raised for input validation or something others.06:58
gmannamotoki: it is hard to say, there can be various other error can be raised by Nova which is API specific and many times we do add/improve the exception06:59
gmannamotoki: how about putting else only in case of 500 error code. ex.htttp_status == 50007:00
gmannand for rest all, horizon can use the ex.msg as it is. which is much reliable because nova API prepare that error msg explicitly.07:01
amotokigmann: in a best case, we can catch a specific exception and show an appropriate message by horizon.....07:01
amotokigmann: for example, horizon tries to hide UUID but most API messages include UUID :-(07:01
gmann2nd option is: include all possible exception per APIs which you can get info from API ref07:02
*** rpittau|afk is now known as rpittau07:02
gmannamotoki: yeah so you can do that. but in that case you need to decode the error message. because error message include the details of "why it is BadRequest"07:02
gmannamotoki: does neutron client hide such info ? or horizon does ?07:04
amotokigmann: yeah, that's the dilemma.... if English is the only language it would be much much simpler07:04
gmannyeah. it depends on locale07:04
*** igordc has joined #openstack-nova07:05
amotokigmann: this is neutronclient code https://opendev.org/openstack/python-neutronclient/src/branch/master/neutronclient/v2_0/client.py#L65-L7307:05
*** tkajinam has joined #openstack-nova07:05
*** luksky has joined #openstack-nova07:06
amotokigmann: 'type' contains an exception name in neutron server and if a specific exception is defined in https://opendev.org/openstack/python-neutronclient/src/branch/master/neutronclient/common/exceptions.py#L136 neutronclient raises a corresponding exception to callers.07:06
gmannamotoki: so it does include the complete 'error_message' send from neutron API correct ?07:06
*** ivve has joined #openstack-nova07:07
gmannamotoki: novaclient does the same - https://opendev.org/openstack/python-novaclient/src/branch/master/novaclient/exceptions.py#L249-L27607:07
amotokigmann: yes, the general format of an exception message is like: {"type": "fooException", "message": "...."}07:08
amotokigmann: we add new entries to neutronclient/common/exceptions.py per request from consumers.07:08
amotokibut in this case it looks better to concatenate (translatable) "Unable to attach volume" with a message from nova API.07:09
gmannamotoki: yeah, nova has only high level exception only not so rich like neutronclient07:09
amotokiwhat I think is to catch exceptions related to multi-attach and send a specific error message.07:10
*** igordc has quit IRC07:10
amotokiso I wonder there is a way to distinguish multi-attached related exceptions from otherr BadRequest.07:10
gmannamotoki: hummm. that is good idea. I am just wondering how many such Client exception we need to add in nova case07:10
amotokibut it looks not easy now.07:10
gmannattach volume has only 5-6 exception which is not so hard07:11
*** luksky has quit IRC07:11
gmannbut see the server boot exceptions - https://opendev.org/openstack/nova/src/branch/master/nova/api/openstack/compute/servers.py#L695-L76307:12
amotokiyeah I know it07:12
gmannis it too bad to show UUID in horizon? i mean  horizon will be showing it to owner or admin only07:13
gmannor you find nova exception error message include more details which can cause security issue07:13
amotokiin the current impl of horizon, if UUID is shown to users, they sometimes need to fallback into CLI.07:15
*** ralonsoh has joined #openstack-nova07:16
amotokiI don't think it can be a security issue. it is just an usability topic.07:16
gmannyeah07:16
amotokiUUID is shown only in the detail page in most cases.07:16
*** tesseract has joined #openstack-nova07:20
*** hamdykhader has joined #openstack-nova07:21
*** hamdyk has quit IRC07:21
gmannamotoki: i replied to either include all possible exception or check case of 500.07:22
amotokigmann: thanks.07:22
amotokigmann: http://specs.openstack.org/openstack/api-sig/guidelines/errors.html might be a candidate (though I don't think it is implemented)07:22
*** luksky has joined #openstack-nova07:23
amotokigmann: neutron API does similar thing in a different way https://opendev.org/openstack/neutron/src/branch/master/neutron/api/api_common.py#L512-L52407:23
amotokigmann: anyway thanks for the discussion. really appreciated.07:23
*** ccamacho has quit IRC07:27
*** ccamacho has joined #openstack-nova07:27
gmannamotoki: yeah standard error format is the missing part. NovaException does not provide scuh better way like neutron does. we have only 'code' and 'message'.07:28
gmannamotoki: let me see sometime ( when i will be free) if we can have such wrapper method in NovaException to fetch the data in more standard way.07:31
gmannthanks for reporting it.07:31
*** helenafm has joined #openstack-nova07:33
*** tetsuro has joined #openstack-nova07:33
*** tssurya has joined #openstack-nova07:38
*** belmoreira has quit IRC07:38
*** tetsuro has quit IRC07:38
*** ttsiouts has joined #openstack-nova07:42
*** belmoreira has joined #openstack-nova07:44
*** dtantsur|afk is now known as dtantsur07:56
*** trident has quit IRC07:57
*** threestrands has quit IRC07:59
*** bhagyashris has quit IRC07:59
*** takashin has left #openstack-nova08:00
*** trident has joined #openstack-nova08:01
openstackgerritJosephine Seifert proposed openstack/nova-specs master: Spec for the Nova part of Image Encryption  https://review.opendev.org/60869608:04
*** ttsiouts has quit IRC08:06
*** tkajinam has quit IRC08:06
*** ttsiouts has joined #openstack-nova08:07
*** brinzh has joined #openstack-nova08:10
*** ttsiouts has quit IRC08:11
*** brinzhang has quit IRC08:12
openstackgerritJosephine Seifert proposed openstack/nova-specs master: Spec for the Nova part of Image Encryption  https://review.opendev.org/60869608:15
*** ociuhandu has joined #openstack-nova08:15
*** ttsiouts has joined #openstack-nova08:16
openstackgerritjiasirui proposed openstack/nova-specs master: fix the spelling mistakes  https://review.opendev.org/66624408:24
*** pcaruana has quit IRC08:27
*** xek has joined #openstack-nova08:27
shilpasd<efried> i am facing issue in n-cpu start, due to nova/db/sqlalchemy/migrate_repo/versions/397_migrations_cross_cell_move.py08:28
shilpasdefried: this got commited at https://review.opendev.org/#/c/614012/08:28
shilpasderror is:Error starting thread.: RemoteError: Remote error: DBError (pymysql.err.InternalError) (1054, u"Unknown column 'migrations.cross_cell_move' in 'field list'")08:28
shilpasdi have run db manage command, but no success08:28
shilpasdany solution?08:28
shilpasdefried: mriedem: here is the detailed error log, http://paste.openstack.org/show/753170/08:30
*** dikonoor has joined #openstack-nova08:34
*** imacdonn has quit IRC08:41
*** imacdonn has joined #openstack-nova08:41
openstackgerritMartin Midolesov proposed openstack/nova master: Implementing graceful shutdown.  https://review.opendev.org/66624508:44
*** pcaruana has joined #openstack-nova08:45
*** arxcruz is now known as arxcruz|brb08:47
*** markvoelker has joined #openstack-nova08:53
*** bhagyashris__ has joined #openstack-nova08:55
*** tetsuro has joined #openstack-nova08:55
*** markvoelker has quit IRC08:58
*** tetsuro has quit IRC09:11
*** kaisers1 has quit IRC09:13
*** cdent has joined #openstack-nova09:15
*** kaisers has joined #openstack-nova09:16
*** damien_r has joined #openstack-nova09:17
*** brinzh has quit IRC09:18
*** brinzhang has joined #openstack-nova09:19
*** awalende has joined #openstack-nova09:19
*** brinzhang has quit IRC09:19
*** brinzhang has joined #openstack-nova09:19
*** brinzhang has quit IRC09:20
*** brinzhang has joined #openstack-nova09:20
*** damien_r has quit IRC09:22
*** luksky has quit IRC09:24
*** davidsha has joined #openstack-nova09:24
*** belmoreira has quit IRC09:25
*** guozijn has quit IRC09:32
*** igordc has joined #openstack-nova09:37
openstackgerritMerged openstack/nova master: Fix wrong assert methods  https://review.opendev.org/66589709:38
*** derekh has joined #openstack-nova09:39
*** janki has joined #openstack-nova09:43
*** dikonoor has quit IRC09:47
*** lpetrut has joined #openstack-nova09:49
*** mdbooth has joined #openstack-nova09:49
openstackgerritzhaixiaojun proposed openstack/python-novaclient master: Modify the url of upper_constraints_file  https://review.opendev.org/66593409:50
*** markvoelker has joined #openstack-nova09:53
*** brinzhang has quit IRC09:55
*** punith has joined #openstack-nova09:57
openstackgerritStephen Finucane proposed openstack/nova master: xvp: Start using consoleauth tokens  https://review.opendev.org/65296709:57
openstackgerritStephen Finucane proposed openstack/nova master: xvp: Remove use of '_LI' marker  https://review.opendev.org/66542509:57
openstackgerritStephen Finucane proposed openstack/nova master: nova-status: Remove consoleauth workaround check  https://review.opendev.org/65296809:57
openstackgerritStephen Finucane proposed openstack/nova master: Remove nova-consoleauth  https://review.opendev.org/65296909:57
openstackgerritStephen Finucane proposed openstack/nova master: objects: Remove ConsoleAuthToken.to_dict  https://review.opendev.org/65297009:57
openstackgerritStephen Finucane proposed openstack/nova master: docs: Rework nova console diagram  https://review.opendev.org/66014709:57
openstackgerritStephen Finucane proposed openstack/nova master: docs: Integrate 'sphinx.ext.imgconverter'  https://review.opendev.org/66569309:57
openstackgerritSurya Seetharaman proposed openstack/nova master: Grab fresh info from the driver during nova start/stop actions  https://review.opendev.org/66597509:58
*** markvoelker has quit IRC09:58
*** bhagyashris__ has quit IRC09:59
*** martinkennelly has joined #openstack-nova10:04
*** luksky has joined #openstack-nova10:07
*** lpetrut has quit IRC10:13
*** rcernin has joined #openstack-nova10:18
*** yedongcan has left #openstack-nova10:20
openstackgerritStephen Finucane proposed openstack/nova master: tests: Use consistent URL regex substitution  https://review.opendev.org/66594910:21
*** dave-mccowan has joined #openstack-nova10:31
*** _alastor_ has joined #openstack-nova10:35
*** mdbooth has quit IRC10:36
*** mdbooth has joined #openstack-nova10:36
*** _alastor_ has quit IRC10:40
*** priteau has joined #openstack-nova10:46
*** ttsiouts has quit IRC10:47
*** ttsiouts has joined #openstack-nova10:48
*** lpetrut has joined #openstack-nova10:51
*** ttsiouts has quit IRC10:52
*** markvoelker has joined #openstack-nova10:54
*** awalende has quit IRC10:58
*** markvoelker has quit IRC10:59
ohwhyosahey there! I uploaded an iso ubuntu-server-18.04 (not live) and when connecting via console to the instance, it does detect it (it lets me get to the keymap selection part) but then it complains no cd-rom image is mounted, any idea why might that be?11:18
sean-k-mooneynot really while you can use iso its a little complicated in that you generally have to also a data volume to install to11:21
sean-k-mooneywhen you select the iso as a disk image it gets attached as the root disk. im not sure if it is attahced as a cdrom or not11:22
ohwhyosathanks sean-k-mooney! the recommendation would be using other image formats if available then, right?11:23
sean-k-mooneywell normally you would locally install it in a vm and then upload that to the cloud or use a prebuild image11:23
sean-k-mooneyyou can enable a boot menu too. you can convert the iso into a data volume in cinder and then boot an instance with a blank volume and install too but in generall you are better off preparing the image first rather then trying to do it in openstack11:25
ohwhyosanova.console.websocketproxy keeps on saying broken pipe, by the way, tried the openstack-ansible multinode and the aio (which should set up all the networking correctly by itself)11:25
ohwhyosaIt is only the console service that broken pipes, just saying in case there might be an issue behind it11:26
ohwhyosaI used both latest and stein11:26
ohwhyosastable/stein11:26
sean-k-mooneyim not sure that sound more like a spice server or websockify issue rather then nova11:26
sean-k-mooneyohwhyosa: if you have not seen https://github.com/openstack/diskimage-builder you should check it out11:27
sean-k-mooneythere is also virt-builder but tool like those are how people usually prepare image for openstack11:28
*** punith has quit IRC11:29
ohwhyosasean-k-mooney, thanks a ton! (and it happened both with noVNC and spice, but it is indeed happening with websockify)11:29
*** ttsiouts has joined #openstack-nova11:32
*** udesale has quit IRC11:53
*** arxcruz|brb is now known as arxcruz11:53
*** udesale has joined #openstack-nova11:53
*** markvoelker has joined #openstack-nova11:55
*** markvoelker has quit IRC12:00
*** awalende has joined #openstack-nova12:07
*** awalende has quit IRC12:08
*** awalende has joined #openstack-nova12:08
*** dikonoor has joined #openstack-nova12:08
*** mgariepy has joined #openstack-nova12:15
*** rcernin has quit IRC12:15
*** udesale has quit IRC12:20
*** udesale has joined #openstack-nova12:21
*** dikonoor has quit IRC12:21
*** udesale has quit IRC12:25
*** udesale has joined #openstack-nova12:25
*** priteau has quit IRC12:27
*** liuyulong has joined #openstack-nova12:28
*** awalende has quit IRC12:29
*** awalende has joined #openstack-nova12:30
*** awalende has quit IRC12:34
*** eharney has quit IRC12:40
*** alex_xu has joined #openstack-nova12:45
*** markvoelker has joined #openstack-nova12:56
*** ttsiouts has quit IRC13:00
*** markvoelker has quit IRC13:00
*** ttsiouts has joined #openstack-nova13:01
efriedshilpasd: You've hit this: https://review.opendev.org/#/c/614012/13:01
efriedshilpasd: sorry, you already said that.13:01
efriedare your conductor & computes running at the same level?13:02
*** ratailor has quit IRC13:02
*** awalende has joined #openstack-nova13:03
*** ttsiouts has quit IRC13:05
shilpasdefried: yes13:10
efriedshilpasd: In any case this sounds like a bug (it should behave better than that even if versions are mismatched) but something mriedem will have to look at once he gets here.13:10
*** mriedem has joined #openstack-nova13:11
efriedthere he is :)13:12
shilpasdefried: ok13:12
shilpasdyes13:12
shilpasdmriedem: i am getting http://paste.openstack.org/show/753170/ while service restart on master, can you help here to solve13:14
alex_xumriedem: we replied your comment on vpmem. also explored the bdm, let us know your thought13:14
alex_xumriedem: sean-k-mooney also in case you don't know, we already submit the vpmem code to gerrit https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/virtual-persistent-memory, hope that is helpful on review spec13:15
mriedemshilpasd: looks like you upgraded code without syncing the database13:15
shilpasdmriedem: nova-manage db sync, is run still the issue13:16
shilpasdany other command to be run?13:16
*** ttsiouts has joined #openstack-nova13:18
sean-k-mooneyalex_xu: sorry i have kept defering re reviewing https://review.opendev.org/#/c/601596/ ill grab a cup of coffee and go review  before my next meeting13:18
alex_xusean-k-mooney: no problem, thanks13:19
sean-k-mooneyand cool im not sure i will have time to reivew the code today but ill skim over it13:19
alex_xu\o/13:19
*** lbragstad has joined #openstack-nova13:20
*** cdent has quit IRC13:21
*** bbowen has joined #openstack-nova13:24
mriedemshilpasd: if you check the table schema in that database does it have the cross_cell_move column?13:24
*** bbowen_ has quit IRC13:24
shilpasdmriedem: there is no cross_cell_move column13:28
mriedemthen you didn't run the db sync properly13:28
mriedemare you sure you're syncing your nova (cell1) database?13:28
shilpasdmriedem: i am having multi node setup, and i have migrations record in it, is it cauing an issue?13:29
shilpasdshould i clen records and run db sync?13:29
mriedemexisting migration records should not be a problem13:29
shilpasdok13:30
mriedemshilpasd: you know that https://review.opendev.org/#/c/614012/ just merged yesterday, so you must be doing CD somewhere?13:30
shilpasdmriedem: yes i know that, after pulling master today started getting this issue13:31
tssuryashilpasd: did you run it on both cell0 and cell1 ?13:36
shilpasdtssurya: after syncing DB, checked the table schema in that database for both cell0 and cell1, but it does not have the cross_cell_move column13:39
*** rajinir has joined #openstack-nova13:40
sean-k-mooneyalex_xu: erric woudl we be able to take a look at the CAT spec https://review.opendev.org/#/c/662264/3 johnthetubaguy i dont know if cache allocation and memory bandwith limits are somethign your interested in too for hpc but input is welcome if you are interested.13:40
sean-k-mooneyefried: ^13:40
efriedsean-k-mooney: ack13:41
sean-k-mooneyill be working with lyarwood on the implementation he will likely take lead but im hopeing to get a yes or no on the general direction of the spec in the next week or two.13:43
tssuryashilpasd: well the nova-manage db sync comand by default takes the db_connection from the nova.conf config file.13:43
tssuryahopefully you are running it on level1 and not level013:44
tssuryafor cell1 syncing13:44
stephenfingibi: Could you take a look at this long standing bugfix? https://review.opendev.org/#/c/609460/13:44
mriedemshilpasd: confirm that the config file you're using when running nova-manage db sync has the [database]/connection pointed at the correct cell db13:44
*** BjoernT has joined #openstack-nova13:44
stephenfin(I don't know anyone else that knows that piece of the code in detail)13:44
gibistephenfin: I've put it in my queue13:45
stephenfinThanks :)13:45
tssuryashilpasd: yea check the database connection in the config you are running it against, I am guessing its not pointing at the right db13:45
shilpasdtssurya: i don't think so, since nova operations were working before pulling master13:47
tssuryashilpasd: what I mean is that nova-manage db sync is not being run against the cell config file13:48
tssuryaperhaps13:48
tssuryawhich is why its not adding the column to the cell level dbs13:48
shilpasdi have checked all that and in conf cell1 is configured as DB con13:49
*** eharney has joined #openstack-nova13:50
shilpasdand checked cell1 for cross_cell_move column but its not there13:50
shilpasdfinally i am doing stack with reclone YES, lets see further, will keep you posted13:50
*** mlavalle has joined #openstack-nova13:53
mlozahello, I have 13 computes and 12 of them are working fine. One of the problematic compute node's nova-compute service goes up and down in `openstack compute service list`. From the logs, I got tons of "Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed"  and RMQ broken pipes. Redeploying the compute didn't fix the issue neither rebooting13:53
mlozacompletely the controller nodes.13:53
*** mchlumsky has joined #openstack-nova13:54
ohwhyosamloza, have you checked placement logs?13:55
tssuryamloza: looks like a message queue issue due to which the compute is not ablt to communicate to the controller, its not a nova-compute service issue I guess13:55
tssuryaable*13:55
*** mdbooth has quit IRC13:56
ohwhyosaI had a similar problem and it turns out the host-name was duplicated (because of a previous installation)13:56
*** mdbooth has joined #openstack-nova13:56
*** janki has quit IRC13:57
*** markvoelker has joined #openstack-nova13:57
ohwhyosatry  OS_TOKEN=$(openstack token issue -c id -f value)13:57
ohwhyosaand then curl -s -H "X-Auth-Token: ${OS_TOKEN}" http://10.0.0.11:8780/resource_providers?name=${hostname}13:58
ohwhyosaassumming that 10.0.0.11:8780 is your placement endpoint13:58
ohwhyosawhich you could check with openstack endpoint list I believe13:59
mlozatssurya: When I redeployed the problematic compute, I change the hostname and IP address but still the issue persist. nova-compute keeps flapping13:59
*** rouk has joined #openstack-nova13:59
*** liuyulong has quit IRC14:01
*** markvoelker has quit IRC14:01
*** BjoernT_ has joined #openstack-nova14:01
*** BjoernT has quit IRC14:03
*** tssurya has quit IRC14:06
*** brinzhang has joined #openstack-nova14:09
shilpasdtssurya: mriedem: stack solve the issue14:09
mlozaohwhyosa: Just full of RMQ broken pipes and heartbeat missed. All logs are thrown in elasticsearch14:09
*** cdent has joined #openstack-nova14:11
*** dpawlik has quit IRC14:11
ohwhyosaDid you check the api, mloza ?14:13
*** lpetrut has quit IRC14:14
mlozaohwhyosa: No such error in nova-api14:17
ohwhyosanope, the placement api, did you check the curl command result?14:19
mlozaohwhyosa: let me check14:20
jrollrandom question: has anyone discussed doing passthrough of a host TPM device to the guest? I see this BP from 2014 https://blueprints.launchpad.net/nova/+spec/add-libvirt-tpm . I guess what I'm wondering is if that's something that would be accepted, and if so, if anyone has ideas on the best route to implement (as it isn't just standard pci passthrough).14:20
jrollcontext: we have users that have a requirement for a hardware tpm (still don't understand why vtpm isn't okay, but it is what it is)14:21
* jroll is happy to just mail the list, but thought I'd ask here first14:21
*** ivve has quit IRC14:23
*** JamesBenson has joined #openstack-nova14:26
gibistephenfin: I'm +2 on https://review.opendev.org/#/c/60946014:27
stephenfingibi++14:28
stephenfinNow to find another unwilling victim^H^H^H helpful soul :)14:28
mlozaohwhyosa: https://pastebin.com/raw/bA3LScMr14:28
mriedemjroll: there is an approved spec for tpm14:29
mriedemhttps://specs.openstack.org/openstack/nova-specs/specs/train/approved/add-emulated-virtual-tpm.html14:29
mriedemthe code is lagging and was deferred from stein14:29
mriedemwindriver owns it14:29
mriedemjroll: oh this is different,14:30
jrollmriedem: passthrough, not virt... yeah14:30
mriedemyou want passthrough14:30
mriedemi'd ask cfriesen about hw passthrough tpm if/when he's around14:31
mlozaohwhyosa: I did redeploy twice. The first one, we deleted the old compute in `openstack resource provider ..` and the second time, we change the hostname and IP address but still RMQ issues are there.14:31
*** JamesBen_ has joined #openstack-nova14:33
mlozaohwhyosa: The RMQ issue is only a specific node and other computes are fine.14:33
ohwhyosahmmm so the old provider doesn't appear on the query, right?14:34
ohwhyosadid you search by the current hostname or the old one?14:34
*** Luzi has quit IRC14:36
*** JamesBenson has quit IRC14:36
*** hamdykhader has quit IRC14:37
kashyapjroll: Heya, just noticed your random question, yeah, was curious why vTPM (which is implemented in libvirt/QEMU) is not okay.  Curious to hear when you know14:38
*** awalende has quit IRC14:38
jrollkashyap: probably FUD. idk. I'll let you know if I figure it out14:38
*** awalende has joined #openstack-nova14:38
mriedemjroll: let me guess, they are baremetal people that don't trust anything virtual, so even using VMs is a problem for them and they want hw passthrough wherever and whenever possible b/c they don't trust virtualization.14:38
jrollmriedem: it's almost like you know where I work14:39
mriedemheh14:39
kashyapmriedem: Heh, that guess is perfectly reasonable14:40
stephenfindansmith: Could you take a look at https://review.opendev.org/#/c/609460 too? Someone has finally stumbled upon it14:41
*** dklyle_ has quit IRC14:42
*** awalende has quit IRC14:42
mlozaohwhyosa: yes, the old provider isn't in the list14:42
mlozacurrent one14:42
*** dklyle has joined #openstack-nova14:42
mlozaI checked the old and it wasn't there anymore14:42
dansmithstephenfin: I'll look when there's a CI run from this year14:43
stephenfinthat'll happen automatically when it goes through the gate14:43
dansmithyeah, but I'm not going to spend time reviewing it if it's like completely broken, you know?14:44
*** gfhellma has joined #openstack-nova14:45
sean-k-mooneydansmith: ha that could take a while to happen14:45
openstackgerritStephen Finucane proposed openstack/nova master: Ignore hw_vif_type for direct, direct-physical vNIC types  https://review.opendev.org/60946014:45
stephenfinFair, but it's not. Just saying :)14:46
sean-k-mooneystephenfin: wait you tested something locally :P14:46
mlozaohwhyosa: btw, this is a kolla-ansible deployment of  stable/rocky branch14:47
*** brinzhang0 has joined #openstack-nova14:47
*** brinzhang has quit IRC14:47
*** luksky has quit IRC14:50
*** awalende has joined #openstack-nova14:51
*** belmoreira has joined #openstack-nova14:53
*** brinzhang0 has quit IRC14:53
*** awalende has quit IRC14:54
*** liuyulong has joined #openstack-nova14:57
*** markvoelker has joined #openstack-nova14:58
*** lpetrut has joined #openstack-nova14:58
*** lpetrut has quit IRC14:59
*** lpetrut has joined #openstack-nova14:59
belmoreiramriedem are you available? Hope you're good.15:01
belmoreirawhat's the best way to fix wrong allocations? (requires_allocation_refresh = False)15:01
*** awalende has joined #openstack-nova15:02
mriedembelmoreira: coincidentally bauzas and i are talking about busted allocations in -placement15:02
*** markvoelker has quit IRC15:02
*** cfriesen has joined #openstack-nova15:02
mriedembut should probably be talking about that here since it's a nova problem15:02
bauzasactually yeah15:03
mriedemi'm not sure what requires_allocation_refresh is15:03
mriedemresource_provider_association_refresh ?15:03
mriedemresource_provider_association_refresh doesn't touch allocations anyway15:04
belmoreirathis is only enable for ironic (the default is false). I can confirm but I'm assuming it always updates the allocations15:05
mriedembelmoreira: well one part is fixing the many bugs that cause incorrect allocatoins, e.g. https://review.opendev.org/#/c/654067/15:05
mriedembelmoreira: oh, requires_allocation_refresh in the ironic driver, that was removed awhile ago15:05
belmoreiramriedem I agree. But after hitting the bugs operators need something to fix the busted allocations15:05
mriedemyeah i know15:06
*** sridharg has quit IRC15:06
mriedemi think https://bugs.launchpad.net/nova/+bug/1793569 is where we've been collecting the various notes on what some operators are doing for scripts15:06
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]15:06
mriedembelmoreira: one option is if you know that some instance has the wrong allocations, you can delete the allocations in placement and then run the nova-manage placement heal_allocations command and it should fix the allocations15:07
mriedembut that won't do anything about allocations held by migration records from some failed migration15:07
ohwhyosamloza, I recommend you checkout also with the people at #openstack-kolla15:08
ohwhyosaI can't help you more, I'm kinda new myself (without the kinda) but since I had a similar problem recently and was helped by sean-k-mooney , I thought I'd pay the favor forward in case it would help15:08
*** _alastor_ has joined #openstack-nova15:13
bauzasmriedem: you need to stop your APIs before cleaning up allocations, right?15:14
mriedemumm15:15
belmoreiramriedem in that case the operator needs to identify somehow the wrong allocations. I was thinking in having a "nova-manage" for that and fix or a periodic task15:15
mriedemi guess it depends on what you mean by cleaning up15:15
mriedembelmoreira: see comment 5 that i just posted https://bugs.launchpad.net/nova/+bug/179356915:15
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]15:15
bauzasmriedem: what you just said 'delete allocations and run nova-manage'15:15
mriedembauzas: heal_allocations won't do anything to an instance whose task_state is not None,15:16
belmoreiramriedem we had that periodic task in the past, if I remember correctly15:16
bauzasif you delete allocations, you could race if you leave users to be able to create new instances ;)15:16
mriedembauzas: creating new servers and their allocations isn't the problem you're trying to solve15:16
bauzasI'm misunderstanding what you propose then15:16
mriedembelmoreira: once everything is upgraded to pike, the resource tracker would stop updating allocations for non-ironic nodes15:16
mriedembauzas: if you're trying to heal allocations for a single instance, you could lock it, delete its allocatoins in placement, and then run heal_allocations and then unlock it15:17
bauzashah ok, that I understand15:18
mriedembut as i've said in that bug above, and in irc many a time, heal_allocations nor the scripts in that bug from larsks and mnaser deal with stale allocations held by migration consumers15:18
bauzasI thought you were saying about scrubbing the whole allocations table and just do the nova-manage placement heal stuff to recreate all the records15:18
mriedembauzas: i probably wouldn't do that on a production cloud with a lot of instances15:19
bauzasof course15:19
mriedemunless you use the --dry-run option which was also recently added and not backported15:19
mriedemheal_allocations also doesn't deal with nested allocations (yet)15:19
belmoreiramriedem right. Why it shouldn't continue to happen for libvirt if we continue to require it for ironic? (at least optional) We will have always bugs and this "sync" will be always required15:19
mriedemso our tooling for operators is definitely lagging the complicated features we're shoving in15:20
mriedembelmoreira: first, we no longer have that flag in the ironic driver as i said - it was removed some time ago, either rocky or stein15:20
bauzasbelmoreira: for the context, I have some customers that complain about the nova_api (because Queens) DB be growing because of orphaned allocations15:20
mriedembelmoreira: the reason the RT doesn't manage allocations since pike is because starting in pike the scheduler creates the allocations and the RT would trample the allocations during a migration afterward, which is its own problem15:21
mriedemtrust me, we had a bunch of bugs going into Pike RC1 and post-GA related to that15:21
mriedemthat was addressed in queens with https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html15:22
mriedembut because of ^ and the lack of the RT auto healing things, if a migraiton fails and we don't cleanup properly, you've got stale allocations in placement held by migration records15:22
mriedemand heal_allocations doesn't yet deal with those15:22
belmoreiramridem fair enough15:22
mriedemwhich leads to the scheduler thinking you have far less capacity than you probably do15:22
belmoreirayeah, heal_allocations only create new allocations15:23
*** mrch_ has quit IRC15:23
bauzasI guess a "fix my cloud" button would be awesome15:23
belmoreirawe need a nova-manage for placement consistency15:23
mriedemor just more people (developers/cores) working on nova that care about fixing these latent issues for operators15:24
bauzasbut then https://media.giphy.com/media/xThuW45pxrB820tD0c/giphy.gif15:24
mriedemb/c frankly i'm getting burned out on caring about finding and fixing this stuff15:24
*** Sundar has joined #openstack-nova15:24
bauzasmriedem: oh yeah I understand you, don't make me wrong15:25
belmoreiramriedem don't get me wrong. I really appreciate all your work on this15:25
bauzasI just think about the possible DB locks a "fix my cloud" button would make15:25
mriedembauzas: invariably we'd fuck up a "fix my cloud" thing anyway15:25
bauzasideally, this needs to work online15:26
bauzasbut this is the ideal15:26
mriedemas i said in https://bugs.launchpad.net/nova/+bug/1793569/comments/5 i think one low hanging fruit is probably adding something to heal_allocations that scans for allocations held by migratoins which are not in progress and reports on those15:27
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]15:27
bauzasbelmoreira: you'd be okay with some tool that 'd fix allocations by a maintenance window ?15:27
bauzasI mean, I guess the answer, it'd depend on the required time window  :-)15:27
bauzasmriedem: couldn't we just report allocations that are either for not-in-progress migrations or just not related to any instance or migration UUID ?15:28
mriedemi'm torn about how much functionality to shove into heal_allocations b/c it's going to get more complicated https://review.opendev.org/#/q/topic:bug/1819923+(status:open+OR+status:merged)15:28
mriedembauzas: sure we *could* but that's likely a separate command i'd think15:29
mriedemnova-manage placement audit_allocations or something15:29
*** lpetrut has quit IRC15:30
cdentwhich is part of how we ended up wanting consumer types15:30
cdentfor most clouds (so far) it won't matter because all allocations will come from nova15:30
mriedemi thought about that, and we could fudge around consumer types for now by checking the resource classes involved to know it's coming from nova15:30
* cdent nods15:30
bauzascdent: heh touché15:31
bauzasyou know what ?15:31
mriedemwe could also use some troubleshooting docs about "my allocations seem all f'ed up, how can i tell for sure?" but i'm not sure how easy that is to write, but it's something that comes up in here almost every week i think15:31
bauzasI've been given that escalation so I have free time to work on it15:32
*** gyee has joined #openstack-nova15:32
cdenthaving those docs and tools coming from the people who are experiencing and/or fixing the bugs would be ideal as they have the most idea of what's going on. in a perfect universe we'd have lots of people running around to do it, but we're long past that15:33
bauzasmriedem: specless BP, there is ?15:33
mriedembauzas: i don't think providing tooling for fixing our messes is a feature15:33
sean-k-mooneyits not really a bug either15:34
sean-k-mooneythe thing that is cause the mess is a bug15:34
sean-k-mooneyi guess a tool could be a partial fix for that but15:34
sean-k-mooneyare you ok with tieing the tool to the bug rather then a seperate thing15:34
mriedemhttps://bugs.launchpad.net/nova/+bug/1793569 is a bug15:34
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]15:34
bauzascool, I'll use this bug15:35
* bauzas rolls his sleeves15:35
sean-k-mooneyam ok that feels more like an RFE then a bug but ok15:35
sean-k-mooneyno need to invent more paperwork when we dont have too15:36
mriedembauzas: i wouldn't mind you posting something to the ML with what you plan on adding before starting a bunch of work on it15:36
bauzasack, tagging ops too15:36
belmoreirabauzas yes please15:36
*** damien_r has joined #openstack-nova15:36
*** awalende has quit IRC15:38
mriedembauzas: you can reply to this thread i started last september http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004223.html15:39
bauzasmriedem: cool15:39
*** awalende has joined #openstack-nova15:39
*** gfhellma has quit IRC15:40
mriedemi'll also say i expect functional tests for anything added b/c unit tests don't cut it with this kind of stuff that involves placement15:40
*** awalende_ has joined #openstack-nova15:41
mlozaohwhyosa: how did fix your issue?15:42
*** awalende has quit IRC15:43
bauzasbelmoreira: do you know if [ops] tag in the email subject is enough for getting ops' eye on what I write ?15:43
mriedemyes, it should be15:44
*** awalende_ has quit IRC15:44
belmoreirais the tag that we are using15:44
mriedemit's what i used on http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#709715:44
mriedemwhich is related to all of this also - nova mis-managing placement resources15:45
artomsean-k-mooney, remind me again, what's the situation where we can have a single instance with bind-time and plug-time events?15:46
mriedembauzas: mayhap while you have been escalated to spend time upstream, you'd like to review https://review.opendev.org/#/q/topic:bug/1825537+(status:open+OR+status:merged) as well15:46
artomOVS hybrid-plugged ports for the former, what's the latter?15:47
artom'cuz hybrid-plug is a neutron-wide setting, no?15:47
mriedemartom: sriov direct-physical i thought15:47
sean-k-mooneyovs + sriov or just tow different sriov ports15:47
mriedemartom: different types of ports attached to the server15:47
artommriedem, that's what I have in my commit message currently (not pushed yet), but it means I got https://review.opendev.org/#/c/664431/ completely wrong15:48
sean-k-mooneye.g. one port that is vnic_type=direct-physical + another port that is vnic_type=direct15:48
artomI think I'l push, sean-k-mooney can shoot it down ;)15:48
mriedembelmoreira: while you're around, here is another tool for ops for you https://review.opendev.org/#/c/655908/15:48
sean-k-mooneyartom: if i think its incorrect ills leave a comment or updated it for you when i review15:49
artomsean-k-mooney, appreciated :)15:49
sean-k-mooneybut ya it basicaly only happens if you have two different network backend attached to the same vm which only happens if you are using sriov really15:50
artommriedem, I'm off until Tuesday, so I won't be the one harassing you for reviews. That honour goes to sean-k-mooney while I'm gone (because it's still on fire internally)15:50
mriedemartom: as in you're off today?15:50
artommriedem, as of tonight, flying to Russia to see the folks15:50
artomWell, this afternoon, really15:50
mriedemok so is someone going to be updating https://review.opendev.org/#/c/644881/ sometime soon because i'm getting really tired of having to re-load the context from that change into my head every other week15:51
mriedemwhen dansmith asks me to15:51
belmoreiramriedem thanks.15:51
openstackgerritArtom Lifshitz proposed openstack/nova master: Revert resize: wait for events according to hybrid plug  https://review.opendev.org/64488115:51
openstackgerritArtom Lifshitz proposed openstack/nova master: [DNM] testing bug/1813789 revert resize events  https://review.opendev.org/66444215:51
artommriedem, ^^ :)15:51
artomAnd as I said, sean-k-mooney's carrying the torch15:51
* mriedem uncocks15:51
artomUnless it somehow merges today15:51
*** helenafm has quit IRC15:52
sean-k-mooneymriedem: ill priorites any review feedback that you or dansmith have. artom ill also review this again shortly15:54
jrollkashyap: so the concern on vTPM is around where keys "in the TPM" are stored. can you help me find docs on that? in other words, we don't want these keys sitting on a disk.15:57
*** belmoreira has quit IRC15:57
*** markvoelker has joined #openstack-nova15:58
bauzasmriedem: ack, will review those15:58
bauzasif working on customer escalations means more upstream time for me, then I sign off for moar :)15:59
*** ttsiouts has quit IRC16:02
openstackgerritStephen Finucane proposed openstack/nova-specs master: Additional upgrade clarifications for cpu-resources  https://review.opendev.org/66603216:02
mriedemsean-k-mooney: i don't know what's up with https://review.opendev.org/#/c/647733/ but it's not queued in zuul and zuul hasn't reported on it16:02
*** ttsiouts has joined #openstack-nova16:03
*** markvoelker has quit IRC16:03
sean-k-mooneymriedem: patch set 7 didnt get queued either16:04
sean-k-mooneymriedem: i realised like an hour ago that its still missing a test16:04
sean-k-mooneyso ill respin it later today16:04
sean-k-mooneyi still need to add the test for backleveling the object16:05
sean-k-mooney:) you got to the - workflow before me16:06
sean-k-mooneyi think ther ewas a zuul restart yeasterday aroudn wehn i pushed it an i think that messed with it16:06
mriedemyou rechecked it about 3 hours ago though16:06
mriedem*416:06
*** ttsiouts has quit IRC16:07
sean-k-mooneyya i dont know why that didnt go through.16:07
sean-k-mooneyanyway ill update it later today. sorry for the noise16:07
*** rpittau is now known as rpittau|afk16:10
*** lpetrut has joined #openstack-nova16:10
*** _erlon_ has joined #openstack-nova16:12
*** udesale has quit IRC16:12
*** mdbooth_ has joined #openstack-nova16:14
*** awalende has joined #openstack-nova16:14
*** mdbooth has quit IRC16:17
*** spsurya has quit IRC16:18
*** mdbooth_ has quit IRC16:19
*** awalende has quit IRC16:19
*** mrch_ has joined #openstack-nova16:20
*** liuyulong has quit IRC16:22
sean-k-mooneyalex_xu: just finished https://review.opendev.org/#/c/601596/14 im not sure we should default to copying pmem namespace for resize or cold migration16:27
openstackgerritjacky06 proposed openstack/os-traits master: Sync Sphinx requirement  https://review.opendev.org/66638616:32
openstackgerritjacky06 proposed openstack/os-vif master: Sync Sphinx requirement  https://review.opendev.org/66638716:34
*** gfhellma has joined #openstack-nova16:44
*** ricolin has quit IRC16:48
*** lpetrut has quit IRC16:49
*** awalende has joined #openstack-nova16:50
*** gfhellma_ has joined #openstack-nova16:50
*** gfhellma has quit IRC16:53
*** dtantsur is now known as dtantsur|afk16:56
*** davidsha has quit IRC16:57
*** martinkennelly has quit IRC16:57
*** markvoelker has joined #openstack-nova16:59
*** derekh has quit IRC17:00
*** trident has quit IRC17:02
*** markvoelker has quit IRC17:04
*** trident has joined #openstack-nova17:04
*** tesseract has quit IRC17:21
*** cdent has quit IRC17:22
*** awalende has quit IRC17:22
*** eharney has quit IRC17:25
*** mgoddard has quit IRC17:25
roukso, update_available_resource in resource_tracker.py and _refresh_associations in report.py both take so long to complete that the main thread hangs long enough to miss rmq heartbeats for 2-3 minutes at a time on a specific node, causing nova service to be unusable and status to flap. does this sound familar to anyone?17:25
roukbeen debugging a specific compute node flapping/being broken for a while, narrowed it down this far, making these functions just return immediately has fixed the symptoms, but those functions are kinda needed for functioning.17:27
sean-k-mooneyrouk: no not really17:27
sean-k-mooneywhat driver are you using17:27
sean-k-mooneyand how big is the host its happening on17:27
*** mgoddard has joined #openstack-nova17:27
rouksean-k-mooney: libvirt, 256gb ram, 96 threads of cpu, epyc17:28
sean-k-mooneyand do you have a lot of running instnaces?17:28
roukother nodes at same spec and exact hw match are... fine, as far as i can tell, suddenly this unit became useless, fresh deploy of it still broken.17:29
sean-k-mooneyor pci deivces17:29
rouknope, totally evicted currently, its a fresh build, had to fail everyone out of it and manually recover, as nova wasnt working enough to complete any migrations.17:29
roukno crazy pcie devices, a 40gig card.17:30
sean-k-mooneythat seams strange. i have run nova on simlarly sized system in the past 88core intel system with 192GB of ram and seen no issues17:30
sean-k-mooneyif you do virsh capablities on the host system does it complete quickly or is it slow17:31
rouki have something like 60 other nodes at same spec without issue, this one just cant get through the libvirt calls fast enough even on a fresh deploy, fresh ip, fresh provider config, fresh hostname17:31
sean-k-mooneyim wondering if thise could be a libvirt issue17:31
roukit does seem to hang around in nova's libvirt.py for a while17:32
sean-k-mooneywhat version of python are you running?17:32
roukthe very first line in rocky's update_available_resource takes sometimes upwards of 60 seconds, the 2 functions i mention trade in terms of time spent, one cycle it will timeout on one, then the other17:33
rouk2.7.1517:33
sean-k-mooneynova currently dose not supprot python 3.7 and i have seen the compute agent hang with 3.717:33
sean-k-mooney2.7.15 on ubuntu bionic right?17:33
sean-k-mooneymriedem: if you are around there was a gate issue with 2.7.15 recently right?17:34
roukyep17:34
roukwhat command should i check for cap list?17:35
sean-k-mooneywell i was wondering if libvirt was taking a long time to repond to that command17:35
sean-k-mooneye.g. was it haveing trouble iterspecing the plathform17:36
*** tesseract has joined #openstack-nova17:36
roukcapabilities returns in under 100ms it seems17:36
sean-k-mooneyif you had a dodgey disk or a semi broken pci device it could cause that to be slow or toher command17:36
sean-k-mooney*other17:36
sean-k-mooneybasicaly i was trying to figure ot is it slow because the compute agent is running slow or its waithing around for io form sysfs/libvirt17:37
rouknetwork running at a stable 20gig with 0% loss to the rabbit nodes, disks are happy, 2GB/s to the root device, no other drives in the system...17:37
sean-k-mooney2GB/s to the root disk?17:38
roukyep17:38
sean-k-mooneyas in you currently can write at that speed or that is the activtiy you are seeing17:38
*** eharney has joined #openstack-nova17:38
roukcapability, stable, load is nothing right now17:38
sean-k-mooneyif you are seeing 2GB/s of writes and the system is idel somethign is wrong but if you ran dd and it worked a 2GB/s then your disk is fine17:39
rouk3GB/s to the docker mountpoint right now via DD from /dev/zero, no load otherwise17:40
rouk0.03 load across 96 threads17:40
sean-k-mooneyya ok so this is likely not related to the system hardware then or to libvirt/sysfs17:41
sean-k-mooneylatency17:41
roukresources = self.driver.get_available_resource(nodename) is the line that takes the longest in the first freeze spot17:41
roukthat one takes upwards of 60 seconds every 2nd cycle17:42
rouk_refresh_associations is hanging within the if refresh_sharing:17:43
roukalso every 2nd cycle, on the cycle which the other does not freeze, they trade, like some kind of shared lock17:43
roukagain, nothing special about this compute node that i can see, same config, same deploy, completely fresh, fresh rmq queues, its never even had a vm on it.17:44
rouksame containers as the others, central repo so versions always match.17:45
*** psachin has quit IRC17:48
rouksean-k-mooney: any idea for next steps? or should i just trace all the code down till i find a specific thing blocking it?17:48
sean-k-mooneyam if you have time that would help. these fucntion are used to update the invetory of avialble resouce in placmente so we chan scheule properly17:49
*** factor__ has joined #openstack-nova17:49
*** icarusfactor has quit IRC17:49
sean-k-mooneyso without them you cant use that node but there is obviouly somethign wrong17:50
sean-k-mooneydo you use pci passthouhg on that node17:50
sean-k-mooneyif so you could try commenting out the whitelist to rule out the pci module17:50
sean-k-mooneysimilarly if you are using vgpus you could disabel that for debuging in teh config17:51
sean-k-mooneyif the issue still exists with pci passthough and vgpus disabled in the conf(assuming you were using either) that narrows down the code that could be at falut17:52
rouknope, same config as all other nodes, just rbd storage, oslo memcached, lock_path = /var/lib/nova/tmp,17:52
sean-k-mooneyok so we can ignore those codepaths then17:52
roukno major customization other than standard kolla config for a compute node.17:52
sean-k-mooneywhat relese did you say you were using17:52
roukcan dump you configs if it would help17:52
*** maciejjozefczyk has quit IRC17:52
roukrocky, nova==18.1.0, it was working on this node up until monday, no changes in version, got some stein deployments that are happy too.17:53
roukcould move up to 18.2.1 i guess, if a fresh one would help maybe?17:54
*** melwitt is now known as jgwentworth17:55
*** awalende has joined #openstack-nova17:56
openstackgerritAdam Spiers proposed openstack/nova master: Add extra spec parameter and image property for memory encryption  https://review.opendev.org/66442017:58
sean-k-mooneyam can you file a bug for this and inclode as much info as you can17:59
roukwell if nobody has heard of it, and nobody has had similar, i would need to find the exact cause before a bugreport would be any use...17:59
sean-k-mooneyit sound like there is a deadlock or somethin that is preventing thinks form correctly yeilding17:59
openstackgerritAdam Spiers proposed openstack/nova master: Use fake flavor instead of empty dict in test  https://review.opendev.org/66255518:00
openstackgerritAdam Spiers proposed openstack/nova master: Pass extra_specs to flavor in vif tests  https://review.opendev.org/66255618:00
openstackgerritAdam Spiers proposed openstack/nova master: Extract SEV-specific bits on host detection  https://review.opendev.org/63633418:00
openstackgerritAdam Spiers proposed openstack/nova master: Add <launchSecurity> element to guest config for AMD SEV  https://review.opendev.org/63631818:00
openstackgerritAdam Spiers proposed openstack/nova master: Allow guest devices to include <driver iommu='on' />  https://review.opendev.org/64456418:00
openstackgerritAdam Spiers proposed openstack/nova master: Detect that SEV is required and enable iommu for devices  https://review.opendev.org/64456518:00
openstackgerritAdam Spiers proposed openstack/nova master: Use <launchSecurity> element when SEV is required  https://review.opendev.org/66255718:00
openstackgerritAdam Spiers proposed openstack/nova master: Enable memory locking if SEV is requested  https://review.opendev.org/66255818:00
*** markvoelker has joined #openstack-nova18:00
sean-k-mooneyrouk: well you dont need to fuly root cause it to open the bug you could update it as you go with your findings18:00
aspierssean-k-mooney, mriedem: this is the best I could come up with for the fake dict issue: https://review.opendev.org/#/c/664420/10/nova/virt/hardware.py@114218:00
*** awalende has quit IRC18:00
*** ociuhandu has quit IRC18:02
*** gfhellma__ has joined #openstack-nova18:03
*** markvoelker has quit IRC18:05
*** gfhellma_ has quit IRC18:06
*** pcaruana has quit IRC18:11
mriedemsean-k-mooney: the 2.7.15 python thing recently was that hacking check unit test failure18:13
mriedemhttps://bugs.launchpad.net/nova/+bug/183239218:14
openstackLaunchpad bug 1804062 in nova (Ubuntu Eoan) "duplicate for #1832392 test_hacking fails for python 3.6.7 and newer" [High,Triaged]18:14
mriedemaspiers: comment inline18:16
*** damien_r has quit IRC18:22
sean-k-mooneymriedem: kolla-ansible also had some other issue but i think rouk issue is related to ceph network connectinvity18:25
aspiersmriedem: thanks!18:26
sean-k-mooneynot 100% sure but when they tried to conenct to ceph form the node the ceph status commnd was filling speradically so that might be cause the agent to hang when it tries to get the capasity fo the rbd pool18:26
*** luksky has joined #openstack-nova18:30
roukyeah, checking for ip conflicts that somehow got into the network managed only by kolla...18:31
rouksee what else we can find, its flapping like an ip conflict.18:31
*** BjoernT has joined #openstack-nova18:32
sean-k-mooneyyou mentioned you changed the host name between redeploying right18:32
sean-k-mooneydid you use the kolla ansible bootstrap playbook18:32
sean-k-mooneythat templates out a /etc/hosts file with static assignments for all nodes18:32
sean-k-mooneybut if you are also using dhcp that could maybe cause issues18:33
sean-k-mooneyits been a while since i woked on kolla so they may have changed that18:33
*** BjoernT_ has quit IRC18:34
*** damien_r has joined #openstack-nova18:36
mriedemsean-k-mooney: mlavalle: comments on https://review.opendev.org/#/c/644881/ and question about the assertion that ovs hybrid plug vif types are "neutron wide" configuration - that seems surprising to me18:41
mriedemi thought that conifguration was per neutron agent18:41
*** damien_r has quit IRC18:43
artommriedem, thanks for the review - we're in the final throws of packing, so sean-k-mooney will have to follow up. I've left a reply for 1 thing though.18:48
*** tesseract has quit IRC18:49
*** ivve has joined #openstack-nova18:50
*** BjoernT_ has joined #openstack-nova18:51
*** BjoernT has quit IRC18:52
*** markvoelker has joined #openstack-nova19:01
*** ivve has quit IRC19:16
*** markvoelker has quit IRC19:20
*** phughk has joined #openstack-nova19:25
mriedemhttps://review.opendev.org/#/c/571265/ adds a functional test for a scheduler filter that was otherwise not tested (i don't think when i wrote it anyway), open for over a year now, has a +2 if someone can look19:40
*** maciejjozefczyk has joined #openstack-nova19:44
*** eharney has quit IRC19:47
mriedemis melwitt out this week?19:50
*** maciejjozefczyk has quit IRC19:50
mriedemdansmith: looking back on https://bugs.launchpad.net/nova/+bug/1773945 and https://bugs.launchpad.net/nova/+bug/1784074 - if we get here in conductor where the instance is deleted after the build request is created and we've done scheduling but before the instance is created in a cell, we delete the build request here and continue to the next instance we're building: https://github.com/openstack/nova/blob/74aebe0d4e5a978a400119:51
openstackLaunchpad bug 1773945 in OpenStack Compute (nova) "nova client servers.list crashes with bad marker" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)19:51
mriedem0aee9e70e98246c4/nova/conductor/manager.py#L135019:51
openstackLaunchpad bug 1784074 in OpenStack Compute (nova) "Instances end up with no cell assigned in instance_mappings" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)19:51
mriedemwe don't bury that instance in cell0 presumably because it's already been deleted via build request19:51
mriedemhowever, since we don't bury it, and the build request is gone, and we didn't create the instance, we could have an orphaned instance mapping, yeah?19:52
jgwentworthno, just didn't want to lose my other nick from not using it19:52
*** jgwentworth is now known as melwitt19:52
*** ralonsoh has quit IRC19:53
* melwitt feebly tries to open that link19:54
melwittconcatenating not working19:54
mriedemok, was going to ping you on ^ as well19:54
melwittwhat you're saying makes sense though19:55
melwittinterestingly recently I was trying to figure out how an instance could possibly be in both cell0 and cell1 in a single cell env, since that's happened to rdo cloud several times. but I found no way19:55
melwittso if you saw how that could be, lmk19:56
*** awalende has joined #openstack-nova19:58
dansmithmriedem: maybe?19:59
melwittso, we create instance mapping with cell_id=NULL in compute/api. so if we get to conductor and no build request found, we add a placeholder to the instance list19:59
melwittwe don't create an instance record20:00
mriedemright we'd skip those here https://github.com/openstack/nova/blob/74aebe0d4e5a978a40011e890aee9e70e98246c4/nova/conductor/manager.py#L139120:01
dansmithyeah was just reading that loop20:01
dansmithwe assume that if instance is none that it's because we buried it already I think20:01
mriedemwhile i'm looking at this code, we should probably make this fatal now :) https://github.com/openstack/nova/blob/74aebe0d4e5a978a40011e890aee9e70e98246c4/nova/conductor/manager.py#L123820:01
melwittwe delete all the stuff if quota recheck fails. but if someone has requested a delete of the thing, wouldn't the compute/api delete the instance mapping? I can't remember where we do that for delete20:01
mriedemdansmith: or it's already been deleted via the build request before we got that far20:02
mriedemmelwitt: no we don't delete the instance mapping when deleting the instance (or build request) in the api20:02
mriedemthat's why surya had to add it to the archive_deleted_rows cmd20:02
*** artom has quit IRC20:02
dansmithmriedem: yeah, we get instance=None if the build request failed, do we get it for other reasons?20:02
melwittoh, ever? if we never do, then is it an issue?20:03
dansmithohh,20:03
dansmithI see,20:03
openstackgerritAdam Spiers proposed openstack/nova master: Add extra spec parameter and image property for memory encryption  https://review.opendev.org/66442020:03
dansmithwe bury it right away if we get hostmappingnotfound20:03
dansmithbut if we get instance mapping not found, we don't bury, but instances.append(None)20:03
dansmiththe loop at the bottom assumes the former behavior20:03
dansmiththe latter looks like it was added for quota counting reasons20:04
melwittyeah, it was20:04
mriedemdansmith: you mean build request not found20:04
melwittthe append None thing20:04
mriedemhttps://github.com/openstack/nova/blob/74aebe0d4e5a978a40011e890aee9e70e98246c4/nova/conductor/manager.py#L135020:04
openstackgerritAdam Spiers proposed openstack/nova master: Use fake flavor instead of empty dict in test  https://review.opendev.org/66255520:04
openstackgerritAdam Spiers proposed openstack/nova master: Pass extra_specs to flavor in vif tests  https://review.opendev.org/66255620:04
openstackgerritAdam Spiers proposed openstack/nova master: Extract SEV-specific bits on host detection  https://review.opendev.org/63633420:04
openstackgerritAdam Spiers proposed openstack/nova master: Add <launchSecurity> element to guest config for AMD SEV  https://review.opendev.org/63631820:04
openstackgerritAdam Spiers proposed openstack/nova master: Allow guest devices to include <driver iommu='on' />  https://review.opendev.org/64456420:04
openstackgerritAdam Spiers proposed openstack/nova master: Detect that SEV is required and enable iommu for devices  https://review.opendev.org/64456520:04
openstackgerritAdam Spiers proposed openstack/nova master: Use <launchSecurity> element when SEV is required  https://review.opendev.org/66255720:04
openstackgerritAdam Spiers proposed openstack/nova master: Enable memory locking if SEV is requested  https://review.opendev.org/66255820:04
dansmithmriedem: yeah sorry20:04
dansmithmriedem: yeah that was melwitt's main counting quotas patch20:04
dansmithwhich added the instances.append(None) for the host not found bury case as well20:05
dansmithhttps://github.com/openstack/nova/commit/5c90b25e49d47deb7dc6695333d9d5e46efe8665#diff-378a96ec6159d0a2f8ec7ab71bc3843bR94620:06
dansmithalthough it looks like we were skipping those both anyway20:06
mriedemyeah, and at some point in talking about https://bugs.launchpad.net/nova/+bug/1784074 we had talked about squashing those two for loops into one again, but i've lost track of the discussion about that (though i think i know how to find it)20:07
openstackLaunchpad bug 1784074 in OpenStack Compute (nova) "Instances end up with no cell assigned in instance_mappings" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)20:07
openstackgerritEric Fried proposed openstack/nova master: Clarify --before help text in nova manage  https://review.opendev.org/66128920:07
mriedemi'm not really signing up to do that refactor now though,20:07
*** gfhellma__ has quit IRC20:07
dansmithyeah20:07
mriedemjust thinking it makes sense for us to delete the instance mapping in this block https://github.com/openstack/nova/blob/74aebe0d4e5a978a40011e890aee9e70e98246c4/nova/conductor/manager.py#L135020:07
dansmithI think we should just bury in that case, no?20:07
mriedemb/c if the BR is gone, and the instance isn't created in a cell yet, the instance mapping shouldn't exist either - it's orphaned20:07
melwittI've been wanting ("wanting") to do that refactor but, you know how it goes20:07
mriedemburying it would bring it back20:07
mriedemfrom the br zombieland20:08
dansmithmriedem: right, but isn't that the expectation of the api user?20:08
melwittbut how does that hurt anything? if we never delete instance mappings?20:08
dansmithor is this only in the multi-create case?20:08
melwitt*if we never delete instance mappings anyway?20:08
dansmithfor single create, once you get back a uuid you expect to be able to list that thing (with deleted=yes) until archive20:09
dansmithbut I guess maybe if you've deleted it it could be up for immediate archive and thus deleting the mapping is equivalent20:09
dansmithso yeah, either I guess20:09
mriedemmelwitt: it could be a problem in a tight window if you got that mapping via build request as a marker when listing/paging20:09
melwittoh, I see20:09
mriedemdansmith: if you delete the build request before it's an instance, i don't think we honor reading it as deleted=yes anyway,20:10
mriedembecause BRs are hard deleted20:10
mriedemmelwitt: at least that was part of the reasoning behind https://review.opendev.org/#/c/575556/20:10
dansmithmriedem: ack yeah I guess I remember that bit of cheating20:11
mriedemit would have been like newton-era talk about that cheating being ok, likely in a small hotel conf room in hillsboro20:11
mriedemif i remember correctly20:11
mriedemwith jpenick squashed on the floor in a corner20:12
melwittlol20:12
dansmithyar20:12
mriedemthe good old days...20:12
melwittit was wharm in there20:12
mriedemanyway, i'm about to abandon https://review.opendev.org/#/c/575556/ but was just looking at my last comment there20:12
mriedemmelwitt: unless you were directly underneath where the air shot out and down on you, which was my case20:13
mriedemthen it was cold and contact-drying20:13
melwittlol!20:13
melwittwell I'll be20:13
*** markvoelker has joined #openstack-nova20:17
*** belmoreira has joined #openstack-nova20:20
*** gfhellma has joined #openstack-nova20:22
*** gfhellma_ has joined #openstack-nova20:24
*** gfhellma has quit IRC20:28
openstackgerritMatt Riedemann proposed openstack/nova master: Delete InstanceMapping in conductor if BuildRequest is already deleted  https://review.opendev.org/66643820:32
*** awalende has quit IRC20:32
openstackgerritmelanie witt proposed openstack/nova master: rbd: use MAX_AVAIL stat for reporting bytes available  https://review.opendev.org/55669220:35
*** eharney has joined #openstack-nova20:36
*** markvoelker has quit IRC20:36
*** gfhellma_ has quit IRC20:39
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional test for AggregateMultiTenancyIsolation + migrate  https://review.opendev.org/57126520:40
*** awalende has joined #openstack-nova20:41
*** whoami-rajat has quit IRC20:47
openstackgerritMatt Riedemann proposed openstack/nova master: libvirt: don't log error if guest gone during interface detach  https://review.opendev.org/61072721:00
mriedemrelatively simple older patch here that has a +2 on it - it's mostly docstring https://review.opendev.org/#/c/633212/21:04
sean-k-mooneymriedem: just back after dinner ill take a look at https://review.opendev.org/#/c/644881/ now21:28
mriedemi'll be gone in 30 minutes21:31
sean-k-mooneymriedem: the use of hybrid plug is based on the local config on the compute host which is read by the agent and hybrid plug amoung other things is stored in the neutron db21:31
*** _erlon_ has quit IRC21:31
sean-k-mooneyso when the ml2 driver bind the port to the hsot is pass an agent context like our host_state object21:32
mriedemmnaser: figured out why my recreate test in https://review.opendev.org/#/c/647603/ wasn't failing21:32
sean-k-mooneywhich it check to determin if hybid plug should be used21:32
mriedemmnaser: only took a couple of months21:32
mriedemsean-k-mooney: but that's per neutron agent right?21:32
sean-k-mooneyyes21:32
mriedemthe commit message makes it sound like if you're ovs hybrid plug, it's deployment-wide21:33
*** markvoelker has joined #openstack-nova21:33
sean-k-mooneyit does. ill comment on the comment and correct it21:33
sean-k-mooneyfor tools like triplo it typeiclay configred globally21:33
sean-k-mooneybut it actully set per host21:33
sean-k-mooneyat bind time the ml2 driver is passed the agent context for the agent on the host that nova selected and it determins what to do based on the configurtion of that host21:34
mnasermriedem: what object should it be?21:34
mriedemmnaser: nvm, i guess it doesn't recreate it21:35
mriedemfor one thing the test wasn't going through get_stashed_volume_connector when deleting the server,21:35
mriedemand _local_cleanup_bdm_volumes swallows the error anyway21:35
mriedembut if the bdm.connection_info were 'null' the test would fail before that anyway21:36
mriedemso i guess i didn't recreate it21:36
efriedI think that newton midcycle was my first in-person encounter with you people.21:39
mriedem"you people"21:39
* mriedem calls the PC police21:39
sean-k-mooneyefried: does it feel longer or shorter then that :)21:40
efriedWith my dying breath I shall continue to deny that "you people" is an acceptable PC violation, under any circumstances.21:41
Nick_Aconfig drive - is there a way to only use it at instance creation and not let instance mount it again after?21:41
efriedsean-k-mooney: I thought I started somewhere around liberty/mitaka, but that may have been way in the background, no community involvement at that point.21:42
efriedsean-k-mooney: I do remember understanding ZERO of what went on in that room though.21:42
efried(Some things never change)21:42
*** gfhellma has joined #openstack-nova21:42
sean-k-mooneyi think that is common. both initally not following the supper detailed conversation and not getting involved in the comunity right away21:43
sean-k-mooneyi stared playing with openstack at teh end fo havana playing with quantum but it was after icehouse had  shipped in early juno that i submitted my first patch i think21:46
sean-k-mooneyhuh my first patch was a neutron spec apparently21:48
sean-k-mooneyhttps://review.opendev.org/#/c/95121/21:48
*** BjoernT_ has quit IRC21:49
*** markvoelker has quit IRC21:53
mriedemseems pypi mirrors have exploded,21:53
mriedemso for anyone waiting all day for a CI result, they likely have to recheck21:53
*** JamesBen_ has quit IRC21:54
*** awalende has quit IRC21:55
*** awalende has joined #openstack-nova21:56
*** awalende has quit IRC21:59
*** mriedem has quit IRC21:59
*** xek has quit IRC22:02
*** mlavalle has quit IRC22:08
sean-k-mooneythat proably explains why my recheck early never started22:10
*** belmoreira has quit IRC22:13
*** belmoreira has joined #openstack-nova22:16
*** zbr|ruck has quit IRC22:17
*** luksky has quit IRC22:39
*** Sundar has quit IRC22:44
*** belmoreira has quit IRC22:45
*** markvoelker has joined #openstack-nova22:50
*** gfhellma_ has joined #openstack-nova22:52
*** gfhellma has quit IRC22:55
*** markvoelker has quit IRC23:05
*** tkajinam has joined #openstack-nova23:06
*** JamesBenson has joined #openstack-nova23:12
*** rcernin has joined #openstack-nova23:16
*** JamesBenson has quit IRC23:17
*** gfhellma_ has quit IRC23:25
*** gfhellma_ has joined #openstack-nova23:25
*** gfhellma__ has joined #openstack-nova23:30
*** gfhellma_ has quit IRC23:31
*** gfhellma_ has joined #openstack-nova23:43
*** slaweq has quit IRC23:47
*** gfhellma__ has quit IRC23:47
alex_xusean-k-mooney: thanks, I'm ok with default to not copy the data23:56
sean-k-mooneyi think in the context of cross cell resize it nice that the default beahvior would be the same as intra cell resize23:57
sean-k-mooneyand its also consistent with snapshotting and shelve23:57
*** gfhellma_ has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!