Tuesday, 2018-11-20

*** prometheanfire has joined #openstack-nova00:10
prometheanfirehttp://logs.openstack.org/34/618834/1/check/cross-nova-py27/c3128e0/testr_results.html.gz new oslo.service causes failures00:10
prometheanfirehappy post-summit :D00:10
*** tetsuro has joined #openstack-nova00:11
jrollefried: done00:14
*** hoonetorg has quit IRC00:35
*** hoonetorg has joined #openstack-nova00:37
*** hoonetorg has quit IRC00:40
*** hoonetorg has joined #openstack-nova00:40
openstackgerritzhufl proposed openstack/nova master: Add missing ws seperator between words  https://review.openstack.org/61849101:21
*** Swami has quit IRC01:34
*** Dinesh_Bhor has joined #openstack-nova01:49
*** Dinesh_Bhor has quit IRC01:55
*** bhagyashris has joined #openstack-nova01:56
*** Dinesh_Bhor has joined #openstack-nova02:04
openstackgerritTetsuro Nakamura proposed openstack/nova master: Consider root id is None in the database case  https://review.openstack.org/61330502:18
*** mrsoul has quit IRC02:33
*** mhen has quit IRC02:51
*** mhen has joined #openstack-nova02:52
*** hongbin has joined #openstack-nova02:58
*** cfriesen has quit IRC03:24
*** Dinesh_Bhor has quit IRC03:42
*** dklyle has quit IRC03:45
*** david-lyle has joined #openstack-nova03:45
openstackgerritJie Li proposed openstack/nova-specs master: Support volume-backed server rebuild  https://review.openstack.org/53240703:46
*** cfriesen has joined #openstack-nova03:53
*** diga has joined #openstack-nova04:00
*** Dinesh_Bhor has joined #openstack-nova04:01
*** tetsuro has quit IRC04:04
*** udesale has joined #openstack-nova04:09
*** sridharg has joined #openstack-nova04:28
*** janki has joined #openstack-nova04:30
*** tbachman_ has joined #openstack-nova04:39
*** ivve has joined #openstack-nova04:41
*** tbachman has quit IRC04:42
*** tbachman_ is now known as tbachman04:42
openstackgerritMerged openstack/nova master: Skip double word hacking test  https://review.openstack.org/61884304:57
openstackgerritMerged openstack/nova master: Fix server query examples  https://review.openstack.org/61683404:57
*** bhagyashris has quit IRC05:04
*** hongbin has quit IRC05:07
*** tetsuro has joined #openstack-nova05:11
*** cfriesen has quit IRC05:38
*** betherly has quit IRC05:43
*** k_mouza has joined #openstack-nova05:43
*** betherly has joined #openstack-nova05:44
*** ratailor has joined #openstack-nova05:46
*** k_mouza has quit IRC05:48
*** betherly has quit IRC05:49
*** brinzhang has joined #openstack-nova05:51
*** links has joined #openstack-nova05:57
openstackgerritJie Li proposed openstack/nova-specs master: Support volume-backed server rebuild  https://review.openstack.org/53240706:04
*** sapd1 has joined #openstack-nova06:11
*** moshele has joined #openstack-nova06:12
*** ccamacho has quit IRC06:24
*** Dinesh_Bhor has quit IRC06:25
*** Dinesh_Bhor has joined #openstack-nova06:28
*** spsurya has joined #openstack-nova06:39
*** sapd1 has quit IRC06:39
*** Luzi has joined #openstack-nova06:42
*** takashin has left #openstack-nova07:01
*** sapd1 has joined #openstack-nova07:04
openstackgerritOpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata  https://review.openstack.org/61475707:10
openstackgerritRadoslav Gerganov proposed openstack/nova master: VMware: implement trigger crash dump  https://review.openstack.org/61873607:18
*** pcaruana has joined #openstack-nova07:20
*** moshele has quit IRC07:26
*** pcaruana has quit IRC07:34
*** psachin has joined #openstack-nova07:37
*** pcaruana has joined #openstack-nova07:40
*** tssurya has joined #openstack-nova07:46
*** ccamacho has joined #openstack-nova07:54
*** bhagyashris__ has joined #openstack-nova07:54
*** sahid has joined #openstack-nova07:59
*** adrianc has joined #openstack-nova08:04
*** ralonsoh has joined #openstack-nova08:24
openstackgerritSurya Seetharaman proposed openstack/nova master: Add os_compute_api:servers:create:cell_down policy  https://review.openstack.org/61478308:29
*** sapd1 has quit IRC08:37
*** sapd1 has joined #openstack-nova08:37
*** rodolof has joined #openstack-nova08:44
*** do3meli has joined #openstack-nova08:46
*** jpena|off is now known as jpena08:53
*** priteau has joined #openstack-nova08:54
*** lbragstad has joined #openstack-nova08:54
*** jpena has left #openstack-nova08:55
*** dpawlik has joined #openstack-nova08:56
*** ShilpaSD has joined #openstack-nova09:08
*** s10 has joined #openstack-nova09:17
*** k_mouza has joined #openstack-nova09:19
*** k_mouza has quit IRC09:20
*** k_mouza has joined #openstack-nova09:20
*** alexchadin has joined #openstack-nova09:22
*** sapd1 has quit IRC09:22
*** links has quit IRC09:26
*** pooja_jadhav has joined #openstack-nova09:27
*** cdent has joined #openstack-nova09:31
*** derekh has joined #openstack-nova09:37
*** bhagyashris__ has quit IRC09:48
*** slaweq__ has quit IRC09:48
*** sapd1 has joined #openstack-nova09:49
*** nehaalhat_ has joined #openstack-nova09:50
*** sapd1 has quit IRC09:54
*** dtantsur|afk is now known as dtantsur09:58
*** rodolof has quit IRC10:02
*** adrianc has quit IRC10:05
*** adrianc has joined #openstack-nova10:10
*** rodolof has joined #openstack-nova10:16
*** moshele has joined #openstack-nova10:23
*** sapd1 has joined #openstack-nova10:26
*** k_mouza has quit IRC10:27
*** k_mouza has joined #openstack-nova10:28
sean-k-mooneyo/10:30
*** sapd1 has quit IRC10:31
*** k_mouza has quit IRC10:33
openstackgerritChris Dent proposed openstack/nova master: Use external placement in functional tests  https://review.openstack.org/61794110:33
openstackgerritChris Dent proposed openstack/nova master: WIP: Delete the placement code  https://review.openstack.org/61821510:33
*** slaweq__ has joined #openstack-nova10:40
*** k_mouza has joined #openstack-nova10:51
*** Dinesh_Bhor has quit IRC10:56
*** Dinesh_Bhor has joined #openstack-nova10:58
*** Dinesh_Bhor has quit IRC10:59
*** sahid has quit IRC11:00
*** sahid has joined #openstack-nova11:00
*** tbachman_ has joined #openstack-nova11:02
*** tbachman has quit IRC11:03
*** tbachman_ is now known as tbachman11:03
*** udesale has quit IRC11:11
*** alexchadin has quit IRC11:11
*** adrianc has quit IRC11:12
*** diga has quit IRC11:16
*** pooja_jadhav has quit IRC11:22
*** adrianc has joined #openstack-nova11:23
*** adrianc_ has joined #openstack-nova11:25
*** k_mouza has quit IRC11:26
*** adrianc has quit IRC11:29
*** k_mouza has joined #openstack-nova11:29
openstackgerritMerged openstack/nova master: Nix refs to ResourceProvider obj from libvirt UT  https://review.openstack.org/61878611:29
cdenthuzzah11:32
sean-k-mooneywas the the last usage of placement object in the nova unit tests11:33
cdentsean-k-mooney: not quite11:34
*** k_mouza has quit IRC11:34
cdentwell, strictly speaking, yes11:34
cdentit was the last usage of the objcts11:34
*** k_mouza has joined #openstack-nova11:34
cdentbut there are some other tests which are testing things that want to use the placement db11:34
sean-k-mooneybut not the last useage of placement11:34
sean-k-mooneyah ok11:34
cdentin the my wip to remove the placement code I've had to remove some other tests/code11:34
cdentwhen I get mat's suggestions on https://review.openstack.org/#/c/600161/ done, I'm going to go back to https://review.openstack.org/#/c/618215/ to fix up the merge conflict and tidy up whatever else is broken11:35
prometheanfirehttp://logs.openstack.org/34/618834/1/check/cross-nova-py27/c3128e0/testr_results.html.gz new oslo.service causes failures11:40
prometheanfiresince people seem here :D11:40
openstackgerritMartin Midolesov proposed openstack/nova master: Allow driver to specify switch&port for faster lookup  https://review.openstack.org/61769511:40
*** adrianc_ has quit IRC11:43
cdentprometheanfire: oh joy. I seem to recall efried and melwitt being vaguely aware of that11:45
sean-k-mooneywe were aware of that because we backported a fix downstream that we sould not have11:46
sean-k-mooneyprometheanfire: what branch is this happening on11:46
sean-k-mooneyprometheanfire: ya rocky nova is not compatible with that version of oslo service11:47
sean-k-mooneywe should not be increaseing the upper constatin on olso service in stable rocky anyway11:47
*** tbachman has quit IRC11:48
sean-k-mooneyoh it got backported into 1.31.6 ...11:48
*** pooja_jadhav has joined #openstack-nova11:49
*** pooja_jadhav has quit IRC11:50
*** alexchadin has joined #openstack-nova11:50
prometheanfireyep :|11:54
sean-k-mooneyhttps://review.openstack.org/#/c/616505/3 is the issue11:54
sean-k-mooneyor rather https://review.openstack.org/#/q/I62e9f1a7cde8846be368fbec58b8e0825ce0207911:55
sean-k-mooneywell i guess its the same11:55
*** alexchadin has quit IRC11:56
*** s10 has quit IRC11:57
*** dtantsur is now known as dtantsur|brb11:59
*** do3meli has left #openstack-nova12:02
*** s10 has joined #openstack-nova12:04
*** xek_ has joined #openstack-nova12:06
*** xek_ is now known as xek12:06
*** Luzi has quit IRC12:07
*** slaweq__ has quit IRC12:07
*** adrianc_ has joined #openstack-nova12:09
*** janki has quit IRC12:10
xekdid anyone try to use nova notifications lately?12:10
xekI have an issue that when an instance is deleted, I only get an update notification with task state "deleting", but no instance.delete.end notification...12:10
xekI tried both versioned and unversioned notifications12:11
jaosoriorgibi: are you around?12:15
jaosoriorgibi: I remember you had around some document where you had the new supported versioned notifications.12:16
xekjaosorior, I think it's this one: https://docs.openstack.org/nova/latest/reference/notifications.html12:16
jaosoriorxek: instance.delete.end is there in the list12:17
jaosoriorso, if the notification is not being emmited, it's a bug.12:17
sean-k-mooneyxek: going froward i belive we are droping support for unverioned notifications. i dont belive we intend to remvoed them but we had discussed freezeing the code and not fixing new buts12:18
sean-k-mooney*bugs12:18
jaosoriorsean-k-mooney: thanks, we're aware of the unversioned notifications deprecation. Hence why we started testing the versioned ones. instance.delete.end should be in the versioned notifications too (according to the list in the doc xek pointed at). So... if it's not being emmited, I think it's a bug. We'll file it up.12:19
jaosoriorunless that doc is outdated :/12:19
sean-k-mooneyjaosorior: if the versioned notifcation is not beeing emited its a bug yes12:20
sean-k-mooneythat said the notifacion may have been lost in rabbitmq if you dont have perstency12:20
sean-k-mooneyi assume its repeatable12:20
jaosoriorsean-k-mooney: can yo ellaborate on that?12:21
*** moshele has quit IRC12:21
sean-k-mooneyjaosorior: you can configure rabbitmq to persist each message to disk until its dequeued or it can keep them just in memory12:21
*** Luzi has joined #openstack-nova12:22
sean-k-mooneyusing durable queue is a significant perfromace hit but if rabbit is restarted messages are not lost12:22
sean-k-mooneyi do not belive the queue used for notifcation are durabel12:22
*** ratailor has quit IRC12:23
jaosoriorsean-k-mooney: so, a bit of context: xek is working on getting functional tests for novajoin (a vendordata plugin for nova). These functional tests are being set up on top of devstack.... do you happen to know if this would be an issue that would hit a small devstack deployment?12:23
xeksean-k-mooney, we saw some heartbeat misses and thus connectivity issues in the logs, but we didn't restart rabbitmq12:23
jaosoriorsean-k-mooney: we ultimately use it in TripleO, however, we haven't actually had issues with the notifications (yet). So I guess the rabbitmq settings are OK :D12:23
xeksean-k-mooney, and it's pretty consistent12:23
sean-k-mooneyno idea unfortunetly i dont know what the default is12:24
xekthe default is non-persistent12:24
sean-k-mooneyoslo messaging and rabit by default do not gurantee message delivery as far as i am aware12:24
sean-k-mooneyif there was a tempary network issue the notifcaion could have been lost12:24
sean-k-mooneymy understanding is  oslo messaging will not buffer the notifaction to my knolage and retry sendign the notification to rabbit if it trys to send it and fail unless you code that on top yourself12:26
*** slaweq__ has joined #openstack-nova12:27
sean-k-mooneyi could be wrong about that however as this is a part of nova i have realy dealt with12:27
*** priteau has quit IRC12:28
jaosoriorsean-k-mooney: got it, I guess the next step is to investigate how rabbitmq is being configured in devstack, and how to modify that configuration if necessary12:28
jaosoriorsean-k-mooney: thanks for the guidance!12:28
*** moshele has joined #openstack-nova12:28
openstackgerritChris Dent proposed openstack/nova master: Use external placement in functional tests  https://review.openstack.org/61794112:35
openstackgerritChris Dent proposed openstack/nova master: WIP: Delete the placement code  https://review.openstack.org/61821512:35
openstackgerritChris Dent proposed openstack/nova master: WIP: Delete the placement code  https://review.openstack.org/61821512:36
sean-k-mooneyjaosorior: xek for what its worth it looks like you can enable durable queues in the nova.conf and it default to false12:36
sean-k-mooneyhttps://docs.openstack.org/oslo.messaging/ocata/opts.html#oslo_messaging_rabbit.amqp_durable_queues12:36
sean-k-mooneyi should proably check the latest doc actully to see if that is still a thing12:37
janguttersean-k-mooney: would you perhaps be able to donate some of your copious free time towards the helping with the naming of things?12:38
jaosoriorsean-k-mooney: thanks! we'll check that out.12:39
*** moshele has quit IRC12:39
sean-k-mooneyjangutter: hah i could stop procrastinating by looking at messaging stuff and look at nameing instead12:39
sean-k-mooneyjaosorior: https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo_messaging_rabbit.amqp_durable_queues its there on latest too12:40
janguttersean-k-mooney: bring your bikeshed, it can be any colour as long as it's red.12:40
sean-k-mooneyhaha12:40
sean-k-mooneynot pink12:41
janguttersean-k-mooney: I think we're close to peak confusion with what the heck "network offloads" mean.12:41
sean-k-mooneyjangutter: is this related to the os-vif spec12:41
sean-k-mooneyah yes12:41
*** psachin has quit IRC12:41
janguttersean-k-mooney: yaas.... anyone wanting to frighten their young'uns: https://review.openstack.org/#/c/607610/12:42
*** slaweq__ is now known as slaweq12:42
sean-k-mooneyjay highlighted it to me yesterday12:42
janguttersean-k-mooney: googling NIC offloads leads me to: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-nic-offloads12:43
sean-k-mooneyyes when i think nic offload i think the things ethool say the nic can offload12:43
sean-k-mooneynot hardware offloaded ovs12:44
janguttersean-k-mooney: that makes sense from that point of view.12:44
janguttersean-k-mooney: so what would a nice, intuitive, unambiguous way be to describe a packet processing pipeline that is run on a coprocessor?12:45
sean-k-mooneyit can also include things like ipsec offlaod too which does not show up in ethtool but is in the same vain as vxlan tunnel encap/decap offload12:45
janguttersean-k-mooney: it's a metric shedload of confusion.12:45
sean-k-mooneywell you could call it vswitch offload/accleration12:46
gibixek, jaosorior: please file a bug. I will go and try to reproduce it12:47
janguttersean-k-mooney: In my mind, I have broadly two classifications: things that happen "at the endpoint" (like TSO/ rx/tx checksum offload) and things that happen "before it hits the other endpoint".12:47
gibixek, jaosorior: I'm a bit on and off today12:47
janguttersean-k-mooney: yeah 'vswitch' is acceptable, I guess, but you don't necessarily need a vswitch. In theory, things like IPSEC/VXLAN would fit in my "second case" too.12:48
sean-k-mooneyjangutter: there is the offload phase the modifes the packet and the classifiact/switching pahses that desiced what modification to apply and where to send it12:48
janguttersean-k-mooney: if IPSEC is regarded as a tunnel...12:48
*** panda|rover is now known as panda|rover|lch12:48
janguttersean-k-mooney: yeah, "more associated with the guest" and "more associated with the host".12:49
janguttersean-k-mooney: it would be awesome if we could classify them as "guest offloads" and "host offloads".12:49
*** brinzhang has quit IRC12:50
sean-k-mooneyjangutter: i would be ok with that split12:51
janguttersean-k-mooney: only problem is that it's a bit ambiguous.... since obviously endpoints on the host can also make use of "guest offloads".12:51
*** rodolof has quit IRC12:52
sean-k-mooneywe could call the protocol offload instead and switching offloads12:53
janguttersean-k-mooney: hence why I thought "endpoint offloads" and "datapath offloads" are also good bikesheds....12:53
janguttersean-k-mooney: "protocol offload" and VXLAN/IPSEC will cause a bit of confusion, but I'm fine with "switching/vswitch/eswitch offloads"12:54
sean-k-mooneythen network offloads and switching offloads?12:55
sean-k-mooneyi agree the tunnels are kind of both12:55
sean-k-mooneythey ware weird12:55
janguttersean-k-mooney: tell me about it.12:55
janguttersean-k-mooney: I almost want to call 'em "socket offloads".12:57
sean-k-mooneyso while we some of them dont work a l4 however12:57
sean-k-mooneyignore the so while we12:58
sean-k-mooneyi need to full clear my buffer when i change what i was going to type12:58
sean-k-mooneyjangutter: since we are talking about nameing how do you feel about my nameing comments in https://review.openstack.org/#/c/572081/9/os_vif/objects/host_info.py@20512:59
sean-k-mooneye.g. the fact that the repsentor netdevs are really part of the contol plane not data plane since they do not transmit packets13:00
*** dtantsur|brb is now known as dtantsur13:00
sean-k-mooneythe are kind of like the vhost-user sockets in that respect13:00
janguttersean-k-mooney: representors absolutely transmit packets.13:00
sean-k-mooneythe difference being if you implemented them correctly and ran tcpdump the kernel would actully allow you to sniff the packets13:01
sean-k-mooneyjangutter: the VF those but the netdev you add to ovs should not13:01
janguttersean-k-mooney: mainly, the first packet of a new flow, i.e. something that doesn't have a rule.13:01
sean-k-mooneyright the learning packet but not the rest13:01
sean-k-mooneyi forgot they are used for the exception path13:02
janguttersean-k-mooney: yep, and it's vital to things like tunnels...13:02
sean-k-mooneynetworking is hard.13:02
sean-k-mooneyi need to respond to a downstream bug but ill be back in a while13:02
sean-k-mooneydid this help?13:02
janguttersean-k-mooney: thanks very much, this was very therapeutic!13:03
sean-k-mooneyill try and review the spec today or at least this week i have it open on my monitor in anycase13:03
janguttersean-k-mooney: thanks, will be respinning to try to clarify that some "offloads" are more "off" than others.13:04
*** s10 has quit IRC13:05
*** priteau has joined #openstack-nova13:11
*** moshele has joined #openstack-nova13:11
openstackgerritdo3meli proposed openstack/nova master: Allow VMs to use unaddressed ports  https://review.openstack.org/53324913:12
*** k_mouza has quit IRC13:19
*** k_mouza has joined #openstack-nova13:19
*** moshele has quit IRC13:21
*** s10 has joined #openstack-nova13:21
*** diga has joined #openstack-nova13:22
*** k_mouza has quit IRC13:23
*** k_mouza has joined #openstack-nova13:26
*** k_mouza_ has joined #openstack-nova13:30
*** k_mouza has quit IRC13:32
dtantsurhi folks! is there a high-level description of Placement API? /cc cdent13:33
bauzasdtantsur: https://developer.openstack.org/api-ref/placement/ ?13:34
dtantsurbauzas: this is low-level, it says how to use specific endpoints. I'm more interested in high-level flow.13:35
dtantsurI want to make ironic optionally report to/consume placement13:36
dtantsurI need to understand 1. what reporting actually means, 2. how a node can be reserved via Placement.13:36
bauzasdtantsur: we also have https://docs.openstack.org/nova/latest/contributor/placement.html13:37
*** panda|rover|lch is now known as panda|rover13:37
belmoreiradtantsur are you tracking this work somewhere?13:37
dtantsurbelmoreira: mostly in my head for now.. the API design without placement bits is https://review.openstack.org/61795313:38
dtantsurlet me try a specific question13:39
dtantsurgiven that any Ironic node is represented by exactly one instance of a custom resource class13:40
dtantsurto reserve a node I need: 1. GET /resource_providers?resources=CUSTOM_BAREMETAL:1&required=list,of,traits13:40
dtantsur2. POST /allocations with suitable UUID?13:40
*** mriedem has joined #openstack-nova13:41
*** mlavalle has joined #openstack-nova13:43
*** lpetrut has joined #openstack-nova13:44
*** tbachman has joined #openstack-nova13:45
*** whoami-rajat has quit IRC14:00
*** artom has quit IRC14:02
openstackgerritMatt Riedemann proposed openstack/nova master: Use long_rpc_timeout in select_destinations RPC call  https://review.openstack.org/60773514:04
mriedemdansmith: a couple of questions in https://review.openstack.org/#/c/617898/14:11
*** mmethot has quit IRC14:13
*** mmethot has joined #openstack-nova14:17
*** k_mouza has joined #openstack-nova14:20
dansmithjaypipes: you had feelings on this in the past, if you want to chime in ^14:20
*** k_mouza_ has quit IRC14:23
*** awaugama has joined #openstack-nova14:25
cdentdtantsur: join us in #openstack-placement14:27
*** jdillaman has joined #openstack-nova14:30
*** liuyulong has joined #openstack-nova14:30
mriedemdansmith: replied. i'd be +2 on that now unless you are going to update the little CRUD APIs thing14:37
*** s10 has quit IRC14:37
mriedemi could see value in trying to document your replies about alternatives to filtering on cell etc, but that might be more work than it's worth right now14:38
dansmithmriedem: yep, I'll fix the crud wording first14:39
efriedprometheanfire, sean-k-mooney: If we're backporting that oslo.service change, we need to backport the nova fixage to those mocks. This was a pretty big PITA when we did it on master, requiring a weird lockstep of patches in nova and requirements. Let me find it quick...14:40
efriedprometheanfire, sean-k-mooney: Okay, so I think it was, in this order:14:42
efriedRemove the mocks from nova: https://review.openstack.org/#/c/616697/14:42
efriedUpdate the requirements: https://review.openstack.org/#/c/616371/14:42
efriedUpdate nova to use the mocks and require the new release: https://review.openstack.org/#/c/615724/14:42
sean-k-mooneyefried: yes we would or we could not backport the oslo.service change at all14:43
efriedsean-k-mooney: Or we could just backport "remove the mocks". The only thing it affects is wallclock time for tox. The mocks are just avoiding real sleeps.14:44
sean-k-mooneyi would personally prefer to revet teh oslo change form the 1.31.x branch but that said it only breaks the unit test and does pass functional and tempest tests14:44
sean-k-mooneyefried: ya that is an option14:45
efriedyes, it's UT only. And it's because nova is mocking private things from oslo.service, and those private things are re/moved with that fix.14:45
efried(and that was my bad, mocking the privates)14:45
openstackgerritDan Smith proposed openstack/nova master: Add CellsV2 FAQ about API design decisions  https://review.openstack.org/61789814:45
sean-k-mooneyefried: yes but they were removed in a release of oslo.service that was above the max allowed by the upper-constratins for that release14:46
sean-k-mooneyefried: redhat has backported this internally and it broke everything so i know it will make lyarwood happy if we fixed nova upstream to work with that backport14:47
sean-k-mooneyefried: i guess https://review.openstack.org/#/c/616697/ is relitivly small14:48
efriedsean-k-mooney: I'm going to take the morning off. If you and/or dhellmann and/or prometheanfire want to fix it up, cool, or bug me about it later and I can propose whatever.14:48
efriedsean-k-mooney: Yes, it's trivial.14:48
sean-k-mooneyok i can propose the backport for https://review.openstack.org/#/c/616697/214:49
sean-k-mooneywe cant bump to 1.33 on stable however14:50
sean-k-mooneyso we will need to get them to backport the sleep fixture.14:51
openstackgerritEric Fried proposed openstack/nova master: Remove v1 check in Cinder client version lookup  https://review.openstack.org/61792714:52
*** munimeha1 has joined #openstack-nova14:53
openstackgerritEric Fried proposed openstack/nova master: Consider root id is None in the database case  https://review.openstack.org/61330514:54
*** sahid has left #openstack-nova14:59
efriedsean-k-mooney: It looks like that's proposed anyway: https://review.openstack.org/#/c/617989/14:59
*** jcosmao has joined #openstack-nova14:59
sean-k-mooneyefried: yes chating to them on oslo channel15:01
*** efried is now known as efried_pto15:02
sean-k-mooneyill propse the backport for the 2 patches you suggested15:02
efried_ptothanks sean-k-mooney15:02
sean-k-mooneyactully hberaud is gong to do it but ill keep an eye on it. enjoy your morning off15:02
*** udesale has joined #openstack-nova15:05
*** tbachman has quit IRC15:08
*** tidwellr_ has joined #openstack-nova15:09
*** cfriesen has joined #openstack-nova15:12
jaypipesdansmith: done15:13
jaypipesdansmith: thx for the heads up on that.15:13
*** edmondsw has joined #openstack-nova15:14
*** Luzi has quit IRC15:14
dansmithjaypipes: thanks15:19
openstackgerritChris Dent proposed openstack/nova master: WIP: Delete the placement code  https://review.openstack.org/61821515:19
mriedemif someone is looking to update the resurrected ops guide docs about cells https://bugs.launchpad.net/openstack-manuals/+bug/180425315:21
openstackLaunchpad bug 1804253 in openstack-manuals "Capacity planning and scaling in Operations Guide - cells information is out of date" [Undecided,New]15:21
mriedem^ is still all cells v115:21
*** BjoernT has joined #openstack-nova15:25
*** k_mouza_ has joined #openstack-nova15:25
openstackgerritHervé Beraud proposed openstack/nova stable/rocky: remove mocks of oslo.service private members  https://review.openstack.org/61901915:26
BjoernTHello, Is someone here aware of the implementation of ComputeManager._run_image_cache_manag as we run in to performance issues on a NFS mounted /var/lib/nova/instances directory and now had to increase rpc response timeout?15:26
*** k_mouza has quit IRC15:29
*** tbachman has joined #openstack-nova15:31
sean-k-mooneythat ^ sound like an mdbooth kind of question but he does not seam to be about currently15:32
*** tbachman has quit IRC15:37
*** tbachman has joined #openstack-nova15:37
openstackgerritHervé Beraud proposed openstack/nova stable/rocky: Use SleepFixture instead of mocking _ThreadingEvent.wait  https://review.openstack.org/61902215:38
*** k_mouza_ has quit IRC15:39
*** k_mouza has joined #openstack-nova15:40
*** lpetrut has quit IRC15:40
*** dave-mccowan has joined #openstack-nova15:41
openstackgerritJack Ding proposed openstack/nova master: Add I/O Semaphore to limit concurrent disk ops  https://review.openstack.org/60918015:42
*** k_mouza has quit IRC15:44
*** k_mouza has joined #openstack-nova15:45
*** maciejjozefczyk has quit IRC15:46
*** maciejjozefczyk has joined #openstack-nova15:47
prometheanfireefried_pto: sean-k-mooney I'd say we are fine for now, nova may want to add a exclusion to it's reqs.txt, or not15:47
*** liuyulong has quit IRC15:47
*** maciejjozefczyk has quit IRC15:47
prometheanfirethe update is being held back atm by reqs cross gating15:48
*** devep has joined #openstack-nova15:49
prometheanfirequestion is, are old versions of nova going to work with the oslo.service change (18.0.2 and the like), it sounds like not, which means packagers should be made aware15:49
sean-k-mooneyprometheanfire: https://review.openstack.org/#/c/619019/1 and https://review.openstack.org/#/c/619022/1 will fix the nova compatiblity15:49
sean-k-mooneyprometheanfire: old versions of nova would work but the unites would not which may break packager build systems15:50
prometheanfireunites / unit tests?15:52
*** tssurya has quit IRC15:55
openstackgerritMerged openstack/nova master: Add description of custom resource classes  https://review.openstack.org/61672115:55
openstackgerritMerged openstack/nova master: Add CellsV2 FAQ about API design decisions  https://review.openstack.org/61789815:55
*** itlinux has quit IRC15:57
*** Sundar has joined #openstack-nova16:06
Sundarjaypipes, dansmith, sean-k-mooney, cdent: Thanks for discussing the Nova-Cyborg spec in IRC y'day. I caught up with that. Will remove the Cyborg API signatures. and16:09
*** bhdn has joined #openstack-nova16:10
SundarI still have some questions on what jaypipes expects. The os-acc is not going to handle devices by itself. It neds access to Cyborg db and drivers, which means the majority of work will happen in Cyborg.16:10
*** udesale has quit IRC16:11
*** k_mouza has quit IRC16:11
Sundarsean-k-mooney: Re. request groups in device profiles, it is still not clear to me how we would handle co-location without them, i.e., we want 2 accelerators from 2 different RPs in the same device.16:13
*** hamzy has quit IRC16:14
mriedemos-acc is going to have direct db access to cyborg?16:17
*** k_mouza has joined #openstack-nova16:18
*** artom has joined #openstack-nova16:19
*** artom has joined #openstack-nova16:19
*** ccamacho has quit IRC16:20
Sundarmdriedem: No. os-acc needs to call Cyborg REST APIs, and those calls do the bulk of the work.16:23
jaypipesSundar: currently on a call with sean-k-mooney and jangutter about os-vif. give me a little while to respond.16:23
mriedemSundar: ok, if os-acc were like os-brick and os-vif, i would expect it to deal with the physical devices on the host16:24
mriedemand something like python-cyborgclient would be used for dealing with the cyborg rest API16:24
mriedemor the openstacksdk16:24
mriedemat least that's the model nova has for dealing with volumes and ports16:24
Sundarjaypipes, Sure, NP16:24
Sundarmriedem: I understand. os-acc is not an exact clone of os-vif or os-brick. For example, to bind an ARQ, a device may need to be configured or re-programmed. That requires a Cyborg driver which knows the details of that device.16:26
Sundarmriedem: That is more like what a Neutron mechanism driver does.16:27
mriedemhmm,16:27
mriedemos-brick and os-vif definitely have plugins/drivers that do things based on the 'type' of device16:27
mriedembut i'm just sitting in the peanut gallery here so ignore me16:28
SundarIf we were to have separate drivers for os-acc and Cyborg, it would be cumbersome -- for example, tasks needed for device discovery/initialization (handled by Cyborg drivers) and tasks required for ARQ binding (initiated via os-acc) will have many commonalities. For instance, both may need ways to reset the device (or some part of it).16:33
SundarApart from having two different driver installs/configures etc.16:34
*** devep_ has joined #openstack-nova16:34
*** devep has quit IRC16:36
SundarThe os-vif plugins, from what I have seen, are handling Linux bridges, OVS, etc., not hardware per se.16:38
mriedemi believe cinder (the service) uses os-brick16:41
mriedemto avoid doing the same things in both places16:41
mriedemjungleboyj: ^?16:41
* jungleboyj tries to catch up.16:42
*** itlinux has joined #openstack-nova16:43
dansmithmriedem: Sundar I think it's entirely legit to think that not all device programming can be contained within os-acc16:43
dansmithit's a lot more complicated of a thing than configuring an initiator or a bridge16:44
Sundardansmith: ^ +116:44
jungleboyjmriedem:  You understanding is correct and we have different drivers in there depending on the type of device.16:44
mriedemok, again, peanut gallery16:45
jungleboyjBoth Cinder and Nova use os-brick so that we aren't duplicating code.16:45
sean-k-mooneyo/16:45
jungleboyjThat does the work locally on the compute node and then anything that required work from the volume driver is done through the Cinder-API.16:46
dansmithif we use brick as the analogy,16:47
dansmithit would be like putting all the stuff that talks to the backend volume providers into os-brick16:47
sean-k-mooneydansmith: the programin of the device id not really done by cyborg either howver16:49
sean-k-mooneyit will be delegating the fpga progroming in the intel case to opae16:49
Sundarsean-k-mooney: Not quite true.16:49
dansmithsean-k-mooney: sure, just like cinder-volume doesn't actually sort the bits on the disk, but asks whatever api it has for the backend to do it16:49
SundarCyborg will indeed call a Cyborg driver, which can call into device/vendor-specific drivers, such as OPAE or i915 (for GPUs), etc.16:50
sean-k-mooneyso im conused are we all happy with the statemet nova only interaction point with cyborg should be via os-acc and that os-acc should only interact with cyborg via its rest api16:53
dansmithI guess I'm not sure what is confusing.. we interact with cinder via the cinderclient/brick16:55
dansmithI think we're hoping that os-acc can serve both purposes, right?16:55
mriedemthat's what i'm hearing be described16:55
mriedemos-acc is both rest api client and low-level host device thing16:55
sean-k-mooneyyes and that os-acc can hold the definitions of any data structre that nova and cyboge have to share16:56
sean-k-mooneymriedem: yes to form no to later16:56
mriedemwhat?16:56
dansmithhuh?16:56
dansmithjinx16:56
sean-k-mooneyos-acc would only be a rest client16:57
mriedemi got jinxed by a couple of 7 year old girls the other night, couldn't talk for 10 minutes, it was....not hard16:57
sean-k-mooneythe low level programing is handeled by the cyboge drivers and the tools they invoke16:57
Sundarsean-k-mooney: True. But I think what is being said is that Nova views os-acc as performing the low-level tasks, i.e, it calls os-acc and leaves the details to it16:58
jaypipesI really don't see why os-acc can't start off being a plug-the-device-into-the-VM library.16:58
dansmithsean-k-mooney: I don't know why you're so intent on declaring that os-acc is only one thing or another16:58
dansmithjaypipes: that's not a separate action16:58
jaypipesdansmith: what do you mean?16:58
dansmithjaypipes: plugging involves writing pci attachment into the virt xml for boot, right?16:58
dansmithjaypipes: if there's any massaging of the device needed, like writing to sys to discover a new pci endpoint or something, then that seems like it could/should be in os-acc16:59
sean-k-mooneyjaypipes: would you be happy saying os-acc could start as beeing a program the device for the vm lib16:59
mriedemin other news, we broke postgresql http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/a52bcf9/logs/screen-n-api.txt.gz?level=ERROR16:59
jaypipesdansmith: yes, I agree with you. I just don't believe os-acc should be a REST API client to cyborg.17:00
dansmithjaypipes: so we need a cyborgclient?17:00
dansmithjaypipes: I'm not sure what os-acc would be doing if it's not talking to cyborg17:00
sean-k-mooneydansmith: i would like to clearly scope what os-acc is so that i can understand what componets prefrom what actions17:02
sean-k-mooneySundar: the reason i care about if os-acc is just interacting with the rest api is because if it not. e.g it use the RPC bus or cyborge db directly then it has deployment impact17:03
sean-k-mooneye.g. the credetials an connection info17:03
dansmithsean-k-mooney: nobody is arguing for that are they?17:04
sean-k-mooneyif it interacts with the devices directly it has packaging impacts17:04
Sundarsean-k-mooney: I agree. os-acc will indeed talk to Cyborg API, not the agents or drivers directly.17:04
slaweqmriedem: bug reported: https://bugs.launchpad.net/nova/+bug/180427117:04
openstackLaunchpad bug 1804271 in OpenStack Compute (nova) "nova-api is broken in postgresql jobs" [Undecided,New]17:04
mriedemslaweq: thanks again17:04
dansmithsean-k-mooney: crossing over to the db/mq between projects would be a major issue I think17:04
jaypipes(sorry, folks, I'm in another meeting...)17:04
slaweqyw :)17:05
sean-k-mooneydansmith: in past version of the spec it was allowed to17:05
SundarTaking a step back: pretty much everything that needs to be done either requires Cyborg db access or device poking via Cyborg drivers.17:05
dansmithSundar: you understand you can't poke the cyborg db from os-acc or nova though right?17:06
*** fanzhang has quit IRC17:06
Sundardansmith: Yea, thats why I am saying os-acc needs to call Cyborg API17:06
dansmithyeah17:07
dansmithI definitely never saw where in the spec it said that os-acc would talk directly to the cyborg db17:07
sean-k-mooneySundar: and simlarly do you want os-acc to be able interact with the device directly? i assume no17:07
SundarYup, no direct access17:07
mriedemi think the confusion was because of this question / statement earlier, which prompted me to ask about direct db access:17:08
mriedem(10:10:21 AM) Sundar: I still have some questions on what jaypipes expects. The os-acc is not going to handle devices by itself. It neds access to Cyborg db and drivers, which means the majority of work will happen in Cyborg.17:08
Sundarsean-k-mooney: Even in past specs, os-acc wouldn;t talk to CYborg drivers -- it was talking to Cyborg agent on the same compute node -- and that was all prior to the Stein PTG17:08
dansmithright, which is saying "it can't, because .. access to db"17:08
dansmithright?17:08
mriedemyes i realize it meant, "access to the db and drivers via the cyborg rest api"17:09
sean-k-mooneySundar: that was going to be my next question17:09
dansmithwell, I think it means "it has to ask cyborg via api to do that, because only cyborg has access to the drivers and db" but.. same difference17:09
sean-k-mooneyyes at the ptg we said os-acc would not talk to the agent directly either17:09
sean-k-mooneyso does os-acc talk to anything other then the cyborge api17:09
SundarIf you look at the os-acc API notes in the Nova spec, I have even identified which Cyborg API whill be called in each scenario17:10
Sundarsean-k-mooney: No17:11
sean-k-mooneyok so the statement i made earilar that os-acc will only talk to the rest-api and is not the lovel device lib was corect17:11
mriedemin that case, why not just use openstacksdk?17:12
mriedemi realize this is bike shedding a bit,17:12
mriedembut os-acc makes me think of os-vif and os-brick which definitely do not call REST APIs in cinder or neutron,17:12
sean-k-mooneymriedem: there is no sdk support yet but that would also be a valide approch17:12
Sundarmriedem: That's a good point. There is still a need for os-acc.17:12
mriedemand if there is no python-cyborgclient, like there is no python-placementclient, we should just use openstacksdk17:12
SundarFor example, os-acc associates device RPs with  individual accelerator requests from the device profiles, because Nova/Placement don't do that17:13
dansmithmriedem: well, I think the linkage to os-vif was around defining pluggable data types, so that things other than PCI would be doable17:14
mriedem"For example, os-acc associates device RPs with  individual accelerator requests from the device profiles, because Nova/Placement don't do that",17:14
mriedemmeaning it's going to be doing things like neutron agent for bandwidth provider inventory/allocations?17:14
sean-k-mooneymriedem: meaing that when nova get the allcoation candiate form placement in the schduerler/condcutor os-acc will parse it try and figure out which RP maps to each device profile and tell cyborg17:15
gibimriedem: as a side note, I was pushed to the direction to do the mapping in a more generic way, between RequestGroup ovos in the RequestSpec and RPs in the allocation17:15
gibimriedem: which means the core of that code will be generic enough for cyborg use as well17:16
*** alexchadin has joined #openstack-nova17:16
SundarFolks, there is a Cyborg client: https://github.com/openstack/python-cyborgclient17:17
sean-k-mooneySundar: yes there is but that is really the commandline client right17:17
SundarYes ^17:17
dansmithbut still,17:18
SundarI am just pointing out that it exists and is different from os-acc17:18
dansmithno different than cinderclient or neutronclient right?17:18
sean-k-mooneythe openstack sdk in theory is ment to replace the python-client17:18
sean-k-mooneydansmith: ture17:18
dansmithisn't the only reason we're going to have os-acc over cyborgclient is for the object definitions?17:19
dansmitha la os-vif17:19
sean-k-mooneyif gibi's rp mapping code is generic enough nova could reuseit before calling into the sdk or cyborge client to update cyborge17:19
*** dtantsur is now known as dtantsur|afk17:19
sean-k-mooneydansmith: that is one of the main reasons yes17:19
Sundardansmith: yes, and for conversions to/from such objects. For example, if a Cyborg API returns a JSON, os-acc would convert that to an OVO.17:20
*** devep_ has quit IRC17:21
dansmithSundar: that doesn't make any sense17:21
dansmithor.. I hope you're just planning to return the serialized object.. :)17:21
*** alexchadin has quit IRC17:21
Sundardansmith: The return values from Cyborg API are not necessariy the OVOs defined in os-acc. They are meant to be neutral, used even in scenarios where Cyborg may be used stand-alone17:23
dansmithSundar: well there's probably not much point in the OVOs then at that point17:23
sean-k-mooneyif we just use the python client or sdk we can just use python classes17:24
Sundardansmith: if you are OK with providing a versioned JSON to Nova as a return value from Cyborg API, that is fine too17:24
sean-k-mooneyand if we have jason scema definitions for the cyboge api we can validate them17:24
openstackgerritJack Ding proposed openstack/nova master: Add HPET timer support for x86 guests  https://review.openstack.org/60590217:24
*** lpetrut has joined #openstack-nova17:24
Sundardansmith: sean-k-mooney: Are we now saying os-acc is not adding much value, and Nova can directly call Cyborg APIs?17:26
dansmithSundar: nova is going to use some sort of client regardless17:27
SundarYes, agreed ^17:27
dansmithSundar: honestly, I'm not sure where we're at now.. I thought os-acc was going to be the only clienty thing and that it was going to define object models to be passed over the api between the services17:27
dansmithsounds like that's not clear17:27
dansmithand there is another cyborg api client already17:27
*** tobias-urdin is now known as tobias-urdin_afk17:27
dansmithso I dunno17:27
Sundardansmith: I agree with that! That's what I thought too17:28
*** sridharg has quit IRC17:28
SundarIf Cyborg returns a JSON value, os-acc could subject that to JSON schema valiadation, like what Sean said17:28
mriedemi assume sean-k-mooney wanted os-acc to do ovo translation stuff ala the long-held dream of ovo version negotiation for nova/neutron and os-vif17:28
sean-k-mooneySundar: you raise tho in that case the python classes that define the cyborge api resocues before they are converted to json would live in os-acc in that case17:28
sean-k-mooneymriedem: that would be nice but not requrired17:29
sean-k-mooneymriedem:  i was hoping it would work like neutron-lib which contians the api defitions17:29
mriedemsounds like we're building in more complexity than we really need right now17:29
mriedemif nova just needs to call apis, then cyborgclient is the way to go17:29
mriedemadding in ovo sugar later via an os-acc library could be done when needed17:30
dansmithSundar: any client can/would/should do schema validation17:30
sean-k-mooneymriedem: one thing i would however is that we do not need to make regualr commits to nova to account for change to the cyborg api17:31
dansmithSundar: not sure why that means os-acc should be separate17:31
*** sapd1 has joined #openstack-nova17:31
dansmithsean-k-mooney: that's the case for any client17:31
sean-k-mooneysure17:31
Sundardansmith: The current Cyborg client is not designed to convert the return values of APIs to what Nova expects. We would need something on top of that, and that could be os-acc17:32
sean-k-mooneyi think os-acc orginally came about because of the desire to be able to talk to the cyborg agent on the same host as the nova agent17:32
sean-k-mooneythat is not part of the spec anymore17:32
mriedemmost of the existing python-*client projects in openstack don't do response validation, they just take the response payload and throw it into a dict-like object17:32
dansmithSundar: nova can digest whatever the client returns17:33
sean-k-mooneySundar: or we define a set of class the represent the public api and nova will jsut use that17:33
dansmithSundar: we don't need a whole library to turn one dict into another17:33
mriedemthe caller needs to understand what is in that object, field-wise, based on version17:33
sean-k-mooneymriedem: yes that is true which is totlly fine for stable apis17:34
*** sapd1 has quit IRC17:35
sean-k-mooneythe cyborge api is not mature and stable yet and that might lead to a non zero amount of curn on the nova side to handel unless the python cyborge client can provide a semi stable subset we can use17:35
Sundarmriedem: sena-k-mooney: Cyborg API is versioned. In the event of changes, we would move to v217:36
sean-k-mooneySundar: sure but nova would have to be addapted to V217:36
mriedemSundar: does cyborg use microversions?17:36
Sundarmriedem: Cyborg API return values are documented -- now in the Nova spec but eventually in a Cyborg spec17:36
Sundarmriedem: No, not today.17:37
mriedemare there plans to? or just bump major versions whenever there is an api change?17:37
mriedemGET /v65/accelerators17:38
*** lbragstad has quit IRC17:38
mriedemanyway, again, probably not really necessary for this conversation for what nova needs17:38
Sundarmriedem: Currently, I haven't planned on microversions. I think you mean a microversion per API call?17:38
mriedemmicroversions are per request yes17:38
mriedemif not specified, there is a minimum default17:39
mriedem2.1 for nova, but i'd expect 1.0 or something for cyborg17:39
SundarIf we go with major versions alone, would moving to microversions later cause upgrade issues? Not if we bump major version *and* introduce microversions with that, I suppose?17:40
mriedemnova had v2.0 and then microversions were added in 2.1, which was backward compatible with v2.017:40
sean-k-mooneySundar: the difference is ususally you dont run multiple majour virsions at the same time17:40
mriedemnova was going to have a v3 but that became v2.117:41
SundarSounds good17:41
*** lbragstad has joined #openstack-nova17:41
mriedemcinder had v1 and v2,17:41
mriedemand they added microversions in v3.017:41
mriedemso it sounds like you'd be following the cinder model17:41
sean-k-mooneySundar: in a singel boot request we could call cyborge with multiple versions if it supported microversions17:41
mriedemanywho17:42
sean-k-mooneye.g bind with 1.1, program with 1.517:42
mriedemthe advantages of microversions isn't really necessary for what nova needs initially with cyborg17:42
SundarHmm, ok. I need to think about that. Given the long task list for Cyborg in Stein, may be we can introduce microversions later, as needed17:42
mriedemand it sounds like we don't need something translating JSON responses to OVOs17:43
sean-k-mooneySundar: what is the status fo teh deployable api endpoint currently17:43
Sundardansmith: sean-k-mooney: mriedem: What I am gathering is, we need a client for Nova to call Cyborg. That can be just the Cyborg client, as os-acc is not adding much value. Is that correct?17:43
mriedemso what i'm hearing is nova just needs cyborgclient17:43
mriedemSundar: i think so17:43
mriedembefore we make this all so complicated that we never get anything done, we should probably start with that17:44
sean-k-mooneyif we can reuse gibi's resouce prvider mapping code for cyborge then yes17:44
Sundarsean-k-mooney: is that a long haul? Can we reasonably expect that to get in by, say, Jan?17:45
gibiSundar: the non-generic mapping code is up in gerrit, I just got the comment yesterday that it can be done a lot more generic way so I'm working on that right now. I think this week I can publish the generic code17:46
gibiSundar: https://review.openstack.org/#/c/61623917:46
Sundargibi: Excellent, thanks!17:46
dansmithSundar: as we have said, serializing all of this work is definitely going to blow all your timelines17:46
*** adrianc__ has joined #openstack-nova17:46
*** adrianc_ has quit IRC17:47
sean-k-mooneySundar: has any work progressed on the ci front17:47
Sundardansmith: sean-k-mooney: Can you please state 'for the record' that you are ok with dispensing os-acc and having Nova call Cyborg directly?17:47
Sundarsean-k-mooney: The concept of Deployables is being reworked to map to RPs. What do you mean by 'CI front'? Zuul checking for Cyborg?17:48
sean-k-mooneySundar: yes we can call directly with out os-acc via the python clients.17:48
*** adrianc_ has joined #openstack-nova17:48
dansmithSundar: once again, nova will use a client library and not call directly. I do not care what the name of that thing is17:48
sean-k-mooneyim agreeing with dan by teh way eventhough we said it differently17:48
dansmithyes17:49
Sundardansmith: If it is the standard Cyborg client, Nova will have to do the necessary conversions to/from JSON. Hance the question17:49
sean-k-mooneywhat i meant for the ci front is we talk about the need to do some basic integration testing17:49
dansmithSundar: when nova uses a client library, it has to deal with the output of that client17:49
dansmithSundar: no client library returns exactly what we need with no massaging, of course17:49
Sundardansmith: Great. I will update the spec to skip os-acc. Thanks a lot to you, sean-k-mooney and mriedem.17:50
SundarI still have the question on colocation17:50
Sundar Re. request groups in device profiles, it is still not clear to me how we would handle co-location without them, i.e., we want 2 accelerators from 2 different RPs in the same device17:51
Sundarsean-k-mooney: You mentioned NUMA ffinity in your rpelies oin the spec. I am looking at co-location within a device17:51
*** adrianc__ has quit IRC17:51
* sean-k-mooney reading one sec17:52
sean-k-mooneySundar why would you have 2 different RP on the same device instead of 2 inventories in 1 device17:53
SundarBecause they have different traits17:53
sean-k-mooney* 2 inventories in 1 rp17:53
sean-k-mooneyok then have 1 RP for the devce and two nested rp for the two acllerators17:54
SundarSay an FPGA with 2 regions: one has compression, other has encryption and we want to gang them up together17:54
sean-k-mooneythen we can use in_tree=<device rp uuid> for the colocation17:54
dansmithSundar: those aren't traits, right?17:55
Sundarsean-k-mooney: Would the traits be applied to the parent RP or the children RPs? Ans, more importantly, is that scenario working today?17:55
dansmithSundar: those are inventories on a single provider, no?17:55
Sundardansmith: No, they are in 2 different RPs, but on the same device (PCI card for e.g.)17:55
sean-k-mooneySundar: i was tinking the childe RPs17:55
dansmithSundar: in that case, what would the inventories be?17:55
SundarBecause they represent different functions, and functions are traits, they would show up as 2 RPs17:56
sean-k-mooneythey are different resouce classes also17:56
dansmithsean-k-mooney: right which is why they can be inventories on the same provider17:56
*** adrianc_ has quit IRC17:56
SundarEach RP would contain resources of the class CUSTOM_ACCELERATOR_FPGA17:56
sean-k-mooneyone is COMPRESS_MBs and the other is CRYPTO_MBs17:56
dansmithI thought the whole point of this was to expose functions as consumable things?17:57
Sundarsean-k-mooney: No, we agreed both in rocky and Stein PTGs that the RCs reflect the device type (e.g. GPU, FPGA), not device details or functions17:57
dansmithomg17:57
dansmiththat is not what I thought we agreed17:57
sean-k-mooneyin anycase i think nested RP can handel this usecase17:57
SundarSo, a requets may look like: resources:CUSTOM_ACCELERATOR_FPGA=1; trait:CUSTOM_FUNCTION_A=required17:58
dansmithbecause in that case, all cyborg is ever going to expose is a thousand RPs with the same =1 inventories, decorated with super complex traits to describe what is in them at any given point right?17:58
SundarIf the RCs reflect functions, the inventories of RPs will change all the time, as devices get reconfigured17:59
dansmithso will the traits right?17:59
sean-k-mooneythe hack here is we cant delete or recreate invtories if there are allocation against them but we can change the traits17:59
Sundardansmith: Why super-complex traits? I documented a small handful (4 or 5 at the most), plus whatever custom traits that PowerVM guys asked for18:00
*** derekh has quit IRC18:00
sean-k-mooneySundar: there are more then 5 traits just to dicibe crypto functions18:00
dansmithsean-k-mooney: right, but we can atomically update multiple inventory things at the same time18:00
dansmithI mean, we can atomically update traits too I guess, but.. I totally thought this was going the direction of inventory being functions, and traits being actual, you know, traits about the device like brand, model capabilities, etc18:01
*** sapd1 has joined #openstack-nova18:01
Sundarsean-k-mooney: My point is they all have the same structure: CUSTOM_FUNCTION_foo, CUSTOM_DEVICE_MODEL_bar, etc.18:02
sean-k-mooneyok lets try to get the most basic version of integration working first18:03
*** spatel has joined #openstack-nova18:03
Sundardansmith: It would be simpler if all variations happened in traits, while RCs are more or less static -- a GPU accelerator will never become an FPGA accelerator.18:03
sean-k-mooneySundar: the expection from a placement point of view it that both traits and resouce classes would be largly stattic but could change over time18:04
Sundardansmith: This is documented in the specs -- both before and after the Stein PTG. But I am *not trying to guilt-trip you :)18:04
openstackgerritElod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification  https://review.openstack.org/50850618:05
sean-k-mooneyis there a way today to use cyborge to deploy somthing without special hardware that we could use a piplot to test the workflow18:06
dansmithSundar: " a GPU will never be an FPGA" is not an argument that means anything to me in this context18:06
Sundarsean-k-mooney: Makes sense. A device model trait is not expected to change much -- except perhaps on firmware updates. A function trait will change only when orchestration programs it (not if the VM programs it, which is the Device as a Service use case)18:06
*** sapd1 has quit IRC18:06
dansmithSundar: I thought the discussion had previously gone that a user says "I want two TLS offload devices, and they need to be able to support crypto $foo"18:07
dansmithSundar: but what you're saying is that they will need to say "I need two FPGAs and they need to have traits TLS_OFFLOAD and TLS_MECH_FOO"18:07
*** k_mouza_ has joined #openstack-nova18:07
dansmithwhich means cyborg isn't providing us much in the way of abstraction18:07
dansmithanyway, I'm about out of energy for discussing this at this point, so I'll leave it to the others that are more invested18:08
*** moshele has joined #openstack-nova18:08
Sundardansmith: Are you ok if sean-k-mooney and I continue the discussion? And are you ok with whatever conclusion we reach? :)18:09
*** k_mouza has quit IRC18:10
*** moshele has quit IRC18:10
Sundarsean-k-mooney: You have been closely following my specs (and thanks for that). Are you in alignment with this representation?18:10
dansmithSundar: you can of course discuss anything you want, and no I'm not signing off on something I haven't read18:11
sean-k-mooneySundar: i am honest gettin quite tired also can we pick this up later in the week18:12
Sundardansmith: Ok, tried my luck there. Can we talk tomorrow same time?18:12
*** k_mouza_ has quit IRC18:12
sean-k-mooneyperhaps we should do it on the cyborge channel not to flood nova18:12
dansmithSundar: honestly, I'm not sure we're making progress here18:13
dansmithSundar: and no, I can't be involved in every discussion, I'm just saying I reserve the right to be unhappy with the next round of the spec18:13
Sundarsean-k-mooney: Sure. Same time tomorrow?18:13
dansmithSundar: you need more than just sean-k-mooney in agreement on this18:13
sean-k-mooneyi would honestly love to jsut protoype something end to end that works and see what it looked like18:13
dansmithand getting everyone into a single irc channel at the same time is just not going to happen repetitively18:14
dansmithsean-k-mooney: ++18:14
*** moshele has joined #openstack-nova18:14
Sundardansmith: Sorry to hear that. I thought we made progress by agreeing to skip os-acc.18:14
dansmithsean-k-mooney: I'm getting spec fatigue I think.. a series of patches on both sides that actually does something we can evaluate might be a better stepping stone18:15
Sundarsean-k-mooney: I am with you, but Cyborg folks are reluctant to move till Nova spec converges, so I am facing a catch-2218:15
dansmithsean-k-mooney: it'll be trivial to look at that and evaluate how things are being done18:15
sean-k-mooneyso lets create a feature branch18:15
dansmithSundar: no, that's not a legit argument18:15
dansmithSundar: you can put up patches against nova and cyborg and test them together without merging anything18:15
dansmithwe do it all the time for big complex things like this18:15
dansmithif there isn't already, cyborg should have a fake driver that can just pretend to offer up devices and program them,18:16
*** moshele has quit IRC18:17
dansmithand that should be enough to do some interaction testing between the two services, even if nothing actually gets attached at the final step to the vm18:17
dansmithsean-k-mooney: agree with that ^ ?18:17
SundarA feature branch upstream?18:17
sean-k-mooneyyes18:17
spatelsean-k-mooney: do you have experience with rabbitmq ?18:17
sean-k-mooneyspatel: not much sorry18:17
spatelno worry!!18:18
sean-k-mooneydansmith: i would love to take that approch18:18
Sundardansmith: sean-k-mooney: OK, thanks for your time.18:20
openstackgerritArtom Lifshitz proposed openstack/nova-specs master: Re-propose numa-aware-live-migration spec  https://review.openstack.org/59958718:22
jaypipesholy crap, I missed a bunch... :( sorry, reading back up...18:23
sean-k-mooneymriedem: i found a relitvly simple and reliable way to repoduce https://bugs.launchpad.net/nova/+bug/1751923 by the way18:25
openstackLaunchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)18:25
*** efried_pto is now known as efried18:28
*** Sundar has quit IRC18:31
*** tobias-urdin_afk is now known as tobias-urdin18:38
*** devep has joined #openstack-nova18:42
*** pcaruana has quit IRC18:47
*** sapd1 has joined #openstack-nova18:50
mriedemsean-k-mooney: how is that? take down the neutron agent and reboot the vm or something?18:50
sean-k-mooneymriedem: i added a scipt to the bug18:52
sean-k-mooneyyou can cause it or a similar effect via the api18:53
sean-k-mooneybasically if you send the api request to detach a port to neutron and reboot the vm you end up with it broken18:54
sean-k-mooneyand you cant use openstack server add port or remove port to fix it18:54
sean-k-mooneymriedem: https://bugs.launchpad.net/nova/+bug/1751923/comments/1018:55
openstackLaunchpad bug 1751923 in OpenStack Compute (nova) "_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server" [Medium,In progress] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)18:55
sean-k-mooneymriedem: i was debating if i coudl make this into some kind of functional regression test but not sure how yet18:55
*** sapd1 has quit IRC18:56
openstackgerritMatt Riedemann proposed openstack/nova master: Apply DISTINCT clause to CellMapping.get_by_project_id for postgres  https://review.openstack.org/61906118:58
*** ralonsoh has quit IRC19:00
mriedemslaweq: hopefully this ^ does the ojb19:01
mriedem*job19:01
mriedemi'm no pg expert though19:02
mriedemmy socks have officially been rocked off19:12
sean-k-mooneyfind something interesting19:16
*** zul has quit IRC19:22
*** hamzy has joined #openstack-nova19:25
prometheanfireI'm not one either, but do use it for openstack, pg question?19:31
prometheanfireok, above my head too :D19:32
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Add recreate test for bug 1799892  https://review.openstack.org/61907519:39
openstackbug 1799892 in OpenStack Compute (nova) "Placement API crashes with 500s in Rocky upgrade with downed compute nodes" [Medium,In progress] https://launchpad.net/bugs/1799892 - Assigned to Eric Fried (efried)19:39
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Consider root id is None in the database case  https://review.openstack.org/61907619:39
mriedemupgrade issue in rocky so would be good to get that backport series moving ^19:50
mriedemthe xenserver CI seems to be busted19:53
openstackgerritJack Ding proposed openstack/nova-specs master: Flavor Extra Spec and Image Properties Validation  https://review.openstack.org/61854219:54
dansmithmriedem: not merged in master yet right?19:54
mriedemjust rechecked it in the gate19:58
efriedmriedem: should we assign both branches of that bug back to tetsuro?20:00
efriedI at least put the master side back.20:00
mriedemin launchpad?20:01
mriedemyes20:01
dansmithmriedem: so I've been half-assedly working on trying to do the manual metadata fill on single-instance-get20:12
dansmithlots of weird things break just within db_api if we return a dict instead of a model20:12
dansmithso I can go chase and fix all those things,20:12
dansmithbut I think jaypipes once flexed his db muscle and argued there was some way to change the way we do the joining to avoid the rowsplosion20:13
dansmithso maybe we should challenge him on that before we get too far20:13
*** sapd1 has joined #openstack-nova20:13
mriedemyou know how i get when jay flexes his muscles20:14
mriedemi melt20:14
dansmithyep20:14
dansmithit's grotesque yet oddly satisfying20:14
jaypipesewww.20:14
jaypipesdansmith: are you referring to the eagerload thing?20:15
dansmithno20:15
dansmithjaypipes: in the oldentimes,20:15
dansmithwe would load an instance, joined with metadata, system_metadata, etc20:15
dansmithwhich would end up returning X*Y*Z rows for X instances, with Y rows of metadata and Z rows of sysmeta20:16
jaypipesright.20:16
dansmithwhich was the reason RAX failed to deploy icehouse after we moved flavor data to sysmeta20:16
dansmithnow we query metadata separately from the actual instance load20:16
dansmithbut I thought a convo with you a long time ago yielded you saying that we could do an inner-outer-blue-unicorn join to avoid that somehow20:17
jaypipesyes, we can do a single query to get all instance metadata (and sysmeta) for all selected instances.20:17
dansmithto be clear,20:18
jaypipesa query that would just yield (instance_uuid, key, value) tuples.20:18
dansmithno, that's not what I'm asking20:18
*** sapd1 has quit IRC20:18
dansmiththe old query was returning the instance data itself, and the metadata key,value and the sysmeta key,value20:19
dansmithso the instance data was repeated for every row in meta, sysmeta20:19
dansmiththat was a single query for the instance itself, and the metadata(s)20:19
dansmithI know we can query the instance, and then query for the metadatas efficiently20:20
dansmithbut it's two queries20:20
dansmiththat's basically what we do now20:20
jaypipesok. well, that's the most efficient way to solve this particular problem.20:20
dansmithtwo queries?20:20
jaypipesyup.20:20
dansmithokay20:21
dansmithI'm pretty sure you called me a stupid ignoramus for doing that back in the icehouse days,20:21
jaypipesI thought we were doing >1 query for grabbing instance metadata and system metadata20:21
dansmithwell, we are, but only for convenience20:21
jaypipesno, I never called you anything.20:21
dansmithYOU DID20:21
dansmithwe're doing three queries now, instance, meta, sysmeta20:22
jaypipesif I ever said anything about performance it would have been because we were doing a query on instances, then *for each instance* issuing a query to get some instance metadata.20:22
dansmithwe're not doing that20:22
jaypipesok, coools.20:22
jaypipesI can reduce the meta + sysmeta to a single query.20:22
dansmithso I guess that means I have to keep plugging at this20:22
dansmithjaypipes: yea, feel free, but that's separate from my other work here20:22
jaypipesk. is there anything I can help you with on your work here?20:23
dansmithjaypipes: you just did20:23
dansmiththanks20:23
jaypipes:)20:23
jaypipeswell, at least that gives me something to smile about today.20:23
jaypipesthanks.20:23
* dansmith moves on to different excuses20:24
jaypipesFTR, on the cyborg thing, I *also* believe that cyborg should be modeling should be inventories, not RPs with tons of traits masquerading as resource classes.20:25
jaypipesdansmith: ^20:25
dansmithjaypipes: yay.20:25
* dansmith relishes in the small victories these days20:25
jaypipesindeed.20:25
mriedemdansmith: on top of the improved join on what you're doing, i think it's a 2-part change in that the metadata api doesn't need to be pre-loading on system_metadata - at least not anymore20:26
dansmithmriedem: yeah, so that will address the acute issue right?20:26
mriedemlast i looked the only thing in meta-api that would use sysmeta is a vendor data provider if configured20:26
dansmithmaybe I should punt this until we have a better reason to do this work20:26
mriedemi believe so, and i think that's what the workday ops guy said he did in the ML20:27
mriedemheh20:27
mriedemsee, i started trying to do what you said and sparks flew immediately20:27
mriedemand i gave up20:27
dansmithoh did you20:27
dansmith?20:27
mriedemlocally20:27
dansmithmaybe that should be my impetus to fix it20:27
mriedemnever pushed it up b/c tests failed horribly20:27
mriedemto show me up?20:27
dansmithyeah20:27
mriedemby all means20:27
dansmithnah, sounds hard.20:28
mriedemnext you can fix the nova/cinder cross az attach mess20:28
mriedemwhich i have a fix for, but it's fugly as all get out20:28
dansmithI like it already20:29
*** sapd1 has joined #openstack-nova20:34
*** sapd1 has quit IRC20:38
*** itlinux has quit IRC20:39
efrieddansmith, jaypipes: Modeling accelerators via specific resource classes, so like CUSTOM_FPGA_GZIP rather than rc=FPGA + traits=[GZIP] ?20:40
dansmithGZIP isn't a trait, IMHO20:41
dansmithlike, I don't ask for SOME_SILICON=1024, trait=RAM20:41
efriedum. The FPGA is capable of processing gzips?20:41
jaypipesthe resource class is a context to a GZIP program flashed to a device.20:42
dansmithright, but there's a difference between asking for an FPGA and asking for GZIP offload to me20:42
jaypipesdansmith++20:42
dansmithif I want an FPGA that I can program myself, I want an FPGA=1.. if I want a GZIP handler, I want GZIP=1,20:42
dansmithwhich might be an FPGA in the back end20:42
dansmithor it might be an ASIC20:42
dansmithor whatever20:42
efriedBut then a resource provider representing a blank (as-yet-unprogrammed) FPGA would have to show inventories of multiple resource classes, and then when one of those is consumed, we would have to nix the other resource classes (or do the reserved=total trick for them).20:44
efriedwhich is racy, as well as being ew.20:44
dansmithsame for the trait right?20:44
dansmithyou say FPGA=1, trats=GZIP,TLS,BITCOIN20:44
efriedno. GZIP_CAPABLE stays. Not sure we have to retrait every time we reprogram.20:44
efriedbut if we do, the GZIP_IS_ON_THIS_THING_AT_THE_MOMENT trait would be separate, and have separate meaning, than the GZIP_CAPABLE.20:45
dansmithyou're just providing no abstraction there20:45
efriedthe former would be used only as an optimization, if/when we have "preferred traits", to avoid reprogramming if there's one that's already set up.20:46
*** david-lyle is now known as dklyle20:46
jaypipespremature optimization...20:46
efriedI'd be happy if we skipped that whole bit for the first pass and just used the *_CAPABLE traits.20:47
efriedProgramming gets done after the claim, if and as necessary.20:47
jaypipesI'd be happy if we just skipped everything other than just using custom resource classes.20:47
efriedjaypipes: So preprogram everything?20:47
dansmithpretty sure we said the first step was assuming everything was static, no?20:48
dansmithexcept for the "user will program it themselves" case of course20:48
jaypipesdansmith++ again.20:48
dansmithwhen we talked about this in (denver I think?) I think the overwhelming majority of cases where this really applies is the pre-programmed case,20:49
dansmithbecause it provides for locality in certain FGPAs that have one code region and multiple execution contexts,20:49
dansmithsuch that if you co-locate a GZIP and a TLS, they both can't use the same FPGA, but if you get two GZIP tenants on the same box, they can20:50
dansmithand I thought we agreed to avoid boiling the ocean with "everything is completely dynamic all the time forever" until we could do, you know, fucking anything :)20:50
efriedI can buy it for a first pass. Long-term, that seems like not very cloudy. Though I suppose if the "pre"programming is done by a higher orchestrator, it could fly.20:50
dansmithmaybe that's just me missing something, but..20:50
efriedokay, thanks for the fresher.20:51
dansmithefried: well, if you do it with inventories, you can actually count usage of those things,20:51
dansmithand then your pre-programming workflow can ensure that X% of GZIP is available based on current demand20:51
dansmithbut if you do it with traits that seems a lot messier20:51
dansmithusage and capacity I mean20:52
slaweqmriedem: thx for taking care of this issue20:52
*** priteau has quit IRC20:52
dansmithefried: the orchestrator that maintains a certain amount of available inventory of stuff, I mean20:52
jaypipesI was under the impression Cyborg was gonna contain that "pre-orchestrator/pre-programming" thing...20:52
* dansmith is tripping all over himself20:52
dansmithjaypipes: yeah I dunno, could be, or could be a cron job you run once an hour that makes sure the capacity is within limits20:53
dansmithmeaning I dunno if that level of cron is something cyborg was going to do or not,20:53
efried"load balancing" your accelerators20:53
efriedguess that implies moving them around, which isn't what we're talking about20:53
efriedbut yeah, I get the idea.20:53
*** itlinux has joined #openstack-nova21:13
*** dave-mccowan has quit IRC21:28
*** rcernin has joined #openstack-nova21:38
*** lpetrut has quit IRC21:41
*** hamzy has quit IRC21:42
jaypipessean-k-mooney: questions for you on https://review.openstack.org/#/c/602384/ please21:46
*** k_mouza has joined #openstack-nova21:57
*** mchlumsky has quit IRC22:00
*** k_mouza has quit IRC22:01
*** devep has quit IRC22:04
*** awaugama has quit IRC22:04
openstackgerritChris Dent proposed openstack/nova master: Use external placement in functional tests  https://review.openstack.org/61794122:09
*** BjoernT has quit IRC22:09
openstackgerritChris Dent proposed openstack/nova master: WIP: Delete the placement code  https://review.openstack.org/61821522:09
*** mchlumsky has joined #openstack-nova22:11
mriedemefried: turns out https://review.openstack.org/#/c/619061/ does fix the pg thing22:13
mriedemalso, wee lots of red http://logs.openstack.org/05/613305/7/check/tempest-full/999ec9f/controller/logs/screen-n-api.txt.gz?level=ERROR22:14
mriedemunrelated regression22:14
efriedack x222:14
efriedmriedem: What were the func test failures? Actual differences in results?22:15
mriedemdansmith: looks like an unintended side effect of using the scatter_gather_single_cell for nova show is the scatter thing logs errors from the query^ even for things we expect22:15
mriedemefried: yeah, got 5 rows when 1 expected22:16
mriedemefried: pull it down and try it out22:16
efriedokay, must be aggregate functions22:16
efriednah, higher priorities.22:16
dansmithmriedem: oh yeah, I think I called that out initially and then totally forgot :(22:16
dansmithit's spewing errors to the logs right?22:16
mriedemyes22:16
mriedemi'll open a bug22:16
efriedIf it fixes the problem, let's roll with it.22:16
*** tbachman has quit IRC22:18
mriedemhttps://bugs.launchpad.net/nova/+bug/180432522:19
openstackLaunchpad bug 1804325 in OpenStack Compute (nova) "InstanceNotFound traceback errors in n-api logs while polling for server delete" [High,Triaged]22:19
mriedemso, i think we just remove that exception line since the caller can get the actual exception type now22:20
mriedemand decide if it needs to log22:20
mriedemdansmith: you want it or shall i?22:20
dansmithI don't want it22:20
dansmithI'm "finishing one email" away from disappearing22:21
mriedemare you around tomorrow?22:21
mriedemair quotes is acceptable22:21
dansmithheh22:23
dansmithI am but I have a few things going on22:23
dansmithbut if you lay on the guilt extra thick I might do something productive22:23
*** owalsh has quit IRC22:23
mriedemi'm going to be working hard at trying to figure out this espresso maker i bought22:30
mriedemit's like if dr seuss tried to make coffee22:30
*** spatel has quit IRC22:30
openstackgerritMatt Riedemann proposed openstack/nova master: Remove exception logging from scatter_gather_cells  https://review.openstack.org/61911022:32
*** owalsh has joined #openstack-nova22:39
openstackgerritMatt Riedemann proposed openstack/nova master: Add HPET timer support for x86 guests  https://review.openstack.org/60590222:39
*** imacdonn has quit IRC22:42
*** imacdonn has joined #openstack-nova22:42
*** efried is now known as efried_back_mond22:43
*** efried_back_mond is now known as efried_back_mon22:43
*** itlinux has quit IRC22:44
*** owalsh has quit IRC22:49
*** munimeha1 has quit IRC22:49
*** owalsh has joined #openstack-nova23:02
*** tbachman has joined #openstack-nova23:05
*** mugsie has quit IRC23:06
*** ivve has quit IRC23:08
*** takashin has joined #openstack-nova23:34
openstackgerritTakashi NATSUME proposed openstack/nova stable/rocky: Add description of custom resource classes  https://review.openstack.org/61912223:44
mriedemhello friends, could use some core reviews on this pretty simple straight forward spec https://review.openstack.org/#/c/612531/23:57
*** spatel has joined #openstack-nova23:59
*** cdent has quit IRC23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!