Friday, 2019-05-10

artomWait, Blizzard genuinely run OpenStack?00:00
mriedemartom: yes00:00
artomExplains the WoW server queues00:00
artom</joke from 15 years ago>00:00
mriedemhttps://www.openstack.org/summit/denver-2019/summit-schedule/events/23379/how-blizzard-entertainment-uses-autoscaling-with-overwatch00:00
mriedemeandersson will cut you00:00
artomI probably deserve that.00:01
*** gyee has quit IRC00:01
artomOh, Designate.00:03
artomMy intro to OpenStack was on that, as an intern at eNovance00:03
mriedemit's great that when post live migration fails we just, you know, act like it was good http://logs.openstack.org/64/649464/3/check/nova-live-migration/a0fdcc9/logs/screen-n-cpu.txt.gz?level=TRACE#_May_09_23_00_39_27282500:03
mriedemhttps://review.opendev.org/#/c/649464/3/nova/compute/manager.py@688900:04
openstackgerritMatt Riedemann proposed openstack/nova master: DNM: Test theory behind bug 1822884  https://review.opendev.org/64946400:04
openstackbug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Medium,In progress] https://launchpad.net/bugs/1822884 - Assigned to sean mooney (sean-k-mooney)00:04
artom*snerk* my goat is still in there: https://github.com/openstack/designate/blob/master/designate/tests/resources/zonefiles/malformed_example.com.zone00:04
mriedemi was supposed to have started making dinner and hour ago so i'm out of here00:05
mriedemo/00:05
mriedem*an00:06
*** mriedem has quit IRC00:06
artomGood man00:06
*** slaweq has joined #openstack-nova00:11
*** slaweq has quit IRC00:15
*** hongbin has quit IRC00:21
*** sapd1_x has quit IRC00:24
*** lbragstad has joined #openstack-nova00:26
*** ricolin has joined #openstack-nova00:43
*** igordc has quit IRC00:51
*** tbachman has joined #openstack-nova01:02
*** tssurya has quit IRC01:03
*** cdent has quit IRC01:16
*** bhagyashris_ has joined #openstack-nova01:16
*** whoami-rajat has joined #openstack-nova01:18
openstackgerritMerged openstack/nova stable/queens: Use migration_status during volume migrating and retyping  https://review.opendev.org/65757901:22
*** _hemna has joined #openstack-nova01:24
*** slaweq has joined #openstack-nova01:42
*** slaweq has quit IRC01:46
bhagyashris_Artom: Hi01:49
bhagyashris_artom: Hi01:49
*** _hemna has quit IRC01:58
*** ttsiouts has joined #openstack-nova01:59
*** dasp has joined #openstack-nova02:11
*** lbragstad has quit IRC02:12
*** ttsiouts has quit IRC02:33
openstackgerritMerged openstack/nova master: Remove macs kwarg from allocate_for_instance  https://review.opendev.org/65274902:37
*** brinzhang has joined #openstack-nova02:37
*** ileixe has quit IRC02:48
*** udesale has joined #openstack-nova03:03
*** wwriverrat has quit IRC03:09
*** slaweq has joined #openstack-nova03:11
*** slaweq has quit IRC03:16
*** jobewan has joined #openstack-nova03:21
*** JamesBenson has joined #openstack-nova03:25
*** ileixe has joined #openstack-nova03:32
*** _hemna has joined #openstack-nova03:54
*** takashin has quit IRC04:08
*** dasp has quit IRC04:08
*** slaweq has joined #openstack-nova04:11
*** takashin has joined #openstack-nova04:15
*** slaweq has quit IRC04:16
*** ratailor has joined #openstack-nova04:24
*** _hemna has quit IRC04:27
*** cdent has joined #openstack-nova04:29
*** ttsiouts has joined #openstack-nova04:30
*** dasp has joined #openstack-nova04:30
*** ileixe has quit IRC04:34
*** ileixe has joined #openstack-nova04:38
*** tkajinam has quit IRC05:01
*** JamesBenson has quit IRC05:01
*** ttsiouts has quit IRC05:02
*** cdent has quit IRC05:06
*** slaweq has joined #openstack-nova05:11
*** ivve has quit IRC05:14
*** slaweq has quit IRC05:15
*** JamesBenson has joined #openstack-nova05:33
*** tkajinam has joined #openstack-nova05:34
*** JamesBenson has quit IRC05:38
*** udesale has quit IRC05:44
*** udesale has joined #openstack-nova05:45
*** ivve has joined #openstack-nova06:14
*** janki has joined #openstack-nova06:17
*** Luzi has joined #openstack-nova06:19
*** slaweq has joined #openstack-nova06:23
*** _hemna has joined #openstack-nova06:24
*** dpawlik has joined #openstack-nova06:27
*** rpittau|afk is now known as rpittau06:41
*** maciejjozefczyk has joined #openstack-nova06:48
openstackgerritMerged openstack/python-novaclient master: Use SHA256 instead of MD5 in completion cache  https://review.opendev.org/65818106:50
*** _hemna has quit IRC06:58
*** ttsiouts has joined #openstack-nova07:00
*** boxiang has quit IRC07:07
*** boxiang has joined #openstack-nova07:07
*** _hemna has joined #openstack-nova07:11
*** _hemna has quit IRC07:16
*** brault has joined #openstack-nova07:19
*** tesseract has joined #openstack-nova07:19
tobberydbergDefinitely mriedem - thanks!07:34
*** jobewan has quit IRC07:44
*** jaosorior has joined #openstack-nova07:49
*** mcgigglier has joined #openstack-nova07:52
*** ralonsoh has joined #openstack-nova07:53
*** udesale has quit IRC07:57
*** udesale has joined #openstack-nova07:58
*** ttsiouts has quit IRC08:03
*** takashin has left #openstack-nova08:03
*** udesale has quit IRC08:05
*** udesale has joined #openstack-nova08:08
*** tkajinam has quit IRC08:12
*** udesale has quit IRC08:22
*** tssurya has joined #openstack-nova08:23
*** ricolin has quit IRC08:24
*** sapd1_x has joined #openstack-nova08:29
*** sapd1_x has quit IRC08:56
*** jchhatbar has joined #openstack-nova08:57
*** jchhatbar has quit IRC08:58
openstackgerritzhufl proposed openstack/nova master: Fix broken url links  https://review.opendev.org/65831208:58
*** jchhatbar has joined #openstack-nova08:58
*** janki has quit IRC09:00
openstackgerritzhufl proposed openstack/nova master: Fix broken url links  https://review.opendev.org/65831209:09
*** _hemna has joined #openstack-nova09:12
*** sapd1_x has joined #openstack-nova09:19
*** udesale has joined #openstack-nova09:27
*** tbachman has quit IRC09:38
*** tbachman has joined #openstack-nova09:40
*** _hemna has quit IRC09:46
*** bhagyashris_ has quit IRC09:50
openstackgerritBalazs Gibizer proposed openstack/nova master: pull out functions from _heal_allocations_for_instance  https://review.opendev.org/65545709:57
openstackgerritBalazs Gibizer proposed openstack/nova master: reorder conditions in _heal_allocations_for_instance  https://review.opendev.org/65545809:57
openstackgerritBalazs Gibizer proposed openstack/nova master: Prepare _heal_allocations_for_instance for nested allocations  https://review.opendev.org/63795409:57
openstackgerritBalazs Gibizer proposed openstack/nova master: pull out put_allocation call from _heal_*  https://review.opendev.org/65545909:57
openstackgerritBalazs Gibizer proposed openstack/nova master: nova-manage: heal port allocations  https://review.opendev.org/63795509:57
*** ttsiouts has joined #openstack-nova10:00
*** tbachman has quit IRC10:03
*** rtjure has joined #openstack-nova10:16
*** ttsiouts has quit IRC10:33
*** lpetrut has joined #openstack-nova10:35
*** brinzhang has quit IRC10:45
*** vdrok has quit IRC10:48
*** vdrok has joined #openstack-nova10:49
*** Luzi has quit IRC11:00
*** rtjure has quit IRC11:00
*** panda is now known as panda|lunch11:07
*** sapd1_x has quit IRC11:09
kashyapildikov: Hi, when you're about, do you have a minute to re-read this error message you've added in _set_multiattach_support()?11:17
kashyap            LOG.debug('Volume multiattach is not supported based on current '11:17
kashyap                      'versions of QEMU and libvirt. QEMU must be less than '11:17
kashyap                      '2.10 or libvirt must be greater than or equal to 3.10.')11:17
kashyapI think the "or" there should be "and", isn't it?11:17
kashyapOtherwise, it doesn't quite resolve.11:17
*** ralonsoh has quit IRC11:28
ildikovkashyap: I think the 'or' was intentional11:39
*** samueldmq has joined #openstack-nova11:39
kashyapildikov: Hmm, in that case, I've seen someone from HP report a bug with the above error: where they have 2.10.1 QEMU and libvirt as 3.6.0.11:41
openstackgerritBalazs Gibizer proposed openstack/nova-specs master: Server move operations with ports having resource request  https://review.opendev.org/65260811:41
ildikovkashyap: yeah, that should lead to an error11:41
ildikovkashyap: as neither the QEMU version is lower nor the libvirt version is higher than suggested11:42
ildikovkashyap: it's due to how they handle caching or smth11:42
*** _hemna has joined #openstack-nova11:42
ildikovas we needed to turn that off for volumes being attached to multiple instances11:43
kashyapYeah, I was reading the related code in the source.  Let me go recheck11:43
ildikovand there were some changes in how that's handled with the flags, etc11:43
ildikovin QEMU and libvirt11:43
kashyap(Hmm, but that error message is somewhat confusing.)11:43
kashyapildikov: Right, I saw the bugzilla linked which describes the mess in libvirt11:44
kashyapildikov: Oh, bad me -- I mistook 3.6.0 as higher than 3.10.0.  Can I be blinder than that...11:45
ildikovwell, the error message only says that if the QEMU version is low enough than nothing else matters and the same thing for the libvirt one if it's high enough11:45
ildikovkashyap: happens to everyone :)11:45
kashyapildikov: Thanks for the clarification.  Just someone reading it out loud is just what you need sometimes :-)11:49
ildikovkashyap: np, always happy to help :)11:50
*** redrobot has joined #openstack-nova11:55
*** cgoncalves has quit IRC11:56
*** panda|lunch is now known as panda12:01
*** tbachman has joined #openstack-nova12:06
*** tbachman_ has joined #openstack-nova12:08
*** tbachman has quit IRC12:11
*** tbachman_ is now known as tbachman12:11
openstackgerritBalazs Gibizer proposed openstack/nova master: Remove unused param from _fill_provider_mapping  https://review.opendev.org/65510712:12
openstackgerritBalazs Gibizer proposed openstack/nova master: Move _fill_provider_mapping to the scheduler_utils  https://review.opendev.org/65510812:12
openstackgerritBalazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth  https://review.opendev.org/65510912:12
openstackgerritBalazs Gibizer proposed openstack/nova master: allow getting resource request of every bound ports of an instance  https://review.opendev.org/65511012:12
openstackgerritBalazs Gibizer proposed openstack/nova master: Pass network API to the conducor's MigrationTask  https://review.opendev.org/65511112:12
openstackgerritBalazs Gibizer proposed openstack/nova master: Add request_spec to server move RPC calls  https://review.opendev.org/65572112:12
openstackgerritBalazs Gibizer proposed openstack/nova master: re-calculate provider mapping during migration  https://review.opendev.org/65511212:12
openstackgerritBalazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate  https://review.opendev.org/65642212:12
openstackgerritBalazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations  https://review.opendev.org/65511412:12
openstackgerritBalazs Gibizer proposed openstack/nova master: func test for migrate server with ports having resource request  https://review.opendev.org/65511312:12
*** _hemna has quit IRC12:17
*** nowster has quit IRC12:20
*** cgoncalves has joined #openstack-nova12:20
*** nowster has joined #openstack-nova12:24
*** ttsiouts has joined #openstack-nova12:30
*** mchlumsky has joined #openstack-nova12:35
*** lbragstad has joined #openstack-nova12:39
*** ratailor has quit IRC12:43
artomDoes placement understand AZs? Or I guess a better question would be, if we make the right request to placement (whatever that request may look like), are we guaranteed to get at least some hosts in the correct AZ?12:47
amodiartom: yes, they do, https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#availability-zones-with-placement12:48
artomfrancoisp, ^^12:49
francoispyes thanks amodi, artom12:49
artommdbooth, by the way, are we going to start including francoisp in our triage call?12:51
artomDoh, wrong channel, answer downstream plz :)12:51
*** ttsiouts has quit IRC13:03
*** mriedem has joined #openstack-nova13:05
openstackgerritMerged openstack/nova master: Expose Hyper-V supported image types  https://review.opendev.org/65513713:10
*** mcgigglier has quit IRC13:14
*** mcgiggler has joined #openstack-nova13:14
*** tbachman has quit IRC13:15
*** tbachman has joined #openstack-nova13:19
*** sapd1_x has joined #openstack-nova13:27
*** jaypipes has joined #openstack-nova13:32
openstackgerritArnaud Morin proposed openstack/nova master: Always Set dhcp_server in network_info  https://review.opendev.org/65836213:37
openstackgerritMerged openstack/nova master: Make libvirt expose supported image types  https://review.opendev.org/65345413:46
*** shilpasd has quit IRC13:49
*** sapd1_x has quit IRC13:56
*** mcgiggler has quit IRC13:58
*** zbr has joined #openstack-nova14:00
*** _hemna has joined #openstack-nova14:13
*** bnemec is now known as beekneemech14:23
*** jchhatbar has quit IRC14:30
*** mlavalle has joined #openstack-nova14:33
*** JamesBenson has joined #openstack-nova14:36
*** JamesBenson has quit IRC14:38
*** JamesBenson has joined #openstack-nova14:38
*** dpawlik has quit IRC14:40
*** ivve has quit IRC14:41
*** lpetrut has quit IRC14:43
*** _hemna has quit IRC14:46
*** lpetrut has joined #openstack-nova14:47
*** jaosorior has quit IRC14:49
efriedtssurya: Quick look at https://review.opendev.org/#/c/648662/ if you please, sanity check my comment.14:50
tssuryaefried: checking14:51
*** hongbin has joined #openstack-nova14:51
tssuryain a meeting, will answer asap14:53
efriedtssurya: thanks. If I'm wrong about that wrinkle, I'll happily +2 (will let mriedem approve though)14:53
mriedemi think you're right, and likely need to use oneOf14:56
mriedemwhere the options are null with no locked_reason or object with a required locked_reason14:57
*** lpetrut has quit IRC14:57
mriedemefried: i think it just copied how 2.56 works for cold migrate where you can specify null or a dict with a host14:59
mriedemand it looks like that schema has the same issue where you can pass null or {}14:59
efriedwoot14:59
*** ttsiouts has joined #openstack-nova15:00
tssuryaefried, mriedem: yea I basically did what 2.56 did for migrate host15:10
tssuryaI thought it was more of a feature thing than a bug :)15:10
tssuryabut maybe you are right, empty dict shouldn't be allowed ?15:11
*** imacdonn has quit IRC15:12
*** imacdonn has joined #openstack-nova15:12
efriedtssurya, mriedem: IMO there's no reason to allow it; just increases the test surface for no value.15:12
efriedAnd I guess we ought to have a test regardless.15:13
tssuryaefried: ack, I'll write a unit test like mriedem said and fix this then15:13
efried++15:13
efriedthanks tssurya15:13
tssuryathanks for the detailed eyeying :)15:13
efriedhandful of nits you can fix up while you're in there if you feel like it15:14
tssuryayep15:14
*** gyee has joined #openstack-nova15:17
*** samueldmq has quit IRC15:20
openstackgerritDongcan Ye proposed openstack/nova master: Raise BuildAbortException while updating instance task_state conflict  https://review.opendev.org/63316015:21
*** maciejjozefczyk has quit IRC15:25
*** mkrai1 has joined #openstack-nova15:27
*** macza has joined #openstack-nova15:31
*** ttsiouts has quit IRC15:34
*** BjoernT has joined #openstack-nova15:40
*** BjoernT has quit IRC15:43
*** rpittau is now known as rpittau|afk15:45
*** ivve has joined #openstack-nova15:47
*** jangutter has quit IRC15:51
*** brault has quit IRC15:52
*** jobewan has joined #openstack-nova15:53
*** tbachman has quit IRC15:55
*** liuyulong has joined #openstack-nova16:03
*** wwriverrat has joined #openstack-nova16:05
*** lpetrut has joined #openstack-nova16:06
*** xek has joined #openstack-nova16:10
*** cdent has joined #openstack-nova16:11
*** udesale has quit IRC16:12
*** udesale has joined #openstack-nova16:12
*** brault has joined #openstack-nova16:15
*** brault has quit IRC16:20
*** efried is now known as fried_rolls16:28
*** _hemna has joined #openstack-nova16:43
mriedemdansmith: seems our external event routing isn't working across multiple cells, the api figures out the instance is migrating and gets the proper hosts but for whatever reason only the source host gets the event...anyway, that's the current reason why the multi-cell resize stuff is failing in the gate, will have to dig a bit after lunch.16:44
mriedemmaybe i can assert that somehow in my functional test, not sure how though16:44
dansmithhmm, maybe because of the batching to the compute api that it does16:46
dansmithit tries to collate multiple events per host where possible16:46
*** udesale has quit IRC16:49
tssuryaefried, mriedem: https://review.opendev.org/#/c/648662/11/nova/api/openstack/compute/schemas/lock_server.py I am not sure if its a good idea anymore because a lot of the actions don't have schema validation at all, that means they allow empty dicts and what not.16:53
tssuryalet me know what you guys thing when you have the time16:54
tssuryathink*16:55
*** whoami-rajat has quit IRC16:58
*** _hemna has quit IRC17:16
*** lpetrut has quit IRC17:17
*** tssurya has quit IRC17:37
*** mdbooth_ has joined #openstack-nova17:41
*** mdbooth has quit IRC17:44
*** boxiang has quit IRC17:44
*** boxiang has joined #openstack-nova17:45
*** psyton has joined #openstack-nova17:49
*** Swami has joined #openstack-nova17:59
mriedemi guess i can't really do the same thing in the functional test because of the fake driver and neutron fixture18:02
mriedemfried_rolls: replied to surya's comment but haven't gone through the rest of that change yet since the last time - if you are +2 please withold the +W so i can take a look through it again18:04
*** whoami-rajat has joined #openstack-nova18:09
*** hamzy_ has quit IRC18:17
mriedemdansmith: ah yes,18:18
mriedemif host not in cell_contexts_by_host:18:18
mriedem                    cell_contexts_by_host[host] = instance._context18:18
mriedemthat assumes all of the hosts are in the same cell18:19
dansmithyou mean, it assumes that for a given instance, the src/dst host will be in the same cell18:19
dansmithyeah?18:19
mriedemyup18:19
dansmithis there no comment above saying "we assume..." ?18:20
mriedemi can probably recreate/test that in my functional multi-cell test by just sending an event with a migration that has hosts in different cells18:20
mriedemnot really, but https://github.com/openstack/nova/blob/0cb1544106346664b4a53114458417ea62474b8c/nova/compute/api.py#L485318:20
mriedemi wouldn't really expect it to either18:20
mriedemnothing has needed it yet18:20
mriedemhttps://github.com/openstack/nova/blob/0cb1544106346664b4a53114458417ea62474b8c/nova/compute/api.py#L486918:21
*** tssurya has joined #openstack-nova18:21
dansmithright, but I was thinking we had discussed this before,18:21
mriedemah yes18:21
mriedemConsequently we can currently assume that the context for                                       # both the source and destination hosts of a migration is the                                       # same.18:21
dansmithand putting a comment like that seems like something you would have made me do :)18:21
mriedemdidn't read far enough18:21
mriedemit was mdbooth18:21
openstackgerritMerged openstack/os-traits master: Update SEV trait docs to avoid misleading people  https://review.opendev.org/65567118:22
mriedemso if the migration record has cross_cell_move=true i'll have to tell it to pull the dest host mapping via the host mapping18:22
mriedemso still optimized for the non-cross-cell case18:23
dansmithcool18:23
*** hamzy has joined #openstack-nova18:24
openstackgerritSurya Seetharaman proposed openstack/nova master: Microversion 2.73: Support adding the reason behind a server lock  https://review.opendev.org/64866218:35
*** jamesdenton has quit IRC18:37
eanderssonlol mriedem18:50
*** jobewan has quit IRC18:51
*** fried_rolls is now known as efried18:52
zzzeekjaypipes: ping18:54
mriedemhmm my genius test must not be so great http://paste.openstack.org/show/751238/18:56
mriedemsince it should fail, but doesn't18:56
mriedemwas hoping to avoid stubbing things out in the functional test to test the multi-cell event routing18:56
mriedemdansmith: is there maybe something in our rpc fixture stuff that wouldn't make this easy to test in a functional test because we're using fake rpc?18:57
dansmithmriedem: er, I wouldn't think so.. you're using the multi cell fixture?18:58
mriedemyeah, my functional tests work fine for multi-cell18:58
mriedemcould be my context manager yield pattern in ^ isn't actually asserting anything18:58
*** dklyle_ has joined #openstack-nova19:02
mriedemno that routing seems to be working at least on one host19:04
mriedem    b'2019-05-10 15:02:18,969 DEBUG [nova.compute.manager] Processing event network-vif-plugged-60c6c823-ed75-46a0-a0c6-30aadb90e78c'19:04
mriedem    b'2019-05-10 15:02:18,970 DEBUG [nova.compute.manager] Received event network-vif-plugged-60c6c823-ed75-46a0-a0c6-30aadb90e78c'19:04
*** david-lyle has quit IRC19:04
*** tesseract has quit IRC19:12
*** _hemna has joined #openstack-nova19:13
*** tssurya_ has joined #openstack-nova19:23
*** jistr has quit IRC19:28
mriedemi think it works in functional testing because we're not using multiple rpc transports https://review.opendev.org/#/c/396417/19:28
*** jistr has joined #openstack-nova19:28
*** jistr has quit IRC19:29
mriedemso i'm guessing in a multi-cell test we'd need to use per-cell rpc fixtures?19:32
*** jistr has joined #openstack-nova19:33
efriedmriedem, tssurya: I'm +2 on https://review.opendev.org/#/c/648662/ - leaving for mriedem to +W.19:39
*** jistr has quit IRC19:40
*** jistr has joined #openstack-nova19:41
mriedemoh the pressure is on19:45
*** _hemna has quit IRC19:47
gansomriedem: hey Matt. I am looking at https://review.opendev.org/#/c/658136/ ... it seems it needs https://github.com/openstack/nova/commit/94e620e87cb9349f799007f418ce94978bc33be119:47
gansomriedem: Rocky also doesn't have the methods assertFlavorMatchesUsage and assertRequestMatchesUsage19:47
gansomriedem: do you think it makes sense to backport the refactor?19:48
mriedemno19:48
mriedemthere should be assertion methods you can use in queens19:48
gansomriedem: only assertFlavorMatchesAllocation19:51
gansomriedem: so the solution would be to cut down on some asserts19:51
*** imacdonn has quit IRC19:51
mriedemwithout looking at the test patch, you could rig up your own usage assertion using https://github.com/openstack/nova/blob/stable/queens/nova/tests/functional/test_servers.py#L151419:57
mriedemnot sure if you're asserting provider usage or consumer usage19:57
mriedemi'm assuming the former19:57
gansomriedem: yup, doing that now. Basically re-coding the test to perform checks as it was done in queens19:57
gansomriedem: yea provider usage19:58
*** bbowen has joined #openstack-nova19:58
openstackgerritRodrigo Barbieri proposed openstack/nova stable/queens: [DEBUG] Add functional confirm_migration_error test  https://review.opendev.org/65813620:13
jaypipeszzzeek: pong20:22
zzzeekhey20:22
jaypipesheya :)20:22
zzzeekjaypipes: time warp back to http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/20:22
zzzeekso given that galera can have a little bit of read latency: https://www.percona.com/blog/2013/03/03/investigating-replication-latency-in-percona-xtradb-cluster/  can that impact an UPDATE that says, "UPDATE ... SET x='bar' WHERE x = 'foo'" ?   if two transactions both update, but write latency means transaction 2 still sees "foo" ?20:24
*** slaweq has quit IRC20:24
zzzeekOR, do the UPDATE statements in a write-set get replayed such that they are serialized and it will detect this ?20:24
zzzeekAND, if the latter, what if the two UPDATE statements are changing "x" to the *same* value?  this is ultimately  a nova question20:25
zzzeekb.c. the issue is observed in the instance shelving logic20:25
*** bbowen has quit IRC20:30
*** ccamacho has quit IRC20:32
openstackgerritMatt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API  https://review.opendev.org/63826920:35
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Enable cross-cell resize in the nova-multi-cell job  https://review.opendev.org/65665620:35
openstackgerritMatt Riedemann proposed openstack/nova master: Support cross-cell moves in external_instance_event  https://review.opendev.org/65847820:35
mriedemzzzeek: a la this bug? https://bugs.launchpad.net/nova/+bug/182137320:36
openstackLaunchpad bug 1821373 in OpenStack Compute (nova) "Most instance actions can be called concurrently" [Undecided,New]20:36
zzzeekmriedem: yes20:37
zzzeekmriedem: mdbooth_ has updated me that his proposed solution won't work20:37
*** bbowen has joined #openstack-nova20:38
mriedemi think he mentioned something to that effect in irc awhile back for this bug but i don't remember what or why20:39
mriedemgotta run20:44
*** mriedem has quit IRC20:44
cdent so many things20:48
cdentzzzeek: since you're sort of around, I wanted to warn you that at some point we might be hassling you for some advice on some performance improvements in placement.20:51
cdentbut not now. later.20:51
zzzeekcdent: that should be good since placement looks to be straightforward20:51
cdentit _was_20:52
cdentbut the nested stuff being added this cycle are going to be mind bending20:52
zzzeekcdent: what kind of nesting20:55
*** JamesBenson has quit IRC20:57
*** JamesBenson has joined #openstack-nova20:59
*** JamesBenson has quit IRC21:03
*** mchlumsky has quit IRC21:04
edleafezzzeek: representing compute nodes containing NUMA nodes containing memory, for example21:07
cdentzzzeek: sorry was away. the nesting is the "nested providers" concept. The new features are described (in brief) at https://storyboard.openstack.org/#!/story/200557521:07
cdentzzzeek edleafe has done some interesting work to switching a graph database (which is a better fit for some of these things) but we're in the standard position of "we've got this other stuff already"21:07
zzzeekcdent: so for now are you looking at...adjacency list schema ?21:09
cdentzzzeek: no explicit decisions yet. the linked email thread has some ideas. But the lack of decisions is why I was saying we'll probably want some input, but not quite yet. It's more of a heads up for a later discussion rather than a now discussion21:10
cdenton top of that: solutions for the existing features are going to need some work to scale into the 100s of thousands of resource providers21:11
*** slaweq has joined #openstack-nova21:11
cdentbut we need to do some accurate measuring first21:11
zzzeekedleafe is dying to get off SQL :P21:19
zzzeek:)21:19
edleafezzzeek: Just for this case.21:19
zzzeekedleafe: yeah graph DBs are awesome21:19
zzzeekedleafe: huge crazy new dependencies not as much :)21:19
edleafezzzeek: I got nested providers working in a couple of days at the summit last week21:20
zzzeekedleafe: might be one of htose cases where you have multiple backends21:20
edleafeI just use Neo4j in docker :)21:20
zzzeekedleafe: we still have to have RPMs that build it out, we have memory reuirements that java VM adds some weight towards, etc21:22
zzzeekedleafe: by "we" I mean red hat21:22
edleafezzzeek: Sure, I understand all that. I just want to show it working, and solving the problems that we've spent years trying to solve using SQL. If that gets done and enough people feel that this is the way to go, then we can worry about what is needed for build/deploy21:23
*** jaypipes has quit IRC21:24
zzzeekedleafe: for small graphs, I use adjacency list.  if you need a huge deep graph every time, then that won't work21:24
*** slaweq has quit IRC21:24
*** jaypipes has joined #openstack-nova21:24
jaypipeszzzeek: apologies, having net issues... reading back21:25
cdentI think our model is many trees, each <= 7 levels deep, not very broad21:25
edleafecdent: for nested, agree. Shared providers is the opposite21:26
cdentdepends on how we model the shared association. in a graph db, yes21:27
cdentI tend to like shared as group, not tree21:28
openstackgerritEric Fried proposed openstack/os-resource-classes master: Propose ACCELERATOR_{FPGA|GPU} resource classes  https://review.opendev.org/65746421:35
edleafecdent: I was just commenting on the trees reference. In the graph, sharing is just another relation. With Many:1 sharing, it kind of looks like a flower, not a tree: https://bit.ly/2VUI52U21:36
jaypipeszzzeek: set global wsrep_causal_reads=1; <-- do that if you want synchronous replication behaviour.21:38
zzzeekjaypipes: OK so you can confirm that comapre and swap can fail for multimaster if that's not set ?21:38
zzzeekjaypipes: this might be what's needed for https://bugs.launchpad.net/nova/+bug/182137321:39
openstackLaunchpad bug 1821373 in OpenStack Compute (nova) "Most instance actions can be called concurrently" [Undecided,New]21:39
jaypipeszzzeek: I'm not 100% sure, but I believe so. the issue, however, is that the compare-and-swap technique should be done to increment a field and check that field value is at a previous read-view in the WHERE clause. in other words, doing things like UPDATE instances SET status = 'active' WHERE status IN (<list of statuses>) AND instances.uuid = ? is inherently not as safe/efficient/retryable-with-confidence as UPDATE instances SET generation =21:43
jaypipes? + 1 WHERE generation = ? AND uuid = ?21:43
*** _hemna has joined #openstack-nova21:44
jaypipeszzzeek: that's the fundamental problem IMHO with the whole "check my instance status is in these list of states and set the status to X" checks that nova does.21:44
zzzeekjaypipes: if the UPDATE included a unique version or timestamp of some kind does that help?  if galera sees two UPDATE statements setting it to a different value ?21:45
jaypipeszzzeek: maybe? :) galera already sends around essentially a generation for the innodb records affected by a transaction writeset, AFAIK. I just think doing the "compare" part of the compare-and-swap functionality with a "loose match" like "status IN <....>" isn't as good as comparator that was specifically designed to inform the caller that "yes, someone else changed this record since you last read a view of it". hope that makes sense.21:48
jaypipeszzzeek: this is why, in placement land, we always do the compare-and-swap using the `UPDATE tbl SET generation = ($LAST_READ_GENERATION + 1) WHERE pk = $PK AND generation = $LAST_READ_GENERATION` strategy21:50
zzzeekjaypipes: the specific case in that issue we are looking for an exisitng status of NULL21:50
jaypipeszzzeek: yeah, but AFAIK, the code you're describing isn't really a compare-and-swap. it's more just a "hey, check that I'm not, say, in the process of deleting this instance when I try to unshelve it"21:51
zzzeekjaypipes: well yes I was saying, it's more reliable if we are setting it to a new value, however, if two transactions on different nodes hit it at the same time they will see the same LAST_READ_GENERATION value21:51
jaypipeszzzeek: yes, they will.21:52
jaypipeszzzeek: and if both attempt to update, one will fail of course.21:52
zzzeekjaypipes: one fails because of the SET clasue specifically ?21:53
jaypipeszzzeek: no, one will fail because WHERE generation = $LAST_READ_GENERATION will fail.21:54
jaypipesto return a row.21:54
jaypipeszzzeek: so, the SQL won't fail, per-se, it's just the transaction will return 0 rows affected.21:54
jaypipeszzzeek: which is the thing we look for to trigger a rollback of the entire transaction.21:55
zzzeekjaypipes: but what if one UPDATE processes, sends out the writeset which includes the new value, however the other node gets an UPDATE, due to replcation latency it also sees the same value, also emits an UPDATE, no failure21:55
zzzeekbut updates the row21:55
zzzeekbeause the new vaule wasn't there yet b.c. no wsrep_causal_reads21:55
zzzeekjaypipes: this gets into, I have no idea how the galera writeset certification works21:56
zzzeekjaypipes: I would think that certifiaction should be, transaction modified this row, this other transacvtion is modifying the same generation of that row, so it fails21:56
zzzeekwhich means this issue is non-existent21:56
zzzeeke.g. mvcc generation21:56
jaypipesthat is essentially how it works, yes.21:57
jaypipeshttps://github.com/openstack/placement/blob/master/placement/objects/resource_provider.py#L96021:57
*** rcernin has quit IRC21:58
zzzeekjaypipes: OK.  So, if the SHELVE thing is looking explicitly for NULL and changes to SHELVING as described in the launchpad, no failure ?  how does it fail ?21:58
jaypipesthe wsrep_causal_reads is about reads only. writes that attempt to update the same record when a diff trx changed that record are never allowed.21:58
zzzeekjaypipes: right that's what I sort of thought21:59
jaypipeszzzeek: apooogies, I haven't read the bug report yet.21:59
jaypipeslemme do that now. one minute.21:59
jaypipeszzzeek: I would take issue with mdbooth_'s statement "This is intended to act as a robust gate against 2 instance actions happening concurrently." :)22:00
jaypipesit's not a robust gate at all.22:00
jaypipesit's a super coarse-grained check22:01
zzzeekjaypipes: OK he is stumped on this and I dont really know the details of this system, I just wrote the UPDATE statement four years ago22:01
zzzeekdo you have anything you can add to that launchapd?22:01
jaypipesin fact, it's not a gate at all. it's nothing more than a very simple sanity check that exists outside of any transactional context AFAIK22:01
jaypipeszzzeek: yeah, I will add a note to it.22:02
zzzeekjaypipes: really?   it doesnt seem that way to me, I assume this is on enginefacade and there should be a tx22:02
jaypipeszzzeek: one would assume that. I have no real way of verifying it's in the same trx though.22:02
*** tssurya has quit IRC22:03
*** tssurya_ is now known as tssurya22:03
* zzzeek has to go do friday stuff22:04
zzzeekthanks for the chat jaypipes22:04
*** bbowen has quit IRC22:06
*** slaweq has joined #openstack-nova22:11
*** _hemna has quit IRC22:17
*** cdent has quit IRC22:20
*** mlavalle has quit IRC22:24
*** slaweq has quit IRC22:24
openstackgerritEric Fried proposed openstack/os-resource-classes master: Propose ACCELERATOR_{FPGA|GPU} resource classes  https://review.opendev.org/65746422:33
*** whoami-rajat has quit IRC22:39
*** _hemna has joined #openstack-nova22:44
*** macza has quit IRC22:45
*** macza has joined #openstack-nova22:47
openstackgerritSundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification.  https://review.opendev.org/60395522:54
*** lbragstad has quit IRC22:58
*** jaypipes has quit IRC23:01
*** macza has quit IRC23:11
*** slaweq has joined #openstack-nova23:11
*** _hemna has quit IRC23:18
openstackgerritMerged openstack/nova master: Add ironic driver image type capabilities  https://review.opendev.org/65572923:22
openstackgerritMerged openstack/nova master: Add vmware driver image type capabilities  https://review.opendev.org/65573023:22
*** mlavalle has joined #openstack-nova23:24
*** slaweq has quit IRC23:24
*** gyee has quit IRC23:26
*** Swami has quit IRC23:32
*** xek has quit IRC23:36

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!