Thursday, 2022-01-13

admin1hi all 04:12
admin1my instances are all on shutoff state after a reboot04:12
admin1and they are not booting up 04:12
admin1any pointers on how to start them 04:12
admin1in the database/gui , they are shown as active 04:12
opendevreviewGhanshyam proposed openstack/nova master: Move rule_if_system() method to base test class  https://review.opendev.org/c/openstack/nova/+/82447505:31
opendevreviewGhanshyam proposed openstack/nova master: Server actions APIs scoped to project scope  https://review.opendev.org/c/openstack/nova/+/82435805:47
*** tbachman_ is now known as tbachman07:31
*** hemna8 is now known as hemna07:38
gibichateaulav: no worries. I should have notice that it was at the wrong place07:50
*** bhagyashris_ is now known as bhagyashris07:52
*** bhagyashris__ is now known as bhagyashris08:07
*** bhagyashris_ is now known as bhagyashris09:20
sean-k-mooney[m]admin1 there is a config option to resume guests on host reboot09:22
sean-k-mooney[m]but if the system has been up  for a while and the periodic task has run it proably has updated the db to mark them of shutdown by now09:23
sean-k-mooney[m]chateualav im +1 on your spec but i have a few comments if you adress them im +209:53
sean-k-mooneybauzas: do you have time to revisit https://review.opendev.org/c/openstack/nova-specs/+/82419110:52
sean-k-mooneykashyap: i might jsut fix the nits in https://review.opendev.org/c/openstack/nova-specs/+/824053 myself assuming you are not working on them?10:53
kashyapsean-k-mooney: Mornin10:53
sean-k-mooneymorning :)10:53
kashyapsean-k-mooney: Sorry, lemme just do it right away10:54
sean-k-mooneycool we can review it quickly when you push and reappove10:54
kashyapsean-k-mooney: Why does Sylvain suggest "Previous-approved : Yoga" in the commit message?  Isn't it *for* Yoga?10:57
sean-k-mooneythey ment xena10:57
sean-k-mooneyif its just a reapproval we typicaly refernece the last release it was apprved for but that is not strictly requried10:58
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova-specs/+/799096 is the most recently approved version10:58
kashyapAh, nod10:58
sean-k-mooneywhich had Previous-approved: Train10:59
kashyapYep, fixin11:00
opendevreviewKashyap Chamarthy proposed openstack/nova-specs master: Repropose "CPU selection with guest hypervisor consideration"  https://review.opendev.org/c/openstack/nova-specs/+/82405311:08
kashyapsean-k-mooney: --^11:08
sean-k-mooneybauzas: i have a pep8 issue in the spec anyway. i need to wrap the example output in a code block to make sphinx happy11:21
sean-k-mooneybauzas: so ill just adress you comments now11:21
*** bhagyashris_ is now known as bhagyashris11:21
sean-k-mooneyill also drop the log filter stuff for now11:21
*** bhagyashris_ is now known as bhagyashris11:45
bauzassean-k-mooney: all good, ping me when you're done11:53
bauzassean-k-mooney: and I was on https://review.opendev.org/c/openstack/nova-specs/+/824191:)11:53
sean-k-mooneyim more or less done just tweakign some formating to make it render nicer11:54
opendevreviewDmitrii Shcherbakov proposed openstack/nova master: [yoga] Add PCI VPD Capability Handling  https://review.opendev.org/c/openstack/nova/+/80819911:57
opendevreviewsean mooney proposed openstack/nova-specs master: add per process healthcheck spec  https://review.opendev.org/c/openstack/nova-specs/+/82127912:05
sean-k-mooneybauzas: ^12:05
sean-k-mooneybauzas: also kashyap spec https://review.opendev.org/c/openstack/nova-specs/+/82405312:06
*** dasm|off is now known as dasm12:08
opendevreviewMerged openstack/nova-specs master: lightos volume driver spec  https://review.opendev.org/c/openstack/nova-specs/+/82419112:20
opendevreviewIago Filipe proposed openstack/nova master: Remove deprecated opts from VNC conf  https://review.opendev.org/c/openstack/nova/+/82447812:41
opendevreviewJonathan Race proposed openstack/nova-specs master: Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova-specs/+/82404413:23
chateaulavsean-k-mooney: thanks for the feedback, changes have been made.13:26
chateaulavpatchset 6 submitted13:26
sean-k-mooneychateaulav: +2 from me gibi when you have time can you revisit13:36
gibion my list for tofay13:39
sean-k-mooneythanks. i have pushed the latest version fo the health check spec. im going to work on something else for a bit but if anyone has question ping me or leave them inline in the spec an ill try to be reponsive to them13:42
bauzassean-k-mooney: looking at your last rev for the hc13:48
bauzasshould be an easy peasy13:48
bauzaschateaulav: you're my next jab13:49
opendevreviewMerged openstack/nova-specs master: Repropose "CPU selection with guest hypervisor consideration"  https://review.opendev.org/c/openstack/nova-specs/+/82405314:07
gibisean-k-mooney: ack, you are on my list too (just meeeetings)14:09
*** akekane_ is now known as abhishekk14:29
opendevreviewMerged openstack/nova-specs master: Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova-specs/+/82404414:40
gibisean-k-mooney: I have couple of comment in the health check spec https://review.opendev.org/c/openstack/nova-specs/+/821279 feel free to ping me if clarification is needed14:51
sean-k-mooneyi will take a look now14:51
gibiok14:51
sean-k-mooneyregarding /health14:52
sean-k-mooneythose are referign to two differnrt api endpoint14:52
sean-k-mooneythe first isntance is saying i will expose /helth for the new tcp endpoint added by this spec14:52
sean-k-mooneythe second section in rest api impact14:52
gibiOK, I see14:53
sean-k-mooneyis saying i will not expose it in the nova-api endpoint14:53
gibimake sense14:53
sean-k-mooneyregardign the versioning14:54
sean-k-mooneyif it woudl make you feel better i can just bump the version everytime we add a posible check14:54
sean-k-mooneye.g. the minior version14:54
sean-k-mooneythat would make it more consitnet14:54
gibiI'm good any ways iff the rules of bumping is clear14:54
sean-k-mooneyi was jsut goign to have the keys of the checks sub dictionay unversioned14:54
gibithat is fine14:55
gibithen lets state taht14:55
gibithat14:55
gibichecks is unversioned the rest is semver14:55
sean-k-mooneywell that the thing i want the value of the checks to be versioned14:55
sean-k-mooneyjust not the keys14:55
sean-k-mooneyso you can rely on the format of the checks14:55
sean-k-mooneybut the set of names is open14:56
gibiaah14:56
sean-k-mooneyif we want to version everything in the responce the only delta form what i have now is the set of check names would be versioned too14:56
sean-k-mooneywhich i can do but im not sure what all of them are yet14:56
gibithen I would vote for versioning the set of check names too14:57
sean-k-mooneyok14:57
sean-k-mooneyill make that update and fix the typo14:57
gibiOK, thanks14:57
gibiI will +2 it14:57
opendevreviewsean mooney proposed openstack/nova-specs master: add per process healthcheck spec  https://review.opendev.org/c/openstack/nova-specs/+/82127915:08
sean-k-mooneygibi: there you go ^15:08
sean-k-mooneyif we are generally aligned on that i might adress any other nits if there are some in a followup15:09
gibisean-k-mooney: I'm +215:10
gibithank you15:10
sean-k-mooneyno worries :)15:10
sean-k-mooneyi proably shoudl start implementing that soon likely next week15:10
bauzas++15:10
bauzasif nothing controversial, gibi please +2/+W it then15:11
bauzasor I can jab this spec again15:11
sean-k-mooneybauzas: if you can that would be great15:11
gibiI will go back to it before I leave today and +W it if bauzas didn't yet15:12
sean-k-mooneyTheJulia: given today is spec freeze are you plannign to rework https://review.opendev.org/c/openstack/nova-specs/+/81578915:13
bauzassean-k-mooney: oki doki, hold your brace15:13
TheJuliasean-k-mooney: I didn't create that and I've never seen it before15:14
sean-k-mooneyoh its the spec for https://review.opendev.org/c/openstack/nova/+/813897 no?15:15
TheJuliasean-k-mooney: I've been treating the issue as a bug but I've been beyond slammed as of recent15:15
sean-k-mooneyok we might want to put this intout the compute team backlog downstream and take it over to land it next cycle15:16
sean-k-mooneygiven we are chaign the api behavior and rebalaicne semantic upstream we were considering it a feature15:16
TheJuliawell, that is what it calls for15:17
TheJuliaat least, a glance at the spec15:17
TheJuliathe bug can be solved by just doing the needful and not changing the API15:17
sean-k-mooneyits not changing the api syntatically 15:18
sean-k-mooneyi guess the sidefffect is not nessisaly visable15:18
sean-k-mooneyso you could arge it snot changing it semantically too15:18
TheJuliayeah, essentially invisible unless someone goes looking for it and even then it could have been a migration in a vm context for all anyone looking at the api knows15:18
sean-k-mooneyif other are ok with this as a pure bugfix and we change your initall patch to look at the dissabled state not up/down then i coudl buy that15:19
TheJuliaI think it was already doing that, but I've litterally forgotten exactly what is in the patch since I've not been able to pull it back up since November15:19
sean-k-mooneyack i think you didnt in v1 but ill adming i simarly dont know whats in v515:20
TheJuliaor disabled was contextually redundant to the data on hand or something like that15:20
sean-k-mooneywell one of the concers we hat was making sure this only happend if the operator opted in15:21
TheJuliaI was also going for keeping it as light weight on the DB as possible given multi-thousand node clusters will make things cry regardless15:21
sean-k-mooneyand the way to do tha was to delete the failed service or disable it15:21
TheJuliaYeah, and I think the operator opt-in is where I never got into15:21
TheJuliaopt-in in general is kind of crap for our end users since they need to be aware and otherwise they are exposed to... not great behavior otherwise15:22
* TheJulia shrugs15:22
sean-k-mooneyright but arbitary chaning the isntace.host effectivly at anytime is not somethign we wanted to allow either15:22
sean-k-mooneywe were more or less ok with it in a failure or mantaince event15:22
TheJuliait is not actually that arbitrary, it would be during a hash ring rebalance which is not an arbitrary event in the first place15:23
TheJuliawhich would have been triggered by a failure or maintenance event15:24
sean-k-mooneytrue im a little worried about what happens in the case of a network partion but ill try and review this again and see wht athe current patch does15:24
TheJuliamaybe we're using the same words with different causes in mind15:24
TheJuliacomputes don't directly interact with the db do they?15:25
TheJuliain terms of open socket to it ?15:25
sean-k-mooneyno they connect via the conductor15:25
sean-k-mooneyso they make rpcs which the conductor the exectues against the db15:25
TheJuliaso whatever has conductor connectivity would win and ultimately the partitioned cluster would cease to really function but still be able to carry the requests if they get them via another route15:25
TheJuliathat is as long as it doesn't hit the db15:26
sean-k-mooneywell the case i was thinking of is one of the compute cant conenct to rabbit teperaly and the heartbeat expires15:26
TheJuliaat which point deadlocked compute process15:26
sean-k-mooneyit gets marked as down15:26
sean-k-mooneyyou starte to reblance and it heartbeats and is marked as up15:26
TheJuliawell, the object access is over rabbit yes?15:27
sean-k-mooneywe dont want to continually reblance in that case when its flapping15:27
TheJuliawouldn't the process just halt on the db queries through rabbit in that case?15:27
TheJuliasince they are remoted objects15:27
sean-k-mooneyit depnes on what happening15:28
* bauzas just catching up the convo15:28
sean-k-mooneybtu the compute serivce if the rabbit connection drop woudl continue runnign normally and then if it needed to do somethign with the conductor it would retry15:28
sean-k-mooneyif the connection to rabbit is restablished the service will continute to work as normal15:29
TheJuliayes, but if we're pulling a list of things or even a reference of an object through rabbit to satisfy even the existing logic, then wouldn't that compute be stuck at that point anyway until rabbit connectivity is re-established and new requests can be picked up off the message bus?15:29
TheJuliayes, I think we're on the same about connectivity with rabbit15:30
TheJuliaerr, on the same page15:30
sean-k-mooneyso ideally if this drops and reconnects in the space of say 5 seconds15:31
sean-k-mooneythat shoudl not trigger a rebalnce15:31
sean-k-mooneywith the patch as wrttien i think we coudl get unlucky in that case yes?15:31
TheJuliano, is the alive balance a remote object save to the db every 30 seconds?15:31
sean-k-mooneyam yes i think that is the default interval15:32
TheJuliaso loosing connectivity briefly wouldn't cause the machine to drop from the list returned from the db15:32
TheJuliaunless the conductor is watching like a hawk and overriding the table contents15:32
sean-k-mooneyin the gneral case yes. if its a littel more protacted then it would15:33
TheJuliayes, but at which point that compute is out of service or unable to be used15:33
sean-k-mooneyyep15:34
TheJuliaI know in ironic that we don't drop it from the working list until after 3 failures have occured, but I'd need to check the code in nova15:34
sean-k-mooneyi think its similar15:34
sean-k-mooneywe dont mark it ad down until multiple heart beats are missed15:34
TheJuliawhich means it wouldn't be disqualified I think until 90 seconds have actually passed15:34
sean-k-mooneyi think its also 315:34
sean-k-mooneyya15:34
sean-k-mooneyhow long does rebalancing typically take15:35
sean-k-mooneyis it second or minutes? i assume the former15:35
TheJuliaseconds for the actual mapping at worst, the actual updates and iteration through is the painful part15:35
sean-k-mooneywithout using disables as a distiubed lock15:35
sean-k-mooneyhow do you prevent the compute when it compes back form racing iwth the reblance15:36
sean-k-mooneythat was the other usecase for it15:36
TheJuliaI don't think we've ever measured a startup rebalance on a large site in ironic but I've generally heard of a couple minutes15:36
TheJuliasean-k-mooney: the code, as I remember it would check to see if the prior compute is back and then breakout of the rebalanance apply loop if so15:37
TheJuliasince surely in a little bit, it was going to undo some of what it just did since the hash ring is back to what it was prior to the failure15:37
sean-k-mooneythat makes sense15:38
TheJuliaa "oh, the universe just changed on us. stop what we're doing!"15:38
TheJuliacheck15:38
sean-k-mooneyyep but it need to also roleback so liekly everthing need to be in one big transaction15:38
sean-k-mooneyso we either get the rebalanced sate or the old state15:39
TheJuliaa giant transaction would acutally harm the ability to understand what is going on because it also checks current record state so if that is effectively hiding in a giant transaction which will take a minute or two to commit, then we're introducing ourselves to more issues15:43
sean-k-mooneyit should not take minutes to commit15:44
sean-k-mooneywe shoudl just compute the desired endstate then comiit the change in one trasaction15:44
TheJuliaa couple thousand specific column field updates in a heavily used table?15:44
sean-k-mooneyyou just need to use a case statement to update the instance.host on all instance in one go15:44
TheJuliaeh... then again performance has changed drastically since the days I did that often manually15:45
sean-k-mooneyso i think its just one colume on one db table with many rows15:46
sean-k-mooneythat shoudl be quick15:46
TheJuliaeh, last time I did something like that manually like 4 thousand rows in a 100k row table it was a couple minutes, but again, that was like a decade ago15:46
TheJuliaAnyway, I'd need to double check a few things but a transaction, at least to me seems more harmful in that the compute node's rebalance is hidden state so we could get a bunch of conflicting transactions pile up, all fail past the first one, and trash to try and correct some of it as time goes on consistency wise15:47
sean-k-mooneywell if we dont have a transaction you will need to role back the partially reblanced state 15:49
sean-k-mooneymanually right15:49
sean-k-mooneyor trigger a full rebalnace again15:50
sean-k-mooneyto even out the load on the compute services15:50
opendevreviewMerged openstack/nova-specs master: add per process healthcheck spec  https://review.opendev.org/c/openstack/nova-specs/+/82127915:50
TheJuliawell... a transaction could work in try to generate the updates and then only check if things have changed before committing it if things have not changed, but I'd need to dig into the dbapi internals since to do so likely sounds like rpc functionality would be required to batch it all up properly outside of the normal transactions oslo_db drives15:52
TheJuliasince we're doing it one at a time instead of a bulk change15:52
sean-k-mooneywe would need to have a db method for this yes not just loop over the isntance set the filed and call save15:54
TheJuliaI guess I'm worried about trying to overthink it and then over engineer it, or if a model of eventual consistency is a happier place to be. I'm very much on the eventual consistency mindset since for ironic, it *really* doesn't matter which node proxies the request as long as it is online15:54
TheJuliait all matters for rpc request routing ultimately15:54
sean-k-mooneyi wonder if for ironic it would be better to have a single shared topic queue15:54
TheJuliawhat would that change/gain us in this situation?15:55
sean-k-mooneyso that all the compute shared one queue and any of them could deque it15:55
sean-k-mooneyit would mena the isntance.host woudl not matter anymore15:55
TheJuliahmm, that could entirely do away with the hash ring15:55
TheJuliawell, kind of15:55
TheJuliahmmmm it would have to know it is ironic15:56
TheJuliabut that could actually be navigated upgrade wise too15:56
TheJuliathat doesn't fix the UX issues which are ultimately bugs in past releases though15:56
sean-k-mooneyyes but we do know the hypervior type at least on the ocmpute node side not sure about the instnace object15:56
TheJuliaI don't think it is on the instance object15:57
TheJuliabut again, I've not looked at its structure in a while15:57
sean-k-mooneywell the way to hack it is to decied instance.host would always be set to "ironic" or simiarl for ironci contoled instnaces15:58
sean-k-mooneyanyway the impartnat thing is you still care about fixing this issue15:58
sean-k-mooneybut dont curretnly have time to work on it15:58
sean-k-mooneyso we shoudl consider if we "redhat comptue team" have capsity to help this cycle or next15:59
sean-k-mooneywe coudl land your patch as is but im not sure long term its the best approch but it would stop the bleeding in the short term16:00
sean-k-mooneywhich is the whole perferect is the enemy of good enough argument16:00
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: trigger nova-next with new tempest test  https://review.opendev.org/c/openstack/nova/+/82460716:40
TheJuliasean-k-mooney: yeah. :(16:43
TheJuliasorry, been distracted looking at a blocker issue16:43
admin1hi all .. how to figure out what puts   messages in versioned_notifications.info  ..but there are no consumers 16:53
admin1so the queue just grows and grows .. and then i have to manually delete it 16:54
gibiadmin1: you can disable notifications16:54
admin1gibi, from where/how ? 16:55
gibisec...16:55
gibiadmin1: https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo_messaging_notifications.driver16:56
gibiadmin1: so in the nova service configuration files16:56
gibiadmin1: set [oslo_messaging_notifications]driver=noop16:56
sean-k-mooneyadmin1: its disabled by default16:59
gibisean-k-mooney: I don't think so16:59
sean-k-mooneythe noop driver is the defult16:59
gibisean-k-mooney: I think the default is messaging_v216:59
sean-k-mooneyi dont think so16:59
gibihm16:59
gibiinteresting16:59
gibiaccording to the doc you are right16:59
gibiI alwas remembered it is enable by default16:59
sean-k-mooneydevstack enables it by default17:00
sean-k-mooneyand it also defaluts to unversioned if i remember17:00
gibiyes, the unversioned is the default17:00
gibiwe never switched it to versioned17:00
sean-k-mooneywe really should17:00
sean-k-mooneyand eventurlly remove the deprecated unversioned notifications17:01
sean-k-mooneythey have been deprecated since mitaka17:01
gibisean-k-mooney: the problem that there are openstack services using the unversioned17:01
gibiso we would migrate them first17:01
sean-k-mooneyya but honestly we should propose that a a cross projectr goal17:01
sean-k-mooneythe problem really haing people to do the work17:02
sean-k-mooneymaybe a topic of next ptg17:02
sean-k-mooneydo we know which ones rely on it17:02
gibiI have to dig17:03
gibiceilometer is one I'm sure17:03
sean-k-mooneyi support cilometer, cloud kitty, heat or masikari are the only ones that might be impacted17:03
sean-k-mooneymaybe watcher17:03
sean-k-mooneyas long as we are not adding any new unversion notification i guess it does not hurt use too much17:04
gibiI agree, it does not hurt17:04
gibiand we have a test in place to forbid new unversioned notifications17:05
bauzasyuval: around ? I don't see a blueprint filed for https://review.opendev.org/c/openstack/nova-specs/+/82419117:10
yuvalI am here17:10
bauzasyuval: could you please create one using the link you gave, ie. https://blueprints.launchpad.net/nova/spec/nova-support-lightos-driver17:11
yuvalyes I created blueprint17:11
bauzas?17:11
yuvaljust a second17:11
bauzasok, then the URL is wrong17:11
bauzaswe can change it17:11
bauzasyuval: found it https://blueprints.launchpad.net/nova/+spec/nova-support-lightos-driver17:12
*** Uggla is now known as Uggla|afk17:12
bauzasoh, yeah, I see17:12
bauzass/spec/+spec17:12
yuvalhttps://blueprints.launchpad.net/nova/+spec/nova-support-lightos-driver17:12
bauzasyuval: could you create a change fixing the URL in https://review.opendev.org/c/openstack/nova-specs/+/824191/6/specs/yoga/approved/lightos_volume_driver.rst#11 ?17:13
bauzasyuval: I'll fast approve it17:13
yuvalhttps://blueprints.launchpad.net/nova/spec/nova-support-lightos-driver17:13
yuvalthis is currently in the spec17:13
sean-k-mooneyyep17:13
yuvalits missing the "+"17:13
sean-k-mooneyits missing the +17:13
sean-k-mooneyyep17:13
yuvalbefore the spec17:13
bauzasyup, please update the link in the spec17:13
yuvaljust a sec17:14
bauzas(18:12:46) bauzas: s/spec/+spec17:14
bauzasin the rst file17:14
opendevreviewyuval proposed openstack/nova-specs master: lightos- fix blueprint url  https://review.opendev.org/c/openstack/nova-specs/+/82463317:34
yuvalsorry for the wait17:34
yuvalhad some git issues17:34
bauzasyuval: thanks and no worries17:35
opendevreviewGhanshyam proposed openstack/nova master: Update centos 8 py36 functional job nodeset to centos stream 8  https://review.opendev.org/c/openstack/nova/+/82463718:06
opendevreviewRajat Dhasmana proposed openstack/nova master: WIP: Add support for volume backed server rebuild  https://review.opendev.org/c/openstack/nova/+/82036818:17
opendevreviewMerged openstack/nova-specs master: lightos- fix blueprint url  https://review.opendev.org/c/openstack/nova-specs/+/82463318:20
opendevreviewmelanie witt proposed openstack/nova master: Remove deprecated opts from VNC conf  https://review.opendev.org/c/openstack/nova/+/82447819:22
*** lbragstad2 is now known as lbragstad19:39
*** dasm is now known as dasm|off21:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!