Wednesday, 2023-09-13

JayFzigo: bobcat is not 2.0. If you have a need on a timeline please put that on the list, although it won't change bobcat it can help decide future priorities 00:51
*** ozzzo1 is now known as ozzzo07:02
zigoJayF: I thought it was 2.0. Yes, I do need SQLA 2.0 support in Bobcat, and will write about it in the list, thanks for your care!07:18
bauzaszigo: the situation is quite simple, most projects try to support the new DSL of 1.407:28
bauzaswhich is considered a bridge release speaking old pre-1.4 and new 2.0 style07:28
bauzasbut then we need some guidance, as for the moment, none of the distros ship SQLA 2.007:29
bauzasit's then a classic chicken and eggs situation, if everyone awaits the other :D07:29
frickleralso see https://review.opendev.org/c/openstack/requirements/+/879743 for a (possibly incomplete) list of projects that still struggle07:36
zigoWell, for Debian, you should consider that Experimental represent the next OpenStack stable, so SQLA 2.x is the way ...07:38
zigofrickler: Where do I see it in that patch?!?07:38
fricklerzigo: look at the failing cross jobs. I just saw https://review.opendev.org/c/openstack/requirements/+/887261/comments/6b1f2573_60f48f1c has some more detailed analysis of the issues. but this is getting a bit offtopic for the nova channel07:41
zigofrickler: So, if I read it well, we are down to only 4 packages, the biggest concern being heat, but it's probably still working, right?08:04
fricklerheat seems very strongly broken, see https://review.opendev.org/c/openstack/heat/+/886549 and also https://08a8510b7f90952cf4e9-7691b2eb12cb7f63209fe7ead2f9c479.ssl.cf1.rackcdn.com/887261/2/check/openstacksdk-functional-devstack/f9d3c13/controller/logs/screen-h-api.txt08:11
bauzasstephenfin: please, just one again needed modification for sg associations in https://review.opendev.org/c/openstack/nova/+/86082909:10
sean-k-mooneyzingo so timeline wise caracal (2024.1) will still be based on ubuntu 22.04 and the D cycle 2024.2 shoudl be based on 24.0409:32
sean-k-mooneyzigo: ^ for nova i woudl liek to merge the final patches so that we can run on 2.0 and im hoping the rest of openstack catches up in caracal09:32
sean-k-mooneybut im not exepcting it to become our default until D09:33
sean-k-mooneyalthoguh i woudl be fine with nova runing most of our jobs with 2.0 provided we still had 1.4 coverage in at least one job09:33
sean-k-mooneyright now we have the opisite situation where we run most of our jobs on 1.4 but have a singel job testing 2.0 and we enabel all the wranings in yoru unit/funcitonal tests09:34
sean-k-mooneyonce we are 2.0 comptaible it woudl be really nice if we could make that warning an error by the way09:34
sean-k-mooneyzigo: i dont know if you can take the same approch tumbleweed did nan have a second package for SQLA09:36
zigosean-k-mooney: I can't, that's not how Debian works.09:36
sean-k-mooneythey moved the main package to 2.0 and added a second package to track 1.409:36
zigoThere would be a path clash ...09:36
zigo"import sqlalchemy" would need to be rewritten to "import sqlalchemy1" or something ...09:37
sean-k-mooneysure on any one system 09:37
sean-k-mooneyyou could  only have one09:37
zigoPlus many many hacks.09:37
sean-k-mooneybut you could declare one package as conflicting with the other09:37
zigoFirst, that's forbidden by the Debian policy to do that. Second, I'm not packaging for chroot or venvs... packages are supposed to be co-installable.09:38
sean-k-mooneywell tere are packages on debian that do that09:38
sean-k-mooneyand yes i know they are ment ot be co installable09:38
sean-k-mooneyi was just giving you options since bobcat wont be 2.0 compatiable in its entirity09:38
sean-k-mooneyalthoguh caracal should be09:39
zigoIt's packages that are supposed to be different implementation of the same functionality, like gawk vs mawk. This is *NOT* for having multiple versions of the *SAME* package.09:39
zigoSo, your option is unfortunately not an option ... :/09:40
zigosean-k-mooney: Really, my only viable option is to have patches backported from master ...09:40
sean-k-mooneyya where i saw it was differnt pipewire backend plugins09:40
sean-k-mooneyzigo: so it was not a comunity goal to get this done this cycle09:42
sean-k-mooneyso they may or may not be written09:42
sean-k-mooneythat said several of us really wanted to get to that point since it was discussed as a possibel goal a few times 09:42
zigoThis fact (that it was not a comunity goal) doesn't change anything to the situation, unfortunately ...09:42
zigoI can live with a broken heat package in Unstable for some time, that's ok-ish, the backport will probably work.09:43
sean-k-mooneywell the question is do you need bobcat to run with 2.0 for debian or is the 2.0 change going to happen later 09:43
zigoBut it has to be taken seriously and it'd be best if we have a patch soon.09:43
zigoI'd have to discuss with the SQLAlchemy maintainer...09:44
zigoThough he's pressing to have the package migrate to Unstable ...09:44
sean-k-mooneywhat im really getting at is what was your orgianl plan and timeline09:44
sean-k-mooneyya so i think it woudl be goood to get it in unstable09:44
sean-k-mooneyso that we can hopefuly get it in ubuntu 24.0409:44
zigoMy original plan was to have Bobcat support 2.0, and upload it together with OpenStack when Bobcat is released.09:44
sean-k-mooneyack09:45
zigoI may follow that path even with a broken heat package in Unstable...09:45
sean-k-mooneyfor what its worth i was really really hoping all openstack whoudl have got to 2.0 compatibliy in bobcat too09:45
zigoFYI, that's ortogonal to maintaining the unofficial bookworm-bobcat backport repository that most Debian users will consume instead of unstable/testing.09:45
opendevreviewStephen Finucane proposed openstack/nova master: db: Replace use of backref  https://review.opendev.org/c/openstack/nova/+/86082909:46
opendevreviewStephen Finucane proposed openstack/nova master: objects: Stop fetching from security_groups table  https://review.opendev.org/c/openstack/nova/+/86085009:46
opendevreviewStephen Finucane proposed openstack/nova master: db: Remove unused relationships  https://review.opendev.org/c/openstack/nova/+/89463509:46
zigoAlso, the real question we should ask zzzeek about: how come he's breaking everyone ?!?09:46
zigoThe Linux kernel is *never* breaking userland, I don't see why this must happen every weekend in the Python ecosystem. :(09:46
sean-k-mooneyzigo: this is not zzzeek fault09:47
zigoI admit I don't have the details (didn't have time to look).09:47
sean-k-mooneyzigo: in fact the 1.4 bridge release is them going well out of there way to provide a smooth upgrade path09:47
WJeffsHey all, is anyone running AMD genoa cpus, that I could query some stuff to them?09:48
zigoWell, there should be no need for an upgrade path, there should be backward compat forever, that's the point I'm making.09:48
sean-k-mooneyzigo: right which as a software maintainer you hsould knwo woudl be unresable ot continue in a supprot way 09:48
sean-k-mooneyhe coudl freeze the old apis09:48
sean-k-mooneybut there were fundemetal changes to the core 09:49
sean-k-mooneyto use and adpat to new python changes09:49
zigoHow do you explain the Linux kernel does it the proper way then?09:49
sean-k-mooneyand to solve some correctness issues that needed the api to change09:49
zigoAnd the kernel is a *WAY* bigger and complicated...09:49
sean-k-mooneyzigo: by letting the old api bit rot09:50
zigoWell, the current situation shows it was the way to go... :)09:50
sean-k-mooneyim not really sure it does09:51
sean-k-mooneywe are not going to be droping 1.4 support in openstack for some time09:51
sean-k-mooneyand i dont think zzzeek plans ot dicontinue it for some tiem either09:51
sean-k-mooneyyes the fact breakign changes are required is unfortunet09:52
sean-k-mooneythat does not mean we shoudl never do them just in frequesntly when its the best option09:52
zigoDon't get me wrong, I do love SQLAlchemy, though in the past, every upgrade to a minor version has broke all of OpenStack. The switch from 1.4 to 2.x is kind of very painful, as we're discussing. Well, that's *very* bad practice that zzzeek should learn to *not* reproduce ever again.09:53
sean-k-mooneyits actully not that painful to adapt too09:54
sean-k-mooneybut it was not a priorty for most of the maintainers09:54
sean-k-mooneyonly a few have wanted to spend time on it and done most of the work09:54
zigoBut the old code should still be working with the new version ...09:54
sean-k-mooneyit is in 1.4 which is a long term supprot branch to make it work09:54
zigoDoesn't work for me...09:55
sean-k-mooneyzigo: to be clear some of the api had diffent behavior on diffent backend due to bugs09:55
zigoOnly a single version will live in the distro, as we discussed...09:55
sean-k-mooneythat coudl not be fixed without breakign changes09:55
zigoOk.09:55
zigo:)09:55
zigoThis makes a lot more sence then.09:55
sean-k-mooneyits becasue theere is no one SQL09:55
sean-k-mooneythers is TSQL PSQL and what ever mariadb calls it09:56
zigoMaybe zzzeek can help fixing Heat and oslo.db?09:56
sean-k-mooneyoslo.db i think basically works09:56
sean-k-mooneyheat im not sure about09:56
zigoOh, btw, should I revert my packaging to oslo.db 12.3.2 ?09:56
sean-k-mooneystephenfin: ^09:57
zigo(ie: leave 14.0.0 in Experimental, and just bump to 12.3.2 in unstable...)09:57
zigoAnyways thanks a lot for discussing the mater with me, that's very helpful.09:58
zigoAt least, I know what's going on.09:58
sean-k-mooneyhttps://review.opendev.org/q/topic:sqlalchemy-20+status:open09:58
sean-k-mooneythat is the topic wiht all the pending packages09:58
sean-k-mooneywhich unfortunelly still has nova and placement in it09:59
* zigo bookmarked this and will probably use unmerged patches10:06
zigostephenfin: Most patches are from you, so I have to thank you as well ! :)10:09
sean-k-mooneyyep stephenfin went out of there way to try and update proejct that did nto have peopel steping up to do it10:10
sean-k-mooneyunfortuetly that still needs that projects core team to review it10:10
bauzassean-k-mooney: again, please understand that we have review priorities10:25
bauzasand sqla 2.0 isn't really a prio10:25
sean-k-mooneybauzas: it really was ment to be10:26
sean-k-mooneyand i find the fact you didnt consider it to be one a probelm10:26
sean-k-mooneyit means we are not comunicating what our priorites are as a team properly10:27
stephenfinbauzas: I struggle to see how adapting to the new major version of a critical library is anything but a priority, especially when it's being treated as such by every other project team10:44
bauzassorry was at lunch11:53
bauzasI don't want to argue about why we didn't had time to review yet the series, but I'm just trying to merge it before RC111:53
bauzasso, please understand my concerns and the fact that we still also need to review other changes11:54
bauzasstephenfin: will you have time to upload a new PS due to my comment ?12:08
stephenfinI already did, but it's crashing and burning12:09
stephenfinhttps://review.opendev.org/c/openstack/nova/+/860829/312:09
sean-k-mooneythat looks like the change you made is actully the problem12:11
sean-k-mooneyand the previous version was likely correct12:11
stephenfinyup12:11
stephenfinI did test with another reproducer locally but I must have missed something12:11
sean-k-mooneywell its actully failing in tempest too12:12
sean-k-mooneyso its not just a test artifact12:12
sean-k-mooney sqlalchemy.exc.InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'mapped class BlockDeviceMapping->block_device_mapping'. Original exception was: Instance.block_device_mapping and back-reference BlockDeviceMapping.instance are both of the same direction symbol('MANYTOONE').  Did you mean to12:13
sean-k-mooneyset remote_side on the many-to-one side ?12:13
bauzasthat's why I want to be on par with the existing12:14
bauzasI'm very afraid of any performance hit we may introduce if we wrongly do 12:14
sean-k-mooneyit was workign before he added the extra changes12:14
bauzasthe problem here is that we don't know if the other-way relationship is needed12:14
sean-k-mooneyyou mean even though it passses tempest, unit test, functional test adn api test12:15
sean-k-mooneywithout this change12:15
sean-k-mooneyyou still think we dont actuly have a stong enough indication that its not used12:15
sean-k-mooneythe revers relateion would be lookign up instance by a security group12:17
sean-k-mooneyhttps://docs.openstack.org/api-ref/compute/#list-security-groups-by-server12:17
sean-k-mooneythe proxy api only allows you to look up security groups of an instnace12:17
sean-k-mooneyso we have no public api for the reverse coralation12:18
opendevreviewStephen Finucane proposed openstack/nova master: db: Replace use of backref  https://review.opendev.org/c/openstack/nova/+/86082912:18
opendevreviewStephen Finucane proposed openstack/nova master: objects: Stop fetching from security_groups table  https://review.opendev.org/c/openstack/nova/+/86085012:18
opendevreviewStephen Finucane proposed openstack/nova master: db: Remove unused relationships  https://review.opendev.org/c/openstack/nova/+/89463512:18
bauzassean-k-mooney: yeah, while it was difficult to find whether we use instance.bdm, I just guess that instance.securit_group is way less largely used :)12:21
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/db/main/api.py#L2992-L322812:21
sean-k-mooneylooking at the code there is nothign that woudl be trying to look up instance by the security group12:22
sean-k-mooneythe closest thing is security_group_in_use12:22
sean-k-mooneyand that jsut directly uses the SecurityGroupInstanceAssociation table12:22
sean-k-mooneybauzas: anyway looks like stephen removed  foreign_keys=uuid,12:25
sean-k-mooneyso that presumabnle fixes it locally?12:25
stephenfinyup12:25
sean-k-mooneybut honestly i woudl have been much happer merging the older version before we added this. i dont want stephen to respien to drop it12:25
stephenfinshouldn't have been there since there's no foreign key column on that side (many-to-one)12:25
sean-k-mooneybut i dont like adding dead code12:25
bauzasthe code is already dead12:28
bauzasstephenfin explained me that backref implicetely creates a return relationship12:28
bauzasso we already have this, silently12:28
bauzassome "code" may use this implicit relationship for getting the values12:29
bauzaslike, instance.bdm12:29
bauzasfortunately, we have nova objects that are the DB facade12:29
auniyalwhile writing a functional test for server having tags I am getting this error: https://paste.opendev.org/show/bM4cijMNqvGY70FuEDFG/12:39
auniyalI have set     api_version = '2.43'12:39
auniyalbecause its required in CLI12:40
auniyalis there anything else I should look for ?12:40
auniyalbauzas, sean-k-mooney ^12:41
sean-k-mooneyits tags not tag12:49
sean-k-mooneywell there are two ways12:49
sean-k-mooneywhat eactully are you executing12:49
sean-k-mooneyyou can add a tag by doing a put to /servers/{server_id}/tags/{tag}12:50
sean-k-mooneyor you do a put to /servers/{server_id}/tags with {tags": ["tag1", "tag2"]} as the boday12:50
sean-k-mooneyhttps://docs.openstack.org/api-ref/compute/#server-tags-servers-tags12:51
sean-k-mooneyif your doing this as part of server create its also tags not tag12:52
auniyalI am adding tag while creating server - https://paste.opendev.org/show/bJBibgysdZSdO5iMAZEg/13:06
sean-k-mooneythat not adding a tag to a server13:06
sean-k-mooneythats addign a tag to a network13:06
sean-k-mooneyits not the same thing13:06
auniyalack13:06
auniyalso I am trying device tagging13:07
auniyalonce device is tagged, I'll verify it in metadata.json13:07
sean-k-mooneyper the api docs 13:07
sean-k-mooneyA bug has caused the tag attribute to no longer be accepted starting with version 2.37. Therefore, network interfaces could only be tagged in versions 2.32 to 2.36 inclusively. Version 2.42 has restored the tag attribute.13:07
sean-k-mooneyso in the network you call it tag13:08
sean-k-mooneyand your using 2.4313:08
sean-k-mooneyso that shoudl be fixed13:08
auniyalyes, 2.4313:08
sean-k-mooneyso https://paste.opendev.org/show/bE1IV21v1cdSzRKNWKrw/13:09
sean-k-mooneyshould be valid13:09
sean-k-mooneyjust s/tags/tag/13:09
sean-k-mooneyhttps://github.com/openstack/nova/commit/e80e2511cf825671a479053cc8d41463aab1caaa13:11
sean-k-mooneythat is the change that fixed it13:11
auniyalthis is full what I have right now - https://paste.opendev.org/show/bymbRPlBXG3nmY1Nm4zy/13:12
sean-k-mooneythis is the api sample test that validates it https://github.com/openstack/nova/blob/master/nova/tests/functional/api_sample_tests/api_samples/servers/v2.42/server-create-req.json.tpl13:13
sean-k-mooneyyou do not need admin_api=ture by the way13:14
auniyalack13:15
sean-k-mooneyso i think the problem is 13:15
sean-k-mooney self.api = self.useFixture(13:15
sean-k-mooney            nova_fixtures.OSAPIFixture(api_version='v2.1')).api13:15
sean-k-mooneyyou are seting api_version but i dont think that will work as you expect13:16
bauzassorry auniyal can't help, today is the last day before RC113:17
bauzasIMHO you need to set the microversion correctly13:18
auniyalso yeah, I tried this  nova_fixtures.OSAPIFixture(api_version='v2.42')) and got https://paste.opendev.org/show/bHOTRcnwwpPb1lP8BbH5/13:18
bauzasthat's not how you set the microversion13:18
sean-k-mooneyya so when we set version in teh class like that13:18
bauzasplease look at other tests13:18
sean-k-mooneywe have explcitly coded the base class to use them13:18
bauzasyou have a class value13:18
sean-k-mooney(test.TestCase, integrated_helpers.InstanceHelperMixin) wont make api_version = '2.42'13:19
sean-k-mooneywork at the class level13:19
sean-k-mooneythis is hte base your inheriting form 13:20
sean-k-mooneyhttps://github.com/openstack/nova/blob/53012f1c55072c42ced267a2b1adef0a669d9f45/nova/test.py#L15113:20
sean-k-mooneythe functionaltiy you are tyring to use si part of _IntegratedTestBase13:23
sean-k-mooneyhttps://github.com/openstack/nova/blob/53012f1c55072c42ced267a2b1adef0a669d9f45/nova/tests/functional/integrated_helpers.py#L123913:23
sean-k-mooneyit does https://github.com/openstack/nova/blob/53012f1c55072c42ced267a2b1adef0a669d9f45/nova/tests/functional/integrated_helpers.py#L1307-L131113:24
sean-k-mooneyif this is a regression test you shoudl just set the microversion on the fixture13:24
auniyalmicroversion as latest13:24
sean-k-mooneyif this is for geneal testing then you proably shoudl be extendign the exitig test13:24
sean-k-mooneyhttps://github.com/openstack/nova/blob/53012f1c55072c42ced267a2b1adef0a669d9f45/nova/tests/functional/test_servers.py#L61C7-L61C1813:25
sean-k-mooneyauniyal: microveion as lates means nothing of your parten classes dont read that class veraible13:25
sean-k-mooneythats the point im making13:25
sean-k-mooneyso first question is why are you writing this test?13:26
auniyalthis bug https://bugs.launchpad.net/nova/+bug/183638913:27
auniyalI am working on reproducer of it13:27
sean-k-mooneyok so if its a regression test13:27
sean-k-mooneythen you should jsut set the api version in the fxiture defintion13:27
auniyalalso as I could not find many sonple deve tagging functional test, so thoguth I create for all devices13:28
sean-k-mooneynot in the same patch13:28
auniyalyes yes13:28
sean-k-mooneythe repoduce should only test the specific bug13:28
auniyalone patch will be a general odule having all tagging test13:29
auniyaland then reproducer13:29
auniyalbut yeah, if could write either one first next should be easy13:29
auniyalso update api version in OSAPIFixture ?13:30
sean-k-mooneythat or do self.api.microverion=2.4313:30
sean-k-mooneyi generally woudl do that instead13:30
auniyalack13:31
sean-k-mooneylike this https://github.com/openstack/nova/blob/53012f1c55072c42ced267a2b1adef0a669d9f45/nova/tests/functional/regressions/test_bug_1806064.py#L54-L5713:31
auniyalsean-k-mooney, still same error, "tag was unexpexted"13:34
auniyalso I removed api_versiona and microversion at class level, and add self.api.microverion=2.43 at setup13:36
auniyalafter defining self.api 13:36
opendevreviewStephen Finucane proposed openstack/nova master: Add job to test with SQLAlchemy master (2.x)  https://review.opendev.org/c/openstack/nova/+/88623013:40
stephenfinbauzas, sean-k-mooney: I addressed merge conflicts and removed the job from the gate pipeline ^13:53
bauzasstephenfin: saw it, you're great14:00
bauzasI definitely want to merge your patch because of the effort you made14:00
bauzasstephenfin: could you make it non-voting at first ?14:10
stephenfinI'd rather not. There's no point in non-voting jobs because no one looks at them.14:12
stephenfinIf it breaks our gate and we need to get stuff merged, we can simply disable it until we get around to fixing it. However, outside of a release week, a regression with our SQLAlchemy 2.x support should be prevented from merging.14:12
dansmiththen we can just wait until after rc1 to merge?14:14
bauzaswe agreed on the Bobcat support envelope14:15
bauzaswhich is 1.414:15
bauzasideally new-style, hence https://review.opendev.org/c/openstack/nova/+/860829/ being actively reviewed14:16
bauzasso, yeah, I'm ok with awaiting RC1 to be tagged for the zuul job14:16
bauzasyup14:17
stephenfinack14:26
bauzasstephenfin: I'll keep my +2 but you can ping me once we deliver RC114:27
bauzasI'll add this zuul patch into our RC tracking etherpad14:27
bauzaswfy ?14:27
stephenfinsure14:27
greatgatsby_Hello.  Our host aggregates seem to get out of sync with our provider aggregates.  There seems to be a `nova-manage placement sync_aggregates` command, but we're confused how they're getting out of sync in the first place.  Any suggestions of what could be the cause?  This is deployed via kolla-ansible yoga14:28
greatgatsby_we don't have any idea even where to start looking.  Searching logs didn't uncover anything14:28
dansmithgreatgatsby_: is the sync command fixing them?14:29
greatgatsby_it does fix them, but then we'll notice they're suddenly out of sync again.  The last time we lost all the provider aggs and couldn't spin up VMs14:29
dansmithsuddenly out of sync? in what way?14:30
dansmiththey should only change if you add/remove compute nodes from aggregates14:30
greatgatsby_the host aggs do not match the provider aggs.  They do initially, then they keep going out of sync14:30
greatgatsby_we're not removing nodes14:31
dansmiththere is nothing to synchronize if you're not changing the aggregates...14:31
dansmithso I'm not sure what could be going on.. are you doing DB replication and perhaps something is undoing the sync?14:31
greatgatsby_so the aggs that the sync command sync go out of sync with us (to the best of our knowledge) not doing anything to cause that14:32
greatgatsby_DB is just setup how kolla-ansible sets it up, we're pretty hands off with the DB right now14:33
dansmithto be clear, the synchronization is 1. make sure all nova aggs exist in placement and 2. make sure host assignments to aggs is mirrored from nova to placement14:33
dansmithso if you're not changing aggs or mappings, there is nothing changing that needs to be sync'd14:33
dansmithpresumably nova looks right and placement looks wrong?14:33
greatgatsby_I agree with that, except we lose the placement aggs.  They start of ok, we even have a script that compares the sync, it's all ok for a few days, then some (or all) of the placement aggs get dropped14:34
greatgatsby_I hope I'm using the correct terminology14:34
dansmiththe aggs are disappearing or the mappings?14:34
dansmith(or both)?14:34
greatgatsby_nova is always right, placement disappear14:34
tobias-urdininteresting observation, stopping a nova-conductor should call ConductorManager's stop() and wait() which isn't implemented or am I blind? the nova.service part calls that. If I stop nova-conductor while nova-compute is mid report_state or doing a periodic task it would fail with MessageTimeout because in nova-conductor we don't finish processing14:36
tobias-urdinbefore stopping the service? for example we dont stop(), process things until queue is empty (calling wait method)...? perhaps somebody knows before I go digging14:36
greatgatsby_so when we do `openstack aggregate show <agg-name>` that always shows the correct hosts.  When we drill into the resource provider aggregates, those go out of sync14:36
dansmithgreatgatsby_: biab, call14:38
greatgatsby_appreciate the help!  I'm just kind of stuck trying to figure out where to even look to figure out what's causing this14:39
dansmithtobias-urdin: yeah nova has no safe graceful shutdown procedure that lets RPC finish what it's doing, unfortunately14:48
dansmithgreatgatsby_: so again, the aggregates themselves are there, it's the host mappings (i.e. which hosts are in which aggregate) that is wrong/14:48
dansmithgreatgatsby_: I think you need to look at the placement access log to see if there is something doing aggregate operations in between it being in sync and being out of sync.. to narrow it down to some sort of DB issue or some external actor un-syncing your aggregates14:49
dansmithgreatgatsby_: I'm assuming you're just using normal libvirt and not anything like ironic?14:50
greatgatsby_correct.  We have dev and prod environments, and on prod, 2 hosts were removed from a placement agg but the other 20 or so are still fine.  AFAIK nothing unique was done to those computes14:50
greatgatsby_I grepped through the placement logs for the agg uuid or anything related to aggs and didn't see anything.  I'll dig some more there though14:51
greatgatsby_we're not using ironic14:52
tobias-urdindansmith: ack good to know, then it's not just me being blind :) out of curiosity does that apply to nova-compute as well, a service stop could interrupt building an instance? (not something one should do but is it possible)14:52
dansmithgreatgatsby_: okay, also grep for any operations happening on the hosts that get removed, like if their provider is getting deleted and re-added or something14:54
dansmithgreatgatsby_: changing hostnames (which sometimes happens because of bad DNS) or other things could be causing the computes to delete and re-create their providers in placement, which would have the same effect14:54
dansmithtobias-urdin: yep :/14:54
dansmithtobias-urdin: for instance create, best is to disable the host and let it quiesce, but other things could still be started on that compute like a resize. we were just discussing making this better a bit ago14:55
greatgatsby_dansmith: thanks, I'll start looking for that14:56
greatgatsby_greatly appreciate the help!14:56
dansmithgreatgatsby_: fwiw, someone a few weeks ago mentioned the same sort of thing, but I never saw a RCA, so I'll be interested to hear the findings14:57
dansmiththat was more just out of sync one time and running the sync fixed it IIRC14:58
dansmithcertainly possible we've got a bug, but lots of people should be screaming if so14:58
*** blarnath is now known as d34dh0r5315:02
tobias-urdindansmith: ack, thanks for the info! always nice to get educated on specifics15:06
dansmithsean-k-mooney: were you going to fix this post check? https://review.opendev.org/c/openstack/nova/+/89354015:30
greatgatsby_dansmith: I'm going to do an hourly `openstack resource provider list` to a log file.  Just to confirm, if something was deleting/re-creating the providers in placement, I would expect to see the uuid change?15:45
dansmithgreatgatsby_: not necessarily, although I think in yoga that's probably true15:46
dansmithworth a shot15:47
greatgatsby_ok - if I find anything I'll be sure to comment back in here15:47
opendevreviewDan Smith proposed openstack/nova master: Make our nova-ovs-hybrid-plug job omit cinder  https://review.opendev.org/c/openstack/nova/+/89354015:48
dansmithgreatgatsby_: you might also just query the DB for resource providers for the "created_at" field and see if any of them look too recent15:55
opendevreviewJay Faulkner proposed openstack/nova master: [ironic] Use openstacksdk version with shard support  https://review.opendev.org/c/openstack/nova/+/89483315:58
greatgatsby_dansmith: excellent, thanks15:59
gmanndansmith: test_evacuate.sh also need to be updated https://review.opendev.org/c/openstack/nova/+/893540/5/roles/run-evacuate-hook/files/test_negative_evacuate.sh#3816:18
dansmithgmann: I'm not sure it does.. it seems to blow past the failed create just fine16:18
gmanndansmith: but that will be run after negative evacuate test pass https://github.com/openstack/nova/blob/master/roles/run-evacuate-hook/tasks/main.yaml#L8716:19
dansmithah okay, I thought you meant the setup_evacuate_resources.sh which seems to be fine16:20
dansmithbut yeah got it16:20
gmannk16:21
bauzaswe are one day before RC1 and still a lot of changes in flight mode :/16:21
bauzaseven if the gate is not flipping that much, that's still limbo dance16:21
opendevreviewDan Smith proposed openstack/nova master: Make our nova-ovs-hybrid-plug job omit cinder  https://review.opendev.org/c/openstack/nova/+/89354016:22
dansmithbauzas: things needing review or just waiting to merge? your etherpad seemed like everything was pretty well on its way when I looked16:22
sean-k-mooneydansmith: oh i forgot about that am not today but i can look at it tomorrow i guess16:22
dansmithsean-k-mooney: already on it16:23
sean-k-mooneyoh ok thanks16:23
bauzasdansmith: everything is accepted except the prelude (which is still on my laptop atm) so just a gate update16:27
dansmithbacack16:27
dansmithor ack even16:27
bauzasbah ack16:27
bauzas:p16:27
opendevreviewSylvain Bauza proposed openstack/nova master: Add a Bobcat prelude section  https://review.opendev.org/c/openstack/nova/+/89494016:34
bauzasJayF: I'm lost in translation, does https://review.opendev.org/c/openstack/nova/+/894833/2 mean that ironic shards won't work until we deliver an openstacksdk release ?16:36
bauzasif so, /me is a very sad panda16:36
gmannI will also check etherpad if anything new change needed to merge, 16:36
JayFbauzas: johnthetubaguy and I are on a call right now trying to find alternatives16:39
JayFbauzas: I assure you I am the saddest panda16:39
bauzasgmann: prelude just arrived, but maybe we should remove the first bullet point items16:40
bauzasJayF: we still have time to remove the shards from the highlights and remove the conf options, if you want MHO16:42
bauzasor just revert the 3rd patch tbc16:42
JayFthat is the worst of all options, I think we have a chance of getting a cleaner fix16:42
bauzasI can hold RC1 just for that16:42
bauzasJayF: a clean fix can't be a client change16:42
JayFwe are trying to manipulate it to do what we want w/o the client change16:43
bauzasit's a long-term solution, but this can't happen in the timeframe we hav16:43
JayFthat patch is up as the known-working example16:43
bauzaswell, the latest patchset is just a noop but yeah16:43
gmannbauzas: noted, will wait for that16:44
bauzasthe fact is, I'm surprised we're facing this at the very end, but urgency requires me to do the post-mortem after RC1, not now16:44
bauzasbut I'd enjoy any very quick fix16:45
dansmithbauzas: I think it makes sense to communicate the "not super tested" nature of it if we're going to leave it in the prelude, personally16:45
bauzastrust me about it16:45
dansmithgiven what we normally expect for validation of headline features like that16:45
bauzasdansmith: yeah, and trust me, I can even remove it from the prelude 16:45
bauzasactually the highlights worry me much16:45
bauzassince they will be used by the marketing folks for the marketing fest16:46
bauzasfortunately, this isn't merged yet, I'm gonna put a -W until we clarify the state16:46
bauzasJayF: I'll drop now for 2.5 hours but I'll come back16:54
bauzasin case you have options16:54
sean-k-mooneyJayF: bauzas  i think we use ironic client of rthe shard stuff no?17:12
JayFbauzas: I am convinced a revert is the best path. I'd personally prefer even reverting the peer_list deprecation; but if we leave that deprecation in, can I get some assurance that we won't *remove* peer_list until sharding lands?17:12
sean-k-mooneyJayF: stephenfin had a seriese ot move ironic to the sdk17:12
JayFsean-k-mooney: we use a mix of client and sdk17:12
sean-k-mooneyok but we didnt land the changes to move to sdk only17:13
JayFsean-k-mooney: which is part of why this was complex to figure out what was going on17:13
JayFhttps://etherpad.opendev.org/p/nova-sharding-rca I've created this17:13
sean-k-mooneyok17:13
JayFputting notes from our technical research for follow-ups17:13
sean-k-mooneyso are we using ironic client for this or not?17:13
JayFall node fetches in Ironic for node listing use sdk17:13
JayFmost node updates/writes use client17:13
bauzasJayF: so you basically wanna revert the whole series ?17:13
JayFbut that is a very rough line17:13
sean-k-mooneylets not do that just yet17:14
JayFbauzas: I think that's what we have to do. We'd need an SDK change, and there's a sniff of a bug in the Ironic API handling too, which is what pushed me to revert17:14
sean-k-mooneyif the specific call we need are functional then i woudl not revert17:14
bauzassean-k-mooney: we are on the edge of RC117:14
JayFthe entire basis of ironic/nova node sharding is that we have the ability to query nodes limited by shard17:14
bauzasand we highlighted to the marketing about the ironic features17:14
JayFon ironic side we index on shard to make that a fast query17:14
JayFand so if SDK can't add that shard query, doing late filtering or anything like that basically guts the purpose of the change17:15
dansmithif there's any question, we should revert for sure17:16
sean-k-mooneyok its using the sdk17:16
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L61617:16
JayFYeah I've been on a zoom w/ johnthetubaguy troubleshooting this since like 7am local time, 3 hours17:16
JayFand julia in here too17:16
dansmithwe should leave the deprecation in place but we probably need to adjust the text that says to use the other thing17:16
JayFdansmith: please please lets not remove peer_list until shard lands though :/17:16
bauzasdansmith: jay is even proposing to revert the whole stack17:16
JayFbauzas: my care about peer_list is this: we need a single slurp release with peer_list + sharding for migration17:17
sean-k-mooneywe are not removing peer_list until D17:17
dansmithbauzas: right, which is probably best17:17
JayFI have no concerns with deprecating peer_list now, but if we remove it in D17:17
JayFthen SLURP users don't get a migratory release17:17
sean-k-mooneyyes they do the migratiory release is C17:17
bauzasokay, then I'm preparing the reverts17:17
sean-k-mooneyit will be deprected in B and C and then removed in D C will have shard supprot and peer_list support17:18
JayFbauzas: I'm extremely sorry for this, we should've had devstack testing lined up as a prereq17:18
bauzasJayF: this is not about being sorry, I prefer we found the bug before we deliver17:19
bauzasso thanks for having spotted it17:19
sean-k-mooneyi would prefer to just using update the docs hostely rather then revert17:19
JayFbauzas: that's why I made darn sure I was hooking up devstack before release time17:19
dansmithnova should have expected a devstack job showing it working before merging, so the fault lies with us all17:19
sean-k-mooneydansmith: also true ...17:20
JayFdansmith: I literally just said that on the zoom call I'm still on17:20
dansmithyup, I didn't review the later patches so I didn't even know we were missing that17:20
dansmithbut this is a good demonstration of why we should (and usually do) require such things17:21
bauzasagreed17:21
dansmithat least we found it before release17:21
sean-k-mooneyJayF: i assume laning the requried change in sdk was aslo rejected right17:22
JayFit's already landed in master17:22
JayFbut we saw concerning behavior from the /v1/nodes?shard=blah Ironic API endpoing17:22
JayFwhich is why I hit the revert button17:22
sean-k-mooneyok but it has not release yet right17:22
JayFI want it to land; but I don't want to break people and my confidence in this feature functioning is extremely low17:22
JayFsean-k-mooney: That's been out since A17:22
JayFsean-k-mooney: so I likely need to validate that broken behavior and fix it17:23
JayFsean-k-mooney: or it's possible it's not broken; but I need to check that and I don't think RC is the time to be answering questions like that17:23
sean-k-mooneyack 17:24
dansmithindeed.17:24
sean-k-mooneyit sound like there are enough things in flight that we wont knwo for a while :(17:24
opendevreviewSylvain Bauza proposed openstack/nova master: Revert "Make compute node rebalance safter"  https://review.opendev.org/c/openstack/nova/+/89494417:25
opendevreviewSylvain Bauza proposed openstack/nova master: Revert "Add nova-manage ironic-compute-node-move"  https://review.opendev.org/c/openstack/nova/+/89494517:25
opendevreviewSylvain Bauza proposed openstack/nova master: Revert "Limit nodes by ironic shard key"  https://review.opendev.org/c/openstack/nova/+/89494617:25
opendevreviewSylvain Bauza proposed openstack/nova master: Revert "Deprecate ironic.peer_list"  https://review.opendev.org/c/openstack/nova/+/89494717:25
dansmithbauzas: we need to leave the deprecation in place, are you going to do a new one to deprecate with different wording?17:25
bauzasdansmith: I thought JayF said "unplug the whole stack since we don't want to deprecate until shards exist"17:26
dansmithbauzas: no, we should deprecate now, but not remove until shards is in place17:26
JayFMy opinion on deprecation is null; my opinion on *removal* is "after there has been at least one SLURP with shard+peer_list in a release together"17:27
dansmithyup17:27
sean-k-mooney+117:27
bauzaso ok17:27
dansmithbauzas: good with ninja approvals on the reverts right?17:27
bauzasdansmith: I think it's even documented in our docs17:28
sean-k-mooneyi mean i can also just hit them17:28
sean-k-mooneybauzas: it is but not really for this17:28
sean-k-mooneythe fast revert poicly does allow it however17:28
bauzaswell, this is a broken piece17:28
bauzasfast reverts apply to this17:28
dansmithbauzas: yep, just confirming.. did the first three17:29
bauzasso17:29
bauzasI need to look at the deprecation patch17:29
bauzasto see the wordings17:29
dansmithspecifically we need to keep something like "Running multiple nova-compute processes that point at the same conductor group is now deprecated"17:30
dansmithand the conf rename and the deprecation reason there17:30
dansmithhowever we do that is fine17:30
bauzasI just did read the whole patch17:31
bauzasand I think I can abandon my revert17:31
bauzasthe deprecation just says the truth17:31
bauzasdansmith: JayF: remind me my SLURP knowledge17:33
bauzasif we say 'we deprecate in B'17:33
bauzasthat means we can't hardly remove in C, provided ops wouldn't have had the memo17:33
sean-k-mooneyif we deprecate in be we still have to keep it to have the deprecateion released in C17:33
dansmithwe also need to deprecate in C and then we can remove in E, assuming C has the shards as usable17:33
JayFYeah, that matches my understanding.17:33
sean-k-mooneyso first removal is D regardless of when we deprecate17:33
bauzasyeah17:34
dansmithright, we could remove in D 17:34
dansmithfirst slurp it can be missing from is E17:34
bauzaswe have a couple of deprecations this cycle17:34
bauzashttps://docs.openstack.org/releasenotes/nova/unreleased.html#deprecation-notes17:34
sean-k-mooneyyep which is fine17:34
bauzasso I'll ensure we don't remove anything next cycle17:34
sean-k-mooneywe will be removing thing next cycle17:35
sean-k-mooneyjust not those17:35
sean-k-mooneywe have deprecated things from A and older that we can remove17:35
sean-k-mooneythat is ok because A was a slurp17:35
sean-k-mooneyso if it was deprected in A we can remove if it was deprected in B we cant until D17:36
sean-k-mooneyok im going to call it a day17:36
sean-k-mooneyo/17:36
bauzasJayF: dansmith: updated the cycle highlights in order to remove any occurrence of the shards https://review.opendev.org/c/openstack/releases/+/89421317:37
bauzasI'd appreciate if you could quickly review it17:38
dansmithbauzas: waaaay ahead of you17:38
bauzas:)17:38
bauzasokay, I gonna call it a wrap too, but I'll poke around after dinner, since we have a couple of changes upfront in the gate17:39
opendevreviewSylvain Bauza proposed openstack/nova master: Add a Bobcat prelude section  https://review.opendev.org/c/openstack/nova/+/89494017:42
dansmithbauzas: I've seen this fail twice this morning already, not sure if something changed or not: https://b5ee1ed3653a458879e1-60fa9bbec8248937c3af4b3a8047f40b.ssl.cf2.rackcdn.com/893540/6/check/nova-live-migration/6e12356/testr_results.html17:42
bauzasdansmith: yeah that's a known bug17:42
bauzashttps://bugs.launchpad.net/neutron/+bug/1940425 17:43
bauzasit's old but its occurrences raised recently17:43
dansmithbauzas: yeah, but I haven't seen it very much in the past few months, but several times today now17:43
dansmithokay17:43
dansmithcritical but unfixed in neutron since 2021..eesh17:44
bauzasI haven't pinged lajoskatona ralonsoh nd the other neutron cores17:44
bauzas(yet)17:44
bauzasdansmith: well, I guess they just don't check the number of critical bugs they have on a weekly basis, that's it :D17:45
gmanndansmith: seems we need to put this whole things under cinder check https://review.opendev.org/c/openstack/nova/+/893540/6/roles/run-evacuate-hook/files/test_evacuate.sh#5818:18
gmannit is failing there https://zuul.opendev.org/t/openstack/build/5efc0aaf88874c45b2eb68d8e9cf4a0d/console18:18
dansmithgmann: dammit18:18
dansmithI was just going to fix one thing, see how it goes, fix another, etc18:18
dansmithyou keep pointing out all the problems and are making me look bad :)18:18
opendevreviewDan Smith proposed openstack/nova master: Make our nova-ovs-hybrid-plug job omit cinder  https://review.opendev.org/c/openstack/nova/+/89354018:29
lajoskatonadansmithm bauzas: I think we have now this bug for that issue: https://bugs.launchpad.net/neutron/+bug/2033887 and this is the patch series for that: https://review.opendev.org/q/Ifc2d37e2042fad43dd838821953defd99a5f866518:43
dansmithlajoskatona: col18:45
dansmither cool even18:45
gmanndansmith: you doing so much work and on gate stability makes you always better and not bad :)18:54
dansmithheh18:54
* bauzas just ducks out19:01
bauzasdansmith: fwiw, also seeing more kernel crashes like https://bb94d2825af897cfcd12-f6f1806a4829a343b7540be166a34ea9.ssl.cf5.rackcdn.com/860829/4/check/nova-next/bf91602/testr_results.html19:03
bauzasagain an already known issue but more failures19:04
dansmithbauzas: always with no space left on device?19:04
bauzasdon't think so19:05
bauzasI have to doublecheck tho19:05
dansmiththat's during boot, so it looks more like maybe a broken image creation or if it's on ceph, maybe a backend problem19:05
dansmithoh volume boot, so yeah something wrong with the actual volume19:06
bauzasyeah probably a ceph RC19:06
dansmithit never mounted and thus couldn't pivot over and the ENOSPC comes from having not mounted something writable on the target19:06
bauzashmpfff19:20
bauzasSep 13 13:49:37.042893 np0035238841 nova-compute[40581]: FileNotFoundError: [Errno 2] No such file or directory: 'multipathd'19:21
dansmithalways logged by brick AFAIK :(19:21
bauzasI wish I would be Neo 'operator, get me all the knowledge about chopters, err. volume bindings in OpenStack"19:23
bauzas"Tank, I need a pilot program for B-212 helicopter. Hurry.”19:24
bauzas(found the right catchphrase)19:24
opendevreviewDan Smith proposed openstack/nova master: Make our nova-ovs-hybrid-plug job omit cinder  https://review.opendev.org/c/openstack/nova/+/89354019:25
dansmithgmann: looks like it worked this time, but one fix on the cleanup part19:25
dansmith"choppers" or "copters" not "chopters" :P19:25
bauzasso the backref thing is probably about to be punted19:25
bauzasyeah, damn me, it's late19:26
bauzasand when I was young, this was dubbed in French19:26
bauzasso, the multipathd error is just a "normal error" ?19:27
bauzasthat's fun19:27
JayFmultipathd is only needed for some wacky hardware; we have a similar setup in IPA where we opportunistically load it19:29
dansmithby wacky you mean almost anyone using FC :)19:31
JayFheh, I wasn't exactly sure what storage tech so you know, wacky hardware ;) 19:34
JayFnothing we have in CI/gate was more the point :D 19:34
opendevreviewMerged openstack/nova master: Update compute rpc alias for bobcat  https://review.opendev.org/c/openstack/nova/+/89374420:08
opendevreviewMerged openstack/nova master: Revert "Make compute node rebalance safter"  https://review.opendev.org/c/openstack/nova/+/89494421:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!