Friday, 2020-07-24

*** jmlowe has joined #openstack-nova00:01
*** brinzhang0 has joined #openstack-nova00:06
*** brinzhang_ has quit IRC00:09
*** aj_mailing has joined #openstack-nova00:10
*** xek_ has quit IRC00:14
openstackgerritGhanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies  https://review.opendev.org/74277100:15
*** brinzhang has joined #openstack-nova00:17
openstackgerritGhanshyam Mann proposed openstack/nova master: Add test coverage of tenant networks policies  https://review.opendev.org/74276500:18
openstackgerritGhanshyam Mann proposed openstack/nova master: Introduce scope_types in tenant networks policy  https://review.opendev.org/74276600:19
openstackgerritGhanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies  https://review.opendev.org/74277100:19
*** brinzhang0 has quit IRC00:20
openstackgerritGhanshyam Mann proposed openstack/nova master: Pass the actual target in tenant networks policy  https://review.opendev.org/74277200:23
*** aj_mailing has quit IRC00:33
*** aj_mailing has joined #openstack-nova00:34
*** songwenping__ has joined #openstack-nova00:45
*** xiaolin has joined #openstack-nova00:53
*** jmlowe has quit IRC00:58
*** yaawang has quit IRC01:00
*** jmlowe has joined #openstack-nova01:00
*** yaawang has joined #openstack-nova01:00
*** songwenping_ has joined #openstack-nova01:07
*** songwenping__ has quit IRC01:10
openstackgerritGhanshyam Mann proposed openstack/nova master: Add test coverage of volumes policies  https://review.opendev.org/74277301:15
*** masterpe has quit IRC01:16
*** masterpe has joined #openstack-nova01:20
openstackgerritGhanshyam Mann proposed openstack/nova master: Introduce scope_types in volumes policy  https://review.opendev.org/74277401:22
openstackgerritGhanshyam Mann proposed openstack/nova master: Add new default roles in security_groups policies  https://review.opendev.org/74276301:23
openstackgerritGhanshyam Mann proposed openstack/nova master: Pass the actual target in security_groups policy  https://review.opendev.org/74276401:23
openstackgerritYingji Sun proposed openstack/nova master: Set different VirtualDevice.key  https://review.opendev.org/71356501:38
*** aj_mailing has quit IRC01:47
openstackgerritGhanshyam Mann proposed openstack/nova master: Add new default roles in volumes policies  https://review.opendev.org/74277701:57
*** yaawang has quit IRC02:04
openstackgerritGhanshyam Mann proposed openstack/nova master: Pass the actual target in volumes policy  https://review.opendev.org/74277902:07
*** songwenping__ has joined #openstack-nova02:09
*** songwenping_ has quit IRC02:12
*** mkrai has joined #openstack-nova02:20
*** yaawang has joined #openstack-nova02:24
*** aj_mailing has joined #openstack-nova02:25
*** dave-mccowan has quit IRC02:26
alex_xustephenfin: gibi, I saw you mentioned the upgrade issue for provider config yaml. I didn't follow the spec discussion in the beginning, could you remind me what is about? then I think I can help tony_su go through the problem.02:28
*** lbragstad_ has joined #openstack-nova02:30
openstackgerritMerged openstack/nova stable/stein: compute: Allow snapshots to be created from PAUSED volume backed instances  https://review.opendev.org/72917602:30
*** aj_mailing has quit IRC02:31
*** lbragstad has quit IRC02:32
*** gyee has quit IRC02:33
*** lbragstad_ has quit IRC02:35
*** gyee has joined #openstack-nova02:40
*** Yumeng has joined #openstack-nova02:43
openstackgerritMerged openstack/nova stable/ussuri: objects: Update keypairs when saving an instance  https://review.opendev.org/74263102:50
*** yaawang has quit IRC03:00
*** yaawang has joined #openstack-nova03:01
*** huaqiang has joined #openstack-nova03:09
openstackgerritXinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification  https://review.opendev.org/74278503:15
*** songwenping__ has quit IRC03:19
*** songwenping__ has joined #openstack-nova03:19
*** mriedem has left #openstack-nova03:23
tony_sugibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ...03:26
tony_sugibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ...03:27
*** aj_mailing has joined #openstack-nova03:27
*** tony_su has left #openstack-nova03:27
*** tony_su has joined #openstack-nova03:28
*** yaawang has quit IRC03:31
openstackgerritYingji Sun proposed openstack/nova master: Set different VirtualDevice.key  https://review.opendev.org/71356503:32
*** yaawang has joined #openstack-nova03:32
*** brinzhang_ has joined #openstack-nova03:33
*** brinzhang has quit IRC03:36
*** psachin has joined #openstack-nova03:36
*** huaqiang has quit IRC03:40
*** yaawang has quit IRC04:09
*** yaawang has joined #openstack-nova04:09
*** gyee has quit IRC04:14
openstackgerritXinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification  https://review.opendev.org/74278504:16
*** aj_mailing has quit IRC04:28
*** udesale has joined #openstack-nova04:33
*** mkrai has quit IRC04:34
*** mkrai has joined #openstack-nova04:44
*** songwenping_ has joined #openstack-nova04:54
*** eharney has quit IRC04:55
*** amodi has quit IRC04:55
*** songwenping__ has quit IRC04:57
*** aj_mailing has joined #openstack-nova05:02
*** eharney has joined #openstack-nova05:08
*** yaawang has quit IRC05:11
*** yaawang has joined #openstack-nova05:12
*** ratailor has joined #openstack-nova05:14
*** aj_mailing has quit IRC05:17
*** aj_mailing has joined #openstack-nova05:25
*** links has joined #openstack-nova05:37
*** songwenping__ has joined #openstack-nova05:47
*** songwenping_ has quit IRC05:51
*** jsuchome has joined #openstack-nova06:31
*** tinwood is now known as tinwood-afk06:33
*** yaawang has quit IRC06:59
*** yaawang has joined #openstack-nova06:59
*** aj_mailing has quit IRC07:05
*** aj_mailing has joined #openstack-nova07:06
*** aj_mailing has quit IRC07:09
*** tesseract has joined #openstack-nova07:13
*** ralonsoh has joined #openstack-nova07:28
gibitony_su: don't worry. I appreciate your work on that series and I will look at it when you are ready07:29
gibialex_xu: I'm not sure I can recall an upgrade issue in the provider config series (but it is Friday so my brain is already slow) do you have a reference?07:31
bauzasgibi: do you know the answer of https://review.opendev.org/#/c/739211/5/nova/tests/unit/test_crypto.py@2107:32
bauzas?07:32
bauzasthat's an horrible import07:32
gibibauzas: looking...07:32
bauzashmmm, can't find a castellanclient kind of thing07:34
gibibauzas: does castellan just an interface and by having castellen we don't have to pull in whole key manager backend like barbican07:34
gibi?07:34
bauzasI'm not a specialist of any OpenStack key manager07:35
bauzasbut if you're right, that explains my readings07:35
* bauzas goes looking at the castellan docs07:35
bauzasmmmm https://docs.openstack.org/castellan/latest/user/index.html#basic-usage07:36
bauzaslooks you're right indeed07:36
*** tosky has joined #openstack-nova07:37
*** yaawang has quit IRC07:40
*** yaawang has joined #openstack-nova07:40
*** mkrai has quit IRC07:44
*** maciejjozefczyk has joined #openstack-nova07:45
*** xinranwang__ has joined #openstack-nova08:05
*** markvoelker has joined #openstack-nova08:11
*** markvoelker has quit IRC08:15
*** nightmare_unreal has joined #openstack-nova08:27
*** mkrai has joined #openstack-nova08:32
*** xek_ has joined #openstack-nova08:35
stephenfinbauzas, gibi: The fix for that o.vo version issue is here, btw https://review.opendev.org/#/c/742650/108:41
gibistephenfin: thanks08:41
stephenfinalex_xu: As gibi said, I don't think anyone noted any upgrade issues with provider.yaml. Perhaps you're confusing it with the investigation of upgrade issues bauzas was doing for the vTPM series?08:42
gibistephenfin: ahh, that was the upgrade discussion yesterday ^^08:43
gibiI knew there was something08:43
gibiI just did not remember what08:43
openstackgerritStephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver'  https://review.opendev.org/68439308:45
*** tinwood-afk is now known as tinwood08:45
alex_xustephenfin: ah, thanks :)08:46
*** derekh has joined #openstack-nova08:51
*** dtantsur|afk is now known as dtantsur08:53
*** janno has quit IRC08:53
*** janno has joined #openstack-nova08:54
*** janno has quit IRC08:55
*** janno has joined #openstack-nova08:55
*** ociuhandu has joined #openstack-nova09:06
*** ratailor_ has joined #openstack-nova09:06
*** ratailor has quit IRC09:08
*** ociuhandu has quit IRC09:09
*** xek_ has quit IRC09:12
*** jraju__ has joined #openstack-nova09:23
*** links has quit IRC09:23
openstackgerritMerged openstack/nova master: scheduler: Request vTPM trait based on flavor or image  https://review.opendev.org/73921009:23
openstackgerritMerged openstack/nova master: crypto: Add support for creating, destroying vTPM secrets  https://review.opendev.org/73921109:24
openstackgerritMerged openstack/nova master: manager: Prevent compute startup on invalid vTPM config  https://review.opendev.org/73921209:24
openstackgerritMerged openstack/nova master: tests: Rename tests for '_create_guest_with_network'  https://review.opendev.org/74046409:24
openstackgerritMerged openstack/nova master: tests: Move single use constants to their callers  https://review.opendev.org/74128009:24
openstackgerritMerged openstack/nova master: tests: Define constants in '_IntegratedTestBase'  https://review.opendev.org/74128109:24
openstackgerritMerged openstack/nova master: tests: Remove 'test_servers.ServersTestBase'  https://review.opendev.org/74128209:24
openstackgerritMerged openstack/nova master: tests: Add 'PlacementHelperMixin', 'PlacementInstanceHelperMixin'  https://review.opendev.org/74128309:25
openstackgerritMerged openstack/nova master: tests: Make '_IntegratedTestBase' subclass 'PlacementInstanceHelperMixin'  https://review.opendev.org/74128409:25
*** mkrai has quit IRC09:27
*** mkrai_ has joined #openstack-nova09:27
*** yaawang has quit IRC09:30
*** yaawang has joined #openstack-nova09:30
stephenfinHoly s***, they all merged in one go. No CI failures :O09:31
openstackgerritStephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver'  https://review.opendev.org/68439309:31
gibistephenfin: that was a nice set09:31
stephenfinbauzas, gibi: Can you look at ^ again real quick? Turns out 'scp' cares about the order of arguments. CI caught it for us and will catch it again if it's wrong09:31
gibistephenfin: looking09:32
stephenfin(from https://zuul.opendev.org/t/openstack/build/7e8c6c6ddaba44e09a90a847dfe6ee46/log/logs/screen-n-cpu.txt)09:32
stephenfinThanks09:32
bauzasack09:32
* bauzas wonders then why CI didn't catch it 09:32
stephenfinIt did09:32
bauzashah09:33
bauzasfwiw https://linux.die.net/man/1/scp09:33
stephenfinI just thought it was intermittent failures and wasn't looking at it often enough to spot the trend :)09:33
stephenfinzuul++09:33
bauzasstephenfin: i don't see any required ordering with scp manpage09:34
stephenfinbauzas: neither did I, but the CI failure is fairly unambiguous09:34
stephenfinprobably the implementation of getopt they're using is borked09:34
bauzasin theory, you could also scp -rC09:35
* bauzas prefers tar over nc 09:35
gibiI can reproduce the ordering requirement of scp locally09:35
gibiso the manpage is incomplete :)09:35
bauzasgibi: I honesly never used the -C flag09:35
bauzaslike I said, I tend to use tar over nc when I wanted to transfer large files09:36
stephenfintbf, parsing command line arguments is hard work09:36
bauzaswaaaaay more efficient09:36
* stephenfin suggests looking at the bug list for argparse /o\09:36
gibiscp is secure tar + nc is fast, it is a tradeoff :)09:36
stephenfinso broken :-(09:36
stephenfinto the point that click (which is actually awesome) uses the deprecated optparse. Less magical and more reliable, apparently09:37
* stephenfin goes back to breaking stuff09:38
* gibi hugs zuul both for being stable and for catching bugs09:39
gibistephenfin: btw https://that.guru/blog/the-numa-scheduling-story-in-nova/ is a great article that made me think about where and when nova selects the resources to consume09:42
bauzasstephenfin: gibi: that's an argparse bug http://paste.openstack.org/show/796277/09:42
bauzasdefinitely not scp-related09:42
stephenfingibi: You can thank sean-k-mooney for most of that. I just spell checked and reorganized :)09:43
stephenfinbauzas: Put '-C' at the end09:43
bauzasoh that09:43
stephenfinthe issue isn't with the order of the positionals09:43
bauzasof course, it won't work then09:43
stephenfinoptions09:43
stephenfinit's with options coming after positionals09:43
bauzasyou shock me if you thought it would work :p09:44
stephenfinbut it does in many applications!09:44
bauzasbut I honestly haven't paid attention at the argparse result :)09:44
openstackgerritMerged openstack/nova master: trivial: Test object backporting against correct version  https://review.opendev.org/74265009:44
bauzasit NEVER worked with scp then :)09:44
bauzasand many BSD commands09:44
bauzas(many many)09:44
stephenfinbauzas: http://paste.openstack.org/show/796278/09:46
stephenfinrun that with e.g. 'python test.py 123 MB -b test'09:46
stephenfinit'll work just fine09:46
stephenfinso optparse (or whatever scp is using) is just plain broken09:47
stephenfinbut hey, I'm not going to fix it :)09:47
gibiyeah 'grep foo ./ -R' works too09:48
gibisean-k-mooney: good article https://that.guru/blog/the-numa-scheduling-story-in-nova/ :)09:49
bauzasstephenfin: fyk https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html09:50
bauzastl;dr: options != operands09:50
bauzasargparse was probably written by Linux geeks who weren't knowing about UNIX :p09:51
bauzastime for a quote09:51
bauzasBSD is what you get when a bunch of UNIX hackers sit down to try to port a UNIX system to the PC. Linux is what you get when a bunch of PC hackers sit down and try to write a UNIX system for the PC09:52
toskynice as a quote, even though iirc historically incorrect: when BSD started, there were no PC09:53
bauzasthat's not coming from me :)09:54
bauzasbut I used to play with some BSD OSes in the past, and this pun was very well known09:54
bauzasdo people know that 'ps' has a very specific POSIX syntax that people can use indefffrently from the OS ?09:55
*** mkrai_ has quit IRC10:01
*** markvoelker has joined #openstack-nova10:03
*** markvoelker has quit IRC10:08
*** k_mouza has joined #openstack-nova10:08
stephenfinbauzas: I was taught to use e.g. 'ps aux' which I think is BSD compatible too10:08
bauzasthat's correct, and that's the old syntax10:09
bauzaswe made it forward compatible10:09
bauzaswhoops10:09
bauzasthey, not we10:10
bauzasI'm not THAT modest10:10
*** zhanglong has quit IRC10:10
bauzastl;dr: options without the dash come from BSD10:10
bauzasand Linux ported them10:11
bauzasbut in theory, you *should* always follow the POSIX syntax to be 100% compliant across all platforms10:11
bauzashttps://askubuntu.com/questions/484982/what-is-the-difference-between-standard-syntax-and-bsd-syntax10:12
bauzasor slighly better https://man7.org/linux/man-pages/man1/ps.1.html10:15
*** spatel has joined #openstack-nova10:18
*** k_mouza has quit IRC10:19
*** brinzhang_ has quit IRC10:20
*** mkrai_ has joined #openstack-nova10:22
*** spatel has quit IRC10:22
*** martinkennelly has joined #openstack-nova10:26
*** k_mouza has joined #openstack-nova10:29
*** psachin has quit IRC10:34
*** links has joined #openstack-nova10:40
*** jraju__ has quit IRC10:41
*** mkrai_ has quit IRC10:43
*** mkrai__ has joined #openstack-nova10:43
*** k_mouza has quit IRC10:45
*** yaawang has quit IRC10:50
*** yaawang has joined #openstack-nova10:51
*** k_mouza has joined #openstack-nova10:51
*** k_mouza has quit IRC10:53
*** k_mouza has joined #openstack-nova10:53
*** ociuhandu has joined #openstack-nova10:54
*** ociuhandu has quit IRC10:59
*** k_mouza has quit IRC11:01
*** Yumeng has quit IRC11:08
*** udesale_ has joined #openstack-nova11:11
*** k_mouza has joined #openstack-nova11:12
*** udesale has quit IRC11:13
*** k_mouza has quit IRC11:17
*** mkrai__ has quit IRC11:42
openstackgerritStephen Finucane proposed openstack/nova master: scheduler: Default request group to None  https://review.opendev.org/74265111:52
openstackgerritStephen Finucane proposed openstack/nova master: tests: Add helpers for suspend, resume and reboot of server  https://review.opendev.org/74128511:52
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Pass context, instance to '_create_domain'  https://review.opendev.org/74128611:52
openstackgerritStephen Finucane proposed openstack/nova master: api: Reject non-spawn operations for vTPM  https://review.opendev.org/74150011:52
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Add emulated TPM support to Nova  https://review.opendev.org/63136311:52
openstackgerritStephen Finucane proposed openstack/nova master: docs: Add docs for vTPM support  https://review.opendev.org/73921311:52
openstackgerritStephen Finucane proposed openstack/nova master: Don't unset Instance.old_flavor, new_flavor until necessary  https://review.opendev.org/74199511:52
openstackgerritStephen Finucane proposed openstack/nova master: Add support for resize and cold migration of emulated TPM files  https://review.opendev.org/63993411:52
openstackgerritStephen Finucane proposed openstack/nova master: Add type hints to 'nova.compute.manager'  https://review.opendev.org/74286311:53
openstackgerritStephen Finucane proposed openstack/nova master: privsep: Add support for recursive chown, move_tree operations  https://review.opendev.org/74286411:53
openstackgerritStephen Finucane proposed openstack/nova master: Add type hints to 'nova.virt.libvirt.utils'  https://review.opendev.org/74286511:53
*** markvoelker has joined #openstack-nova12:04
*** markvoelker has quit IRC12:05
*** markvoelker has joined #openstack-nova12:06
*** k_mouza has joined #openstack-nova12:07
*** k_mouza has quit IRC12:12
*** k_mouza has joined #openstack-nova12:18
*** psachin has joined #openstack-nova12:18
*** k_mouza has quit IRC12:23
*** derekh has quit IRC12:24
*** k_mouza has joined #openstack-nova12:28
*** k_mouza has quit IRC12:31
*** k_mouza has joined #openstack-nova12:31
*** ratailor_ has quit IRC12:34
*** ociuhandu has joined #openstack-nova12:43
*** ociuhandu has quit IRC12:47
*** derekh has joined #openstack-nova12:51
*** lbragstad has joined #openstack-nova13:05
*** mriedem has joined #openstack-nova13:07
*** artom has joined #openstack-nova13:09
*** zigo has quit IRC13:19
*** gokhani has joined #openstack-nova13:25
*** ociuhandu has joined #openstack-nova13:30
*** zigo has joined #openstack-nova13:31
openstackgerritElod Illes proposed openstack/nova stable/rocky: compute: Allow snapshots to be created from PAUSED volume backed instances  https://review.opendev.org/72917713:41
*** sean-k-mooney has joined #openstack-nova13:46
openstackgerritArtom Lifshitz proposed openstack/nova master: Add regression test for bug 1879787  https://review.opendev.org/74123013:48
openstackbug 1879787 in OpenStack Compute (nova) "post_live_migration does not handle Neutron errors" [Medium,In progress] https://launchpad.net/bugs/1879787 - Assigned to Artom Lifshitz (notartom)13:48
openstackgerritArtom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration()  https://review.opendev.org/72976313:48
*** gokhani has quit IRC13:48
openstackgerritMerged openstack/nova stable/ussuri: libvirt: Handle VIR_ERR_DEVICE_MISSING when detaching devices  https://review.opendev.org/74241413:49
sean-k-mooneygibi: hi o/ i was away for a funeral this week so just seeing your commnet on the attach/detach patch now. i can take a look at it more closely next week but still pretty burt out today. hopefully ill be less mentally exausted after the weekend13:54
sean-k-mooneyso ya i think your right there is a bug related to macvtap detach that is prexisting in the libvirt driver13:56
sean-k-mooneywell 2 one its not updating the domain correctly because its not finding the device properly and 2 its not relasing the vf claim because we just dont do that today for sriov detach13:57
sean-k-mooneyproblem 1 is want prevents the vf mac from being reset and the macvtap being removed on detach13:58
gibisean-k-mooney: no worries. take your time to recover13:58
*** mlavalle has joined #openstack-nova13:58
gibisean-k-mooney: I think I've just found the reason of 213:58
gibiand I think I can fix it13:58
gibiI will be away next week13:59
gibiso feel free to touch my code or add patches to the series while I'm away and I will continue the weak after13:59
sean-k-mooneywhat proably makes sense is to have 3 patches. 1 that block detach in the api, then your current one and a final patch for macvtap13:59
gibiyes, make sense to have separate patches for the separate issues14:00
*** k_mouza has quit IRC14:00
sean-k-mooneywe also need a ptach for direct-physical but it is basically the same issue the device lookup fails although it fails for a different reason14:00
*** psachin has quit IRC14:01
sean-k-mooneyit filas because the mac is not present rather then the target_dev14:01
sean-k-mooneybut its still failing in the same if i belive https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L25714:01
gibisean-k-mooney: cool. I haven't had time to try direct-physical yet,14:02
sean-k-mooneywhat i think make sense is to just have 2 code paths. if its an sriov inteface find it by the pci adresss and remove it14:03
sean-k-mooneyif not find it by its mac and remove it14:03
*** xek_ has joined #openstack-nova14:04
sean-k-mooneyill have to look at the code and see if that makes sense in partice however as im not sure if we have the vnic_type or vif type avaiable14:05
gibithe code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 100014:06
gibibah14:06
gibithe code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 100014:06
gibimy copy paste buffer is brokn :/14:07
*** ociuhandu has quit IRC14:07
sean-k-mooneyok cool14:07
gibihttps://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L25714:07
*** ociuhandu has joined #openstack-nova14:07
gibinah, this is the place where the matching between the current domain and the vif being detached happens https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L25714:07
sean-k-mooney  same link :) i think you ment https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/driver.py#L219914:09
sean-k-mooneyand yes it has the vif14:09
stephenfinlyarwood: Are you the person I need to shout at for nova-ceph-multistore failing? :P14:10
sean-k-mooneyso we could add a get_interface_by_pci_address and call that instead for sriov devices.14:10
gibisean-k-mooney: yes both yours and mine points to the code that causes the failure14:10
stephenfinjk, but heads up I'm seeing a lot of failures on that today. Haven't investigated yet though14:10
*** dpawlik2 has quit IRC14:11
*** k_mouza has joined #openstack-nova14:11
lyarwoodstephenfin: dansmith introduced it while I was out so no ;)14:12
lyarwoodstephenfin: what's up?14:12
openstackgerritBalazs Gibizer proposed openstack/nova master: [WIP] Support SRIOV interface attach and detach  https://review.opendev.org/74099514:12
dansmithstephenfin: link?14:12
sean-k-mooneystephenfin: its a modifed verion of the previous ceph job14:12
stephenfindansmith: https://review.opendev.org/#/c/741286/14:12
sean-k-mooneyso the test are the same but the config is slighly different to enable multistore and the image import form copy feature14:13
dansmithstephenfin: thanks will look through it in a sec14:13
gibisean-k-mooney: https://review.opendev.org/#/c/740995/5/nova/virt/libvirt/guest.py@240 this change fixes the macvtap detach issue in my env, but I agree that the condition might need a refactoring to have two condition one for pci and another for mac14:14
* lyarwood wonders if this is a space issue again14:14
sean-k-mooneygibi: ya so that will work for macvtap but we will still fail for direct-physical14:15
gibisean-k-mooney: yes, probably, haven't tried14:15
sean-k-mooneyinterfaces = self.get_all_devices(14:15
sean-k-mooney                vconfig.LibvirtConfigGuestInterface)14:16
sean-k-mooneythat wont return the direct-physical interfaces14:16
sean-k-mooneysince they are not element <interface ...> and use <hostdev ...?14:16
gibiohh14:16
gibiinteresting14:16
sean-k-mooneyalso they dont have a mac in the host develement14:16
sean-k-mooneylibvirt cant passthough a pf with the <interface type=hostdev> only VFs14:17
sean-k-mooneyand the hostdev element dose not have a mac either so interface.mac_addr == cfg.mac_addr  would fail14:18
sean-k-mooneyproably with an attribute error if we got that far14:18
dansmithstephenfin: did you look into those fails at all? looks to me like just novalidhost on at least one of the three failed tests, and it's a conflict from placement during scheduling:14:20
dansmithhttps://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-sch.txt#349314:20
dansmithmeaning, are you sure it's just that job failing more? because that fails way before the point where we get to any of the new (i.e. ceph or multistore) stuff14:21
sean-k-mooneythere are traces in the n-cpu log https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-cpu.txt#14267-1432314:21
sean-k-mooney nova.exception.ImageNotFound: Image a549f544-e4e3-4f66-962e-03c1514ee21f could not be found14:21
stephenfindansmith: Barely. I'm seeing image retrieval failures in n-cpu14:21
stephenfinyeah, those ^14:21
*** links has quit IRC14:21
dansmithhmm, maybe the first test I picked was a rando failure then14:22
stephenfinbut there are a couple of patches in that series failing and I don't think they're related to the code14:22
sean-k-mooneyif those tests are uploading new images maybe they are not ready when the boot is started because the import/conversion takes longer or something14:23
dansmithah yeah, I see now14:23
dansmithsean-k-mooney: yeah that could be14:24
dansmithI think we should still be able to GET the image though14:24
sean-k-mooneylooks like that is not the case for rescure at least https://github.com/openstack/tempest/blob/257f3b009f7978723a8748f9f5b413aa8eb38e3a/tempest/api/compute/servers/test_server_rescue.py#L55-L6714:25
dansmithsean-k-mooney: what is not the case for rescue?14:26
sean-k-mooneyya it just does rescue without specifying an image so it will use the image the vm was booted with or the image specifid in the config. i wonder if it failed before that14:26
sean-k-mooneydansmith: the rescue test is not uploading any images14:26
dansmithare you looking at a different fail?14:26
sean-k-mooneytempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance14:27
sean-k-mooneyits the second failure in the test report14:27
dansmithack, the first thing you linked is to an ImagesTest not rescue right?14:27
dansmithit's definitely doing a snapshot14:27
sean-k-mooneyactully looking at the server uuid its not in the ncpu log so the novalid host looks like it really could not fit14:29
dansmithsean-k-mooney: right, that's what I was saying, I just picked poorly on the first test to look at :)14:29
dansmithsean-k-mooney: hmm, I see a DELETE of the image just before the failed GET in the glance logs, for that snapshot one, which is odd14:29
sean-k-mooneyyep  Got no allocation candidates from the Placement API.14:30
sean-k-mooneyoh downstream call14:31
dansmithah14:31
dansmithso,14:31
dansmithI think that stack trace from sean-k-mooney is a red herring14:31
dansmithI think that's an images test that tries to delete the image whilst snapshotting or something14:31
dansmithit's not even one of the tests that failed in the testr report :)14:31
dansmithall three of those tests are novalidhost14:32
dansmithso maybe we're actually reporting something different to placement and running out of disk or something?14:32
sean-k-mooneyya maybe14:32
sean-k-mooneywe have 80G of disk in the ci vms but it may not all be avaible int /opt14:33
sean-k-mooneyso i dont know we might have ran out of space14:33
dansmithwell,14:34
dansmithit might be a reporting thing or something and not actually out of space,14:34
dansmithbecause we're not seeing problems, just placement is refusing to find space14:34
dansmithJul 24 12:44:22.575632 ubuntu-bionic-ovh-bhs1-0018770257 devstack@placement-api.service[50512]: DEBUG placement.wsgi_wrapper [req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 req-c57d5bd6-fc4e-469d-9784-cdfe1652d653 service placement] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'DISK_GB' on resource provider 'e786426a-5ae2-4732-8cf6-16325fd2bf2a'. The requested amount would exceed14:36
dansmith the capacity. {{(pid=50513) call_func /opt/stack/placement/placement/wsgi_wrapper.py:31}}14:37
dansmithOver capacity for DISK_GB on resource provider e786426a-5ae2-4732-8cf6-16325fd2bf2a. Needed: 1, Used: 10, Capacity: 10.014:37
dansmith10G doesn't sound right14:37
*** mlavalle has quit IRC14:40
mriedemrandom drive by comment but https://review.opendev.org/#/c/586363/14:40
*** eharney has quit IRC14:40
mriedemanyway related to ceph ci jobs?14:41
dansmithI'm trying to figure out, but we are running a ceph df right before we report inventory14:41
* mriedem ducks back into hole14:41
openstackgerritAlex Deiter proposed openstack/nova master: Detach is broken for multi-attached fs-based volumes  https://review.opendev.org/74171214:41
*** mlavalle has joined #openstack-nova14:43
*** k_mouza has quit IRC14:43
*** k_mouza has joined #openstack-nova14:53
*** eharney has joined #openstack-nova14:53
sean-k-mooneydansmith: by the way if the traceback is unrelated then we likely have another silent bug as we are not catching the excpetion in the missing image case14:54
dansmithsean-k-mooney: yep14:54
dansmithso we're calling ceph df to get the total size of the pool and reporting that14:55
sean-k-mooneydansmith: i think you are right that its unrelated14:55
dansmithas best I can tell, the ceph is backed by a 24G partition14:55
*** udesale_ has quit IRC14:55
dansmithso I dunno where the 10G is coming from14:55
sean-k-mooneythis is using the ceph image backend in nova so the local_GB should be the ceph pool size right14:56
dansmithwell, it should be yes14:56
dansmithceph has 24G, so I'm trying to find where our images pool would be limited to 10G but not seeing it14:56
dansmithone thing that might explain this,14:56
dansmithis that our normal ceph job was using qcow on rbd, which is not what you're supposed to do,14:57
sean-k-mooneyoh ya because we have to flatten it14:57
sean-k-mooneyit should be raw14:57
sean-k-mooneyto get the cow optimization14:57
dansmithand so we convert the image to raw, which is 44M per image instead of 12 or something.. although we shouldn't really be using that much space, so... hmm14:57
dansmithand this is just placement saying we're out of space, not ceph14:57
dansmithI wonder if glance is incorrectly determining the size of the new image after it flattens or something14:58
sean-k-mooneywell with after teh first image import is all cow clones in ceph right14:58
dansmithand telling us we need a lot more than we do or something14:58
dansmithright14:58
dansmithcheck this out: Jul 24 12:44:22.270293 ubuntu-bionic-ovh-bhs1-0018770257 nova-scheduler[55176]: WARNING nova.scheduler.host_manager [None req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 tempest-MultipleCreateTestJSON-968818181 tempest-MultipleCreateTestJSON-968818181] Host ubuntu-bionic-ovh-bhs1-0018770257 has more disk space than database expected (8 GB > 1 GB)15:01
sean-k-mooneyreserved_host_disk_mb IS 0 TOO15:01
sean-k-mooneythat is strange do we have the hoststate update enabled15:02
*** k_mouza has quit IRC15:03
sean-k-mooneyim pretty sure we do15:03
*** eharney has quit IRC15:03
sean-k-mooneyya we do15:03
*** derekh has quit IRC15:03
sean-k-mooneydisk_allocation_ratio=1.0,disk_available_least=8,free_disk_gb=10,f15:04
dansmithwe're only asking placement for DISK_GB=1 allocation so I don't think we're getting a bad number from glance or anything15:04
sean-k-mooneywhat do our flavor look like15:06
sean-k-mooneyactully no never mind15:06
sean-k-mooneythis is not bfv15:06
sean-k-mooneythe flavor should be either 1 or 2GB per instance i think15:06
*** jsuchome has quit IRC15:07
dansmithand it seems like 1 since we're asking for that size allocation15:07
sean-k-mooneyya its based on teh image size https://github.com/openstack/devstack/blob/2ecd1823850ae0e00ad0ecebbbceb312be60ccf4/lib/tempest#L204-L20615:09
sean-k-mooneyso for cirros image it will be 1g15:09
dansmithsudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 815:10
dansmiththat's 8G for the vms pool15:10
dansmithI dunno where we're getting 10G15:10
sean-k-mooneyi dont think that is the size15:10
sean-k-mooneyi think that is the buckest to share it in15:10
sean-k-mooneylet me check15:10
dansmithhmm, okay it seems like size15:11
sean-k-mooneyi think its the placment groups but its been a while15:11
dansmithokay yeah, maybe you're right15:11
sean-k-mooneyceph osd pool create <pool-name> <pg-num> <pgp-num> [replicated] \15:12
sean-k-mooney         [crush-ruleset-name] [expected-num-objects]15:12
sean-k-mooneyso ya its not the size15:12
dansmithyeah15:13
dansmithI still dunno where we're getting 10G,15:13
sean-k-mooneysame15:13
*** ociuhandu has quit IRC15:14
dansmithbecause CEPH_LOOPBACK_DISK_SIZE=24G15:14
sean-k-mooneyso it should be 8 https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/settings#L1715:14
sean-k-mooneyby default15:15
dansmithit's overridden in our job somewhere15:15
dansmithyou can see in the devstacklog15:15
sean-k-mooneyCEPH_LOOPBACK_DISK_SIZE is15:15
sean-k-mooneyis VOLUME_BACKING_FILE_SIZE15:15
dansmithues15:15
dansmithboth are15:15
sean-k-mooneyok cool15:15
dansmithVOLUME_BACKING_FILE_SIZE=24G15:16
dansmithand the df shows 24G on /var/lib/ceph15:16
sean-k-mooneyah yes it does15:17
*** eharney has joined #openstack-nova15:17
dansmithwe run "ceph df" to get the DISK_GB we report,15:17
dansmithand don't really do much to it,15:17
dansmithso it really seems like we're being told 10G15:18
*** k_mouza has joined #openstack-nova15:19
dansmithlyarwood: do you know anything about what ceph df may be telling us about total pool size that differs from the backing store's size?15:19
lyarwooddansmith: nope, AFAIK it just reports the size of the images_rbd_pool15:20
* lyarwood looks15:20
dansmithseems straightforward :)15:21
lyarwoodhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 - ah well melwitt has a handy comment here that might help15:21
lyarwoodgah highlights are a new lines off but you get the point15:21
dansmithoh I read that,15:21
dansmithbut didn't grok until now15:22
dansmithso replication makes the thing looks smaller I guess?15:22
dansmithseems weird to go from 24G to 10G, as that's not an even factor15:22
-openstackstatus- NOTICE: We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.15:22
dansmither, no wait15:23
dansmiththat's for max_avail, which is "free" not total right?15:23
*** k_mouza has quit IRC15:23
lyarwoodright sorry and you're seeing 10 reported as the total capacity right?15:24
dansmithcorrect15:24
lyarwoodkk sorry then that isn't it15:24
*** dklyle has joined #openstack-nova15:25
*** maciejjozefczyk has quit IRC15:26
dansmithI guess one thing we could do is increase the ceph backing size to 36G and see if DISK_GB goes up15:26
bauzascan someone tell me what the fuck is ? http://paste.openstack.org/show/796292/15:26
bauzastl;dr: ssh: connect to host review.openstack.org port 29418: Network is unreachable15:26
bauzashave I missed a memo ?15:27
melwittthere's a openstackstatus above ^ said there will be a short outage15:27
openstackgerritSylvain Bauza proposed openstack/nova-specs master: WIP: Offline Reshape tool spec  https://review.opendev.org/74290815:29
bauzasyay, it worked15:29
bauzasmelwitt: thanks15:29
bauzascalling it a day15:29
*** k_mouza has joined #openstack-nova15:32
melwittdansmith: MAX_AVAIL should be total actually, just taking number of replicas into account. if you only have 1 replica (default NUM_REPLICAS=1) then MAX_AVAIL should match whatever total says in 'ceph df'15:32
dansmithmelwitt: you're reporting free as max_avail though in that thing aren't you?15:32
dansmithor does MAX_AVAIL != max_avail ?15:32
melwittbut if you've set NUM_REPLICAS=2 when you deployed a devstack, then since the devstack ceph plugin creates 2 OSDs on the same HDD in that case, it would be 2x the real disk15:32
melwittno MAX_AVAIL is a ceph thing15:33
melwitt(if you're referring to what is written about ceph df in rbd_utils.py)15:33
dansmithyou mean half the disk I assume15:33
dansmithyeah15:33
melwittno like the old behavior used to report 20G if you had a 10G disk, of you had created 2 OSDs that point at the same HDD15:33
dansmithso maybe (24 - overhead) / 2 == 10 or something15:34
melwittyou're using NUM_REPLICAS=1 right? you didn't set it in the job15:34
melwittif so, there shouldn't be a difference15:34
dansmithI'm not setting it, but let me look if it's getting set15:34
melwittI doubt it, I've never seen it set in CI before. I had to set it locally to do the testing for that MAX_AVAIL change15:35
dansmithyeah I don't even see that variable anywhere15:35
dansmithis that a devstack-plugin-ceph thing?15:35
melwittyeah sec15:35
lyarwooddansmith: https://docs.ceph.com/docs/jewel/rados/operations/pools/#create-a-pool ; sudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 8 ; that doesn't mean create a 8GB pool15:36
melwittbah sorry it's CEPH_REPLICAS https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L10915:36
*** k_mouza has quit IRC15:36
dansmithlyarwood: yeah we established that :)15:36
lyarwoodah sorry wasn't watching irc15:36
dansmithlyarwood: somewhere in the plugin I saw a comment that made it sound like that was size15:36
melwitt10G honestly I would have thought is just the cloud image's disk size, no?15:37
dansmithmelwitt: yeah 115:37
melwittor do we probably use something larger in CI15:37
dansmithmelwitt: no, said above, it's 24G15:37
dansmithmelwitt: https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/df.txt15:37
dansmithand it's overridden to 24G in the devstack log15:37
lyarwoodthat's total for the three different pools15:37
*** k_mouza has joined #openstack-nova15:37
lyarwoodvms images and volumes?15:37
melwittoh I see15:38
dansmithlyarwood: and are the pools set to something specific for size? that's what we're trying to find and can't :)15:38
dansmithlyarwood: the way it looks now I'd assume it just reports that they're all 24G in size, with various amounts free like zfs does for filesystems on a pool,15:38
dansmithbut I'm just guessing15:38
dansmithI'm stacking a ceph devstack so I can poke but right now all I have is logs15:38
dansmithif total decreases as we use space, then we're not really reporting the right thing to placement15:39
dansmithwhich could be part of the problem of coruse15:39
lyarwooddansmith: yup true and that's also going to bounce around alot during a tempest run15:40
dansmithyep15:40
dansmithI'm pretty sure this is not a consequence of my job, by the way, I think mine is just a little slower because we have some glance features turned on, so we probably have a little more of a logjam than normal15:40
*** xek_ has quit IRC15:41
dansmithoh jeez, you know what I just realized?15:41
dansmithwe might be snapshotting to the file store and not the ceph store in some cases, actually15:41
dansmithhmm15:41
dansmithnova does the snapshots itself so maybe not, but if we ever do a raw image upload.. the default store is the file store15:41
*** gyee has joined #openstack-nova15:42
dansmithnot that that would cause this, but it might be changing the timing characteristics15:42
*** k_mouza has quit IRC15:42
dansmithI'll have to think on that a bit15:42
melwittwell, this doesn't look promising for MAX_AVAIL, it sounds like it would decrease with use and is not a total https://access.redhat.com/solutions/353796115:42
dansmithah yeah15:43
dansmithmelwitt: did you read this? https://access.redhat.com/solutions/227395115:43
dansmithwe're not replicated I guess so maybe that doesn't affect us in CI, but probably has some impact for real users of this15:44
melwittno15:44
melwittso there are multiple reasons MAX_AVAIL shouldn't be used :(15:45
*** bnemec is now known as beekneemech15:46
dansmithnot it!15:46
*** k_mouza has joined #openstack-nova15:47
dansmiththe other problem I'm guessing,15:47
melwittyeah... I'm thinking whether to revert that or tweak it to take total and divide by pool size, the latter would do what was actually desired and report total with replication considered15:47
dansmithis that if we report the real actual total (even minus replication overhead), but other pools can consume space from the same store,15:48
dansmithwe will tell placement we have more room than it can allocate15:48
dansmithso really we need to sum up all the pools on the same store, and then set reserved= for any space they use I guess, but then we race with those other uses in our reporting15:48
dansmithand could go negative15:48
melwittyeah, I'm trying to remember, I could have sworn this get_pool_info was only used to report free space, not total space, but I could be totally making that up15:49
melwittor that that's what it's used for ultimately in higher layers15:49
melwittlet me look up what "total" used to be, maybe it meant "total available"15:50
melwittno, looks like it was total. had total, total used, and total available15:51
*** k_mouza has quit IRC15:51
sean-k-mooneydansmith: one thing that i just tought of15:52
*** dtantsur is now known as dtantsur|afk15:53
sean-k-mooneyby default replicate pools have a replciation factor of 315:53
sean-k-mooneyso if we have 24G of space we would only have 8 useable15:53
melwittbut looking at the clip again https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 I did parse out total_bytes to go with 'total', max_avail to go with 'free', and bytes_used to go with 'used'. so this should be fine....15:53
dansmithI'm confused about whether we're replicating or not15:53
dansmithsean-k-mooney: ^15:53
sean-k-mooneythat is the default unless we create a erasure encoded pool15:53
dansmithand even still, 24/3==10 only for very small values of 3 :P15:53
dansmithhmm, okay what is CEPH_REPLICAS then?15:54
melwittthat's the number of replicas for when it creates the pools15:54
sean-k-mooneywell we have 24G for ceph but we have multiple pools right?15:54
*** k_mouza has joined #openstack-nova15:54
sean-k-mooneythe images pool will also be using that15:54
dansmithright, vms and images15:55
sean-k-mooneyhttps://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L10915:56
sean-k-mooneyits 115:56
sean-k-mooneyCEPH_REPLICAS15:57
sean-k-mooneywich for ci makes sense15:57
dansmithright, I think we established that earlier :)15:57
melwittyeah, I was saying earlier I've never seen CI use anything other than the default of 115:57
sean-k-mooneywell we dont need to test anything else in our ci since we are not really testing ceph15:58
dansmithhttps://pastebin.com/6gcGhTHQ15:58
sean-k-mooneyjust ceph integration with other thngs15:58
dansmiththis is what my ceph df shows on a clean devstack15:58
*** gibi is now known as gibi_pto15:58
dansmithinterestingly I didn't update my backing size from 8 to 24, but still got 2415:58
gibi_ptoso I'm going away for a week. I will be back on 3rd of Aug15:58
dansmithgibi_pto: p/15:59
*** k_mouza has quit IRC15:59
gibi_ptoo/15:59
lyarwood\o15:59
sean-k-mooneydansmith: i think VOLUME_BACKING_FILE_SIZE is a devstack setting15:59
dansmithoh, I see, and ceph plugin uses that, gotcha16:00
sean-k-mooneyyes https://github.com/openstack/devstack/blob/e0d06adffcf4c8da1aefebc66f2de9a440badbf6/stackrc#L76616:00
sean-k-mooneyand devstack defaults it to 2416:00
sean-k-mooneyso that is where that is comming form16:00
sean-k-mooneythat was orginically for cinder16:01
sean-k-mooneyoh16:03
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph_log.txt16:03
sean-k-mooneypgmap v5: 0 pgs: ; 0 B data, 704 KiB used, 9.0 GiB / 10 GiB avail16:03
sean-k-mooneyso ceph does think it has only 10G16:03
dansmithoh nice, but from where?16:03
*** k_mouza has joined #openstack-nova16:04
sean-k-mooneythere is a ceph follder at the root of the contoler logs16:04
dansmithfrom my devstack: 2020-07-24 08:25:52.400945 mgr.x client.14099 192.168.201.41:0/3299763660 2 : cluster [DBG] pgmap v5: 0 pgs: ; 0B data, 188MiB used, 23.8GiB / 24.0GiB avail16:04
dansmithno I mean where is it getting the 10G16:04
sean-k-mooneyim wondering if we are using the filestore backend and didnt resize the filesystem or something?16:04
sean-k-mooneyalthough DF on the host shose 24G right16:05
dansmithit does,16:05
sean-k-mooneyis that the block device size or filesystem16:05
dansmithand my local devstack shows the 24G16:05
dansmithfilesystem16:05
dansmithdoesn't look like we grab the ceph configs16:06
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#1016:06
sean-k-mooneyso its using filestore16:06
sean-k-mooneybut wy is that 10G16:06
*** songwenping_ has joined #openstack-nova16:06
sean-k-mooneyoh its using bluestore not file store16:06
sean-k-mooneybut same question16:06
dansmithyou mean files in /var/lib/ceph right?16:07
sean-k-mooney bluestore(/var/lib/ceph/osd/ceph-0) _setup_block_symlink_or_file resized block file to 10 GiB16:07
sean-k-mooneyits not using the mount16:07
dansmithnot using the mount for what?16:07
sean-k-mooneyits creating a ceph-0 file inside it it think16:07
dansmithsure, that's inside the mount16:07
dansmithit's creating a 10G flat file right?16:08
sean-k-mooneyi think so16:08
*** k_mouza has quit IRC16:08
sean-k-mooneythat is then being used for the osd16:08
dansmithright16:08
sean-k-mooneyso we are creatinga 24G flatifile and attaching it as a loopback device then mounting it on /mnt/ceph16:09
sean-k-mooneysorry16:09
sean-k-mooney/var/lib/ceph16:09
dansmithright16:09
sean-k-mooneythen inside that they are creating another flatfile16:09
dansmithand then it's creating a file called block inside there as the actual thing the osd uses16:09
sean-k-mooneyand using that for the osd16:09
sean-k-mooneyyep16:10
sean-k-mooneyso this is wrong16:10
dansmithand that thing is 10G16:10
*** songwenping__ has quit IRC16:10
sean-k-mooneyi think we are expecting them to use the /var/lib/ceph mound directly for the osd16:10
sean-k-mooneyi suspect this behavior changed when we changed form the filestore to bluestore backend16:10
sean-k-mooneywe should mount the loopback device at /var/lib/ceph/osd/ceph-0/block16:11
sean-k-mooneyinstead that way it would have teh full 24G16:11
dansmithI dunno what "directly" means.. they still have to store their data in their special format right?16:11
dansmithand it's normally a raw disk they want, so if we give them a filesystem they need to create a flat file to emulate the block device on no?16:11
dansmithfwiw, I don't have a block file (yet) and mine is reporting 24G16:12
dansmithso i dunno why it's different16:12
dansmithah, my osd0.log:16:13
dansmithxfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf16:13
dansmithso that's different than bluestore I guess you're saying?16:13
sean-k-mooneyi think if we just left /var/lib/ceph mounted under / as part of the root filestem and moved wehere we mount the 24G loopback device file to /var/lib/ceph/osd/ceph-0/block ceph would have all 24G16:13
sean-k-mooneydansmith: yes that is teh filestore backend16:13
sean-k-mooneythat use need a folder to use16:13
dansmithah, CI is using the nautilus version of ceph, I'm on luminous16:13
sean-k-mooneyluminous is the defualt in the devstack plugin ya16:14
dansmithbut CI is using nautilus16:14
sean-k-mooneybut ci i guess is overriding it16:14
dansmithceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process ceph-osd, pid 895216:14
sean-k-mooneyya16:14
dansmithcan we set the backing store driver?16:14
dansmithback to xfs?16:14
sean-k-mooneywe could but i think the better solution is to change how we do the mounting16:15
sean-k-mooneybluestore is not the default16:15
sean-k-mooneyupstream16:15
sean-k-mooneyin ceph and downstream as of osp 1616:15
sean-k-mooneyso its nice to test with bluestore16:15
*** k_mouza has joined #openstack-nova16:15
dansmithunless you see where we're setting it to bluestore, it would seem maybe the default changed in nautilus?16:15
sean-k-mooneywell after lumious in any case but yes i dont think we currently set it directly16:16
dansmithyou said "bluestore is not the default" above16:16
dansmithso I'm confused about what you're proposing16:17
sean-k-mooneyoh i ment is16:17
sean-k-mooneyit is now the default in ceph16:17
*** k_mouza has quit IRC16:17
sean-k-mooneyi filestore used to be the default before16:17
dansmithokay that's what I was saying16:18
*** k_mouza has joined #openstack-nova16:18
dansmithI still don't get where the 10G comes from, other than that something is clearly different with blue vs xfs stores16:18
sean-k-mooneybluestore has been the default for a few releases now. filestore is deprecated upstream and downstream in ops16:18
sean-k-mooneydansmith: i think that is the default size that the ceph tool uses16:19
sean-k-mooneywhen its creating a backing file16:19
dansmithokay I don't see that anywhere16:19
sean-k-mooneyits being created by the ceph osd itself here https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#416:20
dansmithI imagine that keeping the loopback mount for var lib ceph is ideal for the plugin as long as we have stable branches that use that16:20
dansmithsean-k-mooney: yeah I get that :)16:20
dansmithsean-k-mooney: I'm saying I don't know where 10G is set or assumed or whatever ;)16:20
sean-k-mooneyyes we can proably change that in the job?16:21
melwittdansmith: is it not here? https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#1816:21
dansmithlol16:21
dansmithyes, I understand 10G is being used16:21
sean-k-mooneyor if the destack pluging is branched we can change it only on the branchs that use nautilus16:21
dansmithI'm saying I don't see a config for that16:21
dansmithhttps://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/16:21
sean-k-mooneyif there is a config it would be the ceph config file16:21
melwittoh, sorry, just saying that's a command that is setting 10G deliberately16:22
dansmithmelwitt: yep I think that's understood now16:22
melwittfrom what sean-k-mooney was saying, I thought no one saw a deliberate setting of it yet16:22
melwittthat it was happening "automatically"16:22
melwittI am caught up now16:22
dansmithall I'm saying is, I imagine that bluestore can have more than 10G of backing store, and if it's not basing that on actual disk free space, it's probably a config somewhere or something :)16:23
melwittyeah, I understand now16:23
melwittI misunderstood what sean was saying earlier16:24
dansmiththe bluestore config actually seems to be mostly focused on using physical devices16:25
*** songwenping__ has joined #openstack-nova16:25
sean-k-mooneymelwitt: it happens frequently that people missunderstad me. well not that often if i type what i ment to type but i am bad at not doing that16:27
*** songwenping_ has quit IRC16:27
*** nightmare_unreal has quit IRC16:27
melwittsean-k-mooney: eh, I often have trouble understanding people so by our powers combined... !16:28
dansmithI HAVE NO IDEA WHAT YOU PEOPLE ARE SAYING16:30
melwittI AM GOOD AT DEALING WITH PEOPLE16:31
* stephenfin decides this is too much weirdness and bails16:31
sean-k-mooneystephenfin: talking about storage does this to people16:31
sean-k-mooneyjust taking a step back16:32
sean-k-mooneywe are happy we know where the 10G size is comming from now16:32
melwittStOrAGE16:32
*** eharney has quit IRC16:32
sean-k-mooneyand that we are proably just hitting a real no valid host error because we are actully providing 10G to cpeh instead of 2416:33
sean-k-mooneyso the ceph jobs failrues are not related to dansmith's recent changes to the job16:33
sean-k-mooneyyes?16:33
*** ociuhandu has joined #openstack-nova16:34
dansmithsean-k-mooney: yeah I thought we were assuming that16:34
dansmithbecause it's clearly just placement16:34
dansmithmy job might be slower (or faster) causing us to hit it more than we were or something16:34
sean-k-mooneyso we just either a.) swap back to file store to get the old behavior or b.) mount our loopback file in such a way that the bluestore block device uses our 24G loopback device instead of creating its own16:35
dansmithyeah so I figured going back to xfs would be ideal for compatibility with everything16:35
dansmithmy system clearly gets xfs16:35
dansmithI assume the workers are getting blue because they're newer ubuntu or something16:35
dansmithI'm still on bionic16:36
sean-k-mooneydansmith: sure but eventurally we will have to move since i think filestore is deprected in ceph16:36
dansmithsure16:36
dansmithpain now or pain later16:36
dansmithpain later might be someone else's pain :P16:36
sean-k-mooneyso i guess what we are looking for is a ceph config option to select filestore for the osd backend16:36
sean-k-mooneythat or we set it on the osd create command16:37
dansmithyeah16:38
sean-k-mooneyso this code https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L475-L48416:38
*** ociuhandu has quit IRC16:39
sean-k-mooneythat inital sudo ceph -c ${CEPH_CONF_FILE} osd create16:39
dansmithwell, the other option is to figure out how to make blue use 20ish G instead of 10,16:39
dansmithwhich would be less impactful than retooling the mount stuff in the ceph plugin16:39
sean-k-mooneywell we are mounting it on /var/lib/ceph16:40
sean-k-mooneyso i guess this is already plugin specific16:40
sean-k-mooneywe are likely resuing the same function that is used for cinder and just passing the mount path16:40
sean-k-mooneyya we are just calling create_disk https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L38716:40
dansmithcinder just wants a loop not a mounted fs though right/16:41
sean-k-mooneymaybe this is the fucntion in devstack https://github.com/openstack/devstack/blob/eee60c76719c02c08dba7b7fb703798a056b22b9/functions#L758-L78916:41
sean-k-mooneythat kind of looks like a hack16:42
sean-k-mooneye.g. that does not look like it was created orginally for ceph16:42
melwitthm, I found this https://forum.proxmox.com/threads/proxmox-ceph-osd-partition-created-with-only-10gb.55291/16:43
sean-k-mooneyoh its for swift orginially16:43
*** maciejjozefczyk has joined #openstack-nova16:44
sean-k-mooneyi think the  "sudo ceph-osd -c ${CEPH_CONF_FILE} -i ${OSD_ID} --mkfs" is the one we would need to modify16:46
sean-k-mooneymelwitt: that does seam like the same issue more or less16:49
melwittyeah, I'm having trouble understanding it16:49
melwittthe last comment links to another post https://forum.proxmox.com/threads/where-can-i-tune-journal-size-of-ceph-bluestore.44000/ where they're talking about tuning journal size and bluestore_block_db_size and bluestore_block_wal_size16:50
melwittand I don't know what any of that is or means16:51
melwitt(in ceph.conf)16:51
*** markvoelker has quit IRC16:52
*** maciejjozefczyk has quit IRC16:59
*** k_mouza has quit IRC16:59
sean-k-mooneythose are not realated to the data storage size of the osd17:01
sean-k-mooneyblustore has an embeed database that track where the logic block are located on disk17:01
sean-k-mooneywal i think it the write ahead log or somethingl like that17:02
sean-k-mooneyits part of how it does write journalling17:02
sean-k-mooneyin both cases they are turning parmatner for how bluestore can save its metadata17:03
sean-k-mooneyunlike file sotre it can save it inline in the blockdevice it is managening or it can save it oh external devices and they support tuneing of the sizing of them independelty17:03
openstackgerritArtom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration()  https://review.opendev.org/72976317:04
melwittsean-k-mooney: found a new thing https://bugzilla.redhat.com/show_bug.cgi?id=159704817:09
openstackbugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard17:09
dansmithimagine that :)17:10
melwittwhat17:11
sean-k-mooneyit should be 3.7TB but is 10G17:11
dansmithmelwitt: "not showing correct disk size"17:11
melwittyeah?17:11
melwittI'm still googling for why bluestore is maxed out at 10G17:12
dansmithmelwitt: just saying, I think we've stumbled into a realization that our df reporting on ceph in libvirt es no bueno right?17:12
melwittno17:12
dansmithoh did I miss something? I thought those RHN articles were indicating that we're reporting the wrong thing still17:13
sean-k-mooneyhttps://bugzilla.redhat.com/show_bug.cgi?id=1597048#c817:13
openstackbugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard17:13
dansmithlike, if we're reporting the total size of the osd, but that's shared by vms and images, we'll be telling placement it can allocate all that space for instances but it can't17:14
sean-k-mooneyso it look like they hever actully got to the root cause of why the bluestore file was a 10G file17:14
sean-k-mooneythey just redeployed with file store an ignored it17:14
*** songwenping_ has joined #openstack-nova17:14
melwittyeah.. but then what does this mean? "The BlueStore block device was a file named with a block, not a symlink to block device partition of this disk and that file size was 10G hence it was showing the size of the OSD as 10G."17:15
sean-k-mooneyi think they ment17:15
sean-k-mooneythat in stead of it being a symlink to /dev/sdX17:15
sean-k-mooneyit was a file named block17:15
sean-k-mooneythat was 10G17:15
sean-k-mooneyhence -rw-r--r--. 1 ceph ceph 10737418240 Jul  2 16:51 block17:16
melwittright.. so you think it's correct that it's pointing at a file named block? and that the problem is that the file is not larger than 10G?17:16
sean-k-mooneythey were expecting it to be a symlink to the actual hdd17:16
sean-k-mooneywell in that case yes17:16
sean-k-mooneyand also likely in our case17:16
melwittok. from reading that I thought maybe it was pointing wrongly at a file17:17
*** songwenping__ has quit IRC17:17
*** k_mouza has joined #openstack-nova17:17
sean-k-mooney/var/lib/ceph/osd/ceph-0/block is liekly a 10G file17:17
dansmithI think that bug is that they deployed on file instead of having the bluestore osd use the disk they wanted17:17
sean-k-mooneydansmith: yes17:17
dansmithwe want file, they wanted disk, right?17:17
sean-k-mooneyyes17:18
melwittoh17:18
melwittok so why is /var/lib/ceph/osd/ceph-0/block only 10G ... who creates it ...17:18
dansmithright, that I think we still don't know.. where the 10G comes from and how we change it17:19
dansmithbecause the osd itself (the driver) seems to create that as a flat 10G file if it's not there17:19
melwittyeah, at least before now I did not know that the 10G comes from the size of the file named "block" so now I'm gonna see if I can find where that file is created17:20
*** k_mouza has quit IRC17:21
*** k_mouza has joined #openstack-nova17:22
*** k_mouza has quit IRC17:27
melwitthm https://github.com/ceph/ceph/blob/master/src/common/legacy_config_opts.h#L94017:28
*** hamalq has joined #openstack-nova17:29
dansmithlol17:29
melwitthttps://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L412217:29
dansmithso maybe we're supposed to create that block file to be what we want it to be17:30
melwittlooks like they changed the default to 100G at some point17:30
dansmithdefinitely obscure though17:30
sean-k-mooneydansmith: yes we are ment to create the file/partion first normaly when deploying ceph17:30
melwittwell I think you can set bluestore_block_size in ceph.conf no?17:30
melwittoh17:30
sean-k-mooneyam likely17:31
sean-k-mooneybut ceph does not expect to have to create this normally17:31
melwitthttps://github.com/ceph/ceph/commit/57890fce7064811780823e298b31e7fced2fa0e317:31
sean-k-mooneyif you use the tooling they provide tehy create the partions ahead of time17:31
melwittthat's more recent, change from 1 TB -> 100G default. but in older versions the default was 10G, trying to see when that was so we can compare with what version we're running17:32
sean-k-mooneythis is the funtion that actully creates teh file https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L593417:33
melwittv15.1.0 is Octopus17:33
sean-k-mooneyif the block file is not present it creates it https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5931-L593417:34
melwittand somehow 'size' is passed in from the config option I assume17:34
sean-k-mooneythat is what im currently trying to find yes17:34
sean-k-mooneythis maybe https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5943-L594417:36
sean-k-mooneyah no its here17:38
sean-k-mooneyhttps://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L6050-L605217:38
sean-k-mooneyso in the mkfs call17:38
sean-k-mooneyso when we do this17:39
sean-k-mooneyhttps://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L482-L48317:39
melwittah yup, and it's pulling the conf option17:39
sean-k-mooneyit cause the mkfs function to be invoked on the backend store17:39
sean-k-mooneywhic for bluestore uses that config option to create the 10G file17:39
melwittthe interesting thing is, I wonder why it uses the legacy option and not the new one. I don't understand how that works in their code. cause in nautilus they have both the 10G and 100G default in the legacy conf vs the non17:40
sean-k-mooneyif var/lib/ceph/osd/ceph-0/block is not a symplink to a device17:40
sean-k-mooneymelwitt: its legacy on master17:41
sean-k-mooneyit might not be on nautalius17:41
melwittoh, I thought you mentioned earlier that CI is using nautilus17:41
sean-k-mooneyactully its alos here on master https://github.com/ceph/ceph/blob/master/src/common/options.cc#L4127-L413117:41
sean-k-mooneymelwitt: yes it is17:42
sean-k-mooneyactully just above that17:42
sean-k-mooneyhttps://github.com/ceph/ceph/blob/master/src/common/options.cc#L4122-L412517:42
melwittyeah I'm saying it's weird that it's not defaulting to 100G like that is showing17:42
melwittthe old default was 10G17:42
sean-k-mooneyyep17:43
sean-k-mooneywe are pulling 14.2.2 https://github.com/ceph/ceph/blob/v14.2.2/src/common/options.cc#L433917:43
sean-k-mooneywhich is 1017:44
sean-k-mooneythey backported the 100G change to nautilus17:44
sean-k-mooneybut its not in the tag we are pulling17:44
sean-k-mooneyi think legacy_config_opts.h is just an old way to define config options17:45
melwittohhh17:45
melwittgood find. ok at least everything makes sense now17:45
sean-k-mooneyrather then deprected by the way17:45
sean-k-mooneyya so i guess we just set that config option to say 20G?17:45
sean-k-mooneyin ceph.conf17:46
melwittyeah, seems like it17:46
sean-k-mooneywhich we can do here https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L415-L42817:46
melwittyarp. just have to double check whether it's a "global" or what. are those config groups or?17:47
sean-k-mooneyi like how this is basically undocumeted other then in the source code17:47
sean-k-mooneyi think in global yes17:47
melwittyeah, I know. they have a bluestore config doc but zero mention of this https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref17:48
sean-k-mooneyiniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "20"17:48
sean-k-mooneyis that right?17:48
sean-k-mooneyi was search for 10_G but i think _G is a user defied suffix17:49
sean-k-mooneyso now i need to find that17:49
*** aj_mailing has joined #openstack-nova17:50
sean-k-mooneyyep https://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L343-L34517:50
melwittoh, is the unit GB or something else?17:51
sean-k-mooneyits in bytes i think17:52
sean-k-mooney10_G is doing 10 << 3217:52
sean-k-mooneyits a c++ 11 user defied literal https://en.cppreference.com/w/cpp/language/user_literal17:52
sean-k-mooneyactuly its << 30 not 3217:53
sean-k-mooneybut ya still bytes17:53
sean-k-mooney unsigned long long .... im glad they also defined a bettere way to name integers in c++11 so you dont have toe use that c way of naming types17:54
melwittok so you can't just put "20" in the conf17:54
sean-k-mooneyi think we have to do 20<<3017:54
sean-k-mooneyso 2147483648017:55
melwittright17:55
*** tesseract has quit IRC17:59
sean-k-mooneyill pretend tehy are not potting a unsigned long long into a size_t variant without asserting it fits17:59
melwitt:)18:00
*** k_mouza has joined #openstack-nova18:00
*** aj_mailing has quit IRC18:01
*** aj_mailing has joined #openstack-nova18:02
*** k_mouza has quit IRC18:05
*** gmann is now known as gmann_lunch18:13
dansmithhave ya'll fixed it yet?18:16
sean-k-mooneyim looking at a linux bridge issue from the neutron channel currently but it looks liek we jsut need one more line here to set the config option https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L42918:18
sean-k-mooneydansmith: can you test it with your local setup18:18
sean-k-mooneyjust add iniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "21474836480"18:19
dansmithyup18:19
dansmithoh wait, I can't18:19
dansmithbecause mine doesn't use blue18:19
dansmithbut I can float a patch and get jobs going18:19
sean-k-mooneyya that works18:19
sean-k-mooneyi dont have a ceph env currently i could set one up but its almost half past 7 on a friday so dont want to wait for it to stack :)18:20
*** ociuhandu has joined #openstack-nova18:22
dansmithsean-k-mooney: dude, you need to cut yourself off :)18:24
dansmithhttps://review.opendev.org/#/c/742961/18:25
*** ociuhandu has quit IRC18:27
dansmithI think the nova team needs to have the keys to sean-k-mooney's irc bouncer so we can turn it off when it's time for him to sleep18:27
sean-k-mooneyhehe i dont use one i just dont trun my laptop off :P18:28
dansmithlike giving car keys to the bartender18:28
melwittsean-k-mooney laptop and dev box permanently ON18:28
dansmithsean-k-mooney: well, then an ssh account to your laptop I guess18:29
sean-k-mooneymelwitt: yes they more or less are.18:29
dansmithmelwitt: more like sean-k-mooney permanently ON18:30
melwittman, what was I doing earlier18:30
melwitttrue18:30
dansmithlaptop sleep timer be like "jesus when is he going to go to bed, I'm exhausted"18:30
melwitthaha yeah18:30
artomdansmith, I think we'll need remote access to his fuse box...18:31
artomFirst we'll need to invent an SSHable fuse box...18:31
* artom checks18:31
dansmithartom: he's a property owner now, so we can't go to the landlord18:31
dansmithartom: no need to cut the power just reset his luks key18:32
* artom was half expecting remotable fuse boxes to exist, because IoT18:32
artomBut then they'd have a crappy web UI with 'password' hardcoded as the admin password18:33
artomSo maybe not18:33
melwittyeah, I would not be surprised18:33
sean-k-mooneydansmith: oh your mean resting the luks key on my laptop would be a pain to fix18:34
dansmithsean-k-mooney: no, we can reset it back to the one you know when you should be online18:34
sean-k-mooneyah ok18:34
dansmithhah18:35
sean-k-mooneyspeaking of which o/18:35
dansmithgood :)18:35
melwittwait, weren't you on pto today too? what the heck18:35
*** slaweq has joined #openstack-nova18:35
sean-k-mooneyyesterday18:35
sean-k-mooneywell untill today18:36
dansmithmaybe /melwitt/ needs the sleep18:36
sean-k-mooneyi was at a funeral18:36
sean-k-mooneyso ya back for a "shortish" day of not very stressful things18:36
sean-k-mooneyi planned to leave after teh bug call but got distraced18:36
sean-k-mooneyanyway food18:36
sean-k-mooneyo/18:37
melwittwell we swapped running the bug call today so I was like why is sean here18:37
melwitthave a nice weekend o/18:37
artommelwitt, he was on PTO wednesday, so unable to send out the email18:37
artomAnd usual email + run the call go hand in hand18:37
melwittno, I swapped with him and I was supposed to send the email18:37
melwittI just forgot to18:37
artomRight, so swap means you get to run the call as well18:37
melwittI know, and I did18:38
artomBut... he's allowed to be there for the call18:38
melwittI know he's allowed to be there lol18:38
artomWELL WHAT THE HELL ARE WE ARGUING ABOUT18:38
melwittI just thought if we swapped cause he was out, I was surprised when he was there18:38
melwittI DONT KNOW18:38
artomWHY ARE WE YELLING18:38
melwittWE ARE HAVING TROUBLE CONTROLLING THE VOLUME OF OUR VOICE18:39
artomOH RIGHT I STARTED IT IM SO SORRY18:39
melwitts/TROUBLE/DIFFICULTY/18:40
artomUmm, how about we do real work for a bit? Is there a way to see stats for a particular job? nova-ceph-multistore just failed twice on me18:40
dansmithlol18:40
dansmithdude18:40
artommriedem would have hacked up a logtash query in seconds18:40
dansmithhave you like paid attention to the last three hours in here at all?18:40
melwittomg18:40
dansmithand also, logstash is still fubar I think18:40
melwittno u di'nt18:40
artomdansmith, me? Pay attention? lol u cray cray18:41
dansmithapparently ;)18:41
*** xinranwang__ has quit IRC19:04
*** huaqiang has joined #openstack-nova19:18
*** gmann_lunch is now known as gmann19:19
mriedemi only do logdna queries these days now anyway19:22
*** dklyle has quit IRC19:26
*** dklyle has joined #openstack-nova19:30
*** dklyle has quit IRC19:40
melwittdansmith: I opened https://bugs.launchpad.net/nova/+bug/1888895 for the gate failure. going to mail the ML now19:43
openstackLaunchpad bug 1888895 in devstack-plugin-ceph "nova-ceph-multistore job fails often with 'No valid host was found. There are not enough hosts available.'" [Undecided,In progress]19:43
dansmithcool19:43
melwittgreat, the WIP patch just failed19:44
melwitt[errno 110] error connecting to the cluster wtf19:44
dansmithmaybe that config made it fail to start?19:46
dansmithno ceph logs19:47
dansmithhrm, don't even see it set the ini,19:47
dansmithso maybe it didn't even get that far19:47
melwittyeah must be unless it's a fluke coincidence that ceph totally bombed this time19:50
dansmithI think it must be a fluke bombing because it didn't run that line19:51
*** dklyle has joined #openstack-nova19:51
dansmithah here we go: 2020-07-24 18:32:36.045 | /opt/stack/devstack-plugin-ceph/devstack/lib/ceph: line 429: (24G: value too great for base (error token is "24G")19:53
melwittah19:55
dansmithhacky fix19:55
*** ralonsoh has quit IRC19:57
*** dklyle has quit IRC20:04
dansmithmelwitt: oops, should have put the bug on that, sorry20:05
dansmithbut it'll need cleanup20:05
melwittah yeah20:05
melwittI was just thinking, I think all ceph jobs in openstack are broken over this, not just ours20:05
dansmithpotentially, but like I said, my job may be running slower or have different behaviors20:06
dansmithactually20:07
melwittyeah. I was thinking of replying to mention other ceph jobs may be affected too. I see the older version 14.2.2 being pulled in openstack/tempest, for example20:07
dansmithheh, I just saw a glance failure on the plain ceph job which is the same novalidhost20:07
dansmithso I think yeah.20:07
dansmithyeah20:07
* melwitt nods20:07
dansmiththis is from one of my glance patches: https://e81e6b81331830d4903c-5acdef5dc10478cee5291df1596ec66a.ssl.cf1.rackcdn.com/742065/9/check/devstack-plugin-ceph-tempest-py3/292fd21/testr_results.html20:08
melwittah yeah. and this is the tempest job failure I was looking at https://1ca2ee7583d21788b1d8-42b9b3ca9891e58d539431fcfb5b799d.ssl.cf2.rackcdn.com/742836/2/check/devstack-plugin-ceph-tempest-py3/547fab7/testr_results.html20:09
melwitt(I picked a recent patch proposed to the tempest repo)20:09
dansmithcool20:09
*** dklyle has joined #openstack-nova20:09
dansmiththat's awesome because of two things:20:09
dansmith1. I didn't break stuff20:10
dansmith2. I get to steal the credit from sean-k-mooney for fixing more people!20:10
melwittlol awww20:10
*** ociuhandu has joined #openstack-nova20:10
dansmithdang, now I better give him honorable mention in the commit message ;P20:10
melwitthey, I found the default config value too20:11
melwittafter that he blew past me finding how/where it was used20:11
dansmithheh20:12
dansmithhe gave me a line to copy/paste, so...20:12
melwittyeah, that's what it's all about20:13
*** ociuhandu has quit IRC20:15
*** dave-mccowan has joined #openstack-nova20:25
*** mriedem has left #openstack-nova20:58
*** ociuhandu has joined #openstack-nova22:02
*** ociuhandu has quit IRC22:07
*** _erlon_ has quit IRC22:23
*** raildo has quit IRC22:28
*** dave-mccowan has quit IRC22:44
*** mlavalle has quit IRC23:02
*** martinkennelly has quit IRC23:09
*** hamalq has quit IRC23:10
*** tonyb[m] has left #openstack-nova23:21
*** bbowen has quit IRC23:34
*** bbowen has joined #openstack-nova23:35
*** tosky has quit IRC23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!