Friday, 2022-07-08

opendevreviewAmit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down  https://review.opendev.org/c/openstack/nova/+/84888604:58
bauzasgood Friday Nova07:34
Ugglabauzas, hello did you upgrade to F36 ?08:04
bauzasyup08:04
bauzashad an issue with my network, but this was on me08:04
bauzasmy perf issues with Wayland and any video stream eating all my CPU aren't fixed08:05
bauzasI have to try to see whether I could use hardware acceleration (I have an intel gpu) for those streams08:05
bauzasintel soc I mean08:06
Ugglabauzas, any diff if  you use xorg ?08:06
bauzasUggla: haven't tested yet08:07
bauzasbut I need a reproducible test08:07
bauzasI'll use webrtc08:07
bauzassar gives me a lot of information08:07
bauzasbut this is clear this is just a CPU-bound task, that's it08:08
bauzasnot a lot of context switching08:08
bauzasmy sys % is high, about 20%08:08
bauzasand user % is about 60%08:08
bauzasso I straced08:08
bauzasand I see lots of polling08:08
gibigood morning08:08
Ugglabauzas, mine is also slow sometime when it is getting hot. I think the cooling system is a bit limited.08:09
Ugglagibi, o/08:09
bauzasUggla: good point, I should dig sar's power information08:09
gibiUggla: give me 20 mins and I'm all yours if neede08:10
gibid08:10
bauzasand yeah, in general, my fan is yelling08:10
bauzasI suspected that the CPU is stepping down08:10
bauzashence the perf issues08:10
bauzasbut that's a guess08:11
Ugglabauzas, in such case, I can not switch the power balance to performance, it sticks to balanced.08:11
bauzasUggla: me too, can't modify it08:11
bauzasthis is like I was running my laptop unplugged08:11
bauzaswhich shouldn't be the rule08:11
bauzasif my laptop is plugged, I shall expect max performance and CPU power not stepping down 08:12
bauzasbut yeah, haven't checked the temperature08:12
bauzasyou make me realize a thing08:12
bauzasI think my fan was yelling because the CPU was intensively high08:13
bauzasthought*08:13
bauzasbut that could be the contrary08:13
bauzasif the temperature rises, then the fan yells and then the CPU could step down08:13
bauzasall of this seems tied to Fedora power management tools08:14
bauzasbut yeah, needs a reproducer to validate 08:14
bauzasUggla: either way, I was having performance trouble with F35 already08:14
bauzasso F36 isn't a regression08:15
Ugglabauzas, I think it is what happened in my case. And that's dramatically long  when you run the openstack tests suite.08:15
bauzaspoint is, the TB3 system doesn't help08:16
bauzaspreviously with my other laptops, those were docked08:16
bauzasso the fan was having room for air08:16
bauzasnow, with the tb3 plug, the laptop is just sit down08:16
Ugglaclearly08:16
bauzaswill continue to dig into it08:17
bauzasthere are actions I could do08:17
bauzasand I'd like to verify08:17
bauzasmaybe the cooling system is sufficient and this is only Fedora which is too much conservative in terms of power consumption08:18
bauzashaving the fan burning isn't a problem08:18
bauzasthat's the CPU stepping down which is an issue08:18
Ugglayep I think so as well.08:19
gibiI also have perf issues with google meet (on T14, debian sid, xorg, i3, chrome). During our calls sometimes chrome starts to hogg the CPU and the audio and video starts lagging, and clipping. I haven't figured out yet what triggers the CPU hogging08:19
gibibut as CPU get hot, thermal throttling kicks in and worsen the situation08:20
Ugglagibi, welcome to the club :)08:21
gibithen after 1 minute chrome stops hogging the CPU and after an additional minute google meet audio/video recovers08:21
gibiI suspect that in my case it is chrome that does something crazy08:21
Ugglabauzas, maybe we should have a look to firmware as well. 08:23
* Uggla have no idea if this laptop is updated.08:24
Uggla*has08:24
Ugglagibi, have you a moment to look at my issue ?08:25
gibiyes, I'm ready now08:25
gibilets test some google meet ;)08:25
Ugglayep08:25
Uggla:)08:25
bauzasok, I tuned sar to report my CPU clock freq08:34
bauzaswe'll see, I have to join auniyal_'s meeting now :)08:34
bauzashah ! 08:39
bauzasmy clock freq turns down to 800MHz while the temperature on the devices is only about 50°C08:39
gibithat sounds bad08:40
bauzaswhen I playing gmeet I mean08:40
gibimine can sustain 1.8GHz while hot08:40
bauzassar reports the frequency constantly flipping between 866MHz and 1.5GHz08:41
bauzasevery second08:41
bauzasyeah, found a direct correlation08:42
bauzashuzzah08:42
bauzasbetween the CPU clock freq and the CPU usage08:43
bauzashttps://paste.opendev.org/show/bHWo1z8ElaEDqW2b95fZ/08:47
bauzas[sbauza@sbauza ~]$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor08:51
bauzaspowersave08:51
bauzashmmmpfff08:51
kashyapbauzas: What laptop is yours?08:56
bauzas11:02:30        CPU       MHz09:04
bauzas11:02:31        all    410,3409:04
bauzast14s09:04
kashyapI see09:05
Ugglabauzas, maybe you can run sudo fwupdmgr get-devices it will give you fw revision on your laptop. So we could compare with mine and gibi's.09:30
bauzason a meeting with auniyal_09:31
bauzaswill reply later09:31
bauzasbut I think this is a the CPU governor's fault09:31
gibiI have a ThinkPad T14s Gen 109:33
gibihm, the powersave governor could be a problem09:33
Ugglabauzas from release note latest system relase note: • Updated includes thermal optimization.09:34
Ugglaclearly mine is not at the latest version 0.1.15 vs 0.1.2109:35
Ugglaso bauzas has you have received this laptop first maybe maybe your fws are really "old".09:38
UgglaMy current version: https://paste.openstack.org/show/b9cnjsnSNOqzjw52AA8L/09:39
gibithis is mine https://paste.opendev.org/show/b48RVDFNHGRaeHUq2qMg/09:47
sean-k-mooney1bauzas: powersave keep the cpu at the minium frequency10:10
*** sean-k-mooney1 is now known as sean-k-mooney10:10
sean-k-mooneyyou should set it to the hardware pstate dirver or use ondemand10:10
sean-k-mooneyalso i recommend not using tuned10:10
sean-k-mooneyi have had severl issue with it in the past so i dont trust it anymore10:11
sean-k-mooneyby issue i mean putting one or both ports in a bond to sleep or disabling my mouse/keyboard randomly10:12
bauzassean-k-mooney1: yup, was considering to use ondemand10:14
bauzassean-k-mooney1: but I'd like a governor different between being plugged or not10:15
bauzasI thought Fedora was managing it but apparently not10:15
sean-k-mooneybauzas: i used to use udev rules to do that10:15
bauzasyup, that's what I thought I need to do10:15
sean-k-mooneyill see if i have them but i dont know if i have it on this laptop since i reinstalled10:16
bauzasbut this sucks for users10:16
bauzashow many people know about udev rules ?10:16
bauzasand systemctl ?10:16
sean-k-mooneywell gnome/cinanamon/kde often can10:16
sean-k-mooneyits normally built into the destop envionment10:16
bauzaswhy can't we have a standard behaviour to be ondemand when plugged and powersave when unplugged ?10:16
sean-k-mooneywith a nice gui10:16
bauzassean-k-mooney: yup, and I saw it10:17
bauzassean-k-mooney1: but now with F35 and F36, this is stuck to powersave apparently10:17
bauzasI still have the GUI but clicking makes no result10:17
sean-k-mooneyhave you tried removing tuned10:17
sean-k-mooneyits not using tuned to implement this10:17
bauzasand I can't define a plugged/unplugged strategy like before10:17
sean-k-mooneyso it could be fighting with it10:17
bauzasI haven't investigated it10:18
sean-k-mooneyhttps://linux.die.net/man/1/gnome-power-manager10:18
bauzasI'm already glad I found the root cause...10:18
sean-k-mooneythat is what shoudl be doing it on the gnome side10:18
bauzasbut again, how many users are knowing how to collect power metrics with sar ?10:19
bauzasand even knowing what sar is ?10:19
sean-k-mooneywell your not going to find me defending fedroa10:19
sean-k-mooneyyou know i stongly dislike it10:19
bauzasif you want MHO, Fedora is too much opiniated about what people should do10:19
sean-k-mooneyjust pointing out its probaly tuned that is the issue10:20
bauzasI'll look into it10:20
bauzasthanks10:20
* bauzas goes cooking for his wife10:20
sean-k-mooneyi havenet used fedora since f33 or f32 but it did not used to have tuned installed so i wonder whne that got added10:21
kashyapsean-k-mooney: I'm on F36, and I don't have 'tuned' installed by default (mine is a 'Workstation' install)10:22
sean-k-mooneyack10:23
sean-k-mooneyi wonder why bauzas has it then10:23
bauzasI haven't said I have tuned10:23
sean-k-mooneyoh i tought you did10:24
sean-k-mooneyignore the noise then10:24
bauzas[sbauza@sbauza ~]$ rpm -qa | grep tuned10:24
bauzas[sbauza@sbauza ~]$ 10:24
bauzasI'll continue on it this afternoon10:24
bauzaslooks like an Hack&Hustle session10:25
sean-k-mooneyya maybe10:26
sean-k-mooneylet us know if you figure it out either way10:26
bauzasmy sar figures are explicit10:27
bauzasmy temperature was still fine10:27
bauzasbut I was struggling with CPU clock freq at 400Mhz10:27
bauzasand my audio was then choppy as the load was spiking up to 8.010:27
bauzas(8 CPUs, which is normal)10:27
bauzaswhen the CPU clock was about 1500MHz, then the audio was better (and the load down to 4.0)10:28
bauzasf**** governor10:28
sean-k-mooneywisely or not im using tlp to manage my power https://paste.opendev.org/show/b01fBUjtb7EA4fQDmvuM/10:28
sean-k-mooneyhttps://linrunner.de/tlp/index.html10:29
bauzas  power-profiles-daemon.service                                                             loaded active running Power Profiles daemon10:30
* bauzas needs to see what this daemon is doing10:30
sean-k-mooneyya that is one of the things you have to remove if you are using tlp10:31
sean-k-mooneyjust notifced im actully using schedutil for my govoner10:31
sean-k-mooneyif you are not using the intel pstate driver thats actully the better option vs ondemand10:31
bauzasConflicts=tlp.service shutdown.target auto-cpufreq.service system76-power.service tuned.service10:31
bauzashuhuh10:31
bauzasI'll have to dig into it10:32
bauzasbut... lunch10:32
sean-k-mooneyyep its one of the ones i tried10:32
sean-k-mooneyit didnt work well for me when i use the power-profiles-daemon10:32
sean-k-mooneyi think my issue was mainly fan noise and it was not really goign down form max cpu10:33
sean-k-mooneylike idling on ac was full non turbo/bost clock speeed10:34
opendevreviewMerged openstack/python-novaclient master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/python-novaclient/+/84880410:40
*** tosky_ is now known as tosky11:11
bauzasfound the guilty https://fedoraproject.org/wiki/Changes/Power_Profiles_Daemon12:01
bauzas"The "performance" mode is only available on select systems and is  implemented by different "drivers" based on the system or systems it  targets. "12:02
bauzasthat's maybe me but I thought we were having different power management profiles whether the laptop was docked or not, previously12:10
bauzas(in Gnome, I mean)12:10
bauzashttps://forums.lenovo.com/t5/Other-Linux-Discussions/Lenovo-L14-Gen1-AMD-Lap-Mode-Throttling/m-p/515419712:20
* bauzas facepalms12:20
bauzasbut my surface is steady...12:20
sean-k-mooneybauzas: so ya i would jsut remove that and go with one of the many other tools12:25
bauzaschecked, I'm not in lap mode12:25
bauzasbut there are a ton of issues with PPD with Thinkpads https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues12:25
bauzasand that's the issue I hit when changing the PPD profile to 'perf' https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues/7212:26
bauzassean-k-mooney: any power management tool you'd recommend that could set the power based on charging vs. discharging ?12:28
bauzaspowersave seems reasonable to me if I'm running on battery12:28
bauzasbut I don't want to clockdown if I'm plugged12:29
opendevreviewMerged openstack/nova master: Retry attachment delete API call for 504 Gateway Timeout  https://review.opendev.org/c/openstack/nova/+/84554312:30
sean-k-mooneybauzas: i use tlp12:33
sean-k-mooneyi dont actully use power save on battery but i configure it to limit the max frequency12:33
sean-k-mooneybut you can certenly set the profile to power save too if you prefer12:33
sean-k-mooneyhttps://paste.opendev.org/show/bPVClbF8MnuFjrhBE2z3/12:34
sean-k-mooneythat is my tlp config12:34
sean-k-mooneybauzas: so you could change line 6 two powersaver12:34
sean-k-mooneyor liek i do on line 1112:35
sean-k-mooneylimite the cpu frequency12:35
sean-k-mooneybauzas: tlp also has a tlp-stats tool which is kind of nice12:36
sean-k-mooneyyou can see that i still have 90% capasity on my battery after 4 years12:37
sean-k-mooneyhttps://paste.opendev.org/show/bNtQTQ0T2Tx4Bd2nFyLu/12:37
sean-k-mooneypartly becasue i only let it chagne to 80%12:37
sean-k-mooneyso i went with tlp since i coudl use it to do more then save power it can also extend battry life by contoling change and i cloud have a little more contol over all12:38
sean-k-mooneyhttps://termbin.com/ic06r is the full set of stats it has12:39
sean-k-mooneyand you can twek many of the paramters if you care about it12:39
opendevreviewAmit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down  https://review.opendev.org/c/openstack/nova/+/84888612:49
opendevreviewAmit Uniyal proposed openstack/nova master: add regression test case for bug 1978983  https://review.opendev.org/c/openstack/nova/+/84910412:49
bauzasok, eventually fixed my issue, upgraded the firware12:50
gibisean-k-mooney: I'm wondering that we reintroduced a an issue by reenabling greendns. See my fresh bugreport https://bugs.launchpad.net/nova/+bug/198108012:51
bauzaswhen the lid is closed, you can't upgrade your firmware, bizarre but ok12:51
bauzasso this one was PEBKAC12:51
bauzasI'll need to use to change the profile based on the docking info12:51
bauzasneed to see*12:52
sean-k-mooneygibi: i was still getting that waring before that13:11
gibihm, I don't get it if I turn off greendns13:11
sean-k-mooneywe coudl perhaps disable greendns in our func tests but i dont think a revert would be correct13:12
sean-k-mooneywe shoudl not be doing dns lookup in the func tests13:12
gibithis is not about func test, there it is probably cosmetic. But I wonder if we reintroduced https://bugs.launchpad.net/nova/+bug/180895113:13
gibias the fix for that was to blacklist urllib3 13:13
gibinow we see that urllib3 is imported before monkey patching13:14
gibiso we might retrigger https://bugs.launchpad.net/nova/+bug/180895113:14
sean-k-mooneywell we now use eventliet directly in the nova-api13:15
sean-k-mooneyfor scater gather so we cant skip monky patching in general there anymore13:15
gibiI don't follow you how that is related13:16
sean-k-mooneywe know that disabling greendns can lead to the nova-api and other serivce had locking up on dns quieres13:16
sean-k-mooneyso if we revert the other patch we woudl break things13:16
gibiOK, so if we disable greendns we break thing, but now that we enabled it we might break other things13:17
sean-k-mooneywe might but i dont think we will13:17
sean-k-mooneyhave you found a code path where urllib3 is imported before monkeypatching?13:17
gibiyes, it is the dns code in eventlet that imports it13:18
sean-k-mooneyso https://github.com/openstack/nova/blob/90c0c687a487601e009c72f60c88be92f6a55264/nova/monkey_patch.py#L30=13:18
sean-k-mooneyimports urllib3?13:18
gibiindireclt yes13:18
gibiindirectly13:19
sean-k-mooneythat sounds like an eventlet bug that we canot fix then13:19
sean-k-mooneysince we cant monkeypatch until after we import eventlet13:19
sean-k-mooneyso ya maybe we need to remove urllib3 form the problem list if eventlet have fixed the issue13:20
sean-k-mooneyif not then we shoudl file an eventlet bug13:20
sean-k-mooneyunless from nova import debugger13:20
sean-k-mooneyis pulling it in instead fo eventlet13:20
sean-k-mooneynot that i can see 13:21
sean-k-mooneyhttps://github.com/openstack/nova/blob/90c0c687a487601e009c72f60c88be92f6a55264/nova/debugger.py13:21
gibihm I assuming that deployers had some workaround for our disable greendns as we had that disable for a long time13:25
gibiso we might fixed a known and worked around problem with but re-introduced a break somewhere else. Still I feel like reverting the greendns patch would be safer while we figure out is urllib3 is safe 13:27
gibianyhow updated the bug report linking to this discussion as I have now bandwidth to dig deeper into this now13:29
* gibi moves back tracking down a locking issue in our CellDatabases fixture found by Uggla13:30
Ugglagibi, sorry for that.... Uggla is a black cat that attract bug.13:32
gibiUggla: no worries. I like these kind of challenges :)13:33
*** dasm|off is now known as dasm13:56
sean-k-mooneygibi: it was breaking our downstream customers14:13
sean-k-mooneyso we cant assume that14:13
sean-k-mooneygibi: when i did the revert an renebaled greendns14:14
sean-k-mooneyi noted you could sitll diablie it by setting the env var14:14
sean-k-mooneygibi: we were hardcodeing   os.environ['EVENTLET_NO_GREENDNS'] = 'yes'14:14
sean-k-mooneybefore14:14
sean-k-mooneybut you can just export that before you run nova-api14:15
sean-k-mooneyor set it in the wsgi env if you need too14:15
sean-k-mooneybut i dont think you should14:15
gibisean-k-mooney: so if somebody get hit by https://bugs.launchpad.net/nova/+bug/1808951 then they can set EVENTLET_NO_GREENDNS=yes in there api service script to get back the old behavior14:18
gibigood point. I agree that is a viable workaround14:19
gibiI will not that in https://bugs.launchpad.net/nova/+bug/180895114:19
sean-k-mooneyi just did in https://bugs.launchpad.net/nova/+bug/1981080/comments/214:19
sean-k-mooneybut yes you could not that in the orginal bug14:20
sean-k-mooneygibi: do you happen to know what the orginal urllib3 bug is and if it was fixed14:20
sean-k-mooneyi have read over https://bugs.launchpad.net/nova/+bug/180895114:20
sean-k-mooneyand that references https://github.com/eventlet/eventlet/issues/37114:21
opendevreviewSergii Golovatiuk proposed openstack/nova master: Replace "db archive" with "db archive_deleted_raws"  https://review.opendev.org/c/openstack/nova/+/84796314:21
sean-k-mooneythat seams to have been fiex and broken 14:21
sean-k-mooneybased on differnt python releases14:21
gibiI think it is https://github.com/eventlet/eventlet/issues/371 as you noted14:22
gibiat least https://github.com/eventlet/eventlet/issues/371#issuecomment-1047336652 reports that it still exists14:22
sean-k-mooneyhttps://github.com/eventlet/eventlet/issues/72614:22
sean-k-mooneyso they have fixed some version fo this14:23
sean-k-mooneygibi: did you actully se this cause issues14:23
sean-k-mooneyor did you just notice the warning14:23
sean-k-mooneywe use uwsgi in the devstack jobs14:23
sean-k-mooneyand i have not seen them fail14:23
sean-k-mooneyso just trying ot undersand if there is an actul functional issue14:24
sean-k-mooneyor just a warnign message in our test14:24
gibiI only noticed the warning14:24
gibiI did not see actuall failures14:24
sean-k-mooneyok then i think we should just remove the warnign unless someoen reports an error14:25
sean-k-mooneysince it appears to work properly14:25
sean-k-mooneygiven it has been runnign upstream and in production for down stream custoemr for a whiel now14:26
sean-k-mooneyhum actully14:26
opendevreviewAmit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down  https://review.opendev.org/c/openstack/nova/+/84888614:26
sean-k-mooneygibi: ok we hae no shipped it downstream yet14:27
sean-k-mooneymy backport is still pending14:27
sean-k-mooneyit has been runin in the gate though for a hile14:27
sean-k-mooney*while14:28
gibiat least we are not setting EVENTLET_NO_GREENDNS so we should see it failing in the gate14:30
gibi*not setting in devstack14:30
sean-k-mooneyyep14:30
sean-k-mooneyand since we dont i think it still correct14:31
gibido we test with ssl in the gate?14:33
sean-k-mooneywe use the tls_proxy14:38
sean-k-mooneyso we use apache to do tls termination using a prviate ca  in front of uwsgi14:38
sean-k-mooneyi dont think uwsgi support tls directly or its a pain to configure which is why we do it that way14:39
gibithen we probably don't trigger the ssl code that caused the original problem 14:40
sean-k-mooneymaybe not sure if the api calls other service we would open an ssh clonenction14:41
sean-k-mooneyso i would expect the keystone middelware to trigger it no?14:41
sean-k-mooneyi dont thinkthe recursion was on the server side14:42
gibihm, yeah it was in keystonemiddleware14:43
AiramekHi! I know this isn't the place to ask for support, but #openstack is kinda dead. If anyone has a spare minute, could you look into this?(https://lists.openstack.org/pipermail/openstack-discuss/2022-July/029454.html). Thanks in advance!14:52
opendevreviewBalazs Gibizer proposed openstack/nova master: Add extra info about limitation of CellDatabase fixture  https://review.opendev.org/c/openstack/nova/+/84912314:59
gibiUggla: ^^ I added extra info the the lock escalation error so that next time we can track down the cause easier14:59
opendevreviewsean mooney proposed openstack/nova master: add regression test case for bug 1978983  https://review.opendev.org/c/openstack/nova/+/84910415:07
opendevreviewsean mooney proposed openstack/nova master: Adds check, if admin has set compute service down  https://review.opendev.org/c/openstack/nova/+/84888615:07
gibiI'm done for the week. Have a nice weekend folks!15:53
opendevreviewAlexey Stupnikov proposed openstack/nova master: Remove deleted projects from flavor access list  https://review.opendev.org/c/openstack/nova/+/84913116:06
opendevreviewRajesh Tailor proposed openstack/placement master: Fix typos  https://review.opendev.org/c/openstack/placement/+/84863616:17
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: api: Drop generating a keypair and add special chars to naming  https://review.opendev.org/c/openstack/nova/+/84913317:03
*** dasm is now known as dasm|off21:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!