Monday, 2021-05-10

*** dchen has quit IRC		00:02
*** dychen is now known as dchen		00:16
*** __ministry has joined #openstack-infra		01:17
*** ysandeep\|away is now known as ysandeep\|SL		02:07
*** ykarel has joined #openstack-infra		04:21
*** ralonsoh has joined #openstack-infra		05:03
*** stevebaker has quit IRC		05:15
*** stevebaker has joined #openstack-infra		05:22
*** slaweq has joined #openstack-infra		06:23
*** happyhemant has joined #openstack-infra		06:37
*** sboyron has joined #openstack-infra		06:46
*** jcapitao has joined #openstack-infra		07:12
*** andrewbonney has joined #openstack-infra		07:19
*** slaweq has quit IRC		07:21
*** ociuhandu has joined #openstack-infra		07:21
*** slaweq has joined #openstack-infra		07:23
*** rcernin has quit IRC		07:24
*** hashar has joined #openstack-infra		07:28
*** tosky has joined #openstack-infra		07:36
*** rcernin has joined #openstack-infra		07:38
*** ociuhandu has quit IRC		07:45
*** rpittau\|afk is now known as rpittau		07:48
*** jpena\|off is now known as jpena		07:55
*** rcernin has quit IRC		07:56
*** ykarel is now known as ykarel\|lunch		07:58
*** lucasagomes has joined #openstack-infra		08:06
*** kopecmartin has quit IRC		08:10
*** Guest50777 has joined #openstack-infra		08:14
*** Guest50777 is now known as geguileo		08:14
*** dpawlik has quit IRC		08:21
*** kopecmartin has joined #openstack-infra		08:25
*** dpawlik1 has joined #openstack-infra		08:28
*** dtantsur\|afk is now known as dtantsur		08:42
*** ykarel\|lunch has quit IRC		08:42
*** ykarel_ has joined #openstack-infra		08:42
*** ykarel_ has quit IRC		08:43
*** ykarel_ has joined #openstack-infra		08:43
*** whoami-rajat has joined #openstack-infra		08:46
*** ykarel_ is now known as ykarel		08:46
*** ociuhandu has joined #openstack-infra		08:49
*** ociuhandu has quit IRC		08:53
*** ociuhandu has joined #openstack-infra		09:06
*** sshnaidm\|afk is now known as sshnaidm		09:08
*** gfidente has joined #openstack-infra		09:15
*** rcernin has joined #openstack-infra		09:21
*** bauzas has quit IRC		09:24
*** bauzas has joined #openstack-infra		09:27
*** lpetrut has joined #openstack-infra		09:27
*** __ministry1 has joined #openstack-infra		09:54
*** hjensas_ is now known as hjensas\|lunch		09:55
*** __ministry has quit IRC		09:55
*** __ministry1 is now known as __ministry		09:55
*** dciabrin has joined #openstack-infra		10:07
*** dciabrin_ has quit IRC		10:07
*** rcernin has quit IRC		10:21
*** rcernin has joined #openstack-infra		10:35
*** carloss has joined #openstack-infra		10:38
*** ociuhandu has quit IRC		10:41
*** jcapitao is now known as jcapitao_lunch		10:47
*** rcernin has quit IRC		10:49
*** kopecmartin has quit IRC		11:01
*** rcernin has joined #openstack-infra		11:02
*** dpawlik1 has quit IRC		11:03
*** kopecmartin has joined #openstack-infra		11:08
*** dpawlik0 has joined #openstack-infra		11:11
*** ociuhandu has joined #openstack-infra		11:11
*** rcernin has quit IRC		11:15
*** ociuhandu has quit IRC		11:19
*** hjensas\|lunch is now known as hjensas		11:20
*** jpena is now known as jpena\|lunch		11:26
*** ociuhandu has joined #openstack-infra		11:32
*** ociuhandu has quit IRC		11:33
*** __ministry has quit IRC		11:38
*** rlandy has joined #openstack-infra		11:40
*** lajoskatona has joined #openstack-infra		11:44
*** ociuhandu has joined #openstack-infra		11:50
*** jcapitao_lunch is now known as jcapitao		11:52
*** dpawlik0 is now known as dpawlik		11:56
*** ociuhandu has quit IRC		12:15
*** nweinber has joined #openstack-infra		12:19
*** ociuhandu has joined #openstack-infra		12:23
*** jpena\|lunch is now known as jpena		12:27
*** ociuhandu has quit IRC		12:27
*** ociuhandu has joined #openstack-infra		12:31
*** ociuhandu has quit IRC		12:31
*** ociuhandu has joined #openstack-infra		12:35
*** ociuhandu has quit IRC		12:41
*** ociuhandu has joined #openstack-infra		12:48
lajoskatona	Hi, I proposed a patch for renaming tap-as-a-service: https://review.opendev.org/c/openstack/project-config/+/790093 , is it necessary to participate on the weekly infra meeting?	12:56
lajoskatona	I added this topic to the meeting wiki: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting	12:57
fungi	lajoskatona: no really, no, that's just the next time i expect having enough people around to coordinate the gerrit outage we'll need to be able to perform the rename	12:58
fungi	er, not really i mean	12:58
*** ociuhandu has quit IRC		13:03
lajoskatona	fungi: thanks	13:05
*** ociuhandu has joined #openstack-infra		13:14
ttx	fungi: re: skyline, if you add me to skyline-core and skyline-release I can add their initial core group and remove myself.	13:16
fungi	sure, happy to	13:17
fungi	just a sec	13:17
fungi	ttx: done!	13:18
fungi	so much faster via command line	13:18
ttx	alright, on it now	13:18
ttx	ok all set	13:21
*** ociuhandu has quit IRC		13:22
fungi	they should feel free to let us know if they need help with anything	13:22
*** tosky_ has joined #openstack-infra		13:27
*** tosky has quit IRC		13:27
*** tosky_ is now known as tosky		13:27
*** ociuhandu has joined #openstack-infra		13:28
*** ykarel_ has joined #openstack-infra		13:31
*** ykarel has quit IRC		13:31
*** ykarel_ is now known as ykarel		13:32
*** vishalmanchanda has joined #openstack-infra		13:37
*** ociuhandu has quit IRC		13:46
*** ociuhandu has joined #openstack-infra		13:47
*** ociuhandu has quit IRC		13:57
*** ociuhandu has joined #openstack-infra		13:59
*** rcernin has joined #openstack-infra		14:11
*** rcernin has quit IRC		14:16
*** lpetrut has quit IRC		14:22
*** ociuhandu has quit IRC		14:26
*** ociuhandu has joined #openstack-infra		14:30
*** happyhemant has quit IRC		14:37
*** dklyle has joined #openstack-infra		14:38
*** ociuhandu has quit IRC		14:39
*** ociuhandu has joined #openstack-infra		14:39
*** gyee has joined #openstack-infra		15:44
*** hashar is now known as hasharDinner		15:47
*** ykarel has quit IRC		15:57
*** lucasagomes has quit IRC		16:02
*** rlandy is now known as rlandy\|biab		16:03
*** rpittau is now known as rpittau\|afk		16:03
*** ociuhand_ has joined #openstack-infra		16:04
*** ociuhand_ has quit IRC		16:06
*** ociuhandu has quit IRC		16:07
*** ociuhandu has joined #openstack-infra		16:21
*** d34dh0r53 has quit IRC		16:26
*** ralonsoh has quit IRC		16:27
*** ociuhandu has quit IRC		16:28
*** d34dh0r53 has joined #openstack-infra		16:30
*** fungi has quit IRC		16:57
*** jpena is now known as jpena\|off		16:59
*** tdasilva_ has joined #openstack-infra		17:05
*** tdasilva has quit IRC		17:07
*** tdasilva_ has quit IRC		17:08
*** tdasilva_ has joined #openstack-infra		17:09
*** andrewbonney has quit IRC		17:09
*** lajoskatona has quit IRC		17:13
*** gfidente is now known as gfidente\|afk		17:14
*** dtantsur is now known as dtantsur\|afk		17:19
*** rlandy\|biab is now known as rlandy		17:22
*** fungi has joined #openstack-infra		17:35
*** jcapitao has quit IRC		17:37
sean-k-mooney	clarkb: o/ regarding elk and the rest. losing them i think would be a pretty big blow to how we track issue in the upstream ci	17:44
sean-k-mooney	that said i get the resouce constraied element just expressing that losing the ablity to check http://logstash.openstack.org/ if an issue is a one off and quntifying it would be a signifcant ux reguress when looking at the healt of the openstack gate	17:46
sean-k-mooney	by the way the topic of how we can add new elastic receck querries also came up in the nova ptg we were hoping to figure out if we could keep the quries in tree for example	17:47
sean-k-mooney	or get commit rights to the elastic recheck repo to merge new queries	17:47
clarkb	sean-k-mooney: yes I agree, the problem is basically no one has done any effort to improve them, upgrade them, make them upgradeable in years (melwitt did improve some of the scripts recently though)	17:48
clarkb	basically they need to be redone from scratch	17:48
sean-k-mooney	ya	17:48
clarkb	and tahts a lot of work that the people involved right now don't seem to have time for	17:48
sean-k-mooney	for what its worth if they were done form scratch it would be fine too	17:48
sean-k-mooney	ya ok	17:49
sean-k-mooney	im not sure what that invovles by the way	17:49
sean-k-mooney	but i asusme if we were going to redeploy we also likely would want to use the amazone fork	17:49
melwitt	! what's wrong	17:49
openstack	melwitt: Error: "what's" is not a valid command.	17:49
melwitt	oh whoops, I didn't know that was a command flag	17:50
dansmith	melwitt: clarkb is proposing decommissioning elasticsearch and associated tooling	17:50
dansmith	melwitt: email to the list just now	17:50
clarkb	everything :) the config management needs to be replaced since puppet doesn't work on newer distros (in our env anyway). The elasticsearch cluster needs to be upgraded to a modern version that is open source (the amazon fork?). This will precipitate updating logstash and kibana. Last time I tried to update kibana I rage quit beacuse they basically make it impossible to use without paying	17:50
sean-k-mooney	melwitt: its all runing on ubuntu 16.04	17:50
melwitt	oh no :*****(	17:50
clarkb	for their security stuff (so not open source)	17:50
sean-k-mooney	melwitt: which is eol	17:50
melwitt	oof	17:51
sean-k-mooney	ya so if we want to keep this aroudn we would need to set it all up form scratch	17:51
clarkb	basically its been stuck in the dark ages because no one has had time to redo it all. And now the platform it runs on isn't supported	17:51
melwitt	yeah, ok	17:51
dansmith	clarkb: to be fair, security updates through 2024 though right?	17:52
sean-k-mooney	by the way https://softwarefactory-project.io/ its part of software factory	17:52
clarkb	it should be noted that other people have had similar struggles with ELK and that is ultiamtely why amazon forked	17:52
clarkb	dansmith: no, that is only for paying ubuntu advantage custoemrs	17:52
clarkb	(we do have access to that for a small number of servers, but nowhere near enough to cover the ELK stuff so we haven't prioritized it)	17:52
clarkb	^ that/it == ubuntu advantage	17:53
dansmith	clarkb: okay I thought critical security updates would be around for longer	17:53
sean-k-mooney	basiclaly if we wanted to keep having it we woudl need to move to https://opendistro.github.io/for-elasticsearch/	17:53
dansmith	but maybe we could ask canonical for some more of "that" ?	17:53
clarkb	dansmith: I suppose its possible, they would update the normal repos when necessary. But it is my undersatnding that that isn't the case	17:53
clarkb	dansmith: we could. But even then I'm nto sure I want to keep running what is extremely old and needs help	17:54
clarkb	really the major issue is it is a complicated system that has limped along for years now. The xenial eol is showing us that we can't keep limping it along reliably	17:54
dansmith	presumably we could also keep it running until we hit a CVE that affects stuff that is actually running there	17:54
dansmith	understand	17:54
sean-k-mooney	so im wonderign if we wanted to replace it are there alternitive we could use or would updating the automation to deploy a supportable version be the way to go	17:55
sean-k-mooney	or just from a reouce point of view is it to heavy	17:56
sean-k-mooney	e.g. even if we had the people is it too much to continue hosting	17:56
dansmith	I dunno what to say.. it's heavy, hard, unmaintained, and unsexy.. but man, when you need it you need it	17:56
clarkb	sean-k-mooney: in my head it is too resource heavy for the amount of effort people seem to put into it	17:56
sean-k-mooney	clarkb: i think you said it was 1/4 of the resouces?	17:56
clarkb	sean-k-mooney: basically there is a major imbalance there and it is hard to justify	17:56
clarkb	sean-k-mooney: depends on how you slice it. 1/4 of servers, 1/2 of memory, 1/3 of disk	17:57
sean-k-mooney	20 x 4 vcpu + 4GB RAM logstash-worker servers	17:57
clarkb	dansmith: I agree, I mean I came up with the idea years ago and have tried to limp it along as far as I could because I do think it is a powerful tool. But I'm sort of stuck in a position now where it just doesn't make sense given the effort put into it	17:57
dansmith	I feel like we'll basically stop fixing real complex failures that affects other projects and just move to recheck grinding (which we're already doing too much of), but this will eliminate the argument we can even make against it	17:57
sean-k-mooney	that line alone makes me cringe	17:57
dansmith	but maybe we're past that point anyway	17:57
melwitt	clarkb: how would it look if someone were to try and modernize it? would we be able to use an infra resource or would we have to host it outside of openstack infra?	17:58
sean-k-mooney	so im wondering if we can maybe have a zuul post job that woudl do somethign simiar	17:58
sean-k-mooney	well or role	17:58
clarkb	melwitt: I suspect that the resource needs may change dramatically when components are upgraded (though in which direction I'm not sure). If the resource needs stay flat or increase I don't think we'll be able to keep hosting it. IF we can modernize it and reduce the footprint then it may be possible to keep hosting it	17:59
sean-k-mooney	eg if we could have somethign process the logs in the vms used by the logs and give a list of bugs that migt have been hit in a comment or file	17:59
clarkb	sean-k-mooney: that will increase the runtime of every job which is why we haven't done it	17:59
clarkb	and it will be proportional to the amount of logs produced which means longer running jobs will run even longer	17:59
sean-k-mooney	ah thats fair, althoug could we do it only if the job failed?	18:00
dansmith	it's also amazing at how slow and behind it is normally, and can be at the bad times.. like days behind, which defeats a lot of the use	18:00
clarkb	yup that info is available within the jobs	18:00
dansmith	so I guess I can't imagine resource usage going down really	18:00
clarkb	dansmith: that usually happens beacuse the cluster itself has fallen over. Supposedly new elasticsearch is better at that stuff. But I don't know from experience	18:00
sean-k-mooney	so what im wondering is could we have somethign in repo where each project team could list some things to check for	18:01
clarkb	but ya its always been unwieldy which is why we have detached it from the success/fail path in zuul	18:01
clarkb	its nice to have if it works but if it breaks we didn't want it to hold anyone up	18:01
clarkb	sean-k-mooney: ya I mean zuul lets you run pretty arbitrary things. You should be able to do things like that. I think tristanC and dirk had done similar with some of their log filtering in the past	18:02
sean-k-mooney	ya so we could maybe explore that without holding up the removal of the services	18:03
sean-k-mooney	i think once they are gone we will learn how much we atually use them and how painful that removal actully is	18:04
sean-k-mooney	and that will proably motivate use to eitehr adresss it or not	18:04
clarkb	oh thats another thing to point out. I don't think we can currently replace any one of the elasticsearch nodes due to a lack of quota breathing room	18:05
sean-k-mooney	ack so once they are dead they are gone for good at least for now	18:05
sean-k-mooney	e.g. unless more quota is found	18:06
clarkb	* replace without removing an existing one first	18:06
fungi	probably also warrants pointing out, the platform reaching eol and not getting security updates isn't really the main concern, it's just a recent addition to a much bigger risk... all the components which make up this service are not deployed from packages in the distro anyway, they're all already unsupported ancient versions of software which could end up with major security holes at any time (or	18:06
fungi	may even already be vulnerable to some widely known ones and we just haven't found out)	18:07
clarkb	fwiw I don't think there are good answers here. Any choice we make is likely to involve some sort of compromise. I'm happy to think through the options and don't intend on turning things off today :)	18:09
sean-k-mooney	cheaky question but none of the cloud proviers we use have a production loging service we could use :)	18:09
clarkb	sean-k-mooney: not that I am aware of	18:09
sean-k-mooney	:) ok so we cant outsource it to them hehe that would be hard across providers anyway	18:10
clarkb	but ya basically we need to switch to ansible (and docker if that makes it easier), upgrade elasticsearch which will precipitate upgrades of logstash and kibana. We need to be careful to avoid the no longer open source lineage of ES and find one that has a future in open source.	18:10
fungi	but also, as mentioned in clarkb's post to the ml, nothing about this service suite requires privileged access to our other systems. anyone who wanted to run an equivalent service and slurp in our build logs could do so	18:10
clarkb	and ya if we're already doing all that effort then maybe it makes more sense for ^ ? I dunno	18:11
sean-k-mooney	fungi: i guess the main issue is jsut the data taransfer	18:11
clarkb	I know I don't have time for that effort. historically its been a fairly full time job to sort that stuff out over a few weeks because ES and friends hide all the important features behind their products	18:11
fungi	sean-k-mooney: no more than it is now. we already transfer that data back and forth between the systems doing this	18:11
clarkb	That said I can probably help guide others going through the process as I've done it a couple of times years ago	18:11
sean-k-mooney	fungi: right i just ment if i wanted to self host an elk stack and have a zull job that just listened to the first part ci comment and then upload the logs to my instance to process its proably more data then my isp would be happy with	18:13
sean-k-mooney	im assuming we are talking about several TBs of logs a day	18:14
fungi	sean-k-mooney: maybe. kinda figured "someone" running an equivalent service would be somebody who could convince their employer to have some free systems and bandwidth on the internet somewhere	18:14
sean-k-mooney	hehe that would be the sensible approch	18:15
clarkb	sean-k-mooney: its 6TB total but 5TB effective storage due to the way replicas work. We have set our rotation to 7 days	18:15
clarkb	sean-k-mooney: so little bit less than 1TB / day	18:15
clarkb	if we didn't filter debug logs it would fill our disks in one day probably	18:15
fungi	and that's the exploded size the way es indexes it all too. the compressed logfiles would be orders of magnitude less	18:15
fungi	oh, good point, you do wind up pulling debug loglines i suppose	18:16
*** sshnaidm is now known as sshnaidm\|afk		18:17
clarkb	and to be clear I'm totally ok with discussing this further and taking our time to work through it. That is largely why I didn't say "I'm doing this next week". Because I expected some discussionj and wanted to make sure we listened :)	18:18
fungi	actually, cacti probably provides a convenient window into the bandwidth requirements	18:20
fungi	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1275&rra_id=all	18:21
sean-k-mooney	it would also proably be less if i was monitoring only a subset of project i work on rather then everyting. or just failed jobs	18:21
* sean-k-mooney this is totally not an excuse to use the excessinve amount of home storeat i bought a few months ago		18:22
fungi	that's one of the 20 workers, looks like the inbound is <2Mbps but outbound is much larger, i wouldn't have expected that. still, if the elk stack was colocated with the workers then that would be "local"	18:22
fungi	so based on that sample, it would probably peak at <50Mbps download	18:23
sean-k-mooney	so about 15.5TB per month	18:25
sean-k-mooney	if peak was continuous	18:25
fungi	well, if it were continuous which it's not, i was talking about peak utilization	18:25
fungi	yeah	18:25
fungi	i'm guessing it's closer to half that	18:25
fungi	cacti says average throughput for the month on logstash-worker01 was 651.23Kbps	18:26
clarkb	fungi: inbound is compressed outbound is not	18:27
fungi	right	18:27
fungi	but outbound is also filtered?	18:27
clarkb	fungi: only to remove debug which is only done in files with debug formatted such that we can filter it :)	18:28
fungi	sean-k-mooney: my math says based on that average logstash-worker01 would have transferred just shy of 200MiB in 30.5 days, so call it 3.9TiB/mo	18:30
clarkb	fungi: * 20	18:30
fungi	clarkb: yep, that's with the *20 thrown in	18:31
clarkb	ah ok	18:31
fungi	if all the workers are transferring equal amounts, then together they would have downloaded a little under 4TiB from various swift endpoints in a month	18:32
sean-k-mooney	ya i know my fiar usage limit used to be 1TB a month i think its higher now but i proably could only host a subset of logs realistically at home	18:34
clarkb	one thing to keep in mind is that data on the wire doesn't map to data on disk either due to how ES works so don't equate those two	18:35
sean-k-mooney	well unless i upgrade to a busness package which i have been treatening to do for a while	18:35
clarkb	elasticsearch takes X data and stores it with X*somenumber bytes	18:35
fungi	worker02 was lower than 01 by roughly a third: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1253&rra_id=all	18:35
sean-k-mooney	clarkb: i may or may not have 80TB of raw storage in a diskshlef at home currently....	18:35
clarkb	wow	18:35
sean-k-mooney	i probaly should have spent it on a comptue server instead but i bought one in november	18:36
sean-k-mooney	i get a PowerVault MD3060e	18:37
sean-k-mooney	*got	18:37
clarkb	worth mentioning that we haven't been idle either. I've been upgrading the entirety of our zuul, nodepool, and zookeeper cluster. Other than a single service restart to pick up the new zk server locations I don't know that anyone has noticed :) There will be one more restart when we replace the scheduler (this week I hope). I have also been working on replacing config management for	18:42
clarkb	mailman with ansible. And we'll do an upgrade of that server soon too I hope. ianw has been working on a gerrit server upgrade which increases available memory. Once that migration is done I also want to upgrade to gerrit 3.3	18:42
sean-k-mooney	yep i know ye do a lot of work to keep everything running and i definetly apriciate that	18:43
*** lajoskatona has joined #openstack-infra		18:46
clarkb	we also got a new cloud enrolled to nodepool (though with small number of reseources. We can expand those resources but it will require us to do provider specific executors and some other things I haven't wanted to think about yet)	18:47
clarkb	Another struggle is we try to go slushy when everyone is doing releases and now we are in catch up mode	18:47
*** lajoskatona has quit IRC		18:49
*** fungi has quit IRC		18:54
*** fungi has joined #openstack-infra		18:59
*** fungi has quit IRC		19:20
*** ysirndjuro has joined #openstack-infra		19:26
*** fungi has joined #openstack-infra		19:29
*** nweinber has quit IRC		20:16
*** zxiiro has joined #openstack-infra		20:23
*** rlandy is now known as rlandy\|biab		20:24
*** vishalmanchanda has quit IRC		20:27
*** jamesden_ has quit IRC		20:37
*** sboyron has quit IRC		20:42
*** jamesdenton has joined #openstack-infra		21:07
*** rlandy\|biab is now known as rlandy		21:11
*** hasharDinner has quit IRC		22:02
*** iurygregory has quit IRC		22:07
*** iurygregory has joined #openstack-infra		22:11
*** rcernin has joined #openstack-infra		22:41
*** hamalq has joined #openstack-infra		22:42
*** rcernin has quit IRC		22:55
*** tosky has quit IRC		22:57
*** rcernin has joined #openstack-infra		23:01
*** rcernin has quit IRC		23:03
*** rcernin has joined #openstack-infra		23:03
*** chaconpiza has quit IRC		23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!