Monday, 2021-05-10

*** dchen has quit IRC00:02
*** dychen is now known as dchen00:16
*** __ministry has joined #openstack-infra01:17
*** ysandeep|away is now known as ysandeep|SL02:07
*** ykarel has joined #openstack-infra04:21
*** ralonsoh has joined #openstack-infra05:03
*** stevebaker has quit IRC05:15
*** stevebaker has joined #openstack-infra05:22
*** slaweq has joined #openstack-infra06:23
*** happyhemant has joined #openstack-infra06:37
*** sboyron has joined #openstack-infra06:46
*** jcapitao has joined #openstack-infra07:12
*** andrewbonney has joined #openstack-infra07:19
*** slaweq has quit IRC07:21
*** ociuhandu has joined #openstack-infra07:21
*** slaweq has joined #openstack-infra07:23
*** rcernin has quit IRC07:24
*** hashar has joined #openstack-infra07:28
*** tosky has joined #openstack-infra07:36
*** rcernin has joined #openstack-infra07:38
*** ociuhandu has quit IRC07:45
*** rpittau|afk is now known as rpittau07:48
*** jpena|off is now known as jpena07:55
*** rcernin has quit IRC07:56
*** ykarel is now known as ykarel|lunch07:58
*** lucasagomes has joined #openstack-infra08:06
*** kopecmartin has quit IRC08:10
*** Guest50777 has joined #openstack-infra08:14
*** Guest50777 is now known as geguileo08:14
*** dpawlik has quit IRC08:21
*** kopecmartin has joined #openstack-infra08:25
*** dpawlik1 has joined #openstack-infra08:28
*** dtantsur|afk is now known as dtantsur08:42
*** ykarel|lunch has quit IRC08:42
*** ykarel_ has joined #openstack-infra08:42
*** ykarel_ has quit IRC08:43
*** ykarel_ has joined #openstack-infra08:43
*** whoami-rajat has joined #openstack-infra08:46
*** ykarel_ is now known as ykarel08:46
*** ociuhandu has joined #openstack-infra08:49
*** ociuhandu has quit IRC08:53
*** ociuhandu has joined #openstack-infra09:06
*** sshnaidm|afk is now known as sshnaidm09:08
*** gfidente has joined #openstack-infra09:15
*** rcernin has joined #openstack-infra09:21
*** bauzas has quit IRC09:24
*** bauzas has joined #openstack-infra09:27
*** lpetrut has joined #openstack-infra09:27
*** __ministry1 has joined #openstack-infra09:54
*** hjensas_ is now known as hjensas|lunch09:55
*** __ministry has quit IRC09:55
*** __ministry1 is now known as __ministry09:55
*** dciabrin has joined #openstack-infra10:07
*** dciabrin_ has quit IRC10:07
*** rcernin has quit IRC10:21
*** rcernin has joined #openstack-infra10:35
*** carloss has joined #openstack-infra10:38
*** ociuhandu has quit IRC10:41
*** jcapitao is now known as jcapitao_lunch10:47
*** rcernin has quit IRC10:49
*** kopecmartin has quit IRC11:01
*** rcernin has joined #openstack-infra11:02
*** dpawlik1 has quit IRC11:03
*** kopecmartin has joined #openstack-infra11:08
*** dpawlik0 has joined #openstack-infra11:11
*** ociuhandu has joined #openstack-infra11:11
*** rcernin has quit IRC11:15
*** ociuhandu has quit IRC11:19
*** hjensas|lunch is now known as hjensas11:20
*** jpena is now known as jpena|lunch11:26
*** ociuhandu has joined #openstack-infra11:32
*** ociuhandu has quit IRC11:33
*** __ministry has quit IRC11:38
*** rlandy has joined #openstack-infra11:40
*** lajoskatona has joined #openstack-infra11:44
*** ociuhandu has joined #openstack-infra11:50
*** jcapitao_lunch is now known as jcapitao11:52
*** dpawlik0 is now known as dpawlik11:56
*** ociuhandu has quit IRC12:15
*** nweinber has joined #openstack-infra12:19
*** ociuhandu has joined #openstack-infra12:23
*** jpena|lunch is now known as jpena12:27
*** ociuhandu has quit IRC12:27
*** ociuhandu has joined #openstack-infra12:31
*** ociuhandu has quit IRC12:31
*** ociuhandu has joined #openstack-infra12:35
*** ociuhandu has quit IRC12:41
*** ociuhandu has joined #openstack-infra12:48
lajoskatonaHi, I proposed a patch for renaming tap-as-a-service: https://review.opendev.org/c/openstack/project-config/+/790093 , is it necessary to participate on the weekly infra meeting?12:56
lajoskatonaI added this topic to the meeting wiki: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting12:57
fungilajoskatona: no really, no, that's just the next time i expect having enough people around to coordinate the gerrit outage we'll need to be able to perform the rename12:58
fungier, not really i mean12:58
*** ociuhandu has quit IRC13:03
lajoskatonafungi: thanks13:05
*** ociuhandu has joined #openstack-infra13:14
ttxfungi: re: skyline, if you add me to skyline-core and skyline-release I can add their initial core group and remove myself.13:16
fungisure, happy to13:17
fungijust a sec13:17
fungittx: done!13:18
fungiso much faster via command line13:18
ttxalright, on it now13:18
ttxok all set13:21
*** ociuhandu has quit IRC13:22
fungithey should feel free to let us know if they need help with anything13:22
*** tosky_ has joined #openstack-infra13:27
*** tosky has quit IRC13:27
*** tosky_ is now known as tosky13:27
*** ociuhandu has joined #openstack-infra13:28
*** ykarel_ has joined #openstack-infra13:31
*** ykarel has quit IRC13:31
*** ykarel_ is now known as ykarel13:32
*** vishalmanchanda has joined #openstack-infra13:37
*** ociuhandu has quit IRC13:46
*** ociuhandu has joined #openstack-infra13:47
*** ociuhandu has quit IRC13:57
*** ociuhandu has joined #openstack-infra13:59
*** rcernin has joined #openstack-infra14:11
*** rcernin has quit IRC14:16
*** lpetrut has quit IRC14:22
*** ociuhandu has quit IRC14:26
*** ociuhandu has joined #openstack-infra14:30
*** happyhemant has quit IRC14:37
*** dklyle has joined #openstack-infra14:38
*** ociuhandu has quit IRC14:39
*** ociuhandu has joined #openstack-infra14:39
*** gyee has joined #openstack-infra15:44
*** hashar is now known as hasharDinner15:47
*** ykarel has quit IRC15:57
*** lucasagomes has quit IRC16:02
*** rlandy is now known as rlandy|biab16:03
*** rpittau is now known as rpittau|afk16:03
*** ociuhand_ has joined #openstack-infra16:04
*** ociuhand_ has quit IRC16:06
*** ociuhandu has quit IRC16:07
*** ociuhandu has joined #openstack-infra16:21
*** d34dh0r53 has quit IRC16:26
*** ralonsoh has quit IRC16:27
*** ociuhandu has quit IRC16:28
*** d34dh0r53 has joined #openstack-infra16:30
*** fungi has quit IRC16:57
*** jpena is now known as jpena|off16:59
*** tdasilva_ has joined #openstack-infra17:05
*** tdasilva has quit IRC17:07
*** tdasilva_ has quit IRC17:08
*** tdasilva_ has joined #openstack-infra17:09
*** andrewbonney has quit IRC17:09
*** lajoskatona has quit IRC17:13
*** gfidente is now known as gfidente|afk17:14
*** dtantsur is now known as dtantsur|afk17:19
*** rlandy|biab is now known as rlandy17:22
*** fungi has joined #openstack-infra17:35
*** jcapitao has quit IRC17:37
sean-k-mooneyclarkb: o/ regarding elk and the rest. losing them i think would be a pretty big blow to how we track issue in the upstream ci17:44
sean-k-mooneythat said i get the resouce constraied element just expressing that losing the ablity to check http://logstash.openstack.org/ if an issue is a one off and quntifying it would be a signifcant ux reguress when looking at the healt of the openstack gate17:46
sean-k-mooneyby the way the topic of how we can add new elastic receck querries also came up in the nova ptg we were hoping to figure out if we could keep the quries in tree for example17:47
sean-k-mooneyor get commit rights to the elastic recheck repo to merge new queries17:47
clarkbsean-k-mooney: yes I agree, the problem is basically no one has done any effort to improve them, upgrade them, make them upgradeable in years (melwitt did improve some of the scripts recently though)17:48
clarkbbasically they need to be redone from scratch17:48
sean-k-mooneyya17:48
clarkband tahts a lot of work that the people involved right now don't seem to have time for17:48
sean-k-mooneyfor what its worth if they were done form scratch it would be fine too17:48
sean-k-mooneyya ok17:49
sean-k-mooneyim not sure what that invovles by the way17:49
sean-k-mooneybut i asusme if we were going to redeploy we also likely would want to use the amazone fork17:49
melwitt! what's wrong17:49
openstackmelwitt: Error: "what's" is not a valid command.17:49
melwittoh whoops, I didn't know that was a command flag17:50
dansmithmelwitt: clarkb is proposing decommissioning elasticsearch and associated tooling17:50
dansmithmelwitt: email to the list just now17:50
clarkbeverything :) the config management needs to be replaced since puppet doesn't work on newer distros (in our env anyway). The elasticsearch cluster needs to be upgraded to a modern version that is open source (the amazon fork?). This will precipitate updating logstash and kibana. Last time I tried to update kibana I rage quit beacuse they basically make it impossible to use without paying17:50
sean-k-mooneymelwitt: its all runing on ubuntu 16.0417:50
melwittoh no :*****(17:50
clarkbfor their security stuff (so not open source)17:50
sean-k-mooneymelwitt: which is eol17:50
melwittoof17:51
sean-k-mooneyya so if we want to keep this aroudn we would need to set it all up form scratch17:51
clarkbbasically its been stuck in the dark ages because no one has had time to redo it all. And now the platform it runs on isn't supported17:51
melwittyeah, ok17:51
dansmithclarkb: to be fair, security updates through 2024 though right?17:52
sean-k-mooneyby the way https://softwarefactory-project.io/ its part of software factory17:52
clarkbit should be noted that other people have had similar struggles with ELK and that is ultiamtely why amazon forked17:52
clarkbdansmith: no, that is only for paying ubuntu advantage custoemrs17:52
clarkb(we do have access to that for a small number of servers, but nowhere near enough to cover the ELK stuff so we haven't prioritized it)17:52
clarkb^ that/it == ubuntu advantage17:53
dansmithclarkb: okay I thought critical security updates would be around for longer17:53
sean-k-mooneybasiclaly if we wanted to keep having it we woudl need to move to https://opendistro.github.io/for-elasticsearch/17:53
dansmithbut maybe we could ask canonical for some more of "that" ?17:53
clarkbdansmith: I suppose its possible, they would update the normal repos when necessary. But it is my undersatnding that that isn't the case17:53
clarkbdansmith: we could. But even then I'm nto sure I want to keep running what is extremely old and needs help17:54
clarkbreally the major issue is it is a complicated system that has limped along for years now. The xenial eol is showing us that we can't keep limping it along reliably17:54
dansmithpresumably we could also keep it running until we hit a CVE that affects stuff that is actually running there17:54
dansmithunderstand17:54
sean-k-mooneyso im wonderign if we wanted to replace it are there alternitive we could use or would updating the automation to deploy a supportable version be the way to go17:55
sean-k-mooneyor just from a reouce point of view is it to heavy17:56
sean-k-mooneye.g. even if we had the people is it too much to continue hosting17:56
dansmithI dunno what to say.. it's heavy, hard, unmaintained, and unsexy.. but man, when you need it you need it17:56
clarkbsean-k-mooney: in my head it is too resource heavy for the amount of effort people seem to put into it17:56
sean-k-mooneyclarkb: i think you said it was 1/4 of the resouces?17:56
clarkbsean-k-mooney: basically there is a major imbalance there and it is hard to justify17:56
clarkbsean-k-mooney: depends on how you slice it. 1/4 of servers, 1/2 of memory, 1/3 of disk17:57
sean-k-mooney20 x 4 vcpu + 4GB RAM logstash-worker servers17:57
clarkbdansmith: I agree, I mean I came up with the idea years ago and have tried to limp it along as far as I could because I do think it is a powerful tool. But I'm sort of stuck in a position now where it just doesn't make sense given the effort put into it17:57
dansmithI feel like we'll basically stop fixing real complex failures that affects other projects and just move to recheck grinding (which we're already doing too much of), but this will eliminate the argument we can even make against it17:57
sean-k-mooneythat line alone makes me cringe17:57
dansmithbut maybe we're past that point anyway17:57
melwittclarkb: how would it look if someone were to try and modernize it? would we be able to use an infra resource or would we have to host it outside of openstack infra?17:58
sean-k-mooneyso im wondering if we can maybe have a zuul post job that woudl do somethign simiar17:58
sean-k-mooneywell or role17:58
clarkbmelwitt: I suspect that the resource needs may change dramatically when components are upgraded (though in which direction I'm not sure). If the resource needs stay flat or increase I don't think we'll be able to keep hosting it. IF we can modernize it and reduce the footprint then it may be possible to keep hosting it17:59
sean-k-mooneyeg if we could have somethign process the logs in the vms used by the logs and give a list of bugs that migt have been hit in a comment or file17:59
clarkbsean-k-mooney: that will increase the runtime of every job which is why we haven't done it17:59
clarkband it will be proportional to the amount of logs produced which means longer running jobs will run even longer17:59
sean-k-mooneyah thats fair, althoug could we do it only if the job failed?18:00
dansmithit's also amazing at how slow and behind it is normally, and can be at the bad times.. like days behind, which defeats a lot of the use18:00
clarkbyup that info is available within the jobs18:00
dansmithso I guess I can't imagine resource usage going down really18:00
clarkbdansmith: that usually happens beacuse the cluster itself has fallen over. Supposedly new elasticsearch is better at that stuff. But I don't know from experience18:00
sean-k-mooneyso what im wondering is could we have somethign in repo where each project team could list some things to check for18:01
clarkbbut ya its always been unwieldy which is why we have detached it from the success/fail path in zuul18:01
clarkbits nice to have if it works but if it breaks we didn't want it to hold anyone up18:01
clarkbsean-k-mooney: ya I mean zuul lets you run pretty arbitrary things. You should be able to do things like that. I think tristanC and dirk had done similar with some of their log filtering in the past18:02
sean-k-mooneyya so we could maybe explore that without holding up the removal of the services18:03
sean-k-mooneyi think once they are gone we will learn how much we atually use them and how painful that removal actully is18:04
sean-k-mooneyand that will proably motivate use to eitehr adresss it or not18:04
clarkboh thats another thing to point out. I don't think we can currently replace any one of the elasticsearch nodes due to a lack of quota breathing room18:05
sean-k-mooneyack so once they are dead they are gone for good at least for now18:05
sean-k-mooneye.g. unless more quota is found18:06
clarkb* replace without removing an existing one first18:06
fungiprobably also warrants pointing out, the *platform* reaching eol and not getting security updates isn't really the main concern, it's just a recent addition to a much bigger risk... all the components which make up this service are not deployed from packages in the distro anyway, they're all already unsupported ancient versions of software which could end up with major security holes at any time (or18:06
fungimay even already be vulnerable to some widely known ones and we just haven't found out)18:07
clarkbfwiw I don't think there are good answers here. Any choice we make is likely to involve some sort of compromise. I'm happy to think through the options and don't intend on turning things off today :)18:09
sean-k-mooneycheaky question but none of the cloud proviers we use have a production loging service we could use :)18:09
clarkbsean-k-mooney: not that I am aware of18:09
sean-k-mooney:) ok so we cant outsource it to them  hehe that would be hard across providers anyway18:10
clarkbbut ya basically we need to switch to ansible (and docker if that makes it easier), upgrade elasticsearch which will precipitate upgrades of logstash and kibana. We need to be careful to avoid the no longer open source lineage of ES and find one that has a future in open source.18:10
fungibut also, as mentioned in clarkb's post to the ml, nothing about this service suite requires privileged access to our other systems. anyone who wanted to run an equivalent service and slurp in our build logs could do so18:10
clarkband ya if we're already doing all that effort then maybe it makes more sense for ^ ? I dunno18:11
sean-k-mooneyfungi: i guess the main issue is jsut the data taransfer18:11
clarkbI know I don't have time for that effort. historically its been a fairly full time job to sort that stuff out over a few weeks because ES and friends hide all the important features behind their products18:11
fungisean-k-mooney: no more than it is now. we already transfer that data back and forth between the systems doing this18:11
clarkbThat said I can probably help guide others going through the process as I've done it a couple of times years ago18:11
sean-k-mooneyfungi: right i just ment if i wanted to self host an elk stack and have a zull job that just listened to the first part ci comment and then upload the logs to my instance to process its proably more data then my isp would be happy with18:13
sean-k-mooneyim assuming we are talking about several TBs of logs a day18:14
fungisean-k-mooney: maybe. kinda figured "someone" running an equivalent service would be somebody who could convince their employer to have some free systems and bandwidth on the internet somewhere18:14
sean-k-mooneyhehe that would be the sensible approch18:15
clarkbsean-k-mooney: its 6TB total but 5TB effective storage due to the way replicas work. We have set our rotation to 7 days18:15
clarkbsean-k-mooney: so little bit less than 1TB / day18:15
clarkbif we didn't filter debug logs it would fill our disks in one day probably18:15
fungiand that's the exploded size the way es indexes it all too. the compressed logfiles would be orders of magnitude less18:15
fungioh, good point, you do wind up pulling debug loglines i suppose18:16
*** sshnaidm is now known as sshnaidm|afk18:17
clarkband to be clear I'm totally ok with discussing this further and taking our time to work through it. That is largely why I didn't say "I'm doing this next week". Because I expected some discussionj and wanted to make sure we listened :)18:18
fungiactually, cacti probably provides a convenient window into the bandwidth requirements18:20
fungihttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1275&rra_id=all18:21
sean-k-mooneyit would also proably be less if i was monitoring only a subset of project i work on rather then everyting. or just failed jobs18:21
* sean-k-mooney this is totally not an excuse to use the excessinve amount of home storeat i bought a few months ago18:22
fungithat's one of the 20 workers, looks like the inbound is <2Mbps but outbound is much larger, i wouldn't have expected that. still, if the elk stack was colocated with the workers then that would be "local"18:22
fungiso based on that sample, it would probably peak at <50Mbps download18:23
sean-k-mooneyso about 15.5TB per month18:25
sean-k-mooneyif peak was continuous18:25
fungiwell, if it were continuous which it's not, i was talking about peak utilization18:25
fungiyeah18:25
fungii'm guessing it's closer to half that18:25
fungicacti says average throughput for the month on logstash-worker01 was 651.23Kbps18:26
clarkbfungi: inbound is compressed outbound is not18:27
fungiright18:27
fungibut outbound is also filtered?18:27
clarkbfungi: only to remove debug which is only done in files with debug formatted such that we can filter it :)18:28
fungisean-k-mooney: my math says based on that average logstash-worker01 would have transferred just shy of 200MiB in 30.5 days, so call it 3.9TiB/mo18:30
clarkbfungi: * 2018:30
fungiclarkb: yep, that's with the *20 thrown in18:31
clarkbah ok18:31
fungiif all the workers are transferring equal amounts, then together they would have downloaded a little under 4TiB from various swift endpoints in a month18:32
sean-k-mooneyya i know my fiar usage limit used to be 1TB a month i think its higher now but i proably could only host a subset of logs realistically at home18:34
clarkbone thing to keep in mind is that data on the wire doesn't map to data on disk either due to how ES works so don't equate those two18:35
sean-k-mooneywell unless i upgrade to a busness package which i have been treatening to do for a while18:35
clarkbelasticsearch takes X data and stores it with X*somenumber bytes18:35
fungiworker02 was lower than 01 by roughly a third: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1253&rra_id=all18:35
sean-k-mooneyclarkb: i may or may not have 80TB of raw storage in a diskshlef at home currently....18:35
clarkbwow18:35
sean-k-mooneyi probaly should have spent it on a comptue server instead but i bought one in november18:36
sean-k-mooneyi get a PowerVault MD3060e18:37
sean-k-mooney*got18:37
clarkbworth mentioning that we haven't been idle either. I've been upgrading the entirety of our zuul, nodepool, and zookeeper cluster. Other than a single service restart to pick up the new zk server locations I don't know that anyone has noticed :) There will be one more restart when we replace the scheduler (this week I hope). I have also been working on replacing config management for18:42
clarkbmailman with ansible. And we'll do an upgrade of that server soon too I hope. ianw has been working on a gerrit server upgrade which increases available memory. Once that migration is done I also want to upgrade to gerrit 3.318:42
sean-k-mooneyyep i know ye do a lot of work to keep everything running and i definetly apriciate that18:43
*** lajoskatona has joined #openstack-infra18:46
clarkbwe also got a new cloud enrolled to nodepool (though with small number of reseources. We can expand those resources but it will require us to do provider specific executors and some other things I haven't wanted to think about yet)18:47
clarkbAnother struggle is we try to go slushy when everyone is doing releases and now we are in catch up mode18:47
*** lajoskatona has quit IRC18:49
*** fungi has quit IRC18:54
*** fungi has joined #openstack-infra18:59
*** fungi has quit IRC19:20
*** ysirndjuro has joined #openstack-infra19:26
*** fungi has joined #openstack-infra19:29
*** nweinber has quit IRC20:16
*** zxiiro has joined #openstack-infra20:23
*** rlandy is now known as rlandy|biab20:24
*** vishalmanchanda has quit IRC20:27
*** jamesden_ has quit IRC20:37
*** sboyron has quit IRC20:42
*** jamesdenton has joined #openstack-infra21:07
*** rlandy|biab is now known as rlandy21:11
*** hasharDinner has quit IRC22:02
*** iurygregory has quit IRC22:07
*** iurygregory has joined #openstack-infra22:11
*** rcernin has joined #openstack-infra22:41
*** hamalq has joined #openstack-infra22:42
*** rcernin has quit IRC22:55
*** tosky has quit IRC22:57
*** rcernin has joined #openstack-infra23:01
*** rcernin has quit IRC23:03
*** rcernin has joined #openstack-infra23:03
*** chaconpiza has quit IRC23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!