clarkb | my intention today is to reboot the server and resync data and ensure that reboots behave as expected and syncing is still about as quick as the last time I did it. If that checks out then I'd like to reduce the dns ttl today | 14:46 |
---|---|---|
clarkb | ok last warning re the review03 reboot. In particular it will kill the screen session there. I'll plan to start that in ~10-15 minutes | 15:21 |
fungi | no objection from me | 15:21 |
fungi | though i need to step away to run a couple of quick errands in a few minutes | 15:21 |
clarkb | ya I don't expect problems mostly just warning people about losing that context which isn't the end of the world | 15:22 |
fungi | okay, headed out now, should be back shortly | 15:32 |
clarkb | and review03 is rebooting | 15:33 |
clarkb | when I rebooted teh containers were running. When it comes back I expect them to not be running | 15:34 |
clarkb | wow that reboot was very fast. And confirmed `docker ps -a` shows the containers have existed and are not running | 15:34 |
clarkb | I'll leave thing shutdown for the moment as I'm going to resync data before starting again so that I can collect more sync timing data | 15:35 |
clarkb | I think I figured out why the first index sync was so large/slow. We (gerrit really) seeems to keep old index versions around. So we had to copy all of the old data and the active data. Now that the old data is synced it isn't updated and we ignore it on subsequent rsyncs | 15:46 |
opendevreview | James E. Blair proposed openstack/project-config master: Temporarily stop loading nodesets from zuul-providers https://review.opendev.org/c/openstack/project-config/+/947605 | 15:46 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add a copy of the nodepool nodesets https://review.opendev.org/c/opendev/zuul-providers/+/947606 | 15:47 |
opendevreview | James E. Blair proposed openstack/project-config master: Use zuul-providers for nodesets in opendev/zuul tenants https://review.opendev.org/c/openstack/project-config/+/947607 | 15:49 |
corvus | clarkb: if you have a sec to review those, that would be great. the plan is in the first commit msg | 15:49 |
clarkb | corvus: yup just started on that. I left a comment on the second one witha qusetion | 15:50 |
clarkb | the two bookend changes lgtm and make sense. But the one in the middle has a data mapping thing I'm not sure about | 15:51 |
corvus | replied... and i'm writing a change that may make it more clear when you see it in action. will have that in a min. | 15:53 |
clarkb | corvus: aha that explains it. Thanks | 15:53 |
clarkb | because on the node request side everything is going into the same queue and niz will handle it if it can and nodepool will handle it if it can and the labels are disjoint right now so tehy don't fight over labels just over quota | 15:54 |
corvus | yep. and we'll keep the labels disjoint until nodepool is completely retired. | 15:54 |
corvus | but we're now at a stage where, if we want to switch tenants back and forth, we should no longer keep the nodesets disjoint | 15:54 |
corvus | (because disjoint nodesets means way too many changes to individual projects) | 15:55 |
corvus | by changing the meaning of the "ubuntu-noble" nodeset, we can switch over a whole bunch of jobs in a tenant at once | 15:56 |
corvus | (we won't catch jobs that define their own nodesets, but we'll get a lot) | 15:56 |
corvus | when we're ready for everything to use niz, then we switch the labels over | 15:56 |
corvus | (and even that can still be done one at a time) | 15:57 |
clarkb | yup | 15:57 |
clarkb | review03 has been synced with additional timing data added to the etherpad. I have also started containers again and things continue to appaer to be happy | 15:57 |
clarkb | I think we are ready to lnad https://review.opendev.org/c/opendev/zone-opendev.org/+/947136 and also do the associated record update in the openstack.org zone cc fungi for when you get back | 15:58 |
clarkb | but feel free to ssh in look at the server, update your /etc/hosts to point review.o.o at the server and login and perform read/write operations through the web ui or even push a change if you want | 15:59 |
clarkb | I'm going to keep doing that sort of testing through the day, but I've been trying to keep the laptop as the only host with /etc/hosts overridden so that I don't confuse myself and I'm not on the laptop right now | 15:59 |
clarkb | fungi: I've also thought about the dns problem for the actual switchover and I raelly like that we can use the dns update as test of replication config before we actually cut over. I think the risk with rolling back an update to that repo is low because dns uses a serial number on the zone file contents making this an excellent guinea pig | 16:01 |
clarkb | basically even if something goes wrong we'll continue to be in a known state rolling forward and can sort out from there | 16:02 |
clarkb | the downside is as you point out a longer outage | 16:02 |
clarkb | but I think that it may be worthwhile in this case? | 16:02 |
clarkb | ok I synced indexes after git repos and that creates problems. I thought this would be the correct way but I guess gerrit looks at the eindex then expects any info in there to be present in the git repos | 16:09 |
clarkb | and when they aren't you get exceptions. I'm going to stop gerrit again. And resync git repos so that the index is older and get these exceptions to clean up (that way we don't have the noise while testing) | 16:09 |
fungi | okay, back and catching up | 16:11 |
clarkb | gerrit on 03 is happier now that I've done things in the other order. I'll update the etherpad to note that the old assumed correct way is wrong | 16:15 |
fungi | on using the dns change as a canary, there's always the option of making a separate no-op change ahead of time that we can test with immediately following cut-over | 16:16 |
clarkb | true, I guess my thought was dns is resilient to rolling forward and back if things go wrong. But we could pick a noop change in a low impact project too (eg not project-config or system-config. Maybe bindep or similar) | 16:19 |
fungi | also speaking of replication it just dawned on me to check the iptables rules on gitea backends, but looks like they allow ssh from everywhere and aren't limited to gerrit servers | 16:19 |
fungi | we could even still use the dns zone repo, just increment the soa serial with no other records changed | 16:20 |
clarkb | oh thats a good thing to check. But ya I think we rely on auth for that | 16:20 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Move some nodepool nodesets to niz https://review.opendev.org/c/opendev/zuul-providers/+/947609 | 16:20 |
clarkb | fungi: oh I like that. dig would still show the soa serial value and we could look at gitea repo head | 16:20 |
fungi | yeah, gives us end-to-end test of a simple deploy sequence too | 16:22 |
clarkb | if we go that route the plan would basically be put review02 and 03 in the emergency file, do a pre sync of index and git data, then just before 1600 UTC merge the dns update (maybe force merge it so that we get in before the hourly jobs), once deployed shutdown gerrit on both 02 and 03, rerun syncs, copy replication config from 02 to 03, start gerrit on 03, quickly force merge the | 16:28 |
clarkb | serial bump change and confirm replication is happy, this should also theoretically automatically deploy if zuul has reconnected. Then followup with cleanups and so on | 16:28 |
clarkb | fungi: ^ does that roughly sound right to you? I'll wiork on updating the etherpad if so | 16:28 |
opendevreview | Merged openstack/project-config master: Temporarily stop loading nodesets from zuul-providers https://review.opendev.org/c/openstack/project-config/+/947605 | 16:30 |
fungi | clarkb: yeah, that seems like the best option we have for minimizing the duration of the overall outage | 16:32 |
clarkb | fungi: ok I updated the etherpad. If hyou have a moment a quick skim of that would be great | 16:36 |
clarkb | its a bit more details than my notes above | 16:36 |
clarkb | fungi: and then if you think we're safe to reduce TTLs I think today is a good day to do that since you're out tomorrow and the review.openstack.org record will need your intervention | 16:39 |
fungi | clarkb: the dns update for review.openstack.org currently on line 108 could move up to between lines 95 and 96? | 16:40 |
fungi | that'll give it more time to propagate, similar to the review.opendev.org record | 16:40 |
clarkb | ++ | 16:40 |
fungi | anyway, plan as outlined there lgtm, i'll get started on the advance dns updates | 16:44 |
corvus | i sent the email drafted tues/wed about images to the service-discuss list | 16:45 |
fungi | thanks! | 16:45 |
corvus | https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/ZLZ7OUFAOAZ7OS2PO2MHGJJKOBYVWB3G/ | 16:46 |
opendevreview | Merged opendev/zone-opendev.org master: Reduce the review.o.o record TTL https://review.opendev.org/c/opendev/zone-opendev.org/+/947136 | 16:46 |
opendevreview | Merged opendev/zuul-providers master: Add a copy of the nodepool nodesets https://review.opendev.org/c/opendev/zuul-providers/+/947606 | 16:51 |
corvus | what's changing about review.openstack.org? | 16:52 |
clarkb | corvus: it points at review02.opendev.org like review.opendev.org does and we need to switch the CNAME record to point to review03.opendev.org as part of the outage and swap. | 16:53 |
corvus | isn't it just cname? | 16:53 |
clarkb | corvus: but openstack.org is hosted in cloudflare now so fungi needs to do it | 16:53 |
clarkb | oh wait maybe it points at review.opendev.org? | 16:53 |
corvus | it's not a cname to review.opendev.org | 16:53 |
clarkb | ok sorry for all the confusion it does point to review.opendev.org which points to review02.opendev.org | 16:54 |
clarkb | so I guess we don't have to do anything for it | 16:54 |
corvus | ++ | 16:54 |
clarkb | fungi: ^ fyi | 16:54 |
fungi | good point | 16:54 |
fungi | yeah, i'll just set the ttl on it back to "auto" | 16:54 |
clarkb | I thought I looked this up and it was not this way but I must've misread the dig output since both are listed | 16:54 |
clarkb | fungi: thanks | 16:54 |
clarkb | I'll update the etherpad to drop openstack.org mentions | 16:54 |
corvus | yeah dig being helpful i tihnk | 16:54 |
fungi | it didn't even dawn on me that it would be a cname to a cname, since that was highly discouraged back in ye olden times | 16:54 |
fungi | parts of my mind are still stuck in the dawn of the internet | 16:55 |
corvus | a good choice in this instance though i think | 16:55 |
clarkb | ++ | 16:56 |
clarkb | I see the new ttl showing up in dns too so I'll mark those steps done on the etherpad as well | 16:56 |
fungi | yeah, i don't think it's really problematic these days with ~everyone relying on caching resolvers that helpfully include records in advance that they think you're going to query next | 16:56 |
fungi | so any more if you make a query for review.openstack.org your resolver is probably going to tell you it's a cname to review.opendev.org and also mention that review.opendev.org is a cname to review02.opendev.org and even go so far as to include the a and aaaa records for review02.opendev.org while at it | 16:57 |
fungi | and you end up making only one dns query instead of 3 | 16:58 |
opendevreview | Merged openstack/project-config master: Use zuul-providers for nodesets in opendev/zuul tenants https://review.opendev.org/c/openstack/project-config/+/947607 | 16:59 |
clarkb | that change is deploying now. Got in just before the hourlies | 17:00 |
clarkb | infra-root https://etherpad.opendev.org/p/opendev_newsletter was meant to go out last month but got delayed for reasons. I've done a quick edit to make it relevant to this month and I expect it to go out this month | 17:09 |
clarkb | let me know if you see any problems with my edits | 17:09 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Move some nodepool nodesets to niz https://review.opendev.org/c/opendev/zuul-providers/+/947609 | 17:11 |
corvus | clarkb: ^ that one makes the switch | 17:11 |
corvus | newsletter lgtm | 17:12 |
clarkb | corvus: +2 | 17:13 |
corvus | haha look that change is building new images -- because we switched the underlying images for those jobs from nodepool to niz! | 17:13 |
clarkb | corvus: we can recheck https://review.opendev.org/c/opendev/system-config/+/947165 after 947609 lands and that should use the jammy nodes | 17:13 |
clarkb | oh except we use custom nodesets in the system-config-run jobs | 17:13 |
clarkb | so maybe not | 17:13 |
corvus | i find that really amusing, but also, wonderfully correct! | 17:14 |
clarkb | bindep and git-revie are probably better | 17:14 |
clarkb | corvus: but doesn't niz already have those images? | 17:14 |
corvus | clarkb: all the system-config jobs run in the openstack tenant, right? | 17:15 |
clarkb | corvus: yup I just realized the tenant is wrong too | 17:15 |
clarkb | corvus: bindep and git review are better canaries | 17:15 |
corvus | clarkb: it does, but because they use the nodesets, that change is switching the *label* they use | 17:15 |
clarkb | corvus: oh! the image build jobs have a new label and therefore we need to rebuild images because the job updated | 17:16 |
clarkb | now I understand | 17:16 |
corvus | if we wanted to insulate the build jobs from this (perhaps so we could more easily flip things back and forth) we could take a one-time hit to move those jobs to specifying labels rather than using the nodesets, then we can change the nodesets instantly without affecting the jobs | 17:16 |
corvus | i think i may do that. | 17:16 |
clarkb | more iterations on things from the testing is probably not a bad idea. But your described plan may also better represent a transition in other zuuls | 17:17 |
clarkb | as you'd bootstrap with image A then switch to image B to build images later | 17:17 |
corvus | what i like about it is that it lets us instantly revert to nodepool if we see a problem | 17:17 |
corvus | otherwise i wouldn't care too much | 17:18 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Move some nodepool nodesets to niz https://review.opendev.org/c/opendev/zuul-providers/+/947609 | 17:21 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Temporarily pin image build job labels https://review.opendev.org/c/opendev/zuul-providers/+/947613 | 17:21 |
corvus | clarkb: what do you think of that? if i'm reading that right, neither of those should trigger image builds now. | 17:21 |
clarkb | corvus: there is a typo error in the base one | 17:22 |
clarkb | but I think that looks right to me | 17:22 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Temporarily pin image build job labels https://review.opendev.org/c/opendev/zuul-providers/+/947613 | 17:22 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Move some nodepool nodesets to niz https://review.opendev.org/c/opendev/zuul-providers/+/947609 | 17:22 |
clarkb | hrm its still rebuilding. | 17:23 |
clarkb | maybe beacuse the nodeset itself is changing from foo to anonymous? | 17:23 |
clarkb | and once we're anonymous it will stop rebuilding? | 17:23 |
corvus | hrm, yeah, i guess the comparison is done before dereferencing that | 17:23 |
corvus | if that's the case... then maybe we should drop that change | 17:24 |
corvus | no... hrm. i'm puzzled. | 17:25 |
corvus | oh, it's probably the fact that the anonymous nodeset doesn't have a name, so it still looks different. | 17:27 |
clarkb | ya that is what I wondered about. The name changes even if the labels don't so we get a delta detected. Once this change is in then we should stabilize and be able to flip back and forth? | 17:28 |
corvus | okay, i think we should stick with this stack. i think we're likely to take the one-time hit to switch to the anonymous nodeset, then -- yes, exactly -- we should be able to change the named ones without affecting it | 17:28 |
corvus | (and when we switch the image jobs back to using named nodesets, one more image build round) | 17:28 |
corvus | i've gone ahead and approved those. they can do their image builds in gate. | 17:29 |
clarkb | sounds good | 17:29 |
clarkb | no response to my question about sigint vs sighup handling in gerrit on discord yet | 17:31 |
clarkb | I think my rough plan for today is to switch to the laptop after lunch and start more targetted testing of gerrit functionality that way | 17:31 |
clarkb | otherwise I think I just need to get that SOA record edit pushed up so it is ready for us and I'm feely fairly ready | 17:32 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Update the SOA serial https://review.opendev.org/c/opendev/zone-opendev.org/+/947614 | 17:34 |
clarkb | fungi: ^ that is what you had in mind right? | 17:35 |
fungi | yeah, though when i read the plan i thought you were also wanting to test pushing a change to gerrit, so figured it would be created at that time | 17:36 |
clarkb | I want to do both things but I don't want to couple them together. The primary reason is that I can test everything but replication ahead of time so I'm less owrried about that stuff | 17:42 |
fungi | https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/O4THT732BMZWACYFZKVRDFHAYOG2G44Z/ mentions the test node image call for help, to see if we can get some attention on it from openstack quarters | 17:42 |
clarkb | so I want to get replication tested as quickly as possible | 17:42 |
fungi | makes sense | 17:42 |
clarkb | an in theory replication should work as we manage host keys for giteas in ansible and that was histoprically the main snag | 17:43 |
clarkb | we can do another soa serial bump as a followup to test the other bits if replication is happy | 17:45 |
fungi | sure, simple enough | 17:45 |
clarkb | I skipped breakfast today. Going to figure out early lunch now then maybe do some quick yardwork before looking at review03 testing from my laptop | 18:00 |
NeilHanlon | o/ | 18:05 |
* NeilHanlon answers the batman signal for image help | 18:05 | |
corvus | fungi: thanks! | 18:06 |
corvus | also nice that worked :) | 18:06 |
corvus | NeilHanlon: i think my message had some links at the bottom to get started. please let me/us know if you have any questions! | 18:07 |
NeilHanlon | corvus++ yep, thank you :) gonna see if I can drag some coworkers in, too :D | 18:08 |
corvus | great! | 18:08 |
JayF | fungi: to be clear re: the email; will we be unable to run arm unit tests for openstack if nobody volunteers to update the image? | 18:41 |
fungi | JayF: basically, yes | 18:44 |
fungi | and help with ongoing maintenance tasks from time to time | 18:45 |
JayF | ack. interestingly enough, it would *not* cause Ironic's aarch64 guest tempest testing to fail | 18:46 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Move OSA sync to integrated repository https://review.opendev.org/c/openstack/project-config/+/947628 | 19:37 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Deprecate openstack-ansible-tests repository https://review.opendev.org/c/openstack/project-config/+/947629 | 19:37 |
fungi | infra-root: oh, also a reminder, i'm afk all day tomorrow | 19:51 |
clarkb | ok back. My plan is to create a change on review03. Then I'm going to shutdown gerrit on review03, resync git and indexes and delete caches and make sure that when I turn it back on again there are no problems with that change number being looked up after being updated from 02 | 20:34 |
clarkb | because in theory I'm going to create a change id collision and my sync process should clear that out and I want to check that | 20:34 |
opendevreview | Merged opendev/zuul-providers master: Temporarily pin image build job labels https://review.opendev.org/c/opendev/zuul-providers/+/947613 | 20:36 |
opendevreview | Merged opendev/zuul-providers master: Move some nodepool nodesets to niz https://review.opendev.org/c/opendev/zuul-providers/+/947609 | 20:36 |
corvus | i rechecked a change in the zuul tenant, so that should be using mostly niz nodes now | 20:37 |
corvus | clarkb: sounds logical | 20:37 |
clarkb | compare https://review03.opendev.org/c/opendev/bindep/+/947611 to https://review.opendev.org/c/openstack/governance/+/947611 | 20:41 |
clarkb | I think that shows you can push changes to the new server and that works. | 20:42 |
clarkb | I'll leave that as is for a few so that yall can look if you like then proceed with cleaning that up on 03 and ensuring its happy | 20:42 |
corvus | clarkb: a strange thing happened... 1 sec | 20:51 |
corvus | https://imgur.com/a/r64rSpG | 20:52 |
corvus | first image is what happened when i clicked on the diff for test_depends.py the first time. i opened a new empty tab, went to the change url, clicked the file again and got that. | 20:52 |
clarkb | corvus: and this was using review03 urls both times (so we probably don't need to worry about caching oddities with review.o.o?) | 20:53 |
corvus | maybe my browser had a sad. maybe the internet did. or maybe gerrit had an error? | 20:53 |
corvus | correct, haven't looked at the governance change yet | 20:53 |
corvus | i opened the devtools panel on the bad tab, and it said it crashed due to oom. the good tab works fine. | 20:54 |
clarkb | weird. fwiw that did not happen to me either view review03 urls or my /etc/hosts overriden urls on laptop. | 20:54 |
clarkb | I just tried in incognito too | 20:54 |
clarkb | that looks similar to what happens when caches are stale on gerrit startup though | 20:55 |
clarkb | except in this case you got line numbers (the stale cache gets you one line I think) | 20:55 |
corvus | i'd like to say this is probably just local browser problem. like 95% confidence based on evidence so far. but hey, we're testing, so i mention it. | 20:55 |
clarkb | I'm going to look at server logs just to see if there is anything interesting there but I agree | 20:55 |
corvus | i will now quit my browser and restart to free the memories. :) | 20:56 |
clarkb | corvus: do you want me to leave review03 up or are you done? | 20:56 |
clarkb | nothing in the logs that stands out | 20:56 |
corvus | done. i checked one more time with a fresh browser and everything looks good. | 20:57 |
clarkb | ok I'm going to restore and resync and then 03 should load the governance change instead | 20:57 |
clarkb | https://review03.opendev.org/c/openstack/governance/+/947611 now loads | 21:05 |
clarkb | and https://review03.opendev.org/q/project:opendev/bindep has no knowledge of my previous test change | 21:06 |
clarkb | so I think this works out. I did clear caches too which other than git and index would be the only place I expect problems | 21:07 |
clarkb | and the log is clear after that last restart | 21:09 |
clarkb | at this point I feel like doing more nda more testing is likely to have diminishing returns | 21:11 |
clarkb | the synchronization process seems reliable and basic functionality has been confirmed afterwards | 21:11 |
clarkb | I'm thinking about testing sighup vs sigint in gerrit. I don't think strace will do waht I want. I need the java version of strace but if I had that I could compare the two and be confident that sigint is ok | 21:46 |
fungi | why wouldn't strace work? it should at least indicate the engagement of the top-level signal handler in that process, right? | 21:47 |
fungi | or you want to see what happens internally to confirm it's still a graceful shutdown? | 21:48 |
clarkb | right I want to see that the code running in the jvm is executing graceful shutdown | 21:53 |
clarkb | that is entirely opaque to strace I think | 21:53 |
clarkb | looks like bcc javaflow.sh is what I want | 21:53 |
clarkb | let me hold a node then I can try that | 21:53 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to test sigint vs sighup https://review.opendev.org/c/opendev/system-config/+/893571 | 21:57 |
clarkb | can also run strace at the same time and see if it does expose useful info | 21:58 |
clarkb | but what I want to see in particular is that the jvm calls gerrit internal shutdown method for greaceful stopping whethre sigint or sighup is used | 21:59 |
clarkb | and can confirm that it doesn't happen with sigkill | 21:59 |
clarkb | https://github.com/iovisor/bcc/blob/master/tools/lib/uflow_example.txt should be able to do it based on that documentation | 21:59 |
fungi | yeah, makes sense | 22:00 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to test sigint vs sighup https://review.opendev.org/c/opendev/system-config/+/893571 | 22:28 |
clarkb | the first pass tried to build docker images and hit rate limits there. I've updated the change to use the current imageand hopefully we'll avoid the rate limits | 22:29 |
clarkb | oh shoot I disabled the system-config run job not the image build job | 22:29 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to test sigint vs sighup https://review.opendev.org/c/opendev/system-config/+/893571 | 22:31 |
clarkb | lets see if that works better | 22:32 |
clarkb | that did manage to hold a node. I installed the bcc ebpf tools but uflow and javaflow don't end up tracing anything. I'll have to dig in more. Maybe things need to be enabled more properly in the kernel or something | 23:00 |
clarkb | but I think thats about it for today. I'll dig in mor etomorrow | 23:00 |
clarkb | if this works it woud be a really neat tool to have on the toolbelt | 23:01 |
clarkb | `startup flag "-XX:+ExtendedDTraceProbes" is required` | 23:02 |
clarkb | that explains it | 23:02 |
clarkb | got it "working" | 23:08 |
clarkb | the -C flag didn't work like I expected it to (I get not content after trying to filter for the class I was interested in). So then I tried doing a collect everything run and after a few second sit rwote a half gig file | 23:09 |
clarkb | ok I thought I was done until I discovered that piece of documentation and I think I have successfully teste dthis | 23:17 |
clarkb | https://paste.opendev.org/show/biQgQ57k5gXuv2UdkP5t/ | 23:18 |
clarkb | the kill and uflow commands are in different shells so not run in order that way I just wanted to show the pids matching up with the output behavior | 23:18 |
clarkb | both -INT and -HUP produce the same calls | 23:18 |
clarkb | -9 doesn't call it at all. This is all what I expected given what the docs say and I think it is safe to use sigint now | 23:18 |
clarkb | also this might be the coolest tool ever. WOrks with python too if you build it with dtrace support | 23:19 |
corvus | clarkb: those are the same calls? | 23:20 |
corvus | ShutdownCallbackhodHand != ShutdownCallbackdSelect | 23:21 |
clarkb | oh hrm. | 23:21 |
corvus | but also, i don't know what a Callbackhod is -- could it be just terminal formatting or something? | 23:22 |
clarkb | ya I'm wondering I don't recall those strings showing up in the gerrit source but I'll double check | 23:22 |
corvus | or maybe something in uflow's reflection or whatever is slightly corrupt... | 23:23 |
clarkb | private static class ShutdownCallback extends Thread | 23:23 |
corvus | maybe just missing a terminating null | 23:23 |
clarkb | but there isn't a ShutdownCallbackdSelect or ShutdownCallbackhodHand | 23:23 |
corvus | and we should just read that as ShutdownCallback....... | 23:24 |
clarkb | I think the bcc tool is just a python script that compiles some ebpf. I may be able to dig into that and widen the output or something | 23:24 |
clarkb | ya I wonder if this being a thread is something that causes it to show up oddly depending on when the ebpf catches it? | 23:25 |
clarkb | I also noticd when I got the half gig of raw output that it emits notes to stderr that it is skipping calls too | 23:25 |
clarkb | I guess not a clear indiciator yet but a promising path | 23:25 |
clarkb | I'll dig in more tomorrow and try to get a better understanding | 23:25 |
corvus | yeah. i think i'm positing that looks a lot like a strncopy of something missing a null terminator | 23:26 |
corvus | i agree though, the stack looks suspiciously similar and based on the fact that the difference is gibberish, it's probably a good chance they're the same | 23:26 |
Clark[m] | My laptop just died. I didn't think it was low battery but I guess it was | 23:27 |
corvus | clarkb: you could repeat the experiment, see if you get different gibberish | 23:27 |
corvus | that might help confirm the theory that, whatever the cause, we can ignore those 7 chars. | 23:27 |
Clark[m] | ++ I was going to suggest repeating it a few times and then laptop said I'm done | 23:27 |
clarkb | ok it was low battery and decided to hibernate to disk because everything is still here after plugging in. Neat | 23:30 |
corvus | you hit the jackpot | 23:32 |
clarkb | but ya I'll do a more scientific appraoch repeating the experiment for both sigint and sighup a few times to see if those values vary | 23:33 |
clarkb | and really this is a neat tool once you can narrow down what you are looking for | 23:33 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!