View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002735 | Ham Radio Deluxe | Bug | public | 2018-05-23 10:25 | 2018-09-28 21:42 |
Reporter | KB3NPH | ||||
Assigned To | K7ZCZ | ||||
Priority | high | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | |||||
Target Version | Fixed in Version | 6.4.0.888 | |||
Summary | 0002735: Logbook Lockup | ||||
Description | Another customer reporting the Logbook just stops working for no reason. We have received many of these reports with V6.4.0.840 and really need something done to find the issue. I've attached a screenshot of the display when this happens. I also got a dump file that can be looked at to see if there is any indication there as to what is causing the problem. | ||||
Additional Information | Ticket #986640 | ||||
Tags | No tags attached. | ||||
Module | Logbook | ||||
Sub-Module | Functional | ||||
Testing | Beta Successful | ||||
related to | 0002813 | closed | K7ZCZ | Ham Radio Deluxe | Logbook: crashes when freeing REQUEST_SIGNAL_DATA structure in CountryLookup() function |
related to | 0002732 | closed | PD9FER | 4 - Closed w/o Action | Logbook crashing when idle |
related to | 0002862 | closed | K7ZCZ | Ham Radio Deluxe | Logbook: Bad thread synchronization around lookup calls and REQUEST_SIGNAL_DATA structure |
|
|
|
Dump file located: \Team Drives\HRD Software\Dumps\Mantis ID 0002735Minideluxe.v2.1.zip |
|
The dump file doesn't contain a dump. Instead, it's got a program named "MiniDeluxe" in it. Please provide a dump file so this issue can be investigated. |
|
Ticket #787583 - Here is a dump file from a customer who I instructed on how to do the "forced" dump using Task Manager. I had spent considerable time in his computer trying to get the issue to act up so I could get the dump myself, but was unable to. Hope this helps get this problem resolved once and for all. The dump file is located in the Google drive DUMPS folder. Mantis ID 0002735 - HRDLogbook.zip |
|
I had several customers tell me that this problem with the Logbook stopping working for no reason seemed to start in the release where we removed the IE from the software and started sending all those calls to the external browsers. Over Field Day, Kevin was working using the current version. He called me on the phone and told me he was having that problem where the Logbook stopped working quite often. Today we were talking about that and he said that during Field Day, he was NOT connected to the internet. He said that when he got home and was able to connect to the internet, he still had the problem, however it was not as frequent. Is it possible that something that was put in or not removed from the code in order to access the external browser could be the problem with the current issue we're having? |
|
To believe the browser change was related to this issue, I'd need some amount of evidence greater than coincidence. Many changes were made in that same release, and any one of them could have caused (or revealed) this problem. |
|
Debugging hangs without any repro case is exceptionally hard, since the apps state doesn't stop at the problem like it does with a crash. The dump provides a snapshot of the app as it's hanging -- we get no other clues. The report provided here contains no information other than the report that the "logbook just stops working for no reason". In examining the associated ticket (#986640) I see that it was opened on April 8, 2018. The code to remove the internal browser wasn't checked in until April 26, and didn't ship until a week after that. These facts exclude not only the external browser change, but that whole build altogether, as the customer reports very frequent hangs ("10 time in one hours"). The investigation of this issue is hampered by the lack of information. Further examination of the ticket indicates that what the user has reported doesn't match what's reported here in Mantis. The customer says he's using Build 805 of the product, but the dump is from Build 846. The customer says that he gets a "looking for problem" message from Windows. That indicates there was a crash, not a hang. It also means that our crash handler has not been hit (if it were, it would write the minidump). But in this condition, it should be possible to get Windows to write a minidump of its own directly, right at the crash point. Instructions for doing this can be found here: https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps There should also be an entry in the Windows Event Log about the crash, which might indicate a location of a collected dump. Recovering that information should give us a much better lead toward solving the problem. It would also help to know: Was the customer using the Logbook alone, or was he using other apps in the suite? What were the last actions he took before encountering the crash? Which features of the logbook were active? (DX Cluster connected? Background log uploads? Backup? Anything else ...?) The more information we have about the symptoms of this issue, the better chance we have at discovering it's cause. |
|
Debugging hangs without any repro case is exceptionally hard, since the apps state doesn't stop at the problem like it does with a crash. The dump provides a snapshot of the app as it's hanging -- we get no other clues. The report provided here contains no information other than the report that the "logbook just stops working for no reason". In examining the associated ticket (#986640) I see that it was opened on April 8, 2018. The code to remove the internal browser wasn't checked in until April 26, and didn't ship until a week after that. These facts exclude not only the external browser change, but that whole build altogether, as the customer reports very frequent hangs ("10 time in one hours"). The investigation of this issue is hampered by the lack of information. Further examination of the ticket indicates that what the user has reported doesn't match what's reported here in Mantis. The customer says he's using Build 805 of the product, but the dump is from Build 846. The customer says that he gets a "looking for problem" message from Windows. That indicates there was a crash, not a hang. It also means that our crash handler has not been hit (if it were, it would write the minidump). But in this condition, it should be possible to get Windows to write a minidump of its own directly, right at the crash point. Instructions for doing this can be found here: https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps There should also be an entry in the Windows Event Log about the crash, which might indicate a location of a collected dump. Recovering that information should give us a much better lead toward solving the problem. It would also help to know: Was the customer using the Logbook alone, or was he using other apps in the suite? What were the last actions he took before encountering the crash? Which features of the logbook were active? (DX Cluster connected? Background log uploads? Backup? Anything else ...?) The more information we have about the symptoms of this issue, the better chance we have at discovering it's cause. Turning my attention to the attached minidump to see what I can learn: The supplied dump is a 64-bit image, and requires the use of the "!wow64exts.sw" to switch windbg to 32-bit mode. The supplied dump is of build 846. The attached dump contains on crash record. The process is active, so it's not in the state that it would generate an "application not responding" message. All the threads in the dump are marked "frozen" which makes it hard to tell if they were runnable or not. One thread sticks out, which is this one: 0:009:x86> kb # ChildEBP RetAddr Args to Child 00 0ecfc8dc 77a8879f 0000167c 00000001 00000000 ntdll_779e0000!NtWaitForSingleObject+0x15 01 0ecfc960 77a888cd 0ecfcb00 0ecfcb50 00000000 ntdll_779e0000!RtlReportExceptionEx+0x14b 02 0ecfc9b8 77aae98e 0ecfcb00 0ecfcb50 00000000 ntdll_779e0000!RtlReportException+0x86 03 0ecfc9cc 77aaea0b c0000374 0ecfca00 77a54fb4 ntdll_779e0000!RtlpTerminateFailureFilter+0x14 04 0ecfc9d8 77a54fb4 00000000 0ecfceb0 77a0d0d0 ntdll_779e0000!RtlReportCriticalFailure+0x67 05 0ecfc9ec 77a54e59 00000000 00000000 00000000 ntdll_779e0000!_EH4_CallFilterFunc+0x12 06 0ecfca14 77a434c1 fffffffe 0ecfcea0 0ecfcb50 ntdll_779e0000!_except_handler4+0x8e 07 0ecfca38 77a43493 0ecfcb00 0ecfcea0 0ecfcb50 ntdll_779e0000!ExecuteHandler2+0x26 08 0ecfca5c 77a43434 0ecfcb00 0ecfcea0 0ecfcb50 ntdll_779e0000!ExecuteHandler+0x24 09 0ecfcae8 779f0163 00cfcb00 0ecfcb50 0ecfcb00 ntdll_779e0000!RtlDispatchException+0x127 0a 0ecfcb08 77aae9fb 00000001 77ae4270 79638fc1 ntdll_779e0000!KiUserExceptionDispatcher+0xf 0b 0ecfceb0 77aaf901 c0000374 77ae4270 0ecfcef4 ntdll_779e0000!RtlReportCriticalFailure+0x57 0c 0ecfcec0 77aaf9e1 00000002 79638fa1 00000000 ntdll_779e0000!RtlpReportHeapFailure+0x21 0d 0ecfcef4 77a5d9ac 00000009 01b10000 0ee6f974 ntdll_779e0000!RtlpLogHeapFailure+0xa1 0e 0ecfcf24 754014ad 01b10000 00000000 0ee6f974 ntdll_779e0000!RtlFreeHeap+0x64 0f 0ecfcf38 0042db82 01b10000 00000000 0ee6f974 kernel32!HeapFree+0x14 10 0ecfcf4c 007b287d 0ee6f974 efb54ef1 162cffb0 HRDLogbook!free+0x1a [f:\dd\vctools\crt\crtw32\heap\free.c @ 51] 11 0ecffac8 0079a20d 0ee6f974 0ee6f7ec efb54ec1 HRDLogbook!CBackgroundProcessingThread::CountryLookup+0x894d [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadlookups.cpp @ 5259] 12 0ecffaf8 0079a526 162cffb0 00c5a854 efb54f09 HRDLogbook!CBackgroundProcessingThread::ProcessData+0x36d [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthread.cpp @ 582] 13 0ecffb30 0097e210 efb54f61 06e43bb8 01b47518 HRDLogbook!CBackgroundProcessingThread::DoWork+0xa6 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthread.cpp @ 311] 14 0ecffb58 0097e156 efb54fad 00000000 06e43bb8 HRDLogbook!CThinThread::Run+0x80 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 188] 15 0ecffb94 00432cfb 00c5a808 efb54ff5 00000000 HRDLogbook!CThinThread::Start+0x46 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 236] 16 0ecffbcc 00432e23 00000000 0ecffbe4 7540343d HRDLogbook!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 17 0ecffbd8 7540343d 06e43bb8 0ecffc24 77a19832 HRDLogbook!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 18 0ecffbe4 77a19832 06e43bb8 7963bd71 00000000 kernel32!BaseThreadInitThunk+0xe 19 0ecffc24 77a19805 00432da7 06e43bb8 00000000 ntdll_779e0000!__RtlUserThreadStart+0x70 1a 0ecffc3c 00000000 00432da7 06e43bb8 00000000 ntdll_779e0000!_RtlUserThreadStart+0x1b The CountryLookup() function in the BackgroundProcessingThreadLookups thread in the Logbook is an example of terrible software engineering. The function has awful error handling. It's also approximately 6000 lines long. Yes: one single function is 6000 lines long. The call trapped here shows that the function is about to clean up, it's freeing memory. But the call to free() ends up throwing an exception. It's very odd that the call stack goes through the _except_handler() call without hitting our exception handler (which writes our minidumps). From this call stack, it would seem that the pSignalData parameter to the lookup function contains bad data. The parameter is essentially unused: the only member from the structure ever touched is the m_hWnd in side it, which is used to respond to the request. A good guess might be that the data has already been freed, but in the debugger I have no problem viewing the data. If it were previously deleted, I shouldn't be able to dump the data at the pointer. The only lead we have, then, is that something's wrong with this memory at the moment the thread tried to free it. |
|
The referenced memory is meant to be a REQUEST_SIGNAL_DATA structure. But it's got crap data in it:0:009:x86> dx -r1 ((HRDLogbook!REQUEST_SIGNAL_DATA *)0xee6f974) ((HRDLogbook!REQUEST_SIGNAL_DATA *)0xee6f974) : 0xee6f974 [Type: REQUEST_SIGNAL_DATA *] [+0x000] bFree : 153 [Type: int] [+0x004] hEvent : 0x3 [Type: void *] [+0x008] lpszReturn : 0x229 : "--- memory read error at address 0x00000229 ---" [Type: wchar_t *] [+0x00c] hWnd : 0x1f0 [Type: HWND__ *] [+0x010] lpCallback : 0x3cc [Type: void (*)(wchar_t *,long)] [+0x014] lParam : 3 [Type: long] [+0x018] bEnableLogbook : -2130673663 [Type: int] [+0x01c] bEnableCountry : 368 [Type: int] 0:009:x86> db 0x0ee6f974 0ee6f974 99 00 00 00 03 00 00 00-29 02 00 00 f0 01 00 00 ........)....... 0ee6f984 cc 03 00 00 03 00 00 00-01 80 00 81 70 01 00 00 ............p... 0ee6f994 02 02 00 00 f0 ce 51 16-01 80 00 81 a9 01 00 00 ......Q......... 0ee6f9a4 a6 92 be ef ec f8 e6 0e-7d 2f 52 75 30 fa e6 0e ........}/Ru0... 0ee6f9b4 67 2f ca 71 ce 07 90 90-fe ff ff ff 40 fa e6 0e g/.q........@... 0ee6f9c4 3e 67 52 75 40 00 00 00-e0 fa e6 0e 00 00 00 00 >gRu@........... 0ee6f9d4 00 00 00 00 94 fa e6 0e-24 fa e6 0e d1 c8 9e ef ........$....... 0ee6f9e4 28 6e 4b 16 01 00 00 00-68 b0 20 16 35 88 0c 00 (nK.....h. .5... It might mean that the data was freed the a new block allocated at the same address; or the data was overwritten. The CountryLookup() method is only invoked by two spots in the cdoe. One is synchronous and the memory is allocated on the stack. The other is dynamically allocated and handed off. If the synchronous code were the sender, it would still be waiting for its reply. There's no call stack at that place (which is rom the IP Listener, where a "CountryLookup" or "CountryCallsignLookup" command is being handled.) So the other, asynchronous sender must be the culprit. Thing is, that code is pretty simple: it allocates its memory; then it queues the request to the thread. There's no apparent opportunity for the memory to be freed early and reallocated. |
|
I believe this is fixed with Mantis 2862 |
|
Logbook is stable for me. |
|
Resolved as part of the 888 build. |
Date Modified | Username | Field | Change |
---|---|---|---|
2018-05-23 10:25 | KB3NPH | New Issue | |
2018-05-23 10:25 | KB3NPH | File Added: Logbook Failure.jpg | |
2018-05-23 10:32 | KB3NPH | Note Added: 0005095 | |
2018-05-27 10:08 | K7ZCZ | Assigned To | => KB3NPH |
2018-05-27 10:08 | K7ZCZ | Status | new => feedback |
2018-05-27 10:08 | K7ZCZ | Note Added: 0005115 | |
2018-06-23 17:38 | WA9PIE | Summary | Ticket #986640 - Logbook Lockup => Logbook Lockup |
2018-06-23 17:38 | WA9PIE | Additional Information Updated | View Revisions |
2018-06-27 08:45 | KB3NPH | Note Added: 0005497 | |
2018-06-27 13:06 | KB3NPH | Note Added: 0005502 | |
2018-06-27 14:14 | KB3NPH | Assigned To | KB3NPH => K7ZCZ |
2018-06-28 20:20 | K7ZCZ | Note Added: 0005528 | |
2018-06-28 21:30 | K7ZCZ | Note Added: 0005531 | |
2018-06-28 21:57 | K7ZCZ | Note Added: 0005532 | |
2018-06-28 23:10 | K7ZCZ | Note Added: 0005533 | |
2018-07-23 10:52 | K7ZCZ | Relationship added | related to 0002813 |
2018-07-29 16:44 | K7ZCZ | Relationship added | related to 0002732 |
2018-08-28 15:34 | K7ZCZ | Relationship added | related to 0002862 |
2018-09-11 19:04 | K7ZCZ | Status | feedback => resolved |
2018-09-11 19:04 | K7ZCZ | Resolution | open => fixed |
2018-09-11 19:04 | K7ZCZ | Note Added: 0006201 | |
2018-09-28 07:47 | K7ZCZ | Project | 1 - Backlog => 3 - Current Dev List |
2018-09-28 07:47 | K7ZCZ | Fixed in Version | => 6.4.0.887 |
2018-09-28 10:57 | g3ucq | Note Added: 0006246 | |
2018-09-28 21:40 | WA9PIE | Project | 3 - Current Dev List => Ham Radio Deluxe |
2018-09-28 21:41 | WA9PIE | Testing | Not Started => Beta Successful |
2018-09-28 21:41 | WA9PIE | Fixed in Version | 6.4.0.887 => 6.4.0.888 |
2018-09-28 21:42 | WA9PIE | Status | resolved => closed |
2018-09-28 21:42 | WA9PIE | Note Added: 0006250 |