View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003245||5 - Closed w/o Action||Bug||public||2019-03-13 22:46||2019-06-16 17:06|
|Status||closed||Resolution||no change required|
|Summary||0003245: Logbook freezes and crashes on startup|
|Description||Since I have been creating, deleting, and restoring logs, I began having problems where Logbook freezes and generates a report to Microsoft. I have also forced the creation of a couple dump files.|
|Steps To Reproduce||I can't cause this to happen, so I don't have steps.|
|Tags||No tags attached.|
||The DMP files have been located on Google Drive under "\Team Drives\HRD Software\Dumps\Mantis 3245"|
Two of the dump files are from build 202 (HRDLogbook.DMP and "HRDLogbook (2).DMP") and two are from build 203 ("HRDLogbook (3).DMP" and "HRDLogbook (4).DMP").
Debugging is a bit difficult because there's no crash, just a hang. The captured files are 64-bit dumps, so some aspects of execution aren't apparent.
The dumps all appear to have the same call stack on one thread, which has entered the Access database engine, then called the kernel's Sleep() function. That's pretty odd, but here's what I see:
0:000:x86> ~*kb . 0 Id: 5f14.2700 Suspend: 0 Teb: 01c8e000 Unfrozen # ChildEBP RetAddr Args to Child 00 01efcae8 756e8feb 00000000 01efcb2c 8605abd8 ntdll_77ab0000!NtDelayExecution+0xc 01 01efcb50 756e8f8f 00000032 00000000 00000001 KERNELBASE!SleepEx+0x4b 02 01efcb60 55812758 00000032 0a457510 00000000 KERNELBASE!Sleep+0xf 03 01efcba4 5580972e 00001096 01efcc68 01efcc68 msjet40!System::AllocatePages+0x168 04 01efcbc0 55808d76 0a45752c 00000001 01efcc68 msjet40!PageDesc::WireCurrentPage+0x2e 05 01efcbe0 557ef0c5 078cbfd4 078ece08 00000001 msjet40!PageDesc::ReadPage+0xe6 06 01efcc04 5581af9c 078cbfd4 0000d4ad 00000001 msjet40!Database::ReadPage+0x75 07 01efcc34 5581b707 01efcc68 01efcc68 00000002 msjet40!TableMover::GetNextRow+0xcc 08 01efcc4c 557f693e 00000001 00000002 01efcc68 msjet40!TableMover::Move+0x97 09 01efcc78 557cc811 079713c8 14791380 00000001 msjet40!ErrIsamMove+0x6e 0a 01efcc8c 558109f6 079713c8 000007fa 00000001 msjet40!ErrDispMove2+0x41 0b 01efd64c 557cbfb6 079713c8 000007fa 147907e8 msjet40!ErrIsamCopyRecords+0xb76 0c 01efd674 5583c8e7 079713c8 000007fa 000007f5 msjet40!ErrDispCopyRecords+0x56 0d 01efd8dc 5583c1f6 079713c8 163781d8 00000000 msjet40!ErrMaterializeRows+0x1b8 0e 01efd93c 5583c581 079713c8 16370000 163781d8 msjet40!ErrJPIOpenSort+0x17c 0f 01efd960 5583578c 079713c8 16370000 1637ddc0 msjet40!ErrJPOpenSort+0x113 10 01efd990 558545ab 079713c8 00000409 01efd9c4 msjet40!ErrJPOpenRvt+0x177 11 01efda18 557cab84 16370000 01efda2c 0000001c msjet40!ErrExecuteQuery+0xb68 12 01efda60 75705fcf ffffffff 16370000 00000000 msjet40!ErrUtilVReallocSeg+0x14 13 07b396c0 00000000 ffffffff 00000000 ffffffff KERNELBASE!VirtualQuery+0x1f
In the dumps, the main thread isn't clearly indicated. I don't see any thread which has a WinMain-like entry point. This seems pretty bogus. I do see threads in the IP listener and the DX Cluster; all are waiting for work. But I don't see anything like a message pump or a a WinMain entry.
The main issue I notice, though, is that the version of the Access Database engine in use is almost twenty years old:
0:000:x86> lmDvmmsjet40 Browse full module list start end module name 55790000 55918000 msjet40 (pdb symbols) f:\mysymbolcache\msjet40.pdb\AC52DC0716BF42499778F345076B93C11\msjet40.pdb Loaded symbol image file: msjet40.dll Image path: C:\Windows\SysWOW64\msjet40.dll Image name: msjet40.dll Browse all global symbols functions data Timestamp: Fri Jan 4 11:50:34 2019 (5C2FB90A) CheckSum: 00140CF6 ImageSize: 00188000 File version: 4.0.9801.8 Product version: 4.0.9801.8 File flags: 0 (Mask 3F) File OS: 40004 NT Win32 File type: 2.0 Dll File date: 00000000.00000000 Translations: 0000.04b0 CompanyName: Microsoft Corporation ProductName: Microsoft (R) Jet InternalName: MSJET40.DLL OriginalFilename: MSJET40.DLL ProductVersion: 4.00.9801.8 FileVersion: 4.00.9801.8 FileDescription: Microsoft Jet Engine Library LegalCopyright: Copyright (C) Microsoft Corp. 1993-1999
Customers should be running the newest version of the engine. Jet 4 was released in 1999.
While I don't have a smoking gun, it seems like one way to start would be to have the customer update to a current, supported version fo the Access runtimes.
#1 - I'm "the customer" in this case. I'm the one having the problem on a machine that I've been running each release on.
#2 - I've now got a customer complaint on this one. Here's the ticket: https://support.hamradiodeluxe.com/scp/tickets.php?id=27833
This is a prospective customer.
I'll update a "current supported version of the Access runtime"... but I haven't changed anything or gone backwards with it. If this works, I'll make the same suggestion to the customer. (It doesn't happen every time.)
In other words... this is the same runtime we include in our build, right?
||And by the way... why don't we install the latest version of the Access Runtime (rather than the 2007 version... or Jet version whatever it is that hangs on my computer)? It would seem that we may have fewer problems like this if we moved to a more modern version of Access Runtime.|
Thanks for the clarification. It's not always possible to tell who is reporting an issue, since copy and paste from tickets (for example) doesn't make it clear who the first person is.
Mantis 2827 tracks the older runtime being in setup.
Mantis 2099 tracks the fact that setup doesn't actually install or upgrade the Access runtime.
I haven't worked on these for a couple of different reasons. First, it's always seemed that some other high-priority issue has come up. Next, there's a lot of folk lore around ODBC on the team. Fr example, I recently learned that the support team is trained to install Access 2007 when customers have problems. I can't understand why they're not trained to install a current version of the run times. Until I could investigate that, I didn't want to make any changes.
The version of the Access run times in the minidump is very old; it is from Access 98, I believe. Access 2007 would install ACEODBC.DLL with a version number something like 12.xx.
The newest Access Runtime I know about is for 2013:
This redistributable is 212 megs in size, so it would more than double the size of the HRD download.
There appears to be a newer version of the Access Runtime for Office 2019, which is bundled with the Office 365 runtime kit:
At that link, I don't see a redistributable package; the package downloaded from the links there is a setup runner, which is only 5 megs ... but require an internet connection to successfully download the actual runtime.
Because modern Windows installs have the Access runtime built-in (for example, Windows 10 includes ODBCJT32.DLL with version 10.0.xx by default), it's not clear what the best process for installing the run times is.
||I wonder what other software makers are doing when they have similar needs.|
I'm not sure what needs you're asking about. I assume that other vendors debug just the same way that I am -- piecing together clues and dumps to try to find bad code and fix it. Do you think there's a better way to proceed?
For sure, this is a struggle. Aside from the dump files, there's precious little information provided in this report. I spent a couple of hours trying recreate the trashed call stack to see what more I could find. Here's as far as I got:
0:000:x86> k L=02aada78 # ChildEBP RetAddr 00 02aaca34 76f5c390 ntdll_776e0000!NtReadFile+0xc 01 02aada78 5579a9a9 KERNELBASE!ReadFile+0x90 02 02aada9c 558224ad msjet40!ErrUtilFreeSeg+0x9 03 02aadb7c 55824926 msjet40!ErrQEMCompileQuery+0x5df 04 02aadbd4 5582623d msjet40!ErrExecuteTempQuery+0x10e 05 02aadc04 559c8e9e msjet40!JetExecuteTempQuery+0x90 WARNING: Stack unwind information not available. Following frames may be wrong. 06 02aadec0 559c6a1f odbcjt32!SQLEndTran+0x3ae 07 02aaded0 56e13f42 odbcjt32!SQLExecDirectW+0x8f 08 02aadefc 56e14159 odbc32!SQLExecDirectCover+0x24b 09 02aadf28 013ed312 odbc32!SQLExecDirectW+0xb9 0a 02aadf64 013ed181 HRDLogbook!CRecordset::PrepareAndExecute+0x2e 0b 02aadf94 0115a3bb HRDLogbook!CRecordset::Open+0xf0 0c 02aae1c4 0115903e HRDLogbook!CHRDLogbookView::ReloadNonCustom+0xfeb 0d 02aae260 011492a5 HRDLogbook!CHRDLogbookView::Reload+0x27e 0e 02aaecb8 0114accc HRDLogbook!CHRDLogbookView::LoadDatabase+0x1815 0f 02aaece0 013d9469 HRDLogbook!CHRDLogbookView::OnDatabase+0x5c 10 02aaedb0 013da56e HRDLogbook!CWnd::OnWndMsg+0x803 11 02aaedd4 013d57cf HRDLogbook!CWnd::WindowProc+0x2d 12 02aaee48 013d5f90 HRDLogbook!AfxCallWndProc+0xc6 13 02aaee68 7747635b HRDLogbook!AfxWndProc+0x34 14 02aaee94 7746729c user32!_InternalCallWinProc+0x2b 15 02aaef78 774667ef user32!UserCallWinProcCheckWow+0x3ac 16 02aaefb0 70bb5315 user32!CallWindowProcW+0x7f 17 02aaf038 70bb53b3 comctl32!CallNextSubclassProc+0x155 18 02aaf0b8 70bb50b1 comctl32!CallNextSubclassProc+0x1f3 19 02aaf114 7747635b comctl32!MasterSubclassProc+0xa1 1a 02aaf140 7746729c user32!_InternalCallWinProc+0x2b 1b 02aaf224 77466a4d user32!UserCallWinProcCheckWow+0x3ac 1c 02aaf288 77466713 user32!SendMessageWorker+0x1fd 1d 02aaf2b8 01201909 user32!SendMessageW+0x123 1e (Inline) -------- HRDLogbook!CWnd::SendMessageW+0x15 1f 02aaf458 0113cd55 HRDLogbook!CMainFrame::RestoreLayout+0xf29 20 02aaf940 018f9473 HRDLogbook!CHRDLogbookApp::InitInstance+0x20b5 21 02aaf958 01532e61 HRDLogbook!AfxWinMain+0x5f 22 (Inline) -------- HRDLogbook!invoke_main+0x1a 23 02aaf9a4 74d90179 HRDLogbook!__scrt_common_main_seh+0xf8 24 02aaf9b4 7774662d kernel32!BaseThreadInitThunk+0x19 25 02aafa10 777465fd ntdll_776e0000!__RtlUserThreadStart+0x2f 26 02aafa20 00000000 ntdll_776e0000!_RtlUserThreadStart+0x1b
The "OnDatabase" function is what loads a database in the Logbook. From the call stack, we can see that this call is happening right as the application starts up and tries to restore its layout -- it's trying to open the windows that were previously open, which leads it to try load one of the database shown in a Logbook database view.
From that point, the call into CRecordset to open the record set is completely normal. Nothing wonky is going on here; the database is simply being queried for all of the rows it has so that the view can be loaded into memory. The code hasn't yet started fetching individual rows -- it's just executing the query to start getting them back.
There's nothing here that's actionable for me: the application has called into ODBC to ask for rows, and they're either taking a long time to arrive, or the database engine is getting sick when trying to provide them. The query never returns, so there's no issue of the application getting sick on data that was returned -- the data isn't coming back in the first place. The involved code works correctly in every other situation where we use it, so we have to figure something about the user's database is to blame.
The original stack from my first post has frame addresses that fit in between frames 05 and 06 above, so we've gained a bit of insight ... but all we really know is that we're deep in the bowels of the ODBC / Access / Jet implementation. There's not much more I could do: even if I could piece together the stack, I'd be guessing at what the Access implementation was (or wasn't) doing when it got sick.
This stack trace was reported at the Microsoft partner dashboard on 2019-02-21 against build 184.108.40.206. It's the only stack for this bucket, but the bucket is about 3.8% of reported crashes.
Frame Image Function Offset 0 ntdll NtDelayExecution 0xC 1 kernelbase SleepEx 0x9B 2 kernelbase Sleep 0xF 3 msjet40 System::AllocatePages 0x168 4 msjet40 PageDesc::WireCurrentPage 0x2E 5 msjet40 PageDesc::ReadPage 0xE6 6 msjet40 Database::ReadPage 0x75 7 msjet40 TableMover::GotoBookmark 0xEC 8 msjet40 ErrIsamGotoBookmark 0x39 9 msjet40 ErrDispReadAhead 0x48 10 msjet40 ErrJPMoveBMP 0x250 11 msjet40 ErrDispMove 0x41 12 msjet40 ErrMaterializeRows 0x290 13 msjet40 ErrJPIOpenSort 0x17C 14 msjet40 ErrJPOpenSort 0x113 15 msjet40 ErrJPOpenRvt 0x177 16 msjet40 ErrExecuteQuery 0xB68 17 msjet40 Table::~Table 0x193 18 msjet40 ErrSetValcolSortinfo 0x7C 19 msjet40 PPIGenSortValcols 0x63 20 msjet40 CopyQgrefJpref 0xAE 21 msjet40 PostProcessor 0x2AB 22 msjet40 ErrQEMCompileQuery 0x5DF 23 msjet40 JetExecuteTempQuery 0x90 24 odbcjt32 SQLInternalExecute 0x71C 25 odbc32 SQLExecDirectCover 0x257 26 odbc32 SQLExecDirectW 0xBD 27 hrdlogbook 0x29FDF2 28 hrdlogbook 0x29FC61 29 hrdlogbook 0x6DFE9 30 hrdlogbook 0xB10EB 31 hrdlogbook 0x78AA90 32 kernel32 BaseThreadInitThunk 0x24 33 ntdll __RtlUserThreadStart 0x2F 34 ntdll _RtlUserThreadStart 0x1B
This just shows that there are other instances of this hang, and reaffirms that I got a decent call stack from the dumps. Still nothing I can take action on just yet.
I think you said you were going to try the newer Access runtimes; and also ask the customer to do the same.
What were the results of that test?
I'm using the 32-bit version of Access Runtime 2013.
I'll try moving to the 2106.
Above, we said we'd get the customer to try configuring the data source to use newer Access drivers. WA9PIE said they'd try newer Access drivers, too.
What were the results of using the newer versions of Access?
I had the customer install the 32-bit Access 2013 Runtime... create a new database... and restore the log from backup to that database.
No recurrence of the problem.
Closing it. No change required.
|2019-03-13 22:46||WA9PIE||New Issue|
|2019-03-13 22:53||WA9PIE||Note Added: 0007685|
|2019-03-13 22:53||WA9PIE||Assigned To||=> K7ZCZ|
|2019-03-13 22:53||WA9PIE||Status||new => assigned|
|2019-03-14 11:35||K7ZCZ||Note Added: 0007686|
|2019-03-14 11:35||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-03-17 01:55||WA9PIE||Note Added: 0007692|
|2019-03-17 01:55||WA9PIE||Assigned To||WA9PIE => K7ZCZ|
|2019-03-17 02:00||WA9PIE||Note Edited: 0007692||View Revisions|
|2019-03-17 02:09||WA9PIE||Note Added: 0007693|
|2019-03-17 10:42||K7ZCZ||Note Added: 0007694|
|2019-03-17 12:06||WA9PIE||Note Added: 0007696|
|2019-03-18 00:02||K7ZCZ||Note Edited: 0007694||View Revisions|
|2019-03-18 12:44||K7ZCZ||Note Added: 0007699|
|2019-03-19 15:33||K7ZCZ||Note Added: 0007709|
|2019-03-28 17:11||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-03-28 17:11||K7ZCZ||Assigned To||WA9PIE => K7ZCZ|
|2019-03-28 17:11||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-03-28 17:12||K7ZCZ||Status||assigned => feedback|
|2019-03-28 17:12||K7ZCZ||Note Added: 0007789|
|2019-04-18 14:48||WA9PIE||Note Added: 0007871|
|2019-04-18 15:09||WA9PIE||Project||3 - Current Dev List => 1 - Backlog|
|2019-04-18 15:10||WA9PIE||Priority||normal => low|
|2019-04-25 14:32||WA9PIE||Assigned To||WA9PIE => K7ZCZ|
|2019-04-25 14:32||WA9PIE||Priority||low => normal|
|2019-04-25 14:32||WA9PIE||Project||1 - Backlog => 3 - Current Dev List|
|2019-04-26 10:58||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-04-26 11:00||K7ZCZ||Note Added: 0007892|
|2019-06-16 17:06||WA9PIE||Status||feedback => closed|
|2019-06-16 17:06||WA9PIE||Resolution||open => no change required|
|2019-06-16 17:06||WA9PIE||Note Added: 0008101|
|2019-06-16 17:06||WA9PIE||Project||3 - Current Dev List => 5 - Closed w/o Action|