View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003192||3 - Current Dev List||Bug||public||2019-02-22 18:33||2019-06-17 18:15|
|Target Version||Fixed in Version|
|Summary||0003192: Lat/Long, direction, and distance are incorrect in the Logbook ALE; "Log Coordinates" has opposite affect|
|Description||When using one of the callsign lookup methods, the Lat/Long, direction, and distance are incorrect in the Logbook ALE.|
This is odd, because the direction and distance in the Lookup Pane are correct.
Regardless of the location information obtained by the lookup source (if he has grid, lat/long, or both), the ALE displays the lat/long, distance, and direction for the center of the country. I can demonstrate this with US callsigns. It's difficult to test this in other continents without loading a station location in that continent. Therefore, it's worth considering that the code is wrong for regardless of what country the user's station is located in.
|Steps To Reproduce||First, let's establish that the ALE will display the same lat/long, direction, and distance... regardless of what call is looked up.|
- Open Logbook with the following lookup methods enabled - Logbook, QRZ Subscription, and Country List.
- Display the Lookup Pane
- Open an ALE
- Display the Location tab
- Enter the callsign N8TWO and then either hit the "tab" key or click the "Lookup" button
>> To the right of "Rotator" in the upper section of the ALE, you will see "39.0000N 97.5278W 354/174, 401mi"
>>>> This lat/long is exactly what is in the Country List (ignoring whatever comes in the enabled lookup method)
>> Notice in the Lookup Pane that the lat/long is 42.529048 by -92.359994; 686 miles 19 degrees
>>>> This is the data that comes from QRZ.com
>> Notice the lat/long and distance in the Location tab match the data that came from the lookup source
- Hover over "Log Coordinates" and it says "Save the Rotator coordinates in the Logbook"
- Click the "Log Coordinates" button. Oddly, this does not save the Rotator coordinates in the Lookup. It actually puts the coordinates from the lookup source in the fields to the right of Rotator. The distance is recalculated and is now incorrect. I have no idea which is distance is correct (before or after clicking this button).
Repeat this process with the following calls - W6MKO (which has lat/long and grid in QRZ data), KB3NPH (which has only grid in QRZ data), and KS7Y (which has lat/long and grid in QRZ data and is north of K7ZCZ).
> The lat/long, direction and distance will always show "39.0000N 97.5278W 354/174, 401mi", rather than the correct coordinates
> The lookup pane will always be correct
> Clicking the "Save the Rotator coordinates in the Logbook" will always result in pulling the coordinates from the Location tab and moving them into the upper section of the ALE to the right of Rotator. The distance is recalculated.
|Additional Information||What should happen is that the information to the right of Rotator in the upper section of the ALE should get its lat/long, direction, and distance correct with the initial lookup. Clicking on the "Rotator" button in the ALE correctly causes the Rotator app to move to that direction (beam heading).|
The "Save the Rotator coordinates in the Logbook" function should cause the coordinates from the upper section of the ALE to be pushed into the Location tab. Overall, this is a really really dumb "feature".
|Tags||No tags attached.|
NOTE: The coordinates you get may not match what I see from my "location" ("39.0000N 97.5278W 354/174, 401mi"), but the pattern described is the point here.
Additionally, it doesn't matter what country the callsign you enter belongs to... the ALE will ALWAYS show the lat/long (and calculated distance) from the Country List... ignoring the callsign lookup information.
I'm sure that, to the outside observer, the ALE implementation is presumed to be simple: get some data from the user, look it up in a couple of online sources, and stick it into a database. Maybe it should be simple, but the actual implementation is fragile and complicated. As far as I can tell, the main issue is that no specification exists for precisely how the feature should behave. Changes have incrementally been made to address bugs without any overarching specification, and nothing has been documented. At this point, all we have is the code, which is poorly written and exercises constructs which all but guarantee inconsistent results. But there it is: it's the de facto standard for how this part of the product works.
So, how does it work? Well, let's look.
When the user causes a lookup operation in the ALE, here's what happens:
1) The list of configured call sign lookup data sources is consulted for a response for the given callsign.
2) The responses are built into a list, in order that the user has configured them. If a data source doesn't return any data, then it doesn't add anything to the list.
3) On my machine, after the lookups are done per the repro steps in this issue, result from the "Unique Callsign Database", result is the response from the QRZ XML service, and result is from the "Country List". The Logbook is consulted, but doesn't produce a match and therefore doesn't add an entry to the list. Because this code is not self-consistent with any rule, the "Unique Callsign Database" adds an element to the list at result even though the given call sign isn't found in the "Unique Callsign Database". The results return these values:
Result (UCDB): country = "", lat = "", lon ="" Result (QRZ.COM): country = "United States", lat = "41.239430", lon = "-80.777369" Result (Country Index): country = "United States", lat = "", lon = ""
4) If there's more than one result (there are three) and the first result (from the UCDB) doesn't have a country name (it doesn't), the list is scanned to see if any of the countries have an HDRCN (HRD Country Number?) that is either empty or different than any other in the list. We have an empty country, so we proceed to step 5.
5) Each result is examined in turn:
5a) If the country name is empty, it is skipped.
5b) If the country name has an HRD country number of 10000 (which is Kosovo) and the call begins with "Z6", it is skipped.
5c) The result is copied to a new list; let's call it the "used list".
5d) The results in the rest of the list are examined one by one; if step #5 has us visiting result, we'll examine result and so on, until the end of the list.
5d1) If the additional result doesn't have a country, it is skipped
5d2) If the additional result has the same country as the original result, it is "merged" as follows:
5d2a) The source name of the additional is appended to the current result in the used list. We end up with "QRZ.COM, Country Index", as the source name in this issue's scenario.
5d2b) The country name in the additional result is erased.
5d2c) For each of these fields: CQ, ITU, IOTA, Lat, Lon, County, State, Continent, Locator, QSLMgr, QTH:
5d2c1) If the field in the current entry of the used list is not empty, and the field in the currently examined additional result is empty, then the result in the currently used list is overwritten with the empty value.
A bit more descriptively: step 5 is a nested loop. It compares each returned value with every other returned value, in order. If a later returned value (lower in the user's defined list) has the same country as an earlier returned value, the earlier returned values are erased completely. The erasure happens in step 5d2c1) above. The erasure doesn't seem intuitive. It seems like it would be more appropriate to copy the later value into the earlier value, overwriting what's there instead of erasing it.
In the scenario we have in this issue, at the end of step 5 (including all the loop steps), we end up with this entry in the new "used list" of results:
usedList (QRZ.COM, Country Index): country = "United States", lat = "", lon = ""
6) Depending on the size of the used list, an entry is selected:
6a) If the used list has one entry, it is selected
6b) If the used list has multiple entries, then the user is prompted to choose an entry
6c) If the used list is empty, the first result from the original result list is selected
7) If the country in the selected result is in the "Countries" drop down, it is selected there; else, the IOTA drop down is searched for the country name. If found, it is selected.
8) If the DXCC number in the selected result is empty, then we try to find a DXCC number by country name. The resulting DXCC number is retained in the dialogs data, but not displayed. If the DXCC number is 0 or 777, then some special processing is done. (Skipped here.)
9) If the HRD country number in the selected result is empty, then the country name of the selected record is used to look it up. If the HRD country number is 777 or 0, then some additional processing is done. (Skipped in our walk through.)
10) The DXCC and HRD Country Numbers are displayed in the ALE dialog.
11) If the Lat and Lon are empty, and the QTH and the State are not empty, then the QTH and State are used to find Lat and Lon, plus the grid locator. (Not done in our case, since QTH and State are both empty).
12) If the Lat and Lon are not empty, and the QTH OR the State is known, then an attempt is made to use the ZIP code to find the Lat and Lon, and then compute a grid locator. (Not done in our case, since no ZIP is known and state and QTH are both empty.)
13) If the CQ or ITU regions are known, these are displayed in the dialog box. Else, the dialog shows an empty string for these fields.
14) If the selected record has a continent, it is selected in the continent drop-down list in the ALE. Otherwise, the continent is determined from the country name, and that continent is selected from the drop-down.
15) If the selected record still doesn't have a latitude but does have a country name, the lat and lon are determined from the country name. Their values are set into the estimated lat/lon fields. In our example, this is where the "39.00N" and "97.5278W" values come from. That location is a little north of Salina, Kansas. Looks like the N8TWO QTH is a lot closer to Waterloo, Iowa.
16) Otherwise (if not #15) then the previously discovered Lat and Long are set to the Actual Lat and Lon values in the "Location" tab ... but not directly set to the rotator values.
17) The State, County, Age, QSLMgr, and QTH are unconditionally set from the selected record into the ALE dialog box.
18) The Name from the selected record is set, only if the Name field in the dialog is empty.
19) The start and end time of the QSO are set to the ALE using the current time.
20) The Rotator location is updated. This computes a distance, but doesn't change the lat/lon location for the rotator.
21) The country tab is updated.
Note that this process doesn't explain how the correct values are found in the Location tab. How does that happen?
Turns out that, between step #1 and Step #2, each call sign data source is queried. Before it adds its results into the results[n] list we build, it also sets individual controls in the ALE with the values that it got from the answer. This process is kind of concurrent with the rest of the steps performed here: there is something of a race, and that race might not have a predictable winner. If a lookup operation has a lat/lon value and pokes it into the ALE's controls; and the steps above produce a lat/long value and it gets set into some fields in the ALE, what is displayed? Which one completes first?
It might vary depending on the sources, how long they take to respond, the speed of the machine, other installed software. Odds are, that race is at the root cause of Mantis 3187; as it has been at the root cause of several other issues in the ALE.
Here's a description of what happens when we receive a result from a single lookup data source -- this is the code that asynchronously executes between steps #1 and #2 above:
1) The start time and end time values are retrieved from the ALE controls.
2) These are used to figure out the DXCC for the call sign. AFAICT, this is done using some regular expressions in the country list.
3) The data from the lookup operation is collected and processed as follows:
4) if the DXCC number value can be converted to a country name, that converted country name name is used. Otherwise, the country name in the address is used.
5) The countries list is checked to see if the chosen country name is in it.
6) IF not in the country list, some translations are tried: "St." is translated to "Saint". "Eatern" is translated to "East". "Western" is translated to "West". The list is checked again with the translated names, and if a hit is found, that translated country name is used.
7) If a continent wasn't found, a continent is computed from the country name.
8) If the country name in the address is empty, the computed country name is used.
9) If an option to use only the first name is set, the code uses only the first name as the name field. Otherwise, the name field is the first name plus the last name.
10) If the QTH is empty and the country is the US, and ...
10a) ... and the ZIP code field is supplied, then the city and state name are discovered by looking up the ZIP. If a hit is found, then the QTH becomes the "City, State" up to the first comma found. Otherwise ...
10b) ... and the address2 field is supplied, then Address2 is used as the QTH
10c) ... and the address1 field is supplied, then Address1 is used as the QTH
10d) ... and the county (subdivision) field is supplied, then the county is used as the QTH
10e) ... and the state is field is supplied, then the state is used as the QTH.
11) If we found a QTH name, however we found it, it's trimmed to end at the first comma.
12) If the state field is empty, and ...
12a) ... and the country is the US, then the ZIP code is looked up to find the "City, State" and the state field becomes the data returned after the comma.
12b) ... and the address2 field is empty, and the address 2 field include a comma, everything after the comma becomes the State.
13) The Biography and Address fields are trimmed to make newlines look nice.
14) If the address field is empty (note that "Address" is different than "address1" and "address2"), the address field is bult by:
14a) Adding the first and last name, ignoring the option about first name only
14b) The City, State, and ZIP are built on one lines
14c) If the country is the US, then the QTH and ZIP are used as the last entry in the address and address1 and address2 are the two lines after the name
14d) else, if the country is not the US, then the Address1 and Address2 lines are used directly.
14e) If the QTH appears twice (which it would, because 14C is broken and adds it twice) it is trimmed from the address.
15) If the lat or long are empty but the grid square is provided, the lat and long are calculated from the grid square
16) The QTH field is (again!) stripped of data before the first comma it contains.
17) Age is computed from the "Born" field, if supplied. T his might overwrite a previous value in the "Age" field.
18) At this point, we have a list of fields with values. Values for these fields are copied to the result list and given back to Step #3, above: Country, county, DXCC, IOTA, CQ, ITU, Lat, Long, QSLMgr, State, Locator, Continent, Age, QTH, Name, Source.
19) The results are labelled as coming from QRZCOM regardless of where they came from.
20) If a call sign is found and the call sign isn't empty
20a) test is done to see if the call sign is "away from home".
20b) for each of the fields in a list of displayed fields in the ALE and its tabs:
20b1) if the call sign is "away from home" and the field is on the "Location" tab, or the field is one of QTH, Locator, or State, do nothing with this field.
20b2) Get the value from the currently displayed matching UI control.
20b3) If the current displayed field value is not empty, and we have a value to add, and the field is the CQZone or ITU Zone, prompt the user before overwriting it.
20b4) else, or if the user agreed, overwrite the field.
OK, so now we have two lists of steps that are performed to populate the call sign lookup results into the ALE; and eventually into a logbook QSO database record.
Believe it or not, we're not done!
In the second set of steps, step #20 and its sub-steps pokes values into different fields in the ALE window. Maybe it selects the country from the drop-down list of countries; maybe it slams a name string into the "Name" field.
Either way, some of the controls are programmed to take action when their content changes. Their content might change when the user types or clicks on them, but their change-response actions are also invoked when the content of the controls is programatically changed -- like in response to the work in step #20.
Often, that programmatic response is to sync the control with a similar, redundant control in some other part of the UI. For this issue, a very good example is the Location tab, which has lat and long and locator fields. Step #20 pokes the locator field into the "Location" tab's locator field. That control has code which syncs it with the "Locator" field on the main ALE dialog so that when the user changes one, the other changes. This same code assures that Step #20 copies the location from a lookup into both places.
But that same code does not copy the lat/long values. This seems to be deliberate, again highlighting the difference between the "Estimated" and "actual" lat/long values.
The locator fields are just examples; there are a handful of other fields with reactionary code in them.
All of the field setting and computation here is racing with the field setting and results building of the first set of steps in the first section of this write-up. Usually, it seems the right logic wins. But sometimes not, particularly when we think of the secondary responses to changed fields. Most of the bugs I've fixed in the ALE, from crashes to fields that aren't populating correctly, are a result of this code's architecture and implementation. Normally, software isn't wirtten to race itself becuase predictable results are desired. When a race exists, the results usually aren't predictable.
Practically any change to this code will cause unexpected changes in behavior. We're playing whack-a-mole with the changes we want and the results we get. Looks like a good fix, so we try it. But a few weeks later, we notice some other problem has popped up. Maybe it was always there, maybe the race changed, maybe the third time (but not the second or the first times) the country was converted to a DXCC number now has a bug.
And that's how it works. I've not checked my translation for errors; I wrote this in one pass, and it took several hours. But now the status quo is documented. Now that we've got a good idea of how it does work, how do we want it to work?
In chat, it was noted that we don't know how far back this problem goes.
We can see that, in this checkin:
The code which overwrites good data with empty data (step #5d2c1) was introduced ... along with a bunch of other code to manipulate the different fields. I have no documentation of the developer's intention when this change was implemented; just the comment "prompt for overwrites" as is on the change list.
The race condition (the second set of instructions that run partially, concurrently with the first) existed as far back as I looked.
Why did you write all this up?
I wrote this up because I've walked through the code several times during my involvement in this project. As reported above, it's quite complicated. Each time I walk through it, I forget some detail I learned the last time I read the code. On the other hand, each time I go through the code, I learn something new. Writing it out helps me remember what I learned.
It also provides documentation. To my knowledge, there's no documentation of how the ALE or the call sign lookup feature are meant to work. With this write-up, we have some documentation of how they do, in practice, work. I think a great way to make progress would be to critically think about how they do work, identify problems, and suggest remedies.
Aren't you over-stating the problems with this code?
No. In fact, I'm under-stating the problem. The involved code is about 1500 lines of C++ source. That doesn't include any of the supporting code (the lookup operations themselves), and doesn't include any of the data structures involved. Distilling a few thousand lines of code into a list coulpe lists of 20-some steps each is a simplification and leaves out numerous details.
What's the point?
This code is a lot like a Rube Goldberg machine. Except that, by all evidence I have, Goldberg's machines actually worked most of the time. When they didn't, the cause was obvious: the bird didn't eat the cracker, for example.
The ALE window as a whole does not have reliable behavior; it doesn't have a specification for its use or function. It also isn't readily inspectable to determine the cause of its behaviour -- it can't be, since it would be so hard to understand. The call sign lookup operation doesn't have a specification, either.
Both the ALE itself and the call sign lookup feature are specifically at the heart of the issue reported here, and I think it's necessary to understand the status quo before changes are made.
Why do you write so much? Can't you summarize this?
No, I can't. Really, since I've written a couple pages to describe a few thousand lines of code, this is a summary.
Mostly, the detail in these notes are for me so I can refer back to them rather than re-reading the code again and again. Anyone wanting a description of this feature would also want a description that had enough fidelity to describe the behavior in detail, I think.
But I insist that the team get to the point where it's comfortably able to compare the existing behavior with the desired behavior and articulate the difference clearly and completely. The best I can do is express what I think the next steps might be.
I'm happy to clarify anything at all; I'm happy to answer specific questions. But I believe correctly fixing complex issues demands informed decision-making.
What are the next steps?
Given what I've read, there are a few possible outcomes.
First, I note that the code tracks two locations. One is the "estimated location" and is shown on the main ALE window (not in a tab). The other is the "location", displayed on the location tab. These are different. I have no idea if they should or shouldn't be different; maybe one should set the other, or they set both, or something else. But it's obvious to me that, at least for a time, they were meant to be different -- one was an estimate, the other was actual.
Next, it seems like the resetting of previously found fields is a bug. Step #5d2c1 in the first bunch of instructions, seems like it can be nothing but wrong. If we fix it, though, we radically change the way the ALE behaves during call sign lookup operations. Would the resulting, different behaviour be acceptable to us and our users?
Even if we fix Step #5, we still have this problem of a race. There are two bits of code doing overlapping work and there's no code that deliberately determines which result should be used, kept, or shown to the user. It's a fight over time. Remnants of one computation remain and can be found; that's why the "Save the Rotator coordinates in the Logbook" button seems to work backwards. It's doing what it should -- it's just using an outdated copy of the data it has instead of what's actually displayed.
From a more technical perspective, the complexity in this code is unforgivable. One layer of bad design is spackled over another layer of bad design. Emotionally, it's compelling to tear it down and start over. But, starting from scratch, what do we write? With no specification and a bunch of customers who expect certain familiar behaviour, how do we rebuild something that will be percieved as acceptable and correct?
Thinking solely of the lat/long/heading issue reported here: a simple and direct fix might be to not (ever!) try to guess the location of the remote station. Not guesses: using a reported specific Lat/Lon or using a Lat/Long from a locator grid square. Guesses: Using the ZIP to find a state to use the state, or just using the center of a country. Not guessing means that, if a specific locator or lat/long were not provided, the fields go empty.
If we think guesses are always (ALWAYS!) wrong, we can remove just that code and it will never execute again. But there will certainly be cases where users don't get any lat/long data when they previously did -- even if it had been bad data.
Maybe the simple and direct fix is fine for the short term, but it isn't particularly helpful toward fixing the ALE and the call sign lookup process holistically. The race still remains, so other fields will become problematic. The double- and triple-pass computation of the same field will still make the code hard to maintain. And the notion of overwriting data that we've retrieved still exists.
These attributes seem like deficiencies just as severe (if not more so) than the location issue described here.
We decide to fix this promptly, but not compeltely. This checkin does a conservative fix:
As it stands, it's an acceptable interim fix.
See the attached image. Fortunately, there's a typo in Tim's QRZ entry for his Longitude that enables us to see the difference. But the heading and distance are "close enough" for now until we get to 3001.
Lat-Long.png (189,464 bytes)
Lat-Long.png (189,464 bytes)
||Mike, I don't believe my lat/lon is a very good example. The reason is this, and it might be shared by others as well. My QRZ Lat/Lon is according to GPS location. If you based the my location on my zip code, that is completely wrong. My physical location is the next town over from my zip code. My MAILING address is Spring Run, PA, while I actually live in Dry Run, PA. This is common out in very rural areas like I live in.|
Ok, I wasn't exactly correct here.
Tim's coordinates are correct (per GPS coordinates). What's in the ALE was not based on zip code.
But the Latitude and Longitude in the ALE still show the values that come from the Country List - rather than the coordinates for Tim as found in the lookup source. But the heading and distance in the ALE are close enough to solve the pointing of the rotor problem.
||What, specifically, needs to be fixed? Isn't the country list lookup meant to be fixed with 3001 (where we decide that we'll ignore the country list), or is something required above and beyond that?|
||Yes. We'll fix this as part of 3001. I'll leave this assigned to me and I'll test it along the way (3001).|
|2019-02-22 18:33||WA9PIE||New Issue|
|2019-02-22 18:33||WA9PIE||Assigned To||=> K7ZCZ|
|2019-02-22 18:33||WA9PIE||Status||new => assigned|
|2019-02-23 21:51||WA9PIE||Note Added: 0007493|
|2019-02-23 21:55||WA9PIE||Note Edited: 0007493||View Revisions|
|2019-02-24 07:08||K7ZCZ||Note Added: 0007494|
|2019-02-24 07:16||K7ZCZ||Note Added: 0007495|
|2019-02-24 07:27||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-02-24 07:27||K7ZCZ||Status||assigned => feedback|
|2019-02-24 07:27||K7ZCZ||Note Added: 0007496|
|2019-02-26 17:32||K7ZCZ||Note Added: 0007508|
|2019-02-28 01:33||WA9PIE||Relationship added||related to 0003001|
|2019-03-02 00:26||WA9PIE||Status||feedback => acknowledged|
|2019-03-13 14:11||WA9PIE||File Added: Lat-Long.png|
|2019-03-13 14:11||WA9PIE||Note Added: 0007682|
|2019-03-15 07:33||KB3NPH||Note Added: 0007688|
|2019-03-24 23:57||WA9PIE||Note Added: 0007734|
|2019-06-16 17:13||WA9PIE||Assigned To||WA9PIE => K7ZCZ|
|2019-06-16 23:23||K7ZCZ||Note Added: 0008113|
|2019-06-16 23:23||K7ZCZ||Assigned To||K7ZCZ => WA9PIE|
|2019-06-17 18:15||WA9PIE||Note Added: 0008123|