The Hidden Menace:
If you have an information system that utilizes a database engine, be it MS Access, SQL Server, Oracle, or some other form of data management, you still may not have 100% accurate data. And we're not talking about the accuracy of what gets entered. We're talking about things that can go wrong if the underlying data structure is incorrect.
Only persons with experience in database design principles and techniques can structure your data properly. And this underlying structure is often the basic building block for the rest of the application. So, depending on how much data design knowledge was used to build the application that you currently use, the underlying data may, or may not be accurate. We know this may sound improbable to you, but we're talking about our real world experiences here, so bare with us for a moment.
Oh, your reports may look good, and they may be more than 90% accurate, but there's a chance that what looks correct on the screen, and on paper, is not the whole story. Here's Why...
What Orphans - - ?
We have analyzed and redesigned systems (more than one) that were actually being used, in which we discovered hundreds of orphaned child records! What are those you ask...?
Let's assume that you are tracking clients for a company, and each client has data on hand about the products that they have purchased. A client may have purchased several of your products. Or, it may be that they have not purchased their first product yet.
Now, let's further assume that the data structure is incorrect, and that as users enter new purchases, the product numbers for each purchase can, on rare occasions, become "disassociated" from the client record. (We've seen this happen!)
(Note: The user interface can be partly responsible for this disconnect. But, the creation of that poorly designed screen is driven by the poorly designed data structure. Another reason why data design principles and concepts are so important.)
In this scenario, the client is the "parent" record, and each product purchased is a "child" record, (as in a parent can have one-or-more children, or in this case, a client can buy one-or-more products). If the child record becomes disconnected from the parent record, it's called an "orphaned" record.
So, if there are hundreds of orphaned child records, that report that you print showing all client purchases, is not really very accurate is it? It may look pretty good, as most clients may have a product or two listed, but you have no way of knowing that there might be hundreds of purchased products that do not show up. This data is lost forever, as there is no way to re-connect an orphaned product number to the proper client.
Believe it or not, this is a real world situation, that we have encountered more than once. It's just one potential result of a poor data design.
Here's Another Real-World Issue - - -
Now, let's assume that you are tracking patients at a doctors office. Let's further assume that each patient has a unique combination of First Name, Last Name, and Date of Birth. But, instead of using or encoding these three fields to identify "unique" records, the previous (and less experienced) developer used the MS Access "auto-number" field. Now, each record has a unique identifier of simply, 1, 2, 3, etc.
This is a really bad design decision. Because now, the data engine cannot stop the user from entering the same First Name, Last Name, and Birth Date more than once. John Doe 1/1/1965 can be record #37, and record #1152 can also be John Doe 1/1/1965. There are ways to prevent the duplicates, but without a proper "primary key" for the patient records, it's more difficult and costly (in terms of application performance) to do so.
We have actually seen such a data system, in use, with more than 400 duplicate patient records. One record would show that the patient received a test, and the other record (for the same patient) had no indication of that test being performed. So, which record was correct? It was anybody's guess!
Data Integrity - - -
Data Integrity means so much more than just making sure that users enter accurate data. It also means that duplicate records (i.e. two different master records for the same client or patient) cannot exist. And, it means that no child record ever gets left behind.
Proper data structure also allows the application to respond to user input more quickly and it assures that the reports that you want stand a much better chance of being completely accurate.
Data Validation - Required Fields - Proper Primary Keys - Correct Relationships - And More.
There's a saying in the database world: The fields in a record (any record) should be about "The Key, The Whole Key, and Nothing But the Key". (We make sure this is true...!)
We've been doing "Data" for a long time, and we know how to do it right!