You may find that the database being shipped with OpenOffice (ver.2 and higher) delights you as much as it has me. This page tries to help you use it.
Forget anything you may have heard about Adabas, which came with Star Office, the commercial version of Open Office 1. The current Open Office's database, "Base", aka "ooBase", is unrelated. And remember that Open Office, including ooBase, is free! But don't let that fool you. And it's not new. Big organizations, government and civilian, are adopting it as their standard office suite... and saving million$, but still Getting The Job Done.
There's more about ooBase in the main index to this material.
This page is "browser friendly". Make your browser window as wide as you want it. The text will flow nicely for you. It is easier to read in a narrow window. With most browsers, pressing plus, minus or zero while the control key (ctrl) is held down will change the texts size. (Enlarge, reduce, restore to default, respectively.) (This is more fully explained, and there's another tip, at my Power Browsing page.)
Page contents © TK Boyd, Sheepdog Software ®, 1/09 - 1/11.
Please note the title: "Further thoughts on Primary Keys". I have also written an introductory page about primary keys, covering the basics.
Database tables should (usually) have primary keys.
As discussed in the introductory article, the values in the primary key field of a table must all be unique. That is to say that the value in the primary key field of every record must be different from the value in every other record in the table.
Often, to simplify meeting this requirement, the designer of the database decides to merely use an otherwise meaningless number. That works, but it isn't always necessary to take such a dull approach
Some database projects will give you an easy time. If you are doing a database concerning some automobiles, as long as you can be sure that none will swap license plates (registration numbers)... a not-impossible event, but most of us could discount it in our database planning... then that could be a good primary key.
If the database is to manage, say, a collection of CDs with computer software on them, the owner may already be giving each a unique serial number, in order to simplify other record keeping.
Books have ISBNs. If the database was for recording your opinion of different books, the ISBN would be a good primary key. If it was for tracking books in a library, it would not, by itself, be satisfactory... the library might buy two copies of a popular book. We will come back to this example in a moment.
Some databases are about events tied to specific times. Perhaps the database is about openings and closings of a bank vault, or some premises protected by a burglar alarm. In either case, the date and time of the event would be a wonderful primary key... meaningful, something that would probably have to be recorded anyway. See how a primary key doesn't have to be a simple number?
There is nothing wrong with using a single field for your primary key. I lean towards doing so, because it Keeps things Simple. But there are times when it makes sense to treat two (or more) fields as combined into a single field, and make that "virtual field" (not a "proper" term) the primary key. When you do this, you are using what is properly called a "compound key".
Here's an example.
Boarding schools are sometimes required to monitor the weight of their pupils. Even in our world of men with clipboards, I'll take the chance that the authorities will never require a school to take more than one weight record per day for any child.
Thus, that information would be stored in a database with at least the following fields: Child's name, date of weighing, weight on that day.
See the primary key to use? Yes! A compound key consisting of the child's name and the date.
Notice I said "take the chance" a moment ago? In a perfect world, database designers can find primary keys which the designer knows will never result in two records with the same value in the primary key field. Part of the skills you must develop, if you are going to be successful in computing, is the skill of imagining the things that lesser mortals fail to imagine.
So... that's what a compound primary key is. Now we're going to make one.
Usually, you establish the primary key, compound or otherwise, when you first design the table, before you have added any data. You often can re-define the primary key, including changing it from a simple key to a compound key, after some data has been entered into the table... with the following constraint: At the time you designate a field or fields as a primary key, the data (in combination, if you are making a compound key) must be unique in each of the fields. If it isn't, a quite helpful error message will arise. If this happens, you merely edit the data in the table, and then try again to make the primary key as you want it.
To designate a compound primary key...
Close the table if it is open.
In the ooBase main project management window, select "tables" in the column at the left, then right-click on the table you wish to re-design. Click on Edit in the pop up menu.
This will open the table in design view. The fields will be displayed in rows at the top of the page, one field per row.
Select the fields you want to combine into your compound key. A field is selected if the whole row is highlighted. Just click on the left hand column of the first row you want included in the key; the row should immediately become highlighted. Hold down the control key... until further notice. Click on the left hand column of any rows you want to add to the key. (Click on a row again to de-select it, if you click on a row by mistake).
When- not before- you have selected the rows of all the fields you want in the compound key, release the control key. Right click again in the left hand column of one of the selected rows. Click on "Primary key" in the pop up menu. You should see a little brass key at the left hand end of each of the rows. Job done! (Don't worry about the green triangle which is also present by one of the keys. It is just indicating the currently selected row.
Be sure to re-save the table's design. ooBase will prompt you to do that if you close the design window, but you might go off and try your new table, and unless you've done the save, I can't promise what the result will be.
Onward....
Go back to the "bank vault" example given earlier. What if the door on the vault is wired directly to the computer, and there's no way to "fudge" the data. It could easily happen that someone would one day nearly close the door, close it far enough to trigger the creation of a record, and then remember one more thing to go into the vault, re-open the door.... in the same minute as it was "closed". Of course, if the time of closing was recorded to the second, then it becomes next to impossible for two records to have the same time... but do you want to fill up all that disc space recording what second the door closed?
(To "get real" for a moment: The records are probably going to be kept by hand, and when they are entered into the computer, if someone was daft enough to record "closed 10:12, (re)opened 10:12, closed again 10:18", then...
Let's thing further about recording timestamps. So far in this essay, I've been assuming that a combined "date-time" will be entered in a single field. For the bank vault, this probably makes sense.
What about a database of incidents responded to by ambulances?
In this case, it would make sense to record the date in one field, and the time in another, and I'll tell you why.
Managers of any emergency service are always interested in the patterns in the demands upon the service. The database of incidents responded to, if properly set up, can tell the users when the busy times of day are. A medium sized city probably needs more fire engines on duty between 4pm and 9pm (rush hour and cooking times) than it does at 4am. Thus, in that database, it makes sense to put the time in one field, the date in another. As long as there aren't two incidents starting in the same minute, the date field value and the time field value can be used together in a compound key for the primary key. If the control center is much larger, taking calls for a large population, then a third field containing the id of the person taking the call could be added to the compound key, thus keeping the system from being artificial. (The emergency operator's "ID" would be a primary key in the service's table of employees... it would be vital that no two employees have the same id. Otherwise, in theory, a problematic situation could arise: The situation where two employees with the same "id" (not a very good ID(ea) that the circumstance should exist!) were both taking an incident report at the same time.)
That covers what I wanted to add to what was said in the tutorial introducing primary keys. The rest of this essay, apart from the usual helpful (I hope!) bits at the bottom of the page, is just more examples, exploring pitfalls and possibilities.
It is frequently the case that each record in a table refers to a person.
Obviously, even using first name plus surname ("last" name) won't do... you will soon have two John Smiths, or two other instances of a common name.
Even if there weren't two John Smiths, however, the full name might not be a good choice for a primary key.
If the database has other tables, tables in which people's names are going to feature, then it will be helpful if there's a quicker way of making an entry relating to someone, quicker (and more reliable) than typing "J-o-h-n S-m-i-t-h"... hence abbreviation schemes built upon initials.
You may need an abbreviation that is easy for people to remember... the data subject may be contributing data, for instance. (When the bank vault door is closed, perhaps the person who closes it would be required to put their name into the database, for example.)
Not only are there many John Smiths in the world, but a good number of Jim Smiths, and even some Jane Stevensons. "JS" is just not going to hack it!
But you want something everyone can remember. So just take the first two letters of each part of the name! Our three "JS"s become JoSm, JiSm and JaSt. Hurrah! But of course, there will soon be a second John Smith. Even Joe Smith would wreck things.
So you add something extra to the end.
Beware just using "1", "2", "3".... "JoSm2" may feel insulted.
In any case, "a", "b", "c", etc, make better differentiators... it takes longer to run out, and reach the need of a double letter differentiator. (It is easy to get too concerned with keeping data short, but when you can keep it short, that's good!
One thing that can be used to differentiate people is the day of the month they were born on. Using the full date has problems of unwarranted display of personal information, but I don't think many people will have a problem with the day of the month they were born is used as part of their unique ID. And if they do, or if you have two JoSm's born on the same day of the month, lie! Allow one or the other, or both of them to choose two digits OTHER THAN the day they were born as the last two digits of their ID.
Other quick sources of memorable differentiators: Home telephone numbers, mother's maiden name, father's first name, etc. You just have to exercise some imagination. But in addition to thinking of something that people will be able to remember, you have to anticipate ways in which two different people could end up with the same ID through your system. And you have to find a way to "break" those "ties".
Oh, no! I'm not done with using their names to make up IDs for people yet!
But I'm done with the root of the ID, the JoSm23, or whatever.
Now I want to add a frill. It won't always be possible to force this frill on your users, but when it can be, I'd recommend it.
Take the characters in the root ID, e.g. JoSm23, and apply a formula to them which will generate one more character. Add that character to the ID. So, if by your formula, you get, say, "w", Joe's ID becomes JoSm23w. That final character is called a checksum.
Why? What formula?
Why: To catch mistakes. If your formula is well chosen, then JoSm23 can only give rise to "w". If someone makes a mistake when recording Joe's id somewhere, or if there's a typo when someone is trying to enter "JoSm23w" into the computer, then, if the computer re-calculates what the final character should be (for the six first characters given), and compares it to the 7th character actually given, then mistakes can be caught.
(Making the computer do this is possible in ooBase. Not easy... but possible. In any case, it is quite simple to check the ID entered against the table holding all of the known IDs. If "w" is the correct checksum to go with JoSm23w, then there won't be another employee with, say, JoSm23x as their employee ID. Good checksum formulas can ensure that no single character mistake, or even a transposition, e.g. JoSm32w, can lead to a string that passes the checksum (or reference to list of possible IDs) test.
What formula: Let's start with a bad... but easy to understand... formula, just to give you the idea.
Every character you can type, and the space, which really is "a character", although it is odd to us to think of it as such, has an "ASCII code". The code for "A" is 65, the lower case version, "a", has a different code (97 as it happens.)
So, you could take the ASCII codes for J, o, S, m, 2 and 3, and add them all up. That would give you quite a big number. Let's say the total is 347. (It isn't, by the way.) Forget all of the number except the last two digits... 47. Divide that by 4, throwing away any fractional part. That leaves 23 by a rule that could create numbers from zero up to 24 (99 divided by 4 being 24 and 3/4s.)
With that formula, the final step is easy: The user's checksum character is the letter of the alphabet you get by counting forward as many places as the number you got a moment ago requires. E.g. if the number were zero, the checksum character would be "A", if the number were 1, the checksum character would be "B", etc.
As I said... that's not a very good checksum rule, for several reasons. JoSm23 and JoSm32 would give the same checksum character, and they should not. A good checksum rule will detect transpositions. Also, JoSm23 and JoSm24 would have the same checksum... if I have been lucky with where the boundary lies. (I didn't do the calculation. JoSm23 will have the same checksum as three neighbors in the sequence JoSm20... JoSm29, for sure.)
If you want to know more about checksums, Wikipedia has a good article, as you would expect, I hope!
By the way: Yes, I do know about the use of modular arithmetic, which is very useful in such matters. (Avoids the need for the clumsy "divide by 4 and throw away any fractional part" mechanism used above.) But not all readers will know about it.
Oh... and how do you make a good formula for calculating a checksum?? That's a matter for another day. If you're ready to try to do this, you probably already know of places you can go for help.
If only people were computers, even Windows computers. Computers are much more reliable than people. Suppose we've generated our short form IDs for people in order that openings and closings of that bank vault can be recorded. Depending on how tamper-proof you need the system, it could be that you merely keep a clipboard near the vault, with the columns in your database table on the page:
Employee ID code? Yes! You don't want the poor soul who has to type the information into the computer to have to constantly look up those codes do you? Or write the names out in full?
However, even though it will not usually be needed, I'd suggest... this is the practical matter... that you add a column to the page for Employee Name. It will take the employee but a moment to put it down, and when the data entry person has a question about what is written in the ID code column, the data entry person will have the final column to turn to for help.
If the checksum approach has been used, then the computer can be "taught" to "look at" the employee ID codes as they are typed in, and "complain" if the ID code is impossible. ISBNs, the ID numbers assigned to books, have a character within the ISBN which is a checksum digit, for just such use.
Rather abrupt again, I fear. I'll try one day to make better ends for many of my pages on the web!
I dislike 'fancy' websites with more concern for a flashy appearance than for good content. For a pretty picture, I can go to an art gallery. Of course, an attractive site WITH content deserves praise... as long as that pretty face doesn't cost download time. In any case....
I am trying to present this material in a format which makes it easy for you to USE it. There are two aspects to that: The way it is split up, and the way it is posted. See the main index to this material for more information about the way it is split up, and the way it is posted.
PLEASE >>> Click here to visit editor's Sheepdog Software (tm) freeware, shareware pages <<< PLEASE
If you liked this ooBase tutorial, see the main index for information other help from the same author.
Editor's email address. Suggestions welcomed! - - - Want a site hosted, or email? I like 1&1's services.Page tested for compliance with INDUSTRY (not MS-only) standards, using the free, publicly accessible validator at validator.w3.org
. . . . . P a g e . . . E n d s . . . . .