Open Office Database Tutorials
Field Types... aka Data Types
You may find that the database being shipped with OpenOffice (ver.2 and higher) delights you as much as it has me. This page tries to help you use it.
Forget anything you may have heard about Adabas, which came with Star Office, the commercial version of Open Office 1. The current Open Office's database, "Base", aka "ooBase", is unrelated. And remember that Open Office, including ooBase, is free! But don't let that fool you. And it's not new. Big organizations, government and civilian, are adopting it as their standard office suite... and saving million$, but still Getting The Job Done.
This page is "browser friendly". Make your browser window as wide as you want it. The text will flow nicely for you. It is easier to read in a narrow window. With most browsers, pressing plus, minus or zero while the control key (ctrl) is held down will change the texts size. (Enlarge, reduce, restore to default, respectively.) (This is more fully explained, and there's another tip, at my Power Browsing page.)
Most of the pages in this website are keystroke- by- keystroke tutorials showing you how to accomplish something with OpenOffice's database, "Base". This page is a more theoretical discourse, telling you about field types. (They are also known as data types.)
I know that inside my car, there are cylinders, pistons, etc, and when the car goes it is because the engine is burning fuel... but I don't know very much about all of that, and I think about it less.
Most people know that inside a computer "everything is numbers". And they are quite happy not knowing much more about it than that.... until they try to understand what they are doing. Until they try to know a bit more than just "What button do I press?". When you understand what you are doing, you don't need to rely on others to tell you what to do.
Consider a simple database to keep track of the calls being dealt with by some company's service department. If you were the manager, you would probably want to record what time each call arrives, how long it lasts, the date, which customer service representative deals with the call, and you'd want a brief note about what the call concerns.
As we said earlier, everything inside the computer is a number.
If I write "ABC" with my wordprocessor, then somewhere inside the computer, the numbers 65, 66 and 67 will be held. Those numbers, in a commonly used computer code, stand for the shapes "A", "B" and "C" which you see on the screen now. Note that your web browser and my word processor are using the same code which pairs up shapes and numbers. Inside the computer, or on the disc: 65. On the screen: "A".
By the way... the word "code" sometimes gets people thinking about secret codes or encryption. In this discourse, we are not using "code" in that sense. Think of Morse code... "Dot-Dot-Dot" stands for "S". It is a code... but not a secret code.
The code which assigns "A" to 65 is called the ASCII code. It provides for all letters, both as capitals and as lower case characters. (The code for "a" is not the same as the code for "A"). It provides for the digits (0-9). It provides for many punctuation marks. It is very widely used.
For our database, we could simply use ASCII to store all of the data we want to store... but it wouldn't be the best way to go.
For a whole bunch of reasons, computer scientists have invented many different codes, and ooBase lets you use quite a selection of them.
Before we start on the range of codes available to you, a little word about why you would want more than ASCII.
When the Romans wanted to write twenty seven, they wrote XXVII. Thirty four was XXXIV. Fine. This is another code. You can show 12 as "12", or "twelve" or "* * * * * * * * * * * *" (12 stars) or "a dozen"... those are different codes.
There's nothing wrong with the Roman's code for saying "27", or "34"... until you try to subtract 27 from 34. Then the code the Arabs gave us for expressing numbers works a lot better!
The different codes computer scientists have developed for depicting things inside the computer have strengths and weaknesses. Read this web page to learn more, so that you can choose well.
"Choose well? When do I choose codes?", you ask.
When you set up a database table. One of the things you have to specify for each field is its type. When you specify a type for a field, you "set" the database to use a particular code for the data stored within that field. I won't use the term "code" again, but that is what we're talking about when we speak of "types".
The word "type", in this context, isn't exactly used as a synonym for "kind", as in "sort". (Good grief, Charlie Brown. This gets worse the further I go! I don't mean "kind" as in "nice", nor "sort" as in "arrange in sequence".... but I trust you already understood that. How we accomplish anything in the midst of English's monumental ambiguity, I'll never know!)
We say "The right type for that field is "integer"". "Type" is a noun in this context. While "kind" is also a noun, it is rarely(?) used without an "...of [something]" clause after it, which wouldn't be the case with "type", as used in our context.
Open Office's data types________
Open Office is able to work with data stored in many different ways. How it is stored is the data's "type". In the Open Office documentation, you often see the term "field type", which is just a concise way of saying "The data type used for storing the data held in this field."
Pros and cons...
When you know that a given field is only ever going to hold a whole number.... for example, in our service department database, let's say that we're noting the length of each call in minutes, to the nearest whole minute.... it will pay us to use a data type which is
using minimal space inside the computer
being added, subtracted, etc
... although the "price" will be that it is...
An example of that data type being inflexible: you won't be able to say that the call took "about 10" minutes.
Before you dismiss too quickly the business about using minimal space inside the computer, remember that not only will that save you disc space (not, perhaps, too important in many cases today), but it will mean that when the database program is working with your data for you, there won't be so much data to deal with, so it will be able to produce results more quickly.
On the other hand, suppose the field is for the name of the caller. We can use a data type that can't supply numbers for arithmetic, so some of the problems associated with numeric fields become irrelevant. On the other hand, to store a name, we have to be able to cope with more characters than just 0, 1, 2, 3..... 8, 9.
Many times your choice of data type involves making a compromise between the space the type requires to store something, and the flexibility you need for the data involved.
Part of the "art" of designing databases is being able to see how our chaotic world can be tamed for the relatively inflexible world of the computer.
Sometimes you employ "tricks". Years ago, dealing with fractions was difficult for small computers. If you wanted to use one for tracking your finances, you could impose the difficulty of fractions on the project... which made things run slowly, in those days... or you could be creative and do your accounts in pennies. It really was no burden for my brain to look at "Car serviced: 12550" and interpret it as $125.50 rather than $12,550.00
Open Office's data types- Broad categories________
The broad categories of data types available to you are:
Types for stings of characters, be they letters, digits, other
Types for dates and/ or times
Types for numbers
That's one way of grouping the types... and probably the most important way to do it.
You could also split all of the types into these two categories:
Variable length types
Fixed length types
In fact, in Base, all fields are of "fixed" length... for a given instance of a field. The "variable length" types allow you to say how much space to set aside for each field in a given database's table. You can't, in Base, have records within which a field has a different length compared to the length of that same field in a different record within the same database.
If you have data, say people's names, (which are of course of different lengths), you set the length for that field so that it is long enough for the longest name you need to enter. The computer will happily deal with the "wasted" space in the record holding "Bill" in a field which could hold "Ermyntrude". (I actually had a great aunt of that name!)
Again, we come back to the "art" of working with databases. If you set the maximum length for the name field to 8, then dear aunt Ermyntrude may have to tolerate "Eymyntru". For some purposes, this would be acceptable. For others, it would not be. Field lengths can (usually) be re-specified later... but doing so can be a pain, and is an activity you should seek to avoid.
Fields from the second group of data types always take the same number of "spaces" in the computer's memory and on its disc. These are the "fixed length" data types. They take the same amount of space for every field of a particular data type in every ooBase database ever made.
There are databases... not ooBase... which allow you to have fields which differ in their length from record to record. I.e. the "name" field in Bill's record uses less space than the name field of Ermyntrude's record. (You can, of course, have, say, a "Street Address" field which is a different length from the "Name" field.
A little aside, demonstrating the sort of "trick" that helps you overcome the limitations of database: If you were doing a database, say, of regional pronunciations of different words, and you had small .wav files of individual words being spoken, you might find that the lengths of those files varied enormously. Within ooBase, what you could do is incorporate those .wav files by reference. You wouldn't put the .wav file into your database, but you would put the path to it in a field, e.g. C:\MyDocs\RegionalDialects\Yorkshire\HeyUp.wav. The value of adding that complication would be that the database (and associated files) would take much less disc space. If you had 100 words recorded, and the longest .wav file was 2kb, even though most were only about 1kb, you would save something like 100kb of disc space.
But I digress!
Open Office's data types- The details________
Now we will go through the data types available in ooBase, organizing them by the broad categories I set out above. I won't pretend to know everything about each data type. If you can clarify any of the things I've left vague, I'd be pleased to hear from you!
When you are setting the specifications for your data table, there is a listbox, from which you choose a type. In each case, there is a name for the type, and then, in square brackets, something like [VARCHAR]. I suspect that the less- than- human- friendly name in the square brackets is a name for the type used in the broader SQL community. (ooBase is based on HSQL, a variant of SQL.)
You may notice that sometimes there seem to be multiple names for (almost?) the same thing. I believe this has happened because the project team is trying to make ooBase inter-operate with a wide selection of alternative database engines.
Data types for strings of characters, be they letters, digits, punctuation marks or other: the text and char types
I mentioned "digits" to emphasize that you can put digits into fields of the text and char types. ooBase won't "see" them as numbers. It will be as confused by 4 + 3, if those digits are in a text-type field as it would be by $ + %... but you can store numbers in text-type fields. (There are frequently good reasons for doing so.)
Text-type fields give you great flexibility. If in doubt: Use a text-type field. You may thereby limit your access to special features (like doing arithmetic with the data), but you won't find yourself prevented from storing something you want to in the field concerned, unless it is too long.
There is a text type field of fixed size. It's size is 2 gigabytes per record, so I'm not sure who would find it useful!
It is the "Memo" type, marked [longvarchar].
There are three text types which allow you to specify the number of characters to be used per field:
I tend to use the first, because that is what the wizard generally gives you. I would guess that the second lets you find "FISHCAKES" even if you search for "fishcakes", but there are better ways to make searches ignore case. I'm not sure what is "fixed" about the last data type. You are certainly allowed to change the number of characters devoted to the field. It may have something to do with how the string of characters is stored. In many computer applications there are two basic schemes. In one, if you want to store "ABC", you store a 3, then the codes for A and B and C. In the other, you store the A, the B and the C, and then a zero. If that's not enough confusion, consider the following. Within a database, we could have a situation where the data is required to be the same length in each case, e.g. you could not (in these cases) accept both Bill and Ermyntrude. A US Social Security Number, for instance, always has 9 digits. In those cases, assuming a nine character field, you would store just the 9 digits. You would not need to use either of the solutions which allow you to use less than all of the space that has been put aside for what is to be stored in the field.
The good news? Usually you don't have to think about any of that!
Data types for dates and/ or times
Dates and times can be a Big Pain In The Butt. Happily, in database work, they are less problematic that they sometimes are elsewhere... if you use date or time data types for such fields! (The fun starts when you embark on questions like "How long has it been since that customer last called us? How many days was it between the two calls?" (Quick: How many days is it between February 15th and May 15th? See... you can't do it... and computers have trouble to. Was it a leap year?)
Base gives you three date/ time data types. The memory / disc space required is not indicated.
If you cannot always be sure of having an accurate or complete date or time, and you need to preserve the fact that the information is approximate, there are two solutions (at least) open to you.
For each date or time, you can add an extra field called "Approximate", and put Y or N in that field. Or you could use a numeric field and put 5 in it if the date, say, might be wrong by up to 5 days.
I half remember that if you were entering data into a form, ooBase would automatically insert the current date, or time, or a combined date and time datestamp into fields of the appropriate type, as you entered the data. This doesn't seem to happen any more. Automatically putting a date/timestamp on a record when it is entered isn't trivial, but if you need it now, there's a discussion in a discussion at oooForum. You have to scroll down a bit.
To stray a little from our discussion of data types to show you some more artfulness: Imagine you are dealing with things for which you will sometimes have a year, but no month, or perhaps a year and month, but no day. What you might, in some cases, want to do would be to set up separate fields for year, month, and day. When the month or day information was unreliable, you would enter 0. (That's an example of a "rogue value"... you don't mean that the month was "zero"... the value "zero" carries a special message, by agreement between all of the people using the database.) Having the year, month and day in separate fields is fraught with shortcomings, however. If you enter a date of 30 February 2006 into a field of type "date", the computer will moan; it will protect you from yourself. A data entry form can be "taught" to check data, even if it has to look at the values in several fields to apply a test... but it is much more work for you.
Data types for numbers- whole numbers
Whole books are written on the subject of data types for numbers.
In many cases, you won't much care which type you use, but you should begin to have some respect for the options open to you.
The options arise due to the efforts by computer scientists to optimize our computing experience.
If you can be certain that your number will always be a whole number, i.e. that there will be no fractional part, then you want one of the integer types, or "real", or "float" or "double". The latter three always take 17 "spaces" of memory or disc space. Because of "clever" things they do, they can hold Really Big numbers... number more than 17 digits long. Unless you are doing astrophysics, I doubt you'll need them.
There are four integer types. They differ in the space they take up, which is given after the "S" below:
I think those numbers may be misleading, or just wrong, but they're the best I have.
Why the different types, if all of these are for whole numbers?
I'm not sure if there are examples in ooBase's data types, but there ARE data types which can only hold positive whole numbers. If you are designing a database to keep track of the children in different houses of a school, such a data type would be fine. For giving up the same amount of disk space to each field, one of these types can store bigger numbers than the ones that can store negative numbers as well.
As hinted in what I just said, all types are limited to a certain biggest possible, and a certain smallest (or most negative) possible number. The limits are different for different data types.
Let's look at the specifics of what ooBase's types can manage....
I believe that a Tiny Integer can hold negative 128 to 128, and will only tie up 1 "space" of memory or disc space per datum. (I know that statement disagrees with what I reported in the list above, but I found conflicting information in my researches.) (See "Experiments" below, if you want to dig into this deeper. I don't recommend it, for the general reader!)
In general, I would just use the "Integer" type, and be done with it.
You may find that some, but probably not all, of the various integer types do allow you to enter negative numbers in a field. Experiment. Tell me what you discover?
Data types for numbers- numbers with fractions
ooBase offers two data types for numbers with fractions. I'm surprised there aren't more... many more are possible!
The two they admit to are....
In both cases, you can tell ooBase how many decimal places to record the number to. In neither case are we told how much space the field will occupy. The difference between the types? I don't know. At least things are simple here for us!
In general, the different data types used to represent numbers with fractions involve all of the trade- offs we've already discussed, plus another one called "precision". A certain type may be able to record a huge number, but not be able to store the fraction part to very many decimal places. Another may not be able to cope with large numbers, but still be able to store it with many decimal places of the fractional part of the number.
Scientific notation is usually involved, and I could discuss that side of things, but I doubt many readers would thank me. Send me an email if you disagree!
Miscellaneous special data types
We'll start with something simple: The "Yes/No [boolean]" type.
If you assign this type to a field, then you can only store "Yes" or "No" in the field. (This may be called "True" or "False", "1" or "0", or be shown with a tick (or absence of tick) in a box.) (One mystery arose as I was researching this for you: The ooBase form designer does not seem to be able to deal with fields which are of type "Boolean". I suspect you add them to a form the same way as you add image-type data fields to a form. (See next paragraph.)) ooBase will probably check what you've entered, and reject invalid entries, or just turn everything except, say, "Y" into "N". You can mimic this type with a simple text type field, length 1. I can almost certainly... although I couldn't quickly find how just now... write something to force the user to entering a y or an n, not, say, an x. I might want to allow the user "?" as well... which "Yes/No" won't allow (although you could fudge it by adding "nothing in field" as an option. But that would not be good. "Nothing" might signify "don't know", or "forgot to fill this field in". (There's more on matters like this in my tutorial on data validation.)
Base offers four other "special" data types. At first, from reading the documentation, I thought that you couldn't change the space devoted to any of them; they each seemed take up 2 gigabytes of space per record... at least I suspect that I mis- read it... there must be a way to store shorter binary- type data. Since my initial study of the subject, I explored creating databases with images stored in them, in Image [longvarbinary]-type fields. In that case, at least, if you add a 200k image to a database, the size of the database only grows by about 200k. Whew.
The Good Thing about such types is that you can store anything in those fields. The Bad News is that you can't expect ooBase (or any other databases) to have a built-in capability for presenting all of the sorts of "stuff" coded into the numbers you have stored in the field. (ooBase can however present many things stored in widely used formats, at least many sorts of image files.) Your numbers could represent an image, a sound, the key to decrypt an encrypted file, etc, etc.
That's the worst of this essay behind you!
That's it! Well done, you, struggling on to this point.
Just in conclusion, to assure you that you've understood the above, what about the customer helpdesk example I started with? These are the data types you might assign to the various fields. These are not the only possible reasonable answers.
Date the call arrived: Date type field
Time the call arrived: Time
How long it was (in whole minutes): Integer, or Small Integer
Who dealt with the call: Text (see below)
Note about what the call concerned: Text
(I put the date and time that the call arrived in separate fields to facilitate asking "What times of the day are our busiest?")
While the "Who dealt with the call" is held in a simple text- type field, there is great scope for being "clever" here, in the pursuit of accurate and complete data.... but that's a story for another time. (In a nutshell- I'd use employee codes.)
Congratulations (and thanks) for sticking with this rather dry and comprehensive presentation.
Some experiments ________
In order to clear up some of the gray areas implicit in the above, I've done some experiments.
With what ooBase calls a "Tiny Integer": If you enter -120 or 120, you get what you entered. If you enter 128, when you leave or save the record, ooBase changes it to negative 128. This is not as strange as it seems, if you understand the underlying mathematics. I infer that a TinyInt, in technical terms, is a signed 8 bit number. (This is the basis of what I said above about it requiring 1 "space" (byte) of memory or storage space right.... Perhaps the documentation writers were afraid that people might think that the biggest TinyInt you could enter was 9. But you can't enter 128, let alone 999, so they just created a different confusion, if that was their thinking.) If a TinyInt is a signed 8 bit number, it can store negative 128 to positive 127, inclusive. If you enter numbers outside that range, you will be in a situation similar to that of a car with only 5 digits on its odometer when it goes a mile past 99999. With the car, you go back to 0. With a signed number, you go from the biggest number that can be shown to the smallest. The good news is that ooBase doesn't fall over, at least in the examples I've tried. If you enter 260, when you leave or save the record, it will be changed to 4. This initially lead me astray... I thought maybe a TinyInt was an unsigned 8 bit number. They can store 0-255 inclusive. The 4 makes sense, but it is easy to mis-interpret it! It would use one "space" in memory or on your disc.
With what ooBase calls a "Small Integer": You can enter numbers from negative 32768 to 32767. Try to go outside this range, and your number will change as soon as you leave the record, or try to store it. This suggests that an Small Integer is a signed 16-bit number. As such, it would use two "spaces" in memory or on your disc.
With what ooBase calls an "Integer": You can enter numbers from negative 2147483648 to 2147483647. Try to go outside this range, and your number will change as soon as you leave the record, or try to store it. This suggests that an ooBase Integer is a signed 32-bit number. As such, it would use four "spaces" in memory or on your disc.
I dislike 'fancy' websites with more concern for a flashy
appearance than for good content. For a pretty picture, I can go to
an art gallery. Of course, an attractive site WITH content deserves
praise... as long as that pretty face doesn't cost download time. In
I am trying to present this material in a format which makes it
easy for you to USE it. There are two aspects to that: The way it is
split up, and the way it is posted. See the main index to this material for more information about the way it is split up, and the way it is posted.
Ad from page's editor: Yes.. I do enjoy
compiling these things for you... I hope they are helpful. However..
this doesn't pay my bills!!! If you find this stuff useful, (and you
run an MS-DOS or Windows PC) please visit my freeware and shareware
page, download something, and circulate it for me? Links on your page
to this page would also be appreciated!