|
|
||||||||||||
|
|
|||||||||
|
|||||||||
|
Project on RDBMS and ORDBMS |
|||||||||
|
We are providing Projects for your business growth and to meet new challenges. Here are some projects prepared by our team of "Developing New Projects" for the Guarantee of your business growth
History of Database
Database
It is often said that we
live in an information society and that information is a very valuable resource
(or, as some people say, information is power). In this information society, the
term database has become a rather common term although its meaning seems to have
become somewhat vague as the importance of database systems has grown. Some
people use the term database of an organization to mean all the data in the
organization (whether computerized or not). Other people use the term to mean
the software that manages the data. We would use it to mean a collection of
computerized information such that it is available to many people for various
uses.
“A database is a well organized
collection of data that are related in a meaningful way which can be accessed in
different logical orders but are stored only once. The data in the database is
therefore integrated, structured, and shared.”
Or
“A Database can be considered an
organized collection of information.”
For example, it is easy to find a book in
a library because books are arranged in an organized manner. The tools that help
us
Ü
Store Data
Ü
Adding new
Information
Ü
Changing the
existing information
Ü
Deleting the
existing information
Ü
Viewing the
information
The data is managed by a
Database Management System.
The phone book is a good example of a
simple database. For example, to find the phone number of Mr. Jonathan
Smith, living at No.5 4 th Avenue Apt 15, New York, NY, one may use the
following process: Get the phone book for the city of New York. Flip to the
section where the last name is ‘Smith’. More than one Mr. Smith may be listed.
In that case, look for Mr. Smith whose first name is Jonathan. Look for Jonathan
Smith living at No. 5 4 th Avenue Apt 15, New York, NY to get the corresponding
phone number. In the example above, the phone number can be found because the
information in the phone book was organized in a certain manner.
In the early days of
computing, computers were mainly used for solving numerical problems. Even in
those days, it was observed that some of the tasks were common in many problems
that were being solved. It therefore was considered desirable to build special
subroutines which perform frequently occurring computing tasks such as computing
Sin(x). This lead to the development of mathematical subroutine libraries that
are now considered integral part of any computer system.
By the late fifties
storage, maintenance and retrieval of non-numeric data had become very
important. Again, it was observed that some of the data processing tasks
occurred quite frequently e.g. sorting. This lead to the development of
generalized routines for some of the most common and frequently used tasks. A
general routine by its nature tends to be somewhat less efficient than a routine
designed for a specific problem. In the late fifties and early sixties, when the
hardware costs were high, use of general routines was not very popular since it
became a matter of trade-off between hardware and software costs. In the last
two decades, hardware costs have gone down dramatically while the cost of
building software has gone up. It is no more a trade-off between hardware and
software costs since hardware is relatively cheap and building reliable software
is expensive.
Database management
systems evolved from generalized routines for file processing as the users
demanded more extensive and more flexible facilities for managing data. Database
technology has undergone major changes during the 1970's. An interesting history
of database systems is presented by Fry and Sibley (1976).
Types of Databases
There are many techniques and technologies for managing
Ü
DBMS
(Database Management Systems)
§
dBase, Fox Pro
Ü
HDBMS
(Hierarchical DBMS)
§
IMS
Ü
NDBMS
(Network DBMS)
§
IDMS
Ü
RDBMS
(Relational DBMS)
§
Oracle, SQL Server, Sybase, Informix, etc.
Ü
ORDBMS
(Object Relational DBMS)
§
Oracle
DBMS: -
Database Management Systems have been
used for quite a while. Software packages like dBase; Clipper, etc had
facilities for managing data. The user had to take care of two aspects – specify
what data was
to be retrieved and
how to go and get the
data.
HDBMS: -
Managed the data in a hierarchical fashion. IBM’s product IMS is a Hierarchical
Database. It is used in mainframe computers. It is still in use in some
installations. Over a period of time the demands of the users prompted a search
for alternate approaches.
NDBMS: -
Managed the data according to network
models. IBM’s product IDMS is a Network Database. It is also used in mainframe
computers. Like the HDBMS (IMS), it is still used in a few installations. Over a
period of time disadvantages of the network database technology prompted the
search for an alternate solution. With more powerful personal computer systems
and the inherent limitations of the Hierarchical and Network database models,
the popularity of the Relational Database Model increased.
RDBMS: -
Was invented by the team lead by Dr.
Edmund F. Codd and funded by IBM in the early 1970’s. The Relational Model is
based on the principles of relational algebra. This model also known as the
Relational Database Management System is very popular and is in use by a
majority of the Fortune 500 companies. This model will be used in great detail
in this course. Many vendors sell products that conform to this model. Some of
these vendors are: Oracle, SQL Server, Sybase, Informix, Ingress, Gupta SQL,
DB2, and Microsoft Access.
ORDBMS: -
This model has been proposed and is presently being implemented by the Oracle
Corporation. This model addresses the shortcomings of the RDBMS model. The
Object Relational Database Management System is not purely object oriented.
However, it is implemented via the same relational engine that drives the
Relational Database Management System.
Data is stored in tables. A table is the combination of columns and rows. Each
column contains a certain type of data about a person. For example, the
first column contains the last name, the second column contains the first name
and the third column contains the address. Each row contains all types of
data about a person.
Client / Server Database Concepts
The Client computer sends requests to the server. The
Server processes the requests and delivers results to the client.
Server: -
This is usually a high powered computer. Typically it could consist of several
processors (2, 3 or 4), a very large amount of RAM (2GB or 4GB), a very large
amount of hard disk storage space (possibly a few hundred GB), high speed
network data transfer rates, etc. The server need not be from a specific vendor.
It could be purchased from any vendor. The server platform may be UNIX, Windows
NT, AIX, Sun Solaris, Novell, etc.
Client: -
These are usually personal computers. They have processing power of their own.
The client operating system may be Windows 95, Windows 98, Windows 2000, Windows
NT Workstation, Sun Sparc Station, Mac OS, etc.
Operation: -
Operation of the client server network is as follows:
• The server is powered on and is ready to receive connections / requests. The
network administrator configures the server.
• Each client is powered on and is required to provide user identification and
authentication information (such as a user ID and a password) to connect to the
server. In a multiple server environment, the client computer may also be
required to specify the location of the server to which it must connect. This
setup creates the infrastructure for other programs to be used on this
configuration. In most environments, databases, e-mail servers, web servers are
run on a server computer. Clients use the resources offered by these databases,
websites, etc.
DBMS
(Database Management System)
As discussed earlier, a
database is a well organized collection of data. To be able to carry out
operations like insertion, deletion and retrieval, the database needs to be
managed by a substantial package of software. This software is usually called a
Database Management System (DBMS). The primary purpose of a DBMS is to allow a
user to store, update and retrieve data in abstract terms and thus make it easy
to maintain and retrieve information from a database. A DBMS relieves the user
from having to know about exact physical representations of data and having to
specify detailed algorithms for storing, updating and retrieving data.
A DBMS is usually a very
large software package that carries out many different tasks including the
provision of facilities to enable the user to access and modify information in
the database. The database is an intermediate link between the physical
database, the computer and the operating system, and on the other hand, the
users. To provide the various facilities to different types of users, a DBMS
normally provides one or more specialized programming languages often called
Database Languages. Different DBMS provide different database languages although
a language called SQL has recently taken on the role of a de facto standard.
Database languages come in different forms. A language is needed to describe the
database to the DBMS as well as provide facilities for changing the database and
for defining and changing physical data structures. Another language is needed
for manipulating and retrieving data stored in the DBMS. These languages are
called Data Description Languages (DDL) and Data Manipulation Languages (DML)
respectively.
To summarize, a database
system consists of
Some DBMS packages are marketed by computer manufacturers that will run only on
that manufacturer's machines (e.g. IBM's IMS) but increasingly independent
software houses are designing and selling DBMS software that would run on
several different types of machines (e.g. ORACLE).
Evolution of Database Management Systems
The history of Database Management Systems (DBMS) is an evolutionary
history. Each successive wave of innovation can be seen as a response either to
some limiting aspect of the technology preceding it or to demands from new
application domains. In economic terms, successful information technology lowers
the costs incurred when implementing a solution to a particular problem.
Lowering development costs increases the range of information management
problems that can be dealt with in a cost-effective way, leading to new waves of
systems development.
As an example of changing development costs, consider that early mainframe
information systems became popular because they could make numerical
calculations with unprecedented speed and accuracy (by the standards of the
time). This had a major impact on how businesses managed financial assets and
how engineering projects were undertaken. It took another evolutionary step—the
invention of higher level languages such as COBOL—to get to the point that the
economic return from a production management system justified the development
cost. It took still another innovation—the adoption of relational DBMS before
customer management and human resource systems could be automated. Today it is
difficult to imagine how you could use COBOL and ISAM to build the number and
variety of e-commerce Web sites that have sprung up over the last three years.
And looking ahead, as we move into an era of ubiquitous, mobile computers and
the growing importance of digital media, it is difficult to see how RDBMS
technology can cope.
Early Information Systems
Pioneering information systems of the late 1960s and early 1970s were
constrained by the capacity of the computers on which they ran. Hardware was
relatively expensive, and by today’s standards, quite limited. Consequently,
considerable human effort was devoted to optimizing programming code. Diverting
effort from functionality to efficiency constrained the ambition of early
developers, but the diversion of effort was necessary because it was the only
way to develop an information system that ran with adequate performance.
Information systems of this era were implemented as monolithic programs
combining user-interface, data processing logic, and data management operations
into a single executable on a single machine. Such programs intermingled
low-level descriptions of data structure details, how records were laid out on
disk, how records were interrelated, with user-interface management code.
User-interface management code would dictate, for example, where an element from
a disk record went on the screen. Maintaining and evolving these systems, even
to perform tasks as simple as adding new data fields and indexes, required
programmers to make changes involving most aspects of the system. Experience has
shown that development is the least expensive stage of an information system’s
life cycle. Programming and debugging take time and money, but by far the
largest percentage of the total cost of an information system is incurred after
it goes into production. As the business environment changes, there is
considerable pressure to change systems with it. Changing a pre-relational
information system, either to fix bugs or to make enhancements, is extremely
difficult. Yet, there is always considerable pressure to do so. In addition,
hardware constantly gets faster and cheaper. A second problem with these early
systems is that as their complexity increased, it became increasingly important
to detect errors in the design stage. The most pervasive design problem was
redundancy, which occurs when storage structure designs record an item of
information twice.
Different DBMS Architectures
There are three types of Architectures in the Database. They are as under
Hierarchical Model
The hierarchical model is
the basis of the oldest database management systems, which grew out of early
attempts to organize data for the U.S space program. Since these databases were
ad-hoc solution to immediate problems, they were created without the strong
theoretical foundations that later systems had. Their designers were familiar
with file organizations and data structures, and used these concepts to solve
the immediate data representation problems of users. The hierarchical model uses
the tree as its basic data structure. Nodes of the trees in the hierarchical
model represent data records or record segments, which are the portions of the
data records. Relationships are represented as links or pointers between nodes.
The Network Model
The network uses a network
or plex structure, which is data structure consisting of nodes and branches.
Unlike a tree, a plex structure allows a node to have more than one parent. The
nodes of the network represent records of various types. Relationships between
records are represented as links, which become pointers in the implementation.
Relational Model
The relational model was
proposed by Codd in 1970 and continues to be the subject of much research. It is
now widely used by both mainframe and microcomputers-based DBMSs because of its
simplicity from user’s point of view and its power. The relation model uses the
theory of relations from mathematics and adapts it for use in database theory.
In relation model both entities and relationships are represented by relations
which are physically represented as tables or two-dimensional arrays, and
attributes as columns of those tables.
Before going into the
detail of the Relational and Object Relational Database Management Systems we
should have a knowledge of the Relational Design.
RDBMS
(Relational Database Management System)
In 1970 an IBM researcher named Ted Codd wrote a paper that described a new
approach to the management of “large shared data banks.” In his paper Codd
identifies two objectives for managing shared data. The first of these is
data independence, which dictates that applications using a database be
maintained independent of the physical details by which the database organizes
itself. Second, he describes a series of rules to ensure that the shared data is
consistent by eliminating any redundancy in the database’s design. Codd’s
paper deliberately ignores any consideration of how his model might be
implemented. He was attempting to define an abstraction of the problem of
information management: a general framework for thinking about and working with
information. The Relational Model Codd described had three parts: a data model,
a means of expressing operations in a high-level language, and a set of design
principles that ensured the elimination of certain kinds of redundant data
problems. Codd’s relational model views data as being stored in tables
containing a variable number of rows (or records), each of which has a
fixed number of columns. Something like a telephone directory or a
registry of births, deaths, and marriages, is a good analogy for a table. Each
entry contains different information but is identical in its form to all other
entries of the same kind. The relational model also describes a number of
logical operations that could be performed over the data. These operations
included a means of getting a subset of rows (all names in the telephone
directory with the surname “Brown”), a subset of columns (just the name and
number), or data from a combination of tables (a person who is married to a
person with a particular phone number). By borrowing mathematical techniques
from predicate logic, Codd was able to derive a series of design principles that
could be used to guarantee that the database’s structure was free of the kinds
of redundancy so problematic in other systems. Greatly expanded by later
writers, these ideas formed the basis of the theory of normal forms. Properly
applied, the system of normal form rules can ensure that the database’s logical
design is free of redundancy and, therefore, any possibility of anomalies in the
data.
Elements of a Relational Database
There are four basic elements which are necessary for the Relational Database.
Those are given below.
Ü
Relations
Ü
Attributes
Ü
Domains
Ü
The Relational
Operators.
Codd describe a data storage system that possessed
three characteristics that were sorely needed at that time:
1.
Logical data
independence:
- This desirable characteristic means that changes made to an attribute
(column), for example, an increase or decrease in size have no perceivable
effect on other attributes for the same relation (table). Logical data
independence was attractive to data processing organizations because it could
substantially reduce the cost of software maintenance.
2.
Referential and
data integrity: -
Unlike other database systems, a relational database would relieve the
application software of the burden of enforcing integrity constraints. Codd
described two characteristics that would be maintained by a relational database,
referential integrity and data integrity.
3.
Ad hoc query: -
This characteristic would enable the user to tell the database which data to
retrieve without indicating how to accomplish the task.
Some time passed before a commercial product actually implemented some of the
relational database features that Codd described. Today many vendors sell Relational
Database Management Systems; some of the more well-known vendors are
Oracle, Sybase, IBM, Informix, Microsoft,
and Computer Associates. Of these
vendors, Oracle has emerged as the leader. The Oracle
RDBMS engine has been ported to more platforms than any other database
product. Because of Oracle's multiplatform support, many application software
vendors have chosen Oracle as their database platform. And now, Oracle
Corporation has ported the same RDBMS engine to the desktop environment with its
release of Personal Oracle.
Relational Database Implementation
Today's relational databases
implement a number of extremely useful features that Codd did not mention in his
original article. However, as of this writing, no commercially available
database is a full implementation of Codd's rules for relational
databases.
During the early 1970s, several parallel research projects set out to implement
a working RDBMS. This turned out to be very hard. It wasn’t until the late 1980s
that RDBMS products worked acceptably in the kinds of high-end, online
transactions processing applications served so well by earlier technologies.
Despite the technical shortcomings RDBMS technology exploded in popularity
because even the earliest products made it cheaper, quicker, and easier to build
information systems. For an increasing number of applications, economics favored
spending more on hardware and less on people. RDBMS technology made it possible
to develop information systems that, while desirable from a business management
point of view, had been deemed too expensive. To emphasize the difference
between the relational and pre-relational approaches, a four hundred line C
program can be replaced by the SQL-92 expression in Listing below.
CREATE TABLE Employees (
Name VARCHAR(128),
DOB DATE,
Salary DECIMAL(10,2),
Address VARCHAR(128)
);
The code in list implements considerably more functionality than a C program
because RDBMS provide transactional guarantees for data changes. They
automatically employ locking, logging, and backup and recovery facilities to
guarantee the integrity of the data they store. Also, RDBMS provide elaborate
security features. Different tables in the same database can be made accessible
only by different groups of users. All of this built-in functionality means that
developers focus more of their effort on their system’s functionality and less
on complex technical details.
Relational Database
Theory
With relational database, systems can be considered complete unless it discusses
the basic concepts of relational database theory. In a nutshell:
Ü
Every entity has a
set of attributes that uniquely define an instance of that entity.
Ü
No part of the
primary key can be null.
Ü
Every entity has a
set of attributes that uniquely identify each row in that entity.
Ü
A table cannot have
duplicate rows.
Ü
Tables are related
to one another.
Ü
The order of the
rows in a table is arbitrary.
Ü
The order of the
columns in a table is arbitrary.
There are three standard levels for Relational Database management system
Ü
Conceptual level
Ü
External level
Ü
Internal level
External Level
The external level consist of many different external views or external models
of the database. Each user has a model of the real world represented in a form
that is suitable for that user. A particular user interacts with only certain
entities in the real world and is interested in only some of their attributes
and relationships. Therefore, that user’s view contain only information about
those aspects of the real world.
Conceptual Level
The conceptual level
consists of base tables, which are physical stored tables. These tables are
created by Database Administrator using a CREATE Table command. Once the table
is created the DBA can create “VIEW” for Users. A view may be a subset of single
base table or may be created by combining base tables or performing other
Operations on them.
Internal Level
The level which covers the
physical implementation of the database. It includes the data structure and file
organization used to store data on physical storage devices. The internal
schema, written in DDL, is a complete description of the internal model. It
includes such items as how data is represented, how records are sequenced, what
index exist, what pointers exist. An internal record is a single stored record.
It is the unit that is passed up to the internal level.
Primary Key
Every entity has a set of attributes that uniquely define an instance of that
entity. This set of attributes is called the primary key. The primary key
may be composed of a single attribute
No Part of the Primary Key Is Null
A basic tenet of relational theory is that no part of the primary key can be
null. If you think about that idea for a moment, it seems intuitive: The primary
key must uniquely identify each row in an entity; therefore, if the primary key
(or a part of it) is null, it wouldn't be able to identify anything.
Data Integrity
According to relational theory, every entity has a set of attributes that
uniquely identify each row in that entity. Relational theory also states that no
duplicate rows can exist in a table.
Referential Integrity
Tables are related to one another through foreign keys. A foreign key is
one table column for which the set of possible values is found in the primary
key of a second table. Referential integrity is achieved when the set of values
in a foreign key column is restricted to the primary key that it references or
to the null value. Once the database designer declares primary and foreign keys,
enforcing data and referential integrity is the responsibility of the DBMS. Five
Normal Forms in Relational Database Theory
INTRODUCTION
The normal forms defined in relational database theory represent guidelines for
record design. The guidelines corresponding to first through fifth normal forms
are presented here, in terms that do not require an understanding of relational
theory. The design guidelines are meaningful even if one is not using a
relational database system. We present the guidelines without referring to the
concepts of the relational model in order to emphasize their generality, and
also to make them easier to understand. Our presentation conveys an intuitive
sense of the intended constraints on record design, although in its informality
it may be imprecise in some technical details. A comprehensive treatment of the
subject is provided by Date.
The normalization rules are designed to prevent update anomalies and data
inconsistencies. With respect to performance tradeoffs, these guidelines are
biased toward the assumption that all non-key fields will be updated frequently.
They tend to penalize retrieval, since data which may have been retrievable from
one record in an un-normalized design may have to be retrieved from several
records in the normalized form. There is no obligation to fully normalize all
records when actual performance requirements are taken into account. FIRST NORMAL FORM
First normal form deals with the "shape" of a record type. Under first normal
form, all occurrences of a record type must contain the same number of fields.
First normal form excludes variable repeating fields and groups. This is not so
much a design guideline as a matter of definition. Relational database theory
doesn't deal with records having a variable number of fields. SECOND AND THIRD NORMAL FORMS
Second and third normal forms deal with the relationship between non-key and key
fields. Under second and third normal forms, a non-key field must provide a fact
about the key, us the whole key, and nothing but the key. In addition, the
record must satisfy first normal form. We deal now only with "single-valued"
facts. The fact could be a one-to-many relationship, such as the department of
an employee, or a one-to-one relationship, such as the spouse of an employee.
Thus the phrase "Y is a fact about X" signifies a one-to-one or one-to-many
relationship between Y and X. In the general case, Y might consist of one or
more fields, and so might X. In the following example, QUANTITY is a fact about
the combination of PART and WAREHOUSE. Second
Normal Form
Second normal form is violated when a non-key field is a fact about a subset of
a key. It is only relevant when the key is composite, i.e., consists of several
fields. Consider the following inventory record: --------------------------------------------------- | PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS | ====================-------------------------------
The key here consists of the PART and WAREHOUSE fields together, but
WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The basic problems with
this design are:
To satisfy second normal form, the record shown above should be decomposed into
(replaced by) the two records: ------------------------------------------------------------ | PART | WAREHOUSE | QUANTITY |WAREHOUSE | WAREHOUSE-ADDRESS | ====================----------- =============-----------------
When a data design is changed in this way, replacing unnormalized records with
normalized records, the process is referred to as normalization. The term
"normalization" is sometimes used relative to a particular normal form. Thus a
set of records may be normalized with respect to second normal form but not with
respect to third. The normalized design enhances the integrity of the data, by
minimizing redundancy and inconsistency, but at some possible performance cost
for certain retrieval applications. Consider an application that wants the
addresses of all warehouses stocking a certain part. In the unnormalized form,
the application searches one record type. With the normalized design, the
application has to search two record types, and connect the appropriate pairs.
Third Normal Form
Third normal form is violated when a non-key field is a fact about another
non-key field, as in ------------------------------------ | EMPLOYEE | DEPARTMENT | LOCATION | ============------------------------
The EMPLOYEE field is the key. If each department is located in one place, then
the LOCATION field is a fact about the DEPARTMENT -- in addition to being a fact
about the EMPLOYEE. The problems with this design are the same as those caused
by violations of second normal form:
To satisfy third normal form, the record shown above should be decomposed into
the two records: ------------------------- ------------------------- | EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION | ============------------- ==============-----------
To summarize, a record is in second and third normal forms if every field is
either part of the key or provides a (single-valued) fact about exactly the
whole key and nothing else.
Functional Dependencies
In relational database theory, second and third normal forms are defined in
terms of functional dependencies, which correspond approximately to our
single-valued facts. A field Y is "functionally dependent" on a field (or
fields) X if it is invalid to have two records with the same X-value but
different Y-values. That is, a given X-value must always occur with the same
Y-value. When X is a key, then all fields are by definition functionally
dependent on X in a trivial way, since there can't be two records having the
same X value.
There is a slight technical difference between functional dependencies and
single-valued facts as we have presented them. Functional dependencies only
exist when the things involved have unique and singular identifiers
(representations). For example, suppose a person's address is a single-valued
fact, i.e., a person has only one address. If we don't provide unique
identifiers for people, then there will not be a functional dependency in the
data: ---------------------------------------------- | PERSON | ADDRESS | -------------+-------------------------------- | John Smith | 123 Main St., New York | | John Smith | 321 Center St., San Francisco | ----------------------------------------------
Although each person has a unique address, a given name can appear with several
different addresses. Hence we do not have a functional dependency corresponding
to our single-valued fact.
Similarly, the address has to be spelled identically in each occurrence in order
to have a functional dependency. In the following case the same person appears
to be living at two different addresses, again precluding a functional
dependency. --------------------------------------- | PERSON | ADDRESS | -------------+------------------------- | John Smith | 123 Main St., New York | | John Smith | 123 Main Street, NYC | ---------------------------------------
We are not defending the use of non-unique or non-singular representations. Such
practices often lead to data maintenance problems of their own. We do wish to
point out, however, that functional dependencies and the various normal forms
are really only defined for situations in which there are unique and singular
identifiers. Thus the design guidelines as we present them are a bit stronger
than those implied by the formal definitions of the normal forms.
For instance, we as designers know that in the following example there is a
single-valued fact about a non-key field, and hence the design is susceptible to
all the update anomalies mentioned earlier. ------------------------------------------------------ | EMPLOYEE | FATHER | FATHER'S-ADDRESS | |============------------+---------------------------- | Art Smith | John Smith | 123 Main St., New York | | Bob Smith | John Smith | 123 Main Street, NYC | | Cal Smith | John Smith | 321 Center St., San Francisco | ------------------------------------------------------
However, in formal terms, there is no functional dependency here between
FATHER'S-ADDRESS and FATHER, and hence no violation of third normal form. FOURTH AND FIFTH NORMAL FORMS
Fourth and fifth normal forms deal with multi-valued facts. The multi-valued
fact may correspond to a many-to-many relationship, as with employees and
skills, or to a many-to-one relationship, as with the children of an employee
(assuming only one parent is an employee). By "many-to-many" we mean that an
employee may have several skills, and a skill may belong to several employees.
Note that we look at the many-to-one relationship between children and fathers
as a single-valued fact about a child but a multi-valued fact about a father.
In a sense, fourth and fifth normal forms are also about composite keys. These
normal forms attempt to minimize the number of fields involved in a composite
key, as suggested by the examples to follow. Fourth
Normal Form
Under fourth normal form, a record type should not contain two or more
independent multi-valued facts about an entity. In addition, the record must
satisfy third normal form.
The term "independent" will be discussed after considering an example.
Consider employees, skills, and languages, where an employee may have several
skills and several languages. We have here two many-to-many relationships, one
between employees and skills, and one between employees and languages. Under
fourth normal form, these two relationships should not be represented in a
single record such as ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | ===============================
Instead, they should be represented in the two records -------------------- ----------------------- | EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE | ==================== =======================
Note that other fields, not involving multi-valued facts, are permitted to occur
in the record, as in the case of the QUANTITY field in the earlier
PART/WAREHOUSE example.
The main problem with violating fourth normal form is that it leads to
uncertainties in the maintenance policies. Several policies are possible for
maintaining two independent multi-valued facts in one record:
(1) A disjoint format, in which a record contains either a skill or a language,
but not both: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | | | Smith | type | | | Smith | | French | | Smith | | German | | Smith | | Greek | -------------------------------
This is not much different from maintaining two separate record types. (We note
in passing that such a format also leads to ambiguities regarding the meanings
of blank fields. A blank SKILL could mean the person has no skill, or the field
is not applicable to this employee, or the data is unknown, or, as in this case,
the data may be found in another record.)
(2) A random mix, with three variations:
(a) Minimal number of records, with repetitions: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | German | | Smith | type | Greek | -------------------------------
(b) Minimal number of records, with null values: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | German | | Smith | | Greek | -------------------------------
(c) Unrestricted: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | | | Smith | | German | | Smith | type | Greek | -------------------------------
(3) A "cross-product" form, where for each employee, there must be a record for
every possible pairing of one of his skills with one of his languages: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | cook | German | | Smith | cook | Greek | | Smith | type | French | | Smith | type | German | | Smith | type | Greek | -------------------------------
Other problems caused by violating fourth normal form are similar in spirit to
those mentioned earlier for violations of second or third normal form. They take
different variations depending on the chosen maintenance policy:
Fourth normal form minimizes such update problems.
1 Independence
We mentioned independent multi-valued facts earlier, and we now illustrate what
we mean in terms of the example. The two many-to-many relationships, employee:
skill and employee: language, are "independent" in that there is no direct
connection between skills and languages. There is only an indirect connection
because they belong to some common employee. That is, it does not matter which
skill is paired with which language in a record; the pairing does not convey any
information. That's precisely why all the maintenance policies mentioned earlier
can be allowed.
In contrast, suppose that an employee could only exercise certain skills in
certain languages. Perhaps Smith can cook French cuisine only, but can type in
French, German, and Greek. Then the pairings of skills and languages becomes
meaningful, and there is no longer an ambiguity of maintenance policies. In the
present case, only the following form is correct: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | French | | Smith | type | German | | Smith | type | Greek | -------------------------------
Thus the employee: skill and employee: language relationships are no longer
independent. These records do not violate fourth normal form. When there is
interdependence among the relationships, then it is acceptable to represent them
in a single record.
2 Multivalued Dependencies
For readers interested in pursuing the technical background of fourth normal
form a bit further, we mention that fourth normal form is defined in terms of
multivalued dependencies, which correspond to our independent multi-valued
facts. Multivalued dependencies, in turn, are defined essentially as
relationships which accept the "cross-product" maintenance policy mentioned
above. That is, for our example, every one of an employee's skills must appear
paired with every one of his languages. It may or may not be obvious to the
reader that this is equivalent to our notion of independence: since every
possible pairing must be present, there is no "information" in the pairings.
Such pairings convey information only if some of them can be absent, that is,
only if it is possible that some employee cannot perform some skill in some
language. If all pairings are always present, then the relationships are really
independent.
We should also point out that multivalued dependencies and fourth normal form
apply as well to relationships involving more than two fields. For example,
suppose we extend the earlier example to include projects, in the following
sense:
If there is no direct connection between the skills and languages that an
employee uses on a project, then we could treat this as two independent
many-to-many relationships of the form EP:S and EP:L, where "EP" represents a
combination of an employee with a project. A record including employee, project,
skill, and language would violate fourth normal form. Two records, containing
fields E,P,S and E,P,L, respectively, would satisfy fourth normal form. Fifth
Normal Form
Fifth normal form deals with cases where information can be reconstructed from
smaller pieces of information that can be maintained with less redundancy.
Second, third, and fourth normal forms also serve this purpose, but fifth normal
form generalizes to cases not covered by the others.
We will not attempt a comprehensive exposition of fifth normal form, but
illustrate the central concept with a commonly used example, namely one
involving agents, companies, and products. If agents represent companies,
companies make products, and agents sell products, then we might want to keep a
record of which agent sells which product for which company. This information
could be kept in one record type with three fields: ----------------------------- | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | GM | truck | -----------------------------
This form is necessary in the general case. For example, although agent Smith
sells cars made by Ford and trucks made by GM, he does not sell Ford trucks or
GM cars. Thus we need the combination of three fields to know which combinations
are valid and which are not.
But suppose that a certain rule was in effect: if an agent sells a certain
product, and he represents a company making that product, then he sells that
product for that company. ----------------------------- | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------
In this case, it turns out that we can reconstruct all the true facts from a
normalized form consisting of three separate record types, each containing two
fields: ------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT | |-------+---------| |---------+---------| |------+---------| | Smith | Ford | | Ford | car | | Smith | car | | Smith | GM | | Ford | truck | | Smith | truck | | Jones | Ford | | GM | car | | Jones | car | ------------------- | GM | truck | ------------------- ---------------------
These three record types are in fifth normal form, whereas the corresponding
three-field record shown previously is not.
Roughly speaking, we may say that a record type is in fifth normal form when its
information content cannot be reconstructed from several smaller record types,
i.e., from record types each having fewer fields than the original record. The
case where all the smaller records have the same key is excluded. If a record
type can only be decomposed into smaller records which all have the same key,
then the record type is considered to be in fifth normal form without
decomposition. A record type in fifth normal form is also in fourth, third,
second, and first normal forms.
Fifth normal form does not differ from fourth normal form unless there exists a
symmetric constraint such as the rule about agents, companies, and products. In
the absence of such a constraint, a record type in fourth normal form is always
in fifth normal form.
One advantage of fifth normal form is that certain redundancies can be
eliminated. In the normalized form, the fact that Smith sells cars is recorded
only once; in the un-normalized form it may be repeated many times.
It should be observed that although the normalized form involves more record
types, there may be fewer total record occurrences. This is not apparent when
there are only a few facts to record, as in the example shown above. The
advantage is realized as more facts are recorded, since the size of the
normalized files increases in an additive fashion, while the size of the
un-normalized file increases in a multiplicative fashion. For example, if we add
a new agent who sells x products for y companies, where each of these companies
makes each of these products, we have to add x+y new records to the normalized
form, but xy new records to the un-normalized form.
It should be noted that all three record types are required in the normalized
form in order to reconstruct the same information. From the first two record
types shown above we learn that Jones represents Ford and that Ford makes
trucks. But we can't determine whether Jones sells Ford trucks until we look at
the third record type to determine whether Jones sells trucks at all.
The following example illustrates a case in which the rule about agents,
companies, and products is satisfied, and which clearly requires all three
record types in the normalized form. Any two of the record types taken alone
will imply something untrue. ----------------------------- | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | | Jones | Ford | truck | | Brown | Ford | car | | Brown | GM | car | | Brown | Toyota | car | | Brown | Toyota | bus | ----------------------------- ------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT | |-------+---------| |---------+---------| |-------+---------| | Brown | Ford | | GM | truck | | Jones | truck | | Brown | GM | | Toyota | car | | Brown | car | | Brown | Toyota | | Toyota | bus | | Brown | bus | ------------------- --------------------- ------------------- Fifth Normal Form
Observe that:
Fourth and fifth normal forms both deal with combinations of multivalued facts.
One difference is that the facts dealt with under fifth normal form are not
independent, in the sense discussed earlier. Another difference is that,
although fourth normal form can deal with more than two multivalued facts, it
only recognizes them in pair wise groups. We can best explain this in terms of
the normalization process implied by fourth normal form. If a record violates
fourth normal form, the associated normalization process decomposes it into two
records, each containing fewer fields than the original record. Any of this
violating fourth normal form is again decomposed into two records, and so on
until the resulting records are all in fourth normal form. At each stage, the
set of records after decomposition contains exactly the same information as the
set of records before decomposition.
In the present example, no pairwise decomposition is possible. There is no
combination of two smaller records which contains the same total information as
the original record. All three of the smaller records are needed. Hence an
information-preserving pairwise decomposition is not possible, and the original
record is not in violation of fourth normal form. Fifth normal form is needed in
order to deal with the redundancies in this case. UNAVOIDABLE REDUNDANCIES
Normalization certainly doesn't remove all redundancies. Certain redundancies
seem to be unavoidable, particularly when several multivalued facts are
dependent rather than independent. In the example shown, it seems unavoidable
that we record the fact that "Smith can type" several times. Also, when the rule
about agents, companies, and products is not in effect, it seems unavoidable
that we record the fact that "Smith sells cars" several times. INTER-RECORD REDUNDANCY
The normal forms discussed here deal only with redundancies occurring within a
single record type. Fifth normal form is considered to be the "ultimate" normal
form with respect to such redundancies.
Other redundancies can occur across multiple record types. For the example
concerning employees, departments, and locations, the following records are in
third normal form in spite of the obvious redundancy:
In fact, two copies of the same record type would constitute the ultimate in
this kind of undetected redundancy.
Inter-record redundancy has been recognized for some time , and has recently
been addressed in terms of normal forms and normalization . CONCLUSION
While we have tried to present the normal forms in a simple and understandable
way, we are by no means suggesting that the data design process is
correspondingly simple. The design process involves many complexities which are
quite beyond the scope of this paper. In the first place, an initial set of data
elements and records has to be developed, as candidates for normalization. Then
the factors affecting normalization have to be assessed:
And, finally, the desirability of normalization has to be assessed, in terms of
its performance impact on retrieval applications.
RDBMS & The Oracle
Product Line
As the world's leading vendor of relational database software, Oracle
Corporation supports its flagship product, the Oracle RDBMS, on more than 90
platforms. The release of Personal Oracle for Windows extends Oracle's reach to
the most popular desktop operating system; Microsoft Windows.
The Oracle RDBMS is available in the following
three configurations:
Ü
The Oracle
Universal Server can support many users on highly scaleable platforms such
as Sun, HP, Pyramid, and Sequent. The various options that are available with
this configuration include the distributed option, by which several Oracle
databases on separate computers can function as a single logical database. The
Oracle Enterprise Server is available for a wide variety of operating systems
and hardware configurations. Oracle Web Server ;an integrated system for
dynamically generating HTML output from the content of an Oracle database; is
also included with the Universal Server.
Ü
The Oracle
Workgroup Server is designed for workgroups and is available on NetWare,
Windows NT, SCO UNIX, and UnixWare. The Oracle Workgroup Server is a
cost-effective and low-maintenance solution for supporting small groups of
users. Oracle Web Server is also available as an option with the Oracle
Workgroup Server.
Ü
Personal Oracle
is a Windows-based version of the Oracle database engine that offers the same
functionality that exists in the Oracle Universal Server and the Oracle
Workgroup Server. Even though Personal Oracle cannot function as a database
server by supporting multiple users, it still provides an excellent environment
for experimentation and prototyping.
We are discussed here generally about personal oracle As in case of relational
database management system
Oracle with Structured
Query Language (SQL)
You communicate with Personal Oracle through Oracle's version of the Structured
Query Language (SQL, usually pronounced sequel). SQL is a nonprocedural
language; unlike C or COBOL in which you must describe exactly how to access and
manipulate data, SQL specifies what to do. Internally, Oracle determines how to
perform the request. SQL exists as an ANSI standard as well as an industry
standard. Oracle's implementation of SQL adheres to Level 2 of the ANSI
X3.135-1989/ISO 9075-1989 standard with full implementation of the Integrity
Enhancement Feature. Oracle (as well as other database vendors) provides many
extensions to ANSI SQL.
In addition, Oracle's implementation of SQL adheres to the U.S. government
standard as described in the Federal Information Processing Standard Publication
(FIPS PUB) 127, entitled Database Language SQL.
Oracle Data Types
The Oracle DBMS provides a
number of different data types for the storage of the different forms of data in
a manner that is most suitable to the manipulations that are likely to be
performed. As part of the Data Modeling process it is important that the most
appropriate data types are identified. Where there is any doubt (e.g. storing a
year as a number or as a date), the data type chosen should reflect the way the
data will be used most often. It is possible to inter-convert between data
types, but this will reduce the efficiency of queries. The use of these data
types in table design is "Manipulating Oracle Tables".
The available Oracle data types the most widely used are these:
Ü
VARCHAR2
Ü
CHAR
Ü
NUMBER
Ü
DATE
Ü
Long
VARCHAR2
This data type should be used for any columns which may contain characters. This
includes alphabetic letters, together with _, -, !, ?, + etc. and numbers as a
character representation (i.e. they look like numbers when displayed, but they
cannot be subjected to arithmetic manipulation. Similarly an attempt to do a
greater than (>), or less than (<) operation will not have the correct
arithmetic result- VARCHAR2
values are compared character by character up to the first character that
differs. Whichever value has the "greater" character in that position is
considered "greater". Characters are compared via the ASCII character set with
the largest ASCII value (i.e. 255) is considered greatest.) VARCHAR2
columns may be up to 2000 characters wide.
CHAR
This data type is now effectively replaced by VARCHAR2 - CHAR should no longer
be used, although is still recognized and valid. The CHAR data type is fixed
length up to a maximum of 255 characters, whereas VARCHAR2 is of variable length
up to 2000 characters. The variable length of VARCHAR2 gives it significant
storage and performance advantages over CHAR.
Any kind of number;
positive, negative, integer or real. Up to 22 digits may be entered. For
comparison, a larger value is considered greater than a smaller value, with all
negative values smaller than all positive values.
DATE
A special data type with
some of the characteristics of both Character and Number Data-types. It is used
for the storage of
date and time information. The operators =, > or < etc. can be
used for dates but in where clauses such as
WHERE HIREDATE = '01-DEC-95'
the date value, on the
right, must appear in quotes. For comparison, a later date is considered greater
than an earlier date. Oracle dates must lie in the range 1st January, 4712 BC to
31st December, 4712 AD.
The standard Oracle date format is:
Ü
Digit Day
followed by
Ü
Letter Month
and
Ü
Digit Year
All should be separated by hyphens e.g.
'01-JAN-95'
LONG
Can hold strings of
characters up to 2 gigabytes long. Only one column of LONG data
type can be defined per table. Binary data can only be inserted into these using
the programming language interface (an example of binary data storage in a LONG
data type would be storing a picture in an oracle table).
Importantly, there are significant restrictions on the use of character
information stored in LONG data types. While the
INSERT and
UPDATE statements can be used to
insert and modify LONG data types, and the
SELECT statement to retrieve this data, no functions can be used to
manipulate a retrieved a LONG data type and a LONG column can never be used in a
WHERE clause.
When retrieving LONG data types the display length of the column is set to 80
characters, and anything stored in the column beyond this will be truncated. The
following command is required to extend the display length of a LONG field,
within SQL*Plus:
SQL> SET LONG xxxx
The value
xxxx should be replaced with a number representing thelongest value
likely to be stored in the LONG column.
Above are the general use data types in oracle, after similar with the data
Types of oracle the next step in sense of RDBMS is that to clear knowledge of
Creating the tables In Oracle
DBMS Vs RDBMS
The characteristic that differentiates a DBMS from an RDBMS is that
the RDBMS provides a set-oriented database language. For most RDBMSs, this
set-oriented database language is SQL. Set oriented means that SQL
processes sets of data in groups.
Two standards
organizations, the American National Standards Institute (ANSI) and the
International Standards Organization (ISO), currently promote SQL standards to
industry. The ANSI-92 standard is the standard for the SQL used throughout this
book. Although these standard-making bodies prepare standards for database
system designers to follow, all database products differ from the ANSI standard
to some degree. In addition, most systems provide some proprietary extensions to
SQL that extend the language into a true procedural language.
Problems with RDBMS
Starting in the late 1980s, several deficiencies in relational DBMS products
began receiving a lot of attention. The first deficiency is that the dominant
relational language, SQL-92, is limiting in several important respects. For
instance, SQL-92 supports a restricted set of built-in types that accommodate
only numbers and strings, but many database applications began to include deal
with complex objects such as geographic points, text, and digital signal data. A
related problem concerns how this data is used. Conceptually simple questions
involving complex data structures turn into lengthy SQL-92 queries.
The second deficiency is that the
relational model suffers from certain structural shortcomings. Relational tables
are flat and do not provide good support for nested structures, such as sets and
arrays. Also, certain kinds of relationships, such as sub typing, between
database objects are hard to represent in the model. (Subtyping occurs when we
say that one kind of thing, such as a SalesPerson, is a subtype of another kid
of thing, such as an Employee.) SQL-92 supports only independent tables of rows
and columns. The third deficiency is
that RDBMS technology did not take advantage of object-oriented (OO) approaches
to software engineering which have gained widespread acceptance in industry. OO
techniques reduce development costs and improve information system quality by
adopting an object-centric view of software development. This involves
integrating the data and behavior of a real-world entity into a single software
module or component. A complex data structure or algorithmically
sophisticated operation can be hidden behind a set of interfaces. This allows
another programmer to make use of the complex functionality without having to
understand how it is implemented. The relational model did a pretty good job
handling most information management problems. But for an emerging class of
problems RDBMS technology could be improved upon.
Object-Oriented DBMS
Object-Oriented Database Management Systems (OODBMS)
are an extension of OO
programming language techniques into the field of persistent data management.
For many applications, the performance, flexibility, and development cost of
OODBMS are significantly better than RDBMS or ORDBMS. The chief advantage of
OODBMS lies in the way they can achieve a tighter integration between OO
languages and the DBMS. Indeed, the main standards body for OODBMS, the Object
Database Management Group (ODMG) defines an OODBMS as a system that integrates
database capabilities with object-oriented programming language capabilities.
The idea behind this is that so far as an application developer is concerned, it
would be useful to ignore not only questions of how an object is implemented
behind its interface, but also how it is stored and retrieved.
All developers have to do is implement their application using their favorite OO
programming language, such as C++, Smalltalk, or Java, and the OODBMS takes care
of data caching, concurrency control, and disk storage. In addition to this
seamless integration, OODBMS possess a number of interesting and useful features
derived mainly from the object model. In order to solve the finite type system
problem that constrains SQL-92, most OODBMS feature an extensible type
system. Using this technique, an OODBMS can take the complex objects that are
part of the application and store them directly. An OODBMS can be used to invoke
methods on these objects, either through a direct call to the object or through
a query interface. And finally, many of the structural deficiencies in SQL-92
are overcome by the use of OO ideas such as inheritance and allowing sets and
arrays. OODBMS products saw a surge of academic and commercial interest in the
early 1990s, and today the annual global market for OODBMS products runs at
about $50 million per year. In many application domains, most notably
computer-aided design or manufacturing (CAD/CAM), the expense of building a
monolithic system to manage both database and application is balanced by the
kinds of performance such systems deliver.
Problems with OODBMS
Regrettably, much of the considerable energy of the OODBMS community has been
expended relearning the lessons of twenty years ago.
First, OODBMS vendors have
rediscovered the difficulties of tying database design too closely to
application design. Maintaining and evolving an OODBMS-based information system
is an arduous undertaking.
Second, they relearned that
declarative languages such as SQL-92 bring such tremendous productivity gains
that organizations will pay for the additional computational resources they
require. You can always buy hardware, but not time. Third, they re-discovered
the fact that a lack of a standard data model leads to design errors and
inconsistencies. In spite of these shortcomings OODBMS technology provides
effective solutions to a range of data management problems. Many ideas pioneered
by OODBMS have proven themselves to be very useful and are also found in ORDBMS.
Object-relational systems include features such as complex object extensibility,
encapsulation, inheritance, and better interfaces to OO languages.
ORDBMS
(Object Relational Database Management System)
ORDBMS Evolution from RDBMS
One of the most popular sort algorithms is a called insertion sort.
Insertion sort is an O (N2) algorithm. Roughly speaking, the time it takes to
sort an array of records increases with the square of the number of records
involved, so it should only be used for record sets containing less than about
25 rows. But insertion sort is extremely efficient when the input is almost in
sorted order, such as when you are sorting the result of an index scan. List
presents an implementation of the basic insertion sort algorithm for an array of
integers.
InsertSort( integer arTypeInput[] )
{ integer nTypeTemp;
integer InSizeArray, nOuter, nInner;
nSizeArray = arTypeInput[].getSize();
for ( nOuter = 2; nOuter <= nSizeArray; nOuter ++)
{
vTypeTemp = arTypeInput[nOuter];
nInner = nOuter - 1;
while (( nInner > 0 ) &&
(arTypeInput[nInner] > vTypeTemp)) {
arTypeInput[nInner+1] =
arTypeInput[nInner];
nInner--;
}
arTypeInput[nInner+1] = vTypeTemp;
}}
Sorting algorithms such as this can be generalized to make them work with any
data type. A generalized version of this insert sort algorithm appears in list,
as shown above. All this algorithm requires is logic to compare two type
instances. If one value is greater, the algorithm swaps the two values. In the
generalized version of the algorithm, a function pointer is passed into
the sort as an argument. A function pointer is simply a reference to a memory
address where the Compare() function’s actual implementation can be found. At
the critical point, when the algorithm decides whether to swap two values, it
passes the data to this function and makes its decision based on the function’s
return result.
InsertSort( Type arTypeInput[],
(int) Compare( Type, Type) )
{
Type vTypeTemp;
integer InSizeArray, nOuter, nInner;
nSizeArray = arTypeInput[].getSize();
for ( nOuter = 2; nOuter <= nSizeArray; nOuter ++)
{
vTypeTemp = arTypeInput[nOuter];
nInner = nOuter - 1;
while (( nInner > 0 ) &&
Compare(arTypeInput[nInner],vTypeTemp)
> 0 )
{
Swap(arTypeInput[nInner+1],arTypeInput[nInner]);
nInner--;
}
Swap(arTypeInput[nInner+1],
vTypeTemp);
}}
Note how the functionality of the Swap() operation is something that can be
handled by IDS without requiring a type specific routine. The ORDBMS knows how
big the object being swapped is, and whether the object is in memory or is
written out to disk. To use the generalized algorithm in the List, all that IDS
needs is the Compare() logic. The ORDBMS handles the looping, branching, and
exception handling. Almost all sorting algorithms involve looping and branching
around a Compare(), as does B-Tree indexing and aggregate algorithms such as
MIN() and MAX(). Part of the process of extending the ORDBMS framework with new
data types involves creating functions such as Compare() that IDS can use to
manage instances of the type. All data management operations implemented in the
ORDBMS is generalized in this way. In an RDBMS, the number of data types was
small so that the Compare() routines for each could be hard-coded within the
engine.
Sort( void ** parRecords, IFX_TYPE * pRecKey ... )
{
switch(pRecKey->typenum)
{
case IFX_INT_TYPE:
case IFX_CHAR_TYPE:
InsertSort(parRecords, IntCompare);
break;
case IFX_FLOAT_TYPE:
InsertSort(parRecords, DoubleCompare);
break;
// etc for each SQL-92 type
default:
ifx_internal_raiseerror('No Such Type');
break;
}}
To turn an RDBMS into an ORDBMS, you need to modify the code shown in List to
allow the engine to access Compare() routines other than the ones it has
built-in. If the data type passed as the second argument is one of the SQL-92
types, the sorting function proceeds as it did before. But if the data type is
not one of the SQL-92 types, the ORDBMS assumes it is being asked to sort an
extended type Every user-defined function embedded within the ORDBMS is recorded
in its system catalogs, which are tables that the DBMS uses to store
information about databases. When asked to sort a pile of records using a
user-defined type, the ORDBMS looks in these system catalogs to find a
user-defined function called compare that takes two instances of the type and
returns an INTEGER. If such a function exists, the ORDBMS uses it in place of a
built-in logic. If the function is not found, IDS generates an exception. The
listing below shows the modified sorting facility.
Sort( void ** parRecords, IFX_TYPE * pRecKey . . . )
{
switch(pRecKey->typenum)
{
case IFX_INT_TYPE:
case IFX_CHAR_TYPE:
InsertSort(parRecords, IntCompare);
break;
case IFX_FLOAT_TYPE:
InsertSort (parRecords, DoubleCompare);
break;
// etc for each SQL-92 type
. .
default: // ah! Not SQL-92. Must be user-defined.
if( (pUDRCompare=udr_lookup(pRecKey->typenum,
"Compare")) ==NULL)
ifx_internal_error("No Such Function.');
else
InsertSort (parRecords, pUDRCompare );
break;
}}
Using the ORDBMS generalization of the sort utility has several implications:
Ü
When you take a
database application developed to use the previous generation of RDBMS
technology and upgrade to an ORDBMS, you should see no changes at all. The size
of the ORDBMS executable is slightly increased, and there are some new
directories and install options, but if all you do is to run the RDBMS
application on the ORDBMS, the new code and facilities are never invoked.
Ü
This scheme implies
extensive modifications to the RDBMS code. You not only need to find every place
in the engine that such modifications are necessary, but also need to provide
the infrastructure in the engine to allow the extensions to execute within it.
Ü
Finally, you should
know how general such an extensibility mechanism is. As long as you can specify
the structure of your data type, and define an algorithm to compare one instance
of the type with another, you can use the engine’s facilities to sort or index
instances of the type and you can use OR-SQL to specify how you want this done.
In practice, renovating an RDBMS in this manner is an incredibly complex and
demanding undertaking.
ORDBS or Object Relational Database Management System is already defined. Now we
are looking forward to its features which are giving different facilities to
their users.
Features of ORDBMS
ORDBMS synthesize the features of RDBMS with the best ideas of OODBMS.
Ü
Although ORDBMS
reuse the relational model as SQL interprets it, the OR data model is opened up
in novel ways. New data types and functions can be implemented using
general-purpose languages such as C and Java. In other words, ORDBMS allow
developers to embed new classes of data objects into the relational data model
abstraction.
Ü
ORDBMS schema has
additional features not present in RDBMS schema. Several OO structural features
such as inheritance and polymorphism are part of the OR data model.
Ü
ORDBMS adopt the
RDBMS query-centric approach to data management. All data access in an ORDBMS is
handled with declarative SQL statements. There is no procedural, or
object-at-a-time, navigational interface. ORDBMS persist with the idea of a data
language that is fundamentally declarative, and therefore mismatched with
procedural OO host languages. This significantly affects the internal design of
the ORDBMS engine, and it has profound implications for the interfaces
developers use to access the database.
Ü
From a systems
architecture point of view, ORDBMS are implemented as a central server program
rather than as distributed data architectures typical in OODBMS products.
However, ORDBMS extend the functionality of DBMS products significantly, and an
information system using an ORDBMS can be deployed over multiple machines. To
see what this looks such as, consider again the Employees example. Using an
ORDBMS, you would represent this data as shown below.
CREATE TABLE Employees (
Name PersonName NOT NULL,
DOB DATE NOT NULL,
Salary Currency NOT NULL,
Address StreetAddress NOT NULL
PRIMARY KEY ( Name, DOB )
);
An object-relational table is structurally very similar to its relational
counterpart and the same data integrity and physical organization rules can be
enforced over it. The difference between object-relational and relational tables
can be seen in the section stipulating column types. In the object-relational
table, readers familiar with RDBMS should recognize the DATE type, but the other
column types are completely new. From an object-oriented point of view, these
types correspond to class names, which are software modules that encapsulate
state and (as we shall see) behavior.
INSERT INTO SendPage
( Pager_Number, Pass_Code, Message )
SELECT E.Pager_Number,
E.Pass_Code,
Print(E.Name) ||
': Call 1-800-TEMPS-R-US for immediate INFORMIX DBA
job'
FROM Temporary_Employees E
WHERE Contains (GeoCircle('(-122.514, 37.221)',
'60 miles')),
E.LivesAt )
AND DocContains ( E.Resume,
'INFORMIX and Database
Administrator')
AND NOT IsBooked ( Period(TODAY, TODAY + 7),
E.Booked );
In a SQL-92 DBMS, SendPage could be only a table. The effect of this query would
then be to insert some rows into the SendPage table. However, in an ORDBMS,
SendPage might actually be an active table, which is an interface to the
communications system used to send electronic messages. The effect of this query
would then be to communicate with the matching temporary workers!
Motivation
Relational model (70’s): Clean and simple.
Ü
Great
for administrative data.
Ü
Not as
good for other kinds of data (e. g., multimedia, networks, CAD).
Object- Oriented models (80’s): Complicated, but some influential ideas.
Ü
Complex
data types.
Ü
Object
identity/ references.
Ü
ADTs
(encapsulation, behavior goes with data).
Ü
Inheritance.
Idea: Build DBMS based on OO model.
Object- Relational
Idea: Add OO features to the type system of
SQL. I. e. “plain old SQL”, but...
Ü
Columns
can be of new types (ADTs)
Ü
User-
defined methods on ADTs
Ü
Columns
can be of complex types
Ü
Reference types and “deref”
Ü
Inheritance and collection inheritance
Ü
Old SQL
schemas still work!
(Backwards compatibility)
Complex Types
User can use type constructors to generate new types:
Ü
Setof (
foo)
Ü
arrayof( foo)
Ü
listof(
foo)
Ü
Row (n1
t1... nk tk)
Can be nested:
Ü
Setof
(arrayof (int))
ADTs: User- Defined Atomic Types
Built- in SQL types (int, float, text, etc.) are limited.
Ü
Even
these types have simple methods associated with them (math, LIKE,
etc.)
ORDBMS: User can define new atomic types (& methods) if a type
cannot be naturally defined in terms of the built- in types:
create type jpeg (internallength = variable,
input = jpeg_ in, output = jpeg_ out);
Need input & output methods for types. e. g., Convert from text to
internal type and back.
Reference Types & Deref.
In most ORDBMS, every object has an OID. So, can “point” to objects
--reference types!
Ü
ref (
theater_ t)
Don’t confuse reference and complex types!
Ü
mytheater row( tno integer, name text,
address text, phone integer)
Ü
Theaterref( theater_ t)
Both look same at output, but are different!!
Ü
Deletion, update, “sharing”
Ü
Similar
to “by value” vs. “by reference” in PL
Modifications to support ORDBMS
Parsing
Ü
Type-
checking for methods pretty complex.
Query Rewriting
Ü
Often
useful to turn path expression into joins!
Ü
Collection hierarchy’ Unions
Optimization
Ü
New
algebra operators needed for complex types.
o
Must
know how to integrate them into optimization.
Ü
WHERE
clause expression can be expensive!
ORDBMS Advantages
ORDBMS technology improves upon what came before it in three ways. The first
improvement is that can enhance a system’s overall performance. The IDS
product can achieve higher levels of data throughput or better response times
than is possible using RDBMS technology and external logic because ORDBMS
extensibility makes it possible to move logic to where the data resides, instead
of always moving data to where the logic is running. This effect is particularly
pronounced for data intensive applications such as decision support systems and
in situations in which the objects in your application are large, such as
digital signal or time series data. But in general, any application can benefit.
The second improvement relates to the way that integrating functional aspects of
the information system within the framework provided by the ORDBMS improves the
flexibility of the overall system. Multiple unrelated object definitions
can be combined within a single ORDBMS database. At runtime, they can be mingled
within a query expression created to answer some high-level question. Such
flexibility is very important because it reduces the costs associated with
information system development and ongoing maintenance. The third benefit of an
ORDBMS relates to the way information systems are built and managed. An ORDBMS
system catalogs become a metadata repository that records information about the
modules of programming logic integrated into the ORDBMS. Over time, as new
functionality is added to the application and as the programming staff changes,
the system’s catalogs can be used to determine the extent of the current
system’s functionality and how it all fits together. The fourth benefit is that
the IDS product’s features make it possible to incorporate into the database
data sets that are stored outside it. This allows you to build federated
databases. From within single servers, you can access data distributed over
multiple places.
Flexibility and Functionality
Explaining flexibility is more difficult because the advantages of flexibility
are harder to quantify. But it is probably more valuable than performance over
the long run. To understand why, let’s continue with the financial company
example.
As it turned out, the more profound result of the integration effort undertaken
by our financial firm was that the CalcValue() operation was liberated from its
procedural setting. Before, developers had to write and compile (and debug) a
procedural C program every time they wanted to use CalcValue() to answer a
business question. With the ORDBMS, they found that they could simply write a
new OR-SQL query instead. For example, the query in which is Listed below is
showing a join involving a table that, while in the database, was beyond the
scope of the original (portfolio) problem. “What is the SUM() of the values
of instruments in our portfolio grouped by region and sector?”
SELECT IS.Country,
IS.Region,
SUM (CalcValue ( I.Issuer_Class, I.Principal, I.Issue_Date, I.Rate ))
FROM Instruments I, Issuers IS
WHERE I.Portfolio = :Portfolio_Argument
AND I.Issuer = IS.Name
GROUP BY IS.Country, IS.Region;
With the addition of a small Web front end, end users could use the database to
find out which the most valuable instrument in their portfolio was or which
issuer’s instruments performed most poorly, and so on. It was known that the
system had the data necessary to answer all these questions before. The problem
was that the cost of answering them using C and SQL-92 was simply too high.
Maintainability
After some investigation, the developers discovered that, over time, there had
been several versions of the CalcValue() function. Also, once the CalcValue()
algorithm was explained to end users, they had suggestions that might improve
it. With the previous system, such speculative changes were extremely difficult
to implement. They required a recode-recompile relink redistribute cycle. But
with the ORDBMS, the alternative algorithms could be integrated with ease. In
fact, none of the components of the overall information system had to be brought
down. The developers simply wrote the alternative function in C, compiled it
into a shared library, and dynamically linked it into the engine with another
name. What all of this demonstrates is that the ORDBMS permitted the financial
services company to react to changing business requirements far more rapidly
than it could before. By embedding the most valuable modules of logic into a
declarative language, they reduced the amount of code they had to write and the
amount of time required to implement new system functionality. None of this was
interesting when viewed through the narrow perspective of system performance.
But is made a tremendous difference to the business’s operational efficiency.
Object-Relational DBMS Applications
Extensible databases provide a significant boost for developers building
traditional business data processing applications. By implementing a database
that constitutes a better model of the application’s problem domain the
information system can be made to provide more flexibility and functionality at
lower development cost. Business questions such as the one in the List might be
answered in systems using an SQL-92 and C. Doing so, however, involves a more
complex implementation and requires considerably more effort. The more important
effect of the technology is that it makes it possible to build information
systems to address data management problems usually considered to be too
difficult. One way to characterize
applications in which an object-relational DBMS is the best platform is to focus
on the kind of data involved. For thirty years, software engineers have used the
term “data entry” to describe the process by which information enters the
system. Human users enter data using a keyboard. Today many information
systems employ electronics to capture information. Video cameras, environmental
sensors, and specialized monitoring equipment record data in rich media systems,
industrial routing applications, and medical imaging systems. Object-relational
DBMS technology excels at this kind of application. It would be a mistake to say
that ORDBMS are only good for digital content applications. As we shall see in
this book OR techniques provide considerable advantages over more low-level
RDBMS approaches even in traditional business data processing applications. But
as other technology changes move us towards applications in which data is
recorded rather than entered, ORDBMS will become increasingly
necessary.
ORDBMS Concepts and Terminology
Application Domain & Description
Complex data analysis: -
You can integrate sophisticated statistical and special purpose analytic
algorithms into the ORDBMS and use them in knowledge discovery or data mining
applications. For example, it is possible to answer questions such as “Which
attribute of my potential customers indicates most strongly that they will spend
money with me?”
Text and documents: -
Simple cases permit you to find all documents that include some word or phrase.
More complex uses would include creating a network that reflected similarity
between documents.
Digital asset management: -
The ORDBMS can manage digital media such as video, audio, and still images. In
this context, manage means more than store and retrieve. It also means “convert
format,” “detect scene changes in video and extract first frame from new scene,”
and even “What MP3 audio tracks do I have that include this sound?”
Geographic data: -
For traditional applications, this might involve “Show me the lat/long
coordinates corresponding to this street address.” This might be extended to
answer requests such as “Show me all house and contents policy holders within a
quarter mile of a tidal water body.” For next-generation applications, with a
GPS device integrated with a cellular phone, it might even be able to answer the
perpetual puzzler “Car 54, where are you?”
Bio-medical: -
Modern medicine gathers lots of digital signal data such as CAT scans and
ultrasound imagery. In the simplest case, you can use these images to filter out
“all cell cultures with probable abnormality.” In the more advanced uses, you
can also answer questions such as “show me all the cardiograms which are ‘like’
this cardiogram.”
Data Model
A data model is a way of thinking about data, and the object-relational data
model amounts to objects in a relational framework. An ORDBMS chief task
is to provide a flexible framework for organizing and manipulating software
objects corresponding to real-world phenomenon.
The object-relational data model can be broken into three areas:
Ü
Structural
Features. This aspect of the data model deals with how a database’s data can be
structured or organized.
Ü
Manipulation.
Because a single data set often needs to support multiple user views, and
because data values need to be continually updated to reflect the state of the
world, the data model provides a means to manipulate data.
Ü
Integrity and
Security. A DBMS data model allows the developers to declare rules that ensure
the correctness of the data values in the database. In the first two chapters of
this book, we introduce and describe the features of an ORDBMS that developers
use to build information systems.
Enhanced Table Structures
An OR database consists of a group of tables made up of rows. All rows in a
table are structurally identical in that they all consist of a fixed number of
values of specific data types stored in columns that are named as part of
the table’s definition. The most important distinction between SQL-92 tables and
object relational database tables is the way that ORDBMS columns are not limited
to a
standardized set of data types. Figure illustrates what an object-relational
table looks like. The first thing to note about this table is the way that its
column headings consist of both a name and a data type. Second, note how several
columns have internal structure. In a SQL-92 DBMS, such structure would be
broken up into several separate columns, and operations over a data value such
as Employee’s Name would need to list every component column. Third, this table
contains several instances of unconventional data types. Lives At is a
geographic point, which is a latitude/longitude pair that describes a position
on the globe. Resume contains documents, which is a kind of Binary Large Object
(BLOB). In addition to defining the structure of a table, you can include
integrity constraints in its definition. Tables should all have a key,
which is a subset of attributes whose data values can never be repeated in the
table. Keys are not absolutely required as part of the table’s definition, but
they are a very good idea. A table can have several keys, but only one of these
is granted the title of primary key. In our example table, the
combination of the Name and DOB columns contains data values that are unique
within the table. On balance, it is far more likely that an end user made a data
entry mistake than two employees share names and dates of birth.
Object-Oriented Schema Features
Another difference between relational DBMS and ORDBMS is the way in which
object-relational database schema supports features co-opted from
object-oriented approaches to software engineering. We have already seen that an
object-relational table can contain exotic data types. In addition,
object-relational tables can be organized into new kinds of relationships, and a
table’s columns can contain sets of data objects. In an ORDBMS, tables can be
typed; that is, developers can create a table with a record structure that
corresponds to the definition of a data type. The type system includes a notion
of inheritance in which data types can be organized into hierarchies.
This naturally supplies a mechanism whereby tables can be arranged into
hierarchies too. In the figure illustrates how the Employees table might look as
part of such a
hierarchy. In most object-oriented development environments, the concept of
inheritance is limited to the structure and behavior of object classes. However,
in an object-relational database, queries can address data values through the
hierarchy. When you write an OR-SQL statement that addresses a table, all the
records in its subtables become involved in the query too.
Extensibility: - User-Defined Types and User-Defined Functions
The concept of extensibility is a principal innovation of ORDBMS
technology One of the problems you encounter developing information systems
using SQL-92 is that modeling complex data structures and implementing complex
functions can be difficult. One way to deal with this problem is for the DBMS
vendor to build more data types and functions into their products. Because the
number of interesting new data types is very large, however, this is not a
reasonable solution. A better approach is to build the DBMS engine so that it
can accept the addition of new, application specific functionality. Developers
can specialize or extend many of the features of an ORDBMS: the data types,
OR-SQL expressions, the aggregates, and so on. In fact, it is useful to think of
the core ORDBMS as being a kind of software backplane, which is a framework into
which specialized software modules are embedded.
Database Stored Procedures
Almost all RDBMS allow you to create database procedures that implement business
processes. This allows developers to move considerable portions of an
information system’s total functionality into the DBMS. Although centralizing
CPU and memory requirements on a single machine can limit scalability, in many
situations it can improve the system’s overall throughput and simplify its
management. By implementing application objects within the server, using Java,
for examples, it becomes possible, though not always desirable, to push code
implementing one of an application-level object’s behaviors into the ORDBMS. The
interface in the external program simply passes the work back into the IDS
engine. Figure represents the contrasting approaches. An important point to
remember is that with Java, the same logic can be deployed either within the
ORDBMS or within
an external program without changing the code in any way, or even recompiling
it. The novel idea that the ORDBMS can be used to implement many of the
operational features of certain kinds of middleware. Routine extensibility, and
particularly the way it can provide the kind of functionality illustrated in
Figure, is a practical application of these ideas. But making such system
scalable requires using other features of the ORDBMS: the distributed database
functionality, commercially available gateways, and the open storage manager
(introduced below). Combining these facilities provides the kind of location
transparency necessary for the development of distributed information
systems.
Storage Management
Traditionally the main purpose of a DBMS was to centralize and organize data
storage. A DBMS program ran on a single, large machine. It would take blocks of
that machine’s disk space under its control and store data in them. Over time,
RDBMS products came to include ever more sophisticated data structures and ever
more efficient techniques for memory caching, data scanning, and storage of
large data objects. In spite of these improvements, only a fraction of an
organization’s data can ever be brought together into one physical location.
Data is often distributed among many systems, which is the consequence of figure.
Routine Extensibility and the ORDBMS as the Object-Server
Architecture autonomous information systems development using a variety of
technologies, or through organizational mergers, or because the data is simply
not suitable for storage in any DBMS. To address this, the IDS product adds a
new extensible storage management facility.
In the figure, we illustrate this Virtual Table Interface concept. ORDBMS
possess storage manager facilities similar to RDBMS. Disk space is taken under
the control of the ORDBMS, and data is written into it according to whatever
administrative rules are specified. All the indexing, query processing, and
cache management techniques that are part of an RDBMS are also used in an
ORDBMS. Further, distributed database techniques can be adapted to incorporate
user-defined types and functions. However, all of these mechanisms must be
re-implemented to generalize them so that they can work for user-defined types.
For example, page management is generalized to cope with variable length OPAQUE
type objects. You can also integrate code into the engine to implement an
entirely new storage manager. Developers still use OR-SQL as the primary
interface to this data, but instead of relying on internal storage, the ORDBMS
can use the external file system to store data. Any data set that can be
represented as a table can be accessed using this technique. Developers can also
use this technique to get a snapshot of the current state of live information.
It is possible to represent the current state of the operating system’s
processes or the current state of the file system as a table. You can imagine,
for example, an application intended to help manage a fleet of trucks and
employees servicing air conditioners or elevators. Each truck has a device
combining a Global Positioning System (GPS) with a cellular phone
that allows a central service to poll all trucks and to have them “phone in”
their current location. With the ORDBMS, you can embed Java code that activates
the paging and response service to implement a virtual table, and then write
queries over this new table, as shown in Listing below. “Find repair trucks
and drivers within ’50 miles’ of ‘Alan Turing’ who are qualified to repair the
‘T-20’ air conditioning unit.”
SELECT T.Location, T.DriversName
FROM Trucks T, Customers C
WHERE Distance (T.Location, C.Location) < '50 miles'
AND C.Name = 'Alan Turing'
AND DocContains ( T.DriverQualifications,
'Repair for T-20');
This functionality should fundamentally change the way you think about a DBMS.
In an object-relational DBMS, SQL becomes the medium used to combine components
to perform operations on data. Most data will be stored on a local disk under
the control of the DBMS, but that is not necessarily the case. The Virtual Table
Interface tutorial describes these features in more detail.
The Real Benefits of Object-Relational DB-Technology for Object-Oriented
Software Development
Abstract:
-
Object-oriented programming languages
(OOPLs like C++, Java, etc.) have established themselves in the development of
complex software systems for more than a decade. With the integration of
object-oriented concepts, object-relational database management systems (ORDBMS)
aim at supporting new generation software systems better and more efficiently.
Facing the situation that nowadays more and more software development teams use
OOPLs ‘on top of’ (O) RDBMS, i.e. access (object-) relational databases from
applications developed in OOPL, this benefit is on our investigations on
assessing the contribution of object-relational database technology to
object-oriented software development. First, a conceptual examination shows that
there is still a considerable gap between the object-relational paradigm (as
represented by the SQL: 1999 standard) and the object-oriented paradigm. Second,
empirical studies (performed by using our new benchmark approach) point at
mechanisms, which are not part of SQL: 1999 but would allow reducing the
mentioned gap. Thus, we encourage the integration of such mechanisms, e. g.,
support for navigation and complex objects (structured query results), into
ORDBMS in order to be really beneficial for new generation software systems.
Motivation
Object-oriented programming languages (OOPL), such as C++, Java, SmallTalk,
etc., have established themselves in the development of complex software system
for more than a decade. Both, the structure of these systems as well as the
structure of objects managed by these systems have become very complex.
Object-oriented concepts offered by OOPL are well suited for managing complex
structured objects. However, there are additional requirements, such as
persistence and transaction-protected manipulation, which can only be fulfilled
efficiently by integrating a database management system (DBMS). Consequently,
database technology becomes one of the core technologies of modern software
systems. In times of the ‘breakthrough’ of object-oriented system development,
two kinds of DBMS were of practical relevance: relational DBMS (RDBMS) and
object-oriented DBMS (OODBMS). Using OODBMS has proved inefficient/inflexible
for reasons.
Using an RDBMS, on one hand, requires to overcome the well known impedance
mismatch , i.e., performing the non-trivial task of mapping complex object
structures and navigational data processing (at the OOPL layer) to the
set-oriented, descriptive query language (SQL92), which supports just a simple,
flat data model. Despite this considerable mapping overhead, mature RDBMS
technology (index structures, optimization, integrity control, etc.), on the
other hand, contributes to keep the overall system performance acceptable.
Several commercial systems mapping object-oriented structures onto the
relational data model are currently available. Such systems are often referred
to as Persistent Object System built on Relation (shortly: POS).
The object-relational wave in database technology has decisively reduced the gap
between RDBMS and OOPL. Although object-relational DBMS (ORDBMS) are able to
(internally) manage object-oriented structures the required seamless coupling of
OOPL and ORDBMS is not yet possible, be-cause (as in SQL92) results of SQL:1999
queries are rather (sets of data) topples than (desired sets of) objects. In
summary, the gap between OOPL and ORDBMS can be traced back to a whole bunch of
modeling and operational aspects, as we will detail in the following sections.
Furthermore, the SQL: 1999 standard and the commercially available ORDBMS differ
very much in their object-oriented features. Thus, it is by no means clear, how
a given object-oriented design can be mapped to a given ORDBMS (most)
efficiently, or which features should be offered by ORDBMS in general in order
to enable an efficient mapping of object-oriented structures, respectively. Our
long-term objective is to influence the further development of ORDBMS to-wards a
better support of object-oriented software development (minimal mapping
overhead). Thus, we have proposed a new benchmark approach in allowing to assess
a given ORDBMS by taking into account both, its own performance as well as the
required mapping overhead.
Conceptual Considerations
There is a multiplicity of object data models, for example ODMG, UML, COM, C++
and Java. All these models support the basic concepts of the OO paradigm;
however, there are certain differences. Independently from the modeling language
used in the OO software development (e. g., UML), SQL: 1999 must be coupled with
a concrete OOPL. In accordance to their overall relevance and conceptual
vicinity, we concentrate on the object model of C++ and ODMG and compare it with
the SQL: 1999 standard.
Modeling Aspects
Object Orientation in OOPL.
The concept object represents the foundation of the OO paradigm. An
object is the encapsulation of data representing a semantic unit w. r. t. its
structures/values and its behavior. It conforms to a particular class. In fact,
a class implements an object type (classification) which is characterized
by a name as well as a set of attributes and methods. Each attribute conforms to
a certain data type and is either single-valued or set-valued (collection
types). Furthermore, a data type can be scalar (e. g., integer, Boolean,
string, etc.) or complex. In the latter case corresponding values can be
references (association) or objects of other classes (aggregation)
so that complex structures can be modeled. A class may implement methods
(behavior) which can be invoked in order to change the object’s state.
Classes may be arranged within class hierarchies. A class inherits structures
and behavior from its super classes (inheritance), but may refine these
definitions (specialization). Due to space restrictions we do not give a
more detailed description of the OO paradigm, but discuss how the OR data model
conforms to OO concepts.
Object Orientation in SQL: 1999.
While the relational data model (SQL2) did not support semantic modeling
concepts sufficiently, in SQL: 1999 the fundamental extension supporting
object-orientation is the structured user-defined data type (UDT). UDT,
which can be considered as object types, can be treated in the same way as
pre-defined data types (built-in data types). Consequently, similarly to
the type system of OOPL the type system of SQL: 1999 is extensible. UDT may be
complex structured and, therefore, may not only contain predefined data types
but also set-valued attributes (collection types) and even other UDT
(aggregation) or references (associations). Obviously, UDT are
comparable to the classes of the OO paradigm. However, according to the SQL:
1999 standard a UDT must be associated with a table. The notion of typed table,
also referred to as object table, allows to persistently managing
instances of a certain UDT within a table. Each topple of such a table
represents an instance (object) of a particular UDT and is identified by
a unique object identifier (OID) which can be sys-tem- generated or
user-defined. Besides instantiable UDT, SQL: 1999 also supports non-instantiable
UDT, which conforms to the notion of abstract classes in OOPL. In addition, UDT
may have methods (behavior) which are either system-generated or
implemented by users. They may participate in type hierarchies, in which more
specialized types (subtypes) inherit structure and behavior from more
general types (super-types), but may specialize corresponding
definitions. Thus, SQL: 1999 supports polymorphism and substitutability,
however, multiple-inheritance is not supported. Due to the association of UDT
with tables SQL: 1999 does not support encapsulation and, consequently, there is
nothing like the degree of encapsulation known from OOPL (public, protected,
private).
Operational Aspects
Besides the fundamental modeling aspects discussed so far, we also have to
examine operational aspects in order to figure out the conceptual distance
adequately. The following aspects are most relevant to our consideration:
Descriptive Queries vs. Navigational Processing: -
While OOPL processing is inherently
navigational, SQL supports a set-oriented, descriptive query language. Both
navigational and set-oriented query processing are important to modern software
systems. Therefore, ORDBMS should also directly facilitate navigational
processing to fulfill this requirement of OO applications. Direct support of
navigational access by the DBMS would mean that a database object referred
to by its OID can be provided as in-stance of an OOPL class. However, to the
best of our knowledge none of the currently available ORDBMS directly supports
this notion of navigation. A naive coupling of OOPL with descriptive SQL
requires to issue one or several corresponding SQL queries to the database for
processing a dereferencing operation, e. g. GetObject(Ref), and
retrieving the requested object from the database server. Such a processing
scheme will surely lead to bad runtime behaviour of the entire system, since the
costs of transforming a navigational operation to SQL queries, of evaluating
these queries in the DBS, and of the client/ server communication can be very
high. Obviously, the lack of DB APIs in directly supporting navigational access
impairs the system efficiency badly. Thus, either direct support for navigation
1 must be provided or efficient perfecting mechanisms exploit-1. In section 5 we
will see that one of the commercially available ORDBMS provides some basic means
for a direct access to objects by OID. Measurement results show that this is at
least a step into the right direction set-oriented database access and, thereby,
reducing the number of database round-trips must be applied in order to
effectively couple OOPLs with ORDBMS.
Structured Query Results:
-
As already mentioned several times before, OOPL support complex structured
object types, especially by the possibility of nesting complex data types as
well as using collection types and references. We have also mentioned previously
that these facilities of modelling complex structured objects have been
integrated into SQL by the SQL: 1999 standard. Unfortunately, because of the
traditional basic concepts of SQL, complex structures (actually supported both
in OOPL and ORDBMS) get lost at the DBMS interface, since only (sets of) simply
structured, flat data tuples can be retrieved. Therefore, if we want to couple
an OOPL with an (O)RDBMS, it is necessary to separately retrieve simple
fragments of complex objects by issuing several SQL queries, and then rebuild
complex object structures at the programming language level (see Fig. 1). The
mentioned problem even gets worse, if not individual complex objects, but
complex structures (object graphs) containing numerous related objects
interconnected by object references are to be selected as units. Obviously, the
lack of direct support for complex structured objects at the DB API reveals a
bottleneck between the two paradigms, and prevents new generation software
systems from exploiting the potential power of ORDBMS most effectively.
Object Behaviour:
-.
Of course, the operational aspects also encompass the object behaviour
implemented in the database. Because of special implementation aspects these
methods (UDF) can almost exclusively be executed at the server side, or, if
these UDF or special client invokable pendants are executed at the client side,
it cannot be guaranteed that these pendants perform the original semantics. For
example, there may be complex dependencies between UDF and integrity
constraints, e. g., referential integrity constraints and triggers, which are
implemented by using SQL and are automatically ensured by the DBMS. Thus, it is
almost impossible to support calling UDF at the OOPL level in the same
(‘natural’) way as usually object methods can be called. There-fore, we do not
consider a mapping of object methods in this paper and restrict our
considerations to navigational and set-oriented access.
Mapping Rules
In the previous section, we outlined the conceptual distance between the OO
paradigm and SQL: 1999. Considering an individual ORDBMS, its OO features
determine the overhead which has at least to be spent in order to bridge this
distance. Nevertheless, in theory there is an entire spectrum of possibilities
to design the required mapping layer. At this point, we want to mention that
there are some more aspects of ORDBMS, which OO applications may benefit from,
but which cannot be captured in this paper, e. g., and facilities for
integrating external data sources into database processing. The other hand, on
how far the OO features are to be exploited. Regarding the first point
(‘natural’ coding), we demand that the programmer must not be burdened by having
to take data management aspects into account. Thus, programming must be
independent of the database as well as the mapping layer design. Regarding the
second point (degree of exploiting OO features), we want to outline the two
extremes of the mentioned spectrum, i.e., pure relational mapping and full
exploitation of the OO features offered by the considered ORDBMS
Pure Relational Mapping:
-
As mentioned before, there are several commercial POS mapping OO structures to
relational tables. Objects are represented by table rows. Since RDBMS do not
support set-valued attributes, user-defined data types, and object references,
additional tables are required to store corresponding data and to connect them
with the corresponding class tables via foreign keys. Thus, several tables may
be required to map a given class. Principally, there are several ways of
representing a class hierarchy in the relational model. After studying pros and
cons, we decided to use the horizontal partitioning approach, since it provides
good performance in most cases, and is also used in most commercial POS.
Object-Relational Mapping:
-
Exploiting the OO features of ORDBMS is commonly argued to be more promising,
but the real benefits in comparison to the pure relational mapping are not very
well studied yet. This paper will give some performance evaluations later on.
Before outlining general mapping rules exploiting OO features of ORDBMS, we have
to re-emphasize the following point. Our benchmark approach, which will be
out-lined in the subsequent section, assesses a certain ORDBMS by taking the
required mapping overhead into account. In order to be fair, the mapping layer
used throughout the measurements must be designed in an optimal way w. r. t. the
capabilities of the ORDBMS considered. Therefore, the design of the mapping
layer may differ with the ORDBMS to be assessed. In the following, we just
outline general mapping rules, which are based on the SQL: 1999 concepts, in
order to provide some basic understanding on how a mapping layer can be
designed. A C++ class maps to a UDT in SQL: 1999. Non-instantiable UDT
corresponds to abstract C++ classes. A UDT is associated with exactly one table
(typed table) to initialize its instances. Each tuple in this table
represents a persistent instance (object) of a particular class and is
associated with a system-generated OID. Embedded objects (aggregation)
entirely belong to their top-level object and, therefore, do not own an OID.
Extents are mapped to the list constructor of C++ STL. Keys are managed at the
mapping layer by applying the map constructor of C++ STL. A C++ class hierarchy
maps to a hierarchy of structured UDT. However, SQL: 1999 only supports
single-inherit-. So that multi-inheritance has to be simulated at the mapping
layer. SQL references are mapped to C++ pointers. Since SQL: 1999 does not
support diametric references, a relationship type is broken down into two
separate primary-key/foreign-key connections and the mapping layer maintains the
referential integrity. Except for mutator and observer methods, which are
generated by the mapping layer w. r. t. the constraints de-fined in the user
database schema, object behaviour is not yet considered in our performance
investigation. Navigation is supported by offering the function
GetObject(Ref) which, in the case that the DB API does not directly support
OID-based object fetching, is implicitly transformed into an SQL query. After
having discussed (modelling and operational) discrepancies between ORDBMS and
OOPL as well as the mapping rules needed to bridge the gap, we proceed with our
performance evaluations.
Performance Evaluation
Our discussion shows that there is only a small difference between the OO and OR
paradigms w. r. t. modelling aspects, but a considerable distance w. r. t. the
operational aspects and the application semantics. In order to further evaluate
this distance as well as to quantify the overhead required for bridging this
gap, we propose a configurable benchmark approach [18, 26]. Remind, we do not
consider OODBMS, but ORDBMS, because we more and more have to face the situation
that people are using OOPL for software development and (O) RDBMS for data
management purposes so that there is a need for a more detailed examination of
the efficiency of possible coupling mechanisms. Consequently, the OO-Benchmark
representing an important standard for benchmarking OO systems is not
appropriate for our purposes. The performance of RDBMS or ORDBMS has
traditionally been evaluated in isolation by applying a standard benchmark
directly at the DBMS interface. Sample benchmarks are the Wisconsin benchmark,
the TPC benchmark as well as the Bucky benchmark [6]. These benchmarks are very
suitable for comparing different DBMS with each other. However, none of these
benchmarks helps to assess the contributions of a DBMS to OO software
development. Consequently, these other approaches do not take the typical
application server architecture and the fact that the DBMS capabilities
determine the overhead of the required application/mapping layer into account.
Furthermore, data types as well as operations of the applications we consider
may differ significantly (double-edged sword), so that a standard benchmark can
not cover the entire spectrum. Therefore, we propose an open, configurable
benchmark system allowing examining the entire system (incl. mapping layer) w.
r. t. to its typical applications. Such a system will also help us to get
results transcending those reported on in this paper (see succeeding sections),
e. g., and more detailed examinations of navigational support. In the following,
we outline our first prototype.
Benchmark System
An open, configurable benchmark system is not necessarily difficult to be
applied, as our approach proves. Indeed, our current prototype offers predefined
configurations, which w. r. t. data-base size are small, medium
and large in order to be sufficiently scalable. Both, structures (data
type and type hierarchies) as well as complexity of data in the standard
configurations are determined in cooperation with one of the eading software
vendors for business standard software and, thus, represent a wide spectrum of
typical application domains. In addition, our benchmark system can be simply
configured according to the particular properties of a concrete application.
Among other possibilities, users can directly control the generation process of
the benchmark database, e. g., specify the database size, the complexity of the
class hierarchy and the complexity of the individual objects, in order to take
care that the special requirements of the application in mind are taken into
account. After having generated the benchmark database the user may select from
a given set of query templates and indicate how many times each template is to
be instantiated.
New query templates can be easily added, if the existing templates do not
reflect application characteristics sufficiently. Based on these user
selections/ specifications the load generator creates a set of queries which is
passed to the query executor, which, in turn, serves as a kind of driver for
measurements. Users can also specify which kinds of measurement data are to be
collected by the system, i. e., amount of time spent at the DB or the mapping
layer for query transformation, or the time spent for SQL query evaluation, data
loading, and/or result set construction. Corresponding values are collected by
the data collector during execution of the query set and after-wards stored in
the DBS for further evaluations. As explained in more detail in , the special
challenges of this benchmark approach are, on one hand, to properly take into
account the requirements of OO system development, and, on the other hand, to
guarantee an optimal mapping w. r. t. the particular capabilities of an
individual ORDBMS.
Benchmark and Measurements
In order to get a complete performance evaluation, we concentrate on answering
the following questions: 1. which performance gains offer ORDBMS in comparison
to RDBMS regarding their usage in OO software development?
2. Which additional overhead has to be spent at the mapping layer in order to
bridge the gap between the OO and OR paradigms and how does it behave facing
different query types?
3. To which extent is the system performance influenced by the capabilities of
the (O) RDBMS API?
In order to be able to answer the first two questions, we have selected a set of
typical benchmarking queries according to a long-term study of a leading
software company. These queries represent a wide spectrum of typical operations
in the target applications of ORDBMS. We have compared a purely relational
mapping with an object-relational one (by means of exploiting its OO modelling
power) by using a currently available commercial ORDBMS. This way we ‘measured’
how OO software development can leverage from the OO extensions offered by
ORDBMS (e. g., structured UDT, references, etc.). The operations considered for
that purpose are implemented as query tem-plates and grouped in following
categories:
Navigation operations:
Navigation operations,
such as GetObject(OID), are not directly supported by almost all
currently available ORDBMS. Considering such operations helps us to assess the
performance of ORDBMS in supporting navigational processing. We hope that
corresponding results ‘help’ ORDBMS vendors to make ORDBMS as efficient as
OODBMS are in this concern.
Queries with simple predicates on scalar attributes:
Queries of this category
have simple predicates just containing a single comparison operation on a scalar
attribute. This group mainly serves to provide a performance baseline that can
be helpful when interpreting results of more complex queries.
Queries with predicates on UDT:
This group contains queries with
simple predicates (a single comparison operation) on attributes of structured,
non-atomic data types. Thus, it mainly serves for assessing the efficiency of
mapping UDT to (O) RDBMS.
Queries with predicates on set-valued attributes:
This group contains
queries with simple predicates (a single IN operation) on nested sets. ORDBMS
directly support set-valued data types. In the relational mapping, several
tables (according to the degree of nesting), which are connected by
primary/foreign-keys, are necessary.
Queries with path predicates:
This group contains queries evaluating
simple predicates after path traversals. These queries allow to evaluate the
efficiency of processing dereferencing operations (path traversals) in ORDBMS.
Queries with complex predicates: Queries of this group contain
complex predicates challenging both query transformation as well as query
optimization.
Queries on the class hierarchy:
While all other queries exclusively
deliver direct in-stances of a single queried class, queries of this group
deliver transitive instances as well. Predicates conform to those of the second
category. This group of queries allows to evaluate the efficiency of the ORDBMS
in handling class hierarchies (inheritance). The comparison with the relational
mapping has been expected to demonstrate the advantages of ORDBMS. The third
question posed at the beginning of this section deals with the capabilities of
the DB interface especially w. r. t. support for complex structured objects and
navigational access. In order to examine these aspects, we performed
measurements on two different (commercially successful) ORDBMS. One of these
systems offers the more traditional interface, whereas the second one provides
some basic means of supporting complex structures objects. We performed our
measurements 4 on a benchmarking database with 100 classes and 250000 instances
(configuration medium). In order to use a representatively structured
class hierarchy, we studied typical application scenarios of a renowned vendor
of business standard software and parameterized our population algorithm
accordingly. We measured the database time (DB time) and the total system time
(TS time). The DB time of SQL queries is the time elapsed between delegating the
queries to the DBMS and receiving back the results (open cursors, traverse
iterators). It includes the time for client/ server communication, the time for
evaluating the queries within the DBS and the time for loading the complete
result sets. This has to be taken into account, when analyzing the measurement
results. The TS time is defined as the total elapsed time from issuing a query
operation at the OOPL level until having received the complete result set. It
contains the time spent within the mapping layer as well as the DB time. We
think that these 3 questions have to be answered before we can think about, how
OR technology can be improved in order to support OOPL better and more
efficiently. In the following section, we report on our measurement results.
Measurement Results and Observations
ORDBMS vs. RDBMS
In the first test series, we have compared a purely relational mapping with an
OR mapping by using one of the leading currently available commercial ORDBMS.
The hardware and the soft-ware configurations are left unspecified, to avoid the
usual legal and competitive problems with publishing performance numbers for
commercial products.
All performance measurements are averages of multiple trials, with more trials
for higher variance measurements. For each DBMS tested, we put much effort in
optimization (e.g., indexes) and mapping layer design in order to achieve the
best performance possible. Vestigation aims at quantifying the benefits of OO
extensions offered by ORDBMS in more detail. Due to space restrictions, it is
not possible to analyze all results in detail. It can be ob-served that the OR
mapping outperforms the purely relational mapping in all query categories.
Although the OR mapping shows only tiny ad-vantages in retrieving small result
sets, it provides performance gains of up to 40% in retrieving large result
sets, or processing queries on class hierarchies. The reasons are twofold.
First, the OO features provided by the ORDBMS contribute to keep the complexity
of the mapping layer low (better query evaluation strategy, less overhead for
synthesizing the result set) and to reduce client/ server communication (less
queries). Second, the implementation of the ORDBMS (we used) enables performance
improvements, (even) if using OO extensions. In the purely relational case, it
is not possible to map a (complex) class to exactly one table. Mapping
set-valued attributes, aggregations and (m:n)-relationships properly demands
several tables interconnected by primary/foreign keys.
DB Time
Performance Characteristics of the Mapping Layer
In order to characterize the performance of the mapping layer adequately, we
have investigated simple queries with different selectivity. The results are
presented in Figure. As already mentioned in section 2, the DB APIs of almost
all currently available ORDBMS do not support navigational access directly.
Hence, navigational operation, such as GetObject (Ref), must be transformed to a database query (SQL: 1999), such
as "Select * from... Where OID = Ref", by the mapping layer. Such a query strategy, especially
when intensively dealing with navigational operations as usually required by
most OO applications, obviously leads to high processing overhead spent in the
database system as well as very high communication costs (over 70% of the entire
system time). The (probably not very astonishing) observation is that the
traditional query strategy is not adequate for supporting navigational access.
The DB time of set-oriented queries shows only a slight ascent with increasing
result sets, while the additional mapping overhead increases rapidly. When
retrieving 1250 objects, the time spent at the mapping layer even exceeds 86% of
the total system time. This observation can be explained as follows. In the
early days of ORDBMS, these systems comparable to RDBMS were not very successful
in supporting navigational access, but excellent in processing set-oriented
access (as they are still today). Unfortunately, OO applications can hardly
benefit from this advantage, because the ORDBMS API is ‘inherited’ from
traditional RDBMS and, therefore, still only supports simple, flat data. In lack
of an extensible DB API which may generically support complex data types
defined by the user, complex objects in an ORDBMS have to be first
‘disassembled’ into scalar values, and afterwards reconstructed (at the mapping
layer) to objects of a certain class in the particular OOPL. This kind of
overhead gets dramatic with increasing result set cardinality and impairs the
en-tire system efficiency significantly. Regarding these measurement results, we
can draw the following conclusions. In order to be able to support navigational
access better, the ORDB API should directly sup-port navigational operations
like GetObject(Ref), so that the costs of transforming navigational
operations to SQL queries and for evaluating these queries can be avoided.
Furthermore, it should also support the notion of complex objects directly and
offer the possibility of retrieving complex objects as units. According to our
examinations, such improvements can increase the entire system efficiency by up
to 400%.
Supports for Complex Objects
As already mentioned before, the lack of direct support for complex objects and
navigational access, the API level extremely impairs the overall system
efficiency. Fortunately, a leading ORDBMS vendor already offers an extended call
level interface, which, as we can see later in this section, directly supports
navigational access as well as retrieval of complex objects as units, and, in
addition, even retrieval of complex object graphs as units. Navigation is
enabled by the possibility of autonomously retrieving complex structured objects
(by OID) as instances of C structures. This simplifies the mapping to OOPL, such
as C++, considerably and, therefore, is undoubtedly the first step into the
right direction, although this mechanism does not yet supports the actually
wanted seamless coupling (transparent transformation from a database object to
an instance of an OOPL class). The mentioned support for complex objects at the
level of the DB API allows to directly retrieving a complex object’s data from
the database into the main memory by specifying its OID or a predicate.
Therefore, the expensive query processing strategy described earlier, that can
be avoided. Remind that the measurements described in section earlier, that have
been performed on an ORDBMS that does not possess a DB API as the one described
in this section. To show the importance of and the corresponding demand on a
suitable support for complex objects at the DB API level, we repeated the
measurements described earlier, that on the ORDBMS referred to in this section
and providing the mentioned complex object support at its API. Illustrates the
measurement results. Obviously, the additional overhead spent at the mapping
layer is now independent from the cardinality of the query result sets. Thus,
the direct supports of complex objects at the DB API results in a clear
performance gain (up to 400%). The direct support for navigational access at the
DB API level mentioned in this section, allowing to directly access objects by
calling a function like Get-Object( Ref), avoids expensive processes
(query transformation, data types conversion and object re-construction). This
obviously contributes to improve performance significantly. Fig. 6a shows a
comparison between a query strategy (transforming a navigational operation to an
SQL query) and a navigational strategy (directly calling a GetObject(OID)
function at the DB API). The advantages of the navigational strategy are
obvious. With a direct support of navigational access the entire system
efficiency increases by approximately 200%. OO applications often want a set of
objects interconnected by object references (object graph) to be retrieved
completely within just a single database interaction. Fig. 6b shows a comparison
of two strategies for retrieving complex object graphs. In this measurement, we
used the ORDBMS directly supporting navigation as well as retrieval of complex
object graphs. It can be seen clearly that strategy I exploiting the ability of
retrieving object graphs exhibits a performance gain of about 100% already at a
result set cardinality of 13 objects. The design of a new DB API, which directly
supports complex structured objects, is by no means an easy job and requires
generic design methods, because user-defined data types can be arbitrarily
structured, e. g., contain other complex data types, such as UDT, references and
set-valued attributes. Furthermore, a DB API has always to be multi-lingual
requiring to support all common programming languages simultaneously and,
therefore, making it very difficult to offer the best of both worlds (DBMS and
OOPL) without any compromises.
Conclusions
We have emphasized the importance of assessing ORDBMS w. r. t. their
capabilities of supporting OOPL. We have first qualitatively considered ORDBMS
and OOPL regarding modelling and operational aspects relevant for
object-oriented software development. As the object-relational (SQL: 1999) and
the object-oriented data model are essentially coming together, the operational
distance between these two paradigms is still considerable so that an additional
mapping layer is necessary to over-come this gap. Regrettably, such an
additional layer impairs the performance of the overall system considerably.
Additionally, we performed quantitative examinations (measurements) in order to
assess ORDBMS in their capabilities of supporting OOPL. Indirectly, these
measurements are supposed to contribute to promoting the optimal utilization of
currently available ORDBMS in object-oriented system development and to guide
the future development of ORDBMS in a way that the support for OOPL is improved.
Regarding our performance examinations, we have motivated the necessity of an
open, configurable benchmark approach, because not only the performance of
ORDBMS themselves but also the additional overhead, which is necessary for
bridging the conceptual and operational distance between ORDBMS and OOPL, have
to be taken into account and, therefore, properly characterized. It has been
clearly illustrated by our performance measurements that object-relational
database technology gets more and more mature, not only conceptually (data
model, query processing), but also w. r. t. performance. The model facilities
contribute to keep the mapping layer ‘thin’ in contrast to RDBMS. This, on one
hand, reduces the implementation efforts, and, on the other hand, increases the
entire system efficiency. Despite the available object-oriented ex-tensions,
which entail an unambiguous gain in comparison to RDBMS, the potential benefit
of object-relational database technology in our opinion is not yet exhausted,
since the traditional DB API is so far not capable of successfully supporting
object-oriented principles. The DB API of almost all ORDBMS still can not
support navigational operations and complex object structures directly so that
new generation software systems can not take advantage of object-relational
database technology in an optimal way. It can be called the ‘bottleneck’ between
the object-relational and object-oriented paradigms. Our examinations have shown
clearly that a new DB API directly supporting navigational operations and
complex objects is necessary. Obviously, it is not easy to equip ORDBMS with
such a new interface. For example, such an interface must be multi-lingual. For
answering the question, how user-defined data types can be effectively
represented at the OOPL level, further research efforts are required.
Altogether, the problem of a seamless and effective mapping of SQL: 1999 to
OOPL, such as C++ and Java, has to be worked on further. Our future work will
mainly be looking for possible solutions. Furthermore, we plan intensive studies
of ORDBMS capabilities for supporting navigational access, because ORDBMS are
still behind OODBMS in this concern. Generally, after having characterized the
performance aspects in more details, our long-term objective is to develop
applicable concepts contributing to increase the performance of ORDBMS.
Bibliography
References
1. Bernhard, R., Flehmig, M., Mahdoui, A., Ritter, N., Steiert, H.-P., Zhang,
W.P.:
“Building a Persistent Class System on top of (O)RDBMS - Concepts and
Evalua-tions”,
Internal Report, University of Kaiserslautern, 1999
2. Bernstein, P.A., Harry, B., Sanders, P.J., Shutt, D., Zander, J.: “The
Microsoft Re-pository”,
Proc. VLDB Conf., 1997, pp. 3-12
3. Bernstein, P.A., Pal, S., Shutt, D.: “Context-Based Prefetch for
Implementing Ob-jects
on Relations”,
Proc. VLDB Conf., 1999, pp. 327-338
4. Bitton, D., DeWitt, D.J., Turbyfill, C.: “Benchmarking Database Systems: A
Sys-tematic
Approach”,
Proc. VLDB Conf., 1983, pp. 8-19
5. Carey, M.J., DeWitt, D.J.: “Of Objects and Databases: A Decade of Turmoil”,
Proc. VLDB Conf., 1996, pp. 3-14
6. Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke,
J.E.,
Shah, D.N.: “The Bucky Object-Relational Benchmark”, Proc. VLDB Conf.,
1996,
pp. 135-146
7. Carey, M.J., DeWitt, D.J., Kant, C., Naughton, J.F.: “A Status Report on
the OO7
OODBMS Benchmarking Effort”,
Proc. ACM OOPSLA, 1994, pp. 414-426
8. Carey, M.J., Doole, D., Mattos, N.M.: “O-O, What Have They Done to DB2?”,
Proc. 1999 25th. VLDB Conf., pp. 542-553
9. Cattell, R.G.G., Barry, D., Bartels, D., et al: “The Object Database
Stand-ard:
ODMG 2.0”,
Morgan-Kaufman Publishers, San Mateo, 1997
10. Gray, J.: “The Benchmark Handbook for Database and Transaction Processing
Systems”,
Morgen Kaufmann Publishers, San Mateo, CA, USA, 2nd Ed., 1993
11. Gulutzan, P., Pelzer, T.: “SQL-99 Complete, Really”, R&D
Publications, 1999
12. Keller, A., Jensen, R., Agrawal, S.: “Persistence Software: Bridging
Object-Ori-ented
Programming and Relational Database”, Proc. ACM SIGMOD Conf., 1993,
pp. 523-528
13. Mahnke, W., Steiert, H.-P.: “The Application Protential of ORDBMS in the
Design
Environments“,
Proc. CAD 2000, Berlin, pp. 219-239 (in German)
14. Ontos Business Data Server, http://www.ontos.com
15. Poet Object Server, POET Software, POET SQL Object Factory,
http://poet.com/
16. Rao, B.R.: “Object-oriented Databases: Technology, Applications, and
Products”,
McGraw-Hill, New York, 1994
17. RogueWave Software, DBTools.h++,
http://www.roguewav3e.com/products/dbtools/
|
|
||||||||
|