A Virtual Iterative Design Tool

Physical Database Design

Database Development Process

Ming Wang , Russell K. Chan , in Encyclopedia of Information Systems, 2003

I.E. Physical Design

The aim of physical database design is to decide how the logical database design will be implemented. For the relational database, this involves:

•: Defining a set of the table structures, data types for fields, and constraints on these tables such as primary key, foreign key, unique key, not null and domain definitions to check if data are out of the range.
•: Identifying the specific storage structures and access methods to retrieve data efficiently. For example, adding a secondary index to a relation.
•: Designing security features for the database system including account creation, privilege granting/revocation, access protection, and security level assignment.

Physical design is DBMS-specific whereas logical design by contrast is DBMS-independent. Logical design is concerned with the what; physical database design is concerned with the how. In short, physical design is a process of implementing a database on secondary storage with a specific DBMS.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000265

Physical Database Considerations

Charles D. Tupper , in Data Architecture, 2011

Queries, Reports, and Transactions

Part of the consideration for physical database design is the activity being passed against it. The transaction, query, or report creates a unit of work that threads its way through the database in a traversal route that can be mapped. Some of the process mapping has been covered in Chapters 9 and 10 Chapter 9 Chapter 10 , but a small recap would not hurt here. Functional decomposition in those chapters was defined as the breakdown of activity requirements in terms of a hierarchical ordering and is the tool for analysis of activity. The function is at the top of the hierarchy and is defined as a continuously occurring activity within the corporation. Within each function are many processes. Processes have a start activity, a process activity, and a termination activity, which completes the process. Each process may or may not be broken down into subprocesses. Each subprocess or event also has an initiation, an activity state, and a termination and differs from the process in that it represents activity at the lowest level.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123851260000152

Database Administration

Ming Wang , in Encyclopedia of Information Systems, 2003

II.C. Physical Design

Database administration is typically responsible for physical database design and much of database implementation. Physical design is the process of choosing specific structures and access paths for database files to achieve good performance for the various database applications. Each DBMS provides a variety of options for file organization and access paths. These include various types of indexing and clustering of related records on disk blocks. Once a specific DBMS is selected, the physical design process is restricted to choosing the most appropriate structure for the database files from the options offered by that DBMS. One of the advantages of relational database is that users are able to access relations and rows without specifying where and how the rows are stored. The internal storage representation for relations should be transparent to users in a relational database.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000253

Basic Requirements for Physical Design

Charles D. Tupper , in Data Architecture, 2011

Data Access

In order to do a proper physical database design, it is important to understand how and how frequently data will be accessed. Where does this information come from? Ideally, process models should contain references to business functions that will indicate how frequently a business process should be followed. This can be translated to pseudo-SQL (pseudo-code that does not need to parse but needs to contain access and ordering information). The criticality and concurrency of transactions are also important. This section will cover the following subparts of information vital to physical design of a high-performance database system.

•: Access implications: Data gathering and analysis must be done in the manner in which the user accesses the data. Additionally, the tools used for the access must be taken into consideration. For example, reporting tools often are broad spectrum—that is, they will work with many different DBMSs, and as such they use very generic methods for access. Unless they have a pass-through option, like WebFocus does for Microsoft Access and SQLServer, the passed through query will have poor access performance. If the access method is through a GUI front end that invokes DBMS stored procedure triggers or functions, then it is far more tunable for performance.
•: Concurrent access: Concurrent access is of concern for two considerations: network load and locking contention. Network load is not discussed here. Locking implications are dependent on the required access. If the data are required to be held static—that is, unchanged—an exclusive lock must be secured by the program executing the action. This exclusive lock prevents others from accessing the data while it is in use. There is an option to allow a read of the information while it is locked, knowing it will be changed. This is known as a dirty read and is done when the data needed are not those being updated. When too many programs are trying to access the same data, locking contention develops and a lock protocol is invoked, depending on the DBMS involved. In some cases the lock is escalated to the next higher object level in order to prevent a buildup of processes waiting to execute.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123851260000140

Data Warehousing and Caching

AnHai Doan , ... Zachary Ives , in Principles of Data Integration, 2012

10.1.1 Data Warehouse Design

Designing a data warehouse can be even more involved than designing a mediated schema in a data integration setting because the warehouse must support very demanding queries, possibly over data archived over time. Physical database design becomes critical — effective use of partitioning across multiple machines or multiple disk volumes, creation of indices, definition of materialized views that can be used by the query optimizer. Most data warehouse DBMSs are configured for query-only workloads, as opposed to transaction processing workloads, for performance: this disables most of the (expensive) consistency mechanisms used in a transactional database.

Since the early 2000s, all of the major commercial DBMSs have attempted to simplify the tasks of physical database design for data warehouses. Most tools have "index selection wizards" and "view selection wizards" that take a log of a typical query workload and perform a (usually overnight) search of alternative indices or materialized views, seeking to find the best combination to improve performance. Such tools help, but still there is a need for expert database administrators and "tuners" to obtain the best data warehouse performance.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124160446000107

Foreword

John Zachman , in Information Modeling and Relational Databases (Second Edition), 2008

There is one more interesting dimension of these rigorous, precise semantic models —they have to be transformed into databases for implementation. The authors describe in detail and by illustration the transformation to logical models, to physical database design, and to implementation. In this context, it is easy to evaluate and compare the various database implementation possibilities including relational databases, object-oriented databases, object-relational databases, and declarative databases; and they throw in star schemas and temporal databases for good measure! Once again, I cannot remember seeing so dispassionate and objective an evaluation and comparison of the various database structures. Within this context, it is straight-forward to make a considered and realistic projection of database technology trends into the foreseeable future.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123735683500011

Designing a Warehouse

Lilian Hobbs , ... Pete Smith , in Oracle 10g Data Warehousing, 2005

2.1.1 Don′t Use Entity Relationship (E-R) Modeling

The typical approach used to construct a transaction-processing system is to construct an entity-relationship (E-R) diagram of the business. It is then ultimately used as the basis for creating the physical database design, because many of the entities in our model become tables in the database. If you have never designed a data warehouse before but are experienced in designing transaction-processing systems, then you will probably think that a data warehouse is no different from any other database and that you can use the same approach.

Unfortunately, that is not the case, and warehouse designers will quickly discover that the entity-relationship model is not really suitable for designing a data warehouse. Leading authorities on the subject, such as Ralph Kimball, advocate using the dimensional model, and we have found this approach to be ideal for a data warehouse.

An entity-relationship diagram can show us, in considerable detail, the interaction between the numerous entities in our system, removing redundancy in the system whenever possible. The result is a very flat view of the enterprise, where hundreds of entities are described along with their relationships to other entities. While this approach is fine in the transaction-processing world, where we require this level of detail, it is far too complex for the data warehouse. If you ask a database administrator (DBA) if he or she has an entity-relationship diagram, the DBA will probably respond that he or she did once, when the system was first designed. But due to its size and the numerous changes that have occurred in the system during its lifetime, the entity-relationship diagram hasn′t been updated, and it is now only partially accurate.

If we use a different approach for the data warehouse, one that results in a much simpler picture, then it should be very easy to keep it up-to-date and also to give it to end users, to help them understand the data warehouse. Another factor to consider is that entity-relationship diagrams tend to result in a normalized database design, whereas in a data warehouse, a denormalized design is often used.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781555583224500047

CASE Tools for Logical Database Design

Toby Teorey , ... H.V. Jagadish , in Database Modeling and Design (Fifth Edition), 2011

Introduction to the CASE Tools

In this chapter we will introduce some of the most popular and powerful products available for helping with logical database design: IBM's Rational Data Architect, Computer Associates' AllFusion ERwin Data Modeler, and Sybase's PowerDesigner. These CASE tools help the designer develop a well-designed database by walking through a process of conceptual design, logical design, and physical creation, as shown in Figure 11.2.

Computer Associates' AllFusion ERwin Data Modeler has been around the longest. A stand-alone product, AllFusion ERwin's strengths stem from relatively strong support of physical database modeling, the broadest set of technology partners, and third-party training. What it does it does well, but in recent years it has lagged in some advanced features. Sybase's PowerDesigner has come on strong in the past few years, challenging AllFusion ERwin. It has some advantages in reporting, and advanced features that will be described later in this chapter. IBM's Rational Data Architect is a new product that supplants IBM's previous product Rational Rose Data Modeler. Its strength lies in strong design checking; rich integration with IBM's broad software development platform, including products from their Rational, Information Management, and Tivoli divisions; and advanced features that will be described below.

In previous chapters, we have discussed the aspects of logical database design that CASE tools help design, annotate, apply, and modify. These include, for example, entity–relationship (ER) and Unified Modeling Language (UML) modeling, and how this modeling can be used to develop a logical database design. Within the ER design, there are several types of entity definitions and relationship modeling (unrelated, one-to-many, and many-to-many). These relationships are combined and normalized into schema patterns known as normal forms (e.g., 3NF, snowflake schema). An effective design requires the clear definition of keys, such as the primary key, the foreign key, and unique keys within relationships. The addition of constraints to limit the usage (and abuses) of the system within reasonable bounds or business rules is also critical. The effective logical design of the database will have a profound impact on the performance of the system, as well as the ease with which the database system can be maintained and extended.

There are several other CASE products that we will not discuss in this book. A few additional products worth investigating include Datanamic's DeZign for Databases, QDesigner by Quest Software, Visible Analyst by Standard, and Embarcadero ER/Studio. The Visual Studio .NET Enterprise Architect edition includes a version of Visio with some database design stencils that can be used to create ER models. The cost and function of these tools varies wildly, from open-source products up through enterprise software that costs thousands of dollars per license.

The full development cycle includes an iterative cycle of understanding business requirements; defining product requirements; analysis and design; implementation; test (component, integration, and system); deployment; administration and optimization; and change management. No single product currently covers that entire scope. Instead, product vendors provide, to varying degrees, suites of products that focus on portions of that cycle. CASE tools for database design largely focus on the analysis and design portion, and to a lesser degree, the testing portion of this iterative cycle.

CASE tools provide software that simplifies or automates some of the steps described in Figure 11.2 . Conceptual design includes steps such as describing the business entities and functional requirements of the database; logical design includes definition of entity relationships and normal forms; and physical database design helps transform the logical design into actual database objects, such as tables, indexes, and constraints. The software tools provide significant value to database designers by:

1.: Dramatically reducing the complexity of conceptual and logical design, both of which can be rather difficult to do well. This reduced complexity results in better database design in less time and with less skill requirements for the user.
2.: Automating transformation of the logical design to the physical design (at least the basic physical design). This not only reduces time and skill requirements for the designer, but significantly removes the chance of manual error in performing the conversion from the logical model to the physical data definition language (DDL), which the database server will "consume" (i.e., as input) to create the physical database.
3.: Providing the reporting, roundtrip engineering, and reverse engineering that make such tools invaluable in maintaining systems over a long period of time. System design can and does evolve over time due to changing and expanding business needs. Also, the people who design the system (sometimes teams of people) may not be the same as those charged with maintaining the system. The complexity of large systems combined with the need for continuous adaptability virtually necessitates the use of CASE tools to help visualize, reverse engineer, and track the system design over time.

You can find a broader list of available database design tools at the website Database Answers ( www.databaseanswers.com/modelling_tools.htm ), maintained by David Alex Lamb at Queen's University in Kingston, Canada.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123820204000136

Data Modeling: Entity-Relationship Data Model

Salvatore T. March , in Encyclopedia of Information Systems, 2003

I.B. Data Models and Database Implementations

A data model does not specify the physical storage of the data. It provides a precise representation of the data content, structure, and constraints required by an application. These must be supported by the database and software physically implemented for the application. The process of developing a database implementation (schema) from a data model is termed physical database design . In short, the data model defines what data must be represented in the application and the database schema defines how that data is stored. The goal of data modeling, also termed conceptual database design, is to accurately and completely represent the data requirements. The goal of physical database design is to implement a database that efficiently meets those requirements.

Clearly there must be a correspondence between a data model and the database schema developed to implement it. For example, a data model may specify that each employee must report to exactly one department at any point in time. This is represented as a relationship between employees and departments in the data model. This relationship must have a physical implementation in the database schema; however, how it is represented is not of concern to the data model. That is a concern for the physical database design process. In a relational DBMS (RDBMS), relationships are typically represented by primary key-foreign key pairs. That is, the department identifier (primary key) of the department to which an employee reports is stored as a column (foreign key) in the employee's record (i.e., row in the Employee table). In an object DBMS relationships can be represented in a number of ways, including complex objects and embedded object identifiers (OIDs).

Numerous data modeling formalisms have been proposed; however, the entity-relationship (ER) model and variations loosely termed binary-relationship models are the most widely known and the most commonly used. Such formalisms have come to be known as semantic data models to differentiate them from the storage structures used by commercial DBMSs to define a database schema. Data modeling has become a common component of system development methodologies. A number of object-oriented system development approaches, such as the Unified Modeling Language, have extended data models into what has been termed class diagrams. These use the same basic constructs as data models to represent the semantic data structure of the system, but typically extended the representation to include operations, system dynamics, and complex constraints and assertions.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000344

Data Virtualization, Information Management, and Data Governance

Rick F. van der Lans , in Data Virtualization for Business Intelligence Systems, 2012

11.2 Impact of Data Virtualization on Information Modeling and Database Design

Data virtualization has an impact on certain aspects of how databases are designed. To show clearly where and what the differences are, this book considers this design process to consist of three steps: information modeling, logical database design, and physical database design .

One of the tasks when developing a business intelligence system is to analyze the users' information needs. On which business objects do they need reports? What are the properties of those business objects? On which level of detail do they need the data? How do they define those business objects? This is information modeling, which is about getting a precise understanding of the business processes, the data these processes need, and the corresponding decision-making processes. It's an activity that requires little to no knowledge of database technology. What's needed is business knowledge. The more an analyst understands of the business and its needs, the better the results of information modeling. This step is sometimes referred to as data modeling, conceptual data modeling, or information analysis. The term information modeling is used in this book because it's the most commonly used term.

The result of information modeling, called the information model, is a nontechnical but formal description of the information needs of a group of users. Usually, it consists of a diagram describing all the core business objects, their properties, and their interrelationships. Diagramming techniques used are normally based on entity-relationship diagramming (see, for example, [54]). Another diagramming technique used regularly in business intelligence environments is based on multidimensional modeling (see [55]).

In the second step—logical database design—the information model is transformed to tables consisting of columns and keys that are implemented in a staging area, data warehouse, or data mart. These tables will hold the users' information needs. This is a semitechnical step. Normally, the result is simply a description or model of all the tables with their columns and keys structures.

The third step—physical database design—focuses on finding the most effective and efficient implementation of these tables for the database server in use. In this step, database specialists study aspects such as which columns need indexes, whether tables have to be partitioned, and how the physical parameters of table spaces should be set. They can even decide to restructure tables to improve performance. For example, data from two tables is joined to form a more denormalized structure, or derived and aggregated data is added to existing tables. The result of physical database design is a database model showing all the tables, their columns, and their keys. An example of such a database model is shown in Figure 11.1.

Compared to logical database design, physical database design is a very database server-specific step. This means that the best imaginable solution for an Oracle database server doesn't have to be the best solution for a Microsoft database server.

For business intelligence systems with a more classic architecture, early on in the project designers decide which data stores are needed. Should the system be built around a data warehouse, is a staging area needed, and should data marts be developed? These decisions don't have to be made when data virtualization forms the heart of a business intelligence system. Initially, only a data warehouse is created, so no data marts or personal data stores are developed at the start of the project. For performance reasons, they might be created later on.

Using data virtualization has impact on information modeling and database design:

Impact 1—Less Database Design Work: When a business intelligence system is developed, that three-step design process has to be applied to all the data stores needed. So information modeling and logical and physical database design have to be performed, for example, for the data warehouse, the staging area, and the data marts. An information model has to be created, and a database model has to be developed for each of these data stores. For a system based on data virtualization, information modeling is still necessary, but database design only applies to the data warehouse because there are no other data stores. Because there are fewer data stores, there is less database design work.

Impact 2—Normalization Is Applied to All Tables: In a classic system, different database design approaches are used: normalization is quite often applied to the data warehouse, whereas the data marts usually receive a star schema or snowflake schema (see Section 2.6). Compare this to all the tables of a data warehouse in a system based on data virtualization, where initially they receive normalized structures. The reason they are normalized is that this is still the most neutral form of a data structure—neutral in the sense that it can support the widest range of queries and reports. Next, virtual tables are designed (according to the rules in Chapter 7). But for these virtual tables, no physical database design is needed because there are no data stores.

Impact 3—Information Modeling and Database Design Become More Iterative: An iterative approach for information modeling and database design is easier to deploy when data virtualization is used. The initial design of a data warehouse doesn't have to include the information needs of all the users, and new information needs can be implemented step by step.

But why is this easier to deploy? When new information needs are implemented, new tables have to be added, columns may have to be added to existing tables, and existing table structures might have to be changed. In a system with a classic architecture, making these changes requires a lot of time. Not only do the tables in the data warehouse have to be changed, but the data marts and the ETL scripts that copy the data must be changed as well. And changing the tables in the data marts leads to changes in existing reports as well. Reporting code has to be changed to show the same results.

This is not the case when data virtualization is used. If the information needs to be changed, the tables in the data warehouse have to be changed, but this doesn't apply to data marts and ETL scripts. Those changes can be hidden in the mappings of the virtual tables accessed by the existing reports. The consequence is that the extra amount of work needed to keep the existing tables unchanged is considerably less. The changes to the real tables are hidden for the reports. This is why a more iterative approach is easier to use when data virtualization is deployed.

Impact 4—Logical Database Design Becomes More Interactive and Collaborative: Usually, logical database design is quite an abstract exercise. The designers come up with a set of table definitions. In the eyes of the business users, especially if they don't have a computing background, those definitions are quite abstract. It's sometimes difficult for them to see how those tables together represent their information needs. The main reason is that they don't always think in terms of data structures but in terms of the data itself. For example, a designer thinks in terms of customers and invoices, while a user thinks in terms of customer Jones based in London and invoice 6473 which was sent to customer Metheny Metals. Therefore, it can be hard for a user to determine whether the table structures resulting from logical database design are really what he needs.

It would be better if the data structures plus the real data are shown so the users can see what those tables represent. When data virtualization is used, a logical database model can be implemented as virtual tables. The advantage is that when a virtual table is defined, its (virtual) contents can be shown instantaneously—in other words, both the analyst and the user can browse the contents and the user can confirm that what he sees satisfies his information needs. Logical database design becomes a more collaborative and more interactive process.

Impact 5—Physical Database Design Decisions Can Be Postponed: Physical database design changes in two ways. First, instead of having to make all the right physical design decisions upfront, many can be postponed. For example, if a report is too slow, a cache can be defined. That cache can be created instantaneously, and no existing reports have to be changed for that. A more drastic solution might be to create a data mart to which the virtual tables are redirected.

The assumption made here is that derived data stores are not needed initially and therefore require no physical database design. Second, there is less to design. If, indeed, because of data virtualization, fewer databases have to be designed, then there is less physical database design work to do. In a classic architecture where data warehouses and data marts have to be designed, only the first is designed. This makes it a simpler process.

Impact 6—Denormalization Is Less Negative: When designing real tables, denormalization leads to duplication of data, increases the size of a database (in bytes), slows down updates and inserts, and can lead to inconsistencies in the data. These have always been seen as the main disadvantages of denormalization. Every database designer knows this, and it's on page one of every book on database design. If denormalization is applied when designing virtual tables, these assumptions are not true, and these disadvantages don't apply anymore. The point is that a virtual table doesn't have a physical content. So if a virtual table has a denormalized structure, no redundant data is stored, the database doesn't increase, it does not by definition slow down updates and inserts, and it does not lead to inconsistent data. However, if a cache is defined for a denormalized virtual table, then the cache does contain duplicated data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123944252000113