Back

 Glossary

Acquisition Architecture

The system topology and technology supporting the extraction, transformation, integration, transport and loading of data into the data warehouse

Aggregation

A type of data consolidation whereby data is rolled-up or summarized. See consolidation.

Architecture

The form, structure and topology of any physical or conceptual system made up of inter-related components. See acquisition architecture, information architecture.

Archiving

Moving data to long-term off-line storage.

Artifact

A design consideration for retaining time-based (temporal) relationships of data over time.

Atomic Data

See organizationally structured, atomic level

Atomic Data Acquisition

See data acquisition, acquisition architecture

Atomic Level

Data warehouse level with the lowest degree of granularity. Atomic level data is used to populate data marts, departmental data stores, or other data warehouse levels.

Business Metadata

The non-technical definitions and descriptions of the data in the DW. Typically This would includes things such as business rules, data quality, data qualifications, load and refresh schedule, relationships to other data and business purpose. This information comes mainly from existing sources of data documentation such as CASE tools, data dictionaries and business system experts. See also metadata, technical metadata.

Business Problem

An opportunity for increased revenue or decreased costs that can be solved through enhanced decision support or analytical capabilities. Each iteration of a data warehouse should focus on a particular business problem.

Business Requirements

The needs or drivers identified as the objectives to be met by the DW. The end users and management within the business analysis activities in a DW project define these. Acceptance testing is used to ensure that the DW meets the established business requirements. 

Business Rule

An algorithm or logic statement rule which governs some aspect of a business function and can be expressed through some type of data transformation.

Cleanliness

See data cleanliness.

Cleansing

The manual or automated elimination, or correction of data errors at the source. See scrubbing.

Completeness

See data completeness.

Conceptual Data Model

The highest, most abstract level at which the relationships between data entities is described and diagrammed. 

Controlled Redundancy

The replication of data within the data warehouse for the purposes of improved data access or understandability. (a.k.a. "managed redundancy")

Conversion

Altering and moving data from one form or structure to another.

Corporate Data Model

A data model whose scope or domain encompasses all subject areas and functions of the corporation. Corporate data models are of limited utility on data warehouse projects. 

Data Access

The act of reading the data warehouse by end users via reporting or analytical tools. 

Data Acquisition

The extract, transformation, integration, transport and loading of data from one structure to another. Applies to the population of atomic level and secondary level data structures. See also acquisition architecture.

Data Cleanliness

A quality indicator of a data store related to how much errant data it contains. See also data quality, data completeness, and data correctness.

Data Completeness

A quality indicator of a data store related to how much data it may be missing. See also data quality, data cleanliness, and data correctness. see data quality

Data Conversion

See conversion.

Data Correctness

A quality indicator of a data store related to how accurate the data is. See also data quality, data cleanliness, and data completeness.

Data Extract

See extract.

Data Fusion

see data integration

Data Integration

See integration

Data Load

Populating a database from a load file. This may include use of a batch data load utility. Also refers to the load file itself.

Data Mapping

Matching source and target data elements to specify where data warehouse elements will be sourced from.

Data Mart

A form of simplified DW implementation, or a component of an architected data warehouse, which is based on data collected directly from source systems. This data is used for a specific departmental rather than an enterprise level decision support purpose.

Data Mining

Automated analysis of data to unearth previously undiscovered data correlation.

Data Model

A data representation which illustrates data entities and elements and the relationships among them. See conceptual data model, logical data model, and physical data model.

Data Quality

A term that indicates one or more pertains to issues of: accuracy, integrity, cleanliness, correctness, completeness, and consistency. The quality of data is often evaluated to determine usability and to establish the processes necessary for improving data quality. Data quality may be measured objectively or subjectively.

Data Staging Area

An area set aside in a database or disks for loading and testing data prior to loading into the production environment. The staging area provides the opportunity to process or validate data without affecting the production DW environment.

Data Steward

An individual who is largely responsible for data definitions, data usage, and data quality.

Data Warehouse

An integrated, subject oriented collection of data which captures data at incremental moments of time and retains them for a long period of time. Data warehouses are usually designed to enhance decision support functions. Architected data warehouses contain a single atomic level and multiple secondary levels of data to support particular decision support functions.

Data Warehouse Architecture

See acquisition architecture, information architecture.

Data Warehouse Characteristics

Those parameters of the processes that typify the nature of an iteration (e.g. focus, activities to be performed, etc.).

Data Warehouse Data

See data warehouse

Data Warehouse Model

A data model that supports informational or analytical processing rather than operational processing. Data warehouse models introduce controlled redundancy, subject orientation, and summary tables among other non-conventional modeling techniques. See also data warehouse, information architecture.

Decision Support System (DSS)

An information system used to assist in tactical or strategic business decision-making activities. Usually DSS involves the analysis of integrated, subject oriented, historical data.

Default Values

The content that a data element is assigned if not assigned by any other business rule or process.

Departmental Data Warehouse

A limited data warehouse used expressly to solve a particular business problem for a single business unit. It may include aggregated or sub-selected data acquired directly from source systems. See also data mart.

Departmental Level

The tables in a data warehouse used for a specific departmental purpose or to solve a particular business problem. This may include aggregated or sub-selected data from the atomic layer. See also atomic level, data mart, and secondary level.

Departmentally Structured

See departmental level.

Derived Data

Data that is fabricated or calculated based on some business rule from other data

Drill Down

The activity of navigating from high level data to detailed data (e.g. summary level to atomic level) using a data query/analysis tool.

Enterprise Data Model

See corporate data model.

Exception Processing

Actions for records that contain unanticipated or errant data process.

Executive Information System

Decision support systems designed for high level management characterized by simple interfaces and the visual presentation of summarized information.

Extract

Selecting data from one environment prior to transporting it to another environment. Or, a file created by an extract process.

Filter

A conditional statement or process that includes or excludes eliminates data based on its value. Typically seen as a SQL "where" clause.

Full Population Update

Performing a complete load of the data warehouse irrespective of when it was last updated. This process is usually used for initial data warehouse seeding (or initialization), or a complete snapshot of a particular table that has no cyclical processing, rather than only the changed data.

Granularity

The measure of the level of data detail of data. Low granularity infers that the data has been summarized or aggregated. High granularity infers that the data is more transaction-oriented. See atomic level, secondary level.

Individual Level

The most specific level of the data warehouse information architecture. This level consists of highly summarized or highly filtered data that is available or pertinent only to some small group or executive.

Individually Structured

See individual level.

Information Architecture

The DBMS model supporting the data warehouse which usually consists of multiple levels of data summarization, transformation, derivation and/or sub-selection

Information Directory

A data store of metadata that is accessed to understand data context and how to navigate the data warehouse.

Information Model

See data warehouse model.

Informational Processing

The processing of data for analytical and decision support needs versus operational needs.

Informational Systems

Computer systems that allow organizations to report and analyze the business. The data warehouse supports these types of systems that include decision support, executive information, enhanced reporting and data mining applications.

Infrastructure

The basic facilities, components, structures of a system (e.g. a computer network infrastructure consists of wire, routers, bridges and computers).

Integration

The process of bringing data elements from different systems together to form a coherent "integrated" database.

Integrity Constraints And Relationships

The rules that define the relationship between elements or record types. A constraint is a relationship that must exist for one or the other of the elements to exist (e.g. a child record must point to a valid parent record. It is not allowed to exist as an orphan.)

Intelligent Agent

A program such as a query that, when launched, is capable of running without supervision and interacting with many databases, or system components over many networks to return the necessary result(s)

Iteration

A repetition of a set of processes while retaining and applying the knowledge gained from the previous iteration of those same processes.

Iteration Processing

The technique of re-visiting a process or processes until the results are correct or optimal.

Lightly Summarized

When the amount of detail in the data is only slightly reduced

Load

To insert data values into a database that was previously empty.

Logical Data Model

The level of data modeling where the attributes and primary keys of each entity in the conceptual data model are identified and defined.

Mapping

The process of relating a source element to a target and describing the transformation that satisfies the relationship.

Massively Parallel Processor (MPP)

Computer hardware architecture allowing independently coordinated processors and memory space.

Metadata

Data about data including the description of the structure, content, keys, indexes, etc., of data.

Metadata Access Tool

A software application that provides access to metadata such as Prism Solution’s Directory Manager.

Methodology

A strategy and approach to achieving some goal presented as a framework in which related processes made up of activities or steps are grouped. A methodology is normally used as a guideline rather than as a strict set of instructions.

Migration

The process by which frequently used items of data are moved to more readily accessible areas of storage and infrequently used items of data are moved to less readily accessible areas of storage.

Multidimensional-Dimensional Structures

Data structures that have relationships beyond that of a two dimensional tables. A multidimensional-dimensional data set is often referred to as a "cube".

New Subject Iteration

Repeating the methodology specifically to bring a new "subject area" into the DW.

Non-Volatile

Describes data whose value rarely changes

OLAP

Acronym for On-line Analytical Processing - a set of tools and techniques in DSS environments.

Operational Data

Data used to support the daily processing a company does

Operational Data Model

A data model of current transaction processing business systems. This model is used to understand the availability of data for the data warehouse. 

Operational Data Model

A data model whose structure is designed for systems that carry out record level or element level updates. Typically this type of data model is highly "normalized".

Operational Data Store

A subject oriented database similar to in structure to a data warehouse whose refresh frequency approaches real time

Operational System

Computer systems that run the business, including transaction processing systems.

Organizationally Structured

Refers to a level of the data warehouse architecture, and the data found at that level. Also known as "atomic", "corporate" or "current detail" data, this level, the heart of the data warehouse is structured to meet the informational requirements of the entire organization. Data that has been archived ("archived detail") is also considered to belong to this level.

Physical Data Model

The level of data modeling where the entities and attributes are converted to data dictionary language compatible definitions, physical implementation constraints are resolved and tailored to the actual database type (Oracle, Sybase, Informix, DB2, etc.).

Post-Load Processing

The act of processing data after it has been loaded into a database.

Primitive Data

Data that is unchanged from its original form in the source system

Quality

See data quality.

Redundancy

The desired or undesired duplication of data within a data model. Effective data warehouses implement controlled redundancy to affect improved data accessibility.

Retention

Defines the period of time that the data is to be maintained in the database before being considered for archival, roll-up, or deletion.

Retention

Refers to the act of keeping data in the DW over a specified period of time.

Roll-Out

To implement in the field or make available to end-users.

Sample Population

A statistically significant subset of data.

Sampling

The technique of randomly acquiring a small percentage of data from a source. The technique is based on the theory that analyzing a statistically significant sample of a data set will reveal the same or close to the same information as analyzing the complete data set would. This is often used in lieu of extracting reviewing every row of data element in a file where the data volume is very large.

Scheduling

Setting the execution sequence and timing for movement of data files from the source system environment to the DW target environment.

Scrubbing

The automated correction of data anomalies during data warehouse processing. See cleansing.

Secondary Level

A data warehouse level that is populated from another Data Warehouse level, usually the atomic level.

Sizing

Determining needed disk, CPU and communications configurations. 

Factors that impact data warehouse sizing include data volumes, data transport volumes and frequency, data load volumes and frequency, typical data access and user report volumes.

Snapshot

A view of the data at a particular instant in time.

Source System

The system of record or the system that contains the operational data that is to be extracted and loaded into the DW.

Source/Target

see source systems / target systems

Star-Schema

A data model of a particular topology that is typical of the DW subject oriented data relationships. A large "fact" table that has "one to many" typifies the star-schema topology relationships with a number of smaller "dimension" tables.

Stress Test

A test to determine how many resources will be used by a system.

Subject Area

A major subset of corporate data such as customer, transaction, product, part, vendor.

Subject Area Model

A data model of a particular subject area in the DW.

Subject-Oriented

To focus on the subject versus the business process or organization. The data architecture of a DW is subject oriented versus process oriented.

Summarized

see lightly summarized and highly summarized

Symmetric Multi-Processor (SMP)

Computer hardware architecture allowing multi-threaded of multiple processes through shared CPUs and memory space. 

Systems Of Record

The source system that has been selected as the best and most accurate source of a particular subject area or subset of data for the data warehouse.

Technical Metadata

The portion of metadata that describes the data definition and programmatic aspects of the data in the data warehouse (e.g. data type, format, domain, physical table name and definition, etc.)

Time Stamp

To associate a date and time with a record or element that captures the moment of some event.

Time Variant

Data whose accuracy is relevant to some moment in time. The three forms of time variant data are continuous time span, event discrete, and periodic discrete data.

Transformation

Converting data to another for in order to enhance is understandability or accessibility by end users. Transformations are defined by a set of rules or algorithms.

Transformation

Modifying a data type or data value. A complex transformation may include mathematical or conditional business logic.

Transport

The process that moves data from one system environment to another via some communication protocol

User Types

A categorization of user proficiency or access frequency. Generally there are three types of DW users: high level users, ad-hoc users, and power users.

Validation

To ensure data correctness.

Visualization

A technique of graphically displaying the relationships between data and their values using graphs, colors or objects.

Volatility

Describes the tendency of data values to change (i.e. high volatility)

Volume Test

Test used to determine data volumes in a capacity planning process

Copyright © 2002