Saturday, 18 November 2017

Relational versus Non-Relational Databases


A database is an organized collection of data. A database-management system (DBMS) is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. A general-purpose DBMS allows the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, EnterpriseDB, MongoDB, MariaDB, Microsoft SQL Server, Oracle, Sybase, SAP HANA, MemSQL, SQLite and IBM DB2.  Databases can be classified into two broad categories 1) relational 2) Non-relational or No-SQL


Relational versus Non-Relational


Relational databases are also called Relational Database Management Systems (RDBMS) or SQL databases. Data is represented in rows vs columns format in tables or views.  Historically, the most popular of these have been Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2. The RDBMS’s are used mostly in large enterprise scenarios, with the exception of MySQL, which is also used to store data for Web applications.  All relational databases can be used to manage transaction-oriented applications (OLTP), and most non-relational databases, in the categories of Document Stores and Column Stores.

RDBMSs support ACID transactions (Atomicity, Consistency, Isolation, and Durability).  RDBMSs have provided for data integrity needs for decades, but the exponential growth of data over the past 10 years or so, along with many new data types have changed the data equation entirely, and so non-relational databases have grown from such a need.


Example of Relational DBs: Oracle, MS SQL Server, MySQL, Postgre SQL and etc.,


Hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value.   The idea behind hierarchical database models is useful for a certain type of data storage, but it is not extremely versatile. Its limitations mean that it is confined to some very specific uses. For example, where each individual person in a company may report to a given department, the department can be used as a parent record and the individual employees will represent secondary records, each of which links back to that one parent record in a hierarchical structure.


Object database: [Cache DB] .  An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. Object-relational databases are a hybrid of both approaches.  Object-oriented database management systems (OODBMSs) also called ODBMS(Object Database Management System) combines database capabilities with object-oriented programming language capabilities. OODBMSs allow object-oriented programmers to develop the product, store them as objects, and replicate or modify existing objects to make new objects within the OODBMS. Because the database is integrated with the programming language, the programmer can maintain consistency within one environment, in that both the OODBMS and the programming language will use the same model of representation. Below are few referneces of OODBMS.


Below are few primary characterstics of Relational or Hierarchical DB


Data Integrity: Data integrity is the overall completeness, accuracy and consistency of data. 

  • Entity Integrity: This is concerned with the concept of primary keys. The rule states that every table must have its own primary key and that each has to be unique and not null.
  • Referential Integrity: This is the concept of foreign keys. The rule states that the foreign key value can be in two states. The first state is that the foreign key value would refer to a primary key value of another table, or it can be null. Being null could simply mean that there are no relationships, or that the relationship is unknown.
  • Domain Integrity: This states that all columns in a relational database are in a defined domain.


DB Normalization: Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity. Normalization is also the process of simplifying the design of a database so that it achieves the optimal structure composed of atomic elements.


ACID: Automicity, Cosistency, Isolation, and Durability

Atomicity is an all-or-none proposition.

Consistency guarantees that a transaction never leaves your database in a half-finished state.

Isolation keeps transactions separated from each other until they’re finished.

Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.


image




Non-relational databases are also called NoSQL databases. NoSQL has become an industry standard term, but the name is beginning to lose popularity since it doesn’t fully cover the complexity and range of non-relational data stores that are available. Some of the most known NoSQL or non-relational DBs that discussed are MongoDB, DocumentDB, Cassandra, Coachbase, HBase, Redis, and Neo4j. There are literally hundreds, if not thousands, more.


Amazon AWS definition for NoSQL is a term used to describe high-performance, non-relational databases. NoSQL databases utilize a variety of data models, including document, graph, key-value, and columnar. NoSQL databases are widely recognized for ease of development, scalable performance, high availability, and resilience.


Example of Non-Relational DBs: CouchDB, MongoDB, DynamoDB and etc., 


Find more NoSQL DBs examples and its pupose at


Use Cases and types of NoSQL DBs

Number of different non-relational use cases, a few of these mentioned are:

  • Key-Value Stores: [Redis, Dynamo DB] Used for cache, queues fit in memory, rapidly changing data, and store blob data. Examples: sensor data, shopping cart, leaderboards, graph data bases, stock prices. Fastest “performance.”
  • Document Stores: [MongoDB, Dynamo DB] Have a flexible schemas, dynamic queries, defined indexes, good performance on big DB. Examples: order data, customer data, log data, product catalog, user generated content (chat sessions, tweets, blog posts, ratings, comments). Fastest development.
  • Wide-Column Stores: [Cassandra] Come with real-time querying of random (non-sequential) data, huge number of writes, sensors. Examples: Web analytics, time series analytics, real-time data analysis, banking industry.



Below are few other type of Databases which are being used for Analytic and Research purpose.


Data Warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise. The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.


image



image



The typical Extract, transform, load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema in MS SSAS. The access layer helps users retrieve data.


Data Mart

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.


Data Lake

A data lake is a method of storing data within a system or repository, in its natural format,that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning. The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data store accommodating all forms of data. A data swamp is a deteriorated data lake, that is inaccessible to its intended users and provides little value.



Big data

Big data is high-volume and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision-making, and process automation.” Big Data analytics find insights that help organizations make better business decisions.


Business Intelligence and Business Intelligence Tools

Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

Business intelligence software or tools is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that have been previously stored, often, though not necessarily, in a data warehouse or data mart.


The key general categories of business intelligence applications are:



Main Differences Between Relational and Non-Relational Databases


Relational Databases

  Pros

  • Relational databases work with structured data.
  • They support ACID transactional consistency and support “joins.”
  • They come with built-in data integrity and a large eco-system.
  • Relationships in this system have constraints.
  • There is limitless indexing. Strong SQL.

Cons

  • Relational Databases do not scale out horizontally very well (concurrency and data size), only vertically, (unless you use sharding).
  • Data is normalized, meaning lots of joins, which affects speed.
  • They have problems working with semi-structured data.


Non-relational / NoSQL

Pros

  • They scale out horizontally and work with unstructured and semi-structured data. Some support ACID transactional consistency.
  • Schema-free or Schema-on-read options.
  • High availability.
  • While many NoSQL databases are open source and so “free”, there are often considerable training, setup, and developments costs. There are now also numerous commercial products available.

Cons

  • Weaker or eventual consistency (BASE) instead of ACID.
  • Limited support for joins.
  • Data is denormalized, requiring mass updates (i.e. product name change).
  • Does not have built-in data integrity (must do in code).
  • Limited indexing.



When to choose NoSQL or Non-Relational DBs

Some of those reasons include as below:

  • When bringing in new data with a lot of volume and/or variety.
  • Data is non-relational/semi-structured.
  • Your team will be trained in these new technologies (NoSQL).
  • You have enough information to correctly select the type and product of NoSQL for your situation.
  • You can relax transactional consistency when scalability or performance is more important.
  • You can service a large number of user requests vs rigorously enforcing business rules.


Conclusion

We could assume that “RDBMS is for enterprise OLTP and ACID compliance, or databases under 1 terabyte. NoSQL is for scaled OLTP and JSON documents. Hadoop is for Big Data Analytics.” The choices on the market today are numerous, but so are the needs of different enterprises. How (and when) to choose the right database system is something that every enterprise must now contend with to maintain marketplace advantages.

No comments:

Post a Comment