Thursday, 23 November 2017

Data Science Vs Data Analytics


This article will discuss about Data science which is new buzz word in recent days like IOT, Cloud, Robotics and etc.,  Also, we are going to see differences between Data science and Data Analytics and important characteristics of them. 


Data Analytics (The Present)

In past, data analysis is very expensive, slow, and difficult to collect data.


Data Analytics (The Present)

Data analytics, also known as analysis of data or data analytics, is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.  Data analytics is a process for obtaining raw data and converting it into information useful for decision-making by users as lot of effective tools like Excel, Tabuleau and etc., available today.  Also, data analytics is the application of data science practices in business world. 


image



image



image



How Data Analytics works step by step approach?


image



image



image



image



image



image



image




Below are few other type of Databases which are being used for Analytic and Research purpose.


Data Warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise. The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.


image_thumb4



image_thumb6



The typical Extract, transform, load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema in MS SSAS. The access layer helps users retrieve data.


Data Mart

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.


Data Lake

A data lake is a method of storing data within a system or repository, in its natural format,that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning. The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data store accommodating all forms of data. A data swamp is a deteriorated data lake, that is inaccessible to its intended users and provides little value.



Big data

Big data is high-volume and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision-making, and process automation.” Big Data analytics find insights that help organizations make better business decisions.


Business Intelligence and Business Intelligence Tools

Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

Business intelligence software or tools is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that have been previously stored, often, though not necessarily, in a data warehouse or data mart.


The key general categories of business intelligence applications are:



Data Science (The Future, data driven decision making)

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.  Also, data science is intersection of compunter science, Math and statistics, domain knowlege.  Data science is next level of analysis on data with various modern tools and technologies.   The goal of data science is transform data into knowlege to make desicions and future business predictions.


Data science is a multidisciplinary blend of data inference, algorithmm development, and technology in order to solve analytically complex problems.


image



A Data Scientist is a professional who understands data from a business point of view. He is in charge of making predictions to help businesses take accurate decisions. Data scientists come with a solid foundation of computer applications, modeling, statistics and math. What sets them apart is their brilliance in business coupled with great communication skills, to deal with both business and IT leaders. They are efficient in picking the right problems, which will add value to the organization after resolving it.



image



Data science – discovery of data insight

This aspect of data science is all about uncovering findings from data. Diving in at a granular level to mine and understand complex behaviors, trends, and inferences.

For example:

  • Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce.



Data science – development of data product

A "data product" is a technical asset that: (1) utilizes data as input, and (2) processes that data to return algorithmically-generated results.

For example:

    • Amazon's recommendation engines suggest items for you to buy, determined by their algorithms. Netflix recommends movies to you. Spotify recommends music to you.



Data Science Skills and Tools

Below are very important skills for data science world.


image



image



Data Science Process and Focus on Trends


image




Below are few generic steps which will be followed in any organization today in Data Analytical and Data Science scope.


Data Requirements (Data Analytics and Data Science)

The data is necessary as inputs to the analysis are specified based upon the requirements of those directing the analysis or customers who will use the finished product of the analysis.


Data Collection (Data Analytics and Data Science)

Data is collected from a variety of sources. The requirements may be communicated by analysts to custodians of the data, such as information technology personnel within an organization. The data may also be collected from sensors in the environment, such as traffic cameras, satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or reading documentation.


Data Processing (Data Analytics and Data Science)

Data initially obtained must be processed or organised for analysis. For instance, these may involve placing data into rows and columns in a table format (i.e., structured data) for further analysis, such as within a spreadsheet or statistical software.


Data Cleaning (Data Analytics and Data Science)

Once processed and organised, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning will arise from problems in the way that data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation.


Exploratory data analysis (Data Analytics and Data Science)

Once the data is cleaned, it can be analyzed. Analysts may apply a variety of techniques referred to as exploratory data analysis to begin understanding the messages contained in the data.The process of exploration may result in additional data cleaning or additional requests for data, so these activities may be iterative in nature.


Modeling and algorithms (Data Science)

Mathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation. In general terms, models may be developed to evaluate a particular variable in the data based on other variable(s) in the data.


Data Product or Model (Data Science)

A data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm. An example is an application that analyzes data about customer purchasing history and recommends other purchases the customer might enjoy.


Communication (Data Analytics and Data Science)

Once the data is analyzed, it may be reported in many formats to the users of the analysis to support their requirements. The users may have feedback, which results in additional analysis. As such, much of the analytical cycle is iterative.



Difference between Data Science and Data Analytics


Below are skill based comparision which explains difference between Data Scientist and Data Analysts .


image



Below role based comparision which explains difference between Data Scientist and Data Analysts.


"Analyst" is somewhat of an ambiguous job title that can represent many different types of roles (data analyst, marketing analyst, operations analyst, financial analyst, etc). What does this mean in comparison to data scientist?

  • Data Scientist: Specialty role with abilities in math, technology, and business domain. Data scientists work at the raw database level to derive insights and build data product.
  • Analyst: This can mean a lot of things. Common thread is that analysts look at data to try to gain insights. Analysts may interact with data at both the database level or the summarized report level.


image



Future of Data Science

In future, Data science will be the blend of Data Analytics, IOT, Machine Learning, and Big Data.  Each of this in this trend will exchange data between one another to fulfill data science requirements.  Eventually, it will deliver fullly autonomous system which will be called as “Smart system”, “Clous Robotics”, and “Cyber-physical system” and etc.,


image



image

1 comment: