Sunday, 26 November 2017

Machine Learning


Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed to do so.  Machine learning and statistics are closely related fields. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.  He also suggested the term data science as a placeholder to call the overall field


image


Statistics is just about the numbers, and quantifying the data. There are many tools, mathematical formulas and functions for finding relevant properties of the data but this is pretty close to pure mathematics.

Data Mining is about using Statistics as well as other programming methods to find patterns hidden in the data so that you can explain some phenomenon.  Human intervention is needed to perform Data mining process on data.

Machine Learning uses Data Mining techniques and other learning algorithms to build models of what is happening behind some data so that it can predict future outcomes. Math is the basis for many of the algorithms, but this is more towards programming. This is autonomous computer program which will build model to be utilized by applications and AI.




Artificial Intelligence (The Past)

Artificial intelligence (AI, also machine intelligence, MI) is Intelligence displayed by machine.  AI machine should be programmed explicitly to perform some task.  Only operates successfully in very constrained environment.  But still bulk of decision has to be made by Humans.

image


Machine Learning (Today)

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed to do so.  Machine learning and statistics are closely related fields.  Machine learning subfield of AI, basically it uses Math/Statistical analysis of data to predict future and make decisions.


image



image




Deep Learning (The Future)

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.  It is a future of AI.  Deep learning contains multiple layer of Machine learning algorithms as below.  Deep learning is most powerfull than earlier Machine learning methods.  It produces more accurate prediction than our earlier version of AI technologies as it has more layer ins evolution process.


image



image



image



image





What is Machine learning processes in real world?

Below are few machine learning activities in real world scenarios.


image


image


image


image


image


image



Below are popular technologies which are being used and involved in machine learning era.


image



image




Machine Learning Process

Below explains basic Machine learning process.


image



image



image



image



image



image



image



image




Closer look at Machine Learning Process

Below points explain some closer details about Machine learning process and activities.


image



image




image



image



image



image



image



image



image



image

Internet of Things (IOT)


The Internet of things (IoT) is the network of physical devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, actuators, and network connectivity which enable these objects to connect and exchange data.Each thing is uniquely identifiable through its embedded computing system but is able to inter-operate within the existing Internet infrastructure. Experts estimate that the IoT will consist of about 30 billion objects by 2020.


image



image



image



The IoT allows objects to be sensed or controlled remotely across existing network infrastructure, creating opportunities for more direct integration of the physical world into computer-based systems, and resulting in improved efficiency, accuracy and economic benefit in addition to reduced human intervention.


The Internet of Things (IoT) is a system of interrelated computing devices, mechanical and digital machines, objects, animals or people that are provided with unique identifiers and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.

Example:  A thing, in the Internet of Things, can be a person with a heart monitor implant, a farm animal with a biochip transponder, an automobile that has built-in sensors to alert the driver when tire pressure is low -- or any other natural or man-made object that can be assigned an IP address and provided with the ability to transfer data over a network.


As per AWS explanation about IOT, it is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices. AWS IoT can support billions of devices and trillions of messages, and can process and route those messages to AWS endpoints and to other devices reliably and securely. With AWS IoT, your applications can keep track of and communicate with all your devices, all the time, even when they aren’t connected.

AWS IoT makes it easy to use AWS services like AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Machine Learning, Amazon DynamoDB, Amazon CloudWatch, AWS CloudTrail, and Amazon Elasticsearch Service with built-in Kibana integration, to build IoT applications that gather, process, analyze and act on data generated by connected devices, without having to manage any infrastructure.

The Internet of Things (IoT) is a term coined by Kevin Ashton, a British technology pioneer working on radio-frequency identification (RFID) who conceived a system of ubiquitous sensors connecting the physical world to the Internet. Although things, Internet, and connectivity are the three core components of IoT, the value is in closing the gap between the physical and digital world in self-reinforcing and self-improving systems.


Radio-frequency identification (RFID) uses electromagnetic fields to automatically identify and track tags attached to objects. The tags contain electronically stored information. Passive tags collect energy from a nearby RFID reader's interrogating radio waves. Active tags have a local power source (such as a battery) and may operate hundreds of meters from the RFID reader.


image



image



IoT creates these systems by connecting things, animate or inanimate, to the Internet with unique identifiers that provide context, giving visibility into the network, the devices themselves, and their environment. Equipped with rich data sets and using advanced analytics, IoT can give us enormous insight into our world: Measuring vibrations from wind turbine blades and performing real-time analysis to determine maintenance needs before the blades fail. Reducing energy consumption in buildings by controlling lighting on floors where no one is present. Or creating self-driving vehicles that process environmental information to make split-second decisions to stop and avoid accidents. The collective knowledge about the physical world, gained through IoT, becomes the input for more efficiency, new business models, lower pollution, and better health.


image

Thursday, 23 November 2017

Data Science Vs Data Analytics


This article will discuss about Data science which is new buzz word in recent days like IOT, Cloud, Robotics and etc.,  Also, we are going to see differences between Data science and Data Analytics and important characteristics of them. 


Data Analytics (The Present)

In past, data analysis is very expensive, slow, and difficult to collect data.


Data Analytics (The Present)

Data analytics, also known as analysis of data or data analytics, is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.  Data analytics is a process for obtaining raw data and converting it into information useful for decision-making by users as lot of effective tools like Excel, Tabuleau and etc., available today.  Also, data analytics is the application of data science practices in business world. 


image



image



image



How Data Analytics works step by step approach?


image



image



image



image



image



image



image




Below are few other type of Databases which are being used for Analytic and Research purpose.


Data Warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise. The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.


image_thumb4



image_thumb6



The typical Extract, transform, load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema in MS SSAS. The access layer helps users retrieve data.


Data Mart

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.This enables each department to isolate the use, manipulation and development of their data. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.


Data Lake

A data lake is a method of storing data within a system or repository, in its natural format,that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning. The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data store accommodating all forms of data. A data swamp is a deteriorated data lake, that is inaccessible to its intended users and provides little value.



Big data

Big data is high-volume and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision-making, and process automation.” Big Data analytics find insights that help organizations make better business decisions.


Business Intelligence and Business Intelligence Tools

Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

Business intelligence software or tools is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that have been previously stored, often, though not necessarily, in a data warehouse or data mart.


The key general categories of business intelligence applications are:



Data Science (The Future, data driven decision making)

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.  Also, data science is intersection of compunter science, Math and statistics, domain knowlege.  Data science is next level of analysis on data with various modern tools and technologies.   The goal of data science is transform data into knowlege to make desicions and future business predictions.


Data science is a multidisciplinary blend of data inference, algorithmm development, and technology in order to solve analytically complex problems.


image



A Data Scientist is a professional who understands data from a business point of view. He is in charge of making predictions to help businesses take accurate decisions. Data scientists come with a solid foundation of computer applications, modeling, statistics and math. What sets them apart is their brilliance in business coupled with great communication skills, to deal with both business and IT leaders. They are efficient in picking the right problems, which will add value to the organization after resolving it.



image



Data science – discovery of data insight

This aspect of data science is all about uncovering findings from data. Diving in at a granular level to mine and understand complex behaviors, trends, and inferences.

For example:

  • Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce.



Data science – development of data product

A "data product" is a technical asset that: (1) utilizes data as input, and (2) processes that data to return algorithmically-generated results.

For example:

    • Amazon's recommendation engines suggest items for you to buy, determined by their algorithms. Netflix recommends movies to you. Spotify recommends music to you.



Data Science Skills and Tools

Below are very important skills for data science world.


image



image



Data Science Process and Focus on Trends


image




Below are few generic steps which will be followed in any organization today in Data Analytical and Data Science scope.


Data Requirements (Data Analytics and Data Science)

The data is necessary as inputs to the analysis are specified based upon the requirements of those directing the analysis or customers who will use the finished product of the analysis.


Data Collection (Data Analytics and Data Science)

Data is collected from a variety of sources. The requirements may be communicated by analysts to custodians of the data, such as information technology personnel within an organization. The data may also be collected from sensors in the environment, such as traffic cameras, satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or reading documentation.


Data Processing (Data Analytics and Data Science)

Data initially obtained must be processed or organised for analysis. For instance, these may involve placing data into rows and columns in a table format (i.e., structured data) for further analysis, such as within a spreadsheet or statistical software.


Data Cleaning (Data Analytics and Data Science)

Once processed and organised, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning will arise from problems in the way that data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation.


Exploratory data analysis (Data Analytics and Data Science)

Once the data is cleaned, it can be analyzed. Analysts may apply a variety of techniques referred to as exploratory data analysis to begin understanding the messages contained in the data.The process of exploration may result in additional data cleaning or additional requests for data, so these activities may be iterative in nature.


Modeling and algorithms (Data Science)

Mathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation. In general terms, models may be developed to evaluate a particular variable in the data based on other variable(s) in the data.


Data Product or Model (Data Science)

A data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm. An example is an application that analyzes data about customer purchasing history and recommends other purchases the customer might enjoy.


Communication (Data Analytics and Data Science)

Once the data is analyzed, it may be reported in many formats to the users of the analysis to support their requirements. The users may have feedback, which results in additional analysis. As such, much of the analytical cycle is iterative.



Difference between Data Science and Data Analytics


Below are skill based comparision which explains difference between Data Scientist and Data Analysts .


image



Below role based comparision which explains difference between Data Scientist and Data Analysts.


"Analyst" is somewhat of an ambiguous job title that can represent many different types of roles (data analyst, marketing analyst, operations analyst, financial analyst, etc). What does this mean in comparison to data scientist?

  • Data Scientist: Specialty role with abilities in math, technology, and business domain. Data scientists work at the raw database level to derive insights and build data product.
  • Analyst: This can mean a lot of things. Common thread is that analysts look at data to try to gain insights. Analysts may interact with data at both the database level or the summarized report level.


image



Future of Data Science

In future, Data science will be the blend of Data Analytics, IOT, Machine Learning, and Big Data.  Each of this in this trend will exchange data between one another to fulfill data science requirements.  Eventually, it will deliver fullly autonomous system which will be called as “Smart system”, “Clous Robotics”, and “Cyber-physical system” and etc.,


image



image