Friday, 27 October 2017

Domain Name System - Part 2


The Domain Name System (DNS) is a hierarchical decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols.

image


In Internet world, there are two type of DNS Search mechanisam available 1) Recursive 2) Iterative.  Below picture will explain these search types.
Recusive DNS Search:
image

Iterative DNS Search:

image







What is IPv4 and IPv6?
image


Top Level Domains
image
image


IANA (Internet Assigned Numbers Authority)
image


Domain Registrars
image


ICANN (Internet Corporation for Assigned Names and Numbers)
image


WhoIS DB for Domain Name search
image


InterNIC Service of ICANN
image



Below is simple Example of how browser resolving domain name to IP Address
image



Below is simple Example of how Hosting server change happens in Domain Registrar site
image


Hosted Zone
A hosted zone is a collection of resource record sets for a specified domain. You create a hosted zone for a domain (such as example.com), and then you create resource record sets to tell the Domain Name System how you want traffic to be routed for that domain. When you create a hosted zone, Amazon Route 53 automatically creates a name server (NS) record and a start of authority (SOA) record for the zone. The NS record identifies the four name servers that you give to your registrar or your DNS service so that DNS queries are routed to Amazon Route 53 name servers.



DNS Record Types
SOA (Start Of Authority) Record:

SOA means Start of Authority and is a significant part of a zone file in the domain name system (DNS). A SOA-Record contains important management information about the zone, especially regarding the zone transfer.  Keeping SOA record in DSN server is standard, this will help at the time Zone file Change/Transfer happens from Primary to Secondary servers.

Background:
Normally DNS name servers are set up in clusters. The database within this clusters is synchronized through zone transfers. The SOA-Record in the zone file contains data to control the zone transfer. This is the serial number and different timespans.It also contains the e-mail-address of the responsible person for this zone as well as the name of the primary master server. Usually the SOA-Record is located at the top of the zone. A zone without a SOA-Record does not meet the standard and is therefore not transferable.

image

Also, the SOA record is perhaps the least understood record in the entire zone file.  But it controls the speed that any update is propagated thourghout the Internet.  The purpose of the SOA record is:
  • Identify the DNS server that is authoritative for all information within the domain.
  • List the email address of the person in charge of the domain.
  • Control how often secondary servers check for changes to the zone file.
  • Control how long secondary servers keep the zone file active when the primary server cannot be contacted.
  • Control how long a negative response is cached by a DNS resolver (but for some DNS servers, this is also how long a DNS resolver should cache any response).

NS (Name Server) Record:
image
Note: Name Server is nothing but one of the Physical Domain Name Server of AWS’ Route 53 service.  This domain name server (DNS) Server contains all the records which are being used to resolve Internet address Name to Internet IP when user request for web site via Browser.

A (Address) Record:
image


TTL (Time to Live) Record:
image


CNames (Canonical Names) Record:
image


Alias Record:
image
image

Below are few other DNS records in industry.
image


Simple Routing Policy
When you create a resource record set, you choose a routing policy, which determines how Amazon Route 53 responds to queries.  Below are Routing policies available in AWS.
  • Simple routing policy – Use for a single resource that performs a given function for your domain, for example, a web server that serves content for the example.com website.
image


image
  • Failover routing policy – Use when you want to configure active-passive failover.
image

image

  • Geoproximity routing policy – Use when you want to route traffic based on the location of your resources and, optionally, shift traffic from one resources in one location to resources in another.
image


image


  • Latency routing policy – Use when you have resources in multiple locations and you want to route traffic to the resource that provides the best latency.
image

image

  • Multivalue answer routing policy – Use when you want Amazon Route 53 to respond to DNS queries with up to eight healthy records selected at random.
  • Weighted routing policy – Use to route traffic to multiple resources in proportions that you specify.
image
image


























Saturday, 14 October 2017

Digitalization


Welcome to new era of Digitalization.   I am sure it is truly buzz word in our every day now a days. This article explains fundamental concept of Digitalization which brings value to business and its revenue system. 


Below are few key definitions of Digitalization.

“Digitalization is the use of digital technologies to change a business model and provide new revenue and value-producing opportunities; it is the process of moving to a digital business.” 

“Digitalization is the integration of digital technologies into everyday life by the digitization of everything that can be digitized. The literal meaning of digitalization gives an apparent idea of development and technology dependent world. In this chapter, digitalization means computerization of systems and jobs for better ease and accessibility.”


Digital Transformation

Digital Transformation is once in life time technology shift from our current business state.  It transforms every industry & every business in the world into digitization era.  At present many firms are in IT Industrialization phase and looking for new business opportunity and business values.  The concept of “Digitalization” helps to achieve that strategic business objectives.


Below picture explain our current state and expected changes in Digitization world.

image



There are sequence of steps to move our current legacy business model into digitalization.


1) First step in the digital transformation process is fundamentally change the way you think about your business.


image



2) Changing to invent in new business model with latest emerging digital technologies from traditional business model.


image



3) You need to address these changes in new business model before competitors do.


image



4) Digital Transformation will have big impact in your business in the way of revenue, customer satisfaction, competitive advantage, and improve productivity.


image



Below few popular firms which have been transformed into “Digitalization” world most recent time with emerging technologies like Bigdata, Cloud, Containers, Machine learning and etc.,

                  • UBER
                  • AIRBNB
                  • TESLA



Below are few Key Technologies and Key Terms which have been used in “Digital Transformation” process of business


Open Source:

It is denoting software for which the original source code is made freely available and may be redistributed and modified. 

Open source software is software with source code that anyone can inspect, modify, and enhance."Source code" is the part of software that most computer users don't ever see; it's the code computer programmers can manipulate to change how a piece of software—a "program" or "application"—works. Programmers who have access to a computer program's source code can improve that program by adding features to it or fixing parts that don't always work correctly.


image


image



Big Data:

extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.  Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.


image


Big data Infrastructure Stack

image



Cloud:

Cloud computing is the on-demand delivery of compute power, database storage, applications, and other IT resources through a cloud services platform via the internet with pay-as-you-go pricing. Amazon is leading cloud provider at present, it has 1000+ services are offered via their cloud platform.


image



image



Containers:

A container is micro programing model.  Container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure. 


Containers are a way to package software in a format that can run isolated on a shared operating system. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.  Docker automates the repetitive tasks of setting up and configuring development environments so that developers can focus on what matters: building great software.


Docker container is one of the popular container platform.  Docker is the world’s leading software container platform. Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux, Windows Server, and Linux-on-mainframe apps.


image



DEVOPS (Developer + Operational):

DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market.


image


Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes, these two teams are merged into a single team where the engineers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function. Quality assurance and security teams may also become more tightly integrated with development and operations and throughout the application lifecycle.

These teams use practices to automate processes that historically have been manual and slow. They use a technology stack and tooling which help them operate and evolve applications quickly and reliably. These tools also help engineers independently accomplish tasks (for example, deploying code or provisioning infrastructure) that normally would have required help from other teams, and this further increases a team’s velocity.


Conclusion

Digitalization or Digital Transformation process can be achieved by organization easily with help of  open source software and cloud platforms.  In this article, very few open source applications and platforms are referenced in digitalization process.  However, there are large number of open source technologies freely available to be utilized for digital transformation initiation of your business.  I will update this article once again with popular open source software and tools when I get enough information about them. Until then see you.




File Formats and Structure


A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

Some file formats are designed for very particular types of data: PNG files, for example, store bitmapped images using lossless data compression. Other file formats, however, are designed for storage of several different types of data: the OGG format can act as a container for different types of multimedia including any combination of audio and video, with or without text (such as subtitles), and metadata. A text file can contain any stream of characters, including possible control characters, and is encoded in one of various character encoding schemes. Some file formats, such as HTML, scalable vector graphics, and the source code of computer software are text files with defined syntaxes that allow them to be used for specific purposes.


Identifying the file type

Different operating systems have traditionally taken different approaches to determining a particular file's format, with each approach having its own advantages and disadvantages. Most modern operating systems and individual applications need to use all of the following approaches to read "foreign" file formats, if not work with them completely.


Filename extension

One popular method used by many operating systems, including Windows, Mac OS X, CP/M, DOS, VMS, and VM/CMS, is to determine the format of a file based on the end of its name—the letters following the final period. This portion of the filename is known as the filename extension. For example, HTML documents are identified by names that end with .html (or .htm), and GIF images by .gif. In the original FAT file system, file names were limited to an eight-character identifier and a three-character extension, known as an 8.3 filename. There are only so many three-letter extensions, so, often any given extension might be linked to more than one program. Many formats still use three-character extensions even though modern operating systems and application programs no longer have this limitation. Since there is no standard list of extensions, more than one format can use the same extension, which can confuse both the operating system and users.



Internal metadata


A second way to identify a file format is to use information regarding the format stored inside the file itself, either information meant for this purpose or binary strings that happen to always be in specific locations in files of some formats. Since the easiest place to locate them is at the beginning, such area is usually called a file header when it is greater than a few bytes, or a magic number if it is just a few bytes long.


File header

The metadata contained in a file header are usually stored at the start of the file, but might be present in other areas too, often including the end, depending on the file format or the type of data contained. Character-based (text) files usually have character-based headers, whereas binary formats usually have binary headers, although this is not a rule. Text-based file headers usually take up more space, but being human-readable, they can easily be examined by using simple software such as a text editor or a hexadecimal editor.


As well as identifying the file format, file headers may contain metadata about the file and its contents. For example, most image files store information about image format, size, resolution and color space, and optionally authoring information such as who made the image, when and where it was made, what camera model and photographic settings were used (Exif), and so on. Such metadata may be used by software reading or interpreting the file during the loading process and afterwards.


Magic number or Shebang identifier in file contents

One way to incorporate file type metadata, often associated with Unix and its derivatives, is just to store a "magic number" inside the file itself. Originally, this term was used for a specific set of 2-byte identifiers at the beginnings of files, but since any binary sequence can be regarded as a number, any feature of a file format which uniquely distinguishes it can be used for identification. GIF images, for instance, always begin with the ASCII representation of either GIF87a or GIF89a, depending upon the standard to which they adhere. Many file types, especially plain-text files, are harder to spot by this method. HTML files, for example, might begin with the string <html> (which is not case sensitive), or an appropriate document type definition that starts with <!DOCTYPE HTML>, or, for XHTML, the XML identifier, which begins with <?xml. The files can also begin with HTML comments, random text, or several empty lines, but still be usable HTML.


The magic number approach offers better guarantees that the format will be identified correctly, and can often determine more precise information about the file. Since reasonably reliable "magic number" tests can be fairly complex, and each file must effectively be tested against every possibility in the magic database, this approach is relatively inefficient, especially for displaying large lists of files (in contrast, file name and metadata-based methods need check only one piece of data, and match it against a sorted index). On the other hand, a valid magic number does not guarantee that the file is not corrupt or is of a correct type. So-called shebang lines in script files are a special case of magic numbers. Here, the magic number is human-readable text that identifies a specific command interpreter and options to be passed to the command interpreter.


External metadata


A final way of storing the format of a file is to explicitly store information about the format in the file system, rather than within the file itself. This approach keeps the metadata separate from both the main data and the name, but is also less portable than either file extensions or "magic numbers", since the format has to be converted from file system to file system. While this is also true to an extent with filename extensions—for instance, for compatibility with MS-DOS's three character limit—most forms of storage have a roughly equivalent definition of a file's data and name, but may have varying or no representation of further metadata.


Note that zip files or archive files solve the problem of handling metadata. A utility program collects multiple files together along with metadata about each file and the folders/directories they came from all within one new file (e.g. a zip file with extension .zip). The new file is also compressed and possibly encrypted, but now is transmissible as a single file across operating systems by FTP systems or attached to email. At the destination, it must be unzipped by a compatible utility to be useful, but the problems of transmission are solved this way.


MIME (Multipurpose Internet Mail Extensions) types


MIME types are widely used in many Internet-related applications, and increasingly elsewhere, although their usage for on-disc type information is rare. These consist of a standardized system of identifiers (managed by IANA) consisting of a type and a sub-type, separated by a slash—for instance, text/html or image/gif. These were originally intended as a way of identifying what type of file was attached to an e-mail, independent of the source and target operating systems. There are problems with the MIME types though; several organizations and people have created their own MIME types without registering them properly with IANA, which makes the use of this standard awkward in some cases.


File format identifiers (FFIDs)


File name format identifiers is another, not widely used way to identify file formats according to their origin and their file category. It was created for the Description Explorer suite of software. It is composed of several digits of the form NNNNNNNNN-XX-YYYYYYY. The first part indicates the organization origin/maintainer (this number represents a value in a company/standards organization database), the 2 following digits categorize the type of file in hexadecimal. The final part is composed of the usual file extension of the file or the international standard number of the file, padded left with zeros. For example, the PNG file specification has the FFID of 000000001-31-0015948 where 31 indicates an image file, 0015948 is the standard number and 000000001 indicates the ISO Organization.


File content based format identification


Another but less popular way to identify the file format is to examine the file contents for distinguishable patterns among file types. The contents of a file are a sequence of bytes and a byte has 256 unique permutations (0–255). Thus, counting the occurrence of byte patterns that is often referred as byte frequency distribution gives distinguishable patterns to identify file types. There are many content-based file type identification schemes that use byte frequency distribution to build the representative models for file type and use any statistical and data mining techniques to identify file types



File structure and Types

File structure represents how each and every file types are storing encoded content in disk. There are several types of ways to structure data in a file. The most usual ones are described below.


Unstructured formats (Raw memory dumps)


Earlier file formats used raw data formats that consisted of directly dumping the memory images of one or more structures into the file. This has several drawbacks. Unless the memory images also have reserved spaces for future extensions, extending and improving this type of structured file is very difficult. It also creates files that might be specific to one platform or programming language (for example a structure containing a Pascal string is not recognized as such in C). On the other hand, developing tools for reading and writing these types of files is very simple.


Chunk-based formats


In this kind of file structure, each piece of data is embedded in a container that somehow identifies the data. The container's scope can be identified by start- and end-markers of some kind, by an explicit length field somewhere, or by fixed requirements of the file format's definition. Throughout the 1970s, many programs used formats of this general kind. For example, word-processors such as troff, Script, and Scribe, and database export files such as CSV. Electronic Arts and Commodore-Amiga also used this type of file format in 1985, with their IFF (Interchange File Format) file format. A container is sometimes called a "chunk", although "chunk" may also imply that each piece is small, and/or that chunks do not contain other chunks; many formats do not impose those requirements.

The information that identifies a particular "chunk" may be called many different things, often terms including "field name", "identifier", "label", or "tag". The identifiers are often human-readable, and classify parts of the data: for example, as a "surname", "address", "rectangle", "font name", etc. These are not the same thing as identifiers in the sense of a database key or serial number (although an identifier may well identify its associated data as such a key).


MIME headers do this with a colon-separated label at the start of each logical line. MIME headers cannot contain other MIME headers, though the data content of some headers has sub-parts that can be extracted by other conventions. CSV and similar files often do this using a header records with field names, and with commas to mark the field boundaries. Like MIME, CSV has no provision for structures with more than one level. XML and its kin can be loosely considered a kind of chunk-based format, since data elements are identified by markup that is akin to chunk identifiers. JSON is similar to XML without schemas, cross-references, or a definition for the meaning of repeated field-names, and is often convenient for programmers.


Directory-based formats


This is another extensible format, that closely resembles a file system (OLE Documents are actual file systems), where the file is composed of 'directory entries' that contain the location of the data within the file itself as well as its signatures (and in certain cases its type). Good examples of these types of file structures are disk images, OLE documents and TIFF images.



Few Popular File Types


Text File Types

image



Data File Types

image



Audio File Types

image


Note:

MIDI (Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface and electrical connectors and allows a wide variety of electronic musical instruments, computers and other related music and audio devices to connect and communicate with one another.[1] A single MIDI link can carry up to sixteen channels of information, each of which can be routed to a separate device.




Video File Types

image


Note:

MPEG (Moving Pictures Expert Group) MP4 is an abbreviated term for MPEG-4 Part 14, a standard developed by the Motion Pictures Expert Group who was responsible for setting industry standards regarding digital audio and video, and is commonly used for sharing video files on the Web.



Executable File Types

image



Web File Types

image



Compressed File Types

image



System File Types

image



Settings File Types

image



Encoded File Types

image



Font File Types

image



Plugin File Types

image



Disk Image File Types

image



Developer File Types

image



Backup File Types

image




Image (Photo / Graphics / Animation) File Types


Basics of Digital Images

Our digital images are dimensioned in pixels (not bytes, and definitely not inches). And a pixel is simply a color definition, the color that this tiny dot of image sampled area ought to be. Put all those colored dots together, and our brain sees the image. The losses of image data we are speaking about is about the altered color of the pixels.

Image data consists of pixels, and pixels are "colors", simply the storage of the three RGB data components. Any 24-bit RGB image will use three bytes per pixel. So - for example- any 10 megapixel camera image data will occupy 3x10 = 30 million bytes, by definition of RGB color. This number is the "data size" (when opened into computer memory for use). A TIF file will be near that size (and is lossless), but JPG is normally compressed very heavily (lossy, not lossless) to store in a JPG file of perhaps 1/10 this size (variable with JPG Quality setting), which is "file size" (not image size and not data size). This example image size is still 10 megapixels (dimensioned in pixels, width x height), and the data size is 30 million bytes, but the JPG file size might be 3 MB (lossy compression takes a few liberties). The image will still come out of the JPG file as the same 10 megapixels and the same 30 million bytes when the 3 MB JPG file is opened. We hope its quality also comes out about the same - the JPG losses are altered color values of some of the pixels).

Image size (pixels) determines how we can use the image - everything is about the pixels.  All photo editor programs will support these file formats, which will generally support and store images in the following color modes:


Color data mode of File Types,   bits per pixel


JPG

RGB - 24-bits (8-bit color), or Grayscale - 8-bits

Always uses lossy JPG compression, but its degree is selectable, for higher quality and larger files, or lower quality and smaller files. JPG is for photo images, and is the worst choice for most graphics or text data.


TIF

Versatile, many formats supported.
Mode: RGB or CMYK or LAB, and others, almost anything.
8 or 16-bits per color channel, called 8 or 16-bit "color" (24 or 48-bit RGB files).
Grayscale - 8 or 16-bits,
Indexed color - 1 to 8-bits,
Line Art (bilevel)- 1-bit

For TIF files, most programs allow either no compression or LZW compression (LZW is lossless, but is less effective for color images). Adobe Photoshop also provides JPG or ZIP compression in TIF files too (but which greatly reduces third party compatibility of TIF files). "Document programs" allow ITCC G3 or G4 compression for 1-bit text (Fax is G3 or G4 TIF files), which is lossless and tremendously effective (small). Many specialized image file types (like camera RAW files) are TIF file format, but using special proprietary data tags.

24-bits is called 8-bit color, three 8-bit bytes for RGB (256x256x256 = 16.7 million colors maximum.)
Or 48-bits is called 16-bit color, three 16-bit words (65536x65536x65536 = trillions of colors conceptually)


PNG

RGB - 24 or 48-bits (called 8-bit or 16-bit "color"),
Alpha channel for RGB transparency - 32 bits
Grayscale - 8 or 16-bits,
Indexed color - 1 to 8-bits,
Line Art (bilevel) - 1-bit

Supports transparency in regular indexed color, and also there can be a fourth channel (called Alpha) which can map RGB graduated transparency (by pixel location, instead of only one color, and graduated, instead of only on or off).

The APNG version also supports animation (like GIF), showing several sequential frames fast to simulate motion.

PNG uses ZIP compression which is lossless, and somewhat more effective color compression than GIF or TIF LZW. For photo data, PNG is somewhat smaller files than TIF LZW, but larger files than JPG (however PNG is lossless, and JPG is not.) PNG is a newer format than the others, designed to be both versatile and royalty free, back when the patent for LZW compression was disputed for GIF and TIF files.


GIF

Indexed color - 1 to 8-bits (8-bit indexes, limiting to only 256 colors maximum.) Color is 24-bit color, but only 256 colors.

One color in indexed color can be marked transparent, allowing underlaying background to be seen (very important for text, for example). GIF is an online video image, the file contains no dpi information for printing. Designed by CompuServe for online images in the days of dialup and 8-bit indexed computer video, whereas other file formats can be 24-bits now. However, GIF is still great for web use of graphics containing only a few colors, when it is a small lossless file, much smaller and better than JPG for this. GIF files do not save the dpi number for printing resolution.

GIF uses lossless LZW compression. (for Indexed Color, see second page at GIF link at page bottom).

GIF also supports animation, showing several sequential frames fast to simulate motion.


Note that if your image size is say 3000x2000 pixels, then this is 3000x2000 = 6 million pixels (6 megapixels). Assuming this 6 megapixel image data is RGB color and 24-bits (or 3 bytes per pixel of RGB color information), then the size of this image data is 6 million x 3 bytes RGB = 18 million bytes. That is simply how large your image data is (see more). Then file compression like JPG or LZW can make the file smaller, but when you open the image in computer memory for use, the JPG may not still have the same image quality, but it is always still 3000x2000 pixels and 18 million bytes. This is simply how large your 6 megapixel RGB image data is (megapixels x 3 bytes per pixel).


Type of Images

There are two type of image type available in computer world. 1) Raster Image 2) Vector Image.


Raster Image:

In computer graphics, a raster graphics or bitmap image is a dot matrix data structure, representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium. Raster images are stored in image files with varying formats.  Raster graphics are best used for non-line art images; specifically digitized photographs, scanned artwork or detailed graphics. Non-line art images are best represented in raster form because these typically include subtle chromatic gradations, undefined lines and shapes, and complex composition.

To maximize the quality of a raster image, you must keep in mind that the raster format is resolution-specific — meaning that raster images are defined and displayed at one specific resolution. Resolution in raster graphics is measured in dpi, or dots per inch. The higher the dpi, the better the resolution. Remember also that the resolution you actually observe on any output device is not a function of the file’s own internal specifications, but the output capacity of the device itself. Thus, high resolution images should only be used if your equipment has the capability to display them at high resolution.

Better resolution, however, comes at a price. Just as raster files are significantly larger than comparable vector files, high resolution raster files are significantly larger than low resolution raster files. Overall, as compared to vector graphics, raster graphics are less economical, slower to display and print, less versatile and more unwieldy to work with. Remember though that some images, like photographs, are still best displayed in raster format. Common raster formats include TIFF, JPEG, GIF, PCX and BMP files. Despite its shortcomings, raster format is still the Web standard — within a few years, however, vector graphics will likely surpass raster graphics in both prevalence and popularity.


image


image


Vector Image:

Unlike pixel-based raster images, vector graphics are based on mathematical formulas that define geometric primitives such as polygons, lines, curves, circles and rectangles. Because vector graphics are composed of true geometric primitives, they are best used to represent more structured images, like line art graphics with flat, uniform colors. Most created images (as opposed to natural images) meet these specifications, including logos, letterhead, and fonts.

Inherently, vector-based graphics are more malleable than raster images — thus, they are much more versatile, flexible and easy to use. The most obvious advantage of vector images over raster graphics is that vector images are quickly and perfectly scalable. There is no upper or lower limit for sizing vector images. Just as the rules of mathematics apply identically to computations involving two-digit numbers or two-hundred-digit numbers, the formulas that govern the rendering of vector images apply identically to graphics of any size.

Further, unlike raster graphics, vector images are not resolution-dependent. Vector images have no fixed intrinsic resolution, rather they display at the resolution capability of whatever output device (monitor, printer) is rendering them. Also, because vector graphics need not memorize the contents of millions of tiny pixels, these files tend to be considerably smaller than their raster counterparts. Overall, vector graphics are more efficient and versatile. Common vector formats include AI, EPS, CGM, WMF and PICT (Mac).



image



Difference in photo (Raster) and Graphics (Vector) images

Photo images have continuous tones, meaning that adjacent pixels often have very similar colors, for example, a blue sky might have many shades of blue in it. Normally this is 24-bit RGB color, or 8-bit grayscale, and a typical color photo may contain perhaps a hundred thousand RGB colors, out of the possible set of 16 million colors in 24-bit RGB color.

Graphic images are normally not continuous tone (gradients are possible in graphics, but are seen less often). Graphics are drawings, not photos, and they use relatively few colors, maybe only two or three, often less than 16 colors in the entire image. In a color graphic cartoon, the entire sky will be only one shade of blue where a photo might have dozens of shades. A map for example is graphics, maybe 4 or 5 map colors plus 2 or 3 colors of text, plus blue water and white paper, often less than 16 colors overall. Line art is a special case, only two colors (black or white, with no gray), for example clip art, fax, and of course text. Low resolution line art (like cartoons on the web) is often better as grayscale, to add anti-aliasing to hide the jaggies.



Below are few popular Picture/Image File Formats and Types

The most common image file formats, the most important for cameras, printing, scanning, and internet use, are JPG, TIF, PNG, and GIF.


3D Image Types

image


Raster Image Types

image


Vector Images Types

image



  • JPG is the most used image file format. JPG is the file extension for JPEG files (Joint Photographic Experts Group, committee of ISO and ITU). Digital cameras and web pages normally use JPG files - because JPG heroically compresses the data to be very much smaller in the file. However JPG uses lossy compression to accomplish this feat, which is a strong downside. A smaller file, yes, there is nothing like JPG for small, but this is at the cost of image quality. This degree is selectable (with an option setting named JPG Quality), to be lower quality smaller files, or to be higher quality larger files. In general today, JPG is rather unique in this regard, using lossy compression allowing very small files of lower quality, whereas almost any other file type uses lossless compression (and is larger). The meaning of Lossy is discussed below.

    Frankly, JPG is used when small file size is more important than maximum image quality (web pages, email, memory cards, etc). But JPG is good enough in many cases, if we don't overdo the compression. Perhaps good enough for some uses even if we do overdo it (web pages, etc). But if you are concerned with maximum quality for archiving your important images, then you do need to know two things: 1) JPG should always choose higher Quality and a larger file, and 2) do NOT keep editing and saving your JPG images repeatedly, because more quality is lost every time you save it as JPG (in the form of added JPG artifacts... pixels become colors they ought not to be - lossy). More at the JPG link at page bottom.

  • TIF is lossless (including LZW compression option), which is considered the highest quality format for commercial work. The TIF format is not necessarily any "higher quality" per se (the same RGB image pixels, they are what they are), and most formats other than JPG are lossless too. TIF simply has no JPG artifacts, no additional losses or JPG artifacts to degrade and detract from the original. And TIF is the most versatile, except that web pages don't show TIF files. For other purposes however, TIF does most of anything you might want, from 1-bit to 48-bit color, RGB, CMYK, LAB, or Indexed color. Most any of the "special" file types (for example, camera RAW files, fax files, or multipage documents) are based on TIF format, but with unique proprietary data tags - making these incompatible unless expected by their special software.
  • GIF was designed by CompuServe in the early days of computer 8-bit video, before JPG, for video display at dial up modem speeds. GIF discards all Exif data, and while GIF is fine for video screen purposes, GIF does Not retain printing resolution values. GIF always uses lossless LZW compression, but it is always an indexed color file (1 to 8-bits per pixel). GIF can have a palette of 24-bit colors, but only 256 of them maximum (which colors depend on your image colors). GIF is rather limited colors for color photos, but is generally great for graphics. Repeating, don't use indexed color for color photos today, the color is too limited. GIF offers transparency and animation. PNG and TIF files can also optionally handle the same indexed color mode that GIF uses, but they are more versatile with other choices too (can be RGB or 16 bits, etc). But GIF is still very good for web graphics (I.e., with a limited number of colors). For graphics of only a few colors, GIF can be much smaller than JPG, with more clear pure colors than JPG). Indexed Color is described at Color Palettes.
  • PNG can replace GIF today (web browsers show both), and PNG also offers many options of TIF too (indexed or RGB, 1 to 48-bits, etc). PNG was invented more recently than the others, designed to bypass possible LZW compression patent issues with GIF, and since it was more modern, it offers other options too (RGB color modes, 16 bits, etc). One additional feature of PNG is transparency for 24 bit RGB images. Normally PNG files are a little smaller than LZW compression in TIF or GIF (all of these use lossless compression, of different types), but PNG is slower to read or write. That patent situation has gone away now, but PNG remains excellent lossless compression. Less used than TIF or JPG, but PNG is another good choice for lossless quality work.
  • Camera RAW files are very important of course, but RAW files must be processed to regular formats (JPG, TIF, etc) to be viewable and usable in any way. However, the point is that RAW offers substantial benefit in doing that, one of which is we can choose our settings AFTER we can see the image, and what it needs, and what helps it. The debate goes on, some cannot imagine NOT taking advantage of the greater opportunities of RAW. Others think any extra step is too much trouble, and are satisfied with JPG - my own biased opinion is they just don't know yet. :) More detail Below.

    We could argue that there really is no concept of RAW files from the scanner. Vuescan does offer an output called RAW, which is 16 bit, but RGB, not raw. It includes the fourth Infrared noise correction channel data if any, and defers gamma correction. Vuescan itself is the only post-processor for these. But scanner color images are already RGB color, instead of Bayer pattern raw data like from cameras. Camera RAW images are not RGB, and must be converted to RGB for any use.


image



image


Major considerations to choose the necessary file type include:

  • Compression quality - Lossy for smallest files (JPG), or Lossless for best quality images (TIF, PNG).
  • Full RGB color for photos (TIF, PNG, JPG),   or Indexed Color for graphics (PNG, GIF, TIF).
  • 16-bit color (48-bit RGB data) is sometimes desired (TIF and PNG).
  • Transparency or Animation is used in graphics (GIF and PNG).
  • Documents - line art, multi-page, text, fax, etc - this will be TIF.
  • CMYK color is certainly important for commercial prepress (TIF).

See chart near bottom of page. We select the file type that supports the options we need.

The only reason for using lossy compression is for smaller file size, usually due to internet transmission speed or storage space. Web pages require JPG or GIF or PNG image types, because sone browsers do not show TIF files. On the web, JPG is the clear choice for photo images (smallest file, with image quality being less important than file size), and GIF is common for graphic images, but indexed color is not normally used for color photos (PNG can do either on the web).

Other than the web, TIF file format is the undisputed leader when best quality is desired, largely because TIF is so important in commercial printing environments. High Quality JPG can be pretty good too, but don't ruin them by making the files too small. If the goal is high quality, you don't want small. Only consider making JPG large instead, and plan your work so you can only save them as JPG only one or two times. Adobe RGB color space may be OK for your home printer and profiles, but if you send your pictures out to be printed, the mass market printing labs normally only accept JPG files, and only process sRGB color space.