Friday 10 November 2017

Explain about Architecture Of Apache PIG..?

It is a tool / Platform, generally used with Hadoop to analyze larger sets of data representation. It was developed by Yahoo in the year 2006.  It undergo various releases and the latest version is 0.17 which was released in   June – 2017. All the data manipulations in Hadoop is done suing Apache Pig.  In data analysis program, PIG contains a high-level language known as   PIG Latin.   programmers need to write scripts using PIG Latin for data analyzation using PIG The Scripts written in PIG Latin are internally converted to MAP and Reduce Tasks. This Apache Pig contains a component known as PIG Engine.  It  accepts PIG Latin as a Input and convert those into Map Reduce Jobs. Pig enables data workers to write complex transformations without knowing the PRIOR knowledge on JAVA. PIG can invoke code in many languages like JAVA, JYthon and JRuby using its User Defined Functions (UDF’s).   Get more information at  Hadoop Admin online course Bangalore,
PIG works with data from many sources, including structured, unstructured which stores the results into the Hadoop Data File System. It is part of Hadoop ecosystem technologies which includes Hive, HBase, Zookeeper and other utilities to fulfill the functionality gaps  in the framework. The major advantage of Pig is it follows a multi Query approach which reduces the number of time the data to be scanned. It reduces the development time by almost 16 times.

 Architecture:

 To perform a particular task, programmers need to write script using the PIG Latin language and execute them through any of the execution mechanism.  After the completion of execution these scripts go through a series of transformations to produce a desired output.



Components :   
The pig has several components. The architecture of Pig is shown below. Let us discuss them in detail.

Parser :  
Initially, PIG Scripts were handled by the Parser. It checks the syntax of the script, does type checking and other miscellaneous checks. The output of the Parser is DAG( Directed 
Acrylic  Graphic), which represents the Pig Latin statements and Logical operators.

Optimizer:   The  output   in the Parser is passed to logical optimizer, which carries logical optimizations such as Pushdown and  Projections
Compiler:    The   task of the compiler is to compile the logical  plan  into the series of Map Reduce Jobs
Execution Engine :  The task of the execution engine is to submit the Map Reduce jobs to Hadoop in a Sorted order.  Finally, these Map Reduce jobs are executed  on Hadoop to produce the Desired Results
Map Reduce:  It usually splits the input data set into independent chuncks, which are processed by a map task in a  completely parallel manner.  This frame works takes of scheduling and monitoring the task and re-executes if the task fails.
  
Features of PIG :
UDF’s – It provides the facility to create User Defined Functions as like in other programming languages like JAVA and invoke them in PIG Scripts.

Extensibility:   With the existing operators,  users can develop their own functions to read, process and write data.
Rich Set of operators:  Operations like Join, Sort, Filter etc.. can be performed using its rich set of  operators.
Effective Handling:  Pig handles all kinds of data, both structured and unstructured answer stores the results in HDFS.

Advantages of PIG :
 In comparison to SQL, PIG has following Advantages
It declares Execution plans.
It uses lazy evaluation
It can store data at any point during Pipe Line.
It uses Extract, transform and Load.
Map Reduce tasks can be done easily using PIG Latin  language.

Applications :  
For processing time sensitive data loads
For processing huge data resources such as web logs.

Recommended Audience:

Software developers
ETL  developers
Project Managers
Team Lead’s
Business Analyst

Prerequisites :  
There is nothing many prerequisites for learning Big Data Hadoop.Its good to have a knowledge of  some  OOPs Concepts. But it is not mandatory.Our Trainers  will teach you if you don’t have a knowledge of  those OOPs Concepts
Become a Master in  PIG framework from OnlineITGuru Experts through Hadoop Admin online  course Bangalore.

Tuesday 31 October 2017

Briefly Explain about HDFS?

A   File System is a method and Data Structures that an operating system keeps track of files or partition to store the files.
Drawbacks of Distributed File System:
A Distributed file system stores and processes the data sequentially
In a network, if one file lost, entire file system will be collapsed
Performance decrease if there is an increase in number of users to that files system
To overcome this problem, HDFS was introduced
Get in touch with OnlineITGuru for mastering the Hadoop Admin Online Training Bangalore.

Hadoop Distributed File System:

 HDFS is a Hadoop distributed file system which provides high-performance access to data across Hadoop Clusters.It stores huge amount of data across multiple machines and provides easier access for processing.This file system was designed to have high Fault Tolerant which enables rapid transfer of data between compute nodes and enables the Hadoop system to continue if a node fails.
When HDFS loads the data, it breaks the data into separate pieces and distributes them across different nodes in a cluster, which allows parallel processing of data. The major advantage of this file system each copy of data is stored multiple times across different nodes in the cluster.  It uses MASTER SLAVE architecture with each cluster consisting of single name node that contains a Single Name Node to manage File System operations and supporting Data Nodes to manage data storage on individual nodes.
Architecture of HDFS:
Name Node: It is a commodity hardware which contains Name Node software on a GNU/Linux operating system. Any machine that supports JAVA can run Name Node or Data Node.The system which having the name node acts a master server and does the following tasks
 It  Executes File system operations such as renaming, closing, opening files and directories.
It Request client access to files
It Manages file system namespace
Data Node: This is also a commodity hardware containing data node software installed on a GNU/Linux operating system   . Every node in the cluster contains the Data Node. This is responsible for managing the storage of their system.
 It Performs read-write operations of the file system as per client request.
The operations performed by the Data Node are block creation, deletion and replication  according to the instructions   of the name node.
Block:   Data is usually stored in the form of files to HDFS. The files which are stored in HDFS is divided into one or more segments and stored in individual data nodes.These file segments are known as blocks. The default size of each block is 64 MB which is the minimum amount of data that HDFS can read or write.
Replication : The numbers of backup copies for each data node.Usually, HDFS makes a 3 replica copies  and its replication factor is 3
HDFS New File Creation:  User applications can access the  HDFS File systems using HDFS  client, which exports the  HDFS file system interface.
When an application reads a file, the HDFS ask the Name Node for the list of  available Data nodes. The Data nodes list here is sorted by network Topology. The client directly asks the data node   and requests for the transfer of desired block. When the client writes the data into the file it first asks the Name node to choose Data node to host replicas for the first block of file.  When the first block is filled, the client request new data nodes to be chosen to host replicas of the next block.
The default replication factor is 3  and can be changed based upon the requirement.
Features of HDFS:
Streaming access to file data system
Suitable for Distributed storage and processing
Provides a command interface for HDFS interaction.
Built in Servers of data node and name node which helps end users.
Get in touch with OnlineITGuru for mastering the Hadoop Admin Online Training Bangalore.
Recommended Audience:
Team Lead’s
ETL developers
Software developers
Project Managers
Prerequisites:
In order to start learning Big Data has no prior requirement to have knowledge on any technology required to learn Hadoop Admin Online Training Bangalore and also need to have some basic knowledge on java concept.
Its good to have a knowledge on Oops concepts and Linux Commands.