Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Course Syllabus

Module 1-Introduction to Digital Marketing

  • Introduction and Course Overview
  • History of the Internet
  • Who Is Online &What Do They Do Online?
  • What Do The Statistics Tell Us?
  • Real v/s Digital World
  • Careers
  • and prospects in Digital Marketing

Introduction to BigData, Hadoop:-

  • Big Data Introduction
  • Hadoop Introduction
  • What is Hadoop? Why Hadoop?
  • Hadoop History?
  • Different types of Components in Hadoop?
  • HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
  • What is the scope of Hadoop?

Deep Drive in HDFS (for Storing the Data):-

  • Introduction of HDFS
  • HDFS Design
  • HDFS role in Hadoop
  • Features of HDFS
  • Daemons of Hadoop and its functionality
  • Name Node
  • Secondary Name Node
  • Job Tracker
  • Data Node
  • Task Tracker
  • Anatomy of File Wright
  • Anatomy of File Read
  • Network Topology
  • Nodes
  • Racks
  • Data Center
  • Parallel Copying using DistCp
  • Basic Configuration for HDFS
  • Data Organization
  • Blocks and Replication
  • Rack Awareness
  • Heartbeat Signal
  • How to Store the Data into HDFS
  • How to Read the Data from HDFS
  • Accessing HDFS (Introduction of Basic UNIX commands)
  • CLI commands

MapReduce using Java (Processing the Data):-

  • Introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
  • Splits
  • Mapper
  • Portioning
  • Sort and shuffle
  • Combiner
  • Reducer
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
  • Driver Code
  • Mapper
  • and Reducer
  • How MapReduce Works
  • Writing and Executing the Basic MapReduce Program using Java
  • Submission & Initialization of MapReduce Job.
  • File Input/output Formats in MapReduce Jobs
  • Text Input Format
  • Key Value Input Format
  • Sequence File Input Format
  • NLine Input Format
  • Joins
  • Map-side Joins
  • Reducer-side Joins
  • Word Count Example
  • Partition MapReduce Program
  • Side Data Distribution
  • Distributed Cache (with Program)
  • Counters (with Program)
  • Types of Counters
  • Task Counters
  • Job Counters
  • User Defined Counters