Course duration
- 3 days
Course Benefits
- Store, manage, and analyze unstructured data
- Select the correct big data stores for disparate data sets
- Process large data sets using Hadoop to extract value
- Query large data sets in near real time with Pig and Hive
- Plan and implement a big data strategy for your organization
Course Outline
- Introduction to Big Data
- Defining Big Data
- The four dimensions of Big Data: volume, velocity, variety, veracity
- Introducing the Storage, MapReduce and Query Stack
- Delivering business benefit from Big Data
- Establishing the business importance of Big Data
- Addressing the challenge of extracting useful data
- Integrating Big Data with traditional data
- Defining Big Data
- Storing Big Data
- Analyzing your data characteristics
- Selecting data sources for analysis
- Eliminating redundant data
- Establishing the role of NoSQL
- Overview of Big Data stores
- Data models: key value, graph, document, column–family
- Hadoop Distributed File System
- HBase
- Hive
- Cassandra
- Amazon S3
- BigTable
- DynamoDB
- MongoDB
- Redis
- Riak
- Neo4J
- Selecting Big Data stores
- Choosing the correct data stores based on your data characteristics
- Moving code to data
- Messaging with Kafka
- Implementing polyglot data store solutions
- Aligning business goals to the appropriate data store
- Analyzing your data characteristics
- Processing Big Data
- Integrating disparate data stores
- Mapping data to the programming framework
- Connecting and extracting data from storage
- Transforming data for processing
- Subdividing data in preparation for Hadoop MapReduce
- Employing Hadoop MapReduce
- Creating the components of Hadoop
- MapReduce jobs
- Executing Hadoop
- MapReduce jobs
- Monitoring the progress of job flows
- The building blocks of Hadoop MapReduce
- Distinguishing Hadoop daemons
- Investigating the Hadoop Distributed File System
- Selecting appropriate execution modes: local, pseudo–distributed and fully distributed
- Accelerating process with Spark
- Handling streaming data
- Comparing real–time processing modelsLeveraging Storm to extract live events
- Leveraging Spark Streaming to extract live events
- Combining streaming and batch processing in a Lambda architecture
- Integrating disparate data stores
- Tools and Techniques to Analyze Big Data
- Abstracting Hadoop MapReduce jobs with Pig
- Communicating with Hadoop in Pig Latin
- Executing commands using the Grunt Shell
- Streamlining high–level processing
- Performing ad hoc Big Data querying with Hive
- Persisting metadata in the Hive Metastore
- Performing queries with HiveQL
- Investigating Hive file formats
- Creating business value from extracted data
- Mining data with Mahout
- Visualizing processed results with reporting tools
- Querying in real time with Impala
- Abstracting Hadoop MapReduce jobs with Pig
- Developing a Big Data Strategy
- Defining a Big Data strategy for your organization
- Establishing your Big Data needs
- Meeting business goals with timely data
- Evaluating commercial Big Data tools
- Managing organizational expectations
- Enabling analytic innovation
- Focusing on business importance
- Framing the problem
- Selecting the correct tools
- Achieving timely results
- Implementing a Big Data Solution
- Selecting suitable vendors and hosting options
- Balancing costs against business value
- Keeping ahead of the curve
- Defining a Big Data strategy for your organization
Class Materials
Each student will receive a comprehensive set of materials, including course notes and all the class examples.
Experience in the following is required for this Hadoop class:
- Working knowledge of the Microsoft Windows platform and basic database concepts.
Instructor-led courses are offered via a live Web connection, at client sites throughout Europe, and at our Geneva Training Center.