Spark
1.High level - Spark Architecture, Deployment modes, Spark eco system compnents, Running, Installation and Configuring spark setup and supported programming Languages
2.Spark Shell
3.RDDS tranformations and actions
4.Datasets and DataFrames
5.Working with DataFrames and Schemas
6.Analyzing Data with DataFrame Queries
7.Querying Tables and Views with SQL (spark sql)
8.Spark APIs - High level
9.Spark Concepts, Streaming , API
https://www.javatpoint.com/apache-spark-architecture
https://spark.apache.org/docs/latest/quick-start.html
https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions
https://dzone.com/articles/understanding-of-spark-sql-dataframes-and-datasets
https://www.youtube.com/watch?v=UTpQxMtw58M
https://www.edureka.co/blog/spark-sql-tutorial/
https://spark.apache.org/docs/latest/api.html
https://www.javatpoint.com/what-is-big-data
1.RDD persistence and caching
2.RDD lineage
3.Shared variables
4.DataFrame Operations
5.Spark Distributed Processing - basics
6.Introduction to Structured Streaming
https://techvidvan.com/tutorials/persistence-and-caching-mechanism/
https://data-flair.training/blogs/rdd-lineage/
https://spark.apache.org/docs/latest/rdd-programming-guide.html#shared-variables
https://www.tutorialspoint.com/spark_sql/spark_sql_dataframes.htm
https://www.youtube.com/watch?v=MUMFAHMkLbE
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Hive
1.Hive on Hadoop Cluster & Purpose
2.Different Componenets of Hive
3.Hive Data Definition language
4.Hive Data Manipulation language
https://www.edureka.co/blog/apache-hive-installation-on-ubuntu
https://www.geeksforgeeks.org/architecture-and-working-of-hive/
https://data-flair.training/blogs/hive-ddl-commands/
https://www.javatpoint.com/hive-dml-commands
https://www.javatpoint.com/what-is-big-data
1.Hive Architecture & Hive Metastore
2.Hive data model - bucketing/ Paritioning
3.Hive Extensibility Features - Buckets, Analytics Functions, Windowing, Joins and Join Optimization
4.Functions in Hive - Built in functions
5.Functions in Hive - Custom User-defined Functions
https://medium.com/plumbersofdatascience/hive-architecture-in-depth-ba44e8946cbc
https://dzone.com/articles/hive-metastore-a-basic-introduction
https://youtu.be/r_k4zkT7Z5o
https://data-flair.training/blogs/apache-hive-partitions/
https://data-flair.training/blogs/bucketing-in-hive/
https://data-flair.training/blogs/hive-join/
https://data-flair.training/blogs/hive-optimization-techniques/
https://www.tutorialspoint.com/hive/hive_built_in_functions.htm
Oozie
1.Oozie - Basic concepts Scheduling, Workflow and Purpose
2.Property files to configure Workflow
3.Creation of an Oozie workflow - Given a Scenario
https://www.edureka.co/blog/apache-oozie-tutorial/
https://www.tutorialspoint.com/apache_oozie/apache_oozie_property_file.htm
https://www.tutorialspoint.com/apache_oozie/apache_oozie_workflow.htm
https://www.javatpoint.com/apache-kafka
HDFS
1.NameNode and DataNode concepts
2.HDFS Commands
3.HDFS Data Blocks - Block Size, Distribution, Replication and Redundancy related
https://www.hadoopinrealworld.com/namenode-and-datanode/
https://www.geeksforgeeks.org/hdfs-commands/
https://data-flair.training/blogs/data-block/
https://www.javatpoint.com/what-is-big-data
1.HDFS -Deep dive
2.Concepts - Rack Awarness, High Availability, Fault Tolerence, Federation
3.Name Nodes and Name spaces
4.Performance and failure Scenarios handling
https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788999830/3
https://data-flair.training/blogs/rack-awareness-hadoop-hdfs/
https://pdfs.semanticscholar.org/e72b/9c83fa5ff3f876b3ccec0484b0d02b569418.pdf
BigData concepts
1.Why Bigdata1
2.3Vs of Bigdata
3.Bigdata Life Cycle
4.Benefits of Bigdata
https://www.ibmbigdatahub.com/blog/why-big-data
https://www.flydata.com/blog
http://www.informit.com/articles/article.aspx?p=2473128&seqNum=11
https://www.dataversity.net/many-benefits-big-data-company//3-vs-of-big-data/
https://www.tutorialspoint.com/big_data_tutorials.htm
1.BigData Scenario's Identification
2.Differentiation between DWH and Datalake
3.Structured data,semi-structured and Unstructured data
4.Different file formats -Understanding
5.Batch and streaming data processing
https://www.it4nextgen.com/big-data-applications/
https://www.guru99.com/data-lake-vs-data-warehouse.html
https://www.cloudmoyo.com/blog/data-architecture/difference-between-a-data-warehouse-and-a-data-lake/
https://www.forbes.com/sites/bernardmarr/2019/10/18/whats-the-difference-between-structured-semi-structured-and-unstructured-data/#6ecbfbcd2b4d
https://techmagie.wordpress.com/category/big-data/data-formats/
https://www.bmc.com/blogs/batch-processing-stream-processing-real-time/
NoSQL Databases (Hbase)
1.NoSQL Databases - high Level
2.Different types of NoSQL Databases and purpose
3.Kafka Concepts
https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d
https://www.guru99.com/nosql-tutorial.html
https://aws.amazon.com/nosql/
https://www.tutorialspoint.com/apache_oozie/index.htm
https://www.upgrad.com/blog/apache-oozie-tutorial/
https://www.guru99.com/learn-oozie-in-5-minutes.html
https://www.javatpoint.com/nosql-databases
https://www.guru99.com/nosql-tutorial.html
https://www.w3resource.com/mongodb/nosql.php
1.HBase and its purpose
2.HBase table
3.Hbase Schema Design
4.Basic data access with Hbase Api
5.Hbase read and write path
6.Hbase replication & BackUp
7.Hbase performance HighLevel
https://www.ibm.com/analytics/hadoop/hbase
https://blog.cloudera.com/approaches-to-backup-and-disaster-recovery-in-hbase/
https://www.guru99.com/hbase-limitations-advantage-problems.html
https://intellipaat.com/blog/tutorial/hbase-tutorial/performance-tunning/
https://blog.eduonix.com/bigdata-and-hadoop/use-hbase-nosql-db/
https://www.guru99.com/hbase-shell-general-commands.html
https://www.guru99.com/handling-tables-hbase.html
https://mapr.com/blog/guidelines-hbase-schema-design/
https://intellipaat.com/blog/tutorial/hbase-tutorial/client-api-the-basics/
https://www.corejavaguru.com/bigdata/hbase-tutorial/read-and-write
https://acadgild.com/blog/read-write-operations-hbase
https://www.javatpoint.com/what-is-big-data
Hadoop Concepts 1.Tools and Technologies - High level
2.Hadoop Basic Architecture - For bigdata Processing
3.Understanding of Hadoop Distributions
4.Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml
5.Resource Managers - Understanding
6.Hadoop Services MapReduces - Basic concepts
7.MapReduces -Basic concepts
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
https://www.trifacta.com/blog/hadoop-distribution/
https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html
https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
https://www.guru99.com/bigdata-tutorials.html
1.Cluster Computing Framework
2.MapReduce - Deep diveConcepts
3.Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml - Manipulation
4.Resource Manager - High level technical understanding
5.Data Pipeline - Ingestion, Transformation, Store and Visualization
6.Workflow Management
https://www.tutorialspoint.com/Clustered-Systems
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml
https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
https://dzone.com/articles/what-is-a-data-pipeline
https://en.wikipedia.org/wiki/Workflow_management_system
Kafka 1.Kafka Use cases
2.Kafka Clusters and Brokers - High level understanding
3.Topics and Partitions - Overview
4.Stream Processing Concepts
5.Producers, Consumers and Consumer Groups
6.Producer and Consumer APIs - Understanding
7.Opeations
8.Kafka Streams and Connect APIs - Understanding
9.No SQL fundamentals
https://kafka.apache.org/uses
https://www.tutorialspoint.com/apache_kafka/apache_kafka_cluster_architecture.htm
https://data-flair.training/blogs/kafka-topic-architecture/
https://kafka.apache.org/11/documentation/streams/core-concepts
https://kafka.apache.org/documentation/#theproducer
https://kafka.apache.org/documentation/#theconsumer
https://kafka.apache.org/documentation/#producerapi
https://kafka.apache.org/documentation/#consumerapi
https://kafka.apache.org/documentation/#operations
https://kafka.apache.org/24/documentation/streams/
https://kafka.apache.org/documentation/#connectapi
https://www.tutorialspoint.com/apache_kafka/index.htm
No comments:
Post a Comment