Sunday, May 2, 2021

BigData Links

 Spark

1.High level - Spark Architecture, Deployment modes, Spark eco system compnents, Running, Installation and Configuring spark setup and supported programming Languages

2.Spark Shell

3.RDDS tranformations and actions

4.Datasets and DataFrames

5.Working with DataFrames and Schemas

6.Analyzing Data with DataFrame Queries

7.Querying Tables and Views with SQL (spark sql)

8.Spark APIs - High level

9.Spark Concepts, Streaming , API


https://www.javatpoint.com/apache-spark-architecture

https://spark.apache.org/docs/latest/quick-start.html

https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations

https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions

https://dzone.com/articles/understanding-of-spark-sql-dataframes-and-datasets

https://www.youtube.com/watch?v=UTpQxMtw58M

https://www.edureka.co/blog/spark-sql-tutorial/

https://spark.apache.org/docs/latest/api.html

https://www.javatpoint.com/what-is-big-data


1.RDD persistence and caching

2.RDD lineage

3.Shared variables

4.DataFrame Operations

5.Spark Distributed Processing - basics

6.Introduction to Structured Streaming


https://techvidvan.com/tutorials/persistence-and-caching-mechanism/

https://data-flair.training/blogs/rdd-lineage/

https://spark.apache.org/docs/latest/rdd-programming-guide.html#shared-variables

https://www.tutorialspoint.com/spark_sql/spark_sql_dataframes.htm

https://www.youtube.com/watch?v=MUMFAHMkLbE

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html


Hive

1.Hive on Hadoop Cluster & Purpose

2.Different Componenets of Hive

3.Hive Data Definition language

4.Hive Data Manipulation language


https://www.edureka.co/blog/apache-hive-installation-on-ubuntu

https://www.geeksforgeeks.org/architecture-and-working-of-hive/

https://data-flair.training/blogs/hive-ddl-commands/

https://www.javatpoint.com/hive-dml-commands

https://www.javatpoint.com/what-is-big-data


1.Hive Architecture & Hive Metastore

2.Hive data model - bucketing/ Paritioning

3.Hive Extensibility Features - Buckets, Analytics Functions, Windowing, Joins and Join Optimization

4.Functions in Hive - Built in functions

5.Functions in Hive - Custom User-defined Functions


https://medium.com/plumbersofdatascience/hive-architecture-in-depth-ba44e8946cbc

https://dzone.com/articles/hive-metastore-a-basic-introduction

https://youtu.be/r_k4zkT7Z5o

https://data-flair.training/blogs/apache-hive-partitions/

https://data-flair.training/blogs/bucketing-in-hive/

https://data-flair.training/blogs/hive-join/

https://data-flair.training/blogs/hive-optimization-techniques/

https://www.tutorialspoint.com/hive/hive_built_in_functions.htm


Oozie

1.Oozie - Basic concepts Scheduling, Workflow and Purpose

2.Property files to configure Workflow

3.Creation of an Oozie workflow - Given a Scenario


https://www.edureka.co/blog/apache-oozie-tutorial/

https://www.tutorialspoint.com/apache_oozie/apache_oozie_property_file.htm

https://www.tutorialspoint.com/apache_oozie/apache_oozie_workflow.htm

https://www.javatpoint.com/apache-kafka


HDFS

1.NameNode and DataNode concepts

2.HDFS Commands

3.HDFS Data Blocks - Block Size, Distribution, Replication and Redundancy related


https://www.hadoopinrealworld.com/namenode-and-datanode/

https://www.geeksforgeeks.org/hdfs-commands/

https://data-flair.training/blogs/data-block/

https://www.javatpoint.com/what-is-big-data


1.HDFS -Deep dive

2.Concepts - Rack Awarness, High Availability, Fault Tolerence, Federation

3.Name Nodes and Name spaces

4.Performance and failure Scenarios handling


https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788999830/3

https://data-flair.training/blogs/rack-awareness-hadoop-hdfs/

https://pdfs.semanticscholar.org/e72b/9c83fa5ff3f876b3ccec0484b0d02b569418.pdf


BigData concepts

1.Why Bigdata1

2.3Vs of Bigdata

3.Bigdata Life Cycle

4.Benefits of Bigdata


https://www.ibmbigdatahub.com/blog/why-big-data

https://www.flydata.com/blog

http://www.informit.com/articles/article.aspx?p=2473128&seqNum=11

https://www.dataversity.net/many-benefits-big-data-company//3-vs-of-big-data/

https://www.tutorialspoint.com/big_data_tutorials.htm


1.BigData Scenario's Identification

2.Differentiation between DWH and Datalake

3.Structured data,semi-structured and Unstructured data

4.Different file formats -Understanding

5.Batch and streaming data processing


https://www.it4nextgen.com/big-data-applications/

https://www.guru99.com/data-lake-vs-data-warehouse.html

https://www.cloudmoyo.com/blog/data-architecture/difference-between-a-data-warehouse-and-a-data-lake/

https://www.forbes.com/sites/bernardmarr/2019/10/18/whats-the-difference-between-structured-semi-structured-and-unstructured-data/#6ecbfbcd2b4d

https://techmagie.wordpress.com/category/big-data/data-formats/

https://www.bmc.com/blogs/batch-processing-stream-processing-real-time/


NoSQL Databases (Hbase)

1.NoSQL Databases - high Level

2.Different types of NoSQL Databases and purpose

3.Kafka Concepts


https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d

https://www.guru99.com/nosql-tutorial.html

https://aws.amazon.com/nosql/

https://www.tutorialspoint.com/apache_oozie/index.htm

https://www.upgrad.com/blog/apache-oozie-tutorial/

https://www.guru99.com/learn-oozie-in-5-minutes.html

https://www.javatpoint.com/nosql-databases

https://www.guru99.com/nosql-tutorial.html

https://www.w3resource.com/mongodb/nosql.php


1.HBase and its purpose

2.HBase table

3.Hbase Schema Design

4.Basic data access with Hbase Api

5.Hbase read and write path

6.Hbase replication & BackUp

7.Hbase performance HighLevel


https://www.ibm.com/analytics/hadoop/hbase

https://blog.cloudera.com/approaches-to-backup-and-disaster-recovery-in-hbase/

https://www.guru99.com/hbase-limitations-advantage-problems.html

https://intellipaat.com/blog/tutorial/hbase-tutorial/performance-tunning/

https://blog.eduonix.com/bigdata-and-hadoop/use-hbase-nosql-db/

https://www.guru99.com/hbase-shell-general-commands.html

https://www.guru99.com/handling-tables-hbase.html

https://mapr.com/blog/guidelines-hbase-schema-design/

https://intellipaat.com/blog/tutorial/hbase-tutorial/client-api-the-basics/

https://www.corejavaguru.com/bigdata/hbase-tutorial/read-and-write

https://acadgild.com/blog/read-write-operations-hbase

https://www.javatpoint.com/what-is-big-data


Hadoop Concepts 1.Tools and Technologies - High level

2.Hadoop Basic Architecture - For bigdata Processing

3.Understanding of Hadoop Distributions

4.Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml

5.Resource Managers - Understanding

6.Hadoop Services MapReduces - Basic concepts

7.MapReduces -Basic concepts


https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

https://www.trifacta.com/blog/hadoop-distribution/

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html

https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

https://www.guru99.com/bigdata-tutorials.html


1.Cluster Computing Framework

2.MapReduce - Deep diveConcepts

3.Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml - Manipulation

4.Resource Manager - High level technical understanding

5.Data Pipeline - Ingestion, Transformation, Store and Visualization

6.Workflow Management


https://www.tutorialspoint.com/Clustered-Systems

https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

Configuration files - core-site.xml, hdfs-site.xml, mapred-site.xml

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

https://dzone.com/articles/what-is-a-data-pipeline

https://en.wikipedia.org/wiki/Workflow_management_system

Kafka 1.Kafka Use cases

2.Kafka Clusters and Brokers - High level understanding

3.Topics and Partitions - Overview

4.Stream Processing Concepts

5.Producers, Consumers and Consumer Groups

6.Producer and Consumer APIs - Understanding

7.Opeations

8.Kafka Streams and Connect APIs - Understanding

9.No SQL fundamentals


https://kafka.apache.org/uses

https://www.tutorialspoint.com/apache_kafka/apache_kafka_cluster_architecture.htm

https://data-flair.training/blogs/kafka-topic-architecture/

https://kafka.apache.org/11/documentation/streams/core-concepts

https://kafka.apache.org/documentation/#theproducer

https://kafka.apache.org/documentation/#theconsumer

https://kafka.apache.org/documentation/#producerapi

https://kafka.apache.org/documentation/#consumerapi

https://kafka.apache.org/documentation/#operations

https://kafka.apache.org/24/documentation/streams/

https://kafka.apache.org/documentation/#connectapi

https://www.tutorialspoint.com/apache_kafka/index.htm

No comments:

Post a Comment