Posts

Showing posts with the label Kinesis

Big Data and Kinesis

 EMR Removes the challenges in setting up and maintaining Hadoop cluster. Elastic Mapreduce - Managed Hadoop and Spark Service Storage Options HDFS - Default block size of 128MB. EBS - For temporary data storage EMRFS - Read/writes to S3 based on HDFS. Instance Types General Purpose - M4 Machine learning - C4 Deep learning - P3 Large HDFS - D2 Large scale interactive analysis - X1 Node types Master - Manages the cluster - Runs Yarn to manage resources. Runs Ganglia, Zepplin. Can have 1 or 3 master nodes in EMR cluster Core nodes - 1 to many - runs HDFS - execute tasks from master nodes. Task nodes - does computation and they dont run HDFS - can be 0 to as many as needed. To accelerate data processing more can be added. EMR Cluster types Transient - Terminates automatically after workload completion. Say running 1 hr job 10 times a day Takes 15-30 mins for initialization Long running  - Need to be terminated manually.  Say running 2 hr job 12 times a day Lifecycles of EMR ...