Posts

Showing posts from February, 2022

AWS - Moving data and Migrations

  AWS Data Pipeline  To enable running activities using on-premise resources, AWS Data Pipeline does the following: It supply a Task Runner package that can be installed on your on- premise hosts.  This package continuously polls the AWS Data Pipeline service for work to perform. When it's time to run a particular activity on your on-premise resources, it will issue the appropriate command to the Task Runner.  Data Pipeline service does not support data streams. Task runners call PollForTask to receive a task to perform from AWS Data Pipeline. If tasks are ready in the work queue, PollForTask returns a response immediately. If no tasks are available in the queue, PollForTask uses long-polling and holds on to a poll connection for up to 90 seconds, during which time any newly scheduled tasks are handed to the task agent.  Migration CloudEndure Migration  is a block-level replication tool that simplifies the process of migrating applications from physical, virtual, and cloud-based se

Big Data and Kinesis

 EMR Removes the challenges in setting up and maintaining Hadoop cluster. Elastic Mapreduce - Managed Hadoop and Spark Service Storage Options HDFS - Default block size of 128MB. EBS - For temporary data storage EMRFS - Read/writes to S3 based on HDFS. Instance Types General Purpose - M4 Machine learning - C4 Deep learning - P3 Large HDFS - D2 Large scale interactive analysis - X1 Node types Master - Manages the cluster - Runs Yarn to manage resources. Runs Ganglia, Zepplin. Can have 1 or 3 master nodes in EMR cluster Core nodes - 1 to many - runs HDFS - execute tasks from master nodes. Task nodes - does computation and they dont run HDFS - can be 0 to as many as needed. To accelerate data processing more can be added. EMR Cluster types Transient - Terminates automatically after workload completion. Say running 1 hr job 10 times a day Takes 15-30 mins for initialization Long running  - Need to be terminated manually.  Say running 2 hr job 12 times a day Lifecycles of EMR Starting Boots

KMS

Create/manage Customer master key Enable/disable CMKs Schedule CMKs for deletion Configure key policies and grants Key Material AWS managed - Default - The KM is automatically rotated every 3 years (cannot be changed). Customer managed - AWS generated KM - rotated every year automatically. Customer managed - Customer generated KM- manually rotated KM - but upto customer when they want to do rotation.  Key Rotation - is accomplished by updating the backing key material. Old material is retained for decryption. Key Types Symmetric - Single key for both encrypt and decrypt operations. - Default - Best practice. Asymmetric - Public and Private key pair for  encrypt and decrypt operations. Key deletion - Waiting period between 7 and 30 days before key is deleted. For external key material - wrapping algo which will be used to encrypt the KM.  (CMKs) Master keys thus created has size limitation of 4 KB only. Envelope encryption - Master keys are used to generate data key by API call. Data ke

AWS Storage Services

Storage Gateway Hybrid cloud storage Caches frequently used data for low latency access from on-premises Supports NFS, SMS, iSCSI, iSCSI-VTL  HIPAA and PCI-DSS compatible Use cases Backup and Archiving Disaster recovery Transfer data for cloud workflows Tiered storage Types File - Store in S3 (Object storage) using SMB and NFS protocols. Backing up at object level. Supports POSIX style file metadata on objects. Max file size of 5TB - same as S3. S3 Classes supported - Standard, Standard IA and One Zone IA. Uses multi-part uploads to speed up upload process. Objects managed through the gateway should not be modified outside the gateway. Volume - Uses block storage with iSCSI and stores in S3 as well.  Integrates with AWS Backup S3 Bucket is managed by gateway and we don't have control. Data cannot be accessed through the S3 API unlike File gateway. Supports upto 32 volumes. Cache mode - Primary data is in S3 with a local cache. 32TB per volume and max of 1 PB. Stored mode - Stored l

AWS Compute Services

  Containers A task is a logical group of running containers.  Previously, tasks running on  Amazon ECS  shared the elastic network interface of their EC2 host.  Now, the new  awsvpc  networking mode lets you attach an elastic network interface directly to a task.  With the default bridge network mode, containers on an instance are connected to each other using the docker0 bridge. Containers use this bridge to communicate with endpoints outside of the instance, using the primary elastic network interface of the instance on which they are running.  The  awsvpc  networking mode  enables you to run multiple copies of the container on the same instance using the same container port without needing to do any port mapping or translation, simplifying the application architecture.  Associating security group rules with a container or containers in a task allows you to restrict the ports and IP addresses from which your application accepts network traffic.   Task Role =  Instead of creating and