AWS Storage Services

Storage Gateway
  • Hybrid cloud storage
  • Caches frequently used data for low latency access from on-premises
  • Supports NFS, SMS, iSCSI, iSCSI-VTL 
  • HIPAA and PCI-DSS compatible
  • Use cases
    • Backup and Archiving
    • Disaster recovery
    • Transfer data for cloud workflows
    • Tiered storage
  • Types
    • File - Store in S3 (Object storage) using SMB and NFS protocols. Backing up at object level.
      • Supports POSIX style file metadata on objects.
      • Max file size of 5TB - same as S3.
      • S3 Classes supported - Standard, Standard IA and One Zone IA.
      • Uses multi-part uploads to speed up upload process.
      • Objects managed through the gateway should not be modified outside the gateway.
    • Volume - Uses block storage with iSCSI and stores in S3 as well. 
      • Integrates with AWS Backup
      • S3 Bucket is managed by gateway and we don't have control.
      • Data cannot be accessed through the S3 API unlike File gateway.
      • Supports upto 32 volumes.
      • Cache mode - Primary data is in S3 with a local cache. 32TB per volume and max of 1 PB.
      • Stored mode - Stored locally with async backup to S3. 16TB per volume and max of 512TB.
    • Tape - Virtual tape library as backup and data stored in S3. Can be archived to Glacier and DeepArchive. Bucket where it is stored cannot be accessed unlike Volume gateway.
      • S3 standard and Glacier supported.
      • Max of 1500 tapes and total size limit of 1 PB. No limits for archives.
      • Archives data can be retrieved from Glacier in 3-5 hours and from Deep Archive in 12 hours.
  • Storage gateway running as VM in DC or cloud. Connects to AWS to access S3 alone.
  • Storage Gateway
    1. In AWS Storage Gateway, your iSCSI initiators connect to your volumes as iSCSI targets. Storage Gateway uses Challenge-Handshake Authentication Protocol (CHAP) to authenticate iSCSI and initiator connections. CHAP provides protection against playback attacks by requiring authentication to access storage volume targets. For each volume target, you can define one or more CHAP credentials. You can view and edit these credentials for the different initiators in the Configure CHAP credentials dialog box.
    2. You can protect Storage Gateway cached volumes using Amazon Elastic Block Store (Amazon EBS) snapshots and clones. Amazon EBS stored volumes can be up to 16 TiB, and Storage Gateway cached volumes can be up to 32 TiB. If your Storage Gateway volume is larger than 16 TiB, you cannot create an EBS volume from a snapshot.
    3. The Amazon S3 File Gateway enables you to store and retrieve objects in Amazon S3 using file protocols such as Network File System (NFS) and Server Message Block (SMB). Objects written through S3 File Gateway can be directly accessed in S3.
    4. The Amazon FSx File Gateway enables you to store and retrieve files in Amazon FSx for Windows File Server using the SMB protocol. 
    5. The Volume Gateway provides block storage to your on-premises applications using iSCSI connectivity. Data on the volumes is stored in Amazon S3 and you can take point in time copies of volumes which are stored in AWS as Amazon EBS snapshots. You can also take copies of volumes and manage their retention using AWS Backup. You can restore EBS snapshots to a Volume Gateway volume or an EBS volume. 
      1. In the cached mode, your primary data is written to S3, while retaining your frequently accessed data locally in a cache for low-latency access.
      2. In the stored mode, your primary data is stored locally and your entire dataset is available for low-latency access while asynchronously backed up to AWS.
      3. You can manage backup and retention policies for cached and stored volume modes of Volume Gateway through AWS Backup.
    6. The Tape Gateway provides your backup application with an iSCSI virtual tape library (VTL) interface. Virtual tapes are stored in Amazon S3 and can be archived to Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.

  1. EFS doesn't support cross region replication.
  2. EFS  - Your application should implement restart logic and connect to a mount target in a different AZ if your application is not highly available and your on-premises server cannot access the mount target because the AZ in which the mount target exists becomes unavailable.
  1. FSX
    1. You can host user shares in the cloud for on-premises access, and you can also use it to support your backup and disaster recovery model.
    2. Amazon FSx for Lustre - Use it for workloads where speed matters, such as machine learning, high performance computing (HPC), video processing, and financial modeling.
    3. When you create an Amazon FSx file system, Amazon FSx provisions one or more ENI. The network interface allows your client to communicate with the FSx. 
    4. Multi-AZ should be defined when the FS is created. This will create an active file server and a hot standby, each with their own storage, and synchronous replication across AZs to the standby. If the active file server fails, Amazon FSx will automatically fail over to the standby, so that you can maintain operations without losing any data. Failover typically takes less than 30 seconds. The DNS name remains unchanged, making replication and failover transparent, even during planned maintenance windows. 
    5. Multi-AZ file systems automatically fail over from the preferred file server to the standby file server if AZ outage occurs or preferred server becomes unavailable.
    6. By default, Amazon FSx takes an automatic daily backup of your file system. The default retention period for automatic daily backups is 7 days. You can set the retention period to be between 0–90 days. With Amazon FSx, you can manually take backups of your file systems at any time. 
    7. You can use AWS DataSync to schedule periodic replication of your FSx for Windows File Server file system to a second file system. Also from on-premise migration DataSync can be used by installing a Datasync agent.
  1. AWS Import/Export or Snowball Edge 
    1. Can handle upto 50TB - takes 2 days to get the device upon request and e2e for transfers may take 5-7 days.  Say for 10TB volume of data using 1 GBPS connection will take 1 day to transfer.  Cost for S3 TA is more than 60% over Snowball. So we can use S3 TA for fast access for smaller data like 10-20 TB. If volume is around 50TB it would be cheaper to use Snowball and time would still be 5 days.
    2. Transferring smaller files reduces your transfer speed due to increased overhead. If you have many small files (< 1mb) , we recommend that you zip them up into larger archives before transferring them onto a Snowball Edge device.
    3. The data transfer rate using the file interface is typically between 25 MB/s and 40 MB/s. If you need to transfer data faster than this, use the Amazon S3 Adapter for Snowball which has a data transfer rate typically between 250 MB/s and 400 MB/s.
    4. AWS recommends that you should use Snowmobile to migrate large datasets of 10PB or more in a single location. For datasets less than 10PB or distributed in multiple locations, you should use Snowball. 
    5. Also Snowmobile, you can import your data directly into Glacier.
    6. you can improve the transfer speed from your data source to the device in the following ways:
      1. Perform multiple write operations at one time
      2. Transfer small files in batches
      3. Write from multiple computers
      4. Don’t perform other operations on files during transfer
      5. Reduce local network use
  2. S3
    1. By default, Amazon S3 allows both HTTP and HTTPS requests. To comply with the s3-bucket-ssl-requests-only rule (only accepting HTTPS connections), your bucket policy should explicitly deny access to HTTP requests. To determine HTTP or HTTPS requests in a bucket policy, use a condition that checks for the key "aws:SecureTransport". When this key is set to true, this means that the request is sent through HTTPS. To be sure to comply with the s3-bucket-ssl-requests-only rule, create a bucket policy that explicitly denies access when the request meets the condition "aws:SecureTransport": "false". This policy explicitly denies access to HTTP requests.
    2. You can protect data at rest in Amazon S3 by using three different modes of server-side encryption: SSE-S3SSE-C, or SSE-KMSServer-side encryption is the encryption of data at its destination by the application or service that receives it.
      1. SSE-S3 requires that Amazon S3 manage the data and the encryption keys. Customer master keys (CMKs) are being used in SSE-KMS and not in SSE-S3. SSE-S3 provides strong multi-factor encryption in which each object is encrypted with a unique key. It also encrypts the key itself with a master key that it rotates regularly.
      2.  SSE-C requires that you manage the encryption key.
      3. SSE-KMS requires that AWS manage the data key but you manage the customer master key (CMK) in AWS KMS.
    3. With Amazon S3 Select, you can scan a subset of an object by specifying a range of bytes to query using the ScanRange parameter. This capability lets you parallelize scanning the whole object by splitting the work into separate Amazon S3 Select requests for a series of non-overlapping scan ranges. Use the Amazon S3 Select ScanRange parameter and Start at (Byte) and End at (Byte). 
    4. Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion. You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object. This helps you achieve higher aggregate throughput versus a single whole-object request. Fetching smaller ranges of a large object also allows your application to improve retry times when requests are interrupted.
    5. You can enable object-level logging for an S3 bucket to send logs to CloudTrail for object-level API operations such as GetObject, DeleteObject, and PutObject. These events are called data events. By default, CloudTrail trails don't log data events, but you can configure trails to log data events for S3 buckets that you specify, or to log data events for all the Amazon S3 buckets in your AWS account.
    6. Access Analyzer for S3 might show that a bucket has read or write access provided through a bucket access control list (ACL), a bucket policy, or an access point policy. It cannot be used for near real-time detection of a new public object uploaded on S3. 
    7. With S3 Access Points, customers can create unique access control policies for each access point to easily control access to shared datasets. 
      1.  Access points can easily scale access for hundreds of applications by creating individualized access points with names and permissions customized for each application. 
      2. Each S3 Access Point is configured with an access policy specific to a use case or application. 
      3. Every access point is associated with a single bucket and contains a network origin control, and a Block Public Access control.  
      4. With Access points you no longer have to manage a single, complex bucket policy with hundreds of different permission rules that need to be written, read, tracked, and audited. 
      5. With S3 Access Points you can specify VPC Endpoint policies that permit access only to access points (and thus buckets) owned by specific account IDs. This simplifies the creation of access policies that permit access to buckets within the same account, while rejecting any other S3 access via the VPC Endpoint. 
      6. An S3 Access Point can limit all S3 storage access to happen from a Virtual Private Cloud (VPC). You can also create a Service Control Policy (SCP) and require that all access points be restricted to a Virtual Private Cloud (VPC), firewalling your data to within your private networks.
      7. You will also be able to use CloudFormation templates to get started with access points. You can monitor and audit access point operations such as “create access point” and “delete access point” through AWS CloudTrail logs. You can control access point usage using AWS Organizations support for AWS SCPs.
    8. Amazon S3 Multi-Region Access Points accelerate performance by up to 60% when accessing data sets that are replicated across multiple AWS Regions. Based on AWS Global Accelerator, S3 Multi-Region Access Points consider factors like network congestion and the location of the requesting application to dynamically route your requests over the AWS network to the lowest latency copy of your data. This automatic routing allows you to take advantage of the global infrastructure of AWS. S3 Multi-Region Access Points provide a single global endpoint to access a data set that spans multiple S3 buckets in different AWS Regions.
    9. S3 Storage Lens is the first cloud storage analytics solution to provide a single view of object storage usage and activity across hundreds, or even thousands, of accounts in an organization, with drill-downs to generate insights at the account, bucket, or even prefix level. After you activate S3 Storage Lens in the S3 Console, you will receive an interactive dashboard containing pre-configured views to visualize storage usage and activity. You can also export metrics in CSV or Parquet format to an S3 bucket. 
    10. With Amazon S3 Replication, you can configure Amazon S3 to automatically replicate S3 objects across different AWS Regions by using S3 Cross-Region Replication (CRR) or between buckets in the same AWS Region by using S3 Same-Region Replication (SRR). S3 Replication offers the flexibility of replicating to multiple destination buckets in the same, or different AWS Regions. S3 Replication supports two-way replication between two or more buckets in the same or different AWS Regions.  
    11. You can use S3 Batch Replication to backfill a newly created bucket with existing objects, retry objects that were previously unable to replicate, migrate data across accounts, or add new buckets to your data lake. Customers needing a predictable replication time backed by a Service Level Agreement (SLA) can use Replication Time Control (RTC) to replicate objects in less than 15 minutes. You can monitor replication progress by tracking bytes pending, operations pending, and replication latency between your source and destination buckets using the S3 management console or Amazon CloudWatch. You can also set up S3 Event Notifications to receive replication failure notifications to quickly diagnose and correct configuration issues.
    12. Amazon S3 Cross-Region Replication (CRR) now supports object filtering based on S3 object tags. S3 object tags are key-value pairs applied to S3 objects that allow you to better organize, secure, and manage your data stored in S3. By using S3 object tags to determine which objects to replicate using CRR, you now have fine grained control to selectively replicate your storage to another AWS Region to backup critical data for compliance and disaster recovery.
    13. Standard S3 GET requests made through an S3 Object Lambda access point will now invoke the specified Lambda function. From that point forward, S3 will automatically call your Lambda function to process any data retrieved through the S3 Object Lambda Access Point, returning a transformed result back to the application. 
    14. The Amazon S3 Intelligent-Tiering storage class is designed to optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change. S3 Intelligent-Tiering automatically stores objects in three access tiers: one tier optimized for frequent access, a lower-cost tier optimized for infrequent access, and a very-low-cost tier optimized for rarely accessed data. 
    15. For archive data that needs immediate access, such as medical images, news media assets, or genomics data, choose the S3 Glacier Instant Retrieval storage class. For archive data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases, choose S3 Glacier Flexible Retrieval (formerly S3 Glacier), with retrieval in minutes or free bulk retrievals in 5-12 hours. To save even more on long-lived archive storage such as compliance archives and digital media preservation, choose S3 Glacier Deep Archive, the lowest cost storage in the cloud with data retrieval from 12-48 hours. 
    16. Glacier has Vaults which is a container for storing archives. We can have 1000 vaults/region. Archives can be upto 40TB and uploaded to vault as single or multi-part uploads.  A vault can be deleted only if there are no archives in the vault. An existing archive cannot be updated, it has to be deleted and uploaded.
    17. Vault inventory helps retrieve a list of archives in a vault with information such as archive ID, creation date, and size for each archive.
    18. Data retrieval requests are asynchronous operations, are queued and most jobs take about four hours to complete. Glacier supports a notification mechanism to an SNS topic when the job completes. S3 Glacier allows retrieving an archive either in whole (default) or a range, or portion.
    19. S3 Glacier Vault Lock helps deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy.
  1. RDS
    1. Amazon RDS supports three types of instance classes: Standard, Memory Optimized, and Burstable Performance.
      1. Burstable performance instances are ideal for workloads that usually require a constant baseline performance with occasional peaks in demand for a limited time. For instance, an e-commerce site will usually have a relatively constant load on the database that will increase dramatically on Cyber Monday and Black Friday
      2. Memory optimized instances are best for scenarios where you need to load a lot of rows (hundreds of thousands) into memory, such as after running a SELECT query on a really large table. 
    2. For each of the instance types, there are three types of storage available. General purpose SSD storage, provisioned IOPS SSD storage, and magnetic storage.
      1. Baseline I/O performance for General Purpose SSD storage is 3 IOPS for each GiB, with a minimum of 100 IOPS. This relationship means that larger volumes have better performance. For example, baseline performance for a 100-GiB volume is 300 IOPS. Baseline performance for a 1-TiB volume is 3,000 IOPS.
      2. Provisioned SSD storage costs more than regular storage, but it should provide better and more consistent performance. 
    3. RDS instances are priced by the hour and billed by the second. If an instance is stopped, you stop paying for it. However, you still pay for any storage you’re using.
    4. You can separately modify your Amazon RDS DB instance to increase the allocated storage space or improve the performance by changing the storage type (such as General Purpose SSD to Provisioned IOPS SSD).
    5. To increase the I/O capacity of a DB instance, do any or all of the following:

      • Migrate to a different DB instance class with high I/O capacity.
      • Convert from magnetic storage to either General Purpose or Provisioned IOPS storage,
      • When Enhanced Monitoring is enabled, Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on. 
      • free memory > 75% indicates that you should check your workload or upgrade your instance.
      • Investigate disk space consumption if space used is consistently at or above 85 percent of the total disk space. 
    1. RDS Backups - In addition to the daily automated backup, Amazon RDS archives database change logs. This enables you to recover your database to any point in time during the backup retention period, up to the last five minutes of database usage.
    2. Amazon RDS stores multiple copies of your data, but for Single-AZ DB instances these copies are stored in a single availability zone. If for any reason a Single-AZ DB instance becomes unusable, you can use point-in-time recovery to launch a new DB instance with the latest restorable data.
    3. To allow communication between RDS and to your on-premises network, you must first set up a VPN or an AWS Direct Connect connection. Once that is done, just follow the below the steps to perform the replication:

      1. Prepare an instance of MySQL running external to Amazon RDS.
      2. Configure the MySQL DB instance to be the replication source.
      3. Use mysqldump to transfer the database from the Amazon RDS instance to the instance external to Amazon RDS (e.g. on-premises server)
      4. Start replication to the instance running external to Amazon RDS.
    4. DB parameter group acts as a container for engine configuration values that are applied to one or more DB instances.
    5. When you associate a new DB parameter group with a DB instance, the modified static and dynamic parameters are applied only after the DB instance is rebooted. However, if you modify dynamic parameters in the newly associated DB parameter group, these changes are applied immediately without a reboot.
    6. Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances.
    7. Read Replicas are updated asynchronously either in Cross AZ or Cross region
    8. To min cost create Read replica in same region as primary as the transfer of data within a single Region is free.
    9. Only Standby replica is synchronously replicated.
    10. When the DB instance is deleted the Read replica is promoted as standalone DB.
  1. Aurora
    1. Amazon Aurora (Aurora) is a fully managed relational database engine that’s compatible with MySQL and PostgreSQL.
    2. Aurora Auto Scaling dynamically adjusts the number of Aurora Replicas provisioned for an Aurora DB cluster using single-master replication. When the connectivity or workload decreases, Aurora Auto Scaling removes unnecessary Aurora Replicas so that you don’t pay for unused provisioned DB instances. 
    3. You define and apply a scaling policy to an Aurora DB cluster. The scaling policy defines the minimum and maximum number of Aurora Replicas that Aurora Auto Scaling can manage. 
    4. You cannot set Auto Scaling for the master database on Amazon Aurora. You can only manually resize the instance size of the master node.
    5. Aurora can publish audit logs, slow query logs and error logs to Cloudwatch logs.
  1. DynamoDB 
    1. DynamoDB uses the concept of read and write units. One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. One write capacity unit represents one write per second for an item up to 1 KB in size. The defaults are 5 read and 5 write units, which means 20 KB of strongly consistent reads/second and 5 KB of writes/second.
    2. DynamoDB scaling is best suited for user growth changes over time. For hourly spikes use caching. The typical approach to caching is known as lazy population or cache aside. This means that the cache is checked, and if the value is not in cache (a cache miss), the record is retrieved, stored in cache, and returned.
    3. DynamoDB - . Using a hexadecimal string such as a hash key or checksum is one easy strategy to inject randomness.
    4. DynamoDB on-demand mode might still throttle if you exceed double your previous traffic peak within 30 minutes.
    5. You cannot add a local secondary index to an existing DynamoDB table. It must be provided at creation. However we can do this with Global Secondary table.
    6. LSI will have same primary key as table and can have different sort key - Idea is to resort the table using the new sort key.
    7. GSI can have different primary key and sort key - max we will have 5 GSI. 
  • Fully managed, Best suited for HA and Reliability
  • MySQL and PostgreSQL compatible.
  • Aurora cluster can grow upto 128TB.
  • Aurora cluster comprises of one or more DB Instances and DB storage volume that spans multi-AZ (within same region) with each AZ having a copy of data.
  • For high availability across multiple regions, use Aurora Global DB. Auto replication and failover is supported. (failover completes in 30 secs). 
  • Replication data lag between primary and replicas < 100 ms.
  • Ensure the clients dont cache the instance DNS data for  > 30 secs (TTL) - this is to avoid connection errors when instance fail overs but the client is still using cached URL.
  • DB instance class determines memory and computation capacity of DB.
    • Memory optimized
    • Burstable performance
  • Storage volume uses SSD.
  • Security - For MySQL Aurora alone we can use IAM Database Authentication.
  • Connection to the cluster is through an cluster endpoint 
  • Reader endpoint provides load balancing and re-routing when some instance becomes unavailable.
  • Instance endpoints can also be used to connect to specific instance.
  • In a single master cluster:
    • Primary DB instance for both read and write operations.
    • Replicas handle only read operations. Upto 15 replicas.
  • In multi-master cluster
    • All DB instance does write operations
    • Read replicas are hence not applicable.
  • Manages automatic clustering, replication, storage allocation, automatic failover when primary fails.
  • Option groups
  • Parameter groups - Is a container for DB config which can be applied to instances (RDS and Aurora). Cluster parameter groups gets applied to every DB instance in the cluster.
  • Max data API requests/second is 1000 per region. If this quota is exceeded we get Throttling error - Rate exceeded.  Reduce number of calls in such cases.
  • Max concurrent requests is 500.
  • Serverless Aurora is a DB cluster which auto scales compute capacity.
    • Use cases 
      • Variable, unpredictable workload applications, Dev/Test dbs
      • Multi-tenant - database capacity for each application is managed automatically
    • Limitations
      • Features not supported
        • IAM DB authentication 
        • Replicas
        • Multi-master clusters
        • Global databases
        • Proxies
  • Global databases - Spans multiple regions - has 1 primary DB cluster in 1 region and upto 5 Secondary DB clusters in other regions. This is asynchronous.
    • Writes to primary region and Aurora  replicates data to secondary regions.
    • The secondary DB clusters are read only and support read ops closer to users (low latency)
    • Each secondary can have upto 15 read replicas.
    • Low replica lag of < 1 sec cross region lag
    • < 1 min downtime after region unavailability
  • RDS Proxy - Helps pool and share DB connections. Throttles application connection that can be served from pool of connections. The proxy can be called by its endpoint. 
  • Storage
    • Data is stored in 10 GB logical groups called protection groups.
    • Replication of each protection group on 6 storage nodes across 3 AZs.  Write is considered successful when 4 of 6 ack write.
    • Self healing mechanism  - auto scanned and replaced.
  • DAS - Database Activity Streams - Provides NRT stream of database activity. Once enabled the streams are sent to Kinesis streams which can be analysed using Kinesis data analytics. 

  • Fully managed NoSQL database.
  • Unlike traditional databases which require a network connection to the DB, DynamoDB is a webservice and connection is open at interaction level. Instead of SQL statements for querying, the connect is based on Http(s) requests.
  • Dynamo DB Table has items (records) and each item has attributes(columns).
  • Uses Primary key as the partition index to uniquely identify an item. Also referred as hash attribute.
  •  Additionally we can have a Sort key, in which case it acts as a composite primary key. Sort key is also referred as range attribute.
  • More than an item can have the same partition key, however each should have their own sort key.
  • Other than primary key for the purpose of querying we can have secondary indexes on the table
    • Global - an index with a partition and sort key which can be different from that of the table. Max 20 per table.
    • Local - has same partition key as the table but different sort key. Max 5 per table.
  • A DynamoDB table is unique to a region. By default the data is replicated across AZs in the region.
  • ECR - Eventually consistent reads - may not contain the most recent write data and may return stale data.
  • SCR - Strongly consistent reads - Use more throughput than ECR, do not support global secondary indexes, may have higher latency than ECR
  • On-Demand Read/Write Capacity mode
    • Use for unknown workloads, unpredictable traffic scenarios. This is a flexible pay mode where we pay for use.
    • 1 Read Request Unit represents 1 SCR or 2 ECR for a 4 kb item. For a 8kb item we need 2 RRU.
    • 1 Write Request Unit represents 1 Write for a 1 KB item.
  • Provisioned mode - The Read/Write capacity is planned and specified. We can use auto scaling to adjust provisioned capacity automatically.  Use the mode for predictable traffic, consistent traffic 
Global Tables

  • For massively scaled apps with globally distributed user base, instead of setting up the table in multiple regions and manage replication amongst them, we can define the table as Global table with regions where it will be accessed.

Dynamo DB Streams

  • When streams are enabled on a table, a stream record is created when new item is added/updated/deleted to table. Each stream record is valid for 24 hours.
  • Similar to Oracle triggers if we want specific action to be taken based on table data changes, we can enable streams, link a lambda function to read stream data and build logic around it.
Table Classes
  • Each table is associated with a table class.
  • Standard table class is default for most workloads.
  • Standard-IA table class is optimized for tables where storage is dominant factor over read/writes. Basically infrequently accessed data like logs, old order history, etc 
DynamoDB Accelerator

  • DynamoDB normally offers single digit milli-second performance, however if we need micro-second performance, we can use DAX - DynamoDB Accelerator, a in-memory caching service which reduces response time of ECR workloads.
  • Real time bidding, Social gaming, trading applications are ideal use cases for DAX.
  • For read heavy applications, instead of increasing provisioned read throughput, the read activity can be offloaded to DAX cluster.
  • DAX is not for SCR and write sensitive apps.
  • DAX cluster is installed in the VPC. It can interact with the DynamoDB in the same region as it is in. 
  • The application connects to the DAX using a DAX Client which should be installed along with the application. 
  • Cluster has one or more nodes. One acts as primary and others as read replicas.
  • When client requests and the item is available in DAX it is returned else it passes request to DynamoDB for that item. 
  1. Elasticsearch
    1. Index snapshots are a popular way to migrate from a self-managed ElasticSearch cluster to Amazon OpenSearch Service.
    2. Amazon OpenSearch Service uses a blue/green deployment process when updating domains. Blue/green typically refers to the practice of running two production environments, one live and one idle, and switching the two as you make software changes. In the case of OpenSearch Service, it refers to the practice of creating a new environment for domain updates and routing users to the new environment after those updates are complete. The practice minimizes downtime and maintains the original environment in the event that deployment to the new environment is unsuccessful.
    3. To prevent data loss and minimize Amazon OpenSearch Service cluster downtime in the event of a service disruption, you can distribute nodes across two or three Availability Zones in the same Region, a configuration known as Multi-AZ. 
    4. Snapshots in Amazon OpenSearch Service are backups of a cluster's indices and state. 
      1. Automated snapshots are only for cluster recovery. You can use them to restore your domain in the event of red cluster status or data loss. 
      2. Manual snapshots are for cluster recovery or for moving data from one cluster to another.  These snapshots are stored in your own Amazon S3 bucket.
    5. Sample arch - Data from DynamoDB when created can be streamed using DynamoDB streams to Opensearch. Opensearch will index and optimize for search.

Comments

Popular posts from this blog

Key Concepts

Linear Algebra Concepts

Cryptography