Glue

  •  AWS Glue is a serverless, fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. 
  • AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog.
  • AWS Glue is designed to work with semi-structured data.
  • You can use AWS Glue to organize, cleanse, validate, and format data for storage in a data warehouse or data lake. 
  • Glue discovers and catalogs metadata about your data stores into a central catalog. 
  • Populates the AWS Glue Data Catalog with table definitions from scheduled crawler programs.
  • AWS Glue can catalog your Amazon Simple Storage Service (Amazon S3) data, making it available for querying with Amazon Athena and Amazon Redshift Spectrum.
  • You can run your ETL jobs as soon as new data becomes available in Amazon S3 by invoking your AWS Glue ETL jobs from an AWS Lambda function.
  • Glue supports data sources - S3, RDS, Dynamo, Mongo, Document DB, 3rd party JDBC accessible DBs. Crawlers can pull data from sources and update Data Catalog with metadata. AWS Glue can generate a script to transform your data. 
  • Glue can access Kafka or Kinesis data streams.
  • You can run your job on demand, or you can set it up to start when a specified trigger occurs. The trigger can be a time-based schedule or an event.
  • Tables and databases in AWS Glue are objects in the AWS Glue Data Catalog. They contain metadata; they don't contain data from a data store. Similar to Hive metastore.
  • Each AWS account has one AWS Glue Data Catalog per region.
  • data store is a repository for persistently storing your data. Examples include Amazon S3 buckets and relational databases. data target is a data store that a process or transform writes to.


















  • The AWS Glue Schema Registry is a new feature that allows you to centrally discover, control, and evolve data stream schemas. A schema defines the structure and format of a data record. 

Comments

Popular posts from this blog

AWS Organizations, IAM

Key Concepts

Linear Algebra Concepts