Skip to main content
Glue
- AWS Glue is a serverless, fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams.
- AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog.
- AWS Glue is designed to work with semi-structured data.
- You can use AWS Glue to organize, cleanse, validate, and format data for storage in a data warehouse or data lake.
- Glue discovers and catalogs metadata about your data stores into a central catalog.
- Populates the AWS Glue Data Catalog with table definitions from scheduled crawler programs.
- AWS Glue can catalog your Amazon Simple Storage Service (Amazon S3) data, making it available for querying with Amazon Athena and Amazon Redshift Spectrum.
- You can run your ETL jobs as soon as new data becomes available in Amazon S3 by invoking your AWS Glue ETL jobs from an AWS Lambda function.
- Glue supports data sources - S3, RDS, Dynamo, Mongo, Document DB, 3rd party JDBC accessible DBs. Crawlers can pull data from sources and update Data Catalog with metadata. AWS Glue can generate a script to transform your data.
- Glue can access Kafka or Kinesis data streams.
- You can run your job on demand, or you can set it up to start when a specified trigger occurs. The trigger can be a time-based schedule or an event.
- Tables and databases in AWS Glue are objects in the AWS Glue Data Catalog. They contain metadata; they don't contain data from a data store. Similar to Hive metastore.
- Each AWS account has one AWS Glue Data Catalog per region.
- A data store is a repository for persistently storing your data. Examples include Amazon S3 buckets and relational databases. A data target is a data store that a process or transform writes to.
- The AWS Glue Schema Registry is a new feature that allows you to centrally discover, control, and evolve data stream schemas. A schema defines the structure and format of a data record.
Comments
Post a Comment