Skip to main content

AWS Entity Resolution Documentation

Flexible and customizable data preparation

AWS Entity Resolution is designed to read your data to use it as data inputs for match processing. Each row of the data input table is designed to be processed as a record, with a unique identifier serving as a primary key. AWS Entity Resolution can operate on encrypted datasets. You can also define the schema mapping for AWS Entity Resolution to understand which input fields you want to use in your matching workflow. You can bring your own data schema, or blueprint, from an existing AWS Glue data input or build your custom schema using an interactive user interface or JSON editor. Data inputs are designed to be normalized prior to matching. You can turn off normalization if your data input has already been normalized.

Configurable entity matching workflows

An entity matching workflow is a sequence of steps you set up to tell AWS Entity Resolution how to match your data input and where to write the consolidated data output. You can set up one or more matching workflows to compare different data inputs and use various matching techniques. You can also view the job status of existing matching workflows and metrics.

Ready-to-use rule-based matching

This matching technique is designed to include a set of ready-to-use rules in the AWS Management Console or command line interface (CLI) to find matches, based on your input fields. You can also customize the rules, delete rules, rearrange the priority of rules, and create new rules. You can also reset the rules. The data output in your S3 bucket is designed to have match groups. They are generated by AWS Entity Resolution using the rule-based matching where each match group is designed to have the rule number used to generate that match associated to it.

Preconfigured ML matching

This matching technique is designed to include a preconfigured ML model to find matches across your data inputs. The model is designed to use input fields associated with name, email address, phone number, address, and date of birth data types. The model can generate match groups of related records with a confidence score in each group designed to explain the quality of the match relative to other match groups. The data output in your S3 bucket is designed to have match groups.

Matching records with data service providers

With AWS Entity Resolution you can match, link, and enhance your records with data service providers to better understand, reach, and engage your customers. With this matching workflow you can enhance your records though column appends, or you can translate customer data into data service providers IDs to meet your business goals. You are enabled to create a matching workflow with data service providers.

Bulk and incremental processing

Data processing helps you convert your data inputs into a consolidated data output table with similar records that have a common match ID generated using entity matching workflow configurations. Using the API and the AWS Management Console or CLI, you can then run manual bulk processing, based on your existing extract, transform, and load (ETL) data pipeline. This is designed to reprocess data for any new matches and update to existing matches. Also, for rule-based matching scenarios, you can initiate incremental processing. When new data is available in your S3 bucket, the service is designed to read those new records and compare them against existing records.

Matching

For rule-based matching scenarios, you can initiate resolution. You can lookup, match, or create a new match ID through the AWS Entity Resolution Generate Match ID API, AWS Management Console, or CLI. This is designed to process new data for any matches, update to existing matches, or create new match IDs for new records. AWS Entity Resolution is designed to hash those attributes for data protection and retrieve or create the corresponding entity match ID to link and match the customer. This enables you to match records.

Rule-based fuzzy matching

For rule-based matching scenarios, you can use advanced fuzzy-matching techniques using various algorithms. These algorithms can be used to set similarity, distance, and phonetic thresholds on string fields to match records. Using the AWS Entity Resolution Create Matching Workflow API, AWS Management Console, or CLI, you can combine multiple algorithms using logical operators to customize your rules. This is designed to process data for any relevant matches, based on the order of the rules you configure with the fuzzy algorithms you select. This enables you to match approximate records containing variations, typos, and inconsistencies.

Lookup

Looking up entity match IDs through the AWS Entity Resolution Get Match ID API helps you retrieve an existing match ID. You can call AWS Entity Resolution with personally identifiable information (PII) attributes acquired through multiple sources and channels. AWS Entity Resolution is designed to hash those attributes and retrieves the corresponding entity match ID to link and match the customer.

Data protection and regionalization by design

AWS Entity Resolution offers an encryption capability that can help you protect your data and help provide you with an encryption key for data input into the service. Both AWS Entity Resolution and its data encryption capability are designed to support regionalization to where the data is processed, and they are designed to operate in the same AWS Region from where you are using data for matching. Finally, you can also encrypt and hash the data output in Amazon Simple Storage Service (Amazon S3) before using your resolved data in other applications. 

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.