Entity Resolution (and how GOST does it differently)
by Kristina Drye, on Mar 17, 2021 8:35:52 AM
Entity resolution is one of the most important aspects of a risk- or security-oriented organization, and yet it is often misunderstood. In this blog post, we will define entity resolution, what risks accompany poor entity resolution, and how GOST does it differently.
In the real world, an entity is anything tangible (think of what usually constitutes a noun: people, places, or things are all technically “entities”.) In the risk and security world, entities usually refer to people or companies. In data science and computing, entity resolution (ER) is “the task of disambiguating records that correspond to real-world entities across and within datasets.” (Source).
These two definitions give us the important parts of ER: first, the process of entity resolution; and second, the outcome of entity resolution. The process of entity resolution is when a professional is trying to find information about a real-life entity. In order to do so, they have to find information online about the entity in question. The outcome of entity resolution is the level to which the information they find correctly corresponds to the entity in question.
A successful entity resolution capability is essential to any screening program’s success. This is primarily because of what are known as false positives and false negatives, illustrated in the diagram below.
As you can see, there are two true outcomes to any decision, and two false outcomes. The true outcomes are “true positive” and “true negative”. This means that the computer either matched information to a person correctly (true positive), or that the computer indicated that the information pulled does not match a person correctly (true negative). The problem comes with the false outcomes, or a “false positive” and a “false negative”. A false positive is when the computer matches information to a person incorrectly, and a false negative is when the computer fails to identify a match correctly.
Both of these situations can be problematic for a risk department or security institution. In the case of the false positive, an incorrect identification can cause problems for both the entity identified and the institution that incorrectly took action. In the case of the false negative, the institution misses a risk or a threat that can continue to perpetuate illicit activity or negative behavior.
To make more precise and accurate entity resolution assessments, professionals are always on the hunt for tools that reduce false positives and false negatives. Because GOST performs entity resolution at two levels, the extraction of information and the aggregation of that information, GOST’s patented approach to entity resolution is unmatched in the current screening market.
The first level, the extraction of information and its organization, is content-level reliability. At the content level, GOST creates concepts and measures the conceptual distance between the entity as-searched and the entity in the data GOST returns. By concepts, we mean a behavior in question. For example, we might have behaviors associated with human traffickers, so we know how they behave, and how people who are not human traffickers behave. This can be repeated with any behavior- money laundering, fraud, terrorism, or corruption, for example.
It’s helpful to think of it as a simple thought experiment with colors. What if someone handed you the image below, and asked you to decide if it was yellow, or if it was green? What would you decide?
You have two choices, Green or Yellow:
The color you were given (chartreuse) kind of looks a bit like both: a little yellow, a little green. But we have to choose which one it is most like. To do this, we measure the distance between the two concepts, or colors. What we see here is that the first color- chartreuse - is closer to green than it is to yellow. That is the “conceptual distance” GOST measures. Except instead of colors, the concepts are behaviors. While the actual process is a little more complex, this is a simple way to visualize the process.grasp what is happening.
This process of concepts and measurement allows the GOST user to see a rank-ordered list of content in order of how closely related the content and the searched entity are. Because GOST indexes the open and deep webs as well as including any data the user provides, the amount of information is very large, offering a more precise measurement.
The second level, the aggregation of that information to a top-level score allowing differentiation between entities, is known as entity-level reliability, or ELR. ELR represents the likelihood that the entity being searched in fact matches across the body of information.
To produce this Reliability score, GOST’s algorithms aggregate the reliability of the content-level information to summarize whether the GOST user should expect highly-resolved information to be available for the respective entity’s search results. These Reliability scores across a set of data allow the user to rank-order entities on the level of entity resolution: which segment of my entire dataset is highly resolved, that I should look at first? Alternatively, which segment of my entire dataset is not highly resolved, and might take longer or be a less productive use of my time?
This entity resolution process, robust with two levels and iteratively learning because of the machine learning capabilities, is more effective than other industry-standard tools. It helps your team reduce false positives and false negatives, while increasing what matters: true positives and true negatives. With GOST, your team can screen the bad guys while letting all others seamlessly transact.