They all say, “Less false positives guaranteed!”

by Lauralee Dhabhar, on Mar 9, 2023 7:00:00 AM

Determining the best adverse media or sanctions screening tool for your organization is a complex task. Beyond the varying lists of features and cost considerations are a host of ambiguous differentiators. Every website, every salesperson promises the same thing: “We are different. We will deliver fewer false positives.”

How? Compared to what?

NLP resize

To answer the first question, virtually all vendors today will point to their tool’s machine-learning component and tell you they use natural language processing (NLP). This is fair. Even the most basic set of algorithms is going to provide better matches than nothing at all. However, there is a very wide gulf between rules-based systems and the highly sophisticated named entity recognition capabilities of a tool like GOST. To determine which product will serve you with the most accurate information available, you must understand the capabilities and limitations of the machine-learning processes driving each solution.

What is natural language processing?

NLP is not new. The idea that a machine could be created to understand the nuances and structure of human languages originated in the 1940s. Since this time, computer scientists have studied the problem of language from a variety of angles, developing basic rules-based systems in the 1950s and moving forward in the 1980s and 1990s to empirical and probability-based models. Today, you can find NLP at work in all of its forms from the first algorithms of its infancy to the near-human brain performance of some of the latest tools on the market.

In order to understand the capabilities of the machine-learning models contained within an adverse media or sanctions solution, you need to first understand the type of NLP employed. Three common terms you will run across while conducting your solutions search are fuzzy matching, rules-based matching, and named entity recognition.

Fuzzy matching is used in most commercial search engines. It identifies similar but not identical data strings. When employed in an adverse media tool, fuzzy matching will return media sources about anyone with similar identifiers. It may return some exact matches, but it will also provide you with a large amount of irrelevant information. Tools using fuzzy matching logic will always have high false positive rates and rely upon significant manual labor to clear risk alerts.

Rules-based systems go a step beyond fuzzy matches by using rules to categorize the language under analysis. Therefore, not only will a rules-based system return exact matches to your query, but it will use a library of language rules to determine if similar or related results also qualify as matches.

For instance, perhaps you are onboarding a new client named John Smith living in Anytown, Nebraska. His birthday is April 1, 1997. A rules-based system could consider all the following probable matches depending on the rules in its library.

NLP Computer

While the rules-based approach reduces the occurrence of false positives in comparison to fuzzy matching, it will only be as good as the set of rules governing its performance. This leaves room for a large variation in tool efficiency and effectiveness.

Named Entity Recognition (NER) is an NLP technique that identifies entities within a text and classifies them into predefined categories. An entity in this case does not necessarily refer to an individual. NER models can identify organizations, locations, times, monetary values, behaviors of interest, and more. Tools like GOST leverage the power of NER to navigate unstructured data. Eighty percent of available data in the world is in an unstructured format; meaning that this information isn’t organized in an easily accessible format such as lists and charts. It is messy, difficult to categorize, and so abundant that a human could not possibly make use of this information without the help of advanced technology. Imagine trying to read, organize, and understand the entire internet.

This ability to understand data regardless of its format is what differentiates NER from the less sophisticated NLP tools mentioned above. While NER is able to find meaning and provide context in any environment, fuzzy matching and rules-based matching only operate well on pre-categorized and structured data. Adverse media and sanctions screening tools employing NER will deliver the most accurate results with the least false positives assuming the machine-learning models are ethically and appropriately trained.

The most advanced named entity recognition models employ neural networks known as transformers to enable the study of data from many different perspectives. An NER tool built on transformers behaves much like an investigator interviewing witnesses at a crime scene. Every witness has a different perspective of the event. The investigator must merge all the variations to obtain the most accurate picture of the occurrence. A transformer-guided NER tool does the same with language in order to understand the context of the words.

GOST and its many components are made of many machine learning models including some of the most advanced examples of NER on the market including GONER, which you can read more about here.

With this knowledge, you can now determine how effective a platform is likely to be in reducing your false positive rate. However, a vendor still needs to be able to provide a comparison for their reduction claim. How many times have you visited a product website and read a claim of an 80% reduction in false positives? 50%? It is difficult to determine the value of a specific platform without a basis for comparison.

Fortunately, at Giant Oak, we had the opportunity to run a head-to-head test at a top 5 enterprise-level financial institution between their incumbent system, a well-respected organization in the adverse media and watchlisting industry utilizing rules-based matching, and GOST. For this study, 3.4 million files were screened through both systems. The incumbent system produced one true positive for every 32 false positives. GOST returned one true positive for every 17 false positives. In the same population, GOST discovered 83% more actionable instances of risk while providing a 47% reduction in false positives. This evidence is why GOST is now the institution’s incumbent technology for adverse media screening.

Advanced technology does not have to represent an unknown risk to your risk and compliance program. Ensure you have confidence in coverage by understanding what makes the technology work and asking for proof.

They all say, “Less false positives guaranteed!”

Giant Oak Newsletter

Recent Posts