Machine learning can be used for a lot more than just RPA, virtual assistants, and autonomous vehicles. We can also use machine learning to identify duplicates in your Salesforce environment. In this article, we will take a look at how machine learning algorithms can be trained to dedupe Salesforce.
How Does Machine Learning Match Two Records?
When a human looks at two records, they can easily tell if two records are duplicates or not. For example, let’s take a look at the example below:
Name | Last Name | Address |
Boris | johnson | Downing St. 10 |
Boris | Johnson | 10 Downing Street |
While it may be pretty obvious that the records below are duplicates, it may be challenging to explain exactly why. You might start by pointing out all of the similarities, i.e., the first name and last name as well as the street address. While this is a good first step, you would then need to stipulate what exactly you mean by “similar.” How do you determine such similarities? How would a machine learning system go about identifying these similarities? One of the ways researchers teach such similarities is through the use of string metrics. There are many string metrics, such as the Hemming distance, which counts the number of substitutions required to turn one string into another. For example, if we look at the example above, it takes only one substitution to turn “johnson” into “Johnson”; therefore, the Hamming distance would be 1. There are many other string metrics out there, and they are all used to train ML systems on how to spot similarities between two strings of data.
How Can Machine Learning Be Used to Dedupe Your Salesforce?
There are several ways you can look at Salesforce records. One way is as a block of text:
Record 1 | Record 2 |
Boris johnson Downing Street 10 | Boris Johnson 10 Downing Street |
Another option is to compare each field independently:
Record 1 | Record 2 | |
First Name | Boris | Boris |
Last Name | johnson | Johnson |
Address | Downing St.10 | 10 Downing Street |
The single block approach is not very convenient because it does not allow you to place emphasis on a particular field. The field-by-field approach is much more effective since it allows you to place a specific weight on each field, with the most important field having the highest weight. Salesforce deduping tools that use this sort of technology will allow you to set weights for each field and create a model so that approach can be codified and leveraged in any comparison.
Why Use Machine Learning to Dedupe Salesforce?
When it comes to deduplication, every company’s dataset is unique and will come with its own set of issues. When one of your team members determines that a record set is unique (or not), the system will automatically learn from these actions and adjust the algorithm to identify future duplicates without human intervention. This is called “active learning,” and it allows the system to continuously modify the weights assigned to each field based on the decisions of the users and thereby improving duplicate detection.
It is important to note here that it is difficult to accurately set the field weights. For example, is the Last Name twice as important as the First Name field or 1.7 times and so on? For a human, it would be impossible to make such a calculation. On the other hand, computers powered by machine learning can calculate an almost infinite amount of data quickly and efficiently. The only limitation would be the availability of computation power. These algorithms would be able to calculate accurate weights for each field, a process known as regularized logistic regression.
The Added Value of Deduping With Machine Learning
When we take a close look at all of the deduping apps on the AppExchange, we see that they are all rule-based. What this means is that every time a new duplicate is detected, your Salesforce admin would need to create a new rule to prevent this from happening in the future. Not only is such an approach unsustainable, but it also cannot account for every possible type of “fuzzy” duplicate. On the other hand, the machine learning system does all of the work here for you because it simulates the human thought process when comparing records.
If we take a look at some of the popular deduping tools available on AppExchange, we notice that they are all rule-based. What this means is that every time a duplicate record is identified, your Salesforce admin will need to create an additional rule to prevent it from recurring. Not only does this take up a lot of time, but it’s nearly impossible to account for every possible “fuzzy” duplicate. You can try to set all of the weightings for each field yourself or use other metrics to catch the duplicates. In the end, it is very time-consuming and ineffective in catching all the issues. Machine learning does all of this for you, thus saving you a lot of time and hassle.
Consider Using Machine Learning to Dedupe Your Salesforce
There are many other advantages to using this type of artificial intelligence (AI). The algorithm is fully customizable, and there is no need for a complicated setup process. Remember, if you are using a tool that relies on complex rules, someone needs to set up the rules and then maintain them. A machine learning tool eliminates this effort completely, allowing you to simply download the product and start using it right away.