Clean Data Is Valuable Data

by Lou DiGiacomo, VP of Product

In the AdTech industry, low match rates are a major problem that can lead to wasted marketing budget and missed opportunities. Let's dive into the basics of data hygiene, including the importance of recency, normalization, and hashing.

The (Very) Basics of Data Hygiene

Recency 

A quintessential enemy of match rates is the short lifespan of data. People get new phones, new shipping addresses, new computers, or open new browsers. When they do, data previously collected on them through cookies, IP addresses, and mobile ad IDs are dead in the water. If you’re purchasing a list of third-party data, there’s a good chance some of that data is already outdated and won’t match to an active in-market shopper … and withered data means wasted marketing budget. 

Normalization 

In the context of marketing and advertising, data normalization is the process of standardizing and organizing different data inputs, such as customer details, to make them consistent and easily comparable. Different companies have varying formatting rules, abbreviations, and variations for addresses and other personal information. Challenges here arise when trying to match “123 Main Street” to “123 Main St” and similar formatting mismatches. Normalization is essentially a set of agreed-upon formatting rules that ensures everyone labels their data the same way. Partners who use different formatting rules are a common culprit of low match rates. 

This is a necessary precursor to hashing, which has recently become a common phrase in the AdTech world.  

Hashing 

This is the process of taking plain text input data of any size and irreversibly encoding it into a fixed string of new characters. It’s used as a safeguard for personally identifiable information. Instead of sending a list of plain text email addresses, for example, an API would send a list of hashed addresses instead.  

This added layer of security is important, but it also leads to difficulty in matching data sets, if normalization has not taken place.  

Check out the following example: 

Even a simple change, like capitalizing the “s,” will lead to an entirely different hashed value. You can imagine how these challenges compound for a more complex data point, such as a postal address, where there are more opportunities for each partner to write the same value differently. So, even the slightest difference in normalization changes a hash, which results in significant disparities in match rates. 

Recency, normalization, and hashing are key components of data hygiene that can unlock high match rates in the AdTech industry, and we see the proof every day. At fullthrottle.aiTM, our strict data hygiene standards for ourselves and our partners allow us to deliver an unprecedented 90% average match rate for clients. In addition to having great data hygiene, you should also diversify your data.

 


To read more about match rates, check out the articles below:

How To Reach More of Your Audience With Higher Match Rates

The Illusion of First-Party Data Control in Walled Gardens

Unlocking High Match Rates in Data-Driven Marketing