Data science drives big decisions at Kohl’s
The American department store chain modernized its data strategy around Google BigQuery, bringing in third-party data sets and algorithms to further hone its personalization and merchandizing efforts.

Long before the advent of customer data platforms (CDPs), Kohl’s business model centered on collecting and cultivating customer data.

“We’ve had a homegrown customer data environment for decades,” says Paul Gaffney, CTO and supply chain officer at the $19.4 billion American department store chain. “And we’re quite happy with our custom implementation.”

The Milwaukee, Wis.-based retailer originally built its homegrown on-premises CDP on Netezza, creating robust customer profiles based on the chain’s large credit card portfolio and “a historical approach to cultivating customer loyalty and attachment that is very personalized,” Gaffney says.

But for the past several years, Kohl’s has made a big push to the cloud as part of a “technology modernization” that Gaffney says makes the most of machine learning, personalization, enhanced demographic data sets, and “hyper-localization” insights to deliver the most relevant merchandise to local stores.

The transformation sees the retailer, which is currently up for sale, running workloads on Google Cloud Platform and on private on-premises Google Cloud servers running VMware, as well as some utility workloads on Amazon Web Services, the CTO says. While the company’s current on-premises cloud uses a comprehensive suite of tools, including Qlik for advanced analytics and data visualization, Kohl’s long-term plan for data is all about Google BigQuery, Gaffney says.

“Four years ago, we started focusing on BigQuery as our primary data environment,” a decision Gaffney says he inherited. Kohl’s has since built a sophisticated data science practice around the Google platform, with most of the retailer’s critical data, including customer, product, and business performance views, now residing in that modernized data environment.

But Gaffney is far from finished.

“We’ve got about two more years to go to get to a place where I would describe us as a fully data-native organization, using automated decision processes instead of using data just augmenting human decision processes,” says Gaffney.

Key to that push is a strategy to make the most of machine learning and third-party data in service of customer personalization and the “hyper-localization” of merchandising decisions, Gaffney says.

The power of third-party data

Kohl’s, which employs 1,000 people in its IT organization, including 50 data scientists, started its data automation push 18 months ago. Currently, the chain’s ample collection of first-party customer data as well as licensed third-party data sets are being migrated to BigQuery to apply advanced machine learning models and enhanced personalization technology to bolster sales, Gaffney says.

Like many retailers, Kohl’s also uses publicly available machine learning models on the Google platform and has used Google’s Vertex AI platform. The retailer also licensed a data set called Demand Brain from Deloitte focused on consumer demand, comprehension, and forecasting, says Gaffney, explaining that all the big consulting firms have data subscription products and ML engines available for licensing.

Gartner analyst Erick Brethenoux says use of consultant data and ML models is gaining steam, especially among retailers.

“Many organizations employ third parties to build models for them,” Brethenoux says, noting that consulting firms also use third-party data sets to pre-build models to embed in client systems or, in rare cases, use both their own technology and their own data to build models for retailers and other clients.

Kohls, for example, has licensed a platform from Deloitte called InSightIQ and is working with another partner, Axiom, to enhance its first-party data with other data sets. Working with partners is essential for distinguishing what data signals are useful and what is noise, Gaffney says.

“One of the most interesting things in the technology landscape right now is the proliferation of these syndicated third-party data sets,” he says.

For example, Kohl’s uses a combination of customer spending algorithms to predict the next best offer to a customer based on their recent purchases. Much of that is based on first-party data of Kohl’s customers online and in stores. But now, to learn more about their loyal customers, Kohl’s can employ licensed third-party data sets to gain valuable information about a customer’s employment or recreational activities, for example.

“We’ve started augmenting first-party data with third-party data to determine what type of job they do when they’re not shopping and that has an impact on the footwear we should offer them, and that’s only one example out of dozens,” says Gaffney, adding that the investment community has been utilizing third-party data sets for many years, while the general business community is in the early days of putting them to use.

“In the past six months, we’ve started adding, alongside these deterministic non-learning algorithms, new machine learning models to help us get more precise about the kinds of offers we should make [to shoppers], who we should make them to, and when we should be making them,” he says.

Gaffney sees nothing but opportunity in the personalization space. “We’ve been very effective at using data science to better target our historical marketing campaigns,” the CTO says. “I think we’re no more than six months away from shifting away from a campaign-based approach to a truly personalized approach and another good three years to five years of continuous improvement.”

Better decisions with data

With its modernized CDP and personalization strategy fully in place, Kohl’s could be poised to make other, larger business moves. For example, Kohl’s tapped into its customer data to form a marketing partnership with cosmetics giant Sephora, with a goal of building a $2 billion beauty business. Kohl’s will have Sephora shops in 850 of its 1,100-plus stores by 2023, according to Kohl’s officials.

For Gaffney, hyper-locationalization is among the most “exciting” applications of third-party data. One goal, he explains, is to apply machine learning to a mix of first- and third-party data to make highly targeted merchandising decisions and to determine where to open stores based on a matrix of thousands of data points.

This could prove valuable in the company’s plans to add 100 new small format stores to its fleet of department stores over the next four years. In decades past, using solely its own customer data, Kohl’s would offer an identical assortment of products in each store based on core demographic data such as income, demand information, local competition, and neighborhood ethnicity. Just two years ago, applying third-party data sets in addition to its first-party data, Kohl’s was able to generate, for example, roughly 35 different assortments of shoe wear for various stores based on additional population, weather, and other third-party data, Gaffney says.

And that number has exploded as the volume of machine learning models and third-party data sets has increased. “We now have a matrix that’s about 1,500 cells instead of just 35,” the CTO says. “That’s what’s next: … build on this underlying paradigm to find better data and use better data science to make the data more granular and therefore make more effective decisions.”

IT leaders get creative to fill data science gaps
The ongoing data scientist shortage sees enterprises reconfiguring data teams, upskilling promising employees, and partnering to improve talent pipelines.