The secret to successful citizen data science programs: Good governance
Putting data and visualization tools in the hands of business people is one thing; getting IT out of the business of running reports is quite another.

What’s one way to get CIOs griping and venting about their data strategies? Ask them how successfully they get their business users to migrate off their mega spreadsheets and onto data visualization and other self-service business intelligence platforms.

Then, ask chief data officers (CDOs) how hard it is to lead data governance programs that include more support for citizen data scientists who want to integrate, prep, analyze, and share insights over a growing number of data sets.

I ran a workshop at CIO’s recent Future of Work Summit on governing citizen development programs that leverage no-code and low-code platforms. I elected to focus on citizen data science, knowing that many CIOs and CDOs look for advice to build data governance into these programs. After writing two articles for InfoWorld, one on how spreadsheets are killing your business and another on replacing spreadsheets with business workflows, I was anxious to hear the challenges from the IT and data leaders in attendance.

Full disclosure, I know a thing or two about developing center of excellence programs in citizen data science and rolled out my first programs as a CIO over a decade ago. I share some of the stories and lessons in my new book, Digital Trailblazer, in the chapter on “Buried in bad data.”

Survey says!

I ran a quick survey during the workshop to get a sense of attendees’ challenges and perceptions around  citizen data science. And although the sample size of 60 respondents is too small to support any conclusions, the survey suggests that these IT leaders are still in the early stages of rolling out citizen data science programs:  

  • When asked to pick the top two ways business departments typically view data, respondents pointed to spreadsheets they develop themselves (53%) and automated reports managed by IT and data teams (43%). Forty-three percent said self-service BI was among the top ways business departments view data, but just of them 13% said their self-service BI had strong governance.
  • The group reported the functions having the most to gain and are the least served with data analytics are customer experience at 35% and product development at 28%. 
  • One question asked for the top three challenges getting collaboration between business, data specialists, and technologists around data-driven practices. The top answer (reported by 40% of respondents) was that business leaders just want IT to fix the data and deliver reports.

Data visualization and prep tools went mainstream ten years ago, so this apparent lack of progress is far from encouraging. To get things moving in the right direction, IT and data leaders must ramp up data governance programs that support citizen data science efforts.

Turn compliance risks into citizen data science force multipliers

The problem with spreadsheets is that they were rolled out to business users well before there were data governance practices. Business analysts downloaded data sets, created multiple spreadsheets, and emailed them to colleagues. Today, replace spreadsheets with your favorite data visualization tools and if left ungoverned, you could end up with even bigger problems.

Problems include:

  • Sharing private and confidential information and creating compliance risks;
  • Leaking information to unauthorized people outside of the organization;
  • Misunderstanding data definitions and making wrong decisions based on assumptions;
  • Sharing analytics and insights without testing the algorithms and validating results;
  • Building visualizations without standards or style guidelines, thus making it more difficult for employees to understand the results.

Of course, today the risks are magnified because most enterprises analyze big data sets, use multiple analytics tools, and develop custom code for proprietary machine learning models. Analytic models are used across the organization for revenue-generating activities and operational efficiencies, and mistakes can be costly. Data governance aims to address the compliance requirements, knowledge gaps, and data quality goals that can turn risk into an accelerating force in citizen data science programs.

Where to start with proactive data governance

The primary drivers behind many data governance programs are compliance and security requirements, but proactive data governance aims to achieve those objectives while also enabling the data-driven organization. These programs define transparent data access and usage policies so that it’s clear who can use what data sets for their analysis. Data catalogs are updated whenever an analysis or visualization includes new formulas, segments, and other parameterizations. There are ongoing efforts to reduce data debt, improve data quality, and automate data integrations. Dashboards, analytics, and machine learning models are versioned and have a support lifecycle defined.

Failure or falling behind in creating these data governance practices and this generation of citizen data science analytics will look just as bad as last decade’s mega spreadsheets.

Amplitude customer data platform challenges Twilio, Salesforce, Adobe
Analytics company Amplitude is set to release a new customer data platform (CDP), planning to undercut market leaders with an aggressive pricing strategy.