The plight of the data team

There exists a never-ending quest for knowledge in organizations, with data teams drowning in quick questions while chasing the elusive dream of self-service. Our current systems for distributing and accessing information aren't working - and what we might want to try instead.

I have a core thesis that people ask data questions just because. No actionable insights were made, no needles were moved, and no low-hanging fruit was grabbed. Just because. What starts as existential questions as a child only develops further into adulthood; the difference is one is about astronomy and the other, well, literally everything.

This is not news to anyone with “data” in their job title. The countless ad-hoc questions create busy work that is undoubtedly disconnected from future outcomes of the business. Early in my career, I spent entire days in BI tools, jumping from simple questions like “how much did we spend on Facebook last week?” to complex quests like “why do different customer segments use our app during different times of the day?”. While it was a great learning experience, I find it hard to believe that addressing those questions led to increased revenue or lower costs for the business.

TFW you answer a tough ad-hoc request.

Generally speaking, there are two ways to handle these requests.

Option A: succumb to the ask. Put your tail between your legs, look into the “quick question”, and get on with your actual work. Unfortunately, what first seems like a five-minute question takes up half of your day because the field you thought existed in the LookML model does not. Further, there’s a “Great! What about [enter additional dimension]…” follow-up that inevitably leads to more digging. You’re frustrated, but at least your stakeholder is happy.

Option B: push back and encourage self-service. Question the urgency of the ask while guiding them to the right dataset(s) with “Can you give this a try yourself and let us know how it goes?”. Inevitably, when that does not work, ask that they document the request in a JIRA ticket. Explain that while your roadmap is flexible, you promised an additional ARR breakout for your CRO to present to the board, so this one will have to wait. Review those backlogged tickets bi-weekly, never get to most of them, and archive the ticket when the stakeholder leaves the company. Your stakeholder is frustrated, but at least you kept your sanity.

I’ve spent what feels like a third of my life grappling with some version of Option B, perpetuating the idea that we built a system that solves the knowledge needs of the organization. A system in which a team of people are responsible for sourcing, preparing, and delivering all of the information that any one individual within the organization might possibly ask for at any given time, and ensuring that the system is accessible to the majority of the organization. Are we, the collective data people, chasing the impossible?

Out in the wild

In the recent past, one of my team’s main responsibilities was to support client success managers (CSMs) in delivering sector trends and scouting potential partners for our clients. In my experience, external conversations around data are often no different than internal conversations; the only difference is that externally facing teams are less equipped to handle these questions.

Conversations between CSMs and their clients typically followed this pattern:

Client: “Wow, this is great. Can we split [chart A] by geography?”

CSM: “Hmm, this isn’t something we’ve looked into, but we can check with our data team and get back to you!”

CSM (to us): “Hey, can you help us with this?”

Data team: “Why is it important? How will the client use this information, or what decision(s) will the client make from knowing this information?”

CSM: *blank stare*

While this may seem like a knock against the CSMs, it’s not.

For one, managing clients is a multivariate job that involves onboarding, report building, and synchronous and asynchronous touchpoints with numerous clients of differing needs. Their main priority is increasing customer satisfaction, sometimes at any cost (if the deal size is large enough). It’s not in their - nor the company’s - best interests to constantly push back on these types of questions.

Secondly, who is to say that addressing all client (read: stakeholder) needs is not the most prudent for the business? At worst, they’re building context around a problem they need to solve. At best, a direct decision is made as a result of this information. Either way, they will associate our organization as the partner who helped them solve it. Who are we to tell a client what is important versus what is not?

The self-service movement

Ten to fifteen years ago, BI products like Tableau, Looker, and PowerBI cropped up, and a self-serve analytics movement began. The goal behind self-serve analytics is to remove the data team as a bottleneck for data requests. The data team builds and maintains the system, and the rest of the organization can easily access this system. In an ideal world, these competing forces work in perfect harmony: there is an exact list of fields ready at the user’s disposal, flexible enough to answer any business question and meaningful enough to be understood by any user. Meanwhile, stakeholders are powerful enough in the tool to self-serve, i.e., find answers to their own questions.

Rather, we’ve witnessed that maintaining the present-day state of business in the system is nearly impossible. SKU launches, deal nuances, and upstream schema changes throw off the sense of stability stakeholders expect in an internal data platform. Managing these changes requires a sizable team, which is particularly difficult given that most data teams are viewed as cost centers and not revenue drivers.

Further, accessing the system is never truly easy; in fact, it requires a lot of hands-on training. Navigating your way around data tooling is a skill. There are fairly complex data concepts such as joining tables, understanding semantics (dim what?), and building tables for specific visualizations. Lastly, it doesn’t help when the data team changes tooling every two years.1

Give the people what they want

I previewed this piece to a colleague, and they asked me point blank: “what’s the answer?” Frankly, I don’t know. But, I have a few hunches on the principles that would lead to a better system.

Product-first approach

Data analysts should not be the stopgap for quick questions. This is where software can uniquely fit into the equation as a more cost-efficient way to distribute knowledge.

The future of analytics lies in creating product experiences that emulate the expertise of data analysts without requiring their constant intervention. While not a silver bullet, AI may be a crucial piece to this puzzle. However, rather than building chatbots that simply query databases, we need intelligent systems that understand business context and guide users through their analytical journey. I envision an extension of Thoughtspot's approach, where the data team curates an analytics-ready data warehouse and provides context to an LLM, while the end stakeholder manages the analysis.2

However, this requires a paradigm shift in how data teams operate by eliminating ad-hoc request support and doubling down on:

  • Building robust data models that capture business logic.

  • Creating comprehensive metadata that helps AI understand context.

  • Developing guardrails that prevent misinterpretation of data.

  • Continuously improving the system based on user interactions.

Empathize with your client stakeholders

Data teams need to fundamentally reimagine their relationship with stakeholders by treating them as clients rather than internal users. Using the CSM example from earlier, when clients asked questions, CSMs didn't respond with "Have you checked our documentation?" They engaged meaningfully with the request, understanding that clients 1) did not have the technical nor domain expertise of our organization and 2) should not be expected to "figure it out".

This client-service mindset should extend to how we build and maintain data systems. In practice, this means investing in truly understanding users' workflows. What decisions are they making, and what is their skill level in getting to that decision point? This means regular shadowing sessions, feedback loops, and user testing – just like product teams do. Data teams are providing both a product and a service, and it’s time we start acting like it.

What’s next?

I’ve been heads-down on a new project, built on the problem statement above: there will always be more questions, and current data systems are ineffective in filling this gap. While not directly seeking to address the plight of the data team, it revolves around the CSM example above (niched way down) and involves AI (of course it does). More to come in a follow-up post 👀

1  The number of times BI tools change because a decision-maker is “comfortable with that one” is non-negligible.

2  Note how I use “manage” here. I don’t foresee the end stakeholder building content (as exists in self-service today), but rather interacting with the LLM as the data analyst to build content.

Reply

or to participate.