Beyond (Data) Contracts: A Response to Benn Stancil

September 23, 2022 § Leave a comment

This essay by Benn Stancil provoked me so deeply my intended “comment” evolved into a full-fledged blog post:

Fine, let’s talk about data contracts

Benn’s “rant” feels profound on so many levels, especially if I can assume he’s captured the zeitgeist of our industry as accurately as he usually does.

My first observation is that he seem to (wisely!) invert Postel’s Law for data: be strict in what you accept, and generous in what you emit. The profound truth here is that we cannot control other people. We can only honestly and gracefully fail, if we are not getting what we need to succeed.

We can only honestly and gracefully fail, if we are not getting what we need to succeed.

I can’t help but wonder how much of the energy around “data contracts” is the desire to avoid facing exactly that reality.

Next, the corollary to this is something I literally wrote last night in an internal planning document: “transparency is more important than compliance”. The context is that don’t want employees worried about “appearing” to reach nominal goals. I want them to be ruthlessly honest with us about the true risks to delivering genuine impact.

“Transparency is more important than Compliance”

Third, the profound implications of this is that we must shift power from centralized hierarchies to decentralized networks. We have to stop chasing Xanadu — the mythical demo of reliable hyperlinks — and embrace the chaotic generativity of the World Wide Web. That is the only kind of system that ever truly scales.

Shift power from centralized hierarchies to decentralized networks

Finally, Benn is right that it is foolish to replace a technical problem with a human problem. But I fear you can never avoid the human problem, only squish it somewhere else. The challenge is finding the “right” human problem to solve, so the rest of the system can support that as efficiently as possible.

Finding the “right” human problem to solve, so the rest of the system can support that as efficiently as possible.

I think Benn is calling for pipelines to “fail quickly” when it is better for consumers to get explicitly old data versus implicitly wrong data. But that implies non-fatal errors must be communicated transparently yet efficiently throughout the stack.

This is literally impossible (née Masnick), but I believe it is THE human problem that must be addressed — even if we can never solve it! Once we embrace that ugly truth, we can devote all of our effort to doing the best we can technically, while giving each other grace to recognize our human limits.

That’s a contract I’m willing to sign up for. How about you?

PipeBook: UX Design Brief

July 9, 2022 § 1 Comment

A key design goal of PipeBook is to break away from the single-browser-window user experience of traditional data notebooks, to take full advantage of the large screens on today’s laptops and desktops.

The PipeBook Multi-Window User Experience (Live prototype, annotated)
« Read the rest of this entry »

PipeBook.yml: Reimagining Notebooks as Resilient Data Pipelines

June 30, 2022 § 1 Comment

See Also

Overview

The modern data notebook has its roots in academic tools for mathematical research. Because of that, notebooks are fantastic for open-ended exploration, but an awkward match for production data pipelines. In particular, they don’t:

  • Explicitly declare and track dependencies
  • Enforce organizational quality and reproducibility standards
  • Enable easy testing, validation, and alerting

PipeBooks are a simple but radical re-imagining of notebooks as “tools for iteratively constructing resilient data pipelines.” The key is a novel data format called FRIDAAY that allows us to:

  • Express arbitrary data transformations
  • As a series of idempotent Data Actions
  • Via a single, easy-to-parse YAML file
« Read the rest of this entry »

Analytics Anonymous: The Missing Peace of the Modern Data Stack

May 31, 2022 § Leave a comment

Pitch 2 for Coalesce 2022 (unsubmitted) « Read the rest of this entry »

Pitch: Data is a Feature, not a Product

May 12, 2022 § 1 Comment

Communal Decision-Making Platforms and the End of the Modern Data Stack

Session Proposal for Coalesce 2022

TL:DR Businesses may start by developing a technical solution, but only succeed by integrating around a human problem. The same is true of the Modern Data Stack.

« Read the rest of this entry »

How to Build LightDash from Source

November 11, 2021 § Leave a comment

LightDash is a super-cool Open Source business intelligence tool built on top of DBT (which I think of as node for SQL). While it is distributed as open source, the usual way to deploy it locally is by simply running a docker container.

If you want to actually built lightdash directly from source yourself, you need to follow the instructions under CONTRIBUTING. However, what was written there (as of November 11, 2021) did not quite work for me, so here are my workarounds.

I will also file this as a GitHub issue, and they are super-responsive so hopefully this page will be obsolete soon!

« Read the rest of this entry »

From “Zombie Data” to “Smart Reports”

October 4, 2021 § Leave a comment

The bane of my IT existence is a business user who says, “Please get me the latest version of <random Excel file I have never seen before, named using idiosyncratic or ambiguous words>. Oh, and I need it tomorrow or else we won’t {make our numbers | pass our audit | satisfy the board}.”

I call this “zombie data” because it:

  • Lacks any self-awareness
  • Doesn’t remember where it came from
  • Has no relationship to its current context
  • Infects everyone it touches with that same mindlessness.
« Read the rest of this entry »

The Reporting Control Center

September 5, 2021 § 1 Comment

aka Quilt Data Hub or Lightdash 2.0?

Challenge

Can I evangelize
a corporate data platform
by just emailing out reports
with sufficiently smart URLs?

Rationale

I don’t have the power
to pull others onto a new platform.
But I can push useful data to others
in a way that inspires them to participate more directly with the platform

Proposal

Replace friendly Salesforce Reports and powerful NetSuite Saved Searches with a unified interface for viewing, editing, sharing, and managing:

  • versioned reports
  • personalized alerts
  • variant analyses

that are delivered via self-contained emails that also onboard people into greater use of the platform

Definitions

Friendly

  • Browseable
  • Drag and Drop
  • Live previews

Powerful

  • Complex formulas
  • Scaleable notifications
  • Easy joins and relabeling

Motivation

The main value of Quilt to my business
is as a point of leverage
to shift the culture of communication
from “zombie data” in tables
to “smart reports” in a repository

The Coherency Manifesto: Towards Communal Data Platforms

August 21, 2021 § 2 Comments

Version 1.0: Sep 11, 2021 (Interdependence Day)

As a community
who produces, consumes, and manages data
we hold these truths to be self-evident:

« Read the rest of this entry »

Psycho-Analytic Engineering (Coalesce 2021)

June 6, 2021 § Leave a comment

Using Data to Differentiate Our Selves

Keynote Talk Proposal for Coalesce 2021

Google Slides

Based on “DBT as Organizational Therapy

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with data at iHack, therefore iBlog.