Beyond (Data) Contracts: A Response to Benn Stancil

September 23, 2022 § Leave a comment

This essay by Benn Stancil provoked me so deeply my intended “comment” evolved into a full-fledged blog post:

Fine, let’s talk about data contracts

Benn’s “rant” feels profound on so many levels, especially if I can assume he’s captured the zeitgeist of our industry as accurately as he usually does.

« Read the rest of this entry »

PipeBook: UX Design Brief

July 9, 2022 § 1 Comment

A key design goal of PipeBook is to break away from the single-browser-window user experience of traditional data notebooks, to take full advantage of the large screens on today’s laptops and desktops.

The PipeBook Multi-Window User Experience (Live prototype, annotated)
« Read the rest of this entry »

PipeBook.yml: Reimagining Notebooks as Resilient Data Pipelines

June 30, 2022 § 1 Comment

See Also

Overview

The modern data notebook has its roots in academic tools for mathematical research. Because of that, notebooks are fantastic for open-ended exploration, but an awkward match for production data pipelines. In particular, they don’t:

  • Explicitly declare and track dependencies
  • Enforce organizational quality and reproducibility standards
  • Enable easy testing, validation, and alerting

PipeBooks are a simple but radical re-imagining of notebooks as “tools for iteratively constructing resilient data pipelines.” The key is a novel data format called FRIDAAY that allows us to:

  • Express arbitrary data transformations
  • As a series of idempotent Data Actions
  • Via a single, easy-to-parse YAML file
« Read the rest of this entry »

Analytics Anonymous: The Missing Peace of the Modern Data Stack

May 31, 2022 § Leave a comment

Pitch 2 for Coalesce 2022 (unsubmitted) « Read the rest of this entry »

Pitch: Data is a Feature, not a Product

May 12, 2022 § 1 Comment

Communal Decision-Making Platforms and the End of the Modern Data Stack

Session Proposal for Coalesce 2022

TL:DR Businesses may start by developing a technical solution, but only succeed by integrating around a human problem. The same is true of the Modern Data Stack.

« Read the rest of this entry »

How to Build LightDash from Source

November 11, 2021 § Leave a comment

LightDash is a super-cool Open Source business intelligence tool built on top of DBT (which I think of as node for SQL). While it is distributed as open source, the usual way to deploy it locally is by simply running a docker container.

If you want to actually built lightdash directly from source yourself, you need to follow the instructions under CONTRIBUTING. However, what was written there (as of November 11, 2021) did not quite work for me, so here are my workarounds.

I will also file this as a GitHub issue, and they are super-responsive so hopefully this page will be obsolete soon!

« Read the rest of this entry »

From “Zombie Data” to “Smart Reports”

October 4, 2021 § Leave a comment

The bane of my IT existence is a business user who says, “Please get me the latest version of <random Excel file I have never seen before, named using idiosyncratic or ambiguous words>. Oh, and I need it tomorrow or else we won’t {make our numbers | pass our audit | satisfy the board}.”

I call this “zombie data” because it:

  • Lacks any self-awareness
  • Doesn’t remember where it came from
  • Has no relationship to its current context
  • Infects everyone it touches with that same mindlessness.
« Read the rest of this entry »

The Reporting Control Center

September 5, 2021 § 1 Comment

aka Quilt Data Hub or Lightdash 2.0?

Challenge

Can I evangelize
a corporate data platform
by just emailing out reports
with sufficiently smart URLs?

Rationale

I don’t have the power
to pull others onto a new platform.
But I can push useful data to others
in a way that inspires them to participate more directly with the platform

Proposal

Replace friendly Salesforce Reports and powerful NetSuite Saved Searches with a unified interface for viewing, editing, sharing, and managing:

  • versioned reports
  • personalized alerts
  • variant analyses

that are delivered via self-contained emails that also onboard people into greater use of the platform

Definitions

Friendly

  • Browseable
  • Drag and Drop
  • Live previews

Powerful

  • Complex formulas
  • Scaleable notifications
  • Easy joins and relabeling

Motivation

The main value of Quilt to my business
is as a point of leverage
to shift the culture of communication
from “zombie data” in tables
to “smart reports” in a repository

The Coherency Manifesto: Towards Communal Data Platforms

August 21, 2021 § 2 Comments

Version 1.0: Sep 11, 2021 (Interdependence Day)

As a community
who produces, consumes, and manages data
we hold these truths to be self-evident:

« Read the rest of this entry »

Psycho-Analytic Engineering (Coalesce 2021)

June 6, 2021 § Leave a comment

Using Data to Differentiate Our Selves

Keynote Talk Proposal for Coalesce 2021

Google Slides

Based on “DBT as Organizational Therapy

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with data at iHack, therefore iBlog.