Pitch: Data is a Feature, not a Product
May 12, 2022 § Leave a comment
Communal Decision-Making Platforms and the End of the Modern Data Stack
Session Proposal for Coalesce 2022
TL:DR Businesses may start by developing a technical solution, but only succeed by integrating around a human problem. The same is true of the Modern Data Stack.
« Read the rest of this entry »How to Build LightDash from Source
November 11, 2021 § Leave a comment
LightDash is a super-cool Open Source business intelligence tool built on top of DBT (which I think of as node for SQL). While it is distributed as open source, the usual way to deploy it locally is by simply running a docker container.
If you want to actually built lightdash directly from source yourself, you need to follow the instructions under CONTRIBUTING. However, what was written there (as of November 11, 2021) did not quite work for me, so here are my workarounds.
I will also file this as a GitHub issue, and they are super-responsive so hopefully this page will be obsolete soon!
« Read the rest of this entry »From “Zombie Data” to “Smart Reports”
October 4, 2021 § Leave a comment
The bane of my IT existence is a business user who says, “Please get me the latest version of <random Excel file I have never seen before, named using idiosyncratic or ambiguous words>. Oh, and I need it tomorrow or else we won’t {make our numbers | pass our audit | satisfy the board}.”
I call this “zombie data” because it:
- Lacks any self-awareness
- Doesn’t remember where it came from
- Has no relationship to its current context
- Infects everyone it touches with that same mindlessness.
The Reporting Control Center
September 5, 2021 § 1 Comment
aka Quilt Data Hub or Lightdash 2.0?
Challenge
Can I evangelize
a corporate data platform
by just emailing out reports
with sufficiently smart URLs?
Rationale
I don’t have the power
to pull others onto a new platform.
But I can push useful data to others
in a way that inspires them to participate more directly with the platform
Proposal
Replace friendly Salesforce Reports and powerful NetSuite Saved Searches with a unified interface for viewing, editing, sharing, and managing:
- versioned reports
- personalized alerts
- variant analyses
that are delivered via self-contained emails that also onboard people into greater use of the platform
Definitions
Friendly
- Browseable
- Drag and Drop
- Live previews
Powerful
- Complex formulas
- Scaleable notifications
- Easy joins and relabeling
Motivation
The main value of Quilt to my business
is as a point of leverage
to shift the culture of communication
from “zombie data” in tables
to “smart reports” in a repository
The Coherency Manifesto: Towards Communal Data Platforms
August 21, 2021 § 1 Comment
Version 1.0: Sep 11, 2021 (Interdependence Day)
As a community
who produces, consumes, and manages data
we hold these truths to be self-evident:
Psycho-Analytic Engineering (Coalesce 2021)
June 6, 2021 § Leave a comment
Using Data to Differentiate Our Selves
Keynote Talk Proposal for Coalesce 2021
Based on “DBT as Organizational Therapy“
« Read the rest of this entry »DBT as the “Couch” for Organizational Therapy
May 13, 2021 § 1 Comment
Or, “How ELTT is the Key to World Peace”
Draft Submission Script for Coalesce 2021 « Read the rest of this entry »
SyncHouse: MVC for Enterprise SaaS
May 2, 2021 § 1 Comment
A concrete proposal for Imagining a Data Resort as enforcing a Model-View-Controller architecture across multiple Software-as-a-Service applications. The key is replacing transient enterprise data integrations with a persistent “sync house,” and making that the one full-service Source of Truth for data, schemas, and business logic.
- Ingest data from Salesforce, NetSuite, etc. (e.g.,
Stitch/Talend, FiveTran) - Store raw data in a LakeHouse (e.g., Databricks, Delta Lake; or just Redshift)
- Aka “ELT vs ETL“
- Manage schemas via dbt (e.g., dbt Cloud)
- View and report on appropriate data (e.g., Mode, Data Studio)
- Push updates (reverse ETL) back to source applications (e.g.,
Celigo, Get Census)
My First Date with Quilt Data
July 21, 2020 § Leave a comment
I’ve known the good folks at Quilt Data for a long time. A company hackathon gave me a good excuse to actually use them “in anger” for an actual demo. These are my notes on how to configure quilt3 and create my first package (and panda data frame) from a CSV
« Read the rest of this entry »