September 23, 2022 § Leave a comment
This essay by Benn Stancil provoked me so deeply my intended “comment” evolved into a full-fledged blog post:
Benn’s “rant” feels profound on so many levels, especially if I can assume he’s captured the zeitgeist of our industry as accurately as he usually does.
My first observation is that he seem to (wisely!) invert Postel’s Law for data: be strict in what you accept, and generous in what you emit. The profound truth here is that we cannot control other people. We can only honestly and gracefully fail, if we are not getting what we need to succeed.
I can’t help but wonder how much of the energy around “data contracts” is the desire to avoid facing exactly that reality.
Next, the corollary to this is something I literally wrote last night in an internal planning document: “transparency is more important than compliance”. The context is that don’t want employees worried about “appearing” to reach nominal goals. I want them to be ruthlessly honest with us about the true risks to delivering genuine impact.
Third, the profound implications of this is that we must shift power from centralized hierarchies to decentralized networks. We have to stop chasing Xanadu — the mythical demo of reliable hyperlinks — and embrace the chaotic generativity of the World Wide Web. That is the only kind of system that ever truly scales.
Finally, Benn is right that it is foolish to replace a technical problem with a human problem. But I fear you can never avoid the human problem, only squish it somewhere else. The challenge is finding the “right” human problem to solve, so the rest of the system can support that as efficiently as possible.
I think Benn is calling for pipelines to “fail quickly” when it is better for consumers to get explicitly old data versus implicitly wrong data. But that implies non-fatal errors must be communicated transparently yet efficiently throughout the stack.
This is literally impossible (née Masnick), but I believe it is THE human problem that must be addressed — even if we can never solve it! Once we embrace that ugly truth, we can devote all of our effort to doing the best we can technically, while giving each other grace to recognize our human limits.
That’s a contract I’m willing to sign up for. How about you?
July 9, 2022 § 1 Comment
June 30, 2022 § 1 Comment
- The Data Config by Benn Stancil (Medium)
- https://github.com/TheSwanFactory/pipebook (App)
- PipeBook: UX Design Brief (Blog)
- https://github.com/TheSwanFactory/fridaay (Framework)
- Data on Rails: Solving the Data App Imperative (YouTube)
The modern data notebook has its roots in academic tools for mathematical research. Because of that, notebooks are fantastic for open-ended exploration, but an awkward match for production data pipelines. In particular, they don’t:
- Explicitly declare and track dependencies
- Enforce organizational quality and reproducibility standards
- Enable easy testing, validation, and alerting
PipeBooks are a simple but radical re-imagining of notebooks as “tools for iteratively constructing resilient data pipelines.” The key is a novel data format called FRIDAAY that allows us to:
- Express arbitrary data transformations
- As a series of idempotent Data Actions
- Via a single, easy-to-parse YAML file