Imagining a Data Resort
April 21, 2021 § 1 Comment
A data resort is where data comes to get pampered, so that it is prepared to get back to work.
Motivation
The good news is that I finally understand how we really need to be managing all the business data in my organization. The bad news is that I don’t know how to articulate that in terms of industry-standard terminology (examples below). Worse, I’d probably use the wrong term (or the right term incorrectly) leading to endless rounds of frustrating conversation.
Therefore I’ve coined a new term (“data resort“) which I can define as the exact thing I want. Hopefully you my dear readers can help translate that into something concrete I can efficiently buy+build today!
Let me know your suggestions in the comments or via email.
Requirements
- Data Lake: schema-on-read access to an immutable, append-only data store
- Connectors: easily import and export data via dynamic schemas over multiple protocols (eg, HTTP, SMTP) to REST, SalesForce, NetSuite, etc. Should use (or be as easy as) Celigo’s integrator.io.
- Logic Layer: git repository plus execution engine for queries, triggers, and automations using standard languages (e.g., JavaScript, SQL)
- Console: unified web interface for browsing, configuring, debugging — and ideally editing — queries, reports, data and logic
- Time Travel: trivial to run (and idempotently rerun) any version of logic against any snapshot of data, e.g., staging servers, sandboxes, 2019 data against 2021 logic, vice versa, etc.
- Access Control: fine-grained permissions of who can do what
- Auditing: fine-grained tracking of who did what when
My dream is that this would subsume (and eventually eliminate or at least deprecate):
- Analytics Engines (Tableau, Qubole)
- Data Warehouses
- In-App Scripting and Alerts (Salesforce, NetSuite)
- Manually-Prepared Audits
- Mutable Databases for Business Data
- Storage as a Service (Box, DropBox)
- Sync Engines (Celigo, DELL Boomi)
Procurement
I would love to buy a turnkey solution for this from a top-tier vendor. Unfortunately, all the ones I know about have strong incentives to avoid empowering customers to this extent.
Intriguingly, I hear rumors that large enterprises “at scale” organically end up with an architecture like this, but by that point it is customized and idiosyncratic that it can’t be used by anyone else. My hope is that the various components have finally become mature enough a dedicated amateur can cobble something together at a low-enough cost for our medium-sized business.
Therefore, I suspect this will require some combination of:
- Best-of-breed commercial solutions for key components
- Open source modules for commodity functionality
- A hungry young startup eager to vertically integrate into a full-stack solution
- Custom development to fill in the gaps
Existing Data- Terminology
- Analytics
- Base
- Broker
- Catalog
- Center
- Connector
- Custodian
- Frame
- Governance
- Integration
- Integrity
- Lake
- Lab
- Library
- Loader
- Management
- Mart
- Mesh
- Mining
- Model
- Path
- Record
- Repository
- Schema
- Set
- Source
- Strategy
- Store
- Table
- Tree
- Verse
- View
- Warehouse
See Also
QuantOps 2030: Ten Years into the Analog Revolution
[…] concrete proposal for Imagining a Data Resort as enforcing a Model-View-Controller architecture across multiple Software-as-a-Service […]