Skip to main content

One post tagged with "modern data stack"

View All Tags

Post Modern Data Stack: Starlake's Declarative Approach

· 6 min read

The "Modern Data Stack" (MDS) gave us cloud agility, but it also delivered complexity: fragmented tools, opaque lineage, and dialect lock-in. It's time to move past the modern toward the declarative.

The Declarative Data Stack isn't just another layer on the MDS; it’s a foundational rethinking of data engineering, offering a simpler, faster, and more secure pathway to production-grade data.

Starlake is distinguished by its declarative, configuration-as-code approach: all of its features and settings, from ingestion to semantic modeling are defined using the simple, unified YAML declarative language. This allows data pipelines to be treated as robust, version-controlled software artifacts.


1. Quality-First Ingestion: Moving Beyond Simple Data Movement

Traditional data stacks focus heavily on ETL/ELT—moving data from Point A to Point B. But what good is fast movement if the data is garbage? Starlake starts with quality.

  • Validation is Core: Starlake is not just about data movement; it thoroughly validates the incoming data against your defined schemas and business rules before it enters your transformation pipeline.
  • Built-in Data Quality: Instead of requiring a separate, bolted-on data quality tool, Starlake integrates validation, cleansing, and quality checks into the very first layer of ingestion.

The Benefit: By validating data at the source, Starlake ensures only high-quality, trusted data flows into your warehouse, reducing errors downstream and eliminating the "garbage in, garbage out" problem.


2. SQL, Unchained: Pure Transformation Logic

The MDS insists on embedding logic (like Jinja or proprietary domain-specific languages) directly into your SQL. This adds complexity and breaks portability.

Modern Data Stack WayStarlake Declarative Way
Uses Jinja to manage table relationships and lineage.Uses SQL only. No complex templating language required.
SQL cannot be easily copy-pasted or run outside the tool.SQL is pure and portable. Copy/paste your transformation logic from any tool (like DBeaver or Snowsight) and it just works.

The Benefit: By isolating transformation logic to pure SQL, Starlake automatically derives table and column lineage, making your pipelines transparent, auditable, and accessible to any SQL developer, regardless of their background or tool they use.


3. Local-First Development for Global Deployment

Why should developing a data pipeline require a costly, minutes-long compile-and-test cycle on your production data warehouse? With Starlake, you don't have to.

Starlake embraces a local-first philosophy thanks to transparent transpilation:

  • Develop Locally, Deploy Globally: Develop and debug your entire pipeline using DuckDB on your laptop. DuckDB offers lightning-fast, zero-cost execution on your sample data.
  • Transparent Transpilation: Starlake automatically converts your written datawarehouse’s SQL dialect (Snowflake, BigQuery, Spark ...) into the DuckDB dialect when running locally. When you deploy, the original datawarehouse’s SQL is used.
  • Faster CI/CD: You can run full Continuous Integration tests on your SQL transformations without ever hitting your data warehouse, dramatically reducing CI/CD costs and iteration time.

The Benefit: Achieve faster development cycles and reduced costs by decoupling the development environment from the costly production warehouse.


4. Git-Style Data Branching (Zero Copy)

Branching code is standard. Branching data is often a nightmare of expensive copies and complex synchronization. Starlake solves this using the datawarehouse’s powerful SNAPSHOT feature.

  • Zero Copy Data Branching: Starlake creates a logical "branch" of your production data. This is not a costly physical copy. Instead, it’s a zero copy operation—a pointer to the production data at a moment in time.
  • Production Data Safety: Users get read-only rights on the production data, allowing safe exploration, testing, and development without any risk of accidentally updating or corrupting the live environment.

The Benefit: Enable agile data development and experimentation by allowing teams to work on production data safely and efficiently, just as developers work on code branches in Git.


5. Orchestration Agnostic: Your Orchestrator, Our Lineage

No vendor lock-in. Just clean, native orchestration, powered by lineage.

The MDS often requires you to use a specific tool (or a proprietary orchestrator) just to manage pipeline dependencies. Starlake believes orchestration should be a pluggable utility.

  • Automatic Execution Graph: Through its analysis of the pure SQL lineage, Starlake automatically generates the execution dependencies (the DAG).
  • Orchestration Agnostic Deployment: Instead of wiring dependencies manually, Starlake generates from SQL lineage, event-driven and dataset-aware DAGs for your orchestrator of choice (e.g., Snowflake Tasks, Google Cloud Composer, Airflow, Dagster, etc.).

The Benefit: You leverage the native orchestration capabilities of your data cloud, leading to simplified infrastructure and reduced dependency management.


6. Agnostic Semantic Modeling

Semantic models should serve your business logic, not be locked to a single BI tool. Starlake breaks down the silos between the business users and the dataviz developers.

Business logic shouldn’t live and die inside multiple dashboards. With Starlake, your metrics and relationships are declared once at the business level and then automatically available in multiple semantic formats:

  • Snowflake Cortex Analyst semantic model
  • Power BI TMDL
  • Looker LookML

That means your KPIs are consistent across dashboards - no more "metric drift" between tools.

The Benefit: Ensure consistency across your BI landscape and avoid expensive, error-prone manual translations of business metrics between tools.


7. CLI and GUI Support: Code When You Want, Click When You Don't

The Declarative Data Stack recognizes that different users have different needs.

  • CLI (Code): Data Engineers and developers can use the Command Line Interface for scripting, automation, and complex pipeline management.
  • GUI (Click): Data Analysts and less technical users can use the Graphical User Interface to configure simple ingestion, view lineage, and monitor health.

The Benefit: Unify your data team by providing tools optimized for both code-first engineers and visual analysts, maximizing efficiency for everyone.


Conclusion: The MDS vs. The DDS

We move from fragmented, templated, and expensive data engineering to one that is pure, local-first, and quality-driven.

FeatureModern Data Stack (MDS)Starlake (Declarative Data Stack)
Data QualityValidation added after movement.Thoroughly validated at ingestion; quality is built-in.
Transformation LogicSQL mixed with Jinja or Python templating.SQL only. Pure, portable, and easily auditable.
Development CycleRemote execution on production DW; slow, costly iteration.Local-First (DuckDB) with transparent Snowflake SQL transpilation. Fast, cheap cycles.
CI/CDRequires access to the live data warehouse.Runs on local SQL transpilation; no DW access needed.
Data ExperimentationCostly full data copies or manual sandbox creation.Git Style Data Branching. Safe, read-only Git-like control over production data.
OrchestrationOften mandates proprietary orchestrators.Automated DAG Generation & Agnostic Deployment. Generates execution graph for your orchestrator of choice.
Semantic LayerLocked to a single BI platform (e.g., LookML only).Agnostic. Available for Cortex Analyst Semantic Model, PowerBI TMDL, Looker LookML.

It’s data engineering that feels like software engineering again.

Ready to see the future of data engineering? Learn more about Starlake today!