Skip to main content

3 posts tagged with "Databricks"

View All Tags

Introducing Starlake.ai

· 2 min read
Abdelhamide El Arib
Starlake Core Team

We're excited to unveil Starlake.ai, a groundbreaking platform designed to streamline your data workflows and unlock the full potential of your data. 🚀

The Challenges We Solve

In the modern data landscape, businesses often face these challenges:

  • Overwhelming complexity in managing data pipelines
  • Inefficiencies in transforming and orchestrating data workflows
  • Lack of robust governance and data quality assurance

Starlake tackles these problems head-on, offering a declarative data pipeline solution that simplifies the entire data lifecycle.

How to unit test your data pipelines

· 6 min read
Bounkong Khamphousone
Starlake Core Team

In today's data-driven landscape, ensuring the reliability and accuracy of your data warehouse is paramount. The cost of not testing your data can be astronomical, leading to critical business decisions based on faulty data and eroding trust. 

The path to rigorous data testing comes with its own set of challenges. In this article, I will highlight how you can confidently deploy your data pipelines by leveraging Starlake JSQLTranspiler and DuckDB, while also reducing costs. we will go beyond testing your transform usually written in SQL and see how we can also test our Ingestion jobs.

Polars versus Spark

· 6 min read
Hayssam Saleh
Starlake Core Team

Introduction

Polars is often compared to Spark. In this post, I will highlight the main differences and the best use cases for each in my data engineering activities.

As a Data Engineer, I primarily focus on the following goals:

  1. Parsing files, validating their input, and loading the data into the target data warehouse.
  2. Once the data is loaded, applying transformations by joining and aggregating the data to build KPIs.

However, on a daily basis, I also need to develop on my laptop and test my work locally before delivering it to the CI pipeline and then to production.

What about my fellow data scientist colleagues? They need to run their workload on production data through their favorite notebook environment.