5 posts tagged with "BigQuery"

Introducing Starlake.ai

January 8, 2025 · 2 min read

Starlake Core Team

We're excited to unveil Starlake.ai, a groundbreaking platform designed to streamline your data workflows and unlock the full potential of your data. 🚀

The Challenges We Solve

In the modern data landscape, businesses often face these challenges:

Overwhelming complexity in managing data pipelines
Inefficiencies in transforming and orchestrating data workflows
Lack of robust governance and data quality assurance

Starlake tackles these problems head-on, offering a declarative data pipeline solution that simplifies the entire data lifecycle.

How to Load and Transform into BigQuery Wildcard Tables

December 19, 2024 · 5 min read

Hayssam Saleh

Starlake Core Team

Sharding

BigQuery Wildcard Tables

When loading files into BigQuery, you may need to split your data into multiple partitions to reduce data size, improve query performance, and lower costs. However, BigQuery’s native partitioning only supports columns with date/time or integer values. While partitioning on string columns isn’t directly supported, BigQuery provides a workaround with wildcard tables, offering nearly identical benefits.

In this example, we demonstrate how Starlake simplifies the process by seamlessly loading your data into wildcard tables.

How to unit test your data pipelines

July 5, 2024 · 6 min read

Bounkong Khamphousone

Starlake Core Team

In today's data-driven landscape, ensuring the reliability and accuracy of your data warehouse is paramount. The cost of not testing your data can be astronomical, leading to critical business decisions based on faulty data and eroding trust.

The path to rigorous data testing comes with its own set of challenges. In this article, I will highlight how you can confidently deploy your data pipelines by leveraging Starlake JSQLTranspiler and DuckDB, while also reducing costs. we will go beyond testing your transform usually written in SQL and see how we can also test our Ingestion jobs.

Column and Row Level Security in BigQuery

February 15, 2022 · 4 min read

Hayssam Saleh

Starlake Core Team

Data exposition strategies

Data may be exposed using views or authorized views and more recently using Row / Column level security.

Historically, to restrict access on specific columns or rows in BigQuery, one can create a (authorized) view with a SQL request like the one below:

CLS / RLS using Views

Handling Dynamic Partitioning and Merge with Spark on BigQuery

December 15, 2021 · 7 min read

Hayssam Saleh

Starlake Core Team

Data Loading strategies

When loading data into BigQuery, you may want to:

Overwrite the existing data and replace it with the incoming data.
Append incoming data to existing
Dynamic partition Overwrite where only the partitions to which the incoming data belong to are overwritten.
Merge incoming data with existing data by keeping the newest version of each record.

For performance reasons, when having huge amount of data, tables are usually split into multiple partitions. BigQuery supports range partitioning which are uncommon and date/time partitioning which is the most widely used type of partitioning.

The Challenges We Solve​

BigQuery Wildcard Tables​

Data exposition strategies​

Data Loading strategies​

The Challenges We Solve

BigQuery Wildcard Tables

Data exposition strategies

Data Loading strategies