Skip to main content

Working with historized tables

In some cases, you might have historized tables that store historical data, and you want to query the most recent data by default. However, you might also want to query the data over time or for a specific time period. In this tutorial, we'll show you how to handle historized tables using Veezoo's virtual_table and default_filter.

Example scenario

Let's consider a historized table named historized_reviews that contains review data for various platforms. By default, you want to query the latest review data, for example, "What's our review star rating on Play Store?".

However, you might also want to query the data over time or for a specific time period, such as "How did our review star rating develop over time for Play Store?" or "What was our review score in the second week of Feb 2023 for each platform?".

Step 1: Draft a SQL that adds a flag for the most recent data

First, you need to create a flag that indicates the most recent data for each platform. You can do this by adding a query with a CTE (Common Table Expression) that ranks the records by the load_timestamp:

WITH ranked_records AS (
SELECT DISTINCT
r."platform" as "platform",
r."load_timestamp" as "load_timestamp",
r."review_star" as "review_star",
ROW_NUMBER() OVER (
PARTITION BY r."platform"
ORDER BY r."load_timestamp" DESC
) as row_num
FROM
"your_database"."your_schema"."historized_reviews" r
)
SELECT
rr."platform",
rr."load_timestamp",
rr."review_star",
CASE
WHEN rr.row_num = 1 THEN 1
ELSE 0
END as is_most_recent
FROM
ranked_records rr

This query will have a new column is_most_recent that has a value of 1 for the most recent record and 0 for others.

Now that you have the flag for the most recent data, you need to set up the logic in Veezoo.

Step 2: Change our historized_reviews to be a virtual_table

First, we will make historized_reviews be a virtual_table instead, including the new is_most_recent flag.

To do that you will first need to switch the sidebar in Studio to show the file tree instead:

Switch to file tree

Then open the table definition file for the historized_reviews table.

Switch to file tree

We will change from this:

db.storage.db.your_schema {
table historized_reviews {
identifier: "historized_reviews"

...
}
}

...to include the virtual_table and virtual_table_sql instead:

db.storage.db.your_schema {
virtual_table historized_reviews {
identifier: "historized_reviews"
virtual_table_sql:
"""
WITH ranked_records AS (
SELECT DISTINCT
r."platform" as "platform",
r."load_timestamp" as "load_timestamp",
r."review_star" as "review_star",
ROW_NUMBER() OVER (
PARTITION BY r."platform"
ORDER BY r."load_timestamp" DESC
) as row_num
FROM
"your_database"."your_schema"."historized_reviews" r
)
SELECT
rr."platform",
rr."load_timestamp",
rr."review_star",
CASE
WHEN rr.row_num = 1 THEN 1
ELSE 0
END as is_most_recent
FROM
ranked_records rr"""

// manually added

column is_most_recent {
identifier: "is_most_recent"
primitive_type: "BIT"
}

...
}
}

Step 3: Add a boolean for the is_most_recent flag

kb {
class Historized_Reviews {
...

boolean is_Most_Recent {
name: "most recent"
sql: "${historized_reviews.is_most_recent}"
}
}
}

Step 4: Add a default_filter with a condition

Add a default_filter with a condition that only applies the filter when the Load_Timestamp is not being used in the question:

kb {
class Historized_Reviews {
...

default_filter: {
filter: "this.is_Most_Recent = true"
// this means: only use this filter when Load_Timestamp is not being used in the
// question (e.g., 'review star over time' should not use the filter)
condition: "@missing { this.Load_Timestamp }"
}

...
}
}

Now, Veezoo will handle historized tables with the most recent data by default. When you ask questions involving time, such as "How did our review star rating develop over time for Play Store?", Veezoo will not use the filter, allowing you to query historical data.

Best practice: Move the SQL to dbt

As a best practice, we recommend moving the SQL used for the virtual table to dbt. This makes it easier to maintain in the long term. To do this, create a new dbt model for the historized_reviews table with the is_most_recent flag, and replace the virtual table in Veezoo with a reference to the new table generated by the dbt model.