Working with historized tables
In some cases, you might have historized tables that store historical data, and you want to query the most recent data by default. However, you might also want to query the data over time or for a specific time period. In this tutorial, we'll show you how to handle historized tables using Veezoo's virtual_table
and default_filter
.
Example scenario
Let's consider a historized table named historized_reviews
that contains review data for various platforms. By default, you want to query the latest review data, for example, "What's our review star rating on Play Store?".
However, you might also want to query the data over time or for a specific time period, such as "How did our review star rating develop over time for Play Store?" or "What was our review score in the second week of Feb 2023 for each platform?".
Step 1: Draft a SQL that adds a flag for the most recent data
First, you need to create a flag that indicates the most recent data for each platform. You can do this by adding a query with a CTE (Common Table Expression) that ranks the records by the load_timestamp
:
WITH ranked_records AS (
SELECT DISTINCT
r."platform" as "platform",
r."load_timestamp" as "load_timestamp",
r."review_star" as "review_star",
ROW_NUMBER() OVER (
PARTITION BY r."platform"
ORDER BY r."load_timestamp" DESC
) as row_num
FROM
"your_database"."your_schema"."historized_reviews" r
)
SELECT
rr."platform",
rr."load_timestamp",
rr."review_star",
CASE
WHEN rr.row_num = 1 THEN 1
ELSE 0
END as is_most_recent
FROM
ranked_records rr
This query will have a new column is_most_recent
that has a value of 1 for the most recent record and 0 for others.
Now that you have the flag for the most recent data, you need to set up the logic in Veezoo.
Step 2: Change our historized_reviews to be a virtual_table
First, we will make historized_reviews
be a virtual_table
instead, including the new is_most_recent
flag.
To do that you will first need to switch the sidebar in Studio to show the file tree instead:
Then open the table definition file for the historized_reviews
table.
We will change from this:
db.storage.db.your_schema {
table historized_reviews {
identifier: "historized_reviews"
...
}
}
...to include the virtual_table
and virtual_table_sql
instead:
db.storage.db.your_schema {
virtual_table historized_reviews {
identifier: "historized_reviews"
virtual_table_sql:
"""
WITH ranked_records AS (
SELECT DISTINCT
r."platform" as "platform",
r."load_timestamp" as "load_timestamp",
r."review_star" as "review_star",
ROW_NUMBER() OVER (
PARTITION BY r."platform"
ORDER BY r."load_timestamp" DESC
) as row_num
FROM
"your_database"."your_schema"."historized_reviews" r
)
SELECT
rr."platform",
rr."load_timestamp",
rr."review_star",
CASE
WHEN rr.row_num = 1 THEN 1
ELSE 0
END as is_most_recent
FROM
ranked_records rr"""
// manually added
column is_most_recent {
identifier: "is_most_recent"
primitive_type: "BIT"
}
...
}
}
Step 3: Add a boolean for the is_most_recent
flag
kb {
class Historized_Reviews {
...
boolean is_Most_Recent {
name: "most recent"
sql: "${historized_reviews.is_most_recent}"
}
}
}
Step 4: Add a default_filter with a condition
Add a default_filter
with a condition that only applies the filter when the Load_Timestamp
is not being used in the question:
kb {
class Historized_Reviews {
...
default_filter: {
filter: "this.is_Most_Recent = true"
// this means: only use this filter when Load_Timestamp is not being used in the
// question (e.g., 'review star over time' should not use the filter)
condition: "@missing { this.Load_Timestamp }"
}
...
}
}
Now, Veezoo will handle historized tables with the most recent data by default. When you ask questions involving time, such as "How did our review star rating develop over time for Play Store?", Veezoo will not use the filter, allowing you to query historical data.
Best practice: Move the SQL to dbt
As a best practice, we recommend moving the SQL used for the virtual table to dbt. This makes it easier to maintain in the long term. To do this, create a new dbt model for the historized_reviews
table with the is_most_recent
flag, and replace the virtual table in Veezoo with a reference to the new table generated by the dbt model.