Knowledge Graph Sync
Why keep your Knowledge Graph synchronized?
The Knowledge Graph is a semantic representation of your data and, because of that, it needs to know the values (entities) in your data.
When we talk about synchronizing the Knowledge Graph, we mean keeping these database values up-to-date, not the database schema.
What does "synchronization" mean?
To understand the importance of synchronization, let's consider an example. You have a table in your database, where each row represents an order. The existing data might look something like this:
order_id | customer_id | customer_name | product_id | product_name | product_description | category | price | order_date |
---|---|---|---|---|---|---|---|---|
1 | 1 | John Doe | 2 | Jeans | Comfortable blue denim jeans that are perfect for everyday wear | Clothing | 50 | 2023-01-01 |
2 | 2 | Jane Smith | 1 | T-shirt | Simple white cotton t-shirt that pairs well with any outfit | Clothing | 20 | 2023-01-02 |
3 | 1 | John Doe | 3 | Watch | Durable stainless steel watch with a modern design | Accessories | 150 | 2023-01-03 |
Let's say your data gets updated, and a new row is added where a new product (e.g. a "Dress") is sold in a new category (e.g. "Formal Wear").
order_id | customer_id | customer_name | product_id | product_name | product_description | category | price | order_date |
---|---|---|---|---|---|---|---|---|
4 | 3 | Bob Brown | 4 | Dress | Elegant red silk evening gown with beautiful beading and a dramatic silhouette. Ideal for formal occasions. | Formal Wear | 100 | 2023-01-04 |
Without syncing, Veezoo will not understand if you mention the word 'Dress' or 'Formal Wear' in a question.
You would want to synchronize your Knowledge Graph so that it is aware of the new product "Dress" and the new category "Formal Wear". This is necessary to allow Veezoo to understand questions related to these new entities, e.g. "How many orders of Dresses were there this year?"
In contrast, if another row is added where an existing customer buys an existing product, like so:
order_id | customer_id | customer_name | product_id | product_name | product_description | category | price | order_date |
---|---|---|---|---|---|---|---|---|
5 | 1 | John Doe | 1 | T-shirt | Simple white cotton t-shirt that pairs well with any outfit | Clothing | 20 | 2023-01-04 |
In this case, there are no new entities that the Knowledge Graph needs to be aware of. So even if a sync did not run since the new order came in, Veezoo will still understand a question about 'T-shirt' or 'Clothing'.
In other words, the Knowledge Graph is used to understand a question, not to answer it. The actual answers to your questions are computed based on the most recent state of your database, using SQL.
The Knowledge Graph will contain the individual distinct values of a database column (e.g. your products), but not all the rows in your database (e.g. all the transactions from these products and the revenue, etc).
What should you sync?
Due to the purpose of sync as described above, you can only sync classes in Veezoo. But not all classes make sense to sync.
Following the example above, here is what would make sense to sync:
Products: If new products are being added, you'd want to sync the Product class to reflect these changes. This way, when you ask Veezoo "How many orders of Dresses did we sell last month?" it will be able to understand and answer the question correctly, even for newly added products.
Categories: If new product categories are being introduced, the Category class should be synced. This will allow Veezoo to understand questions like "How much revenue did we generate from the Clothing category last year?"
(Potentially) Customers: If you're interested in asking questions related to specific customers, then syncing the Customer class can be useful. This allows Veezoo to understand queries like "What did John Doe purchase last month?". If there are millions of customers though, Veezoo does not support syncing and you may want to ask using
string
or the onto.ID pattern.
Here is what you CANNOT sync or MAY NOT want to sync:
Price: Since it is not a class, you cannot sync Price. Questions like "What is the total revenue from T-shirts last month?" don't require the Knowledge Graph to know the individual prices. The calculation of revenue happens at the level of the database, not at the level of the Knowledge Graph.
Product Description: The description column is not necessary to sync because it typically doesn't contribute to the understanding of a question. The Knowledge Graph is used to understand questions and map them to the appropriate entities. The description of a product wouldn't typically be part of a question posed to Veezoo. The best practice is also to model Description as a
string
and not aclass
.Order Date: The order_date column cannot by synced because it is not a class and doesn't play a role in understanding the questions.
Order ID: The Orders class is usually not synced, because often users won't care about asking questions about specific Order IDs. Even if they want to do so, you should follow the onto.ID pattern instead.
How do I synchronize it?
Manual Sync
By default, Veezoo does not automatically schedule to sync the Knowledge Graph.
So the most common way to sync it is by:
- Open in Veezoo Studio the file for the class you want to sync (under
knowledge-base/classes/...
). You can get there quickly by going to the Knowledge Graph sidebar, hovering over the class and clicking to "Open in Studio" in the info panel.
- You will notice on the top bar of the file a sync icon.
- Click on it and select the classes you want to sync.
- Wait... and that's it.
Scheduled Sync
If there are always new individual values being added, you may want to schedule your sync to run with a certain frequency.
- To do this, go to the
Synchronization
view in Studio.
Click on the
+ SCHEDULE
button.Select when the sync should start and how frequent (once, every hour, every day, every week).
- Now you can choose which concepts should be synced or whether you want all concepts with an explicit sync policy set to be synced (see next section).
Scheduled Sync with Sync Policy
To do that, you will need to:
Open Veezoo Studio and open the file for each class you want to schedule the sync.
Inside each class you want to sync, add a sync_policy:
If you want to keep changes you may have done to the entities (e.g. synonyms, renaming), add:
sync_policy: "merge"
If you want to always replace the entities with the new ones:
sync_policy: "replace"
kb {
class Product {
name.en: "Product"
from_table: product
sql: "${product.product_id}"
name_sql.en: "${product.product_name}"
extends: onto.Product
sync_policy: "replace"
class Category {
name.en: "Category"
sync_policy: "merge"
sql: "${product.category}"
}
class Sub_Category {
name.en: "Sub Category"
sql: "${product.sub_category}"
}
}
}
- If you want a class to have a
name_sql
that is shown in the queries, but without syncing them, because they contain too many values:sync_policy: "ignore"
.
kb {
// There may be millions of customers in your database and these should not be indexed in Veezoo (we also have a hard-limit here)
// but you don't want customers to be displayed with just their id, but rather with "first_name last_name"
// To achieve this, set sync_policy: "ignore"
class Customer {
name.en: "Customer"
from_table: customer
sql: "${customer.id}"
name_sql.en: "${customer.first_name} || ' ' || ${customer.last_name}"
sync_policy: "ignore"
...
}
}
Save and go to the
Synchronization
view again for your Knowledge Graph in Veezoo Studio.Follow the steps in the previous section
Scheduled Sync
, but choose to syncAll concepts with a sync policy
. If you only setsync_policy: "ignore"
, you don't need to do this.