Skip to main content

Knowledge Graph Sync

Why keep your Knowledge Graph synchronized?

The Knowledge Graph is a semantic representation of your data and, because of that, it needs to know the values (entities) in your data.

When we talk about synchronizing the Knowledge Graph, we mean keeping these database values up-to-date, not the database schema.

What does "synchronization" mean?

To understand the importance of synchronization, let's consider an example. You have a table in your database, where each row represents an order. The existing data might look something like this:

order_idcustomer_idcustomer_nameproduct_idproduct_nameproduct_descriptioncategorypriceorder_date
11John Doe2JeansComfortable blue denim jeans that are perfect for everyday wearClothing502023-01-01
22Jane Smith1T-shirtSimple white cotton t-shirt that pairs well with any outfitClothing202023-01-02
31John Doe3WatchDurable stainless steel watch with a modern designAccessories1502023-01-03

Let's say your data gets updated, and a new row is added where a new product (e.g. a "Dress") is sold in a new category (e.g. "Formal Wear").

order_idcustomer_idcustomer_nameproduct_idproduct_nameproduct_descriptioncategorypriceorder_date
43Bob Brown4DressElegant red silk evening gown with beautiful beading and a dramatic silhouette. Ideal for formal occasions.Formal Wear1002023-01-04

Without syncing, Veezoo will not understand if you mention the word 'Dress' or 'Formal Wear' in a question.

You would want to synchronize your Knowledge Graph so that it is aware of the new product "Dress" and the new category "Formal Wear". This is necessary to allow Veezoo to understand questions related to these new entities, e.g. "How many orders of Dresses were there this year?"

In contrast, if another row is added where an existing customer buys an existing product, like so:

order_idcustomer_idcustomer_nameproduct_idproduct_nameproduct_descriptioncategorypriceorder_date
51John Doe1T-shirtSimple white cotton t-shirt that pairs well with any outfitClothing202023-01-04

In this case, there are no new entities that the Knowledge Graph needs to be aware of. So even if a sync did not run since the new order came in, Veezoo will still understand a question about 'T-shirt' or 'Clothing'.

In other words, the Knowledge Graph is used to understand a question, not to answer it. The actual answers to your questions are computed based on the most recent state of your database, using SQL.

The Knowledge Graph will contain the individual distinct values of a database column (e.g. your products), but not all the rows in your database (e.g. all the transactions from these products and the revenue, etc).

What should you sync?

Due to the purpose of sync as described above, you can only sync classes in Veezoo. But not all classes make sense to sync.

Following the example above, here is what would make sense to sync:

  1. Products: If new products are being added, you'd want to sync the Product class to reflect these changes. This way, when you ask Veezoo "How many orders of Dresses did we sell last month?" it will be able to understand and answer the question correctly, even for newly added products.

  2. Categories: If new product categories are being introduced, the Category class should be synced. This will allow Veezoo to understand questions like "How much revenue did we generate from the Clothing category last year?"

  3. (Potentially) Customers: If you're interested in asking questions related to specific customers, then syncing the Customer class can be useful. This allows Veezoo to understand queries like "What did John Doe purchase last month?". If there are millions of customers though, Veezoo does not support syncing and you may want to ask using string or the onto.ID pattern.

Here is what you CANNOT sync or MAY NOT want to sync:

  1. Price: Since it is not a class, you cannot sync Price. Questions like "What is the total revenue from T-shirts last month?" don't require the Knowledge Graph to know the individual prices. The calculation of revenue happens at the level of the database, not at the level of the Knowledge Graph.

  2. Product Description: The description column is not necessary to sync because it typically doesn't contribute to the understanding of a question. The Knowledge Graph is used to understand questions and map them to the appropriate entities. The description of a product wouldn't typically be part of a question posed to Veezoo. The best practice is also to model Description as a string and not a class.

  3. Order Date: The order_date column cannot by synced because it is not a class and doesn't play a role in understanding the questions.

  4. Order ID: The Orders class is usually not synced, because often users won't care about asking questions about specific Order IDs. Even if they want to do so, you should follow the onto.ID pattern instead.

How do I synchronize it?

Manual Sync

By default, Veezoo does not automatically schedule to sync the Knowledge Graph.

So the most common way to sync it is by:

  1. Open in Veezoo Studio the file for the class you want to sync (under knowledge-base/classes/...). You can get there quickly by going to the Knowledge Graph sidebar, hovering over the class and clicking to "Open in Studio" in the info panel.

Open in Veezoo Studio

  1. You will notice on the top bar of the file a sync icon.

Sync Button

  1. Click on it and select the classes you want to sync.

Choose Classes

  1. Wait... and that's it.

Success

Scheduled Sync

If there are always new individual values being added, you may want to schedule your sync to run with a certain frequency.

  1. To do this, go to the Synchronization view in Studio.

Synchronization

  1. Click on the + SCHEDULE button.

  2. Select when the sync should start and how frequent (once, every hour, every day, every week).

Start and Frequency

  1. Now you can choose which concepts should be synced or whether you want all concepts with an explicit sync policy set to be synced (see next section).

Concepts

Scheduled Sync with Sync Policy

To do that, you will need to:

  1. Open Veezoo Studio and open the file for each class you want to schedule the sync.

  2. Inside each class you want to sync, add a sync_policy:

    • If you want to keep changes you may have done to the entities (e.g. synonyms, renaming), add: sync_policy: "merge"

    • If you want to always replace the entities with the new ones: sync_policy: "replace"


kb {
class Product {
name.en: "Product"

from_table: product
sql: "${product.product_id}"
name_sql.en: "${product.product_name}"

extends: onto.Product

sync_policy: "replace"

class Category {
name.en: "Category"

sync_policy: "merge"

sql: "${product.category}"
}

class Sub_Category {
name.en: "Sub Category"

sql: "${product.sub_category}"
}
}
}

  • If you want a class to have a name_sql that is shown in the queries, but without syncing them, because they contain too many values: sync_policy: "ignore".

kb {

// There may be millions of customers in your database and these should not be indexed in Veezoo (we also have a hard-limit here)
// but you don't want customers to be displayed with just their id, but rather with "first_name last_name"
// To achieve this, set sync_policy: "ignore"
class Customer {
name.en: "Customer"

from_table: customer
sql: "${customer.id}"
name_sql.en: "${customer.first_name} || ' ' || ${customer.last_name}"

sync_policy: "ignore"

...
}
}

  1. Save and go to the Synchronization view again for your Knowledge Graph in Veezoo Studio.

  2. Follow the steps in the previous section Scheduled Sync, but choose to sync All concepts with a sync policy. If you only set sync_policy: "ignore", you don't need to do this.

All concepts with a sync policy