Configure a Knowledge Graph from Scratch
Overview
This tutorial will guide you through the necessary steps to configure a Knowledge Graph from scratch to provide a great self-service analytics experience.
Step 0: Preparation
First, you will have to select which data you will start with and which questions you want to answer.
Always start small, prioritizing the most important data and questions. You can always add more later!
We recommend starting with a single fact table (e.g. Orders / Order Items) and the related dimension tables (e.g. Customers, Products).
Create a list of 10 questions you want to be able to answer in Veezoo and make sure you have the data to answer them.
Once you have this mapped out, create a new Knowledge Graph over Studio, select the tables and columns needed (and only those needed!) and let Veezoo create a first Knowledge Graph for you.
Remember to ignore columns that are "technical" in nature, e.g. a timestamp column Updated_At
may not really matter for the business user, but Created_At
may be important.
Step 1: Naming
After you have selected the tables and columns and Veezoo created the first version of the Knowledge Graph, open the Knowledge Graph
in the sidebar and click to see it visually.
The first thing we will want to address is the naming of the concepts in the Knowledge Graph.
We want to make sure we don't have any "technical names", e.g. 'Fact Orders', 'Dim Customers', 'Order Data', 'Created At', 'Sales Product', etc.
Instead, you want to "think semantically". What do these things mean? What do they map to in the business domain?
Example: instead of 'Fact Orders', call it 'Orders'. Instead of 'Created At', call it 'Creation Date' or 'Order Date' (if on the order).
Veezoo may provide you with suggestions that you can simply click to accept. But some things will require your domain knowledge.
So, go over the nodes in the Knowledge Graph (or over the sidebar) and rename them for your business users.
Step 2: Defining Relationships
The next step is to check the relationships between the nodes. The best way to do this is again by inspecting the Knowledge Graph visually.
If your Knowledge Graph has many disconnected "clusters", you will very likely have to define relationships for it.
Remember that relationship
needs in the sql
the foreign key column to the other table, so you should always use db
(database) resources in the sql
instead of kb
ones. Example: use sql: "${orders.customer_id}"
or sql: "${db.database.schema.table.column}"
instead of sql: "${kb.Customer}"
Step 3: Verifying Types
Common things to verify here are that attributes that are rather free-text are defined as string
instead of class
.
Examples would be:
- Name, Full Name, Last Name, First Name
- Descriptions, Comments, URLs, Email
- Certain IDs, e.g. if you have a table for Customers, you should model Customer as a class, but you may still want the Customer ID as a string attribute
Other common cases:
- Verify that booleans are actually modelled as
boolean
and not asclass
with values like 'Yes' / 'No'. - Verify that dates are modelled as
date
. This may require a transformation/casting in thesql
. - Sometimes columns that represent years are imported as
integer
, but we recommend modelling it asdate
with adatetime_format
set toYearFormat
instead.
If you have a number/integer that should never be summed up, because it acts more like a categorical attribute, you may consider modelling it as a class
instead or adding at least a tag: KB_NotSummable
. One example would be things like 'Zip Code' or certain (personal) identification numbers.
For other numbers that may make no sense to sum up, but it would be ok to average it, consider adding default_aggregation: average
.
Step 4: Removing Ambiguity
The Knowledge Graph works best if you remove ambiguity, which is why we prefer working with normalized/dimensional modelling.
Look at your Knowledge Graph and see if you have attributes that appear repeated in different tables, but mean actually the same thing.
Example:
You have a dataset with Orders, Customers and Products. But in the Orders table, you see again the Customer Name, Customer Age, Customer Country, apart from the Customer ID. In this case, you only need the relationship between Orders and Customers, using the Customer ID foreign key. You can manually remove the redundant attributes over Studio (or set them to hidden: true
).
Sometimes though, you may not have a dedicated table with the values, but still they appear in multiple tables. Example: You have Countries in different tables (e.g. Customer, Orders, Shipping), but no dim_countries
table. Or you have Status values that reappear consistently in different tables.
You can still normalize it in your Knowledge Graph. Just follow the steps here: Normalizing the KG.
Step 5: Adding Synonyms
In the knowledge graph, we can define synonyms to aid Veezoo in understanding user questions involving business-specific terms. For instance, suppose we have a measure called Cost per Revenue
. If a user asks 'What was our CPR last week,' Veezoo may not recognize what is meant by CPR. To address this, we can define CPR as a synonym for Cost per Revenue. As a result, Veezoo will correctly interpret questions about CPR, while continuing to refer to them as Cost per Revenue.
Step 6: Defining Defaults
Default Date
If a concept has multiple timestamps associated with it, questions about points in time may be ambiguous. For instance, an Order can have an Order Date, a Shipping Date, and a Payment Date. In this case, when Veezoo is asked Show me Orders last month, it might not be clear which is meant. To clarify this, we can define a default date for a concept as follows.
Default Filters
Step 7: Defining Measures and new Dimensions
Step 8: Improving Explainability
Adding Descriptions
Adding descriptions helps with Data Literacy, as business users get presented with definitions in multiple places:
- Sidebar
- Knowledge Graph Visual Mode
- AutoComplete when typing
- Answers (e.g. title and filters)
Check how to add descriptions here: description
Adding Display Names
Especially for relationships, it helps to add a custom display_name
that explains what the relationship is about.
Example: A relationship between Customer and Country could mean multiple things like 'lives in' or 'born in'.
Check display_name, relationships and to_name to learn more.
Step 9: Improving How Answers are Displayed
Displaying only relevant fields
In Veezoo Studio, you have the flexibility to define which attributes of a concept to be displayed by default. This is particularly useful when dealing with concepts that have numerous attributes, some of which being more relevant than others. For instance, when asking about Customers, you might prefer to display attributes like name and address by default and hide less relevant details such as customer id.
Displaying on Maps
Veezoo can visualize data associated with locations on a map. Here is how you configure it:
Displaying Links
Integrating direct links from Veezoo answers to your CRM or other external applications simplifies the workflow of users. See below how this feature can be configured in Veezoo Studio.
Step 10: Improving On-Boarding Experience
Creating and Sharing Boards
Setting Up a Discovery Page
Customizing the Colors of Veezoo
If you want Veezoo to visually better match the style of your company, you can easily adjust the look by changing the colour scheme or replacing the Veezoo logo with your own.
Maintenance & Observability
Identifying Issues with MetaVeezoo
Meta Veezoo allows administrators to observe and investigate the usage of Veezoo by business users. For instance, it allows inspecting commonly asked questions, detecting recurring challenges, and identifying missing configuration.
Development Branches
It's essential to maintain a continuous workflow for your business users even while you are actively enhancing your models or integrating new functionalities. To facilitate this, Veezoo enables the creation of development branches for your knowledge graph. This allows creators to independently work on the knowledge graph, without affecting the live version currently in use. Additionally, Veezoo has testing capabilities to ensure that your changes do not disrupt any existing boards or answers.