Skip to main content

Configure a Knowledge Graph from Scratch

Overview

This tutorial will guide you through the necessary steps to configure a Knowledge Graph from scratch to provide a great self-service analytics experience.

Step 0: Preparation

First, you will have to select which data you will start with and which questions you want to answer.

Always start small, prioritizing the most important data and questions. You can always add more later!

We recommend starting with a single fact table (e.g. Orders / Order Items) and the related dimension tables (e.g. Customers, Products).

Create a list of 10 questions you want to be able to answer in Veezoo and make sure you have the data to answer them.

Once you have this mapped out, create a new Knowledge Graph over Studio, select the tables and columns needed (and only those needed!) and let Veezoo create a first Knowledge Graph for you.

Remember to ignore columns that are "technical" in nature, e.g. a timestamp column Updated_At may not really matter for the business user, but Created_At may be important.

Step 1: Naming

After you have selected the tables and columns and Veezoo created the first version of the Knowledge Graph, open the Knowledge Graph in the sidebar and click to see it visually.

The first thing we will want to address is the naming of the concepts in the Knowledge Graph.

We want to make sure we don't have any "technical names", e.g. 'Fact Orders', 'Dim Customers', 'Order Data', 'Created At', 'Sales Product', etc.

Instead, you want to "think semantically". What do these things mean? What do they map to in the business domain?

Example: instead of 'Fact Orders', call it 'Orders'. Instead of 'Created At', call it 'Creation Date' or 'Order Date' (if on the order).

Instead of this:
Don't
Do this:
Do

Veezoo may provide you with suggestions that you can simply click to accept. But some things will require your domain knowledge.

So, go over the nodes in the Knowledge Graph (or over the sidebar) and rename them for your business users.

Step 2: Defining Relationships

The next step is to check the relationships between the nodes. The best way to do this is again by inspecting the Knowledge Graph visually.

If your Knowledge Graph has many disconnected "clusters", you will very likely have to define relationships for it.

Remember that relationship needs in the sql the foreign key column to the other table, so you should always use db (database) resources in the sql instead of kb ones. Example: use sql: "${orders.customer_id}" or sql: "${db.database.schema.table.column}" instead of sql: "${kb.Customer}"

Step 3: Verifying Types

Common things to verify here are that attributes that are rather free-text are defined as string instead of class.

Examples would be:

  • Name, Full Name, Last Name, First Name
  • Descriptions, Comments, URLs, Email
  • Certain IDs, e.g. if you have a table for Customers, you should model Customer as a class, but you may still want the Customer ID as a string attribute

Other common cases:

  1. Verify that booleans are actually modelled as boolean and not as class with values like 'Yes' / 'No'.
  2. Verify that dates are modelled as date. This may require a transformation/casting in the sql.
  3. Sometimes columns that represent years are imported as integer, but we recommend modelling it as date with a datetime_format set to YearFormat instead.

If you have a number/integer that should never be summed up, because it acts more like a categorical attribute, you may consider modelling it as a class instead or adding at least a tag: KB_NotSummable. One example would be things like 'Zip Code' or certain (personal) identification numbers.

For other numbers that may make no sense to sum up, but it would be ok to average it, consider adding default_aggregation: average.

Step 4: Removing Ambiguity

The Knowledge Graph works best if you remove ambiguity, which is why we prefer working with normalized/dimensional modelling.

Look at your Knowledge Graph and see if you have attributes that appear repeated in different tables, but mean actually the same thing.

Example:

You have a dataset with Orders, Customers and Products. But in the Orders table, you see again the Customer Name, Customer Age, Customer Country, apart from the Customer ID. In this case, you only need the relationship between Orders and Customers, using the Customer ID foreign key. You can manually remove the redundant attributes over Studio (or set them to hidden: true).

Sometimes though, you may not have a dedicated table with the values, but still they appear in multiple tables. Example: You have Countries in different tables (e.g. Customer, Orders, Shipping), but no dim_countries table. Or you have Status values that reappear consistently in different tables.

You can still normalize it in your Knowledge Graph. Just follow the steps here: Normalizing the KG.

Step 5: Adding Synonyms

In the knowledge graph, we can define synonyms to aid Veezoo in understanding user questions involving business-specific terms. For instance, suppose we have a measure called Cost per Revenue. If a user asks 'What was our CPR last week,' Veezoo may not recognize what is meant by CPR. To address this, we can define CPR as a synonym for Cost per Revenue. As a result, Veezoo will correctly interpret questions about CPR, while continuing to refer to them as Cost per Revenue.

Step 6: Defining Defaults

Default Date

If a concept has multiple timestamps associated with it, questions about points in time may be ambiguous. For instance, an Order can have an Order Date, a Shipping Date, and a Payment Date. In this case, when Veezoo is asked Show me Orders last month, it might not be clear which is meant. To clarify this, we can define a default date for a concept as follows.

Default Filters

Step 7: Defining Measures and new Dimensions

Step 8: Improving Explainability

Adding Descriptions

Adding descriptions helps with Data Literacy, as business users get presented with definitions in multiple places:

  1. Sidebar
  2. Knowledge Graph Visual Mode
  3. AutoComplete when typing
  4. Answers (e.g. title and filters)

Check how to add descriptions here: description

Adding Display Names

Especially for relationships, it helps to add a custom display_name that explains what the relationship is about.

Example: A relationship between Customer and Country could mean multiple things like 'lives in' or 'born in'.

Check display_name, relationships and to_name to learn more.

Step 9: Improving How Answers are Displayed

Displaying only relevant fields

In Veezoo Studio, you have the flexibility to define which attributes of a concept to be displayed by default. This is particularly useful when dealing with concepts that have numerous attributes, some of which being more relevant than others. For instance, when asking about Customers, you might prefer to display attributes like name and address by default and hide less relevant details such as customer id.

Displaying on Maps

Veezoo can visualize data associated with locations on a map. Here is how you configure it:

Integrating direct links from Veezoo answers to your CRM or other external applications simplifies the workflow of users. See below how this feature can be configured in Veezoo Studio.

Step 10: Improving On-Boarding Experience

Creating and Sharing Boards

Setting Up a Discovery Page

Customizing the Colors of Veezoo

If you want Veezoo to visually better match the style of your company, you can easily adjust the look by changing the colour scheme or replacing the Veezoo logo with your own.

Maintenance & Observability

Identifying Issues with MetaVeezoo

Meta Veezoo allows administrators to observe and investigate the usage of Veezoo by business users. For instance, it allows inspecting commonly asked questions, detecting recurring challenges, and identifying missing configuration.

Development Branches

It's essential to maintain a continuous workflow for your business users even while you are actively enhancing your models or integrating new functionalities. To facilitate this, Veezoo enables the creation of development branches for your knowledge graph. This allows creators to independently work on the knowledge graph, without affecting the live version currently in use. Additionally, Veezoo has testing capabilities to ensure that your changes do not disrupt any existing boards or answers.