[ad_1]
Welcome to our new article! 👋 We’ll exhibit rapidly and effectively combine dbt with GoodData utilizing a collection of Python scripts. Within the earlier article, How To Construct a Trendy Knowledge Pipeline, we supplied a information on construct a strong information pipeline that solves typical issues that analytics engineers face. Alternatively, this new article describes extra in-depth integration with dbt as a result of as we wrote within the article GoodData and dbt Metrics, we predict that dbt metrics are good for easy use circumstances however for superior analytics, you want a extra strong instrument like GoodData.
Even though our answer is tightly coupled with GoodData, we need to present a common information on combine with dbt! Let’s begin 🚀.
Very first thing first — why would you need to combine with dbt? Earlier than you begin to write your personal code, it’s a good strategy to do analysis of current dbt plugins first. It’s a recognized indisputable fact that the dbt has a really sturdy neighborhood with lots of information professionals. In case your use case is just not very unique or proprietary to your answer, I’d guess that there already exists an analogous plugin.
One instance is price a thousand phrases. Few months in the past, we have been creating our first prototype with dbt and jumped into an issue with referential integrity constraints. We had principally two choices:
- Write a customized code to resolve the issue.
- Discover a plugin that will resolve the issue.
Happily, we discovered a plugin dbt Constraints Bundle after which the answer was fairly easy:
Lesson discovered: Seek for an current answer first, earlier than writing any code. For those who nonetheless need to combine dbt, let’s transfer to the subsequent part.
Implementation: How To Combine With dbt?
Within the following sections, we cowl a very powerful facets of integration with dbt. If you wish to discover the entire implementation, try the repository.
Setup
Earlier than we begin writing customized code, we have to do some setup. First essential step is to create a profile file:
It’s principally a configuration file with the database connection particulars. Attention-grabbing factor right here is the partition between dev and prod. For those who discover the repository, you will discover that there’s a CI/CD pipeline (described in How To Construct a Trendy Knowledge Pipeline). The dev and prod environments be sure that each stage within the pipeline is executed with the precise database.
The subsequent step is to create a regular python bundle. It permits us to run the proprietary code inside the dbt surroundings.
The entire dbt-gooddata bundle is in GitLab. Throughout the bundle, we will then run instructions like:
Transformation
Transformation was essential for our use case. The output of dbt are materialized tables within the so-called output stage schema. The output stage schema is the purpose the place GoodData connects however with a purpose to efficiently begin to create analytics (metrics, stories, dashboards), we have to do a number of issues first, like connect with information supply (output stage schema), or – what’s the most fascinating half — convert dbt metrics to GoodData metrics.
Let’s begin with the fundamentals. In GoodData, we now have an idea known as the Bodily Knowledge Mannequin (PDM) that describes the tables of your database and represents how the precise information is organized and saved within the database. Primarily based on the PDM, we additionally create a Logical Knowledge Mannequin (LDM) which is an summary view of your information in GoodData. The LDM is a set of logical objects and their relationships that signify the information objects and their relationships in your database by means of the PDM.
If we use extra easy phrases that are widespread in our trade — PDM is tightly coupled with a database, LDM is tightly coupled with analytics (GoodData). Nearly the whole lot you do in GoodData (metrics, stories) is predicated on the LDM. Why can we use the LDM idea? Think about you modify one thing in your database, for instance, the identify of a column. If GoodData didn’t have the extra LDM layer, you would wish to alter the column identify in each place (each metric and each report, and so on.). With LDM, you solely change one property of the LDM, and the adjustments are robotically propagated all through your analytics. There are different advantages too, however we is not going to cowl them right here — you’ll be able to examine them in the documentation.
We lined a little bit principle, let’s examine the extra fascinating half. How can we create PDM, LDM, Metrics, and so on. from dbt generated output stage schemas? To begin with, a schema description is the last word supply of reality for us:
You may see that we use dbt normal issues like date_type however we additionally launched metadata that helps us with changing issues from dbt to GoodData. For the metadata, we created information courses that information us in utility code:
The info courses can be utilized in strategies the place we create LDM objects (for instance, date datasets):
You may see that we work with metadata which helps us to transform issues appropriately. We use the outcome from the strategy make_date_datasets, along with different outcomes, to create a LDM in GoodData by means of its API, or extra exactly with the assistance of GoodData Python SDK:
For many who wish to additionally discover how we convert dbt metrics to GoodData metrics, you’ll be able to examine the entire implementation.
Large Image
We perceive that the earlier chapter might be overwhelming. Earlier than the demonstration, let’s simply use one picture to indicate the way it works for higher understanding.
Demonstration: Generate Analytics From dbt
For the demonstration, we skip the extract half and begin with transformation, which signifies that we have to run dbt:
The result’s output stage schema with the next construction:
Now, we have to get this output to GoodData to begin analyzing information. Usually, you would wish to do a number of guide steps both within the UI or utilizing API / GoodData Python SDK. Due to integration described within the implementation part, just one command must be run:
Listed below are the logs from the profitable run:
The ultimate result’s a efficiently created Logical Knowledge Mannequin (LDM) in GoodData:
The final step is to deploy dbt metrics to GoodData metrics. The command is just like the earlier one:
Listed below are the logs from the profitable run:
Now, we will examine how the dbt metric was transformed to a GoodData metric:
Crucial factor is you can now use the generated dbt metrics and construct extra advanced metrics in GoodData. You may then construct stories and dashboards and, as soon as you’re pleased with the outcome, you’ll be able to retailer the entire declarative analytics utilizing one command and model in git:
For these of you who like automation, you’ll be able to take inspiration from our article the place we describe automate information analytics utilizing CI/CD.
What Subsequent?
The article describes our strategy to integration with dbt. It’s our very first prototype and with a purpose to productize it, we would wish to finalize a number of issues after which publish the combination as a stand alone plugin. We hope that the article can function an inspiration in your firm, in case you determine to combine with dbt. For those who take one other strategy, we might love to listen to that! Thanks for studying!
If you wish to attempt it by yourself, you’ll be able to register for the GoodData trial and play with it by yourself.
[ad_2]