Delta Live Table with example|Databricks steps- Datacloudy

In this blog we are going to see about a very new concept called Delta Live Table in Databricks. We are also going to see how to create the pipeline as well in step by step manner in crisp and clear way along with example. Hope you are exited, let us get into the topic.

Delta Live Table:

It is a framework for building reliable, maintainable and testable data processing pipelines. It performs and manages task, Orchestration, cluster management, monitoring, data quality and error handling.

Now, let us see the step by step process we have to do in Databricks.

Here we have to follow the Multi hop architecture as Delta live table is a framework.

Step 1:

Declare Bronze Table:

Bronze table only holds the raw data. Here we are going to see an example where we are creating 2 bronze table.

CREATE OR REFRESH STREAMING LIVE TABLE bronze_table_1_name
AS SELECT * FROM
cloud_files("path",json",map("cloudFiles.schemaHints,time DOUBLE"))

In the first bronze table we are considering in this example as loading the files in incremental fashion.
That can be achieved by mentioning as cloud_files . In this example we consider the files are in json format, so we mentioned in the query.

As an example we are going to create the second table as mentioned before,

CREATE OR REFRESH STREAMING LIVE TABLE bronze_table_2_name
AS SELECT * FROM
cloud_files("path",csv",map("cloudFiles.inferColumnTypes,true"))

here we consider the files to be csv format.

Step 2:

Declare Silver Table,

Remember, here we can give CONSTRAINT, EXPECT and VIOLATION. These are the key points. Silver Table contains the cleaned data from the raw data. The source for Silver zone is Bronze zone.

CREATE OR REFRESH STREAMING LIVE TABLE silver_table_name
(
CONSTRAINT column_name EXPECT(col_name>0) ON VIOLATION Drop Row
)
AS SELECT * FROM STREAM(live.bronze_table_1_name) a
INNER JOIN STREAM(live.bronze_table_2_name) b on a.key_col=b.key_col

From this example we can understand that we can give constraint and the expected value via EXPECT keyword. And on violation of the expect rule we gave Drop Row, which means that particular record which is violating the rule will be dropped.

The types of Violation rule we can provide are,

FAIL UPDATE :

Pipeline will be failed when constraint is violated
DROP ROW:

Discard the records that violates the constraint.
Omitted:

This means when we doesn't gave any keyword, that is in default, the records violating constraints will be included but violation will be reported in metrics.

Step 3:

Declare Gold Table:

Gold tables consists of Aggregated values and the statistics concluded here will be the downstream to Presentation layer, that is it is given to visualization tools like Power BI, Tableau etc.

CREATE OR REFRESH STREAMING LIVE TABLE gold_table_name

AS SELECT col1,col2,sum(col3)

FROM STREAM(live.silver_table_name)

GROUP BY col1,col2

Step 4 :

Follow the below steps in Databricks,

Go to Jobs section
Click the Delta Live Table tab at the top of the screen
Click Create Delta Live table
And there you can add the notebook that we are created in the above steps.

Thus, in this blog we saw about Delta Live Tables, How to create the pipeline in step by step manner and also we saw about the types of violation rule we can provide as well. Hope this content will be helpful and gave a clear cut idea about Delta Live Tables.

Thank You!!!

Ticker

Delta Live Table with example|Databricks steps- Datacloudy

Delta Live Table:

Declare Bronze Table:

Declare Silver Table,

Declare Gold Table:

Post a Comment

0 Comments

Followers

Search This Blog

About Me

Labels

Popular Posts

Mount ADLS to Databricks | Steps - Datacloudy

Query Planning - Spark sql Engine -Databricks Certification

Shallow and Deep Copy in Python | Example - Datacloudy

Footer Menu Widget

Ticker

Delta Live Table with example|Databricks steps- Datacloudy

Delta Live Table:

Declare Bronze Table:

Declare Silver Table,

Declare Gold Table:

Post a Comment

0 Comments

Social Plugin

Followers

Search This Blog

Ad Code

About Me

Labels

Popular Posts

Mount ADLS to Databricks | Steps - Datacloudy

Query Planning - Spark sql Engine -Databricks Certification

Shallow and Deep Copy in Python | Example - Datacloudy

Footer Menu Widget