Ticker

6/recent/ticker-posts

Databricks Certified Data Engineer Associate|Dumps-Datacloudy

 


In this blog we are going to see about "Databricks Certified Data Engineer Associate" exam details, tips, the key point needs to prepared and crisp explanation. Please go through each and every topics mentioned in this blog and also the link provided in some of the topics. This will help you for sure. 

Certification Details in short:

There will be almost 45 questions and that will be of multiple-choice on the certification exam. The below questions distribution will give the overview of the exam.

Databricks Lakehouse Platform – 24% (11/45)

ELT with Spark SQL and Python – 29% (13/45)

Incremental Data Processing – 22% (10/45)

Production Pipelines – 16% (7/45)

Data Governance – 9% (4/45)


You need to focus on below points that are important topics will surely come in exam. I have given some crisp details for some of the points.

Please Focus on some topics for the Exam:

1. What is DESCRIBE DETAIL:

    It gives the details of file location, number of files , size in bytes etc

    DESCRIBE DETAIL Table_name



2.  Rollback:

    Even after deleting the table we can restore it.

    RESTORE TABLE  table_name TO VERSION AS OF 8

    In the above example it will fetch back the table which is of version 8 



3.Vacuum:

    Used to delete the many old files as it is difficult to maintain all the files in Production. But we can give only more than 7 days , That is 168 hours. If we give less than that it wont work. In that case we can enable it by  below steps.

SET spark.databricks.delta.retentionDurationCheck.enabled=false;

SET spark.databricks.delta.vacuum.logging.enabled=true

VACUUM Table_name RETAIN 0 HOURS DRY RUN

    0 hours will only retain current version.



4.View :

    This going to be an important topic and there will be at least 1 question from views.
We are going to concentrate in Temporary view and Global Temporary view as well.

  click here to read about the crisp and clear concept of the views



5.Cloning: 

    This is also an important topic and we can expect questions on this topic.
Cloning is nothing but getting the similar table as of another table.
The two types are DEEP CLONE and SHALLOW CLONE.

    * Deep Clone:
Create or replace table table_name
DEEP CLONE table_name2

Deep clone will fully copies data and metadata.

    *Shallow Clone:

Shallow clone just copies delta transaction logs and data is not moved.



6. Writing to table:

    * Overwrite:

Create or replace table events AS
select * from parquet.'path'

 This will overwrites the existing table and accepts schema change, old version can be returned.

    * Insert Overwrite :

INSERT OVERWRITE tablename
Select * from parquet.'path'

It is same as overwrite but schema change is not allowed. 



7. Count_if:

    Based on count_if there might be a question.

Select count_if(user_id is NULL) AS a from table

the above query count the number of user_id which is NULL. 



8. Filter in Json:

Select id,FILTER (items, i -> i.item_id like '%k') from t

don't get confuse with above query, it states that in each item of json it checks for 'k' like word in the 'item_id'  key/column. Simple as 'like' command in 'where' clause



9. Transform in Json:

    Useful when we want to apply an existing function to each element.



10. Incremental load using Auto Loader and Structured Streaming:

    click here to get detail of the above topic



11. Multi-Hop Architecture:

    click here to get detail of the above topic


12. Delta Live table:

    It is a framework for building reliable, maintainable and testable data processing pipeline.
And it is an important topic for this certification exam.

    click here to get detail of the above topic



13. Data Governance Overview:

    i) Data Access Control
    ii) Data Access Audit
    iii) Data Lineage
    iv) Data Discovery



14. Unity Catalog:

    It is common across multiple cloud infrastructure to set permissions of various resources among them.


15. Permissions:

    learn about the permissions, there may be questions on this topic, 
    check the below example at a glance.

GRANT usage,create ON CATALOG 'hive-metastore' TO 'users';
SHOW Grant ON CATALOG 'hive-metastore'


In addition to this, Once you prepared for the examination, i will recommend  to check this to view some key points of  Databricks Certified Associate Developer for Apache Spark 3 Exam Preparation as well, this will give some clear point of spark.

Thus in this blog we looked for the key points or the thumb rule topics needs to be prepared for the "Databricks Certified Data Engineer Associate Exam". Hope this helps to clear the exam.


Thank you!!!


 






Post a Comment

0 Comments

Ad Code