Ticker

6/recent/ticker-posts

Mount ADLS with Databricks| SAS token-Datacloudy


 In this blog we are going to see about how to mount the ADLS with Databricks using SAS token. Hope we know that ADLS stands for Azure Data Lake Storage. In simple words consider mounting to be syncing the ADLS with Databricks, so that we can access files in ADLS from databricks. There are many ways to do this. One of the method  is SAS token method, the another simple method is using the access key, to know about that method click here.

Before going to learn the steps, let us quickly discuss about SAS token. SAS stands for Shared Access Signature. A Shared Access Signature (SAS) token is a security mechanism used in Azure Active Directory (AD) to grant temporary access to resources in Azure Data Lake Storage (ADLS) accounts. A SAS token is a unique string that contains information about the access level, expiry time, and resource name.

SAS tokens provide a secure way to grant temporary access to resources in ADLS without revealing the account key or user credentials. This makes them ideal for scenarios where you need to grant temporary access to a resource, such as when sharing data with a third-party application or when providing access to a specific user or group for a limited time.

That is here , we generate the SAS token by setting up the correct permission and the availability or the life of the SAS token in the ADLS. And use that generated SAS token to access the ADLS files. That is, the token will grant the user or application access to the resource for the specified duration, after which it will expire and access will be revoked.



                         




So let us see the steps.

Step 1: The first thing you have to do it to get the SAS token from the ADLS gen2 account. So go to your ADLS gen2 storage account and click on the Shared Access Signature menu on the left pane.


Step2:  Check the below check boxes in the allowed resource types:

        i) Service

        ii) Container

        iii) Object

Step 3: Give all the permission that are need by clicking the check box, the available options are,

        i) Read

        ii) Write

        iii)Delete

        iv) List

        v) Add

        vi) Create

        vii) Update

        viii)Process

        ix) immutable storage

        x) Permanent Delete


Step 4: Set the start and end time of the token expiry duration.


Step 5 : press the Generate SAS and connection string button. And copy the SAS token.


Step 6: Now go to your Notebook. Now there are two ways to do it lets see one by one


    Type 1:

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")

spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")

spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")    


In the above code replace the <storage-account> with your storage account name and paste the copied SAS token in place of <token>


Now the storage account is authorized by SAS token by the configuration of the spark cluster. Now to access the files in the ADLS follow the below step

 df=spark.read.scv('abfs://<container_name>@<storage_name>.dfs.core.windows.net/path/file.csv)        


Replace <container_name> with your container name and <storage_name> with the storage account name. Mention the respective path and the file name to access it.


        Type 2:  This is the another type where you can use SAS token. 


dbutils.fs.mount( 

source=f"wasbs://{conatiner_name}@{storage_name}.blob.core.windows.net", 

mount_point=f"/mnt/{mount_point}", 

extra_configs={f"fs.azure.sas.<container_name>.{storage_name}.blob.core.windows.net:{SAS_token}"} 

)        

Replace container_name with the name of the container, storage_name with the storage account name, mount_point with the path where it needs to be mounted and SAS_token with the copied generated SAS token. 

Now the container of the storage account is mounted, now let us see how we can access the file


    df=spark.read.csv("/mnt/path/file.csv")    


So, we can use the mount path to access the file as shown above.


Thus we saw how to mount the ADLS with Databricks using SAS token in step by step manner and the 2 different approach to do it. Hope this information is vey helpful. You don't need to memorize all these things , just remember the approach and safe this for future reference.


Thank You !!!


Post a Comment

0 Comments

Ad Code