In this blog we are going to see about how to mount the ADLS with Databricks using SAS token. Hope we know that ADLS stands for Azure Data Lake Storage. In simple words consider mounting to be syncing the ADLS with Databricks, so that we can access files in ADLS from databricks. There are many ways to do this. One of the method is SAS token method, the another simple method is using the access key, to know about that method click here.
Before going to learn the steps, let us quickly discuss about SAS token. SAS stands for Shared Access Signature. A Shared Access Signature (SAS) token is a security mechanism used in Azure Active Directory (AD) to grant temporary access to resources in Azure Data Lake Storage (ADLS) accounts. A SAS token is a unique string that contains information about the access level, expiry time, and resource name.
SAS tokens provide a secure way to grant temporary access to resources in ADLS without revealing the account key or user credentials. This makes them ideal for scenarios where you need to grant temporary access to a resource, such as when sharing data with a third-party application or when providing access to a specific user or group for a limited time.
That is here , we generate the SAS token by setting up the correct permission and the availability or the life of the SAS token in the ADLS. And use that generated SAS token to access the ADLS files. That is, the token will grant the user or application access to the resource for the specified duration, after which it will expire and access will be revoked.
So let us see the steps.
Step 1: The first thing you have to do it to get the SAS token from the ADLS gen2 account. So go to your ADLS gen2 storage account and click on the Shared Access Signature menu on the left pane.
Step2: Check the below check boxes in the allowed resource types:
i) Service
ii) Container
iii) Object
Step 3: Give all the permission that are need by clicking the check box, the available options are,
i) Read
ii) Write
iii)Delete
iv) List
v) Add
vi) Create
vii) Update
viii)Process
ix) immutable storage
x) Permanent Delete
Step 4: Set the start and end time of the token expiry duration.
Step 5 : press the Generate SAS and connection string button. And copy the SAS token.
Step 6: Now go to your Notebook. Now there are two ways to do it lets see one by one
Type 1:
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")
In the above code replace the <storage-account> with your storage account name and paste the copied SAS token in place of <token>
Now the storage account is authorized by SAS token by the configuration of the spark cluster. Now to access the files in the ADLS follow the below step
df=spark.read.scv('abfs://<container_name>@<storage_name>.dfs.core.windows.net/path/file.csv)
Replace <container_name> with your container name and <storage_name> with the storage account name. Mention the respective path and the file name to access it.
Type 2: This is the another type where you can use SAS token.
dbutils.fs.mount(
source=f"wasbs://{conatiner_name}@{storage_name}.blob.core.windows.net",
mount_point=f"/mnt/{mount_point}",
extra_configs={f"fs.azure.sas.<container_name>.{storage_name}.blob.core.windows.net:{SAS_token}"}
)
Replace container_name with the name of the container, storage_name with the storage account name, mount_point with the path where it needs to be mounted and SAS_token with the copied generated SAS token.
Now the container of the storage account is mounted, now let us see how we can access the file
df=spark.read.csv("/mnt/path/file.csv")
So, we can use the mount path to access the file as shown above.
Thus we saw how to mount the ADLS with Databricks using SAS token in step by step manner and the 2 different approach to do it. Hope this information is vey helpful. You don't need to memorize all these things , just remember the approach and safe this for future reference.
Thank You !!!
0 Comments