Uploading Files to Azure Data Lake Using a .NET App

Azure Data Lake Store is an extendable store of Cloud data in Azure. You can move data to and from Azure Data Lake Store via Azure data Factory or Azure SQL Database and connect to a variety of data sources. You can script upload files from on-premise or local servers to Azure Data Lake Store using the Azure Data Lake Store .NET SDK. Ambily KK shows how easy it is to get started by setting up a script or App to move data to Azure Data Lake Store.

Azure Data Lake allows you to rapidly store and ingest data simply whatever its form, and do any type of processing and analysis on it across platforms and languages. It is ideal for batch, streaming and interactive analytics because it provides a unified repository for diverse enterprise data requirements.

There are many ways of uploading the files to Data Lake store, using Azure Data Factory Services, using Azure Data Lake Store .NET SDK, Java SDK, Node, REST API and so on. In this article, we will look into how to use the Azure Data Lake Store .NET SDK for uploading files from local system to Azure Data Lake store. To do this, you will need Visual studio 2015/2017 and an Azure Subscription

If you haven’t already done so, the first step is to create a Data Lake Store in Azure to get started.

Register Azure App

All external application interactions are controlled or secured using an Azure Active Directory App. We will start off by creating a Azure Active Directory App for our Data Lake solution. Go to https://portal.azure.com and select ‘Azure Active Directory’ from the left-side navigation bar.

Select the ‘App registration’ option from the Azure Active Directory blade. This will open a blade with list of registered apps and the option to register a new application. Select New application registration button on top of the blade.

This will open the ‘Create’ blade to create a new app. Specify the name and Sign-on URL. Select the application type as ‘Web App/API’ or ‘Native’.The ‘Native’ term indicates that the application will be installed in the user device or computer. ‘Web App/API’ means the authentication is from a web-based application.

Because we are planning to use a console application for uploading the files, we will add a dummy Sign-on URL such as http://appname.

Grant Permission in Data Lake Store

This AD app will be used to authenticate the users to upload files from local system to Azure Data Lake store. This application will need appropriate permission to perform activities in the target resource, that is, the Data Lake store. We can provide the necessary permission to the App by adding the app under Access section of Data Lake Store.

Navigate to the Data Lake Store and select the ‘Data Explorer’ option.

Select the ‘Access’ option from the Data Lake Store blade. You can select the specific folder and specify access at the folder level, if required. In case you receive a “Forbidden” exception at the time of file copy, that indicates that your app don’t have enough permission to do the operation. Verify the app permissions at root level to fix the exception.

Note the Examples folder in the Store list. We will be using this folder for uploading files from .NET code.

The ‘Access’ blade will display the allowed users and apps list along with the type of permission that each of the users or apps have for this particular folder. Click on the ‘Add’ option at the top to grant permission for a new user or app.

Select the previously created app name from the ‘Select User or Group’ blade and click ’OK. You can search for a particular user or app by entering a few characters in the ‘Select’ box; a filtered list of users or apps will be listed down.

Select the required permissions for specific apps from the Permissions list. Also, we can restrict the apps’ access to a specific folder or sub folders. Access permission entry is for files and folders that control the access to that object. Default permission entry is for a folder that controls the access permissions associated with the child elements.

File Upload Solution

Now we can create the solution to upload files from local system to Azure Data Lake Store. Open VS 2015/2017 and create a new console application.

Add the following two Nuget packages to the project.

  • Microsoft.Azure.Management.DataLake.Store
  • Microsoft.Rest.ClientRuntime.Azure.Authentication

Right-click on the project and select ‘Manage Nuget Packages…’ to add the required packages

Navigate to ‘Browse’ and enter the Nuget package name in the search box to find the latest version of the Nuget package. Install the selected package by clicking ‘Install’.

I have used

  • Microsoft.Azure.Management.DataLake.Store v2.2.0
  • Microsoft.Rest.ClientRuntime.Azure.Authentication v2.3.1

for this walk-through.

Before we start coding, we should collect the required data from Azure resources. We need details from the Azure Active Directory App for authentication. Navigate to the previously created AD App blade.

Copy the Application ID mentioned in the App blade, which will be used as Client Id in the .NET application. Click on ‘Keys’ to generate a new key or client secret.

Enter a key name as ‘Description’ and select the expiry duration from the drop-down. Click on the ‘Save’ icon on top to generate the Key.

Copy the Key, which will be hidden after you leave this blade. This key will be used as a client secret in the .NET application.

Now, go back to our .NET Console application and start coding. Declare the Data Lake Store Account Management Client and Data Lake Store File System Management Client objects.

Declare the corresponding assembly

Next, define the Data Lake Store property values. Select the newly created Data Lake store from the list of all resources and navigate to the Data Lake Store blade

Create the authentication token using the Azure Active Directory App details. Open the Azure Activity Directory blade and select Domain names to capture the domain name.

You can get the domain name associated with the app from the App ID Uri mentioned under the App properties. App ID Uris start with the domain name only.

Include the required assembly references

Now we can create the objects of Data Lake Store Account Management Client and Data Lake Store File System Management Client and assign a subscription id.

Define the source and destination file paths. The source file path will be the physical file location in the server or local system. The Destination will be the Azure Data Lake Store folder with the target file name. It is not mandatory to upload the file with the same name. You can upload a file by specifying a new name.

This code will copy the Hello.txt file from D:\Agile with a new name file.txt into the Azure Data Lake Store folder /Example.

We can copy the file to Data Lake Store using the UploadFile method.

Parameters are:

  • adlsAccountName – account name or the Data Lake Store name
  • sourceFilePath – source file path
  • dataLakeStoreFilePath – destination file path
  • 1 – how many threads needs to be used for upload. By default, it calculates based on file size
  • False – indicate this upload process is a new one, not a continuation from previously paused upload
  • True – Overwrite the file or not

Here is the complete application code for uploading a file from the local system to the Azure Data Lake Store folder. You can customize this by adding Foreach or Parallel Foreach to upload multiple files at a time.

Reference

https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-net-sdk

Conclusion

Azure SDKs support the automation of various redundant operations in Azure. One of the common requirements is to upload files from on-premise or local servers to Azure Data Lake Store at regular intervals. This will allow the enterprise analytics applications to derive meaningful insights from various enterprise data available in various formats, using Data Lake Analytics, and Azure HDInsight.