{"id":95819,"date":"2023-02-15T21:55:58","date_gmt":"2023-02-15T21:55:58","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=95819"},"modified":"2023-02-09T22:09:31","modified_gmt":"2023-02-09T22:09:31","slug":"azure-machine-learning-introduction-part-1-overview-and-prep-work","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/data-science\/azure-machine-learning-introduction-part-1-overview-and-prep-work\/","title":{"rendered":"Azure Machine Learning Introduction: Part 1 Overview and prep work"},"content":{"rendered":"<p>The five-part series is designed to jump-start any IT professional\u2019s journey in the fascinating world of Data Science with Azure Machine Learning (Azure ML). Readers don\u2019t need prior knowledge of Data Science, Machine Learning, Statistics, or Azure to begin this adventure.<\/p>\n<p>All you will need is an Azure subscription and I will show you how to get a free one that you can use to explore some of Azure\u2019s features before I show you how to set up the Azure ML environment.<\/p>\n<ul>\n<li><strong>Part 1<\/strong> introduces readers to Azure ML and walks through the prep work of setting up an Azure ML workspace.<\/li>\n<li><strong>Part 2<\/strong> demonstrates the vital steps of data ingestion, data cleaning, and exploratory data analysis with Azure Machine Learning Studio.<\/li>\n<li><strong>Part 3<\/strong> walks readers through the core Machine Learning steps of training and scoring models.<\/li>\n<li><strong>Part 4<\/strong> demonstrates Azure ML capabilities for deploying trained models.<\/li>\n<li><strong>Part 5<\/strong> familiarizes readers with the easy-to-use Automated Machine Learning feature in Azure ML.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/machine-learning\/#product-overview\">Azure Machine Learning<\/a> (Azure ML), part of Microsoft\u2019s public cloud offerings, is a set of managed cloud services and tools to help Data Science and Machine Learning professionals build, test, deploy and share machine learning solutions.<\/p>\n<p>This first article in the Azure Machine Learning series covers the fundamentals of Azure Machine Learning, walks readers through setting up their Azure ML workspace, and gets them familiar with Azure ML studio, Azure ML Datastores, and Data assets.<\/p>\n<h2>Overview of Azure Machine Learning<\/h2>\n<p>Azure ML is a cloud service intended for Machine Learning professionals, data scientists, and engineers to train and deploy machine learning models, manage Machine Learning Operations (MLOps) and enhance their daily workflows. It\u2019s designed to help accelerate and manage machine learning project lifecycles with collaboration, automation, deployment, scaling, integration, and security tools. Azure Machine Learning enables professionals to create models natively, or use models built from open-source platforms like <a href=\"https:\/\/pytorch.org\/\">Pytorch<\/a>, <a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a>, <a href=\"https:\/\/scikit-learn.org\/stable\/\">scikit-learn<\/a>, etc., and monitor, retrain and redeploy models using its MLOps tools.<\/p>\n<p>The typical flow for a machine learning process is shown in Figure 1.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"1525\" class=\"wp-image-95825\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/word-image-95819-1-1.png\" \/><\/p>\n<p><strong>Figure 1. Typical Machine Learning project lifecycle<\/strong><\/p>\n<p>The Azure Machine Learning Studio (Azure ML Studio) facilitates quick model creation and deployment, with an easy-to-use graphical user interface (GUI). The automated Machine Learning (AutoML) feature in Azure ML speeds up the repetitive and time-consuming process of feature and algorithm selection.<\/p>\n<p>While Azure ML comes with a large selection of pre-built Machine Learning algorithms and modules, it also allows for extending your models with custom R and Python scripts. Azure ML facilitates easy deployment of models as web services, to be consumed by applications in real-time as well as batch processing modes. All these tasks can be done using Azure ML Studio Graphical User Interface (GUI) or Python SDK.<\/p>\n<h3>Azure Machine Learning Architecture Overview<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"1265\" class=\"wp-image-95827\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/word-image-95819-2-1.png\" \/><\/p>\n<p><strong>Figure 2. Azure Machine Learning high-level architecture<\/strong><\/p>\n<p>A high-level architecture overview of Azure ML will familiarize readers with its<\/p>\n<p>various components and how they work together.<\/p>\n<p><em>Note: There are a lot of terms and concepts, but throughout the series, it should become clearer as I make user of these concepts to build out examples.<\/em><\/p>\n<p>An Azure \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-resource-manager\/management\/manage-resource-groups-portal\">Resource group<\/a><\/strong>\u201d is a container of related resources for a solution, that typically share the same solution lifecycle. It stores metadata about your resources and is tied to a geographical region.<\/p>\n<ul>\n<li>Azure \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-workspace\">Machine Learning workspace<\/a><\/strong>\u201d is a top-level resource for Azure ML and is the centralized place to manage resources used to train and deploy models.<\/li>\n<li>Azure ML \u201cAssets\u201d like Environments, Experiments, Pipelines, Datasets, Models, and Endpoints are created while using Azure ML.\n<ul>\n<li>An \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-environments\">Environment<\/a><\/strong>\u201d is a collection of Python or R packages and libraries, environment variables, and various settings that encapsulate the needs of the machine learning model\u2019s training and scoring scripts<\/li>\n<li>A single execution of a training script, along with the record of its metadata, metrics, and output, is called a \u201c<strong>Run<\/strong>\u201d. An \u201c<strong>Experiment<\/strong>\u201d is a collection of multiple runs of a particular training script.<\/li>\n<li>Machine Learning \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-ml-pipelines\">Pipelines<\/a><\/strong>\u201d are used to create and manage workflows that orchestrate various phases of machine learning like data preparation, model training, model deployment, scoring, etc.<\/li>\n<li>An Azure ML \u201c<strong>Dataset<\/strong>\u201d is a reference to the data source location and a copy of its metadata. It is also known as a \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-create-data-assets?tabs=cli\">Data Asset<\/a><\/strong>\u201d.<\/li>\n<li>Creating a Machine Learning model involves choosing an algorithm, training it with data, and hyperparameter tuning. A trained \u201c<strong>Model<\/strong>\u201d can accept input and produce (infer) the output, which is commonly referred to as a prediction.<\/li>\n<li>Machine learning models are registered in the Azure Machine Learning workspace and deployed from the registry as a service \u201c<strong>endpoint<\/strong>\u201d, which is simply an instance of the model hosted in the cloud as a web service. It can be a \u201c<strong>Real-time<\/strong>\u201d or \u201c<strong>batch-processing<\/strong>\u201d endpoint<\/li>\n<\/ul>\n<\/li>\n<li>An Azure ML \u201c<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-datastore\">Datastore<\/a>\u201d is a reference to an existing storage account on Azure and secures the connection information without risking the authentication credentials and integrity of the source of data. It is used by Data Assets (also known as datasets) to securely connect to Azure storage services.<\/li>\n<li>\u201c<strong>Dependencies<\/strong>\u201d are the other Azure Resources like \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/storage\/common\/storage-account-overview\">Azure Storage Account(s)<\/a><\/strong>\u201d, \u201c<strong><a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/container-registry\">Azure Container Registry<\/a><\/strong>\u201d, \u201c<strong><a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/key-vault\/\">Azure Key Vault<\/a><\/strong>\u201d and \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-monitor\/app\/app-insights-overview?tabs=net\">Azure Application Insights<\/a><\/strong>\u201d, used by Azure ML workspace.<\/li>\n<li>A \u201c<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-compute-target\">Compute target<\/a>\u201d is a designated compute resource of the environment where the ML model training script is run, or inference service is deployed to and hosted. It\u2019s called a \u201c<strong>Linked Service<\/strong>\u201d when its location is the Machine Learning professional\u2019s local machine, on-premises resources, or cloud-based resources not hosted on Azure. When the compute target is hosted in and fully managed by Azure Machine Learning, it\u2019s called a \u201c<strong>Managed Resource<\/strong>\u201d. Azure Machine Learning offers two fully managed cloud-based Virtual Machines (VM) for machine learning development tasks.<\/li>\n<li>\u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-compute-instance\">Compute instance<\/a><\/strong>\u201d is intended to serve as the Machine Learning professional\u2019s development workstation. It\u2019s a VM with multiple tools and environments pre-installed for common machine-learning tasks<\/li>\n<li>\u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-create-attach-compute-cluster?tabs=python&quot; \\l &quot;what-is-a-compute-cluster\">Compute cluster<\/a><\/strong>\u201d is a set of VMs capable of scaling to multiple nodes when needed for large workloads. It scales up automatically when a job is submitted and is well-suited for dev\/test deployments<\/li>\n<\/ul>\n<p>When the compute target used to host production deployment of an ML model for performing inference is fully managed by Azure Machine learning, it\u2019s called an \u201cinference cluster\u201d. This currently includes \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-endpoints\">Azure Machine Learning endpoints<\/a><\/strong>\u201d and \u201c<strong><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-attach-kubernetes-anywhere\">Azure Machine Learning Kubernetes<\/a><\/strong>\u201d. Both can be used for Real-time and Batch inference.<\/p>\n<h3>Azure Machine Learning Workflow Design<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"1935\" class=\"wp-image-95829\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/word-image-95819-3-1.png\" \/><\/p>\n<p><strong>Figure 3. High-Level steps of typical Azure Machine Learning Workflow<\/strong><\/p>\n<p>The typical high-level design of an Azure Machine Learning Workflow involves the following steps:<\/p>\n<ol>\n<li><strong>Acquire Data<\/strong> &#8211; This is typically the first step for any Machine Learning workflow and involves making the raw data available to the Azure Machine experiment. Azure ML offers several options to gather data including manual data entry, Azure Blob storage, Azure SQL Database, Web URLs, compressed files, etc.<\/li>\n<li><strong>Data Preparation<\/strong> \u2013 Azure ML offers numerous modules to prepare and transform data. It includes tools for filtering, cleaning missing values, adding rows and columns, changing data types, splitting the data set for training and testing, etc.<\/li>\n<li><strong>Feature Engineering<\/strong> \u2013 Azure ML provides various methods for Feature engineering like Filter-based feature selection, Fisher Linear Discriminant Analysis, Permutation Feature Importance, etc.<\/li>\n<li><strong>Select and implement Machine Learning Algorithms<\/strong> \u2013 Azure ML comes with a wide array of built-in Machine Learning Algorithms and options to tune their parameters.<\/li>\n<li><strong>Train Machine Learning Model<\/strong> \u2013 Azure ML provides modules to quickly train and score Machine Learning Models<\/li>\n<li><strong>Evaluate Machine Learning Models<\/strong> \u2013 Azure ML provides modules to easily evaluate the performance of a trained model using industry-standard metrics.<\/li>\n<\/ol>\n<p>In the rest of this article, we are going to start the process of building an Azure ML example.<\/p>\n<h2>Azure Machine Learning workspace setup<\/h2>\n<p>Azure Machine Learning workspace is the central hub for all Azure ML artifacts and the repository for storing experiments, logs, etc. This section walks you through the steps of setting up their Azure Machine Learning workspace and upload a demo file in their storage account.<\/p>\n<p><em>Note: An Azure account with access to the Azure Portal is needed to perform these steps. Please follow this<\/em> <em><a href=\"https:\/\/azure.microsoft.com\/en-ca\/free\/\">link<\/a><\/em> <em>to learn more about creating your free account with Microsoft Azure.<\/em><\/p>\n<p>This first step is designed to help you understand the process of setting up your workspace and upload data to be processed.<\/p>\n<ol>\n<li>Log into the Azure Portal and click on \u201c<strong>Create a resource<\/strong>\u201d &gt; select the \u201c<strong>AI + Machine Learning<\/strong>\u201d category &gt; Type \u201c<strong>Azure Machine Learning<\/strong>\u201d in the search bar as shown in Figure 5, then select \u201c<strong>Azure Machine Learning<\/strong>\u201d from the list of results:<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1736\" height=\"830\" class=\"wp-image-95831\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-application-description-5.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 4. Create the Azure Machine Learning Resource<\/strong><\/p>\n<ul>\n<li>This brings up the \u201c<strong>Azure Machine Learning<\/strong>\u201d product page with a product overview and tabs for additional details about Plans, Usage Information, Support, and product reviews. Click the \u2018<strong>Create<\/strong>\u2019 button to launch the machine learning creation wizard, as shown in Figure 5.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1769\" height=\"1131\" class=\"wp-image-95833\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-application-email-desc-1.png\" alt=\"Graphical user interface, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 5. Azure Machine Learning product page<\/strong><\/p>\n<ol>\n<li>On the \u201c<strong>Basics<\/strong>\u201d tab for the machine learning workspace creation wizard. You can see my choices in Figure 6, but your values will likely vary some. Names are not important if they work. Many of the names will give you a default that will work for the example, especially if you plan to delete everything after following along with the article.<\/li>\n<\/ol>\n<ul>\n<li>Select your \u201c<strong>Subscription<\/strong>\u201d from the dropdown list.<\/li>\n<li>Select your \u201c<strong>Resource group<\/strong>\u201d from the dropdown list (You can also opt to create a new resource group using the \u201ccreate new\u201d option under this field)<\/li>\n<li>Enter the \u201c<strong>Workspace name<\/strong>\u201d you like to create (Please follow the prompts for naming rules)<\/li>\n<li>Select \u201c<strong>Region<\/strong>\u201d for your workspace from the dropdown list.<\/li>\n<li>Enter the name for your \u201c<strong>Storage account<\/strong>\u201d (or create a new one. Please follow prompts for naming rules)<\/li>\n<li>Enter the name for your \u201c<strong>Key vault<\/strong>\u201d (or create a new one. Please follow the prompts for naming rules)<\/li>\n<li>Enter the name for your \u201c<strong>Application insights<\/strong>\u201d (or create a new one. Please follow the prompts for naming rules)<\/li>\n<li>Enter the name for your \u201c<strong>Container registry<\/strong>\u201d (or create a new one. Please follow the prompts for naming rules)<\/li>\n<li>Then click the \u2018<strong>Review + create<\/strong>\u2019 button, which runs validation on your entries.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1012\" height=\"1141\" class=\"wp-image-95835\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-application-description-6.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 6. Create Machine Learning workspace <\/strong><\/p>\n<ul>\n<li>In Figure 7, after validation has passed, you will be presented with a summary of your entries for review. Click the \u2018<strong>Create\u2019<\/strong> button.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"704\" height=\"934\" class=\"wp-image-95837\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-12.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 7. Review entries and create machine learning workspace<\/strong><\/p>\n<ul>\n<li>This initiates a deployment to create your machine-learning workspace. Once your deployment is complete (typical run time is under one minute, but it may take a few minutes), Azure will take you to the deployment overview screen, as shown in Figure 8.\n<p>Please review this page to confirm the green check mark against all Resources and the Status \u201c<strong>OK<\/strong>\u201d<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1655\" height=\"685\" class=\"wp-image-95839\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-descr-5.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 8. Azure Machine Learning workspace deployment status<\/strong><\/p>\n<p>Click on the \u201c<strong>Go to resource<\/strong>\u201d button to navigate to your newly created machine learning workspace. This section will familiarize readers with the newly created machine-learning workspace. Figure 9 shows the screen that should follow, with the values that match your choices.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1648\" height=\"1033\" class=\"wp-image-95841\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-14.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 9. Azure Machine Learning workspace overview<\/strong><\/p>\n<ol>\n<li>The \u201c<strong>Overview<\/strong>\u201d section gives basic details of the Machine Learning workspace. The \u201c<strong>Studio web URL<\/strong>\u201d or \u201c<strong>Launch studio<\/strong>\u201d button is used to launch the Azure ML Studio\n<ol>\n<li>The \u201c<strong>Download config.json<\/strong>\u201d button downloads a file containing workspace configuration details, for use in Python scripts or Azure ML SDK notebooks that are outside of your workspace. This is used to connect to the workspace from any laptop or desktop.<\/li>\n<\/ol>\n<\/li>\n<li>The \u201c<strong>Tags<\/strong>\u201d menu is used to create, update, and delete tags associated with your workspace. Tags help organize, identify, and maintain azure resources based on your company\u2019s needs like grouping by department, application, functionality, purpose, etc.<\/li>\n<li>\u201c<strong>Access Control (IAM)<\/strong>\u201d has options to manage access for other users\/groups to the workspace, defining various access levels via roles and assigning roles to users\/groups. In addition to granting access to users\/groups, you can block individual users or entire groups from accessing your workspace with \u201c<strong>Deny<\/strong>\u201d. Such fine-grain access controls help implement complex scenarios like \u201call members of the team except the quality assurance tester, should have direct access to the team\u2019s workspace\u201d.<\/li>\n<li>The \u201c<strong>Settings<\/strong>\u201d menu has\n<ol>\n<li>\u201c<strong>Networking<\/strong>\u201d option for enabling\/disabling public network access to your workspace, and configuring \u201c<strong>private endpoint connections<\/strong>\u201d<\/li>\n<li>The \u201c<strong>Properties<\/strong>\u201d option listing out all details of the workspace.<\/li>\n<li>\u201c<strong>Locks<\/strong>\u201d option to configure workspace as \u201c<strong>Read-only<\/strong>\u201d or blocking users from \u201c<strong>Delete<\/strong>\u201d action<\/li>\n<\/ol>\n<\/li>\n<li>The remaining menu options include \u201c<strong>Monitoring and alerts<\/strong>\u201d, \u201c<strong>Automation<\/strong>\u201d and \u201c<strong>Support and troubleshooting<\/strong>\u201d.<\/li>\n<\/ol>\n<p>Let&#8217;s upload a test data file into the storage account (I will use it to create a Datastore). Navigate to the Azure Portal\u2019s left side <strong>Main menu<\/strong> &gt; <strong>Storage accounts<\/strong> &gt; <strong>mhatreredgatedemostgacct<\/strong> &gt; Containers, to review the list of existing blob storage containers.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1981\" height=\"1028\" class=\"wp-image-95843\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-15.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 10. Create a new Blob storage container<\/strong><\/p>\n<p>As shown in Figure 10, click the \u2018<strong>+ Container<\/strong>\u2019 button on the top left &gt; Enter a unique name in the Name text box for the new container &gt; click the \u2018<strong>Create<\/strong>\u2019 button. Once the new container is created and shows up under the container list, click to open it.<\/p>\n<p>Download the \u201c<strong>datastore_text.txt<\/strong>\u201d file from my GitHub repository using <a href=\"https:\/\/github.com\/SQLSuperGuru\/AzureMLDemos\/blob\/main\/datastore_test.txt\">this URL<\/a> and upload it to this container as shown in Figure 11. Note that the file is very small and just contains one record.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2093\" height=\"722\" class=\"wp-image-95845\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-16.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p>F<strong>igure 11. Upload text file to the Azure blob storage container<\/strong><\/p>\n<p>After the file upload completes, navigate back to the storage account (I will use this file in a subsequent step to create the data asset).<\/p>\n<p>Navigate the left side menu of the storage account, to \u201c<strong>Access keys<\/strong>\u201d under the \u201c<strong>Security +\u201d<\/strong> networking section as shown in Figure 12. Access keys can be used to authenticate your application\u2019s requests to this storage account. Copy one of the keys, to use in the subsequent step to create the Datastore.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1476\" height=\"1132\" class=\"wp-image-95847\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-descr-7.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 12. Copy the Storage account access key<\/strong><\/p>\n<h3>Azure Machine Learning Studio<\/h3>\n<p>Azure ML Studio (the successor of Machine Learning Studio classic) is the pivotal platform for most machine learning tasks in Azure. This powerful platform enables a wide gamut of machine learning professionals with varying levels of skills and preferences, to work with a range of <em>no-code<\/em> to <em>extensive-coding<\/em> options.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1474\" height=\"916\" class=\"wp-image-95849\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-18.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 13. Launch Azure Machine Learning Studio<\/strong><\/p>\n<p>Click on the \u2018<strong>Launch Studio<\/strong>\u2019 button to open the Azure ML Studio as shown in Figure 13 (Note: you may be prompted to log in again as you are going to a different tool in a different window.) This opens the tool as shown in Figure 14.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1043\" height=\"810\" class=\"wp-image-95851\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-application-description-8.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 14. Azure Machine Learning Studio<\/strong><\/p>\n<p>The left side menu consists of three key sections: <strong>Author<\/strong>, <strong>Assets<\/strong>, and <strong>Manage<\/strong><\/p>\n<p>The \u201c<strong>Author<\/strong>\u201d section is geared towards creators of Machine learning pipelines and currently offers three authoring experiences.<\/p>\n<ul>\n<li>\u201c<strong>Notebooks<\/strong>\u201d is the code-first experience popular with seasoned machine learning professionals who are comfortable coding with Python. The notebooks created here are stored in the default storage account associated with the workspace. The notebook user interface experience is fully integrated within Azure ML studio. Azure offers numerous tutorials and samples to help accelerate model development<\/li>\n<li>\u201c<strong>Designer<\/strong>\u201d is the Drag and Drop tool for creating and deploying machine learning models. The Graphical User Interface look and feel is comparable to Azure Data Factory studio. Experiments are created using pipelines in the designer interface, which offers a wide array of pre-built and configurable components for numerous tasks.<\/li>\n<li>\u201c<strong>Automated ML<\/strong>\u201d is the no-code approach to building machine learning models. It takes dataset, problem class, evaluation metric, and prediction target as inputs and automatically performs steps of data preprocessing, feature selection and engineering, algorithm selection, model training, testing, and hyper-parameter tuning. A machine learning professional can review the performance of various models trained through this process and directly deploy the best one.<\/li>\n<li>The \u201c<strong>Assets<\/strong>\u201d section enables machine learning professionals to create, customize and manage the numerous assets generated during the authoring process.\n<ul>\n<li>The \u201c<strong>Data<\/strong>\u201d sub-section is used to register and manage Datastores. Datastores enable a secure connection to storage services on Azure by storing the connection information, thus eliminating the need to provide credentials in scripts for data access. The \u201c<strong>Dataset monitor<\/strong>\u201d feature (currently in preview when this article was published) can be configured to detect data drift between training and inference data.<\/li>\n<li>The \u201c<strong>Jobs<\/strong>\u201d sub-section is used to create new experiments, or run sample experiments with Notebooks and with code using the Python SDK<\/li>\n<li>\u201c<strong>Components<\/strong>\u201d are the basic building blocks used to perform a given task like data processing, model training, scoring, etc. They have predefined input\/output ports, parameters, and environments that can be shared and reused in multiple pipelines. This sub-section enables machine learning professionals to register code from GitHub, Azure DevOps, or local files to create shareable components that can be used as building blocks for several machine learning projects.<\/li>\n<li>Pipelines authored using the designer can be viewed and orchestrated via the \u201c<strong>Pipelines<\/strong>\u201d sub-section. The \u201c<strong>Pipeline jobs<\/strong>\u201d tab shows details of the pipeline run, while the \u201c<strong>Pipeline drafts<\/strong>\u201d tab list pipelines that have never run so far. When Azure ML pipelines are published as a REST endpoint (for parameterized reuse or invoking jobs from external systems), they are listed under the \u201c<strong>Pipeline endpoints<\/strong>\u201d tab.<\/li>\n<li>Environments specify the Docker image, Python packages, and software settings for executing your training and scoring scripts. They are managed and versioned entities that enable reproducible, auditable, and portable machine-learning workflows across different compute targets. The \u201c<strong>Environments<\/strong>\u201d sub-section contains a list of \u201c<strong>curated environments<\/strong>\u201d, and an option for machine learning professionals to create their own user-defined \u201c<strong>custom environments<\/strong>\u201d. Curated environments are predefined environments that offer good starting points for building your environments. Curated environments are backed by cached Docker images, providing a reduced run preparation cost. Custom environments are user-defined environments created from a Docker image, a Docker build context, and a conda specification with a Docker image.<\/li>\n<\/ul>\n<\/li>\n<li>The \u201c<strong>Models<\/strong>\u201d sub-section enables machine learning professionals to create, manage and track their registered models. The model registry provides useful features like version tracking and metadata tagging. All models created under the workspace are listed here<\/li>\n<li>Azure Machine Learning \u201c<strong>Endpoints<\/strong>\u201d empower machine learning professionals to deploy machine learning models as web services.\n<ul>\n<li>Real-time endpoints are endpoints that are used for real-time inferencing. Real-time endpoints contain deployments ready to receive data from clients and send responses back in real time. They are listed under the \u201c<strong>Real-time endpoints<\/strong>\u201d tab.<\/li>\n<li>Batch endpoints are used to run inference on large volumes of data in batch processing mode that could run for long durations. They take pointers to data and run batch jobs asynchronously to distribute the workload on compute clusters. Their outputs can be stored for further analysis in a data store. Batch endpoints are listed under the \u201cbatch endpoints\u201d tab.<\/li>\n<\/ul>\n<\/li>\n<li>The \u201c<strong>Manage<\/strong>\u201d section is equipped with options for creating and managing Compute, connecting with external Linked Services, and managing the labelling of data sets.\n<ul>\n<li>Machine learning professionals use the \u201c<strong>compute<\/strong>\u201d sub-section to create and manage various types of compute targets like \u201ccompute instances\u201d, \u201ccompute clusters\u201d and \u201cInference clusters\u201d. The \u201cAttached computes\u201d tab allows machine learning professionals to bring their own compute like HDInsight clusters, Virtual machines, Databricks clusters, etc. to use as compute targets in the Azure Machine Learning workspace.<\/li>\n<li>\u201c<strong>Linked Services<\/strong>\u201d is a collection of external (cloud services that are outside of the Azure ML workspace) Azure services that can connect with the Azure Machine learning workspace. The guided integrations experience for linked services is currently in preview.<\/li>\n<li>The \u201c<strong>Data Labeling<\/strong>\u201d sub-section helps machine learning professionals to create, scale, and manage labeling efforts for projects involving image classification, object identification, text classification, and text Named Entity Recognition. Its \u201c<strong>ML Assist<\/strong>\u201d feature improves the speed and accuracy of labeling large datasets by leveraging the power of machine learning.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Azure Machine Learning Datastore and Data asset<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"2054\" class=\"wp-image-95853\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/word-image-95819-15-1.png\" \/><\/p>\n<p><strong>Figure 15. Azure Machine Learning Datastores and Data assets (Datasets)<\/strong><\/p>\n<p>Data Asset and Datastore are the two key components in Azure ML workspace, to connect the actual sources of data with the machine learning authoring tools. Figure 15 shows how Data Asset encompasses a logical representation of the actual sources of data, while the Datastore safeguards the connection details to the actual sources of data by keeping the credentials in a separate secure location (represented by the lock icon)<\/p>\n<p>\u201c<strong>Data asset<\/strong>\u201d (also known as a dataset) in Azure Machine Learning studio is a reference to the collection of related data sources, used in various authoring mechanisms like Automated ML, Notebooks, Designer, and Experiments. The data referenced in any Data asset can come from a wide variety of Data Sources like local or web files, Azure blob storage and datastores, Azure Open Datasets, Azure Data Lake, numerous cloud databases, and an array of other data sources.<\/p>\n<p>\u201c<strong>Datastore<\/strong>\u201d in Azure Machine Learning facilitates the connection between Data assets and the various sources of data, by securely storing the connection and authentication information. With Datastores in the picture, machine learning professionals no longer need to provide credentials and data source connection details in their scripts, pipelines, or any other authoring tools.<\/p>\n<p>Azure best practices recommend storing business data in a separate storage account to store, manage, secure, and access control it separately from workspace data. However, for the simplicity of demonstrations in these articles, I am using the same storage account \u201c<strong>mhatreredgatedemostgacct<\/strong>\u201d for storing my demo files.<\/p>\n<p>In Azure Machine Learning Studio, navigate to <strong>Data<\/strong> &gt; <strong>Datastores<\/strong> as seen in Figure 16.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1508\" height=\"979\" class=\"wp-image-95854\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-descr-8.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 16. Azure Machine Learning Datastores list<\/strong><\/p>\n<p>The Azure Machine Learning workspace has a default Azure Blob Storage Datastore \u201c<strong>woekspaceblobstorage(Default)<\/strong>\u201d for temporarily saving various files generated during the authoring process. Three other Datastores \u201c<strong>workspacefilestore<\/strong>\u201d, \u201c<strong>workspaceworkingdirectory<\/strong>\u201d and \u201c<strong>workspaceartifactstore<\/strong>\u201d found in the Storage Account are set up as part of creating the workspace. Click on each one of them to reveal details such as storage account name, Storage URI and creation date.<\/p>\n<p>To create a new datastore and link it to the blob storage container created previously, click on the \u2018<strong>+ Create<\/strong>\u2019 button as shown in Figure 17.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1573\" height=\"1121\" class=\"wp-image-95855\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-20.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 17. Create Azure Machine Learning Datastore<\/strong><\/p>\n<p>On the \u201c<strong>Create datastore<\/strong>\u201d screen<\/p>\n<ol>\n<li>Enter a name for the new datastore<\/li>\n<li>Select Datastore type \u201c<strong>Azure Blob Storage<\/strong>\u201d from the drop-down list.<\/li>\n<li>Choose \u201c<strong>From Azure subscription<\/strong>\u201d as the Account selection method.<\/li>\n<li>Select your \u201c<strong>Subscription ID<\/strong>\u201d and \u201c<strong>Storage account<\/strong>\u201d values from the drop-down list.<\/li>\n<li>Select the \u201cBlob container\u201d name from the drop-down list (I am using the blob container created earlier in this article, which is part of the Storage account)<\/li>\n<li>Set the \u201c<strong>Authentication type<\/strong>\u201d value as \u201c<strong>Account key<\/strong>\u201d from the drop-down list.<\/li>\n<li>Paste the Access Key copied from the earlier step, into the \u201c<strong>Account Key<\/strong>\u201d field.<\/li>\n<li>Then Click the \u2018<strong>Create<\/strong>\u2019 button at the bottom of the screen.<\/li>\n<\/ol>\n<p>Once the new datastore is created, it shows up under the list of datastores. Click to open the \u201c<strong>Overview<\/strong>\u201d page &gt; navigate to the \u201c<strong>Browse preview<\/strong>\u201d section to confirm the \u201c<strong>datastore_test.txt<\/strong>\u201d file is listed as shown in Figure 18.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1334\" height=\"1108\" class=\"wp-image-95856\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-text-application-email-21.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 18. Datastore with test file from a blob container<\/strong><\/p>\n<p>To create the data asset from this datastore, click the \u2018<strong>create data asset<\/strong>\u2019 button and in the \u201c<strong>Data type<\/strong>\u201d section.<\/p>\n<ul>\n<li>enter a name for the new Data asset into the \u201c<strong>Name<\/strong>\u201d field.<\/li>\n<li>enter a description for the Data asset into the \u201c<strong>Description<\/strong>\u201d field.<\/li>\n<li>select \u201c<strong>File (uri_file)<\/strong>\u201d from the drop-down list in the \u201c<strong>Type<\/strong>\u201d field.<\/li>\n<\/ul>\n<p>Click \u2018<strong>Next<\/strong>\u2019 to navigate to the \u201c<strong>Storage path<\/strong>\u201d section as seen in Figure 19.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1762\" height=\"1822\" class=\"wp-image-95857\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/02\/graphical-user-interface-application-description-10.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 19. Create data asset<\/strong><\/p>\n<p>After setting the values, click <strong>Next <\/strong>to choose the storage path.<\/p>\n<ul>\n<li>Choose the \u201c<strong>Browse to storage path<\/strong>\u201d radio button option.<\/li>\n<li>Select the \u201c<strong>datastore_text.txt<\/strong>\u201d file.<\/li>\n<\/ul>\n<p>Click Next to navigate to the \u201c<strong>Review<\/strong>\u201d section &gt; Review the data asset settings &gt; Click the \u2018<strong>Create<\/strong>\u2019 button.<\/p>\n<p>This creates the data asset for the test_dataasset.txt file and makes it available for the machine learning authoring processes and tools in Azure Machine Learning.<\/p>\n<p>Conclusion<\/p>\n<p>This article introduced readers to Azure Machine Learning and gives an overview of its various components and high-level architecture. It showcases key features and capabilities of the platform that enable machine learning professionals to build, deploy and manage high-quality machine learning models. I demonstrated the steps for setting up an Azure Machine Learning workspace, Azure Blob storage container, Datastore, and Data asset and got readers familiar with the Azure Machine Learning studio.<\/p>\n<h2>References\/Further reading:<\/h2>\n<ul>\n<li>Azure Machine Learning Product overview: <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/machine-learning\/#product-overview\">https:\/\/azure.microsoft.com\/en-us\/products\/machine-learning\/#product-overview<\/a><\/li>\n<li>Azure Machine Learning Documentation: <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/overview-what-is-azure-machine-learning\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/overview-what-is-azure-machine-learning<\/a><\/li>\n<li>Azure ML architecture and key concepts: <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/v1\/concept-azure-machine-learning-architecture\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/v1\/concept-azure-machine-learning-architecture<\/a><\/li>\n<li>Compute targets in Azure Machine Learning: <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-compute-target\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-compute-target<\/a><\/li>\n<li>Creating Azure ML workspace &#8211; <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/quickstart-create-resources\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/quickstart-create-resources<\/a><\/li>\n<li>Azure ML Environments &#8211; <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-environments\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/concept-environments<\/a><\/li>\n<li>Azure ML Data assets &#8211; <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-create-data-assets?tabs=cli\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-create-data-assets?tabs=cli<\/a><\/li>\n<li>Azure ML datastores &#8211; <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-datastore?tabs=cli-identity-based-access%2Ccli-adls-identity-based-access%2Ccli-azfiles-account-key%2Ccli-adlsgen1-identity-based-access\">https:\/\/learn.microsoft.com\/en-us\/azure\/machine-learning\/how-to-datastore?tabs=cli-identity-based-access%2Ccli-adls-identity-based-access%2Ccli-azfiles-account-key%2Ccli-adlsgen1-identity-based-access<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The five-part series is designed to jump-start any IT professional\u2019s journey in the fascinating world of Data Science with Azure Machine Learning (Azure ML). Readers don\u2019t need prior knowledge of Data Science, Machine Learning, Statistics, or Azure to begin this adventure. All you will need is an Azure subscription and I will show you how&#8230;&hellip;<\/p>\n","protected":false},"author":317671,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[47,53],"tags":[],"coauthors":[101710],"class_list":["post-95819","post","type-post","status-publish","format-standard","hentry","category-data-science","category-featured"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95819","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/317671"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=95819"}],"version-history":[{"count":3,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95819\/revisions"}],"predecessor-version":[{"id":95859,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95819\/revisions\/95859"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=95819"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=95819"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=95819"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=95819"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}