{"id":1589,"date":"2013-02-18T00:00:00","date_gmt":"2013-02-18T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/data-science-laboratory-system-instrumentation\/"},"modified":"2024-08-30T14:12:55","modified_gmt":"2024-08-30T14:12:55","slug":"data-science-laboratory-system-instrumentation","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/data-science\/data-science-laboratory-system-instrumentation\/","title":{"rendered":"Data Science Laboratory System &#8211; Instrumentation"},"content":{"rendered":"<div class=\"article-content\">\n<p class=\"start\">This is the second in a series on setting up a Data Science Laboratory server &#8211; <a href=\"http:\/\/www.simple-talk.com\/sql\/database-administration\/setting-up-a-data-science-laboratory\/\">the first is located here<\/a>.<\/p>\n<p>My plan is to set up a system that allows me to install and test various methods to store, process and deliver data. These systems range from simple text manipulation to Relational Databases and distributed file and compute environments. Where possible, I plan to install and configure the platforms and code locally.<\/p>\n<p>This article covers <em>Instrumentation<\/em> &#8211; the way that you can measure the time that processes take, and evaluate resource allocation, on the laboratory system.<\/p>\n<p>The laboratory system&#8217;s first function is to serve as a place to load, learn and test software that deals with data; it is important to be able to measure what you do. Some of these tools and processes you&#8217;ll be familiar with, others may be new or updated significantly. As always, it&#8217;s best to test these tools out yourself as you read along.<\/p>\n<h2>The Story So Far<\/h2>\n<p>My goal is to work with the following types of data systems:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.simple-talk.com\/cloud\/data-science\/data-science-laboratory-system---testing-the-text-tools-and-sample-data\/\">Text Systems<\/a><\/li>\n<li>Instrumentation (This article)<\/li>\n<li>Interactive Data Tools<\/li>\n<li>Programming and Scripting Languages<\/li>\n<li>Relational Database Management Systems<\/li>\n<li>Key\/Value Pair Systems<\/li>\n<li>Document Store Databases<\/li>\n<li>Graph Databases<\/li>\n<li>Object-Oriented Databases<\/li>\n<li>Distributed File and Compute Data Handling Systems<\/li>\n<\/ul>\n<p>I&#8217;ll repeat a disclaimer I made in the previous articles:<\/p>\n<div class=\"note\">\n<p class=\"note\"><strong>Note<\/strong>: This information is not an endorsement or recommendation to use any particular vendor, software or platform; it is an explanation of the factors that influenced my choices. You can choose any other platform, cloud provider or other software that you like &#8211; the only requirement is that it fits your needs. As I always say &#8211; <em>use what works for you<\/em>. You can examine the choices I&#8217;ve made here, change the decisions to fit your needs and come up with your own system. The choices here are illustrative only, and not meant to sell you on a software package or vendor.<\/p>\n<\/div>\n<p>The first article in this series explains my choices for the platform and the operating system, and details a few tools for text-based data interaction. The second article deals with a few examples for those text tools. Even with a few examples however, there is no substitute for studying and reading up on the tools &#8211; this series isn&#8217;t meant to be an exhaustive examination of each one. Hopefully the examples give you an idea of what the tool can do, and then you can decide how much further you want to investigate it.<\/p>\n<h2>The Monitoring Process<\/h2>\n<p>Performance monitoring isn&#8217;t a major objective of the instrumentation for the laboratory, since the system isn&#8217;t designed to have adequate hardware for that type of testing. In fact, you normally wouldn&#8217;t test a process for performance using a Virtual Machine, unless the production system is also a virtualized environment. There&#8217;s actually quite an established scientific method to proper performance tuning: Even if you have identical environments, it&#8217;s almost impossible to completely simulate the load on the production system.<\/p>\n<p>In any scientific process, measurement is of paramount importance. The experimenter aims to change one of the many factors that together determine the behavior of whatever he is observing, whilst keeping the others constant, or at least ensuring that they exert a consistent influence across the changes in the factor being observed. One factor is changed and all others are &#8216;controlled&#8217; to try to ensure that the consequences of the deliberate changes are causing the observed effects.<\/p>\n<p>The process for measurement is very important. In order to rely on the numbers you obtain from the monitoring you&#8217;ll need to &#8216;control&#8217; factors other than the one you&#8217;re experimenting with to ensure that you follow the same process under the same conditions each time you run your experiment, and carefully document as many factors that as you can that could distort the results. Any note-taking tool is sufficient. You should record narration and numbers of your test, and include graphics and screenshots where appropriate. Formats vary, depending on the type of test run, and this is where an electronic note-taking utility such as OneNote is useful, because it allows for graphics and embedded objects.<\/p>\n<div class=\"note\">\n<p class=\"note\"><em>Reference:<\/em> <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ff647791.aspx\">Monitoring .NET Applications<\/a> is a great primer on the monitoring process, and something you can use to develop your notebook layout.<\/p>\n<\/div>\n<h2>Measuring Time<\/h2>\n<p>I&#8217;ve already mentioned that it isn&#8217;t a good idea to use the lab system for formal performance tuning. The lab system by definition has more software loaded and is configured in a different way than a production system. Production systems should have no more software than is absolutely necessary to perform a task, and should be configured for that task with the most optimal settings possible. The lab system has lots of extraneous tools for testing and isn&#8217;t configured the same for any length of time.<\/p>\n<p>That being said, there are generalizations you can make about the time it takes for a process to run. Having that number at least provides a starting point and you can use it to make comparisons between processes, the software tool for the run, and the impact of configuration changes.<\/p>\n<p>Most data software packages have built-in measurement tools and commands. I&#8217;ll defer the discussion of each of those and instead focus on the operating system tools that show the time aspects of a given process.<\/p>\n<p>At the most general, to get the time of a process the outline looks like this:<\/p>\n<ol>\n<li>Record current time <em>t1<\/em><\/li>\n<li>Run process<\/li>\n<li>Record current time <em>t2<\/em><\/li>\n<li>Show elapsed time between <em>t1<\/em>and <em>t2: t2-t1 = complete process time<\/em><\/li>\n<\/ol>\n<p>This is a common practice in software development, but in some cases you don&#8217;t have access to the source code in a process. You can use operating system commands and third-party utilities to get the time that a process takes. For instance, as a simple test of duration, you could use the following batch-file in Windows to get the time for simple directory search for files created in the last article:<\/p>\n<pre class=\"listing\">copy con c:\\users\\Administrator\\Documents\\testtime.cmd\r\ntime \/t &gt; c:\\users\\administrator\\documents\\testtime.txt\r\ndir c:\\*.tsv \/s\r\ntime \/t &gt;&gt; c:\\users\\administrator\\documents\\testtime.txt\r\ntype c:\\users\\administrator\\documents\\testtime.txt\r\n&lt;&lt;Press CTRL-Z to close the file&gt;&gt;<\/pre>\n<p>This creates a <em>cmd<\/em> batch file that first uses the <em>time<\/em> command in Windows with the <em>\/t<\/em> switch, recording the current time. Using the <em>&gt;<\/em> pipe, the value is placed in a file called <em>testime.txt<\/em>. Following that, the batch runs whatever commands or processes desired, and then the same time command is run, but this time using the <em>&gt;&gt;<\/em> pipe to <em>append<\/em> to the text file rather than creating it. After the run, the start and end time is displayed.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image1.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image1small.png\" alt=\"1750-image1small.png\" \/><\/a><\/p>\n<p>If your process involves more than one step, this is actually a good way to show the complete elapsed time, since it measures the start and completion of all commands in the batch, and you could also append more than one time-check throughout the run to measure each part.<\/p>\n<p>There are some drawbacks, however. The <code>time<\/code> command in Windows doesn&#8217;t show seconds or milliseconds, and you may want that level of precision for the smaller workloads that you&#8217;ll test on your system. Also, while it&#8217;s miniscule, the time it takes to open and write the information in the <em>testtime.txt<\/em> file is included in the measurements. That isn&#8217;t exactly what you&#8217;re after.<\/p>\n<p>Using PowerShell you&#8217;ll get far more information and granularity, and you won&#8217;t have to write out the information into a file. Here&#8217;s the same process using PowerShell&#8217;s <code>measure-Command<\/code> cmd-let:<\/p>\n<pre class=\"listing\">measure-Command {Get-ChildItem c:\\*.tsv -r}<\/pre>\n<p>The <code>{}<\/code> in PowerShell is a &#8220;block&#8221; of code, so much more than just a simple <code>DIR<\/code> command applies. You&#8217;ll get far more information, and you can combine that with other commands to output the data to a file, e-mail, Excel, or anything else PowerShell supports.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image2.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image2small.png\" alt=\"1750-image2small.png\" \/><\/a><\/p>\n<p>In a POSIX-compliant system, and in UNIX variants like Linux, you normally use a set of tools built-in at the shell level to monitor a process. Using the Cygwin tools from the last tutorial, I&#8217;ll demonstrate testing one of the first text-based tools from the last article: the Cygwin tool &#8220;<code>wc<\/code>&#8220;.<\/p>\n<p>The <code>wc<\/code> tool is a simple, albeit powerful, utility. It counts the lines, words and bytes in a file. It can even work on very large files &#8211; and I wanted to find out how it does that. Does it read the entire file into memory, or stream it through and perform some calculation? Of course, I could look this information up, but this is an example of discovering how a tool works, a practice I&#8217;ll use in the data systems that follow.<\/p>\n<p>The command for <code>wc<\/code> is quite simple &#8211; it&#8217;s <code>wc<\/code> and then the name of the file. If you&#8217;re ever unsure about the syntax, use and format for a Cygwin command you can type &#8220;<code>man<\/code>&#8221; and the name of the command for more information.<\/p>\n<p>Using the largest of my sample data files from the last article I started the monitoring tools and then opened the Cygwin terminal, navigated to the directory where I stored that file, and then ran the following command:<\/p>\n<pre class=\"listing\">time wc long_abstracts_en.nq<\/pre>\n<p>I received the following output a few seconds later:<\/p>\n<pre class=\"listing\">   3769928  362732492 2801266780 long_abstracts_en.nq\r\n\r\nreal    2m17.558s\r\nuser    1m32.171s\r\nsys     0m2.312s<\/pre>\n<p>This shows the actual elapsed time (<em>real<\/em>), the time spent calling from the CPU for system functions (<em>user<\/em>) and the amount of system time used during the process (<em>sys<\/em>). Note that the format returned varies from POSIX to POSIX implementation, and I&#8217;m showing the Cygwin format here. Your Unix system may show another format, but they all normally contain this same information.<\/p>\n<h2>Measuring Resources<\/h2>\n<p>Measuring the time a process takes is usually the first step, followed by measuring the resources consumed by a process. Excepting specific object monitoring, most technical professionals focus on the &#8220;big four&#8221; components in a computing system: CPU, Memory, I\/O (usually the storage subsystem) and Networking. Showing the use of each of these components throughout an experiment informs you about the areas to tune, which areas cannot be tuned and more.<\/p>\n<p>There is an extremely rich set of tools that are designed to monitor a Windows-based system, from broad to very specific levels. The most useful for the &#8220;big four&#8221; monitoring are the built-in <em>Task Manager, Resource Monitor<\/em>, and <em>Performance Monitor<\/em>.<\/p>\n<p>If you&#8217;re writing your own code on the laboratory system you may want to take a look at this article which covers stacks and trace monitoring using a new suite of tools from Microsoft:\u00a0<a href=\"http:\/\/blogs.microsoft.co.il\/sasha\/2013\/02\/06\/windows-performance-analyzer\/\">http:\/\/blogs.microsoft.co.il\/sasha\/2013\/02\/06\/windows-performance-analyzer\/<\/a><\/p>\n<p>You should take the time to read up further on each of these tools, since they are useful not only in the laboratory system but in your day-to-day job as a technical professional &#8211; regardless of the data technology you use on Windows. This article focuses on a general overview.<\/p>\n<p>My process is to start with the broadest measurements, and then focus in on the components that show the highest pressure. This allows me to focus my efforts on what part of a process uses the most resources.<\/p>\n<h3>Task Manager<\/h3>\n<p>One of the most basic tools in Windows to analyze a process is Task Manager. This tool has been installed in the Windows operating system since the earliest releases of Windows NT 4.0, and if you haven&#8217;t evaluated it in the latest releases you&#8217;ll find significant changes in the interface and what you can collect with the tool. Start with this tool to gain the broadest set of information about resource utilization on the system as you run a process. You can pivot off to other performance tools from the same window.<\/p>\n<p>To start Task Manager you have several options:<\/p>\n<ul>\n<li>Press <em>CTRL+ALT+DELETE<\/em>, and then click <em>Task Manager<\/em>.<\/li>\n<li>Press <em>CTRL+SHIFT+ESC<\/em>.<\/li>\n<li>Right-click an empty area of the taskbar, and then click <em>Task Manager<\/em><\/li>\n<\/ul>\n<p>Once you start the tool, you can see the main process sheets:<em> Processes, Performance, Users, Details<\/em> and <em>Services<\/em>.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image3.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image3small.png\" alt=\"1750-image3small.png\" \/><\/a><\/p>\n<p>You&#8217;ll notice from the graphic that you have access to information on the CPU, Memory and Network components, but not the I\/O subsystem. Right-clicking in this tab allows you to show a smaller version of the graphs for the components on the system, or expand the components to show individual graphs rather than red-yellow-blue health indicators.<\/p>\n<p>Selecting the <em>Details<\/em> panel allows you to see individual processes, which can lead to the resources used by those processes.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image4.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image4small.png\" alt=\"1750-image4small.png\" \/><\/a><\/p>\n<p>Right-clicking the headers in the details pane allows you to add more details about a process, such as CPU time, I\/O time and more.<\/p>\n<p>You can learn more about this tool here:<\/p>\n<ul>\n<li><a href=\"http:\/\/blogs.technet.com\/b\/askperf\/archive\/2012\/10\/27\/windows-8-windows-server-2012-the-new-task-manager.aspx\">http:\/\/blogs.technet.com\/b\/askperf\/archive\/2012\/10\/27\/windows-8-windows-server-2012-the-new-task-manager.aspx<\/a><\/li>\n<li><a href=\"http:\/\/4sysops.com\/archives\/overview-of-the-task-manager-in-windows-server-2012\/\">http:\/\/4sysops.com\/archives\/overview-of-the-task-manager-in-windows-server-2012\/<\/a><\/li>\n<\/ul>\n<h3>Resource Monitor<\/h3>\n<p>The next level of detail is available in the <em>Resource Monitor<\/em> tool. You can get to this tool from the <em>Task Manager<\/em>, or:<\/p>\n<ul>\n<li>Press the <em>Windows<\/em> key and type &#8220;<em>Resource Monitor<\/em>&#8220;<\/li>\n<li>The Command: <em>%windir%\\system32\\perfmon.exe \/res<\/em><\/li>\n<li>The Command: <em>%windir%\\system32\\resmon.exe<\/em><\/li>\n<\/ul>\n<p>Resource Monitor focuses on the big four, specifically more real-time monitoring as a process runs.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image5.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image5small.png\" alt=\"1750-image5small.png\" \/><\/a><\/p>\n<p>It&#8217;s a different view than the <em>Task Manager<\/em>, starting with the four main components listed on the first panel, along with graphs and small health monitors next to each component. From there you can drill in to a specific component by selecting the appropriate tab.<\/p>\n<p>I use this tool to start with the <em>Disk<\/em> and <em>Network<\/em> components. I can quickly locate which processes are taking the most time on the disk, and start identifying a chain of processes that are affected by opening the <em>Disk<\/em> tab and sorting the columns by <em>Response time<\/em>.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image6.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image6small.png\" alt=\"1750-image6small.png\" \/><\/a><\/p>\n<p>You can read more about <em>Resource Monitor<\/em> and how to use it here:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.makeuseof.com\/tag\/closer-windows-resource-monitor\/\">http:\/\/www.makeuseof.com\/tag\/closer-windows-resource-monitor\/<\/a><\/li>\n<li><a href=\"http:\/\/blogs.technet.com\/b\/askperf\/archive\/2012\/02\/01\/using-resource-monitor-to-troubleshoot-windows-performance-issues-part-1.aspx\">http:\/\/blogs.technet.com\/b\/askperf\/archive\/2012\/02\/01\/using-resource-monitor-to-troubleshoot-windows-performance-issues-part-1.aspx<\/a><\/li>\n<\/ul>\n<h3>Performance Monitor<\/h3>\n<p>So far the tools shown use real-time, which means you&#8217;re sitting at the console watching a particular sequence run. While that&#8217;s common in a laboratory system, there are times that the process will take longer than you&#8217;re willing to sit and watch the screen.<\/p>\n<p>Most of the time you&#8217;re also going to want to record the performance and big four impacts on your system. To do that, you&#8217;ll need to a way to record the results of your measurements, and for that one of the best tools is the <em>Windows Performance Monitor<\/em>. This tool has been around since the earliest versions of Windows Server, and is the primary tool for monitoring not only the operating system, but software that plugs into this architecture. It&#8217;s been upgraded significantly in the latest releases of Windows Server, so it deserves another look if you&#8217;ve used it before.<\/p>\n<p>I&#8217;ll provide some resources that show you examples and practical uses of this tool, since there are so many references available for that. Rather than focus on each specific use-case for Performance Monitor, I&#8217;ll explain generally how it works, and then how you can use it with your laboratory system. I&#8217;ll also explain a command-line tool that allows you to create monitoring sessions, start them, and record their output, all without having to use the graphical interface.<\/p>\n<p>You can start the <em>Performance Monitor<\/em> by pressing the <em>Windows<\/em> key and typing &#8220;<em>Performance Monitor<\/em>&#8220;, or you can type &#8220;<em>perfmon<\/em>&#8221; at the command-line in CMD or PowerShell.<\/p>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image7.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image7small.png\" alt=\"1750-image7small.png\" \/><\/a><\/p>\n<p>Once inside, you&#8217;ll see there are three primary areas you have access to:<\/p>\n<ul>\n<li><em>Monitoring Tools <\/em><\/li>\n<li><em>Data Collector Sets<\/em><\/li>\n<li><em>Reports<\/em><\/li>\n<\/ul>\n<p><em>Performance Monitor<\/em> works using Objects, such as the <em>CPU, Counters<\/em> on those <em>Objects<\/em>, such as <em>%Processor Time<\/em>, and displaying <em>Values<\/em> for those <em>Counters<\/em> as time goes by. Each <em>Object\/Counter<\/em> pair determines the time interval they will present for monitoring, and the <em>Values<\/em> are collected based on that interval and then whatever granularity on that interval you want.<\/p>\n<p>A monitoring session involves picking <em>Objects<\/em> and <em>Counters<\/em> and examining the <em>Values<\/em> they present. You could, for instance, select various <em>Counters<\/em> from the <em>Memory, CPU, Network Interface<\/em> and <em>I\/O<\/em> subsystem <em>Objects<\/em> at one time to form a picture of the activity on the system. Think of it as a manual selection of the <em>Objects<\/em> shown in <em>Task Manager<\/em> or the <em>Resource Monitor<\/em>. By default, the values are collected and displayed in a graphical line chart on the screen as they occur.<\/p>\n<p>You can, however, change the behavior and have <em>Performance Monitor<\/em> record the values to a file, either <em>Binary<\/em> or <em>Text<\/em>, and examine them later. My process is to set up the monitoring to a text file and then allow it to run for a short period of time. I then fire off my experiments, allow them to complete, and wait a few moments after completion to stop the collection. I then open the text file in Excel or R and study and graph the results. I allow the &#8220;white space&#8221; at the beginning and end of the run to ensure I catch the background noise of the normal system operation. I will repeat the process during subsequent runs of my experiment as I change variables to see which components are exercised during the various test runs.<\/p>\n<p>Just below the <em>Performance Tools<\/em> item are <em>Data Collectors<\/em> (my apologies to the SQL Server experts, at Microsoft we do tend to re-use terms sometimes). These are <em>System Event Traces<\/em> (<em>ETW<\/em>) events that can be collected and analyzed and shown in the <em>Reports<\/em> item just below <em>Data Collectors<\/em>. There are two default <em>Data Collectors <\/em>you can run to show the system configuration and health and a 60-second performance collection. The reports are quite valuable, and you can also create your own.<\/p>\n<p>You can also create, schedule, run and collect ETW traces using the <em>Logman<\/em> command-line tool. I&#8217;ll reference the documentation to that tool in a moment.<\/p>\n<p>One word of caution when using the <em>Performance Monitor<\/em> tool &#8211; it&#8217;s not to be blindly trusted. You must carefully understand how the numbers for each counter are presented, since some of them are written by Microsoft and others by third-party vendors, if you are using <em>Performance Monitor Counters<\/em> with their products. Also, it&#8217;s important to consider the granularity and other equality conditions when combining <em>Objects<\/em> and <em>Counters<\/em>.<\/p>\n<p>Here are a few references you should read, and experiment with on your system:<\/p>\n<ul>\n<li>General Documentation on Windows Performance Monitor: <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/cc749249.aspx\">http:\/\/technet.microsoft.com\/en-us\/library\/cc749249.aspx<\/a><\/li>\n<li>A &#8220;Using Windows Performance Monitor&#8221; Guide: <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/cc771692.aspx\">http:\/\/technet.microsoft.com\/en-us\/library\/cc771692.aspx<\/a> (For Windows 2008 but holds for Windows 2012 as well)<\/li>\n<li>Logman reference page: <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/cc753820(v=WS.10).aspx\">http:\/\/technet.microsoft.com\/en-us\/library\/cc753820%28v=WS.10%29.aspx<\/a><\/li>\n<li>Practical example of Logman: <a href=\"http:\/\/blogs.technet.com\/b\/askperf\/archive\/2008\/05\/13\/two-minute-drill-logman-exe.aspx\">http:\/\/blogs.technet.com\/b\/askperf\/archive\/2008\/05\/13\/two-minute-drill-logman-exe.aspx<\/a><\/li>\n<\/ul>\n<h3>Sysinternals<\/h3>\n<p>Another common set of tools to monitor very specific areas of your system is the <em>Sysinternals<\/em> suite of software, formerly its own company and now part of Microsoft. The entire set of tools are listed here: <a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb545027\">http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb545027<\/a> and there are many utilities you can use to monitor your system. I downloaded the entire suite (<a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb842062\">http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb842062<\/a>) since I&#8217;ll use many of these tools to evaluate my system.<\/p>\n<p>I focus primarily on the following tools from this suite:<\/p>\n<ul>\n<li><a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb896645\">Process Monitor<\/a> &#8211; Show the file system, Registry, process, thread and DLL activity<\/li>\n<li><a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb896682\">PsList<\/a> &#8211; Shows Processes and Threads<\/li>\n<li><a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb896656\">ListDLLs<\/a> &#8211; Information on currently loaded DLLs. This is more useful than you may think.<\/li>\n<li><a href=\"http:\/\/technet.microsoft.com\/en-us\/sysinternals\/bb896655\">Handle<\/a> &#8211; Files opened by process<\/li>\n<\/ul>\n<p>This is a short list, but I find all of the tools useful from time to time. When you download these utilities as a complete package, they come in a &#8220;zipped&#8221; file, so you&#8217;ll need to expand them into whatever directory you like, and add them to the path if you want them to be available directly from the command line. I copied mine to a <em>c:\\Program Files\\Sysinternals<\/em> directory, and the used the following process to set the path:<\/p>\n<ul>\n<li>Open <em>Control Panel \\System<\/em> and <em>Security\\System <\/em><\/li>\n<li>Click on <em>Change Settings<\/em><\/li>\n<li>Open <em>Advanced<\/em> tab<\/li>\n<li>Click on <em>Environment Variables<\/em><\/li>\n<li>Click the <em>Path<\/em> variable in the bottom part of the panel and then <em>Edit<\/em><\/li>\n<li>Add <em>;c:\\PROGRA~1\\Sysinternals<\/em><\/li>\n<li>Close all panels<\/li>\n<\/ul>\n<p class=\"illustration center\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image8.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1750-image8small.png\" alt=\"1750-image8small.png\" \/><\/a><\/p>\n<p>Note that you don&#8217;t have to add these utilities to that path of your system; it just makes things simpler to type at the command line.<\/p>\n<p>The &#8220;Help&#8221; files in these utilities are quite good &#8211; start there for the most information, and then at the download links above for more specifics on each. Note &#8211; if you use the process described, you&#8217;ll need to open the rights in that directory so that the help files can launch. You could also place the tools in another directory to avoid this concern.<\/p>\n<p>In the next article in this series, I&#8217;ll cover tools you can use to interact directly with data, meant to be used (although not exclusively) at the terminal.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>It is sensible to check the performance of different solutions to data analysis in &#8216;lab&#8217; conditions. Measurement by instrumentation makes it easier to develop systems that are efficient.&hellip;<\/p>\n","protected":false},"author":221875,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[47],"tags":[5336,5764,4364,5818],"coauthors":[45107],"class_list":["post-1589","post","type-post","status-publish","format-standard","hentry","category-data-science","tag-cloud","tag-data-science","tag-monitoring","tag-resource-management"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1589","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221875"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1589"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1589\/revisions"}],"predecessor-version":[{"id":79412,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1589\/revisions\/79412"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1589"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1589"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1589"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1589"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}