{"id":80208,"date":"2018-08-03T15:24:02","date_gmt":"2018-08-03T15:24:02","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=80208"},"modified":"2026-04-16T09:03:27","modified_gmt":"2026-04-16T09:03:27","slug":"the-crimes-of-chicago-working-with-data-in-mongodb","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/the-crimes-of-chicago-working-with-data-in-mongodb\/","title":{"rendered":"Getting Started with MongoDB and PowerShell: A Real-World Data Walkthrough"},"content":{"rendered":"<p><strong>This article walks through a complete MongoDB workflow using PowerShell: downloading a real-world public dataset (6.5 million Chicago crime records), importing it into MongoDB from CSV, creating indexes for query performance, running aggregation queries to summarise the data, and exporting results for Excel reporting. It uses Studio 3T as the MongoDB GUI equivalent of SSMS. The article is a hands-on introduction for SQL Server developers exploring MongoDB for the first time.<\/strong><\/p>\n<p>Why bother with MongoDB? Although MongoDB is not yet, to my eye, an obvious choice for processing enterprise-level data, it has a useful role in the enterprise. It can be a powerful tool for automating the process of getting hold of data and transforming it into a form that can then be consumed easily by reporting tools or for updating data in a relational database. It can be a useful adjunct to scripting.<\/p>\n<p>In this article, I\u2019ll be showing how to achieve this, using an open-data example. MongoDB\u2019s obvious attraction is that it can read document data in JSON or BSON: it can even ingest CSV. It allows you to do a wide range of transformations of document data containing objects and arrays, far more than is possible with SQL Server alone.<\/p>\n<h2>Hands-on Approach to Getting Started<\/h2>\n<p>There are a number of third-party tools that allow you to manipulate and use MongoDB without having to engage in a lot of learning. For me, the equivalent to SSMS is<a href=\"https:\/\/studio3t.com\/\"> Studio3T<\/a>. You can get most of what you initially want by judicious clicking. It allows you to use SQL to generate mongo shell queries for reasonably simple joins and grouping aggregations. Its Aggregation editor is a lifeline because it allows development of complex pipelines to be done in easy steps. At some stage, however, once you\u2019ve developed MongoDB queries and aggregations, you\u2019ll want to supplement this with some form of automation processes. It could be that you want an automated data feed, or a system for collecting performance baselines. You may need to do routine reports or graphics. You will need some sort of processes that will test data, sequence MongoDB code and execute such code conditionally.<\/p>\n<p>MongoDB itself comes with several command line-interface tools which allow for easy automation. In this article, we\u2019ll just get started by using the part of MongoDB shell that provides a JavaScript API for database operations. The shell also provides higher-level commands that are not JavaScript, that are geared for admin: administrating roles, users, and logs. To run queries and get all the information that we want about database objects, we need the JavaScript Shell.<\/p>\n<p>If you are using MS Windows, then you\u2019ll generally want to use PowerShell. With PowerShell, you have several choices for automating MongoDB. You can use a module to do the work for you, you can use ODBC with a suitable driver, you can use the C# interface, or you can use MongoDB.exe. If you need to do asynchronous queries or anything elaborate, then consider the C# interface, but I like a simple life, am not too bothered about small performance issues, and I like the comfort of knowing that my interface is always up-to-date, so I use the command-line interface.<\/p>\n<h2>Installing MongoDB.<\/h2>\n<p>MongoDB is best installed on its own server, but it is happy with a reasonably-sized PC. The more memory you can provide it, the happier it is. However, if you install it on your slow workstation or laptop, you will be more attuned to performance issues and keener to deal with them. I usually use <a href=\"https:\/\/chocolatey.org\/\">Chocolatey<\/a> to install MongoDB and <a href=\"https:\/\/studio3t.com\/\">Studio3T<\/a>, and I use it to subsequently keep my MongoDB instances up-to-date.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">choco install studio3t -y\nchoco install  mongodb -y<\/pre>\n<p>At the time of writing, Version 4 is out and features a proper windows installer. This is not yet on Chocolatey, so you would need to download the installer from the MongoDB site. There are instructions in the <a href=\"https:\/\/docs.mongodb.com\/manual\/tutorial\/install-mongodb-on-windows\/#get-mongodb-community-edition\">Install MongoDB Community Edition<\/a>. I don\u2019t want to repeat these here as they may change over time.<\/p>\n<p>For the details of how to install MongoDB on a Windows server with authentication, I\u2019ve written an <a href=\"https:\/\/studio3t.com\/knowledge-base\/articles\/setup-mongodb-windows-powershell\/\">article<\/a> with my advice on how to go about it.<\/p>\n<p>There is a good introduction to getting started with MongoDB for Windows here <a href=\"https:\/\/www.red-gate.com\/simple-talk\/dotnet\/net-framework\/mongodb-basics-for-net-by-example\/\">MongoDB Basics for .NET by Example<\/a> which gives more detail.<\/p>\n<p>The database server can be started using MongoD.exe or run as a service, again using MongoD.exe. To start from PowerShell, navigate to the MongoDB Bin directory and run this command:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">.\\mongod.exe<\/pre>\n<p>To prevent the MongoDB database server from shutting down, keep this window open and launch a new PowerShell window to follow along with the rest of the article. MongoDB can be alternatively <a href=\"https:\/\/docs.mongodb.com\/manual\/tutorial\/install-mongodb-on-windows\/#configure-a-windows-service-for-mongodb-community-edition\"> run as a service<\/a>. (instructions in the link) If you run it on a server as a service and wish to access it across a network, you will need to ensure that there is access to the port you select<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">&lt;# Cut a hole in the firewall for the designated port #&gt;\nNew-NetFirewallRule &lt;#now allow it through the firewall #&gt; `\n    -DisplayName \"Allow MongoDB\" `\n    -Direction Inbound `\n    -Protocol TCP `\n    -LocalPort $MongoPort `\n    -Action Allow<\/pre>\n<p>Where $mongoport is the port you choose \u2013 usually 27017. You will also need to set the <strong>&#8211;bind_ip_all<\/strong> parameter in mongod.exe to allow it to respond to all the network cards on the server. This server-based installation requires authentication but setting this up is outside the scope of this article. I give instructions <a href=\"https:\/\/studio3t.com\/knowledge-base\/articles\/setup-mongodb-windows-powershell\/\">here<\/a>.<\/p>\n<h2>Using PowerShell<\/h2>\n<p>We\u2019ll assume that you\u2019ve got MongoDB installed in the <em>C:\\Program Files\\MongoDB\\Server\\<\/em> directory and you are using version 4.0. Obviously, the version will change over time. Run this code to set the directory and alias. Using this trick will allow the MongoDB commands to behave a bit like PowerShell cmdlets.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">&lt;#The default MongoDB Community Edition location #&gt;\n$mongoDirectory='C:\\Program Files\\MongoDB\\Server\\4.0\\bin'\nSet-Alias mongo \"$mongoDirectory\\mongo.exe\"\n#so we can test it out\nmongo --help<\/pre>\n<p>This will display the basic command-line options and parameters for the Mongo CLI.<\/p>\n<p>Let\u2019s just create a database, create a collection within that database, and finally insert some data into the collection. Having done that, we find what we selected and bring it back to the client. We\u2019ll use shell methods to achieve this. We will start out by assuming that we have not yet set up authentication and are using the local server. We could do it all in one script, but let\u2019s not get carried away yet. We will blithely assume that everything goes well<\/p>\n<pre class=\"lang:ps theme:powershell-ise\"># create the database by first accessing it ....\n$db = (mongo --quiet --eval \"dbnew=db.getSiblingDB('deleteMe')\")\n# ... and now creating a collection (normally we'd want to create collations etc)\nmongo $db --quiet --eval \"db.createCollection('sample');\"\n# write a document to the collection we have just created \nmongo $db --quiet --eval \"db.sample.save( { 'item': 'book', 'qty': 40 } )\"\n#return the resulting document as a BSON document string\nmongo $db --quiet --eval \"db.sample.find().pretty().shellPrint()\"<\/pre>\n<p>This should have returned \u2026<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80264\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/Imgcode2.png\" alt=\"\" width=\"1400\" height=\"136\" \/><\/p>\n<p>We ought to tidy this up a bit.<\/p>\n<p>For a start, once you get beyond accessing MongoDB on the same machine, then you\u2019ve got a lot more parameters to specify. Now, it is likely that we will want to use a network-hosted MongoDB server that has authentication. We might even be using an encrypted connection if we are using a cloud-based server. You have your credentials <strong>(&#8211;username, &#8211;password\u00a0and\u00a0\u2013authenticationDatabase<\/strong>) \u00a0, the host name (<strong>&#8211;host<\/strong>), the port it uses (<strong>&#8211;port<\/strong>), the specification of the TLS\/SSL and so on. Soon, you end up with a mess of detail for the connection. Fortunately, there are two solutions. You can connect to the host via a connection string that contains all the connection information, or you can splat.<\/p>\n<p>First set up your details<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$MongoPort=27017  #TCP port 27017 is the default port used by MongoDB.\n$Mongohost='MyServer' \n$Admin='Admin'\n$AdminPassword='MyAdminPassword'<\/pre>\n<p>We can set up a connection string and use it with Mongo<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$connectionString=\"mongodb:\/\/$($Admin):$($AdminPassword)@$($Mongohost):$($MongoPort)\/admin?readPreference=primary\"\n \nmongo --quiet --host $connectionString --eval \"db.adminCommand('listDatabases')\"<\/pre>\n<p>We can also do an array splat.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\"><a id=\"post-80208-_Hlk520216301\"><\/a>$p=@('--quiet',\n \u00a0\u00a0\u00a0\u00a0'--host',\"MyMongoHost\",\n \u00a0\u00a0\u00a0\u00a0'--port',\" \",\n \u00a0\u00a0\u00a0\u00a0'-u',\"`\"$Admin`\"\",\n \u00a0\u00a0\u00a0\u00a0'-p',\"`\"$AdminPassword`\"\",\n \u00a0\u00a0\u00a0\u00a0'--authenticationDatabase','\"admin\"',\n \u00a0\u00a0\u00a0\u00a0'--eval')<\/pre>\n<p>Now, we can use mongo much more tidily:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\"># ensure that the admin password works\n$Databases=(mongo  @p \"db.adminCommand('listDatabases')\")|ConvertFrom-JSON\n$Databases.databases|format-table<\/pre>\n<p>You can alter values on the fly, though this is unlikely to be necessary. Here we change the port value<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$p.IndexOf('--port')|foreach{if ($_ -ne -1){$p[$_+1]='12234'}}<\/pre>\n<p>If you don\u2019t have any security, then leave out those parameters<\/p>\n<pre class=\"lang:ps theme:powershell-ise\"><a id=\"post-80208-_Hlk520216381\"><\/a>$p=@('--quiet',\n \u00a0\u00a0\u00a0\u00a0'--host',\"MyMongoHost\",\n \u00a0\u00a0\u00a0\u00a0'--port',\" \",\n \u00a0\u00a0\u00a0\u00a0'--authenticationDatabase','\"admin\"',\n \u00a0\u00a0\u00a0\u00a0'--eval')<\/pre>\n<p>And if you are also using your local workstation then you don\u2019t even need the <strong>\u2013host<\/strong> parameter.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$p=@('--quiet',\n \u00a0\u00a0\u00a0\u00a0'--port',\" \",\n \u00a0\u00a0\u00a0\u00a0'--authenticationDatabase','\"admin\"',\n \u00a0\u00a0\u00a0\u00a0'--eval')<\/pre>\n<p>The other thing we need to do is to put in some error handling.<\/p>\n<p>Some shell functions return a JSON string that tells you how it did. We can read this in PowerShell to create native PowerShell objects so we can check the status. Also, we would rather like to save the BSON (binary JSON) result as a native PowerShell object.<\/p>\n<p>That <strong>ObjectId()<\/strong> wrapper isn\u2019t JSON, but BSON. It has extra functions that tell us what datatype the value is. It can\u2019t be parsed by<strong> ConvertFrom-JSON. <\/strong>The same is true of <strong>Date()<\/strong> and <strong>ISODate()<\/strong> functions embedded in the <strong>BSON()<\/strong>. We can heave out the <strong>ObjectId<\/strong>() with a simple regex. If we wanted to strip out all three functions we\u2019d want to use the slightly more complicated regex replacement which would use syntax like this\u2026.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$result = $subject -ireplace '(NumberInt\\(|ObjectId\\(|ISODate\\(|Date\\(|NumberLong\\()(?&lt;innerds&gt;[\"]?[\\w\\.:\\-\\+]{1,50}[\"]?)\\)', '${innerds}'<\/pre>\n<p>So here is our improved code, showing how the <strong>ireplace<\/strong> is used. To avoid an error if you have already created the database and collection, you can rename them as needed.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\"># create the database by first accessing it ....\n$db = (mongo @p \"dbnew=db.getSiblingDB('deleteMe')\")\n# ... and now creating a collection (normally we'd want to create collations etc)\n$success = (mongo $db @p \"db.createCollection('sample');\") | ConvertFrom-JSON\nif ($success.ok -ne 0) # if that went well ...\n{\n\t# write a document to the collection we have just created\n\tmongo $db @p \"db.sample.save( { 'item': 'book', 'qty': 40 } )\";\n\t#and now read it back out\n\t$result =\n\t((mongo $db @p \"db.sample.find().pretty().shellPrint()\") `\n\t\t-ireplace 'ObjectId\\(\"(?&lt;innerds&gt;[\\w]{1,50})\"\\)', '\"${innerds}\"') |\n\tConvertFrom-JSON\n}\nelse #save the resulting document as a PowerShell object\n{ write-warning $success.errmsg }\n$result #and display what we got back<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80276\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcod8-1.png\" alt=\"\" width=\"1175\" height=\"61\" \/><\/p>\n<p>We can now check to see what databases we have, and what collections and views are present in each database.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">#OK, lets list our databases\n$DatabaseCollection = @()\n((mongo @p \"db.adminCommand('listDatabases')\") |\n\tConvertFrom-JSON).databases | foreach{\n\t$This = @{ };\n\t$This.DatabaseName = $_.name;\n\t$This.Collections =\n\t(mongo $_.name @p 'db.getCollectionNames()' | ConvertFrom-JSON);\n\t$DatabaseCollection += $This\n}\n$DatabaseCollection<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80262\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/Imgcode1.png\" alt=\"\" width=\"1397\" height=\"216\" \/><\/p>\n<p>And it is easy to see what is there. To see the information in JSON format, run:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$DatabaseCollection|ConvertTo-JSON<\/pre>\n<h2>Preparing a Database<\/h2>\n<p>In order to demonstrate the basics, we\u2019ll use a real database. Particularly when dealing with database performance, nothing works as well as real data of a real size.<\/p>\n<p>I wanted a database that provided a good learning potential and had plenty of data. Human nature being what it is, the open data police records are perfect for databases of a reasonable size. It is ideal as a practice database because it has a number of interesting details, including location data. I was interested to find out much more of the detail of the trends in crime in Chicago, so I downloaded the <a href=\"https:\/\/data.cityofchicago.org\/Public-Safety\/Crimes-2001-to-present\/ijzp-q8t2\">Chicago database of crimes<\/a> from 2001 to the present provided by the Chicago police department.<\/p>\n<p>Crime levels were steeply on the decline in Chicago, and I wanted to find out more of the detail. Were all types of crime in decline or were we just seeing fewer of the more common crimes such as theft? Was there any truth in the idea that crimes increased significantly on particular days of the week?<\/p>\n<p>The data can be accessed from this page \u2026<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"408\" height=\"210\" class=\"wp-image-80243\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-19.png\" \/><\/p>\n<p>\u2026 by clicking this link.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"420\" height=\"434\" class=\"wp-image-80244\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-20.png\" \/><\/p>\n<p>The <strong>MongoImport<\/strong> utility can import data in Extended JSON, CSV, or TSV. Tab-Separated format is probably the safest. We\u2019ll create the Crimes database and a staging collection. Then we\u2019ll stock the staging collection with the data (replace the path to the file, of course).<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">Set-Alias MongoImport  \"$mongoDirectory\\mongoimport.exe\"\n#set this to the location of the exported file\n$ImportedFile='C:\\Downloads\\Crimes_-_2001_to_present.tsv'\n$db = (mongo @p \"dbnew=db.getSiblingDB('Crimes')\")\n$success=(mongo $db  @p \"db.createCollection('staging');\")|ConvertFrom-JSON\nif ($success.ok -ne 0)\n  {mongoimport --db Crimes --collection staging --type tsv --headerline --file $ImportedFile}\nelse\n    {write-warning $success.errmsg }<\/pre>\n<p>If the import is working, you\u2019ll see the status in the PowerShell window:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"328\" class=\"wp-image-80245\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-21.png\" \/><\/p>\n<p>We\u2019ll keep this string value for the database in for further examples, but you can simply substitute the literal string \u2018Crimes\u2019 if you prefer.<\/p>\n<p>We can now examine a record<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p 'db.staging.find({}).limit(1).pretty()\u2019;<\/pre>\n<p>Assuming that it all went well, we\u2019ll see this.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80266\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcode3.png\" alt=\"\" width=\"1399\" height=\"673\" \/><\/p>\n<p>Beyond changing the two dates to an ISO form that is suitable for MongoDB, there is going to be very little preparation work that we will need to do , which is just as well if you are re-reading the database from its source regularly to keep it up-to-date.<\/p>\n<p>This following command inserts a slightly modified form of the collection into the empty Crimes Collection of the Crimes database. We are using the \u2018staging\u2019 collection merely to make any data conversions or cleaning that we require. You can, of course, do the same thing by importing straight into the Crimes collection and updating the data. Those 1.0 values don\u2019t represent the data, it tells the $project stage to include that item (zero if you don\u2019t want it).<\/p>\n<p>NOTE: In MongoDB, the term \u2018aggregate\u2019 is confusing for database people, because in an RDBMS aggregation will almost always result in a summary of the data. In MongoDB, any data transformation other than returning fields from a set of documents within a collection is an aggregation. The <strong>find()<\/strong> method is really just that (See my article <a href=\"https:\/\/studio3t.com\/whats-new\/mongodb-find-method\/\">How to Use the MongoDB find() Method)<\/a>.We can only do a limited range of projection options with the <strong>find()<\/strong> method so we\u2019ll need an aggregation method.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$aggregate=@\"\ndb.Crimes.insert(db.staging.aggregate(\n    [\n        { \n            '`$project' : {\n                '_id' : 1.0, \n                'ID' : 1.0, \n                'Case Number' : 1.0, \n                'Date' : {\n                    '`$dateFromString' : {\n                        'dateString' : '`$Date'\n                    }\n                }, \n                'Block' : 1.0, \n                'IUCR' : 1.0, \n                'Primary Type' : 1.0, \n                'Description' : 1.0, \n                'Location Description' : 1.0, \n                'Arrest' : 1.0, \n                'Domestic' : 1.0, \n                'Beat' : 1.0, \n                'District' : 1.0, \n                'Ward' : 1.0, \n                'Community Area' : 1.0, \n                'FBI Code' : 1.0, \n                'X Coordinate' : 1.0, \n                'Y Coordinate' : 1.0, \n                'Year' : 1.0, \n                'Updated On' : {\n                    '`$dateFromString' : {\n                        'dateString' : '`$Updated On'\n                    }\n                }, \n                'Latitude' : 1.0, \n                'Longitude' : 1.0, \n                'Location' : 1.0\n            }\n        }\n    ], \n    { \n        'allowDiskUse' : false\n    }\n).toArray());\n\"@\nmongo $db @p $aggregate<\/pre>\n<p>There were 6526582 documents of 12 fields, taking 2.8 GB, Here is a typical database record. (using a slight modification of the previous <strong>find <\/strong>code &#8211; <strong>db.Crimes.find({}).limit(1).pretty();<\/strong><\/p>\n<pre class=\"theme:powershell-output lang:ps decode:true\">{ \n    \"_id\" : ObjectId(\"5a8f06e7318582208846f358\"), \n    \"ID\" : NumberLong(634), \n    \"Case Number\" : \"G000705   \", \n    \"Date\" : ISODate(\"2001-01-01T10:40:00.000+0000\"), \n    \"Block\" : \"024XX W MONROE ST\", \n    \"IUCR\" : \"0110\", \n    \"Primary Type\" : \"HOMICIDE\", \n    \"Description\" : \"FIRST DEGREE MURDER\", \n    \"Location_Description\" : \"CHA STAIRWELL\", \n    \"Arrest\" : false, \n    \"Domestic\" : false, \n    \"Beat\" : NumberInt(1125), \n    \"District\" : NumberInt(11), \n    \"Ward\" : NumberInt(2), \n    \"Community Area\" : NumberInt(28), \n    \"FBI Code\" : \"01A\", \n    \"X Coordinate\" : NumberLong(1159921), \n    \"Y Coordinate\" : NumberLong(1899604), \n    \"Year\" : NumberInt(2001), \n    \"Updated On\" : ISODate(\"2015-08-17T14:03:40.000+0000\"), \n    \"Latitude\" : 41.880224549, \n    \"Longitude\" : -87.688248952, \n    \"Location\" : \"(41.880224549, -87.688248952)\"\n}<\/pre>\n<p>There are several interesting points here. Each crime is given an IUCR code (Illinois Uniform Crime Reporting code) which are four digit codes that law enforcement agencies in Illinois generally have adopted to use to classify criminal incidents when taking individual reports. Other states will have their own set of codes. The Chicago Police Department currently uses more than 350 IUCR codes to classify criminal offenses. It provides the meaning of each code as well in each document, but this, in database terms redundant. The Chicago version of these codes can be obtained from here: <a href=\"https:\/\/data.cityofchicago.org\/Public-Safety\/Chicago-Police-Department-Illinois-Uniform-Crime-R\/c7ck-438e\/data\">Chicago Police Department &#8211; Illinois Uniform Crime Reporting (IUCR) Codes:<\/a><\/p>\n<p>There is an FBI code for \u2018Index\u2019 offenses that are collected by the FBI to document crime trends, and include major crimes such as murder, criminal sexual assault, robbery, aggravated assault &amp; battery, burglary, theft, motor vehicle theft, and arson. The FBI Code tells you which, of 26 different categories (if any) the crime comes into.<\/p>\n<p>To save you having to prepare this extra data, I have supplied it with the article. (See the Downloads section at the bottom of the article.)<\/p>\n<p>You can read it in simply like this<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongoimport --db Crimes --collection iucr --type json --jsonArray --stopOnError --file S:\\work\\programs\\Mongo\\Crimes.iucr.json\nmongoimport --db Crimes --collection NIBRS --type json --jsonArray --stopOnError --file S:\\work\\programs\\Mongo\\Crimes.NIBRS.json<\/pre>\n<p>These routines currently throw spurious errors that are not visible to PowerShell\u2019s Try\/Catch. Check the error values which are usually messages to say that the records were imported!<\/p>\n<p>If you need to subsequently update data, you can use <strong>MongoImpor<\/strong>t to merge or \u2018upsert\u2019 into the existing data within MongoDB. You need MongoImport\u2019s <strong>\u2013mode<\/strong> option To do this, you\u2019ll need to define an <strong>upsertField<\/strong>: this would be the original <strong>ID <\/strong>or maybe the <strong>Case Number<\/strong> in our data.<\/p>\n<h2>Tackling the Database.<\/h2>\n<p>The Crimes database, at six and a half million records, is too big to go into a spreadsheet but fits happily in either a relational database or MongoDB on the laptop, so it is ideal for experimentation. We aim to get to the level of investigation shown below, in this case using Sparklines. To do this, we need to get the data into Excel.<\/p>\n<p>To get the data to the point where we can do reporting on it within Excel, we need to reduce our working data within this database. We need to have an aggregation that just has the information that we need for our analysis. Once we have extracted just the essential information, then we\u2019re ready to do the analysis on a dataset that is quick and easy to deal with. The objective is to reduce this data down to something that can be queried easily or exported into an application that allows you to slice and dice the data to explore trends.<\/p>\n<p>In studying the data, we can reduce the detail down to a manageable size to create a collection that can be aggregated to provide data split a number of different ways. We don\u2019t need all the crime descriptions on every record, and, if we can provide all the totals for each type of crime down to a week or month, then that is sufficient. We aren\u2019t going to do any mapping for this exercise, so the location codes can go. We still have the full data, so we can extract another aggregation when it suits us.<\/p>\n<p>The sort of queries we\u2019ll want to perform are ones like this, which shows the sparklines for each type of crime between 2001 and 2018.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"458\" height=\"761\" class=\"wp-image-80247\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-23.png\" \/><\/p>\n<h2>Driving the Machine<\/h2>\n<p>Before we can query the data in the Crimes collection, we will need to index it. Without an index in the imported collection beyond the default primary key, each query takes half a minute. If you try using lookups to information in other collections, such as the IUCR codes or FBI codes, the individual queries soon run into minutes. Here\u2019s how to <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/method\/db.collection.createIndex\/\">create a MongoDB index<\/a>. We\u2019ll index the CrimeID with a unique index.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p @'\ndb.collection.createIndex({\n   ID: 1 \/\/field to include\n},\n  {\n    'name' : 'CrimeID', \n    'unique' : true\n})\n'@<\/pre>\n<p>This returns a json document that you can parse<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80268\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcode4.png\" alt=\"\" width=\"1402\" height=\"165\" \/><\/p>\n<p>The type of additional index you\u2019d want is based on the sort of querying you\u2019d want to do. You can do covering indexes very effectively and they are a great performance booster.<\/p>\n<p>Before we start using the collection methods for querying and aggregation, I must prepare you for a shock. We are using JavaScript. Whereas, in SQL, we are able to express the result that we want, in MongoDB, we have to explain how to get it, using JavaScript. Actually, MongoDB has a query optimiser and will decide the best indexes to use and so on. More recently, it has been able to change the order of pipeline operators to optimise the process and ensure that any filtering is done as early as possible in the process.<\/p>\n<p>It is possible to create queries and statements in SQL and convert them to MongoDB JavaScript. There are third-party utilities that do this to a small extent but generally they fail to translate any but the simplest SQL code, or you can use a good ODBC driver which will allow you to use SQL to do selection and grouping. With the profiler that <a href=\"https:\/\/studio3t.com\/whats-new\/mongodb-query-performance\/\">I explain here<\/a>, you can then examine, modify, adapt and use the MongoDB code that you retrieve.<\/p>\n<h2>Creating Aggregated Collections<\/h2>\n<p>If we want to do effective reporting with a tool such as Excel, we will need the quantity of crimes of every category for every month, every weekday, and every year. We\u2019re not, for the purposes of this demonstration, interested in doing it for every quarter. As well as the IUCR categories, we will want broader categories such as the FBI\u2019s <strong>NIBRS<\/strong> categories<\/p>\n<p>I have supplied both the <strong>iucr <\/strong>codes and the <strong>NIBRS<\/strong> codes and, if you\u2019ve been following along, you will have loaded them. If you prefer, you can import the complete IUCR codes from the <a href=\"https:\/\/data.cityofchicago.org\/Public-Safety\/Chicago-Police-Department-Illinois-Uniform-Crime-R\/c7ck-438e\/data\">Chicago Police Department &#8211; Illinois Uniform Crime Reporting (IUCR) Codes:<\/a> or alternatively just get the codes and their meanings from the Crimes collection.<\/p>\n<p>So now we have three collections. <strong>Crimes<\/strong>, <strong>iucr <\/strong>and<strong> NIBRS<\/strong>.<\/p>\n<p>Here is a typical document of the <strong>iucr<\/strong> collection that we can retrieve via this PowerShell statement<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p \"db.iucr.find( { 'IUCR' : '501H' }).pretty();\u201d<\/pre>\n<p>This will return this JSON document<\/p>\n<pre class=\"theme:powershell-output lang:ps decode:true\">{ \n    \"_id\" : ObjectId(\"5a99936b3185822088aa8dd2\"), \n    \"IUCR\" : \"501H\", \n    \"NIBRS\" : \"26\", \n    \"category\" : \"none\", \n    \"details\" : \"hazardous materials violation \", \n    \"indexCode\" : \"N \", \n    \"offense\" : \"other offense\"\n}<\/pre>\n<p>You\u2019ll notice that it maps the NIBRS or \u2018FBI Code\u2019 in the main collection. This, in turn, can link us to the information on the type of crime in the NIBRS associated with that. We can view this via this script.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p \"db.NIBRS.find( { 'Code' : '26' });.pretty()\"<\/pre>\n<p>\u2026which returns \u2026<\/p>\n<pre class=\"theme:powershell-output lang:ps decode:true \">{ \n    \"_id\" : ObjectId(\"5a902cb73185822088aa8bba\"), \n    \"Name\" : \"MISC NON-INDEX OFFENSE \", \n    \"Code\" : \"26\", \n    \"Type\" : \"CRIME AGAINST SOCIETY\", \n    \"Definition\" : \"DEFINITION: THE VIOLATION OF MISCELLANEOUS LAWS OR ORDINANCES.\"\n}<\/pre>\n<p>Before we present the full aggregation, we\u2019ll go over the operations that we need to include in the final query<\/p>\n<p>The first thing we\u2019ll want to do is to extract from the full database just the lowest aggregation we are likely to need for our purposes. We\u2019d want to do something like this.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$JSONresult= @'\ndb.Crimes.aggregate([\n\t\/\/ Stage 1 : Group by year, month and day of week\n    { $group: { _id: { crime: '$IUCR', date_year: { $year: '$Date' }, \n      date_month: { $month: '$Date' }, date_dayofweek: { $dayOfWeek: '$Date' } },\n      count: { $sum: 1 } } },\n \t\/\/ Stage 2 : specify the output \n    {$jason\n        $project: {\n            'IUCR': '$_id.crime',\n            'Year': '$_id.date_year',\n            'Month': '$_id.date_month',\n            'Weekday': '$_id.date_dayofweek',\n            'count': 1,\n            '_id': 0\n        }\n    }],\n    { allowDiskUse: true }).toArray();\n'@ | foreach {mongo $db @p  $_ }<\/pre>\n<p>This will give the number of crimes committed for every date period in any month, broken down by the day of the week. (e.g. Monday, Tuesday etc).<\/p>\n<p>We will also need to join to the IUCR table to get a description of each crime.<\/p>\n<p>We can do this via a lookup (JOIN to you) which we can now add<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$JSONresult= @'\ndb.Crimes.aggregate([\n  \/\/ Stage 1 : Group by year, month and day of week\n  {\n    $group: {\n      _id: {\n        crime: '$IUCR',\n        date_year: {\n          $year: '$Date'\n        },\n        date_month: {\n          $month: '$Date'\n        },\n        date_dayofweek: {\n          $dayOfWeek: '$Date'\n        }\n      },\n      count: {\n        $sum: 1\n      }\n    }\n  },\n  \/\/ Stage 2 : specify the output \n  {\n    $project: {\n      'IUCR': '$_id.crime',\n      'Year': '$_id.date_year',\n      'Month': '$_id.date_month',\n      'Weekday': '$_id.date_dayofweek',\n      'count': 1,\n      '_id': 0\n    }\n  }, { \/\/ Stage 3 : join to the IUCR table\n    $lookup: {\n      from: 'iucr',\n      localField: 'IUCR',\n      foreignField: 'IUCR',\n      as: 'IUCRDetails'\n    }\n  }\n], {\n  allowDiskUse: true\n}).toArray();\n'@ | foreach {mongo $db @p  $_ }<\/pre>\n<p>Which will give a collection of documents like this<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80280\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcode-9.png\" alt=\"\" width=\"942\" height=\"330\" \/><\/p>\n<p>And we also need to look up the FBI crime categories in the same way, so we can do crime figures split by their categories.<\/p>\n<p>We do not actually want the output arrays in the final aggregation so we remove just the data we want and remove the rest.<\/p>\n<p>We can now put all this altogether in a final aggregation that we can then use to slice and dice. This is similar to the SELECT \u2026 INTO statement, because it creates a new collection that then becomes the basis of all further aggregation.<\/p>\n<p>We, of course, want to use an automated version so we can re-aggregate when we refresh the data<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$aggregation=@'\ndb.CrimeAggregation.insert(db.Crimes.aggregate(\n    [\n        { \/\/ Group by year, month and day of week \n            '$group' : {\n                '_id' : {\n                    'crime' : '$IUCR', \n                    'date_year' : {\n                        '$year' : '$Date'\n                    }, \n                    'date_month' : {\n                        '$month' : '$Date'\n                    }, \n                    'date_dayofweek' : {\n                        '$dayOfWeek' : '$Date'\n                    }\n                }, \n                'count' : {\n                    '$sum' : 1.0\n                }\n            }\n        }, \n        { \/\/ specify the output from the grouping\n            '$project' : {\n                'IUCR' : '$_id.crime', \n                'Year' : '$_id.date_year', \n                'Month' : '$_id.date_month', \n                'Weekday' : '$_id.date_dayofweek', \n                'count' : 1.0, \n                '_id' : 0.0\n            }\n        }, \n        { \/\/join to the iucr aggregation. Make sure both are numbers!\n            '$lookup' : {\n                'from' : 'iucr', \n                'localField' : 'IUCR', \n                'foreignField' : 'IUCR', \n                'as' : 'IUCRDetails'\n            }\n        }, \n        { \/\/join to the NIBRS aggregation. Make sure both are numbers!\n            '$lookup' : {\n                'from' : 'NIBRS', \n                'localField' : 'IUCRDetails.NIBRS', \n                'foreignField' : 'Code', \n                'as' : 'NIBRS'\n            }\n        }, \n        { \/\/flatten all the values we need from the subarrays\n            '$project' : {\n                'NIBRScategory' : {\n                    '$reduce' : {\n                        'input' : '$NIBRS.Type', \n                        'initialValue' : '', \n                        'in' : {\n                            '$concat' : [\n                                '$value', \n                                '$this'\n                            ]\n                        }\n                    }\n                }, \n                'typeOfCrime' : {\n                    '$reduce' : {\n                        'input' : '$IUCRDetails.offense', \n                        'initialValue' : '', \n                        'in' : {\n                            '$concat' : [\n                                '$value', \n                                '$this'\n                            ]\n                        }\n                    }\n                }, \n                'category' : {\n                    '$reduce' : {\n                        'input' : '$IUCRDetails.category', \n                        'initialValue' : '', \n                        'in' : {\n                            '$concat' : [\n                                '$value', \n                                '$this'\n                            ]\n                        }\n                    }\n                }, \n                'NIBRS' : {\n                    '$reduce' : {\n                        'input' : '$IUCRDetails.NIBRS', \n                        'initialValue' : '', \n                        'in' : {\n                            '$concat' : [\n                                '$value', \n                                '$this'\n                            ]\n                        }\n                    }\n                }, \n                'count' : 1.0, \n                'Year' : 1.0, \n                'Month' : 1.0, \n                'Weekday' : 1.0, \n                'IUCR' : 1.0\n            }\n        }\n    ], \n    { \/\/ set the collation for the new collection\n        'allowDiskUse' : true, \n        'collation' : {\n            'locale' : 'en_US', \n            'caseLevel' : true, \n            'caseFirst' : 'off', \n            'strength' : NumberInt(1)\n        }\n    }\n).toArray())\n'@\nmongo $db @p  $aggregation<\/pre>\n<p>We now have the basic aggregation collection we need, and here is a sample document using the powerShell code<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p \"db.CrimeAggregation.find({}).limit(1).pretty();\"<\/pre>\n<p>\u2026 which returns this JSON document<\/p>\n<pre class=\"theme:powershell-output lang:ps decode:true\">{ \n    \"_id\" : ObjectId(\"5ae9e6327ddc2daf6a5c0102\"), \n    \"count\" : 6.0, \n    \"IUCR\" : 110.0, \n    \"Year\" : 2018.0, \n    \"Month\" : 1.0, \n    \"Weekday\" : 1.0, \n    \"NIBRScategory\" : \"CRIMES AGAINST PERSONS\", \n    \"typeOfCrime\" : \"homicide\", \n    \"category\" : \"murder\", \n    \"NIBRS\" : \"01A\"\n}<\/pre>\n<p>So now we have all the information we need for some pretty cool reporting. We now have a much more manageable collection with just 245,000 records and taking a mere 20Mb of space. This extract will be a lot more serviceable.<\/p>\n<p>we can run all the obvious aggregate queries from it such as the number of crimes per weekday.<\/p>\n<p>As you\u2019d expect, there are more crimes on a Saturday, and fewer on a Monday, but it is only around ten percent difference. This query ran in half a second on a very slow laptop.<\/p>\n<p>In PowerShell we would do this:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$aggregation=@'\ndb.CrimeAggregation.aggregate(\n    [\n        { \n            '$group' : {\n                '_id' : {\n                    'Weekday' : '$Weekday'\n                }, \n                'gp1' : {\n                    '$first' : '$Weekday'\n                }, \n                'gp2' : {\n                    '$sum' : '$count'\n                }\n            }\n        }, \n        { \n            '$sort' : {\n                'gp1' : 1.0\n            }\n        }, \n        { \n            '$project' : {\n                \n                '_id': 0,\n                'Weekday': '$gp1',\n                'CrimeCount': '$gp2',\n                'DayName': {\n                    $switch: {\n                      branches: [{\n                          case: {$eq: ['$gp1', 1]}, then: 'Sunday'},{\n                          case: {$eq: ['$gp1', 2]}, then: 'Monday'}, {\n                          case: {$eq: ['$gp1', 3]}, then: 'Tuesday'}, {\n                          case: {$eq: ['$gp1', 4]}, then: 'Wednesday'},{\n                          case: {$eq: ['$gp1', 5]}, then: 'Thursday'}, {\n                          case: {$eq: ['$gp1', 6]}, then: 'Friday'}, {\n                          case: {$eq: ['$gp1', 7]}, then: 'Saturday'}\n                      ],\n                      default: 'Neverday'\n                    }\n                  }\n                }       \n             }\n        ], \n    { \n        'allowDiskUse' : false\n    }\n).toArray(); \/\/you always need to do this to return data to Powershell\n'@\n(mongo $db @p  $aggregation)|convertfrom-json<\/pre>\n<p>Notice that we must use single quotes and pass the results back as an array to allow PowerShell to convert it into a PowerShell object<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80271\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcode6.png\" alt=\"\" width=\"1399\" height=\"243\" \/><\/p>\n<p>We can also see which are the commonest crimes<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">mongo $db @p @'\ndb.CrimeAggregation.aggregate(\n    [\n        { \n            '$group' : {\n                '_id' : {\n                    'typeOfCrime' : '$typeOfCrime'\n                }, \n                'COUNT(*)' : {\n                    '$sum' : NumberInt(1)\n                }\n            }\n        }, \n        { \n            '$project' : {\n                '_id' : NumberInt(0), \n                'Type Of Crime' : '$_id.typeOfCrime', \n                'Incidents' : '$COUNT(*)'\n            }\n        }, \n        { \n            '$sort' : {\n                'Incidents' : -1.0\n            }\n        }, \n        { \n            '$limit' : 20.0\n        }\n    ], \n    { \n        'allowDiskUse' : false\n    }\n);\n'@|Convertfrom-json| format-table<\/pre>\n<h1><img loading=\"lazy\" decoding=\"async\" width=\"549\" height=\"419\" class=\"wp-image-80251\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-27.png\" \/><\/h1>\n<h2>Moving on to Reporting<\/h2>\n<p>We are getting to the point where we can do more exploration using an appropriate tool such as Excel, so we export it to a CSV format that excel can read in. MongoExport.exe throws an unspecified category of error in PowerShell even if it works successfully, hence the awkward code. It works fine in the command shell but fails in PowerShell.<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">$mongoDirectory = 'C:\\Program Files\\MongoDB\\Server\\3.6\\bin'\n$MongoExportPath = \"$mongoDirectory\\mongoexport.exe\"\n$MongoExportFile = 'C:\\Downloads\\Crimeaggregation.csv'\ntry\n{\n\t&amp; <strong>cmd.exe<\/strong> \/c \"`\"$MongoExportPath`\" \/db:Crimes \/collection:CrimeAggregation \/fields:count,IUCR,Year,Month,Weekday,NIBRScategory,typeOfCrime,category,NIBRS \/type:csv --out $MongoExportFile\"\n}\ncatch\n{\n\tif ($_.Exception -like '*error*') { throw $_.Exception }\n}\n<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-80273\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/imgcode7.png\" alt=\"\" width=\"1169\" height=\"242\" \/><\/p>\n<p>We can now import this from within Excel and do a pivot table. This gives you a good summary over the years \u2026<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1382\" height=\"735\" class=\"wp-image-80252\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-28.png\" \/><\/p>\n<p>\u2026 from which we can draw sparklines<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"577\" height=\"678\" class=\"wp-image-80253\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-29.png\" \/><\/p>\n<p>We can also see how the crime level varies according to the time of the year, and a surface map tells us that February is a quiet time for all sorts of crime, both against persons and property, whereas July and August are the most difficult months.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"961\" height=\"763\" class=\"wp-image-80254\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/08\/word-image-30.png\" \/><\/p>\n<h2>Conclusions<\/h2>\n<p>So we\u2019ve done a quick walkthrough of using PowerShell with MongoDB all the way through from getting the raw data from the internet, reading it into MongoDB, and creating an extract of the full collection that allows us to do whatever ad-hoc aggregations we want. It is now so small in size that we can read it into Excel and produce graphs and charts to drill into the data and provide reports. At the up-front cost of around two minutes to reduce the full data down to our summary, we now have opened the road to quick exploration.<\/p>\n<p>By using PowerShell with MongoDB together, we\u2019ve effectively delivered the cleaned and aggregated data to the final destination, whether it be for reporting or a feed for a relational database. It is a combination with a great deal of potential.<\/p>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"https:\/\/studio3t.com\/\">Studio 3T &#8211; The GUI for MongoDB<\/a><\/li>\n<li><a href=\"https:\/\/chocolatey.org\/\">Chocolatey &#8211; The package manager for Windows<\/a><\/li>\n<li><a href=\"https:\/\/docs.mongodb.com\/manual\/tutorial\/install-mongodb-on-windows\/\">Install MongoDB Community Edition on Windows \u2014 MongoDB Manual<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/knowledge-base\/articles\/setup-mongodb-windows-powershell\/\">Setting Up a MongoDB Service Under Windows via PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/glossary\/\">Glossary \u2014 MongoDB Manual<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/dotnet\/net-framework\/mongodb-basics-for-net-by-example\/\">MongoDB Basics for .NET by Example &#8211; Simple Talk<\/a><\/li>\n<li><a href=\"https:\/\/data.cityofchicago.org\/Public-Safety\/Crimes-2001-to-present\/ijzp-q8t2\">Crimes &#8211; 2001 to present | City of Chicago | Data Portal<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/mongodb-find-method\/\">How to Use the MongoDB find() Method and Indexes to Find Data Quickly | Studio 3T<\/a><\/li>\n<li><a href=\"https:\/\/data.cityofchicago.org\/Public-Safety\/Chicago-Police-Department-Illinois-Uniform-Crime-R\/c7ck-438e\/data\">Chicago Police Department &#8211; Illinois Uniform Crime Reporting (IUCR) Codes | City of Chicago | <\/a><\/li>\n<li><a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/method\/db.collection.createIndex\/\">db.collection.createIndex() \u2014 MongoDB Manual<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/mongodb-query-performance\/\">How to Investigate MongoDB Query Performance | Studio 3T<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/introduction-to-mongodb-views\/\">A Practical Introduction to MongoDB Views<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/prepare-data-mongodb-collection\/\">How to Prepare Data Before Creating a MongoDB Collection<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/mongodb-index-strategy\/\">How to Develop an Effective MongoDB Indexing Strategy<\/a><\/li>\n<li><a href=\"https:\/\/studio3t.com\/whats-new\/improve-database-performance-with-mongodb-aggregation\/\">MongoDB, A Database with Porpoise<\/a><\/li>\n<\/ul>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: The Crimes of Chicago: Working with Data in MongoDB<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. How do I import CSV data into MongoDB?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use mongoimport.exe with the &#8211;type csv flag, specifying the CSV file path, the target database and collection name, and whether the first row contains field headers (&#8211;headerline). Example: mongoimport &#8211;db crimes &#8211;collection chicago &#8211;type csv &#8211;headerline &#8211;file crimes.csv. For large files, mongoimport handles chunked loading automatically.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. How do I run MongoDB queries from PowerShell?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Load the MongoDB .NET driver or use the Mongo shell via Invoke-Expression from PowerShell. The MongoDB .NET driver provides a C# API that PowerShell can use natively. Alternatively, run mongo.exe shell commands from PowerShell using Start-Process or the &amp; operator, passing JSON query strings as arguments. For production automation, the MongoDB C# driver with proper connection management is the recommended approach.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. What is the difference between MongoDB and SQL Server for data analysis?<\/h3>\n            <div class=\"faq-answer\">\n                <p>MongoDB is a document database that stores data as JSON-like documents (BSON) without a fixed schema. SQL Server is a relational database with a fixed schema and strong ACID transaction support. MongoDB is well-suited for hierarchical, variable-structure, or write-heavy workloads; SQL Server is better for structured analytical workloads with complex joins and aggregations. For analytical reporting, SQL Server typically provides better performance at scale. MongoDB&#8217;s aggregation pipeline can handle many reporting needs but requires different query thinking than T-SQL.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. What is Studio 3T and do I need it for MongoDB?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Studio 3T (formerly Robomongo\/MongoChef) is a GUI tool for MongoDB that provides a similar experience to SSMS for SQL Server: query editor, connection management, collection browser, and visual explain plans. It is not required &#8211; the MongoDB shell and mongoimport\/mongoexport command-line tools are sufficient for scripted workflows. Studio 3T is useful when exploring data interactively or for teams who prefer a GUI over command-line access.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Learn MongoDB with PowerShell using a real 6.5 million record dataset: install MongoDB, import CSV data, create indexes, run aggregation queries, and export results for Excel reporting.&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":true,"footnotes":""},"categories":[159161,35,143531],"tags":[5618],"coauthors":[6813],"class_list":["post-80208","post","type-post","status-publish","format-standard","hentry","category-mongodb","category-powershell","category-t-sql-programming-sql-server","tag-mongodb"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/80208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=80208"}],"version-history":[{"count":22,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/80208\/revisions"}],"predecessor-version":[{"id":109912,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/80208\/revisions\/109912"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=80208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=80208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=80208"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=80208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}