{"id":68089,"date":"2016-09-16T13:44:02","date_gmt":"2016-09-16T13:44:02","guid":{"rendered":"https:\/\/www.simple-talk.com\/?p=68089"},"modified":"2021-02-23T15:48:44","modified_gmt":"2021-02-23T15:48:44","slug":"start-big-data-apache-spark","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/cloud\/big-data\/start-big-data-apache-spark\/","title":{"rendered":"How to Start Big Data with Apache Spark"},"content":{"rendered":"<p>There is no single definition of Big Data, but there is currently a lot of hype surrounding it. An accurate operational definition is that organizations have to use Big Data when their data processing needs get too big for traditional relational database systems (RDBMS). Knowing that Big Data is really big and more common every day, it does not mean that starting from tomorrow you will be analyzing petabytes of data from <a href=\"http:\/\/wlcg.web.cern.ch\/\">The Large Hadron Collider<\/a> (I wish you). However; there is a quite big chance that you will come across the project where you will have to process the data that can\u2019t be stored on your personal flash drive.<\/p>\n<p>Hence, in this article I will start by explaining why it might be a good idea for a developer or data professional to get familiar with Apache Spark &#8211; a fast and general engine for large-scale data processing. First I explain the advantages of it, next we\u2019ll see some basic examples how to use its shell to write your first application. Then we will jump into Databricks &#8211; a simple, integrated development environment, where you can even use your familiar SQL skills to tackle the analysis of vast quantities semi-structured data.<\/p>\n<h2>Do you really need Apache Spark?<\/h2>\n<p>There are many data frameworks that allow you to process data and you probably have your favorite tool already in your tool belt. You\u2019re familiar with its syntax and you know its caveats or limitations; so what is a reason to learn yet another framework?<\/p>\n<p>If you are struggling in a project that has to deal with a high volume of unstructured data, and you need to get a measure of business insight from it, you probably need to use a completely different computing framework; one that allows you to process the data with ease without waiting hours for the results.<\/p>\n<h3>Old school approach<\/h3>\n<p>The first question that arises is, why can\u2019t you use Relational Database Systems (RDBMS) with many disks to do large-scale analysis? Why would you need a completely new data framework? The answer to these questions comes from the way that disk drives are evolving: seek time is improving more slowly than transfer rate. If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than it would by streaming through it sequentially at the speed of the transfer rate. You can also buy a Massively-parallel processor (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Massively_parallel_(computing)\">MPP<\/a>) SQL appliance that could do the job for you. Beware though that, <a href=\"http:\/\/download.microsoft.com\/download\/1\/2\/B\/12B251C0-09F0-472B-85F3-F03F91BF4976\/Microsoft_Analytics_Platform_System_Delivers_Best_TCO_to_Performance_white_paper_EN_US.pdf\">MPP gets used on expensive<\/a>, specialized hardware tuned for CPU, storage and network performance, whereas a cheaper solution such as Hadoop runs on a cluster of commodity servers.<\/p>\n<h3>Meet Apache Hadoop<\/h3>\n<p>You may be asking, if the old-school RDBMS approach is not likely to be appropriate for very large unstructured data, why not just use Hadoop with its classical MapReduce approach? It seems to be a mature technology as all 100 fortune companies like Google, Facebook, Twitter and LinkedIn use it to harness their Big Data.<\/p>\n<p>The Hadoop ecosystem emerged as a cost-effective way of working with such large data sets. It imposes a divide-and-conquer programming model, called MapReduce. Computation tasks are broken into units that can be distributed around a cluster of commodity servers, thereby providing cost-effective, horizontal scalability. Underneath this computation model is a distributed file system called Hadoop Distributed Filesystem (HDFS).<\/p>\n<p>MapReduce involves a two-step batch process:<\/p>\n<ul>\n<li><strong>Map phase<\/strong> &#8211; first, the data is partitioned and sent to mappers generating key value pairs;<\/li>\n<li><strong>Reduce phase<\/strong> &#8211; the key value pairs are then collated, so that the values for each key are together, and then the reducer processes the key value pairs to calculate one value per key.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-359.png\" width=\"528\" height=\"357\" \/><\/p>\n<p class=\"caption\">Figure 1: MapReduce in one picture<\/p>\n<h3>Apache Hive is coming!<\/h3>\n<p>If we\u2019re sure that RDBMS is not the right tool for our job, let\u2019s use Hadoop! However, a challenge remains to determine a way of moving to Hadoop an existing data infrastructure based on traditional relational databases? What about the large base of SQL users, database developers and administrators, as well as casual users who use SQL on a daily basis?<\/p>\n<p>SQL, first developed by IBM in the early 1970s., is widespread for a reason. It\u2019s an effective, intuitive model for organizing and using data. Mapping these familiar data operations to the low-level MapReduce Java API can be daunting, even for experienced Java developers. This is where Apache Hive comes in. Apache Hive grew in Facebook. They needed a way to manage and learn from the huge volumes of data that Facebook produced every day from its burgeoning social network. After trying a few different systems, the team chose Hadoop for storage and processing, since it was cost-effective and met the scalability requirements. Apache Hive was created to make it possible for analysts with strong SQL skills, but meager Java programming skills, to run queries on the huge data volumes stored in HDFS.<\/p>\n<p>Apache Hive provides a SQL dialect, called Apache Hive Query Language (abbreviated Apache HiveQL or just HQL) for querying data stored in a Hadoop cluster. Behind the scenes, it translates most queries to MapReduce jobs, thereby exploiting the scalability of Hadoop, while presenting a familiar SQL abstraction. It means that Apache Hive does this drudgery for you, so you can focus on the query itself. Hive even supports an ODBC driver so that existing applications are easier to convert.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-360.png\" \/><\/p>\n<p class=\"caption\">Figure 2: Facebook data scientist reality<\/p>\n<p>Sooner or later, data scientists in Facebook realized that working with datasets always loaded from the disk are very slow. It turned out that two most important performance killers are disk IO operations and data serialization and replication in HDFS. They wanted a framework that operates in memory instead of writing and reading all intermediate steps of computation from disk.<\/p>\n<h2>Getting started with Apache Spark<\/h2>\n<p>Spark is known for being able to keep large working datasets in memory between jobs. Thanks to this, many distributed computations, even ones that process terabytes of data across dozens of machines, can run in a few seconds. It provides a performance boost that is up to <a href=\"http:\/\/spark.apache.org\/\">100 times faster than Hadoop<\/a>. Unlike most of the other Big Data processing frameworks, Spark does not use MapReduce as an execution engine; instead, it uses its own distributed runtime for executing work on a cluster.<\/p>\n<p>Spark can be used with Python, Java, Scala, R, SQL and recently in <a href=\"https:\/\/databricks.com\/blog\/2016\/08\/03\/developing-apache-spark-applications-in-net-using-mobius.html\">.NET<\/a>, so it is up to you to select your favorite programming language. It is written in Scala, and runs on the Java Virtual Machine (JVM). You can run it either on your laptop or a computing cluster, all you need is an installation of Java 6+.<\/p>\n<p>The first step in using Spark is to download and unpack it. Let\u2019s start by <a href=\"http:\/\/spark.apache.org\/downloads.html\">downloading<\/a> a recent precompiled version of Spark. Select the package type of \u201cPre-built for Hadoop 2.7 and later,\u201d and click \u201cDirect Download.\u201d This will download a compressed file.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-361.png\" \/><\/p>\n<p class=\"caption\">Figure 3: Apache Spark official <a href=\"http:\/\/spark.apache.org\/downloads.html\">download page<\/a><\/p>\n<p>If you are a Windows user like me, you may run into issues installing Spark into a directory with a space in the name. Instead, install Spark in a directory with no spaces (e.g., C:\\spark). You also don\u2019t need to have Hadoop cluster in place, but as you are Windows user you need to mimic the Hadoop environment. To do this:<\/p>\n<ol>\n<li>Install Java Development Kit on your machine.<\/li>\n<li>Create a local directory (e.g. C:\\spark-hadoop\\bin\\) and copy there <a href=\"https:\/\/github.com\/steveloughran\/winutils\">Windows binaries for Hadoop<\/a> (winutils.exe)<\/li>\n<li>Add environment variable called HADOOP_HOME and set value to your winutils.exe location.<\/li>\n<\/ol>\n<p>To start with Spark, let\u2019s run an interactive session in your command shell of choice. Go to your Spark bin directory and start up the shell with the following command: <em>spark-shell<\/em>. The shell prompt should appear within a few seconds. It is a <a href=\"http:\/\/www.scala-lang.org\/\">Scala<\/a> REPL (read-eval-print loop) with a few Spark additions.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-362.png\" \/><\/p>\n<p class=\"caption\">Figure 4: Apache Spark shell welcome screen<\/p>\n<p>The easiest way to demonstrate the power of Spark is to walk through the example from the <a href=\"http:\/\/spark.apache.org\/docs\/latest\/quick-start.html\">Quick Start Guide<\/a> in the official Spark documentation. Spark\u2019s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). Once created, RDDs offer two types of operations: transformations and actions. Actions compute a result based on an RDD. Let\u2019s make a new RDD from the text of the README file in the Spark source directory:<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">scala&gt; val textFile = sc.textFile(\"README.md\")\r\ntextFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at :24\r\nscala&gt; textFile.count() \/\/ Number of items in this RDD\r\nres0: Long = 99\r\nscala&gt; textFile.first() \/\/ First item in this RDD\r\nres1: String = # Apache Spark\r\n<\/pre>\n<p class=\"caption\">Figure 5: First action in Spark shell<\/p>\n<p>One example of an action we called in above example is <a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">count<\/a>, which returns number of elements in an RDD. Transformations, on the other hand, construct a new RDD from a previous one. So let\u2019s use a <a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">filter<\/a> transformation to return a new RDD with a subset of the items in the file.<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">scala&gt; val linesWithSpark = textFile.filter(line =&gt; line.contains(\"Spark\"))\r\nlinesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at :27\r\n\r\nscala&gt; linesWithSpark.take(10)\r\nres0: Array[String] = Array(# Apache Spark, Spark is a fast and general cluster computing system for Big Data. It provides, rich set of higher-level tools including Spark SQL for SQL and DataFrames,, and Spark Streaming for stream processing., ...)\r\n<\/pre>\n<p class=\"caption\">Figure 6: Transformation example in Spark shell<\/p>\n<p>Although you can define new RDDs using transformations at any time, Spark computes them only in a lazy fashion &#8211; that is, the first time an action is used. This approach might seem unusual at first, but makes a lot of sense when you are working with Big Data. Consider the above example, where we defined a text file and then filtered the lines that include Spark keyword. If Spark were to load and store all the lines from the file as soon as we wrote lines = sc.textFile(&#8230;), it would waste a lot of storage space, given that we then immediately filter out many lines. Instead, it computes and returns only this data that are a result of all transformations.<\/p>\n<p>This wouldn\u2019t be a complete picture of Big Data framework, if we didn\u2019t have a word count example Spark can implement MapReduce pattern easily. So, let\u2019s do this in last example:<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">scala&gt; val wordCounts = { textFile\r\n     | .flatMap(line =&gt; line.split(\" \"))\r\n     | .map(word =&gt; (word, 1))\r\n     | .reduceByKey((a, b) =&gt; a + b)\r\n     | }\r\nwordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[9] at reduceByKey at :29\r\nscala&gt; wordCounts.collect()\r\nres6: Array[(String, Int)] = Array((means,1), (under,2), (this,3), ...)\r\n<\/pre>\n<p class=\"caption\">Figure 7: Word count example in Spark shell<\/p>\n<p>Here, we have combined the\u00a0<a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">flatMap<\/a>,\u00a0<a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">map<\/a>, and\u00a0<a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">reduceByKey<\/a>\u00a0transformations to compute the per-word counts. To collect the word counts, we can use the\u00a0<a href=\"http:\/\/spark.apache.org\/docs\/latest\/programming-guide.html\">collect<\/a>\u00a0action. As you can see, it is pretty simple to implement this word count example on a single machine. Keep in mind that, in a distributed framework, it is a common challenge because it involves processing data from many nodes.<\/p>\n<p>Spark records into log files the detailed progress information that is produced by the driver and executor processes. However, there is a better way to learn the behavior and performance of your application. Spark, out of the box offers a web UI that contains detailed information about the jobs being executed. This is Spark\u2019s built-in web UI available on the machine where the driver is running (by default on 4040 port). You can easily analyze and monitor the progress of execution in order to help you to tweak your code for better performance.<\/p>\n<h2><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-363.png\" \/><\/h2>\n<p class=\"caption\">Figure 8: Spark\u2019s built-in web UI<\/p>\n<p>In Spark, a job is associated with a chain of RDD dependencies organized in a direct acyclic graph (DAG). Another great feature of Spark UI is DAG visualization. It is a great way to learn how the application is executed step by step. You can see that each RDD maintains a pointer to one or more parents along with metadata about the type of relationship that they have. It constitutes the lineage of an RDD.<\/p>\n<h2>Databricks Community Edition \u2013 you owe yourself to have a cluster!<\/h2>\n<p>Apache Spark is a sophisticated distributed computation framework for parallel code execution across many machines. While the abstractions and interfaces are simple, the same isn\u2019t true for managing clusters of computers and ensuring production-level stability. <a href=\"https:\/\/databricks.com\/\">Databricks<\/a>, a company founded by the creators of Spark, makes Big Data simple by providing Apache Spark as a hosted solution. A free <a href=\"https:\/\/databricks.com\/blog\/2016\/02\/17\/introducing-databricks-community-edition-apache-spark-for-all.html\">Databricks Community Edition<\/a> (DCE) service enables everyone to learn and explore Apache Spark by providing access to a simple, integrated development environment. That means you don\u2019t have to learn complex cluster management concepts or perform tedious maintenance tasks to take advantage of Spark! All you need to do is to <a href=\"https:\/\/community.cloud.databricks.com\">register<\/a>, click one button and voil\u00e0, you have your own Big Data cluster in a cloud!<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-364.png\" \/><\/p>\n<p class=\"caption\">Figure 9: Databrics service welcome page<\/p>\n<p>First thing what you should do after logging into DCE is to read that great publication <a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/346304\/2168141618055043\/484361\/latest.html\">A Gentle Introduction to Apache Spark on Databricks<\/a>. After that, you will be familiar with concepts such as workspace or notebook. The next step is to create a cluster by means of one click of your mouse.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-365.png\" \/><\/p>\n<p class=\"caption\">Figure 10: Creating cluster in DCE<\/p>\n<p>At the time of writing this article, you will receive a free cluster with 48 GB memory and 7 CPU cores for worker nodes, and 6 GB and almost 1 CPU for a driver. This sounds promising, so let\u2019s go and write some code! Instead of doing \u201chello world\u201d examples, let\u2019s jump to some real world data from Wikipedia. DCE offers an access <a href=\"https:\/\/datahub.io\/dataset\/wikipedia-clickstream\/\">3.2 billion requests from Wikipedia collected during the month of February 2015<\/a>. Let\u2019s use it in our first notebook. We will use <a href=\"https:\/\/github.com\/databricks\/spark-csv\">Databricks library<\/a> for CSV file reading.<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">\/\/ Load the raw dataset stored as a CSV file\r\nval clickstreamRaw = sqlContext.read\r\n   .format(\"com.databricks.spark.csv\")\r\n   .option(\"header\", \"true\")\r\n   .option(\"delimiter\", \"\\t\")\r\n   .option(\"inferSchema\", \"true\")\r\n   .load(\"dbfs:\/\/\/databricks-datasets\/wikipedia-datasets\/data-001\/clickstream\/raw-uncompressed\")\r\n<\/pre>\n<p class=\"caption\">Figure 11: Loading Wikipedia data source in DCE notebook<\/p>\n<p>As we loaded around 1 GB of aggregated data, it would be valuable to use a more efficient format to store it. Without going into the details, as it is out of scope for this article, the best option is to use some columnar storage such as <a href=\"https:\/\/parquet.apache.org\/\">Parquet<\/a>.<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">\/\/ Convert the dataset to a more efficent format to speed up our analysis\r\nclickstreamRaw.write\r\n   .mode(\"overwrite\")\r\n   .format(\"parquet\")\r\n   .save(\"\/datasets\/wiki-clickstream\")\r\n<\/pre>\n<p class=\"caption\">Figure 12: Converting CSV to columnar format<\/p>\n<p>After saving to Parquet, we were able to limit the size of the dataset from circa 1GB to 280 MB which is a pretty good compression ratio. It also allows us to reduce execution time from 40 sec to 0.68 sec!<\/p>\n<pre class=\"theme:vs2012 lang:scala decode:true\">\/\/ Read Parquet file\r\nval clicks = sqlContext.read.parquet(\"\/datasets\/wiki-clickstream\");\r\n\/\/ Register sql table in the SQLContext \r\nclicks.registerTempTable(\"myWikiTable\")\r\n<\/pre>\n<p class=\"caption\">Figure 13: Reading compressed file and registering SQL table<\/p>\n<p>Having all clickstream Wikipedia data from February 2015, we can now use it to find out what was the most interesting topic at that moment. Since Spark is a unified platform, we can switch between different programming languages and choose the best one for the job. In that case we will use Spark SQL.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">%sql\r\n\r\n\/\/ Find the most popular wiki site\r\nselect \r\n    prev_title,\r\n    curr_title,\r\n    n\r\nfrom\r\n    myWikiTable\r\nwhere \r\n    prev_title not like 'other%'\r\nand prev_title != 'Main_Page'\r\norder by \r\n    n desc\r\nlimit 10\r\n<\/pre>\n<p class=\"caption\">Figure 14: Finding the most popular Wikipedia site in February 2015<\/p>\n<p>Unsurprisingly, the most interesting article was about the movie Fifty Shades of Grey, because February was Oscar season. The wiki movie website was visited around 370k times.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2016\/09\/word-image-366.png\" \/><\/p>\n<p class=\"caption\">Figure 15: Notebook query results in tabular format presented<\/p>\n<h2>Documentation and books<\/h2>\n<p>There are a lot of books, blogs and official technical documentation available on the Apache <a href=\"http:\/\/spark.apache.org\/docs\/latest\/index.html\">foundation<\/a> and Databricks <a href=\"https:\/\/databricks.com\/spark\/about\">site<\/a>. There is also an official Apache Spark channel available on YouTube, where you can find many interesting webinars, including e.g. <a href=\"https:\/\/www.youtube.com\/watch?v=2b-0xddTzEU\">Spark Essentials with Adam Breindel<\/a> or <a href=\"https:\/\/www.infoq.com\/articles\/apache-spark-introduction\">Srini Penchikala introduction to Spark<\/a>.<\/p>\n<p>If you would like to start your journey with Spark and looking for a book, I would recommend <a href=\"http:\/\/shop.oreilly.com\/product\/0636920028512.do\">Learning Spark<\/a> as a starter. <a href=\"https:\/\/databricks.com\/blog\/2016\/02\/17\/introducing-databricks-community-edition-apache-spark-for-all.html\">Databricks Community Edition<\/a> is also a great platform to learn Spark. Since its launch, tens of US universities have already used it for teaching, including UC Berkeley and Stanford. To make learning Apache Spark even easier, DCE gives you three notebooks to provide a \u201cgentler\u201d introduction to Apache Spark. You can find these new notebooks here:<\/p>\n<ul>\n<li><a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/346304\/2168141618055043\/484361\/latest.html\">A Gentle Introduction to Apache Spark on Databricks<\/a><\/li>\n<li><a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/346304\/2168141618055194\/484361\/latest.html\">Apache Spark on Databricks for Data Engineers<\/a><\/li>\n<li><a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/346304\/2168141618055109\/484361\/latest.html\">Apache Spark on Databricks for Data Scientists<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>Spark is growing fast and the community becomes larger and larger day by day. Recently, <a href=\"https:\/\/spark.apache.org\/releases\/spark-release-2-0-0.html\">the new 2.0 version<\/a> has been released with a lot of new features. It is considered as a next-generation ETL framework, thanks to its flexibility, scalability, conciseness, team-based development capabilities and great performance. If you are a data scientist or an engineer interested in modern data processing Apache Spark should be at the top of your learning list.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is worth getting familiar with Apache Spark because it a fast and general engine for large-scale data processing and you can use you existing SQL skills to get going with analysis of the type and volume of semi-structured data that would be awkward for a relational database. With an IDE such as Databricks you can very quickly get hands-on experience with an interesting technology.&hellip;<\/p>\n","protected":false},"author":239951,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[137094],"tags":[],"coauthors":[21219],"class_list":["post-68089","post","type-post","status-publish","format-standard","hentry","category-big-data"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/68089","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/239951"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=68089"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/68089\/revisions"}],"predecessor-version":[{"id":90068,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/68089\/revisions\/90068"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=68089"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=68089"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=68089"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=68089"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}