{"id":104693,"date":"2024-12-17T17:26:55","date_gmt":"2024-12-17T17:26:55","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=104693"},"modified":"2025-01-16T19:33:29","modified_gmt":"2025-01-16T19:33:29","slug":"comparing-groups-for-similarities-in-power-query-using-cosine-similarity","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/data-science\/comparing-groups-for-similarities-in-power-query-using-cosine-similarity\/","title":{"rendered":"Comparing Groups for Similarities in Power Query \u2013 Using Cosine Similarity"},"content":{"rendered":"\n<p>I&#8217;ll admit upfront\u2014I am not a data scientist by trade. Instead, I&#8217;ve picked up my data science skills over time, learning through a combination of osmosis from talented colleagues and tackling real-world data challenges. It\u2019s been a journey of trial, error, and refinement, as I&#8217;ve worked to bridge gaps between complex data science techniques and tools available to me.<\/p>\n\n\n\n<p>Recently, my skills were put to the test when I needed to compare hundreds of Active Directory and SharePoint Groups to find similarities in their memberships. With only Power Query available in the production environment, no Python or R to ease the process, I faced the task of finding a method to finding similarities from scratch in Power Query. In this guide, I&#8217;ll walk you through the solution I developed, highlighting the steps that made it possible.<\/p>\n\n\n\n<p><em>You can download the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/Cosine-Similarity-Commented.zip\">PowerBI .pbix file here in a .zip format<\/a> that is used in this article.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cosine-similarity\">Cosine Similarity<\/h2>\n\n\n\n<p>Let\u2019s first start with the data science technique I used called Cosine similarity. Cosine similarity is a metric used to measure the similarity between two vectors by calculating the cosine of the angle between them. My eyes glazed over the first time I heard that explanation, but this technique is widely applied to quantify similarity in text analysis to find similarities in documents and recommendation engines \u2013 think Netflix or Amazon recommendations.<\/p>\n\n\n\n<p>Cosine similarity ranges from -1 to 1, where 1 indicates perfectly aligned vectors (high similarity), 0 denotes orthogonal vectors (no similarity), and -1 suggests opposite alignment. If you want to read about the whole math that defines it, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cosine_similarity\">this wiki article<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Real-world Application<\/h2>\n\n\n\n<p>So, in my case, I was looking for Active Directory\/SharePoint groups that had memberships that were identical (cosine similarity of 1) or nearly identical (cosine similarity &gt; .95). With hundreds of groups that range from 1 person to over 1,000 members, this was an analytical challenge. The goal was to look for ways to consolidate groups and reduce the overhead of managing multiple groups and access to applications.<\/p>\n\n\n\n<p>Calculating cosine similarity in Power Query presents a few challenges, however. Power Query lacks built-in vector or matrix operations, which makes performing linear algebra tasks complex. Python\/R developers could just import a library and use a function to calculate this, but alas this was not an option in the environment I was working in (just Power BI with no Power BI gateways). If you are in the same boat, here\u2019s how to do it:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li><strong>Original Data<\/strong> \u2013 Let\u2019s assume the data starts with a table where each row identifies a person and a single group they are associated with. Table 1 displays an example of group membership we\u2019ll use during the rest of this article.<\/li>\n<\/ol>\n<\/div>\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n<\/div>\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>\n<p><strong>Group Name<\/strong><\/p>\n<\/td><td>\n<p><strong>Person<\/strong><\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>Jane.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>Jane.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>Jane.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>Jane.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>Bruce.Wayne<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>Jane.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>Bruce.Wayne<\/p>\n<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Table 1 \u2013 Original Table<\/strong><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n<div class=\"block-core-list\">\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Set up The Table<\/strong> \u2013 Before applying the cosine similarity algorithm you need to create a table where each row represents the group and each column represents each individual member (e.g., universal principal name\/email address). Table 2 provides an example. Notice that 1 represents that the individual member is a member of the group and 0 represents that the individual member is NOT a member of the group.<\/li>\n<\/ol>\n<\/div>\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\n<p><strong>Group Name<\/strong><\/p>\n<\/th><th>\n<p><strong>Jane.Doe<\/strong><\/p>\n<\/th><th>\n<p><strong>John.Doe<\/strong><\/p>\n<\/th><th>\n<p><strong>Barry.Allen<\/strong><\/p>\n<\/th><th>\n<p><strong>Diana.Prince<\/strong><\/p>\n<\/th><th>\n<p><strong>Bruce.Wayne<\/strong><\/p>\n<\/th><\/tr><\/thead><tbody><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Table 2 \u2013 Set up The Table<\/strong><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n<div class=\"block-core-list\">\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Convert the Columns to Vectors<\/strong> \u2013 Using Power Query we get the individual members (column names) and their value (1,0) and convert that to a Power Query list. We are left with a table with two columns Group Name, and Membership List. Table 3 provides an example of the transformed table.<\/li>\n<\/ol>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\n<p><strong>Group Name<\/strong><\/p>\n<\/th><th>\n<p><strong>Membership as Vectors<\/strong><\/p>\n<\/th><\/tr><\/thead><tbody><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Table 3 \u2013 Example of Table with Membership as Vectors (Membership List column)<\/strong><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n<div class=\"block-core-list\">\n<ol start=\"4\" class=\"wp-block-list\">\n<li><strong>Cross Join the Table<\/strong> \u2013 Next, we need to Cross Join the Tables. This will create a permutation of each Group to compare. We also remove rows that would compare the same group and remove duplicate combinations; moreover, we don\u2019t need both Red Jaguars compared to Blue Barracudas and Blue Barracudas compared to Red Jaguars. Table 4 provides an example output.<\/li>\n<\/ol>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\n<p><strong>Group Name<\/strong><\/p>\n<\/th><th>\n<p><strong>Membership as Vectors<\/strong><\/p>\n<\/th><th>\n<p><strong>Group Name 2<\/strong><\/p>\n<\/th><th>\n<p><strong>Membership as Vectors 2<\/strong><\/p>\n<\/th><\/tr><\/thead><tbody><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Table 4 \u2013 Cross Joined Table<\/strong><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n<div class=\"block-core-list\">\n<ol start=\"5\" class=\"wp-block-list\">\n<li><strong>Conduct Cosine Similarity<\/strong> \u2013 Now it\u2019s time to apply the algorithm by taking each <em>Membership as Vectors<\/em> and <em>Membership as Vectors 2<\/em> and then calculate the dot product of the two lists (a.k.a. vectors), calculate the magnitude of each vector, calculate the square root for each magnitude and then divide the dot product by the square roots multiplied. This must be done with a custom Power Query function, and Figure 1 provides the current code.<\/li>\n<\/ol>\n<\/div>\n\n\n<pre class=\"wp-block-preformatted\">(group1 as list, group2 as list) =&gt;\n    let\n        \/\/ Extract values of both groups\n        vec1 = group1,\n        vec2 = group2,\n        \n        \/\/ Calculate dot product\n        dotProduct = List.Sum(List.Transform(List.Zip({vec1, vec2}), \n                                                   each _{0} * _{1})),\n        \n        \/\/ Calculate magnitudes of both vectors\n        magnitude1 = List.Sum(List.Transform(vec1, each _ * _)),\n        magnitude2 = List.Sum(List.Transform(vec2, each _ * _)),\n        \n        \/\/ Handle case when magnitudes are zero \n        \/\/(to avoid division by zero)\n        magnitude1Sqrt = if magnitude1 = 0 then 0 else \n                                   Number.Sqrt(magnitude1),\n        magnitude2Sqrt = if magnitude2 = 0 then 0 \n                              else Number.Sqrt(magnitude2),\n        \n        \/\/ Calculate cosine similarity\n        cosineSimilarity = if magnitude1Sqrt = 0 or magnitude2Sqrt = 0\n              then 0 else dotProduct \/ (magnitude1Sqrt * magnitude2Sqrt)\n    in\n        cosineSimilarity<\/pre>\n\n\n\n<p><strong>Listing 1 \u2013 fnConsineSimilarity function<\/strong><\/p>\n\n\n\n<p>And you now have a calculation that defines the similarity of the group memberships. You can see that the Green Monkeys and Orange Monkeys have the same membership as the Cosine Similarity is 1. We also see the Red Jaguars and Silver Snakes have very similar membership makeup (the difference being just one person).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\n<p><strong>Group Name<\/strong><\/p>\n<\/th><th>\n<p><strong>Membership as Vectors (String)<\/strong><\/p>\n<\/th><th>\n<p><strong>Group Name 2<\/strong><\/p>\n<\/th><th>\n<p><strong>Membership as Vectors (String) 2<\/strong><\/p>\n<\/th><th>\n<p><strong>Cosine Similarity<\/strong><\/p>\n<\/th><th>\n<p><strong>List Differences by Name<\/strong><\/p>\n<\/th><\/tr><\/thead><tbody><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>0.816496581<\/p>\n<\/td><td>\n<p>Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>0.816496581<\/p>\n<\/td><td>\n<p>John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>0.816496581<\/p>\n<\/td><td>\n<p>John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne, Jane.Doe, Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Blue Barracudas<\/p>\n<\/td><td>\n<p>{1, 0, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><td>\n<p>0.707106781<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>0.666666667<\/p>\n<\/td><td>\n<p>John.Doe, Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>0.666666667<\/p>\n<\/td><td>\n<p>John.Doe, Diana.Prince<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>0.40824829<\/p>\n<\/td><td>\n<p>Bruce.Wayne, Jane.Doe, Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Red Jaguars<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><td>\n<p>0.866025404<\/p>\n<\/td><td>\n<p>Bruce.Wayne<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>&nbsp;<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne, Jane.Doe, John.Doe, Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Green Monkeys<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><td>\n<p>0.577350269<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne, John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>0<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne, Jane.Doe, John.Doe, Barry.Allen<\/p>\n<\/td><\/tr><tr><td>\n<p>Orange Iguanas<\/p>\n<\/td><td>\n<p>{1, 1, 1, 0, 0}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><td>\n<p>0.577350269<\/p>\n<\/td><td>\n<p>Diana.Prince, Bruce.Wayne, John.Doe<\/p>\n<\/td><\/tr><tr><td>\n<p>Purple Parrots<\/p>\n<\/td><td>\n<p>{0, 0, 0, 1, 1}<\/p>\n<\/td><td>\n<p>Silver Snakes<\/p>\n<\/td><td>\n<p>{1, 0, 1, 1, 1}<\/p>\n<\/td><td>\n<p>0.707106781<\/p>\n<\/td><td>\n<p>Jane.Doe, Barry.Allen<\/p>\n<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Table 5 &#8211; Conduct Cosine Similarity<\/strong><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>While the examples above are trivial, running this process across hundreds of groups with thousands of users may take some time; yet you can let Power BI do the hard work of identifying groups with similar memberships. You can also extend this to other comparison needs, like finding similar respondents to survey questions or product recommendations.<\/p>\n\n\n\n<p>If all you have is Power BI, calculating cosine similarity is possible and can help you compare group memberships to find similar groups. If you\u2019re interested in trying this out, we have an example file to share that contains the example tables and code used to calculate cosine similarity.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ll admit upfront\u2014I am not a data scientist by trade. Instead, I&#8217;ve picked up my data science skills over time, learning through a combination of osmosis from talented colleagues and tackling real-world data challenges. It\u2019s been a journey of trial, error, and refinement, as I&#8217;ve worked to bridge gaps between complex data science techniques and&#8230;&hellip;<\/p>\n","protected":false},"author":344919,"featured_media":104703,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[47,53,159166],"tags":[5764],"coauthors":[159224],"class_list":["post-104693","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-featured","category-powerbi","tag-data-science"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/344919"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=104693"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104693\/revisions"}],"predecessor-version":[{"id":105104,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104693\/revisions\/105104"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104703"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=104693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=104693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=104693"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=104693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}