{"id":86540,"date":"2020-03-02T18:56:01","date_gmt":"2020-03-02T18:56:01","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=86540"},"modified":"2021-05-03T14:04:43","modified_gmt":"2021-05-03T14:04:43","slug":"translating-a-sql-server-schema-into-a-cassandra-table-part-ii-integrity-constraints","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/nosql\/translating-a-sql-server-schema-into-a-cassandra-table-part-ii-integrity-constraints\/","title":{"rendered":"Translating a SQL Server Schema into a Cassandra Table: Part II Integrity Constraints"},"content":{"rendered":"<p><strong>The series so far:<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/translating-a-sql-server-schema-into-a-cassandra-table-part-i-problem-space-and-cassandra-primary-key\/\">Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/translating-a-sql-server-schema-into-a-cassandra-table-part-ii-integrity-constraints\/\">Translating a SQL Server Schema into a Cassandra Table: Part II Integrity Constraints<\/a>\u00a0<\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/translating-a-sql-server-schema-into-a-cassandra-table-iii-many-to-many-attribute-closure-and-solution-space\/\">Translating a SQL Server Schema into a Cassandra Table: Part III Many-to-Many, Attribute Closure and Solution Space\u00a0<\/a><\/li>\n<\/ul>\n\n<p>Integrity constraints, including functional dependencies, guide the modeling process for relational database systems. They are no less important when designing Cassandra structures \u2013 although support can be limited.<\/p>\n<p>This second of the series revolves around four separate problems to solve, each presenting modeling principles in Cassandra and some in SQL Server as well. Each highlights different facets. In the first, tracking surgeries in an OLTP relational model is straightforward, but there are alternatives in Cassandra.<\/p>\n<p>In the second, complex integrity constraints for plane rentals are covered in several ways in SQL Server but prove problematic in Cassandra. The third problem presents a series of unrelated functional dependencies from which I deduce rules for Cassandra key design.<\/p>\n<p>The last to solve then breaks the rules with architects and blueprints to reuse a Cassandra base table as the basis for other tables.<\/p>\n<p>Each example is meant to be an enjoyable logic puzzle in itself. You need a background in Cassandra key and table design as gotten from Part I or elsewhere as well as experience with relational systems.<\/p>\n<h2>Collections and Key Uniqueness<\/h2>\n<p>Consider the following entity-relationship diagram (ERD):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"490\" height=\"300\" class=\"wp-image-86541\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-picture-containing-text-map-description-automa.png\" alt=\"A picture containing text, map\n\nDescription automatically generated\" \/><\/p>\n<p class=\"caption\">Figure 1. Conceptual surgeries model<\/p>\n<p>In a normalized relational physical model, the multivalued <em>procedures<\/em> attribute would comprise its own table, its composite key containing a foreign key referencing the associative table (from the associative entity) for <em>Surgery<\/em> and using appropriate cascading referential integrity to prevent orphans. The associative table, in turn, could have three unique constraints: the primary key as shown, plus (Patient, Start Date Time), and (Surgery ID). This is straightforward.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"730\" height=\"416\" class=\"wp-image-86542\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-social-media-post-description-a.png\" alt=\"A screenshot of a social media post\n\nDescription automatically generated\" \/><\/p>\n<p class=\"caption\">Figure 2. SQL Server physical solution<\/p>\n<p>In designing a Cassandra table, though, often there are more choices. Assume this access pattern:<\/p>\n<p>Q<sub>k<\/sub>. <em>Find lead surgeons and procedures performed at a surgery center for a given week<\/em><\/p>\n<p>Here is one possible design (all code examples can be found <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/Cassandra-II-Scripts.7z\">here<\/a>):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"428\" height=\"250\" class=\"wp-image-86543\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>Numeric columns <em>year<\/em> and <em>week<\/em> together identify the week (see readout below). Column <em>year<\/em> as a partition key prevents the partition from growing unbounded.<\/p>\n<p>All told, by inserting nested collection type <em>surgery<\/em> <em>details<\/em>, the design is apropos for the query, and the primary key meets the uniqueness criteria. The problems are, however, the complexity and size of surgery details.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"734\" height=\"257\" class=\"wp-image-86544\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-1.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The <em>DataStax CQL<\/em> shell (cqlsh) displays query result partition columns in red, clustering columns in aqua, regular (non-key) columns in mauve and statics in white.<\/p>\n<p>There is not a way in CQL to retrieve a smaller segment of the potentially substantial surgery details column, e.g. by surgeon or date. Solely by removing the lead surgeon from the nested collection and appending it to the clustering column list, though, the collection\u2019s size and complexity shrinks; its number of collections reduces from five to three:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"222\" class=\"wp-image-86545\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-2.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>This key is also unique! Because of collections, the infamous 1NF violator, there is no danger of compromising Cassandra data by making keys out of non-minimal superkeys as happens in poor relational designs.<\/p>\n<p>Notice that the same query can be reused on this revised table. One more row is returned, each with less (inapplicable) data:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"651\" height=\"401\" class=\"wp-image-86546\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-3.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>A query now has the flexibility to restrict on the lead surgeon as well if known via the application\u2019s workflow. Further, an aggregate query can count the distinct doctors performing surgery in the week.<\/p>\n<p>The process can be extended. Surgery start time could also be taken from surgery details and appended after lead surgeon in the key, leaving surgery details as a single set of procedure codes. Similarly, the codes could be appended and surgery details removed. Each clustering column addition places the rowset at a finer granularity, i.e. more rows albeit with less, but more focused, information.<\/p>\n<p>Another option affecting the number of rows in a partition is to slide keys in or out of the partition.<\/p>\n<p>In this exercise, placement and suggested possible placement of key attributes all follow the correct ordering of attributes. This concept will be explored in section <em>Placement of Functional Dependencies<\/em> below.<\/p>\n<h2>Enforcing Integrity Constraints<\/h2>\n<p>Say a proof-of-concept Cassandra database monitors a single flight club under the auspices of the Federal Aviation Administration (FAA). A transactional plane rental table to be designed is bound by these integrity constraints (IC; aka enterprise\/business rules):<\/p>\n<ol>\n<li>A member can rent at most one plane on any day<\/li>\n<li>A plane can be rented at most once on any day<\/li>\n<li>Rentals require a single credit card payment<\/li>\n<li>A set of credit cards uniquely identifies a member<\/li>\n<li>Violation of club or FAA or other rules disqualifies a member from renting a plane<\/li>\n<\/ol>\n<p>In a relational system, point 5 requires procedural code to enforce. Functional dependencies (FD), a special class of IC, can be deduced from points 1 through 4. Here they are using letter abbreviations M: member, D: day, P: plane, C: credit card, and \u2018\u2192\u2019 for functionally determines:<\/p>\n<ol>\n<li>DM \u2192 P IC 1.<\/li>\n<li>DP \u2192 M IC 2.<\/li>\n<li>C \u2192 M IC 4.<\/li>\n<li>DM \u2192 C IC 1. and 3.<\/li>\n<\/ol>\n<p>Together these FDs imply three overlapping candidate keys. This can be proven using Armstrong\u2019s Axioms and secondary rules. I\u2019ll show one \u2013 DC as candidate key \u2013 and omit the rest.<\/p>\n<ol start=\"5\">\n<li>DC \u2192 DM (3, augmentation)<\/li>\n<li>DC \u2192 P (1 and 5, transitivity)<\/li>\n<li>DC \u2192 DMP (5 and 6, union)<\/li>\n<li>DC \u2192 DCMP (7, augmentation)<\/li>\n<\/ol>\n<p>The candidate keys can be displayed graphically, the bold <strong>R<\/strong> as shorthand for all columns in the relation:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"503\" height=\"196\" class=\"wp-image-86547\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/word-image.png\" \/><\/p>\n<p class=\"caption\">Figure 3. Two functional dependency diagrams for the Plane Rental table<\/p>\n<p>The diagrams imply an interesting barebones relational table: there are no non-prime attributes. A non-prime attribute doesn\u2019t appear in any candidate key. Further, owing to functional dependency C \u2192 M, the table would be in 3NF, with redundancy occurring with each repeated credit card-member pair.<\/p>\n<p>This sketch is enough to start modeling. I\u2019ll defer for the moment the design considerations in Cassandra and continue with a sample physical model of the transactional table in SQL Server:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"577\" height=\"200\" class=\"wp-image-86548\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-picture-containing-indoor-description-automatic.png\" alt=\"A picture containing indoor\n\nDescription automatically generated\" \/><\/p>\n<p>As expected, foreign keys reference tables with more related attributes. Unique constraints represent the three candidate keys, the Plane-Rental Date pair having been elevated to the primary key. This satisfies ICs 1 and 2 and the implied IC involving 3. Member credit card as non-null implies the credit card payment in point 3. What remains is addressing integrity constraints 4 and 5.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"431\" height=\"134\" class=\"wp-image-86549\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-4.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The code snippet is taken from an insert-update trigger on the table to enforce point 4, the rule that credit cards functionally determine members. (This can also serve as a rudimentary basis for a fraud detection system (members as drug smugglers, e.g.)). A more advanced prototype trigger would include joining on the registered credit card table (not shown).<\/p>\n<p>This next snippet is from a second insert-update trigger that joins another table to satisfy rule 5, ensuring a member is cleared to fly:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"294\" height=\"134\" class=\"wp-image-86550\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-5.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The relational table Plane Rental in its very definition covers every integrity constraint required of it. Now switch the effort to Cassandra.<\/p>\n<p>This is the data access pattern for which the table \u2013 the precomputed, sorted result set \u2013 is to be modeled:<\/p>\n<p>Q<sub>n.<\/sub> <em>Find member rentals for a given plane.<\/em><\/p>\n<p>The phrase \u201cfor a given plane\u201d signals that attribute plane should be in the partition key. The choices for the full primary key, then, are these: ((P)) or ((PD)) or ((P)D), the inner parentheses separating partition keys from clustering keys. The first choice ((P)), a deviant (unique) key, creates a single row for each plane with a possibly large collection, unsuitable for a transactional workload. This situation was discussed in the previous section.<\/p>\n<p>The second ((PD)) addresses the access pattern but is non-performant as Q<sub>n<\/sub>. queries may access many nodes over the cluster.<\/p>\n<p>That leaves ((P)D).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"437\" height=\"173\" class=\"wp-image-86551\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-6.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The table definition allows for one manageable row for each plane-day pair on one partition. Below is a sampling of data legal for the table:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"788\" height=\"115\" class=\"wp-image-86552\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-black-sign-with-white-text-description-automati.png\" alt=\"A black sign with white text\n\nDescription automatically generated\" \/><\/p>\n<p>The result is failure on all but one integrity constraint. Restricting a plane rental to at most once in a day, FD point 2 (DP \u2192 M), is satisfied by the primary key. Point 1, though, limiting a member to at most one rental a day {DM \u2192 P}, is shown violated by rows two and four; Cassandra doesn\u2019t support alternate keys (think of the complexity and expense in a distributed data system).<\/p>\n<p>Requiring credit card payment, point 3, fails in the first row; in the Cassandra wide-row model, any regular value may be missing from a row, displayed here as null. Point 4, in which a set of credit cards uniquely identifies a member (C \u2192 M), is not enforced, as seen in rows two and three. Recall that it was verified in the SQL Server trigger.<\/p>\n<p>The readout does not indicate if point 5, which disqualifies a member in violation of rules from renting a plane, is being enforced \u2013 another trigger-based test not available in Cassandra.<\/p>\n<p>Integrity constraints don\u2019t go away when the storage\/retrieval engine changes. Most enforcement in Cassandra must be written in the client app(s). Bulletproofing that was almost paint-by-the-numbers simple now requires more effort.<\/p>\n<h2>Placement of Functional Dependencies<\/h2>\n<p>As seen in the previous section, functional dependencies are an important type of integrity constraint. For this section, consider these attribute sets and their many-to-one (\u2018\u2192\u2019 ) relationships in the unified relation:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"599\" height=\"49\" class=\"wp-image-86553\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/word-image-1.png\" \/><\/p>\n<p class=\"caption\">Figure 4. Unrelated functional dependencies<\/p>\n<p>In Part I, I introduced the unified relation as a container for entities, attributes and relationships of interest to an access pattern in the logical design phase. Besides placing focus on relationships to support, much Cassandra analysis relies on functional dependencies, which exist only in relations.<\/p>\n<p>Attribute sets in a functional dependency relationship can appear anywhere in the primary key or as regular attributes or spanning both. This diagram shows placement \u2013 poor placement, to be discussed \u2013 using Cassandra conceptual row nesting disc layout:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"471\" height=\"321\" class=\"wp-image-86554\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/word-image-2.png\" \/><\/p>\n<p class=\"caption\">Figure 5. Functional dependency errors in partition and clustering keys<\/p>\n<p>The first problem occurs with the partition keys. CQL queries in support of an access pattern mostly restrain on all partition attributes. It may make sense for the access pattern to place the A attribute set in the partition, or the B set. Importantly, however, it makes no sense to place them both in the partition. Doing so cannot affect logical partitioning and can only complicate querying. One is in, and one is made static, assuming there are clustering keys.<\/p>\n<p>(Recall that a static column\u2019s value is stored once per partition, not once per row. Making a column static only makes sense when it is in a relationship with a subset of the partition columns and clustering columns exist. Depending on the relationship type (not necessarily functional), a static may store an atomic value or be a collection type or user-defined type (UDT).)<\/p>\n<p>The second problem occurs with clustering keys C and D. In the diagram, the dependent D follows the determinant C. As a result, many C values converge on a single D value. This buys nothing. Rather, the D dependent side cannot affect searching, sorting or uniqueness, and can only complicate queries.<\/p>\n<p>Now imagine the D dependent occurring before the C determinant. The search space now correctly expands in the conceptual row nesting. If C is holes and D is a golf course, each course correctly diverges into many holes.<\/p>\n<p>These two issues suggest a pair of general rules for primary key design given non-overlapping attribute sets X and Y such that X \u2192 Y. X must be irreducible (no attribute can be removed or the dependency is invalidated) and all attributes of X are in the key. For Y, take it to mean any non-empty subset of Y appearing in the key:<\/p>\n<ul>\n<li>X and Y can never appear together in the partition<\/li>\n<li>Regardless of where X and Y appear in the primary key, dependent Y must precede determinant X<\/li>\n<\/ul>\n<p>In clarification of the second point: only (a subset of) Y may (or may not) be in the partition, as per the first point.<\/p>\n<p>I illustrate both rules with an example.<\/p>\n<p>Say the table is to be built for this access pattern:<\/p>\n<p>Q<sub>n<\/sub>. <em>Find product lines by industry and manufacturer<\/em><\/p>\n<p>Identifiers for product line (P), industry (I) and manufacturer (M) are in a hierarchical FD relationship such that P \u2192 M \u2192 I as in Figure 4. Here is the initial physical table developed from the logical model (not shown); it follows both rules:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"143\" class=\"wp-image-86555\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-7.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The advantages are single-pass querying over one partition and redundancy limited to the manufacturer columns. If \u2018I\u2019 and \u2018M\u2019 together would not be available through all relevant, connected application workflow paths, however, Q<sub>n<\/sub>. and the table may need to be rethought, or the deficient workflow paths\/access patterns amended. For the exercise, assume the former tactic \u2013 but consider how the solution next still alters the workflow path.<\/p>\n<p>The table is split in two. The first table is made by sliding over the \u2018M\u2019 to make it a partitioning key: ((IM)P). By the first rule, this reduces to ((M)P) with \u2018I\u2019 as static (if needed). It also remains in agreement with the second (and eliminates redundancy on manufacturer columns).<\/p>\n<p>Could another access pattern require its reverse, ((P)M)? This would falsely imply that a product line can be made by multiple manufacturers, in violation of the integrity constraint formalized as P \u2192 M (and the second rule). Instead, a single-row partition ((P)) may hold all product line details for access later in the workflow.<\/p>\n<p>I think you can deduce the second table: ((I)M), also in accord with the rules. Here they are:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"408\" height=\"233\" class=\"wp-image-86556\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-8.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>There is, though, a significant exception to these rules. See the next section.<\/p>\n<h2>Materialized Views<\/h2>\n<p>In the problem for this section, tables for queries on custom homes need to be designed. Here are the integrity constraints:<\/p>\n<ul>\n<li>Each blueprint is attributed to one architect<\/li>\n<li>An architect develops one or more blueprints<\/li>\n<li>A blueprint is registered before the house is built<\/li>\n<li>A blueprint cannot be reused<\/li>\n<\/ul>\n<p>This entity-occurrence diagram captures the integrity constraints:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"564\" height=\"204\" class=\"wp-image-86557\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-picture-containing-drawing-description-automati.png\" alt=\"A picture containing drawing\n\nDescription automatically generated\" \/><\/p>\n<p class=\"caption\">Figure 6. Merged occurrence diagrams<\/p>\n<p>The architect-blueprint \u201cdesigner of\u201d relationship is one-to-many (1:n) with total participation, and the blueprint-house \u201cplan for\u201d relationship is one-to-one (1:1) with blueprint having partial participation.<\/p>\n<p>There are several access patterns needing tables that reference the three entities and their attributes. Unlike in preceding designs, though, the tables can be defined on a single base table and automatically managed and updated by Cassandra as the base table changes.<\/p>\n<p>This is the access pattern for the base table:<\/p>\n<p>Q<sub>i<\/sub>. <em>Find blueprints and houses for a given architect.<\/em><\/p>\n<p>Below is the sample table definition:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"553\" height=\"157\" class=\"wp-image-86558\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-9.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>Acronym NCARB means National Council of Architectural Registration Boards.<\/p>\n<p>The base table purposely and necessarily follows the two rules from the previous section and has all the data \u2013 there is no loss of information:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"808\" height=\"131\" class=\"wp-image-86559\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-picture-containing-electronics-description-auto.png\" alt=\"A picture containing electronics\n\nDescription automatically generated\" \/><\/p>\n<p>It is also a rare table in that there is no redundancy. (Don\u2019t let the readout fool you: although static <em>is ncarb certified<\/em> appears in every row in the readout, it is stored once per architect (partition)).<\/p>\n<p>The first materialized view table is defined for this access pattern:<\/p>\n<p>Q<sub>i<\/sub>. <em>Find architect and blueprint information for a given house address.<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"547\" height=\"112\" class=\"wp-image-86560\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-10.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>The base table is referenced in the FROM clause. In the WHERE clause, each primary key column is restricted at minimum to be non-null for key correctness.<\/p>\n<p>Notice that static column <em>is ncarb certified<\/em> is not in the SELECT list. There is now some information loss as static columns cannot be included in materialized view definitions:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"637\" height=\"109\" class=\"wp-image-86561\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-close-up-of-a-sign-description-automatically-ge.png\" alt=\"A close up of a sign\n\nDescription automatically generated\" \/><\/p>\n<p>More critically, now the rule placing determinants after dependents is violated. House address, elevated from regular column to partition key, functionally determines both clustering keys architect and blueprint. This may happen in the view. All primary key columns in the base must be, in any order, in the view primary key to preserve the 1:1 mapping from base rows to a subset of rows in the view. Failure to do so could collapse multiple rows into one from base to view and is prohibited.<\/p>\n<p>This is a second access pattern:<\/p>\n<p>Q<sub>k<\/sub>. <em>Find all houses constructed for an architect.<\/em><\/p>\n<p>This mirrors the previous view except that the house address moves from the partition to the end of the clustering key list. In restricting the address to non-null, not all of Suzie\u2019s base table rows are in the view as the readout following the definition shows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"534\" height=\"112\" class=\"wp-image-86562\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screen-shot-of-a-social-media-post-description.png\" alt=\"A screen shot of a social media post\n\nDescription automatically generated\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"114\" class=\"wp-image-86563\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-close-up-of-a-computer-description-automaticall.png\" alt=\"A close up of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>This is the final access pattern:<\/p>\n<p>Q<sub>l<\/sub>. <em>Find architect and house design information by blueprint.<\/em><\/p>\n<p>To realize the view, simply reverse the partition and clustering keys from the base. In contrast to the previous view, house address is not in the key lest it incorrectly remove rows given blueprint\u2019s partial participation with it.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"539\" height=\"95\" class=\"wp-image-86564\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-11.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"112\" class=\"wp-image-86565\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-close-up-of-a-sign-description-automatically-ge-1.png\" alt=\"A close up of a sign\n\nDescription automatically generated\" \/><\/p>\n<p>As with the base table, the view table has no redundancy as any attributes in a cardinality relationship with the architect should be static, and statics aren\u2019t carried into the view.<\/p>\n<h2>Conclusion<\/h2>\n<p>You saw that integrity constraints and functional dependencies are just as crucial for Cassandra design and, although only implied, table updates and maintenance as well.<\/p>\n<p>On the relational side, built-in features and theory are tremendous aids in modeling. For the surgery problem, the way was clear in translating entities and a multivalued attribute into interconnected tables and using cascading referential integrity appropriately. For the plane rental problem, it was translating some ICs into FDs, and reasoning from them to apply a mixed strategy of unique and referential constraints and triggers to cover all ICs.<\/p>\n<p>In Cassandra, the first lesson was about moving non-key attributes to the key and enlisting (nested) collections to ensure the key is always unique. In the next, you saw how, having only a primary key for IC enforcement, a legal instance of a Cassandra table can violate almost all ICs.<\/p>\n<p>The third presented two rules for placement of FDs in the primary key: disallowing determinants and dependents both to reside in the partition and ordering of dependents before determinants to ensure proper nested row layout.<\/p>\n<p>Finally, the discussion on materialized views showed that the base table must follow the rules, but the views built on the base necessarily don\u2019t. Materialized views enable reusing of data with automatic synchronization.<\/p>\n<h2>Last Word<\/h2>\n<p>Imagine building a SQL Server backend for a medium- to large-size OLTP application. You enforce a clean separation of concerns between app and database, allowing only execute permission on stored procedures in a handful of schemas. Whether you agree with me that this is a best practice \u2013 a topic for another day \u2013 one thing is clear: coupling between the two is as loose as can be.<\/p>\n<p>All I\u2019ve discussed points to the fact that, in contrast, Cassandra and OLTP apps are as tightly coupled as possible. Do you want integrity constraint enforcement? Mostly, write it in the app.<\/p>\n<p>Looking ahead, the last article in the series is in three parts. The first discusses the common case of many-to-many relationships in the ERD. The second presents a function for doing attribute closure given a set of functional dependencies. This process has important implications for key and table design and functional dependency enforcement.<\/p>\n<p>The last part returns to the original problem; solutions, given all the background, now come easily.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Complex integrity constraints are more challenging to enforce in Cassandra than they are in a relational database. In this article, Shel Burkow walks through four modelling examples in Cassandra involving constraints.&hellip;<\/p>\n","protected":false},"author":230507,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143535],"tags":[5966],"coauthors":[20272],"class_list":["post-86540","post","type-post","status-publish","format-standard","hentry","category-featured","category-nosql","tag-monitor"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/230507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=86540"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86540\/revisions"}],"predecessor-version":[{"id":86568,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86540\/revisions\/86568"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=86540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=86540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=86540"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=86540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}