Lossless join decomposition

In database design, a lossless join decomposition is a decomposition of a relation $r$ into relations $r_{1},r_{2}$ such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.^[1] Lossless join can also be called non-additive.^[2]

Definition

A relation $r$ on schema $R$ decomposes losslessly onto schemas $R_{1}$ and $R_{2}$ if $\pi _{R_{1}}(r)\bowtie \pi _{R_{2}}(r)=r$ , that is $r$ is the natural join of its projections onto the smaller schemas. A pair $(R_{1},R_{2})$ is a lossless-join decomposition of $R$ or said to have a lossless join with respect to a set of functional dependencies $F$ if any relation $r(R)$ that satisfies $F$ decomposes losslessly onto $R_{1}$ and $R_{2}$ .^[3]

Decompositions into more than two schemas can be defined in the same way.^[4]

Criteria

A decomposition $R=R_{1}\cup R_{2}$ has a lossless join with respect to $F$ if and only if the closure of $R_{1}\cap R_{2}$ includes $R_{1}\setminus R_{2}$ or $R_{2}\setminus R_{1}$ . In other words, one of the following must hold:^[4]

$(R_{1}\cap R_{2})\to (R_{1}\setminus R_{2})\in F^{+}$
$(R_{1}\cap R_{2})\to (R_{2}\setminus R_{1})\in F^{+}$

Criteria for multiple sub-schemas

Multiple sub-schemas $R_{1},R_{2},...,R_{n}$ have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas $R_{i},R_{j}$ to form a new schema $R_{i,j}$ , we use this new schema (rather than $R_{i}$ or $R_{j}$ ) to form a lossless join with another schema $R_{k}$ (which may already be joined (e.g., $R_{k,l}$ )).^[vague]

Example

Let $R=\{A,B,C,D\}$ be the relation schema, with attributes $A$ , $B$ , $C$ and $D$ .
Let $F=\{A\rightarrow BC\}$ be the set of functional dependencies.
Decomposition into $R_{1}=\{A,B,C\}$ and $R_{2}=\{A,D\}$ is lossless under $F$ because $R_{1}\cap R_{2}=A$ and we have a functional dependency $A\rightarrow BC$ . In other words, we have proven that $(R_{1}\cap R_{2}\rightarrow R_{1}\setminus R_{2})\in F^{+}$ .^[5]^[6]

References

^ Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.
^ Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.
^ Maier, David (1983). The theory of relational databases (PDF). Computer Science Press. p. 101. ISBN 0-914894-42-0. Retrieved 16 August 2024.
^ ^a ^b Ullman, Jeffrey D. (1988). Principles of Database and Knowledge-base Systems (PDF) (1 ed.). Computer Science Press. p. 397. ISBN 0-88175188-X. Retrieved 16 August 2024.
^ "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.
^ "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.

[1] Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.

[Elmasri-2] Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.

[3] Maier, David (1983). The theory of relational databases (PDF). Computer Science Press. p. 101. ISBN 0-914894-42-0. Retrieved 16 August 2024.

[Ullman1988-4] Ullman, Jeffrey D. (1988). Principles of Database and Knowledge-base Systems (PDF) (1 ed.). Computer Science Press. p. 397. ISBN 0-88175188-X. Retrieved 16 August 2024.

[5] "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.

[6] "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.

[1]

[2]

[3]

[4]

[5]

[6]