# The Basic Relation Calculation in Database

This article mainly introduces the basic operations and examples of relational algebra such as union, difference, selection, projection and join.

## Set

### Set Union

The union operation in relational algebra combines all tuples from two relations (i.e., tables) to form a new relation. The union operator is typically denoted by $∪$. Its mathematical definition is as follows:

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(A_1, A_2, \ldots, A_n)$, their union $R ∪ S$ is defined as:

$R ∪ S = \{ t \mid t \in R \text{ or } t \in S \}$where $t$ represents a tuple. $R$ and $S$ must be **union-compatible**, meaning they have the same attributes (columns).

**Note:** The union operation removes duplicate tuples, ensuring that the tuples in the resulting relation are unique.

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

The union of relations $R$ and $S$, denoted as $R ∪ S$, results in the following relation:

**Relation $R ∪ S$:**

In this example, the tuples $(2, Bob)$ and $(3, Carol)$ that are present in both relations $R$ and $S$ appear only once in the resulting relation $R ∪ S$. The union result includes all distinct tuples from both relations.

### Set Difference

The difference operation in relational algebra is used to remove tuples present in one relation from another relation. The difference operator is typically denoted by $−$.

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(A_1, A_2, \ldots, A_n)$, their difference $R - S$ is defined as:

$R - S = \{ t \mid t \in R \text{ and } t \notin S \}$where $t$ represents a tuple. The result of the difference operation contains only those tuples that are present in $R$ but not in $S$.

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

The difference operation $R - S$ results in the following relation:

**Relation $R - S$:**

In this example, the tuple $(1, Alice)$ exists only in $R$ and not in $S$, so it appears in the resulting relation $R - S$. The other tuples $(2, Bob)$ and $(3, Carol)$ also exist in $S$, so they are excluded from the difference result.

## Projection

The projection operation in relational algebra is used to select specific columns (attributes) from a relation to create a new relation. The projection operator is typically denoted by $\pi$ (Greek letter pi).

**Formal Definition:**

Given a relation $R(A_1, A_2, \ldots, A_n)$ and a subset of attributes $\{A_i, A_j, \ldots, A_k\}$, the projection operation $\pi_{A_i, A_j, \ldots, A_k}(R)$ is defined as:

$\pi_{A_i, A_j, \ldots, A_k}(R) = \{ \{A_i, A_j, \ldots, A_k\}(t) \mid t \in R \}$where $\{A_i, A_j, \ldots, A_k\}(t)$ represents the values of attributes $A_i, A_j, \ldots, A_k$ extracted from tuple $t$. The projection operation removes duplicate tuples in the result.

Assume we have a relation $R$ with the following structure and data:

**Relation $R$:**

Now, performing projection on relation $R$, selecting attributes $Name$ and $Age$:

$\pi_{Name, Age}(R)$**Resulting Relation:**

In this example, the projection operation selects the $Name$ and $Age$ columns from $R$, resulting in a new relation that includes only the data for these two attributes.

If the projection result contains duplicate tuples, the projection operation will automatically remove these duplicates, ensuring that the tuples in the resulting relation are unique.

## Selection

The selection operation in relational algebra is used to select tuples from a relation that satisfy a specific condition. The selection operator is typically denoted by $\sigma$ (Greek letter sigma).

**Formal Definition:**

Given a relation $R(A_1, A_2, \ldots, A_n)$ and a condition $\theta$ (such as a comparison or logical expression), the selection operation $\sigma_{\theta}(R)$ is defined as:

$\sigma_{\theta}(R) = \{ t \mid t \in R \text{ and } t \text{ satisfies condition } \theta \}$where $t$ represents a tuple and $\theta$ is a Boolean expression defining the selection condition.

Assume we have a relation $R$ with the following structure and data:

**Relation $R$:**

Now, performing selection on relation $R$, selecting tuples where $Age$ is greater than $25$:

$\sigma_{\text{Age} > 25}(R)$**Resulting Relation:**

In this example, the selection operation filters the tuples in $R$ where $Age$ is greater than $25$. The resulting relation includes all tuples that meet this condition.

The selection operation allows us to extract specific data from a relation by specifying conditions, which is very common in database queries.

## Rename

The renaming operation in relational algebra is used to change the name of a relation or its attributes (columns). The renaming operator is typically denoted by $\rho$ (Greek letter rho).

**Formal Definition:**

Given a relation $R(A_1, A_2, \ldots, A_n)$ and a new name $S(B_1, B_2, \ldots, B_n)$, the renaming operation $\rho_{S(B_1, B_2, \ldots, B_n)}(R)$ is defined as:

$\rho_{S(B_1, B_2, \ldots, B_n)}(R) = S(B_1, B_2, \ldots, B_n)$where $S(B_1, B_2, \ldots, B_n)$ represents the new relation name and attribute names.

Assume we have a relation $R$ with the following structure and data:

**Relation $R$:**

Now, performing the renaming operation on relation $R$, changing the relation name to $Person$, and renaming attributes $ID$ to $PersonID$, $Name$ to $FullName$, and $Age$ to $Years$:

$\rho_{\text{Person}(\text{PersonID}, \text{FullName}, \text{Years})}(R)$**Resulting Relation:**

In this example, the renaming operation changes the name of the relation $R$ to $Person$ and updates the attribute names. Renaming is useful, especially when you need to alias a relation or adjust the output format of query results.

## Inner Join

### Natural Join

The natural join operation in relational algebra combines two relations based on their common attributes (i.e., attributes with the same name), creating a new relation. The natural join operator is usually denoted by $\bowtie$.

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n, B_1, B_2, \ldots, B_m)$ and $S(B_1, B_2, \ldots, B_m, C_1, C_2, \ldots, C_k)$, their natural join $R \bowtie S$ is defined as:

$R \bowtie S = \{ t \mid t = (r \cup s) \text{ and } r \in R, s \in S \text{ and } r[B_i] = s[B_i] \text{ for all common attributes } B_i \}$The result of the natural join includes all combinations of common attribute values that are the same in both $R$ and $S$, while removing duplicate common attributes.

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

Now, performing the natural join $R \bowtie S$ on their common attribute $DeptID$:

$R \bowtie S$**Resulting Relation:**

In this example, only tuples with the same $DeptID$ values in both $R$ and $S$ are combined and appear in the result. The tuple $(3, Carol, 103)$ in $R$ does not have a matching $DeptID$ in $S$, so it does not appear in the result. Similarly, the tuple with $DeptID$ 104 in $S$ has no matching $DeptID$ in $R$, so it is not included in the result.

The natural join is commonly used in database queries, especially when combining multiple tables based on common fields, as it automatically removes duplicate common attributes, simplifying the query results.

### θ-Join

The θ-join (Theta Join) in relational algebra is a more general join operation that allows tuples from two relations to be joined based on any condition (θ). The θ-join is denoted by $R \bowtie_{\theta} S$, where $\theta$ is a condition expression that may include comparison operators (such as $=$, $<$, $>$, $<=$, $>=$, $<>$, etc.).

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(B_1, B_2, \ldots, B_m)$, and a condition $\theta$, the θ-join $R \bowtie_{\theta} S$ is defined as:

$R \bowtie_{\theta} S = \{ t \mid t = (r \cup s), r \in R, s \in S, \text{ and } r \\ \text{ and } s \text{ satisfy the condition } \theta \}$In a θ-join, the resulting relation contains all tuples $(r, s)$ that satisfy the condition $\theta$ and includes all attributes from $R$ and $S$.

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

Now, performing the θ-join on $R$ and $S$ with the condition $R.Name = S.Name$:

$R \bowtie_{R.Name = S.Name} S$**Resulting Relation:**

In this example, the condition for the θ-join is $R.Name = S.Name$, so only tuples with matching $Name$ values in $R$ and $S$ are joined and appear in the result. Tuples with names that do not match (e.g., $Carol$ and $Dave$) are not included in the result.

The θ-join is a flexible join operation that can combine two relations based on any condition. Its generality allows it to represent various join operations, including natural joins and equijoins.

### Equijoin

As mentioned, the equijoin is a special case of the θ-join where the condition $\theta$ is equality $=$.

### Semi Join

The semi-join is a relational algebra operation that selects tuples from one relation that match tuples in another relation. Unlike join operations, a semi-join result includes only attributes from the first relation and does not combine all attributes from both relations.

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(B_1, B_2, \ldots, B_m)$, the semi-join $R \ltimes S$ is defined as:

$R \ltimes S = \{ t \in R \mid \exists s \in S \text{ such that } t \\ \text{ and } s \text{ satisfy the join condition } \theta \}$where $\theta$ is the join condition, typically $t[A_i] = s[B_j]$ (i.e., equality join condition).

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

Now, performing the semi-join $R \ltimes S$ with the condition $R.DeptID = S.DeptID$:

$R \ltimes S$**Resulting Relation:**

In this example, the semi-join selects tuples from $R$ where $DeptID$ has a match in $S$, returning these matching tuples from $R$. Unlike join operations, the result includes only attributes from $R$, excluding attributes from $S$.

**Summary:** The semi-join is used to extract tuples from one relation that have matches in another relation without including the second relation's attributes. It is useful for query optimization and reducing data redundancy.

### Anti Join

The anti-join is a relational algebra operation used to select tuples from one relation that do not have matches in another relation. In other words, an anti-join returns tuples that do not satisfy the join condition.

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(B_1, B_2, \ldots, B_m)$, the anti-join $R \mathbin{\ltimes} S$ (or $R \setminus (R \bowtie S)$) is defined as:

$R \ltimes S = \{ t \mid t \in R \text{ and there is no } s \in S \text{ such that } t \\ \text{ and } s \text{ satisfy the join condition } \theta \}$where $\theta$ is the join condition, typically $t[A_i] = s[B_j]$ (i.e., equality join condition).

Assume we have two relations $R$ and $S$ with the following structure and data:

**Relation $R$:**

**Relation $S$:**

Now, performing the anti-join $R \ltimes S$ with the condition $R.DeptID = S.DeptID$:

$R \ltimes S$**Resulting Relation:**

In this example, the anti-join selects tuples from $R$ where $DeptID$ does not have a corresponding match in $S$. Tuples with $DeptID$ values 103 and 104 are included in the result because they have no matches in $S$.

**Summary:** The anti-join is used to find tuples from one relation that do not match any tuples in another relation based on the join condition. It is useful for identifying records that do not meet certain criteria, such as finding entities not included in a specific group or condition.

## Division

In relational algebra, the division operation is a higher-level operation used to select tuples from one relation that are associated with all tuples in another relation. In other words, the result of a division operation includes those tuples from the dividend relation that match every tuple in the divisor relation.

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(B_1, B_2, \ldots, B_m)$, where $B_1, B_2, \ldots, B_m$ is a subset of $A_1, A_2, \ldots, A_n$ (denoted as $B_i \subseteq A_i$), the division operation $R \div S$ is defined as:

$R \div S = \\ \{ t[A_1, A_2, \ldots, A_{n-m}] \mid \forall s \in S, \exists r \in R, \\ \text{ such that } r[B_1, B_2, \ldots, B_m] = s \}$The resulting relation includes tuples that represent the values of attributes $A_1, A_2, \ldots, A_{n-m}$ in $R$ that are associated with every tuple in $S$.

Suppose we have two relations, $R$ and $S$, with the following structures and data:

**Relation $R$:**

**Relation $S$:**

Now, performing the division operation $R \div S$ yields the students who have taken all the subjects listed in $S$:

$R \div S$**Resulting Relation:**

In this example, $R$ is a relation that includes students and the subjects they have taken, while $S$ is a list of subjects of interest. The result of the division operation is the set of students who have taken every subject in $S$, so only $Alice$ and $Bob$ appear in the result, as they have taken both $Math$ and $Physics$.

**Conclusion:** The division operation is particularly useful in database queries when looking for tuples that meet all specified conditions. It helps in finding a subset of tuples that fulfill a particular requirement.

## Outer Join

In relational algebra, an outer join is an extended join operation that preserves tuples that do not satisfy the join condition. Outer joins are classified into three types: Left Outer Join, Right Outer Join, and Full Outer Join. The result of these operations includes not only the tuples that satisfy the join condition but also the unmatched tuples, with $NULL$ used to fill in missing attribute values.

### Left Outer Join

**Formal Definition:**

Given two relations $R(A_1, A_2, \ldots, A_n)$ and $S(B_1, B_2, \ldots, B_m)$, the Left Outer Join $R⟕S$ is defined as:

$R⟕S = \{ t \mid t = (r \cup s) \text{ and } r \in R, s \in S \text{ satisfy the join condition } \theta \} \\ \cup \{ r \mid r \in R \text{ and there is no } s \in S \text{ that satisfies the join condition } \theta \}$The Left Outer Join returns all tuples from $R$. If a tuple in $R$ does not have a match in $S$, the corresponding $S$ attributes in the result are filled with $NULL$.

**Relation $R$:**

**Relation $S$:**

**Left Outer Join $R⟕S$ Result:**

In this example, $Carol$ and $Dave$ have $DeptID$s that do not match any $DeptID$ in $S$, so the $DeptName$ column is $NULL$ for these rows in the result.

### Right Outer Join

**Formal Definition:**

The Right Outer Join is similar to the Left Outer Join, except that it preserves all tuples from the second relation. If a tuple in the second relation has no match in the first relation, the corresponding attributes from the first relation are filled with $NULL$.

$R⟖S = \{ t \mid t = (r \cup s) \text{ and } r \in R, s \in S \text{ satisfy the join condition } \theta \} \\ \cup \{ s \mid s \in S \text{ and there is no } r \in R \text{ that satisfies the join condition } \theta \}$**Right Outer Join $R⟖S$ Result:**

In this example, the tuple with $DeptID$ 105 in $S$ has no matching tuple in $R$, so the $ID$ and $Name$ columns are $NULL$ for this row in the result.

### Full Outer Join

**Formal Definition:**

A Full Outer Join returns all tuples from both relations, including those that do not match. If a tuple in one relation has no corresponding match in the other, the attributes from the other relation are filled with $NULL$.

$R⟗S = (R⟕S) \cup (R⟖S)$**Full Outer Join $R⟗S$ Result:**

In a Full Outer Join, the result includes all tuples from both $R$ and $S$, with unmatched portions filled with $NULL$.

**Conclusion:**

**Left Outer Join**preserves all tuples from the left table, filling unmatched parts with $NULL$.**Right Outer Join**preserves all tuples from the right table, filling unmatched parts with $NULL$.**Full Outer Join**preserves all tuples from both tables, filling unmatched parts with $NULL$.

These outer join operations are particularly useful in database queries when we want to retain unmatched records in the result.