Relational analysis

Introduction in relational analysis

Relations are heart of Relational Databases. In general, relation topic is not complex to understand, however it is hardly can be comprehensively gathered from theory of SQL statements, if you just start from SQL learning.

This short guide will help to get basic information about relations and relational analysis.

Teide can help you with providing of comprehensive information of relation, showing data in different views, and quick investigation of relations without develoment of long SQL statements.

Sometimes getting of specific relations can be really take significant time and mental efforts.

Terms

Table is a structure related to some uniform objects. For example, persons, cars, timeslots in schedule etc

Data in database reflect data of real objects. It means that some information about object can be coded, some information can be simplified, some information can be omited. Field of table should be considered as a property of object. This is main premise to getting idea of relations.

If some property is absent, it has null value.

Primary key is one or several fields of table that unique for all objects. Other words, you can directly find the only object based on field or fields value of primary key. For example, for cars Primary key can be VIN code, for people can be combination of given name (first name, middle name), family name and birthdate. You can select fields for primary key when you design the table.

If you cannot find appropriate fields to include into primary key, you can assign artificial fields such as unique identifier like GUID or counter. If you specify this field as primary key, the key is called as surrogate key.

If your table includes some other field or field combination with unique values, and it may be considered as key, it is called as alternative key. Please don't use term of secondary key, because this is incorrect term.

Foreign key is a reference from one table to other table. If you want to refer from one table to other table, you can add fields of primary key of table you want refer to. Thus, you will able to directly describe object in other table (that you refer to), and directly find object in other table.

Relation is a rule of linking of two tables. One table should be master table, another table is subordinate table or linkable and linking table. However this is just technical specifics of relation. In reality, Relational data model allows much more various relations.

Most simple relation is use of foreign key in one table, with reference to other table.

However, you can get relation based on any fields or properties of object. For example, you can link people and cars based on favorite color of people and color of cars.

Result of link in relational databases are expressed via query execution. A query can include one, two or many tables. The result of query is called tuple. Fields in tuple are called attributes.

Cartesian product is a term from combinatorics. This term is very important in relations. It helps to answer (this is main answer), what happens if the same property will meet in related tables for more than one time.

Cartesian product function returns result of all combinations. It means the following. For example, you relate people and cars based on favorite color of people and color of car. In the list of people you have two persons Irene and Marta who likes red color. Also among cars you have red Opel and red Ferrari. Result of Cartesian product will be 4 pairs: Irene - Opel, Irene - Ferrari, Marta - Opel, Marta - Ferrari.

Cartesian product frees us from the question, how to link two same values in two related tables: the result is versatile and voluminous. This approach works good for most of queries.

However if you need getting exactly the same number of records as in one of tables, this approach is not appropriate. For example, you can find car for every people based on their color preference. In this way you need to match people and cars.

Relation results

Query with relation of two tables, you can be interested in different results. The best way of necessary results can be represented in Euler-Venn diagrams:

(a1 ( a2/b2 ) b1)

Euler-Venn diagram allows to understand specifics of intersection data. For example, you intersects people and cars by color (favorite color and car color).

Table 1. People
Name Favorite color
Maria Green
Irene Red
Marta Red
Dora Yellow
Olivia Dark blue
Table 2. Car
Car Color
Opel Red
Ferrari Red
Nissan Yellow
Peugeot Green
Volvo Silver

Result of relation is intersection that in designated as join: Result = People join Car (on Favorite color of People = Car Color).

In this expression People is left table, Car is right table.

Intersection function is symmetric. It means that "People join Car" returns the same result as "Car join People". However if we impose some constraints (for example, only European cars or names started from consolant letters), we need to know what table is in left and what is in right.

As you can see from these tables:

1) Maria, Irene, Marta, Dora have pair with Opel, Ferrari,Nissan, Peugeot

2) Olivia does not have pair in People table

3) Volvo does not have pair in Car table

In terms of relations:

Pairs from 1) form natural join (Maria-Peugeot, Irene-Opel, Irene-Ferrari, Marta-Opel, Marta-Ferrari, Dora-Nissan)

Records from 2) form left residue (Olivia).

Records from 3) form right residue (Volvo).

Combination of 1), 2) and 3) also have own names:

2) + 1) is left join (Olivia-null, Maria-Peugeot, Irene-Opel, Irene-Ferrari, Marta-Opel, Marta-Ferrari, Dora-Nissan)

1) + 3) is right join (Maria-Peugeot, Irene-Opel, Irene-Ferrari, Marta-Opel, Marta-Ferrari, Dora-Nissan, null-Volvo)

2) + 1) + 3) is full join (Olivia-null, Maria-Peugeot, Irene-Opel, Irene-Ferrari, Marta-Opel, Marta-Ferrari, Dora-Nissan, null-Volvo)

2) + 3) is full residue (Olivia-null, null-Volvo).

Sometimes relations are designated via relation types, standardly it is related with constraints as allowed relations.

Constraints are used for controlling of possible relations among tables. Some tables by their meaning should be mandatory linked with other tables. For example, every appartment should be located in some house. Appartment cannot exists out of house. It means that table of appartments should have correct and existing (not null) reference to table of houses.

Relation is designated via number of records. Colon separates number of allowed records in left and in right tables.

The following examples show different types of relations:

0, 1:0, 1 - means that left table should have 0 or 1 records, right table should have also 0 or 1 records.

1:1 - both tables should have the only one records. One table should fully complies another one. For this relation both of tables can be merged.

1: 0, 1, .., n - one record in left table should have zero, one or many records in right table. For example, left table is buildings, right table is apartments. House can have zero, one or many apartments but apartments cannot be with no reference to house.

1: 1, .., n - one record in left table should have one or many records in right table. For example, university group should consists of one or many students. Groups with no students are not allowed. And all students should be assigned to some group.

n:n - the most free relation. Many students can study in manu groups. The important point is that this relation cannot be designed in tables directly. If you need to design this relation, you have to enter auxiliary table. For example, 1.Groups, 2.Students, 3.Students teaching in groups.