BİL 354 Veritabanı Sistemleri Relational Database Design (İlişkisel Veri Tabanı Tasarımı)
Integrity Constraints (Bütünlük Kısıtlamaları) Domain Constraints (Alan Kısıtlamaları) Referential Integrity (Referans Kısıtlamaları) Others (Nitelikler Arası Bağımlılıklar) Functional Dependencies (İşlevsel Bağımlılıklar)
Redundancy Redundancy is at the root of several problems associated with relational schemas: Redundant storage: repeatedly. Some information is stored Update anomalies: If one copy is updated, all copies must be similarly updated. Delete anomalies: It may not be possible to delete some information without losing some other information as well. Insert anomalies: It may not be possible to store some information unless some other information is stored as well.
Update Anomaly
Insertion Anomaly
Deletion Anomaly
How about null values? Hourly Emps(ssn, name, lot, rating, hourly_wages, hours_worked) Null values cannot eliminate redundant storage or update anomalies! However they can address insertion and deletion anomalies to some extent. Deletion : rating and hourly_wages is non-null, ssn is null?
Use of Decompositions Hourly_Emps2(ssn, name, lot, rating, hours worked) Wages(rating, hourly wages)
Decomposition (Ayrıştırma) Decomposition should be used carefully, or it may create more problems. Two questions that should be asked repeatedly: Is there reason to decompose a relation? Normal Forms proposed to address this question What problems (if any) does the decomposition cause? Lossless join Dependency preserving
Decomposition (Ayrıştırma) Lossless-join Enable us to recover any instance of decomposed relation from corresponding instances of the smaller relations. Dependency-preservation o o Enable us to enforce any constraint on the original relation by simply enforcing some constraints on each of the smaller relations. Thus, we need not performs joins of the smaller relations to check whether a constraint on the original relation is violated.
Example Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
Example : Decomposition Decompose the relation schema Lending-schema into: Branch-customer = (branch-name, branch-city,assets, customername) customer-loan = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R 1, R 2 ): R = R 1 R 2 Lossless-join decomposition. For all possible relations r on schema R r = R1 (r) R2 (r)
Example : Decomposition
Lossy-join Decomposition The Relation branch-customer customer-loan
Desirable Properties of Decomposition It may become necessary to decompose a relation into several smaller relations. Lossless-Join Dependency Preservation Non-Repetition of Information Using functional dependencies we can define normal forms that represent «good» db design.
Desirable Properties of Decomposition 2 Lossless-Join: enable to recover any instance of decomposed relation from corresponding instances of smaller relations. Dependency Preservation: enable to enforce any constraint on the original relation by simply enforcing some constraints on each of the smaller relations. We need not perform joins of the smaller relations to check whether a constraint on the relation is violated. Non-Repetition of Information
Functional Dependencies (İşlevsel Bağımlılık) A functional dependency(fd) X Y holds over relation R if, for every allowable instance r of R: t1 r, t2 r, (t1) = (t2) implies Y (t1) = X Y (t2) X i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.) X Y iff each X value is associated with precisely one Y value. Thus, given a tuple and the values of the attributes in X, one can determine the corresponding value of the Y attribute
Functional Dependencies A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Example: StudentId Name but not Name StudentId If X Y then if X is updated, Y must also be updated
Functional Dependencies An FD is a statement about all allowable relations. Must be identified based on semantics of application. Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R! K is a candidate key for R means that K R However, K R does not require K to be minimal! Consider Enrolled(sid, cid, cname, semester) (sid,cid) and (sid, cname) are candidate keys (sid, cid, cname) is a superkey
Functional Dependency Kavramsal Düzeyde FNO FADI, FADRESİ ÜKODU, FNO SFİYATI Gerçek dünyada FADI biricik ise: FADI FNO, FADRESİ Olgu (örnek) Düzeyinde FNO FADI, FADRESİ FADI FNO, FADRESİ ÜKODU, FNO SFİYATI FADRESİ FNO, FADI
Example TAŞIT (PLAKANO, MARKA, MODEL, YIL, AĞIRLIK, RENK) Bu örnekteki işlrevsel bağımlılıklar şunlardır: PLAKANO MARKA, MODEL, YIL, AĞIRLIK, RENK MARKA, MODEL AĞIRLIK
Example R ilişki olgusunun işlevsel bağımlılıkları : D C A, B C, D A, C B, D A, D B, C B, C D
Types of Functional Dependencies 1/2 R (ÖNO, ÖADI, BNO, BADI, FAKNO, DKODU, DADI, KRD, NOTU) Kısmi İşlevsel Bağımlılık. Eğer X A yı belirliyorsa ve X in en az bir öz altkümesi de A yı belirliyorsa (X A ve Z X : Z A) X A işlevsel bağımlılığına kısmi işlevsel bağımlılık denir. ÖNO, ÖADI BNO ÖNO, DKODU KRD Tam işlevsel bağımlılık. Eğer X A yı belirliyorsa ve X in hiçbir öz altkümesi A yı belirlemiyorsa (X A ve Z X : Z A) X A işlevsel bağımlılığına tam işlevsel bağımlılık denir. ÖNO ÖADI, BNO ÖNO, DKODU NOTU
Types of Functional Dependencies 2/2 Önemsiz İşlevsel Bağımlılık. Eğer X A yı belirliyorsa ve A X in bir altkümesi ise (X A ve A X ) X A işlevsel bağımlılığına önemsiz işlevsel bağımlılık denir. ÖNO, ÖADI ÖADI BNO, BADI, FAKNO BNO, FAKNO Önemli İşlevsel Bağımlılık. Eğer X A yı belirliyorsa ve A X in bir altkümesi değilse (X A ve A X ) X A işlevsel bağımlılığına önemli işlevsel bağımlılık denir. ÖNO, ÖADI, DKODU BNO, KRD ÖNO, DKODU NOTU Geçişli işlevsel bağımlılık. Eğer X Y yi, Y de Z i belirliyorsa (X Y ve Y Z ) X Z işlevsel bağımlılığına geçişli işlevsel bağımlılık denir. ÖNO FAKNO (ÖNO BNO, BNO FAKNO)
Armstrong s Axioms (İ.B. Türetme Kuralları) Reflexivity (dönüşlülük): If X Y, then Y X Augmentation (arttırma): If X Y, then XZ Y for any Z Transitivity (geçişlilik): If X Y and Y Z, then X Z
Other Rules Union (Birleşim): If X Y and X Z, then X YZ Decomposition (Ayrışım): If X YZ, then X Y and X Z Pseudo Transitivity (Sözde Geçişlilik): If X Y, then YZ W and XZ W Accumulation rule: If X YZ and Z W, then X YZW
Example R (A, B, C, D, E, G) F: A BD BC EG D E DG E (artırma kuralına göre) A E ( ayrıştırma ve geçişlilik kurallarına göre) A ABD (dönüşlülük ve birleşim kurallarına göre) A B ve A D (ayrıştırma kuralına göre) AC EG (ayrıştırma ve sözde geçişlilik kurallarına göre)
Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F +. We can find all of F + by applying Armstrong s Axioms:
Computing F + F + = F; loop { For each f in F, apply the reflexivity and augmentation rules and add the new FDs to F +. For each pair of FDs in F, apply the transitivity rule and add the new FDs to F + } until F + does not change any more.
Artıklık Algoritması 1. Başlangıçta T = { X } yap 2. F { f } teki her W Z işlevsel bağımlılığı için: eğer {W} T ise T = T {Z} yap 3. T değiştiği sürece 2. adımı tekrarla 4. Sonuçta eğer Y T ise (f : X Y) işlevsel bağımlılığı F de artıktır. Ayrıştırma yapılır. Artık olan her f F den çıkarılır.
Example 1/2 R (A, B, C, D, E, G) F: A BCDE G BD BC E CG A BDE ACG F : A B G B BDE A A C G D BDE C A D BC E BDE G A E CG A
Example 2/2 F de A B artık mı? T = {A,C,D,E} elde edilir hayır F de A C artık mı? T = {A,B,D,E,C,G} elde edilir evet F 1 = F {A C} F 1 de A D artık mı? T = {A,B,E} elde edilir hayır F 1 de A E artık mı? T = {A,B,D} elde edilir hayır F 1 de G B artık mı? T = {G,D} elde edilir hayır F 1 de G D artık mı?t = {G,B} elde edilir hayır F 1 de BC E artık mı? T = {B,C} elde edilir hayır F 1 de CG A artık mı? T = {C,G,B,D,E,A} elde edilir evet F 2 = F 1 {CG A}
Functional Dependencies Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F. An efficient check: Compute attribute closure of X (denoted ) Set of all attributes A such that X A is in There is a linear time algorithm to compute this. Check if Y is in Does F = {A B, B C, C D E } imply A E? i.e, is A E in the closure F? Equivalently, is E in? X
Computation of Attribute Closure X + F closure := X; --since X X + F Repeat until no change in closure if there is an FD Z V in F such that Z closure then closure := closure V If T closure then X T is entailed by F
Example A B C A D E B D A F B {A, B} + = {A, F} + = {A, B, C, D, E} {A, F, B, D, C, E}
Example X X F + F: AB C A D D E AC B Is AB E entailed by F? Yes Is D C entailed by F? No Result: X F + allows us to determine FDs of the form X Y entailed by F A {A, D, E} AB {A, B, C, D, E} (Hence AB is a key) B {B} D {D, E}
What is the attribute closure good for? Test if X is a superkey compute X +, and check if X + contains all attrs of R Check if X Y holds by checking if Y is contained in X + Another (not so clever) way to compute closure S + of FDs for each subset of attributes X in relation R, compute X + with respect to S for each subset of attributes Y in X +, output the FD X Y 37
Normal Forms (Normal Biçimler) Returning to the issue of schema refinement, the first question to ask is whether any refinement is needed! If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/minimized. This can be used to help us decide whether decomposing the relation will help. 38
Normal Forms First Normal Form (1NF): A relation is in 1NF, if every field contains atomic values. This requirement is implicit in our definition of the relational model. Second Normal Form (2NF): (obsolete) each nonkey attribute in the relation must be functionally dependent upon the primary key. Third Normal Form (3NF) Boyce-Code Normal Form (BCNF) Fourth Normal Form (4NF) (for the interested) Increasing requirements 39
1NF ÖĞRENCİ (ÖNO, ÖADI, DERS (DADI, NOTU)) ÖĞRDERS (ÖNO, ÖADI, DADI, NOTU)
2NF Asal Nitelik: ilişki anahtarlarında yeralan nitelikler. Asal Olmayan Nitelik. Bir ilişki 1NF ise Asal olmayan niteliklerden hiçbiri anahtarlardan hiçbirine kısmi işlevsel bağımlı değil ise (tam bağımlı). SATICI (ÜKODU, FNO, FADI, FADRESİ, SFİYATI) FNO FADI, FADRESİ
3NF Reln R with FDs F is in 3NF if, for all X A in A X (called a trivial FD), or X is a superkey, or F A is part of some key for R. Minimality of a key is crucial in third condition above! If R is in BCNF, obviously in 3NF. If R is in 3NF, some redundancy is possible. Lossless-join,dependency-preserving decomposition of R into a collection of 3NF relations always possible.
BCNF (Boyce Codd Normal Form) Reln R with FDs F is in BCNF if, for all X A in A X (called a trivial FD), or X contains a key for R. (X is a superkey) In other words, R is in BCNF if the only nontrivial FDs that hold over R are key constraints. F If example relation is in BCNF, the 2 tuples must be identical (since X is a key).
BCNF doesn t always have a dependency-preserving decomposition. Third normal form may be preferable to having to take a join to check dependencies after an update. 44
Lossless-Join (Yitimsiz Birleştirme) A decomposition should not lose information Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F: (r) (r) = r It is always true that r (r) (r) In general, the other direction does not hold! If it does, the decomposition is lossless-join. X Y X Y
Lossy Decomposition The following is always the case: r r 1 r 2... r n But the following is not always true: r r 1 r 2... r n Example: r r 1 r 2 SSN Name Address SSN Name Name Address 1111 Joe 1 Pine 1111 Joe Joe 1 Pine 2222 Alice 2 Oak 2222 Alice Alice 2 Oak 3333 Alice 3 Pine 3333 Alice Alice 3 Pine The tuples (2222, Alice, 3 Pine) and (3333, Alice, 2 Oak) are in the join, but not in the original
Lossy Decompositions: What is Actually Lost? In the previous example, the tuples (2222, Alice, 3 Pine) and (3333, Alice, 2 Oak) were gained, not lost! Why do we say that the decomposition was lossy? What was lost is information: That 2222 lives at 2 Oak: In the decomposition, 2222 can live at either 2 Oak or 3 Pine That 3333 lives at 3 Pine: In the decomposition, 3333 can live at either 2 Oak or 3 Pine
Is this decomposition lossless? Tournament Year Winner Winner Date of Birth Indiana Invitational 1998 Al Fredrickson 21 July 1975 Cleveland Open 1999 Bob Albertson 28 September 1968 Des Moines Masters 1999 Al Fredrickson 21 July 1975 Indiana Invitational 1999 Chip Masterson 14 March 1977 Tournament Year Winner Indiana Invitational 1998 Al Fredrickson Cleveland Open 1999 Bob Albertson Des Moines Masters 1999 Al Fredrickson Indiana Invitational 1999 Chip Masterson Winner Winner Date of Birth Al Fredrickson 21 July 1975 Bob Albertson 28 September 1968 Al Fredrickson 21 July 1975 Chip Masterson 14 March 1977
Is this decomposition lossless? StudentId CourseId Dept Grade 12 CS101 IE B- 12 CS102 IE C+ 12 IE201 IE C- 7 CS201 CS A 7 CS352 CS D StudentId CourseId Grade 12 CS101 B- 12 CS102 C+ 12 IE201 C- 7 CS201 A 7 CS352 D StudentId Dept 12 IE 7 CS
Ayrıştırmaların Yitimsizlik Sınaması
Dependency Preserving Decomposition (İşlevsel Bağımlılıkların Korunması) Consider CSJDPQV, C is key, JP C and SD P. BCNF decomposition: CSJDQV and SDP Problem: Checking JP C requires a join! Dependency preserving decomposition (Intuitive): If R is decomposed into X, Y and Z, and we enforce the FDs that hold on X, on Y and on Z, then all FDs that were given to hold on R must also hold.
Dependency Preserving Decompositions Decomposition of R into X and Y is dependency preserving if (F X union F Y ) + = F + ABC, A B, B C, C A, decomposed into AB and BC. Is this dependency preserving? Is C preserved? A Dependency preserving does not imply lossless join: ABC, A And vice-versa! B, decomposed into AB and BC.
İşlevsel Bağımlılıkların Korunması Alg.
Decomposition into BCNF Input: relation R, set S of FDs over R. Output: a set of relations in BCNF. 1. Compute S +. 2. Compute keys for R (from ER or from S + ). 3. Use S + and keys to check if R is in BCNF. If not: a. Pick a violation FD A B. b. Expand B as much as possible, by computing A +. c. Create R1 = A B, and R2 = R B. d. Find the FDs over R1, using S +. Repeat for R2. e. Recurse on R1 & its set of FDs. Repeat for R2. 4. Else R is already in BCNF; add R to the output.
Decomposition into BCNF Given: relation R with FD s F. Look among the given FD s for a BCNF violation X ->B. Compute X +. Replace R by relations with schemas: 1. R 1 = X +. 2. R 2 = R (X + X ). cont. with two new relations.. 55
Example Drinkers(name, addr, beersliked, manf, favbeer) F = name->addr, name -> favbeer, beersliked->manf Pick BCNF violation name->addr. Close the left side: {name} + = {name, addr, favbeer}. Decomposed relations: Drinkers1(name, addr, favbeer) Drinkers2(name, beersliked, manf) 56
Example, Continued We are not done; we need to check Drinkers1 and Drinkers2 for BCNF. For Drinkers1(name, addr, favbeer), relevant FD s are name->addr and name->favbeer. Thus, {name} is the only key and Drinkers1 is in BCNF. 57
Example, Continued For Drinkers2(name, beersliked, manf), the only FD is beersliked->manf, and the only key is {name, beersliked}. Violation of BCNF. beersliked + = {beersliked, manf}, so we decompose Drinkers2 into: 1. Drinkers3(beersLiked, manf) 2. Drinkers4(name, beersliked) 58
Example, Concluded The resulting decomposition of Drinkers : 1. Drinkers1(name, addr, favbeer) 2. Drinkers3(beersLiked, manf) 3. Drinkers4(name, beersliked) Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like. 59
Decomposition into 3NF Reading