Software Development Lifecycle and System Design Principles

Phases in the SDLC

  • Project Planning Phase
  • Analysis Phase
  • Design Phase
  • Implementation Phase
  • Support Phase

Modality

  • Relationships have a modality of either “required” or “optional”, which refers to whether an instance of an entity can exist without a related instance in the related entity.
  • Can we have a customer instance without a related custom drone order instance?
  • Can we have a customer drone order instance without a customer instance?
  • Modality indicates whether the relationship between an entity instance and an instance of the related entity is null (optional) or not null (required).
  • Place a zero on the relationship line next to the parent entity if nulls are allowed.
  • Place a bar on the relationship line next to the parent entity if nulls are not allowed.

Reliability

  • The ability to continue working correctly, even when things go wrong.
  • The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
  • Fault is not the same as failure.
    • Fault: One component of the system deviating from its spec.
    • Failure: When the system as a whole stops providing the required service.
  • It is impossible to reduce the probability of fault to zero, and we need to design systems to prevent faults from causing failures.

What is Scalability?

  • Scalability describes the system’s ability to cope with increased load.
    • The system has grown from 1000 concurrent users to 1 million, causing performance degradation.
  • Scalability is not a one-dimensional label.
    • It is meaningless to say “X is scalable” or “Y doesn’t scale”.
    • It is about “if the system grows in a particular way, what are our options to cope with the growth?”.

Design Principles for High Maintainability

  • Operability
    • Make it easier for operations teams to keep the system running smoothly.
  • Simplicity
    • Make it easier for new engineers to understand the system.
  • Evolvability
    • Make it easier for engineers to make changes to the system in the future.

Relational Data Model

  • Data is organized into relations (called tables in SQL), and each relation is an unordered collection of tuples (rows in SQL).

Analysis of JSON Model

  • JSON model reduces the mismatch between the application code and the storage layer.
  • The JSON representation has better locality than the multi-table schema.
    • To fetch a profile in the relational example, we need to perform multiple queries or multi-way join.
    • To fetch a profile in the JSON model, we just retrieve an entry.
  • JSON representation makes the one-to-many relationship explicit, using a tree structure.

Online Transaction Processing (OLTP)

  • An application typically looks up a small number of records, and new records are inserted or updated based on the user’s inputs. These applications are interactive.
  • The basic access pattern of business transactions, comments on blog posts, actions in a game, etc.
  • Some example transactional queries:
    • Adding a new order to the order database.
    • Changing the address of a customer.

Online Analytical Processing (OLAP)

  • OLAP has very different access patterns from OLTP.
  • OLAP query needs to scan over a huge number of records, only reading a few columns per record, and calculates aggregate statistics (such as count, sum) rather than returning the raw data.
  • Some example analytical queries:
    • What was the total revenue of each of our stores in January?
    • How many more bananas than usual did we sell during our latest promotion?

Star Schemas for Analytics

  • Star schema is a way to organize data (also known as dimensional modeling).
  • At the center of the schema is a fact-table, each row of the fact table represents an event that occurred at a particular time.
  • Some of the columns in the fact table are attributes, others are foreign key references to other tables, called dimension tables.
  • As each row in the fact table represents an event, the dimensions represent the who, what, where, when, how, and why of the events.

Read and Write in a Replication Environment

  • If the data that is replicated does not change over time, replication is easy: just copy the data to every node once.
  • Handling changes to replicated data is difficult, because we need to ensure that changes are made on all nodes, with very low latency.
  • Every write to the database needs to be processed by every replica.

Leaders and Followers

  • Each node that stores a copy of the database is called a replica.
  • To ensure data consistency, one common solution is using leader-based replication (also called active/passive replication).

Partitioning and Replication

  • Partitioning is usually combined with replication so that copies of each partition are stored on multiple nodes.
    • Even if each record only belongs to exactly one partition, it is still stored on multiple nodes, for fault tolerance purposes.
  • A node may store more than one partition.
  • The choice of partitioning schema is independent of the choice of replication schema.

How to Partition Data?

  • How do we decide which records to store on which nodes, when partitioning a large amount of data?
  • Our goal of partitioning: spread data and query load evenly across nodes.
    • Ideally, 10 nodes should be able to handle 10 times as much data and 10 times the throughput of a single node.
  • If the partitioning is unfair, some partitions have more data or queries than others, we call it skewed.
    • The presence of skew makes partitioning less effective.
    • A partition with disproportionally high load is called a hot spot – imagine all the load end up on one partition, and 9 out of 10 nodes are idle.

Random Partitioning

  • The simplest approach to avoid hot spots is to assign records to nodes randomly.
  • Advantage: Distribute the data quite evenly across the nodes.
  • Disadvantage: When we try to read a particular item, there is no way for us to know which node it is on – so we need to query all nodes in parallel.

Partition by Key Range

  • One way of partitioning is to assign a continuous range of keys to each partition, like a paper encyclopedia.

Design Introduction

  • Interface design: The process of defining how the system interacts with the external entities.
  • User interface components:
    • Navigation mechanism: The way in which the user gives instructions to the system.
      • Example: buttons, menus.
    • Input mechanism: The way in which the system captures information.
      • Example: forms for adding new customers.
    • Output mechanism: The way in which the system provides information to the user.
      • Example: reports.
  • From user interface (UI) to user experience (UX).
    • Users’ feelings, motivations, as well as efficiency, effectiveness, satisfaction are all considered.

Designed Objects Must Be Usable

  • Effective: Does the thing.
  • Safe: Does not harm anyone while doing the thing.
  • Efficient: Does the thing (more) quickly and easily.
  • Learnable: Lets the user master it quickly.
  • Integrated: Fits in with other objects and practices.
  • Satisfying: Is enjoyable and aesthetically pleasing.
  • Accessible: Presents no barriers to its users.

Implementation Phase Objectives

  • Be familiar with the system construction process.
  • Explain different types of tests and when to use them.
  • Describe how to develop user documentation.

Testing Concepts

  • A test component: A part of the system that can be isolated for testing.
  • A fault (also called bug or defect): A design or coding mistake that may cause abnormal component behavior.
  • An erroneous state: A manifestation of a fault during the execution of the system.
    • Caused by one or more faults and can lead to a failure.
  • A failure: A deviation between the specification and the actual behavior.
    • Triggered by one or more erroneous states. Not all erroneous states trigger a failure.

Testing Concepts

  • A test case: A set of inputs and expected results that exercises a test component with the purpose of causing failures and detecting faults.
  • A test stub: A partial implementation of components on which the tested component depends.
  • A test driver: A partial implementation of a component that depends on the test component. Test stubs and drivers enable components to be isolated from the rest of the system for testing.
  • A correction: A change to a component.
    • The purpose is to repair a fault.
    • Note that a correction can introduce new faults.

Type of Tests

  • Blackbox tests: Focus on the input/output behavior of the component.
    • Do not deal with the internal aspects of the component, nor with the behavior or the structure of the components.
  • Whitebox tests: Focus on the internal structure of the component.
    • Make sure that, independently from the particular input/output behavior, every state in the dynamic model of the object and every interaction among the objects is tested.

Model Transformation

  • A model transformation is applied to an object model and results in another object model.
  • The purpose of object model transformation is to simplify or optimize the original model, bringing it into closer compliance with all requirements in the specification.

Transformation Principles

  • Each transformation must address a single criterion.
    • A transformation should improve the system with respect to only one design goal.
  • Each transformation must be local.
    • A transformation should change only a few methods or a few classes at once.
  • Each transformation must be applied in isolation to other changes.
    • To further localize changes, transformations should be applied one at the time.
  • Each transformation must be followed by a validation step.

Optimizing the Object Design Model

  • Four simple but common optimizations:
    • Add associations to optimize access paths.
    • Collapse objects into attributes.
    • Delay expensive computations.
    • Cache the results of expensive computations.
  • When applying optimizations, developers must strike a balance between efficiency and clarity.
    • Optimizations increase the efficiency of the system but also the complexity of the models, making it more difficult to understand the system.

Mapping Contracts to Exceptions

  • Treat each operation in the contract individually and add code within the method body.
  • Checking preconditions:
    • Preconditions should be checked at the beginning of the method, before any processing is done.
  • Checking postconditions:
    • Postconditions should be checked at the end of the method, after all the work has been accomplished and the state changes are finalized.
  • Checking invariants:
    • When treating each operation contract individually, invariants are checked at the same time as postconditions.
  • Dealing with inheritance:
    • The checking code for preconditions and postconditions should be encapsulated into separate methods that can be called from subclasses.

Mapping Contracts to Exceptions

  • Heuristics for mapping contracts to exceptions:
    • Omit checking code for postconditions and invariants.
      • Checking code is usually redundant with the code accomplishing the functionality of the class, and is written by the developer of the method. It is not likely to detect many bugs unless it is written by a separate tester.
    • Focus on subsystem interfaces and omit the checking code associated with private and protected methods.
      • System boundaries do not change as often as internal interfaces and represent a boundary between different developers.
    • Focus on contracts for components with the longest life, that is, on code most likely to be reused and to survive successive releases.
      • Entity objects usually fulfill these criteria, whereas boundary objects associated with the user interface do not.
    • Reuse constraint checking code.
      • Many operations have similar preconditions. Encapsulate constraint checking code into methods so that they can be easily invoked and so that they share the same exception classes.

Mapping Object Models to Relational Databases

  • A schema is a description of the data, that is, a meta-model for data.
  • Relational databases store persistent data in the form of tables (relations).
  • A table is structured in columns, each of which represents an attribute.
  • A primary key of a table is a set of attributes whose values uniquely identify the data records in a table.
    • Sets of attributes that could be used as a primary key are called candidate keys.

Mapping Object Models to Relational Databases

  • A foreign key is an attribute (or a set of attributes) that references the primary key of another table.

Mapping Classes and Attributes

  • We map each class to a table with the same name.
    • Each data record in the table corresponds to an instance of the class.
  • For each attribute, we add a column in the table with the name of the attribute in the class.
  • Select a data type for the database column.
  • Select primary key, two options:
    • Identify a set of class attributes that uniquely identifies the object.
    • Add a unique identifier attribute that we generate.