Understanding Software Architecture: CAP Theorem, Design Principles, and Quality Attributes

CAP Theorem

The CAP theorem states that any networked shared-data system can have at most two of three desirable properties:

  • Immediate Consistency (C): Equivalent to having a single up-to-date copy of the data. A system is consistent if a modification is applied to all nodes in the same logical time, and therefore, when the information is retrieved, all data returns the same result. All nodes see the same information at the same time.
  • High Availability (A): Of that data (for updates). Each request received from a node must have a response even if all other nodes fail.
  • Tolerance to Network Partitions (P): A request must be processed by the system even if arbitrary messages are lost between some or all of the system nodes, that is, if one is separated from the network, the system will remain available.

However, by explicitly handling partitions, designers can optimize consistency and availability, thereby achieving some trade-off of all three.

Example:

C: The Percentiles database is a function of the Pneumonia model. However, updating the Percentiles database is far from immediate: the database is updated once every hour, while the Pneumonia model changes every second. Therefore, there is no immediate consistency.

A: If the first node fails but not the second, we still have the percentiles, which can be regarded as a “summary” of the original data. If the second node fails but not the first one, we still have the original pneumonia model. However, no availability tactics have been applied to the solution described in the exercise, so we cannot claim that this system provides high availability.

P: If the network connecting both nodes fails, i.e., if a partition occurs, the first node can still receive emails, extract information, and update the model. So at least that node is partition-tolerant. In the case of a partition, the second node could not update its statistics every hour. The problem statement does not indicate that the second node checks the network to see if it is eventually working.

3 Types of Links:

  • CA: The system will always respond to requests, and the processed data will be consistent. No loss of communication between nodes is allowed.
  • AP: It will always respond to requests, even if communication between nodes is lost. The processed data may be inconsistent.
  • CP: It will perform operations consistently, even if communication between nodes is lost, but it does not ensure that the system responds.

Software Architecture: (Architectural Views)

The software architecture of a system is the set of structures needed to reason about the system, which comprises software elements, the relationships between them, and the properties of both.

Architectural Views:

  • Module View: A module is an abstraction of source code. The module view represents the modules and relates them through usage dependencies and encapsulations.
  • Component-and-Connector View (C&C): Components and connectors are run-time elements such as data in a database, instances of data structures (e.g., an instance of a linked list), and threads (we will use the terms “thread” and “process” as synonyms). Connectors denote relations among components.
  • Allocation View: For our purposes, this view will represent the hardware elements where the software is deployed and executed.

Guidelines for Obtaining a Preliminary Sketch:

  • Create a module for every machine domain obtained from the problem analysis.
  • Create a module for every model obtained from the problem analysis. This module will represent the source code needed for implementing the data structures of the model; however, the data contained in the model belong to a C&C view.
  • For the remaining designed domains obtained from the problem analysis, consider the necessity of adding additional modules for representing them.
  • Create one or several modules for the user interface(s).
  • Create modules representing virtual machines for communication with physical devices and external software systems.
  • Establish usage dependencies among modules.
  • By default, each module obtained from a machine domain will be executed by its own thread unless it can be proven that module A must be executed before module B (in that case, modules A and B will share the same thread).
  • Apply the CAP theorem.

Quality Attributes: Trade-offs

For this course, we consider three quality attributes that impact software architecture design:

  • Modifiability: “The ease with which changes can be made to a system” [1, p.27]. “Our interest in it centers on the cost and risk of making changes” [1, p.117].
  • Performance: “The software system’s ability to meet timing requirements”.
  • Availability: “The ability of a system to mask or repair faults such that the cumulative service outage period does not exceed a required value over a specified time interval” [1, p.79]. There are four types of faults [1, p.85]:
    • Omission
    • Crash
    • Timing
    • Response.
    The artifact (resource that is required to be highly available) may be a processor, communication channel, process, or storage [1, p.85].

Trade-offs

  • Select tactics for each quality attribute relevant to the problem.
  • Take into account that a tactic may improve a quality attribute while damaging others.
  • Create modules for implementing the selected tactics.
  • For every quality attribute relevant to the problem, specify which parts of your architectural design promote the attribute and which parts damage it.

Tactics for Improving Modifiability:

  • Split a Module with a Great Deal of Capability: Modifying a module with great capacity is very costly, so it is better to divide them into smaller modules. Reduce module size.
  • Increase Semantic Coherence: If the responsibilities of modules A and B do not have the same purpose, you should consider assigning them to different modules. – – > Implies creating a new module or moving to a different module. Increase cohesion.
  • Use an Intermediary: Breaks the dependency between two responsibilities (A and B). We want to reduce the coupling of two methods and increase coherence. Ex: Virtual machine. Reduce coupling.
  • Restrict Dependencies: Reduces the number of modules that interact with a given one. This tactic is achieved by restricting the visibility of a module. Ex: Layered architecture – – > A layer can only use lower layers and wrappers. Reduce Coupling.
  • Refactor Modules: When two modules have, at least partially, the same responsibility, the code must be modified to assign it to only one of them; otherwise, a change in this part will affect both modules. The common factor of responsibility is removed.
  • Abstract Common Service: If two modules provide not exactly the same but similar services, it may be profitable to implement the services only once in a more general format. Any modification to the (common) service will occur at one time or another, thus reducing modification costs.

Tactics for Increasing Performance:

There are two tactics:

For Controlling Resource Demand:

  • Reduce Sampling Frequency: If we can reduce the sampling frequency without losing significant information, this will reduce demand. Common in signal processing systems.
  • Limit Event Response: When events arrive at the system too quickly to be processed, this tactic suggests creating a queue (until the events can be processed) where they accumulate: you need a component of my software that puts them in the queue.
  • Prioritize Events: Events in queues can be prioritized, or if we have multiple queues, we can see which ones have higher priority. Ignoring events consumes minimal resources –> Higher Performance.
  • Reduce Computational Overhead: 4 types:
    • Remove Intermediaries: We gain performance but lose modifiability.
    • Asynchrony Everywhere: I call and do other things meanwhile.
    • Data Partitioning: We partition the data, thus getting the data into a couple of databases. (T.CAP).
    • Stateless Modules: The value of the attributes is the state of the objects of the class.
  • Bound Execution Times: A limit is placed on the execution time used to respond to an event. This tactic is frequently combined with managing the sampling frequency.

For Improving Resource Management:

  • Maintain Multiple Copies of Computations: Have copies of the data or calculations. In a client-server architecture, the existence of many servers allows us to perform several copies of the same computations in parallel. This reduces the chances of something happening if all the computations are on the same server.
  • Maintain Multiple Copies of Data: Copies in the cache where I keep copies of data with different access speeds. (Problem: writing things to a hard drive takes time). 2 types:
    • Caching: Storage with different access speeds.
    • Data Replication: Maintaining separate copies of data to reduce contention from multiple simultaneous accesses.

Tactics for Increasing Availability:

There are two tactics:

For Detecting Faults:

  • Ping/Echo: Refers to a pair of asynchronous request/response messages exchanged between nodes, used to determine the accessibility and round-trip delay. Echo also determines that ping is alive and responding correctly.
  • Monitor (Watchdog): Useful component for monitoring the status of other parts of the system. A monitor can detect faults. _WatchDog: When the detection mechanism is used using a counter or timer that is periodically reset.
  • Heartbeat: Fault detection mechanism that employs a periodic exchange of messages between a system monitor and a process being monitored.

For Recovering from Faults:

  • Active Redundancy (Hot Spare): Refers to the configuration where all nodes in a protection group (a group of processing nodes where one or more nodes are active with the remaining nodes in the protection group serving as redundant spares) receive and process identical inputs in parallel, allowing the redundant spare to maintain state synchronously with the active nodes.
  • Passive Redundancy (Warm Spare): Configuration in which only members of the protection group process incoming traffic.
  • Spare (Cold Spare): Configuration in which redundant spares in a protection group remain out of service until a failure occurs.
  • Rollback (Use of Checkpoint): This tactic allows the system to return to the previous known good state. (Rollback timeline upon detection of a failure).

Notation for the Module View

A dashed arrow from ModuleA to ModuleB means that ModuleA uses the services offered by ModuleB. If we draw a module A inside another module B, we mean that A is encapsulated into B. Modules implementing a data repository (e.g., a module abstracting the Oracle Database source code) must be accompanied by UML class diagrams describing the conceptual schema. The specific data are run-time elements and therefore do not belong to the module view but to the C&C view.

Notation for the C&C View

In the case that we need to represent data, we will use the UML object notation. In the case that we need to represent threads (processes), we will denote them with simple names (T1, T2, …) using the notation Thread (T1), Thread (T2), etc. We will use a matrix to relate threads with the modules that they are executing. Alternatively, we can use the following notation: Runs (T, M1, …, Mn) where T is the name of a thread and M1, …, Mn are the names of the modules which are run by T. If we want to denote that threads T1 and T2 need to run in synchronized mode (for reads and writes performed by some module M) then we will write Syn (T1, T2, M) where M is the name of the module whose thread-safe execution must be guaranteed by the operating system.

Notation for the Allocation View

A cube denotes a node: a physical object with computing capabilities. A solid line denotes a physical link between two nodes. It is also called “communication channel”, “input-output channel”, and “I/O channel”. The node name and type are shown, underlined, into the cube. Modules deployed in every node are shown in a traceability matrix. Alternatively, we can use the following notation: Dep (N, M1, …, Mn) where M1, …, Mn are module names and N is the name of a node where those modules are deployed.

Abstractions About Computer Nodes

We assume the following properties about computer nodes employed in allocation views:

  • They are equipped with an operating system capable of providing threads (processes) and mechanisms for synchronizing those threads.
  • There is always an initial thread called Init.
  • Any thread can request the creation of new threads to the operating system and link those new threads with specific modules.
  • The module containing the User Interface is always executed in its own thread; that thread is automatically provided by the operating system (there is no need to request it explicitly). We will refer to such a thread by the identifier UIT.
  • For every I/O channel connected to a computer node, the operating system automatically provides a thread that executes whatever callback methods (contained in modules) we link to the processing of the events associated with that I/O channel. There is no need to request the creation of those I/O threads. Notation: if an I/O channel is named C, then the associated I/O thread will be denoted by Thread(C).

Alloy Code Example:

sig PneumoniaCause {} { some d : DiagnosedPneumoniaCase | this = d.Cause (a) } (b) lone sig ViralInfection, BacterialInfection, MicroorganismCause, MedicationCause, AutoimmuneDisease extends PneumoniaCause {} (c) sig Hospital{} sig Treatment {} sig PneumoniaCase {} sig DiagnosedPneumoniaCase in PneumoniaCase { (d) cause: PneumoniaCause, hospital: Hospital } sig RecoveredPneumoniaCase in PneumoniaCase { (e) treatment: Treatment } { this in DiagnosedPneumoniaCase (f) } fact{ PneumoniaCase = DiagnosedPneumoniaCase + RecoveredPneumoniaCase (g) } assert NoRecoveryWithoutDiagnosis { no (RecoveredPneumoniaCase not in DiagnosedPneumoniaCase) (h) } assert NoIsolatedCause { all p: PneumoniaCause | some d: DiagnosedPneumoniaCase | p = d.cause (i) }

Explanation:

  • Pneumonia causes cannot exist in isolation: if a certain cause exists, then at least a case must have been diagnosed for such a cause.
  • Pneumonia causes are disjoint.
  • Every pneumonia cause has one atom at most.
  • Pneumonia cases are either diagnosed or recovered; every recovered case must have been diagnosed as well.
  • Assertion NoRecoveryWithoutDiagnosis checks that every recovered case must have been diagnosed as well.
  • Assertion NoIsolatedCause checks that there are no causes for which no case has been diagnosed.

——————————————————————————–