Distributed Systems: Security, Fault Tolerance, and Recovery

Posted on Feb 28, 2025 in Business Administration and Innovation Management

Security Issues in Distributed Systems

Distributed systems face two primary security challenges:

Secure Communication

Ensuring authentication, message integrity, and confidentiality requires secure channels to prevent eavesdropping, tampering, and message forgery.

Authorization

Verifying if the client is permitted to perform specific operations after authentication.

Threat Models

A threat model identifies security risks and vulnerabilities in a system.

Steps in Threat Modeling

Identify Security Objectives: Confidentiality, integrity, availability.
Decompose the Application: Analyze system components, data flow, and entry points.
Identify and Rank Threats: Evaluate risks based on severity.
Develop Countermeasures: Propose fixes like encryption or secure protocols.
Document Findings: Create a comprehensive report for stakeholders.

Benefits: Early identification of vulnerabilities, saving cost and time.

Challenges: Time-consuming, requires skilled personnel.

Authentication and Authorization

Authentication: Verifies the identity of users (e.g., password, biometrics).

Authorization: Ensures the user has appropriate access rights after authentication.

Techniques

Authentication: Passwords, 2FA, biometrics.
Authorization: Role-Based Access Control (RBAC)

Encryption and Decryption

Encryption: Converts plaintext into ciphertext to ensure confidentiality.

Decryption: Converts ciphertext back to plaintext using a key.

Techniques

Symmetric Key Encryption (e.g., AES).
Public Key Encryption (e.g., RSA).

Fault Tolerance in Distributed Systems

Fault tolerance ensures the system continues operating correctly despite failures.

Types of Faults

Transient Faults: Occur temporarily (e.g., temporary network glitch).
Intermittent Faults: Recurring faults (e.g., unstable hardware).
Permanent Faults: Require repair or replacement (e.g., burned-out chip).

Failure Models

Crash Failures: System halts without warning.
Omission Failures: Missing request or response.
Timing Failures: Response time exceeds the allowed interval.
Response Failures: Incorrect output or unexpected state transitions.
Byzantine Failures: Malicious or random incorrect outputs.

Fault Tolerance Techniques

Redundancy
- Information Redundancy: Error correction codes (e.g., Hamming codes).
- Time Redundancy: Retry operations after failures.
- Physical Redundancy: Extra hardware or software for backup.
Replication
Use multiple identical components (e.g., servers, processes) to handle failures.
- Primary-Backup: A primary system and its backups.
- Active Replication: All replicas process requests simultaneously.
Consensus Algorithms
- Paxos: Majority voting ensures agreement on a single value.
- Raft: Leader election and log replication for distributed agreement.

Recovery Mechanisms

Backward Recovery: Restores a previous correct state using checkpoints.
Forward Recovery: Corrects errors without rolling back to a previous state.

Example: Retransmitting lost packets in communication is a backward recovery technique.

Distributed Mutual Exclusion

Ensures only one process accesses a critical section (CS) at a time.

Approaches

Token-Based: A unique token is passed between processes.
Non-Token-Based: Processes exchange messages to decide access.
Quorum-Based: Processes seek permissions from a subset of nodes.

Distributed Systems: Security, Fault Tolerance, and Recovery

Security Issues in Distributed Systems

Secure Communication

Authorization

Threat Models

Steps in Threat Modeling

Authentication and Authorization

Techniques

Encryption and Decryption

Techniques

Fault Tolerance in Distributed Systems

Types of Faults

Failure Models

Fault Tolerance Techniques

Redundancy

Replication

Consensus Algorithms

Recovery Mechanisms

Distributed Mutual Exclusion

Approaches

Recent Notes

Subjects

Publicidad