Information Storage and Management: A Comprehensive Guide
1. Active-Active vs. Active-Passive
AA-Symm
Hosts can perform I/Os through any controller.
AP-VNX
Hosts can perform I/Os only through the controller that owns the LUN (MODULE 4)
2. Advantages of IP (4)
- Uses existing network
- Easily scalable
- Reduced hardware cost
- Uses existing security options
- Uses existing long-distance recovery solutions
3. Application
Software program that provides logic for computing operations (MODULE 2)
4. Application Virtualization
Presenting an application to an end-user without installation or dependency on the user’s computing platform (MODULE 2)
5. Archive Architecture Components (3)
- Agent: Installed software on the application server scans data to be archived and leaves a stub reference file (small)
- Server: Software on the server defines archiving policies
- Storage Device: Stores fixed content
6. Backup Architecture (3)
- Backup Client: Gathers data to be backed up and sends it to the storage node
- Backup Server: Manages backup operations and keeps a catalog of backups
- Storage Node: Writes data to the storage device and manages it
7. Backup in Virtualized Environment (2)
- Traditional: Agent is located on the VM or on the hypervisor, taking precious processing power away from the host system
- Imaged-Based: Agent is on the hypervisor, takes a snapshot of the whole system, saves it as an image file, and saves it to the proxy server
8. Backup Methods (3)
- Hot: Application is running, and users may access data during backup
- Cold: Requires application shutdown
- Bare-Metal: Full system start from scratch
9. Backup Operation (7 Steps)
- Backup server initiates backup
- Retrieves information from the catalog
- Backup server tells the node to tell storage to load the storage medium
- Backup server tells clients to send data to the device
- Clients send data
- Node sends data to storage
- Node sends metadata to the backup server
- Backup server updates the catalog
10. Backup Targets (3)
- Tape: Portable, sequential, shoe-shining effect
- Disk: Random access, reliable
- Virtual Tape: Tricks existing data center software into thinking that they’re still using tape, not disk
11. Backup Topologies (4)
- Direct-Attached: Application server/client/node are all on the same device connected over LAN to the backup server and FC to the backup device
- LAN-Based Backup: Everything is separate and connected by a LAN
- SAN-Based Backup: FC used for everything except for LAN for metadata from the client to the backup server
- Mixed-Backup: Combination of LAN and SAN topologies
12. Benefits of OSD (4)
- Security and reliability (object ID)
- Platform independence
- Scalability
- Manageability (i.e., ATMOS)
13. Beta Methods to Make FCoE Lossless (Converged Enhanced Ethernet) (4)
- Priority-Based Flow Control (PFC): Uses pause functionality
- Enhanced Transmission Selection (ETS): Intelligently allocates bandwidth
- Congestion Notification (CN): Identifies congestion and notifies the host
- Data Center Bridging Exchange Protocol (DCBX): New advanced Ethernet protocol
14. Big Data
Data sets beyond the capacity of common software tools (i.e., Greenplum analytics software) (MODULE 1)
15. Block-Level Storage Virtualization
Using a virtual appliance, groups LUNs into massive pools of portable, non-disruptive virtual LUN volumes (i.e., EMC VPLEX)
16. Block vs. File Level Access
- Block: File system created on the host, requests for raw data sent over the network
- File Level: Request sent over the network to a separate file system, which gets data quickly but is more expensive (MODULE 2)
17. Business Continuity
Process that prepares for, responds to, and recovers from system outages (downtime)
18. Business Continuity Lifecycle (5 Steps)
- Establishing objectives
- Analyzing
- Designing
- Implementing
- Training/Maintaining
19. Business Continuity Technology Solutions (3)
- Removing single points of failure with redundancy (failure of one component that terminates the availability of the entire system)
- Multipathing software (reroutes I/Os down different paths if one goes dead)
- Backup and replication
20. Business Impact Analysis
Identifies which business units are essential to the survival of the business and what a loss will cost a company
21. Cache
Large amount of volatile, semi-conductor memory. Grabs memory in nanoseconds (MODULE 4)
22. Cache Data Protection (2)
- Mirroring: Each write is sent to two memory locations
- Vaulting: Runs cache with battery until all memory is placed in vault drives (MODULE 4)
23. Cache Tiering
Uses DRAM for primary and FLASH drives for secondary cache memory
24. CAS Features (10)
- Content authenticity (uses binary blob to create ID)
- Location independence
- Single-instance storage (won’t copy the same object twice)
- Retention enforcement (for policy reasons)
- Data protection
- Faster retrieval
- Load balancing (multiple nodes)
- Scalability
- Self-diagnosis
- Audit-trail (i.e., CENTERA)
25. Cloud Computing
Model for convenient, on-demand access to a shared pool of computing resources that requires minimal management effort
26. Cloud Deployment Models (4)
- Public: Open use by everybody
- Private:
- Community: Intranet
- Hybrid: Some public/some private
27. COFW Principle vs. COFA Principle
COFW
Uses a bitmap that tracks changed blocks on the production FS – reads come from the production FS
28. Common I/O Characteristics of an Application (3)
- Read vs. write-intensive
- Sequential vs. random
- I/O size (MODULE 2)
29. Components of an Object (4)
- Object ID (unique algorithm ensures data integrity forever)
- Data
- Metadata
- Attributes
30. Components of iSCSI (3)
- Initiator (like host-based iSCSI HBA)
- Target (array or gateway)
- Network (IP)
31. Components of LVM
- Physical volumes (individual disks)
- Volume groups (groups of physical disks)
- Logical volumes (disk partitions) (MODULE 2)
32. Components of NAS Head
- CPU/Memory
- NIC
- OS
- Protocols
- Ports
33. Components of SAN (5)
- Node ports
- Cables
- Connectors
- Interconnecting devices
- Management software
34. Components of Unified Storage (4)
- Storage controller
- NAS head
- OSD node
- Storage
35. Compute Hardware Components (3)
- CPU
- Memory
- I/O devices (MODULE 2)
36. Compute (Host)
Resource that runs applications (MODULE 2)
37. Compute Software Components (4)
- OS
- Device drivers
- File system
- Volume manager (MODULE 2)
38. Compute Virtualization
Running multiple OS’s on one host by masking hardware components (MODULE 2)
39. Concatenation
LVM groups several disks to appear as one large disk (MODULE 2)
40. Connectors (3)
- SC-Standard
- LC-Smaller, better
- ST-Fiber patch channels
41. Content Address Storage (CAS)
Archiving solution designed to store fixed content
42. Continuous Data Protection Components (3)
Occurs at the network layer. Three components: journal volume, CDP appliance, and write splitter (i.e., EMC Recoverpoint)
43. Core Elements of a Data Center
“All Data Centers Seem Nice”
- Application
- Database
- Compute
- Storage
- Network (MODULE 1)
44. DAS
Direct Attached Storage. Storage attached directly to the host (internal or external). Pros: Cheap. Cons: Bad scaling and no storage pooling (MODULE 2)
45. Data
Collection of raw facts (MODULE 1)
46. Data Archive
A place where fixed content is stored: online, nearline, or offline
47. Data Deduplication
Process of identifying and reducing redundant data
48. Data Migration Replication
Moves data between heterogeneous
49. Data Transfer Rate (Internal/External)
Time taken for data to transfer from platter -> R/W head -> buffer -> interface -> HBA (MODULE 2)
50. DBMS
Database Management System. Organizes structured data (i.e., ORACLE, MYSQL). Application requests data, the database supplies it and tells the OS to get it from storage (MODULE 2)
51. Dedicated vs. Global Cache
- Dedicated: VNX separate memory for read and separate for writes
- Global: SYMM I/O from any memory location, great, only one set of addresses (MODULE 4)
52. Deduplication Implementations (2)
- Source-Based: Deduplicated at the source, less traffic over the network but increased overhead on the client
- Target-Based: Dedups on the target, offloading processing on the client but more network bandwidth used
53. Deduplication Methods (2)
- File Level or Single Instance: After one file is stored, all identical copies refer to it
- Subfile Level: More advanced, detects redundancy within files
54. Defense-in-Depth
Basically says just look everywhere while protecting
55. Define Backup and Uses (3)
Additional copy of data with the sole purpose of recovering lost data
- Disaster recovery
- Operational recovery
- Archive
56. Define iSCSI
Encapsulates SCSI I/O into IP packets and transports them with TCP/IP
57. Define NAS
IP-based, high-speed, dedicated file sharing and storage device
58. Define RAID
It is a technique that combines multiple disk drives into a logical unit (RAID set) and provides protection, performance, or both (MODULE 3)
59. Define SAN
High-speed, dedicated network of servers and shared storage devices. 15 million devices per network
60. Describe a Hot Spare
A hot spare refers to a spare drive in a RAID array that temporarily replaces a failed disk drive by taking the identity of the failed disk drive (MODULE 3)
61. Describe Dependent Write I/O Principle
If the writes from a database don’t all occur and in the correct order, the replica won’t make a copy
62. Describe FC-AL Connectivity Option
Nodes most share the network through a hub that supports up to 126 nodes or without a hub in a ring of devices
63. Describe FCIP Topology
Uses two parallel FCIP gateways that translate local FC data into IP packets for long-distance travel and convert them back into FC data at the other end
64. Describe FC-SW
Creates a logical space (fabric) in which all nodes communicate directly with each other through switches
65. Describe Flushing Host Buffers
Means the replica must flush memory in host RAM before it mirrors the data
66. Desktop Virtualization
Allows the whole user desktop environment (OS, user settings, apps) to be managed centrally and available dynamically to all types of devices (MODULE 2)
67. Device Driver
Software that enables the OS to recognize a specific device (MODULE 2)
68. Difference Between Backup and Replication
Replication is an exact copy that is instantly mirrored; in case of disaster, it offers seamless business continuance. However, backups are advantageous because they offer point-in-time historical data retrieval before user/environmental corruption occurred (http://bit.ly/yAiibn)
69. Disaster Recovery vs. Disaster Restart
- Disaster Recovery: All about RESTORING systems to where they were before the disaster, think backups
- Disaster Restart: All about RESTARTING business operations with replication technology. Think a McDonald’s burning down, disaster recovery is restoring the building to how it used to be before the fire, and disaster restart is the grand reopening where they start selling burgers again
70. Disk Buffered Remote Replication
Host writes data to the source, where it is locally replicated, which in turn is remotely replicated to another site, which is again locally replicated
71. Disk Drive Components (5)
- Platter
- Spindle
- R/W head
- Actuator arm
- Drive controller board (MODULE 2)
72. EMP
Enterprise Management Platform. It is a suite of applications that provides a great way to manage and monitor components
73. Essential Cloud Characteristics (5)
- On-demand self-service (need more devices/storage? Take them!)
- Broad network access (can use on several devices)
- Resource pooling
- Rapid elasticity (quickly scaled resources)
- Measured service (MR. ROB)
74. Fabric Services (4)
- Fabric login
- Name server
- Fabric controller (RSCN – “Hey all, node x joined the fabric, welcome him”)
- Management server
75. Fabric-Wide Access Control: Access Control Lists
Uses policy control, which specifies which HBAs and storage nodes can be connected to a particular switch
76. Fabric-Wide Access Control: Fabric Binding
Prevents an unauthorized switch from joining an existing switch in the network
77. Fabric-Wide Access Control: RBAC
Role-based access control enables the security admin to assign specific roles to different users
78. Factors for Digital Data Growth (4)
- Increased processing capabilities
- Low cost of storage
- Affordable communication technology
- Increasing number of smartphones and applications (MODULE 1)
79. Factors for Disk Drive Performance (3)
- Seek time
- Rotational latency
- Data transfer rate (MODULE 2)
80. Factors to Consider to Judge Vulnerability (3)
- Attack Surface: Refers to potential entry points
- Attack Vectors: Steps taken to launch an attack
- Work Factor: Refers to the amount of time and effort to exploit an attack vector
81. FC Addressing and Format
Assigned to nodes at fabric login, dynamic. Domain ID, Area ID, Port ID. 8 bits each. 239x256x256
82. FC Exchange
Occurs in level 4. Composed of one or more sequences. The protocol occurs between two hosts in an exchange
83. FC Frame
Fundamental transfer unit at level 2 that typically transfers SCSI data
84. FC Interconnecting Options (3)
- Point-to-point
- FC-AL (arbitrated loop)
- FC-SW (switched fabric)
85. FCIP
Long-distance solution, often associated with disaster recovery. This protocol allows virtual FC links across distributed FC data islands
86. FCoE
FCoE is a protocol that translates FC data over Ethernet, reducing cost and management
87. FCoE Components (3)
- Converged Network Adapter (CNA): Combination of FC HBA and standard NIC on the same card
- Cable: Copper and fiber optic
- FCoE Switch: Has Ethernet and FC switch capabilities
88. FC Protocol Stack (5)
- 0 – Defines the physical interface
- 1 – Defines how data is encoded
- 2 – Defines the structure of frames, routing, flow control, etc.
- 4 – Defines the protocols
89. FC SAN Topologies (4)
- Full Mesh: Each switch is connected to another switch; only one hop needed
- Partial Mesh: Not all switches are connected; multiple hops required
- Core-Edge Single: Switches attach to the director, which attaches to storage
- Dual Core-Edge: Same as single but with interconnecting directors
90. FC Sequence
Contiguous set of frames sent from one port to another
91. File
Collection of records/data stored as one unit (MODULE 2)
92. File-Level Virtualization
Virtualized appliance allowing a simple, non-disruptive, file-mobility solution by allowing logical paths to files instead of physical paths
93. File System
Hierarchical storage of files (i.e., FAT32, NTFS for Windows and UFS for Unix) (MODULE 2)
94. Fixed Content
Data that doesn’t change, at the end of its lifecycle (i.e., bank checks, x-rays)
95. Four Pillars of Multitenancy (4)
- Secure separation
- Service assurance
- Availability
- Management
96. Front End
Provides an interface between the host and storage system (MODULE 4)
97. Hardware vs. Software RAID Implementation
Uses host-based software, sacrifices CPU cycles, while hardware uses a controller on the host or array (MODULE 3)
98. Host-Based Remote Replication Methods (2)
- LVM-Based Replication: LVM sends data to two different sites
- Log Shipping: Database-related – everything is sent to a log (buffer) and periodically sent to the target
99. Host-Based Replication Methods (2)
- LVM-Based Mirroring: LVM writes each host write to two different physical locations
- File system snapshot
100. Hypervisor
Sits between VMs and hardware (MODULE 2)
101. ILM
Information Lifecycle Management, a proactive strategy to lower costs. Information is less valuable tomorrow than it is today
102. Information
Knowledge derived from data (MODULE 1)
103. Information Availability
Ability of IT infrastructure to function according to business expectations, during a specified time of operation. Uptime / (uptime + downtime)
104. Information Availability Defined by (3)
- Accessibility: Accessible to the user when required
- Reliability: Reliable and correct
- Timeliness: Window when it should be available
105. Information-Centric Storage
Environment where storage devices are managed centrally and shared across multiple servers (MODULE 1)
106. Information Security Framework (4)
- Confidentiality: Only authorized users have access
- Integrity: Information remains unaltered
- Availability: Users have reliable and timely access to systems
- Accountability: Log of events that ensure every process has an owner
107. Intelligent Storage System
RAID array with highly optimized I/O processing capabilities (MODULE 4)
108. Interconnecting Devices (3)
- Hubs
- Switches
- Directors
109. I/O Controller Percentage
Basically says that you shouldn’t use a controller past 70% of capacity, or you’ll lose performance (response time) (MODULE 2)
110. IOPS Problem Steps (2)
- Determine disk capacity needed: total capacity / capacity of a single disk
- Peak IOPS application needs / IOPS of disk
- IOPS of disk = seek + (0.5 / disk rpm / 60) + (data block size / data transfer rate) (MODULE 2)
111. IP Protocols (2)
- iSCSI
- FCIP
112. iSCSI Security Implementations (2)
- CHAP: Challenge Handshake Authentication Protocol uses a hash
- iSNS: Similar to FC zoning, devices must be configured in the same domain
113. iSCSI Discovery Methods (2)
- SendTargets Discovery: Manually give the initiator the target’s information
- Internet Storage Naming Service (iSNS): Uses host-based software that auto-registers all targets
114. iSCSI Host Connectivity Options (3)
- Standard NIC with iSCSI software
- TCP Offload Engine (TOE): Like a NIC but offloads TCP processing to the card
- iSCSI HBA: Offloads all processing to the HBA
115. iSCSI Name (2)
Unique ID used to identify initiators and targets
- iqn: Uses the organization’s registered domain name
- eui: Burned-in global unique ID
116. iSCSI Topologies (3)
- Native: All components run on an IP-based network
- Bridged: Uses an iSCSI gateway to translate IP data to FC data
- Combination of native and bridged using an iSCSI and FC port on the array
117. ISL
Interswitch Link. Connection between switches. E-PORT
118. Key Characteristics of a Data Center
“PC MI ASS!”
- Performance
- Capacity
- Manageability
- Integrity
- Availability
- Security
- Scalability (MODULE 1)
119. Key Data Center Management Activities
- Monitoring
- Reporting
- Provisioning (MODULE 1)
120. Key Storage Components for Monitoring (3)
- Servers
- Networks
- Storage arrays
121. Local Replication in a Virtual Environment (2)
- Mirroring at the virtual volume
- Replication of virtual machines (clone & snapshot)
122. Local Replication Technologies (3)
- Host-based
- Network-based
- Storage array-based
123. Logical Volume Manager
Sits between the file system and physical disks. It can divide disks (partition) or join disks to appear bigger to the file system (concatenation)
124. Login Types in Switched Fabric (3)
- FLOGI: Between N&F “I would love to enter the fabric, may I? Yes, you may”
- PLOGI: Between N&N “May I introduce myself to you, fellow node?”
- PRLI: Between N&N “Let’s speak SCSI”
125. LRU vs. MRU
- LRU: Least Recently Used – ditches data that hasn’t been accessed for a long time (common)
- MRU: Most Recently Used – discards the most recently used data (MODULE 4)
126. LUN Masking
Occurs on the storage array. Decides which LUNs will be seen by which hosts (MODULE 4)
127. Magnetic Tape
Low-cost, sequential data solution for long-term storage (archiving) (MODULE 2)
128. Memory Virtualization
Fools the application into thinking there is more memory available than there is by using disk space and a swap-file process. Inactive memory is moved to disk and, when needed, swapped back to RAM (memory) (MODULE 2)
129. MetaLUN (2)
Is a method to expand LUNs.
- Concatenated: Provides capacity but no additional performance
- Striped: Adds capacity and performance (MODULE 4)
130. MTBF
Mean Time Between Failure. Calculations: total MTBF / # of drives (MODULE 3)
131. MTBF
Average uptime or time between failures. Total uptime / # of failures. The goal is 99.999%/year
132. MTTR
Average time to repair an outage. Total downtime / # of failures
133. Multitenancy
Multiple tenants using the same set of storage resources
134. Multi vs. Single-Mode Fiber Cables
- Multi-Mode: Only covers up to 500m because of modal dispersion
- Single-Mode: Up to 10km
135. NAS Backup Implementations (4)
- Server-Based: Bad – data is sent over LAN, overloading the network
- Serverless: Eliminates the client, one less hop, still over LAN
- NDMP 2-Way: Only metadata is sent over LAN, all else is sent off the network, good!
- NDMP 3-Way: Common private LAN for backup data. All implementations use NAS heads
136. NAS Device Components
- NAS head
- Storage
137. NAS File Sharing Protocols (2)
- CIFS: Stateful (auto-restores if connectivity is lost)
- NFS: Unix-based
138. NAS Implementations (3)
- Unified: Single storage platform for block/file, easy management
- Gateway NAS: Uses external/independent storage with a separate Gateway NAS head (allows FC in the back-end)
- Scale-Out: Isilon, pools multiple nodes together into a cluster for an easily scalable NAS device that pools resources
139. Object-Based Data
Unstructured data that’s at the end of its lifecycle stored as objects in flat-addressed space based on content, not address. Accessed over IP using REST and SOAP (XML-based)
140. Operating System
Software between applications and hardware responsible for controlling the entire environment (MODULE 2)
141. OSD System Components (3)
- OSD nodes; servers on the device that stores and assigns the object ID (metadata service) and maps storage locations (storage service) using an OS
- Internal network
- Storage
142. Parameters Managed (5)
- Capacity
- Reporting
- Accessibility
- Performance
- Security
143. Parameters Monitored (4)
- Capacity
- Accessibility
- Performance
- Security
144. Partitioning
LVM divides one disk into several logical disks (MODULE 2)
145. PCIe Card
Caching done on the host that uses smart technology (i.e., data used by the application frequently is kept on the host) (MODULE 4)
146. Physical Components of Connectivity (3)
- Host adapter (HBA or NIC)
- Port
- Cable (MODULE 2)
147. Physical Disk Structure
Platter -> tracks -> sectors (512 bytes) (MODULE 2)
148. Point-in-Time vs. Continuous
Point-in-time is timestamped like a backup, whereas continuous is in-sync with production data at all times
149. Point-to-Point Connectivity
Direct connection between nodes, used in a DAS environment
150. Port Types (4)
- N-Port: End of fabric, connects to node devices
- E-Port: Between two switches
- F-Port: On the switch that connects to the N-Port
- G-Port: Generic port on the back of the switch
151. Primary Cloud Service Models (3)
- Software-as-a-Service: Consumers deploy software, OS, and applications on the provider’s infrastructure (i.e., EC2)
- Platform-as-a-Service: Consumer-created apps are housed on the provider’s OS and infrastructure
- Infrastructure-as-a-Service: Consumers use the provider’s applications on the cloud infrastructure
152. Profiling
In a server-type backup, the process of taking a snapshot of application server CONFIGURATIONS
153. Protocol
Defined format for device communication done through controllers (i.e., IDE, ATA, SCSI) (MODULE 2)
154. RAID Array
Enclosure that contains physical and logical (RAID sets) drives with a RAID controller (MODULE 3)
155. Read Hit vs. Read Miss
- Read Hit: Host issues a request, the processor finds it in the cache
- Read Miss: Host issues a request, the processor gets it from disks, copies it to the cache, and sends it to the host (MODULE 4)
156. Remote Replication
Process of creating replicas at remote sites