NoSQL and MongoDB: Scalable Database Solutions
Week 7: NoSQL
What is Big Data?
Scale – Dimensions
- Workload – Number of concurrent sessions and operations
- Operation mix (create, read, update, delete)
- Read – query mix
- Generally, each system use case represents a distinct workload
- Data Sets – Volume, Velocity, Variety
- Number of records
- Record size
- Record structure (e.g., sparse records)
- Homogeneity/heterogeneity of structure/schema
- Elasticity
- Runtime peaks and valleys – how frequently, how quickly, how much
Scalability
“Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.”
- Scale out: adding more nodes to a system
- Scale up: adding more resources to a single node
- Not considering temporal aspects
- how fast, how often and what granularity scaling actions can be performed
Characterizing Scalability
- Scalability = Dependability at Scale
- Criteria is often “cost increases linearly as <x> increases”
- Scalability is a composite quality that includes:
- Throughput
- Latency
- Availability
- Consistency
- Security
- and more
- Need to define this for each system
- The challenge is that the qualities are not independent – requirements must account for tradeoffs as system scales
- Quality Attribute Scenarios are one useful tool
- Scalability = Dependability at Scale
- Criteria is often “cost increases linearly as <x> increases”
- Scalability is a composite quality that includes:
- Throughput
- Latency
- Availability
- Consistency
- Security
- and more
- Need to define this for each system
- The challenge is that the qualities are not independent – requirements must account for tradeoffs as system scales
- Quality Attribute Scenarios are one useful tool
“Vertical” Scaling
- Scale up – “big iron”
- Monolithic compute resource
- Shared disk
- More load = bigger processor and disk
- Partitioned databases on disk
- Optimizes the data placement on separate files or disks
- Monolithic compute resource
- Separate database instances
- Partition database across database engine instance
- Functional partitioning common (e.g., customers, orders, stock)
- More compute, more license costs
“Horizontal” Scaling
- Spread data and workload across large clusters of commodity-class hardware
- For example, data sharding
- Horizontal data partitioning
- Many possible partitioning schemes, e.g. value based (region, customer ID) or arbitrary (hashing)
- Issues:
- Evenly distributing read load, write load, and data volume
- Handling shard failures
Sharding
Sharding is horizontal scaling that divides the data set and distributes the data over multiple servers, or shards.
Each shard is an independent database, and collectively, the shards make up a single logical database.
Horizontal Scaling Distributes Data
- Distributed systems theory is hard but well-established
- Lamport’s “Time, clocks and ordering of events” (1978), “Byzantine generals” (1982), and “Part-time parliament” (1990)
- Gray’s “Notes on database operating systems” (1978)
- Lynch’s “Distributed algorithms” (1996, 906 pages)
- Implementing the theory is hard, but possible
- Google’s “Paxos made live” (2007)
- Introduces fundamental tradeoff among “CAP” qualities
- Consistency, Availability, Partition tolerance (see Brewer)
- “When Partition occurs, tradeoff Availability against Consistency, Else tradeoff Latency against Consistency” (PACELC, see Abadi)
NoSQL – Horizontally-Scalable Database Technology
- NoSQL – Originally implied “No SQL”, but stance has moderated over time
- Designed to scale horizontally and provide high performance for a particular type of problem
- Most originated to solve a particular system problem/use case
- Later were generalized (somewhat) and many are available as open-source packages
NoSQL Databases – Common Characteristics
- No relational schema
- De-normalized data models (application manages redundancy)
- No join queries (use an aggregation framework like MapReduce instead)
- Dynamic schema – no enforcement by database, all enforcement in application
- Simple physical data models – less mismatch between programming language constructs and database constructs
- Built-in, automatic distribution
- sharding (partitioning across nodes)
- replication (copying across nodes)
- Application controls durability and consistency at the operation level
- Coordinate write and read settings to achieve desired quality
Four Types of NoSQL
Type 1: Document Store
- Key-value store where “value” is a document (XML, JSON, BSON, …)
- Examples – MongoDB, Couchbase
“id”: “1” “Name”: “John” “Employer”: “SEI” ; “id”: “2” “Name”: “Ian” “Employer”: “SEI” “Previous”: “PNNL”
Type 2: Key-Value Store
- Simple hash table – associates unique key with a piece of un-typed data
- Examples – Dynamo, riak, Redis, Oracle NoSQL
“key”: “1” value { “Name”: “John” “Employer”: “SEI”}; “key”: “2” value {“Name”: “Ian” “Employer”: “SEI” “Previous”: “PNNL”}
Type 3: Column Store
- Rows store multiple key-value pairs, indexed and sorted by key
- Examples – Cassandra, Accumulo, HBase, Amazon SimpleDB
“row”: “1” , “Name” “Employer”
“John” “SEI”
“row”: “2” “Name” “Employer” “Previous”
“Ian” “SEI” “PNNL”
Type 4: Graph Store
- Records represent nodes and edges/links – properties attached to each store the data
- Example – Neo4J
MongoDB
- MongoDB is an open source project mainly driven by the company 10gen Inc
- Features:
- Document Data Model and a relatively rich query language,
- Data partitioning by sharding,
- Leader – worker mode of replication,
- No data versioning
- Prominent users:
- SourceForge.net,
- Foursquare,
- New York Times
Data Model
- A MongoDB hosts a number of databases
- A database is a physical data container of a set of collections
- A collection contains a set of documents
- A document contains a set of key-value (aka field_name/field_value) pairs {field1: value1,…, fieldn: valuen}
MongoDB Documents
- The value can be:
- Any of the BSON (Binary JSON) data types,
- Other documents,
- Arrays, and
- Arrays of documents
- Field names are strings
- The field {name}_id is reserved for the document’s primary key that is indexed
- Unique in the collection
- Immutable, and
- Any type except an array
MongoDB Documents
- The basic unit of data
- Analogous to a JSON object ;
- Analogous to structures in programming languages that associate keys with values
- dictionaries, hashes, maps, and associative arrays
A Document Example
{
class_id: ObjectId (“509980df3”),
course: {code: “SWEN432”,
title: “Advanced DB”},
year: 2014,
students: [“Matt”, “Jack”,…, “Lingshu”],
no_of_st: 11
}
- The document contains values of varying types
- The primary key is class_id and it is of the ObjectId type,
- The field course is a subdocument, and
- students is an array of strings
MongoDB Documents
Documents have a dynamic schema:
- Documents in the same collection do not need to have the same structure
- Common fields belonging to different documents may have different data types
- However, it is a common practice that documents of a collection have a similar structure
MongoDB Collections
Collections are analogous to a table in relational databases.
CRUD Operations
The acronym CRUD stands for:
- Create (insert),
- Read,
- Update, and
- Delete
- MongoDB CRUD operations target a specific collection
- In addition to the basic queries, MongoDb also supports aggregation
Write Operations
- There are three classes of write operations in MongoDB:
- Insert that adds a new document to a collection,
- Update that modifies an existing document, and
- Remove that deletes an existing document from a collection.
- The update and remove operations allow specifying criteria or conditions that identify documents (to be modified, or removed)
Insert Operation
The following operation inserts a new document into the collection courses:
db.courses.insert (
{code: “SWEN432”,
title: “Advanced Database Design and
Implementation”,
trimester: 1,
year: 2014}
)
- The inserted document is going to have five fields: code, title, trimester, year, and _id
- MongoDB adds the field _id automatically and populate it with a unique ObjectId (unless specified by the application)
Update and Delete Operations
The following operation updates a document that satisfies the update criteria {code: “SWEN432”}:
db.courses.update (
{code: “SWEN432”},
{set: {year: 2015}}
)
The following operation deletes multiple documents that satisfy the remove criteria
{year: {$lt:
2014}}:
db.courses.remove (
{year: {$lt: 2014}}
)
Read Operation
- MongoDB provides a db.collection.find() method
- The method accepts:
- Query criteria (conditions),
- Projection list,
- Modifiers (sort, limit, skip), and
- Returns a cursor to the matching documents
Index
Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Ascending: -1; Descending: 1
Projections
- Queries return all fields in all matching documents by default
- Projections are defined in the second argument of find() (the first is the query selector)
- Projections may either specify:
- A list of fields to return (designated by {<field_name>: 1}), or
- A list of fields to exclude (designated by {<field_name>: 0}) in the result,
- Only the exclusion of _id field can be mixed with fields to return
- Examples:
db.courses.find({no_of_st: { $gt: 8 }}
{title: 1, _id: 0})
db.courses.find({no_of_st: { $gt: 8 }} {title: 0})
Data Modeling
- The key challenge in data modeling is balancing:
- The needs of the application,
- The performance characteristics of the database engine, and
- The data retrieval patterns
- The key decision in structuring a document is how to represent relationships between data:
- Embedding or
- Referencing
Embedded Relationships
- By embedding, related data is stored using sub- documents within a document
- This way related data is going to be stored in the same place
- Embedded relationship schemes are also said to be denormalized
- Recommended for use in case of:
- “Contains” relationship between entities,
- One-to-one, or one-to-many relationships between entities
Relationships by Referencing
- Implementing relationships by referencing is also called the normalized data model:
- Subdocuments are implemented like individual documents, and
- Relationships are implemented by linking subdocuments from the “parent” document using primary keys of the subdocuments
Relationships by Referencing
- Recommended to use:
- When gain in read performance does not compensate losses because of data duplication caused by embedding,
- To represent a many-to-many relationship, and
- To build large hierarchical data sets
- Disadvantage:
- A query asking for data from the parent and children documents has to be implemented using a sequence of queries
Sharding
- In MongoDB, data sharding is a synonym for horizontal data partitioning
- Sharding is performed on the level of a collection
- A collection is divided into sub collections called shards
- Each shard is an independent database stored on a cluster of servers that is also called shard
- All shards make up a logical database
- Example:
- Assume a collection contains 1 TB of data
- It can be divided into four shards having 256 GB each
Sharded Cluster
- Sharded clusters implement sharding.
- Shards are MongoDB instances that hold a subset of a collection’s data. Each shard is either a single mongod instance or a replica set.
- Config Servers are mongod instances that hold metadata about the cluster. The metadata maps chunks to shards.
- Router is a mongod instance that routes reads and writes from apps to the shards. Applications do not access the shard directly.
Shard Key
- Sharding partitions a collection’s data by the shard key
- A shard key is either an indexed field or an indexed composite field that exists in every document in the collection
- MongoDB divides the shard key value range into chunks and distributes the chunks evenly across the shards
- Dividing shard key values into chunks is done either by:
- Range based partitioning or
- Hash based partitioning
Range Based Sharding
- Range based sharding:
- The space of shard key values is divided into non overlapping ranges called chuks
- Each chunk is stored on a shard cluster
- Documents having close shard key values are likely to be stored on the same shard
- Favors range queries by shard key
- May result in an uneven distribution of data and the work load
Hash Based Sharding
- Hash based sharding:
- The space of hashes of shard key values is divided into non overlapping chuks
- Documents having close shard key values are likely to be stored on different shards
- Favors an even distribution of data and the work load
Balanced Data Distribution
- The addition of new data or servers can result in data distribution imbalances
- A shard can contain significantly more chunks than another, or
- Size of a chunk may become significantly greater then another
- MongoDB uses two background processes to keep a cluster balanced:
- Splitting and
- Balancer
Splitting
When a chunk grows beyond a specified size, MongoDB:
- Splits the chunk into two halves,
- Does not migrate any data (the shard contains more chunks now)
Balancing
When the distribution of chunks in a sharded collection in a cluster becomes uneven, a balancer that runs in all routers:
- Migrates whole chunks from the shard with the largest number of chunks to the shard with the least number of chunks,
- The origin shard keeps the chunk and answers queries until the chunk has been successfully transferred to the destination shard
- Only then, MongoDB deletes the chunk from the origin shard
Adding and Removing Shards
- When a new shard joins a cluster it creates imbalance, since the new shard has no chunks
- A balancer starts migrating data immediately
- When a shard needs to be removed from a cluster
- A balancer migrates all of its chunks to the other shards,
- Updates the meta data on config servers appropriately, and
- Only then, the shard can be removed
Replication
- In MongoDB, replication is implemented on the level of a shard
- A replica set contains replicas of a shard
- A replica set is a cluster of MongoDB servers
- There is a primary daemon process called mongod running on each server,
- mongod handles data requests, manages data format, and performs background management operations
- Comment on the term daemon:
- In multitasking computer programming systems, a daemon] is a program that runs as a background process, rather than being under the direct control of an interactive user
- Traditionally daemon names end with the letter d:
Leader and Worker
- One mongod – the primary (or leader):
- Accepts all write request from clients,
- Updates its data set, and
- Records update operations in its operation log
- All other servers of the replica set – secondaries:
- Receive operations from the primary’s operation logs Apply operations on their data sets
- If clients read from the primary, the replica set provides a strict consistency
- If clients are allowed to read from secondaries, the replica set provides an eventual consistency
Distributed Queries
- Applications issue operations to one of mongos instances of a sharded cluster
- Read operations are most efficient when a query includes the collections shard key
- Otherwise the mongos must direct the query to all shards in the cluster (scatter gather query) and that might be inefficient
- By default, MongoDB always reads data from a replica set’s primary
- That behavior can be modified by changing the preference mode:
- To balance the work load,
- To allow reads during failover, but
- Eventual consistency can be guaranteed, only
Write Operations on Sharded Clusters
For sharded collections in a sharded cluster, the mongos directs write operations from applications to shards that are responsible for the portion of the data set using the sharding key value
The mongos gets needed metadata information from the config database residing on config servers
Write Operations on Replica Sets
- In replica sets, all write operations go to the set’s primary
- The primary applies the write operations and then records the operations on its operation log (oplog)
- Oplog is a reproducible sequence of operations to the data set
- Secondary members of the set continuously replicate the oplog by applying operations to themselves in an asynchronous process
Failover
- Replica set members send heartbeats (pings) to each other every two seconds
- If a heartbeat does not return within 10 seconds, the other members mark the not responding node inaccessible
- If a secondary becomes inaccessible, the replica set can continue to function without it
- If the primary becomes inaccessible, the remaining members of the replica set have to elect a new primary by voting
- The replica set can not accept any write request if there is no primary
- The first secondary that gets majority of votes becomes a new primary and the replica set resumes the normal mode of operation.
Leader Election
Problem Description In a cloud-based system that implements horizontal scaling, multiple instances of the same task could be running simultaneously with each instance servicing a different user. If these instances write to a shared resource, it may be necessary to coordinate their actions to prevent each instance from blindly overwriting the changes made by the others If the tasks are performing individual elements of a complex calculation in parallel, the results will need to be aggregated when they all complete.
Leader Election
- Solution Selecting the task instance with the lowest-ranked instance or process ID. Racing to obtain a shared distributed mutex. The first task instance that acquires the mutex is the leader. However, the system must ensure that, if the leader terminates or becomes disconnected from the rest of the system, the mutex is released to allow another task instance to become the leader. Implementing one of the common leader election algorithms such as the Bully Algorithm or the Ring Algorithm. These algorithms assume that each candidate participating in the election has a unique ID, and that they can communicate with the other candidates in a reliable manner.
Example Token Ring Election Algorithm
1. We start with 6 processes, connected in a logical ring. Process 6 is the leader, as it has the highest number.
2. Process 6 fails.
3. Process 3 notices that Process 6 does not respond. So it starts an election, sending a message containing its id to the next node in the ring.
4. Process 5 passes the message on, adding its own id to the message.
5. Process 0 passes the message on, adding its own id to the message.
6. Process 1 passes the message on, adding its own id to the message.
7. Process 4 passes the message on, adding its own id to the message.
8. When Process 3 receives the message back, it knows the message has gone around the ring, as its own id is in the list. Picking the highest id in the list, it starts the coordinator message “5 is the leader” around the ring.
9. Process 5 passes on the coordinator message.
10. Process 0 passes on the coordinator message.
11. Process 1 passes on the coordinator message.
12. Process 4 passes on the coordinator message.
13. Process 3 receives the coordinator message, and stops it.
Summary
- MongoDB is a popular free source CDBMS with a number of prominent users
- Data model: document oriented
- A database is a container for a number of document collections
- Documents of a collection may have but don’t have to have the same structure
- A document is a set of key-value pairs represented as a JSON structure
- The main issue with data modeling is how to represent relationships between entities
- Embedded relationships
- Relationships by references
- Data partitioning: sharding (horizontal partitioning) by
a shard key, contained in all documents of a collection - Data replication: performed on the level of a shard on the primary leader-worker approach
- Leader is an instance of MongoDB that performs all writes and propagates them to workers
- If all reads also go to leader – strict consistency, otherwise – eventual consistency
- Leader is a single point of failure
- If leader fails, a worker is elected as a new leader
- Query: query selectors, projections, sorting, limit,…
- aggregation operations is supported