MongoDB
MongoDB is a document-oriented database,
A document-oriented database replaces the concept of a “row” with a more flexible model, the “document.”
By allowing embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record.
There are also no predefined schemas: a document’s keys and values are not of fixed types or sizes.
MongoDB’s document-oriented data model makes it easier for it to split up data across multiple servers (scaling out).
MongoDB automatically takes care of balancing data and load across a cluster, redistributing documents automatically and routing user requests to the correct machines.
Some features common to relational databases are not present in MongoDB, notably joins and complex multirow transactions.
A document is the basic unit of data for MongoDB and is roughly equivalent to a row in a relational database management system.
A collection can be thought of as a table with a dynamic schema. This means that the documents within a single collection can have any number of different “shapes.”
A single instance of MongoDB can host multiple independent databases, each of which can have its own collections.
Every document has a special key, "_id", that is unique within a collection.
MongoDB comes with a simple but powerful JavaScript shell, which is useful for the administration of MongoDB instances and data manipulation.
Documents are an ordered set of keys with associated values. Most documents will contain multiple key/ value pairs. The keys in a document are strings.
Replica Sets
Replication is a way of keeping identical copies of your data on multiple servers.
With MongoDB, you set up replication by creating a replica set.
A replica set is a group of servers with one primary, the server taking client requests, and multiple secondaries, servers that keep copies of the primary’s data.
If the primary crashes, the secondaries can elect a new primary from amongst themselves.
Replica Sets Quirks
A majority of the servers in your set must agree on the primary. Even numbers of servers (like 2) don’t work well.
If you have only 2 servers, you can set an arbiter, whose only purpose is to participate in elections. Arbiters hold no data and aren’t used by clients: they just provide a majority for two-member sets.
Replicas only address durability, not your ability to scale You can take advantage of reading from secondaries but this is generally not recommended. The database will still go into read-only mode for a bit while a new primary is elected (remember Availability in the CAP theorem).
Sharding
Sharding refers to the process of splitting data up across machines; the term partitioning is also sometimes used to describe this concept.
MongoDB supports autosharding, which tries to both abstract the architecture away from the application and simplify the administration of such a system.
Shards store the data. Each shard is a replica set.
Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard or shards.
Config servers store the cluster’s metadata. This data contains a mapping of the cluster’s data set to the shards.