Why Another MongoDB Blog? Pt 1

My first experience with MongoDB and noSQL databases

Why Another MongoDB Blog When there are Already so Many?

Last week I had a phone screen for a full stack software engineering position at a top-TI top-50 start-up company. The interview went very well, but it is now up to the engineering team to decide whether or not to give me a technical interview. It is only an entry-level position, but the company’s growth-rate is exponential. They need to know that I can hit the ground running, or that I can handle what potentially is a steep learning curve. So, what to do? Fill in the gaps, and flatten the curve. That’s what.

MongoDB is an essential part of the stack that the company uses. It’s also a part that I had little-to-no experience with. I had most worked with relational databases, so I was excited to have an opportunity to learn something new with more than sheer curiosity as a driving force.

Hence, another blog about MongoDB.

A Quick Note

Source

Relational Databases vs noSQL Databases

Relational / SQL Databases

Relational databases are composed of tables. The rows of the table are individual entries with unique ID keys, and the columns are the properties of each entry. The properties and value types of each table are clearly defined in a pre-written schema. For example, a table of customers and a table products may look something like Fig 1 and Fig 2, respectively.

Fig 1: Customers
Fig 2: Products

If a customer purchases an item from the products table, there needs to be a way to track it, so we build something called a join table. A join table consists of at least two foreign keys that represent the id’s of entries in other tables as well as any other pertinent information. In Fig 3 we see a join table that represents sales. It includes the customer’s id and the item’s id, as well as a couple other bits of useful information.

Fig 3: Sales

This is what’s called a many-to-many relationship. Customers can purchase many items, and the items can be purchased by many different customers. Two other types of relationships are one-to-one, and one-to-many. One-to-one relationships might be something like a customer and their user preferences. Each customer has user preferences, and the user preferences belong to only one customer each. A one-to-many relationship is like an auto manufacturer. Manufacturers can build multiple different car types, but each car type is build by only one manufacturer.

Fig 4: Visual representation of relationships that tables can have.

Software engineers have used these types of databases for years for many reasons. The original reason is that memory used to be expensive. I mean, really expensive. This style of database ensures that there isn’t ever any replicated data. Everything has a single source of truth. For instance, if I need to change the email address of a customer, every other table that uses the customer_id property now knows about this change.

Relational databases use SQL to query their tables and serve up data to the user.

noSQL Databases

Since we know about relational databases already, understanding the layout of document databases is an easy step. In relational databases, the database is composed of tables, which are composed of entries. In document databases, the database is composed of clusters, which in turn house documents. For example, a ‘Users’ cluster will have a document for each individual user. Documents are quite different than the entries we are used to in relational databases.

Documents are written in binary JSON (BSON). The structure is the same as JSON, but by encoding it in binary it allows for all languages to reach out to the database. Fig 5 is the return from a sample data set that MongoDB offers when using their free version’s sandbox-mode. It looks an awful lot like the JSON objects that we are already so well-versed with, right?

Fig 5: Return from a sample weather data document.

Documents store data in a way that most-closely resembles the way it will be used, and they do not require a schema. Engineers often provide one for continuity, but different documents in the same collection can hold different data. For instance, were we to compare another sample_weather document to the one in Fig 5, we may find that the location doesn’t collect ‘skyCondition’ data. In a relational database we’d be forced to return that property with some value of null, or a reference to the fact that it doesn’t exist. In a document, it just wouldn’t be listed. This offers incredible flexibility when building the database.

Documents also allow for embedding rather than only referencing. Looking to Fig 5 again, we see multiple properties that look like objects; skyCondition, visibility, and wind are just a few. These are called embedded documents. They are documents within the document. In a relational database we would normally create a separate table for each of these properties. We would have a ‘wind’ table, a ‘skyCondition’ table, etc. Using the wind property, the table would most likely look something like Fig 6.

Fig 6: Example Wind table entry

In order to get this data, we would have to do multiple queries to retrieve the location, the wind table data, and any other tables that refer to our location’s weather information. By using a document to store our data, it is all retrieved at once, greatly increasing the response time of queries.

Pros and Cons

Reading/Writing

  • Data is normalized with a strict schema
  • Data has a single source of truth. Writing to relational databases is very fast, because the information can only live in one place
  • Great for applications that have to update data often

Document Database Pros:

  • Schema-less: documents offer great flexibility when it comes to expansion. Project parameters often change, so the documents can easily adapt.
  • Related information lives in one place. Read operations are very fast, because they only need to return the single document.
  • No over-fetching & no under-fetching

Relational Database Cons:

  • n+1 problem
  • Over-fetching & under-fetching
  • Slow queries if data is distributed across many tables

Document Database Cons:

  • No single source of truth. Updating data can be cumbersome if it exists in multiple collections.

Scaling

  • Vertical scaling is possible

Document Database Pros:

  • Vertical scaling is possible.
  • Horizontal scaling is very easy. Documents are already separated, so housing them on multiple servers is simple. (MongoDB Clusters are collections that are sharded across multiple servers. More on that in Pt 2)

Relational Database Cons:

  • Horizontal scaling is difficult and expensive by comparison

Document Database Cons:

  • Greater chance for return of stale data depending on query settings.

Pt 1 Conclusion

References

A climate scientist turned software engineer