MongoDB is a popular NoSQL database that stores data in collections. MongoDB collections are comprised of one or more documents that contain the actual data in JSON format. Documents are comparable to rows in traditional relational SQL databases, while collections are analogous to tables.
A key functionality in databases is the ability to query the data stored in the database. Querying data allows for the retrieval of specific information, data analysis, data reporting, and also data integration.
To be able to query a database effectively, it is crucial to be able to combine data from multiple tables, in the case of SQL databases or multiple collections in NOSQL databases, into a single result set.
In MongoDB $lookup users to combine information from two collections when querying. It performs the equivalent of a left outer join in an SQL database.
$lookup’s Use and Objective
An important function of databases is data processing to get meaningful information out of raw data.
For instance, if you’re running a restaurant business, you might want to analyze your restaurant’s data to find out how much you make every day, which foods are on demand on weekends, or even find out how many cups of coffee you sell at each hour of the day.
For such needs, simple database queries will not suffice. You need to perform advanced queries on the data you have stored. To address such needs, MongoDB has a feature called an aggregation pipeline.
An aggregation pipeline is a system comprised of composable operations called stages, which are used to process data to produce a final aggregate result. Examples of stages on the aggregation pipeline include $sort, $match, $group, $merge, $count, and $lookup, among others.
These stages can be applied in any order in an aggregation pipeline. At each stage in an aggregation pipeline, different operations are performed on the data being passed through the aggregation pipeline.
$lookup is thus a stage in the MongoDB aggregation pipeline. $Lookup is used to perform a left outer join between two collections in a MongoDB database. A left outer join combines all the documents or entries on the left with matching documents or entries on the right.
For instance, consider the two collections below, which have been represented in tabular format for easier understanding:
If we perform a left outer join on the above collections using the customer_id field, which appears in the order_collection, with the order_collection being the left collection and the customers_collection being the right collection, the result will contain all the documents in the Orders Collection and documents on the Customers Collection that have a customer_num that matches a customer_id of any of the records in the Orders Collection.
The final result of the left outer join operation on the orders and customers collections looks like this when represented in tabular format:
Notice that for the customer with customer_id 101 in the Orders Collection, which had no matching customer_num value in the Customers Collection, missing corresponding values from the customer table have been filled with null.
$lookup performs strict equality comparison between fields and retrieves the entire document that matched, and not just the fields that matched.
The syntax for $lookup is as follows:
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
$lookup has four parameters:
from – represents the collection from which we want to lookup documents. In our earlier example using orders_collection and customers_collection, we’d put customers_collection as the from the collection.
localField – this is a field in the working or primary collection that we use to compare to fields in our from collection(customers_collection in our case). In the example above, the localField would be customer_id which is found in the orders_collection.
foreignField – this is the field we want to compare to in the collection we specify in from. In our example, this would be customer_num found in the customer_collection which we use as our value in from
as – this is a new field name we specify to represent the field that will show up in our document, which contains documents resulting from matches between the localField and the foreignField. All these matches are put in an array in this field. If there are no matches, this field will contain an empty array.
From our two earlier collections, we would use the following code to perform a $lookup operation on the two collections with the orders_collection as our working or primary collection.
Note that the as field can be any string value. However, if you give it a name that already exists in the working document, that field will be overwritten.
Joining Data From Multiple Collections
MongoDB $lookup is a useful stage in an aggregation pipeline in MongoDB. Although it is not a requirement that an aggregation pipeline in MongoDB must have a $lookup stage, the stage is crucial when performing complex queries that require joining data across multiple collections.
The $lookup stage performs a left outer join on two collections which results in a new field being created or an existing field’s values being overwritten with an array containing documents from another collection.
These documents are selected based on whether they have values that match the values of the field they are being compared with. The end result is a field containing an array of documents in case matches were found or an empty array in case no matches were found.
Consider the employee’s and project collections shown below.
We can use the following code to join the two collections:
The result of this operation is a combination of the two collections. The result is the projects and all the employees assigned to each project. The employees are represented in an array.
Pipeline Stages that can be used together with $lookup
As mentioned before, $lookup is a stage in a MongoDB aggregation pipeline, and it can be used together with other aggregation pipeline stages. To show how these stages can be used together with $lookup, we’ll use the following two collections for illustration purposes.
In MongoDB, they are stored in JSON format. This is what the above collections look like in MongoDB.
Some examples of aggregation pipeline stages that can be used together with $lookup include:
$match is an aggregation pipeline stage used to filter the document stream to allow only those documents that meet the given condition to proceed to the next stage in the aggregation pipeline. This stage is best used early in the pipeline to remove documents that won’t be needed and thus optimize the aggregation pipeline.
Using the two earlier collections, you can combine $match and $lookup like so:
$match is used to filter for users from the USA. The result from $match is then combined with $lookup to get the order details of users from the USA. The result of the above operation is shown below:
$project is a stage used to reshape documents by specifying which fields to include, exclude or add to documents. For instance, in case you’re processing documents with ten fields each, but only four fields in the documents contain data that you need for your data processing, you can use $project to filter outer the fields you do not need.
This allows you to avoid sending unnecessary data to the next stage of your aggregation pipeline.
The above combines the users and orders collections using $lookup, then $project is used to only display the name of each user and the amount spent by each user. $project is also used to remove the _id field from the results. The result of the above operation is shown below:
$unwind is an aggregation stage used to deconstruct or unwind an array field creating new documents for each element in the array. This is useful in case you want to run some aggregation on the array field values.
For instance, in the example below, in case you want to run aggregation on the hobbies field, you can’t do so because it is an array. However, you can use unwind it using $unwind and then perform aggregations on the resulting documents.
Using the users and orders collections, we can use $lookup and $unwind together like so:
In the code above, $lookup returns an array field called orders. $unwind is then used to unwind the array field. The result of this operation is shown below: Notice Alice appears twice because she had two orders.
Examples of $lookup Use Cases
When performing data processing, $lookup is a useful tool. For instance, you might have two collections that you want to join based on fields on the collections which have similar data. A simple $lookup stage can be used to do this and add a new field in the primary collections, which contain documents gotten from another collection.
Considers the users and orders collections shown below:
The two collections can be combined using $lookup to give the result shown below:
$lookup can also be used to perform more complex joins. $lookup is not just limited to performing joining on two collections. You can implement multiple $lookup stages to perform joins on more than two collections. Consider the three collections shown below:
We can use the code below to perform a more complex join across the three collections to get all the orders that were made and also details of the products that were ordered.
When performing data processing involving multiple collections, $lookup can be useful as that allows you to join data and draw conclusions based on data stored in multiple collections. Data processing rarely relies on just one collection.
To draw meaningful conclusions from data, joining data across multiple collections is a key step. Therefore, consider utilizing the $lookup stage in your MongoDB aggregation pipeline to allow you to process your data better and draw meaningful insights from raw data stored across collections.
Collins Kariuki is a software developer and technical writer for Geekflare. He has over four years experience in software development, a background in Computer Science and has also written for Argot, Daily Nation and the Business Daily Newspaper.
Cloud data warehouses provide a scalable solution for storing and analyzing data in real-time. They enable businesses to reduce costs and improve efficiency by eliminating the need for on-premises hardware and maintenance. Here is our list of the best cloud data warehouse platforms.