Talend & MongoDB: An Introduction to Simple Relational Mapping into MongoDB
This article was written using the open source version of Talend Open Studio for Big Data which can be downloaded below:
I am looking into ways to map relational data models into MongoDB, and am starting off simple to test a few methods. To begin with I just want to take two related tables from Oracle and create a hierarchical JSON document to represent the one-to-many relationship within a MongoDB document.
I am using Oracle XE with the pre-installed HR database, which has two tables “DEPARTMENTS” and “EMPLOYEES” related on “DEPARTMENT_ID“.
The task is to try and use Talend to create a document in MongoDB, which mirrors this one to many relationship, using standard components.
Below is the end result, we have one document per department with the “many” side of the relationship mapped as an array of employees.
There are likely lots of ways to achieve this, but for this simple example the following seemed to work for me. If I am doing this incorrectly, or there are better ways to do it, then it would be great to know.
Step 1: Create the Connections to Oracle and to MongoDB
Step 2: Pull in the Department and Employee Data
Use “tOracleInput” components to retrieve the “Departments” and “Employees” tables:
Step 3: Join the Departments and Employees data
For this I am using a “tMap” component, with an an inner join, and “All Matches” as the match type, which is good enough for this example:
Step 4: Sort the data by Department
Later on I need to do some grouping of the data by the department, so I will need to sort the data by “DEPARTMENT_ID” at a minimum. I do this via a simple “tSortRow“. The important thing is to sort by “DEPARTMENT_ID“, but I will then sort by employee “FIRST_NAME“:
Step 5: Add a tMongoDBOutput component
I add the “tMongoDBOutput” component, and configure it to “Generate JSON Document“. This gives me some control over the JSON document structure:
Step 6: Configure the JSON Tree
We now have to think about the JSON structure and our loop elements. Loop elements are a little confusing (at least to me), and this was a little trial and error. The end result is the JSON tree below. We have a root element which we will actually remove later called “department“.
Below the “department” root we have mapped sub-elements of “DEPARTMENT_ID” and “DEPARTMENT_NAME” from the data coming in from the tMAP.
Then we create a parent node called “employees” which we set as the loop element, as this will hold the array of employees related to the department.
Below the “employees” node we map the various fields of interest which will form part of each employee record.
Step 7: Run the job
I chose to “Remove root node” and ran the job. Remove root node in this instance means the specific “department” root node we defined in the JSON tree above:
Step 8: Check MongoDB
We are close, we have a sensible looking document, but we have one document per department/employee combination. The “Shipping” department below has multiple documents, one per employee. What we want is an array of employees within each department.
Step 9: Group by department
In the Advanced settings in “tMongoDBOutput“, we choose to add a “Group By” to the “DEPARTMENT_ID“. This is why I sorted earlier by DEPARTMENT_ID.
Step 10: Run the job
We seem to have achieved the right outcome. Here is the “Finance” department, with an array of all associated employees in MongoDB.
The job is not that complicated to look at either:
The XML and JSON components are really powerful in Talend, but they are also quite complicated when you first start out (as I am!). But I managed to work out this simple example in a few hours with some intuition and trial and error.
I need to understand more the impact of the “loop element” and how it applies to this example. I know things will get more complicated if I need to map more than one relationship and build a deeper hierarchy, as I can only specify a single loop element in the JSON Tree it seems.
As a matter of interest if I put the “Loop element” on the “DEPARTMENT_ID” field instead:
We get this in MongoDB…
Disclaimer: All opinions expressed in this article are my own and do not necessarily reflect the position of my employer.