Talend & MongoDB: Iterating Over Files Using tMongoDBBulkLoad


 

This is a quick blog, but one that can be extremely useful when working with MongoDB. Today I needed to do a bulk import of 75 CSV files into MongoDB.  I tend to prefer working with lots of smaller files I can open up in Notepad++ when there is an issue, rather than deal with 5gb CSV files I can’t open.

Download >> Talend Open Studio for Data Integration

I added my “tFileList” and then a “tMongoDBBulkLoad” component, but was then surprised that I couldn’t connect my “tFileList” via an “iterate” connector to my “tMongoDBBulkLoad” component.  Today I was working in a 6.0 environment, so maybe this works in a later version, but this is what I got:

 

The solution was to iterate from “tFileList” into a “tJava” component (which doesn’t really do anything) and then trigger the “tMongoDBBulkLoad” component via an “onComponentOK” trigger link.

Below is the “tFileList” which is iterating over the 75 regulatory documents:

 

This is connected to a “tJava” which does nothing – other than outputting the current file name. The “tJava” doesn’t need to do this, but I may as well have it do something:

 

Then, I triggered the “tMongoDBBulkLoad” component from the “tJava” component using an “onComponentOK” trigger. The file name is extracted from the globalMap, courtesy of the “tFileList“.

 

Because of the job design, I couldn’t choose the “Drop collection if exists“. In this configuration, the collection would constantly get dropped and I would only ever have the contents of the last file processed in the collection.

I cheated and manually dropped the collection in MongoDB before the job started, but would add this step to the start of the job via a “tMongoDBRow” if I was doing it properly.

 

There you have it! As I stated earlier, this was a quick overview, but hopefully useful. We’ve got a few more blogs with MongoDB coming along here so keep your eyes out for more tips and tricks!

Download >> Talend Open Studio for Data Integration

Disclaimer: All opinions expressed in this article are my own and do not necessarily reflect the position of my employer. 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>