Talend & MongoDB: Iterating Over Files Using tMongoDBBulkLoad
This is a quick blog, but one that can be extremely useful when working with MongoDB. Today I needed to do a bulk import of 75 CSV files into MongoDB. I tend to prefer working with lots of smaller files I can open up in Notepad++ when there is an issue, rather than deal with 5gb CSV files I can't open.
I added my "tFileList" and then a "tMongoDBBulkLoad" component, but was then surprised that I couldn't connect my "tFileList" via an "iterate" connector to my "tMongoDBBulkLoad" component. Today I was working in a 6.0 environment, so maybe this works in a later version, but this is what I got:
The solution was to iterate from "tFileList" into a "tJava" component (which doesn't really do anything) and then trigger the "tMongoDBBulkLoad" component via an "onComponentOK" trigger link.
Below is the "tFileList" which is iterating over the 75 regulatory documents:
This is connected to a "tJava" which does nothing - other than outputting the current file name. The "tJava" doesn't need to do this, but I may as well have it do something:
Then, I triggered the "tMongoDBBulkLoad" component from the "tJava" component using an "onComponentOK" trigger. The file name is extracted from the globalMap, courtesy of the "tFileList".
Because of the job design, I couldn't choose the "Drop collection if exists". In this configuration, the collection would constantly get dropped and I would only ever have the contents of the last file processed in the collection.
I cheated and manually dropped the collection in MongoDB before the job started, but would add this step to the start of the job via a "tMongoDBRow" if I was doing it properly.
There you have it! As I stated earlier, this was a quick overview, but hopefully useful. We've got a few more blogs with MongoDB coming along here so keep your eyes out for more tips and tricks!
Disclaimer: All opinions expressed in this article are my own and do not necessarily reflect the position of my employer.