TALEND CONNECT 2018 : Get inspired by the movers and shakers in the big data world in NYC
Talend & MongoDB: Iterating Over Files Using tMongoDBBulkLoad
Talend & MongoDB: Iterating Over Files Using tMongoDBBulkLoad
This is a quick blog, but one that can be extremely useful when working with MongoDB. Today I needed to do a bulk import of 75 CSV files into MongoDB. I tend to prefer working with lots of smaller files I can open up in Notepad++ when there is an issue, rather than deal with 5gb CSV files I can't open.
Download >> Talend Open Studio for Data Integration
I added my "tFileList" and then a "tMongoDBBulkLoad" component, but was then surprised that I couldn't connect my "tFileList" via an "iterate" connector to my "tMongoDBBulkLoad" component. Today I was working in a 6.0 environment, so maybe this works in a later version, but this is what I got:
The solution was to iterate from "tFileList" into a "tJava" component (which doesn't really do anything) and then trigger the "tMongoDBBulkLoad" component via an "onComponentOK" trigger link.
Below is the "tFileList" which is iterating over the 75 regulatory documents:
This is connected to a "tJava" which does nothing - other than outputting the current file name. The "tJava" doesn't need to do this, but I may as well have it do something:
Then, I triggered the "tMongoDBBulkLoad" component from the "tJava" component using an "onComponentOK" trigger. The file name is extracted from the globalMap, courtesy of the "tFileList".
Because of the job design, I couldn't choose the "Drop collection if exists". In this configuration, the collection would constantly get dropped and I would only ever have the contents of the last file processed in the collection.
I cheated and manually dropped the collection in MongoDB before the job started, but would add this step to the start of the job via a "tMongoDBRow" if I was doing it properly.
There you have it! As I stated earlier, this was a quick overview, but hopefully useful. We've got a few more blogs with MongoDB coming along here so keep your eyes out for more tips and tricks!
Download >> Talend Open Studio for Data Integration
Disclaimer: All opinions expressed in this article are my own and do not necessarily reflect the position of my employer.
Most Downloaded Resources
Browse our most popular resources - You can never just have one.
- 2018
- April
- Automation Made Easy: Building Jenkins Jobs with Jenkins DSL
- How to Develop a Data Processing Job Using Apache Beam
- Successful Methodologies with Talend – Part 2
- The Six Biggest GDPR Pitfalls Everyone Must Avoid
- Apache Spark and Talend: Performance and Tuning
- Why Paddy Power Betfair Bet on a Cloud Architecture for Big Data
- From GDPR to Customer Trust: Is Your Data Ready to Protect Customer Privacy?
- Introducing the Talend Architecture Center – Your One Stop Resource for Best Practices, Architectures and More
- Everything You Need to Know About IoT – Hardware
- How to Go Serverless with Talend & AWS Lambda
- What’s Outcome-Based Data Management?
- Data Preparation and Wrangling Best Practices – Part 1
- The Race for AI: Embed Artificial Intelligence in all Business Application by 2019 or Risk Irrelevancy
- March
- Talend Joins the Open API Initiative to Further API Standards and Interoperability
- 7 Emerging Open Source Big Data Projects that will Revolutionize Your Business
- How GDPR Can Empower Travel, Transport and Hospitality Firms and Their Customers
- Data Science: How to Get the Most out of Data, Science and Technology
- The ROI of Being Data-Driven
- Open Source: 20 years of Innovation and the Best is Yet to Come
- Salesforce Acquires Mulesoft – The War for Customer Data Rages On
- It’s Time to End Bad Data
- How Big Data is Growing Agriculture
- How to Migrate Your Data From On-premise to the Cloud: Amazon S3
- “Moving to the Cloud”: Going Cloud First at University of Pennsylvania
- The Cloud of Yesterday, Today, and Tomorrow
- Building the Best Enterprise Data Strategy in 2018: How Our Customers Are Getting There
- A Simple Architecture for Building a Big Data Lake on Azure with Talend Cloud
- Digital transformation in the public sector: balancing the risks with data-driven cyber security
- “Move to the Cloud” – Beachbody Delivers a Cloud Data Lake and Faster Analytics with Talend and AWS
- An Intro to Apache Spark Partitioning – What You Need to Know
- Talend & Snowflake: Building a Cloud Data Warehouse Ready for Analytics
- February
- [Step-by-Step] Data Cleansing & Discovery with Talend Data Preparation Cloud
- The Paradise Papers: How the Cloud Helped Expose the Hidden Wealth of the Global Elite
- Talend vs. Spark Submit Configuration: What’s the Difference?
- How to Structure Your Business to Make Better Use of Data
- Net Neutrality: Why it’s Vital for Digital Transformation
- CIOs: Three Considerations for Digital Transformation
- Time to review your contracts: How GDPR will change the relationship between organizations and cloud service providers
- Legacy Versus Next-Generation – How Open Source is Driving the Big Data Market
- Talend Step-by-Step: Continuous Data Matching & Machine Learning with Microsoft Azure
- Batch vs. Stream Processing: Which Should You Choose and When?
- January
- The future of DevOps is mastery of multi-cloud environments
- Apache Beam in 2017: Use Cases, Progress and Continued Innovation
- How APIs, Edge Computing and AI will Evolve in 2018
- 2 Key Takeaways from the 2017 Gartner Market Guide for Data Preparation
- Talend Integration Cloud 101 – SDLC and Code Promotion Pipeline
- An Informatica PowerCenter Developers’ Guide to Talend – Part 3
- Successful Methodologies with Talend
- April
- 2017
- December
- Six Top Technology Trends to Watch in 2018
- Disaster Recovery 101: 3 Strategies to Consider
- NetSuite and Talend: Integrating with Cloud ERP Systems
- What is the Future for SQL Developers in a Machine Learning World?
- 8 Key Takeaways from the MDM & Data Governance Summit
- 5 Predictions About the Future of Machine Learning
- Getting Ready For GDPR: 5 Key Takeaways from Data 2020 EMEA
- How to Create a Smart City with IoT and Big Data
- November
- Introducing The Data Lake Quick Start from Talend, Amazon Web Services and Cognizant
- Organizational Structures and Leadership in Times of Digital Disruption
- Achieving Unlimited Scale Using Talend ESB & Auto-scaling on AWS
- Tackling the API Driven Future with Restlet and Talend
- The Secret to Getting Data Lake Insight: Data Quality
- An Introduction to Anti-Patterns – Preventing Software Design Anomalies
- October
- How to Apply SQL Analytics and Windowing functions to Apache Spark Data Processing
- Proc Out: A Guide on Utilizing Talend with Google Cloud Dataproc
- Why Data Quality Should be the ‘Red Thread’ of your Data Strategy
- Danger Zone: How Big is Your GDPR Blind Spot?
- Digital Transformation and GDPR: How Self-Service Data in the Cloud Can Help
- Talend Connect 2017: Architecting Your Data-Driven Future
- The New Era of Data Apps
- Making the World a Better Place, One Mogo at a Time
- Cyber Security Data – Too Much is Just as Bad as Not Enough
- An Intro to Digital Twin Technology: A Step Towards Fully Maximizing Industrial IoT
- September
- An Informatica PowerCenter Developers’ Guide to Talend – II
- For AI to Change Business, It Needs to Be Fueled with Quality Data
- Gartner Magic Quadrant for Data Integration Tools 2017: The Data Integration Market is Being Disrupted
- An Introduction to the Global Data Protection Regulations (GDPR)
- Time to Consider a New “V” for Big Data: Virtue
- Step-by-Step: How to Check Data Quality with Talend Using Your Own RegEx Pattern Library
- Talend & Apache Spark: A Technical Primer
- Your Company can be Google Smart Too – You Just Need Some Learning – Machine Learning That Is
- [Podcast] Digging into Digital Transformation: Featuring Marco Iansiti of Harvard
- Mergers, Acquisitions and Customer Experience in the Age of Data
- 5 Key Considerations for Building a Data Governance Strategy
- [Podcast] Big Data in 2020: Featuring Mark van Rijmenam of Datafloq
- August
- From Lambda to Kappa: A Guide on Real-time Big Data Architectures
- Server Monitoring 101: Getting Started with Nagios and Talend ESB
- How to Operationalize Machine Learning with Talend
- 3 Top Trends in Big Data, and 3 Things Holding Them Back
- Data, Insight, Action: Turning the Cycle to your Competitive Advantage
- How to Seamlessly Include GeoSpatial Data and Operations Into Your Data Integration Process
- Why the Gartner Magic Quadrant is a Developer’s Secret Weapon
- ETL, ELT, and UPM for Data Warehousing with Google BigQuery
- July
- Running Data Preparations on your Data Lake with Talend and Apache Beam
- Is Your Data Integration Platform Container Ready?
- Talend’s CTO Office Insights: Devising a Strategy for Thriving in a Multi-Cloud World
- Building a Data Sharehouse – Agile Data Management and Industrial Data Space (IDS)
- [Podcast] What’s Next for Apache Beam? Featuring Frances Perry of Google
- Boost Your Data Skills with Talend’s “Summer of Open Source” Live Stream Series
- The Reality of the Artificial Intelligence Revolution
- GDPR & Data Management – Five Pillars for Success Using Talend
- Talend Summer ’17: What’s New in Self-Service Apps? (Part 2)
- Getting Connected with Google Home Using API.AI & Talend
- June
- How to Configure ELK Stack for Telemetrics on Apache Spark
- What Everyone Should Know about Machine Learning
- Talend Summer ’17: What’s New in Self-Service Apps?
- Whole Foods gives Amazon New Data to Enhance Online and Offline Shopping
- Do You Have the Data Agility Your Business Needs?
- Talend Summer ’17: Run Big Data Integration Workloads on Any Cloud
- How to Process HL7 Data Using Talend Data Mapper
- How to Start Incorporating Machine Learning in Enterprises
- Data Matching 101: How Do You Tune Data Matching?
- Microservices – A Lean Thinking Approach
- Using Talend and MapR to Create a Real-time Recommendation Model
- 12 Months to GDPR: The Year of Metadata
- May
- Why our Partnership with Cloudera Altus is a No Brainer
- Using Neural Networks with Talend DI and ESB
- Talend & Couchbase: Jumping into the NoSQL Database World
- Testing Machine Learning Algorithms with K-Fold Cross Validation
- Diving Into Cloud Data Warehousing and Big Data with Microsoft Azure
- Before the Great Data Floods – Why Data Management is Critical for Industry 4.0 Success
- Data Model Design & Best Practices – Part 1
- What the NFL Still Needs to Learn about Big Data
- How to Turn Text into Data Using tNormalize and tJavaFlex in Talend
- April
- Introducing Our Latest Video Series: Craft Beer & Data
- [Podcast] Tech Trends in 2017 with Bernard Marr: Blockchain, IoT and More
- What’s new in Talend Data Preparation 2.0?
- Modern Data Architectures In the Real-World: Enabling Business Users and Big Data Processing
- Hand-coding SQL for Data Integration? Not Cool!
- Applying Machine Learning to IoT Sensors
- How to Simplify Your IoT Platform with Talend
- Talend & MongoDB: Iterating Over Files Using tMongoDBBulkLoad
- The Internet of Things and the Threat it Poses to GDPR Compliance
- March
- How to Achieve Business Transformation Using Talend and Amazon Web Services (AWS)
- How to Achieve Business Transformation Using Talend and Amazon Web Services (AWS)
- Getting to Real-Time Big Data Faster: Talend & MapR
- Before the Great Data Floods – Managing the Data Challenges of Industrial IoT, Industry 4.0, and Cross-industrial Exchange
- How DevOps Can Bring Innovation to IT through Cloud Integration
- Data Matching 101: What Tools Does Talend Have?
- Unlocking Data Preparation for Business Intelligence (BI)
- How to Use Click Stream Analysis to Optimize your Company’s Social Outreach
- A First for Apache Beam
- [VIDEO] Modern Data Management Needs a Governed, Self-Service Approach
- February
- Using Talend to Gather Data About Data
- What’s Blockchain and Can It Help You Trust Your Data?
- When It Comes to Big Data and Cloud, Continuous Innovation is the Model
- Stripping Websites and Translating Text using Talend and Google Translate API
- How to Load Data into Microsoft Azure SQL Data Warehouse using PolyBase & Talend ETL
- Are You Ready For The Data Age? Five Maturity Levels in Data-Driven Organizations
- What are the Top Three Questions Keeping CDOs Up at Night?
- Step-by-Step: Running, Testing and Debugging a Job in Talend Open Studio
- Talend Appoints Technology Industry Veteran Nanci Caldwell to its Board
- How to Offload Oracle and MySQL Databases into Hadoop using Apache Spark and Talend
- January
- Getting Started with Big Data
- Power to The People – Creating Trust in Data with Collaborative Governance
- Accelerate Data Lake Creation and Software Development Lifecycles with Talend Integration Cloud Winter ’17
- Apache Beam Your Way to Greater Data Agility
- Talend Data Masters 2016: How the ICIJ Decoded the Panama Papers with Talend
- The Future of Apache Beam, Now a Top-Level Apache Software Foundation Project
- What Exactly is Talend Data Stewardship and Why Do You Need It?
- Air France-KLM: Change is in the air to delight customers with “made-just-for-me” travel experiences
- December
- 2016
- December
- Top 6 Technology Market Predictions for 2017
- Your ‘Resolution List’ for 2017: 5 Best Practices for Unleashing the Power of Your Data Lakes
- 4 Considerations for Delivering Data Quality on Hadoop
- The Role of Statistics in Business Decision Making
- Talend Data Masters 2016 – UNOS: How many lives can you save?
- Data Matching 101: How Does Data Matching Work?
- IT: How to Survive in a Self-Service World
- Sensors, Environment and Internet of Things (IoT)
- Talend Data Masters 2016: Lenovo’s Data-Driven Retail Transformation
- Where’s a Russian Linesman When You Need One? Talend Scores Highest Position in Visionaries Quadrant for Data Quality
- Top 5 Takeaways from AWS re:Invent 2016
- Singapore Big Data Survey
- News from AWS re:Invent – How do you solve the complex data problem?
- November
- Helping Data Driven Companies Advance to Artificial Intelligence
- Catch the Big Data Wave – Talend Named Leader in Forrester Wave™: Big Data Fabric, Q4 2016
- Views from the Top: 5 Key Pieces of Advice from Talend CTO on the Future of Cloud – Part 2
- Views from the Top: 5 Key Pieces of Advice from Talend CTO on the Future of Cloud
- Talend Connect 2016: Unlock Your Data for Unlimited Possibilities
- What’s new in Talend Data Preparation 1.3?
- Which Flavor of Talend Data Preparation is Best for You?
- Setting Up an Apache Spark Powered Recommendation Engine
- October
- Applying Big Data Analytics to Clickstream Data
- Looking Back at Ten Years of Growth
- The Industrial Internet of Things: Why You Need to Get up to Speed Fast
- Hand Coding vs. Tools: Our Take on Gartner’s Report
- Five Pillars for Succeeding in Big Data Governance and Metadata Management with Talend
- 6 Steps that will Pave the way for your Hadoop Journey with Data Governance and Metadata Management
- Making Sense of the Data Integration Market
- September
- Talend Data Mapper, Spark and Electronic Data Interchange
- Day-in-the-Life of a Data Integration Developer: Advanced Talend Studio Features
- Good Things Come in Small(er) Docker Packages!
- Choose Your Own Big Data Adventure: Getting Started with Talend’s New Big Data Sandbox
- Big Data is Revolutionizing Political Campaigning
- Eight Steps to Becoming a Data-Driven Organization
- An Introduction to Microservices
- August
- Day-in-the-Life of a Data Integration Developer: How to Build Your First Talend Job
- Apache Beam in Action: Same Code, Several Execution Engines
- Day-in-the-Life of a Data Integration Developer: Introduction to Talend Studio
- Talend Integration Cloud Summer ’16 – The Best of Both Worlds: Security & IT Productivity for AWS
- Why We Think Gartner’s 2016 Magic Quadrant for Data Integration is a Big Milestone for Open Source
- It’s Not About the Dot: A Journey to Becoming a Leader in the Gartner Magic Quadrant for Data Integration Tools
- CIO: 3 Questions to Ask about your Enterprise Data Lake
- What’s New in Talend Data Preparation 1.2?
- July
- Talend’s Evolution: An Innovative and Ongoing Journey
- Welcome to the Data-Driven Era
- Syncing Users and Groups from LDAP into Apache Ranger
- The Rise of MDM in the Analytics Age
- Practical Cryptography with Apache CXF JOSE
- 5 Enterprise Software Upgrade Best Practices You Should Know
- Are You Ready For The Data Age? Five Maturity Levels in Data-Driven Organizations
- SaaS Data Migration & Data Integration
- Bridging the Gap Between Business and IT with Self-Service Data Preparation
- How Apache Spark™ Feeds Real-Time Sports Analytics
- Creating a Hortonworks Big Data Pipeline at the Speed of Talend
- June
- Data Preparation, to the Moon and Beyond
- Our Newest Data Fabric – A Gateway to Enterprise-Wide Data Driven Insights
- Data Prep 101: Diving into Enterprise Features
- IoST and IoUT: Why They Matter for IoT Growth
- Complex Generation and Distribution of Documents with Talend
- The Evolution of ETL and Continuous Integration
- Spark Summit West & Apache Spark 2.0—An Electrifying Week in Big Data
- Moving Data to the Coalface to Achieve Business Success
- Talend Integration Cloud & AWS: 3 Ways to Automate Big Data (Part 2)
- How to Aggregate Clickstream Data with Apache Spark
- May
- The Lambda Architecture and Big Data Quality
- Talend Integration Cloud & AWS: 3 Ways to Automate Big Data (Part 1)
- Artificial Intelligence is no Longer Science Fiction, It’s a Reality
- Career Opportunities in Talend for Big Data: Your Guide to Bagging Top Talend ETL Jobs
- Talend and “The Data Vault”
- Stop Chasing Perfection in Analytics. Here’s Why
- Introduction to Apache Beam
- April
- Making Sense Out of the Big Data Tangle
- Telcos and the Big Data-Driven Opportunity
- Analytics for the Masses: Five Things to Consider
- The Real Challenge of Analytics
- Internet of Things: Connecting the Digital to the Physical World
- Utilizing the Kerberos Protocol in Talend
- Key Components for Laying the Foundation for your Data-Driven Enterprise
- March
- Talend Job Design Patterns & Best Practices: Part 2
- What are the Top Three Questions Keeping CDOs Up at Night?
- Five Key Tips for Making MDM the Foundation for Your Customer Centric Organizations
- Talend Integration Cloud Spring ‘16: Making Leaps with Spark, Amazon Redshift, and EMR Integration
- The Five Phases of Hybrid Integration—Part II
- How To Operationalize Meta-Data in Talend with Dynamic Schemas
- Why Marketing Teams Need Data Prep Tools!
- Apache Solr High Speed Data Integration Plugin
- The Five Phases of Hybrid Integration—Part I
- Big Data: Why You Must Consider Open Source
- Step-by-Step: Running, Testing and Debugging a Job in Talend Open Studio
- February
- Talend and Google Services Components: 9 Possibilities to Explore
- JAX-RS 2.1 Specification Work Has Started
- Delivering Data “As You Like It” with Self-Service
- Big Data & Logistics: 7 Current Trends to Watch
- Step-by-Step: Constructing a Job in Talend Open Studio
- Good News Marketeers! Your Day Job Just Got a WHOLE lot Easier
- Data Prep 101: Getting Started with Talend Data Preparation
- Clean and Actionable Data 1 Click away
- Big Data and the Big Game: Super Bowl 50
- 3 Trends Behind the Movement to Real-Time Data
- January
- Talend Connect 2015: Rethinking Data
- 3 Cloud Trends to Prepare for in 2016
- WADL and Swagger United in Apache CXF
- Talend Joins Google to Propose Dataflow as an ASF Incubator Project
- All Talend MDM Users Can Now Help Create a Golden Record
- My Challenge to Informatica: Let’s Play
- Talend’s Benchmark Against Informatica – Setting the Record Straight
- Start Easily Using Apache Spark With Talend 6!
- How To Turn Any Big Data Project Into a Success (And Key Pitfalls To Avoid)
- Improve Customer Engagement and Generate More Business with Apache Spark
- December
- 2015
- December
- Software Development’s Fountain of Youth
- Don’t Let Your Emails Bounce Back!
- Letting Your Data Quality Software Understand Your Data
- 2016 Predictions – 4 Ways Big Data & Analytics Will Impact Every Business
- Spoiler Alert! Talend 6.1 Hits the ‘Big Screen’
- When it Comes To Big Data – Speed Matters
- What’s Next for IoT: 4 Things to Watch
- Talend “Job Design Patterns” and Best Practices
- IT stuff for free! – 3 Zero-Cost Integration Projects
- November
- Explore the Talend 6 Studio and Its Exciting Productivity Features
- Creating the Golden Record that Makes Every Click Personal
- The Universal Language of Data Mastery
- [Demo] Combining Talend 6 + Spark for Real-Time Big Data Insights
- 6 Things You Should be Looking for in a Big Data Platform
- Too Soon to Talk Holiday Shopping?
- A Surprisingly Simple but Effective Masking System
- You Too Can Become a Data Rock Star & Change the World
- Our Sandbox has Better Toys
- October
- Talend Connect: Step into the future of Big Data!
- Three Key Takeaways from Amazon re:Invent 2015
- Building ‘Houses’ in the Cloud
- You’ve Bought Into the Cloud: Now What?
- Self-Service and Data Governance Empowers LOB Users
- Why Driving a Data-Driven Culture is Essential to Business Success
- Unlocking the Power of the Cloud: Talend Teams Up with AWS at re:Invent 2015
- You Can’t Fake the Data-Driven Force
- September
- Real-Time Big Data is About to Go Mainstream – Are You Ready?
- Survive and Thrive in a Data-Driven Future: Talend Hits the Big Apple at Strata and Hadoop World 2015!
- The Role of Data Governance in Delivering Seamless Omni-Channel Experiences
- The Path to Optimize Retail Operations through Big Data
- Being a Data-Driven Retailer: What’s in it for You?
- Bootstrapping AWS CloudFormation Stacks with Puppet and Structured EC2 User Data
- August
- Focus IT development on the user experience while improving the developer/designer relationship
- Talend – Implementation in the ‘Real World’: Data Quality Matching (Part 2)
- Beyond “The Data Vault”
- Talend and the Gartner Magic Quadrant for Data Integration Tools – Less than a whisker from the leader’s quadrant
- On the Road to MDM
- OSGI Service Containers
- July
- June
- May
- April
- March
- February
- Retail: Personalised Services to Generate Customer Confidence
- What is a Container? Cloud and SOA Converge in API Management (Container Architecture Series Part 2)
- Use Big Data to Secure the Love of Your Customers
- Defining Your “One-Click”
- Big, Bad and Ugly – Challenges of Maintaining Quality in the Big Data Era – Part 1
- January
- December
- 2014
- December
- September
- August
- Key Capabilities of MDM for Anything, and Wrap-up (MDM Summer Series Part 11)
- Key Capabilities of MDM for Product Information Management (MDM Summer Series Part 10)
- Key Capabilities of MDM for Regulated Products (MDM Summer Series Part 9)
- Key Capabilities of MDM for Lean Managed Services (MDM Summer Series Part 8)
- Key Capabilities of MDM for Material Data (MDM Summer Series Part 7)
- MDM for Anything (MDM Summer Series Part 6)
- Product Information Management (MDM Summer Series Part 5)
- MDM for Regulated Products (MDM Summer Series Part 4)
- MDM for Lean Managed Services (MDM Summer Series Part 3)
- July
- May
- April
- February
- 2013
- 2018
Top Categories
Dig Deeper
Can't get enough, can you?
Don't miss out on new content! Sign up for our newsletter.
Join The Conversation