In this installment of A Day in the Life of a Data Integration Developer, you will learn how to configure your Activity Monitor Console (AMC) to capture job statistics for a particular data integration Job.
- Part 1: Introduction to Talend Studio
- Part 2: How to Build Your First Job in Talend Studio
- Part 3: Running, Testing, and Debugging
- Part 4: AMC Studio Basic Features
- Part 5: Basic Job Design Features
- Part 6: How to Self-Document Any Data Integration Job
- Part 7: How to Import a License File
The AMC gives you critical information about how long certain components ran, the overall performance of a job, and the historical performance of a Job.
So let’s take a look at the airline dimension load table data integration Job.
If you click on the Job tab and go under the stat and log, you’ll see that in the instance that I have, the Job settings have been configured to capture all of these stats and workflows, and our data flow points from the Job.
It’s also being logged into an AMC database on my sequel, and telling it what tables it needs to go to, as well as capturing the stats from different components. Furthermore, I can drill into each component to capture specific stats. It will then log those specific stats into the AMC tables. I have that checked on all three of the aggregates because I want to make sure I know if any one of the aggregates is taking longer than the others.
Once the process runs, you can open the AMC perspective within the studio. And if you don’t have that available you click on that icon there and it will open up all the available perspectives and you select AMC.
Up at the top, you can select which projects and what connection you want, and then hit Refresh and it will bring up all the past jobs that have run that had AMC data logged.
So here’s the dimension load table and all the historical runs that are in the database.
If you click on one of the runs, and click on the first or the middle tab for the logs, it will show you the breakdown of the components. So I can see the file read took 25% of the time, and the rest of time is split evenly between the aggregation components. So not any one of the components is taking up any more time than the other.
I also put monitoring on, so I can monitor if any records are being rejected and at what percentage. You can also put thresholds on that. The gauge here is in green showing 100% of the data was loaded, so I don’t have any errors or issues to be concerned.
For more details, watch the complete video, or click through to learn about basic Job design features.