Whether it is from a database or a file, sourcing data is one of the most basic and necessary elements of data integration. Talend Open Studio for Data Integration allows for easy access to your data with a wide array of components that support database connectivity as well as standard and complex file formats. In this tutorial, you will see just how easy it is to access data within a standard comma separated file format.
This tutorial uses Talend Open Studio Data Integration version 6
1. Create a New Job
- Ensure that the Integration perspective is selected.
- In the Project Repository, right-click Job Designs and click Create Standard Job in the menu.
- In the Name field of the New Job wizard, fill in the name of the Job as readCSVFile.
- It is good practice to add a purpose and a description to a Job. Then, click Finish to create your Job.
The Job Designer opens an empty Job.
2. Add a tFileInputDelimited component
3. Configure the tFileInputDelimited_1 component
- In the Job Designer, click the tFileInputDelimited_1.
- To define the Basic settings for the component, in the Component view, click the Component.
Property Type defines how you will read the data source.
File Name/Stream shows the complete input or output file path. You can either type the path manually or use the ellipsis button [..] to provide the file path.
Row and Field Separators define the type of row separator.
Header and Footer indicate the number of rows in the file that should be ignored.
Limit shows the maximum number of lines to read in the file.
Schema defines the data structure of the file.
- To specify the path and name of the file to be read, click […] next to the File Name field, select the file from the local disk, and click Open.
4. Define the schema for the tFileInputDelimited_1 component
- To define the schema for the tFileInputDelimited_1 component, click […] next to the Edit schema field.
The Schema of the tFileInputDelimited_1 wizard opens.
[+] button adds a column to the schema wizard.
[x] button removes the selected items from the schema wizard.
[↑] and [↓] buttons move selected items up or down in the schema wizard.
- In the Schema wizard, click the [+] icon to add a column.
- In the Column column, enter the field name as movieID.
- To designate this field as the key, select the Key.
- In the Type column, click Integer.
- Ensure that the Nullable column is unchecked, so that any null value for this column is rejected.
- In the Length column, enter 4.
- Repeat steps b to g for each field in the CSV file.
- To close the Schema wizard, click OK.
5. Add the logging component and propagate the data
- Add a tLogRow component to the Job. The tLogRow component will display in the console all the rows of data it receives.
- To propagate data from the tFileInputDelimited_1 component to the tLogRow_1 component, in the Job Designer, right-click tFileInputDelimited_1, hold, and drag to tLogRow_1.
Alternative method: To link the components, you can also right-click the source component and click Row > Main.
6. Run the Job
In the Run view for the Job readCSVFile, click Run.
The file was read by the tFileInputDelimited component, and its content was displayed on the console by the tLogRow component.