Reading a File
In this tutorial, learn how to read data from a simple delimited file.
1. Create a New Job
- Ensure that the Integration perspective is selected.
- In the Project Repository, right-click Job Designs and click Create Standard Job in the menu.
- In the Name field of the New Job wizard, fill in the name of the Job as readCSVFile.
- It is good practice to add a purpose and a description to a Job. Then, click Finish to create your Job.
The Job Designer opens an empty Job.
2. Add a tFileInputDelimited component
3. Configure the tFileInputDelimited_1 component
- In the Job Designer, click the tFileInputDelimited_1.
- To define the Basic settings for the component, in the Component view, click the Component.
Property Type defines how you will read the data source.
File Name/Stream shows the complete input or output file path. You can either type the path manually or use the ellipsis button [..] to provide the file path.
Row and Field Separators define the type of row separator.
Header and Footer indicate the number of rows in the file that should be ignored.
Limit shows the maximum number of lines to read in the file.
Schema defines the data structure of the file.
- To specify the path and name of the file to be read, click [...] next to the File Name field, select the file from the local disk, and click Open.
4. Define the schema for the tFileInputDelimited_1 component
- To define the schema for the tFileInputDelimited_1 component, click [...] next to the Edit schema field.
The Schema of the tFileInputDelimited_1 wizard opens.
[+] button adds a column to the schema wizard.
[x] button removes the selected items from the schema wizard.
[↑] and [↓] buttons move selected items up or down in the schema wizard.
- In the Schema wizard, click the [+] icon to add a column.
- In the Column column, enter the field name as movieID.
- To designate this field as the key, select the Key.
- In the Type column, click Integer.
- Ensure that the Nullable column is unchecked, so that any null value for this column is rejected.
- In the Length column, enter 4.
- Repeat steps b to g for each field in the CSV file.
- To close the Schema wizard, click OK.
5. Add the logging component and propagate the data
- Add a tLogRow component to the Job. The tLogRow component will display in the console all the rows of data it receives.
- To propagate data from the tFileInputDelimited_1 component to the tLogRow_1 component, in the Job Designer, right-click tFileInputDelimited_1, hold, and drag to tLogRow_1.
Alternative method: To link the components, you can also right-click the source component and click Row > Main.
6. Run the Job
In the Run view for the Job readCSVFile, click Run.
The file was read by the tFileInputDelimited component, and its content was displayed on the console by the tLogRow component.