Performing a Structural Analysis

Task outline

You can profile your data using many types of analysis available in Talend Studio. A structural analysis provides basic information about your databases and catalogs, including the number of tables, rows per table, indexes, and primary keys. This overview is often the starting point for a comparison of separate data sources, or for making sure data includes the structure needed to meet data quality requirements.

After starting Talend Studio, you will establish a connection to the local database server. Then you will use a structural analysis to create an overview of the databases stored on the MySQL server, focusing on the CRM database.

Your training environment uses a single virtual machine (VM) that contains all the software you need to complete this tutorial, including Talend Studio and MySQL.

Starting Talend Studio

  1. To start the virtual machine, open a new tab with the same page and click the following link : START VM!

    The VM is launched in your web browser. Wait for Windows to start.

    A script is automatically launched. You can close it by clicking the X button.

  2. Select Yes on the Networks panel on the right side of your screen.

  3. To start Studio, on your VM desktop, double-click the Talend Studio shortcut.

  4. The Talend Data Fabric window opens.

    A project named DQ Essentials is available.

    Click Finish.

  5. Verify that the splash screen appears.

  6. Click the Start now! button (you may need to scroll down).

    The window opens.

Depending on your preferences and available tools, your screen may vary.

You can see the commonly used areas:

  • The DQ Repository, where data analysis and metadata are stored, is in the upper left corner.
  • The workspace, where you create and modify Jobs, run analyses, and examine results, is in the center.

Creating database connection metadata

  1. If you are not already in the Profiling perspective, click the Profiling icon.

  2. Create Metadata. In DQ Repository, expand Metadata.

    Right-click DB connections and click Create Connection.

    You are creating a connection to a collection of databases already configured in your training environment.

    1. In the Name text box, enter StagingDB.

    2. Click Next.

    3. On the DB Type list, click MySQL. Do not modify the default value in the Db Version box.

    4. Enter these settings:

        • In the Login and Password text boxes, enter root

        • In the Server text box, enter localhost

        • Leave the DataBase text box empty (you want to be able to access all databases configured on the local server)

    5. To verify your connection information, click the Check button.

    6. In the Check Connection window, click OK or make any necessary corrections and again click Check.
    7. Click Finish.

      Your new database connection, StagingDB, appears in the repository.

      The workspace displays the configuration information for the connection.

      If needed, make changes to the connection settings.

Creating a connection overview analysis

  1. Create a folder.
    1. In DQ Repository, expand Data Profiling.

    2. To create a folder for your analyses, right-click Analyses and select Create Folder.

      Name it CRM_Analysis and click Finish.

  2. Create the analysis.

    1. Right-click the CRM_Analysis folder and click New Analysis.
    2. Expand Structural Analysis and click Connection Overview Analysis.

    3. Read the background information and click Next.
  3. Set up the analysis.

    1. Now you will identify the analysis.

      In the Name text box, enter Database_Server_Connection_Analysis.

    2. Click Next.
    3. Choose a connection to analyze. Expand DB connections and click StagingDB (the database connection metadata you created earlier).

    4. Click Next.
    5. You can limit the analysis to specific tables or views. Leave the text boxes empty so that the analysis includes all information about the database connection.

    6. Click Finish.

      The analysis is open in the workspace with no results.

Analyzing the results

  1. Run the analysis.
    1. At the top of the workspace, click the Run icon.

    2. When the analysis runs successfully, the Analysis Results tab appears.

      You can switch between the Analysis Settings and Analysis Results tabs using the tab bar in the lower section of the workspace.

      Global statistics for each database are displayed in the Statistical Information section.

  2. Explore the CRM catalog.
    1. To display additional information about a catalog, in Statistical Information, in the Catalog column, click crm.

      The table on the lower left displays the number of rows, primary keys, and indexes for each table in the selected catalog.

    2. In the table on the lower left, right-click in the country row and click View keys.

    The display changes in several ways. By clicking View keys, you change from the Profiling to the Data Explorer perspective. Your current perspective is indicated in the upper right corner, where you can switch between perspectives with a single click.

    Notice the new elements displayed in the Data Explorer perspective:

    • The Database Detail view opens below the analysis results. It shows basic information on the primary key of the country table.

      More details about this table are available on the other tabs in the Database Detail view.

    • The Database Structure tree is displayed on the right side of the window.

      The database selected in the screenshot is the same as displayed in the Database Detail view. The primary key of the table is displayed with a specific icon.

    • On the tabs displayed in the upper left corner, you see basic connection and SQL history information.

  3. Again click the Profiling button.

In this tutorial, you started Talend Studio and learned about perspectives. You created and stored database connection information as metadata in the DQ repository. You finished by learning about structural analyses, creating and running a connection overview analysis to display a complete overview of the type of information in your database.

You have finished the tutorial.