ebook: : The Definitive Guide to Data Integration

Best Practices for Using Context Variables with Talend – Part 3

Best Practices for Using Context Variables with Talend – Part 3

  • Richard Hall
    With more than 10 years of data integration consulting experience (many of which having been spent implementing Talend), Richard really knows his stuff. He’s provided solutions for companies on 6 of the 7 continents and has consulted across many different market verticals. Richard is a keen advocate of open source software, which is one of the reasons he first joined Talend in 2012. He is also a firm believer in engaging developers in “cool ways”, which is why he looks for opportunities to demonstrate Talend’s uses with technologies found around the home. Things like hooking his Sonos sound system to Twitter via Talend, getting Google Home to call Talend web services, and controlling his TV with Talend calling universal plug and play services, are a handful of examples.Prior to 2019, Richard had been running his own business providing Talend solutions. During that time he became a prominent contributor on Talend Community, providing both examples of how to solve business problems and also how to do some of the cool stuff mentioned above. In 2019 he was invited to return to Talend as the Technical Community Manager.

Hello and welcome to Part 3 of my best practices guide on context variables! Before I get started, I just want to inform you that this blog builds on concepts discussed in Part 1 and Part 2. Read those before you get started. 

Let’s say that we decide that we want our Talend Jobs to be able to run on any environment after they have been compiled. We do not want to have to compile them again. We want to maintain our Context variable values in a database (which at design time we have no idea of its location) and we want to keep the database connections details hidden so that they cannot easily be found by someone who might get access to the servers. How can this be done? Will this make the framework incredibly complicated?

What if I said you can do this and keep the system incredibly dynamic and developer friendly, using nothing more than a couple of Operating System Environment Variables, a flat file, and a relatively simple Talend Routine? All you would need to do is configure the Environment Variables on the servers that jobs will be run on and place a flat file on those servers. After that, the jobs will automatically pick up the Context variable values when the jobs start, regardless of which environment they run on, and without any need to change the individual jobs. Would that be useful?

The Solution

The solution I will give to the above problem is one I have evolved over several years and several projects, where some or all of the requirements above (and sometimes more complicated ones) have had to be met.

The Context variable Table

The first thing you will need to do is to set up a database table to hold your context variable values. The schema that I generally use can be seen below (this was written for SQL Server):

CREATE TABLE [context_variables](

        [id] [bigint] NOT NULL,

        [env] [varchar](255) NULL,

        [key] [varchar](255) NULL,

        [value] [varchar](255) NULL,

        [description] [varchar](255) NULL

)

This is a bare-bones table. I’ve not added any primary keys (although “id” would be the one I’d use), indexes or any other potentially useful columns. You can do that and configure this as you wish. The important columns used in this example are “key”, “value” and “env”. “key” and “value” MUST be named this way. The Implicit Context Load will need the key column (the column holding the Context variable name) to be called “key” and the value column (the column holding the Context variable value) to be called “value”. Both of these columns are Varchars. The Implicit Context Load will implicitly cast (convert) the values into an object of the correct class. The “env” column I am using to demonstrate how you can have different environment’s Context variables in the same table if you wish. I will get to this later.

The Operating System Environment Variables

In order to enable this solution, every server that Talend jobs might be run on (including development environments) will require 2 Operating System Environment Variables; “FILEPATH” and “ENCRYPTIONKEY”. The “FILEPATH” variable will point to a flat file (.properties file) with the database connection settings in it (encrypted where required) and the “ENCRYPTIONKEY” variable holds the encryption/decryption key. These must be System Environment Variables and they must be set before the Talend components (Studio, Jobservers, etc) are started.

The Properties File

This is a simple flat file which holds the connection details of the database holding the Context variables. It is pointed to by the “FILEPATH” Operating System Environment Variable. The keys (variable names) I am using in this example are pretty generic and are loosely related to the required Implicit Context Load parameters. Since they are referenced elsewhere it makes sense to keep them and just change your values, but if you want to change them, just be sure to make sure you change any names I have hardcoded in the Routine (TalendContextEnvironment has been hardcoded, for example). The file format can be seen below:

TalendContextAdditionalParams=instance=TALEND_DEV

TalendContextDbName=context_db

TalendContextEnvironment=DEV

TalendContextHost=MyDBHost

TalendContextPassword=ENC(4mW0zXPwFQJu/S6zJw7MIJtHPnZCMAZB)

TalendContextPort=1433

TalendContextUser=TalendUser

The ImplicitContextUtils Routine

I have written a basic routine which allows the Implicit Context Load variables to be set to values supplied in your properties file (pointed to by the “FILEPATH” Operating System Variable). The routine can be seen below:

package routines;




import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.util.Properties;

import org.jasypt.encryption.pbe.StandardPBEStringEncryptor;

import org.jasypt.properties.EncryptableProperties;




/*

 * This routine is used to point the Implicit Context Load to the correct database and select the

 * correct Context variable environment. It is also used to automatically decide whether to supply

 * Context variables (for Parent jobs only).

 *

 */

public class ImplicitContextUtils {

  //A static Properties class used to hold the context variables in memory after having been

  //read.

  private static Properties properties;

           

    /**

     * getImplicitContextParamterValue: used to return the appropriate parameter for the Implicit Context Load

     * configuration.

     *

     * {talendTypes} String

     *

     * {Category} Implicit Context Load

     *

     * {param} string("TalendContextDbName") parameter: the parameter name to be returned

     * {param} string("agh565") rootPID: the root process id

     * {param} string("agh565") jobPID: the job process id

     *

     * {example} getImplicitContextParamterValue("TalendContextDbName", "adfr54","adfr54") # returns "Talend_DB"

     */

            public static String getImplicitContextParameterValue(String parameter, String rootPID, String jobPID) {

                       String returnVal = "";




                       //If the properties are null call the getProperties method to populate the properties

                       if(properties==null){

                                  getProperties();                               

                       }

                       returnVal = properties.getProperty(parameter);

                      

                       //Handles formatting the environment WHERE clause when the TalendContextEnvironment parameter

                       //is requested. This must return a WHERE Clause

                       if(parameter.trim().compareToIgnoreCase("TalendContextEnvironment")==0){

                                  //If the jobPID does not equal the rootPID (not a parent job)

                                  //ensure no data will be returned

                                  if (!jobPID.equals(rootPID)) {

                                              returnVal = "env='" + returnVal + "' AND 1=0";

                                  }else{

                                              returnVal = "env='" + returnVal + "'";

                                  }

                       }

                       return returnVal;

            }




     /**

     * getProperties: used to populate the properties variable

     *

     * {talendTypes} void

     *

     * {Category} Implicit Context Load

     *

     *

     * {example} getProperties("TalendContextDbName", "adfr54","adfr54") # returns "Talend_DB"

     */

            private static void getProperties() {

                       String propFile = getEnvironmentVariable("FILEPATH");

                       String encryptionKey = getEnvironmentVariable("ENCRYPTIONKEY");




                       if (propFile != null) {




                                  try {

                                  /*

                                  * First, create the encryptor for decrypting the values in the .properties file.

                                  */

                                               StandardPBEStringEncryptor encryptor = new StandardPBEStringEncryptor();

                                               encryptor.setAlgorithm("PBEWithMD5AndDES");

                                               encryptor.setPassword(encryptionKey);

                                               

                                               /*

                                                * Create our EncryptableProperties object. This is used to decrypt

                                                * and variables surrounded by "ENC(" and ")"

                                                */

                                               properties = new EncryptableProperties(encryptor);

                                               File file = new File(propFile);

                                               FileInputStream fileInput = new FileInputStream(file);

                                               properties.load(fileInput);

                                               fileInput.close();




                                  } catch (FileNotFoundException e) {

                                              e.printStackTrace();

                                  } catch (IOException e) {

                                              e.printStackTrace();

                                  }

                       }

            }




           

    /**

     * getEnvironmentVariable: used to retrieve Environment Variables

     *

     * {talendTypes} String

     *

     * {Category} Implicit Context Load

     *

     * {param} string("TalendContextPassword") variableName: the parameter name to be returned




     *

     * {example} getEnvironmentVariable("TalendContextPassword") # returns "My Password"

     */

            public static String getEnvironmentVariable(String variableName) {

                       String returnVal = System.getenv(variableName);




                       if (returnVal == null) {

                                  System.out.println(variableName

                                                         + " does not exist or holds no value");

                       }

                       return returnVal;

            }

}

There are 4 key parts to this routine which need some explaining.

The “getEnvironmentVariable” Method

This method is a public static method and is used solely to retrieve Operating System Environment Variables. We have 2 Operating System Environment Variables configured in this example; “FILEPATH” and “ENCRYPTIONKEY”. This method is used by the “getProperties” method to retrieve those values and use them to point to your database connection settings properties file.

The “getProperties” Method

This method is a private static method (mainly because it would not be expected for this method to be used on its own by something outside of this Routine) and is used to populate the Properties variable with decrypted property values in key/value pairs. It uses the “getEnvironmentVariable” method to retrieve the Operating System Environment Variables we have set up (the names have been hardcoded within this routine, so MUST be called the same if you use this….or the hardcoded names need changing).

You will also notice that I am using some slightly more complicated JASYPT code here. This code uses the FILEPATH to read the properties file into an EncryptableProperties object. This decrypts those parameters which need decrypting and makes them available via the Properties variable.

The “properties” Variable

The “properties” variable is a private static variable used to keep the database connection properties stored in memory so that the file only needs to be read once per job.

The “getImplicitContextParameterValue” Method

This method is a public static method used to retrieve a value by key from the “properties” variable if it is populated or to call the “getProperties” method and then retrieve the value by key from the “properties” variable. This method is added to each of the Implicit Context Load parameter boxes, with the name of the Context variable supplied as the key with the root PID (Process ID) and the Job PID (Process ID).

The reason for the Root PID and the Job PID is that we do not want to retrieve Context variables in child jobs. If we are passing our Context variables from the Parent Job to the Child Jobs, we may want to keep any changes which may have taken place along the way. To allow this, I have put a little “hack” into the code above. The Implicit Context Load functionality has a “Query Condition” parameter.

I used this to filter by our “env” (environment) column in our context_variables table. The property in the properties file for this is called “TalendContextEnvironment”. Again, I have hardcoded this into my Routine since it will be consistent throughout my entire project, but you can change this if you want. Now when the value “TalendContextEnvironment” is passed into the “getImplicitContextParameterValue” method with the same rootPID and jobPID, the method will know that this is the Parent Job (rootPID and jobPID will only be the same for the Parent Job) and that it needs to retrieve the value for the “Query Condition”. In this case, it will supply a String representing a WHERE CLAUSE for the table we built earlier. Something like below:

env='DEV'

This will allow the Implicit Context Load to query the database against the correct “env” value. But if the rootPID and jobPID are different, this means that we are dealing with a child job. In which case we do not want ANY Context variables returned. In this case, the method would return:

env='DEV' AND 1=0

For the “TalendContextEnvironment” property. You will notice the addition of “0=1”. This is ALWAYS false and as such, no Context variables are returned. The Job will therefore accept the Context values from the Parent Job.

Now you may not wish to do this, in which case you can either modify the code or (more easily) just add the same hardcoded String to both the rootPID and jobPID parameters.

Hooking it all together

Now you should have your Operating System Environment Variables, your Properties file, your Talend Routine, and your Context variable database table. If you have all of these set up, you just need to configure your Implicit Context Load. Now you can do this per job, but it makes much more sense to do it for your whole project. To do it for your whole project go to “File” in your Studio and select “Edit Project Properties”. The following screen should pop-up.

I have partially configured this in the screenshot above. Notice that I have ticked the “Implicit tContextLoad” box to reveal “From File” and “From Database”. I have also selected “From Database” and set the “Property Type”, “Db Type” and “Db Version” for SQL Server (set yours to whichever database type you are using). These values cannot be dynamic, unfortunately.

To configure the rest of the parameters you can use the values below (tweaked to your configuration if you have made changes). If you want to force the Implicit Context Load to run for child jobs then replace rootPid and pid with “” and “”.

Note: rootPid and pid are variables used internally by ALL Talend jobs. They must be used exactly as shown. rootPid clearly corresponds to rootPID, but pid is not so clear. This corresponds to jobPID.

 

Host:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextHost",rootPid,pid)

 

Port:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextPort",rootPid,pid)

 

DB Name:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextDbName",rootPid,pid)

 

Additional Parameters:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextAdditionalParams",rootPid,pid)

Password:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextPassword",rootPid,pid)

Note: this code is encrypted since it normally represents an unencrypted password. To add this simply click on the password ellipsis button and add the above code without quotes surrounding the text.

 

Table Name:

”context_variables”

NB: At this point, I noticed that “table name” had not been configured in the Properties File example I put together above. I realized that in my eagerness to get this final blog post of the series out, I had made a mistake. I had hardcoded it in the Implicit Context Load settings. I decided to leave it as hardcoded to show that both Routine methods and hardcoded values can be used side by side in the Implicit Context Load settings. This was not in any way left because I didn’t want to revisit everything I had already written….honestly J. If you wish not to hard code this, simply add a variable name to the Properties file, for which a Context variable exists, which will be used to hold the table name. For example, something with the name of TalendContextTableName. Then all you need to do is replace the hard-coded value above with code like below:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextTableName",rootPid,pid)

Query Condition:

routines.ImplicitContextUtils.getImplicitContextParameterValue("TalendContextEnvironment",rootPid,pid)

Running a Job

Once your Implicit Context Load settings are populated you can run the Job from the Studio, compile (build) the Job and run it from the command line (if the machine has the properties file and Operating System Environment Variables configured) or run it on a JobServer via TAC (again, so long as the properties file and Operating System Environment Variables are configured on the JobServer machine). You simply have to ensure that wherever the Job is run, that the server has the correct environment variables and Properties file on it.

At first glance, this might appear to be quite complicated, but once it is set up, this solution allows you to build your jobs without caring about environments. You will KNOW that your jobs will run against the environment that is configured for the machine that they are running on.

There you have it! That's close to everything you need to know about using Context Variables with Talend. We have one more part coming ot next week. Come back to finish the series!

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *