Best Practices for Using Context Variables with Talend: Part 1

A question I was regularly asked when working on different customer sites and answering questions on forums was “What is the best practice when using context variables?”

My years of working with Talend have led me to work with context variables in a way that minimizes the effort I need to put into ongoing maintenance and moving them between environments. This blog series is intended to give you an insight into the best practices I use as well as highlight the potential pitfalls that can arise from using the Talend context variable functionality without fully understanding it.

Contexts, Context Variables and Context Groups

To start, I want to ensure that we are all on the same page with regard to terminology. There are 3 ways “Context” is used in Talend:

  • Context variable: A variable which can be set either at compile time or runtime. It can be changed and allows variables which would otherwise be hardcoded to be more dynamic.
  • Context: The environment or category of the value held by the context variable. Most of the time Contexts are DEV, TEST, PROD, UAT, etc. This allows you to set up one context variable and assign a different value per environment.
  • Context Group: A group of context variables which are packaged together for ease of use. Context Groups can be dragged and dropped into jobs so that you do not have to set up the same context variables in different jobs. They can also be updated (added to) in one location and then the changes can be distributed to the jobs that use those Context Groups.

I’ve found that many people will refer to “context variables” as “contexts”. This leads to confusion in discussions, so if these terms are used incorrectly online it really can confuse the issue. So, now that we have a common set of definitions, let’s move forward.

Potential Pitfalls with Contexts

While context variables are incredibly useful when working with Talend, they can also introduce some unforeseen problems if not fully understood. The biggest cause of problems in my experience are the contexts. Quite simply, I do not use anything but a Default Context.

At the beginning of your Talend journey, they come across as a genius idea which allows developers to build against one environment, using that environment’s context variable values, then when the code is ready to test, change the context at the flick of a switch. That is true (kind of), but mainly for smaller data integration jobs. However, more often than not they open up developers and testers to horrible and time-consuming unexpected behavior. Below is just one scenario demonstrating this.

<< Get Your Free Book: The Definitive Guide to Data Integration>>

Let’s say a developer has built a job which uses a Context Group configured to supply database connection parameters. She has set up 4 Contexts (DEV1, DEV2, TEST and PROD) and has configured the different Context Variable values for each Context. In her main job, she reads from the database and then passes some of the data to Child Jobs using tRunJob components. Some of these Child Jobs have their own Child Jobs and all Child Jobs make use of the database. Thus, all jobs make use of the Context Group holding the database credentials. While she is developing, she sets the Context within the tRunJobs to DEV1. This is great. She can debug her Job until she is happy that it is working. However, she needs to test on DEV2 because it has a slightly cleaner environment. When she runs the Parent Job she changes the default Context from DEV1 to DEV2 and runs the Job. It seems to work, but she cannot see the database updates in her DEV1 database. Why? She then realizes that her Child Jobs are all defaulted to use DEV1 and not DEV2.

Now there are ways around this, she could ensure that all of her tRunJobs are set with the correct Context. But what if she has dozens of them? How long will that take? She could ensure that “Transmit whole context” is set in each tRunJob. But what happens if a Child Job is using a Context variable or Context Group that is not used by any of the Parent Jobs? We are back to the same problem of having to change all of the tRunJob Contexts. But this doesn’t affect us outside of the Talend Studio, right? Wrong.

If the developer compiled that job to use on the command-line, even if she sets “Apply Context to children jobs” on the Build Job page, all this does is hardcode all of the Child Jobs’ Contexts to that selected in the Context scripts drop down. When you run it, if you change the Context that the Job needs to run for, the Child Jobs stick with the one that has been compiled. The same thing happens in the Talend Administration Center (TAC) as well.

Now, this does have some uses. Maybe your Contexts are not for environments and you want to be able to use different Contexts within the same environment? That is a legitimate (if not slightly unusual) scenario. There are other examples of these sorts of problems, but I think you get the idea.

In the early days of Talend, Contexts were brilliant. But these days (unless you have a particular use case where multiple Contexts are used within a single environment), there are better ways of handling Context variables for multiple environments. I’ll cover all of those ways and best practices in part two and three.

Part 2 →

Ready to get started with Talend?