ebook: : The Definitive Guide to Data Integration

Best Practices for Using Context Variables with Talend – Part 1

Best Practices for Using Context Variables with Talend – Part 1

  • Richard Hall
    With more than 10 years of data integration consulting experience (many of which having been spent implementing Talend), Richard really knows his stuff. He’s provided solutions for companies on 6 of the 7 continents and has consulted across many different market verticals. Richard is a keen advocate of open source software, which is one of the reasons he first joined Talend in 2012. He is also a firm believer in engaging developers in “cool ways”, which is why he looks for opportunities to demonstrate Talend’s uses with technologies found around the home. Things like hooking his Sonos sound system to Twitter via Talend, getting Google Home to call Talend web services, and controlling his TV with Talend calling universal plug and play services, are a handful of examples. Prior to 2019, Richard had been running his own business providing Talend solutions. During that time he became a prominent contributor on Talend Community, providing both examples of how to solve business problems and also how to do some of the cool stuff mentioned above. In 2019 he was invited to return to Talend as the Technical Community Manager.

A question I was regularly asked when working on different customer sites and answering questions on forums was “What is the best practice when using context variables?”

My years of working with Talend have led me to work with context variables in a way that minimizes the effort I need to put into ongoing maintenance and moving them between environments. This blog series is intended to give you an insight into the best practices I use as well as highlight the potential pitfalls that can arise from using the Talend context variable functionality without fully understanding it.

Contexts, Context Variables and Context Groups

To start, I want to ensure that we are all on the same page with regard to terminology. There are 3 ways “Context” is used in Talend:

  • Context variable: A variable which can be set either at compile time or runtime. It can be changed and allows variables which would otherwise be hardcoded to be more dynamic.
  • Context: The environment or category of the value held by the context variable. Most of the time Contexts are DEV, TEST, PROD, UAT, etc. This allows you to set up one context variable and assign a different value per environment.
  • Context Group: A group of context variables which are packaged together for ease of use. Context Groups can be dragged and dropped into jobs so that you do not have to set up the same context variables in different jobs. They can also be updated (added to) in one location and then the changes can be distributed to the jobs that use those Context Groups.

I’ve found that many people will refer to “context variables” as “contexts”. This leads to confusion in discussions, so if these terms are used incorrectly online it really can confuse the issue. So, now that we have a common set of definitions, let’s move forward.

Potential Pitfalls with Contexts

While context variables are incredibly useful when working with Talend, they can also introduce some unforeseen problems if not fully understood. The biggest cause of problems in my experience are the contexts. Quite simply, I do not use anything but a Default Context.

At the beginning of your Talend journey, they come across as a genius idea which allows developers to build against one environment, using that environment’s context variable values, then when the code is ready to test, change the context at the flick of a switch. That is true (kind of), but mainly for smaller data integration jobs. However, more often than not they open up developers and testers to horrible and time-consuming unexpected behavior. Below is just one scenario demonstrating this.

<< Get Your Free Book: The Definitive Guide to Data Integration>>

Let’s say a developer has built a job which uses a Context Group configured to supply database connection parameters. She has set up 4 Contexts (DEV1, DEV2, TEST and PROD) and has configured the different Context Variable values for each Context. In her main job, she reads from the database and then passes some of the data to Child Jobs using tRunJob components. Some of these Child Jobs have their own Child Jobs and all Child Jobs make use of the database. Thus, all jobs make use of the Context Group holding the database credentials. While she is developing, she sets the Context within the tRunJobs to DEV1. This is great. She can debug her Job until she is happy that it is working. However, she needs to test on DEV2 because it has a slightly cleaner environment. When she runs the Parent Job she changes the default Context from DEV1 to DEV2 and runs the Job. It seems to work, but she cannot see the database updates in her DEV1 database. Why? She then realizes that her Child Jobs are all defaulted to use DEV1 and not DEV2.

Now there are ways around this, she could ensure that all of her tRunJobs are set with the correct Context. But what if she has dozens of them? How long will that take? She could ensure that “Transmit whole context” is set in each tRunJob. But what happens if a Child Job is using a Context variable or Context Group that is not used by any of the Parent Jobs? We are back to the same problem of having to change all of the tRunJob Contexts. But this doesn’t affect us outside of the Talend Studio, right? Wrong.

If the developer compiled that job to use on the command-line, even if she sets “Apply Context to children jobs” on the Build Job page, all this does is hardcode all of the Child Jobs’ Contexts to that selected in the Context scripts drop down. When you run it, if you change the Context that the Job needs to run for, the Child Jobs stick with the one that has been compiled. The same thing happens in the Talend Administration Center (TAC) as well.

Now, this does have some uses. Maybe your Contexts are not for environments and you want to be able to use different Contexts within the same environment? That is a legitimate (if not slightly unusual) scenario. There are other examples of these sorts of problems, but I think you get the idea.

In the early days of Talend, Contexts were brilliant. But these days (unless you have a particular use case where multiple Contexts are used within a single environment), there are better ways of handling Context variables for multiple environments. I’ll cover all of those ways and best practices in part two and three of our blog series coming out next week. Until next time!

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *