Data Preparation: Empowering The Business User
A growing number of business users with limited knowledge of computer programming are taking an interest in integration functions and data quality, as companies become more and more “data-driven”. From marketing to logistics, customer service to executive management, HR to finance, data analysis has become a means for all company departments to improve their process productivity and efficiency. However, even with cloud graphics development functions offered by publishers like Talend, these tools remain largely reserved for computer scientists and specialists in the development of data integration jobs.
For example, today a marketing manager wanting to launch a campaign has to go to his or her IT department in order to obtain specifically targeted and segmented data, etc. The marketing manager will have to spend time describing their needs in detail, and as for the IT department, it will have to set aside the time to develop the project and then both the marketing manager and IT department will have to conduct initial tests in order to validate the relevance of the development. In this day and age, when reaction time means everything and against a backdrop of global competition in which real-time has become the norm, this process is no longer a valid option.
And yet, business managers simply don't have the time to waste and need shared self-service tools to help them reach their goals. The widespread use of Excel is proof. Business users manage to the best of their ability to make their data usable, which means they spend 70 to 80% of their time preparing this data, without the assurance of even having quality data. Furthermore, the lack of centralized governance represents a risk in terms of the very use of the data including privacy and compliance issues, even problems with data use (such as licensing issues).
These are very common restraints and users need specific tools to manage enrichment, quality or problem-detection issues. Intended for business users, this new type of data preparation solution must be based on an Excel-type shared interface and must offer a broad spectrum of data quality functions. In addition, it must offer viewing and, it goes without saying, transformation functions, easier to use than Excel macros and specialized for the most commonly used domains in order to ensure appropriation by the business user.
For example, by offering a semantic recognition function, the solution could enable automatic model detection and categorization, while simultaneously indicating the potentially missing or non-compliant values. By also offering a visual representation mode based on color codes and pictograms, the user is able to better understand his or her data. In addition, an automatic data class recognition function (personal information, social security number or credit card, URL, email, etc) will further facilitate the user's task.
But if the company is happy with providing self-service tools, it is only addressing one part of the challenges and is “neglecting” to face issues related to the lack of data governance. The IT department, as competent as it may be, generally controls data, which sometimes unleashes the creation of a “Tower of Babel” when users extract their own version of the original data. In this way, a data inventory function would enable the data sets from companies open to “self-service” to be itemized or certified by the IT department, but directly managed by business users. This would enable the implementation of a truly centralized and collaborative platform, giving access to secure and reliable data, while reducing the proliferation of different versions.
What's more, this shared and centralized platform can help IT control the use of data by way of indicators like the popularity of data sets and the monitoring of their use. Or even alarm programming in order to detect problems with data quality, compliance or privacy, as soon as possible. Tracking is the first step in a good governance plan. All in all, it is a win-win situation for everyone: the business user is happy to have access to self-service data sets, to be self-reliant and agile in terms of carrying out the data transformations necessary for his or her business; in the same breath, IT better delegates to its users while implementing good data governance conditions.
However, a new pitfall of the “self-service” model is the fact that it encourages a new type of proliferation: that of personnel preparation scripts. In reality, many preparations can be automated, like recurring operations that have to be conducted every month or on a quarterly basis, like accounting closure. This is what we refer to as “operationalization”: the IT department launches the production of the preparation of recurring data that may be recovered in the form of a verified, certified and official flow of information. By operationalizing their preparations, the users benefit from information system reliability guarantees, including for very large volumes, and for a fraction of the cost thanks to Hadoop. In the end, this virtuous circle meets the double-edged needs highlighted by companies: reactivity (even pro-activity) of business users who have to make decisions in less and less time and the need for the governance and urbanization of the IT department.