How To Operationalize Meta-Data in Talend with Dynamic Schemas
This blog addresses the operationalizing of meta-data usage in data management using Talend. To explain further you have files and tables which have schema definitions. The schemas hold information like name, data type, and length. This information can be imported or keyed in with a visual editor like Talend Studio at design time; however, this can add a lot of extra work and is prone to errors. If you could extract this information from a known & governed source and use it at run time you could automate the creation of tables or file structures. Once tested and verified the tables can be manually imported if stored meta-data schemas are desired.What is schema?Schema is the definition of the data formats, fields, types and lengths.Persisted Schema This is an example of what a persisted meta-data schema looks like in Talend. This is created at design time in Talend.Data Dictionary File ExampleThis is what a data dictionary file could look like. This can be used to instead of defining a static schema in Talend Meta-Data as displayed above. The data dictionaries used in this process will have a static meta-data schema from a data modeling tool like Erwin or a database schema. Your dictionaries should be transformed to a common format allowing for more code reuse.Schema for a Data DictionaryThis is a basic layout for a data dictionary using Column Name, Type and Length. An exhaustive list of Talend dictionary items is listed below.
- Load the data dictionary
- Define input and output as a single data item
- String for Big Data
- Dynamic for files, tables and internal schemas
- Use Java APIs to operationalize the use of data dictionaries
- For non-Big-Data use the Talend API which will be demonstrated in subsequent examples.
- For Big-Data Java string utilities will be used
- Virtualize schemas for files
- Fixed (NOTE: There are Talend components to do this as well)
- Virtualize schemas for Tables
- Virtualize schemas for Big Data elements like HDFS or Hive Schemas
- Virtualize Internal Schemas used for Data services or Queues
- Dynamic Schemas for traditional files
- Dynamic Schemas for Big Data Files
- Dynamic Schemas for NoSQL Tables
- Operationalizing with the Meta Data Bridge (Available in a future release of Talend)
- Does your Talend use-case favor code re-use and the use of governed 3rd-party data dictionaries?
- Use Dynamic Schemas
- Does Talend use case favor persisted meta-data? Is this meta-data the vehicle for schema governance?
- Use Persisted Schemas
- Do you want to use dynamic schemas to create meta-data for testing and POCs which will eventually end up as persisted meta-data?
- Use a Hybrid Approach by dynamically creating files or tables that can be imported as persisted schemas.
Join The Conversation