Talend Data Mapper, Spark and Electronic Data Interchange
We all know that mapping complex files can be cumbersome, and large files often cannot be processed in a day. Want to reduce complex data integration from days, or even weeks, to minutes? Talend Data Mapper is your answer.
In the video below, we show you how to read EDI (Electronic Data Interchange) data coming from health claims and quickly map it to a simple structure.
What exactly is EDI? Electronic Data Interchange is the transfer of documents or data from one computer to another in a standard format. Compared to other types of exchange such as paper, electronic exchange reduces cost and errors as well as improves speed.
Standard formats are important in EDI. Because EDI documents must be processed by computers, a standard format must be used so the computer is able to understand the documents. Moreover, the format describes what each piece of information is.
Several EDI standards are in use including ANSI, X12 and EDIFACT. For each standard, there are also many different versions. Most businesses today use EDI translators to translate the EDI format.
At Talend, we have run load tests specifically on EDI X12 and we would like to share with you the following results. On a Cloudera Distributed Hadoop (CDH) 5.7.0 cluster with 10 nodes and 288 Vcores, 152GB of EDI data are processed in only 13 minutes! One of the main benefits of utilizingApache Spark, as we can see with this test, is that it can map a very large number of EDI documents in parallel in a matter of minutes.
To summarize what you are about to see in the video, we are taking a complex data type, simplifying it and running the process on Apache Spark. This approach offers heightened efficiency and productivity while handling data. Watch the video below!