Friday, October 21, 2016

Initialize, Load and Save Data in ECL

In a previous article, I introduced HPCC and ECL, the data-centric declarative programming language for HPCC.  In this first article on ECL, I'll share what I have learned about ECL after experimenting with it for a while.  As ECL and HPCC are about extract, transform and load data, I'll demonstrate using a toy data set consisting of person names and dates of birth. I'll continue to use this data set in the following articles on ECL.

Without further adieu, we first declare the format of the incoming data.

Record_Member_Raw := RECORD
    UNSIGNED8 Id;
    STRING15 LastName;
    STRING15 FirstName;
    STRING20 Birthdate;
END;

The syntax is rather intuitive. We're declaring a data row or record consisting of 4 fields and their data types are 8 bytes unsigned integer, 15 bytes string, 15 bytes string and 20 bytes string.  ECL strings are space padded and not null-terminated.  Also, ECL is not case sensitive so Record_Member_Raw is same as record_member_raw, LastName is same as lastname, and UNSIGNED8 is same as unsigned8.

In the real world, you will be getting your data from an existing data source but in this case, I'm hardcoding the data manually to members_file_raw:

members_file_raw := DATASET([
    {1,'Picard','Jean-Luc','July 13, 2305'},
    {2,'Riker','William','2335'},
    {3,'La Forge','Geordi','February 16, 2335'},
    {4,'Yar','Tasha','2337'},
    {5,'Worf','','2340'},
    {6,'Crusher','Beverly','October 13, 2324'},
    {7,'Troi','Deanna','March 29, 2336'},
    {8,'Data','','February 2, 2338'},
    {9,'Crusher','Wesley','July 29, 2349'},
    {10,'Pulaski','Katherine','2309'},
    {11,'O\'Brien','Miles','September 2328'},
    {12,'Guinan','','1293'}], Record_Member_Raw);

Our data set consists of 12 records.  The records are of type Record_Member_Raw.  To display or output the data set or recordset result, add the following code to your ECL script.

OUTPUT(members_file_raw);

OUTPUT(members_file_raw(lastname='Data' OR id=3));

OUTPUT(members_file_raw[1]);

OUTPUT(members_file_raw[1..3]); 

The first output dumps the entire data set.  The second selects only records meeting the filter condition.  The third outputs only the first record.  Note, ECL indexing starts with 1 and not 0.  Indexing can also be a range like in the last output which returns the first 3 records.  You can also save the output to file:

OUTPUT(members_file_raw, ,'~FOO::BAR::Members', OVERWRITE);

The OUTPUT action will be used frequently in debugging ECL code.  To learn about what each ECL action does, the ECL Language Reference is your best (and only) source of help.

Remember earlier I said that in the real world, you will be getting your data from an existing data source.  Well, now you have a data source which is the file you just created.  To load the file, the command is:

loaded_members_file := DATASET('~FOO::BAR::Members', Record_Member_Raw, THOR);

In this article, I showed how to initialize, load and save data in ECL.  In the next article, I'll pre-process the data by parsing the date of birth into separate month, day and year fields.


No comments:

Post a Comment