Thursday, October 20, 2016

First Encounter with HPCC and ECL

I have been trying out HPCC for a few days now.  HPCC is an open source big data platform developed by LexisNexis which can run on commodity computing clusters like Amazon AWS.  HPCC includes its own declarative programming language called ECL to extract, transform and load large scale data.  Being a declarative programming language, it belongs to the same class of languages as SQL as opposed to imperative languages like C/C++, Java, Python or PHP.  On first sight though, ECL looks a bit like C++.  Perhaps it has to do with the fine granularity of control permissible by ECL such as file pointers and bytes allocation as we shall see.


Setting up

So how does one get started with HPCC quickly and freely?  For me, the path of least resistance was to download and install the HPCC virtual image and run it with VirtualBox.  This will deploy a pre-configured linux based HPCC guest server running in your local machine for demo purposes.

Next, you'll need to install an IDE to interact with the server and expedite ECL programming.
Though an IDE is not required and command line interface tools are available for download, many of the online tutorials assume you're using an IDE, in particular, the ECL IDE for Windows.  I was lucky to have my old copy of Windows 7 lying around which I installed ECL IDE to.  So now I have both HPCC server and Windows 7 with ECL IDE running as guests on my Mac using VirtualBox and everything is working okay (after some hiccups).


Getting Acquainted

While there are various learning resources available on the HPCC website, they are scattered on different pages.  It can be a flustering experience not knowing which ones you should start with and in what order.  Also, some of the resources are seemingly locked away for customers only or require access credentials.  Hopefully, by the time you're reading this, the resources are better organized.

  1. In hindsight, I would start by reading Running HPCC in a Virtual Machine to help with the installation and usage of ECL IDE.

  2. To gain a little bit more insights into ECL, I read HPCC Data Tutorial and followed the short programming examples.

  3. Depending on your preference, you could watch some of the short tutorial videos.

  4. What helped me the most so far is the ECL Programmers Guide.  It's my Rosetta stone to ECL.  I hope they would continue to expand the guide and coverage with more examples.  When reading the guide, you would need to frequently consult the ECL Language Reference.

I haven't read everything there is yet and most likely, there are other useful resources I haven't stumbled upon yet.  Hopefully, these are enough to get you started with HPCC.  In my next article, I'll share what I've learned so far on programming with ECL.

No comments:

Post a Comment