pig latin (loading, counting records)

Load dataa from 3 csv files

grunt> fd = load ‘/user/horton/flightdelays/flight_delays?.csv’ using PigStorage(‘,’);

Grouping all the records by first column

grunt> G = group fd all;

Now count the records by looping using foreach

grunt> wc = foreach G generate COUNT(fd.$0); 

Verify the results by dumping variable wc

dump wc



Now let’s filter by 5th column “DEPARTURE TIME” whose value is “NA”

grunt> fdf = filter fd by $4 != ‘NA’;

grunt> dump fdf  ;






Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.