pig latin (loading, counting records)

Load dataa from 3 csv files

grunt> fd = load ‘/user/horton/flightdelays/flight_delays?.csv’ using PigStorage(‘,’);

Grouping all the records by first column

grunt> G = group fd all;

Now count the records by looping using foreach

grunt> wc = foreach G generate COUNT(fd.$0); 

Verify the results by dumping variable wc

dump wc

OUTPUT:

(30000)

Now let’s filter by 5th column “DEPARTURE TIME” whose value is “NA”

grunt> fdf = filter fd by $4 != ‘NA’;

grunt> dump fdf  ;

(2008,1,6,7,900,905,1009,1025,WN,469,N720WN,69,80,57,-16,-5,LAX,SFO,337,6,6,0,,0,NA,NA,NA,NA,NA)
(2008,1,6,7,2000,1955,2121,2115,WN,593,N720WN,81,80,63,6,5,LAX,SFO,337,5,13,0,,0,NA,NA,NA,NA,NA)
(2008,1,6,7,1624,1620,1742,1740,WN,618,N720WN,78,80,66,2,4,LAX,SFO,337,4,8,0,,0,NA,NA,NA,NA,NA)
(2008,1,6,7,1946,1805,2059,1930,WN,646,N283WN,73,85,61,89,101,LAX,SFO,337,4,8,0,,0,0,0,6,0,83)
(2008,1,6,7,1549,1430,1706,1550,WN,656,N283WN,77,80,68,76,79,LAX,SFO,337,3,6,0,,0,0,48,0,0,28)

 

 

 

 

Advertisements

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s