PIG: filter by Destination and count the values

denver_total.pig

FDCLN = load ‘/user/horton/flightdelays_clean/part*’ using PigStorage(‘,’) as (Year, Month, DayofMonth, DepTime, UniqueCarrier, FlightNum, ArrDelay, Origin, Dest);

FDCLN_FLTR = FILTER FDCLN by Dest==’DEN’;

FDCLN_G = GROUP FDCLN_FLTR by Dest;

FDCLN_DEN_CNT = FOREACH FDCLN_G GENERATE COUNT(FDCLN_FLTR);

store FDCLN_DEN_CNT into ‘/user/horton/denver_total’ using PigStorage();

Now another pig script “denver_late.pig” to count Denver destination flights with delay >=60minutes.

denver_late.pig

FDCLN = load ‘/user/horton/flightdelays_clean/part*’ using PigStorage(‘,’) as (Year, Month, DayofMonth, DepTime, UniqueCarrier, FlightNum, ArrDelay, Origin, Dest);

FDCLN_FLTR = FILTER FDCLN by Dest==’DEN’ and ArrDelay >= 60;

FDCLN_G = GROUP FDCLN_FLTR by Dest;

FDCLN_DEN_CNT = FOREACH FDCLN_G GENERATE COUNT(FDCLN_FLTR);

store FDCLN_DEN_CNT into ‘/user/horton/denver_late’ using PigStorage();

 

 

 

Advertisements

Author: rajukv

Hadoop(BigData) Architect and Hadoop Security Architect can design and build hadoop system to meet various data science projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s