PIG: filter by Destination and count the values

denver_total.pig

FDCLN = load ‘/user/horton/flightdelays_clean/part*’ using PigStorage(‘,’) as (Year, Month, DayofMonth, DepTime, UniqueCarrier, FlightNum, ArrDelay, Origin, Dest);

FDCLN_FLTR = FILTER FDCLN by Dest==’DEN’;

FDCLN_G = GROUP FDCLN_FLTR by Dest;

FDCLN_DEN_CNT = FOREACH FDCLN_G GENERATE COUNT(FDCLN_FLTR);

store FDCLN_DEN_CNT into ‘/user/horton/denver_total’ using PigStorage();

Now another pig script “denver_late.pig” to count Denver destination flights with delay >=60minutes.

denver_late.pig

FDCLN = load ‘/user/horton/flightdelays_clean/part*’ using PigStorage(‘,’) as (Year, Month, DayofMonth, DepTime, UniqueCarrier, FlightNum, ArrDelay, Origin, Dest);

FDCLN_FLTR = FILTER FDCLN by Dest==’DEN’ and ArrDelay >= 60;

FDCLN_G = GROUP FDCLN_FLTR by Dest;

FDCLN_DEN_CNT = FOREACH FDCLN_G GENERATE COUNT(FDCLN_FLTR);

store FDCLN_DEN_CNT into ‘/user/horton/denver_late’ using PigStorage();

 

 

 

Advertisements