Skip to content
/ Pigitos Public

Pigitos is a set of tiny, but highly useful UDFs for Apache Pig.

Notifications You must be signed in to change notification settings

kawaa/Pigitos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pigitos

About

Pigitos is a set of tiny, but highly useful Java UDFs for Apache Pig.

Contents

UDFs for manipulating maps

Pigitos provides UDFs to manipulate maps such as calculating the size of the map or retrieving keys (or values, or key/value pairs) as a bag. Such UDFs are very useful when working with dynamically created column qualifiers (that hold some meaningful information that you want to process) in Apache HBase tables.

It seems that there is no such UDFs in Apache Pig itself or Piggybank library. I have found only UDFs like TOBAG or TOTUPLE, but they do not take a map as an input parameter.

Currently, it contains following UDFs:

  • MapSize – takes a map and returns the number of entries in the map
  • MapKeysToBag – takes a map and produces a bag that contains all keys from that map
  • MapValuesToBag -takes a map and produces a bag that contains all values from that map
  • MapEntriesToBag – takes a map and produces a bag that contains tuples, where each tuple consists of two field: key and value (each tuple corresponds to one key/value pair from a map)

Here is a quick example:

User = LOAD 'hbase://user' USING HBaseStorage('friend:*', '-loadKey true') 
  AS (username:chararray, friendMap:map[]);
UserFriend = FOREACH User
  GENERATE username, FLATTEN(MapKeysToBag(friendsMap)) AS friendUsername;

Acknowledges

It is primarily developed at Centre for Open Science (CeON) at Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw (UW).

About

Pigitos is a set of tiny, but highly useful UDFs for Apache Pig.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages