In my first job I was working for a company that developed a management information system for UK Police Forces; this system produced the statutory HMIC (Her Majesty’s Inspectorate of Constabulary) reports and allowed OLAP exploration of the datasets loaded into cubes from the data warehouse tables.
One of the areas that I implemented was the key performance indicators for Road Traffic Collisions, so I was intrigued to discover that the fuller, anonymised STATS19 dataset was now available on data.gov.uk. If you’re interested in the STATS19 form you can see it here.
Apache Pig is designed to handle analysis of large data sets using a high-level language (Pig Latin) that allows for parallelisation. Pig Latin compiles to sequences of Map-Reduce programs that can be executed on Hadoop.
This post pulls together an archive of some Apache Pig tips tweeted as “Apache #Pig tip of the day”.
Posted in Notes
Tagged big data, Hadoop, Pig
Grant Ingersoll started the conference with a keynote on search + big data “it’s still all about the user“. A refreshing talk on keeping the end users needs, both internal and external, in mind rather than fixating on specific technologies / technical band wagons.