Skip to content

Commit

Permalink
Update date for blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
chimerasaurus committed May 27, 2016
1 parent 49ae0d9 commit 0a1d6c8
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 14 deletions.
4 changes: 3 additions & 1 deletion _posts/2016-05-20-where-is-my-pcollection-dot-map.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
---
layout: post
title: "Where's my PCollection.map()?"
date: 2016-05-20 11:00:00 -0700
date: 2016-05-27 09:00:00 -0700
excerpt_separator: <!--more-->
categories: blog
authors:
- robertwb
---
Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.

<!--more-->

Though Beam is relatively new, its design draws heavily on many years of experience with real-world pipelines. One of the primary inspirations is [FlumeJava](http://research.google.com/pubs/pub35650.html), which is Google's internal successor to MapReduce first introduced in 2009.

The original FlumeJava API has methods like `count` and `parallelDo` on the PCollections. Though slightly more succinct, this approach has many disadvantages to extensibility. Every new user to FlumeJava wanted to add transforms, and adding them as methods to PCollection simply doesn't scale well. In contrast, a PCollection in Beam has a single `apply` method which takes any PTransform as an argument.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<link rel="stylesheet" href="/css/theme.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js"></script>
<script src="/js/bootstrap.min.js"></script>
<link rel="canonical" href="http://beam.incubator.apache.org/blog/2016/05/20/where-is-my-pcollection-dot-map.html">
<link rel="canonical" href="http://beam.incubator.apache.org/blog/2016/05/27/where-is-my-pcollection-dot-map.html">
<link rel="alternate" type="application/rss+xml" title="Apache Beam (incubating)" href="http://beam.incubator.apache.org/feed.xml">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
Expand Down Expand Up @@ -99,14 +99,16 @@

<header class="post-header">
<h1 class="post-title" itemprop="name headline">Where's my PCollection.map()?</h1>
<p class="post-meta"><time datetime="2016-05-20T11:00:00-07:00" itemprop="datePublished">May 20, 2016</time> • Robert Bradshaw
<p class="post-meta"><time datetime="2016-05-27T09:00:00-07:00" itemprop="datePublished">May 27, 2016</time> • Robert Bradshaw
</p>
</header>

<div class="post-content" itemprop="articleBody">
<p>Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.
<p>Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.</p>

<!--more-->
Though Beam is relatively new, its design draws heavily on many years of experience with real-world pipelines. One of the primary inspirations is <a href="http://research.google.com/pubs/pub35650.html">FlumeJava</a>, which is Google’s internal successor to MapReduce first introduced in 2009.</p>

<p>Though Beam is relatively new, its design draws heavily on many years of experience with real-world pipelines. One of the primary inspirations is <a href="http://research.google.com/pubs/pub35650.html">FlumeJava</a>, which is Google’s internal successor to MapReduce first introduced in 2009.</p>

<p>The original FlumeJava API has methods like <code class="highlighter-rouge">count</code> and <code class="highlighter-rouge">parallelDo</code> on the PCollections. Though slightly more succinct, this approach has many disadvantages to extensibility. Every new user to FlumeJava wanted to add transforms, and adding them as methods to PCollection simply doesn’t scale well. In contrast, a PCollection in Beam has a single <code class="highlighter-rouge">apply</code> method which takes any PTransform as an argument.</p>

Expand Down
6 changes: 3 additions & 3 deletions content/blog/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,16 @@ <h1 class="post-title">Apache Beam Blog</h1>
<p>This is the blog for the Apache Beam project. This blog contains news and updates
for the project.</p>

<h3 id="a-classpost-link-hrefblog20160520where-is-my-pcollection-dot-maphtmlwheres-my-pcollectionmapa"><a class="post-link" href="/blog/2016/05/20/where-is-my-pcollection-dot-map.html">Where’s my PCollection.map()?</a></h3>
<p><i>May 20, 2016 • Robert Bradshaw
<h3 id="a-classpost-link-hrefblog20160527where-is-my-pcollection-dot-maphtmlwheres-my-pcollectionmapa"><a class="post-link" href="/blog/2016/05/27/where-is-my-pcollection-dot-map.html">Where’s my PCollection.map()?</a></h3>
<p><i>May 27, 2016 • Robert Bradshaw
</i></p>

<p>Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.</p>

<!-- Render a "read more" button if the post is longer than the excerpt -->

<p>
<a class="btn btn-default btn-sm" href="/blog/2016/05/20/where-is-my-pcollection-dot-map.html#read-more" role="button">
<a class="btn btn-default btn-sm" href="/blog/2016/05/27/where-is-my-pcollection-dot-map.html#read-more" role="button">
Read more&nbsp;<span class="glyphicon glyphicon-menu-right" aria-hidden="true"></span>
</a>
</p>
Expand Down
4 changes: 4 additions & 0 deletions content/capability-matrix/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,11 @@

<div class="container">
<h1 id="apache-beam-capability-matrix">Apache Beam Capability Matrix</h1>
<<<<<<< 23feafc0bb80d835bfd9a05a1d98b0c997aafc84
<p><span style="font-size:11px;float:none">Last updated: 2016-05-27 09:58 PDT</span></p>
=======
<p><span style="font-size:11px;float:none">Last updated: 2016-05-27 09:51 PDT</span></p>
>>>>>>> Update date for blog post

<p>Apache Beam (incubating) provides a portable API layer for building sophisticated data-parallel processing engines that may be executed across a diversity of exeuction engines, or <i>runners</i>. The core concepts of this layer are based upon the Beam Model (formerly referred to as the <a href="http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf">Dataflow Model</a>), and implemented to varying degrees in each Beam runner. To help clarify the capabilities of individual runners, we’ve created the capability matrix below.</p>

Expand Down
17 changes: 12 additions & 5 deletions content/feed.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,22 @@
</description>
<link>http://beam.incubator.apache.org/</link>
<atom:link href="http://beam.incubator.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
<<<<<<< 23feafc0bb80d835bfd9a05a1d98b0c997aafc84
<pubDate>Fri, 27 May 2016 09:58:31 -0700</pubDate>
<lastBuildDate>Fri, 27 May 2016 09:58:31 -0700</lastBuildDate>
=======
<pubDate>Fri, 27 May 2016 09:51:00 -0700</pubDate>
<lastBuildDate>Fri, 27 May 2016 09:51:00 -0700</lastBuildDate>
>>>>>>> Update date for blog post
<generator>Jekyll v3.1.3</generator>

<item>
<title>Where&#39;s my PCollection.map()?</title>
<description>&lt;p&gt;Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.
<description>&lt;p&gt;Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.&lt;/p&gt;

&lt;!--more--&gt;
Though Beam is relatively new, its design draws heavily on many years of experience with real-world pipelines. One of the primary inspirations is &lt;a href=&quot;http://research.google.com/pubs/pub35650.html&quot;&gt;FlumeJava&lt;/a&gt;, which is Google’s internal successor to MapReduce first introduced in 2009.&lt;/p&gt;

&lt;p&gt;Though Beam is relatively new, its design draws heavily on many years of experience with real-world pipelines. One of the primary inspirations is &lt;a href=&quot;http://research.google.com/pubs/pub35650.html&quot;&gt;FlumeJava&lt;/a&gt;, which is Google’s internal successor to MapReduce first introduced in 2009.&lt;/p&gt;

&lt;p&gt;The original FlumeJava API has methods like &lt;code class=&quot;highlighter-rouge&quot;&gt;count&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;parallelDo&lt;/code&gt; on the PCollections. Though slightly more succinct, this approach has many disadvantages to extensibility. Every new user to FlumeJava wanted to add transforms, and adding them as methods to PCollection simply doesn’t scale well. In contrast, a PCollection in Beam has a single &lt;code class=&quot;highlighter-rouge&quot;&gt;apply&lt;/code&gt; method which takes any PTransform as an argument.&lt;/p&gt;

Expand Down Expand Up @@ -93,9 +100,9 @@ PCollection&amp;lt;O&amp;gt; output = input
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Although it’s tempting to add methods to PCollections, such an approach is not scalable, extensible, or sufficiently expressive. Putting a single apply method on PCollection and all the logic into the operation itself lets us have the best of both worlds, and avoids hard cliffs of complexity by having a single consistent style across simple and complex pipelines, and between predefined and user-defined operations.&lt;/p&gt;
</description>
<pubDate>Fri, 20 May 2016 11:00:00 -0700</pubDate>
<link>http://beam.incubator.apache.org/blog/2016/05/20/where-is-my-pcollection-dot-map.html</link>
<guid isPermaLink="true">http://beam.incubator.apache.org/blog/2016/05/20/where-is-my-pcollection-dot-map.html</guid>
<pubDate>Fri, 27 May 2016 09:00:00 -0700</pubDate>
<link>http://beam.incubator.apache.org/blog/2016/05/27/where-is-my-pcollection-dot-map.html</link>
<guid isPermaLink="true">http://beam.incubator.apache.org/blog/2016/05/27/where-is-my-pcollection-dot-map.html</guid>


<category>blog</category>
Expand Down
2 changes: 1 addition & 1 deletion content/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ <h2 id="getting-started-with-apache-beam">Getting Started with Apache Beam</h2>
<h3>Blog</h3>
<div class="list-group">

<a class="list-group-item" href="/blog/2016/05/20/where-is-my-pcollection-dot-map.html">May 20, 2016 - Where's my PCollection.map()?</a>
<a class="list-group-item" href="/blog/2016/05/27/where-is-my-pcollection-dot-map.html">May 27, 2016 - Where's my PCollection.map()?</a>

<a class="list-group-item" href="/blog/2016/05/18/splitAtFraction-method.html">May 18, 2016 - Dynamic work rebalancing for Beam</a>

Expand Down

0 comments on commit 0a1d6c8

Please sign in to comment.