<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Mark Cavage]]></title>
  <link href="http://mcavage.me/atom.xml" rel="self"/>
  <link href="http://mcavage.me/"/>
  <updated>2013-07-24T14:29:43-07:00</updated>
  <id>http://mcavage.me/</id>
  <author>
    <name><![CDATA[Mark Cavage]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Using node_modules in Manta]]></title>
    <link href="http://mcavage.me/blog/2013/07/19/using-node-modules-in-manta/"/>
    <updated>2013-07-19T13:47:00-07:00</updated>
    <id>http://mcavage.me/blog/2013/07/19/using-node-modules-in-manta</id>
    <content type="html"><![CDATA[<h2>How to use Node.js with node_modules in Manta</h2>

<p>Continuing in the series of &ldquo;common real world Manta questions&rdquo;, another common
one we hear a lot, at least from newcomers, is how to run
<a href="http://nodejs.org/">node</a> with <code>node_modules</code>.  If you&rsquo;re not familar with
node, it&rsquo;s basically the same problem as using ruby gems, perl modules, python
eggs, etc.  You have a particular VM, and a particularly built set of add-ons
that go with it, and you want all of that available to run your program.</p>

<p>This post is going to walk you though writing a node program that uses several
add-on modules (including native ones) to accomplish a real world task, which is
an ad-hoc comparison of how
<a href="https://code.google.com/p/chromium-compact-language-detector/">Google&rsquo;s Compact Language Detector</a>
compares to the &ldquo;language&rdquo; attribute that exists on
<a href="https://dev.twitter.com/docs/platform-objects/tweets">tweets</a>.  Because tweets
are obviously very small, I was actually just curious myself how well this
worked, so I built a small <code>node</code> script that uses extra <code>npm</code> modules to test
this out.</p>

<h2>Concepts</h2>

<p>If you&rsquo;re not familiar yet with <a href="http://apidocs.joyent.com/manta">Manta</a>, Manta
is an object store with a twist: you can run compute in-situ on objects stored
there.  While the compute environment comes preloaded with a ton of standard
utilities and libraries, sometimes you need custom code that isn&rsquo;t available, or
is customized in some way.  To accomplish this you leverage two Manta concepts:
<a href="http://apidocs.joyent.com/manta/jobs-reference.html#assets-assets-property">assets</a>
and <a href="http://apidocs.joyent.com/manta/jobs-reference.html#initializing-compute-instances-init-property">init</a>;
often these two are used together, as I will show you here.</p>

<p>The gist is that you create an asset of your necessary code and upload it as an
object to Manta.  When you submit a compute job, you specify the path to that
object as an <code>asset</code> and Manta will automatically make it available to you in
your compute environment on the filesystem, under <code>/assets/$MANTA_USER/...</code>.
While you can then just unpack it as part of your <code>exec</code> line, this is actually
fairly heavyweight, as <code>exec</code> gets run on <em>every</em> input object (recall that if
Manta can, it will optimize by not evicting you from the virtual machine between
every object). <code>init</code> allows you to run this once for the the full slice of
time you get in the compute container.</p>

<h2>Step 0: acquire some data</h2>

<p>Since most of the twitter datasets are for-pay, I needed to get a sample dataset
up.  I wrote a small <code>node2manta</code> daemon that just buffers up tweets into 1GB
files locally and then pushes those up into Manta under a date-based folder.
Beyond pointing you at the source (below, and the only npm module in play
besides <code>manta</code> is <a href="https://github.com/ttezel/twit">twit</a>), I won&rsquo;t go into any
detail as it&rsquo;s pretty straight-forward. In the scheme I used, we have 1 GB of
tweets per object under the scheme <code>/$MANTA_USER/stor/twitter/$DATE.json</code>.  Note
you need a <a href="https://dev.twitter.com/">twitter developer account</a> and application
to fill in the <code>...</code> credentials in this snippet.</p>

<figure class='code'><figcaption><span>pump tweet stream into Manta</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
<span class='line-number'>57</span>
<span class='line-number'>58</span>
<span class='line-number'>59</span>
<span class='line-number'>60</span>
<span class='line-number'>61</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">assert</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;assert&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">manta</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;manta&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">Twit</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;twit&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">util</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;util&#39;</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">M</span> <span class="o">=</span> <span class="nx">manta</span><span class="p">.</span><span class="nx">createClient</span><span class="p">({</span>
</span><span class='line'>  <span class="nx">sign</span><span class="o">:</span> <span class="nx">manta</span><span class="p">.</span><span class="nx">privateKeySigner</span><span class="p">({</span>
</span><span class='line'>    <span class="nx">key</span><span class="o">:</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">HOME</span> <span class="o">+</span> <span class="s1">&#39;/.ssh/id_rsa&#39;</span><span class="p">,</span> <span class="s1">&#39;utf8&#39;</span><span class="p">),</span>
</span><span class='line'>    <span class="nx">keyId</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_KEY_ID</span><span class="p">,</span>
</span><span class='line'>    <span class="nx">user</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span>
</span><span class='line'>  <span class="p">}),</span>
</span><span class='line'>  <span class="nx">user</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">url</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_URL</span>
</span><span class='line'><span class="p">});</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">T</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Twit</span><span class="p">({</span>
</span><span class='line'>  <span class="nx">consumer_key</span><span class="o">:</span> <span class="s1">&#39;...&#39;</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">consumer_secret</span><span class="o">:</span> <span class="s1">&#39;...&#39;</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">access_token</span><span class="o">:</span> <span class="s1">&#39;...&#39;</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">access_token_secret</span><span class="o">:</span> <span class="s1">&#39;...&#39;</span>
</span><span class='line'><span class="p">});</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">bytes</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">f</span> <span class="o">=</span> <span class="s1">&#39;/var/tmp/tweets.json&#39;</span><span class="p">;</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">live</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">fstream</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">createWriteStream</span><span class="p">(</span><span class="nx">f</span><span class="p">,</span> <span class="p">{</span><span class="nx">encoding</span><span class="o">:</span> <span class="s1">&#39;utf8&#39;</span><span class="p">});</span>
</span><span class='line'><span class="nx">fstream</span><span class="p">.</span><span class="nx">once</span><span class="p">(</span><span class="s1">&#39;open&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
</span><span class='line'>  <span class="kd">var</span> <span class="nx">k</span> <span class="o">=</span> <span class="nx">util</span><span class="p">.</span><span class="nx">format</span><span class="p">(</span><span class="s1">&#39;/%s/stor/twitter&#39;</span><span class="p">,</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span><span class="p">);</span>
</span><span class='line'>  <span class="nx">M</span><span class="p">.</span><span class="nx">mkdir</span><span class="p">(</span><span class="nx">k</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="nx">assert</span><span class="p">.</span><span class="nx">ifError</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">live</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>  <span class="p">});</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">tstream</span> <span class="o">=</span> <span class="nx">T</span><span class="p">.</span><span class="nx">stream</span><span class="p">(</span><span class="s1">&#39;statuses/sample&#39;</span><span class="p">);</span>
</span><span class='line'><span class="nx">tstream</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;tweet&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">tweet</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>  <span class="k">if</span> <span class="p">(</span><span class="nx">live</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">s</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">tweet</span><span class="p">)</span> <span class="o">+</span> <span class="s1">&#39;\n&#39;</span><span class="p">;</span>
</span><span class='line'>    <span class="nx">bytes</span> <span class="o">+=</span> <span class="nx">Buffer</span><span class="p">.</span><span class="nx">byteLength</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">fstream</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="nx">bytes</span> <span class="o">&gt;=</span> <span class="mi">1000000000</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>      <span class="nx">live</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
</span><span class='line'>      <span class="nx">fstream</span><span class="p">.</span><span class="nx">end</span><span class="p">();</span>
</span><span class='line'>      <span class="kd">var</span> <span class="nx">k</span> <span class="o">=</span> <span class="nx">util</span><span class="p">.</span><span class="nx">format</span><span class="p">(</span><span class="s1">&#39;/%s/stor/twitter/%s.json&#39;</span><span class="p">,</span>
</span><span class='line'>                          <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span><span class="p">,</span>
</span><span class='line'>                          <span class="k">new</span> <span class="nb">Date</span><span class="p">().</span><span class="nx">toISOString</span><span class="p">());</span>
</span><span class='line'>      <span class="kd">var</span> <span class="nx">opts</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">type</span><span class="o">:</span> <span class="s1">&#39;application/json&#39;</span>
</span><span class='line'>      <span class="p">};</span>
</span><span class='line'>      <span class="nx">fstream</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">createReadStream</span><span class="p">(</span><span class="nx">f</span><span class="p">);</span>
</span><span class='line'>      <span class="nx">M</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="nx">k</span><span class="p">,</span> <span class="nx">fstream</span><span class="p">,</span> <span class="nx">opts</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">assert</span><span class="p">.</span><span class="nx">ifError</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
</span><span class='line'>        <span class="nx">fstream</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">createWriteStream</span><span class="p">(</span><span class="nx">f</span><span class="p">,</span> <span class="p">{</span><span class="nx">encoding</span><span class="o">:</span> <span class="s1">&#39;utf8&#39;</span><span class="p">});</span>
</span><span class='line'>        <span class="nx">fstream</span><span class="p">.</span><span class="nx">once</span><span class="p">(</span><span class="s1">&#39;open&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
</span><span class='line'>          <span class="nx">live</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>          <span class="nx">bytes</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>        <span class="p">});</span>
</span><span class='line'>      <span class="p">});</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'>  <span class="p">}</span>
</span><span class='line'><span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>


<p>After running that script for a while, I had this:</p>

<pre><code>$ mls /$MANTA_USER/stor/twitter
2013-07-23T00:11:23.772Z.json
2013-07-23T01:47:49.732Z.json
2013-07-23T03:18:16.774Z.json
2013-07-23T04:49:40.730Z.json
2013-07-23T06:41:58.752Z.json
2013-07-23T09:03:19.772Z.json
2013-07-23T11:20:00.741Z.json
2013-07-23T13:07:43.800Z.json
2013-07-23T14:37:33.797Z.json
2013-07-23T16:03:44.764Z.json
2013-07-23T17:36:13.063Z.json
</code></pre>

<h2>Step 1: mlogin, and write your code</h2>

<p>Ok so we&rsquo;ve got some data, now it&rsquo;s time to write our map script.  In this case
I&rsquo;m going to develop the <em>entire workflow</em> out of Manta using
<a href="http://apidocs.joyent.com/manta/mlogin.html">mlogin</a>.  If you&rsquo;ve not seen
<code>mlogin</code> before it&rsquo;s basically the REPL of Manta.  <code>mlogin</code> allows you to login
to a temporary compute container with one of your objects mounted.  This is
actually critical for us in building an asset with <code>node_modules</code> as we need an
OS environment (i.e., compilers, shared libraries, etc) that matches what our
code will run on.  So, I just fired up <code>mlogin</code>, and setup my project with <code>npm</code>
(the <code>export HOME</code> bit is only to make
<a href="https://github.com/TooTallNate/node-gyp">gyp</a> happy).  Then I just hacked out a
script <em>in the manta VM</em> by prototyping with this:</p>

<pre><code>$ mlogin /mark.cavage/stor/twitter/2013-07-23T00:11:23.772Z.json
mark.cavage@manta # export HOME=/root
mark.cavage@manta # cd $HOME
mark.cavage@manta # npm install cld
mark.cavage@manta # emacs lang.js
mark.cavage@manta # head -1 $MANTA_INPUT_FILE | node lang.js
</code></pre>

<p>And the script I ended up with was:</p>

<figure class='code'><figcaption><span>Parsing a stream of newline separated tweets</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">cld</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;cld&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">readline</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;readline&#39;</span><span class="p">);</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">sprintf</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;util&#39;</span><span class="p">).</span><span class="nx">format</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'><span class="kd">var</span> <span class="nx">FMT</span> <span class="o">=</span> <span class="s1">&#39;%d %s %s&#39;</span><span class="p">;</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">rl</span> <span class="o">=</span> <span class="nx">readline</span><span class="p">.</span><span class="nx">createInterface</span><span class="p">({</span>
</span><span class='line'>  <span class="nx">input</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">stdin</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">output</span><span class="o">:</span> <span class="kc">false</span>
</span><span class='line'><span class="p">});</span>
</span><span class='line'>
</span><span class='line'><span class="nx">rl</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;line&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">l</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>  <span class="k">try</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">obj</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">l</span><span class="p">);</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">lang</span> <span class="o">=</span> <span class="nx">cld</span><span class="p">.</span><span class="nx">detect</span><span class="p">(</span><span class="nx">obj</span><span class="p">.</span><span class="nx">text</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">sprintf</span><span class="p">(</span><span class="nx">FMT</span><span class="p">,</span> <span class="p">(</span><span class="nx">lang</span><span class="p">.</span><span class="nx">code</span> <span class="o">===</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">lang</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">),</span> <span class="nx">lang</span><span class="p">.</span><span class="nx">code</span><span class="p">,</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">lang</span><span class="p">));</span>
</span><span class='line'>  <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{}</span>
</span><span class='line'><span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>


<p>So every tweet gets mapped to a 3 column output of <code>$match $cld $twitter</code>, which
we can reduce on.  Anyway, now that we&rsquo;ve got this coded up, let&rsquo;s tar and save
into manta (again, from the <code>mlogin</code> session):</p>

<pre><code>mark.cavage@manta # tar -cf tweet_lang_detect.tar lang.js node_modules
mark.cavage@manta # mput -f tweet_lang_detect.tar /$MANTA_USER/stor
...avage/stor/tweet_lang_detect.tar ==========================&gt;] 100%  14.00MB
mark.cavage@manta #
</code></pre>

<p>To be pedantic while we&rsquo;re here, we&rsquo;ll go ahead an write the reduce step as
well, even though it&rsquo;s trivial.  I&rsquo;m just going to output two numbers, the first
being the number of matches, and the second being the total dataset size.  Note
the reduce line below uses <a href="http://apidocs.joyent.com/manta/maggr.html">maggr</a>,
which is just a simple &ldquo;math utility&rdquo; Manta provides for common summing/average
operations. Other uses report success using
<a href="https://code.google.com/p/crush-tools/">crush-tools</a>.  Use what you like,
that&rsquo;s the power of Manta :)</p>

<pre><code>mark.cavage@manta # head -10 $MANTA_INPUT_FILE | node lang.js | maggr -c1='sum,count'
6,10
mark.cavage@manta #
</code></pre>

<p>So given 10 inputs, we&rsquo;ve got a 60% success rate with <code>cld</code>. Let&rsquo;s see how it
does on a larger sample set.</p>

<p>You can now exit the <code>mlogin</code> session, we&rsquo;re ready to rock.</p>

<h2>Step 2: Run a Map/Reduce Job</h2>

<p>Ok, so to recap, we hacked up a map/reduce script with an asset using
<code>mlogin</code>, and now we want to run a job on our dataset.  Twitter
throttles your ability to suck their feed pretty aggressively, so by
the time I wrote this blog I only had 11GB of data. That said, they&rsquo;re
just text files, so that should be a fairly large number of tweets.
Let&rsquo;s see how it does:</p>

<pre><code>$ mfind -t o /$MANTA_USER/stor/twitter | \
    mjob create -o -s /$MANTA_USER/stor/tweet_lang_detect.tar \
                --init 'tar -xf /assets/$MANTA_USER/stor/tweet_lang_detect.tar' \
                -m 'node lang.js' \
                -r 'maggr -c1="sum,count"'
added 11 inputs to f7af6bcf-2126-4b1d-b9d5-c0f25a162786
2610121,3860111
</code></pre>

<p>How did I figure out what the <code>-s</code> and <code>--init</code> options should be you ask?
Simple, I ran <code>mlogin</code> again with <code>-s</code> specified, and tested out what my untar
line should be.</p>

<p>Side point, if you&rsquo;re interested, this initial prototype took 1m19s to run. An
enterprising individual would likely be able to cut that (at least) in half by
&ldquo;pre-reducing&rdquo; as part of the map phase; in my case, ~1m latency was fine,
because I&rsquo;m lazy.  Also, the entire time it took me to prototype this from no
code to actually having my answer was about 20m (not counting the time it took
to ingest data &ndash; I just ran that overnight).</p>

<h2>Step 3: There is no step 3</h2>

<p>We&rsquo;re actually done now.  Clearly you could go figure out more interesting
statistics here, but really I just wanted to quickly see what <code>cld</code> did on a
reasonable dataset (I had ~3.8M tweets); it turned out surprisingly close to my
original prototype of 60% (~67% accurate on tweets).</p>

<p>Also, while this example used <code>node</code> to illustrate a real world &ldquo;custom code&rdquo;
problem, the same technique would apply to <code>python</code>, <code>ruby</code>, etc.; you need
to get your &ldquo;tarball&rdquo; built up in the same way, and just push it back into
Manta for future jobs to consume.</p>

<p>Hopefully that helps!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Uploading Files to Manta from a browser]]></title>
    <link href="http://mcavage.me/blog/2013/07/10/uploading-files-to-manta-from-a-browser/"/>
    <updated>2013-07-10T09:09:00-07:00</updated>
    <id>http://mcavage.me/blog/2013/07/10/uploading-files-to-manta-from-a-browser</id>
    <content type="html"><![CDATA[<h2>How to use Signed URLs and CORS to Upload Files to Manta</h2>

<p>One of the very first questions that came up after launching
<a href="https://www.joyent.com/products/manta">Manta</a> was how to have a browser
directly upload an object into Manta.  There are several reasons you would want
this, but most obvious is that it allows clients to bypass your web server(s),
which (1) reduces your bandwidth costs, and (2) reduces client latency.
Unfortunately the browser security model is fairly complicated here as
<a href="http://www.w3.org/TR/cors/">CORS</a> is brought into play.  To answer this
question, I created a small (and ugly!)
<a href="https://github.com/mcavage/manta-browser-upload">sample application</a> to
illustrate how to accomplish this.  This blog post will explain the tricky
pieces of the sample application; it will not stand alone without looking
at the sample code.</p>

<h2>A few concepts</h2>

<h3>Signed URLs</h3>

<p>There seems to be some general confusion around what
<a href="https://apidocs.joyent.com/manta/api.html#signed-urls">signed urls</a> are used
for.  Basically, signed URLs are a way for you to hand out an expiring &ldquo;ticket&rdquo;
that allows somebody else to see a single, private Manta object.  For example,
suppose I have an MP3 that I wanted to share with a few friends over email;
rather than putting the file in <code>/:login/public</code>, and getting sued by the RIAA,
I would place it in <code>/:login/stor/song.mp3</code>, generate a signed URL to it,
and just send the URL to them.
<a href="https://apidocs.joyent.com/manta/msign.html">msign</a> is the command line utility
that will generate a presigned URL, but in this example we&rsquo;ll be generating it
programatically.</p>

<h3>CORS</h3>

<p><a href="http://www.w3.org/TR/cors/">CORS</a>, or &ldquo;Cross-Origin-Resource-Sharing&rdquo; is a
mechanism that allows browsers to access resources that did not originate on the
same domain.  While functional, it&rsquo;s very complicated (personally, there is
little else in the web world I hate more than CORS); for a gentler introduction
than the W3C spec, see the
<a href="https://developer.mozilla.org/en-US/docs/HTTP/Access_control_CORS">MDN page</a>.
Manta fully supports CORS on a per-directory and per-object basis, so that you
are empowered to be as restrictive or permissive as you like.  To achieve direct
upload, you will need to set CORS headers on your directories.  In the examples
below, I&rsquo;ve basically set them &ldquo;wide open.&rdquo;</p>

<h2>The General Flow</h2>

<p>The gist is that you are still running some web application, albeit a light one,
that a browser does interact with to get &ldquo;tickets,&rdquo; that allow the browser to
directly write into some object under your control.  There are other ways to
accomplish doling out &ldquo;tickets,&rdquo; but this is the most practical.  In the example
I made, each browser session gets their own &ldquo;dropbox,&rdquo; that the server sets up
for them (in reality it would be tied to your webapp&rsquo;s users, not a session).
The browser has some little HTML form, and when the user selects a file and
submits the HTML form, the browser asks your webserver for a location to upload.
Your webapp generates a Manta
<a href="https://apidocs.joyent.com/manta/api.html#signed-urls">signed url</a>, and gives
that to the browser.  The browser then creates an Ajax request and sends the
bytes up. Here&rsquo;s an illustration of all that text:</p>

<p><img src="http://mcavage.me/images/browser_upload_diagram.svg" alt="Upload Sequence Diagram" /></p>

<p>Of course the devil is in the details&hellip;</p>

<h3>Our Storage Layout</h3>

<p>In this example I&rsquo;m using <code>/$MANTA_USER/stor/dropbox</code>, as the root for uploads.
Note that <code>/:login/stor/</code> is &ldquo;private,&rdquo;  so only you can read and write data.
For each browser session that comes in, our webserver creates
<code>/:login/stor/dropbox/:session</code> (which is just a random number in this example).</p>

<p>When a user selects a file to upload, we send it to
<code>/:login/stor/dropbox/:session/:filename</code>.  If the user uploads the same file
multiple times, it just gets overwritten.</p>

<h2>The Web Server</h2>

<p>We&rsquo;ll start with an examination of the important parts of the web server.  I
used no dependencies in this example so there&rsquo;s no confusion about which toolkit
makes more sense, etc.; it&rsquo;s all just &ldquo;straight up&rdquo;
<a href="http://nodejs.org/api/http.html">node http</a>. I&rsquo;m not going to walk through
every line of the example application, but instead just give some more context
on the particularly tricky parts that may not be clear.</p>

<h3>Creating a per-session directory</h3>

<p>When we see a new user session (to reemphasize again, you assuredly want this
based off your user&rsquo;s name or id or something), we create a &ldquo;private directory&rdquo;.
Ignoring all the setup and node HTTP stuff, here&rsquo;s the bits that creates a
per-session directory &ndash; I&rsquo;ve slightly modified the example code here to be
readable out of context:</p>

<figure class='code'><figcaption><span>Creating a Directory with CORS options</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="c1">// &quot;cookie&quot; is the sessionid</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">dir</span> <span class="o">=</span> <span class="s1">&#39;/&#39;</span> <span class="o">+</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span> <span class="o">+</span> <span class="s1">&#39;/stor/dropbox/&#39;</span> <span class="o">+</span> <span class="nx">cookie</span><span class="p">;</span>
</span><span class='line'><span class="kd">var</span> <span class="nx">opts</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>    <span class="nx">headers</span><span class="o">:</span> <span class="p">{</span>
</span><span class='line'>            <span class="s1">&#39;access-control-allow-headers&#39;</span><span class="o">:</span> <span class="s1">&#39;access-control-allow-origin, accept, origin, content-type&#39;</span><span class="p">,</span>
</span><span class='line'>            <span class="s1">&#39;access-control-allow-methods&#39;</span><span class="o">:</span> <span class="s1">&#39;PUT,GET,HEAD,DELETE&#39;</span><span class="p">,</span>
</span><span class='line'>            <span class="s1">&#39;access-control-allow-origin&#39;</span><span class="o">:</span> <span class="s1">&#39;*&#39;</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'><span class="p">};</span>
</span><span class='line'><span class="nx">mantaClient</span><span class="p">.</span><span class="nx">mkdir</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span> <span class="nx">opts</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="nx">assert</span><span class="p">.</span><span class="nx">ifError</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1">// HTML is just the static string of our webapp</span>
</span><span class='line'>    <span class="nx">res</span><span class="p">.</span><span class="nx">setHeader</span><span class="p">(</span><span class="s1">&#39;set-cookie&#39;</span><span class="p">,</span> <span class="s1">&#39;name=&#39;</span> <span class="o">+</span> <span class="nx">res</span><span class="p">.</span><span class="nx">cookie</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">res</span><span class="p">.</span><span class="nx">setHeader</span><span class="p">(</span><span class="s1">&#39;content-type&#39;</span><span class="p">,</span> <span class="s1">&#39;text/html&#39;</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">res</span><span class="p">.</span><span class="nx">setHeader</span><span class="p">(</span><span class="s1">&#39;content-length&#39;</span><span class="p">,</span> <span class="nx">Buffer</span><span class="p">.</span><span class="nx">byteLength</span><span class="p">(</span><span class="nx">HTML</span><span class="p">));</span>
</span><span class='line'>    <span class="nx">res</span><span class="p">.</span><span class="nx">writeHead</span><span class="p">(</span><span class="mi">200</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">res</span><span class="p">.</span><span class="nx">end</span><span class="p">(</span><span class="nx">HTML</span><span class="p">);</span>
</span><span class='line'><span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>


<p>The important aspect to point out here is the <code>header</code> block we pass into the
options block on <code>mkdir</code>.  When a write request comes into Manta in a &ldquo;CORS
scenario,&rdquo; the server honors the CORS settings <em>on the parent directory</em>.  So
setting up the requisite CORS headers on the directory we want to write into
allows the browser to go through all the preflight garbage and send the headers
it needs to for uploading an object directly.</p>

<h3>Signing a Request</h3>

<p>This portion is actually pretty straightforward and handled by the
<a href="https://github.com/joyent/node-manta">Manta SDK</a>.  The only thing of interest
here is that we&rsquo;re signing a request to the given URL with two methods:
<code>OPTIONS</code> and <code>PUT</code>.  Normally you&rsquo;d only hand out a signed URL with one method
signed, but for this case as the browser preflighs the request with the same URL
we need the server side to honor both.  Again I&rsquo;ve slightly modified the example
application code here:</p>

<figure class='code'><figcaption><span>Signing a URL for the browser</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">var</span> <span class="nx">body</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">;</span>
</span><span class='line'><span class="nx">req</span><span class="p">.</span><span class="nx">setEncoding</span><span class="p">(</span><span class="s1">&#39;utf8&#39;</span><span class="p">);</span>
</span><span class='line'><span class="nx">req</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="nx">body</span> <span class="o">+=</span> <span class="nx">chunk</span><span class="p">;</span>
</span><span class='line'><span class="p">});</span>
</span><span class='line'>
</span><span class='line'><span class="nx">req</span><span class="p">.</span><span class="nx">once</span><span class="p">(</span><span class="s1">&#39;end&#39;</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">params</span> <span class="o">=</span> <span class="nx">qs</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">body</span><span class="p">)</span> <span class="o">||</span> <span class="p">{};</span>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">params</span><span class="p">.</span><span class="nx">file</span><span class="p">)</span>
</span><span class='line'>      <span class="c1">// send error</span>
</span><span class='line'>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">p</span> <span class="o">=</span> <span class="s1">&#39;/&#39;</span> <span class="o">+</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_USER</span> <span class="o">+</span> <span class="s1">&#39;/stor/dropbox/&#39;</span> <span class="o">+</span> <span class="nx">cookie</span> <span class="o">+</span> <span class="s1">&#39;/&#39;</span> <span class="o">+</span> <span class="nx">params</span><span class="p">.</span><span class="nx">file</span><span class="p">;</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">opts</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">expires</span><span class="o">:</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">().</span><span class="nx">getTime</span><span class="p">()</span> <span class="o">+</span> <span class="p">(</span><span class="mi">3600</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">),</span> <span class="c1">// 1hr</span>
</span><span class='line'>        <span class="nx">path</span><span class="o">:</span> <span class="nx">p</span>
</span><span class='line'>        <span class="nx">method</span><span class="o">:</span> <span class="p">[</span><span class="s1">&#39;OPTIONS&#39;</span><span class="p">,</span> <span class="s1">&#39;PUT&#39;</span><span class="p">],</span>
</span><span class='line'>    <span class="p">};</span>
</span><span class='line'>    <span class="nx">mantaClient</span><span class="p">.</span><span class="nx">signURL</span><span class="p">(</span><span class="nx">opts</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">signature</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">assert</span><span class="p">.</span><span class="nx">ifError</span><span class="p">(</span><span class="nx">err</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>        <span class="kd">var</span> <span class="nx">signed</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">({</span>
</span><span class='line'>            <span class="nx">url</span><span class="o">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">MANTA_URL</span> <span class="o">+</span> <span class="nx">signature</span>
</span><span class='line'>        <span class="p">});</span>
</span><span class='line'>        <span class="nx">res</span><span class="p">.</span><span class="nx">setHeader</span><span class="p">(</span><span class="s1">&#39;content-type&#39;</span><span class="p">,</span> <span class="s1">&#39;application/json&#39;</span><span class="p">);</span>
</span><span class='line'>        <span class="nx">res</span><span class="p">.</span><span class="nx">setHeader</span><span class="p">(</span><span class="s1">&#39;content-length&#39;</span><span class="p">,</span> <span class="nx">Buffer</span><span class="p">.</span><span class="nx">byteLength</span><span class="p">(</span><span class="nx">signed</span><span class="p">));</span>
</span><span class='line'>        <span class="nx">res</span><span class="p">.</span><span class="nx">writeHead</span><span class="p">(</span><span class="mi">200</span><span class="p">);</span>
</span><span class='line'>        <span class="nx">res</span><span class="p">.</span><span class="nx">end</span><span class="p">(</span><span class="nx">signed</span><span class="p">);</span>
</span><span class='line'>    <span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>


<p>So in the example above, the webapp <code>POST</code>&rsquo;d a form to us with the file name
the user wants to upload.  A &ldquo;real app&rdquo; would want to sanitize that, and stop
<a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">CSRF</a> attacks, etc.,
but that&rsquo;s outside the scope of this little application.  Here we blindly sign
it, and spit the URL back to the browser.</p>

<p>At this point we&rsquo;re basically done with what our little webapp needed to do.</p>

<h2>Client Side</h2>

<p>First, a disclaimer: I am <em>awful</em> at client-side code, so please don&rsquo;t be wed to
anything I did here.  Anyway, so I made a single HTML page with an upload form
and some <a href="http://jquery.com/">jQuery</a> pieces:  specifically I used their
<a href="http://api.jquery.com/jQuery.ajax/">Ajax API</a> where it made sense, along with
their <a href="http://api.jquery.com/category/forms/">Form Helpers</a>.  Lastly, so you can
see progress information, I stuck in a
<a href="http://jqueryui.com/progressbar/">progress bar</a>.</p>

<p>Ok, enough preamble, let&rsquo;s see the code!</p>

<figure class='code'><figcaption><span>The tried and true form</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='html'><span class='line'><span class="nt">&lt;form</span> <span class="na">enctype=</span><span class="s">&quot;multipart/form-data&quot;</span><span class="nt">&gt;</span>
</span><span class='line'>  <span class="nt">&lt;input</span> <span class="na">name=</span><span class="s">&quot;file&quot;</span> <span class="na">type=</span><span class="s">&quot;file&quot;</span> <span class="na">accept=</span><span class="s">&quot;image/jpeg&quot;</span> <span class="nt">/&gt;</span>
</span><span class='line'>  <span class="nt">&lt;input</span> <span class="na">type=</span><span class="s">&quot;button&quot;</span> <span class="na">value=</span><span class="s">&quot;Upload&quot;</span> <span class="nt">/&gt;</span>
</span><span class='line'><span class="nt">&lt;/form&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p>Yup, that&rsquo;s a form.  What do we do when the user submits? As per our flows
above, we first need to request a place to write the file to, so we ask the
webserver to sign the file name &ndash; note this is &ldquo;straight up&rdquo; jQuery and ajax,
nothing fancy about this:</p>

<figure class='code'><figcaption><span>jQuery Sign The filename</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="nx">$</span><span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">file</span><span class="p">;</span>
</span><span class='line'>    <span class="nx">$</span><span class="p">(</span><span class="s1">&#39;:file&#39;</span><span class="p">).</span><span class="nx">change</span><span class="p">(</span><span class="kd">function</span><span class="p">(){</span>
</span><span class='line'>        <span class="nx">file</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">files</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
</span><span class='line'>    <span class="p">});</span>
</span><span class='line'>
</span><span class='line'>    <span class="nx">$</span><span class="p">(</span><span class="s2">&quot;:button&quot;</span><span class="p">).</span><span class="nx">click</span><span class="p">(</span><span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
</span><span class='line'>        <span class="nx">$</span><span class="p">.</span><span class="nx">ajax</span><span class="p">({</span>
</span><span class='line'>            <span class="nx">url</span><span class="o">:</span> <span class="s1">&#39;sign&#39;</span><span class="p">,</span>
</span><span class='line'>            <span class="nx">type</span><span class="o">:</span> <span class="s1">&#39;POST&#39;</span><span class="p">,</span>
</span><span class='line'>            <span class="nx">data</span><span class="o">:</span> <span class="p">{</span>
</span><span class='line'>                <span class="nx">file</span><span class="o">:</span> <span class="nx">file</span><span class="p">.</span><span class="nx">name</span>
</span><span class='line'>            <span class="p">}</span>
</span><span class='line'>        <span class="p">}).</span><span class="nx">done</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">signature</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>            <span class="c1">// signature looks like</span>
</span><span class='line'>            <span class="c1">// {</span>
</span><span class='line'>            <span class="c1">//   url: &#39;https://...&#39;</span>
</span><span class='line'>            <span class="c1">// }</span>
</span><span class='line'>        <span class="p">});</span>
</span><span class='line'>    <span class="p">});</span>
</span></code></pre></td></tr></table></div></figure>


<h3>Uploading the file to Manta</h3>

<p>At long last, we can now push the raw bytes into Manta.  Our own webserver gave
us a URL we can write to for an hour.  We now make an
<a href="http://www.w3.org/TR/XMLHttpRequest2/">XHR2</a> request and directly <code>PUT</code> the
data.</p>

<figure class='code'><figcaption><span>XHR2.send() data</span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='javascript'><span class='line'><span class="kd">function</span> <span class="nx">onSignature</span><span class="p">(</span><span class="nx">signature</span><span class="p">)</span> <span class="p">{</span>
</span><span class='line'>    <span class="kd">var</span> <span class="nx">xhr</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">XMLHttpRequest</span><span class="p">();</span>
</span><span class='line'>
</span><span class='line'>    <span class="nx">xhr</span><span class="p">.</span><span class="nx">open</span><span class="p">(</span><span class="s1">&#39;PUT&#39;</span><span class="p">,</span> <span class="nx">signature</span><span class="p">.</span><span class="nx">url</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">xhr</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s1">&#39;accept&#39;</span><span class="p">,</span> <span class="s1">&#39;application/json&#39;</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">xhr</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s1">&#39;access-control-allow-origin&#39;</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">xhr</span><span class="p">.</span><span class="nx">setRequestHeader</span><span class="p">(</span><span class="s1">&#39;content-type&#39;</span><span class="p">,</span> <span class="s1">&#39;image/jpeg&#39;</span><span class="p">);</span>
</span><span class='line'>    <span class="nx">xhr</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">file</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>A few notes:</p>

<ul>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest?redirectlocale=en-US&amp;redirectslug=DOM%2FXMLHttpRequest#send(">XHR2.send()</a>)
is the only way I know of to actually send an uninterpreted byte stream. XHR
and jQuery&rsquo;s Ajax both send a <code>multipart</code> framed message, which Manta will not
interpret; meaning you would end up with an object that still has HTTP framing
noise in it.</li>
<li>In this example I sent <code>access-control-allow-origin: *</code>.  That&rsquo;s specifically
so that <em>future GET requests</em> will work from a web browser as well.  When
reading an object, the CORS semantics are inferred from the object itself.</li>
</ul>


<p>That&rsquo;s pretty much it &ndash; once the browser completes you can see it using <code>mls</code> or
whatever other tool you want.</p>

<h2>Conclusion</h2>

<p>This article explained how to construct a web application that allows clients to
directly upload to <a href="https://www.joyent.com/products/manta">Manta</a>.  It
highlighted the relevant portions of a
<a href="https://github.com/mcavage/manta-browser-upload">sample application</a> I created
that does this using <a href="http://api.jquery.com/jQuery.ajax/">Ajax</a> and
<a href="http://www.w3.org/TR/XMLHttpRequest2/">XHR Level 2</a>.  Comments welcome!</p>
]]></content>
  </entry>
  
</feed>
