sergeant-caffeinated/docs/index.html


								<!DOCTYPE html>

								<!-- Generated by pkgdown: do not edit by hand --><html>

								<head>

								<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

								<meta charset="utf-8">

								<meta http-equiv="X-UA-Compatible" content="IE=edge">

								<meta name="viewport" content="width=device-width, initial-scale=1.0">

								<title>Tools to Transform and Query Data with 'Apache' 'Drill' • sergeant</title>

								<!-- jquery --><script src="https://code.jquery.com/jquery-3.1.0.min.js" integrity="sha384-nrOSfDHtoPMzJHjVTdCopGqIqeYETSXhZDFyniQ8ZHcVy08QesyHcnOUpMpqnmWq" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">

								<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script><!-- Font Awesome icons --><link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-T8Gy5hrqNKT+hzMclPo118YTQO6cYprQmhrYwIiQ/3axmI1hQomh7Ud2hPOy8SP1" crossorigin="anonymous">

								<!-- pkgdown --><link href="pkgdown.css" rel="stylesheet">

								<script src="jquery.sticky-kit.min.js"></script><script src="pkgdown.js"></script><!-- mathjax --><script src="https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><!--[if lt IE 9]>

								<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>

								<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>

								<![endif]-->

								</head>

								<body>

								    <div class="container template-vignette">

								      <header><div class="navbar navbar-default navbar-fixed-top" role="navigation">

								  <div class="container">

								    <div class="navbar-header">

								      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">

								        <span class="icon-bar"></span>

								        <span class="icon-bar"></span>

								        <span class="icon-bar"></span>

								      </button>

								      <a class="navbar-brand" href="index.html">sergeant</a>

								    </div>

								    <div id="navbar" class="navbar-collapse collapse">

								      <ul class="nav navbar-nav">

								<li>

								  <a href="reference/index.html">Reference</a>

								</li>

								<li>

								  <a href="news/index.html">News</a>

								</li>

								      </ul>

								<ul class="nav navbar-nav navbar-right">

								<li>

								  <a href="http://github.com/hrbrmstr/sergeant">

								    <span class="fa fa-github fa-lg"></span>


								  </a>

								</li>

								      </ul>

								</div>

								<!--/.nav-collapse -->

								  </div>

								<!--/.container -->

								</div>

								<!--/.navbar -->


								      </header><div class="row">

								  <div class="col-md-9">


								<div class="contents">

								<!-- README.md is generated from README.Rmd. Please edit that file -->


								<p><img src="sergeant.png" width="33" align="left" style="padding-right:20px"></p>

								<p><code>sergeant</code> : Tools to Transform and Query Data with ‘Apache’ ‘Drill’</p>

								<p>Drill + <code>sergeant</code> is (IMO) a nice alternative to Spark + <code>sparklyr</code> if you don’t need the ML components of Spark (i.e. just need to query “big data” sources, need to interface with parquet, need to combine disparate data source types — json, csv, parquet, rdbms - for aggregation, etc). Drill also has support for spatial queries.</p>

								<p>I find writing SQL queries to parquet files with Drill on a local linux or macOS workstation to be more performant than doing the data ingestion work with R (for large or disperate data sets). I also work with many tiny JSON files on a daily basis and Drill makes it much easier to do so. YMMV.</p>

								<p>You can download Drill from <a href="https://drill.apache.org/download/" class="uri">https://drill.apache.org/download/</a> (use “Direct File Download”). I use <code>/usr/local/drill</code> as the install directory. <code>drill-embedded</code> is a super-easy way to get started playing with Drill on a single workstation and most of my workflows can get by using Drill this way. If there is sufficient desire for an automated downloader and a way to start the <code>drill-embedded</code> server from within R, please file an issue.</p>

								<p>There are a few convenience wrappers for various informational SQL queries (like <code><a href="reference/drill_version.html">drill_version()</a></code>). Please file an PR if you add more.</p>

								<p>The package has been written with retrieval of rectangular data sources in mind. If you need/want a version of <code><a href="reference/drill_query.html">drill_query()</a></code> that will enable returning of non-rectangular data (which is possible with Drill) then please file an issue.</p>

								<p>Some of the more “controlling vs data ops” REST API functions aren’t implemented. Please file a PR if you need those.</p>

								<p>Finally, I run most of this locally and at home, so it’s all been coded with no authentication or encryption in mind. If you want/need support for that, please file an issue. If there is demand for this, it will change the R API a bit (I’ve already thought out what to do but have no need for it right now).</p>

								<p>The following functions are implemented:</p>

								<p><strong><code>DBI</code></strong></p>

								<ul>

								<li>As complete of an R <code>DBI</code> driver has been implemented using the Drill REST API, mostly to facilitate the <code>dplyr</code> interface. Use the <code>RJDBC</code> driver interface if you need more <code>DBI</code> functionality.</li>

								<li>This also means that SQL functions unique to Drill have also been “implemented” (i.e. made accessible to the <code>dplyr</code> interface). If you have custom Drill SQL functions that need to be implemented please file an issue on GitHub.</li>

								</ul>

								<p><strong><code>RJDBC</code></strong></p>

								<ul>

								<li>

								<code>drill_jdbc</code>: Connect to Drill using JDBC, enabling use of said idioms. See <code>RJDBC</code> for more info.</li>

								<li>NOTE: The DRILL JDBC driver fully-qualified path must be placed in the <code>DRILL_JDBC_JAR</code> environment variable. This is best done via <code>~/.Renviron</code> for interactive work. i.e. <code>DRILL_JDBC_JAR=/usr/local/drill/jars/drill-jdbc-all-1.9.0.jar</code>

								</li>

								</ul>

								<p><strong><code>dplyr</code></strong>:</p>

								<ul>

								<li>

								<code>src_drill</code>: Connect to Drill (using dplyr) + supporting functions</li>

								</ul>

								<p>See <code>dplyr</code> for the <code>dplyr</code> operations (light testing shows they work in basic SQL use-cases but Drill’s SQL engine has issues with more complex queries).</p>

								<p><strong>Drill APIs</strong>:</p>

								<ul>

								<li>

								<code>drill_connection</code>: Setup parameters for a Drill server/cluster connection</li>

								<li>

								<code>drill_active</code>: Test whether Drill HTTP REST API server is up</li>

								<li>

								<code>drill_cancel</code>: Cancel the query that has the given queryid</li>

								<li>

								<code>drill_jdbc</code>: Connect to Drill using JDBC</li>

								<li>

								<code>drill_metrics</code>: Get the current memory metrics</li>

								<li>

								<code>drill_options</code>: List the name, default, and data type of the system and session options</li>

								<li>

								<code>drill_profile</code>: Get the profile of the query that has the given query id</li>

								<li>

								<code>drill_profiles</code>: Get the profiles of running and completed queries</li>

								<li>

								<code>drill_query</code>: Submit a query and return results</li>

								<li>

								<code>drill_set</code>: Set Drill SYSTEM or SESSION options</li>

								<li>

								<code>drill_settings_reset</code>: Changes (optionally, all) session settings back to system defaults</li>

								<li>

								<code>drill_show_files</code>: Show files in a file system schema.</li>

								<li>

								<code>drill_show_schemas</code>: Returns a list of available schemas.</li>

								<li>

								<code>drill_stats</code>: Get Drillbit information, such as ports numbers</li>

								<li>

								<code>drill_status</code>: Get the status of Drill</li>

								<li>

								<code>drill_storage</code>: Get the list of storage plugin names and configurations</li>

								<li>

								<code>drill_system_reset</code>: Changes (optionally, all) system settings back to system defaults</li>

								<li>

								<code>drill_threads</code>: Get information about threads</li>

								<li>

								<code>drill_uplift</code>: Turn a columnar query results into a type-converted tbl</li>

								<li>

								<code>drill_use</code>: Change to a particular schema.</li>

								<li>

								<code>drill_version</code>: Identify the version of Drill running</li>

								</ul>

								<div id="installation" class="section level3">

								<h3 class="hasAnchor">

								<a href="#installation" class="anchor"></a>Installation</h3>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">devtools<span class="op">::</span><span class="kw">install_github</span>(<span class="st">"hrbrmstr/sergeant"</span>)</code></pre></div>

								</div>

								<div id="experimental-dplyr-interface" class="section level3">

								<h3 class="hasAnchor">

								<a href="#experimental-dplyr-interface" class="anchor"></a>Experimental <code>dplyr</code> interface</h3>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(sergeant)


								ds &lt;-<span class="st"> </span><span class="kw"><a href="reference/src_drill.html">src_drill</a></span>(<span class="st">"localhost"</span>)  <span class="co"># use localhost if running standalone on same system otherwise the host or IP of your Drill server</span>

								ds

								<span class="co">#&gt; src:  DrillConnection</span>

								<span class="co">#&gt; tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys</span>


								db &lt;-<span class="st"> </span><span class="kw">tbl</span>(ds, <span class="st">"cp.`employee.json`"</span>)


								<span class="co"># without `collect()`:</span>

								<span class="kw">count</span>(db, gender, marital_status)

								<span class="co">#&gt; # Source:   lazy query [?? x 3]</span>

								<span class="co">#&gt; # Database: DrillConnection</span>

								<span class="co">#&gt; # Groups:   gender</span>

								<span class="co">#&gt;   marital_status gender     n</span>

								<span class="co">#&gt;            &lt;chr&gt;  &lt;chr&gt; &lt;int&gt;</span>

								<span class="co">#&gt; 1              S      F   297</span>

								<span class="co">#&gt; 2              M      M   278</span>

								<span class="co">#&gt; 3              S      M   276</span>

								<span class="co">#&gt; 4              M      F   304</span>


								<span class="co"># ^^ gets translated to:</span>

								<span class="co"># </span>

								<span class="co"># SELECT *</span>

								<span class="co"># FROM (SELECT  gender ,  marital_status , COUNT(*) AS  n </span>

								<span class="co">#       FROM  cp.`employee.json` </span>

								<span class="co">#       GROUP BY  gender ,  marital_status )  govketbhqb </span>

								<span class="co"># LIMIT 1000</span>


								<span class="kw">count</span>(db, gender, marital_status) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">collect</span>()

								<span class="co">#&gt; # A tibble: 4 x 3</span>

								<span class="co">#&gt; # Groups:   gender [2]</span>

								<span class="co">#&gt;   marital_status gender     n</span>

								<span class="co">#&gt; *          &lt;chr&gt;  &lt;chr&gt; &lt;int&gt;</span>

								<span class="co">#&gt; 1              S      F   297</span>

								<span class="co">#&gt; 2              M      M   278</span>

								<span class="co">#&gt; 3              S      M   276</span>

								<span class="co">#&gt; 4              M      F   304</span>


								<span class="co"># ^^ gets translated to:</span>

								<span class="co"># </span>

								<span class="co"># SELECT  gender ,  marital_status , COUNT(*) AS  n </span>

								<span class="co"># FROM  cp.`employee.json` </span>

								<span class="co"># GROUP BY  gender ,  marital_status </span>


								<span class="kw">group_by</span>(db, position_title) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">count</span>(gender) -&gt;<span class="st"> </span>tmp2


								<span class="kw">group_by</span>(db, position_title) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">count</span>(gender) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">ungroup</span>() <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">full_desc=</span><span class="kw">ifelse</span>(gender<span class="op">==</span><span class="st">"F"</span>, <span class="st">"Female"</span>, <span class="st">"Male"</span>)) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">collect</span>() <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">select</span>(<span class="dt">Title=</span>position_title, <span class="dt">Gender=</span>full_desc, <span class="dt">Count=</span>n)

								<span class="co">#&gt; # A tibble: 30 x 3</span>

								<span class="co">#&gt;                     Title Gender Count</span>

								<span class="co">#&gt;  *                  &lt;chr&gt;  &lt;chr&gt; &lt;int&gt;</span>

								<span class="co">#&gt;  1              President Female     1</span>

								<span class="co">#&gt;  2     VP Country Manager   Male     3</span>

								<span class="co">#&gt;  3     VP Country Manager Female     3</span>

								<span class="co">#&gt;  4 VP Information Systems Female     1</span>

								<span class="co">#&gt;  5     VP Human Resources Female     1</span>

								<span class="co">#&gt;  6          Store Manager Female    13</span>

								<span class="co">#&gt;  7             VP Finance   Male     1</span>

								<span class="co">#&gt;  8          Store Manager   Male    11</span>

								<span class="co">#&gt;  9           HQ Marketing Female     2</span>

								<span class="co">#&gt; 10 HQ Information Systems Female     4</span>

								<span class="co">#&gt; # ... with 20 more rows</span>


								<span class="co"># ^^ gets translated to:</span>

								<span class="co"># </span>

								<span class="co"># SELECT  position_title ,  gender ,  n ,</span>

								<span class="co">#         CASE WHEN ( gender  = 'F') THEN ('Female') ELSE ('Male') </span><span class="re">END</span><span class="co"> AS  full_desc </span>

								<span class="co"># FROM (SELECT  position_title ,  gender , COUNT(*) AS  n </span>

								<span class="co">#       FROM  cp.`employee.json` </span>

								<span class="co">#       GROUP BY  position_title ,  gender )  dcyuypuypb </span>


								<span class="kw">arrange</span>(db, <span class="kw">desc</span>(employee_id)) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">print</span>(<span class="dt">n=</span><span class="dv">20</span>)

								<span class="co">#&gt; # Source:     table&lt;cp.`employee.json`&gt; [?? x 16]</span>

								<span class="co">#&gt; # Database:   DrillConnection</span>

								<span class="co">#&gt; # Ordered by: desc(employee_id)</span>

								<span class="co">#&gt;    store_id gender department_id birth_date supervisor_id  last_name          position_title  hire_date</span>

								<span class="co">#&gt;       &lt;int&gt;  &lt;chr&gt;         &lt;int&gt;     &lt;date&gt;         &lt;int&gt;      &lt;chr&gt;                   &lt;chr&gt;     &lt;dttm&gt;</span>

								<span class="co">#&gt;  1       18      F            18 1914-02-02          1140      Stand Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt;  2       18      M            18 1914-02-02          1140    Burnham Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt;  3       18      F            18 1914-02-02          1139  Doolittle Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt;  4       18      M            18 1914-02-02          1139     Pirnie Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt;  5       18      M            17 1914-02-02          1140     Younce Store Permanent Stocker 1998-01-01</span>

								<span class="co">#&gt;  6       18      F            17 1914-02-02          1140    Biltoft Store Permanent Stocker 1998-01-01</span>

								<span class="co">#&gt;  7       18      M            17 1914-02-02          1139   Detwiler Store Permanent Stocker 1998-01-01</span>

								<span class="co">#&gt;  8       18      F            17 1914-02-02          1139     Ciruli Store Permanent Stocker 1998-01-01</span>

								<span class="co">#&gt;  9       18      F            16 1914-02-02          1140     Bishop Store Temporary Checker 1998-01-01</span>

								<span class="co">#&gt; 10       18      F            16 1914-02-02          1140  Cutwright Store Temporary Checker 1998-01-01</span>

								<span class="co">#&gt; 11       18      F            16 1914-02-02          1139   Anderson Store Temporary Checker 1998-01-01</span>

								<span class="co">#&gt; 12       18      F            16 1914-02-02          1139  Swartwood Store Temporary Checker 1998-01-01</span>

								<span class="co">#&gt; 13       18      M            15 1914-02-02          1140 Curtsinger Store Permanent Checker 1998-01-01</span>

								<span class="co">#&gt; 14       18      F            15 1914-02-02          1140      Quick Store Permanent Checker 1998-01-01</span>

								<span class="co">#&gt; 15       18      M            15 1914-02-02          1139      Souza Store Permanent Checker 1998-01-01</span>

								<span class="co">#&gt; 16       18      M            15 1914-02-02          1139   Compagno Store Permanent Checker 1998-01-01</span>

								<span class="co">#&gt; 17       18      M            11 1961-09-24          1139  Jaramillo  Store Shift Supervisor 1998-01-01</span>

								<span class="co">#&gt; 18       18      M            11 1972-05-12            17     Belsey Store Assistant Manager 1998-01-01</span>

								<span class="co">#&gt; 19       12      M            18 1914-02-02          1069    Eichorn Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt; 20       12      F            18 1914-02-02          1069  Geiermann Store Temporary Stocker 1998-01-01</span>

								<span class="co">#&gt; # ... with more rows, and 8 more variables: management_role &lt;chr&gt;, salary &lt;dbl&gt;, marital_status &lt;chr&gt;, full_name &lt;chr&gt;,</span>

								<span class="co">#&gt; #   employee_id &lt;int&gt;, education_level &lt;chr&gt;, first_name &lt;chr&gt;, position_id &lt;int&gt;</span>


								<span class="co"># ^^ gets translated to:</span>

								<span class="co"># </span>

								<span class="co"># SELECT *</span>

								<span class="co"># FROM (SELECT *</span>

								<span class="co">#       FROM  cp.`employee.json` </span>

								<span class="co">#       ORDER BY  employee_id  DESC)  lvpxoaejbc </span>

								<span class="co"># LIMIT 5</span>


								<span class="kw">mutate</span>(db, <span class="dt">position_title=</span><span class="kw">tolower</span>(position_title)) <span class="op">%&gt;%</span>

								<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">salary=</span><span class="kw">as.numeric</span>(salary)) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">gender=</span><span class="kw">ifelse</span>(gender<span class="op">==</span><span class="st">"F"</span>, <span class="st">"Female"</span>, <span class="st">"Male"</span>)) <span class="op">%&gt;%</span>

								<span class="st">  </span><span class="kw">mutate</span>(<span class="dt">marital_status=</span><span class="kw">ifelse</span>(marital_status<span class="op">==</span><span class="st">"S"</span>, <span class="st">"Single"</span>, <span class="st">"Married"</span>)) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">group_by</span>(supervisor_id) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">summarise</span>(<span class="dt">underlings_count=</span><span class="kw">n</span>()) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span><span class="kw">collect</span>()

								<span class="co">#&gt; # A tibble: 112 x 2</span>

								<span class="co">#&gt;    supervisor_id underlings_count</span>

								<span class="co">#&gt;  *         &lt;int&gt;            &lt;int&gt;</span>

								<span class="co">#&gt;  1             0                1</span>

								<span class="co">#&gt;  2             1                7</span>

								<span class="co">#&gt;  3             5                9</span>

								<span class="co">#&gt;  4             4                2</span>

								<span class="co">#&gt;  5             2                3</span>

								<span class="co">#&gt;  6            20                2</span>

								<span class="co">#&gt;  7            21                4</span>

								<span class="co">#&gt;  8            22                7</span>

								<span class="co">#&gt;  9             6                4</span>

								<span class="co">#&gt; 10            36                2</span>

								<span class="co">#&gt; # ... with 102 more rows</span>


								<span class="co"># ^^ gets translated to:</span>

								<span class="co"># </span>

								<span class="co"># SELECT  supervisor_id , COUNT(*) AS  underlings_count </span>

								<span class="co"># FROM (SELECT  employee_id ,  full_name ,  first_name ,  last_name ,  position_id ,  position_title ,  store_id ,  department_id ,  birth_date ,  hire_date ,  salary ,  supervisor_id ,  education_level ,  gender ,  management_role , CASE WHEN ( marital_status  = 'S') THEN ('Single') ELSE ('Married') </span><span class="re">END</span><span class="co"> AS  marital_status </span>

								<span class="co">#       FROM (SELECT  employee_id ,  full_name ,  first_name ,  last_name ,  position_id ,  position_title ,  store_id ,  department_id ,  birth_date ,  hire_date ,  salary ,  supervisor_id ,  education_level ,  marital_status ,  management_role , CASE WHEN ( gender  = 'F') THEN ('Female') ELSE ('Male') </span><span class="re">END</span><span class="co"> AS  gender </span>

								<span class="co">#             FROM (SELECT  employee_id ,  full_name ,  first_name ,  last_name ,  position_id ,  position_title ,  store_id ,  department_id ,  birth_date ,  hire_date ,  supervisor_id ,  education_level ,  marital_status ,  gender ,  management_role , CAST( salary  AS DOUBLE) AS  salary </span>

								<span class="co">#                   FROM (SELECT  employee_id ,  full_name ,  first_name ,  last_name ,  position_id ,  store_id ,  department_id ,  birth_date ,  hire_date ,  salary ,  supervisor_id ,  education_level ,  marital_status ,  gender ,  management_role , LOWER( position_title ) AS  position_title </span>

								<span class="co">#                         FROM  cp.`employee.json` )  cnjsqxeick )  bnbnjrubna )  wavfmhkczv )  zaxeyyicxo </span>

								<span class="co"># GROUP BY  supervisor_id </span></code></pre></div>

								</div>

								<div id="usage" class="section level3">

								<h3 class="hasAnchor">

								<a href="#usage" class="anchor"></a>Usage</h3>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(sergeant)


								<span class="co"># current verison</span>

								<span class="kw">packageVersion</span>(<span class="st">"sergeant"</span>)

								<span class="co">#&gt; [1] '0.5.0'</span>


								dc &lt;-<span class="st"> </span><span class="kw"><a href="reference/drill_connection.html">drill_connection</a></span>(<span class="st">"localhost"</span>)


								<span class="kw"><a href="reference/drill_active.html">drill_active</a></span>(dc)

								<span class="co">#&gt; [1] TRUE</span>


								<span class="kw"><a href="reference/drill_version.html">drill_version</a></span>(dc)

								<span class="co">#&gt; [1] "1.10.0"</span>


								<span class="kw"><a href="reference/drill_storage.html">drill_storage</a></span>(dc)<span class="op">$</span>name

								<span class="co">#&gt; [1] "cp"    "dfs"   "hbase" "hive"  "kudu"  "mongo" "s3"</span></code></pre></div>

								<p>Working with the built-in JSON data sets:</p>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(dc, <span class="st">"SELECT * FROM cp.`employee.json` limit 100"</span>)

								<span class="co">#&gt; Parsed with column specification:</span>

								<span class="co">#&gt; cols(</span>

								<span class="co">#&gt;   store_id = col_integer(),</span>

								<span class="co">#&gt;   gender = col_character(),</span>

								<span class="co">#&gt;   department_id = col_integer(),</span>

								<span class="co">#&gt;   birth_date = col_date(format = ""),</span>

								<span class="co">#&gt;   supervisor_id = col_integer(),</span>

								<span class="co">#&gt;   last_name = col_character(),</span>

								<span class="co">#&gt;   position_title = col_character(),</span>

								<span class="co">#&gt;   hire_date = col_datetime(format = ""),</span>

								<span class="co">#&gt;   management_role = col_character(),</span>

								<span class="co">#&gt;   salary = col_double(),</span>

								<span class="co">#&gt;   marital_status = col_character(),</span>

								<span class="co">#&gt;   full_name = col_character(),</span>

								<span class="co">#&gt;   employee_id = col_integer(),</span>

								<span class="co">#&gt;   education_level = col_character(),</span>

								<span class="co">#&gt;   first_name = col_character(),</span>

								<span class="co">#&gt;   position_id = col_integer()</span>

								<span class="co">#&gt; )</span>

								<span class="co">#&gt; # A tibble: 100 x 16</span>

								<span class="co">#&gt;    store_id gender department_id birth_date supervisor_id last_name         position_title  hire_date   management_role</span>

								<span class="co">#&gt;  *    &lt;int&gt;  &lt;chr&gt;         &lt;int&gt;     &lt;date&gt;         &lt;int&gt;     &lt;chr&gt;                  &lt;chr&gt;     &lt;dttm&gt;             &lt;chr&gt;</span>

								<span class="co">#&gt;  1        0      F             1 1961-08-26             0    Nowmer              President 1994-12-01 Senior Management</span>

								<span class="co">#&gt;  2        0      M             1 1915-07-03             1   Whelply     VP Country Manager 1994-12-01 Senior Management</span>

								<span class="co">#&gt;  3        0      M             1 1969-06-20             1    Spence     VP Country Manager 1998-01-01 Senior Management</span>

								<span class="co">#&gt;  4        0      F             1 1951-05-10             1 Gutierrez     VP Country Manager 1998-01-01 Senior Management</span>

								<span class="co">#&gt;  5        0      F             2 1942-10-08             1   Damstra VP Information Systems 1994-12-01 Senior Management</span>

								<span class="co">#&gt;  6        0      F             3 1949-03-27             1  Kanagaki     VP Human Resources 1994-12-01 Senior Management</span>

								<span class="co">#&gt;  7        9      F            11 1922-08-10             5   Brunner          Store Manager 1998-01-01  Store Management</span>

								<span class="co">#&gt;  8       21      F            11 1979-06-23             5  Blumberg          Store Manager 1998-01-01  Store Management</span>

								<span class="co">#&gt;  9        0      M             5 1949-08-26             1     Stanz             VP Finance 1994-12-01 Senior Management</span>

								<span class="co">#&gt; 10        1      M            11 1967-06-20             5  Murraiin          Store Manager 1998-01-01  Store Management</span>

								<span class="co">#&gt; # ... with 90 more rows, and 7 more variables: salary &lt;dbl&gt;, marital_status &lt;chr&gt;, full_name &lt;chr&gt;, employee_id &lt;int&gt;,</span>

								<span class="co">#&gt; #   education_level &lt;chr&gt;, first_name &lt;chr&gt;, position_id &lt;int&gt;</span>


								<span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(dc, <span class="st">"SELECT COUNT(gender) AS gender FROM cp.`employee.json` GROUP BY gender"</span>)

								<span class="co">#&gt; Parsed with column specification:</span>

								<span class="co">#&gt; cols(</span>

								<span class="co">#&gt;   gender = col_integer()</span>

								<span class="co">#&gt; )</span>

								<span class="co">#&gt; # A tibble: 2 x 1</span>

								<span class="co">#&gt;   gender</span>

								<span class="co">#&gt; *  &lt;int&gt;</span>

								<span class="co">#&gt; 1    601</span>

								<span class="co">#&gt; 2    554</span>


								<span class="kw"><a href="reference/drill_options.html">drill_options</a></span>(dc)

								<span class="co">#&gt; # A tibble: 113 x 4</span>

								<span class="co">#&gt;                                              name value   type    kind</span>

								<span class="co">#&gt;  *                                          &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt;   &lt;chr&gt;</span>

								<span class="co">#&gt;  1                 planner.enable_hash_single_key  TRUE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  2      store.parquet.reader.pagereader.queuesize     2 SYSTEM    LONG</span>

								<span class="co">#&gt;  3             planner.enable_limit0_optimization FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  4              store.json.read_numbers_as_double FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  5                planner.enable_constant_folding  TRUE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  6                      store.json.extended_types FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  7   planner.memory.non_blocking_operators_memory    64 SYSTEM    LONG</span>

								<span class="co">#&gt;  8                  planner.enable_multiphase_agg  TRUE SYSTEM BOOLEAN</span>

								<span class="co">#&gt;  9                  exec.query_profile.debug_mode FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 10 planner.filter.max_selectivity_estimate_factor     1 SYSTEM  DOUBLE</span>

								<span class="co">#&gt; # ... with 103 more rows</span>


								<span class="kw"><a href="reference/drill_options.html">drill_options</a></span>(dc, <span class="st">"json"</span>)

								<span class="co">#&gt; # A tibble: 7 x 4</span>

								<span class="co">#&gt;                                                    name value   type    kind</span>

								<span class="co">#&gt;                                                   &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt;   &lt;chr&gt;</span>

								<span class="co">#&gt; 1                     store.json.read_numbers_as_double FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 2                             store.json.extended_types FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 3                              store.json.writer.uglify FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 4                store.json.reader.skip_invalid_records FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 5 store.json.reader.print_skipped_invalid_record_number FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 6                              store.json.all_text_mode FALSE SYSTEM BOOLEAN</span>

								<span class="co">#&gt; 7                    store.json.writer.skip_null_fields  TRUE SYSTEM BOOLEAN</span></code></pre></div>

								</div>

								<div id="working-with-parquet-files" class="section level2">

								<h2 class="hasAnchor">

								<a href="#working-with-parquet-files" class="anchor"></a>Working with parquet files</h2>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(dc, <span class="st">"SELECT * FROM dfs.`/usr/local/drill/sample-data/nation.parquet` LIMIT 5"</span>)

								<span class="co">#&gt; Parsed with column specification:</span>

								<span class="co">#&gt; cols(</span>

								<span class="co">#&gt;   N_COMMENT = col_character(),</span>

								<span class="co">#&gt;   N_NAME = col_character(),</span>

								<span class="co">#&gt;   N_NATIONKEY = col_integer(),</span>

								<span class="co">#&gt;   N_REGIONKEY = col_integer()</span>

								<span class="co">#&gt; )</span>

								<span class="co">#&gt; # A tibble: 5 x 4</span>

								<span class="co">#&gt;              N_COMMENT    N_NAME N_NATIONKEY N_REGIONKEY</span>

								<span class="co">#&gt; *                &lt;chr&gt;     &lt;chr&gt;       &lt;int&gt;       &lt;int&gt;</span>

								<span class="co">#&gt; 1  haggle. carefully f   ALGERIA           0           0</span>

								<span class="co">#&gt; 2 al foxes promise sly ARGENTINA           1           1</span>

								<span class="co">#&gt; 3 y alongside of the p    BRAZIL           2           1</span>

								<span class="co">#&gt; 4 eas hang ironic, sil    CANADA           3           1</span>

								<span class="co">#&gt; 5 y above the carefull     EGYPT           4           4</span></code></pre></div>

								<p>Including multiple parquet files in different directories (note the wildcard support):</p>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(dc, <span class="st">"SELECT * FROM dfs.`/usr/local/drill/sample-data/nations*/nations*.parquet` LIMIT 5"</span>)

								<span class="co">#&gt; Parsed with column specification:</span>

								<span class="co">#&gt; cols(</span>

								<span class="co">#&gt;   N_COMMENT = col_character(),</span>

								<span class="co">#&gt;   N_NAME = col_character(),</span>

								<span class="co">#&gt;   N_NATIONKEY = col_integer(),</span>

								<span class="co">#&gt;   N_REGIONKEY = col_integer(),</span>

								<span class="co">#&gt;   dir0 = col_character()</span>

								<span class="co">#&gt; )</span>

								<span class="co">#&gt; # A tibble: 5 x 5</span>

								<span class="co">#&gt;              N_COMMENT    N_NAME N_NATIONKEY N_REGIONKEY      dir0</span>

								<span class="co">#&gt; *                &lt;chr&gt;     &lt;chr&gt;       &lt;int&gt;       &lt;int&gt;     &lt;chr&gt;</span>

								<span class="co">#&gt; 1  haggle. carefully f   ALGERIA           0           0 nationsMF</span>

								<span class="co">#&gt; 2 al foxes promise sly ARGENTINA           1           1 nationsMF</span>

								<span class="co">#&gt; 3 y alongside of the p    BRAZIL           2           1 nationsMF</span>

								<span class="co">#&gt; 4 eas hang ironic, sil    CANADA           3           1 nationsMF</span>

								<span class="co">#&gt; 5 y above the carefull     EGYPT           4           4 nationsMF</span></code></pre></div>

								<div id="a-preview-of-the-built-in-support-for-spatial-ops" class="section level3">

								<h3 class="hasAnchor">

								<a href="#a-preview-of-the-built-in-support-for-spatial-ops" class="anchor"></a>A preview of the built-in support for spatial ops</h3>

								<p>Via: <a href="https://github.com/k255/drill-gis" class="uri">https://github.com/k255/drill-gis</a></p>

								<p>A common use case is to select data within boundary of given polygon:</p>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(dc, <span class="st">"</span>

								<span class="st">select columns[2] as city, columns[4] as lon, columns[3] as lat</span>

								<span class="st">    from cp.`sample-data/CA-cities.csv`</span>

								<span class="st">    where</span>

								<span class="st">        ST_Within(</span>

								<span class="st">            ST_Point(columns[4], columns[3]),</span>

								<span class="st">            ST_GeomFromText(</span>

								<span class="st">                'POLYGON((-121.95 37.28, -121.94 37.35, -121.84 37.35, -121.84 37.28, -121.95 37.28))'</span>

								<span class="st">                )</span>

								<span class="st">            )</span>

								<span class="st">"</span>)

								<span class="co">#&gt; Parsed with column specification:</span>

								<span class="co">#&gt; cols(</span>

								<span class="co">#&gt;   city = col_character(),</span>

								<span class="co">#&gt;   lon = col_double(),</span>

								<span class="co">#&gt;   lat = col_double()</span>

								<span class="co">#&gt; )</span>

								<span class="co">#&gt; # A tibble: 7 x 3</span>

								<span class="co">#&gt;          city       lon      lat</span>

								<span class="co">#&gt; *       &lt;chr&gt;     &lt;dbl&gt;    &lt;dbl&gt;</span>

								<span class="co">#&gt; 1     Burbank -121.9316 37.32328</span>

								<span class="co">#&gt; 2    San Jose -121.8950 37.33939</span>

								<span class="co">#&gt; 3        Lick -121.8458 37.28716</span>

								<span class="co">#&gt; 4 Willow Glen -121.8897 37.30855</span>

								<span class="co">#&gt; 5 Buena Vista -121.9166 37.32133</span>

								<span class="co">#&gt; 6    Parkmoor -121.9308 37.32105</span>

								<span class="co">#&gt; 7   Fruitdale -121.9327 37.31086</span></code></pre></div>

								</div>

								<div id="jdbc" class="section level3">

								<h3 class="hasAnchor">

								<a href="#jdbc" class="anchor"></a>JDBC</h3>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(RJDBC)

								<span class="co">#&gt; Loading required package: rJava</span>


								<span class="co"># Use this if connecting to a cluster with zookeeper</span>

								<span class="co"># con &lt;- drill_jdbc("drill-node:2181", "drillbits1") </span>


								<span class="co"># Use the following if running drill-embedded</span>

								con &lt;-<span class="st"> </span><span class="kw"><a href="reference/drill_jdbc.html">drill_jdbc</a></span>(<span class="st">"localhost:31010"</span>, <span class="dt">use_zk=</span><span class="ot">FALSE</span>)

								<span class="co">#&gt; Using [jdbc:drill:drillbit=localhost:31010]...</span>


								<span class="kw"><a href="reference/drill_query.html">drill_query</a></span>(con, <span class="st">"SELECT * FROM cp.`employee.json`"</span>)

								<span class="co">#&gt; # A tibble: 1,155 x 16</span>

								<span class="co">#&gt;    employee_id         full_name first_name last_name position_id         position_title store_id department_id</span>

								<span class="co">#&gt;  *       &lt;dbl&gt;             &lt;chr&gt;      &lt;chr&gt;     &lt;chr&gt;       &lt;dbl&gt;                  &lt;chr&gt;    &lt;dbl&gt;         &lt;dbl&gt;</span>

								<span class="co">#&gt;  1           1      Sheri Nowmer      Sheri    Nowmer           1              President        0             1</span>

								<span class="co">#&gt;  2           2   Derrick Whelply    Derrick   Whelply           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  3           4    Michael Spence    Michael    Spence           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  4           5    Maya Gutierrez       Maya Gutierrez           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  5           6   Roberta Damstra    Roberta   Damstra           3 VP Information Systems        0             2</span>

								<span class="co">#&gt;  6           7  Rebecca Kanagaki    Rebecca  Kanagaki           4     VP Human Resources        0             3</span>

								<span class="co">#&gt;  7           8       Kim Brunner        Kim   Brunner          11          Store Manager        9            11</span>

								<span class="co">#&gt;  8           9   Brenda Blumberg     Brenda  Blumberg          11          Store Manager       21            11</span>

								<span class="co">#&gt;  9          10      Darren Stanz     Darren     Stanz           5             VP Finance        0             5</span>

								<span class="co">#&gt; 10          11 Jonathan Murraiin   Jonathan  Murraiin          11          Store Manager        1            11</span>

								<span class="co">#&gt; # ... with 1,145 more rows, and 8 more variables: birth_date &lt;chr&gt;, hire_date &lt;chr&gt;, salary &lt;dbl&gt;, supervisor_id &lt;dbl&gt;,</span>

								<span class="co">#&gt; #   education_level &lt;chr&gt;, marital_status &lt;chr&gt;, gender &lt;chr&gt;, management_role &lt;chr&gt;</span>


								<span class="co"># but it can work via JDBC function calls, too</span>

								<span class="kw">dbGetQuery</span>(con, <span class="st">"SELECT * FROM cp.`employee.json`"</span>) <span class="op">%&gt;%</span><span class="st"> </span>

								<span class="st">  </span>tibble<span class="op">::</span><span class="kw">as_tibble</span>()

								<span class="co">#&gt; # A tibble: 1,155 x 16</span>

								<span class="co">#&gt;    employee_id         full_name first_name last_name position_id         position_title store_id department_id</span>

								<span class="co">#&gt;  *       &lt;dbl&gt;             &lt;chr&gt;      &lt;chr&gt;     &lt;chr&gt;       &lt;dbl&gt;                  &lt;chr&gt;    &lt;dbl&gt;         &lt;dbl&gt;</span>

								<span class="co">#&gt;  1           1      Sheri Nowmer      Sheri    Nowmer           1              President        0             1</span>

								<span class="co">#&gt;  2           2   Derrick Whelply    Derrick   Whelply           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  3           4    Michael Spence    Michael    Spence           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  4           5    Maya Gutierrez       Maya Gutierrez           2     VP Country Manager        0             1</span>

								<span class="co">#&gt;  5           6   Roberta Damstra    Roberta   Damstra           3 VP Information Systems        0             2</span>

								<span class="co">#&gt;  6           7  Rebecca Kanagaki    Rebecca  Kanagaki           4     VP Human Resources        0             3</span>

								<span class="co">#&gt;  7           8       Kim Brunner        Kim   Brunner          11          Store Manager        9            11</span>

								<span class="co">#&gt;  8           9   Brenda Blumberg     Brenda  Blumberg          11          Store Manager       21            11</span>

								<span class="co">#&gt;  9          10      Darren Stanz     Darren     Stanz           5             VP Finance        0             5</span>

								<span class="co">#&gt; 10          11 Jonathan Murraiin   Jonathan  Murraiin          11          Store Manager        1            11</span>

								<span class="co">#&gt; # ... with 1,145 more rows, and 8 more variables: birth_date &lt;chr&gt;, hire_date &lt;chr&gt;, salary &lt;dbl&gt;, supervisor_id &lt;dbl&gt;,</span>

								<span class="co">#&gt; #   education_level &lt;chr&gt;, marital_status &lt;chr&gt;, gender &lt;chr&gt;, management_role &lt;chr&gt;</span></code></pre></div>

								</div>

								<div id="test-results" class="section level3">

								<h3 class="hasAnchor">

								<a href="#test-results" class="anchor"></a>Test Results</h3>

								<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(sergeant)

								<span class="kw">library</span>(testthat)

								<span class="co">#&gt; </span>

								<span class="co">#&gt; Attaching package: 'testthat'</span>

								<span class="co">#&gt; The following object is masked from 'package:dplyr':</span>

								<span class="co">#&gt; </span>

								<span class="co">#&gt;     matches</span>


								<span class="kw">date</span>()

								<span class="co">#&gt; [1] "Sat Jun 17 20:47:11 2017"</span>


								devtools<span class="op">::</span><span class="kw">test</span>()

								<span class="co">#&gt; Loading sergeant</span>

								<span class="co">#&gt; Testing sergeant</span>

								<span class="co">#&gt; basic functionality: ..</span>

								<span class="co">#&gt; </span>

								<span class="co">#&gt; DONE ===================================================================================================================</span></code></pre></div>

								</div>

								<div id="code-of-conduct" class="section level3">

								<h3 class="hasAnchor">

								<a href="#code-of-conduct" class="anchor"></a>Code of Conduct</h3>

								<p>Please note that this project is released with a <a href="CONDUCT.md">Contributor Code of Conduct</a>. By participating in this project you agree to abide by its terms.</p>

								</div>

								</div>

								</div>

								  </div>


								  <div class="col-md-3 hidden-xs hidden-sm" id="sidebar">

								    <h2 class="hasAnchor">

								<a href="#sidebar" class="anchor"></a>Links</h2>

								<ul class="list-unstyled">

								<li>Browse source code at <br><a href="http://github.com/hrbrmstr/sergeant">http://github.com/hrbrmstr/sergeant</a>

								</li>

								<li>Report a bug at <br><a href="https://github.com/hrbrmstr/sergeant/issues">https://github.com/hrbrmstr/sergeant/issues</a>

								</li>

								</ul>

								<h2>License</h2>

								<p><a href="https://opensource.org/licenses/mit-license.php">MIT</a> + file <a href="LICENSE.html">LICENSE</a></p>

								<h2>Developers</h2>

								<ul class="list-unstyled">

								<li>Bob Rudis <br><small class="roles"> Author, maintainer </small> </li>

								<li><a href="authors.html">All authors...</a></li>

								</ul>

								<h2>Dev status</h2>

								<ul class="list-unstyled">

								<li><a href="https://travis-ci.org/hrbrmstr/sergeant"><img src="https://travis-ci.org/hrbrmstr/sergeant.svg?branch=master" alt="Travis-CI Build Status"></a></li>

								</ul>

								</div>


								</div>


								      <footer><div class="copyright">

								  <p>Developed by Bob Rudis.</p>

								</div>


								<div class="pkgdown">

								  <p>Site built with <a href="http://hadley.github.io/pkgdown/">pkgdown</a>.</p>

								</div>


								      </footer>

								</div>


								  </body>

								</html>