Access and Query Amazon Athena via DBI/JDBC
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.1 KiB

output: rmarkdown::github_document
chunk_output_type: console

# `metis`

Helpers for Accessing and Querying Amazon Athena

Including a lightweight RJDBC shim.

In Greek mythology, Metis was Athena's "helper".

## Description

Still fairly beta-quality level but getting there.

The goal will be to get around enough of the "gotchas" that are preventing raw RJDBC Athena connections from "just working" with `dplyr` v0.6.0+ and also get around the [`fetchSize` problem]( without having to not use `dbGetQuery()`.

The `AthenaJDBC42_2.0.2.jar` JAR file is included out of convenience but that will likely move to a separate package as this gets closer to prime time if this goes on CRAN.

NOTE that the updated driver *REQUIRES JDK 1.8+*.

See the **Usage** section for an example.


Since R 3.5 (I don't remember this happening in R 3.4.x) signals sent from interrupting Athena JDBC calls crash the R interpreter. You need to set the `-Xrs` option to avoid signals being passed on to the JVM owner. That has to be done _before_ `rJava` is loaded so you either need to remember to put it at the top of all scripts _or_ stick this in your local `~/.Rprofile` and/or sitewide `Rprofile`:

if (!grepl("-Xrs", getOption("java.parameters", ""))) {
"java.parameters" = c(getOption("java.parameters", default = NULL), "-Xrs")
## What's Inside The Tin?

The following functions are implemented:

Easy-interface connection helper:

- `athena_connect` Make a JDBC connection to Athena

Custom JDBC Classes:

- `Athena`: AthenaJDBC (make a new Athena con obj)
- `AthenaConnection-class`: AthenaJDBC
- `AthenaDriver-class`: AthenaJDBC
- `AthenaResult-class`: AthenaJDBC

Custom JDBC Class Methods:

- `dbConnect-method`: AthenaJDBC
- `dbExistsTable-method`: AthenaJDBC
- `dbGetQuery-method`: AthenaJDBC
- `dbListFields-method`: AthenaJDBC
- `dbListTables-method`: AthenaJDBC
- `dbReadTable-method`: AthenaJDBC
- `dbSendQuery-method`: AthenaJDBC

Pulled in from other `cloudyr` pkgs:

- `read_credentials`: Use Credentials from .aws/credentials File
- `use_credentials`: Use Credentials from .aws/credentials File

## Installation

```{r eval=FALSE}

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}

# current verison

```{r message=FALSE, warning=FALSE, error=FALSE}

default_schema = "sampledb",
s3_staging_dir = "s3://accessible-bucket",
log_path = "/tmp/athena.log",
log_level = "DEBUG"
) -> ath

dbListTables(ath, schema="sampledb")

dbExistsTable(ath, "elb_logs", schema="sampledb")

dbListFields(ath, "elb_logs", "sampledb")

dbGetQuery(ath, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>%
type_convert() %>%

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct]( By participating in this project you agree to abide by its terms.