Amazon Athena JDBC Driver Wrapper Supporting the 'metis' Package
Вы не можете выбрать более 25 тем Темы должны начинаться с буквы или цифры, могут содержать дефисы(-) и должны содержать не более 35 символов.
boB Rudis a01ab3351f
initial commit
7 лет назад
R initial commit 7 лет назад
inst initial commit 7 лет назад
man initial commit 7 лет назад
tests initial commit 7 лет назад
.Rbuildignore initial commit 7 лет назад
.codecov.yml initial commit 7 лет назад
.gitignore initial commit 7 лет назад
.travis.yml initial commit 7 лет назад
DESCRIPTION initial commit 7 лет назад
NAMESPACE initial commit 7 лет назад
NEWS.md initial commit 7 лет назад
README.Rmd initial commit 7 лет назад
README.md initial commit 7 лет назад
metis.Rproj initial commit 7 лет назад

README.md

metis : Helpers for Accessing and Querying Amazon Athena

Including a lightweight RJDBC shim.

THIS IS SUPER ALPHA QUALITY. NOTHING TO SEE HERE. MOVE ALONG.

The goal will be to get around enough of the "gotchas" that are preventing raw RJDBC Athena connecitons from "just working" with dplyr v0.6.0+ and also get around the fetchSize problem without having to not use dbGetQuery().

It will also support more than the vanilla id/secret auth mechism (it currently support the default basic auth and temp token auth, the latter via environment variables).

See the Usage section for an example.

The following functions are implemented:

  • athena_connect: Make a JDBC connection to Athena (this returns an AthenaConnection object which is a super-class of it's RJDBC vanilla counterpart)
  • Athena: AthenaJDBC`
  • AthenaConnection-class: AthenaJDBC
  • AthenaDriver-class: AthenaJDBC
  • AthenaResult-class: AthenaJDBC
  • dbConnect-method: AthenaJDBC
  • dbGetQuery-method: AthenaJDBC
  • dbSendQuery-method: AthenaJDBC

Installation

devtools::install_github("hrbrmstr/metis")

Usage

library(metis)
library(dplyr)

# current verison
packageVersion("metis")
## [1] '0.1.0'
ath <- athena_connect("your_schema_name")

res <- dbGetQuery(ath, "
SELECT format_datetime(timestamp, 'yyyy-MM-dd HH:00:00') timestamp,
        port as field, count(port) cnt_field FROM your_schema_name.your_table_name
        WHERE CONTAINS(ARRAY['201705'], date)
        AND port IN (445, 139, 3389)
        AND timestamp > date '2017-05-01'
        AND timestamp <= date '2017-05-22'
GROUP BY format_datetime(timestamp, 'yyyy-MM-dd HH:00:00'), port LIMIT 1000000
")