# metis Access and Query Amazon Athena via DBI/JDBC ## Description In Greek mythology, Metis was Athena’s “helper” so methods are provided to help you accessing and querying Amazon Athena via DBI/JDBC and/or `dplyr`. \#’ Methods are provides to connect to ‘Amazon’ ‘Athena’, lookup schemas/tables, ## IMPORTANT Since R 3.5 (I don’t remember this happening in R 3.4.x) signals sent from interrupting Athena JDBC calls crash the R interpreter. You need to set the `-Xrs` option to avoid signals being passed on to the JVM owner. That has to be done *before* `rJava` is loaded so you either need to remember to put it at the top of all scripts *or* stick this in your local `~/.Rprofile` and/or sitewide `Rprofile`: ``` r if (!grepl("-Xrs", getOption("java.parameters", ""))) { options( "java.parameters" = c(getOption("java.parameters", default = NULL), "-Xrs") ) } ``` ## What’s Inside The Tin? The following functions are implemented: Easy-interface connection helper: - `athena_connect` Simplified Athena JDBC connection helper Custom JDBC Classes: - `Athena`: AthenaJDBC (make a new Athena con obj) - `AthenaConnection-class`: AthenaJDBC - `AthenaDriver-class`: AthenaJDBC - `AthenaResult-class`: AthenaJDBC Custom JDBC Class Methods: - `dbConnect-method` - `dbExistsTable-method` - `dbGetQuery-method` - `dbListFields-method` - `dbListTables-method` - `dbReadTable-method` - `dbSendQuery-method` Pulled in from other `cloudyr` pkgs: - `read_credentials`: Use Credentials from .aws/credentials File - `use_credentials`: Use Credentials from .aws/credentials File ## Installation ``` r devtools::install_git("https://git.sr.ht/~hrbrmstr/metis-lite") # OR devtools::install_gitlab("hrbrmstr/metis-lite") # OR devtools::install_github("hrbrmstr/metis-lite") ``` ## Usage ``` r library(metis.lite) # current verison packageVersion("metis.lite") ``` ## [1] '0.3.0' ``` r library(rJava) library(RJDBC) library(metis.lite) library(magrittr) library(dbplyr) library(dplyr) dbConnect( drv = metis.lite::Athena(), schema_name = "sampledb", provider = "com.simba.athena.amazonaws.auth.PropertiesFileCredentialsProvider", AwsCredentialsProviderArguments = path.expand("~/.aws/athenaCredentials.props"), s3_staging_dir = "s3://aws-athena-query-results-569593279821-us-east-1", ) -> con dbListTables(con, schema="sampledb") ``` ## [1] "elb_logs" ``` r dbExistsTable(con, "elb_logs", schema="sampledb") ``` ## [1] TRUE ``` r dbListFields(con, "elb_logs", "sampledb") ``` ## [1] "timestamp" "elbname" "requestip" "requestport" ## [5] "backendip" "backendport" "requestprocessingtime" "backendprocessingtime" ## [9] "clientresponsetime" "elbresponsecode" "backendresponsecode" "receivedbytes" ## [13] "sentbytes" "requestverb" "url" "protocol" ``` r dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>% glimpse() ``` ## Observations: 10 ## Variables: 16 ## $ timestamp "2014-09-29T18:18:51.826955Z", "2014-09-29T18:18:51.920462Z", "2014-09-29T18:18:52.2725… ## $ elbname "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo",… ## $ requestip "255.48.150.122", "249.213.227.93", "245.108.120.229", "241.112.203.216", "241.43.107.2… ## $ requestport 62096, 62096, 62096, 62096, 56454, 33254, 18918, 64352, 1651, 56454 ## $ backendip "244.238.214.120", "248.99.214.228", "243.3.190.175", "246.235.181.255", "241.112.203.2… ## $ backendport 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888 ## $ requestprocessingtime 9.0e-05, 9.7e-05, 8.7e-05, 9.4e-05, 7.6e-05, 8.3e-05, 6.3e-05, 5.4e-05, 8.2e-05, 8.7e-05 ## $ backendprocessingtime 0.007410, 0.256533, 0.442659, 0.016772, 0.035036, 0.029892, 0.034148, 0.014858, 0.01518… ## $ clientresponsetime 0.000055, 0.000075, 0.000131, 0.000078, 0.000057, 0.000043, 0.000033, 0.000043, 0.00007… ## $ elbresponsecode "302", "302", "200", "200", "200", "200", "200", "200", "200", "200" ## $ backendresponsecode "200", "200", "200", "200", "200", "200", "200", "200", "200", "200" ## $ receivedbytes 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ sentbytes 0, 0, 58402, 152213, 20766, 32370, 3408, 3884, 84245, 3831 ## $ requestverb "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET" ## $ url "http://www.abcxyz.com:80/", "http://www.abcxyz.com:80/accounts/login/?next=/", "http:/… ## $ protocol "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HT… ### Check types ``` r dbGetQuery(con, " SELECT CAST('chr' AS CHAR(4)) achar, CAST('varchr' AS VARCHAR) avarchr, CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday, CAST(100.1 AS DOUBLE) AS justadbl, CAST(127 AS TINYINT) AS asmallint, CAST(100 AS INTEGER) AS justanint, CAST(100000000000000000 AS BIGINT) AS abigint, CAST(('GET' = 'GET') AS BOOLEAN) AS is_get, ARRAY[1, 2, 3] AS arr1, ARRAY['1', '2, 3', '4'] AS arr2, MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp, CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw, CAST('{\"a\":1}' AS JSON) js FROM elb_logs LIMIT 1 ") %>% glimpse() ``` ## Observations: 1 ## Variables: 13 ## $ achar "chr " ## $ avarchr "varchr" ## $ tsday 2014-09-26 ## $ justadbl 100.1 ## $ asmallint 127 ## $ justanint 100 ## $ abigint 100000000000000000 ## $ is_get TRUE ## $ arr1 "1, 2, 3" ## $ arr2 "1, 2, 3, 4" ## $ mp "{bar=2, foo=1}" ## $ rw "{x=1, y=2.0}" ## $ js "\"{\\\"a\\\":1}\"" #### dplyr ``` r tbl(con, sql(" SELECT CAST('chr' AS CHAR(4)) achar, CAST('varchr' AS VARCHAR) avarchr, CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday, CAST(100.1 AS DOUBLE) AS justadbl, CAST(127 AS TINYINT) AS asmallint, CAST(100 AS INTEGER) AS justanint, CAST(100000000000000000 AS BIGINT) AS abigint, CAST(('GET' = 'GET') AS BOOLEAN) AS is_get, ARRAY[1, 2, 3] AS arr, ARRAY['1', '2, 3', '4'] AS arr, MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp, CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw, CAST('{\"a\":1}' AS JSON) js FROM elb_logs LIMIT 1 ")) %>% glimpse() ``` ## Observations: ?? ## Variables: 13 ## Database: AthenaConnection ## $ achar "chr " ## $ avarchr "varchr" ## $ tsday 2014-09-27 ## $ justadbl 100.1 ## $ asmallint 127 ## $ justanint 100 ## $ abigint 100000000000000000 ## $ is_get TRUE ## $ arr "1, 2, 3" ## $ arr "1, 2, 3, 4" ## $ mp "{bar=2, foo=1}" ## $ rw "{x=1, y=2.0}" ## $ js "\"{\\\"a\\\":1}\"" ## Code of Conduct Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.