Access and Query Amazon Athena via DBI/JDBC
No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.

README.md 7.3KB

hace 3 años
hace 2 años
hace 3 años
hace 2 años
hace 3 años
hace 3 años
hace 1 año
hace 2 años
hace 3 años
hace 2 años
hace 2 años
hace 2 años
hace 3 años
hace 2 años
hace 2 años
hace 3 años
hace 3 años
hace 2 años
hace 3 años
hace 3 años
hace 3 años
hace 3 años
hace 3 años
hace 2 años
hace 2 años
hace 2 años

  1. # metis
  2. Access and Query Amazon Athena via DBI/JDBC
  3. ## Description
  4. In Greek mythology, Metis was Athena’s “helper” so methods are provided
  5. to help you accessing and querying Amazon Athena via DBI/JDBC and/or
  6. `dplyr`. \#’ Methods are provides to connect to ‘Amazon’ ‘Athena’,
  7. lookup schemas/tables,
  8. ## IMPORTANT
  9. Since R 3.5 (I don’t remember this happening in R 3.4.x) signals sent
  10. from interrupting Athena JDBC calls crash the R interpreter. You need to
  11. set the `-Xrs` option to avoid signals being passed on to the JVM owner.
  12. That has to be done *before* `rJava` is loaded so you either need to
  13. remember to put it at the top of all scripts *or* stick this in your
  14. local `~/.Rprofile` and/or sitewide `Rprofile`:
  15. ``` r
  16. if (!grepl("-Xrs", getOption("java.parameters", ""))) {
  17. options(
  18. "java.parameters" = c(getOption("java.parameters", default = NULL), "-Xrs")
  19. )
  20. }
  21. ```
  22. ## What’s Inside The Tin?
  23. The following functions are implemented:
  24. Easy-interface connection helper:
  25. - `athena_connect` Simplified Athena JDBC connection helper
  26. Custom JDBC Classes:
  27. - `Athena`: AthenaJDBC (make a new Athena con obj)
  28. - `AthenaConnection-class`: AthenaJDBC
  29. - `AthenaDriver-class`: AthenaJDBC
  30. - `AthenaResult-class`: AthenaJDBC
  31. Custom JDBC Class Methods:
  32. - `dbConnect-method`
  33. - `dbExistsTable-method`
  34. - `dbGetQuery-method`
  35. - `dbListFields-method`
  36. - `dbListTables-method`
  37. - `dbReadTable-method`
  38. - `dbSendQuery-method`
  39. Pulled in from other `cloudyr` pkgs:
  40. - `read_credentials`: Use Credentials from .aws/credentials File
  41. - `use_credentials`: Use Credentials from .aws/credentials File
  42. ## Installation
  43. ``` r
  44. devtools::install_git("https://git.sr.ht/~hrbrmstr/metis-lite")
  45. # OR
  46. devtools::install_gitlab("hrbrmstr/metis-lite")
  47. # OR
  48. devtools::install_github("hrbrmstr/metis-lite")
  49. ```
  50. ## Usage
  51. ``` r
  52. library(metis.lite)
  53. # current verison
  54. packageVersion("metis.lite")
  55. ```
  56. ## [1] '0.3.0'
  57. ``` r
  58. library(rJava)
  59. library(RJDBC)
  60. library(metis.lite)
  61. library(magrittr)
  62. library(dbplyr)
  63. library(dplyr)
  64. dbConnect(
  65. drv = metis.lite::Athena(),
  66. schema_name = "sampledb",
  67. provider = "com.simba.athena.amazonaws.auth.PropertiesFileCredentialsProvider",
  68. AwsCredentialsProviderArguments = path.expand("~/.aws/athenaCredentials.props"),
  69. s3_staging_dir = "s3://aws-athena-query-results-569593279821-us-east-1",
  70. ) -> con
  71. dbListTables(con, schema="sampledb")
  72. ```
  73. ## [1] "elb_logs"
  74. ``` r
  75. dbExistsTable(con, "elb_logs", schema="sampledb")
  76. ```
  77. ## [1] TRUE
  78. ``` r
  79. dbListFields(con, "elb_logs", "sampledb")
  80. ```
  81. ## [1] "timestamp" "elbname" "requestip" "requestport"
  82. ## [5] "backendip" "backendport" "requestprocessingtime" "backendprocessingtime"
  83. ## [9] "clientresponsetime" "elbresponsecode" "backendresponsecode" "receivedbytes"
  84. ## [13] "sentbytes" "requestverb" "url" "protocol"
  85. ``` r
  86. dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>%
  87. glimpse()
  88. ```
  89. ## Observations: 10
  90. ## Variables: 16
  91. ## $ timestamp <chr> "2014-09-29T18:18:51.826955Z", "2014-09-29T18:18:51.920462Z", "2014-09-29T18:18:52.2725…
  92. ## $ elbname <chr> "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo",…
  93. ## $ requestip <chr> "255.48.150.122", "249.213.227.93", "245.108.120.229", "241.112.203.216", "241.43.107.2…
  94. ## $ requestport <int> 62096, 62096, 62096, 62096, 56454, 33254, 18918, 64352, 1651, 56454
  95. ## $ backendip <chr> "244.238.214.120", "248.99.214.228", "243.3.190.175", "246.235.181.255", "241.112.203.2…
  96. ## $ backendport <int> 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888, 8888
  97. ## $ requestprocessingtime <dbl> 9.0e-05, 9.7e-05, 8.7e-05, 9.4e-05, 7.6e-05, 8.3e-05, 6.3e-05, 5.4e-05, 8.2e-05, 8.7e-05
  98. ## $ backendprocessingtime <dbl> 0.007410, 0.256533, 0.442659, 0.016772, 0.035036, 0.029892, 0.034148, 0.014858, 0.01518…
  99. ## $ clientresponsetime <dbl> 0.000055, 0.000075, 0.000131, 0.000078, 0.000057, 0.000043, 0.000033, 0.000043, 0.00007…
  100. ## $ elbresponsecode <chr> "302", "302", "200", "200", "200", "200", "200", "200", "200", "200"
  101. ## $ backendresponsecode <chr> "200", "200", "200", "200", "200", "200", "200", "200", "200", "200"
  102. ## $ receivedbytes <S3: integer64> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  103. ## $ sentbytes <S3: integer64> 0, 0, 58402, 152213, 20766, 32370, 3408, 3884, 84245, 3831
  104. ## $ requestverb <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET"
  105. ## $ url <chr> "http://www.abcxyz.com:80/", "http://www.abcxyz.com:80/accounts/login/?next=/", "http:/…
  106. ## $ protocol <chr> "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HT…
  107. ### Check types
  108. ``` r
  109. dbGetQuery(con, "
  110. SELECT
  111. CAST('chr' AS CHAR(4)) achar,
  112. CAST('varchr' AS VARCHAR) avarchr,
  113. CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday,
  114. CAST(100.1 AS DOUBLE) AS justadbl,
  115. CAST(127 AS TINYINT) AS asmallint,
  116. CAST(100 AS INTEGER) AS justanint,
  117. CAST(100000000000000000 AS BIGINT) AS abigint,
  118. CAST(('GET' = 'GET') AS BOOLEAN) AS is_get,
  119. ARRAY[1, 2, 3] AS arr1,
  120. ARRAY['1', '2, 3', '4'] AS arr2,
  121. MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp,
  122. CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw,
  123. CAST('{\"a\":1}' AS JSON) js
  124. FROM elb_logs
  125. LIMIT 1
  126. ") %>%
  127. glimpse()
  128. ```
  129. ## Observations: 1
  130. ## Variables: 13
  131. ## $ achar <chr> "chr "
  132. ## $ avarchr <chr> "varchr"
  133. ## $ tsday <date> 2014-09-26
  134. ## $ justadbl <dbl> 100.1
  135. ## $ asmallint <int> 127
  136. ## $ justanint <int> 100
  137. ## $ abigint <S3: integer64> 100000000000000000
  138. ## $ is_get <lgl> TRUE
  139. ## $ arr1 <chr> "1, 2, 3"
  140. ## $ arr2 <chr> "1, 2, 3, 4"
  141. ## $ mp <chr> "{bar=2, foo=1}"
  142. ## $ rw <chr> "{x=1, y=2.0}"
  143. ## $ js <chr> "\"{\\\"a\\\":1}\""
  144. #### dplyr
  145. ``` r
  146. tbl(con, sql("
  147. SELECT
  148. CAST('chr' AS CHAR(4)) achar,
  149. CAST('varchr' AS VARCHAR) avarchr,
  150. CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday,
  151. CAST(100.1 AS DOUBLE) AS justadbl,
  152. CAST(127 AS TINYINT) AS asmallint,
  153. CAST(100 AS INTEGER) AS justanint,
  154. CAST(100000000000000000 AS BIGINT) AS abigint,
  155. CAST(('GET' = 'GET') AS BOOLEAN) AS is_get,
  156. ARRAY[1, 2, 3] AS arr,
  157. ARRAY['1', '2, 3', '4'] AS arr,
  158. MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp,
  159. CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw,
  160. CAST('{\"a\":1}' AS JSON) js
  161. FROM elb_logs
  162. LIMIT 1
  163. ")) %>%
  164. glimpse()
  165. ```
  166. ## Observations: ??
  167. ## Variables: 13
  168. ## Database: AthenaConnection
  169. ## $ achar <chr> "chr "
  170. ## $ avarchr <chr> "varchr"
  171. ## $ tsday <date> 2014-09-27
  172. ## $ justadbl <dbl> 100.1
  173. ## $ asmallint <int> 127
  174. ## $ justanint <int> 100
  175. ## $ abigint <S3: integer64> 100000000000000000
  176. ## $ is_get <lgl> TRUE
  177. ## $ arr <chr> "1, 2, 3"
  178. ## $ arr <chr> "1, 2, 3, 4"
  179. ## $ mp <chr> "{bar=2, foo=1}"
  180. ## $ rw <chr> "{x=1, y=2.0}"
  181. ## $ js <chr> "\"{\\\"a\\\":1}\""
  182. ## Code of Conduct
  183. Please note that this project is released with a [Contributor Code of
  184. Conduct](CONDUCT.md). By participating in this project you agree to
  185. abide by its terms.