Access and Query Amazon Athena via DBI/JDBC
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.Rmd 4.1KB

3 years ago
3 years ago
2 years ago
3 years ago
3 years ago
2 years ago
3 years ago
2 years ago
3 years ago
2 years ago
2 years ago
3 years ago
2 years ago
3 years ago
2 years ago
2 years ago
2 years ago
3 years ago
3 years ago
3 years ago
2 years ago
3 years ago
3 years ago
3 years ago
2 years ago
2 years ago
3 years ago
2 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
  1. ---
  2. output: rmarkdown::github_document
  3. editor_options:
  4. chunk_output_type: console
  5. ---
  6. # metis
  7. Access and Query Amazon Athena via DBI/JDBC
  8. ## Description
  9. In Greek mythology, Metis was Athena's "helper" so methods are provided to help you accessing and querying Amazon Athena via DBI/JDBC and/or `dplyr`.
  10. #' Methods are provides to connect to 'Amazon' 'Athena', lookup schemas/tables,
  11. ## IMPORTANT
  12. Since R 3.5 (I don't remember this happening in R 3.4.x) signals sent from interrupting Athena JDBC calls crash the R interpreter. You need to set the `-Xrs` option to avoid signals being passed on to the JVM owner. That has to be done _before_ `rJava` is loaded so you either need to remember to put it at the top of all scripts _or_ stick this in your local `~/.Rprofile` and/or sitewide `Rprofile`:
  13. ```r
  14. if (!grepl("-Xrs", getOption("java.parameters", ""))) {
  15. options(
  16. "java.parameters" = c(getOption("java.parameters", default = NULL), "-Xrs")
  17. )
  18. }
  19. ```
  20. ## What's Inside The Tin?
  21. The following functions are implemented:
  22. Easy-interface connection helper:
  23. - `athena_connect` Simplified Athena JDBC connection helper
  24. Custom JDBC Classes:
  25. - `Athena`: AthenaJDBC (make a new Athena con obj)
  26. - `AthenaConnection-class`: AthenaJDBC
  27. - `AthenaDriver-class`: AthenaJDBC
  28. - `AthenaResult-class`: AthenaJDBC
  29. Custom JDBC Class Methods:
  30. - `dbConnect-method`
  31. - `dbExistsTable-method`
  32. - `dbGetQuery-method`
  33. - `dbListFields-method`
  34. - `dbListTables-method`
  35. - `dbReadTable-method`
  36. - `dbSendQuery-method`
  37. Pulled in from other `cloudyr` pkgs:
  38. - `read_credentials`: Use Credentials from .aws/credentials File
  39. - `use_credentials`: Use Credentials from .aws/credentials File
  40. ## Installation
  41. ```{r eval=FALSE}
  42. devtools::install_git("https://git.sr.ht/~hrbrmstr/metis-lite")
  43. # OR
  44. devtools::install_gitlab("hrbrmstr/metis-lite")
  45. # OR
  46. devtools::install_github("hrbrmstr/metis-lite")
  47. ```
  48. ```{r message=FALSE, warning=FALSE, include=FALSE}
  49. options(width=120)
  50. ```
  51. ## Usage
  52. ```{r message=FALSE, warning=FALSE}
  53. library(metis.lite)
  54. # current verison
  55. packageVersion("metis.lite")
  56. ```
  57. ```{r message=FALSE, warning=FALSE}
  58. library(rJava)
  59. library(RJDBC)
  60. library(metis.lite)
  61. library(magrittr)
  62. library(dbplyr)
  63. library(dplyr)
  64. dbConnect(
  65. drv = metis.lite::Athena(),
  66. schema_name = "sampledb",
  67. provider = "com.simba.athena.amazonaws.auth.PropertiesFileCredentialsProvider",
  68. AwsCredentialsProviderArguments = path.expand("~/.aws/athenaCredentials.props"),
  69. s3_staging_dir = "s3://aws-athena-query-results-569593279821-us-east-1",
  70. ) -> con
  71. dbListTables(con, schema="sampledb")
  72. dbExistsTable(con, "elb_logs", schema="sampledb")
  73. dbListFields(con, "elb_logs", "sampledb")
  74. dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>%
  75. glimpse()
  76. ```
  77. ### Check types
  78. ```{r}
  79. dbGetQuery(con, "
  80. SELECT
  81. CAST('chr' AS CHAR(4)) achar,
  82. CAST('varchr' AS VARCHAR) avarchr,
  83. CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday,
  84. CAST(100.1 AS DOUBLE) AS justadbl,
  85. CAST(127 AS TINYINT) AS asmallint,
  86. CAST(100 AS INTEGER) AS justanint,
  87. CAST(100000000000000000 AS BIGINT) AS abigint,
  88. CAST(('GET' = 'GET') AS BOOLEAN) AS is_get,
  89. ARRAY[1, 2, 3] AS arr1,
  90. ARRAY['1', '2, 3', '4'] AS arr2,
  91. MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp,
  92. CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw,
  93. CAST('{\"a\":1}' AS JSON) js
  94. FROM elb_logs
  95. LIMIT 1
  96. ") %>%
  97. glimpse()
  98. ```
  99. #### dplyr
  100. ```{r}
  101. tbl(con, sql("
  102. SELECT
  103. CAST('chr' AS CHAR(4)) achar,
  104. CAST('varchr' AS VARCHAR) avarchr,
  105. CAST(SUBSTR(timestamp, 1, 10) AS DATE) AS tsday,
  106. CAST(100.1 AS DOUBLE) AS justadbl,
  107. CAST(127 AS TINYINT) AS asmallint,
  108. CAST(100 AS INTEGER) AS justanint,
  109. CAST(100000000000000000 AS BIGINT) AS abigint,
  110. CAST(('GET' = 'GET') AS BOOLEAN) AS is_get,
  111. ARRAY[1, 2, 3] AS arr,
  112. ARRAY['1', '2, 3', '4'] AS arr,
  113. MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) AS mp,
  114. CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)) AS rw,
  115. CAST('{\"a\":1}' AS JSON) js
  116. FROM elb_logs
  117. LIMIT 1
  118. ")) %>%
  119. glimpse()
  120. ```
  121. ## Code of Conduct
  122. Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.