Access and Query Amazon Athena via DBI/JDBC
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

57 lines
1.9 KiB

3 years ago
  1. [`metis`](https://en.wikipedia.org/wiki/Metis_(mythology)) : Helpers for Accessing and Querying Amazon Athena
  2. Including a lightweight RJDBC shim.
  3. ![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Winged_goddess_Louvre_F32.jpg/300px-Winged_goddess_Louvre_F32.jpg)
  4. THIS IS SUPER ALPHA QUALITY. NOTHING TO SEE HERE. MOVE ALONG.
  5. The goal will be to get around enough of the "gotchas" that are preventing raw RJDBC Athena connecitons from "just working" with `dplyr` v0.6.0+ and also get around the [`fetchSize` problem](https://www.reddit.com/r/aws/comments/6aq22b/fetchsize_limit/) without having to not use `dbGetQuery()`.
  6. It will also support more than the vanilla id/secret auth mechism (it currently support the default basic auth and temp token auth, the latter via environment variables).
  7. See the **Usage** section for an example.
  8. The following functions are implemented:
  9. - `athena_connect`: Make a JDBC connection to Athena (this returns an `AthenaConnection` object which is a super-class of it's RJDBC vanilla counterpart)
  10. - `Athena`: AthenaJDBC\`
  11. - `AthenaConnection-class`: AthenaJDBC
  12. - `AthenaDriver-class`: AthenaJDBC
  13. - `AthenaResult-class`: AthenaJDBC
  14. - `dbConnect-method`: AthenaJDBC
  15. - `dbGetQuery-method`: AthenaJDBC
  16. - `dbSendQuery-method`: AthenaJDBC
  17. ### Installation
  18. ``` r
  19. devtools::install_github("hrbrmstr/metis")
  20. ```
  21. ### Usage
  22. ``` r
  23. library(metis)
  24. library(dplyr)
  25. # current verison
  26. packageVersion("metis")
  27. ```
  28. ## [1] '0.1.0'
  29. ``` r
  30. ath <- athena_connect("your_schema_name")
  31. res <- dbGetQuery(ath, "
  32. SELECT format_datetime(timestamp, 'yyyy-MM-dd HH:00:00') timestamp,
  33. port as field, count(port) cnt_field FROM your_schema_name.your_table_name
  34. WHERE CONTAINS(ARRAY['201705'], date)
  35. AND port IN (445, 139, 3389)
  36. AND timestamp > date '2017-05-01'
  37. AND timestamp <= date '2017-05-22'
  38. GROUP BY format_datetime(timestamp, 'yyyy-MM-dd HH:00:00'), port LIMIT 1000000
  39. ")
  40. ```