You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

86 lines
2.1 KiB

7 years ago
6 years ago
# decapitated
Headless ‘Chrome’ Orchestration
6 years ago
## Description
7 years ago
The ‘Chrome’ browser <https://www.google.com/chrome/> has a headless
mode which can be instrumented programmatically. Tools are provided to
perform headless ‘Chrome’ instrumentation on the command-line, including
retrieving the javascript-executed web page, PDF output or screen shot
of a URL.
7 years ago
### IMPORTANT
6 years ago
You’ll need to set an envrionment variable `HEADLESS_CHROME` to one of
these two values:
7 years ago
- Windows(32bit): `C:/Program
Files/Google/Chrome/Application/chrome.exe`
- Windows(64bit): `C:/Program Files
(x86)/Google/Chrome/Application/chrome.exe`
- macOS: `/Applications/Google\ Chrome.app/Contents/MacOS/Google\
Chrome`
- Linux: `/usr/bin/google-chrome`
7 years ago
A guess is made (but not verified yet) if `HEADLESS_CHROME` is
non-existent.
7 years ago
It’s best to use `~/.Renviron` to store this value for the time being.
7 years ago
## What’s in the tin?
7 years ago
The following functions are implemented:
- `chrome_dump_pdf`: “Print” to PDF
- `chrome_read_html`: Read a URL via headless Chrome and return the
raw or rendered ’
7 years ago
<body>
‘’innerHTML’ DOM elements
- `chrome_shot`: Capture a screenshot
- `chrome_version`: Get Chrome version
- `get_chrome_env`: get an envrionment variable ‘HEADLESS\_CHROME’
- `set_chrome_env`: set an envrionment variable ‘HEADLESS\_CHROME’
7 years ago
6 years ago
## Installation
7 years ago
``` r
devtools::install_github("hrbrmstr/decapitated")
```
6 years ago
## Usage
7 years ago
``` r
library(decapitated)
# current verison
packageVersion("decapitated")
```
## [1] '0.1.0'
7 years ago
``` r
chrome_version()
chrome_read_html("http://httpbin.org/")
```
## {xml_document}
## <html>
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta http-equiv="content-type" valu ...
## [2] <body id="manpage">\n<a href="http://github.com/kennethreitz/httpbin"><img style="position: absolute; top: 0; rig ...
7 years ago
``` r
chrome_dump_pdf("http://httpbin.org/")
```
``` r
chrome_shot("http://httpbin.org/")
## format width height colorspace filesize
## 1 PNG 1600 1200 sRGB 215680
7 years ago
```
![screenshot.png](screenshot.png)