You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.1 KiB
2.1 KiB
decapitated
Headless ‘Chrome’ Orchestration
Description
The ‘Chrome’ browser https://www.google.com/chrome/ has a headless mode which can be instrumented programmatically. Tools are provided to perform headless ‘Chrome’ instrumentation on the command-line, including retrieving the javascript-executed web page, PDF output or screen shot of a URL.
IMPORTANT
You’ll need to set an envrionment variable HEADLESS_CHROME
to one of
these two values:
- Windows(32bit):
C:/Program Files/Google/Chrome/Application/chrome.exe
- Windows(64bit):
C:/Program Files (x86)/Google/Chrome/Application/chrome.exe
- macOS:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
- Linux:
/usr/bin/google-chrome
A guess is made (but not verified yet) if HEADLESS_CHROME
is
non-existent.
It’s best to use ~/.Renviron
to store this value for the time being.
What’s in the tin?
The following functions are implemented:
chrome_dump_pdf
: “Print” to PDFchrome_read_html
: Read a URL via headless Chrome and return the raw or rendered ’ ‘’innerHTML’ DOM elementschrome_shot
: Capture a screenshotchrome_version
: Get Chrome versionget_chrome_env
: get an envrionment variable ‘HEADLESS_CHROME’set_chrome_env
: set an envrionment variable ‘HEADLESS_CHROME’
Installation
devtools::install_github("hrbrmstr/decapitated")
Usage
library(decapitated)
# current verison
packageVersion("decapitated")
## [1] '0.1.0'
chrome_version()
chrome_read_html("http://httpbin.org/")
## {xml_document}
## <html>
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta http-equiv="content-type" valu ...
## [2] <body id="manpage">\n<a href="http://github.com/kennethreitz/httpbin"><img style="position: absolute; top: 0; rig ...
chrome_dump_pdf("http://httpbin.org/")
chrome_shot("http://httpbin.org/")
## format width height colorspace filesize
## 1 PNG 1600 1200 sRGB 215680