You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
boB Rudis 269dd1d866
major improvements all around
5 years ago
R major improvements all around 5 years ago
inst/include just decaptiation 5 years ago
man major improvements all around 5 years ago
.Rbuildignore initial commit 5 years ago
.codecov.yml initial commit 5 years ago
.gitignore initial commit 5 years ago
.travis.yml initial commit 5 years ago
DESCRIPTION major improvements all around 5 years ago
NAMESPACE major improvements all around 5 years ago
NEWS.md major improvements all around 5 years ago
README.Rmd major improvements all around 5 years ago
README.md major improvements all around 5 years ago
decapitated.Rproj initial commit 5 years ago
output.pdf major improvements all around 5 years ago
screenshot.png major improvements all around 5 years ago

README.md

decapitated

Headless ‘Chrome’ Orchestration

Description

The ‘Chrome’ browser https://www.google.com/chrome/ has a headless mode which can be instrumented programmatically. Tools are provided to perform headless ‘Chrome’ instrumentation on the command-line, including retrieving the javascript-executed web page, PDF output or screen shot of a URL.

IMPORTANT

You’ll need to set an envrionment variable HEADLESS_CHROME to one of these two values:

  • Windows(32bit): C:/Program Files/Google/Chrome/Application/chrome.exe
  • Windows(64bit): C:/Program Files (x86)/Google/Chrome/Application/chrome.exe
  • macOS: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
  • Linux: /usr/bin/google-chrome

A guess is made (but not verified yet) if HEADLESS_CHROME is non-existent.

It’s best to use ~/.Renviron to store this value for the time being.

What’s in the tin?

The following functions are implemented:

  • chrome_dump_pdf: “Print” to PDF
  • chrome_read_html: Read a URL via headless Chrome and return the raw or rendered ’ ‘’innerHTML’ DOM elements
  • chrome_shot: Capture a screenshot
  • chrome_version: Get Chrome version
  • get_chrome_env: get an envrionment variable ‘HEADLESS_CHROME’
  • set_chrome_env: set an envrionment variable ‘HEADLESS_CHROME’

Installation

devtools::install_github("hrbrmstr/decapitated")

Usage

library(decapitated)

# current verison
packageVersion("decapitated")
## [1] '0.1.0'
chrome_version()

chrome_read_html("http://httpbin.org/")
## {xml_document}
## <html>
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta http-equiv="content-type" valu ...
## [2] <body id="manpage">\n<a href="http://github.com/kennethreitz/httpbin"><img style="position: absolute; top: 0; rig ...
chrome_dump_pdf("http://httpbin.org/")
chrome_shot("http://httpbin.org/")

##   format width height colorspace filesize
## 1    PNG  1600   1200       sRGB   215680

screenshot.png