You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
49 lines
2.0 KiB
49 lines
2.0 KiB
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/read-html.r
|
|
\name{chrome_read_html}
|
|
\alias{chrome_read_html}
|
|
\title{Read a URL via headless Chrome and return the raw or rendered \code{<body>} \code{innerHTML} DOM elements}
|
|
\usage{
|
|
chrome_read_html(url, render = TRUE, prime = TRUE, work_dir = NULL,
|
|
chrome_bin = Sys.getenv("HEADLESS_CHROME"))
|
|
}
|
|
\arguments{
|
|
\item{url}{URL to read from}
|
|
|
|
\item{render}{if \code{TRUE} then return an \code{xml_document}, else the raw HTML (invisibly)}
|
|
|
|
\item{prime}{if \code{TRUE} preliminary URL retrieval requests will be sent to "prime" the
|
|
headless Chrome cache. This seems to be necessary primarily on recent versions of macOS.
|
|
If numeric, that number of "prime" requests will be sent ahead of the capture request.
|
|
If \code{FALSE} no priming requests will be sent.}
|
|
|
|
\item{work_dir}{See special Section.}
|
|
|
|
\item{chrome_bin}{the path to Chrome (auto-set from \code{HEADLESS_CHROME} environment variable)}
|
|
}
|
|
\description{
|
|
Read a URL via headless Chrome and return the raw or rendered \code{<body>} \code{innerHTML} DOM elements
|
|
}
|
|
\note{
|
|
This only grabs the \code{<body>} \code{innerHTML} contents
|
|
}
|
|
\section{Working around headless Chrome & OS security restrictions}{
|
|
|
|
Security restrictions on various operating systems and OS configurations can cause
|
|
headless Chrome execution to fail. As a result, headless Chrome operations should
|
|
use a special directory for \code{decapitated} package operations. You can pass this
|
|
in as \code{work_dir}. If \code{work_dir} is \code{NULL} a \code{.rdecapdata} directory will be
|
|
created in your home directory and used for the data, crash dumps and utility
|
|
directories for Chrome operations.\cr
|
|
\cr
|
|
\code{tempdir()} does not always meet these requirements (after testing on various
|
|
macOS 10.13 systems) as Chrome does some interesting attribute setting for
|
|
some of its file operations.
|
|
\cr
|
|
If you pass in a \code{work_dir}, it must be one that does not violate OS security
|
|
restrictions or headless Chrome will not function.
|
|
}
|
|
|
|
\examples{
|
|
chrome_read_html("https://www.r-project.org/")
|
|
}
|
|
|