Tools to Work with the 'Splash' JavaScript Rendering Service in R
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

render_html.Rd 3.2KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
  1. % Generated by roxygen2: do not edit by hand
  2. % Please edit documentation in R/render-html.R
  3. \name{render_html}
  4. \alias{render_html}
  5. \title{Return the HTML of the javascript-rendered page.}
  6. \usage{
  7. render_html(splash_obj = splash_local, url, base_url, timeout = 30,
  8. resource_timeout, wait = 0, proxy, js, js_src, filters,
  9. allowed_domains, allowed_content_types, forbidden_content_types,
  10. viewport = "1024x768", images, headers, body, http_method, save_args,
  11. load_args, raw_html = FALSE)
  12. }
  13. \arguments{
  14. \item{splash_obj}{Object created by a call to \code{\link[=splash]{splash()}}}
  15. \item{url}{The URL to render (required)}
  16. \item{base_url}{The base url to render the page with.}
  17. \item{timeout}{A timeout (in seconds) for the render (defaults to 30). Without
  18. reconfiguring the startup parameters of the Splash server (not this package)
  19. the maximum allowed value for the timeout is 60 seconds.}
  20. \item{resource_timeout}{A timeout (in seconds) for individual network requests.}
  21. \item{wait}{Time (in seconds) to wait for updates after page is loaded (defaults to 0).}
  22. \item{proxy}{Proxy profile name or proxy URL.}
  23. \item{js}{Javascript profile name.}
  24. \item{js_src}{JavaScript code to be executed in page context.}
  25. \item{filters}{Comma-separated list of request filter names.}
  26. \item{allowed_domains}{Comma-separated list of allowed domain names. If present, Splash
  27. won’t load anything neither from domains not in this list nor from subdomains of
  28. domains not in this list.}
  29. \item{allowed_content_types}{Comma-separated list of allowed content types. If present,
  30. Splash will abort any request if the response’s content type doesn’t match any of
  31. the content types in this list. Wildcards are supported.}
  32. \item{forbidden_content_types}{Comma-separated list of forbidden content types. If
  33. present, Splash will abort any request if the response’s content type matches
  34. any of the content types in this list. Wildcards are supported.}
  35. \item{viewport}{View width and height (in pixels) of the browser viewport to render the
  36. web page. Format is “<width>x<height>”, e.g. 800x600. Default value is "full".}
  37. \item{images}{Whether to download images.}
  38. \item{headers}{HTTP headers to set for the first outgoing request.}
  39. \item{body}{Body of HTTP POST request to be sent if method is POST.}
  40. \item{http_method}{HTTP method of outgoing Splash request.}
  41. \item{save_args}{A list of argument names to put in cache.}
  42. \item{load_args}{Parameter values to load from cache}
  43. \item{raw_html}{if \code{TRUE} then return a character vector vs an XML document. Only valid for \code{render_html}}
  44. }
  45. \value{
  46. An XML document. Note that this is processed by \code{\link[xml2:read_html]{xml2::read_html()}} so it will not be
  47. the pristine, raw, rendered HTML from the site. Use \code{raw_html=TRUE} if you do not want it
  48. to be processed first by \code{xml2}. If you choose \code{raw_html=TRUE} you'll get back a
  49. character vector.
  50. }
  51. \description{
  52. Similar (i.e. a dynamic equivalent) to \code{rvest::read_html}.
  53. }
  54. \references{
  55. \href{http://splash.readthedocs.io/en/stable/index.html}{Splash docs}
  56. }
  57. \seealso{
  58. Other splash_renderers: \code{\link{execute_lua}},
  59. \code{\link{render_har}}, \code{\link{render_jpeg}},
  60. \code{\link{render_json}}, \code{\link{render_png}}
  61. }
  62. \concept{splash_renderers}