Browse Source

wch/harbor docker helper functions

master
boB Rudis 3 years ago
parent
commit
513b3d57cf
10 changed files with 164 additions and 14 deletions
  1. +1
    -0
      DESCRIPTION
  2. +3
    -0
      NAMESPACE
  3. +45
    -0
      R/docker.r
  4. +15
    -1
      README.Rmd
  5. +27
    -13
      README.md
  6. BIN
      img/cap.jpg
  7. BIN
      img/cap.png
  8. +21
    -0
      man/install_splash.Rd
  9. +26
    -0
      man/start_splash.Rd
  10. +26
    -0
      man/stop_splash.Rd

+ 1
- 0
DESCRIPTION View File

@@ -30,3 +30,4 @@ Imports:
magick,
HARtools
RoxygenNote: 6.0.0
Remotes: wch/harbor

+ 3
- 0
NAMESPACE View File

@@ -6,6 +6,7 @@ S3method(print,splash_status)
export("%>%")
export(HARviewer)
export(HARviewerOutput)
export(install_splash)
export(renderHARviewer)
export(render_har)
export(render_html)
@@ -15,6 +16,8 @@ export(render_png)
export(splash)
export(splash_active)
export(splash_debug)
export(start_splash)
export(stop_splash)
export(writeHAR)
import(httr)
import(magick)


+ 45
- 0
R/docker.r View File

@@ -0,0 +1,45 @@
#' Retrieve the Docker image for Splash
#'
#' @return `harbor` `host` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
install_splash <- function() {
harbor::docker_pull(localhost, "scrapinghub/splash")
}

#' Start a Splash server Docker container
#'
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] for this to work. You should save the resultant `host`
#' object for use in [stop_splash].
#' @return `harbor` `container` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
start_splash <- function() {
harbor::docker_run(localhost, image = "scrapinghub/splash", detach = TRUE,
docker_opts = "-p 5023:5023 -p 8050:8050 -p 8051:8051")
}

#' Stop a running a Splash server Docker container
#'
#' @param splash_container Docker `container` object created by [start_splash]
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] and started the Splash container with [start_splash] for this
#' to work. You will need the `container` object from [start_splash] for this to work.
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
stop_splash <- function(splash_container) {
harbor::container_rm(splash_container, force=TRUE)
}

+ 15
- 1
README.Rmd View File

@@ -17,6 +17,17 @@ You can also get it running with two commands:
(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)

If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:

install_splash()
splash_container <- start_splash()

and then run:

stop_splash(splash_container)
when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.

You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.

All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@@ -32,6 +43,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container

Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.

@@ -42,7 +56,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- _Possibly_ writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>_Possibly_ writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters

### Installation


+ 27
- 13
README.md View File

@@ -14,6 +14,17 @@ You can also get it running with two commands:

(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)

If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:

install_splash()
splash_container <- start_splash()

and then run:

stop_splash(splash_container)

when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.

You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.

All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@@ -29,6 +40,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container

Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.

@@ -39,7 +53,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- *Possibly* writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>*Possibly* writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters

### Installation
@@ -73,7 +87,7 @@ splash("splash", 8050L) %>%
splash_active()
```

## Status of splash instance on [http://splash:8050]: ok. Max RSS: 402407424
## Status of splash instance on [http://splash:8050]: ok. Max RSS: 412110848

``` r
splash("splash", 8050L) %>%
@@ -89,7 +103,7 @@ splash("splash", 8050L) %>%
## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1
## ..$ Request : int 1
## $ maxrss : int 392976
## $ maxrss : int 402452
## $ qsize : int 0
## $ url : chr "http://splash:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@@ -104,8 +118,8 @@ splash("splash", 8050L) %>%

## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=1.114.1_widgets_244" type="text/javascri ...
## [2] <body>\n<iframe src="http://tpc.googlesyndication.com/safeframe/1-0-6/html/container.html" style="visibility: hid ...
## [1] <head>\n<script type="text/javascript" async="" id="tealium-tag-3005" src="http://b.scorecardresearch.com/c2/1526 ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...

``` r
read_html("http://marvel.com/universe/Captain_America_(Steve_Rogers)")
@@ -136,21 +150,21 @@ print(har)
## --------HAR PAGES--------
## Page id: 1 , Page title: Poynter – A global leader in journalism. Strengthening democracy.
## --------HAR ENTRIES--------
## Number of entries: 55
## Number of entries: 50
## REQUESTS:
## Page: 1
## Number of entries: 55
## Number of entries: 50
## - http://www.poynter.org/
## - http://www.poynter.org/wp-content/plugins/easy-author-image/css/easy-author-image.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css?ver=2016_06_24.1
## - http://cloud.webtype.com/css/162ac332-3b31-4b73-ad44-da375b7f2fe3.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css?ver=2016_06_24.1
## ........
## - http://ntvcld-a.akamaihd.net/image/upload/w_286,h_161,c_fill,g_auto,f_auto/assets/C6B95A2AECA04462AC9FCD7C9802256...
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486264735645&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/15471443418029360623
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsu3mzkIuC8SYIGCp5136h6q7AtaZDrZ109tKADwc544iipyqEmWMxVMC...
## - data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMgAAACgCAYAAABJ/yOpAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccl...
## - https://stats.g.doubleclick.net/r/collect?v=1&aip=1&t=dc&_r=3&tid=UA-2072784-1&cid=1992506909.1486267047&jid=1325...
## - http://srv-2017-02-05-03.config.parsely.com/config/poynter.org
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486267047731&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/10025351500812357522
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsv3IVwW6mP5Eu79tajcj_fXJXhJhWb5xWUMF31OW8pkuhKz-68Gbdb1m...

You can use [`HARtools::HARviewer`](https://github.com/johndharrison/HARtools/blob/master/R/HARviewer.R) — which this pkg import/exports — to get view the HAR in an interactive HTML widget.

@@ -179,7 +193,7 @@ library(testthat)
date()
```

## [1] "Sat Feb 4 22:19:00 2017"
## [1] "Sat Feb 4 22:57:33 2017"

``` r
test_dir("tests/")


BIN
img/cap.jpg View File

Before After
Width: 1024  |  Height: 768  |  Size: 123KB Width: 1024  |  Height: 768  |  Size: 118KB

BIN
img/cap.png View File

Before After
Width: 1024  |  Height: 768  |  Size: 433KB Width: 1024  |  Height: 768  |  Size: 433KB

+ 21
- 0
man/install_splash.Rd View File

@@ -0,0 +1,21 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{install_splash}
\alias{install_splash}
\title{Retrieve the Docker image for Splash}
\usage{
install_splash()
}
\value{
`harbor` `host` object
}
\description{
Retrieve the Docker image for Splash
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

+ 26
- 0
man/start_splash.Rd View File

@@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{start_splash}
\alias{start_splash}
\title{Start a Splash server Docker container}
\usage{
start_splash()
}
\value{
`harbor` `container` object
}
\description{
Start a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] for this to work. You should save the resultant `host`
object for use in [stop_splash].
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

+ 26
- 0
man/stop_splash.Rd View File

@@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{stop_splash}
\alias{stop_splash}
\title{Stop a running a Splash server Docker container}
\usage{
stop_splash(splash_container)
}
\arguments{
\item{splash_container}{Docker `container` object created by [start_splash]}
}
\description{
Stop a running a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] and started the Splash container with [start_splash] for this
to work. You will need the `container` object from [start_splash] for this to work.
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

Loading…
Cancel
Save