Pārlūkot izejas kodu

wch/harbor docker helper functions

master
boB Rudis pirms 7 gadiem
vecāks
revīzija
513b3d57cf
  1. 1
      DESCRIPTION
  2. 3
      NAMESPACE
  3. 45
      R/docker.r
  4. 16
      README.Rmd
  5. 40
      README.md
  6. Binārs
      img/cap.jpg
  7. Binārs
      img/cap.png
  8. 21
      man/install_splash.Rd
  9. 26
      man/start_splash.Rd
  10. 26
      man/stop_splash.Rd

1
DESCRIPTION

@ -30,3 +30,4 @@ Imports:
magick,
HARtools
RoxygenNote: 6.0.0
Remotes: wch/harbor

3
NAMESPACE

@ -6,6 +6,7 @@ S3method(print,splash_status)
export("%>%")
export(HARviewer)
export(HARviewerOutput)
export(install_splash)
export(renderHARviewer)
export(render_har)
export(render_html)
@ -15,6 +16,8 @@ export(render_png)
export(splash)
export(splash_active)
export(splash_debug)
export(start_splash)
export(stop_splash)
export(writeHAR)
import(httr)
import(magick)

45
R/docker.r

@ -0,0 +1,45 @@
#' Retrieve the Docker image for Splash
#'
#' @return `harbor` `host` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
install_splash <- function() {
harbor::docker_pull(localhost, "scrapinghub/splash")
}
#' Start a Splash server Docker container
#'
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] for this to work. You should save the resultant `host`
#' object for use in [stop_splash].
#' @return `harbor` `container` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
start_splash <- function() {
harbor::docker_run(localhost, image = "scrapinghub/splash", detach = TRUE,
docker_opts = "-p 5023:5023 -p 8050:8050 -p 8051:8051")
}
#' Stop a running a Splash server Docker container
#'
#' @param splash_container Docker `container` object created by [start_splash]
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] and started the Splash container with [start_splash] for this
#' to work. You will need the `container` object from [start_splash] for this to work.
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
stop_splash <- function(splash_container) {
harbor::container_rm(splash_container, force=TRUE)
}

16
README.Rmd

@ -17,6 +17,17 @@ You can also get it running with two commands:
(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)
If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:
install_splash()
splash_container <- start_splash()
and then run:
stop_splash(splash_container)
when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.
You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.
All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@ -32,6 +43,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container
Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.
@ -42,7 +56,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- _Possibly_ writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>_Possibly_ writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters
### Installation

40
README.md

@ -14,6 +14,17 @@ You can also get it running with two commands:
(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)
If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:
install_splash()
splash_container <- start_splash()
and then run:
stop_splash(splash_container)
when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.
You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.
All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@ -29,6 +40,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container
Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.
@ -39,7 +53,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- *Possibly* writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>*Possibly* writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters
### Installation
@ -73,7 +87,7 @@ splash("splash", 8050L) %>%
splash_active()
```
## Status of splash instance on [http://splash:8050]: ok. Max RSS: 402407424
## Status of splash instance on [http://splash:8050]: ok. Max RSS: 412110848
``` r
splash("splash", 8050L) %>%
@ -89,7 +103,7 @@ splash("splash", 8050L) %>%
## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1
## ..$ Request : int 1
## $ maxrss : int 392976
## $ maxrss : int 402452
## $ qsize : int 0
## $ url : chr "http://splash:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@ -104,8 +118,8 @@ splash("splash", 8050L) %>%
## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=1.114.1_widgets_244" type="text/javascri ...
## [2] <body>\n<iframe src="http://tpc.googlesyndication.com/safeframe/1-0-6/html/container.html" style="visibility: hid ...
## [1] <head>\n<script type="text/javascript" async="" id="tealium-tag-3005" src="http://b.scorecardresearch.com/c2/1526 ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...
``` r
read_html("http://marvel.com/universe/Captain_America_(Steve_Rogers)")
@ -136,21 +150,21 @@ print(har)
## --------HAR PAGES--------
## Page id: 1 , Page title: Poynter – A global leader in journalism. Strengthening democracy.
## --------HAR ENTRIES--------
## Number of entries: 55
## Number of entries: 50
## REQUESTS:
## Page: 1
## Number of entries: 55
## Number of entries: 50
## - http://www.poynter.org/
## - http://www.poynter.org/wp-content/plugins/easy-author-image/css/easy-author-image.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css?ver=2016_06_24.1
## - http://cloud.webtype.com/css/162ac332-3b31-4b73-ad44-da375b7f2fe3.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css?ver=2016_06_24.1
## ........
## - http://ntvcld-a.akamaihd.net/image/upload/w_286,h_161,c_fill,g_auto,f_auto/assets/C6B95A2AECA04462AC9FCD7C9802256...
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486264735645&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/15471443418029360623
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsu3mzkIuC8SYIGCp5136h6q7AtaZDrZ109tKADwc544iipyqEmWMxVMC...
## - data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMgAAACgCAYAAABJ/yOpAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccl...
## - https://stats.g.doubleclick.net/r/collect?v=1&aip=1&t=dc&_r=3&tid=UA-2072784-1&cid=1992506909.1486267047&jid=1325...
## - http://srv-2017-02-05-03.config.parsely.com/config/poynter.org
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486267047731&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/10025351500812357522
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsv3IVwW6mP5Eu79tajcj_fXJXhJhWb5xWUMF31OW8pkuhKz-68Gbdb1m...
You can use [`HARtools::HARviewer`](https://github.com/johndharrison/HARtools/blob/master/R/HARviewer.R) — which this pkg import/exports — to get view the HAR in an interactive HTML widget.
@ -179,7 +193,7 @@ library(testthat)
date()
```
## [1] "Sat Feb 4 22:19:00 2017"
## [1] "Sat Feb 4 22:57:33 2017"
``` r
test_dir("tests/")

Binārs
img/cap.jpg

Bināro failu nav iespējams attēlot.

Pirms

Platums:  |  Augstums:  |  Izmērs: 123 KiB

Pēc

Platums:  |  Augstums:  |  Izmērs: 118 KiB

Binārs
img/cap.png

Bināro failu nav iespējams attēlot.

Pirms

Platums:  |  Augstums:  |  Izmērs: 433 KiB

Pēc

Platums:  |  Augstums:  |  Izmērs: 433 KiB

21
man/install_splash.Rd

@ -0,0 +1,21 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{install_splash}
\alias{install_splash}
\title{Retrieve the Docker image for Splash}
\usage{
install_splash()
}
\value{
`harbor` `host` object
}
\description{
Retrieve the Docker image for Splash
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

26
man/start_splash.Rd

@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{start_splash}
\alias{start_splash}
\title{Start a Splash server Docker container}
\usage{
start_splash()
}
\value{
`harbor` `container` object
}
\description{
Start a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] for this to work. You should save the resultant `host`
object for use in [stop_splash].
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

26
man/stop_splash.Rd

@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{stop_splash}
\alias{stop_splash}
\title{Stop a running a Splash server Docker container}
\usage{
stop_splash(splash_container)
}
\arguments{
\item{splash_container}{Docker `container` object created by [start_splash]}
}
\description{
Stop a running a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] and started the Splash container with [start_splash] for this
to work. You will need the `container` object from [start_splash] for this to work.
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}
Notiek ielāde…
Atcelt
Saglabāt