Kaynağa Gözat

wch/harbor docker helper functions

master
boB Rudis 7 yıl önce
ebeveyn
işleme
513b3d57cf
  1. 1
      DESCRIPTION
  2. 3
      NAMESPACE
  3. 45
      R/docker.r
  4. 16
      README.Rmd
  5. 40
      README.md
  6. BIN
      img/cap.jpg
  7. BIN
      img/cap.png
  8. 21
      man/install_splash.Rd
  9. 26
      man/start_splash.Rd
  10. 26
      man/stop_splash.Rd

1
DESCRIPTION

@ -30,3 +30,4 @@ Imports:
magick,
HARtools
RoxygenNote: 6.0.0
Remotes: wch/harbor

3
NAMESPACE

@ -6,6 +6,7 @@ S3method(print,splash_status)
export("%>%")
export(HARviewer)
export(HARviewerOutput)
export(install_splash)
export(renderHARviewer)
export(render_har)
export(render_html)
@ -15,6 +16,8 @@ export(render_png)
export(splash)
export(splash_active)
export(splash_debug)
export(start_splash)
export(stop_splash)
export(writeHAR)
import(httr)
import(magick)

45
R/docker.r

@ -0,0 +1,45 @@
#' Retrieve the Docker image for Splash
#'
#' @return `harbor` `host` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
install_splash <- function() {
harbor::docker_pull(localhost, "scrapinghub/splash")
}
#' Start a Splash server Docker container
#'
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] for this to work. You should save the resultant `host`
#' object for use in [stop_splash].
#' @return `harbor` `container` object
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
start_splash <- function() {
harbor::docker_run(localhost, image = "scrapinghub/splash", detach = TRUE,
docker_opts = "-p 5023:5023 -p 8050:8050 -p 8051:8051")
}
#' Stop a running a Splash server Docker container
#'
#' @param splash_container Docker `container` object created by [start_splash]
#' @note you need Docker running on your system and have pulled the container with
#' [install_spash] and started the Splash container with [start_splash] for this
#' to work. You will need the `container` object from [start_splash] for this to work.
#' @export
#' @examples \dontrun{
#' install_splash()
#' splash_container <- start_splash()
#' stop_splash(splash_container)
#' }
stop_splash <- function(splash_container) {
harbor::container_rm(splash_container, force=TRUE)
}

16
README.Rmd

@ -17,6 +17,17 @@ You can also get it running with two commands:
(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)
If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:
install_splash()
splash_container <- start_splash()
and then run:
stop_splash(splash_container)
when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.
You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.
All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@ -32,6 +43,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container
Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.
@ -42,7 +56,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- _Possibly_ writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>_Possibly_ writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters
### Installation

40
README.md

@ -14,6 +14,17 @@ You can also get it running with two commands:
(Do whatever you Windows ppl do with Docker on your systems to make ^^ work.)
If using the [`harbor`](https://github.com/wch/harbor) package you can use the convience wrappers in this pacakge:
install_splash()
splash_container <- start_splash()
and then run:
stop_splash(splash_container)
when done. All of that happens on your localhost so use `localhost` as the Splash server parameter.
You can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don't have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.
All you need for this package to work is a running Splash instance. You provide the host/port for it and it's scrape-tastic fun from there!
@ -29,6 +40,9 @@ The following functions are implemented:
- `render_jpeg`: Return a image (in JPEG format) of the javascript-rendered page.
- `render_png`: Return a image (in PNG format) of the javascript-rendered page.
- `splash`: Configure parameters for connecting to a Splash server
- `install_splash`: Retrieve the Docker image for Splash
- `start_splash`: Start a Splash server Docker container
- `stop_splash`: Stop a running a Splash server Docker container
Some functions from `HARtools` are imported/exported and `%>%` is imported/exported.
@ -39,7 +53,7 @@ Suggest more in a feature req!
- <strike>Implement `render.json`</strike>
- Implement `execute` (you can script Splash!)
- <strike>Add integration with [`HARtools`](https://github.com/johndharrison/HARtools)</strike>
- *Possibly* writing R function wrappers to start Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, possibly using [`harbor`](https://github.com/wch/harbor)
- <strike>*Possibly* writing R function wrappers to install/start/stop Splash</strike> which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using [`harbor`](https://github.com/wch/harbor)
- Testing results with all combinations of parameters
### Installation
@ -73,7 +87,7 @@ splash("splash", 8050L) %>%
splash_active()
```
## Status of splash instance on [http://splash:8050]: ok. Max RSS: 402407424
## Status of splash instance on [http://splash:8050]: ok. Max RSS: 412110848
``` r
splash("splash", 8050L) %>%
@ -89,7 +103,7 @@ splash("splash", 8050L) %>%
## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1
## ..$ Request : int 1
## $ maxrss : int 392976
## $ maxrss : int 402452
## $ qsize : int 0
## $ url : chr "http://splash:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@ -104,8 +118,8 @@ splash("splash", 8050L) %>%
## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=1.114.1_widgets_244" type="text/javascri ...
## [2] <body>\n<iframe src="http://tpc.googlesyndication.com/safeframe/1-0-6/html/container.html" style="visibility: hid ...
## [1] <head>\n<script type="text/javascript" async="" id="tealium-tag-3005" src="http://b.scorecardresearch.com/c2/1526 ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...
``` r
read_html("http://marvel.com/universe/Captain_America_(Steve_Rogers)")
@ -136,21 +150,21 @@ print(har)
## --------HAR PAGES--------
## Page id: 1 , Page title: Poynter – A global leader in journalism. Strengthening democracy.
## --------HAR ENTRIES--------
## Number of entries: 55
## Number of entries: 50
## REQUESTS:
## Page: 1
## Number of entries: 55
## Number of entries: 50
## - http://www.poynter.org/
## - http://www.poynter.org/wp-content/plugins/easy-author-image/css/easy-author-image.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css?ver=2016_06_24.1
## - http://cloud.webtype.com/css/162ac332-3b31-4b73-ad44-da375b7f2fe3.css?ver=2016_06_24.1
## - http://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css?ver=2016_06_24.1
## ........
## - http://ntvcld-a.akamaihd.net/image/upload/w_286,h_161,c_fill,g_auto,f_auto/assets/C6B95A2AECA04462AC9FCD7C9802256...
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486264735645&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/15471443418029360623
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsu3mzkIuC8SYIGCp5136h6q7AtaZDrZ109tKADwc544iipyqEmWMxVMC...
## - data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMgAAACgCAYAAABJ/yOpAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccl...
## - https://stats.g.doubleclick.net/r/collect?v=1&aip=1&t=dc&_r=3&tid=UA-2072784-1&cid=1992506909.1486267047&jid=1325...
## - http://srv-2017-02-05-03.config.parsely.com/config/poynter.org
## - http://srv-2017-02-05-03.pixel.parsely.com/plogger/?rand=1486267047731&idsite=poynter.org&url=http%3A%2F%2Fwww.po...
## - https://tpc.googlesyndication.com/simgad/10025351500812357522
## - https://securepubads.g.doubleclick.net/pcs/view?xai=AKAOjsv3IVwW6mP5Eu79tajcj_fXJXhJhWb5xWUMF31OW8pkuhKz-68Gbdb1m...
You can use [`HARtools::HARviewer`](https://github.com/johndharrison/HARtools/blob/master/R/HARviewer.R) — which this pkg import/exports — to get view the HAR in an interactive HTML widget.
@ -179,7 +193,7 @@ library(testthat)
date()
```
## [1] "Sat Feb 4 22:19:00 2017"
## [1] "Sat Feb 4 22:57:33 2017"
``` r
test_dir("tests/")

BIN
img/cap.jpg

İkili dosya gösterilmiyor.

Önce

Genişlik:  |  Yükseklik:  |  Boyut: 123 KiB

Sonra

Genişlik:  |  Yükseklik:  |  Boyut: 118 KiB

BIN
img/cap.png

İkili dosya gösterilmiyor.

Önce

Genişlik:  |  Yükseklik:  |  Boyut: 433 KiB

Sonra

Genişlik:  |  Yükseklik:  |  Boyut: 433 KiB

21
man/install_splash.Rd

@ -0,0 +1,21 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{install_splash}
\alias{install_splash}
\title{Retrieve the Docker image for Splash}
\usage{
install_splash()
}
\value{
`harbor` `host` object
}
\description{
Retrieve the Docker image for Splash
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

26
man/start_splash.Rd

@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{start_splash}
\alias{start_splash}
\title{Start a Splash server Docker container}
\usage{
start_splash()
}
\value{
`harbor` `container` object
}
\description{
Start a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] for this to work. You should save the resultant `host`
object for use in [stop_splash].
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}

26
man/stop_splash.Rd

@ -0,0 +1,26 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docker.r
\name{stop_splash}
\alias{stop_splash}
\title{Stop a running a Splash server Docker container}
\usage{
stop_splash(splash_container)
}
\arguments{
\item{splash_container}{Docker `container` object created by [start_splash]}
}
\description{
Stop a running a Splash server Docker container
}
\note{
you need Docker running on your system and have pulled the container with
[install_spash] and started the Splash container with [start_splash] for this
to work. You will need the `container` object from [start_splash] for this to work.
}
\examples{
\dontrun{
install_splash()
splash_container <- start_splash()
stop_splash(splash_container)
}
}
Yükleniyor…
İptal
Kaydet