Browse Source

some final tweaks

master
boB Rudis 7 years ago
parent
commit
7adc6c38d1
No known key found for this signature in database GPG Key ID: 2A514A4997464560
  1. 14
      DESCRIPTION
  2. 6
      NEWS.md
  3. 16
      R/splashr-package.R
  4. 7
      README.Rmd
  5. 12
      README.md
  6. BIN
      img/cap.png
  7. 16
      man/splashr.Rd

14
DESCRIPTION

@ -1,20 +1,18 @@
Package: splashr Package: splashr
Type: Package Type: Package
Title: Tools to Work with the 'Splash' 'JavaScript' Rendering Service Title: Tools to Work with the 'Splash' 'JavaScript' Rendering Service
Version: 0.3.0 Version: 0.4.0
Date: 2017-02-14 Date: 2017-08-26
Encoding: UTF-8 Encoding: UTF-8
Author: Bob Rudis (bob@rud.is) Author: Bob Rudis (bob@rud.is)
Maintainer: Bob Rudis <bob@rud.is> Maintainer: Bob Rudis <bob@rud.is>
Description: 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service. Description: 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted' It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes' and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
R pacakges but with a Java-free footprint. The (twisted) 'QT' reactor is used to make the R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
sever fully asynchronous allowing to take advantage of 'webkit' concurrency via 'QT' main loop. multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
Some of 'Splash' features include the ability to process multiple webpages in parallel; images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
retrieving 'HTML' results and/or take screenshots; disabling images or use 'Adblock Plus' rules page context; getting detailed rendering info in 'HAR' format.
to make rendering faster; executing custom 'JavaScript' in page context; getting detailed
rendering info in 'HAR' format.
URL: http://github.com/hrbrmstr/splashr URL: http://github.com/hrbrmstr/splashr
BugReports: https://github.com/hrbrmstr/splashr/issues BugReports: https://github.com/hrbrmstr/splashr/issues
License: AGPL License: AGPL

6
NEWS.md

@ -1,3 +1,9 @@
0.4.0
* moved to 'docker' pacakge since it's on CRAN
* temporarily removed `render_file()` support
* added code coverage
0.3.0 0.3.0
* added basic pkg tests * added basic pkg tests

16
R/splashr-package.R

@ -1,14 +1,12 @@
#' Tools to Work with the 'Splash' JavaScript Rendering Service #' Tools to Work with the 'Splash' JavaScript Rendering Service
#' #'
#' 'Splash' <https://github.com/scrapinghub/splash> is a javascript rendering service. #' 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
#' It’s a lightweight web browser with an 'HTTP' API, implemented in Python using #' It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
#' 'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or #' and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
#' 'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is #' R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
#' used to make the sever fully asynchronous allowing to take advantage of 'webkit' #' multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
#' concurrency via QT main loop. Some of Splash features include the ability to process #' images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
#' multiple webpages in parallel; retrieving HTML results and/or take screenshots; #' page context; getting detailed rendering info in 'HAR' format.
#' disabling images or use Adblock Plus rules to make rendering faster; executing custom
#' JavaScript in page context; getting detailed rendering info in HAR format.
#' #'
#' @md #' @md
#' @name splashr #' @name splashr

7
README.Rmd

@ -8,7 +8,7 @@ output: rmarkdown::github_document
TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation. TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative. Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`. It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.
@ -133,8 +133,6 @@ library(splashr)
library(magick) library(magick)
library(rvest) library(rvest)
library(anytime) library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse) library(tidyverse)
# current verison # current verison
@ -222,6 +220,9 @@ splash_local %>%
<img src="img/flash.png" width="50%"/> <img src="img/flash.png" width="50%"/>
```{r echo=FALSE, eval=FALSE} ```{r echo=FALSE, eval=FALSE}
library(htmlwidgets)
library(DiagrammeR)
### Rendering Widgets ### Rendering Widgets
{r eval=FALSE} {r eval=FALSE}
splash_vm <- start_splash(add_tempdir = TRUE) splash_vm <- start_splash(add_tempdir = TRUE)

12
README.md

@ -5,7 +5,7 @@
TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation. TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative. Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`. It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.
@ -130,15 +130,13 @@ library(splashr)
library(magick) library(magick)
library(rvest) library(rvest)
library(anytime) library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse) library(tidyverse)
# current verison # current verison
packageVersion("splashr") packageVersion("splashr")
``` ```
## [1] '0.3.0' ## [1] '0.4.0'
``` r ``` r
splash_active() splash_active()
@ -159,7 +157,7 @@ splash_debug()
## ..$ LuaRuntime: int 1 ## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1 ## ..$ QTimer : int 1
## ..$ Request : int 1 ## ..$ Request : int 1
## $ maxrss : int 75556 ## $ maxrss : int 76308
## $ qsize : int 0 ## $ qsize : int 0
## $ url : chr "http://localhost:8050" ## $ url : chr "http://localhost:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list" ## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@ -173,7 +171,7 @@ render_html(url = "http://marvel.com/universe/Captain_America_(Steve_Rogers)")
## {xml_document} ## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr"> ## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=release%2F1.116.0_widgets_767" type="tex ... ## [1] <head>\n<script type="text/javascript" async="async" src="http://dpm.demdex.net/id?d_rtbd=json&amp;d_ver=2&amp;d_ ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ... ## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...
``` r ``` r
@ -286,7 +284,7 @@ library(testthat)
date() date()
``` ```
## [1] "Sun Aug 27 08:27:02 2017" ## [1] "Sun Aug 27 09:01:57 2017"
``` r ``` r
test_dir("tests/") test_dir("tests/")

BIN
img/cap.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 521 KiB

After

Width:  |  Height:  |  Size: 521 KiB

16
man/splashr.Rd

@ -6,15 +6,13 @@
\alias{splashr-package} \alias{splashr-package}
\title{Tools to Work with the 'Splash' JavaScript Rendering Service} \title{Tools to Work with the 'Splash' JavaScript Rendering Service}
\description{ \description{
'Splash' \url{https://github.com/scrapinghub/splash} is a javascript rendering service. 'Splash' \url{https://github.com/scrapinghub/splash} is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in Python using It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
used to make the sever fully asynchronous allowing to take advantage of 'webkit' multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
concurrency via QT main loop. Some of Splash features include the ability to process images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
multiple webpages in parallel; retrieving HTML results and/or take screenshots; page context; getting detailed rendering info in 'HAR' format.
disabling images or use Adblock Plus rules to make rendering faster; executing custom
JavaScript in page context; getting detailed rendering info in HAR format.
} }
\author{ \author{
Bob Rudis (bob@rud.is) Bob Rudis (bob@rud.is)

Loading…
Cancel
Save