Browse Source

some final tweaks

master
boB Rudis 2 years ago
parent
commit
7adc6c38d1
No known key found for this signature in database GPG Key ID: 2A514A4997464560
7 changed files with 35 additions and 36 deletions
  1. +6
    -8
      DESCRIPTION
  2. +6
    -0
      NEWS.md
  3. +7
    -9
      R/splashr-package.R
  4. +4
    -3
      README.Rmd
  5. +5
    -7
      README.md
  6. BIN
      img/cap.png
  7. +7
    -9
      man/splashr.Rd

+ 6
- 8
DESCRIPTION View File

@@ -1,20 +1,18 @@
Package: splashr
Type: Package
Title: Tools to Work with the 'Splash' 'JavaScript' Rendering Service
Version: 0.3.0
Date: 2017-02-14
Version: 0.4.0
Date: 2017-08-26
Encoding: UTF-8
Author: Bob Rudis (bob@rud.is)
Maintainer: Bob Rudis <bob@rud.is>
Description: 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
R pacakges but with a Java-free footprint. The (twisted) 'QT' reactor is used to make the
sever fully asynchronous allowing to take advantage of 'webkit' concurrency via 'QT' main loop.
Some of 'Splash' features include the ability to process multiple webpages in parallel;
retrieving 'HTML' results and/or take screenshots; disabling images or use 'Adblock Plus' rules
to make rendering faster; executing custom 'JavaScript' in page context; getting detailed
rendering info in 'HAR' format.
R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
page context; getting detailed rendering info in 'HAR' format.
URL: http://github.com/hrbrmstr/splashr
BugReports: https://github.com/hrbrmstr/splashr/issues
License: AGPL


+ 6
- 0
NEWS.md View File

@@ -1,3 +1,9 @@
0.4.0

* moved to 'docker' pacakge since it's on CRAN
* temporarily removed `render_file()` support
* added code coverage

0.3.0

* added basic pkg tests


+ 7
- 9
R/splashr-package.R View File

@@ -1,14 +1,12 @@
#' Tools to Work with the 'Splash' JavaScript Rendering Service
#'
#' 'Splash' <https://github.com/scrapinghub/splash> is a javascript rendering service.
#' It’s a lightweight web browser with an 'HTTP' API, implemented in Python using
#' 'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or
#' 'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is
#' used to make the sever fully asynchronous allowing to take advantage of 'webkit'
#' concurrency via QT main loop. Some of Splash features include the ability to process
#' multiple webpages in parallel; retrieving HTML results and/or take screenshots;
#' disabling images or use Adblock Plus rules to make rendering faster; executing custom
#' JavaScript in page context; getting detailed rendering info in HAR format.
#' 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
#' It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
#' and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
#' R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
#' multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
#' images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
#' page context; getting detailed rendering info in 'HAR' format.
#'
#' @md
#' @name splashr


+ 4
- 3
README.Rmd View File

@@ -8,7 +8,7 @@ output: rmarkdown::github_document

TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.

Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.

It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.

@@ -133,8 +133,6 @@ library(splashr)
library(magick)
library(rvest)
library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse)

# current verison
@@ -222,6 +220,9 @@ splash_local %>%
<img src="img/flash.png" width="50%"/>

```{r echo=FALSE, eval=FALSE}
library(htmlwidgets)
library(DiagrammeR)

### Rendering Widgets
{r eval=FALSE}
splash_vm <- start_splash(add_tempdir = TRUE)


+ 5
- 7
README.md View File

@@ -5,7 +5,7 @@

TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.

Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.

It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.

@@ -130,15 +130,13 @@ library(splashr)
library(magick)
library(rvest)
library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse)

# current verison
packageVersion("splashr")
```

## [1] '0.3.0'
## [1] '0.4.0'

``` r
splash_active()
@@ -159,7 +157,7 @@ splash_debug()
## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1
## ..$ Request : int 1
## $ maxrss : int 75556
## $ maxrss : int 76308
## $ qsize : int 0
## $ url : chr "http://localhost:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@@ -173,7 +171,7 @@ render_html(url = "http://marvel.com/universe/Captain_America_(Steve_Rogers)")

## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=release%2F1.116.0_widgets_767" type="tex ...
## [1] <head>\n<script type="text/javascript" async="async" src="http://dpm.demdex.net/id?d_rtbd=json&amp;d_ver=2&amp;d_ ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...

``` r
@@ -286,7 +284,7 @@ library(testthat)
date()
```

## [1] "Sun Aug 27 08:27:02 2017"
## [1] "Sun Aug 27 09:01:57 2017"

``` r
test_dir("tests/")


BIN
img/cap.png View File

Before After
Width: 1024  |  Height: 1451  |  Size: 521KB Width: 1024  |  Height: 1451  |  Size: 521KB

+ 7
- 9
man/splashr.Rd View File

@@ -6,15 +6,13 @@
\alias{splashr-package}
\title{Tools to Work with the 'Splash' JavaScript Rendering Service}
\description{
'Splash' \url{https://github.com/scrapinghub/splash} is a javascript rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in Python using
'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or
'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is
used to make the sever fully asynchronous allowing to take advantage of 'webkit'
concurrency via QT main loop. Some of Splash features include the ability to process
multiple webpages in parallel; retrieving HTML results and/or take screenshots;
disabling images or use Adblock Plus rules to make rendering faster; executing custom
JavaScript in page context; getting detailed rendering info in HAR format.
'Splash' \url{https://github.com/scrapinghub/splash} is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
page context; getting detailed rendering info in 'HAR' format.
}
\author{
Bob Rudis (bob@rud.is)


Loading…
Cancel
Save