Bladeren bron

some final tweaks

master
boB Rudis 7 jaren geleden
bovenliggende
commit
7adc6c38d1
Geen bekende sleutel gevonden voor deze handtekening in de database GPG sleutel-ID: 2A514A4997464560
  1. 14
      DESCRIPTION
  2. 6
      NEWS.md
  3. 16
      R/splashr-package.R
  4. 7
      README.Rmd
  5. 12
      README.md
  6. BIN
      img/cap.png
  7. 16
      man/splashr.Rd

14
DESCRIPTION

@ -1,20 +1,18 @@
Package: splashr
Type: Package
Title: Tools to Work with the 'Splash' 'JavaScript' Rendering Service
Version: 0.3.0
Date: 2017-02-14
Version: 0.4.0
Date: 2017-08-26
Encoding: UTF-8
Author: Bob Rudis (bob@rud.is)
Maintainer: Bob Rudis <bob@rud.is>
Description: 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
R pacakges but with a Java-free footprint. The (twisted) 'QT' reactor is used to make the
sever fully asynchronous allowing to take advantage of 'webkit' concurrency via 'QT' main loop.
Some of 'Splash' features include the ability to process multiple webpages in parallel;
retrieving 'HTML' results and/or take screenshots; disabling images or use 'Adblock Plus' rules
to make rendering faster; executing custom 'JavaScript' in page context; getting detailed
rendering info in 'HAR' format.
R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
page context; getting detailed rendering info in 'HAR' format.
URL: http://github.com/hrbrmstr/splashr
BugReports: https://github.com/hrbrmstr/splashr/issues
License: AGPL

6
NEWS.md

@ -1,3 +1,9 @@
0.4.0
* moved to 'docker' pacakge since it's on CRAN
* temporarily removed `render_file()` support
* added code coverage
0.3.0
* added basic pkg tests

16
R/splashr-package.R

@ -1,14 +1,12 @@
#' Tools to Work with the 'Splash' JavaScript Rendering Service
#'
#' 'Splash' <https://github.com/scrapinghub/splash> is a javascript rendering service.
#' It’s a lightweight web browser with an 'HTTP' API, implemented in Python using
#' 'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or
#' 'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is
#' used to make the sever fully asynchronous allowing to take advantage of 'webkit'
#' concurrency via QT main loop. Some of Splash features include the ability to process
#' multiple webpages in parallel; retrieving HTML results and/or take screenshots;
#' disabling images or use Adblock Plus rules to make rendering faster; executing custom
#' JavaScript in page context; getting detailed rendering info in HAR format.
#' 'Splash' <https://github.com/scrapinghub/splash> is a 'JavaScript' rendering service.
#' It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
#' and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
#' R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
#' multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
#' images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
#' page context; getting detailed rendering info in 'HAR' format.
#'
#' @md
#' @name splashr

7
README.Rmd

@ -8,7 +8,7 @@ output: rmarkdown::github_document
TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do _everything_ Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.
@ -133,8 +133,6 @@ library(splashr)
library(magick)
library(rvest)
library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse)
# current verison
@ -222,6 +220,9 @@ splash_local %>%
<img src="img/flash.png" width="50%"/>
```{r echo=FALSE, eval=FALSE}
library(htmlwidgets)
library(DiagrammeR)
### Rendering Widgets
{r eval=FALSE}
splash_vm <- start_splash(add_tempdir = TRUE)

12
README.md

@ -5,7 +5,7 @@
TL;DR: This package works with Splash rendering servers which are really just a REST API & `lua` scripting interface to a QT browser. It's an alternative to the Selenium ecosystem which was really engineered for application testing & validation.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over your meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do *everything* Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you're just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.
It's also an alternative to `phantomjs` (which you can use in R within or without a Selenium context as it's it's own webdriver) and it may be useful to compare renderings between this package & `phantomjs`.
@ -130,15 +130,13 @@ library(splashr)
library(magick)
library(rvest)
library(anytime)
library(htmlwidgets)
library(DiagrammeR)
library(tidyverse)
# current verison
packageVersion("splashr")
```
## [1] '0.3.0'
## [1] '0.4.0'
``` r
splash_active()
@ -159,7 +157,7 @@ splash_debug()
## ..$ LuaRuntime: int 1
## ..$ QTimer : int 1
## ..$ Request : int 1
## $ maxrss : int 75556
## $ maxrss : int 76308
## $ qsize : int 0
## $ url : chr "http://localhost:8050"
## - attr(*, "class")= chr [1:2] "splash_debug" "list"
@ -173,7 +171,7 @@ render_html(url = "http://marvel.com/universe/Captain_America_(Steve_Rogers)")
## {xml_document}
## <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
## [1] <head>\n<script src="http://widget-cdn.rpxnow.com/manifest/login?version=release%2F1.116.0_widgets_767" type="tex ...
## [1] <head>\n<script type="text/javascript" async="async" src="http://dpm.demdex.net/id?d_rtbd=json&amp;d_ver=2&amp;d_ ...
## [2] <body id="index-index" class="index-index" onload="findLinks('myLink');">\n\n\t<div id="page_frame" style="overfl ...
``` r
@ -286,7 +284,7 @@ library(testthat)
date()
```
## [1] "Sun Aug 27 08:27:02 2017"
## [1] "Sun Aug 27 09:01:57 2017"
``` r
test_dir("tests/")

BIN
img/cap.png

Binary file not shown.

Voor

Breedte:  |  Hoogte:  |  Grootte: 521 KiB

Na

Breedte:  |  Hoogte:  |  Grootte: 521 KiB

16
man/splashr.Rd

@ -6,15 +6,13 @@
\alias{splashr-package}
\title{Tools to Work with the 'Splash' JavaScript Rendering Service}
\description{
'Splash' \url{https://github.com/scrapinghub/splash} is a javascript rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in Python using
'Twisted'and 'QT' and provides some of the core functionality of the 'RSelenium' or
'seleniumPipes'R packages but with a Java-free footprint. The (twisted) 'QT' reactor is
used to make the sever fully asynchronous allowing to take advantage of 'webkit'
concurrency via QT main loop. Some of Splash features include the ability to process
multiple webpages in parallel; retrieving HTML results and/or take screenshots;
disabling images or use Adblock Plus rules to make rendering faster; executing custom
JavaScript in page context; getting detailed rendering info in HAR format.
'Splash' \url{https://github.com/scrapinghub/splash} is a 'JavaScript' rendering service.
It’s a lightweight web browser with an 'HTTP' API, implemented in 'Python' using 'Twisted'
and 'QT' and provides some of the core functionality of the 'RSelenium' or 'seleniumPipes'
R pacakges in a lightweight footprint. Some of 'Splash' features include the ability to process
multiple webpages in parallel; retrieving 'HTML' results and/or take screenshots; disabling
images or use 'Adblock Plus' rules to make rendering faster; executing custom 'JavaScript' in
page context; getting detailed rendering info in 'HAR' format.
}
\author{
Bob Rudis (bob@rud.is)

Laden…
Annuleren
Opslaan