Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.
 
 
boB Rudis 028ad69341
README
il y a 6 ans
R CRAN prep; tests updated; docs updated il y a 6 ans
inst CRAN prep; tests updated; docs updated il y a 6 ans
man CRAN prep; tests updated; docs updated il y a 6 ans
tests basic tests il y a 7 ans
.Rbuildignore CRAN prep; tests updated; docs updated il y a 6 ans
.codecov.yml initial commit il y a 7 ans
.gitignore initial commit il y a 7 ans
.travis.yml Travis il y a 6 ans
CONDUCT.md CRAN prep; tests updated; docs updated il y a 6 ans
DESCRIPTION CRAN prep; tests updated; docs updated il y a 6 ans
NAMESPACE CRAN prep; tests updated; docs updated il y a 6 ans
NEWS.md better printing il y a 7 ans
README.Rmd README il y a 6 ans
README.md README il y a 6 ans
appveyor.yml CRAN prep; tests updated; docs updated il y a 6 ans
codecov.yml CRAN prep; tests updated; docs updated il y a 6 ans
hgr.Rproj initial commit il y a 7 ans

README.md

Travis-CI BuildStatus AppVeyor BuildStatus CoverageStatus

hgr

Tools to Work with the ‘Postlight’ ‘Mercury’ ‘API’

Description

The ‘Postlight’ ‘Mercury’ ‘API’ https://mercury.postlight.com takes any web article and returns only the relevant content - headline, author, body text, images and more - free from any clutter and including only minimal markup. Tools are provided to access the ‘API’ and also further clean up retrieved text through the the application of ‘XSLT’ style sheets. An ‘RStudio’ ‘Addin’ is also provided which makes it possible to preview the cleaned content from a ‘URL’ on the clipboard.

You need an API key which you can get from here.

What’s inside the tin?

The following functions are implemented:

  • just_the_facts: Retrieve parsed content of a URL processed by the Postlight Mercury API
  • clean_text: Remove all HTML/XML tags from an HTML document/atomic character vector

Installation

devtools::install_github("hrbrmstr/hgr")

Usage

library(hgr)

# current verison
packageVersion("hgr")
## [1] '0.3.0'
story <- "https://www.nytimes.com/2017/04/18/world/asia/aircraft-carrier-north-korea-carl-vinson.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news&_r=0"

doc <- just_the_facts(story)

dplyr::glimpse(doc)
## List of 12
##  $ title         : chr "Aircraft Carrier Wasn’t Sailing to Deter North Korea, as U.S. Suggested"
##  $ content       : chr "<div><article id=\"story\" class=\"story theme-main   \">\n\n    \n\n                        \n    \n\n    \n\n"| __truncated__
##  $ author        : chr "Mark Landler and Eric Schmitt"
##  $ date_published: POSIXct[1:1], format: "2017-04-18 17:57:41"
##  $ lead_image_url: chr "https://static01.nyt.com/images/2017/04/19/world/19carrier-sub/19carrier-sub-facebookJumbo.jpg"
##  $ url           : chr "https://www.nytimes.com/2017/04/18/world/asia/aircraft-carrier-north-korea-carl-vinson.html"
##  $ domain        : chr "www.nytimes.com"
##  $ excerpt       : chr "The saga might never have come to light had the Navy not posted a photograph of the Carl Vinson sailing through"| __truncated__
##  $ word_count    : int 1499
##  $ direction     : chr "ltr"
##  $ total_pages   : int 1
##  $ rendered_pages: int 1
##  - attr(*, "row.names")= int 1
##  - attr(*, "class")= chr "hgr"
substr(doc$content, 1, 100)
## [1] "<div><article id=\"story\" class=\"story theme-main   \">\n\n    \n\n                        \n    \n\n    \n\n  "
plain <- clean_text(doc$content)

substr(plain, 1, 100)
## [1] "WASHINGTON — Just over a week ago, the White House declared that ordering an American aircraft carri"

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.