You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

2.2 KiB

---
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---

[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/hgr.svg?branch=master)](https://travis-ci.org/hrbrmstr/hgr)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/hrbrmstr/hgr?branch=master&svg=true)](https://ci.appveyor.com/project/hrbrmstr/hgr)
[![Coverage Status](https://img.shields.io/codecov/c/github/hrbrmstr/hgr/master.svg)](https://codecov.io/github/hrbrmstr/hgr?branch=master)

# hgr

Tools to Work with the 'Postlight' 'Mercury' 'API'

## Description

The 'Postlight' 'Mercury' 'API' <https://mercury.postlight.com> takes any web article and returns only the relevant content - headline, author, body text, images and more - free from any clutter and including only minimal markup. Tools are provided to access the 'API' and also further clean up retrieved text through the the application of 'XSLT' style sheets. An 'RStudio' 'Addin' is also provided which makes it possible to preview the cleaned content from a 'URL' on the clipboard.

You need an API key which you can get from [here](https://mercury.postlight.com).

## What's inside the tin?

The following functions are implemented:

- `just_the_facts`: Retrieve parsed content of a URL processed by the Postlight Mercury API
- `clean_text`: Remove all HTML/XML tags from an HTML document/atomic character vector

## Installation

```{r eval=FALSE}
devtools::install_github("hrbrmstr/hgr")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}
library(hgr)

# current verison
packageVersion("hgr")

story <- "https://www.nytimes.com/2017/04/18/world/asia/aircraft-carrier-north-korea-carl-vinson.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news&_r=0"

doc <- just_the_facts(story)

dplyr::glimpse(doc)

substr(doc$content, 1, 100)

plain <- clean_text(doc$content)

substr(plain, 1, 100)
```

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.