mirror of https://git.sr.ht/~hrbrmstr/crux
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.5 KiB
2.5 KiB
---
output: rmarkdown::github_document
editor_options:
chunk_output_type: inline
---
```{r pkg-knitr-opts, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE, fig.retina=2, message=FALSE, warning=FALSE)
options(width=120)
```
[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/crux.svg?branch=master)](https://travis-ci.org/hrbrmstr/crux)
[![Coverage Status](https://codecov.io/gh/hrbrmstr/crux/branch/master/graph/badge.svg)](https://codecov.io/gh/hrbrmstr/crux)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/crux)](https://cran.r-project.org/package=crux)
# crux
Identify the Crux of an Article
## Description
Methods are provided to retrieve HTML content and return extracted
metadata and summarised plain text. Further methods are provided to classify
URLs with or without making network calls. Based on <https://github.com/chimbori/crux>.
## What's Inside The Tin
The following functions are implemented:
- `classify_url`: Classify a URL with or without making network calls
- `is_ad_image`: Classify a URL with or without making network calls
- `is_likely_archive`: Classify a URL with or without making network calls
- `is_likely_article`: Classify a URL with or without making network calls
- `is_likely_audio`: Classify a URL with or without making network calls
- `is_likely_binary_doc`: Classify a URL with or without making network calls
- `is_likely_executable`: Classify a URL with or without making network calls
- `is_likely_image`: Classify a URL with or without making network calls
- `is_likely_video`: Classify a URL with or without making network calls
- `is_web_scheme`: Classify a URL with or without making network calls
- `summarise_url`: Summarise the contents at a URL to essential bits
## Installation
```{r install-ex, eval=FALSE}
install.packages(c("cruxjars", "crux"), repos = "https://cinc.rud.is/")
```
## Usage
```{r lib-ex}
library(crux)
# current version
packageVersion("crux")
```
```{r}
str(
summarise_url("http://time.com/5541738/joe-biden-backtracks-pence-praise-criticism/"), 1
)
```
```{r}
str(
classify_url("https://www.washingtonpost.com/powerpost/house-democrats-explode-in-recriminations-as-liberals-lash-out-at-moderates/2019/02/28/c3d163fe-3b87-11e9-a06c-3ec8ed509d15_story.html")
)
```
## crux Metrics
```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md).
By participating in this project you agree to abide by its terms.