You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

64 lines
1.9 KiB

7 years ago
output: rmarkdown::github_document
7 years ago
[![Travis-CI Build Status](](
7 years ago
<!-- is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE,
error = FALSE,
7 years ago
fig.path = "README-"
`htmltidy` — Clean up gnarly HTML/XML
7 years ago
Inspired by [this SO question]( and because there's a great deal of cruddy HTML out there that needs fixing to use properly when scraping data.
7 years ago
NOTE: Requires [`libtidy`]( and presently is super-basic (no way to set options and pretty much only does HTML)
7 years ago
You'll need to first do a `brew install tidy-html5` on MacOS or `apt-get install libtidy-dev` on Ubuntu/Debian to get this to work. NOTE that the linux libraries may be older and return slightly different (but no less tidy) HTML.
7 years ago
This works enough for me to use in a pinch. It should be straightforward (but tedious) to:
- enable passing options in a `list`
- bundle `libtidy` _with the package_ and get it to work on Windows, linux & MacOS as the library compiles on all three with the necessary tools.
7 years ago
The following functions are implemented:
- `tidy` : Clean up gnarly HTML/XML
### Installation
```{r eval=FALSE}
```{r echo=FALSE}
7 years ago
### Usage
# current verison
cat(tidy("<b><p><a href=''>google &gt</a></p></b>"))
### Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](
By participating in this project you agree to abide by its terms.