You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.9 KiB

output: rmarkdown::github_document
[![Travis-CI Build Status](](

<!-- is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE,
error = FALSE,
fig.path = "README-"

`htmltidy` — Clean up gnarly HTML/XML

Inspired by [this SO question]( and because there's a great deal of cruddy HTML out there that needs fixing to use properly when scraping data.

NOTE: Requires [`libtidy`]( and presently is super-basic (no way to set options and pretty much only does HTML)

You'll need to first do a `brew install tidy-html5` on MacOS or `apt-get install libtidy-dev` on Ubuntu/Debian to get this to work. NOTE that the linux libraries may be older and return slightly different (but no less tidy) HTML.


This works enough for me to use in a pinch. It should be straightforward (but tedious) to:

- enable passing options in a `list`
- bundle `libtidy` _with the package_ and get it to work on Windows, linux & MacOS as the library compiles on all three with the necessary tools.

The following functions are implemented:

- `tidy` : Clean up gnarly HTML/XML

### Installation

```{r eval=FALSE}

```{r echo=FALSE}

### Usage


# current verison

cat(tidy("<b><p><a href=''>google &gt</a></p></b>"))

### Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](
By participating in this project you agree to abide by its terms.