--- output: rmarkdown::github_document --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, error = FALSE, fig.retina=2, fig.path = "README-" ) ``` `htmltidy` — Clean up gnarly HTML/XML Inspired by [this SO question](http://stackoverflow.com/questions/37061873/identify-a-weblink-in-bold-in-r) and because there's a great deal of cruddy HTML out there that needs fixing to use properly when scraping data. NOTE: Requires [`libtidy`](http://www.html-tidy.org/) and presently is super-basic (no way to set options and pretty much only does HTML) You'll need to first do a `brew install tidy-html5` on MacOS or `apt-get install libtidy-dev` on Ubuntu/Debian to get this to work. **SEEKING COLLABORATORS** This works enough for me to use in a pinch. It should be straightforward (but tedious) to: - enable passing options in a `list` - bundle `libtidy` _with the package_ and get it to work on Windows, linux & MacOS as the library compiles on all three with the necessary tools. The following functions are implemented: - `tidy` : Clean up gnarly HTML/XML ### Installation ```{r eval=FALSE} devtools::install_github("hrbrmstr/htmltidy") ``` ```{r echo=FALSE} options(width=120) ``` ### Usage ```{r} library(htmltidy) # current verison packageVersion("htmltidy") cat(tidy("

google >

")) ``` ### Code of Conduct Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.