## [1] "<!DOCTYPE html>\n<htmlxmlns=\"http://www.w3.org/1999/xhtml\">\n<head>\n<metaname=\"generator\" content=\n\"HTMLTidyforHTML5forRversion5.0.0\" />\n<style>\n<![CDATA[\nbody{font-family:sans-serif;}\n]]>\n</style>\n<title></title>\n</head>\n<body>\n<b>This is some <i>really</i> poorly formatted HTML as is this\n<spanid=\"sp\">portion</span></b>\n<div><spanid=\"sp\"></span></div>\n</body>\n</html>\n"
## <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
## <htmlxmlns="http://www.w3.org/1999/xhtml">
## <head>
## <metaname="generator"content="HTML Tidy for HTML5 for R version 5.0.0">
## <style>
## <![CDATA[
## body { font-family: sans-serif; }
## ]]>
## </style>
## <title></title>
## </head>
## <body>
## <b>This is some <i>really</i> poorly formatted HTML as is this
## <spanid="sp">portion</span></b>
## <div><spanid="sp"></span></div>
## </body>
## </html>
##
```
```
### Testing Options
### Testing Options
@ -90,21 +172,36 @@ txt <- "<html>
</html>"
</html>"
cat(tidy_html(txt, option=opts))
cat(tidy_html(txt, option=opts))
#><!DOCTYPE html>
##<!DOCTYPE html>
#><html>
##<html>
#><head>
##<head>
#><metaname="generator"content="HTML Tidy for HTML5 for R version 5.0.0">
##<metaname="generator"content="HTML Tidy for HTML5 for R version 5.0.0">
#><style>
##<style>
#> p { color: red; }
## p { color: red; }
#></style>
##</style>
#><title></title>
##<title></title>
#></head>
##</head>
#><body>
##<body>
#><p>Test</p>
##<p>Test</p>
#></body>
##</body>
#></html>
##</html>
```
```
But, you're probably better off running it on plain HTML source.
Since it's C/C++-backed, it's pretty fast:
``` r
book <-readLines("http://singlepageappbook.com/single-page.html")
sum(map_int(book, nchar))
## [1] 207501
system.time(tidy_book <-tidy_html(book))
## user system elapsed
## 0.022 0.001 0.022
```
(It's usually between 20 & 25 milliseconds to process those 202 kilobytes of HTML.) Not too shabby.
### Code of Conduct
### Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.