Break Down the Walls of 'HTML' Tags into Usable Text
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

31 lines
1.2 KiB

Package: jericho
Type: Package
Title: Break Down the Walls of 'HTML' Tags into Usable Text
Version: 0.2.0
Date: 2019-03-01
Authors@R: c(
person("Bob", "Rudis", role = c("aut", "cre"), email = "bob@rud.is"),
person("Martin", "Jericho", role = c("ctb"), comment = "Jericho HTML Parser")
)
Maintainer: Bob Rudis <bob@rud.is>
SystemRequirements: Java
Description: Structured 'HTML' content can be useful when you need to parse data tables
or other tagged data from within a document. However, it is also useful to obtain
"just the text" from a document free from the walls of tags that surround it. Tools
are provied that wrap methods in the 'Jericho HTML Parser' Java library by
Martin Jericho <http://jericho.htmlparser.net/docs/index.html>. Martin's library
is used in many at-scale projects, icluding the 'The Internet Archive'.
URL: https://gitlab.com/hrbrmstr/jericho
BugReports: https://gitlab.com/hrbrmstr/jericho/issues
License: Apache License 2.0 | file LICENSE
Encoding: UTF-8
Suggests:
testthat,
covr
Depends:
R (>= 3.2.0),
rJava,
jerichojars
RoxygenNote: 6.1.1
Remotes:
url::https://git.sr.ht/~hrbrmstr/jerichojars