You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.1 KiB
3.1 KiB
---
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---
```{r pkg-knitr-opts, include=FALSE}
hrbrpkghelpr::global_opts()
```
```{r badges, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::stinking_badges()
```
```{r description, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::yank_title_and_description()
```
## What's Inside The Tin
The following functions are implemented:
```{r ingredients, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::describe_ingredients()
```
## Installation
You'll need `libfuzzy` installed and available for linking. See <https://ssdeep-project.github.io/ssdeep/index.html#platforms> for platform support.
On Ubuntu/Debian you can do:
```shell
sudo apt install libfuzzy-dev
```
On macOS you can do:
```shell
brew install ssdeep
```
The library works on Windows, I just need to do some manual labor for that.
Package installation:
```{r install-ex, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::install_block()
```
## Usage
```{r lib-ex}
library(ssdeepr)
# current version
packageVersion("ssdeepr")
```
- `index.html` is a static copy of a blog main page with a bunch of `<div>`s with article snippets
- `index1.html` is the same file as `index.htmnl` with a changed cache timestamp at the end
- `index2.html` is the same file as `index.html` with one article snippet removed
- `RMacOSX-FAQ.html` is the CRAN 'R for Mac OS X FAQ'
```{r u-01}
system.file("extdat", package="ssdeepr") %>%
list.files(full.names = TRUE, pattern = "html$", include.dirs = FALSE) %>%
hash_file() -> hashes
hashes
hash_compare(hashes$hash[1], hashes$hash[1])
hash_compare(hashes$hash[1], hashes$hash[2])
hash_compare(hashes$hash[1], hashes$hash[3])
hash_compare(hashes$hash[1], hashes$hash[4])
```
Works with Connections, too. All three should be the same if the Wikipedia page hasn't changed since making local copies in the package.
Using `hash_con()` has the advantage of not requiring the entire contents of the target blob to be read into memory in exchange for a tiny cost to speed for most files (see Benchmarks).
NOTE that retrieving the URL contents with different user-agent strings and/or with javascript-enabled may/will likely generate different content and, thus, a different hash.
```{r u-02}
(k1 <- hash_con(url("https://en.wikipedia.org/wiki/Donald_Knuth",
header = setNames(splashr::ua_ios_safari, "User-Agent"))))
(k2 <- hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))))
(k3 <- hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))))
hash_compare(k1, k2)
hash_compare(k1, k3)
hash_compare(k2, k3)
```
Benchmarks
```{r u-03}
microbenchmark::microbenchmark(
con = hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))),
gzc = hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))),
fil = hash_file(system.file("knuth", "local.html", package = "ssdeepr"))
)
```
## ssdeepr Metrics
```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```
## Code of Conduct
Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.