Context Triggered Piecewise Hash Computation Using 'ssdeep'
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.1 KiB

output: rmarkdown::github_document
chunk_output_type: console
```{r pkg-knitr-opts, include=FALSE}

```{r badges, results='asis', echo=FALSE, cache=FALSE}

```{r description, results='asis', echo=FALSE, cache=FALSE}

## What's Inside The Tin

The following functions are implemented:

```{r ingredients, results='asis', echo=FALSE, cache=FALSE}

## Installation

You'll need `libfuzzy` installed and available for linking. See <> for platform support.

On Ubuntu/Debian you can do:

sudo apt install libfuzzy-dev

On macOS you can do:

brew install ssdeep

The library works on Windows, I just need to do some manual labor for that.

Package installation:

```{r install-ex, results='asis', echo=FALSE, cache=FALSE}

## Usage

```{r lib-ex}

# current version


- `index.html` is a static copy of a blog main page with a bunch of `<div>`s with article snippets
- `index1.html` is the same file as `index.htmnl` with a changed cache timestamp at the end
- `index2.html` is the same file as `index.html` with one article snippet removed
- `RMacOSX-FAQ.html` is the CRAN 'R for Mac OS X FAQ'

```{r u-01}
system.file("extdat", package="ssdeepr") %>%
list.files(full.names = TRUE, pattern = "html$", include.dirs = FALSE) %>%
hash_file() -> hashes


hash_compare(hashes$hash[1], hashes$hash[1])
hash_compare(hashes$hash[1], hashes$hash[2])
hash_compare(hashes$hash[1], hashes$hash[3])
hash_compare(hashes$hash[1], hashes$hash[4])

Works with Connections, too. All three should be the same if the Wikipedia page hasn't changed since making local copies in the package.

Using `hash_con()` has the advantage of not requiring the entire contents of the target blob to be read into memory in exchange for a tiny cost to speed for most files (see Benchmarks).

NOTE that retrieving the URL contents with different user-agent strings and/or with javascript-enabled may/will likely generate different content and, thus, a different hash.

```{r u-02}
(k1 <- hash_con(url("",
header = setNames(splashr::ua_ios_safari, "User-Agent"))))

(k2 <- hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))))

(k3 <- hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))))

hash_compare(k1, k2)

hash_compare(k1, k3)

hash_compare(k2, k3)


```{r u-03}
con = hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))),
gzc = hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))),
fil = hash_file(system.file("knuth", "local.html", package = "ssdeepr"))

## ssdeepr Metrics

```{r cloc, echo=FALSE}

## Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.