boB Rudis
428991aae1
|
4 years ago | |
---|---|---|
R | 4 years ago | |
inst | 4 years ago | |
man | 4 years ago | |
src | 4 years ago | |
tests | 4 years ago | |
.Rbuildignore | 4 years ago | |
.codecov.yml | 4 years ago | |
.gitignore | 4 years ago | |
.travis.yml | 4 years ago | |
CONDUCT.md | 4 years ago | |
DESCRIPTION | 4 years ago | |
NAMESPACE | 4 years ago | |
NEWS.md | 4 years ago | |
README.Rmd | 4 years ago | |
README.md | 4 years ago | |
cleanup | 4 years ago | |
configure | 4 years ago | |
ssdeepr.Rproj | 4 years ago |
README.md
ssdeepr
Context Triggered Piecewise Hash Computation Using ‘ssdeep’
Description
The ssdeep project provides an open source library https://github.com/ssdeep-project/ssdeep/ context triggered piecewise hashing. Methods are provided to compute and compare hashes from character/byte streams.
What’s Inside The Tin
The following functions are implemented:
hash_compare
: Compare two hasheshash_con
: Return CTP hash of one data collected from a connectionhash_file
: Return CTP hash of one or more fileshash_raw
: Return CTP hash of a raw vector
Installation
You’ll need libfuzzy
installed and available for linking. See
https://ssdeep-project.github.io/ssdeep/index.html#platforms for
platform support.
On Ubuntu/Debian you can do:
sudo apt install libfuzzy-dev
On macOS you can do:
brew install ssdeep
The library works on Windows, I just need to do some manual labor for that.
Package installation:
remotes::install_git("https://git.rud.is/hrbrmstr/ssdeepr.git")
# or
remotes::install_git("https://git.sr.ht/~hrbrmstr/ssdeepr")
# or
remotes::install_gitlab("hrbrmstr/ssdeepr")
# or
remotes::install_bitbucket("hrbrmstr/ssdeepr")
# or
remotes::install_github("hrbrmstr/ssdeepr")
NOTE: To use the ‘remotes’ install options you will need to have the {remotes} package installed.
Usage
library(ssdeepr)
# current version
packageVersion("ssdeepr")
## [1] '0.2.0'
index.html
is a static copy of a blog main page with a bunch of<div>
s with article snippetsindex1.html
is the same file asindex.htmnl
with a changed cache timestamp at the endindex2.html
is the same file asindex.html
with one article snippet removedRMacOSX-FAQ.html
is the CRAN ‘R for Mac OS X FAQ’
system.file("extdat", package="ssdeepr") %>%
list.files(full.names = TRUE, pattern = "html$", include.dirs = FALSE) %>%
hash_file() -> hashes
hashes
## path
## 1 /Library/Frameworks/R.framework/Versions/3.6/Resources/library/ssdeepr/extdat/index.html
## 2 /Library/Frameworks/R.framework/Versions/3.6/Resources/library/ssdeepr/extdat/index1.html
## 3 /Library/Frameworks/R.framework/Versions/3.6/Resources/library/ssdeepr/extdat/index2.html
## 4 /Library/Frameworks/R.framework/Versions/3.6/Resources/library/ssdeepr/extdat/RMacOSX-FAQ.html
## hash
## 1 1536:rwjgwyHuuoH3yHgHJBH5H3YHwHuXiOXd6uEk9SWLIL7ERKvc4wHc+sius234Y4NY:rZvb7HHc+sius234Y4N4pqwrCihwnUui
## 2 1536:twjgwyHuuoH3yHgHJBH5H3YHwHuXiCe6uEk9SWLIL7ERKvc4wbc+sius234Y4N4j:tZvbPobc+sius234Y4N4pqwrCihwnUua
## 3 1536:twjgwyHuuoH3yHgHJBH5H3YHwHuXiCJEk9SWLIL7ERKvc4wbc+sius234Y4N4pqs:tZvbPHbc+sius234Y4N4pqwrCihwnUum
## 4 1536:3ExSauOOiqyq5tfAJqE3+OmEvqVtEYsSWiWB/H5ZJ:0x9fqyqtfAJqEu8vOWYsLd5r
hash_compare(hashes$hash[1], hashes$hash[1])
## [1] 100
hash_compare(hashes$hash[1], hashes$hash[2])
## [1] 91
hash_compare(hashes$hash[1], hashes$hash[3])
## [1] 88
hash_compare(hashes$hash[1], hashes$hash[4])
## [1] 0
Works with Connections, too. All three should be the same if the Wikipedia page hasn’t changed since making local copies in the package.
NOTE that retrieving the URL contents with different user-agent strings and/or with javascript-enabled may/will likely generate different content and, thus, a different hash.
(k1 <- hash_con(url("https://en.wikipedia.org/wiki/Donald_Knuth")))
## [1] "3072:u2dfqECHC6NPsWzqFg2qDKgNYsVeJb19pEDTlfrd5czRsZNqqelzPFKsuXs6X9pU:PQli6NPsWzcg2/EYsVUY6sI"
(k2 <- hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))))
## [1] "3072:u2dfqECHC6NPsWzqFg2qDKgNYsVeJb19pEDTlfrd5czRsZNqqelzPFKsuXs6X9pU:PQli6NPsWzcg2/EYsVUY6sI"
(k3 <- hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))))
## [1] "3072:u2dfqECHC6NPsWzqFg2qDKgNYsVeJb19pEDTlfrd5czRsZNqqelzPFKsuXs6X9pU:PQli6NPsWzcg2/EYsVUY6sI"
hash_compare(k1, k2)
## [1] 100
hash_compare(k1, k3)
## [1] 100
hash_compare(k2, k3)
## [1] 100
ssdeepr Metrics
Lang | # Files | (%) | LoC | (%) | Blank lines | (%) | # Lines | (%) |
---|---|---|---|---|---|---|---|---|
C++ | 2 | 0.15 | 67 | 0.33 | 21 | 0.23 | 8 | 0.06 |
R | 8 | 0.62 | 62 | 0.30 | 28 | 0.30 | 71 | 0.50 |
Bourne Shell | 2 | 0.15 | 54 | 0.26 | 9 | 0.10 | 14 | 0.10 |
Rmd | 1 | 0.08 | 22 | 0.11 | 34 | 0.37 | 49 | 0.35 |
Code of Conduct
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.