You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
boB Rudis fdb3bac05c
modified tlsh_simple_hash: error to warning
1 year ago
R modified tlsh_simple_hash: error to warning 1 year ago
inst/extdat tests & coverage 1 year ago
man turned errors into warnings with NA return 1 year ago
src DSL framed out 1 year ago
tests updated tests to account for directory sort order differences 1 year ago
.Rbuildignore tests & coverage 1 year ago
.codecov.yml initial commit 1 year ago
.gitignore initial commit 1 year ago
.travis.yml tests & coverage 1 year ago
CONDUCT.md initial commit 1 year ago
DESCRIPTION tests & coverage 1 year ago
LICENSE initial commit 1 year ago
NAMESPACE tests & coverage 1 year ago
NEWS.md initial commit 1 year ago
README.Rmd tests & coverage 1 year ago
README.md tests & coverage 1 year ago
appveyor.yml tests & coverage 1 year ago
codecov.yml tests & coverage 1 year ago
tlsh.Rproj initial commit 1 year ago

README.md

Travis-CI BuildStatus AppVeyor BuildStatus CoverageStatus

tlsh

Local Sensitivity Hashing Using the ‘Trend Micro’ ‘TLSH’ Implementation

Description

‘Trend Micro’ provides an open source library https://github.com/trendmicro/tlsh/ for local sensitivity hashing. Methods are provided to compute and compare hashes from character/byte streams.

References

TODO

  • [ ] File input utilities
  • [ ] File input DSL verb
  • [ ] Docs
  • [ ] Tests
  • [ ] toString() method
  • [X] Reference class-backed DSL

What’s Inside The Tin

The following functions are implemented:

“Simple” interface (quick and dirty hashing):

  • tlsh_simple_hash: Compute TLSH hash for a character or raw vector and return hash fingerprint
  • tlsh_simple_diff: Compute the difference between two character hashes

DSL: (WIP)

  • tlsh: Create a new ‘tlsh’ object
  • tlsh_reset: Clear content and hash computation from a ‘tlsh’ object fingerprint
  • tlsh_update: Update the ‘tlsh’ object with content
  • tlsh_finalize: Finalize a ‘tlsh’ object hash
  • tlsh_is_valid: Test if a ‘tlsh’ hash object is valid
  • tlsh_hash: Retrieve the hex-encoded hash string for a ‘tlsh’ object
  • tlsh_dist: Compute distance between two TLSH objects
  • tlsh_stats: Return a data frame of lvalue and q1/2 ratios from a ‘tlsh’ object

TODO: Document DSL

Installation

devtools::install_github("hrbrmstr/tlsh")

Usage

library(tlsh)
library(tidyverse)

# current verison
packageVersion("tlsh")
## [1] '0.1.0'

Example

  • index.html is a static copy of a blog main page with a bunch of <div>s with article snippets
  • index1.html is the same file as index.htmnl with a changed cache timestamp at the end
  • index2.html is the same file as index.html with one article snippet removed
  • RMacOSX-FAQ.html is the CRAN ‘R for Mac OS X FAQ’
doc1 <- as.character(xml2::read_html(system.file("extdat", "index.html", package="tlsh")))
doc2 <- as.character(xml2::read_html(system.file("extdat", "index1.html", package="tlsh")))
doc3 <- as.character(xml2::read_html(system.file("extdat", "index2.html", package="tlsh")))
doc4 <- as.character(xml2::read_html(system.file("extdat", "RMacOSX-FAQ.html", package="tlsh")))

# generate hashes
 
(h1 <- tlsh_simple_hash(doc1))
## [1] "B253F9F3168DC8354B2363E2A585771CD25A803BCEA099C1FBED54ACA790EB5B137346"
(h2 <- tlsh_simple_hash(doc2))
## [1] "6153E8F3168DC8355B2363E2A585771CD26A803BCEA099C1FBED44AC9790EB5B137346"
(h3 <- tlsh_simple_hash(doc3))
## [1] "6443E8F3168DC8355B6262F2A9C5771CD25A802BCEA099C1FBED54AC9780FF4A137346"
(h4 <- tlsh_simple_hash(doc4))
## [1] "B8B3A52F93C0233E0F1216576F192FA812FD5C7EA3802188B557C67F8712D9A47666BB"
# compute distance

tlsh_simple_diff(h1, h2)
## [1] 7
tlsh_simple_diff(h1, h3)
## [1] 18
tlsh_simple_diff(h1, h4)
## [1] 334

DSL

doc1 <- as.character(xml2::read_html(system.file("extdat", "index.html", package="tlsh")))

tlsh() %>% 
  tlsh_update(doc1) %>% 
  tlsh_finalize() -> x

tlsh_hash(x)
## [1] "B253F9F3168DC8354B2363E2A585771CD25A803BCEA099C1FBED54ACA790EB5B137346"
tlsh_is_valid(x)
## [1] TRUE
tlsh_stats(x)
## # A tibble: 1 x 3
##   l_value q1_ratio q2_ratio
##     <int>    <int>    <int>
## 1      53       15        9
doc2 <- charToRaw(as.character(xml2::read_html(system.file("extdat", "index1.html", package="tlsh"))))

tlsh() %>% 
  tlsh_update(doc2) %>% 
  tlsh_finalize() -> y

tlsh_dist(x, y)
## [1] 7
tlsh_reset(x)
tlsh_reset(y)

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.