Browse Source

gettin classy

master
boB Rudis 8 years ago
parent
commit
25c8fe4df4
No known key found for this signature in database GPG Key ID: 1D7529BE14E2BBA9
  1. 1
      .Rbuildignore
  2. 9
      DESCRIPTION
  3. 13
      INSTALL
  4. 2
      R/aaa.r
  5. 31
      R/datasets.r
  6. 8
      R/wand-package.R
  7. 49
      R/wand.r
  8. 4
      R/zzz.r
  9. 25
      README.Rmd
  10. 36
      README.md
  11. 4
      builder/make_mime_db.r
  12. 45
      configure
  13. BIN
      data/mime_db.rda
  14. 0
      inst/extdata/db/new/magic.mgc.zip
  15. 0
      inst/extdata/db/old/magic.mgc.zip
  16. 0
      inst/extdata/img/Rlogo.jpg
  17. 0
      inst/extdata/img/Rlogo.pdf
  18. 0
      inst/extdata/img/Rlogo.png
  19. 0
      inst/extdata/img/Rlogo.svg
  20. 0
      inst/extdata/img/Rlogo.tiff
  21. 0
      inst/extdata/img/example.c
  22. 0
      inst/extdata/img/example.html
  23. 0
      inst/extdata/img/example.r
  24. 0
      inst/extdata/img/example.rtf
  25. 0
      inst/extdata/img/example_dir/test.txt
  26. 2
      man/incant.Rd
  27. 2
      man/magic_wand_file.Rd
  28. 40
      man/mime_db.Rd
  29. 11
      man/wand.Rd
  30. 1
      src/wand.cpp
  31. 2
      tests/testthat/test-wand.R

1
.Rbuildignore

@ -7,3 +7,4 @@
^NOTES\.*html$
^tools$
^cran-comments\.md$
^builder$

9
DESCRIPTION

@ -3,15 +3,19 @@ Type: Package
Title: Retrieve 'Magic' Attributes from Files and Directories
Version: 0.2.0
Date: 2016-08-14
Author: Bob Rudis (@hrbrmstr), Christos Zoulas [libmagic], Mans Rullgard [file]
Author: Bob Rudis (@hrbrmstr), Christos Zoulas [libmagic], Mans Rullgard [file],
Jonathan Ong <me@jongleberry.com> [mime-db]
Maintainer: Bob Rudis <bob@rud.is>
Description: The 'libmagic' library provides functions to determine
'MIME' type and other metadata from files through their "magic"
attributes. This is useful when you do not wish to rely solely on
the honesty of a user or the extension on a file name.
the honesty of a user or the extension on a file name. It also
incorporates other metadata from the mime-db database
<https://github.com/jshttp/mime-db>.
URL: http://github.com/hrbrmstr/wand
BugReports: https://github.com/hrbrmstr/wand/issues
NeedsCompilation: yes
LazyData: true
SystemRequirements: libmagic (>= 5.14) for Unix/Linux/macOS; Rtools 3.3+ for Windows
License: AGPL
Suggests:
@ -28,5 +32,6 @@ Imports:
tidyr,
utils,
Rcpp
Encoding: UTF-8
LinkingTo: Rcpp
RoxygenNote: 5.0.1

13
INSTALL

@ -0,0 +1,13 @@
For Linux/UNIX/macOS you need 'libmagic' installed which is a component of the
'file' utility: <https://github.com/file/file>. You can find out more information
on 'libmagic' and 'file' at this URL: <http://www.darwinsys.com/file/>.
Here are the incantations you must use to get magic for your environment:
- `apt-get install libmagic-dev` on Ubuntu/Debian-ish systems
- `brew install libmagic` on macOS
- `yum install file-devel` on RHEL/CentOS/Fedora
For Windows you will need Rtools <https://cran.r-project.org/bin/windows/Rtools/>
version 3.3 or higher (it may work with older ones, but it's only been tested on
Rtools version 3.3 & 3.4).

2
R/aaa.r

@ -1 +1 @@
response <- encoding <- NULL
extensions <- mime_type <- response <- encoding <- NULL

31
R/datasets.r

@ -0,0 +1,31 @@
#' @title MIME Types Database
#' @description This is a dataset of all mime types. It aggregates data from the
#' following sources:
#'
#' \itemize{
#' \item \url{http://www.iana.org/assignments/media-types/media-types.xhtml}
#' \item \url{http://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types}
#' \item \url{http://hg.nginx.org/nginx/raw-file/default/conf/mime.types}
#' }
#'
#' There are a total of four possible fields per element:
#'
#' \itemize{
#' \item \code{source}: where the mime type is defined. If not set, it's
#' probably a custom media type. One of \code{apache}, \code{iana} or \code{nginx}.
#' \item \code{extensions}: a character vector of known extensions associated with this mime type.
#' \item \code{compressible}: whether a file of this type can be "gzipped" (mostly
#' useful in the context of serving up web content).
#' \item \code{charset}: the default charset associated with this type, if any.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name mime_db
#'
#' @references Ingested from \url{https://github.com/jshttp/mime-db}.
#' @usage data(mime_db)
#' @note Last updated 2016-08-14; the only guaranteed field is \code{source}
#' @format A list with 1,883 elements and four named fields: \code{source},
#' \code{compressible}, \code{extensions} & \code{charset}.
NULL

8
R/wand-package.R

@ -1,5 +1,13 @@
#' Retrieve 'Magic' Attributes from Files and Directories
#'
#' The 'libmagic' library provides functions to determine 'MIME' type and other
#' metadata from files through their "magic" attributes. This is useful when you
#' do not wish to rely solely on the honesty of a user or the extension on a
#' file name. It also incorporates other metadata from the mime-db database
#' <https://github.com/jshttp/mime-db>
#'
#' Based on \code{file} / \code{libmagic} - \url{https://github.com/file/file}
#'
#' @name wand
#' @docType package
#' @author Bob Rudis (@@hrbrmstr)

49
R/wand.r

@ -17,7 +17,7 @@
#' @examples
#' library(dplyr)
#'
#' system.file("img", package="filemagic") %>%
#' system.file("extdata/img", package="filemagic") %>%
#' list.files(full.names=TRUE) %>%
#' incant() %>%
#' glimpse()
@ -37,8 +37,8 @@ incant <- function(path, magic_db="system") {
if (!found_file) {
stop(paste0("'file.exe' not found. Please install 'Rtools' and restart R. ",
"See 'https://github.com/stan-dev/rstan/wiki/Install-Rtools-for-Windows' ",
"for more information on how to install 'Rtools'", collapse=""),
"See 'https://github.com/stan-dev/rstan/wiki/Install-Rtools-for-Windows' ",
"for more information on how to install 'Rtools'", collapse=""),
call.=FALSE)
}
@ -49,17 +49,17 @@ incant <- function(path, magic_db="system") {
suppressMessages(
suppressWarnings(
system2(file_exe,
c("--mime-type", "--mime-encoding", "--no-buffer", "--preserve-date",
'--separator "||"',
sprintf('--files-from "%s"', tf)),
stdout=TRUE))) -> output_1
system2(file_exe,
c("--mime-type", "--mime-encoding", "--no-buffer", "--preserve-date",
'--separator "||"',
sprintf('--files-from "%s"', tf)),
stdout=TRUE))) -> output_1
suppressMessages(
suppressWarnings(system2(file_exe,
c("--no-buffer", "--preserve-date", '--separator "||"',
sprintf('--files-from "%s"', tf)),
stdout=TRUE))) -> output_2
c("--no-buffer", "--preserve-date", '--separator "||"',
sprintf('--files-from "%s"', tf)),
stdout=TRUE))) -> output_2
unlink(tf)
@ -74,13 +74,36 @@ incant <- function(path, magic_db="system") {
setNames(c("file", "description")) -> df2
left_join(df1, df2, by="file") %>%
mutate_all(stri_trim_both)
mutate_all(stri_trim_both) -> ret
} else {
incant_(path, magic_db)
ret <- incant_(path, magic_db)
}
if (!("extensions" %in% colnames(ret))) ret$extensions <- NA
mutate(ret, extensions=ifelse(extensions=="???", NA, extensions)) %>%
mutate(extensions=map_exts(mime_type, extensions))
}
map_exts <- function(mime_type, current_extensions) {
exts <- stri_split_regex(current_extensions, "/")
map2(mime_type, exts, function(mt, xt) {
ret <- wand::mime_db[[mt]]$extensions %||% NA
ret <- sort(unique(c(xt, ret)))
ret <- ret[!is.na(ret)]
if (length(ret)==0) ret <- NA
ret
})
}
#' ripped from rappdirs (ty Hadley!)
get_os <- function () {
if (.Platform$OS.type == "windows") {

4
R/zzz.r

@ -18,7 +18,7 @@
#' @examples
#' library(dplyr)
#'
#' system.file("img", package="filemagic") %>%
#' system.file("extdata/img", package="filemagic") %>%
#' list.files(full.names=TRUE) %>%
#' incant(magic_wand_file()) %>%
#' glimpse()
@ -32,7 +32,7 @@ magic_wand_file <- function(refresh=FALSE) {
if (lib_version() >= 528) vers <- "new" else vers <- "old"
if (refresh | (!file.exists(file.path(rappdirs::user_cache_dir("wandr"), "magic.mgc")))) {
unzip(system.file("db", vers, "magic.mgc.zip", package="wand"),
unzip(system.file("extdata", "db", vers, "magic.mgc.zip", package="wand"),
exdir=cache, overwrite=TRUE)
}

25
README.Rmd

@ -8,8 +8,9 @@ output: rmarkdown::github_document
The `libmagic` library must be installed on *nix/macOS and available to use this.
- `apt-get install libmagic-dev` on Debian-ish systems
- `apt-get install libmagic-dev` on Ubuntu/Debian-ish systems
- `brew install libmagic` on macOS
- `yum install file-devel` on RHEL/CentOS/Fedora
While the package was developed using the 5.28 version of `libmagic` it has been configured to work with older versions. Note that some fields in the resultant data frame might not be available with older library versions. When using the function `magic_wand_file()` it checks for which version of `libmagic` is installed on your system and provides a suitable `magic.mgc` file for it.
@ -20,6 +21,10 @@ The following functions are implemented:
- `incant` : returns the "magic" metadata of the files in the input vector (as a data frame)
- `magic_wand_file` : provides a full path to the package-provided `magic` file
The following datasets are included:
- `mime_db` : a database of all mime types from <https://github.com/jshttp/mime-db>
### Installation
```{r eval=FALSE}
@ -34,22 +39,34 @@ options(width=120)
```{r message=FALSE}
library(wand)
library(magrittr)
library(dplyr)
system.file("img", package="wand") %>%
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant() %>%
glimpse()
```
```{r message=FALSE}
# Use a non-system magic-file
system.file("img", package="wand") %>%
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant(magic_wand_file()) %>%
select(description) %>%
unlist(use.names=FALSE)
```
```{r message=FALSE}
# what kinds of extensions are associated with these mime types
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant(magic_wand_file()) %>%
select(extensions) %>%
as.data.frame()
```
```{r message=FALSE}
# current verison
packageVersion("wand")

36
README.md

@ -17,6 +17,10 @@ The following functions are implemented:
- `incant` : returns the "magic" metadata of the files in the input vector (as a data frame)
- `magic_wand_file` : provides a full path to the package-provided `magic` file
The following datasets are included:
- `mime_db` : a database of all mime types from <https://github.com/jshttp/mime-db>
### Installation
``` r
@ -27,10 +31,9 @@ devtools::install_github("hrbrmstr/wand")
``` r
library(wand)
library(magrittr)
library(dplyr)
system.file("img", package="wand") %>%
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant() %>%
glimpse()
@ -38,16 +41,16 @@ system.file("img", package="wand") %>%
## Observations: 10
## Variables: 5
## $ file <chr> "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/wand/img/example_dir", "/Librar...
## $ file <chr> "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/wand/extdata/img/example_dir", ...
## $ mime_type <chr> "inode/directory", "text/x-c", "text/html", "text/plain", "text/rtf", "image/jpeg", "applicatio...
## $ encoding <chr> "binary", "us-ascii", "us-ascii", "us-ascii", "us-ascii", "binary", "binary", "binary", "us-asc...
## $ extensions <chr> NA, "???", "???", "???", "???", "jpeg/jpg/jpe/jfif", "???", "???", "???", "???"
## $ extensions <list> [NA, <"c", "cc", "cpp", "cxx", "dic", "h", "hh">, <"htm", "html", "shtml">, <"conf", "def", "i...
## $ description <chr> "directory", "C source, ASCII text", "HTML document, ASCII text, with CRLF line terminators", "...
``` r
# Use a non-system magic-file
system.file("img", package="wand") %>%
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant(magic_wand_file()) %>%
select(description) %>%
@ -66,6 +69,27 @@ system.file("img", package="wand") %>%
## [10] "TIFF image data, big-endian"
``` r
# what kinds of extensions are associated with these mime types
system.file("extdata", "img", package="wand") %>%
list.files(full.names=TRUE) %>%
incant(magic_wand_file()) %>%
select(extensions) %>%
as.data.frame()
```
## extensions
## 1 NA
## 2 c, cc, cpp, cxx, dic, h, hh
## 3 htm, html, shtml
## 4 conf, def, in, ini, list, log, text, txt
## 5 rtf
## 6 jfif, jpe, jpeg, jpg
## 7 pdf
## 8 png
## 9 conf, def, in, ini, list, log, text, txt
## 10 tif, tiff
``` r
# current verison
packageVersion("wand")
```
@ -81,7 +105,7 @@ library(testthat)
date()
```
## [1] "Mon Aug 15 10:19:22 2016"
## [1] "Mon Aug 15 11:54:15 2016"
``` r
test_dir("tests/")

4
builder/make_mime_db.r

@ -0,0 +1,4 @@
JSON_DB_URL <- "https://raw.githubusercontent.com/jshttp/mime-db/master/db.json"
mime_db <- jsonlite::fromJSON(JSON_DB_URL, flatten=TRUE)
use_data(mime_db)

45
configure

@ -0,0 +1,45 @@
echo "Checking to see if libmagic is available..."
: ${R_HOME=`R RHOME`}
if test -z "${R_HOME}"; then
echo "could not determine R_HOME"
exit 1
fi
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
CXXFLAGS=`"${R_HOME}/bin/R" CMD config CXXFLAGS`
LDFLAGS=`"${R_HOME}/bin/R" CMD config LDFLAGS`
DYLIB_LDFLAGS=`"${R_HOME}/bin/R" CMD config DYLIB_LDFLAGS`
SHLIB_LDFLAGS=`"${R_HOME}/bin/R" CMD config SHLIB_LDFLAGS`
temp_src=$(mktemp)
cat > ${temp_src} <<EOF
#include "src/magic.h"
int main() {
magic_t t = magic_open(MAGIC_NONE);
return(0)
}
EOF
temp_exe=$(mktemp)
${CC} ${CFLAGS} ${CPPFLAGS} ${LD_FLAGS} ${CXXFLAGS} ${DYLIB_LDFLAGS} ${SHLIB_LDFLAGS} -L/usr/local/lib -L/opt/local/lib -L/usr/lib -lmagic -o ${temp_exe} ${temp_src} &> /dev/null
ccerr=$?
rm ${temp_src} ${temp_exe}
if [ "$ccerr" == 1 ] ; then
echo
echo
echo "The libmagic library was not found."
echo
echo "Please install it before installing this package."
echo
echo
exit 1
fi
exit 0

BIN
data/mime_db.rda

Binary file not shown.

0
inst/db/new/magic.mgc.zip → inst/extdata/db/new/magic.mgc.zip

0
inst/db/old/magic.mgc.zip → inst/extdata/db/old/magic.mgc.zip

0
inst/img/Rlogo.jpg → inst/extdata/img/Rlogo.jpg

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

0
inst/img/Rlogo.pdf → inst/extdata/img/Rlogo.pdf

0
inst/img/Rlogo.png → inst/extdata/img/Rlogo.png

Before

Width:  |  Height:  |  Size: 69 KiB

After

Width:  |  Height:  |  Size: 69 KiB

0
inst/img/Rlogo.svg → inst/extdata/img/Rlogo.svg

Before

Width:  |  Height:  |  Size: 2.0 KiB

After

Width:  |  Height:  |  Size: 2.0 KiB

0
inst/img/Rlogo.tiff → inst/extdata/img/Rlogo.tiff

0
inst/img/example.c → inst/extdata/img/example.c

0
inst/img/example.html → inst/extdata/img/example.html

0
inst/img/example.r → inst/extdata/img/example.r

0
inst/img/example.rtf → inst/extdata/img/example.rtf

0
inst/img/example_dir/test.txt → inst/extdata/img/example_dir/test.txt

2
man/incant.Rd

@ -30,7 +30,7 @@ Various fields might not be available depending on the version
\examples{
library(dplyr)
system.file("img", package="filemagic") \%>\%
system.file("extdata/img", package="filemagic") \%>\%
list.files(full.names=TRUE) \%>\%
incant() \%>\%
glimpse()

2
man/magic_wand_file.Rd

@ -28,7 +28,7 @@ cache directory has been cleared.
\examples{
library(dplyr)
system.file("img", package="filemagic") \%>\%
system.file("extdata/img", package="filemagic") \%>\%
list.files(full.names=TRUE) \%>\%
incant(magic_wand_file()) \%>\%
glimpse()

40
man/mime_db.Rd

@ -0,0 +1,40 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/datasets.r
\docType{data}
\name{mime_db}
\alias{mime_db}
\title{MIME Types Database}
\format{A list with 1,883 elements and four named fields: \code{source},
\code{compressible}, \code{extensions} & \code{charset}.}
\usage{
data(mime_db)
}
\description{
This is a dataset of all mime types. It aggregates data from the
following sources:
\itemize{
\item \url{http://www.iana.org/assignments/media-types/media-types.xhtml}
\item \url{http://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types}
\item \url{http://hg.nginx.org/nginx/raw-file/default/conf/mime.types}
}
There are a total of four possible fields per element:
\itemize{
\item \code{source}: where the mime type is defined. If not set, it's
probably a custom media type. One of \code{apache}, \code{iana} or \code{nginx}.
\item \code{extensions}: a character vector of known extensions associated with this mime type.
\item \code{compressible}: whether a file of this type can be "gzipped" (mostly
useful in the context of serving up web content).
\item \code{charset}: the default charset associated with this type, if any.
}
}
\note{
Last updated 2016-08-14; the only guaranteed field is \code{source}
}
\references{
Ingested from \url{https://github.com/jshttp/mime-db}.
}
\keyword{datasets}

11
man/wand.Rd

@ -1,12 +1,19 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/wand-package.R
% Please edit documentation in R/wand-package.r
\docType{package}
\name{wand}
\alias{wand}
\alias{wand-package}
\title{Retrieve 'Magic' Attributes from Files and Directories}
\description{
Retrieve 'Magic' Attributes from Files and Directories
The 'libmagic' library provides functions to determine 'MIME' type and other
metadata from files through their "magic" attributes. This is useful when you
do not wish to rely solely on the honesty of a user or the extension on a
file name. It also incorporates other metadata from the mime-db database
<https://github.com/jshttp/mime-db>
}
\details{
Based on \code{file} / \code{libmagic} - \url{https://github.com/file/file}
}
\author{
Bob Rudis (@hrbrmstr)

1
src/wand.cpp

@ -1,5 +1,4 @@
#include <Rcpp.h>
using namespace Rcpp;
#ifdef _WIN32

2
tests/testthat/test-wand.R

@ -1,7 +1,7 @@
context("basic functionality")
test_that("we can do something", {
tmp <- incant(list.files(system.file("img", package="wand"),
tmp <- incant(list.files(system.file("extdata", "img", package="wand"),
full.names=TRUE),
magic_wand_file())
tmp <- tmp$description

Loading…
Cancel
Save