You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

66 lines
3.5 KiB

#' Count Lines of Code, Comments and Whitespace in Source Files and Archives
#'
#' Counts blank lines, comment lines, and physical lines of source code in source
#' files/trees/archives. An R wrapper to the Perl `cloc` utility
#' <https://github.com/AlDanial/cloc> by @@AlDanial.
#'
#' @md
#' @section How it works:
#' `cloc`'s method of operation resembles [`SLOCCount`](https://www.dwheeler.com/sloccount/)'s:
#' First, create a list of files to consider. Next, attempt to determine whether or not
#' found files contain recognized computer language source code. Finally, for files
#' identified as source files, invoke language-specific routines to count the number of
#' source lines.
#'
#' A more detailed description:
#'
#' 1. If the input file is an archive (such as a `.tar.gz` or `.zip` file),
#' create a temporary directory and expand the archive there using a
#' system call to an appropriate underlying utility (`tar`, `bzip2`, `unzip`,
#' etc) then add this temporary directory as one of the inputs. (This
#' works more reliably on Unix than on Windows.)
#' 2. Use perl's `File::Find` to recursively descend the input directories and make
#' a list of candidate file names. Ignore binary and zero-sized files.
#' 3. Make sure the files in the candidate list have unique contents
#' (first by comparing file sizes, then, for similarly sized files,
#' compare MD5 hashes of the file contents with perl's `Digest::MD5`). For each
#' set of identical files, remove all but the first copy, as determined
#' by a lexical sort, of identical files from the set. The removed
#' files are not included in the report.
#' 4. Scan the candidate file list for file extensions which cloc
#' associates with programming languages. Files which match are classified as
#' containing source
#' code for that language. Each file without an extensions is opened
#' and its first line read to see if it is a Unix shell script
#' (anything that begins with `#!`). If it is shell script, the file is
#' classified by that scripting language (if the language is
#' recognized). If the file does not have a recognized extension or is
#' not a recognzied scripting language, the file is ignored.
#' 5. All remaining files in the candidate list should now be source files
#' for known programming languages. For each of these files:
#'
#' 1. Read the entire file into memory.
#' 2. Count the number of lines (= L _original_).
#' 3. Remove blank lines, then count again (= L _non-blank_).
#' 4. Loop over the comment filters defined for this language. (For
#' example, C++ as two filters: (1) remove lines that start with
#' optional whitespace followed by `//` and (2) remove text between
#' `/*` and `*/`) Apply each filter to the code to remove comments.
#' Count the left over lines (= L _code_).
#' 5. Save the counts for this language:
#' * blank lines = L _original_ - L _non-blank_
#' * comment lines = L _non-blank_ - L _code_
#' * code lines = L _code_
#' @name cloc-package
#' @docType package
#' @author Bob Rudis (bob@@rud.is)
#' @importFrom DT datatable formatPercentage
#' @importFrom htmltools html_print HTML
#' @importFrom knitr purl kable
#' @importFrom rprojroot find_package_root_file
#' @import rstudioapi
#' @importFrom git2r clone
#' @importFrom processx run
#' @importFrom utils read.table contrib.url download.file download.packages tail
#' @importFrom utils setTxtProgressBar txtProgressBar
NULL