Browse Source

README

master
boB Rudis 6 years ago
parent
commit
90b6bde934
No known key found for this signature in database GPG Key ID: 2A514A4997464560
  1. 2
      .Rbuildignore
  2. 3
      NEWS.md
  3. 2
      R/decapitated-package.R
  4. 29
      README.Rmd
  5. 33
      README.md
  6. 72
      inst/include/easywsclient.hpp
  7. 28
      inst/include/ws.h
  8. 2
      man/decapitated.Rd
  9. BIN
      output.pdf

2
.Rbuildignore

@ -8,3 +8,5 @@
^\.codecov\.yml$
^README_files$
^doc$
^screenshot\.png$
^output\.pdf$

3
NEWS.md

@ -2,6 +2,9 @@
* Re-design of how the Chrome binary is set
* env var functions to help with ^^
* switch to using processx
* added ability to specify "work" dirs
* changed license
* removed all traces of experimental C code
* options for naming & placing PDF & screenshot files
0.1.0

2
R/decapitated-package.R

@ -31,7 +31,7 @@
#'
#' A guess is made (but not verified yet) if `HEADLESS_CHROME` is non-existent.
#'
#' Use `~/.Renviron` to store this value for the time being.
#' It's best to use `~/.Renviron` to store this value.
#'
#' @md
#' @name decapitated

29
README.Rmd

@ -23,7 +23,34 @@ You'll need to set an envrionment variable `HEADLESS_CHROME` to one of these two
A guess is made (but not verified yet) if `HEADLESS_CHROME` is non-existent.
It's best to use `~/.Renviron` to store this value for the time being.
It's best to use `~/.Renviron` to store this value.
## Working around headless Chrome & OS security restrictions:
Security restrictions on various operating systems and OS configurations can cause
headless Chrome execution to fail. As a result, headless Chrome operations should
use a special directory for `decapitated` package operations. You can pass this
in as `work_dir`. If `work_dir` is `NULL` a `.rdecapdata` directory will be
created in your home directory and used for the data, crash dumps and utility
directories for Chrome operations.
`tempdir()` does not always meet these requirements (after testing on various
macOS 10.13 systems) as Chrome does some interesting attribute setting for
some of its file operations.
If you pass in a `work_dir`, it must be one that does not violate OS security
restrictions or headless Chrome will not function.
## Helping it "always work"
The three core functions have a `prime` parameter. In testing (again, especially on macOS),
I noticed that the first one or two requests to a URL often resulted in an empty `<body>`
response. I don't use Chrome as my primary browser anymroe so I'm not sure if that has somethign
to do with it, but requests after the first one or two do return content. The `prime`
parameter lets you specify `TRUE`, `FALSE` or a numeric value that will issue the
URL retrieval multiple times before returning a result (or generating a PDF or PNG).
Until there is more granular control over the command-line execution of headless
Chrome.
## What's in the tin?

33
README.md

@ -27,7 +27,36 @@ these two values:
A guess is made (but not verified yet) if `HEADLESS_CHROME` is
non-existent.
It’s best to use `~/.Renviron` to store this value for the time being.
It’s best to use `~/.Renviron` to store this value.
## Working around headless Chrome & OS security restrictions:
Security restrictions on various operating systems and OS configurations
can cause headless Chrome execution to fail. As a result, headless
Chrome operations should use a special directory for `decapitated`
package operations. You can pass this in as `work_dir`. If `work_dir` is
`NULL` a `.rdecapdata` directory will be created in your home directory
and used for the data, crash dumps and utility directories for Chrome
operations.
`tempdir()` does not always meet these requirements (after testing on
various macOS 10.13 systems) as Chrome does some interesting attribute
setting for some of its file operations.
If you pass in a `work_dir`, it must be one that does not violate OS
security restrictions or headless Chrome will not function.
## Helping it “always work”
The three core functions have a `prime` parameter. In testing (again,
especially on macOS), I noticed that the first one or two requests to a
URL often resulted in an empty `<body>` response. I don’t use Chrome as
my primary browser anymroe so I’m not sure if that has somethign to do
with it, but requests after the first one or two do return content. The
`prime` parameter lets you specify `TRUE`, `FALSE` or a numeric value
that will issue the URL retrieval multiple times before returning a
result (or generating a PDF or PNG). Until there is more granular
control over the command-line execution of headless Chrome.
## What’s in the tin?
@ -58,7 +87,7 @@ library(decapitated)
packageVersion("decapitated")
```
## [1] '0.1.0'
## [1] '0.2.0'
``` r
chrome_version()

72
inst/include/easywsclient.hpp

@ -1,72 +0,0 @@
#ifndef EASYWSCLIENT_HPP_20120819_MIOFVASDTNUASZDQPLFD
#define EASYWSCLIENT_HPP_20120819_MIOFVASDTNUASZDQPLFD
// This code comes from:
// https://github.com/dhbaird/easywsclient
//
// To get the latest version:
// wget https://raw.github.com/dhbaird/easywsclient/master/easywsclient.hpp
// wget https://raw.github.com/dhbaird/easywsclient/master/easywsclient.cpp
#include <string>
#include <vector>
namespace easywsclient {
struct Callback_Imp { virtual void operator()(const std::string& message) = 0; };
struct BytesCallback_Imp { virtual void operator()(const std::vector<uint8_t>& message) = 0; };
class WebSocket {
public:
typedef WebSocket * pointer;
typedef enum readyStateValues { CLOSING, CLOSED, CONNECTING, OPEN } readyStateValues;
// Factories:
static pointer create_dummy();
static pointer from_url(const std::string& url, const std::string& origin = std::string());
static pointer from_url_no_mask(const std::string& url, const std::string& origin = std::string());
// Interfaces:
virtual ~WebSocket() { }
virtual void poll(int timeout = 0) = 0; // timeout in milliseconds
virtual void send(const std::string& message) = 0;
virtual void sendBinary(const std::string& message) = 0;
virtual void sendBinary(const std::vector<uint8_t>& message) = 0;
virtual void sendPing() = 0;
virtual void close() = 0;
virtual readyStateValues getReadyState() const = 0;
template<class Callable>
void dispatch(Callable callable)
// For callbacks that accept a string argument.
{ // N.B. this is compatible with both C++11 lambdas, functors and C function pointers
struct _Callback : public Callback_Imp {
Callable& callable;
_Callback(Callable& callable) : callable(callable) { }
void operator()(const std::string& message) { callable(message); }
};
_Callback callback(callable);
_dispatch(callback);
}
template<class Callable>
void dispatchBinary(Callable callable)
// For callbacks that accept a std::vector<uint8_t> argument.
{ // N.B. this is compatible with both C++11 lambdas, functors and C function pointers
struct _Callback : public BytesCallback_Imp {
Callable& callable;
_Callback(Callable& callable) : callable(callable) { }
void operator()(const std::vector<uint8_t>& message) { callable(message); }
};
_Callback callback(callable);
_dispatchBinary(callback);
}
protected:
virtual void _dispatch(Callback_Imp& callable) = 0;
virtual void _dispatchBinary(BytesCallback_Imp& callable) = 0;
};
} // namespace easywsclient
#endif /* EASYWSCLIENT_HPP_20120819_MIOFVASDTNUASZDQPLFD */

28
inst/include/ws.h

@ -1,28 +0,0 @@
#include <Rcpp.h>
#ifdef _WIN32
#pragma comment( lib, "ws2_32" )
#include <WinSock2.h>
#endif
#include <assert.h>
#include <stdio.h>
#include <string>
#include <iostream>
#include <thread>
#include <chrono>
#include <condition_variable>
#include "easywsclient.hpp"
typedef struct _chromeWs chromeWs;
typedef chromeWs *chromeWsPtr;
struct _chromeWs {
easywsclient::WebSocket::pointer ws;
std::string response;
bool ready;
};
inline void finaliseWs(chromeWsPtr ws) {}
typedef Rcpp::XPtr<chromeWs,Rcpp::PreserveStorage,finaliseWs> XPtrWs;

2
man/decapitated.Rd

@ -41,7 +41,7 @@ You'll need to set an envrionment variable \code{HEADLESS_CHROME} to one of thes
A guess is made (but not verified yet) if \code{HEADLESS_CHROME} is non-existent.
Use \code{~/.Renviron} to store this value for the time being.
It's best to use \code{~/.Renviron} to store this value.
}
\author{

BIN
output.pdf

Binary file not shown.
Loading…
Cancel
Save