The 'Robots Exclusion Protocol' (<http://www.robotstxt.org/orig.html>) documents a set of standards for allowing or excluding robot/spider crawling of different areas of site content. Tools are provided which wrap The `rep-cpp` (<https://github.com/seomoz/rep-cpp>) C++ library for processing these `robots.txt` files.
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Parse and Test Robots Exclusion Protocol Files and Rules
Description
-----------
## Description
The 'Robots Exclusion Protocol' (<http://www.robotstxt.org/orig.html>) documents a set of standards for allowing or excluding robot/spider crawling of different areas of site content. Tools are provided which wrap The `rep-cpp` (<https://github.com/seomoz/rep-cpp>) C++ library for processing these `robots.txt` files.
The ‘Robots Exclusion Protocol’ <https://www.robotstxt.org/orig.html>
documents a set of standards for allowing or excluding robot/spider
crawling of different areas of site content. Tools are provided which
wrap The ‘rep-cpp’ <https://github.com/seomoz/rep-cpp> C++ library for
processing these ‘robots.txt’ files.
- [`rep-cpp`](https://github.com/seomoz/rep-cpp)
- [`url-cpp`](https://github.com/seomoz/url-cpp)
- [`rep-cpp`](https://github.com/seomoz/rep-cpp)
- [`url-cpp`](https://github.com/seomoz/url-cpp)
Tools
-----
## What’s Inside the Tin
The following functions are implemented:
- `robxp`: Parse a 'robots.txt' file & create a 'robxp' object
- `can_fetch`: Test URL paths against a 'robxp' 'robots.txt' object
- `crawl_delays`: Retrive all agent crawl delay values in a 'robxp' 'robots.txt' object
- `sitemaps`: Retrieve a character vector of sitemaps from a parsed robots.txt object
- `can_fetch`: Test URL paths against a robxp robots.txt object
- `crawl_delays`: Retrive all agent crawl delay values in a robxp
robots.txt object
- `print.robxp`: Custom printer for ’robxp“ objects
- `robxp`: Parse a ‘robots.txt’ file & create a ‘robxp’ object
- `sitemaps`: Retrieve a character vector of sitemaps from a parsed
robots.txt object
Installation
------------
## Installation
``` r
devtools::install_github("hrbrmstr/spiderbar")
remotes::install_gitlab("hrbrmstr/spiderbar")
# or
remotes::install_github("hrbrmstr/spiderbar")
```
Usage
-----
NOTE: To use the ‘remotes’ install options you will need to have the
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
Please note that this project is released with a Contributor Code of
Conduct. By participating in this project you agree to abide by its