view README.md @ 17:10ef42e217cc

Update Readme
author Lewin Bormann <lbo@spheniscida.de>
date Mon, 21 Sep 2020 17:27:15 +0200
parents 99031188b089
children
line wrap: on
line source

# `scrapeprice`

is a small one-binary Rust framework for fetching and analyzing web pages. It
honors robots.txt and is based on tokio. The logic for specific websites is
implemented in Rust -- alternatively, a configuration-based system can be
implemented.

To implement your own scraper, you need to implement three traits according to
your needs; the implementations are stored in a `driver::DriverLogic` object.

- `driver::Explorer` tells the driver which URLs to visit. It returns a list of
URLs to initially visit and extracts new URLs from fetched documents.
- `driver::Extractor<T>` extracts items of type `T` from a fetched web page.
- `driver::Storage<T>` stores the items returned by an `Extractor`, whether in a
log file, CSV or other text format, or into a database.

Once you have implemented those traits, check out the `main.rs` file as example.
Not much more is needed now:

```rust
    let logic = driver::DriverLogic {    
        explore: Box::new(implem::YourExplorer::new()),    
        store: Box::new(implem::YourStorage {}),    
        extract: Box::new(implem::YourItemPriceExtractor {}),    
    };    
    let mut driver = driver::Driver::new(logic, None);    
     
    let mut ival = tokio::time::interval(tokio::time::Duration::from_millis(2000));    
    
    loop {    
        ival.tick().await;     
        if let Err(e) = driver.drive().await {    
            warn!("Error from driver:  {}", e);    
        }    
    } 
```

For an example, you can check out the `audiophil` module in the `examples`
directory, which scrapes the website of my local photo store.