Skip to content

Commit

Permalink
add readme & format code
Browse files Browse the repository at this point in the history
  • Loading branch information
giom-l committed Jan 26, 2020
1 parent a8c756a commit fbeba36
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
8 changes: 8 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Rust crawler to gather links list from a webpage

## Compilation
Just launch `cargo fmt --all -- --check && cargo clippy && cargo build --release --bin rust-crawler` so that all your code will be linted and checked.
*NB*:Rust `1.36` or newer is needed.

## Usage
Link to be crawled is hardcoded for now, so just launch the binary you built.
9 changes: 4 additions & 5 deletions src/main.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
extern crate url_crawler;
use url_crawler::*;

use std::sync::Arc;

use url_crawler::*;

/// Function for filtering content in the crawler before a HEAD request.
///
/// Only allow directory entries, and files that have the `deb` extension.
fn apt_filter(url: &Url) -> bool {
let url = url.as_str();
url.ends_with("/") || url.ends_with(".deb")
}

pub fn main() {
// Create a crawler designed to crawl the given website.
let crawler = Crawler::new("https://training.engineering.publicissapient.fr/".to_owned())
Expand All @@ -19,7 +19,6 @@ pub fn main() {
.pre_fetch(Arc::new(apt_filter))
// Initialize the crawler and begin crawling. This returns immediately.
.crawl();

// Process url entries as they become available
for file in crawler {
println!("{:#?}", file);
Expand Down

0 comments on commit fbeba36

Please sign in to comment.