A web archiving service similar to archive.is, built with Rust. WaybackByrd allows you to create permanent, searchable snapshots of web pages with HTML archiving, screenshot capture, PDF export, and full-text search capabilities.
  • Rust 71%
  • HTML 14.6%
  • CSS 10%
  • PLpgSQL 3.3%
  • JavaScript 1.1%
Go to file
2026-03-13 07:44:54 -04:00
migrations WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
src WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
static WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
.env.example WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
.gitignore WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
Cargo.toml WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
CLAUDE.md WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
docker-compose.yml WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
LICENSE Initial commit 2026-03-13 02:54:57 +00:00
QUICKSTART.md WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00
README.md WaybackByrd initial setup. It compiles and eveything starts up fine. Might still be some issues with functionality. 2026-03-13 07:44:54 -04:00

WaybackByrd

A web archiving service similar to archive.is, built with Rust. WaybackByrd allows you to create permanent, searchable snapshots of web pages with HTML archiving, screenshot capture, PDF export, and full-text search capabilities.

Features

  • Web Archiving: Capture and preserve web pages with all their content
  • Screenshot Capture: Generate full-page screenshots using headless Chrome
  • PDF Export: Convert archived pages to PDF format
  • Full-Text Search: Search through archived content using PostgreSQL full-text search
  • Duplicate Detection: Automatically detect and reuse existing archives
  • REST API: JSON API for programmatic access

Technology Stack

  • Backend: Rust with Axum web framework
  • Database: PostgreSQL with SQLx
  • Templating: Tera (Jinja2-like syntax)
  • Headless Browser: Chromiumoxide (Chrome DevTools Protocol)
  • HTTP Client: reqwest with async support

Prerequisites

  • Rust 1.70+ (install from rustup.rs)
  • PostgreSQL 14+
  • Chrome or Chromium browser
  • Docker and Docker Compose (for easy PostgreSQL setup)

Quick Start

1. Clone the Repository

git clone https://github.com/yourusername/WaybackByrd.git
cd WaybackByrd

2. Set Up Environment Variables

cp .env.example .env

Edit .env and configure your settings:

DATABASE_URL=postgresql://waybackbyrd:password@localhost:5432/waybackbyrd
BIND_ADDRESS=127.0.0.1:3000
CHROME_PATH=/usr/bin/chromium
MAX_CONTENT_SIZE=52428800
REQUEST_TIMEOUT=30
RUST_LOG=info,waybackbyrd=debug

3. Start PostgreSQL

Using Docker Compose:

docker-compose up -d

Or use your own PostgreSQL instance and create the database:

createdb waybackbyrd

4. Run Database Migrations

Migrations are automatically run on startup, but you can also run them manually:

cargo install sqlx-cli --no-default-features --features postgres
sqlx migrate run

5. Build and Run

cargo build --release
cargo run --release

The server will start on http://127.0.0.1:3000 (or the address specified in .env).

Usage

Web Interface

  1. Open your browser and navigate to http://127.0.0.1:3000
  2. Enter a URL in the archive form and click "Archive"
  3. Wait for the archive to complete (the page will auto-refresh)
  4. View the archived page, download a screenshot, or export to PDF

API Endpoints

Create Archive

curl -X POST http://127.0.0.1:3000/archive \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Response:

{
  "id": 1,
  "status": "pending"
}

Get Archive Status

curl http://127.0.0.1:3000/api/archive/1/status

List Recent Archives

curl http://127.0.0.1:3000/api/archives

View Archive

curl http://127.0.0.1:3000/archive/1

Get Screenshot

curl http://127.0.0.1:3000/archive/1/screenshot -o screenshot.png

Get PDF

curl http://127.0.0.1:3000/archive/1/pdf -o archive.pdf

Search Archives

curl http://127.0.0.1:3000/search?q=example

Project Structure

WaybackByrd/
├── Cargo.toml              # Rust dependencies and project metadata
├── docker-compose.yml      # PostgreSQL container configuration
├── migrations/             # Database migrations
│   └── 001_initial_schema.sql
├── src/
│   ├── main.rs            # Application entry point
│   ├── config.rs          # Configuration management
│   ├── routes/            # HTTP route handlers
│   │   ├── home.rs       # Homepage and search routes
│   │   ├── archive.rs    # Archive viewing routes
│   │   └── api.rs        # JSON API endpoints
│   ├── services/         # Business logic
│   │   ├── archiver.rs   # Archive orchestration
│   │   ├── fetcher.rs    # HTTP fetching and HTML parsing
│   │   ├── screenshot.rs # Screenshot capture
│   │   ├── pdf.rs        # PDF generation
│   │   └── search.rs     # Search functionality
│   ├── models/           # Database models
│   │   └── archive.rs
│   ├── db/               # Database layer
│   │   └── repository.rs # Database queries
│   └── templates/        # Tera HTML templates
├── static/               # CSS, JavaScript, images
└── tests/               # Test files

Development

Running Tests

cargo test

Database Migrations

Create a new migration:

sqlx migrate add <migration_name>

Run migrations:

sqlx migrate run

Revert last migration:

sqlx migrate revert

Linting and Formatting

cargo fmt
cargo clippy

Configuration

Environment Variables

  • DATABASE_URL: PostgreSQL connection string
  • BIND_ADDRESS: Server bind address and port
  • CHROME_PATH: Path to Chrome/Chromium executable
  • MAX_CONTENT_SIZE: Maximum content size in bytes (default: 50MB)
  • REQUEST_TIMEOUT: HTTP request timeout in seconds (default: 30)
  • RUST_LOG: Logging level (e.g., info, debug, trace)

Security Considerations

  • URLs are validated to prevent local file access
  • Localhost URLs are blocked by default
  • Content size is limited to prevent resource exhaustion
  • HTML is sandboxed in iframes for safe viewing
  • No support for file:// or other non-HTTP(S) schemes

Deployment

Production Build

cargo build --release

The binary will be located at target/release/waybackbyrd.

Running in Production

  1. Set up a reverse proxy (nginx, Caddy) for SSL/TLS
  2. Configure PostgreSQL for production use
  3. Set appropriate environment variables
  4. Run the binary:
./target/release/waybackbyrd

Docker Deployment

A production Dockerfile can be created:

FROM rust:1.70 as builder
WORKDIR /app
COPY . .
RUN cargo build --release

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y chromium postgresql-client
COPY --from=builder /app/target/release/waybackbyrd /usr/local/bin/
CMD ["waybackbyrd"]

Roadmap

  • User authentication and private archives
  • Archive collections and tagging
  • Advanced search with filters
  • Archive comparison (diff between versions)
  • Scheduled re-archiving
  • Archive API rate limiting
  • CDN integration for assets
  • Replace PostgreSQL FTS with Tantivy
  • Redis caching layer

License

MIT License - see LICENSE file for details

© 2026 BallsGang

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Acknowledgments

Built with inspiration from archive.is and the Internet Archive's Wayback Machine.