- Rust 71%
- HTML 14.6%
- CSS 10%
- PLpgSQL 3.3%
- JavaScript 1.1%
| migrations | ||
| src | ||
| static | ||
| .env.example | ||
| .gitignore | ||
| Cargo.toml | ||
| CLAUDE.md | ||
| docker-compose.yml | ||
| LICENSE | ||
| QUICKSTART.md | ||
| README.md | ||
WaybackByrd
A web archiving service similar to archive.is, built with Rust. WaybackByrd allows you to create permanent, searchable snapshots of web pages with HTML archiving, screenshot capture, PDF export, and full-text search capabilities.
Features
- Web Archiving: Capture and preserve web pages with all their content
- Screenshot Capture: Generate full-page screenshots using headless Chrome
- PDF Export: Convert archived pages to PDF format
- Full-Text Search: Search through archived content using PostgreSQL full-text search
- Duplicate Detection: Automatically detect and reuse existing archives
- REST API: JSON API for programmatic access
Technology Stack
- Backend: Rust with Axum web framework
- Database: PostgreSQL with SQLx
- Templating: Tera (Jinja2-like syntax)
- Headless Browser: Chromiumoxide (Chrome DevTools Protocol)
- HTTP Client: reqwest with async support
Prerequisites
- Rust 1.70+ (install from rustup.rs)
- PostgreSQL 14+
- Chrome or Chromium browser
- Docker and Docker Compose (for easy PostgreSQL setup)
Quick Start
1. Clone the Repository
git clone https://github.com/yourusername/WaybackByrd.git
cd WaybackByrd
2. Set Up Environment Variables
cp .env.example .env
Edit .env and configure your settings:
DATABASE_URL=postgresql://waybackbyrd:password@localhost:5432/waybackbyrd
BIND_ADDRESS=127.0.0.1:3000
CHROME_PATH=/usr/bin/chromium
MAX_CONTENT_SIZE=52428800
REQUEST_TIMEOUT=30
RUST_LOG=info,waybackbyrd=debug
3. Start PostgreSQL
Using Docker Compose:
docker-compose up -d
Or use your own PostgreSQL instance and create the database:
createdb waybackbyrd
4. Run Database Migrations
Migrations are automatically run on startup, but you can also run them manually:
cargo install sqlx-cli --no-default-features --features postgres
sqlx migrate run
5. Build and Run
cargo build --release
cargo run --release
The server will start on http://127.0.0.1:3000 (or the address specified in .env).
Usage
Web Interface
- Open your browser and navigate to
http://127.0.0.1:3000 - Enter a URL in the archive form and click "Archive"
- Wait for the archive to complete (the page will auto-refresh)
- View the archived page, download a screenshot, or export to PDF
API Endpoints
Create Archive
curl -X POST http://127.0.0.1:3000/archive \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Response:
{
"id": 1,
"status": "pending"
}
Get Archive Status
curl http://127.0.0.1:3000/api/archive/1/status
List Recent Archives
curl http://127.0.0.1:3000/api/archives
View Archive
curl http://127.0.0.1:3000/archive/1
Get Screenshot
curl http://127.0.0.1:3000/archive/1/screenshot -o screenshot.png
Get PDF
curl http://127.0.0.1:3000/archive/1/pdf -o archive.pdf
Search Archives
curl http://127.0.0.1:3000/search?q=example
Project Structure
WaybackByrd/
├── Cargo.toml # Rust dependencies and project metadata
├── docker-compose.yml # PostgreSQL container configuration
├── migrations/ # Database migrations
│ └── 001_initial_schema.sql
├── src/
│ ├── main.rs # Application entry point
│ ├── config.rs # Configuration management
│ ├── routes/ # HTTP route handlers
│ │ ├── home.rs # Homepage and search routes
│ │ ├── archive.rs # Archive viewing routes
│ │ └── api.rs # JSON API endpoints
│ ├── services/ # Business logic
│ │ ├── archiver.rs # Archive orchestration
│ │ ├── fetcher.rs # HTTP fetching and HTML parsing
│ │ ├── screenshot.rs # Screenshot capture
│ │ ├── pdf.rs # PDF generation
│ │ └── search.rs # Search functionality
│ ├── models/ # Database models
│ │ └── archive.rs
│ ├── db/ # Database layer
│ │ └── repository.rs # Database queries
│ └── templates/ # Tera HTML templates
├── static/ # CSS, JavaScript, images
└── tests/ # Test files
Development
Running Tests
cargo test
Database Migrations
Create a new migration:
sqlx migrate add <migration_name>
Run migrations:
sqlx migrate run
Revert last migration:
sqlx migrate revert
Linting and Formatting
cargo fmt
cargo clippy
Configuration
Environment Variables
DATABASE_URL: PostgreSQL connection stringBIND_ADDRESS: Server bind address and portCHROME_PATH: Path to Chrome/Chromium executableMAX_CONTENT_SIZE: Maximum content size in bytes (default: 50MB)REQUEST_TIMEOUT: HTTP request timeout in seconds (default: 30)RUST_LOG: Logging level (e.g.,info,debug,trace)
Security Considerations
- URLs are validated to prevent local file access
- Localhost URLs are blocked by default
- Content size is limited to prevent resource exhaustion
- HTML is sandboxed in iframes for safe viewing
- No support for file:// or other non-HTTP(S) schemes
Deployment
Production Build
cargo build --release
The binary will be located at target/release/waybackbyrd.
Running in Production
- Set up a reverse proxy (nginx, Caddy) for SSL/TLS
- Configure PostgreSQL for production use
- Set appropriate environment variables
- Run the binary:
./target/release/waybackbyrd
Docker Deployment
A production Dockerfile can be created:
FROM rust:1.70 as builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y chromium postgresql-client
COPY --from=builder /app/target/release/waybackbyrd /usr/local/bin/
CMD ["waybackbyrd"]
Roadmap
- User authentication and private archives
- Archive collections and tagging
- Advanced search with filters
- Archive comparison (diff between versions)
- Scheduled re-archiving
- Archive API rate limiting
- CDN integration for assets
- Replace PostgreSQL FTS with Tantivy
- Redis caching layer
License
MIT License - see LICENSE file for details
Copyright
© 2026 BallsGang
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Acknowledgments
Built with inspiration from archive.is and the Internet Archive's Wayback Machine.