tumblr original post scraper by kayos [fixed]

Go 100%

Go to file

ibotzhub f23e27718a fix: closure race, error handling, dead imports, nil checks, MustWaitOpen		2026-02-24 09:10:26 -08:00
.gitignore	init	2025-06-28 12:21:57 -07:00
browser.go	init	2025-06-28 12:21:57 -07:00
CHANGELOG.md	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00
go.mod	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00
go.sum	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00
logger.go	init	2025-06-28 12:21:57 -07:00
main.go	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00
README.md	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00
tumble	fix: closure race, error handling, dead imports, nil checks, MustWaitOpen	2026-02-24 09:10:26 -08:00

README.md

tumble

scrapes a target tumblr blog using jetblack's original post finder and browser automation via rod.

images are deduplicated by blake2b content hash. already-downloaded content is pre-hashed on startup so runs are safely resumable.

output lands in ./tumblr/<blogname>/

requirements

Go 1.20+
chromium or chrome installed and in PATH
XVFB if running headless on a display-less server

usage

tumble [flags] <blogname>

flag	description
`-v`, `--show-browser`	show the browser window (disables XVFB)
`-h`, `--help`	show help

# headless (default)
tumble some-blog

# with visible browser window
tumble -v some-blog

kill chromium when done:

pkill chromium

build

git clone https://github.com/ibotzhub/tumble
cd tumble
go build -o tumble .

behavior

opens jetblack's original post finder in a headless chromium instance
enters the blog name, submits
on page load: clicks "show more posts", collects image elements, scrolls, downloads
skips 250px thumbnails, fetches 1280px originals
deduplicates by content hash, not filename
if a file exists with matching name but different hash, renames with hash suffix

changelog

ibot (this fork)

fixes:

goroutine closure race -- link *string was captured by reference into goroutines. by the time goroutines fired, the pointer pointed to whatever the next loop iteration had written. wrong images downloaded, some skipped entirely. fixed by capturing linkVal := *link before the goroutine.
GetPage error silently swallowed -- err from b.GetPage(...) was immediately overwritten by os.MkdirAll before being checked. a failed browser session would have continued and panicked later. fixed: mkdir and GetPage are now in the correct order with proper error checks.
p.MustWaitOpen() is not a valid method -- MustWaitOpen exists on rod.Browser, not rod.Page. would panic at runtime. removed. MustActivate() (already called immediately after) covers the intent.
err shared across goroutines -- the outer err var was being assigned by multiple concurrent goroutines. data race. all err assignments inside goroutines are now local via :=.
unbounded strings.Split index -- strings.Split(lnk, "tumblr_")[1] would panic on any URL that doesn't contain tumblr_. added bounds check with a warn-and-skip.
empty body hash panic -- resp.Body()[0:len(body)/2] on a zero-length response would panic. added empty body guard.
deferred releases moved before use -- fasthttp.ReleaseRequest/Response defers were placed after the body was already read. moved to immediately after acquire so they release correctly on all paths.
typo Headeres -- renamed to Headers.
dead import git.tcp.direct/kayos/common -- kayos's personal git server is offline. import swapped to the github mirror github.com/yunginnanet/common. module path updated from git.tcp.direct/kayos/tumble to github.com/ibotzhub/tumble.

kayos (original)

initial implementation
browser automation via go-rod + stealth
fasthttp download with blake2b deduplication
250px -> 1280px thumbnail upgrade
resume support via pre-hashing existing files
XVFB headless mode

kayos+ibot 5ever < 3