tumblr original post scraper by kayos [fixed]
Go to file
2026-02-24 09:10:26 -08:00
.gitignore init 2025-06-28 12:21:57 -07:00
browser.go init 2025-06-28 12:21:57 -07:00
CHANGELOG.md fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00
go.mod fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00
go.sum fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00
logger.go init 2025-06-28 12:21:57 -07:00
main.go fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00
README.md fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00
tumble fix: closure race, error handling, dead imports, nil checks, MustWaitOpen 2026-02-24 09:10:26 -08:00

tumble

scrapes a target tumblr blog using jetblack's original post finder and browser automation via rod.

images are deduplicated by blake2b content hash. already-downloaded content is pre-hashed on startup so runs are safely resumable.

output lands in ./tumblr/<blogname>/


requirements

  • Go 1.20+
  • chromium or chrome installed and in PATH
  • XVFB if running headless on a display-less server

usage

tumble [flags] <blogname>
flag description
-v, --show-browser show the browser window (disables XVFB)
-h, --help show help
# headless (default)
tumble some-blog

# with visible browser window
tumble -v some-blog

kill chromium when done:

pkill chromium

build

git clone https://github.com/ibotzhub/tumble
cd tumble
go build -o tumble .

behavior

  • opens jetblack's original post finder in a headless chromium instance
  • enters the blog name, submits
  • on page load: clicks "show more posts", collects image elements, scrolls, downloads
  • skips 250px thumbnails, fetches 1280px originals
  • deduplicates by content hash, not filename
  • if a file exists with matching name but different hash, renames with hash suffix

changelog

ibot (this fork)

fixes:

  • goroutine closure race -- link *string was captured by reference into goroutines. by the time goroutines fired, the pointer pointed to whatever the next loop iteration had written. wrong images downloaded, some skipped entirely. fixed by capturing linkVal := *link before the goroutine.

  • GetPage error silently swallowed -- err from b.GetPage(...) was immediately overwritten by os.MkdirAll before being checked. a failed browser session would have continued and panicked later. fixed: mkdir and GetPage are now in the correct order with proper error checks.

  • p.MustWaitOpen() is not a valid method -- MustWaitOpen exists on rod.Browser, not rod.Page. would panic at runtime. removed. MustActivate() (already called immediately after) covers the intent.

  • err shared across goroutines -- the outer err var was being assigned by multiple concurrent goroutines. data race. all err assignments inside goroutines are now local via :=.

  • unbounded strings.Split index -- strings.Split(lnk, "tumblr_")[1] would panic on any URL that doesn't contain tumblr_. added bounds check with a warn-and-skip.

  • empty body hash panic -- resp.Body()[0:len(body)/2] on a zero-length response would panic. added empty body guard.

  • deferred releases moved before use -- fasthttp.ReleaseRequest/Response defers were placed after the body was already read. moved to immediately after acquire so they release correctly on all paths.

  • typo Headeres -- renamed to Headers.

  • dead import git.tcp.direct/kayos/common -- kayos's personal git server is offline. import swapped to the github mirror github.com/yunginnanet/common. module path updated from git.tcp.direct/kayos/tumble to github.com/ibotzhub/tumble.


kayos (original)

  • initial implementation
  • browser automation via go-rod + stealth
  • fasthttp download with blake2b deduplication
  • 250px -> 1280px thumbnail upgrade
  • resume support via pre-hashing existing files
  • XVFB headless mode

kayos+ibot 5ever < 3