- Go 100%
| .gitignore | ||
| browser.go | ||
| CHANGELOG.md | ||
| go.mod | ||
| go.sum | ||
| logger.go | ||
| main.go | ||
| README.md | ||
| tumble | ||
tumble
scrapes a target tumblr blog using jetblack's original post finder and browser automation via rod.
images are deduplicated by blake2b content hash. already-downloaded content is pre-hashed on startup so runs are safely resumable.
output lands in ./tumblr/<blogname>/
requirements
- Go 1.20+
- chromium or chrome installed and in PATH
- XVFB if running headless on a display-less server
usage
tumble [flags] <blogname>
| flag | description |
|---|---|
-v, --show-browser |
show the browser window (disables XVFB) |
-h, --help |
show help |
# headless (default)
tumble some-blog
# with visible browser window
tumble -v some-blog
kill chromium when done:
pkill chromium
build
git clone https://github.com/ibotzhub/tumble
cd tumble
go build -o tumble .
behavior
- opens jetblack's original post finder in a headless chromium instance
- enters the blog name, submits
- on page load: clicks "show more posts", collects image elements, scrolls, downloads
- skips 250px thumbnails, fetches 1280px originals
- deduplicates by content hash, not filename
- if a file exists with matching name but different hash, renames with hash suffix
changelog
ibot (this fork)
fixes:
-
goroutine closure race --
link *stringwas captured by reference into goroutines. by the time goroutines fired, the pointer pointed to whatever the next loop iteration had written. wrong images downloaded, some skipped entirely. fixed by capturinglinkVal := *linkbefore the goroutine. -
GetPage error silently swallowed --
errfromb.GetPage(...)was immediately overwritten byos.MkdirAllbefore being checked. a failed browser session would have continued and panicked later. fixed: mkdir and GetPage are now in the correct order with proper error checks. -
p.MustWaitOpen()is not a valid method --MustWaitOpenexists onrod.Browser, notrod.Page. would panic at runtime. removed.MustActivate()(already called immediately after) covers the intent. -
errshared across goroutines -- the outererrvar was being assigned by multiple concurrent goroutines. data race. allerrassignments inside goroutines are now local via:=. -
unbounded
strings.Splitindex --strings.Split(lnk, "tumblr_")[1]would panic on any URL that doesn't containtumblr_. added bounds check with a warn-and-skip. -
empty body hash panic --
resp.Body()[0:len(body)/2]on a zero-length response would panic. added empty body guard. -
deferred releases moved before use --
fasthttp.ReleaseRequest/Responsedefers were placed after the body was already read. moved to immediately after acquire so they release correctly on all paths. -
typo
Headeres-- renamed toHeaders. -
dead import
git.tcp.direct/kayos/common-- kayos's personal git server is offline. import swapped to the github mirrorgithub.com/yunginnanet/common. module path updated fromgit.tcp.direct/kayos/tumbletogithub.com/ibotzhub/tumble.
kayos (original)
- initial implementation
- browser automation via go-rod + stealth
- fasthttp download with blake2b deduplication
- 250px -> 1280px thumbnail upgrade
- resume support via pre-hashing existing files
- XVFB headless mode
kayos+ibot 5ever < 3