Konachan.com is one of famous anime/game/CG imageboards, devoted mostly to wallpapers and landscape images.
It has good image selection and strong community that gives the adequate score to images.
It's verbose tagging system very close to danbooru/safebooru standards.
Also Konachan presents wide variery pictures composition and quality - from almost-empty or
background-dominated wallpapers to clear art, from schematic line art to pure full-color digital.
That's why Konachan is a good source for investigation of non-photographic images and their metadata
to build tools to auto-classify all of that (or simply make your eyes happy).
**This release cover interval from start till 310.100 (04.07.2020) and contains:**
- **142.756 images of "samples" quality or "original" files when they were less than samples**
there were two sampling policies:
* max width = 2000 px till ID=91817 (30.12.2010) when a big share of images fall into samples
* max width = 1500 px till now when almost all images sampled
because of luxurious (99-100%) JPEG quality of most samples I compact them (with ImageMagick mogrify) to 92%
- **full JSON** metadata for all **246.421 posts** except those failed to grab (deleted posts etc)
* with simple Python script how to do it
* with pretty-printed example to illustrate structure and content
- additional TSV (tab separated text) metadata
* key parameters of 143.921 images initially grabbed, including some calculated stats
~ derived from JSON
~ computed with ImageMagick over above mentioned "samples"
* tag-to-post relations (2.736.167) as separate table
~ non-ascii and not suitable for file names symbols replaced or suppressed
~ used for file renaming wherever possible
- some database (Oracle SQL), shell (Windows BAT) and Python scripts
* data structures definition
* key processing steps in database, some query examples
* tools for computing
* not completely “ready to use” but key “building blocks”
- more detailed readme for DATA
- BONUS: example of usage to discover "mogrify effect" when changing JPEG quality from 99-100% to 92%
* mogrify parameters based on [research](https://www.smashingmagazine.com/2015/06/efficient-image-resizing-with-imagemagick/)
* 99.759 images affected, size changed from 92 to 35 GBytes (not bad, isn't it ?)
* only several specific images [e.g. ID=120404](https://konachan.com/post/show/120404/blonde_hair-seeu-twintails-vocaloid) got eye-visible artifacts
**Release include "samples" only for posts with "good enough" images worth to get originals:**
- file_ext in ('jpg','png')
- greatest(image_height,image_width)>=1200 and least(image_height,image_width)>=1000 -- not too small
and image_height * image_width>=1310720 -- (1280*1024)
and image_width / image_height between 0.4 and 2.1 -- not too disproportional
- **rating in ('s','q')** in separate folders/zips
* some (272) explicit-like samples excluded from 'questionable'
- grabbed files renamed to contain **ID - up_to_3_copyrights ~ up_to_5_characters (up_to_2_artists)**
* tags concatenated via “+”, spaces replaced with underscores
* maximum file name length 220 symbols, characters tags may be truncated if too long
* this enables file system search and sampling (with XCOPY , UNZIP etc)
- some gentle deduplication done
* some (893) images threw out preferring 's' rating and newer posts
* only when no visible artistic difference but maybe technical issues
* a lot of (~ 2000 ?) similarities left
- no filter applied by score and/or tags
* it was an initial idea to include only "the best of" and exclude "banned tags"
* the border of "acceptable quality" turned out to be fuzzy
* user score vs tags vs metadata will be the field of research
Images archived by 10.000 ID groups NNxxxx.[Safe/Questionable][Files/Samples] NN=00..30
I recommend to use FastStone MaxView to browse images inside zips.
[HERE](https://sukebei.nyaa.si/view/3219520) is the same way created release for yande.re
[THERE ARE](https://nyaa.si/user/AlexPUA) some rips on Nyaa tracker for Safebooru and Zerochan. No nipples there.
Comments - 0