konachan 2020 samples and metadata

Category:
Date:
2020-12-19 04:52 UTC
Submitter:
Seeders:
2
Information:
No information.
Leechers:
0
File size:
53.8 GiB
Completed:
144
Info hash:
b632ee61ca85150d7a1207acae53ad9ae04b0811
Konachan.com is one of famous anime/game/CG imageboards, devoted mostly to wallpapers and landscape images. It has good image selection and strong community that gives the adequate score to images. It's verbose tagging system very close to danbooru/safebooru standards. Also Konachan presents wide variery pictures composition and quality - from almost-empty or background-dominated wallpapers to clear art, from schematic line art to pure full-color digital. That's why Konachan is a good source for investigation of non-photographic images and their metadata to build tools to auto-classify all of that (or simply make your eyes happy). **This release cover interval from start till 310.100 (04.07.2020) and contains:** - **142.756 images of "samples" quality or "original" files when they were less than samples** there were two sampling policies: * max width = 2000 px till ID=91817 (30.12.2010) when a big share of images fall into samples * max width = 1500 px till now when almost all images sampled because of luxurious (99-100%) JPEG quality of most samples I compact them (with ImageMagick mogrify) to 92% - **full JSON** metadata for all **246.421 posts** except those failed to grab (deleted posts etc) * with simple Python script how to do it * with pretty-printed example to illustrate structure and content - additional TSV (tab separated text) metadata * key parameters of 143.921 images initially grabbed, including some calculated stats ~ derived from JSON ~ computed with ImageMagick over above mentioned "samples" * tag-to-post relations (2.736.167) as separate table ~ non-ascii and not suitable for file names symbols replaced or suppressed ~ used for file renaming wherever possible - some database (Oracle SQL), shell (Windows BAT) and Python scripts * data structures definition * key processing steps in database, some query examples * tools for computing * not completely “ready to use” but key “building blocks” - more detailed readme for DATA - BONUS: example of usage to discover "mogrify effect" when changing JPEG quality from 99-100% to 92% * mogrify parameters based on [research](https://www.smashingmagazine.com/2015/06/efficient-image-resizing-with-imagemagick/) * 99.759 images affected, size changed from 92 to 35 GBytes (not bad, isn't it ?) * only several specific images [e.g. ID=120404](https://konachan.com/post/show/120404/blonde_hair-seeu-twintails-vocaloid) got eye-visible artifacts **Release include "samples" only for posts with "good enough" images worth to get originals:** - file_ext in ('jpg','png') - greatest(image_height,image_width)>=1200 and least(image_height,image_width)>=1000 -- not too small and image_height * image_width>=1310720 -- (1280*1024) and image_width / image_height between 0.4 and 2.1 -- not too disproportional - **rating in ('s','q')** in separate folders/zips * some (272) explicit-like samples excluded from 'questionable' - grabbed files renamed to contain **ID - up_to_3_copyrights ~ up_to_5_characters (up_to_2_artists)** * tags concatenated via “+”, spaces replaced with underscores * maximum file name length 220 symbols, characters tags may be truncated if too long * this enables file system search and sampling (with XCOPY , UNZIP etc) - some gentle deduplication done * some (893) images threw out preferring 's' rating and newer posts * only when no visible artistic difference but maybe technical issues * a lot of (~ 2000 ?) similarities left - no filter applied by score and/or tags * it was an initial idea to include only "the best of" and exclude "banned tags" * the border of "acceptable quality" turned out to be fuzzy * user score vs tags vs metadata will be the field of research Images archived by 10.000 ID groups NNxxxx.[Safe/Questionable][Files/Samples] NN=00..30 I recommend to use FastStone MaxView to browse images inside zips. [HERE](https://sukebei.nyaa.si/view/3219520) is the same way created release for yande.re [THERE ARE](https://nyaa.si/user/AlexPUA) some rips on Nyaa tracker for Safebooru and Zerochan. No nipples there.

File list

  • Konachan_2020S
    • DATA
      • #DATA_readme.txt (4.7 KiB)
      • KONA_JSON.tsv (433.8 MiB)
      • kona_json_pretty.json (3.5 KiB)
      • kona_mogrify.tsv (14.5 MiB)
      • kona_posts.tsv (159.1 MiB)
      • kona_rip.tsv (108.7 MiB)
      • kona_tags.tsv (57.4 MiB)
    • TOOLS
      • #IM__K.bat (666 Bytes)
      • #IM_looY.bat (586 Bytes)
      • #kona_im.bat (71 Bytes)
      • #kona_im.ctl (339 Bytes)
      • #kona_load.bat (154 Bytes)
      • #kona_load.ctl (120 Bytes)
      • #list.bat (294 Bytes)
      • #load_exif.bat (157 Bytes)
      • #load_exif.ctl (295 Bytes)
      • #mogrify.bat (229 Bytes)
      • _kona_sql_DDL.sql (3.8 KiB)
      • _kona_sql_load.sql (5.7 KiB)
      • _kona_sql_out.bat (39 Bytes)
      • _kona_sql_out.sql (5.7 KiB)
      • _kona_sql_rip.sql (3.2 KiB)
      • aria.bat (161 Bytes)
      • aria_urls.lst (64.4 KiB)
      • grab.bat (99 Bytes)
      • grab_files_DB.py (1.3 KiB)
      • grab_json.py (1.2 KiB)
      • kona_list.lst (4.3 KiB)
    • 00xxxx.qf.zip (155.9 MiB)
    • 00xxxx.qs.zip (48.4 MiB)
    • 00xxxx.sf.zip (1.6 GiB)
    • 00xxxx.ss.zip (353.6 MiB)
    • 01xxxx.qf.zip (191.8 MiB)
    • 01xxxx.qs.zip (26.3 MiB)
    • 01xxxx.sf.zip (1.2 GiB)
    • 01xxxx.ss.zip (192.7 MiB)
    • 02xxxx.qf.zip (264.8 MiB)
    • 02xxxx.qs.zip (141.3 MiB)
    • 02xxxx.sf.zip (1.5 GiB)
    • 02xxxx.ss.zip (334.1 MiB)
    • 03xxxx.qf.zip (718.3 MiB)
    • 03xxxx.qs.zip (335.3 MiB)
    • 03xxxx.sf.zip (1.3 GiB)
    • 03xxxx.ss.zip (434.0 MiB)
    • 04xxxx.qf.zip (576.2 MiB)
    • 04xxxx.qs.zip (303.8 MiB)
    • 04xxxx.sf.zip (1.5 GiB)
    • 04xxxx.ss.zip (460.1 MiB)
    • 05xxxx.qf.zip (351.2 MiB)
    • 05xxxx.qs.zip (159.8 MiB)
    • 05xxxx.sf.zip (1.4 GiB)
    • 05xxxx.ss.zip (431.1 MiB)
    • 06xxxx.qf.zip (479.9 MiB)
    • 06xxxx.qs.zip (142.3 MiB)
    • 06xxxx.sf.zip (1.4 GiB)
    • 06xxxx.ss.zip (498.0 MiB)
    • 07xxxx.qf.zip (342.6 MiB)
    • 07xxxx.qs.zip (170.5 MiB)
    • 07xxxx.sf.zip (1.6 GiB)
    • 07xxxx.ss.zip (451.8 MiB)
    • 08xxxx.qf.zip (453.6 MiB)
    • 08xxxx.qs.zip (179.3 MiB)
    • 08xxxx.sf.zip (1.4 GiB)
    • 08xxxx.ss.zip (492.4 MiB)
    • 09xxxx.qf.zip (174.0 MiB)
    • 09xxxx.qs.zip (411.3 MiB)
    • 09xxxx.sf.zip (476.8 MiB)
    • 09xxxx.ss.zip (1.1 GiB)
    • 10xxxx.qf.zip (56.1 MiB)
    • 10xxxx.qs.zip (422.1 MiB)
    • 10xxxx.sf.zip (200.9 MiB)
    • 10xxxx.ss.zip (1.0 GiB)
    • 11xxxx.qf.zip (48.9 MiB)
    • 11xxxx.qs.zip (340.8 MiB)
    • 11xxxx.sf.zip (193.2 MiB)
    • 11xxxx.ss.zip (1.2 GiB)
    • 12xxxx.qf.zip (32.1 MiB)
    • 12xxxx.qs.zip (311.8 MiB)
    • 12xxxx.sf.zip (219.5 MiB)
    • 12xxxx.ss.zip (1.2 GiB)
    • 13xxxx.qf.zip (42.6 MiB)
    • 13xxxx.qs.zip (375.2 MiB)
    • 13xxxx.sf.zip (164.4 MiB)
    • 13xxxx.ss.zip (1.0 GiB)
    • 14xxxx.qf.zip (38.3 MiB)
    • 14xxxx.qs.zip (361.8 MiB)
    • 14xxxx.sf.zip (172.6 MiB)
    • 14xxxx.ss.zip (1.1 GiB)
    • 15xxxx.qf.zip (40.2 MiB)
    • 15xxxx.qs.zip (251.6 MiB)
    • 15xxxx.sf.zip (193.4 MiB)
    • 15xxxx.ss.zip (912.0 MiB)
    • 16xxxx.qf.zip (38.1 MiB)
    • 16xxxx.qs.zip (302.7 MiB)
    • 16xxxx.sf.zip (171.6 MiB)
    • 16xxxx.ss.zip (930.7 MiB)
    • 17xxxx.qf.zip (43.6 MiB)
    • 17xxxx.qs.zip (244.4 MiB)
    • 17xxxx.sf.zip (189.0 MiB)
    • 17xxxx.ss.zip (929.5 MiB)
    • 18xxxx.qf.zip (42.7 MiB)
    • 18xxxx.qs.zip (234.9 MiB)
    • 18xxxx.sf.zip (183.7 MiB)
    • 18xxxx.ss.zip (880.2 MiB)
    • 19xxxx.qf.zip (28.5 MiB)
    • 19xxxx.qs.zip (325.8 MiB)
    • 19xxxx.sf.zip (159.3 MiB)
    • 19xxxx.ss.zip (885.0 MiB)
    • 20xxxx.qf.zip (32.0 MiB)
    • 20xxxx.qs.zip (262.6 MiB)
    • 20xxxx.sf.zip (139.5 MiB)
    • 20xxxx.ss.zip (946.4 MiB)
    • 21xxxx.qf.zip (29.0 MiB)
    • 21xxxx.qs.zip (206.1 MiB)
    • 21xxxx.sf.zip (133.2 MiB)
    • 21xxxx.ss.zip (813.9 MiB)
    • 22xxxx.qf.zip (22.6 MiB)
    • 22xxxx.qs.zip (179.7 MiB)
    • 22xxxx.sf.zip (131.2 MiB)
    • 22xxxx.ss.zip (760.1 MiB)
    • 23xxxx.qf.zip (29.4 MiB)
    • 23xxxx.qs.zip (198.5 MiB)
    • 23xxxx.sf.zip (109.7 MiB)
    • 23xxxx.ss.zip (751.2 MiB)
    • 24xxxx.qf.zip (23.7 MiB)
    • 24xxxx.qs.zip (201.1 MiB)
    • 24xxxx.sf.zip (171.7 MiB)
    • 24xxxx.ss.zip (990.5 MiB)
    • 25xxxx.qf.zip (26.3 MiB)
    • 25xxxx.qs.zip (249.8 MiB)
    • 25xxxx.sf.zip (130.4 MiB)
    • 25xxxx.ss.zip (880.3 MiB)
    • 26xxxx.qf.zip (44.3 MiB)
    • 26xxxx.qs.zip (285.0 MiB)
    • 26xxxx.sf.zip (120.9 MiB)
    • 26xxxx.ss.zip (768.0 MiB)
    • 27xxxx.qf.zip (48.0 MiB)
    • 27xxxx.qs.zip (292.7 MiB)
    • 27xxxx.sf.zip (131.3 MiB)
    • 27xxxx.ss.zip (907.9 MiB)
    • 28xxxx.qf.zip (47.7 MiB)
    • 28xxxx.qs.zip (368.2 MiB)
    • 28xxxx.sf.zip (129.7 MiB)
    • 28xxxx.ss.zip (966.9 MiB)
    • 29xxxx.qf.zip (49.2 MiB)
    • 29xxxx.qs.zip (381.6 MiB)
    • 29xxxx.sf.zip (108.3 MiB)
    • 29xxxx.ss.zip (1.0 GiB)
    • 30xxxx.qf.zip (38.6 MiB)
    • 30xxxx.qs.zip (384.0 MiB)
    • 30xxxx.sf.zip (90.4 MiB)
    • 30xxxx.ss.zip (1005.7 MiB)